You are on page 1of 2410

Handbook of Combinatorial Optimization

VOLUME 1
Handbook of
Combinatorial
Optimization
Volume 1

Edited by

Ding-Zhu Du
University of Minnesota,
Minneapolis, U.S.A.
and
Panos M. Pardalos
University of Florida,
Gainesville, U.S.A.

KLUWER ACADEMIC PUBLISHERS


BOSTON I DORDRECHT/LONDON
A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-13:978-1-4613-7987-4 e-ISBN-13:978-1-4613-0303-9
DOl: 10.1007/978-1-4613-0303-9

Published by Kluwer Academic Publishers,


P.O. Box, 3300 AA Dordrecht, The Netherlands.

Sold and distributed in North, Central and South America


by Kluwer Academic Publishers,
101 Philip Drive, Norwell, MA 02061, U.S.A.

In all other countries, sold and distributed


by Kluwer Academic Publishers,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

Printed on acid-free paper

All Rights Reserved


© 1998 Kluwer Academic Publishers
Softcover reprint of the hardcover 1st edition 1998

No part of the material protected by this copyright notice may be reproduced or


utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner
Contents

Preface .............................................................. Vll

Mixed-Integer Nonlinear Optimization in Process Synthesis .... 1


C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Approximate Algorithms and Heuristics for MAX-SAT ......... 77


R. Battiti and M. Protasi

Connections between Nonlinear Programming


and Discrete Optimization ....................................... 149
F. Giannessi and F. Tardella

Interior Point Methods for Combinatorial Optimization ...... 189


J.E. Mitchell, P.M. Pardalos, and M. G. C. Resende

Knapsack Problems .............................................. 299


D. Pisinger and P. Toth

Fractional Combinatorial Optimization ......................... 429


T. Radzik

Reformulation-Linearization Techniques
for Discrete Optimization Problems ............................ 479
H.D. Sherali and W.P. Adams

Grabner Bases in Integer Programming ........................ 533


R.R. Thomas

Applications of Set Covering, Set Packing


and Set Partitioning Models: A Survey ........................ 573
R.R. Vemuganti

A uthor Index ..................................................... 747

Subject Index ..................................................... 773


Preface

Combinatorial (or discrete) optimization is one of the most active fields


in the interface of operations research, computer science, and applied math-
ematics. Combinatorial optimization problems arise in various applications,
including communications network design, VLSI design, machine vision, air-
line crew scheduling, corporate planning, computer-aided design and man-
ufacturing, database query design, cellular telephone frequency assignment,
constraint directed reasoning, and computational biology. Furthermore,
combinatorial optimization problems occur in many diverse areas such as
linear and integer programming, graph theory, artificial intelligence, and
number theory. All these problems, when formulated mathematically as the
minimization or maximization of a certain function defined on some domain,
have a commonality of discreteness.
Historically, combinatorial optimization starts with linear programming.
Linear programming has an entire range of important applications including
production planning and distribution, personnel assignment, finance, alloca-
tion of economic resources, circuit simulation, and control systems. Leonid
Kantorovich and Tjalling Koopmans received the Nobel Prize (1975) for
their work on the optimal allocation of resources. Two important discover-
ies, the ellipsoid method (1979) and interior point approaches (1984) both
provide polynomial time algorithms for linear programming. These algo-
rithms have had a profound effect in combinatorial optimization. Many
polynomial-time solvable combinatorial optimization problems are special
cases of linear programming (e.g. matching and maximum flow). In addi-
tion, linear programming relaxations are often the basis for many approxi-
mation algorithms for solving NP-hard problems (e.g. dual heuristics).
Two other developments with a great effect on combinatorial optimiza-
tion are the design of efficient integer programming software and the avail-
ability of parallel computers. In the last decade, the use of integer program-
ming models has changed and increased dramatically. Two decades ago,
only problems with up to 100 integer variables could be solved in a com-
puter. Today we can solve problems to optimality with thousands of integer
variables. Furthermore, we can compute provably good approximate solu-
tions to problems with millions of integer variables. These advances have
been made possible by developments in hardware, software and algorithm
design.

vii
viii Preface

The Handbooks of Combinatorial Optimization deal with several algo-


rithmic approaches for discrete problems as well as with many combinato-
rial problems. We have tried to bring together almost every aspect of this
enormous field with emphasis on recent developments. Each chapter in the
Handbooks is essentially expository in nature, but of scholarly treatment.
The Handbooks of Combinatorial Optimization are addressed not only to
researchers in discrete optimization, but to all scientists in various disciplines
who use combinatorial optimization methods to model and solve problems.
We are certain that experts in the field as well as nonspecialist readers will
find the material of the Handbooks stimulating and helpful.
We would like to take this opportunity to thank the authors, the anony-
mous referees, and the publisher for helping us produce these volumes of
the Handbooks of Combinatorial Optimization with state-of-the-art chap-
ters. We would also like to thank Ms. Xiuzhen Cheng for making Author
Index and Subject Index for this volume.

Ding-Zhu Du and Panos M. Pardalos


Handbook of Combinatorial Optimization

VOLUME 2
Handbook of
Combinatorial
Optimization
Volume 2

Edited by

Ding-Zhu Du
University of Minnesota,
Minneapolis, U.S.A.

and
Panos M. Pardalos
University of Florida,
Gainesville, U.SA.

KLUWER ACADEMIC PUBLISHERS


BOSTON I DORDRECHTI LONDON
A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-13 :978-1-4613-7987-4 e-ISBN-13: 978-1-4613- 0303- 9


DOl: 10.1007/978-1-4613-0303-9

Published by Kluwer Academic Publishers,


P.O. Box, 3300 AA Dordrecht, The Netherlands.

Sold and distributed in North, Central and South America


by Kluwer Academic Publishers,
101 Philip Drive, Norwell, MA 02061, U.S.A.

In all other countries, sold and distributed


by Kluwer Academic Publishers,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

Printed on acid-free paper

All Rights Reserved


© 1998 Kluwer Academic Publishers
Softcover reprint of the hardcover 1st edition 1998

No part of the material protected by this copyright notice may be reproduced or


utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner
Contents

Preface .............................................................. vii

Efficient Algorithms for Geometric


Shortest Path Query Problems .................................... 1
Danny Z. Chen

Computing Distances between Evolutionary Trees ............. 35


Bhaskar DasGupta, Xin He, Tao Jiang, Ming Li,
John Tromp, Lusheng Wang, Louxin Zhang

Combinatorial Optimization and Coalition Games .............. 77


Xiaotie Deng

Steiner Minimal Trees: An Introduction,


Parallel Computation, and Future Work ....................... 105
Frederick C. Hams, Jr.

Resource Allocation Problems .................................. . 159


N aoki K atoh and Toshihide Ibaraki

Combinatoral Optimization in Clustering ...................... 261


Boris Mirkin and flya Muchnik

The Graph Coloring Problem: A Bibliographic Survey ....... 331


Panos M. Pardalos, Thelma Mavridou, and Jue Xue

Steiner Minimal Trees in E3:


Theory, Algorithms, and Applications .......................... 397
J. MacGregor Smith

Dynamical System Approaches to


Combinatorial Optimization ..................................... 471
Jens Starke and Michael Schanz
vi Contents

On-line Dominating Set Problems for Graphs ................. 525


Wen-Guey Tzeng

Optimization Problems in Optical Networks ................... 543


Peng-Jun Wan

Shortest Networks on Surfaces .................................. 589


Jia Feng Weng

Minimum Weight Triangulations ............................... 617


Yin-FengXu

Optimization Applications in the Airline Industry ............ 635


Gang Yu and Jian Yang

Author Index ..................................................... 727


Subject Index ..................................................... 749
Preface

Combinatorial (or discrete) optimization is one of the most active fields


in the interface of operations research, computer science, and applied math-
ematics. Combinatorial optimization problems arise in various applications,
including communications network design, VLSI design, machine vision, air-
line crew scheduling, corporate planning, computer-aided design and man-
ufacturing, database query design, cellular telephone frequency assignment,
constraint directed reasoning, and computational biology. Furthermore,
combinatorial optimization problems occur in many diverse areas such as
linear and integer programming, graph theory, artificial intelligence, and
number theory. All these problems, when formulated mathematically as the
minimization or maximization of a certain function defined on some domain,
have a commonality of discreteness.
Historically, combinatorial optimization starts with linear programming.
Linear programming has an entire range of important applications including
production planning and distribution, personnel assignment, finance, alloca-
tion of economic resources, circuit simulation, and control systems. Leonid
Kantorovich and Tjalling Koopmans received the Nobel Prize (1975) for
their work on the optimal allocation of resources. Two important discover-
ies, the ellipsoid method (1979) and interior point approaches (1984) both
provide polynomial time algorithms for linear programming. These algo-
rithms have had a profound effect in combinatorial optimization. Many
polynomial-time solvable combinatorial optimization problems are special
cases of linear programming (e.g. matching and maximum flow). In addi-
tion, linear programming relaxations are often the basis for many approxi-
mation algorithms for solving NP-hard problems (e.g. dual heuristics).
Two other developments with a great effect on combinatorial optimiza-
tion are the design of efficient integer programming software and the avail-
ability of parallel computers. In the last decade, the use of integer program-
ming models has changed and increased dramatically. Two decades ago,
only problems with up to 100 integer variables could be solved in a com-
puter. Today we can solve problems to optimality with thousands of integer
variables. Furthermore, we can compute provably good approximate solu-
tions to problems with millions of integer variables. These advances have
been made possible by developments in hardware, software and algorithm
design.

Vll
viii Preface

The Handbooks of Combinatorial Optimization deal with several algo-


rithmic approaches for discrete problems as well as with many combinato-
rial problems. We have tried to bring together almost every aspect of this
enormous field with emphasis on recent developments. Each chapter in the
Handbooks is essentially expository in nature, but of scholarly treatment.
The Handbooks of Combinatorial Optimization are addressed not only to
researchers in discrete optimization, but to all scientists in various disciplines
who use combinatorial optimization methods to model and solve problems.
We are certain that experts in the field as well as nonspecialist readers will
find the material of the Handbooks stimulating and helpful.
We would like to take this opportunity to thank the authors, the anony-
mous referees, and the publisher for helping us produce these volumes of
the Handbooks of Combinatorial Optimization with state-of-the-art chap-
ters. We would also like to thank Mr. Arnold Mayaka for making Author
Index and Subject Index for this volume.

Ding-Zhu Du and Panos M. Pardalos


Handbook of Combinatorial Optimization

VOLUME 3
Handbook of
Combinatorial
Optimization
Volume 3

Edited by

Ding-Zhu Du
University of Minnesota,
Minneapolis, U.S.A.

and
Panos M. Pardalos
University of Florida,
Gainesville, U.S.A.

KLUWER ACADEMIC PUBLISHERS


BOSTON I DORDRECHTI LONDON
A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-13 :978-1-4613-7987-4 e-ISBN-13:978-1-4613-0303-9


DOl: 10.1007/978-1-4613-0303-9

Published by Kluwer Academic Publishers,


P.O. Box, 3300 AA Dordrecht, The Netherlands.

Sold and distributed in North, Central and South America


by Kluwer Academic Publishers,
101 Philip Drive, Norwell, MA 02061, U.S.A.

In all other countries, sold and distributed


by Kluwer Academic Publishers,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

Printed on acid-free paper

All Rights Reserved


© 1998 Kluwer Academic Publishers
Softcover reprint of the hardcover 1st edition 1998

No part of the material protected by this copyright notice may be reproduced or


utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner
Contents

Preface .............................................................. vii

Semidefinite Relaxations, Multivariate Normal


Distributions, and Order Statistics ................................ 1
Dimitris Bertsimas and Yinyu Ye

A Review of Machine Scheduling:


Complexity, Algorithms and Approximability ................... 21
Bo Chen, Chris N. Potts, and Gerhard J. Woeginger

Routing and Topology Embedding in Lightwave Networks ... 171


Feng Cao

The Quadratic Assignment Problem ............................ 241


Rainer E. Burkard, Eranda gela, Panos M. Pardalos,
and Leonidas S. Pitsoulis

Algorithmic Aspects of Domination in Graphs ................ 339


Gerard J. Chang

Selected Algorithmic Techniques for Parallel Optimization ... 407


Ricardo C. Correa, Afonso Ferreira, and Stella C. S. Porto

Multispace Search for Combinatorial Optimization .......... 457


Jun Gu

The Equitable Coloring of Graphs .............................. 543


Ko-Wei Lih

Randomized Parallel Algorithms for


Combinatorial Optimization ..................................... 567
Sanguthevar Rajasekaran and Jose D. P. Rolim
vi Contents

Tabu Search ...................................................... 621


Fred Glover and Manuel Laguna

Author Index ..................................................... 759

Subject Index ..................................................... 779

Author Index of Volumes 1-3 .................................... 783

Subject Index of Volumes 1-3 ................................... 845


Preface

Combinatorial (or discrete) optimization is one of the most active fields


in the interface of operations research, computer science, and applied math-
ematics. Combinatorial optimization problems arise in various applications,
including communications network design, VLSI design, machine vision, air-
line crew scheduling, corporate planning, computer-aided design and man-
ufacturing, database query design, cellular telephone frequency assignment,
constraint directed reasoning, and computational biology. Furthermore,
combinatorial optimization problems occur in many diverse areas such as
linear and integer programming, graph theory, artificial intelligence, and
number theory. All these problems, when formulated mathematically as the
minimization or maximization of a certain function defined on some domain,
have a commonality of discreteness.
Historically, combinatorial optimization starts with linear programming.
Linear programming has an entire range of important applications including
production planning and distribution, personnel assignment, finance, alloca-
tion of economic resources, circuit simulation, and control systems. Leonid
Kantorovich and Tjalling Koopmans received the Nobel Prize (1975) for
their work on the optimal allocation of resources. Two important discover-
ies, the ellipsoid method (1979) and interior point approaches (1984) both
provide polynomial time algorithms for linear programming. These algo-
rithms have had a profound effect in combinatorial optimization. Many
polynomial-time solvable combinatorial optimization problems are special
cases of linear programming (e.g. matching and maximum flow). In addi-
tion, linear programming relaxations are often the basis for many approxi-
mation algorithms for solving NP-hard problems (e.g. dual heuristics).
Two other developments with a great effect on combinatorial optimiza-
tion are the design of efficient integer programming software and the avail-
ability of parallel computers. In the last decade, the use of integer program-
ming models has changed and increased dramatically. Two decades ago,
only problems with up to 100 integer variables could be solved in a com-
puter. Today we can solve problems to optimality with thousands of integer
variables. Furthermore, we can compute provably good approximate solu-
tions to problems with millions of integer variables. These advances have
been made possible by developments in hardware, software and algorithm
design.

vii
viii Preface

The Handbooks of Combinatorial Optimization deal with several algo-


rithmic approaches for discrete problems as well as with many combinato-
rial problems. We have tried to bring together almost every aspect of this
enormous field with emphasis on recent developments. Each chapter in the
Handbooks is essentially expository in nature, but of scholarly treatment.
The Handbooks of Combinatorial Optimization are addressed not only to
researchers in discrete optimization, but to all scientists in various disciplines
who use combinatorial optimization methods to model and solve problems.
We are certain that experts in the field as well as nonspecialist readers will
find the material of the Handbooks stimulating and helpful.
We would like to take this opportunity to thank the authors, the anony-
mous referees, and the publisher for helping us produce these volumes of
the Handbooks of Combinatorial Optimization with state-of-the-art chap-
ters. We would also like to thank Mr. Arnold Mayaka for making Author
Index and Subject Index for this volume and to thank Ms. Xiuzhen Cheng
for making Author Index of Volumes 1-3 and Subject Index of Volumes 1-3.

Ding-Zhu Du and Panos M. Pardalos


1

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-76
©1998 Kluwer Academic Publishers

Mixed-Integer Nonlinear Optimization in Process


Synthesis
c. S. Adjiman, C. A. Schweiger, and C. A. Floudas t
Department of Chemical Engineering
Princeton University, Princeton, NJ 08544-5263
E-mail: claire<Otitan.princeton.edu.carl<Otitan.princeton. edu,
floudas<Otitan.princeton.edu

t Author to whom all correspondence should be addressed.

Contents
1 Introduction 3

2 Optimization Approach in Process Synthesis 5

3 Algorithms for Convex MINLPs 8


3.1 Generalized Benders Decomposition 9
3.1.1 Primal Problem. 10
3.1.2 Master Problem 11
3.1.3 GBD Algorithm 13
3.2 Outer Approximation . 14
3.2.1 Outer Approximation Primal Problem. 15
3.2.2 Outer Approximation Master Problem . 15
3.2.3 OA Algorithm . . . . . . . . . . . . 16
3.3 Outer Approximation/Equality Relaxation. . . 18
3.3.1 OA/ER Algorithm . . . . . . . . . . . 18
3.4 Outer Approximation/Equality Relaxation/Augmented Penalty . 20
3.4.1 OA/ER/ AP Algorithm . 21
3.5 Generalized Outer Approximation 23
3.5.1 Primal Problem. . 23
3.5.2 Master Problem . . . . . . 23
3.5.3 GOA Algorithm . . . . . . 25
3.6 Generalized Cross Decomposition . 26
2 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

3.6.1 GCD Algorithm . . . . . . . . . . 27


3.7 Branch-and-Bound Algorithms . . . . .. 30
3.7.1 Selection of the Branching Node . 32
3.7.2 Selection ofthe Branching Variable. 32
3.7.3 Generation of a Lower Bound. 33
3.7.4 Algorithmic Procedure .. 33
3.8 Extended Cutting Plane (ECP) . 34
3.8.1 Algorithmic Procedure .. 36
3.9 Feasibility Approach . . . . . . . 36
3.10 Logic Based Approach . . . . . . 37
3.10.1 Logic-Based Outer Approximation . 38
3.10.2 Logic-Based Generalized Benders Decomposition 40

4 Global Optimization for Nonconvex MINLPs 40


4.1 Branch-and-reduce algorithm . . . . . . . . . . 41
4.2 Interval Analysis Based Algorithm . . . . . . . 42
4.2.1 Node Fathoming Tests for Interval Algorithm 42
4.2.2 Branching Step . . . . . . . . . . . . . . . . . 43
4.3 Extended Cutting Plane for Nonconvex MINLPs .. 44
4.4 Reformulation/Spatial Branch-and-Bound Algorithm. 45
4.5 The SMIN-aBB Algorithm . . . . . . . . . . . . . . 46
4.5.1 Convex Underestimating MINLP Generation 47
4.5.2 Branching Variable Selection 47
4.5.3 Variable Bound Updates. . 48
4.5.4 Algorithmic Procedure. 49
4.6 The GMIN-aBB algorithm 50

5 Implementation: MINOPT 51
6 Computational Studies 56
6.1 Distillation Sequencing. . . . . . . . . . . . . . . . . . . . . . . . .. 56
6.2 Heat Exchanger Network Synthesis. . . . . . . . . . . . . . . . . .. 60
6.2.1 Solution Strategy with the SMIN-aBB algorithm. . . . . .. 66
6.2.2 Specialized Algorithm for Heat Exchanger Network Problems 70

7 Conclusions 70

8 Acknowledgments 71

References
Mixed-Integer Nonlinear Optimization in Process Synthesis 3

Abstract
The use of networks allows the representation of a variety of im-
portant engineering problems. The treatment of a particular class
of network applications, the process synthesis problem, is exposed in
this paper. Process Synthesis seeks to develop systematically process
fiowsheets that convert raw materials into desired products. In re-
cent years, the optimization approach to process synthesis has shown
promise in tackling this challenge. It requires the development of a
network of interconnected units, the process superstructure, that rep-
resents the alternative process flowsheets. The mathematical modeling
of the superstructure has a mixed set of binary and continuous vari-
ables and results in a mixed-integer optimization model. Due to the
nonlinearity of chemical models, these problems are generally classified
as Mixed-Integer Nonlinear Programming (MINLP) problems.
A number of local optimization algorithms, developed for the solu-
tion of this class of problems, are presented in this paper: Generalized
Benders Decomposition (GBD), Outer Approximation (OA), General-
ized Cross Decomposition (GCD), Branch and Bound (BB), Extended
Cutting Plane (ECP), and Feasibility Approach (FA). Some recent de-
velopments for the global optimization of nonconvex MINLPs are then
introduced. In particular, two branch-and-bound approaches are dis-
cussed: the Special structure Mixed Integer Nonlinear aBB (SMIN-
aBB), where the binary variables should participate linearly or in
mixed-bilinear terms, and the General structure Mixed Integer Nonlin-
ear aBB (GMIN-aBB), where the continuous relaxation of the binary
variables must lead to a twice-differentiable problem. Both algorithms
are based on the aBB global optimization algorithm for nonconvex
continuous problems.
Once the theoretical issues behind local and global optimization
algorithms for MINLPs have been exposed, attention is directed to
their algorithmic development and implementation. The framework
MINOPT is discussed as a computational tool for the solution of
process synthesis problems. It is an implementation of a number of
local optimization algorithms for the solution of MINLPs. The use of
MINOPT is illustrated through the solution of a variety of process
network problems. The synthesis problem for a heat exchanger network
is then presented to demonstrate the global optimization SMIN-aBB
algorithm.

1 Introduction
Network applications exist in many fields including engineering, applied
mathematics, and operations research. These applications include problems
4 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

such as facility location and allocation problems, design and scheduling of


batch processes, facility planning and scheduling, topology of transporta-
tion networks, and process synthesis problems. These types of problems are
typically characterized by both discrete and continuous decisions. Thus, the
modeling aspects of these applications often lead to models involving both
integer and continuous variables as well as nonlinear functions. This gives
rise to problems classified as mixed-integer nonlinear optimization problems.
Major advances have been made in the development of mathematical
programming approaches which address mixed-integer nonlinear optimiza-
tion problems. The recent theoretical and algorithmic advances in mixed-
integer nonlinear optimization have made the use of these techniques both
feasible and practical. Because of this, optimization has become a standard
computational approach for the solution of these networking problems.
Some of the major contributions to the development of mixed-integer
nonlinear optimization techniques have come from the field of process syn-
thesis. This is due to the natural formulation of the process synthesis prob-
lem as a mixed-integer nonlinear optimization problem. This has led to
significant algorithmic developments and extensive computational experi-
ence in process synthesis applications. The research in this area has focused
on the overall process synthesis problem as well as subsystem synthesis prob-
lems including heat exchanger network synthesis (HENS), reactor network
synthesis, distillation sequencing, and mass exchange network synthesis, as
well as total process flowsheets.
The process synthesis problem is stated as follows: given the specifi-
cations of the inputs (feed streams) and the specifications of the outputs,
develop a process flowsheet which transforms the given inputs to the desired
products while addressing the performance criteria of capital and operating
costs, product quality, environmental issues, safety, and operability. Three
key issues must be addressed in order to determine the process flowsheet:
which process units should be in the flowsheet, how the process units should
be interconnected, and what the operating conditions and sizes of the pro-
cess units should be. The optimization approach to process synthesis has
been developed to address these issues and has led to some of the major the-
oretical and algorithmic advances in mixed-integer nonlinear optimization.
The next section describes the optimization approach to process synthe-
sis which leads to the formulation of a Mixed-Integer Nonlinear Program.
In Section 3, the optimization algorithms developed for the solution of the
posed optimization problem are presented. Although these methods have
been developed for process synthesis, they are applicable to models that
Mixed-Integer Nonlinear Optimization in Process Synthesis 5

result in other network applications. Section 4 reports some recent develop-


ments for the global optimization of nonconvex MINLPs. Section 5 describes
the algorithmic framework, MINOPT, which implements a number of the
described algorithms. The final part of the paper describes the application
of both global and local methods to a heat exchanger network synthesis
problem.

2 Optimization Approach in Process Synthesis


A major advance in process synthesis has been the development of the op-
timization approach to the process synthesis problem. This approach leads
to a mathematical programming problem classified as a Mixed Integer Non-
linear Program. Significant progress has been made in the development of
algorithms capable of addressing this class of problems.
The optimization approach to process synthesis involves three steps: the
representation of alternatives through a process superstructure, the mathe-
matical modeling of the superstructure, and the development of an algorithm
for the solution of the mathematical model. Each of these steps is crucial
to the determination of the optimal process flowsheet.
The superstructure is a superset of all process design alternatives of in-
terest. The representation of process alternatives is conceptually based on
elementary graph theory ideas. Nodes are used to represent the inputs, out-
puts, and each unit in the superstructure. One-way arcs represent connec-
tions from inputs to process units, two-way arcs represent interconnections
between process units, and one-way arcs represent connections to the out-
puts. The result is a bi-partite planar graph which represents the network of
process units in the superstructure. This network represents all the options
of the superstructure and includes cases where nodes in the graph mayor
may not be present. The idea of the process superstructure can be illus-
trated by a process which has one input, two outputs, and potentially three
process units. The network representation of this is shown in Figure 1.
Since all the possible candidates for the optimal process flowsheet are
embedded within this superstructure, the optimal process flowsheet that
can be determined is only be as good as the postulated representation of
alternatives. This superstructure must be rich enough to allow for a com-
plete set of alternatives, but it must also be concise enough to eliminate
undesirable structures.
Another example of a superstructure is illustrated by the two compo-
6 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Figure 1: Network representation of superstructure

nent distillation scheme presented by [KG89]. This process consists of two


feed streams of known composition and flowrate and two products streams
with specified purities. The superstructure consists of a flash unit and a
distillation unit and is shown in Figure 2.
Through the process synthesis, the structure flowsheet and the optimal
values of the operating parameters are determined. The existence of process
units leads to discrete decisions while the determination of operating pa-
rameters leads to continuous decisions. Thus, the process synthesis problem
is mathematically classified as mixed discrete-continuous optimization.
The next step involves the mathematical modeling of the superstruc-
ture. Binary variables are used to indicate the existence of nodes within the
network and continuous variables represent the levels of values along the
arcs. The resulting formulation is a Mixed Integer Nonlinear Programming
Problem (MINLP):

min
:Il,y
f{x,y)
s.t. h{x,y) = 0
g{x,y) ~ 0 (1)
x E X~Rn
y E Y integer
where
Mixed-Integer Nonlinear Optimization in Process Synthesis 7

F6

F MIXER 2
F4 L
PI
A
FI (>80%A)
S
55%A
H
45 %B

MIXER I

SPLITTER

F2 P2
50%A F5 (>75 %B)
50%B MIXER 3

F7

Figure 2: A Two-Column Distillation Sequence Superstructure

• x is a vector of n continuous variables representing flow rates, compo-


sitions, temperatures, and pressures of process streams and sizing of
process units.

• Y is a vector of integer variables representing process alternatives.

• f(x, y) is the single objective function representing the performance


criterion.

• h (x, y) = 0 are the m equality constraints that represent the mass


and energy balances, and equilibrium expressions.

• g(x, y) ~ 0 are the p inequality constraints that represent design


specifications, restrictions, and logical constraints.
This formulation is completely general and includes cases where nonlineari-
ties occur in the x space, y space, and joint x - y space.
The integer variables can be expressed as binary variables without loss of
generality. Through an appropriate transformation, the general formulation
8 o. S. Adjiman, O. A. Schweiger, and O. A. Floudas

can be written as
min
:D,1/
f(z,y)
s.t. h(z,y) - 0
g(z,y) ~ 0 (2)
z E X~Rn
y E {O,l}q

where the y are the q binary variables which represent the existence of
process units.
The final step of the optimization approach is the development and ap-
plication of algorithms for the solution of the mathematical model. This
step is highly dependent on the properties of the mathematical model and
makes use of the structure of the formulation. This step focuses on the
development of algorithms capable of addressing the MINLPs.
The solution of MINLPs is particularly challenging due to the combina-
torial nature of the problem (y domain) combined with the nonlinearities
in the continuous domain (z domain). The combinatorial nature of the
problem becomes an issue as the number of y variables increases creating a
large number of possible process structures. In the continuous domain, the
models of chemical processes are generally nonlinear. The nonlinearities in
the problem imply the possible existence of multiple solutions and lead to
challenges in finding the global solution.
Despite the challenges involved in the solution of the MINLPs, there
have been significant advances in the area of MINLPs on the theoretical, al-
gorithmic and computational fronts. Many algorithms have been developed
to address problems with the above form and a review of these developments
is presented in the next section.

3 Algorithms for Convex MINLPs


A number of algorithms have been developed to address problems with the
above form 2. Some deal with the formulation as stated, while others deal
with a restricted class of the problem. The following is a chronological listing
of these algorithms.

1. Generalized Benders Decomposition, GBD [Geo72, PF89, FAC89]

2. Branch and Bound, BB [Bea77, Gup80, OOM90, BM91]


Mixed-Integer Nonlinear Optimization in Process Synthesis 9

3. Outer Approximation, OA [DG86]

4. Feasibility Approach, FA [MM86]

5. Outer Approximation with Equality Relaxation, OA/ER [KG87]

6. Outer Approximation with Equality Relaxation and Augmented Pen-


alty, OA/ERI AP [VG90]

7. Generalized Outer Approximation, GOA [FL94]

8. Generalized Cross Decomposition, GCD [HoI90]i

An overview of these MINLP algorithms and extensive theoretical, algo-


rithmic, and applications-oriented descriptions of GBD, OA, OA/ER,
OA/ER/AP, GOA, and GCD algorithms is found in [Flo95].
Some of these algorithms are applicable only to restricted classes of the
general problem formulation. The general strategy of algorithms used to
solve MINLPs is to formulate subproblems such that the subproblems are
easier to solve than the original problem. This may involve fixing certain
variable types, relaxing certain constraints, using duality, or using lineariza-
tion. The algorithms iterate through solutions of the subproblems which
provide upper and lower bounds on the optimal solution of the original prob-
lem. The nature of the subproblems and the quality of bounds provided by
the subproblems are different for the various algorithms.

3.1 Generalized Benders Decomposition


The work of [Geo72] generalized the work of [Ben62] which exploits the
structure of mathematical programming problems. The algorithm addresses
problems with the form of problem 2. In fact, the algorithm is applicable
to a broader class of problems for which the y variables may be continuous.
The focus here is on MINLP models and thus the y variables will be treated
as binary.
The basic idea behind GBD is the generation of upper and lower bounds
on the solution of the MINLP model through the iterative solution subprob-
lems formulated from the original problem. The upper bound is the result
of the solution of the primal problem while the lower bound is the result
of the solution of the master problem. The primal problem corresponds to
the solution of the original problem 2 with the values of the y variables
fixed. This problem is solved in the a: space only and its solution provides
10 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

information about the Lagrange multipliers for the constraints. The master
problem is formulated by making use of the Lagrange multipliers and non-
linear duality theory. Its solution provides a lower bound as well as a new
set of y variables. The algorithm iterates between the primal and master
problems generating a sequence of upper and lower bounds which converge
in a finite number of iterations.

3.1.1 Primal Problem

The primal problem results from fixing the values of the y variables. For
values of y fixed to yk where k is an iteration counter, the primal problem
has the following formulation:

min f(z, yk)


10
s.t. h(z,yk) = 0 (3)
g(z,yk) :5 0
z E X~Rn

The primal formulation is an NLP which can be solved by using existing


algorithms. If the primal problem is feasible, then the optimal solution
provides values for zk, f (zk , yk), and the Lagrange multipliers >.. k and ",k
for the equality and inequality constraints.
If the primal problem is found to be infeasible when applying a solution
algorithm, a feasibility problem is formulated. This problem can be formu-
lated by minimizing the l1 or loo sum of constraint violations. One possible
formulation of the feasibility problem is the following:

min
z,a
ai + at + a;
s.t. g(z, yk) - ai :5 0
h(z, yk) + at - a; = 0 (4)
z E X~Rn
ai, a e+ , a e- ~ 0

Another possible form for the infeasible primal problem is the following
Mixed-Integer Nonlinear Optimization in Process Synthesis 11

where the equality constraints are not relaxed:

min a
:!l,a
s.t. g(a:, yk) ~ a
h(a:,yk) = 0 (5)
a: E X ~ Rn
a~O

The solution of the feasibility problem provides values for iiJk and the
Lagrange multipliers Xk and ilk for the equality and inequality constraints.

3.1.2 Master Problem


The formulation of the master problem for GBD makes use of nonlinear
duality theory. The key aspects of the master problem formulation are the
projection of the problem onto the y space and the dual representation.
For the projection of the problem onto the y space, problem 2 can be
written as

min inf f(a:,y)


'II :!l

s.t. h(a:,y) = o
g(a:,y) ~ o (6)
a: E X~Rn
y EYE {O,l}Q

Let v(y) and V be defined as follows:

v(y) = inf f(a:,y)


:!l
s.t. h(a:,y) - 0
(7)
g(a:,y) ~ 0
a: E X ~ Rn

V = {y : h(a:, y) = O,g(a:, y) ~ 0 for some a: E X ~ Rn} (8)


The projected problem can now be written as:

min v(y)
'II (9)
s.t. y EYnV
12 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

The difficulty in solving this problem is that V and v(y) are known only
implicitly. In order to overcome this, dual representations of V and v(y)
are used.
The dual representation of V is described in terms of a collection of
regions that contain it. An element of Y also belongs to the set V if and
only if it satisfies the system:

o ~ inf£(z,y,X,ji), VX,ji E A
where A = {XERm,jiERP:ji~o,tJ.Li=l} (10)
1=1

This system corresponds to the set of constraints that have to be incorpo-

I
rated for the case of infeasible primal problems.
The dual representation of v(y) is the pointwise infimum of a collection
of functions that support it.

v(y) =[ :t.
min J(z,y)
h(z,y) = 0
g(z,y) $ 0 (11)
z E X ~ Rn
_ [ sup min L(z, y,.x, J.L)] Vy E Y n V
zEX
- >',It~O

Now, the representation for V (10) and the representation for v(y) (11
are substituted into problem 9 and the scalar J.Lb is introduced to obtain the
following master problem:

min J.LB
!lEY,ItB
s.t. P.B ~ min L(z, y,.x, J.L) VA, Vp. ~ 0 (12)
zEX
0 > min £(z, y, X, p,) V (X,p,) E A
zEX

where ~(z,y,~,~) = l~Z,y) +AT~~Z,y) +IJ,Tg(z,y) (13)


L(Z,y,A,P.) = A h(z,y) +J.L g(z,y)

The key issue in the development of an algorithmic implementation of


GBD is the solution of the master problem. The master problem consists
Mixed-Integer Nonlinear Optimization in Process Synthesis 13

of an outer optimization with respect to y whose constraints are two op-


timization problems with respect to :v corresponding to the feasible and
infeasible primal problems. These inner optimization problems need to be
considered for all possible values of the Lagrange multipliers which implies
that an infinite number of constraints need to be considered for the master
problem.
One way to solve the master problem is to use relaxation of the problem
where only a few of the constraints are considered. The inner optimiza-
tion problems are considered only for fixed values of the multipliers which
correspond to the multipliers from the solution of the primal problem. Fur-
thermore, the inner optimization problems can be eliminated by evaluating
the Lagrange function for fixed values of the ::r: variables corresponding to
the solution of the primal problem. This elimination assumes that the La-
grange function evaluated at the solution to the corresponding primal is a
valid underestimator of the inner optimization problem. This is true when
the projected problem v(y) is convex in y.

3.1.3 GBD Algorithm


Step 1
Obtain initial values: yl
Set the counter: k = 1
Set the lower bound: LB D = -00
Set the upper bound: U B D = +00
Initialize the feasibility set: F = 0
Initialize the infeasibility set: F = 0
Set the convergence tolerance: € ~ 0
Step 2
Solve the primal problem for the fixed values of y = yk:
min /(:v, yk)
II:
s.t. g(:v, yk) :$ 0
h(:v,yk) = 0
Obtain the optimal solution, optimal :v k and optimal Lagrange
multipliers >..k and I'k
If the primal is feasible
Update the feasibility set: F = F U k
If the optimal solution of the primal is less than UBD
update the upper bound (UBD)
Else
14 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Solve the infeasible primal problem for the fixed values of y = yk:
min
:z:,a
ai + at + a;
g(z, yk) - ai ~ 0
S.t.
h(z, yk) + at - a; = 0
ai, at, a; > 0
Obtain the optimal zk and the Lagrange multipliers Xk and p,k
Update the infeasibility set: F = F U k
Step 3
Solve the relaxed master problem:
min /-tb
J/,/-Ib
s.t. /-tb ~ f(zl,y) + ().I)Tg(zl,y) + (/-tI)Th(zl,y) l E F
o ~ (XI)T g(zl, y) + (p,I)Th(zl, y) l EF
Obtain optimal yk+l and /-tb
Set the lower bound: LBD = /-tb
If U BD - LBD ~ €
Terminate
Else
Update the counter: k = k + 1
Go to step 2

This algorithm can be applied to general MINLP models, however it is


only guaranteed to converge to the global solution for problems which meet
specific conditions. First X must be a nonempty convex set, the functions
f and 9 must be convex for each fixed y E Y, and the function h must be
linear in z for each y E Y.

3.2 Outer Approximation


The basic ideas behind the Outer Approximation methods [DG86] is similar
to those for GBD. At each iteration, upper and lower bounds on the solution
to the MINLP are generated. The upper bound results from the solution of
a primal problem which is formulated identically to the primal problem for
GBD. The lower bound is determined by solving a master problem which
is an outer linearization of the problem around the primal solution.
The outer approximation methods deal with a subclass of MINLP prob-
lems in which the functions f(z, y), and g(z, y) are linear in the y variables
and separable in z and y. The set of y variables is also strictly binary vari-
ables. The formulation also does not allow for equality constraints. Thus,
Mixed-Integer Nonlinear Optimization in Process Synthesis 15

any equality constraints must be eliminated either algebraically or numer-


ically in order to apply the OA algorithm. This class of MINLPs has the
following formulation:
min f(x)
rD,Y
+ cTy
s.t. g(x) + cTy ~ 0 (14)
x E X~Rn
y E {O,I}Q
The OA algorithm is similar to the GBD algorithm in that it iterates
between upper and lower bounding primal and master subproblems. The
difference is in the formulation of the master problem. The master problem
for this method is formed by a projection onto the y space and an outer
approximation of the objective function and feasible region.

3.2.1 Outer Approximation Primal Problem


As in GBD, the primal problem results from fixing the values of the y
variables to yk:
min f(x)
rD
+ cTyk
s.t. g(x) + Byk ~ 0 (15)
x E X ~ Rn
If this problem is feasible, its solution provides an upper bound on the solu-
tion of the MINLP model. If the primal is infeasible, a feasibility problem
similar to those used in GBD is formulated and solved.

3.2.2 Outer Approximation Master Problem


The master problem is formulated by projecting the problem onto the y
space and using an outer approximation of the objective function and feasible
region. The projected problem can be written as
min v(y)
Y (16)
s.t. y EYnV
where
v(y) = cTy+ inf f(x)
rD
s.t. g(x) +BTy ~ 0 (17)
xEX~Rn
16 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

and

v = {y : g(:.c) + By ~ 0, for some :.c E X ~ Rn} (18)

The outer approximation of v(y) is performed by linearizing f(:.c) and


9 (:.c) around the solution of the primal problem :.c". Provided that the func-
tions f(:.c) and g(:.c) are convex, the linearizations represent valid support
functions. Thus, replacing v(y) with its outer approximation and replacing
y E Y n V with y E V along with an integer cut constraint, the following
master problem results:

cTy + J-tOA
s.t. J-tOA ~ f(:.c") + Vxf(:.ck)(:.c - :.c k ) } Vk E F
o ~ g(:.c k ) + Vxg(:.ck)(:.c - :.c k ) + By
:.c E X ~ Rn (19)
y EYE {0,1}
I: yf - I: yf ~ IBk I- 1, kEF
iEBk iENB k

where F is the set of all feasible solutions :.c k to the primal problem.
Since the y variables participate linearly and are binary variables, this
formulation is an MILP which can be solved by standard branch-and-bound
algorithms. This formulation of the master problem requires that all of
the feasible solutions to the primal problem be known which implies an
exhaustive enumeration of the binary variables. In order to accommodate for
this inefficiency, a relaxation is proposed where only linearizations around
the currently known feasible points are included in the master problem.
Additionally, to ensure that integer combinations which produce infeasible
primal problems are also infeasible in the master problem, linearizations
about the solution to the feasibility problem are also included in the master
problem.

3.2.3 OA Algorithm
Step 1
Obtain starting point: yl
Set the counter: k = 1
Set the lower bound: LBD = -00
Set the upper bound: U B D = +00
Set the convergence tolerance: f ~ 0
Mixed-Integer Nonlinear Optimization in Process Synthesis 17

Step 2
Solve the primal problem for the fixed values of y = yk:
min f(x) + cTyk
:Il
s.t. g(x) $ _Byk
Obtain the optimal xk
If the primal is feasible
If the optimal solution of the primal is less than U BD
Update the upper bound (U BD)
Else
Solve the infeasible primal problem for the fixed values of yk:
mm ai +at +a;
:Il,a
s.t. g(x) - ai < _Byk
ai,at,a; > 0
Obtain the optimal xk
Step 3
Solve the relaxed master problem:
min cTy + /-LOA
:Il,y,J-iOA
/-LOA ~ f(x l ) + V'xf(x')(x - x') } Vl E F
s.t. 0 ~ g(xl) + Vxg(x')(x - xl) + By
o ~ g(x') + Vxg(xl)(x - xl) + By } Vl E F
x E X ~ Rn
y EYE {O,l}
L: y~ - L: y~ $ IBI I-l, Vl E F
iEBI iENBI
Obtain the solution and yk+l
If the solution to the master is greater than the current lower bound
Update the lower bound.
If UBD - LBD $ €
Terminate
Else
Update the counter: k = k + 1
Go to step 2

The stated algorithm may be applied to general problems whose formu-


lation is of the form (14). However, it is not guaranteed to converge to the
global solution unless some additional conditions are met. The functions f
and 9 must be convex in x. If this is not the case, the linearizations em-
18 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

ployed in the master problem may not be valid and may possibly eliminate
part of the feasible region.

3.3 Outer Approximation/Equality Relaxation


The OA/ER algorithm is a generalization of the Outer Approximation
algorithm [KG87] to handle nonlinear equality constraints. The class of
problems this algorithm can address is the following:

min fez)
!D,y
+ cTy
s.t. g(z) + By ::; 0
h(z) + Cy = 0 (20)
z E X~Rn
y E {O,I}Q
The basic idea behind this algorithm is to relax the equality constraints
into inequalities and apply the OA algorithm. A square diagonal matrix T,
whose diagonal elements are minus one, zero and one, is used for relaxing
the equality constraints.

Tk Chez) + Cy) ::; 0


The matrix T has the same number of rows as the number of equality
constraints and the values of the diagonal elements depend on the signs of
the corresponding multipliers obtained from the primal problem. The values
of the elements are one for the multipliers which are positive, minus one for
the negative multipliers, and zero for the zero valued multipliers.

Tk = diag(tii) tii = sign(J.t~)


With the equalities now relaxed to inequalities, the principles of the OA
can be applied to the problem.

3.3.1 OA/ER Algorithm


Step 1
Obtain starting point: yl
Set the counter: k = 1
Set the lower bound: LBD =-00
Set the upper bound: UBD = +00
Set the convergence tolerance: € ~ 0
Mixed-Integer Nonlinear Optimization in Process Synthesis 19

Step 2
Solve the primal problem for the fixed values of y = yk:
mm J(z) + cTyk
Z
s.t. g(z):5 _Byk
h(z) = -Cyk
Obtain the optimal mk and the Lagrange multipliers J.£k
If the primal is feasible
If the optimal solution of the primal is less than U BD
Update the upper bound (U BD)
Else
Solve the infeasible primal problem for the fixed values of y = yk:
min ai + at + a;
X,o
s.t. Ui :5 _Byk
g(z) -
h(z) + at -
a; = -Cyk
ai,at,a; ~ 0
Obtain the optimal ii;k and Lagrange multipliers p.k
Step 3
Determine the matrix T:
Tk = diag(tii) where tii = sign(J.£~)
Solve the relaxed master problem:
min J.£ + cTy
Z,Y
J.£ > J(zl) + VxJ(m')(m - m') }
s.t. 0 ~ g(zl) + Vxg(z')(z - zl) + By Vl E F
o ~ T'(h(ml) + Vxh(m')(m - m') + Cy)
o > g(ii;l) + Vxg(ii;l)(m - ii;l) + By } -
o > T'(h(ii;') + Vxh(ii;l)(z _ ii;l) + Cy) VI E F
zEX~Rn
y EYE {O, 1}q
L: Yi - L: Yi :5 IB'I - 1 VI E F
iEB' iEN'
Obtain the solution and yk+l
If the solution to the master is greater than the current lower bound
Update the lower bound.
If U BD - LBD :5 f
Terminate
Else
Update the counter: k = k + 1
Go to step 2
20 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

This algorithm is not guaranteed to determine the global solution unless


certain convexity conditions are met.

3.4 Outer Approximation/Equality Relaxation/Augmented


Penalty

The OA/ER/ AP algorithm [VG90] is a modification of the OA/ER al-


gorithm. The objective of this algorithm is to avoid convexity assumptions
necessary for finding the global solution using the OA/ER algorithm. This
algorithm addresses the same class of problems as OA/ER:

min f(:V)
Z,'I/
+ cTy
s.t. g(:v) + By < 0
h(:v) + Cy = 0 (21)
:v E X~Rn
y E Y E {O,I}Q

This algorithm uses a relaxation of the linearizations of the master prob-


lem in order to expand the feasible region. Through this expansion of the
feasible region, the probability of cutting part of the feasible region due to
one of the linearizations is reduced. Note that this does not guarantee the
possible elimination of part of the feasible region and thus the determination
of the global solution cannot be guaranteed.
The linearizations in the master problem are relaxed by including slack
variables in the constraints. The violations of the linearizations are penalized
by including weighted sums of the slack variables in the objective function.
The difference between the OA/ER and OA/ER/AP algorithms is in
the master problem formulations. The OA/ER/ AP master problem has
Mixed-Integer Nonlinear Optimization in Process Synthesis 21

the following formulation:

min cT Y + J.1. + ~ UISI + ~ ~ 'lJi,IPi,1 + ~ ~ Wi,lqi,l


m,'I/,s,p,q I I i i i
J.1. + Sl ~ f(ZI) + Vxf(Z')(Z - Zl) }
s.t. P, ~ g(ZI) + Vxg(Z')(Z - Zl) + By Vl E F
q, ~ T'(h(z') + Vxh(z')(z - zl) + Cy)
P, > g(zl) + Vxg(zl)(z - zl) + By } -
q, > T'(h(zl) + Vxh(z')(z _ zl) + Cy) VI E F
zEX~Rn
y EYE {O,l}Q
sl,PI,qt ~ 0
~ Yi - E Yi ~ IBtl - 1 Vl E F
iEBi iENi
(22)

where sk is a slack scalar for iteration k and Pk and qk are slack vectors at
iteration k for the inequality and relaxed equality constraints respectively.
The weights for the penalty terms, Ul, 'lJi,k, and Wi,k, are determined from
the multipliers of the corresponding constraints from the solution of the
primal problem. The correspondence between the constraints in the primal
problem and the multipliers, J.1.~, J.1.k, and >..k is as follows:

min
:II
bTyk + J.1.B
s.t. J(z) - J.1.B < 0 +- J.1.~
g(z) - cTyk ~ 0 +- J.1.k
h(z) - JI'yk - 0 +- >..k

The weights for the slack variables are assigned as follows ([VG90]):

Uk = lOOOJ.1.~
'lJi,k = 1000J.1.k
Wj,k = lOOO>..k

3.4.1 OA/ER/ AP Algorithm


Step 1
Obtain starting point: yl
Set counter: k = 1
Set the lower bound: LBD = -00
Set the upper bound: UBD = +00
22 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Set the convergence tolerance: E ~ 0


Step 2
Solve the primal problem for the fixed values of yk:
min J(z) + cTyk
x
s.t. g(z) ~ Byk
h(z) = Cyk
Obtain the optimal zk and the Lagrange multipliers ILk
If the primal is feasible
If the optimal solution of the primal is less than U BD
Update the upper bound (UBD)
Else
Solve the infeasible primal problem for the fixed values of yk:
min ai + at + a;
:!l,Q
s.t. g(z) - ai ~ cTyk
h(z) + at - a; = Byk
ai,at,a; ~ 0
Obtain the optimal wk and Lagrange multipliers p.k
Step 3
Determine the matrix T as follows:
Tk = diag(tii) where tii = sign(J.'~)
Set the values for the penalty parameters, u, v, and w
Solve the relaxed master problem:
min c T y + IL + l: UISI + l: l: Vi,IPi,1 + l: l: wi,lqi,l
:!l,Y,S,p,q I I i i i
J.' + SI
~ J(zl) + VxJ(z')(z - z') }
s.t. PI ~ g(zl) + Vxg(zl)(z - z') + By VI E F
q, ~ T'(h(ZI) + "Vxh(zl)(z - z') + Cy)
P, > g(w l ) + Vxg(wl)(z - wi) + By } Vl F
q, ~ T'(h(w l ) + Vxh(wl)(z - wi) + Cy) E
z EX ~Rn

°
y EYE {O,l}q
SI,PZ,q, ~
l: Yi - l: Yi ~ IB'I- 1 VI E F
iEB' iEN'
Obtain optimal yk+ 1
If the solution to the master is greater than the current lower bound
Update the lower bound
IfUBD-LBD ~ E
Terminate
Mixed-Integer Nonlinear Optimization in Process Synthesis 23

Else
Update the counter: k = k +1
Go to step 2

3.5 Generalized Outer Approximation


The Generalized Outer Approximation, GOA, algorithm [FL94] generalizes
the OA approach to handle the following class of MINLP problems:
min
ill
f(x,y)
s.t. g(x,y) < 0 (23)
x E X~Rn
y E {O,l}q
The difference between this formulation and that for the OA algorithm is
that there is no restriction on the separability in the x and y variables and
the linearity of the y variables.
The differences between the GOA algorithm and the OA algorithm are
the treatment of infeasibilities, a new master problem formulation, and the
unified treatment of exact penalty functions.

3.5.1 Primal Problem


The primal problem is formulated in the same manner as in OA. However,
if the primal is infeasible, then the following feasibility problem is solved for
y =yk:
mln
ill

s.t. gi ( X,y k) <


_ 0 i EI (24)
x E X ~ Rn
where I is the set of feasible inequality constraints and I' is the set of
infeasible inequality constraints. With this formulation of the feasibility
problem, the linearizations of the nonlinear inequality constraints about the
solution of the feasibility problem are violated.

3.5.2 Master Problem


The master problem for GOA is formulated based on the OA ideas of pro-
jection onto the y-space and the outer approximation of the objective func-
24 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

tion and feasible region. The difference in the master problem formulation
is in the utilization of the infeasible primal information.
The projection onto the y-space is the same as for OA:

min v(y)
1/ (25)
s.t. y e YnV
where
v(y) = cTy+ inf J(z)
Z
s.t. g(z)+BTy~O (26)
z eX ~ Rn

and

v = {y: g(z) + By ~ 0, for some z e X} (27)

The formulation of the master problem follows from the outer approx-
imation of the the problem v(y) and a representation of the set V. For
the outer approximation of v(y), the linearizations of the objective function
and constraints are used. The set V is replaced by linearizations of the con-
straints at yk for which the primal is infeasible, and the feasibility problem
has solution zk. Thus, the master problem has the following formulation:

min
Z,1/,/1GOA
cT y + J.tGOA

=:: )
s.t.
J.tGOA ~ J(zk,yk) + VJ(zk,yk) ( :
VkeF
o ~ g(Zk,yk)+Vg(Zk,yk)(:=::)

VkeF

(28)

where F is the set of all yk such that the primal problem is feasible, and F
is the set of all yk such that the primal problem is infeasible.
Mixed-Integer Nonlinear Optimization in Process Synthesis 25

As in OA, relaxation and an iterative procedure are used to solve the


problem since the solution of the complete master problem (28) requires all
feasible and infeasible solutions of the primal problem. For the solution of
the relaxed master problem, the known feasible and infeasible solutions are
used for the outer approximation.

3.5.3 GOA Algorithm

Step 1
Obtain starting point: yl
Set the counter: k = 1
Set the lower bound: LBD = -00
Set the upper bound: UBD = +00
Initialize the feasibility set: F = 0
Initialize the infeasibility set: F = 0
Set the convergence tolerance: E 2: 0
Step 2
Solve the primal problem for the fixed values of yk:
min f(~,yk)
III
s.t. 9(~, yk) ~ 0
If the primal is feasible
Obtain the optimal ~k
Update the feasibility set: F = F U k
If the optimal solution of the primal is less than U BD
Update the upper bound (UBD)
Else
Solve the infeasible primal problem for the fixed values of yk:
min L: wi9t(~,yk)
III iEI'
s. t . 9i (~,y k) <
_ 0 .;• E I
~ E X~Rn
Obtain the optimal ii k
Update the feasibility set: F = F U k
Step 3
Solve the relaxed master problem:
26 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

min cT y + /1GOA
=:: )
:C,'II,IJ.GOA

/1GOA > /(z',y') + V/(zl,yl) ( :

(:=:: )
s.t. 'Vl E F
o ~ g(z',y')+Vg(z',y')

o ~ g(zl, yl) + Vg(zl, yl) ( : =:: )'VI E F


zEX
yEY
Obtain the solution and yk+1
If the solution to the master is greater than the current lower bound
Update the lower bound.
If U BD - LBD ~ f
Terminate
Else
Update the counter: k = k + 1
Go to step 2

3.6 Generalized Cross Decomposition


The Generalized Cross Decomposition, GCD, algorithm [Ho190] exploits the
advantages of Dantzig-Wolfe Decomposition and GBD and simultaneously
utilizes primal and dual information. This algorithm can address problems
with the general MINLP formulation (2) where the constraints are parti-
tioned into two sets:
min /(z,y)
:C,'II
s.t. g1 (z, y) ~ 0
g2(Z,y) < 0
h1(Z,y) = 0 (29)
h2(Z,y) = 0
z E X~Rn
y E {O,l}q

This algorithm consists of two phases and convergence tests. Phase I


involves the solution of the primal and dual subproblems where the primal
Mixed-Integer Nonlinear Optimization in Process Synthesis 27

subproblem provides an upper bound on the solution along with Lagrange


multipliers for the dual subproblem. The dual subproblem provides a lower
bound on the solution of the problem and supplies new values of the y vari-
ables for the primal subproblem. Both the primal and dual subproblems
provide cuts for the master problem (Phase II). In Phase II either a primal
master problem or a Lagrange relaxation master problem is solved. The pri-
mal master problem is formulated using the same derivation as for the GBD
master problem while the Lagrange relaxation master problem is formulated
by using Lagrangian duality. The algorithm also uses several convergence
tests to determine whether or not solutions of various subproblems can pro-
vide bound or cut improvement.
The algorithm for GCD is based on the idea that it is desirable to
solve as few master problems as possible since these are generally more
computationally intensive. The algorithm makes extensive use of the primal
and dual subproblems to reduce the number of times the master problem
must be solved to obtain the solution.

3.6.1 GCD Algorithm


Step 1
Obtain starting point: yl
Set the counter: k = 1
Set the lower bound: LBD = -00
Set the upper bound: URD = +00
Initialize the feasibility set: F = 0
Initialize the infeasibility set: F = 0
Step 2
Solve the primal problem for the fixed values of yk:
mm f(:n, yk)
:e
s.t. gl (:n, yk)$ 0
g2(:n,yk) $ 0
hl(:n,yk) = 0
h2(:n, yk) = 0
:n E X ~ R n
Obtain the optimal :n k and Lagrange multipliers ).f, ).~, J-tf, and J-t~
If the primal is feasible
Update the feasibility set: F = F U k
If the optimal solution of the primal is less than U B D
update the upper bound (U BD)
28 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Perform CTD test for ).f and J-Lf


Else
Solve the infeasible primal problem for the fixed values of yk:
min
m,cr
ail + ai2 + a~ + a;l + a~ + a;2
s.t. 91(z,yk)-ail < 0
92(z, yk) - ai2 < 0
hl(z, yk) + a~ - a;l = 0
h2(Z,yk) + a~ - a;2 = 0
+ - + - >
ail,ai2,ael,ael,ae2,ae2 _ 0
Obtain the optimal {Vk and the Lagrange multipliers Xf, X~, P,f and p'~
Update the infeasibility set: F = F U k
Perform CTDU test for ~f and pf
Step 3
If CTD or CTDU test from Step 2 is not passed
Solve Relaxed Lagrange Relaxation Master problem:
max J-Lb
>'l'~l,~b
s.t. J-Lb :::; f(zl, yl) + ()'l)Tht{z', yl) + (J-Ll)T 91 (z', yl) 1 E F
o :::; ().t} Th l (zl,YI)+(J-Ld T9l({V"YI) lEF
J-Ll ~ 0
Obtain optimal ).f and J-Lf
Step 4
If kEF or the CTD or CTDU test from Step 2 is not passed
Solve the Dual Subproblem:
min f(z, y) + ().f)Thl(Z, y) + (J-Lf)T 91 (z, y)
m,lI
s.t. h 2 (z, y) = 0
92(Z,y) :::; 0
Obtain optimal yHl
If the solution is greater than the lower bound
Update the lower bound
Else
Solve the Dual Feasibility Subproblem:
min (Xf)Thl(:l!,y) + (p,f)T 9 d:l!,y)
m,lI
s.t. h2(:l!,y) - 0
92(Z,y) :::; 0
Obtain the optimal yHl
Step 5
Perform the CTP test for yHl
Mixed-Integer Nonlinear Optimization in Process Synthesis 29

If the CTP test fails


Solve the Relaxed Primal Master Problem:
min J1.b
'Y,l'b
s.t. J1.b ~ f(zl, y) + (>.')Th(z', y) + (J1.')T g(z', y) 1 E F
o ~ (5. I )Th(iC' ,y) + (jll)Tg(iC',y) lEF
Obtain the optimal yk+l
If the solution is greater than the lower bound
Update the lower bound
If U BD - LBD :5 €
Terminate
Else
Update the counter: k = k + 1
Go to step 2

The convergence tests are defined as follows:


CTP Test At the kth iteration, if the solution from the Dual Subproblem
yk+ 1 satisfies
UBD ~ f(zl,yk+1) + (>.I)Th(z',yk+1) + (J1. IVg(zl,yk+1) 1 E F
o ~ (5. )Th(iC' ,yk+1) + (jll)Tg(iC ' ,yk+1)
I IE F

then yk+1 will provide an upper bound or cut improvement. Other-


wise, the Relaxed Primal Master is solved to obtain a new yk+l.
CTD Test At the kth iteration, if the feasible solution of the Primal Prob-
lem satisfies:

then >.f and J1.f will provide a lower bound or cut improvement. Oth-
erwise, the Relaxed Lagrange Relaxation Master is solved to obtain
new >.f and J1.f·
CTDU Test At the kth iteration, if the Primal Problem is infeasible and
the solution to the Infeasible Primal Problem satisfies

then 5.f and jlf will provide cut improvement. Otherwise, the Relaxed
Lagrange Relaxation Master is solved to obtain >.f and J1.f.
30 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

The situations where the GCD algorithm reduces to Dantzig-Wolfe de-


composition and GBD can be observed. First, when the tests CTD and
CTDU are not passed at all iterations and only the relaxed primal master
problem is used, GCD reduces to GBD. On the other hand, if the test
CTP is not used at all iterations and only the relaxed primal Lagrange
problem is used, then GCD reduces to Dantzig-Wolfe decomposition.
The algorithm for GCD is well-illustrated by the use of a flow diagram
such as that in Figure 3.

3.7 Branch-and-Bound Algorithms


A number of branch-and-bound algorithms have been proposed to identify
the global optimum solution of problems which are convex in the :v-space and
relaxed y-space. [Bea77, GR85, OOM90, BM91, QG92]. These algorithms
can also be used for nonconvex problems of the form (2) but their conver-
gence to the global optimum solution can only be guaranteed for convex
problems. A basic principle common to all these algorithms is the gener-
ation of valid lower bounds on the original MINLP through its relaxation
to a continuous problem. In most algorithms, the continuous problem is
obtained by letting binary variables take on any value between 0 and 1. In
most algorithms, this relaxation is an NLP problem. The only exception is
the algorithm of [QG92], discussed in Section 3.7.3, in which an LP prob-
lem is obtained. If the NLP relaxation has an integer solution, this solution
provides an upper bound on the global solution. The generation of lower
and upper bounds in this manner is referred to as the bounding step of the
algorithm. At first, all the binary variables are relaxed and the continuous
problem corresponds to the first node of a branch-and-bound tree. At the
second level, two new nodes are created by forcing one of the binary vari-
ables to take on a value of 0 or 1. This is the branching step. Nodes in
the tree are pruned when their lower bound is greater than the best upper
bound on the problem, or when the relaxation is infeasible. The algorithm
terminates when the lowest lower bound is within a pre-specified tolerance
of the best upper bound.
Although the size of the branch-and-bound tree is finite, and convergence
is guaranteed, it is desirable to explore as few nodes of the tree as possible.
The selection of the branching node and variable and the solution of the
relaxed problem all affect the convergence characteristics of the algorithm.
Several strategies have been suggested in the literature.
Mixed-Integer Nonlinear Optimization in Process Synthesis 31

NO Relaxed
Primal
Master

y=yl '"y
Primal
URD + = (I)
Subproblem
LBD =. (I)

Primal Infeasible
Feasibility
Problem Feasible

NO
Relaxed
NO Lagrange
YES Relaxation
Master
YES

Dual Dual
Feasibility Subproblem

'"x '"Y
'"x '"y
NO '"y

Figure 3: Generalized Cross Decomposition Flow Diagram


32 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

3.7.1 Selection of the Branching Node


Since all nodes whose lower bound is less than the best upper bound must be
explored before convergence can be declared, most algorithms use the node
with the lowest lower bound for branching. It is often the strategy which
minimizes computational requirements [LW66].
In some cases, a depth-first approach has been adopted [GR85, QG92].
In this case, the last node created is selected for branching and the branch-
and-bound tree is explored vertically, until an integer solution is obtained
and backtracking can be used to move back towards the root of the tree,
until a node with a child still open for search is identified. This strategy can
lead to the generation of a tighter upper bound as levels in the branch-and-
bound tree where a large fraction of the binary variables are fixed to one of
their bounds are quickly reached. However, it may result in the solution of
an unnecessarily large number of relaxations.
The breadth-first approach can also be followed in exploring the solution
space. In this case, every node on a level is branched on before moving
on to the next level. This approach can be especially useful at the initial
levels of the tree, in order to identify promising branches that should then be
explored through a depth-first approach. Thus, a combination of a depth-
first and breadth-first strategies is likely to result in smaller computational
expense than the adoption of either a single one of these techniques.
Finally, a node can be selected based on the "quality" of the solution of
the relaxation, as measured by the estimation of the node [GR85]. In this
case, nodes for which the values of the integer variables at the solution of
the continuous relaxation lie far away from an integer solution are penalized.
This deviation from integrality is combined with the value of the objective
function for each node, to yield a quantity referred to as estimation. The
node with the lowest estimation is then chosen as the next branching node.

3.7.2 Selection of the Branching Variable


The most commonly used criterion for the selection of a branching variable
yB is the most fractional variable rule [GR85, OOM90]. The variable which
is farthest from its binary bounds at the solution of the node to be explored
is selected for branching.
Other approaches attempt to determine which binary variable has the
greatest effect on the lower bound of the problem. If the user has prior
knowledge of the problem, branching priorities may be set to accelerate the
Mixed-Integer Nonlinear Optimization in Process Synthesis 33

increase of the best lower bound [GR85]. The quantitative analysis of the
effect of each binary variable has also been proposed through the use of
pseudo-costs [BGG+71, GR85]. The pseudo-cost of a variable Yj is defined
by calculating the relative change in the objective function when Yj is fixed
to 0 or 1. Thus, if the solution of the NLP with 0 :5 Yj :5 1 is Yj = yj and
f(x, y) = /*, and the optimum objective value with Yj = 0 is fo, the lower
pseudo-cost associated with Yj is PCf = (fa - /*)/Y;' Similarly, if h is
the optimum objective value for Yj = 1, the upper pseudo-cost is defined as
pcy = (h - /*)/(1- yj). To avoid excessive computational requirements,
pseudo-costs are only calculated once. At each node, the pseudo-costs of
fractional variables are updated by computing the minimum of PCfyj and
pcy (1 - Yj), where yj is the value of Yj at the solution of the NLP at the
branching node. The variable with the maximum pseudo-cost is selected
for branching as it is expected to lead to the greatest increase in the lower
bound on the problem.

3.7.3 Generation of a Lower Bound


At each node, a relaxation of the MINLP problem must be solved in order
to generate a lower bound. This procedure can be costly for large problems
and (BM91] advocate the early detection of non-integer solutions in order
to reduce the time spent solving NLPs. If the binary variables appear to
be converging to values away from their bounds, the NLP solver should be
interrupted before full convergence has been achieved and the current node
should be branched on.
[QG92] suggest further relaxation of the problem to an LP in order to
obtain lower bounds. When the lower bounding LP has an integer solution,
this integer combination is used to formulate an NLP problem which yields
an upper bound on the original problem. The LP problems are generated
by linearizing the nonlinear functions at the solution of each NLP. In or-
der to prevent the LPs from becoming excessively large, a reformulation
which combines all nonlinearities into one inequality constraint is used. The
first linearization is obtained by fixing the binary variables to an arbitrary
combination and solving the resulting NLP.

3.7.4 Algorithmic Procedure


The general algorithmic statement for branch-and-bound approaches is as
follows:
34 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Step 1
Set absolute tolerance €j set LBD* = -00 and UBD* = 00.
Set node counter k = O.
Initialize set of nodes to be bounded, J = {O}j
Set relaxed y-space, Yo = [0, l]q .
N, the list of nodes to be explored, is empty.
Step 2 Bounding
Solve relaxed NLP problem, for j E J :
LBDj = min f(z, y)
s.t. h(z,y) - 0
g(z, y) ::; 0
z E X ~Rn
y E Yj
If all y variables are integer at the solution:
If LBDj ::; UBD*, UBD* = LBDj.
Else, fathom the current node.
If some y variables are fractional:
If LBDj ::; U BD*, add the current node to N.
Else, fathom the current node.
Step 3
Set LBD* to the lowest lower bound from the list N.
If UBD* - LBD· ::; €, terminate with solution UBD*.
Otherwise, proceed to Step 4.
Step 4 Branching
Select the next node to be explored, Ni (i < k), from list N. Its
lower bound is LBDi and the corresponding y-space is Yi.
Select a branching variable yB.
Create two new regions Y k+l = Y i n {YlyB = O} and
Y k+2 = Yi n {YlyB = I}.
Set J = k + 1, k + 2 and k = k + 2. Go back to Step 2.

3.8 Extended Cutting Plane (ECP)

The cutting plane algorithm proposed by (KeI60] for NLP problems has
been extended to MINLPs [WPG94, WP95]. This Extended Cutting Plane
algorithm (ECP) can address problems of the form:
Mixed-Integer Nonlinear Optimization in Process Synthesis 35

min c'{;x+c~y
s.t. g(:v,y) ~ 0
(30)
x E X~Rn
y E Yinteger

where C x and c y are constant vectors.


Problems with a nonlinear objective function, f(x,y), can be reformu-
lated by introducing a new variable z such that f(x, y) - z ~ 0 and mini-
mizing z.
The ECP algorithm relies on the linearization of one of the nonlinear
constraints at each iteration and the solution of the increasingly tight MILP
made up of these linearizations. The solution of the MILP problem provides
a new point on which to base the choice of the constraint to be linearized
for the next iteration of the algorithm. Unlike the Outer Approximation,
described in Section 3.2, the ECP does not require the solution of any NLP
problems for the generation of an upper bound. As a result, a large number
of linearizations are required for the approximation of highly nonlinear prob-
lems and the algorithm does not perform well in such cases. Due to the use
of linearizations, convergence to the global optimum solution is guaranteed
only for problems involving inequality constraints which are convex in the
x and relaxed y-space.
Given a point (:v o, yO) in the feasible set, the function 9i(X, y) can be
underestimated by


9i X ,y0) + (89i)
ax ",O,yO
(x-x0) + (89i)
-8
y ",O,yO
(y - y0). (31)

The function 9jo(X,y), where jo = argmaxi 9i(XO,yO), is then used to


construct an underestimating MILP problem

min c'{;x+c~y
s.t. lo(x,y) ~ 0
(32)
x E X~Rn
y E Yinteger
36 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

At any iteration k, a single constraint is chosen for linearization and


the corresponding linearization lk(x, y) is added to the MILP. All linear
constraints from the original MINLP problem are, of course, incorporated
into the MILP from the start of the algorithm.

3.B.1 Algorithmic Procedure


The algorithmic procedure is:

Step 1
Set absolute tolerance f; set LBD* = -00.
Set iteration counter k = 0 and select starting point (xO,yO).
Step 2
Solve kth MILP problem:
LBD* = min eTx x + eTyy
s.t. li(x,y) < O,i=O,··· ,k-l
x E X~Rn
y E Yj
The optimal solution of the MILP is at (xk, yk).
Find jk = argmaxi gi(X k, yk).
Step 3
If gjk (xk, yk) ~ f, convergence has been reached. Terminate with
solution LBD*.
Otherwise, proceed to Step 4.
Step 4
Construct lk(x, y). Set k = k + 1. Return to Step 2.

3.9 Feasibility Approach


This algorithm was proposed by [MM85] as an extension to MINLP problems
of the MINOS algorithm [MS93] for large-scale nonlinear problems.
The premise of their approach is that the problems to be treated are
sufficiently large that techniques requiring the solution of several NLP re-
laxations, such as the branch-and-bound approach of Section 3.7, have pro-
hibitively large costs. They therefore wish to account for the presence of
the integer variables in the formulation and solve the mixed-integer problem
directly. This is achieved by fixing most of the integer variables to one their
bounds (the non basic variables) and allowing the remaining small subset
Mixed-Integer Nonlinear Optimization in Process Synthesis 37

(the basic variables) to take discrete values in order to identify feasible solu-
tions. After each iteration, the reduced costs of the variables in the nonbasic
set are computed to measure of their effect on the objective function. If a
change causes the objective function to decrease, the appropriate variables
are removed from the nonbasic set and allowed to vary for the next itera-
tion. When no more improvement in the objective function is possible, the
algorithm is terminated. This strategy leads to the identification of a local
solution.
The basic and nonbasic sets are initialized through the solution of con-
tinuous relaxation of the NLP. The solution obtained must then be rounded
to a feasible integer solution through a heuristic approach. The feasibility
approach has been tested on two types of large-scale problems: quadratic as-
signment and gas pipeline network design. The second problem poses more
difficulties as few variables are at either of their bounds when the continu-
ous relaxation is solved. Therefore, the problem has relatively few nonbasic
variables. This trend is preserved throughout the run, thus increasing the
computational complexity of each iteration.

3.10 Logic Based Approach


An alternative to the direct solution of the MINLP problem was proposed
by [TG96]. Their approach stems from the work of [KG89] on a mod-
eling/decomposition strategy which avoids the zero-flows generated by the
non-existence of a unit in a process network. The first stage of the algorithm
is the reformulation of the MINLP into a generalized disjunctive program of
the form:

min f(x) + ECi


i
s.t. g(x) ~ 0

[ hi(x)
Y; ~ 0 ] V ~Y;
[ Bix =0 ] i ED
Ci = 'Yi Ci =0 (33)
O(Y) = True
x E X~Rn
c ~ 0
Y E {True, False}q
where c is the variable vector representing fixed charges, x is the vector rep-
resenting all other continuous variables (flowrates, temperatures, ... ) and
38 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Y is the vector of Boolean variables, which indicate the status of a disjunc-


tion (True or False) and are associated with the units in the network. The
set of disjunctions D allows the representation of different configurations,
depending on the existence of the units. O(Y) is the set of logical relation-
ships between the Boolean variables, representing the interactions between
different network units. Instead of resorting to binary variables within a
single model, the disjunctions are used to generate a different model for
each different network structure. Since all continuous variables associated
with the non-existing units are set to 0 (Bix = 0, c = 0), this representation
helps to reduce the size of the problems to be solved.
Two algorithms are suggested by [TG96] in order to solve problems such
as (33). They are modifications of the Outer Approximation and Generalized
Benders Decomposition presented in Sections 3.2 and 3.1 respectively.

3.10.1 Logic-Based Outer Approximation

In the case of disjunctive programming, the primal problem is obtained


simply by fixing all the Boolean variables to a combination k of True and
False, yielding an NLP problem of the form:

min f(X)+~Ci
i
s.t. g(x) ~ 0
{ hi(x) ~ 0 for yk
I
= True
Ci = 'Yi
(34)

{ BiX =0
Ci = 0
for Yi,k = False
xEX~Rn
c?O

The master problem is a disjunctive linear program. If K NLP problems


have been solved, the nonlinear part of the objective and the nonlinear
inequality constraints which are not part of disjunctions are linearized at
the K solutions. The nonlinear constraints appearing in the ith disjunction
are linearized at the solutions of the NLPs belonging to the subset Ki=
{klYi,k = True, k = 1, ... ,K}. The master problem is then expressed as:
Mixed-Integer Nonlinear Optimization in Process Synthesis 39

mm /-tOA + ECi
i
s.t. f(zk) + \/f(zk)(z - zk) < /-tOA, k = 1, ... ,K
g(zk) + \/g(zk)(z _ zk) < 0, k = I, ... ,K

[ ~i~~
Yo + \/hi(zk)(z - zk) :5 0 1 V [ Bi:~ 0 liE D
Ci = 0
O(Y) = True
z E X~Rn
c ~ o
Y E {True, False}q
(35)

This type of problem can be solved as a disjunctive problem [Bea90],


or as an MILP [Ba185, RG94]. To ensure that all nonlinear disjunction
constraints are present in the master problem at every iteration, several
NLPs must be solved at the start of the algorithm. The structures to be
optimized are chosen in such a way that each Boolean variable is True in
at least one structure. A method for identifying the minimum number of
combinations required to satisfy this condition has been developed [TG96].
The algorithmic procedure for the logic-based outer approximation al-
gorithm is very similar to the original outer approximation algorithm:

Step 0
Formulate the generalized disjunctive program of the form (33).
Step 1
Set the counter: k = 1
Set the lower bound: LB D = -00
Set the upper bound: UBD = +00
Set the convergence tolerance: E ~ 0
Determine the minimum set of structures needed to obtain cuts
for all constraints. Generate the corresponding NLPs.
Step 2
Solve the NLP{s) for the fixed Boolean variables.
Obtain the optimal z and c vectors.
If the optimal solution of the NLP is less than the
current upper bound, update the upper bound.
Step 3
40 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Construct and solve the master problem.


Obtain its solution and yk+l
If the solution of the master is greater than the current lower bound
Update the lower bound.
If U BD - LBD ~ f
Terminate
Else
Update the counter: k = k + 1
Go to step 2

3.10.2 Logic-Based Generalized Benders Decomposition


The Generalized Benders Decomposition framework is not as readily adapted
to disjunctive programming as the Outer Approximation. The master prob-
lem is generated according to the following scheme:

1. Construct a master problem, as was done for the logic-based OA al-


gorithm (Problem (35)).
2. Transform the problem to an MILP.
3. Based on the values used for the Boolean variables in the previous
NLPs, fix the binary variables: LPs are obtained.
4. Solve the LPs and obtain the values of the Lagrange multipliers for
the constraints and the optimal continuous variables values.
5. Use these to construct an MILP problem: this is the master problem
for the logic-based GBD.

Both algorithms identify the global solution of problems which are convex
for all combinations of the Boolean variables.

4 Global Optimization for Nonconvex MINLPs


The algorithms discussed so far have a major limitation when dealing with
nonconvex problems. While identification of the global solution for convex
problems can be guaranteed, a local solution is often obtained for nonconvex
problems. A number of algorithms that have been developed to address
different types of nonconvex MINLPs are presented in this section.
Mixed-Integer Nonlinear Optimization in Process Syntbesis 41

4.1 Branch-and-reduce algorithm


[RS95] extended the scope of branch-and-bound algorithms to problems for
which valid convex underestimating NLPs can be constructed for the non-
convex relaxations. The range of application of the proposed algorithm
encompasses bilinear problems and separable problems involving functions
for which convex underestimators can be built [McC76, AK90). Because the
nonconvex NLP must be underestimated at each node, convergence can only
be achieved if the continuous variables are branched on. A number of tests
are suggested to accelerate the reduction of the solution space. They are
summarized here.

Optimality Based Range Reduction Tests For the first set of tests, an
upper bound U on the nonconvex MINLP must be computed and a convex
lower bounding NLP must be solved to obtain a lower bound L. If a bound
constraint for variable Xi, with xf $ Xi $ xy, is active at the solution of the
convex NLP and has multiplier >"i > 0, the bounds on Xi can be updated as
follows:
1. If Xi - xy = 0 at the solution of the convex NLP and K.i = xy _ U~L
,
is such that K.i > xf, then xf = K.i.
2. If Xi - xf = 0 at the solution of the convex NLP and K.i = xf + U~L
,
is such that K.i < xy, then xy = K.i.
If neither bound constraint is active at the solution of the convex NLP for
some variable Xi' the problem can be solved by setting Xi = xy or Xi = xf.
Tests similar to those presented above are then used to update the bounds
on Xi'

Feasibility Based Range Reduction Tests In addition to ensuring


that tight bounds are available for the variables, the constraint underesti-
mators are used to generate new constraints for the problem. Consider the
constraint gi(m, y) $ O. If its underestimating function fli(m, y) = 0 at the
solution of the convex NLP and its multiplier is I-'i > 0, the constraint
U-L
fli(m, y) ~ -~
can be included in subsequent problems.
The branch-and-reduce algorithm has been tested on very small prob-
lems.
42 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

4.2 Interval Analysis Based Algorithm


An algorithm based on interval analysis was proposed by [VEH96] to solve to
global optimality problems of the form (2) with a twice-differentiable objec-
tive function and once-differentiable constraints. Interval arithmetic allows
the computation of guaranteed ranges for these functions [Moo79, RR88,
Neu90]. Although the algorithm is not explicitly described as a branch-and-
bound approach, it relies on the same basic concepts of successive parti-
tioning of the solution space and bounding of the objective function within
each domain. Branching is performed on the discrete and the continuous
variables. The main difference with the branch-and-bound algorithms de-
scribed in Section 3.7 is that bounds on the problem solution in a given
domain are not obtained through optimization. Instead, they are based on
the range of the objective function in the domain under consideration, as
computed with interval arithmetic. As a consequence, these bounds may be
quite loose and efficient fathoming techniques are required in order to en-
hance convergence. [VEH96] suggest a number of tests to determine whether
the optimal solution lies in the current domain. In addition, they propose
branching strategies based on local solutions to the problem. In order to
avoid combinatorial problems, integrality requirements for the discrete vari-
ables are removed when performing interval calculations. Convergence is
declared when best upper and lower bounds are within a pre-specified toler-
ance and when the width of the corresponding region is below a pre-specified
tolerance.

4.2.1 Node Fathoming Tests for Interval Algorithm


The upper-bound test is a classical criterion used in all branch-and-bound
schemes: if the lower bound for a node is greater than the best upper bound
for the MINLP, the node can be fathomed.
The infeasibility test is also used by all branch-and-bound algorithms.
However, the identification of infeasibility using interval arithmetic differs
from its identification using optimization schemes. Here, an inequality con-
straint 9i(Z, y) :::; 0 is declared infeasible if Gi(X, Y), its inclusion over the
current domain, is positive. As soon as a constraint is found to be infeasible,
the current node is fathomed.
The monotonicity test is only used in interval-based approaches. If a
region is feasible, the monotonicity properties of the objective function can
be tested. For this purpose, the inclusions of the gradients of the objective
Mixed-Integer Nonlinear Optimization in Process Synthesis 43

with respect to each variable are evaluated. If all the gradients have a
constant sign for the current region, the objective function is monotonic
and only one point needs to be retained from the current node.
The nonconvexity test is used to test the existence of a solution (local
or global) within a region. If such a point exists, the Hessian matrix of the
objective function at this point must be positive semi-definite. A sufficient
condition for this to occur is the non-negativity of at least one of the diagonal
elements of its interval Hessian matrix. The interval Hessian matrix is the
inclusion of a Hessian matrix computed for a given domain.
[VEH96] advocate two additional tests to accelerate the fathoming pro-
cess. The first is the so-called lower bound test. It requires the computation
of a valid lower bound on the objective function through a method other
than interval arithmetic. If the upper bound at a node is less than this
lower bound, the region can be eliminated. The generation of such an upper
bound may occur in an interval-based approach as the constraints are not
used when evaluating the objective. Thus, a region may be found feasible
because of the overestimation inherent in interval calculations, and have an
upper bound lower than the optimal solution. For general problems, the
computation of a valid and tight lower bound on the objective function re-
quires the use of rigorous convex lower bounding techniques such as those
described in Section 4.5.
The second test, the distrust-region method, aims to help the algorithm
identify infeasible regions so that they can be removed from consideration.
Based on the knowledge of an infeasible point, interval arithmetic is used to
identify an infeasible hypercube centered on that point.

4.2.2 Branching Step


The variable with the widest range is selected for branching. It can be a
continuous or a discrete variable. In order to determine where to split the
chosen variable, a relaxation of the MINLP is solved locally.

Continuous Branching Variable If the optimal value of the continuous


branching variable, x*, is equal to one of the variable bounds, branch at the
midpoint of the interval. Otherwise, branch at x* - {3, where {3 is a very
small scalar.

Discrete Branching Variable If the optimal value of the continuous


branching variable, y*, is equal to the upper bound on the variable, define
44 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

a region with y = y* and one with yL ~ Y ~ y* - 1, where yL is the


lower bound on y. Otherwise, create two regions yL ~ Y ~ int(y*) and
int(y*) + 1 ~ y ~ yU, where yU is the upper bound on y.

This algorithm has been tested on a small example problem and a molec-
ular design problem [VEH96].

4.3 Extended Cutting Plane for Nonconvex MINLPs


The use of the ECP algorithm for nonconvex MINLP problems was suggested
in [WPG94], using a slightly modified algorithmic procedure. The main
changes occur in the generation of new constraints for the MILP at each
iteration (Step 4). In addition to the construction of the linear function
lk(Z,y) at iteration k, the following steps are taken:

1. Remove all constraints for which li (zk ,yk) > 9ii (zk , yk). These corre-
spond to linearizations which did not underestimate the corresponding
nonlinear constraint at all points due to the presence of nonconvexities.
2. Replace all constraints for which li (zk, yk) = 9ii (zk ,yk) = 0 by their
linearization around (zk, yk).
3. If constraint i is such that 9i(zk, yk) > 0, add its linearization around
(zk, yk).

The convergence criterion is also modified. In addition to the test used


in Step 3, the following two conditions must be met:

1. (zk - zk-l)T(zk - zk-l) ~ ~, a pre-specified tolerance.

2. yk _ yk-l = o.
The ECP algorithm has been used to address a nonlinear pump configu-
ration problem [WPG94], where it was found to give good results for convex
one-level problems, and to perform poorly for nonconvex problems. It has
also been tested on a small convex MINLP from [DG86]. Finally, a com-
parative study between the Outer Approximation, the Generalized Benders
Decomposition and the Extended Cutting Plane algorithm was presented in
[SHW+96]. A parameter estimation problem from FTIR spectroscopy and
a purely integer problem were addressed.
Mixed-Integer Nonlinear Optimization in Process Synthesis 45

4.4 Reformulation/Spatial Branch-and-Bound Algorithm


A global optimization algorithm branch-and-bound algorithm has been pro-
posed in [SP97]. It can be applied to problems in which the objective and
constraints are functions involving any combination of binary arithmetic op-
erations (addition, subtraction, multiplication and division) and functions
that are either concave over the entire solution space (such as In) or convex
over this domain (such as exp).
The algorithm starts with an automatic reformulation of the original
nonlinear problem into a problem that involves only linear, bilinear, linear
fractional, simple exponentiation, univariate concave and univariate convex
terms. This is achieved through the introduction of new constraints and vari-
ables. The reformulated problem is then solved to global optimality using
a branch-and-bound approach. Its special structure allows the construction
of a convex relaxation at each node of the tree. The integer variables can
be handled in two ways during the generation of the convex lower bounding
problem. The integrality condition on the variables can be relaxed to yield
a convex NLP which can then be solved globally. Alternatively, the integer
variables can be treated directly and the convex lower bounding MINLP can
be solved using a branch-and-bound algorithm as described in Section 3.7.
This second approach is more computationally intensive but is likely to re-
sult in tighter lower bounds on the global optimum solution.
In order to obtain an upper bound for the optimum solution, several
methods have been suggested. A local MINLP algorithm as those described
in Section 3 can be used. The MINLP can be transformed to an equivalent
nonconvex NLP by relaxing the integer variables. For example, a variable
y E {O, I} can be replaced by a continuous variable z E [0,1] by including
the constraint z - z . z = 0. The nonconvex NLP is then solved locally to
provide an upper bound. Finally, the discrete variables could be fixed to
some arbitrary value and the nonconvex NLP solved locally.
Branching variables in this algorithm can be either continuous or dis-
crete variables. An "approximation error" is computed for each term in
the problem as the distance between the original term and its convex relax-
ation. A variable that participates in the term with the largest such error
is selected for branching. Finally, the authors perform bound updates on
all variables in order to ensure tight underestimators are generated. This
algorithm has been applied to several problems such as reactor selection,
distillation column design, nuclear waste blending, heat exchanger network
design and multilevel pump configuration.
46 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

4.5 The SMIN-aBB Algorithm


This algorithm, proposed in [AAF7a], is designed to address the following
class of problems to global optimality:

min f(x) + x T Aoy + c'{;y


s.t. h(x) + x T A1y + cry = 0
g(x) + x T A2Y + cry :5 0 (36)
x E X~Rn
y E Yinteger

where c'{;,cr and cr are constant vectors, A o, Al and A2 are constant


matrices and f(x), h(x) and g(x) are functions with continuous second-
order derivatives.
The solution strategy for problems of type (36) is an extension of the
aBB algorithm for twice-differentiable NLPs [AMF95, AF96, ADFN97]. It
is based on the generation of two converging sequences of upper and lower
bounds on the global optimum solution. A rigorous underestimation and
convexification strategy for functions with continuous second-order deriva-
tives allows the construction of a lower bounding MINLP problem with
convex functions in the continuous variables. If no mixed-bilinear terms are
present (Ai = 0, Vi), the resulting MINLP can be solved to global optimal-
ity using the Outer Approximation algorithm (OA) described in Section 3.2.
Otherwise, the Generalized Benders Decomposition (GBD) can be used, as
discussed in Section 3.1, or the Glover transformations [Gl075] can be ap-
plied to remove these bilinearities and permit the use of the OA algorithm.
This convex MINLP provides a valid lower bound on the original MINLP.
An upper bound on the problem can be obtained by applying the OA al-
gorithm or the GBD to problem (36) to find a local solution. This bound
generation strategy is incorporated within a branch-and-bound scheme: a
lower and upper bound on the global solution are first obtained for the entire
solution space. Subsequently, the domain is subdivided by branching on a
binary or a continuous variable, thus creating new nodes for which upper
and lower bounds can be computed. At each iteration, the node with the
lowest lower bound is selected for branching. If the lower bounding MINLP
for a node is infeasible or if its lower bound is greater than the best upper
bound, this node is fathomed. The algorithm is terminated when the best
lower and upper bound are within a pre-specified tolerance of each other.
Mixed-Integer Nonlinear Optimization in Process Synthesis 47

Before presenting the algorithmic procedure, an overview of the under-


estimation and convexification strategy is given, and some of the options
available within the algorithm are discussed.

4.5.1 Convex Underestimating MINLP Generation


In order to transform an MINLP problem of the form (36) into a convex
problem which can be solved to global optimality with the OA or GBD
algorithm, the functions f{x), h{x) and g{x) must be convexified. The
underestimation and convexification strategy used in the aBB algorithm
has previously been described in detail [AAMF96, AF96, ADFN97]. Its
main features are exposed here.
In order to construct as tight an underestimator as possible, the non-
convex functions are decomposed into a sum of convex, bilinear, univariate
concave and general nonconvex terms. The overall function underestimator
can then be built by summing up the convex underestimators for all terms,
according to their type. In particular, a new variable is introduced to re-
place each bilinear term, and is bounded by the convex envelope of the term
[AKF83]. The univariate concave terms are linearized. For each nonconvex
term nt{x) with Hessian matrix Hnt{x), a convex underestimator L{x) is
defined as

(37)

where xV and xf are the upper and lower bounds on variable Xi respectively
and the a parameters are nonnegative scalars such that H nt {x)+2diag(ai) is
positive semi-definite over the domain [xL, xU]. The rigorous computation of
the a parameters using interval Hessian matrices is described in [AAMF96,
AF96, ADFN97].
The underestimators are updated at each node of the branch-and-bound
tree as their quality strongly depends on the bounds on the variables.

4.5.2 Branching Variable Selection


An unusual feature of the SMIN-aBB algorithm is the strategy used to
select branching variables. It follows a hybrid approach where branching
may occur both on the integer and the continuous variables in order to fully
exploit the structure of the problem being solved. After the node with the
48 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

lowest lower bound has been identified for branching, the type of branching
variable must be determined according to one of the following two criteria:

1. Branch on the binary variables first.

2. Solve a continuous relaxation of the nonconvex MINLP locally. Branch


on a binary variable with a low degree of fractionality at the solution.
If there is no such variable, branch on a continuous variable.

The first criterion results in the creation of an integer tree for the first q
levels of the branch-and-bound tree, where q is the number of binary vari-
ables. At the lowest level of this integer tree, each node corresponds to a
nonconvex NLP and the lower and upper bounding problems at subsequent
levels of the tree are NLP problems. The efficiency of this strategy lies in
the minimization of the number of MINLPs that need to be solved. The
combinatorial nature of the problem and its nonconvexities are handled se-
quentially. If branching occurs on a binary variable, the selection of that
variable can be done randomly or by solving a relaxation of the nonconvex
MINLP and choosing the most fractional variable at the solution.
The second criterion selects a binary variable for branching only if it
appears that the two newly created nodes will have significantly different
lower bounds. Thus, if a variable is close to integrality at the solution of
the relaxed problem, forcing it to take on a fixed value may lead to the
infeasibility of one of the nodes or the generation of a high value for a lower
bound, and therefore the fathoming of a branch of the tree. If no binary
variable is close to integrality, a continuous variable is selected for branching.
A number of rules have been developed for the selection of a continuous
branching variable. Their aim is to determine which variable is responsible
for the largest separation distances between the convex underestimating
functions and the original nonconvex functions. These efficient rules are
exposed in [AAF7b].

4.5.3 Variable Bound Updates


Variable bound updates performed before the generation of the convex
MINLP have been found to greatly enhance the speed of convergence of the
aBB algorithm for continuous problems [AAF7b]. For continuous variables,
the variable bounds are updated by minimizing or maximizing the chosen
variable subject to the convexified constraints being satisfied. In spite of its
computational cost, this procedure often leads to significant improvements
Mixed-Integer Nonlinear Optimization in Process Synthesis 49

in the quality of the underestimators and hence a noticeable reduction in


the number of iterations required.
In addition to the update of continuous variable bounds, the SMIN-aBB
algorithm also relies on binary variable bound updates. Through simple
computations, an entire branch of the branch-and-bound tree may be elimi-
nated when a binary variable is found to be restricted to 0 or 1. The bound
update procedure for a given binary variable is as follows:

1. Set the variable to be updated to one of its bounds Y = YB.

2. Perform interval evaluations of all the constraints in the nonconvex


MINLP, using the bounds on the solution space for the current node.

3. If any of the constraints are found infeasible, fix the variable to Y =


1-YB·

4. If both bounds have been tested, repeat this procedure for the next
variable to be updated. Otherwise, try the second bound.

4.5.4 Algorithmic Procedure


The algorithmic procedure for the SMIN-aBB algorithm is formalized as
follows:

Step 1
Set absolute tolerance fj set LBD* = -00 and UBD* = 00.
Set node counter k = O.
initialize set of nodes to be bounded, J = {O}j
N, the list of nodes to be explored, is empty.
Step 2 Bounding
For each node Nj,j E J:
Perform variable bound updates if desired.
Generate a convex lower bounding MINLP.
Solve convex MINLP using OA or GBD. Solution is LBDj.
If MINLP is infeasible, fathom the current node.
If LBDj :$ UBD*, add the current node to N.
Else, fathom the current node.
Step 3
Set LBD* to the lowest lower bound from the list N.
If UBD* - LBD* :5 f, terminate with solution UBD*.
50 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Otherwise, proceed to Step 4.


Step 4 Branching
Select the node from the list N with the lowest lower bound for
branching, Ni (i < k),
Its lower bound is LBDi.
Select a branching variable yB or x B .
Create two new regions NkH and Nk+2'
Set J = {k + 1, k + 2} and k = k + 2. Go back to Step 2.

4.6 The GMIN-aBB algorithm


This algorithm operates within a classical branch-and-bound framework as
described in Section 3.7. The main difference with the algorithms of [GR85],
[OOM90] and [BM91] is its ability to identify the global optimum solution
of a much larger class of problems of the form
min /(z,y)
fIJ,'11
s.t. h(z,y) = 0
g(z,y) ~ 0 (38)
z E X~Rn
y E Nq
where N is the set of non-negative integers and the only condition imposed
on the functions /(z, y), g(z, y) and h(z, y) is that their continuous relax-
ations possess continuous second-order derivatives.
This increased applicability results from the use of the aBB global opti-
mization algorithm for continuous twice-differentiable NLPs [AMF95, AF96,
ADFN97]. The basic concepts behind the aBB algorithm were exposed in
Section 4.5.
At each node of the branch-and-bound tree, the nonconvex MINLP is
relaxed to give a nonconvex NLP, which is then solved with the aBB algo-
rithm. This allows the identification of rigorously valid lower bounds and
therefore ensures convergence to the global optimum. In general, it is not
necessary to let the aBB algorithm run to completion as each one of its iter-
ations generates a lower bound on global solution of the NLP being solved.
A strategy of early termination leads to a reduction in the computational
requirements of each node of the binary branch-and-bound tree and faster
overall convergence.
The GMIN-aBB algorithm selects the node with the lowest lower bound
for branching at every iteration. The branching variable selection strategy
Mixed-Integer Nonlinear Optimization in Process Synthesis 51

combines several approaches: branching priorities can be specified for some


of the integer variables. When no variable has a priority greater than all
other variables, the solution of the continuous relaxation is used to iden-
tify either the most fractional variable or the least fractional variable for
branching.
Other strategies have been implemented to ensure a satisfactory con-
vergence rate. In particular, bound updates on the integer variables can
be performed at each level of the branch-and-bound tree. These can be
carried out through the use of interval analysis. An integer variable, y*,
is fixed at its lower (or upper) bound and the range of the constraints is
evaluated with interval arithmetic, using the bounds on all other variables.
If the range of any constraint is such that this constraint is violated, the
lower (or upper) bound on variable y* can be increased (or decreased) by
one. Another strategy for bound updates is to relax the integer variables,
to convexify and underestimate the nonconvex constraints and to minimize
(or maximize) a variable y* in this convexified feasible region. The resulting
lower (or upper) bound on relaxed variable y* can then be rounded up (or
down) to the nearest integer to provide an updated bound for y*.
A number of small nonconvex MINLP test problems as well as the pump
configuration problem of [WPG94] have been solved using this strategy.

5 Implementation: MINOPT
Although there are a number of algorithms available for the solution of
MINLPs, there are relatively few implementations of these algorithms. The
recent advances in the development of these algorithms has led to several
automated implementations of these MINLP algorithms.
The earliest implementations make use of the modeling system GAMS
[BKM92] which allows algebraic model representation and automatic inter-
facing with linear, nonlinear and mixed integer linear solvers. The algorith-
mic procedure, APROS [PF89], was developed for the automatic solution
of mathematical programming problems involving decomposition techniques
such as those used in the solution of MINLPs. APROS is an implemen-
tation of GBD and OA in GAMS where the modeling language is used
to generate the NLP and MILP subproblems which are solved through the
GAMS interface. G AMS also includes a direct interface to an implementa-
tion of OA/ER in the package DICOPT++ [VG90]. The model can be
written algebraically as an MINLP and the solver will perform the necessary
52 o. S. Adjiman, O. A. Schweiger, and O. A. Floudas

decomposition.
More recently, a framework MINOPT [SF97b] has been developed for
the solution of general mathematical programming problems. The primary
motivation for the development of MIN OPT (Mixed Integer Nonlinear OP-
Timizer) was to provide a user friendly interface for the solution of MINLPs.
The development has expanded to include an interface for solving many
classes of problems which include both algebraic and differential models.
The next section describes this package in more detail and includes the
results of its application to a number of example problems.
MINOPT has been developed as a framework for the solution of var-
ious classes of optimization problems. Its development has been brought
about by the particular need for implementations of algorithms applicable
to MINLPs. Further development has been done to address the solution
of problems which involve dynamic as well as algebraic models. Extensive
development of MINOPT has led to a highly developed computational tool.
MIN OPT has a number of features including:
• Extensive implementation of optimization algorithms

• Front-end parser
• Extensive options
• Expandable platform
• Interface routines callable as a subroutine
MINOPT is capable of handling a wide variety of problems described
by the variable and constraint types employed. MINOPT handles the
following variable types:
• continuous time invariant
• continuous dynamic

• control
• integer
and recognizes the following constraint types:

• linear
• nonlinear
Mixed-Integer Nonlinear Optimization in Process Synthesis 53

• dynamic
• dynamic path
• dynamic point
Different combinations of variable and constraint types lead to the following
problem classifications:

• Linear Program (LP)


• Nonlinear Program (NLP)
• Mixed Integer Linear Program (MILP)
• Mixed Integer Nonlinear Program (MINLP)
• Nonlinear Program with Differential and Algebraic Constraints
(NLP/DAE)
• Mixed Integer Nonlinear Program with Differential and Algebraic Con-
straints (MINLP /DAE)

• Optimal Control Program (OCP)


• Mixed Integer Optimal Control Program (MIOCP)
The MINOPT program has two phases: problem entry and problem
solution. During the first phase MINOPT reads the input from a file, saves
the problem information, and then determines the structure and consistency
of the problem by analyzing the constraints and variables. After the problem
has been entered, MINOPT proceeds to the second phase to solve the
problem. Based on the problem structure determined by MINOPT and
options supplied by the user, MINOPT employs the appropriate algorithm
to solve the problem.
The entry phase of MINOPT features a parser which reads in the dy-
namic and/or algebraic problem formulation from an input file. The input
file has a clear syntax and allows the user to enter the problem in a concise
form without needing to specify the steps of the algorithm. The input file
includes information such as variable names, variable partitioning (continu-
ous, integer, dynamic), parameter definitions, and option specifications. The
parser features index notation which allows for compact model representa-
tion. The parser allows for general constraint notation and has the ability
54 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

to recognize and handle the various constraint types (i.e. linear, nonlinear,
dynamic, point, path) and ultimately the overall structure of the problem.
The MINOPT parser also determines the necessary analytical Jacobian
information from the problem formulation.
The solution phase of MINOPT features extensive implementations
of numerous optimization algorithms. Once the parser has determined the
problem type, the solution phase applies the appropriate method to solve the
problem. MINOPT utilizes available software packages for the solution of
various subproblems. The solution algorithms implemented by MINOPT
are listed in Table 1. The solution algorithms implemented by MINOPT
are callable as subroutines from other programs.

I ·thms ImpI ement edby MINOPT


o u Ion algOr!
Table 1: SIt"
Problem Algorithm Solver
Type
LP Simplex method CPLEX
MINOS
LSSOL
MILP Branch and Bound CPLEX
NLP Augmented Lagrangian/Reduced Gradient MINOS
Sequential Quadratic Programming NPSOL
Sequential Quadratic Programming SNOPT
Dynamic Integration (Backward Difference) DASOLV
Optimal Control Parameterization DAEOPT
Control
MINLP Generalized Benders Decomposition MINOPT
Outer Approximation/Equality Relaxation MINOPT
Outer Approximation/Augmented Penalty MINOPT
Generalized Cross Decomposition MINOPT

MINOPT has an extensive list of options which allows the user to fine
tune the various algorithms.

• selection of different algorithms for a problem type

• selection of parameters for various algorithms

• solution of the relaxed MINLP


Mixed-Integer Nonlinear Optimization in Process Synthesis 55

Problem Structure

T 1
Input
Parser Master
File

T fixed-~ Iy
fixed V fixed x
command IrntegratoI
line
Main optimal Method Primal
solution z dz
A.
L
~ Ox
dh d
eXIt

Figure 4: Program flow for MINOPT

• auto-initialization procedure-relaxed MINLP solved to determine the


starting values for the y-variables.
• integer cuts for the GBD algorithm
• radial search technique for problems with discrete and continuous y
variables (GBD)
• alternative feasibility formulation for infeasible primal
• solution of the GBD master problem in terms of both z and y rather
than in y alone
• specification of parameters for external solvers
The flow of the program is described in Figure 4. The program is invoked
from the command line and parses the input file and stores the information
into a problem structure. The program then determines the appropriate
method to solve the problem based on the problem type and options provided
by the user. Based on the algorithm and parameters, MINOPT solves the
problem by formulating and solving various subproblems. When needed,
MINOPT draws necessary information from the problem structure.
56 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

The code for MINOPT has been written in portable ANSI C and can
be compiled on any computer. MINOPT has been developed with an
expandable platform in both the entry and solution phases of the program.
This parser can be expanded to recognize additional options, variable types,
commands, and constraint types that may be required of an algorithm. The
solution phase of the program can be expanded to implement additional
algorithms should they become available.

6 Computational Studies
There are numerous MINLP problems that arise in process synthesis and
design. MINOPT has been used as a computational tool to solve a wide
variety of these problems including heat exchanger network synthesis prob-
lems, design and scheduling of multipurpose batch plants, reactor network
synthesis, multicommodity facility location-allocation problems, parameter
estimation, and distillation sequencing problems. The computational results
for these problems run on a Hewlett Packard 9000/780 (C-160) are shown
in Table 2.
Two computational examples are selected from the area of process syn-
thesis. Both of these examples illustrate the use of a superstructure, the
mathematical modeling of the superstructure as well as the implementation
of an appropriate algorithm to solve the problem. The first problem is a
distillation sequencing problem and the second is a heat exchanger network
synthesis problem.

6.1 Distillation Sequencing


The distillation sequencing problem is to determine the configuration of
a separation system which will separate a given feed stream into desired
products which meet desired specifications. The details for the problem
description and model formulation can be found in [AF90].
The input fiowrate and composition are given along with the desired
product fiowrates and compositions. The problem is to determine the fiow-
rates and compositions of the streams and the interconnection of distillation
units which minimizes the annualized cost. The superstructure postulating
two distillation units for one input and two outputs is shown in Figure 5.
The continuous variables for the problem are the fiowrates, F, the mole
fractions, x, and the recoveries for the light key and heavy key, rlk and
rhk. The binary variables represent the existence of a distillation column, y.
Mixed-Integer Nonlinear Optimization in Process Synthesis 57

Table 2: Computational Results


Problem X Y Z L N T ITER CPU TIME
gbd_test 1 3 4 1 D 2/2/2 0.06/0.06/0.06
oaer_test 6 3 5 2 D 3/2/3 0.18/0.25
ap_test 2 1 3 1 D 2/-/2 0.07/-/0.07
minutil 206 87 - A 0.09
minmatch 82 18 - 129 - B 0.11
plan 22 14 - A 0.05
schedule 12 16 27 - B 0.13
bat des 10 9 18 2 D 5/2/2 0.28/0.12/0.13
complex 8 3 10 - B 0.08
alky 10 3 5 C 1.8
cart 6 4 3 4 E 0.63
param 12 2 1 10 E 4.38
feedtray 93 9 34 62 D 2/2/2 1.43/1. 74/5.38
procsel 7 3 6 2 D 6/4/3 0.31/0.27/0.28
alan 4 4 7 1 D 8/4/6 0.20/0.12/0.17
facility 1 16 6 18 1 D 3/3/4 0.14/0.17/0.21
facility2 54 12 33 1 D 5/5/7 0.43/0.73/0.97
ciricl 5 1 6 1 C 15/-/- 0.29/-/-
duran86-1 3 3 4 3 D 4/4/4 0.27/0.28/0.24
duran86-2 6 5 11 4 D 8/4/3 1.16/0.50/0.40
duran86-3 9 8 19 5 D 14/4/7 3.24/0.66/1.30
duran86-4 5 25 6 26 D 69/2/11 42.1/0.4/38.50
kocis88-4a 16 6 60 2 D 16/-/- 3.74/-/-
kocis88-4b 22 24 72 2 D 31/4/3 20.28/1.86/0.89
kocis89-2a 27 2 29 6 D 3/3/2 0.54/0.57/0.50
kocis89-2b 27 8 34 1 D 3/-/- 0.55
meanvl 21 14 44 1 D 10/4/3 0.48/0.33/0.26
meanv2 28 14 51 1 D 7/2/3 0.35/0.21/0.28
nousl 22 28 49 1 D 12/-/- 2.38/-/-
vdv 22 14 4 27 8 F 2/-/- 16.02/-/-
bindis 72 60 114 67 4 F 4/3/- 2860.0/2100.0/-
X, Y, Z, L, Nand T indicate the number of x, y, and z variables, number of linear
and nonlinear constraints and the type of the problem (A-LP, B-MILP, C-NLP, D-
MINLP, E-NLP jDAE, F-MINLP jDAE). ITER and CPU TIME indicate the num-
ber of iterations and cpu time for the MINLP problems (GBDjOAERjOAERAP).
58 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

80
30
o 20

100
100
100

20
70
80

Figure 5: Superstructure for Nonsharp Separation System

The derivation of the mathematical model involves a number of indices and


sets. The index set I = {i} denotes components, N = {k} denotes streams,
J = {j} denote columns, S = {8} denote splitters, Me = {me} denote the
mixers prior to each column, MI = {m/} denote final mixers prior to each
product, and P = {p} denote the products. The splitter 8°
E S represents
the initial splitting point of the feed stream. The following sets are defined
for the connection of the sets of splitters and mixers with the streams in the
superstructure.

- {Ill E N is an inlet to splitter 8} 8 E S


- {Ill E N is an outlet from splitter 8} 8 E S
- {Ill EN is an inlet to mixer m/} m l E MI
{lll E N is an outlet to mixer me} me E Me
Mixed-Integer Nonlinear Optimization in Process Synthesis 59

The inlet and outlet streams of the column are given by the following:

SUj _ {nln E N is the inlet to column j} j E J


SU~op _ {pip EN is the top product of column j} j E J
su~ot
J
_ {qlq E N is the bottom product of column j} j E J

The key components for the column are given by the following:

LKj _ {iii E I is the light key for column j} j E J


HKj _ {iii E I is the heavy key for column j} j E J
LHKj _ {iii E I is lighter than the heavy key for column j} j E J
HLKj _ {iii E I is heavier than the light key for column j} j E J
NDik _ {iii E I is not present in stream k} i E I, kEN

The problem formulation from [AF90] follows.

min E {aOj + (alj + a2jdj + a3jrr,J~ + E bijXik)Fk}


jEJ iEI
i E LKj,i' E HKj,k E SUj
s.t. E Fk- E Fk=O sE S
kES~" kESgut
FpXik - r~j fin = 0 i E LKj,n E SUj,P E S~rp,j EJ
FqXik - rtk lin = 0 iE LKj,n E SUj,q E SUj ot,j E J
lin - FnXin = 0 iE I,n E SUi
lin - FpXip - FqXiq = 0 iE I,j E J,n E SU;,
P E Su~op q E supot
J' J
E FIXil - E fil = 0 iEI,mcEMc
IEM:;:c IEM:;'''ct
Xik = 0 (i,k) E NDik
E -Cip = 0 i E I,p E P
IEMin
ml
EXik = 1
iEI
Fk - UYj ::; 0 j E J,k E SUi
Fn -Fp -Fq =0 n E SUJ·,p E SU~OP,q E supot
J J
60 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

E F, - E Pi = 0 m C E MC
IEM:;:c IEM;:'''ct
E F,-ECip pEP,iEI
iEM in i
ml
Efil- Pi 1 E SUj
i
F, - E lin - E [1 - (rf;)L]lin ~ 0
iELH Kj iEH Kj
top
n E SUj,l E SUj
F, - E lin - E [1 - (r~~)L]lin ~ 0
iEH LKj iELKj
n E SU·3' 1 E supot
3
iEI,nEN,nESUj
i Elk,
E (SU·3 u SU~op
3
u supot)
3
i E LKj,j E J
i E HKj,j E J

The data for the problem are taken from Example 2 in [AF90].
The nonlinearities in the problem are due to bilinear terms. Some of the
continuous variables in the problem are partitioned as y variables along with
the binary variables such that the primal problem is linear and thus convex.
Since the y variables consist of both continuous and binary variables, the
GBD algorithm must be used.
The problem is solved using MINOPT which incorporates a radial
search algorithm into the GBD algorithm. This option specifies that the full
NLP problem with only the binary variables fixed is solved after each primal
problem. The algorithm converges in 3 iterations and takes 1.36 seconds of
CPU time on a Hewlett Packard 9000/780 (C-160). The optimal sequence
utilizes a single column and the optimal flowsheet is shown in Figure 6.

6.2 Heat Exchanger Network Synthesis


The design of a heat exchanger network involving two hot streams, two cold
streams, one hot and one cold utility is studied. The formulation of [YG91]
is used. The annualized cost of the network is expressed as the summation
of the utility costs, the fixed charges for the required heat-exchangers and
an area-based cost for each each exchanger. The area is a highly nonlinear
function of the heat duty and the temperature differences at both ends of
Mixed-Integer Nonlinear Optimization in Process Synthesis 61

60
80
30
70 Ala.. 20
"Ill"

I
100
100
100
.....'"
.ill! 211.765
I

20
I 141.765 .... 70
80

28.235
~
"
Figure 6: Optimal Flowsheet for the Nonsharp Separation Problem

the heat exchanger. The binary variables, which represent the existence of a
given heat-exchanger, participate linearly in the problem. All the constraints
are linear. This nonconvex MINLP therefore provides an opportunity to test
the SMIN-aBB global optimization algorithm proposed in Section 4.5.

The stream data for the problem are summarized in Table 3. There
are two temperature intervals. The steam utility costs $80/kW-yr and the
cooling water costs $15/kW-yr. The fixed charges for the heat exchangers
amount to $5500/yr. The cost coefficient for the area-dependent part of the
heat exchanger costs is $300/yr. The overall heat transfer coefficients are
0.5 kW /m2 K for the hot stream-cold stream units, 0.83333 kW /m 2 K for
the cold stream-hot utility units and 0.5 kW /m 2 K for the hot stream-cold
utility units.

The superstructure for this problem is shown in Figure 7. There are 12


possible matches and therefore 12 binary variables. The global optimum
configuration involves six heat exchangers and is shown in Figure 8. Given
the set ST of K temperature locations, the set H P of hot process streams
and the set C P of cold process streams, the general problem formulation is
as follows:
62 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

(39)

(Tin,i - Tout,i) FCPi = L: L: Qijk + QCU,i Vi E HP


keSTjeCp
(Tout,j - Tin,j) Fcpj = L: L: Qijk + QHU,j V j E CP
keSTieHP
(Ti,k - Ti,k+1) FCPi = L: Qijk "IkE ST, Vi E HP
jeCP
(Tj,k - Tj,k+1) Fcpj = L: Qijk "IkE ST, Vj E CP
ieHP

Tin,i = 1i,1 ViEHP


1in,j = Tj,K VjEOP
1i,k ~ Ti,k+l "IkE ST, ViE H P
Tj,k ~ Tj,k+1 Vk EST, Vj E CP
Tout,i ~ 1i,K ViEHP
Tout,j ~ Tj,l VjEOP
(Ti,K - Tout,i) FCPi = QCU,i ViEHP
(Tout,j - Tj,l) Fcpj = QHU,j VjEOP
Qijk - nZijk ~ 0 "IkE ST, ViE H P, Vj E OP
QCU,i - nZCU,i ~ 0 ViEHP
QHU,j - nZHU,j ~ 0 VjEOP
Zijk, ZCU,i, ZHU,j E {O, I} "IkE ST, ViE H P, Vj E 0 P
T."I, k - ToJ, k + r(l - z"IJok) >
- t::..T.""k
IJ "IkE ST, ViE H P, Vj E OP
1i,k+1 - Tj,k+1 + r(l - Zijk) ~ t::..1ijk+1 "IkE ST, ViE H P, V j E OP
Ti,K - Tout,cu + r(l - ZCU,i) ~ t::..TCU,i ViEHP
Tout,HU - Tj,l + r(l - ZHU,j) ~ t::...THU,j VjEOP
t::..Tijk ~ 10 "IkE ST, ViE H P, Vj E OP
Mixed-Integer Nonlinear Optimization in Process Synthesis 63

Table 3: Stream data for heat exchanger network problem.


Stream tin (K) Tout (K) Fcp (kW /K)
Hot 1 650 370 10.0
Hot 2 590 370 20.0
Cold 1 410 650 15.0
Cold 2 350 500 13.0
Steam 680 680 -
Water 300 320 -

where the parameters are Ccu, the per unit cost of cold utility; CHU, the
per unit cost of hot utility; C F, the fixed charged for heat exchangers; C,
the area cost coefficient; 7in, the inlet temperature of a stream; Tout, the
outlet temperature; Fcp, the heat capacity flowrate of a stream; n, the
upper bound on heat exchange; r, the upper on the temperature difference.
The continuous variables are 7ik, the temperature of hot stream i at the
hot end of stage k; Tjk' the temperature of cold stream j at the cold end of
stage k, Qijk, the heat exchanged between hot stream i and cold stream j
at temperature location k; QCU,i, the heat exchanged between hot stream i
and the cold utility at temperature location k; QHU,j, the heat exchanged
between cold stream j and the hot utility at temperature location k; I.:::..Tijk,
the temperature approach for the match of hot stream i and cold stream j
at temperature location kj I.:::..Tcu,i, the temperature approach for the match
of hot stream i and the cold utility at temperature location kj I.:::..THu,j, the
temperature approach for the match of cold stream j and the hot utility at
temperature location k. The binary variables are Zijk, for the existence of a
match between hot stream i and cold stream j at temperature location kj
ZCU,i, for the existence of a match between hot stream i and the cold utility
at temperature location k; ZHU,j, for the existence of a match between cold
stream j and the hot utility at temperature location k.
Due to the linear participation of the binary variables, the problem can
be solved locally using the Outer Approximation or Generalized Benders
Decomposition algorithms described in Sections 3.2 and 3.1, and globally
using the SMIN-aBB algorithm of Section 4.5.
This problem can be solved locally using MINOPT. For both GBD
and OAER the problem is solved 30 times with random starting values for
the binary variables. The starting values for the continuous variables are
64 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Figure 7: Superstructure for the heat exchanger network problem.


Mixed-Integer Nonlinear Optimization in Process Synthesis 65

Figure 8: Optimum configuration for the heat exchanger network problem.


66 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

set to their lower bounds. The results of these runs are shown in Table 4.
Whereas GBD generally takes more iterations than OAER, it converges
to fewer local minima. Both algorithms obtain the global optimum roughly
the same number of times. When random starting values are used for both
the binary and continuous variables, the global optimum is obtained in all
30 runs.

Table 4: Local solutions for Heat Exchanger Network Synthesis problem


obtained with MINOPT
GBD
Local Solutions number of times average number
obtained of iterations
154997 11 16
155510 18 18
161010 1 14
OAER
Local Solutions number of times average number
obtained of iterations
154997 10 3.1
155510 6 3.8
167602 3 6
180848 1 5
189521 1 5
197983 7 3.6
199196 1 5
212678 1 3

6.2.1 Solution Strategy with the SMIN-aBB algorithm


When using the SMIN-aBB algorithm, the area-dependent cost of the heat
exchangers must be underestimated using the general convex lower bounding
function (37), in order to generate valid lower bounds on the objective func-
tion. The Outer Approximation algorithm is used to solve a lower bounding
convex MINLP at each node of the tree. When this MINLP is feasible, an
upper bound on the objective function is obtained by solving the nonconvex
MINLP locally in the same region. For the heat exchanger between hot
Mixed-Integer Nonlinear Optimization in Process Synthesis 67

stream i and cold stream j, the convex underestimator is expressed as

Uij [6.Tijlc6.Tijk+1 (6.Tijlc+6.Tijlc+1)/2] I


- a~k(Qf;k - Qijk)(Qijk - Qtk) (40)
- a~rlc (L1Tgk - L1Tijk)(b.Tijk - L1Ti1k)
- a~rk+l (L17i~k+l - L11"ijk+l)(b.1"ijk+l - L17i1k+1)·

where a~k' a~rk and a~rk+l are non-negative scalars obtained through
one of the methods described by [AAF7a]. The convex underestimator for
process stream-utility exchangers is similar, expect that one of the L1T's
is constant and only two a terms are therefore required. At the first level
of the branch-and-bound tree, all binary variables can take on a value of
either 0 or 1. As a result, every nonconvex term in the objective function
must be underestimated to obtain a lower bound valid for the entire solution
space. However, if branching occurs on the binary variables, the existence of
some units is pre-determined for subsequent levels of the branch-and-bound
tree. Thus, if some variable Zijk is fixed to 0 at a node of the tree, proper
updating of the variable bounds yields Qijk = Qtk = Qf;k = O. The bounds
on L1Tijk and L11"ijk+l become 10 ~ b.Tijk ~ 1"i,k - Tj,k + rand 10 ~
L11"ijk+l ~ Ti ,k+1 - Tj,k+1 + r. Since r is a large number, the convex terms
corresponding the L1T's do not naturally vanish from Equation (40). Even
though unit the area of unit (ij k) is 0, its cost appears in the underestimating
objective function as

-a~rk (b.Tgk - L11"ijk) (L11"ijk - L17i1k)


(41)
- a~rk+1(L1Tgk+1 - L17ijk+d(L11"ijk+l - L1Ti1k+l).
In order to eliminate this redundant term, it is therefore necessary to
introduce modified a parameters which account for the non-existence of a
unit. These new parameters are defined as

(42)

where zgk is the current upper bound on variable Zijk. According to Equa-
tion (42), if Zijk is fixed to 0, its upper bound Zf;k is 0 and a;J~Tk and
68 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

a;;~Tk+l vanish. The convex underestimator for unit (ijk) no longer par-
ticipates in the lower bounding objective function. On the contrary, if Zijk
is fixed to 1 or remains free to take on the value of 0 or 1, the convex
underestimator is preserved.
This analysis of the objective function emphasizes the importance of the
branching strategy in the generation of tight lower bounds on the objective
function. Several branching strategies were used for this problem. First, the
continuous variables were branched on exclusively (Run 1). Then, for Runs 2
and 3, the binary variables were branched on first, followed by the continuous
variables. Finally, the "almost-integer" strategy described in Section 4.5.2
was used for Runs 4, 5 and 6. A binary variable was declared to have a low
degree of fractionality if its value z* at the solution of the relaxed MINLP
was such that min{z*, 1-z*} :5 zdist. For Run 4, zdist = 0.1 was used and
for Runs 5 and 6, zdist = 0.2 was used.
A number of variable bound update strategies were also tested for this
problem. In Runs 1 and 2, updates were performed only for the continuous
variables. In all other runs, the bounds on the binary variables were also
updated. In Run 6, the effect of updating the bounds on only a fraction of
the continuous variables was studied.
The results are shown in Figure 9 and Table 5. Branching on the con-
tinuous variables only results in slow asymptotic convergence of algorithm
to the global optimum solution (Run 1). The rate of convergence is greatly
improved when the binary variables can be used for branching (Runs 2 to 6).
Although the "almost-integer" branching strategy exhibits the best perfor-
mance in terms of iterations (Runs 4 to 6), the lowest CPU requirements
correspond to Run 3, which branches on all the binary variables before turn-
ing to the continuous variables. The average time spent on each iteration
of the algorithm is therefore greater when the "almost-integer" strategy is
applied. Two factors can account for this increase in the computational
requirements. First, the selection of a binary branching variable requires
the solution of a nonconvex MINLP. In addition, the generation of a lower
bound on the solution at almost every node of the branch-and-bound tree
for Runs 4 to 6 necessitates the solution of a convex MINLP. By comparison,
only 58% of the nodes in the branch-and-bound tree for Run 3 involve the
solution of a convex MINLP. Lower bounds at the remaining nodes are ob-
tained by solving less expensive convex NLPs. Addressing the combinatorial
aspects of the problem first by branching on the binary variables thus leads
to the better performance of the SMIN-aBB algorithm.
Mixed-Integer Nonlinear Optimization in Process Synthesis 69

- - Run 1
-Run2
--_. Run 3
- - Run4
- - - Run 5
- - Run6

o 200 400 600 800


Number of iterations
Figure 9: Progress of the lower bound for the heat exchanger network
70 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

Table 5: Global optimization of heat exchanger network - Note that Run 1


converges asymptotically.
Run Iterations CPU sec Deepest Binary
level branches
1 800 2210 60 -
2 753 1116 26 343
3 604 755 23 173
4 451 1041 18 97
5 422 935 26 112
6 547 945 22 127

6.2.2 Specialized Algorithm for Heat Exchanger Network Prob-


lems
A global optimization algorithm specifically designed for this type of prob-
lem was proposed in [ZG97]. The basic framework of this approach is a
branch-and-bound algorithm where the branching variables are the stream
temperatures. Special convex underestimators have been devised for the
cost function, provided there is no stream splitting. Upper bounds on the
problem are obtained by fixing the binary variables and solving a noncon-
vex NLP locally or globally. The branch-and-bound search is preceded by
a heuristic local MINLP optimization step which allows the identification
of a good starting point. No computational results using this approach are
known for the example presented here.

7 Conclusions
As was demonstrated in this paper, mathematical programming techniques
are a valuable tool for the solution of process network applications. The
optimization approach to process synthesis illustrates their use for an im-
portant industrial application. It was shown that this procedure generates
Mixed-Integer Nonlinear Programming problems (MINLPs) and a number
of algorithms capable of addressing such problems were presented, including
decomposition-based methods, branch-and-bound and cutting plane tech-
niques. Considerable progress has been made in handling both the combi-
natorial aspects of the problem as well as nonconvexity issues so that the
Mixed-Integer Nonlinear Optimization in Process Synthesis 71

global solution of increasingly complex problems can be identified. The de-


velopment of the SMIN-aBB and GMIN-aBB algorithms has extended the
class of problems that can rigorously be solved to global optimality.
The increasing capability of MINLP algorithms has permitted the de-
velopment of automated frameworks such as MINOPT, in which general
mathematical representations can be addressed. These developments have
led researchers in numerous fields to employ mathematical modeling and nu-
merical solution through MINLP optimization techniques in order to address
their problems.
A number of issues must be resolved in order to develop algorithms that
can handle more complex and realistic problems. Although computational
power has increased, the ability for MINLP algorithms to solve large scale
problems is still limited: a large number of integer variables leads to com-
binatorial problems, and a large number of continuous variables leads to
the generation of large scale NLPs. In addition, rigorous models capable of
accurately describing industrial operations usually involve complex mathe-
matical expressions and result in problems which are difficult to solve using
standard procedures. Finally, approaches to address important challenges
such as the inclusion of dynamic models and optimal control problems into
the MINLP framework are emerging [SF97a].

8 Acknowledgments
The authors gratefully acknowledge financial support from the National Sci-
ence Foundation, the Air Force Office of Scientific Research, the National
Institutes of Health, and Mobil Technology Company.

References
[AAF7a] C. S. Adjiman, I. P. Androulakis, and C. A. Floudas, Global
optimization of MINLP problems in process synthesis and design,
Comput. Chern. Eng. 21 (1997a), S445-S450.

[AAF7b] C. S. Adjiman, I. P. Androulakis, and C. A. Floudas, A global op-


timization method, aBB, for general twice-differentiable NLPs -
II. Implementation and computational results, accepted for pub-
lication, 1997b.
72 c. S. Adjiman, C. A. Schweiger, and C. A. Floudas

[AAMF96] C. S. Adjiman, I. P. Androulakis, C. D. Maranas, and C. A.


Floudas, A global optimisation method, aBB, for process design,
Comput. Chern. Eng. Suppl. 20 (1996), S419-S424.
[ADFN97] C. S. Adjiman, S. Dallwig, C. A. Floudas, and A. Neumaier, A
global optimization method, aBB, for general twice-differentiable
NLPs - 1. Theoretical advances, accepted for publication, 1997.
[AF90] A. Aggarwal and C.A. Floudas, Synthesis of general distilla-
tion sequences-nonsharp separations, Comput. Chem. Eng. 14
(1990), no. 6, 631-653.
[AF96] C. S. Adjiman and C. A. Floudas, Rigorous convex underesti-
mators for general twice-differentiable problems, J. Glob. Opt. 9
(1996), 23-40.
[AK90] F. A. AI-Khayyal, Jointly constrained bilinear programs and re-
lated problems: An overview, Comput. Math. Applic. 19 (1990),
no. 11, 53-62.
[AKF83] F. A. AI-Khayyal and J. E. Falk, Jointly constrained biconvex
programming, Math. of Oper. Res. 8 (1983), 273-286.
[AMF95] I. P. Androulakis, C. D. Maranas, and C. A. Floudas, aBB :
A global optimization method for general constrained nonconvex
problems, J. Glob. Opt. 7 (1995), 337-363.
[BaI85] E. Balas, Disjunctive programming and a hierarchy of relaxations
for discrete optimization problems, SIAM Journal on Algebraic
and Discrete Methods 6 (1985), 466-486.
[Bea77] E. M. L. Beale, The State of the Art in Numerical Analysis,
ch. Integer programming, pp. 409-448, Academic Press, 1977,
pp. 409-448.
[Bea90] N. Beaumont, An algorithm for disjunctive programs, European
Journal of Operations Research 48 (1990), no. 3, 362-371.
[Ben62] J. F. Benders, Partitioning procedures for solving mixed-variables
programming problems, Numer. Math. 4 (1962), 238.
[BGG+71] M. Benichou, J. M. Gauthier, P. Girodet, G. Hentges, G. Ri-
biere, and O. Vincent, Experiments in mixed-integer linear pro-
gramming, Math. Prog. 1 (1971), no. 1, 76-94.
Mixed-Integer Nonlinear Optimization in Process Synthesis 73

[BKM92] Anthony Brooke, David Kendrick, and Alexander Meeraus,


Gams: A user's guide, Boyd & Fraser, Danvers, MA, 1992.

[BM91] B. Borchers and J. E. Mitchell, An improved branch and bound


algorithm for mixed integer nonlinear programs, Tech. Report
RPI Math Report No. 200, Renssellaer Polytechnic Institute,
1991.

[DG86] M. A. Duran and 1. E. Grossmann, An outer-approximation al-


gorithm for a class of mixed-integer nonlinear programs, Math.
Prog. 36 (1986), 307-339.

[FAC89] C. A. Floudas, A. Aggarwal, and A. R. Ciric, Global opti-


mal search for nonconvex NLP and MINLP problems, Comput.
Chem. Eng. 13 (1989), no. 10, 1117.

[FL94] R. Fletcher and S. Leyffer, Solving mixed integer nonlinear pro-


grams by outer approximation, Math. Prog. 66 (1994), no. 3,
327.

[Flo95] C. A. Floudas, Nonlinear and mixed integer optimization: Fun-


damentals and applications, Oxford University Press, 1995.

[Geo72] A. M. Geoffrion, Generalized benders decomposition, J. Opt.


Theory Applic. 10 (1972), no. 4, 237-260.

[Glo75] F. Glover, Improved linear integer programming formulations of


nonlinear integer problems, Management Sci. 22 (1975), no. 4,
445.

[GR85] O. K. Gupta and R. Ravindran, Branch and bound experiments


in convex nonlinear integer programing, Management Sci. 31
(1985), no. 12, 1533-1546.

[Gup80] O. K. Gupta, Branch and bound experiments in nonlinear integer


programming, Ph.D. thesis, Purdue University, 1980.

[HoI90] K. Holmberg, On the convergence of the cross decomposition,


Math. Prog. 47 (1990), 269.

[KeI60] J. E. Kelley, The cutting plane method for solving convex pro-
grams, Journal of the SIAM 8 (1960), no. 4, 703-712.
74 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

[KG87] G. R. Kocis and 1. E. Grossmann, Relaxation strategy for the


structural optimization of process flow sheets, Ind. Eng. Chem.
Res. 26 (1987), no. 9, 1869.
[KG89] G. R. Kocis and 1. E. Grossmann, A modelling and decomposi-
tion strategy for the MINLP optimization of process flowsheets,
Comput. Chern. Eng. 13 (1989), no. 7, 797-819.

[LW66] E. L. Lawler and D. E. Wood, Branching and bound methods: A


survey, Oper. Res. (1966), no. 14, 699-719.

[McC76] G. P. McCormick, Computability of global solutions to factorable


nonconvex programs: Part I - convex underestimating problems,
Math. Prog. 10 (1976), 147-175.

[MM85] H. Mawengkang and B. A. Murtagh, Solving nonlinear integer


programs with large-scale optimization software, Annals of Op-
erations Research 5 (1985), no. 6, 425-437.

[MM86] H. Mawengkang and B. A. Murtagh, Solving nonlinear integer


programs with large scale optimization software, Ann. of Oper.
Res. 5 (1986), 425.

[Mo079] R. E. Moore, Interval analysis, Prentice-Hall, Englewood Cliffs,


NJ,1979.

[MS93] Bruce A. Murtagh and Michael A. Saunders, Minos 5.4 user's


guide, Systems Optimization Laboratory, Department of Opera-
tions Research, Stanford University, 1993, Technical Report SOL
83-20R.
[Neu90] A. Neumaier, Interval methods for systems of equations, Ency-
clopedia of Mathematics and its Applications, Cambridge Uni-
versity Press, 1990.

[OOM90j G. M. Ostrovsky, M. G. Ostrovsky, and G. W. Mikhailow, Dis-


crete optimization of chemical processes, Comput. Chem. Eng.
14 (1990), no. 1, 111.

[PF89j G. E. Paules, IV and C. A. Floudas, APROS: Algorithmic devel-


opment methodology for discrete-continuous optimization prob-
lems, Oper. Res. 37 (1989), no. 6, 902-915.
Mixed-Integer Nonlinear Optimization in Process Synthesis 75

[QG92] I. Quesada and I. E. Grossmann, An LP/NLP based branch and


bound algorithm for convex MINLP optimization problems, Com-
put. Chem. Eng. 16 (1992), no. 10/11, 937-947.
[RG94] R. Raman and I. E. Grossmann, Modeling and computational
techniques for logic based integer programming, Comput. Chem.
Eng. 18 (1994), 563-578.

[RR88] H. Ratschek and J. Rokne, Computer methods for the range of


functions, Ellis Horwood Series in Mathematics and its Applica-
tions, Halsted Press, 1988.
[RS95] H. S. Ryoo and N. V. Sahinidis, Global optimization of nonconvex
NLPs and MINLPs with applications in process design, Comput.
Chem. Eng. 19 (1995), no. 5, 551-566.
[SF97a] C. A. Schweiger and C. A. Floudas, Interaction of design and
control: Optimization with dynamic models, Optimal Control:
Theory, Algorithms, and Applications (W. W. Hager and P. M.
Pardalos, eds.), Kluwer Academic Publishers, 1997, accepted for
publication.

[SF97b] C. A. Schweiger and C. A. Floudas, MINOPT: A software pack-


age for mixed-integer nonlinear optimization, Princeton Univer-
sity, Princeton, NJ 08544-5263, 1997, Version 2.0.

[SHW+96] H. Skrifvars, I. Harjunkoski, T. Westerlund, Z. Kravanja, and


R. Porn, Comparison of different MINLP methods applied on
certain chemical engineering problems, Comput. Chem. Eng.
Suppl. 20 (1996), S333-S338.

[SP97] E.M.B. Smith and C.C. Pantelides, Global optimisation of non-


convex minlps, Comput. Chem. Eng. 21 (1997), S791-S796.

[TG96] M. Tiirkay and I. E. Grossmann, Logic-based MINLP algorithms


for the optimal synthesis of process networks, Comput. Chem.
Eng. 20 (1996), no. 8, 959-978.
[VEH96] R. Vaidyanathan and M. EI-Halwagi, Global optimization of non-
convex MINLP's by interval analysis, Global Optimization in
Engineering Design (I. E. Grossmann, ed.), Kluwer Academic
Publishers, 1996, pp. 175-193.
76 C. S. Adjiman, C. A. Schweiger, and C. A. Floudas

[VG90] J. Viswanathan and I. E. Grossmann, A combined penalty func-


tion and outer approximation method lor MINLP optimization,
Comput. Chem. Eng. 14 (1990), no. 7, 769-782.
[WP95] T. Westerlund and F. Pettersson, An extended cutting plane
method lor solving convex MINLP problems, Comput. Chem.
Eng. Suppl. 19 (1995), 131-136.
[WPG94] T. Westerlund, F. Pettersson, and I. E. Grossmann, Optimiza-
tion 01 pump configuration problems as a MINLP probem, Com-
put. Chern. Eng. 18 (1994), no. 9, 845-858.
[YG91] T. F. Yee and I. E. Grossmann, Simultaneous optimization model
lor heat exchanger network synthesis, Chemical Engineering Op-
timization Models with GAMS (I. E. Grossmann, ed.), CACHE
Design Case Studies Series, vol. 6, 1991.
[ZG97] J .M. Zamora and I.E. Grossmann, A comprehensive global opti-
mization approach lor the synthesis 01 heat exchanger networks
with no stream splits, Comput. Chem. Eng. 21 (1997), S65-S70.
77

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 77-148
@1998 Kluwer Academic Publishers

Approximate Algorithms and Heuristics for


MAX-SAT
Roberto Battiti
Dipartimento di Matematica, Universita di 1Tento
38050 Povo (1Tento), Italy
E-mail: battitHlscience.unitn.it

Marco Protasi
Dipartimento di Matematica, Universita di Roma "Tor Veryata"
Via della Ricerca Scientifica, 00133 Roma, Italy
E-mail: protasH!mat.utovrm.it

Contents
1 Introduction 18
1.1 Notation and graphical representation . . . . . . . . . . . . . . . .. 79

2 Resolution and Linear Programming 80


2.1 Resolution and backtracking for SAT 80
2.2 Integer programming approaches 83

3 Continuous approaches 85
3.1 An interior point algorithm . . . . . . . 85
3.2 Continuous unconstrained optimization 86

4 Approximate algorithms 86
4.1 Definitions and basic results. . . . . . . . . . 87
4.2 Johnson's approximate algorithms . . . . . . 91
4.3 Randomized algorithms for MAX W-SAT . . 99
4.3.1 A randomized 1/2-approximate algorithm for MAX W-SAT 99
4.3.2 A randomized 3/4-approximate algorithm for MAX W-SAT. 102
4.3.3 A variant of the randomized rounding technique 107
4.4 Another i-approximate algorithm by Yannakakis . . . . 110
4.5 Approximate solution of MAX W-SAT: improvements. 115
4.6 Negative results about approximability. . . . . . . . . . 116
78 R. Battiti and M. Protasi

5 A different MAX-SAT problem and completeness results 117

6 Local search 118


6.1 Quality oflocal optima . . . . . . . . . . . . 120
6.2 Non-oblivious local optima . . . . . . . . . 122
6.2.1 An example of non-oblivious search. 124
6.3 Local search satisfies most 9-SAT formulae. 125
6.4 Randomized search for 2-SAT (Markov processes) 126
7 Memory-less Local Search Heuristics 128
7.1 Simulated Annealing . . . . . . . . . . . . . . . . 128
7.2 GSAT with "random noise" strategies . . . . . . 129
7.3 Randomized Greedy and Local Search (GRASP) 130
8 History-sensitive Heuristics 132
8.1 Prohibition-based Search: TS and SAMD 132
8.2 HSAT and "clause weighting" . . . . . . . 133
8.3 Reactive Search . . . . . . . . . . . . . . . 133
8.3.1 The Hamming-Reactive Tabu Search (H-RTS) algorithm. 135
9 Experimental analysis and threshold effects 136
9.1 Models . . . . . . . . . . . . . . 137
9.2 Hardness and threshold effects . 138

References

1 Introd uction
In the Maximum Satisfiability (MAX-SAT) problem one is given a Boolean
formula in conjunctive normal form, i.e., as a conjunction of clauses, each
clause being a disjunction. The task is to find an assignment of truth values
to the variables that satisfies the maximum number of clauses.
In our work, n is the number of variables and m the number of clauses,
so that a formula has the following form:

A( V lik)
l~i~m l~k~lcil

where IGil is the number of literals in clause Gi and lik is a literal, i.e.,
a propositional variable Uj or its negation Uj, for 1 ~ j ~ n. The set of
clauses in the formula is denoted by C. If one associates a weight Wi to
each clause Gi one obtains the weighted MAX-SAT problem, denoted as
Approximate Algorithms for MAX-SAT 79

MAX W-SAT: one is to determine the assignment of truth values to the n


variables that maximizes the sum of the weights of the satisfied clauses. Of
course, MAX-SAT is contained in MAX W-SAT (all weights are equal to
one). In the literature one often considers problems with different numbers
k of literals per clause, defined as MAX-k-SAT, or MAX W-k-SAT in
the weighted case. In some papers MAX-k-SAT instances contain 'Up to k
literals per clause, while in other papers they contain exactly k literals per
clause. We consider the second option unless otherwise stated.
MAX-SAT is of considerable interest not only from the theoretical side
but also from the practical one. On one hand, the decision version SAT was
the first example of an NP-complete problem [25], moreover MAX-SAT
and related variants play an important role in the characterization of dif-
ferent approximation classes like APX and PT.AS [8]. On the other hand,
many issues in mathematical logic and artificial intelligence can be expressed
in the form of satisfiability or some of its variants, like constraint satisfac-
tion. Some exemplary problems are consistency in expert system knowledge
bases [69], integrity constraints in databases [7, 35], approaches to inductive
inference [49, 56], asynchronous circuit synthesis [46, 74].
The main purpose of this work is that of summarizing the basic ap-
proaches for the exact or approximated solution of the MAX W-SAT and
MAX-SAT problem. The presentation of algorithms for the related SAT
problem is therefore limited to a quick overview of some basic techniques
and of methods that can be used also for MAX-SAT. Of course, given the
impressive extension of the research in this area, we are not aiming at a
comprehensive survey of the literature, and we have confined ourselves to
citing the sources that we have used, some sources of historical significance,
and some papers that are paradigmatic for the different approaches.

1.1 Notation and graphical representation


=
A clause will be represented either as C uV v V z or as a set of literals, as
in C = {uvz}.
For the following discussion, it can be useful to help the intuition with a
graphical representation of a formula in conjunctive normal form, as depicted
in Fig. 1. In the figure, one has a case of MAX 9-SAT: all clauses have
three literals and the formula is:

(Ul V U3 V U5) A (U2 V U4 V U5) A (Ul V U3 V U4)

Truth values to variables are assigned by placing a black triangle to the left
if the variable is true, to the right if it is false. Each literal is depicted with
80 R. Battiti and M. Protasi

a small circle, placed to the left if the corresponding variable is true, to the
right in the other case. If a literal is matched by the current assignment (e.g.,
if the literal asks for a true value and the variable is set to true, or if is
asks for false and the variable is false), it is shown with a gray shade. The
coverage of a clause is the number of literals in the clause that are matched
by the current assignment, and it is illustrated by placing a black square
in the appropriate position of an array with indices ranging from 0 to the
number of literals in each clause Ic!.

m
.r----------------------,.
l e

'.' '.
coverage

0 1 2 3
r'
, ,


I
Cl - + -~- - 1- -
n _ 1 _ J __ ,_ _
C2
,
L. C3 0 0 .: ,

Figure 1: A formula in conjunctive normal form (CNF).

2 Resolution and Linear Programming


2.1 Resolution and backtracking for SAT
The basic method to solve SAT formulae is given by the recursive replace-
ment of a formula by one or more formulae, the solution of which implies
the solution of the original formula.
In resolution a variable is selected and a new clause, called the resolvent
is added to the original formula. The process is repeated to exhaustion or
until an empty clause is generated. The original formula is not satisfiable if
and only if an empty clause is generated [77].
Let us now consider some details: A clause R is the resolvent of clauses
=
C l and C2 iff there is a literal I E C l with 1 E C2 such that R (Cl \ {I}) U
(C2 \ {1}) and u(l), the variable associated to the literal, is the only variable
appearing both positively and negatively.
For the two clauses C l = (1 V al V ... V aA) and C 2 = (1 V bl V •.• V bB) the
=
resolvent is therefore the clause R (al V ... VaA Vb l v . ..VbB). The resolvent
Approximate Algorithms for MAX-SAT 81

is a logical consequence of the logical and of the two clauses. Therefore, if


the resolvent is added to the original set of clauses, the set of solutions does
not change. It is immediate to check that, if both C 1 and C2 are satisfied,
Le., have at least one matched literal, the resolvent must also be satisfied.
In fact, if it is not, in the original clauses there are no matched literals apart
from either lor I, but this implies that both clauses cannot be satisfied (see
also Fig. 2 for a graphical illustration).

1 a b c d

Cl 0-0 00------ Q---- -----0


0 ---0
C2

~ 0 ~ ~
R 0 0 0 0

Figure 2: How to construct a resolvent, an example with variables I, a, b, c, d.

Davis and Putnam [29] started in 1960 the investigation of useful strate-
gies for handling resolution. In addition to applying transformations that
preserve the set of solutions they eliminate one variable at a time in a cho-
sen order by using all possible resolvents on that variable. During resolution
the lengths and the number of added clauses can easily increase and become
extremely large.
Davis, Logemann and Loveland [28] avoid the memory explosion of the
original DP algorithm by replacing the resolution rule with the splitting rule
(Davis, Putnam, Logemann and Loveland, or DPLL algorithm for short).
In splitting, a variable u in a formula is selected. Now, if there exist a
satisfying truth assignment for the original formula then either u is true
or u is true in the assignment. In the first case the formula obtained by
eliminating all clauses containing u and by deleting all occurrences of u must
be satisfied, see Fig 4. This derived formula is called C(u) in Fig. 3. In the
second case, the formula obtained by eliminating all clauses containing u and
all occurrences of u must be satisfied. Vice versa, if both derived formulae
cannot be satisfied, neither can the original problem.
A tree is therefore generated. At the root one has the original problem
82 R. Battiti and M. Protasi

DPLL( C : set of clauses)


Input: Boolean CNF formula C = {C1 ,C2, ... ,Cm}
Output: Yes or No (decision about satisfiability)
1 if C is empty then return Yes
2 if C contains an empty clause then return No
3 if there is a pure literall in C then return DPLL(C(I))
4 if there is a unit clause {I} E C then return DPLL(C(I))
5 Select a variable u in C
6 if DPLL(C(u)) = Yes then return Yes
7 else return DPLL(C(u))
Figure 3: The DPLL algorithm by Davis, Logemann and Loveland in re-
cursive form. The recursive calls are executed on the problems derived after
setting the truth value of the selected variable.

a b c u e
.................
0 0 fL- _Q
.' Cl
--0 0-- -0 ............................................

.. ~
: ................. C2

w~;: U
0
)
0 .......
...... \
' ..

j~
."

l; a b c e a b c e
\............ ) :0 0 0 0 0 ~./
............. ;;'
........ 0 0 0 0
0

Figure 4: Example of splitting on a variable u.

and no variables are assigned values. At each node of the tree one generates
two children by selecting one of the yet unassigned variables in the problem
corresponding to the node and by generating the two problems derived by
setting the variable to true or false. A trivial upper bound on the number
of nodes in the tree is proportional to the number of possible assignments,
i.e., 0(2"). In fact, sophisticated techniques are available to reduce the
number of nodes, that nonetheless remains exponential in the worst case.
Approximate Algorithms for MAX-SAT 83

The techniques include:

• avoiding the examination of a subtree when the fate of the current


problem is decided (problems with an empty clause have no solutions,
problems with no clauses have a solution). If the current problem
cannot be solved, or if it is solved but one wants all possible solutions,
one backtracks to the first unexplored branch of the tree. Note that,
when splitting is combined with a depth-first search of the tree (as in
the DPLL algorithm) one avoids the memory explosion because only
one subproblem is active at a given time.
• selecting the next variable for the splitting based on appropriate cri-
teria. For example, one can prefer variables that appear in clauses
of length one (unit clause rule), or select a pure literal (such that it
occurs only positive, or only negative), or select a literal occurring in
the smallest clause.
A recent review of advanced techniques for resolution and splitting is
presented in [45], and a summary of algorithms for deciding propositional
tautologies is presented in [64].
It is worth noting that approaches based on the Davis-Putnam scheme
tend to achieve the fastest speed when solving SAT problems. A recent
state-of-the-art parallel implementation in given in [16]. Their previous
sequential implementation turned out to be the fastest program in a SAT
competition [18].

2.2 Integer programming approaches


The MAX W-SAT problem has a natural integer linear programming for-
mulation (ILP). Let Y; = 1 if Boolean variable u; is true, y; = 0 if it is
false, and let the Boolean variable Zi = 1 if clause Ci is satisfied, Zi = 0
otherwise. The integer linear program is:
m
maxLwi Zi
i=1

subject to the following constraints:

L Y; + L (1 - y;) > Zi, i = 1, ... ,m


;Eul ;EUi-

Y; E {O, 1} , j = 1"" ,n
=
Zi E {O, 1} , i 1, ... , m
84 R. Battiti and M. Protasi

where ut and Ui- denote the set of indices of variables that appear un-
negated and negated in clause Gi, respectively.
Because the sum of the zi Wi is maximized and because each Zi appears
as the right-hand side of one constraint only, Zi will be equal to one if and
only if clause Gi is satisfied.
If one neglects the objective function and sets all Zi variables to 1, one
obtains an integer programming feasibility problem associated to the SAT
problem [15].
The integer linear programming formulation of MAX-SAT suggests that
this problem could be solved by a branch-and-bound method. A tree is
generated, see also the DPLL method, where the root corresponds to the
initial instance and two children are obtained by branching, j.e., by selecting
one free variable and setting it true (left child) and false (right child). An
upper bound on the number of satisfied clauses can be obtained by using a
linear programming relaxation: the constraints Yj E {O, I} and Zi E {O, I}
are replaced by Yj E [0,1] and Zi E [0,1]. One obtains a Linear Programming
(LP) problem that can be solved in polynomial time and, because the set
of admissible solutions is enlarged with respect to the original problem, one
obtains an upper bound.
Unfortunately this is not likely to work well in practice [48] because the
solution Yj = 1/2, j = 1,· .. ,n, Zi = 1, i = 1, ... ,m is feasible for the LP
relaxation unless there exist some constraint containing only one variable.
The bounds so obtained would be very poor.
Better bounds can be obtained by using Chvatal cuts. In [49] it is shown
that the resolvents in the propositional calculus correspond to certain cutting
planes in the integer programming model of inference problems.
A general cutting plane algorithm for ILP, see for example [73], works
as follows. One solves the LP relaxation of the problem: if the solution is
integer the algorithm terminates, otherwise one adds linear constraints to
the ILP that do not exclude integer feasible points. The constraints are
added one at a time, until the solution to the LP relaxation is integer.
The application considered in [49] is to determine whether a formula in
the propositional calculus implies another one. The propositional calculus
is a formal logic involving propositions and logic connectives such as "not",
"and", "or" and "implies". Any formula is equivalent to a conjunction of
clauses (in particular, a rule in a knowledge base, such as "if A and B
then G" can be written as a clause "not-A or not-B or G"). Thus, a set
of clauses can be represented by a particular linear system Ax ~ a, also
called generalized set covering problem. Finally, one can determine whether
Ax ~ a implies a formula F by expressing not-F as a system Bx ~ b of
Approximate Algorithms for MAX-SAT 85

clauses, see for example [65], and checking whether the combined system
Ax ~ a, Bx ~ b has a binary solution. If it does not, Ax ~ a implies F.
In the applications Ax ~ a can represent a knowledge base known to be
consistent.
Cutting planes are generated in [49] by finding separating resolvents, i.e.,
a resolvent (expressed as a linear inequality) that is violated by the current
solution of the LP relaxation. When no separating resolvents are found, the
current system is solved with branch-and-bound. The experimental results
are that the cutting plane algorithm is orders of magnitude faster on prob-
lems in which the premises do not imply the given proposition (the majority
of random problems), and moderately faster on other random problems [49].
LP relaxations of integer linear programming formulations of MAX-SAT
have been used to obtained upper bounds in [47,84, 39]. A linear program-
ming and rounding approach for MAX 2-SAT is presented in [22]. Their
cutting plane algorithm starts from the LP relaxation of MAX 2-SAT , and
has separation routines for two families of cuts: cycle and wheel inequalities.
Upper and lower bounds are found, the latter by using a rounding proce-
dure to convert a fractional LP solution to a {O, 1} solution. A method for
strengthening the Generalized Set Covering formulation is presented in [70],
where Lagrangian multipliers guide the generation of cutting planes.

3 Continuous approaches
3.1 An interior point algorithm
The ILP feasibility problem obtained from SAT as described in the pre-
vious section is solved with an interior point algorithm in [56, 57]. In the
interior point algorithm one applies a function minimization method based
on continuous mathematics to the inherently discrete SAT problem.
In [57] the application is to a problem of inductive inference, in which
one aims at identifying a hidden Boolean function using outputs obtained
by applying a limited number of random inputs to the hidden function. The
task is formulated as a SAT problem, which is in turn formulated as an
integer linear program:

(1)

where AT is an m X n real matrix and c a real m vector.


The interior point algorithm is based on finding a local minimum in the
86 R. Battiti and M. Prota.si

box -1 $ Yj $ 1 of the potential function:

(2)

by an iterative method. The denominator of the argument of the log is the


geometric mean ofthe slacks (ak is the k-th column of matrix A). It is shown
that, if the integer linear program has a solution, y* is a global minimum
of this potential function if and only if y* solves the integer program. The
next iterate yk+ 1 (interior point solution, i.e., such that AT y < c) is obtained
by moving in a descent direction ay from the current iterate yk such that
<J>(yk+1) = <J>(yk + nay) < <J>(yk). Each iteration in [57] is ba.sed on the trust
region approach of continuous optimization where the Riemannian metric
used for defining the search region is dynamically modified. The feasibility
of the approach for inductive inference is demonstrated in [57].

3.2 Continuous unconstrained optimization


In some techniques the MAX-SAT (or SAT) problem is transformed into
an unconstrained optimization problem on the real space R n and solved by
using existing global optimization techniques.
Some examples of this approach include the UNISAT models [43] and
the neural network approaches [54, 19]. In general, these techniques do not
have performance guarantees because they assure only the local convergence
to a locally optimal point, not necessarily the global optimum.
The local convergence properties of some optimization algorithms are
considered in [44]. The main results are that, for any CNF formula, if y* is
a solution point of the objective function f defined on Rn associated to the
problem, Le., f(y*) = 0, the Hessian matrix H(y*) is positive definite and
therefore the convergence ratios of the steepest descent, Newton's method
and coordinate descent methods can be derived, see [44] for the details. Let
us note that, to obtain these results, one assumes that the initial solution is
"sufficiently close" to the optimal solution.

4 Approximate algorithms
The present section presents the first important approximate algorithms for
MAX-SAT. However, in order to evaluate the goodness of the algorithms,
one needs to define the meaning of approximation algorithm with a "guar-
anteed" quality of approximation.
Approximate Algorithms for MAX-SAT 87

In the following it is assumed that the reader is familiar with the concepts
of the complexity classes classes P and NP and with elementary concepts
from probability theory.

4.1 Definitions and basic results


First of all, let us present a general definition of optimization problem.

Definition 4.1 An optimization problem P = (I, sol, m, opt) belongs to the


class NpO if the following holds:
1. the set of instances I is recognizable in polynomial time,
2. given an instance x E I, sol (x) is the set of the feasible solutions of
x; moreover there exists a polynomial q such that, given an instance
x E I, for any y E sol (x), Iyl < q(lxl) and, besides, for any y such
that Iyl < q(lxl), it is decidable in polynomial time whether y E sol(x),
9. given an instance x E I, a feasible solution y of x, m(x, y) is the ob-
jective function and is computable in (deterministic) polynomial time.
4. opt E {max, min} specifies whether one has a maximization or mini-
mization problem.

Finally m*(x) will denote the optimal value of instance x. When it is


clear from the context, one will use simply m*.
A problem belonging to Npo will be called an Npo problem.
Note that the difficulty of solving an Npo problem is based on the fact
that, in many cases, the set of feasible solutions is exponentially large.
Even if not explicitly stated, there is a nondeterministic polynomial
time computation model underlying this definition. The nondeterminis-
tic machine of polynomial complexity may run in the following way: in
non-deterministic polynomial time, all strings y such that Iyl < q(lxl) are
generated. Afterwards any string is tested for membership in sol(x) in poly-
nomial time. If the test is positive, m(x, y) is computed (again in polynomial
time) and both y and m(x, y) are returned.
The definition of the class Npo formalizes the notion of optimization
problem with an associated decision version which is in NP. In addition, let
as define as PO the subclass of Npo formed by problems that can be solved
in polynomial time. Many classical combinatorial problems belong to the
class NPOj for instance, the traveling salesperson problem, the knapsack
problem, the minimal covering of a graph and so on.
88 R. Battiti and M. Protasi

MAX W-SAT is another important example of ,NPO problem. In this


case one has
1. I = sets U of Boolean variables and a collection C = Ct, ... ,Cm of
clauses over U, a set W = WI,"" wm of integers (weights) associated
to the clauses.
2. Given an instance x of I, 801(x) = set of truth assignments U to the
variables in the problem. Moreover lUI < (Ixl) and it is possible to
decide in polynomial time whether a string is a truth assignment for
the formula;
3. Given an instance x of I and a feasible solution y of x, m(x, y) =
sum of the weights associated to the satisfied clauses; m is trivially
computable in polynomial time
4. opt= max.
A subset is given by the MAX-SAT problem, obtained when all weights
are equal to one. Note that, in this definition, the set of Boolean variables
and a truth assignment are denoted with the same symbol U; even if formally
questionable, this identification will allow to simplify the presentation of
many results. However, when this abuse of notation could raise problems to
the reader, different symbols will be used.
Because an ,NPO problem that is not in PO cannot be solved in polyno-
mial time unless P = ,Np, a natural approach consists of looking for "good"
approximate solutions.
Definition 4.2 Given an ,NPO problem P = (1,801, m, opt), an algorithm
A is an approximation algorithm if, for any given instance x E I, it returns
an approximate solution, that is a feasible solution A(x) E sol(x).
Because the present work is dedicated to MAX-SAT the following def-
initions will be restricted to the case of maximization problems.
An approximation algorithm can be usefully applied only if it achieves
approximate solutions whose values are "near" to the optimum value. There-
fore one is interested in determining how far from the optimal value is the
value of the achieved solution.
Definition 4.3 Given an ,Npo problem P, an instance x and a feasible
solution y, the performance ratio of y is
R( ) _ m(x,y)
X,y - m*(x) .
Approximate Algorithms for MAX-SAT 89

When the performance ratio is close to 1, the value of the approximate


solution is close to the optimum one.

Definition 4.4 Given an Npo problem P and an approximation algorithm


A, A is said to be an €-approximate algorithm if, given any input instance x,
the performance ratio of the approximate solution A( x) verifies the following
relation:
R(x, A(x)) ~ c.

In other words, the solution provided by the algorithm must guarantee


at least a value cm*(x).

Definition 4.5 An NpO problem P is c-approximable if there exists a


polynomial-time c-approximate algorithm for P, with 0 < c < 1

Definition 4.6 AP X is the class of all NpO problems that are c-approx-
imable.

For a problem to join AP X it is sufficient that the performance ratio is


greater than or equal to c for a particular value of c. Of course, the goodness
of the approximation algorithm strictly depends on how near € is to 1.
In fact, in the class APX there are problems with different values of c
and therefore with different approximation properties. After taking this fact
into account, the definition of c-approximate algorithm can be strengthened
in the following way:

Definition 4.7 Let P be an NpO problem. An algorithm A is said to be a


polynomial-time approximation scheme (PTAS) for P if, for any instance
x of P and any rational value 0 < c < 1, A(x, c) returns an c-approximate
solution of x in time polynomial in the size of the instance x.

Definition 4.8 PTAS is the class of Npo problems that allow a polynomial-
time approximation scheme.

The following theorem holds:

Theorem 4.1 • MAX W-SAT belongs to the class APX.

• MAX W-SAT does not belong to the class PTAS unless P = Np.
90 R. Battiti and M. Protasi

The first part of the theorem will be demonstrated in the sequel, while
citations will be given for the second part.
Finally, let us conclude this Subsection by introducing the important
notion of completeness in an approximation class. As it is done in the
N'P-completeness theory, one is interested in finding the " hardest" problem
in the classes A'P X and 'PTAS that is, the most difficult ones from a
computational point of view. One looks for problems which cannot have
stronger approximation properties unless 'P = N'P .
In complexity theory, the notion of hardest problem in a class is equiva-
lent to saying that a problem is complete with respect to a suitable reduction.
The same approach can be followed for approximation classes. Therefore, a
definition of approximation-preserving reduction is presented and one will
be able to define a complete problem. Actually, many different reductions
have been proposed. In this paper we consider the reduction presented in [8]
which has the relevant advantage that it can be used for defining the notion
of completeness both in A'PX and in 'PTAS.
Intuitively, in order to map an optimization problem P l into another
optimization problem P2 , we need not only a function f mapping instances
of P l into instances of P2 but also a second function 9 mapping back feasible
solutions of P2 into feasible solutions of Pl, see also Fig. 5.

Definition 4.9 Let Pl and P2 be two N'Pe) problems. Pl is said to be


'PTAS -reducible to P2 (Pl ~1'r.AS P2 ) if three functions f, g, and c exist
such that

1. For any x E I PI' f (x) E I P2 is computable in polynomial time.

2. For any x E IPl1 for any y E solp2 (f(x)), and for any e E (O,l)Q (set
of positive rational numbers smaller than 1), g(x, y, e) E solpl (x) 1,S

computable in time polynomial with respect to both 1 x 1 and 1 y I.

a. c: (0, l)Q 4- (0, l)Q is computable and surjective.


4. For any x E Ipl1 for any y E sol~ (f(x)) , and for any e E (O,l)Q,
1 - c(e) ~ Rp2((f(x), y) implies 1 - e ~ Rpl (x, g(x, y, e)).
The triple (f, g, c) is said to be a 'PTAS -reduction from P l to P2 •

It is easy to demonstrate the following Lemma:

Lemma 4.2 If Pl ~1'r.AS P2 and P2 E A'P X (respectively, P2 E 'PTAS),


then P l E A'P X (respectively, Pl E 'PTAS).
Approximate Algorithms for MAX-SAT 91

PI P2

Instances Instances
f

Figure 5: Approximation preserving reduction.

For a proof see [8].


Definition 4.10 A problem PI E ,NPO (respectively, PI E APX) is,NPO-
complete (respectively AP X -complete) if, for any P2 E ,NPO (respectively,
P2 E APX), P2 '5:.P7.M Pl.
The above Lemma shows that the reduction we have introduced is really
capable of preserving the level of approximability. Moreover, a consequence
of the definition of ,NPO-completeness (respectively AP X-completeness)
is that an ,NPO-complete (respectively APX-complete) problem does not
belong to AP X (respectively to PTAS).

4.2 Johnson's approximate algorithms


Let us now present the two first approximate algorithms for MAX W-SAT.
They were proposed by Johnson [52] and use greedy construction strategies.
92 R. Battiti and M. Protasi

The original paper [52] demonstrated for both of them a performance ratio
1/2. Recently it has been proved in [21] that the second one reaches a
performance ratio 2/3. The two algorithms by Johnson are presented for
the unweighted case: it is a simple exercise to add weights.
The first algorithm chooses, at each step, the literal that occurs in the
maximum number of clauses. If the literal is positive, the corresponding
variable is set to true; if the literal is negative, the corresponding variable
is set to false. The clauses satisfied by the literal are deleted from the
formula and the algorithm stops when the formula is satisfied or all variables
have been assigned values. More formally, this procedure is developed in
algorithm GREEDY JOHNSONl of Fig. 6.
GREEDyJOHNSONl
Input: Boolean CNF formula C = {C1 , C2, ... , Cm}j
Output: Truth assignment Uj
l::. The satisfied clauses will be incrementally inserted in the set Sj
l::. U is the truth assignmentj
l::. for every literal 1, u(l) is the corresponding variablej
1 S +- 0j LEFT +- OJ V+-{ u I u variable in C}j
2 repeat
3 Find 1, with u(l) E V, that is in max. no. of clauses in LEFT
4 Solve ties arbitrarily
5 Let {C, I , ••• ,C'lc } be the clauses in which 1 occurs
6 S +- SU{C, I , · · · ,C'lc }
7 LEFT +- LEFT \ {C, I , ••• ,C'lc }
8 if 1 is positive then u(l) +- true else u(l) +- false
9 V +- V \ {u(l)}
10 until no literal 1 with u(l) E V is contained in any clause of LEFT
11 if Vi- 0 then forall u E V do u +- true
12 return U

Figure 6: The GREEDyJOHNSON1 algorithm, a k/(k+ 1)-approximate algo-


rithm.

Theorem 4.3 Algorithm GREEDY JOHNSON1 is a polynomial time 1/2-ap-


proximate algorithm for MAX-SAT.

Proof. One can prove that, given a formula with m clauses, algorithm
GREEDyJOHNSONl always satisfies at least m/2 clauses, by induction on
the number of variables. Because no optimal solution can be larger than
Approximate Algorithms for MAX-SAT 93

m, the theorem follows. The result is trivially true in the case of one vari-
able. Let us assume that it is true in the case of i - 1 variables (i > 1)
and let us consider the case in which one has i variables. Let u be the last
variable to which a truth value has been assigned . We can suppose that u
appears positive in kl clauses, negative in k2 clauses and does not appear in
m - kl - k2 clauses. Without loss of generality suppose that kl 2 k 2. Then,
by inductive hypothesis, algorithm GREEDyJoHNsoNl allows us to choose
suitable values for the remaining i - 1 variables in such a way to satisfy at
least (m - kl - k2 )/2 clauses; if according to the algorithm we now choose
u = true we satisfy

clauses .
• at least half of these clauses for the greedy choice

"wounds"
,
0:
0:

at least half of these clauses


for the inductive hypothesis
~
,
E

Figure 7: Illustration of the GREEDY JOHNSONl algorithm.

Let us note that one does not use the fact that the chosen literal occurs
in the maximum number of clauses for the above proof. What is required is
that, given an unset variable that appears in at least an unsatisfied clause,
the variable is set to true or false in a way that maximizes the number of
newly satisfied clauses.
This result can be made more specific by considering the number of
variables in a clause.
94 R. Battiti and M. Protasi

Theorem 4.4 Let k be the minimum number of variables occurring in any


clause of the formula. For any integer k 2:: 1, algorithm GREEDyJOHNSON1
achieves a feasible solution y of an instance x such that
m(x,y) > 1- _1_
m*(x) - k + l'
Proof. Because of the greediness, when literal I is picked in line 3 of Fig. 6,
the number of newly satisfied clauses is at least as large as the number of
new wounds, defined as the number of occurrences of literal I in clauses
of LEFT that will never be matched in the future steps, given the choice
of 1, see Fig. 7. When the algorithm halts, the only clauses remaining in
LEFT are those that have a number of wounds equal to the number of their
literals, and hence are dead. This means that, when the algorithm halts,
there are at least klLEFT I wounds, and therefore lSI 2:: klLEFT I. Thus
m* ~ m = lSI + ILEFT I ~ (kt 1) lSI. The bound follows .
• Note that, according to the definition of performance ratio, algorithm
GREEDyJOHNSON1 is k!l-approximate. In particular, for k = 1, the per-
formance ratio is 1/2, for k = 2 the performance ratio is 2/3, for k = 3 the
performance ratio is 3/4 and so on. This means that the goodness of the
algorithm improves for larger values of k. Therefore the worst case is given
by k = 1, that is, when one has unit clauses (clauses with just one literal).
Johnson introduced a second algorithm (GREEDYJOHNSON2). This al-
gorithm improves the performance ratio and obtains a bound 2/3 [21]. Until
very recently, only a performance ratio 1/2 was demonstrated [52]. The orig-
inal theorem in [52] is here presented, because of its simplicity and paradig-
matic nature and because it gives a better performance as a function of
k, the minimum number of literals in some clause. In the algorithm one
associates a mass W(Ci) = 2- 10;1 to each clause. The term mass is used
instead of the original term "weight" in order to avoid confusions with the
clause weight in the MAX W-SAT problem. The mass will be proportional
to the weight in the version of the algorithm for the MAX W-SAT problem
(w(C;) = Wi 2- 10;1). In [52] the analysis of the performance of algorithm
GREEDyJOHNsoN2 leads to the following:

Theorem 4.5 Let k be the minimum number of clauses occurring in any


clause of the formula. For any integer k 2:: 1, algorithm GREEDyJOHNsoN2
achieves a feasible solution y of an instance x such that
m(x,y) > 1-..!..
m*(x) - 2k '
Approximate Algorithms for MAX-SAT 95

GREEDVJOHNSON2
Input: Boolean CNF formula C = {ClI C2,'" ,Cm };
Output: Truth assignment U;
b:. The satisfied clauses will be incrementally inserted in the set S;
b:. U is the truth assignment;
b:. for every literal I, let u(l) be the corresponding variable;
1 S +- 0; LEFT +- C; V+- {u I u variable in C};
2 Assign to each clause Ci a mass W(Ci) = 2- 10;1
3 repeat
4 Determine u E V, appearing in at least a clause E LEFT
5 Let CT be the clauses E LEFT cont. u, CF those cont. u
6 if EO;ECT w(Ci) ~ EOiECF W(Ci) then
7 u(l) +- true
8 [ S +- SUCT
9 LEFT +- LEFT \ CT
10 foraH Ci E CF do W(Ci) +- 2· W(Ci)
11 else
12 u(l) +- false
13 [ S +- SUCF
14 LEFT +- LEFT \ CF
15 foraH Ci E CT do W(Ci) +- 2· W(Ci)
16 until no literal I in any clause of LEFT is such that u(l) is in V
17 if V =j:. 0 then forall u E V do u +- true
18 return U

Figure 8: The GREEDVJOHNSON2 algorithm, a (1 - 1/2k )-approximate al-


gorithm.

Proof. Initially, because each clause has at least k literals, the total mass of
all the clauses in LEFT cannot exceed m/2k. During each iteration, the total
mass of the clauses in LEFT cannot increase. In fact, the mass removed from
LEFT is at least as large as the mass added to those remaining clauses which
receive new wounds, see lines 6-15 of Fig. 8. Therefore, when the algorithm
halts, the total mass still cannot exceed m/2k. But each of the dead clauses
in LEFT when the algorithm halts must have been wounded as many times
as it had literals, hence must have had its mass doubled that many times,
and so must have final mass equal to one. Therefore ILEFT I ::5 m/2k, and
so lSI ~ m(l - 1/2 k ) and the bound follows .

96 R. Battiti and M. Protasi

Again, for larger values of k, algorithm GREEDyJOHNSON2 obtains bet-


ter performance ratios and, generally speaking, because 1 - ~ > 1 - k!l for
any integer k ~ 2, algorithm GREEDyJOHNSON2 has a better performance
than that of algorithm GREEDyJoHNSONl.
The performance ratio 2/3 has been proved in a paper by Chen, Friesen,
and Zheng [21]. Because they consider the MAX W-SAT problem, line 2
in Fig. 8 must be modified to take the weights Wi into account: the mass
becomes W(Ci) = Wi 2- 10•1• The preceding bound 1/2 depends on the fact
that the only upper bound used in the above proofs was given by the total
weight of the clauses; of course this upper bound can be far from the optimal
value. The novelty of the approach of [21] is that the performance ratio can
be derived by using the correct value of the optimal solution. In order to
prove that algorithm GREEDyJOHNsON2 has this better performance ratio
let us introduce a generalization of the algorithm. It is important to stress
that this generaliZation is introduced to perform a more accurate analysis
of the performance ratio and it is used in the following as a theoretical tool.
The difference between GREEDY JOHNSON2 and its generalization is rather
subtle. The generalized algorithm, that we denote as GENJOHNSON2, con-
siders an arbitrary Boolean array b[1..n] of size n as additional input, and
examines b to decide what to do if an equality is present in line 6 of Fig. 8.
Let us assume that the variable one is considering is Uj. In line 6 of
GREEDyJOHNsoN2 in Fig. 8, when EO,ECT W(Ci) = EO.ECF W(Ci) , the
if condition is true and Uj is set to true. Now, instead, when one obtains
an equality one considers two different cases: if the variable bfj] is true Uj
is set to true; if the variable bfj] is false Uj is set to false.
This generalized algorithm is then used in the proof with this Boolean
array equal to the optimal assignment. Of course the optimal assignment
cannot be derived in polynomial time but here we are not interested in
running an algorithm but in performing a theoretical analysis.
We will prove that GENJOHNSON2 has a performance ratio 2/3 and this
fact will imply that also GREEDyJOHNSON2 has performance ratio 2/3.
Let us give some definitions needed in the proof.

Definition 4.11 • A literal is positive if it is a Boolean variable Ui for


some i.
• A literal is negative if it the negation Ui of a Boolean variable for
some i.

Definition 4.12 Assume that algorithm GENJOHNSON2 is applied to a for-


mula C and consider a fixed moment in the execution.
Approximate Algorithms for MAX-SAT 97

• A literal I is active if it has not been assigned a truth value yet.


• A clause Cj is killed if all literals in Cj are assigned value false.
• A clause C; is negative if it is neither satisfied nor killed, and all active
literals in Cj are negative literals.
Definition 4.13 Let 0 ~ t ~ n. Assume that in GENJoHNsoN2 the t-th
iteration has been completed (a truth assignment has been given to t vari-
ables). Then st denotes the set of satisfied clauses, Kt denotes the set of
killed clauses, N~ denotes the set of negative clauses with exactly i active
liberals.
Without less of generality, one assumes that each clause in the formula
has at most r literals. The proof of the performance ratio 2/3 depends on
the following Lemma.
Given a set of clauses C, let us define as w( C) the sum of the weights of
all clauses of C.

Lemma 4.6 For any formula C of MAX W-SAT and for any Boolean ar-
ray b[1..n], when the algorithm GENJOHNsoN2 is applied on C the following
inequality holds at all iterations 0 ~ t ~ n:
1
L
r
w(st) ~ 2w(Kt) + 2i - 1 w(ND - Ao (3)
i=l

where Ao = Ei=l 21~1 weN?)


The proof of the Lemma proceeds by induction on t and can be found
in [21].

Theorem 4.7 The performance ratio of algorithm GREEDVJOHNSON2 is


2/3.
Proof. Let C be an instance of MAX W-SAT and let Uo an optimal truth
assignment for C. Now one considers another formula C' that is derived from
C as follows. If Uo(Ut) = false for a variable Ut then one negates Ut (Ut and
fit are interchanged) in C'. No change on the weights is done. Therefore
there exists a one-to-one correspondence between the set of clauses in C
and the set of clauses in C' j moreover the corresponding clauses have the
same weight. In addition, the Boolean array b[1..n] is constructed such that
bfj] = false if and only if Uo (Uj) = false.
It is easy to see (for the details, see again [21]) that
98 R. Battiti and M. Protasi

• the weight of an optimal assignment to C' is equal to the weight of an


optimal assignment to C .
• the truth assignment for C found by GREEDY JOHNSON2 and the truth
assignment for C' found by GENJOHNsON2 have the same weight.
This means that, if we prove that GENJOHNSON2 has a performance ratio
2/3 on the formula C', the theorem is shown.
Note that the truth assignment U~ for C' that gives value true to all
variables corresponds to the optimal truth assignment Uo for C. Therefore
U~ is optimal for C'.
When GENJOHNsoN2 stops, that is, for t = n, S" is the set satisfied by
the algorithm and K" is the set of clauses not satisfied. N? is the empty
set for any i.
Applying the inequality 3 of Lemma 4.6 to this case, one obtains:

(4)
On the other hand, Ao can be upper bounded in the following way:

r 1 r r

Ao = L 2i - 1 weN?) $ L weN?) $ 2 L weN?). (5)


;=1 i=1 i=1

From inequalities 4 and 5 one has:

~w(sn) ~ w(S") + w(Kn) - t


;=1
weN?) (6)

Note that, on one hand, w(S") is the weight of the truth assignment
found by GENJOHNSON2. On the other hand, S" U K" is the whole set of
clauses in C' and the optimal truth assignment U~ for C' tha.t gives value
true to all variables satisfies all clauses in C' except those belonging to N?
for i= 1,2, ... ,r.
Therefore an optimal truth assignment for C' has weight exactly

L w(N?).
r

w(S") + w(K") -
i=1

Then the inequality 6 says that the weight of the truth assignment found
by GENJOHNSON2 is a.t least 2/3 of the weight of an optimal assignment
Approximate Algorithms for MAX-SAT 99

to C'. In consequence, the weight of the assignment constructed by the


original GREEDY JOHNSON2 algorithm for the instance C is at least 2/3 of
the weight of an optimal assignment to C, thus proving the theorem .
• Finally, it is worthwhile to note this performance ratio 2/3 is tight. There
are formulae for which GREEDY JOHNSON2 finds a truth assignment such
that the ratio is 2/3. Therefore this bound cannot be improved. In [21] a
set of formulae with this characteristic has been presented.
Let us consider the following formula Ch formed by 3h clauses with h
integer greater than 0 where all the clauses have the same weight:

GREEDY JOHNSON2 gives value true to all variables so satisfying 2h


c1auses,that is (U3k+1 V U3k+2), (U3k+1 V U3k+3) for 0 ~ k ~ h - 1. On
the other hand, the truth assignment U3k+l =false, U3k+2 = U3k+3 =true
for 0 ~ k ~ h - 1 satisfies all the 3h clauses of the formula.

4.3 Randomized algorithms for MAX W-SAT


4.3.1 A randomized 1/2-approximate algorithm for MAX W-SAT
One of the most interesting approaches in the design of new algorithms is the
use of randomization. During the computation, random bits are generated
and used to influence the algorithm process.
In many cases randomization allows to obtain better (expected) perfor-
mance or to to simplify the construction of the algorithm. Particularly in
the field of approximation, randomized algorithms are widely used and, for
many problems, the algorithm can be "derandomized" in polynomial time
while preserving the approximation ratio. However, it is important to note
that, often, the derandomization leads to algorithms which are very compli-
cated in practice.
Let us now use this approach to present more efficient approximate al-
gorithms for MAX W-SAT. More precisely, this section introduces two
different randomized algorithms that achieve a performance ratio of 3/4.
Moreover, it is possible to de randomize these algorithms, that is, to ob-
tain deterministic algorithms that preserve the same bound 3/4 for every
instance.
The derandomization is based on the method of conditional probabilities
that has revealed its usefulness in numerous cases and is a general technique
100 R. Battiti and M. Protasi

that often permits to obtain a deterministic algorithm from a randomized


one while preserving the quality of approximation.
Let us first present the algorithm RANDOM, a simple randomized algo-
rithm, that, while just achieving a performance ratio 1/2, will be used in the
following subsections as an ingredient to reach the performance ratio 3/4.

RANDOM
Input: Set C of weighted clauses in conjunctive normal form
Output: Truth assignment U, C', ECjEc' Wj
1 Independently set each variable Ui to true with probability 1/2
2 Compute C' = {Gj E C : Gj is satisfied }
3 Compute ECjEc' wi

Figure 9: The RANDOM algorithm, a randomized (1 - 1/2k )-approximate


algorithm.

Because the algorithm is randomized, one is interested in the expected


performance when the algorithm is run with different sequences of random
bits (i.e., with different random assignments).

Lemma 4.8 Given an instance of MAX W-SAT in which all clauses have
at least k literals, the expected weight W of the solution found by algorithm
RANDOM is such that

Proof. The probability that any clause with k literals is not satisfied by the
assignment found by the algorithm is 2- k (all possible k matches must fail).
Therefore the probability that a clause is satisfied is 1 - 2- k • Then

• As an immediate consequence of Lemma 4.8, one obtains the following


Corollary,

Corollary 4.9 Algorithm RANDOM finds a solution for MAX W-SAT whose
expected value is at least one half of the optimum value.
Approximate Algorithms for MAX-SAT 101

The performance of algorithm RANDOM is the same, in a probabilistic


setting, as that of algorithm GREEDY J OHNSON2.
Actually it is possible to show that, by applying the method of condi-
tional probabilities to algorithm RANDOM, one essentially obtains algorithm
GREEDyJoHNSON2. In this way the algorithm RANDOM is derandomized.
Let us consider the following greedy algorithm that it is described infor-
mally and divided in two phases:
First phase (initialization). Assuming (as in algorithm RANDOM) that
every variable Ui is true with probability 1/2, for every clause Ci compute
the probability di that Ci is not satisfied. According to Lemma 4.8 this
probability is 21" where k is the number of literals occurring in Ci.
Second phase Given a variable Uj that has not been assigned a value, let
CT be the set of remaining clauses that contain Uj and let CF be the set
of remaining clauses that contain Uj.
If 2: CiE CT Widi ~ 2:CiECF Wjdi then assign to Uj true, otherwise as-
sign false. In the first case remove all clauses of CT from the formula and
double di for all clauses Ci of CFj in the second case remove all clauses of
CF from the formula and double di for all clauses Ci of CT.
It is immediate to note that the truth assignment computed by such an
algorithm has weight at least equal to the expected value W of the solution
found by algorithm RANDOM. Moreover the algorithm is deterministic.
On the other hand, the two phases correspond to what is made in algo-
rithm GREEDyJOHNSON2.
The approach shown for derandomizing RANDOM can be applied to many
randomized algorithms. For the sake of simplicity, let us assume that the
input is given by n Boolean variables. Now let E be the expected value of
the solutions achieved by a randomized algorithm A. One is now interested
in finding a deterministic algorithm B that achieves a solution of value E in
polynomial time. In such a way A has been efficiently derandomized.
The algorithm B consists of n iterations and, at each iteration, the value
of a variable is determined. The Boolean value of the i-th variable is found
in the following way: given the values of variables Ul, .•• ,Ui-l, we set Ui =
1 and we compute the expected weight of clauses satisfied by that truth
assignment; then we compute the expected weight of clauses satisfied by
the truth assignment in which one has Ui = 0, given the current assignment
to Ul, •.. ,Ui-l. We assign Ui the value that maximizes the conditional
expectation.
After n iterations, a truth assignment is found deterministically. If we
are able to compute each conditional expectation in polynomial time, the
algorithm runs in polynomial time and has found an approximate solution
102 R. Battiti and M. Protasi

whose value is at least E.


Coming back to algorithm RANDOM, one has that, for k = 1, the algo-
rithm achieves an expected performance ratio 1/2. The performance of the
algorithm improves if we increase the number of literals. In particular, for
k = 2, that is for formulae which do not contain unit clauses, one obtains
an expected value which is at least 3/4 of the optimal value. Therefore if
one could discard unit clauses, one would already have a 3/4-approximate
algorithm for MAX W-SAT, after applying the derandomization. This ob-
servation will reveal its usefulness in the following.

4.3.2 A randomized 3/4-approximate algorithm for MAX W-SAT


This subsection presents an algorithm that considerably improves the per-
formance of algorithm RANDOM, and obtains a performance ratio 3/4.
First of all we consider a generalization of algorithm RANDOM. In the
previous case the value of every variable was chosen randomly and uniformly,
that is with probability 1/2; now the value of variable Hi is chosen with
probability Pi, obtaining algorithm GENRANDOM.
GENRANDOM
Input: Set C of weighted clauses in conjunctive normal form
Output: Truth assignment U, C ' , Ecjec' Wi
1 Independently set each variable Hi to true with probability Pi
2 Compute C ' = {Ci E C : Ci is satisfied }
3 Compute ECjEc' Wj

Figure 10: The GENRANDOM algorithm.

The expected number of clauses satisfied by algorithm GENRANDOM can


be immediately computed as a function of Pj.
Lemma 4.10 The expected weight W of the set of clauses C is:

ieU:t
J

where ut(UT) denotes the set of indices of the variables appearing unnegated
(negated) in the clause Cj.

Proof. It is an obvious generalization of the proof given in the particular


case Pi = 1/2 .

Approximate Algorithms for MAX-SAT 103

Now, if one manages to find suitable values Pi such that W ~ 3/4 m*(C)
for every formula C, one would obtain a 3/4-approximate randomized algo-
rithm.
To aim at this result, let us consider the representation of the instances of
MAX W-SAT as instances of an integer linear programming problem (ILP )
already presented in Section 2:

max

subject to :
L Yi + L (1- Yi) ~ Zj, VCj EC
iEUf iEUi

Yi E {O, 1}, 1:5 i:5 n


ZjE{O,l},VCjEC
Let Ul,." , Un be the Boolean variables appearing in the formula. An
instance of MAX W-SAT is equivalent to an instance of ILP if we choose
the following conditions:

- Yi = 1 iff variable Ui is true;

- Yi = 0 iff variable Ui is false;

- Zj = 1 iff clause Cj is satisfied;


- Zj = 0 iff clause Cj is not satisfied.
The linear inequality states the fact that a clause can be satisfied (Zj = 1)
only if at least one of its literals is matched.
One cannot compute the optimal value in polynomial time because ILP
is NP-complete. However let us consider the LP relaxation (by relaxation
one means that the set of admissible solution increases with respect to that
of the original problem) in which one relaxes the conditions Yi, Zj E {0,1}
with the new constraints 0 :5 Yi, Zj :5 1. It is known that LP can be solved
in polynomial time finding a solution

(Y* -- (Yl"'"
* Yn*) ,Z* = (*
Zl"'" * ))
zm

with value m*fP (x) ~ miLP (x), for every instance x, where m LP (x)
and miLP (x denote the optimal value of the LP and ILP instances,
104 R. Battiti and M. Prota.si

respectively. The upper bound is obvious given that the set of admissible
solutions is enlarged by the relaxation.
Let us consider algorithm GENApPROX, see Fig. 11, that works as fol-
lows: first it solves the linear programming relaxation and so computes the
optimal values (y*, z*) ; then, given a function 9 to be specified later, it com-
putes, for each i, i = 1, ... ,n, the probabilities Pi = g(Y*j). By Lemma 4.10
we know that a solution of weight:

w= 2: wj(l- IT (1 - p;) IT Pi)


CjEC iEUt
J
jEUi

must exist; by applying the method of conditional probabilities, such solu-


tion can be deterministically found.

GENApPROX
Input: Set C of clauses in disjunctive normal form
Output: Set C f of clauses, W = 2:CjEC' Wj
1 Express the input C as an equivalent instance x of ILP
2 Find the optimum value y*, z* of x in the linear relaxation
3 Choose Pi t- 9 (y* i), i = 1, 2, ... , n, for a suitable function 9
4 W t- 2: C"EC wj(l - TIiEXt J
(1 - Pi) TIiEX:- Pi)
J
5 Apply the method of conditional probabilities to find
6 a feasible solution C f = {Cj E C : Cj is satisfied} of value W
Figure 11: The GENApPROX algorithm, deterministic version.

If the function 9 can be computed in polynomial time then algorithm


GENApPROX runs in polynomial time. In fact the linear relaxation can
be solved efficiently and the computation of the feasible solution can be
computed in polynomial time with the method of conditional probabilities
explained before.
The quality of approximation naturally depends on the choice of the
function g. Let us suppose that this function finds suitable values such that:

(1 - II (1- Pi) II Pi) ~ ~z;.


jEUt
J
iEU:-
J

If this inequality is satisfied, then the algorithm is a 3/4-approximate algo-


Approximate Algorithms for MAX-SAT 105

rithm for MAX W-SAT. In fact one has :

w = I: wj(1 - II (1 - Pi) II Pi) ~ ~ I: Wjzj =


GjEC iEUt iEUi GjEC

- imLP (x) ~ imiLP (x)


More generally if one has :

(1 - II (1 - Pi) II Pi) ~ az;


iEUt iEUi

one obtains a a-approximate algorithm.


A first interesting way of choosing the function g consists of applying
the following technique, called Randomized Rounding, to get an integral
solution from a linear programming relaxation. In order to get integer values
one rounds the fractional values, that is each variable Yi is independently
set to 1 (corresponding to the Boolean variable Ui being set to true) with
probability Yi, for each i = 1,2, ... , n. Hence the use of the randomized
rounding technique is equivalent to choosing Pi = g(Yi) = Yi, i = 1,2, ... , n.

Lemma 4.11 Given the optimal values (y*, z*) to LP and given any cla'USe
Gj with k literals, one has

(1 - II (1 - yi) II yi) ~ akzj


iEUt iEUj-

where

Proof. Let us consider a clause Gj and, for the sake of simplicity, let us
assume that every variable is unnegated. If a variable Ui would appear
negated in Gj, one could substitute Ui by its negation Ui in every clause and
also replace Yi by 1 - Yi. So we can assume Gj = Ul V ... V Uk with the
associated condition Yi + ... + Yk ~ zj. The Lemma is proved by showing
that:
k
1- II(l- yi) ~ akz;'
i=1
In the proof we exploit the geometric inequality based on the properties of
the arithmetic mean: given a finite set of nonnegative numbers {ab ... ,ak},
106 R. Battiti and M. Protasi

al + ... + ak
k ~ -¢,al a2" ·ak·
Now we apply the geometric inequality to the set {I - yi, .. , ,1 - yD.
",k *
Because "'~ l-y; = 1_ wi=1 Yi one has
WI=1 k k'
*
II (1 -
k ",k z":
1- yi) ~ 1 - (1 - Wi=;/ Yi )k ~ 1 - (1 - l)k.
i=1

We note that the function g(zi) = 1 - (1 - =i-)k is concave in the interval


[0,1]; hence it is sufficient to prove that g(z]) ~ akz] at the extremal points
of the interval. Because one has
g(O) = 0 and g(l) = ak
the Lemma is shown .
• One can conclude that algorithm GENApPROX with the choice Pi = yi
reaches an approximation ratio equal to ak. In particular for k = 2, the ratio
is 3/4. Note that, because ak is decreasing with k, algorithm GENApPROX
is an ak-approximation algorithm for formulae with at most k literals per
clause.
Moreover, it is well known that limk-too(l- k)k = ~j hence for arbitrary
formulae one finds approximate solutions whose value is at least 1- ~ times
the optimal value. Because 1 - ~ = 0.632 ... , the randomized rounding
obtains a better performance than RANDOM, but it looks as if one is far
from achieving a 3/4-approximation ratio.
Luckily, with a suitable merging of the above algorithm with RANDOM
one obtains the desired performance ratio. Firstly let us recall that RANDOM
is a 3/4-approximation algorithm if all clauses have at least two literals. On
the other hand, GENApPROX is a 3/4-approximation algorithm if we work
with clauses with at most two literals. One algorithm is good for large
clauses, the other for short ones. A simple combination consists of running
both algorithm and choosing the best truth assignment obtained. Let us
now consider the expected value obtained from the combination.
Theorem 4.12 Let WI be the expected weight corresponding to Pi = 1/2
and let W2 be the expected weight corresponding to Pi = Yi, i = 1, 2, ... , n.
Then one has :

max(Wb W 2) ~ ~mip (x), for any instance x.


Approximate Algorithms for MAX-SAT 107

i i
Proof. Because max(W1' W2) ~ WI W2 , it is sufficient to show that WI W2 ~
~mip (x) for any x. Let us denote by C k the set of clauses with exactly k
literals. By Lemma 4.8, because 0 ~ z;
~ lone has

(7)

where 'Yk = (1 - f,;)'


Moreover, by applying Lemma 4.11, one obtains:

W2 ~ I: I: OikWjzj. (8)
k~l CjEC"

Summing 7 and 8 one has :

Wi +2 W 2 >- "'"' "'"'


L.J L.J
'Yk +2 Oik W,Z,.
. '!'

k~l CjEC"

We note that 'Y1 + a1 = 'Y2 + a2 = 3/2 and for k ~ 3 one has that 'Yk + ak ~
7/8 + 1 - ~ ~ 3/2; Therefore:

W1 +
2
W2 ~ "'"'
L.J "'"' 3 * = '43 mLP
L.J '4WjZj * ()
x •
k~l CjEC"

• Note that it is not necessary to separately apply the two algorithms but
it is sufficient to randomly choose one of the two algorithms with probability
1/2, as it is done in algorithm 3/4-ApPROXIMATE SAT.
Corollary 4.13 Algorithm 3!4-ApPROXIMATE SAT is a 3/4-approxima-
tion algorithm for MAX W-SAT.
Proof. The proof derives from the above theorem and from the use of the
method of conditional probabilities .

4.3.3 A variant of the randomized rounding technique
This subsection shows that it possible to directly design a 3/4-approximate
algorithm for MAX W-SATbased on randomized rounding. However, in or-
der to reach this aim, one needs to apply some modifications to the standard
technique.
108 R. Battiti and M. Protasi

3/4-ApPROXIMATE SAT
Input: Set C of clauses in conjunctive normal form
Output: Set C' of clauses, W = Eojec' Wj
1 Express the input C as an equivalent instance x of ILP
2 Find the optimum value (y*, z*) of x in the linear relaxation
3 With probability 1/2 choose Pi = 1/2 or Pi = Yi, i = 1,2, ... ,n
4 W +-- EoJ.ec wj{l - ITiEU:l- (1 - Pi) ITiEU:- Pi)
J J
5 Apply the method of conditional probabilities to find a feasible
6 solution C' = {Cj E C : Cj is satisfied} of value W
Figure 12: The 3/4-ApPROXIMATE SAT algorithm: deterministic with per-
formance ratio 3/4.

Let us start again from algorithm GENApPROX. One has already seen
that, by choosing g(yi) = yi, one cannot obtain a performance ratio 3/4.
Therefore a different choice of 9 is necessary.
Let us consider the following definition:

Definition 4.14 A function 9 : [0,1] ---* [0,1] has property 3/4 if

I k 3 I k
1- II(l- g(Yi)) II g(Yi) ~ - min(l, LYi + L (1- Yi)).
i=1 i=I+1 4 i=1 i=I+1

for any integers k,l with k ~ I and any Y1, ... ,Yk E [0,1].

By Lemma 4.11 if a function 9 with property 3/4 is found, then algorithm


GENApPROX becomes a 3/4-approximate algorithm. In order to prove the
existence offunctions with property 3/4 one needs the following lemma:

Lemma 4.14 A fu.nction 9 : [0,1] ---* [0,1] has property 3/4 if satisfies the
following conditions:

i) 1 - IT~=d1 - 9(Yi» ~ ~ min(1, E~=1 Y;), Vk and VYi E [0,1]


for any integer k and Yi E [0,1], i = 1,2, ... ,n.

ii) g(y) $ 1- g(1- y).

Proof. Given integers k, I with k ~ I, let Y: = Yi for i = 1 ... ,I and Y: = 1-Yi


Approximate Algorithms for MAX-SAT 109

for i = 1+ 1, ... ,k. one has


I k I k
1- II (1 - g(Yi)) II g(Yi) > 1- II (1 - g(Yi)) II (1 - g(l- Yi))
i=1 i=l+1 i=1
k k
1- II (1 - g(yD) ~ ~ min(l, LyD
i=1 i=1
3.
L: (1 -
I k
4 mm{l, LYi + Yi)).
i=1 ;=1+1


Lemma 4.15 The following function gOt verifies property 3/4:

gOt{Y) = a + (1- 2a)y


where 2 - if,. ~ a ~ i
Proof It is immediate to verify that gOt satisfies (ii). In order to prove that
also the condition (i) of the lemma 4.14 is verified, one has
k k
1- II(l- YOt(Yi)) = 1- II(l- a - (1- 2a)Yi)
~1 ~1

~ 1- (1-"- (1-2")~klY;)'
where one exploits the fact that the arithmetic mean is greater or equal
than the geometric mean of k numbers. Let us define Y =
E~,-1 Yi . It is
k
sufficient to prove that:

hk(Y) = 1 - (1 - a - (1 - 2a)y)k ~ ~ min(l, kY) 'v'Y E [0,1].


Let us first prove the result in the interval [0, 11k]. Because the function
hk is concave, the minimum value is reached at one of the extremal points
of the interval. This means that it is sufficient to check that the inequality
holds for Y = 0 and Y = 11k. This fact is immediately true for Y = O. In
the other case it is sufficient to prove that:
1 k
1 - (1- a - (1 - 2a)k) ~ 3/4, Vk ~ 1.
110 R. Battiti and M. Protasi

For k = 1 the inequality is satisfied if a ~ 1/4. On the other side for k = 2


the inequality becomes an identity and therefore it is always satisfied. For
k ~ 3, by easy algebraic steps, one needs to show that

k - 1 - k4- 1/ k
a~ k-2 (9)

Note that the right-hand-side of the inequality is a function decreasing in k.


Moreover, for k = 3, inequality 9 holds for a ~ 2 - 'fi.
Therefore the proof
is completed for the interval [0, t].
To finish the proof of the lemma it is sufficient to observe that, for any
given k, the function hk(Y) is increasing in the interval (i, 1] .
• Lemmata 4.11 and 4.15 imply the following theorem:

Theorem 4.16 Given a such that2-~ ~ a ~~, algorithm GENApPROX


with the choice Pi = a + (1 - 2a)yt is a 3/4-approximate algorithm for
MAX W-SAT.

4.4 Another ~-approximate algorithm by Yannakakis


i
It is possible to achieve a performance ratio also by using a very different
approach. In fact Yannakakis [84] introduced an algorithm that exploits
network flow techniques and again obtains the performance bound i.
Because of the complexity of the proofs, we will limit ourselves to con-
sider the case of MAX W-2-SAT, by showing that MAX W-2-SAT can be
approximated with a performance ratio i.After generalizing the techniques
used for MAX W-2-SAT to an arbitrary formula, it is still possible to ob-
tain the same bound. About the notation: in this section we consider the
MAX W-2-SAT problem with clauses containing one or two literals. As
already shown, if every clause has at least two literals, a simple greedy al-
gorithm finds the ratio i. Therefore, if one can eliminate the unit clauses
from instances of MAX W-2-SAT, one obtains the desired bound.
More precisely, given a set C of clauses with at most two literals per
clause, one fixes the truth value for a subset of the variables and builds a
set C' of clauses with exactly two literals per clause in which the remaining
variables occur. C' is constructed in such a way that a truth assignment
for C' with approximation ratio R gives a truth assignment for C (when
combined with the truth values of fixed variables) with an approximation
ratio at least R.
Approximate Algorithms for MAX-SAT 111

Formally, one has the following theorem in which we use this notation:
w(C, p) is the weight of the formula C with respect the truth assignment p.
In this Subsection, for the sake of clarity, we will use different symbols for
the set of Boolean variables and the truth assignments.

Theorem 4.17 Let C be an instance of MAX W-2-SAT defined over a set


U of Boolean variables. It is possible to find in polynomial time a subset V
of variables, a truth assignment (7 for V, a nonnegative constant h, and a
set c' of clauses with exactly two literals per clause in which only variables
belonging to the set U - V occur such that:

1. For every truth assignment fJ to the set U - V of variables one has


w(C, (7 U B) = w(C', B) + h where q U B is the global truth assignment
obtained applying (7 to V and B to U - V.

2. For every truth assignment p to U with restriction () to U - V one has


w(C,p) ~ w(C',()+h

This theorem says that one does not lose in the level of approximation
by choosing the truth assignment q for the variables in Vj an optimal (or
near-optimal) truth assignment for U - V together with (7 for V gives an
optimal (near-optimal) assignment for the entire set U of variables.
Because one knows how to approximate formulae with exactly two literals
per clause with a performance ratio 3/4, the above theorem implies the
following:

Corollary 4.18 If MAX W-2-SAT with exactly two literals per clause can
be approximated with performance ratio R, then the general MAX W-2-SAT
problem can be approximated with performance ratio R.
In particular, MAX W-2-SAT can be approximated with performance ratio
3/4.

Proof. Assume that it is possible to achieve an approximation ratio R for


the formula C', that is w(C', fJ) ~ Rm*(C'). Then the same ratio holds for
C according to the following calculation. By Part 2 of the above Theorem,
one has m"'(C) ~ m"'(C') + h. Moreover, applying Part 1 and because
h ~ 0 and R ~ lone obtains w(C, (7 U () = w(C', () + h ~ Rm*(C') + h ~
R(m"'(C') + h) ~ Rm*(C).
Intuitively, the proof of theorem 4.17 is based on the idea of finding
a correspondence between formulae and networks so that the weight of a
formula is evaluated by computing the maximum flow of a network.
112 R. Battiti and M. Protasi

In the sequel we will assume that some basic notions of network flow
theory are known. For a clear introduction to this area see, for instance,
[73].
Given a formula C, a network N(C) is built in the following way: Every
literal in C becomes a node in N(C); moreover two other nodes are intro-
duced, that is, s (which is the source of the network) and t (which is the
sink of the network).
The arcs are defined as follows. First of all, two arcs (u, v) and (w, z)
are said to correspond to each other if u = z and v = w. In this approach
a= a for an arbitrary literal a and s = t. Now, given a clause Cj of C with
weight Wi, Gj is associated to two corresponding arcs of N(C), each having
capacity Wi/2. If Gj = a is a unit clause then its associated arcs are (s, a)
and (a, t); if Cj = a V b is a clause of length two, its associated arcs are (a, b)
and (Ii, a). Finally note that the source node stands for the constant true
and the sink node for false.
Some other definition is needed.
Definition 4.15 • A network is symmetric if corresponding arcs have
the same capacity.
• A flow is symmetric if corresponding arcs have the same flow.
According to such definition N(C) is symmetric.
Let r be the maximum flow in N(C). Now let us consider a new flow
fin N(C). If ejl and ej2 are the two arcs associated to a clause Gj, then f
is defined in the following way: f( ei1) = f(ei2) = f*(e' l )~r(ei2).
Two introductory lemmata are now presented. The proofs can be found
in [84].

Lemma 4.19 The flow f satisfies the capacity and flow conservation con-
straints and has maximum value.

Definition 4.16 Let G and G' be two formulae defined over the same set
of variables. G and G' are said to be equivalent if every truth assignment
gives the same weight to the two formulae.

In the following lemma one will assume that all the considered clauses
have the same weight.

Lemma 4.20 1. Let 'US consider the following two formulae:


G = {ujVui+lli = 0, ... ,k} and G' = {Ui+l VUili = 0, ... , k} . Then
G and G' are equivalent.
Approximate Algorithms for MAX-SAT 113

~. Let us consider the following two formulae:


H = {ut}u {Ui VUi+l!i = 1, ... ,k -1} and H' = {Uk}U{Ui+l VUi!i =
1, ... ,k - 1}. Then also H and H' are equivalent.
Let us define the residual network M with respect to the flow f. Given
any arc e = (u, v) of N(C), M contains e with capacity c(e) - f(e) if e is not
saturated, where c denotes the capacity in N(C); moreover M contains the
reverse arc (v, u) with capacity f(e), again if the capacity is not saturated.
No arc going into source s or going out of t is included.
The reversal of two corresponding arcs gives two arcs that are still corre-
sponding to each other. Furthermore M is symmetric because the network
N (C) and the flow f are symmetric. Finally note that M is the network
N(C) of a formula C on the same set of variables. If M has an arc (8, l)
and hence an arc (I, t) of weight w, then C has a unit clause I with weight
2w. If M contains the corresponding arcs (a, b) and (b, a) of weight w, then
C contains the clause aV b with weight 2w.
Now one is ready to state the following important Lemma:

Lemma 4.21 Let fopt be the value of the maximum flow f. For any truth
assignment 0, one has

w(C,O) = w(C, 8) + fopt.

Proof. A flow can be decomposed into a set of simple paths PI, .. . ,~


from the source to the sink and into a set of cycles KI, .. . , Km. Given an
arbitrary arc e, the flow f(e) through e is equal to the sum of the weights
associated with the paths and cycles containing e. Moreover fopt is equal to
the sum of the weights of the paths. For this decomposition, in the case of
the residual network M, one has to reverse every path and cycle from N(C).
This operation is performed by subtracting the associated weight from the
capacities of all the arcs of the path or cycle and summing the weight to the
capacities of the arcs in the reverse path or cycle (except for the arcs going
into the source or out of the sink).
Let us consider what happens to the clauses in C after applying this
reversal operation. By reversing a cycle Kj with weight Wj, Wj is sub-
tracted from all the clauses that correspond to arcs of the cycle and Wi
is added to the clauses corresponding to arcs of the reverse cycle. Part
1 of Lemma 4.20 guarantees that a set of equivalent clauses is obtained.
Considering the paths, assume that an arbitrary path Pj consists of arcs
(8, Ul),"" (Uk-I, Uk), (Uk, t). By Part 2 of Lemma 4.20, the corresponding
set of clauses
114 R. Battiti and M. Protasi

{Ul}U{UjVui+lli = 1, ... , k-1}V{ud is equivalent to the set {Uk, Uk}U


{Ui+! V uili = 1, ... , k - 1}. {Uk, Uk} is equivalent to the constant clause
true while {Uj+! V Ui Ii = 1, ... , k -1} corresponds to the reverse path of Pj.
Hence, by reversing a path with weight Wj, the weight is subtracted from
the corresponding clauses and added to the clauses of the reverse path in
Mj moreover, the weight Wj is given to constant clause true, so preserving
the equivalence.
Globally speaking, one obtains an equivalent set of clauses that consists
of C and the clause true with weight equal to the sum of the weights of the
paths, that is, !opt .

z

Figure 13: The residual network M.

Note that the residual network M does not contain any path from s to t,
because! is a maximum flow (see [73]). Let D be the set of nodes reachable
from s in M. The symmetry of M implies that there exists a path from s
to a node a if and only if there exists a path from a to t. One can conclude
that D does not contain any complementary literals because otherwise M
would contain a path from s to t. Again, by symmetry, the set of nodes that
can reach t is given by the set D = {ala ED}. The set Z is given by the
the remaining nodes of M, that is, by the nodes that do not belong to D or
D. By construction, there are no arcs coming out of D and, by symmetry
of M, no arcs going into D, see Fig. 13.
Approximate Algorithms for MAX-SAT 115

So one has obtained the following Lemma.

Lemma 4.22 The set D does not contain any complementary literals. Ev-
ery clause of C that contains the negation 7i of a literal a in D, also contains
(positively) a literal b in D.

Exploiting this lemma, if one sets every literal in D to true, all the
clauses in C involving such literals and their negations are satisfied. Let d*
be the total weight of these clauses. Lemma 4.21 implies Theorem 4.17 if
one does these choices: V is given by those variables with a literal in D, u
is the above truth assignment for V, C' is the set of remaining clauses of
C that involve only literals from Z and h = !opt + d*. As desired, C' does
not have unit clauses because M does not contain any arc from s to nodes
outside D.
Finally it is worthwhile to note that, because the weights of the clauses
of C are integers, it follows that, in general, the weights of C and C', are
half-integers. Then these weights can become integers if they are doubled.
This multiplication by two does not change the problem.
The proof of Theorem 4.17 is therefore completed.
A generalization of this approach allows to eliminate the unit clauses in
the case of formulae in which there are clauses of length 3 or more.

4.5 Approximate solution of MAX W-SAT: improvements


The approximation ratio 3/4 can be slightly improved by applying different
relation techniques. More precisely, Goemans and Williamson [40] intro-
duce a new way to approximate another classical optimization problem:
MAX-CUT.

Definition 4.17 MAX-CUT is thefollowingNPO problem: Given a graph


G = (V, E) and a weight w(e) for each e E E, one looks for a subset V' of
V that maximizes the sum of the weights of the edges from E that have one
endpoint in V' and one endpoint in V - V'.

MAX-CUT can be easily approximated with a performance ratio 1/2


(see, for instance, [84]). For many years it was impossible to improve this
bound. Goemans and Williamson devised a new approach. The problem
can be represented as a quadratic programming problem. As in the case
of MAX W-SAT, one can look for some relaxation in order to find a good
approximate algorithm. Actually, in the case of MAX-CUT, a relaxation
116 R. Battiti and M. Prota.si

based on the so called semidefined programming allows to design an approx-


imation ratio .87856, therefore strongly improving the bound 1/2. In fact,
by applying similar techniques, it is possible to reach better results also for
MAX W-SAT, although the improvement is very small in this case:

Theorem 4.23 There exists a polynomial time approximate algorithm for


MAX W-SAT with a performance ratio .7584.

For the restricted case of MAX 2-SAT , one can obtain a more substan-
tial improvement with the technique of Feige and Goemans [32]. Actually
they have obtained a performance ratio 0.931. Coming back to the general
problem some other small improvements have been given. Asano [5] (fol-
lowing [6]) has improved the bound to .77. If one considers only satisfiable
MAX W-SAT instances, Trevisan [83] obtains a 0.8 approximation factor,
while Karloff and Zwick [58] claim a 0.875 performance ratio for satisfiable
instances of MAX W-3-SAT.

4.6 Negative results about approximability


Until now one has shown how to approximate MAX W-SAT, obtaining bet-
ter and better approximation ratios. It is natural, in this framework, to
wonder whether it is possible to further improve the approximation proper-
ties of MAX W-SAT, by showing that the problem belongs to PTAS. We
recall that if MAX W-SAT belongs to PTAS, one would be able to intro-
duce a polynomial time c-approximate algorithm for every c between 0 and
1. Unfortunately it is possible to prove that, unless P = NP, this is not
true.
The negative results for MAX W-SAT and, more generally, for NpO
optimization problems are based on the theory of probabilistic checkable
proof (for short, PCP). This theory was developed in a different context
but, surprisingly, can be applied to the field of approximation algorithms
allowing to find very interesting negative results. Because of its complexity it
is impossible to present this theory. The reader interested in understanding
this nice relationship can read the pioneering papers that introduced PCP,
for instance [3] and [4]. Exploiting this theory many negative results for
MAX W-SAT have been given. We limit ourselves to present the strongest
one which is due to Hastad [50].

Theorem 4.24 Unless P = Np MAX W-SAT cannot be approximated in


polynomial time within a performance ratio greater than 7/8.
Approximate Algorithms for MAX-SAT 117

More precisely, this result has been obtained for the MAX W-SAT prob-
lem in which each clause is of length exactly three. Since this version is a
particular case of MAX W -SAT, of course the result holds in general.

5 A different MAX-SAT problem and complete-


ness results
In this section we present another optimization problem again having SAT
as associated recognition problem. While in MAX W-SAT we associate a
weight to each clause, now we associate a weight to each variable. More
formally we introduce MAX-VAR SAT.
MAX- VAR SAT is anNPO problem in which (1,801, m, opt) are defined
in the following way:

1. I = sets U = Ub'" ,Un of Boolean variables and a collection C of


clauses over U, a set W = WI,"" wn of integers (weights) associated
to the variables.
2. Given an instance x of I, 801(x) = set of truth assignments to the
variables in U that satisfy all clauses in C.
3. Given an instance x of I and a feasible solution r of x,
m(x, r) =max(1, sum of weights associated to the variables that are
true in r).
4. opt= max.

We note that, in the case of this new problem, the feasible solutions are
restricted to those truth assignments that satisfy the formula completely.
Formally, in the definition, the measure is found by determining a maximum
between 1 and the sum of the weights associated to the variables that are
true in the truth assignment, because the formula could be not satisfiable.
In this case we directly assume that the optimum value is 1, to define the
optimization problem for every instance.
A first important result due to [9] and independently to [71] is the fol-
lowing:

Theorem 5.1 MAX-VAR SAT is NPO-complete.

Proof. Let P be an Npo problem and let us consider the corresponding non
deterministic Turing machine M associated to P and that was presented in
Section 4.1.
118 R. Battiti and M. Prota.si

According to Cook's Theorem, for any instance x, one can find a Boolean
formula whose satisfying truth assignments are in one-to-one correspon-
dence with the halting computation paths of M{x). Let Yl, Y2, ... ,Yr be the
Boolean variables describing the feasible solution Y of x and let ml, ... ,ms
be the Boolean variables that correspond to the tape cells on which M prints
the value mp(x, y). Then a zero weight is assigned to every variable except
the m/s which are given weight 2s -i.
Given a satisfying truth assignment, one is able to find a solution for P
just by looking at the values of the Yi'S. From the construction mp(x, y) is
equal to the sum of the weights of the true variables. Therefore it has been
proved that P 5:.1'7.AS MAX- VAR SAT with, in this case c( f) = f.
• Considering a particular version of MAX - VAR SAT one can exhibit
an example of a problem which is AP X-complete. Let us consider the
problem MAX- VAR BOUNDED SAT in which the total sum ofthe weights
is between Z and 2Z, where Z is an integer given in input. Consequently
the measure is changed in the following way: max(Z, Ei=l WiT(Ui)) if the
formula is satisfied, m(x, T) = Z otherwise

Theorem 5.2 MAX-VAR BOUNDED SAT is APX-complete.

The proof of the theorem can be found in [27].


Historically MAX-VAR BOUNDED SAT is the first example ofa prob-
lem that is AP X-complete. However, by combining together different tech-
niques (including PCP) it would be possible to prove the following:

Theorem 5.3 MAX W-SAT is AP X -complete.

A presentation of the proof can be found in [8]. Let us note that this
theorem is another way of stating that MAX W-SAT does not belong to
PTAS.

6 Local search
According to [73] "local search is based on what is perhaps the oldest op-
timization method - trial and error." The idea is simple and natural and
it is surprising to see how successful local search has been on a variety of
difficult problems. MAX-SAT is among the problems for which local search
has been very effective: different variations of local search with randomness
techniques have been proposed for SAT and MAX-SAT starting from the
Approximate Algorithms for MAX-SAT 119

late eighties, see for example [42, 81], motivated by previous applications of
"min-conflicts" heuristics in the area of Artificial Intelligence [66].
The general scheme is based on generating a starting point in the set
of admissible solution and trying to improve it through the application of
simple basic moves. If a move ("trial") is successful one accepts it, other-
wise ("error") one keeps the current point. Of course, the successfulness
of a local search technique depends on the neighborhood chosen and there
are often trade-offs between the size of the neighborhood (and the related
computational requirements to calculate it) and the quality of the obtained
local optima.
In addition, as it will be demonstrated in Sec. 6.2, the use of a guiding
function different from the original one can in some cases guarantee local
optima of better quality.
Because this presentation is dedicated to the MAX-SAT problem, the
search space that we consider is given by all possible truth assignments.
Of course, a truth assignment can be represented by a binary string. For
this presentation, let us consider the elementary changes to the current
assignment obtained by changing a single truth value. The definitions are
as follows.
Let U be the discrete search space: U = {O, l}n, and let f : U ---? R (R
are the real numbers) be the function to be maximized, i.e., in our case, the
number of satisfied clauses. In addition, let U(t) E U be the current config-
uration along the search trajectory at iteration t, and N(U(t») the neighbor-
hood of point U(t), obtained by applying a set of basic moves ILi (1 ~ i ~ n),
where ILi complements the i-th bit Ui of the string: ILi (Ul' U2, ... , Ui, ••. , un) =
(Ul, U2, ... , 1- Ui, ... , Un). Clearly, these moves are idempotent (ILi l = IL;).

N(U(t)) = {U E U such that U = ILi u(t), i = 1, ... , n}


The version of local search (LS) that we consider starts from a random
initial configuration U(O) E U and generates a search trajectory as follows:

v = BEST-NEIGHBOR( N(U(t») ) (10)


U(t+1) = {V if f(V) > j(U(t)) (11)
U(t) if f(V) ~ f(U(t))

where BEST-NEIGHBOR selects V E N(U(t)) with the best f value and ties
are broken randomly. V in turn becomes the new current configuration if f
improves. Other versions are satisfied with an improving (or non-worsening)
120 R. Battiti and M. Protasi

neighbor, not necessarily the best one. Clearly, local search stops as soon as
the first local optimum point is encountered, when no improving moves are
available, see eqn. 11. Let us define as LS+ a modification of LS where a
specified number of iterations are executed and the candidate move obtained
by BEST-NEIGHBOR is always accepted even if the f value remains equal
or worsens.

6.1 Quality of local optima


Let m* be the optimum value and k the minimum number of literals con-
tained in the problem clauses.
For the following discussion it is useful to consider the different degree
of coverage of the various clause for a given assignment. Precisely, let us
define as Covs the subset of clauses that have exactly s literals matched
by the current assignment, and by COY s (l) the number of clauses in COY s
that contain literal 1.
~

T- coverage
L L

.--
o 1 2 3
, , ,
--.- --- ---.
'-.' ..
~- - ~-- I-~--. - .. - ... - -t- -
_l _.J __•__

0 -;
:~: '"


0 --:~-
- 'T - . , - - ,--
"

_-II.'
... _ 04. _'-, _
) I
_.L _.J __",__
-',I
Cov2(L) =1
Q !»-: : """
Covl(L) =1
-
CovO(L) = 1 """

Figure 14: Literal L is changed from true to false.

One has the following theorem [48]:

Theorem 6.1 Let m,oc be the number of satisfied clauses at a local opti-
mum of any instance of MAX-SAT with at least k literals per clause. m/oc
satisfies the following bound

k
m,oc > +1 m
- -k--
Approximate Algorithms for MAX-SAT 121

and the bound is sharp.

Proof. By definition, if the assignment U is a local optimum, one cannot


flip the truth value of a variable (from true to false or vice versa) and
obtain a net increase in the number of satisfied clauses I. Now, let (A/)i
by the increase in I if variable Ui is flipped. By using the above introduced
quantities one verifies that:

(12)
In fact, when Ui is flipped one looses the clauses that contain Ui as the single
matched literal, i.e., COY 1 (Ui) and gains the clauses that have no matched
literal and that contain Ui, i.e., COVo(Ui).
After summing over all variables:

n n
E COVO(Ui) ~ E COV1(Ui) (13)
i=1 ;=1
(14)
where the equality E?:l COVO(Ui) = klCovol and E?=l COVI(Ui) = ICOVII
have been used. The equality are demonstrated by counting how many times
a clause in Covo (or COVI) is uncountered during the sum. For example,
because all literals are unmatched for the clauses in Covo, each of them will
be encountered k times during the sum.
The conclusion is immediate:

(15)

• The intuitive explanation is as follows: if there are too many clauses in


Covo , because each of them has k unmatched literals, there will be at least
one variable whose flipping will satisfy so many of these clauses to lead to a
net increase in the number of satisfied clauses.
There is therefore a very simple local search algorithm that reaches the
same bound as the GREEDyJOHNSON1 algorithm. One starts from a truth
assignments and keeps flipping variables that cause a net increase of satisfied
clauses, until a local optimum is encountered. Of course, because one gains
at least one clause at each step, there is an upper bound of m on the total
number of steps executed before reaching the local optimum.
The following corollary is immediate:
122 R. Battiti and M. Protasi

Corollary 6.2 Ifm/ oc is the number of satisfied clauses at a local optimum,


then:

(16)

Besides MAX-SAT, many important optimization problems share the


property that the ratio between the value of the local optimum and the
optimal value is bounded by a constant. It is possible to define a class 9£0
composed of these problems. It is of interest to note that the closure of 9 £0
coincides with APX [10].

6.2 Non-oblivious local optima


In the design of efficient approximation algorithms for MAX-SAT a recent
approach of interest is based on the use of non-oblivious functions indepen-
dently introduced in [2] and in [59].
Let us consider the classical local search algorithm LS for MAX-SAT,
here redefined as oblivious local search (LS-OB). Clearly, the feasible solution
found by LS-OB typically is only a local and not a global optimum.
Now, a different type of local search can be obtained by using a different
objective function to direct the search, i.e., to select the best neighbor at
each iteration. Local optima of the standard objective function f are not
necessarily local optima of the different objective function. In this event, the
second function causes an escape from a given local optimum. Interestingly
enough, suitable non-oblivious functions fNoB improve the performance of
LS if one considers both the worst-case performance ratio and, as it has been
shown in [13], the actual average results obtained on benchmark instances.
Let us mention a theoretical result for MAX 2-SAT . The d-neighborhood
of a given truth assignment is defined as the set of all assignment where the
values of at most d variables are changed. The theoretically-derived non-
oblivious function for MAX 2-SAT is:

Theorems 7-8 of [59] state that:

Theorem 6.3 The performance ratio for any oblivious local search algo-
rithm with a d-neighborhood for MAX 2-SAT is 2/3 for any d = o(n).
Non-oblivious local search with an i-neighborhood achieves a performance
ratio 3/4 for MAX 2-SAT.
Approximate Algorithms for MAX-SAT 123

Proof. While one is referred to the cited papers for the complete details,
let us only demonstrate the second part of the theorem. The proof is a
generalization of that for Theorem 6.1. Let the non-oblivious function be
a weighted linear combination of the number of clauses with one and two
matched literals:

Let (tlf)j by the increase in f if variable Uj is flipped. By using the definition


of local optimum and the quantities introduced in Sec. 6.1 one has that
(tlf)i ~ 0 for each possible flip of a variable Uj. Mter expressing (Ilf)j by
using the above introduced quantities, one obtains:

-aICOV1(Ui)l- (b - a)ICov2(Uj)1 + aICOVO(Ui)I + (b - a)ICov1(Uj)1 ~ 0


(17)
In fact, when Ui is flipped, all clauses that contain it decrease their coverage
by one, while the clauses that contain Ui increase it by one, see also Fig. 14.
As usual, let us assume that no clause contains both a literal and its negation.
After summing over all variables and collecting the sizes of the sets Cov i
one obtains:

(18)
i=1
b- a 2a- b
--ICOV21 + -2-ICovd > ICovol (19)
a a
Now one can fix the relative size of the values a and b in order to get the
best possible bound. This occurs when the coefficients of the terms ICOV21
and ICovll in equation 19 are equal, that is, for b = tao
For these values one obtains the following bound:

(20)

The number of satisfied clauses must be larger than three times the number
of unsatisfied ones, which implies that ICovol ~ ~m, or m,oc ~ ~m .
• Therefore LS-NOB, by using a function that weights in different ways
the satisfied clauses according to the number of matched literals, improves
considerably the performance ratio, even if the search is restricted to a much
124 R. Battiti and M. Protasi

smaller neighborhood. In particular the "standard" neighborhood where all


possible flips are tried is sufficient.
With a suitable generalization the above result can be extended: LS-
NOB achieves a performance ratio 1 - -b
for MAX-k-SAT. The oblivious
function for MAX-k-SAT is of the form:
k
fNOB(U) = L: cilCovjl
j::;;:1

and the above given performance ratio is obtained if the quantities ~j =


Ci+1 - Cj satisfy:

~i = --------:-1---:- [J~_-Oi ( ~ ) 1
(k - i + 1) ( i ~ 1 )
Because the positive factors Ci that multiply ICOVil in the function fNOB are
strictly increasing with i, the approximations obtained through fNOB tend
to be characterized by a "redundant" satisfaction of many clauses. Better
approximations, at the price of a limited number of additional iterations,
can be obtained by a two-phase local search algorithm (NOB&OB): after
a random start fNOB guides the search until a local optimum is encoun-
tered [13]. As soon as this happens a second phase of LS is started where
the move evaluation is based on f. A further reduction in the number of
unsatisfied clauses can be obtained by a "plateau search" phase following
NOB&OB: the search is continued for a certain number of iterations after
the local optimum of OB is encountered, by using LS+, with f as guiding
function [13].

6.2.1 An example of non-oblivious search


Let us consider the following task with number of variables n = 5, and
clauses m = m* = 4, see also Fig. 15:

(UI V U2 V 'U3) 1\ (UI V U2 V 'U4) 1\ (UI V U2 V us) 1\ (U3 V U4 V us)

Let us assume that the assignment U = (11111) is reached by OB local


search. It is immediate to check that U = (11111) is an oblivious local opti-
mum with one unsatisfied clause (clause-4). While OB stops here, a possible
sequence to reach the global optimum starting from U is the following: i)
Approximate Algorithms for MAX-SAT 125

Ul is set to false, ii) U3 is set to false. Now, the first move does not change
the number of satisfied clauses, but it changes the "amount of redundancy"
(in clause-l two literals are now satisfied, i.e., clause-l enters COV2) and the
move is a possible choice for a selection based on the non-oblivious func-
tion. The oblivious plateau has been eliminated and the search can continue
toward the globally optimal point U = (01011).

coverage
ul u2 u3 u4 uS

.--
o 2 3
1.....-1
- + - ... - -1- -
_ .1111--'-
_ .J __ 1
1__

:~ :
o o Ii: -r-:--

ul u2 u3 u4 uS
o I 2 3


1.1
- -+ - . . . - -1- -

0 0•
I 1. 1
_ .1 _ .J __ 1__

: :-:
0 ~ -:--I--

Figure 15: Non-oblivious search takes the different coverage into account.

6.3 Local search satisfies most 3-SAT formulae


An intriguing result by Koutsoupias and Papadimitriou [63] shows that, for
the vast majority of satisfiable 3-SAT formulae, the local search heuristic
that starts at a random truth assignments and repeatedly flips a variable
that improves the number of satisfied clauses, almost always succeeds in
discovering a satisfying truth assignment.
Let us consider all clauses that are satisfied by a given truth assignment
(; and let us pick each of them with probability p = 1/2 to build a 3-SAT
formula. The following theorem [63] is demonstrated:
Theorem 6.4 Let 0 < £ < 1/2. Then there exists c,
c ~ (1 - Jl - (1/2 - £)2) 2 /6, such that for all but a fraction of at most
126 R. Battiti and M. Protasi

n2ne-cn2/2 satisfiable 9-SAT formulae with n variables, the probability that


local search succeeds in discovering a truth assignment in each independent
trial from a random start is at least 1 - e- E2n .

Proof. Let us focus on the structure of the proof, without giving the tech-
nical details. One assumes that there is an assignment fJ that satisfies all
clauses and shows that, if one starts from a good initial assignment, i.e., one
that agrees with fJ in at least (1/2 - e) variables, the probability that the
local search is ever mislead is small. By "mislead" one means that, when
a variable is flipped, the Hamming distance between U(t) and fJ increases.
The Hamming distance between two binary strings is given by the number
of differing bits.
In detail, the quantity 1 - e- E2n in the theorem is the probability that
the initial random truth assignment is good (use Chernoff bound). Then
one demonstrates that, if the initial assignment is good, the probability
that one does not reduce the Hamming distance between U(t) and fJ when
an improving neighbor is chosen is at most 2e-cpn2 , the probability being
measured with respect to the random choice of the clauses to build the
original formula (p = 1/2 for the above theorem).
Finally, the probability that local search starting from a good assignment
will ever be misled by flipping a variable during the entire search trajectory
is at most n2 n e-cpn2, since there are at most n2 n - 1 such possible flippings
- the number of edges of the n-hypercube .
• The original formulation of the above theorem is for a greedy version of
local search, using the function BEST-NEIGHBOR described in eqn. 10, but
the authors note that greediness is not required for the theorem to hold,
although it may be important in practice.
Let us finally note that the result, while of theoretical interest, is valid
for formulae with many clauses (p must be such that the expected number of
clauses is Q(n2 )), while the most difficult formulae have a number of clauses
that is linear in n, see also Sec 9.2.

6.4 Randomized search for 2-SAT (Markov processes)


A "natural" polynomial-time randomized search algorithm for 2-SAT is
presented in [72]. While it has long been known that 2-SAT is a polynomi-
ally solvable problem, the algorithm is of interest because of its simplicity
and is summarized here also because it motivated the GSAT-WITH-WALK
algorithm of [78], see also Sec. 7.2.
Approximate Algorithms for MAX-SAT 127

In its "standard" form, local search is guided by the number of satis-


fied clauses and the basic criterion is that of accepting a neighbor only if
more clauses are satisfied. The paper by Papadimitriou [72] changes the
perspective by concentrating the attention to the unsatisfied clauses.
The algorithm for 2-SAT, is extremely simple:
MARKOV SEARCH
1 Start with any truth assignment
2 while there are unsatisfied clauses do
3 pick one of them and flip a random literal in it

Figure 16: The MARKOVSEARCH randomized algorithm for 2-SAT .

Let us note that worsening moves, leading to a lower number of satisfied


clause, can be accepted during the search.
One can prove that:

Theorem 6.5 The MARKOV SEARCH randomized algorithm for 2-SAT , if


the instance is satisfiable, finds a satisfying assignment in O(n 2 ) expected
number of steps.

Proof. The proof involves an aggregation of the states of the Markov chain
so that the chain is mapped to the gambler's ruin chain. A sketch of the
proof is derived from [68]. Given an instance with a satisfying assignment
U, and the current assignment U(t), the progress of the algorithm can be
represented by a particle moving between the integers {O, 1, ... , n} on the
real line. The position of the particle indicates how many variables in U(t)
agree with those of U. At each iteration the particle's position can change
only by one, from the current position i to i + 1 or i-I for 0 < i < n. A
particle at 0 can move only to 1, and the algorithm terminates when the
particle reaches position n, although it may terminate at some other position
with a satisfying assignment different from U. The crucial fact is that, in
an unsatisfied clause, at least one of the two literals has an incorrect value
and therefore, with probability at least 1/2, the number of correct variables
increases by one when a randomized step is executed.
The random walk on the line is one of the most extensively studied
stochastic processes. In particular, the above process is a version of the
"gambler's ruin" chain with reflecting barrier (that is, the house cannot
lose its last dollar). Average number of steps for the gambler to be ruined
is O(n 2 ) •

128 R. Battiti and M. Protasi

7 Memory-less Local Search Heuristics


State-of-the-art heuristics for MAX-SAT are obtained by complementing
local search with schemes that are capable of producing better approxi-
mations beyond the locally optimal points. In some cases, these schemes
generate a sequence of points in the set of admissible solutions in a way
that is fixed before the search starts. An example is given by multiple runs
of local search starting from different random points. The algorithm does
not take into account the history of the previous phase of the search when
the next points are generated. The term memory-less denotes this lack of
feed back from the search history.
In addition to the cited multiple-run local search, these techniques are
based on Markov processes (Simulated Annealing), see Sec. 7.1, "plateau"
search and "random noise" strategies, see Sec. 7.2, or combinations of ran-
domized constructions and local search, see Sec. 7.3.

7.1 Simulated Annealing


The use of a Markov process (Simulated Annealing or SA for short) to
generate a stochastic search trajectory is adopted for example in [82].

SA
1 for tries t- 1 to MAX-TRIES
2 U t- random truth assignment j iter t- 0
3 forever
4 if U satisfies all clause then ret urn U
5 temperature t- MAX-TEMP X e-iterxdecay_rate
6 if temperature < MIN-TEMP then exit loop
7 for i t-1 to n
8 0 t- increase of satisfied clauses if Ui is flipped
[
9 FLIP(Ui)with probability1/(1 + e-tempe~ature)
10 iter t- iter + 1

Figure 17: The Simulated Annealing algorithm for SAT.

The main structure of the algorithm is illustrated in Fig. 17, adapted


from [82]. For a certain number of tries, a random truth assignment is gen-
erated (line 2) and the temperature parameter is set to MAX-TEMP. In
the inner loop, new assignments are generated by probabilistically flipping
each variable based on the improvement 0 in the number of satisfied clauses
that would occur after the flip. Of course, the improvement can be negative.
Approximate Algorithms for MAX-SAT 129

The probability to flip is given by a logistic function that penalizes smaller


or negative improvements (line 9). The inner loop controls the annealing
schedule: when iter increases the temperature slowly decreases (line 5) until
a minimum of MIN-TEMP is reached and the control exits the loop (line
6) Let us note that, when the temperature is large, the moves are similar
to those produced by a random walk, while, when the temperature is low
the acceptance criterion of the moves is that of local search and the algo-
rithm resembles GSAT, that will be introduced in Sec. 7.2. Implementation
details, the addition of a "random walk" modification inspired by [78], and
experimental results are described in the cited paper.

7.2 GSAT with "random noise" strategies


SAT is of special concern to Artificial Intelligence because of its connection
to reasoning. In particular, deductive reasoning is the complement of satis-
fiability: from a collection of base facts A one should deduce a sentence F
if and only if A U F is not satisfiable, see also Sec. 2.2. The popular and
effective algorithm GSAT was proposed in [81] as a model-finding procedure,
i.e., to find an interpretation ofthe variables under which the formula comes
out true. GSAT consists of multiple runs of LS+, each run consisting of a
number of iterations that is typically proportional to the problem dimen-
sion n. The experiments in [81] show that GSAT can be used to solve hard
(see sec. 9.2) randomly generated problems that are an order of magnitude
larger than those that can be solved by more traditional approaches like
Davis-Putnam or resolution. Of course, GSAT is an incomplete procedure:
it could fail to find an optimal assignment. An extensive empirical analysis
of GSAT is presented in [37, 36].
Different "noise" strategies to escape from attraction basins are added to
GSAT in [78, 80]. In particular, the GSAT-WITH-WALK algorithm has been
tested in [80] on the Hansen-Jaumard benchmark of [48], where a better per-
formance with respect to SAMD is demonstrated, although requiring much
longer CPU times. See Sec. 8.1 for the definition of SAMD.
The algorithm is briefly summarized in Fig. 18. A certain number of
tries (MAX-TRIES) is executed, where each try consists of a number of
iterations (MAX-FLIPS). At each iteration a variable is chosen by two
possible criteria and then flipped by the function FLIP, i.e., Ui becomes
equal to (1 - Ui). One criterion, active with "noise" probability p, selects a
variable occurring in some unsatisfied clause with uniform probability over
these variables, the other one is the standard method based on the function
f given by the number of satisfied clauses. The first criterion was motivated
130 R. Battiti and M. Protasi

GSAT-WITH-WALK
1 for i +- 1 to MAX-TRIES
2 U +- random truth assignment
3 for j +- 1 to MAX-FLIPS
4 if RANDOM NUMBER < p then
5 u +- any variable occurring in some unsat. clause
6 else
7 u +- any variable with largest~f
8 FLIP(U)

Figure 18: The GSAT-WITH-WALK algorithm. RANDOM NUMBER generates


random numbers in the range [0,1].

by [72], see also Sec. 6.4. For a generic move J.L, the quantity ~/J (or ~f for
short) is defined as f(J.L U(t)) - j(U(t)). The straightforward book-keeping
part of the algorithm is not shown. In particular, the best assignment found
during all trials is saved and reported at the end of the run. In addition,
the run is terminated immediately if an assignment is found that satisfies
all clauses. The original GSAT algorithm can be obtained by setting p = 0
in the GSAT-WITH-WALK algorithm of Fig. 18.

7.3 Randomized Greedy and Local Search (GRASP)

A hybrid algorithm that combines a randomized greedy construction phase


to generate initial candidate solutions, followed be a local improvement
phase is the GRASP scheme proposed in [75] for the SAT and generalized
for the MAX W-SAT problem in [76], a work that is briefly summarized in
this section.
GRASP is an iterative process, with each iteration consisting of two
phases, a construction phase and a local search phase.
During each construction, all possible choices are ordered in a candidate
list with respect to a greedy function measuring the (myopic) benefit of
selecting it. The algorithm is randomized because one picks in a random
way one of the best candidates in the list, not necessarily the top candidate.
In this way different solutions are obtained at the end of the construction
phase.
Because these solutions are not guaranteed to be locally optimal with
respect to simple neighborhoods, it is usually beneficial to apply a local
search to attempt to improve each constructed solution.
A high-level description of the GRASP algorithm is presented in Fig. 19,
Approximate Algorithms for MAX-SAT 131

GRASP (RCLSize, MaxIter, RandomSeed)


1 b. Input instance and initialize data structures
2 for it-I to M ax Iter
3 [ U t- CONSTRUCT GREEDY RAND (RCLSize, RandomSeed)
4 U t- LOCALSEARCH(U)

CONSTRUCTGREEDyRAND(RCLSize, RandomSeed)
1 for k t- 1 to n
2 [ MAKERCL(RCLSize)
3 S t- SELEcTINDEx(RandomSeed)
4 ASSIGNVARIABLE(S)
5 ADAPTGREEDyFuNCTION(S)

Figure 19: The GRASP algorithm (above) and the randomized greedy con-
struction (below).

a summarized version of the more detailed description in [76]. After reading


the instance and initializing the data structures one repeats for M ax Iter
iterations the construction of an assignment U and the application of local
search starting from U to produce a possibly better assignment (lines 2-4).
Of course, the best assignment found during all iterations is saved and re-
ported at the end. In addition to MaxIter, the parameters are RCLSize,
the size of the restricted candidate list of moves out of which a random selec-
tion is executed, and a random seed used by the random number generator.
In detail, see function CONSTRUCTGREEDyRAND in Fig. 19, the restricted
candidate list of assignments is created by MAKERCL, the index of the
next variable to be assigned a truth value is chosen by SELECTINDEX, the
truth value is assigned by ASSIGNVARIABLE and the greedy function that
guides the construction is changed by ADAPTGREEDyFuNCTION to reflect
the assignment just made.

The remaining details about the greedy function (designed to maximize


the total weight of yet-unsatisfied clauses that become satisfied after a given
assignment), the creation of the restricted candidate list, and local search
(based on the 1-flip neighborhood) are presented in [76], together with ex-
perimental results.
132 R. Battiti and M. Protasi

8 History-sensitive Heuristics
Different history-sensitive heuristics have been proposed to continue local
search schemes beyond local optimality. These schemes aim at intensifying
the search in promising regions and at diversifying the search into uncharted
territories by using the information collected from the previous phase (the
history) of the search. The history at iteration t is formally defined as the
set of ordered couples (U,8) such that 0 ~ 8 ~ t and U = u(a).
Because of the internal feedback mechanism, some algorithm parameters
can be modified and tuned in an on-line manner, to reflect the characteristics
of the task to be solved and the local properties of the configuration space
in the neighborhood of the current point. This tuning has to be contrasted
with the off-line tuning of an algorithm, where some parameters or choices
are determined for a given problem in a preliminary phase and they remain
fixed when the algorithm runs on a specific instance.

8.1 Prohibition-based Search: TS and SAMD


Tabu Search (TS) is a history-sensitive heuristic proposed by F. Glover [38]
and, independently, by Hansen and Jaumard, that used the term SAMD
("steepest ascent mildest descent") and applied it to the MAX-SAT prob-
lem in [48]. The main mechanism by which the history influences the search
in TS is that, at a given iteration, some neighbors are prohibited, only a
non-empty subset N A (U(t») c N (U(t») of them is allowed. The general way
of generating the search trajectory that we consider is given by:
NA(U(t») = ALLOW(N(U(t»), U(O), ... , U(t») (21)
U(t+l) _ BEST-NEIGHBOR( NA(U(t») ) (22)
The set-valued function ALLOW selects a non-empty subset of N(U(t»)
in a manner that depends on the entire previous history of the search
U(O), ... , U(t). Let us note that worsening moves can be produced by eqn. 22,
as it must be in order to exit local optima.
The introduction of algorithm SAMD is motivated in [48] by contrasting
the technique with Simulated Annealing (SA) [60] for maximization. The
directions of local changes are little explored by SA: for example, if the
objective function increases, the change is always accepted however small
it may be. On the contrary, it is desirable to exploit the information on
the direction of steepest ascent and yet to retain the property of not being
blocked at the first local optimum found. SAMD performs local changes in
the direction of steepest ascent until a local optimum is encountered, then a
Approximate Algorithms for MAX-SAT 133

local change along the direction of mildest descent takes place and the reverse
move is forbidden for a given number of iterations to avoid cycling with a
high probability. The details of the SAMD technique as well as additional
specialized devices for detecting and breaking cycles are outlined in [48].
A computational comparison with SA and with Johnson's two algorithms
is also presented. A specialized Tabu Search heuristic is used in [51] to
speed up the search for a solution (if the problem is satisfiable) as part of a
branch-and-bound algorithm for SAT, that adopts both a relaxation and a
decomposition scheme by using polynomial instances, i.e., 2-SAT and Horn
SAT.

8.2 HSAT and "clause weighting"


In addition to the already cited SAMD [48] heuristic that uses the temporary
prohibitions of recently executed moves, let us mention two variations of
GSAT that make use of the previous history.
HSAT [37] introduces a tie-breaking rule into GSAT: if more moves pro-
duce the same (best) /)../, the preferred move is the one that has not been
applied for the longest span. HSAT can be seen as a "soft" version of Tabu
Search: while TS prohibits recently-applied moves, HSAT discourages recent
moves if the same D-./ can be obtained with moves that have been "inac-
tive" for a longer time. It is remarkable to see how this innocent variation
of GSAT can increase its performance on some SAT benchmark tasks [37].
Clause-weighting has been proposed in [79] in order to increase the ef-
fectiveness of GSAT for problems characterized by strong asymmetries. In
this algorithm a positive weight is associated to each clause to determine
how often the clause should be counted when determining which variable to
flip. The weights are dynamically modified during problem solving and the
qualitative effect is that of "filling in" local optima while the search pro-
ceeds. Clause-weighting can be considered as a "reactive" technique where
a repulsion from a given local optimum is generated in order to induce an
escape from a given attraction basin.

8.3 Reactive Search


Different methods to generate prohibitions produce qualitatively different
search trajectories, i.e., sequences of visited configurations U(t). In partic-
ular, prohibitions based on a list of moves lead to a faster escape from a
locally optimal point than prohibitions based on a list of visited configura-
tions [11]. In this method prohibitions are determined by the last moves
134 R. Battiti and M. Protasi

applied. In detail, the ALLOW function can be specified by introducing a


prohibition parameter T (also called list size) that determines how long a
move will remain prohibited after its execution. The FIXED-TS algorithm
is obtained by fixing T throughout the search [38]. A neighbor is allowed if
and only if it is obtained from the current point by applying a move that
has not been used during the last T iterations. In detail, if LASTUSED(J.l)
is the last usage time of move J.l ( LASTUSED(J.l) = -00 at the beginning):
NA(U(t») = {U = J.l U(t) such that LASTUSED(p) < (t - Tn (23)
The Reactive Tabu Search algorithm [14], REACTIVE-TS for short, de-
fines simple rules to determine the prohibition parameter by reacting to
the repetition of previously-visited configurations. One has a repetition if
U(t+R) = U(t), for R ~ 1. The prohibition period T depends on the iteration
t (therefore the notation is T(t»), and the discrete dynamical system that
generates the search trajectory comprises an additional evolution equation
for T(t), that is specified through the function REACT, see eqn. 24 below.
The dynamical system becomes:
T(t) REACT(T(t-l), U(O), ... , U(t») (24)
NA(U(t») = {U = P U(t) such that LASTUSED(p) < (t - T(t)n(25)
U(t+1) = BEST-NEIGHBOR(NA(U(t»)) (26)
While the reader is referred to [14] for the details, the design principles of
REACTIVE-TS are that T(t) (in the range 1 ~ T(t) ~ n - 2) increases when
repetitions happen, and decreases when repetitions disappear for a suffi-
ciently long search period. For convenience, let us introduce a "fractional
prohibition" Tf, such that the prohibition is obtained by setting T = lTf n J.
Tf ranges between zero and one, with bounds inherited from those on T.
Larger T values imply larger diversification, in particular the relationship
between T and the diversification is as follows:
• The Hamming distance H between a starting point and successive
point along the trajectory is strictly increasing for T + 1 steps.
H(U(t+T), U(t») = r for r ~T +1
• The minimum repetition interval R along the trajectory is 2(T + 1).
U(t+R) = U(t) ~ R ~ 2(T + 1)

REACTIVE-TS has been applied to various problems with competitive


performance with respect to alternative heuristics like FIXED-TS, SA, Neu-
ral Networks, and Genetic Algorithms, see the review in [11].
Approximate Algorithms for MAX-SAT 135

8.3.1 The Hamming-Reactive Tabu Search (H-RTS) algorithm


An algorithm that combines the previously described techniques of local
search (oblivious and non-oblivious), the use of prohibitions (see TS and
SAMD), and a reactive scheme to determine the prohibition parameter is
presented in [12]. The algorithm is called HAMMING-REACTIVE-TS algo-
rithm, and its core is illustrated in Fig. 20.
HAMMING-REACTIVE-TS
1 repeat
2 tr ~ t
3 U ~ random truth assignment
4 T~lT,nJ

5 repeat { NOB local search}


6 [ U ~ BEST-MoVE(LS, fNOB)
7 until largest 6.fNOB = 0

8 repeat
9 repeat { local search}
10 [ U ~ BEST-MoVE(LS, fOB)
11 until largest 6.foB = 0
12 Ul~U

13 for 2(T + 1) iterations { reactive tabu search}


14 [ U ~ BEST-MoVE (TS,JOB)
15 UF~U

16 T ~ REACT (T" UF, UI)


17 until (t - t r ) > 10 n
18 until solution is acceptable or max. number of iterations reached

Figure 20: The H-RTS algorithm.

The initial truth assignment is generated in a random way, and non-


oblivious local search (LS-NOB) is applied until the first local optimum
of fNOB is encountered. LS-NOB obtains local minima of better average
quality than LS-OB, but then the guiding function becomes the standard
oblivious one. This choice was motivated by the success of the NOB & OB
combination [13] and by the poor diversification properties of NOB alone,
see [12].
136 R. Battiti and M. Protasi

The search proceeds by repeating phases of local search followed by


phases of TS (lines 8-17 in Fig. 20), until a suitable number of iterations are
accumulated after starting from the random initial truth assignment (see line
17 in Fig. 20). A single elementary move is applied at each iteration. The
variable t, initialized to zero, identifies the current iteration and increases
after a local move is applied, while tr identifies the iteration when the last
random assignment was generated. Some trivial bookkeeping details (like
the increase of t) are not shown in the figure.
During each combined phase, first the local optimum of f is reached,
then 2(T + 1) moves of Tabu Search are executed. The design principle
underlying this choice is that prohibitions are necessary for diversifying the
search only after LS reaches a local optimum. The fractional prohibition
Tf is changed during the run by the function REACT to obtain a proper
balance of diversification and bias [12].
The random restart executed after 10 n moves guarantees that the search
trajectory is not confined in a localized portion of the search space.
Being an heuristic algorithm, there is not a natural termination cri-
terion. In its practical application, the algorithm is therefore run until
either the solution is acceptable, or a maximum number of moves (and
therefore CPU time) has elapsed. What is demonstrated in the compu-
tational experiments in [12] is that, given a fixed number of iterations,
HAMMING-REACTIVE-TS achieves much better average results with respect
to competitive algorithms (GSAT and GSAT-WITH-WALK). Because, to a
good approximation, the actual running time is proportional to the number
of iterations, HAMMING-REACTIVE-TS should therefore be used to obtain
better approximations in a given allotted number of iterations, or equivalent
approximations in a much smaller number of iterations.

9 Experimental analysis and threshold effects


Given the hardness of the problem and the relevancy for applications in
different fields, the emphasis on the experimental analysis of algorithms for
the MAX-SAT problem has been growing in recent years.
In some cases the experimental comparisons have been executed in the
framework of "challenges," with support of electronic collection and distri-
bution of software, problem generators and test instances. An example is
the the Second DIMACS Algorithm Implementation Challenge on Cliques,
Coloring and Satisfiability, whose results have been published in [53]. The
archive is currently available from:
Approximate Algorithms for MAX-SAT 137

http://dimacs.rutgers.edu/Challenges/.
Practical and industrial MAX-SAT problems and benchmarks, with signifi-
cant case studies are also presented in [30], see also the contained review [45].

9.1 Models
Let us describe some basic problem models that are considered both in
theoretical and in experimental studies of MAX-SAT algorithms [45] .

• k-SAT model, also called fixed length clause model. A randomly


generated CNF formula consists of independently generated random
clauses, where each clause contains exactly k literals. Each literal
is chosen uniformly from U = {Ul' ... , un} without replacement, and
negated with probability p. The default value for p is 1/2 .

• average k-SAT model, also called random clause model. A ran-


domly generated CNF formula consists of independently generated
random clauses. Each literal has ha probability p of being part of a
clause. In detail, each of the n variables occurs positively with proba-
bility p(1- p), negatively with probability p(1- p), both positively and
negatively with probability p2, and is absent with probability (1- p)2.

Both models have many variations depending on whether the clauses are
required to be different, whether a variable and its negation can be present
in the same clause, etc.
Although superficially similar, the two models differ in the difficulty to
solve the obtained formulae and in the mathematical analysis. In particular,
when the initial formula comes from the average k-SAT model, a step that
fixes the value of a variable produces a set of clauses from the same model,
while if the same step is executed in the k-SAT model, the resulting clauses
do not necessarily have the same length and therefore do not come from the
k-SAT model.
Other structured problem models are derived from the mapping of in-
stances of different problems, like coloring, n-queens, etc. [43]. The per-
formance of algorithms on these more structured models tend to have little
correlation with the performance tested on the above introduced random
problems. Unfortunately, the theoretical analysis of these more structured
problems is very hard. The situation worsens if one considers "real-world"
practical applications, where one is typically confronted with a few instances
and little can be derived about the "average" performance, both because the
138 R. Battiti and M. Protasi

probability distribution is not known and because the number of instances


tends to be very small.
A compromise can be reached by having parametrized generators that
capture some of the relevant structure of the "real-world" problems of in-
terest.

9.2 Hardness and threshold effects


Different algorithms demonstrate a different degree of effort, measured by
number of elementary steps or CPU time, when solving different kinds of
instances. For example, Mitchell et ale [67] found that some distributions
used in past experiments are of little interest because the generated formulae
are almost always very easy to satisfy. They also reported that one can
generate very hard instances of k-SAT, for k ~ 3. In addition, they report
the following observed behavior for random fixed length a-SAT formulae:
if r is the ratio r of clauses to variables (r = min), almost all formulae
are satisfiable if r < 4, almost all formulae are unsatisfiable if r > 4.5. A
rapid transition seems to appear for r ~ 4.2, the same point where the
computational complexity for solving the generated instances is maximized,
see [61, 26] for reviews of experimental results.
A series of theoretical analyses aim at approximating the unsatisfiability
threshold of random formulae. Let us define the notation and summarize
some results obtained.
Let C be a random k-SAT formula. The research problem that has
been considered, see for example [62], is to compute the least real number
K, such that, if r is larger than K" then the probability of C being satisfi-

able converges to 0 as n tends to infinity. In this case one says that C is


asymptotically almost certainly satisfiable. Experimentally, K, is a thresh-
old value marking a "sudden" change from probabilistically certain satisfi-
ability to probabilistically certain unsatisfiability. More precisely [1], given
a sequence of events Ci, one says that Cn occurs almost surely (a.s.) if
lim n -+ oo Pr [Cn ] = 1, where Pr [event] denotes the probability of an event.
The behavior observed in experiments with random k-SAT leads to the fol-
lowing conjecture:
For every k ~ 2, there exist rk such that for any c > 0, random instances of
k-SAT with (rk - c)n clauses are a.s. satisfiable and random instances with
(rk + c)n clauses are a.s. unsatisfiable
For k = 2 (i.e., for the polynomially solvable 2-SAT) the conjecture
was proved [23, 41], in fact showing that r2 = 1. For k = 3 much less
progress has been made: neither the existence of r3 nor its value has been
Approximate Algorithms for MAX-SAT 139

determined.
In the fixed-length 9-SAT model, the total number of all possible clauses
is 8 (;) and the probability that a random clause is satisfied by a truth
assignment U is 7/8.
Let Un be the set of all truth assignments on n variables, and let Sn be
the set of assignments that satisfy the random formula C. Therefore the
cardinality ISnl is a random variable. Given C, let ISn(C)1 be the number
of assignments satisfying C.
The expected value of the number of satisfying truth assignments of a
random formula, E [ISnl], is defined as:

E [ISn!] = I.)Pr [C) ISn(C)1) (27)


C
The probability that a random formula is satisfiable is:

Pr [the random formula is satisfiable] = I)Pr [C) IC) (2S)


C
where IC is 1 if C is satisfiable, 0 otherwise.
From equations (27) and (28) the following Markov's inequality follows:

Pr [the random formula is satisfiable] ~ E [lSnl] (29)

Let us now consider the "first moment" argument to obtain an upper


bound for K in the 9-SAT model. First one observes that the expected
number of truth assignments that satisfy C is 2n(7 Ist n , then one lets this
expected value converge to zero and uses the above Markov's inequality.
From this one obtains

K ~ logs/72 = 5.191

This result has been found independently by many people, including [33]
and [24].
The weakness of the above technique is that, in the right-hand side of
equation (27) one can have small probabilities multiplied by large cardinali-
ties, therefore the condition may be unnecessarily strong to ensure only that
C is almost certainly satisfiable. In [62], instead of considering the random
class Sn that may have a large cardinality for a formula with small proba-
bility, one considers a subset of it obtained by considering truth assignments
that satisfy a local maximality condition. In particular, one considers the
subset S~ defined as the random class of assignments U satisfying C such
140 R. Battiti and M. Protasi

that any assignment obtained from U by changing exactly one false value
of U to true does not satisfy C.
It is demonstrated in the cited paper that the expected value E [stf]
is at most (7/8yn(2 - e-3r / 7 + 0(1»n. It follows that the unique positive
solution of the equation

is an upper bound for ti.. This solution is less than 4.667. Better bounds can
be obtained by increasing the range of locality when selecting the local max-
ima that represent Sn. A previous best bound of 4.758 had been obtained
in [55] by non-elementary means. Independently, Dubois and Boufkhad [31]
obtained an upper bound of 4.64.
Unlike upper bounds, which are based on probabilistic counting argu-
ments, all known lower bounds for r3 are algorithmic. The UNIT CLAUSE
algorithm for 3-SAT is considered in [20], where it is shown that, for r < 2.9
or r < 8/3, depending on the presence or absence of a "majority rule," it
finds a satisfying assignment with positive probability instead of a.s. (there-
fore this does not imply that r3 ~ 2.9). The PURE LITERAL algorithm
succeeds a.s. for r < 1.63, see [17]. A generalization of the analysis [34]
shows that the GUC algorithm succeeds a.s. for r < 3.003, giving the best
known lower bound for r3. Additional recent developments are presented
in [1].

References
[1] D. Achlioptas, L. M. Kirousis, E. Kranakis, and D. Krinzac, Rigorous
results for random (2+p)-SAT, Proc. Work. on Randomized Algorithms
in Sequential, Parallel and Distributed Computing, Santorini, Greece,
1997.

[2] P. Alimonti, New local search approximation techniques for maximum


generalized satisfiability problems, Proc. Second Italian Conf. on Algo-
rithms and Complexity, Rome, 1994, pp. 40-53.

[3] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof


verification and hardness of approximation problems, Proc. 33rd Annual
IEEE Symp. on Foundations of Computer Science, IEEE Computer
Society, 1992, pp. 14-23.
Approximate Algorithms for MAX-SAT 141

[4] S. Arora and S. Safra, Probabilistic checking of proofs: a new charac-


terization of NP, Proc. 33rd Annual IEEE Symp. on Foundations of
Computer Science, IEEE Computer Society, 1992, pp. 2-13.
[5] T. Asano, Approximation algorithms for MAX-SAT: Yannakakis 'lis.
Goemans- Williamson, Proc. 3rd Israel Symp. on the Theory of Com-
puting and Systems, Ramat Gan, Israel, 1997, pp. 24-37.
[6] T. Asano, T. Ono, and T. Hirata, Approximation algorithms for the
maximum satisfiability problem, Proc. 5th Scandinavian Work. on Al-
gorithms Theory, 1996, pp. 110-l1I.
[7] P. Asirelli, M. de San tis, and A. Martelli, Integrity constraints in logic
databases, Journal of Logic Programming 3 (1985),221-232.
[8] G. Ausiello, P. Crescenzi, and M. Protasi, Approximate solution of NP
optimization problems, Theoretical Computer Science 150 (1995), 1-55.
[9] G. Ausiello, A. D'Atri, and M. Prot asi , Lattice theoretic properties of
NP-complete problems, F'undamenta Informaticae 4 (1981),83-94.
[10] G. Ausiello and M. Protasi, Local search, reducibility and approxima-
bility of NP-optimization problems, Information Processing Letters 54
(1995),73-79.
[11] R. Battiti, Reactive search: Toward self-tuning heuristics, Modern
Heuristic Search Methods (V. J. Rayward-Smith, I. H. Osman, C. R.
Reeves, and G. D. Smith, eds.), John Wiley and Sons, 1996, pp. 61-83.
[12] R. Battiti and M. Protasi, Reactive search, a history-sensitive heuristic
for MAX-SAT, ACM Journal of Experimental Algorithmics 2 (1997),
no. 2, http://www.jea.acm.orgJ.
[13] , Solving MAX-SAT with non-oblivious functions and history-
based heuristics, Satisfiability Problem: Theory and Applications, DI-
MACS: Series in Discrete Mathematics and Theoretical Computer Sci-
ence, no. 35, AMS and ACM Press, 1997.
[14] R. Battiti and G. Tecchiolli, The reactive tabu search, ORSA Journal
on Computing 6 (1994), no. 2, 126-140.
[15] C.E. Blair, R.G. Jeroslow, and J.K. Lowe, Some results and experiments
in programming for propositional logic, Computers and Operations Re-
search 13 (1986), no. 5, 633-645.
142 R. Battiti and M. Protasi

[16] M. Boehm and E. Speckenmeyer, A fast parallel sat solver - efficient


workload balancing, Annals of Mathematics and Artificial Intelligence
17 (1996),381-400.

[17] A. Broder, A. Frieze, and E. Upfal, On the satisfiability and maximum


satisfiability of random 3-CNF formulas, Proc. of the 4th Annual ACM-
SIAM Symp. on Discrete Algorithms, 1993.

[18] M. Buro and H. Kleine Buening, Report on a SAT competition, EATCS


Bulletin 49 (1993),143-151.

[19] S. Chakradar, V. Agrawal, and M. Bushnell, Neural net and boolean


satisfiability model of logic circuits, IEEE Design and Test of Computers
(1990),54-57.

[20] M.-T. Chao and J. Franco, Probabilistic analysis of two heuristics for
the 3-satisfiability problem, SIAM Journal on Computing 15 (1986),
1106-1118.

[21] J. Chen, D. Friesen, and H. Zheng, Tight bound on Johnson's algorithm


for MAX-SAT, Proc. 12th Annual IEEE Conf. on Computational Com-
plexity, Ulm, Germany, 1997, pp. 274-281.

[22] J. Cheriyan, W. H. Cunningham, T. Tuncel, and Y. Wang, A linear pro-


gramming and rounding approach to MAX 2-SAT, Proc. of the Second
DIMACS Algorithm Implementation Challenge on Cliques, Coloring
and Satisfiability (M. Trick and D. S. Johson, eds.), DIMACS Series on
Discrete Mathematics and Theoretical Computer Science, no. 26, 1996,
pp. 395-414.

[23] V. Chvatal and B. Reed, Mick gets some (the odds are on his side),
Proc. 33th Ann. IEEE Symp. on Foundations of Computer Science,
IEEE Computer Society, 1992, pp. 620-627.

[24] V. Chvatal and E. Szemeredi, Many hard examples for resolution, Jour-
nal of the ACM 35 (1988), 759-768.

[25] S.A. Cook, The complexity of theorem-proving procedures, Proc. of the


Third Annual ACM Symp. on the Theory of Computing, 1971, pp. 151-
158.

[26] S.A. Cook and D.G. Mitchell, Finding hard instances of the satisfiabil-
ity problem: a survey, Satisfiability Problem: Theory and Applications
Approximate Algorithms for MAX-SAT 143

(D.-Z. Du, J. Gu, and P.M. Pardalos, eds.), DIMACS Series in Dis-
crete Mathematics and Theoretical Computer Science, vol. 35, AMS
and ACM Press, 1997.
[27] P. Crescenzi and A. Panconesi, Completeness in approximation classes,
Information and Computation 93 (1991), 241-262.
[28] M. Davis, G. Logemann, a.nd D. Loveland, A machine program for
theorem proving, Communications of the ACM 5 (1962),394-397.
[29] M. Davis and H. Putnam, A computing procedure for quantification
theory, Journal of the ACM 1 (1960),201-215.
[30] D. Du, J. Gu, and P.M. Pardalos (Eds.), Satisfiability problem: Theory
and applications, DIMACS Series in Discrete Mathematics a.nd Theo-
retical Computer Science, vol. 35, AMS and ACM Press, 1997.
[31] O. Dubois and Y. Boufkhad, A general upper bound for the satisfiability
threshold of random r-SAT fonnulas, Tech. report, LAFORIA, CNRS-
Univ. Paris 6, 1996.
[32] U. Feige and M.X. Goemans, Approximating the 'Value of two proper
proof systems, with applications to MAX-2SAT and MAX-DICUT,
Proc. of the Third Israel Symp. on Theory of Computing and Systems,
1995, pp. 182-189.
[33] J. Franco and M. Paull, Probabilistic analysis of the davis-putnam pro-
cedure for solving the satisfiability problem, Discrete Applied Mathe-
matics 5 (1983), 77-87.
[34] A. Frieze and S. Suen, Analysis of two simple heuristics on a random
instance of k-SAT, Journal of Algorithms 20 (1996),312-355.
[35] H. Gallaire, J. Minker, and J. M. Nicolas, Logic and databases: a de-
ductive approach, Computing Surveys 16 (1984), no. 2, 153-185.
[36] I.P. Gent and T. Walsh, An empirical analysis of search in gsat, Journal
of Artificial Intelligence Research 1 (1993),47-59.
[37] , Towards an understanding of hill-climbing procedures for SAT,
Proc. of the Eleventh National Conf. on Artificial Intelligence, AAAI
Press I The MIT Press, 1993, pp. 28-33.
[38] F. Glover, Tabu search - part I, ORSA Journal on Computing 1 (1989),
no. 3, 190-260.
144 R. Battiti and M. Protasi

[39] M.X. Goemans and D.P. Williamson, New 9/4-approximation algo-


rithms for the maximum satisfiability problem, SIAM Journal on Dis-
crete Mathematics 7 (1994), no. 4, 656-666.

[40] , Improved approximation algorithms for maximum cut and


satisfiability problems using semidefinite programming, Journal of the
ACM 42 (1995), no. 6, 1115-1145.

[41] A. Goerdt, A threshold for unsatisfiability, Journal of Computer and


System Sciences 53 (1996),469-486.

[42] J. Gu, Efficient local search for very large-scale satisfiability problem,
ACM SIGART Bulletin 3 (1992), no. 1,8-12.

[43] , Global optimization for satisfiability (SAT) problem, IEEE


Transactions on Data and Knowledge Engineering 6 (1994), no. 3, 361-
381.

[44] J. Gu, Q.-P. Gu, and D.-Z.Du, Convergence properties of optimization


algorithms for the SAT problem, IEEE Transactions on Computers 45
(1996), no. 2, 209-219.

[45] J. Gu, P.W. Purdom, J. Franco, and B.W. Wah, Algorithms for the sat-
isfiability (SAT) problem: A survey, Satisfiability Problem: Theory and
Applications (D.-Z. Du, J. Gu, and P.M. Pardalos, eds.), DIMACS Se-
ries in Discrete Mathematics and Theoretical Computer Science, vol. 35,
AMS and ACM Press, 1997.

[46] J. Gu and R. Puri, Asynchronous circuit synthesis with boolean satis-


fiability, IEEE Transactions on Computer-Aided Design of Integrated
Circuits 14 (1995), no. 8, 961-973.

[47] P.L. Hammer, P. Hansen, and B. Simeone, Roof duality, complementa-


tion and persistency in quadratic 0-1 optimization, Mathematical Pro-
gramming 28 (1984), 121-155.

[48] P. Hansen and B. Jaumard, Algorithms for the maximum satisfiability


problem, Computing 44 (1990), 279-303.

[49] J.N. Hooker, Resolution vs. cutting plane solution of inference problems:
some computational experience, Operations Research Letters 7 (1988),
no. 1, 1-7.
Approximate Algorithms for MAX-SAT 145

[50] J. Hastad, Some optimal inapproximability results, Proc. 28th Annual


ACM Symp. on Theory of Computing, El Paso, Texas, 1997, pp. 1-10.

[51] B. Jaumard, M. Stan, and J. Desrosiers, Tabu search and a quadratic


relaxation for the satisfiability problem, Proc. of the Second DIMACS
Algorithm Implementation Challenge on Cliques, Coloring and Satisfi-
ability (M. Trick and D. S. Johson, eds.), DIMACS Series on Discrete
Mathematics and Theoretical Computer Science, no. 26, 1996, pp. 457-
477.

[52] D.S. Johnson, Approximation algorithms for combinatorial problems,


Journal of Computer and System Sciences 9 (1974),256-278.

[53] D.S. Johnson and M. Trick (Eds.), Cliques, coloring, and satisfiability:
Second DIMACS implementation challenge, vol. 26, DIMACS Series in
Discrete Mathematics and Theoretical Computer Science, no. 26, AMS,
1996.

[54] J .L. Johnson, A neural network approach to the 9-satisfiability problem,


Journal of Parallel and Distributed Computing 6 (1989),435-449.

[55] A. Kamath, R. Motwani, K Palem, and P. Spirakis, Tail bounds for oc-
cupancy and the satisfiability threshold conjecture, Random Structures
and Algorithms 1 (1995), 59-80.

[56] A.P. Kamath, N.K Karmarkar, KG. Ramakrishnan, and M.G. Re-
sende, Computational exprience with an interior point algorithm on the
satisfiability problem, Annals of Operations Research 25 (1990),43-58.

[57] , A continuous approach to inductive inference, Mathematical


programming 51 (1992),215-238.

[58] H. Karloff and U. Zwick, A 7/8-approximation algorithm for MAX


9SAT?, Proc. of the 38th Annual IEEE Symp. on Foundations of Com-
puter Science, IEEE Computer Society, 1997, in press.

[59] S. Khanna, R.Motwani, M.Sudan, and U .Vazirani, On syntactic versus


computational views of approximability, Proc. 35th Ann. IEEE Symp.
on Foundations of Computer Science, IEEE Computer Society, 1994,
pp. 819-836.

[60] S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi, Optimization by sim-
ulated annealing, Science 220 (1983),671-680.
146 R. Battiti and M. Protasi

[61] S. Kirkpatrick and B. Selman, Critical behavior in the satisfiability of


random boolean expressions, Science 264 (1994), 1297-130l.
[62] L. M. Kirousis, E. Kranakis, and D. Krizanc, Approximating the un-
satisfiability threshold of random formulas, Proc. of the Fourth Annual
European Symp. on Algorithms (Barcelona), Springer-Verlag, Septem-
ber 1996, pp. 27-38.
[63] E. Koutsoupias and C.H. Papadimitriou, On the greedy algorithm for
satisfiability, Information Processing Letters 43 (1992),53-55.
[64] O. Kullmann and H. Luckhardt, Deciding propositional tautologies: Al-
gorithms and their complexity, Tech. Report 1596, JohannWolfgang
Goethe-Univ., Fachbereich Mathematik, Frankfurt, Germany, January
1997.
[65] D.W. Loveland, Automated theorem proving: A logical basis, North-
Holland, 1978.
[66] S. Minton, M. D. Johnston, A. B. Philips, and P. Laird, Solving large-
scale constraint satisfaction and scheduling problems using a heuristic
repair method, Proc. of the 8th National Conf. on Artificial Intelligence
(AAAI-90), 1990, pp. 17-24.
[67] D. Mitchell, B. Selman, and H. Levesque, Hard and easy distributions of
SAT problems, Proc. of the 10th National Conf. on Artificial Intelligence
(AAAI-92) (San Jose, Cal, July 1992, pp. 459-465.
[68] R. Motwani and P. Raghavan, Randomized algorithms, Cambridge Uni-
versity Press, New York, 1995.
[69] T.A. Nguyen, W.A. Perkins, T.J. Laffrey, and D. Pecora, Checking an
expert system knowledge base for consistency and completeness, Proc.
of the International Joint Conf. on Artificial Intelligence (Los Altos,
CAl, 1985, pp. 375-378.
[70] P. Nobili and A. Sassano, Strengthening lagrangian bounds for the
MAX-SATproblem, Tech. Report 96-230, Institut fuer Informatik, Koln
Univ., Germany, 1996, Proc. of the Work. on the Satisfiability Problem,
Siena, Italy (J. Franco and G. Gallo and H. Kleine Buening, Eds.).
[71] P. Orponen and H. Mannila, On approximation preserving reductions:
complete problems and robust measures, Tech. Report C-1987-28, Dept.
of Computer Science, Univ. of Helsinki, 1987.
Approximate Algorithms for MAX-SAT 147

[72] C. H. Papadimitriou, On selecting a satisfying truth assignment (ex-


tended abstract), Proc. of the 32th Annual Symp. on Foundations of
Computer Science, 1991, pp. 163-169.

[73] C.H. Papadimitriou and K. Steiglitz, Combinatorial optimization, al-


gorithms and complexity, Prentice-Hall, NJ, 1982.

[74] R. Puri and J. Gu, A BDD SAT solver for satisfiability testing: an
industrial case study, Annals of Mathematics and Artificial Intelligence
17 (1996), no. 3-4, 315-337.

[75] M.G.C. Resende and T. A. Feo, A grasp for satisfiability, Proc. of the
Second DIMACS Algorithm Implementation Challenge on Cliques, Col-
oring and Satisfiability (M. Trick and D. S. Johson, eds.), DIMACS Se-
ries on Discrete Mathematics and Theoretical Computer Science, no. 26,
1996, pp. 499-520.

[76] M.G.C. Resende, L.S. Pitsoulis, and P.M. Pardalos, Approximate solu-
tion of weighted MAX-SA Tproblems using GRASP, Satisfiability Prob-
lem: Theory and Applications, DIMACS: Series in Discrete Mathemat-
ics and Theoretical Computer Sc ience, no. 35, 1997.

[77] J. A. Robinson, A machine-oriented logic based on the resolution prin-


ciple, Journal of the ACM 12 (1965),23-41.

[78] B. Selman and H. Kautz, Domain-independent extensions to GSAT:


Solving large structured satisfiability problems, Proc. of the Interna-
tional Joint Conf. on Artificial Intelligence, 1993, pp. 290-295.

[79] B. Selman and H.A. Kautz, An empirical study of greedy local search
for satisfiability testing, Proc. of the 11th National Conf. on Artificial
Intelligence (AAAI-93) (Washington, D. C.), 1993.

[80] B. Selman, H.A. Kautz, and B. Cohen, Local search strategies for satisfi-
ability testing, Proc. of the Second DIMACS Algorithm Implementation
Challenge on Cliques, Coloring and Satisfiability (M. Trick and D. S.
Johson, eds.), DIMACS Series on Discrete Mathematics and Theoreti-
cal Computer Science, no. 26, 1996, pp. 521-531.

[81] B. Selman, H. Levesque, and D. Mitchell, A new method for solving hard
satisfiability problems, Proc. of the 10th National Conf. on Artificial
Intelligence (AAAI-92) (San Jose, Ca), July 1992, pp. 440-446.
148 R. Battiti and M. Prota.si

[82] W.M. Spears, Simulated annealing for hard satisfiability problems,


Proc. of the Second DIMACS Algorithm Implementation Challenge on
Cliques, Coloring and Satisfiability (M. Trick and D. S. Johnson, eds.),
DIMACS Series on Discrete Mathematics and Theoretical Computer
Science, no. 26, 1996, pp. 533-555.
[83] L. Trevisan, Approximating satisfiable satisfiability problems, Proc. of
the 5th Annual European Symp. on Algorithms, Graz, Springer Verlag,
1997, pp. 472-485.
[84] M. Yannakakis, On the approximation of maximum satisfiability, Jour-
nal of Algorithms 17 (1994),475-502.
149

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 149-188
©1998 Kluwer Academic Publishers

Connections between Nonlinear Programming and


Discrete Optimization1
Franco Giannessi
Department of Mathematics,
Universitd di Pisa
Via Buonarroti 2, 56127 Pisa, Italy
E-mail: gianness~dm.unipi.it

Fabio Tardella
Department of Mathematics
Faculty of Economics
University of Rome "La Sapienza"
Via del Castro Laurenziano 9, 00161 Roma, Italy
E-mail: tardella~ime.pi.enr.it

Contents
1 Introduction 150

2 Equivalence, via Relaxation-Penalization, of two Constrained Ex-


tremum Problems 152

3 Equivalence between Integer and Real Optimization 157

4 Functions attaining the Minimum on the Vertices of a Polyhedron161


4.1 The Box Case. . . . . 166
4.2 The Polymatroid Case . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5 Piecewise Concave Functions and Minimax 168

6 Location Problems 172


ISect.s 2,3 are due to F. Giannessij all the other Sections are due to F. Tardella.
150 F. Giannessi and F. Tardella

7 Convex-Concave Problems 176

8 Discretization, Extension and Relaxation 178

9 Necessary and Sufficient Optimality Conditions 180

10 Conclusions and Remarks on Further Developments 181

References

1 Introd uction
Given a set X, a function f :X -t R and a subset S of X- we consider the
problem:

minf(x) s.t. xES (1)


Problem (1) is usually called a combinatorial optimization problem when
S is finite and a discrete optimization problem when the points of S are
isolated in some topology, i.e., every point of S has a neighbourhood which
does not contain other points of S. Obviously, all combinatorial optimization
problems are also discrete optimization problems but the converse is not
true. A simple example is the problem of minimizing a function on the set
of integer points contained in an unbounded polyhedron.
Even though some discrete and combinatorial optimization problems
have been studied since ancient times, the increase of their importance and
their development have been very fast in the last few decades thanks to the
possibility of practically solving them with modern computers and because
of the several applications that they have found in many fields.
In most cases, discrete and combinatorial optimization can be formulated
as linear or nonlinear integer programs and are solved by means of methods
which exploit in a crucial way the finiteness or discreteness of the feasible
region.
In the case where the feasible set S is defined by a family of equalities and
inequalities in R n and at least one of the constraining or objective functions
is nonlinear, problem (1) is called a nonlinear program. Problems of this
kind have been studied for at least four centuries with tools that exploit the
differential, geometric and topological properties of the functions and sets
involved.
Nonlinear Programming and Discrete Optimization 151

Although the methods employed in integer and nonlinear programming


are quite different in general, they can be complementary in several cases.
In fact, it is often possible to restrict the search for a global solution of a
nonlinear program to a finite set of points. On the other hand, most dis-
crete optimization problems can be reformulated equivalently as nonlinear
programs. Clearly, the restriction to a finite set of points or the nonlinear
reformulation do not always lead to practical solution methods. Neverthe-
less, these approaches provide useful tools for many important classes of
problems, as will be shown in the sequel.

It should be clear that we do not attempt here to make an exhaus-


tive survey of all connections between discrete optimization and nonlinear
programming. We do not cover, e.g., the many types of continuous relax-
ation introduced for integer programming - including the recent and very
promising semidefinite relaxation (see, e.g., [1, 55, 56, 70]) - nor the several
continuous approaches to graph problems already described in [53].

In Section 2 we present a general result on the equivalence between the


problem of minimizing a function on a set and the problem of minimizing
a penalized function on a larger set. This result is then specialized in Sec-
tion 3 and used to reformulate integer programs as concave programs or
as linear complementarity problems. In Section 4 we address the problem
of establishing conditions under which a function achieves its minimum on
the vertices of a polyhedron. The class of piecewise concave functions is
introduced in Section 5 where it is proved that such functions achieve their
minimum on a set of points that is often finite. This result is then applied
to some minimax problems thereby strengthening known recent results. Lo-
cation problems are discussed in Section 6. It is shown that the optimal
solutions for many continuous problems of this type are attained in a fi-
nite subset of the feasible set. In Section 7 we consider a generalization
of bilinear programming problems. Extension and relaxation of a function
from a discrete set to a larger continuous set are briefly discussed in Sec-
tion 8 together with the opposite approach of discretizing a function defined
on a polyhedron. Finally, in Section 9, some connections between global
optimality conditions for continuous and discrete problems are presented.
152 F. Giannessi and F. Tardella

2 Equivalence, via Relaxation-Penalization, of two


Constrained Extremum Problems
In this section we will consider a constrained extremum problem P in
the following format:

minf(x) , s.t. x ERn z, (2)


where f : R n -7 R, R ~ R n, Z c Rn. Assume we are given a set Xc R n
such that Z ~ X. Replacement of R n Z with R n X is known as relaxation
of (2); it leads to a lower bound for the minimum of problem (2), which
generally does not equal the minimum. Equality between them may be
forced by means of a suitable penalization of the objective function of (2).
To this end let us introduce a function <P : R n -7 R, and consider the family
{P(/L)}~ER of problems, where P(/L) is defined as follows

min[f(x) + /L<P(x)], s.t. x ERn X, (3)

and shows, with respect to (2), a relaxation of the feasible region and a
penalization of the objective function.
We want to state conditions under which (2) and (3) are equivalent in
the sense that they have the same minimum (or infimum; +00, if Rnz = 0)
and the same set of minimum points. If no assumption is made on cp and if
f is bounded, then the answer is trivial: it is sufficient to choose:

cp
(x).!...
-
{a,1, if x E Z,
if x E X\Z,
and
/L> sup f(x) - inf f(x)
xEX x€X

to guarantee the equivalence between (2) and (3). Indeed, with the above
choice of cp, the equivalence holds iff

/L> inf f{x) - inf f{x)


xERnZ xERn(X\Z)

Of course, the function cp above is discontinuous; thus the equivalence is of


no much interest. The following theorem [23] gives a condition under which
the above equivalence is achieved within the class of continuous functions
<po Fig.s 1 and 2 illustrate the sets which appear in Theorem 2.1 by means
of two examples.
Nonlinear Programming and Discrete Optimization 153

R
Fig.1

"""
//
".-------~~~~
/ I \.,
,, ,, ,,
I I "

,, ,, ,,
x
• I ,

Z, R"Z
,
,
I

\\
'
\
\
, ,"
"
,

.. \.
~---I-~~:::.......;::........:::a
;'

Fig.2

Theorem 2.1 Let R ~ R n be closed, Z ~ X c R n , Z and X be compact,


154 F. Giannessi and F. Tardella

and let the following hypotheses hold:


(Hi) f : R n -+ R is bounded on X, and there exists an open set A :J Z and
real numbers Ct, L > 0, such that, "Ix, YEA, f fulfils the following Holder
condition:
If{x) - f{Y)1 ~ Lllx - ylla.
(H2 ) It is possible to find <p : R n -+ R, such that:
(i) <p is continuous on X;
(ii) <p{x) = 0 "Ix E Z, <p{x) > 0 "Ix E X\Z;
(iii) Vz E Z, there exists a neighbourhood 8{z) of z, and a real €(z) > 0,
such that:
<p{x) ~ €(z)lIx - zlla, "Ix E 8{z) n (X\Z).
Then, a real J.lo exists, such that VJ.l > J.lo problems (2) and (3) are equivalent.

Proof. To prove the thesis we will show that 3J.lo E R, such that VJ.l > J.lo
the minimum of f{x) + J.l<p{x) on R n X is achieved necessarily at a point
z ERn Z. Since <p{z) = 0 for every z ERn Z, we can then conclude that
the solution sets of problems (2) and (3) are the same whenever J.l > J.lo. Let
us introduce the sets X ~ R n X and Z ~ R n Z. It will be shown that the
function
. f (z) - f (x ) - - _ -
Fz{x) = <p{x) , x E An (X\Z), z E Z,

is bounded in some neighbourhood of z. To see this, consider 8{z) ~ An


8{z). For every x E 8{z) n (X\Z) we have:
<p{x) ~ €(z) IIx - zlla , €(z) > 0,
If{x) - f{z)1 ~ Lllx - zlla,
so that
L
IFz{x)1 < €(z) < +00.
The family {8{z),z E Z} is obviously a cover of Z. Since Z is compact and
Z is a closed subset of Z, there is a finite subfamily, say {8{Zi), i = 1, ... ,k},
which is a cover of Z. Consider the set:

8 ~ [0 8(Zi)] n X ;2 Z.
~=i
Nonlinear Programming and Discrete Optimization 155

It is clear that f. L > (3 == maxt(~;), i = 1, ... ,k} implies:

J(x) + f..L!p(x) > J(Zi) , "Ix E 8\2, i = 1, ... ,k. (4)

On the other side, the set

is compact, and we have:

x = Xo U (8\2) U 2 , Xo n 2 = 0, Xo n (8\2) = 0, (8\2) n 2 = 0.


Besides, J is bounded on X and thus
Mf == in{ J(x) > -00;
xEX

!p is continue and positive on Xo, and thus

Mcp == inf !p(x) = min <p(x) >


xEXo xEXo
o.
Since J is bounded on X, we have:

AO == su~ J(x~ Mf < +00


xEX cp
and, of course, Ao ~ O. If A> AO, then, by the definition of Ao, for any Z E 2
and x E Xo
J(x) + A!p(x) > Mf + AoMcp ~ J(z). (5)
Inequalities (4) and (5) hold if

f..L> f..Lo == max{(3, Ao}.

Therefore, J(x) + f..L<p(x) cannot have its minimum neither at a point in X o,


for this would not agree with (5), nor at one in 8\2, for this would not
agree with (4). This completes the proof. 0

In some applications of Theorem 2.1, where J(x) + f..L!p(x) cannot be con-


vex, it is useful to be able to choose !p strictly concave; this happens, e.g.,
in the case of 0-1 programs, as it will be shown later. An extensive treat-
ment of both the theory and methods of concave minimization problems
156 F. Giannessi and F. Tardella

can be found in [4, 40, 54]. If f is not concave, then (3) may require the
minimization of an indefinite form, which may be undesirable. The follow-
ing theorem states a condition under which the objective function in (3) is
strictly concave. Consider the case where
X = XQ == {x ERn: 0:$ x:$ e}, <p(x) = xT(e - x),
with e == (1, ... , I)T. Such a special case is interesting because it represents
the classic relaxation for 0-1 extremum problems.
Theorem 2.2 If f E C2(XQ) and Z C XQ, then there exists a real/-£1 such
that for all /-£ > /-£1 problems (2) and (3) with X = XQ are equivalent and (3)
has a strictly concave objective function.

Proof. Let H (x) and H (x) be the Hessian matrices of f and of f +/-£xT (e- x),
respectively. We have:
H(x) = H(x) - 2/-£In .
Moreover, H is continuous and, because of a well known property 2, the same
is true for its eigenvalues, say Al(X), ... , An(X)j these are bounded since X
is compact. Thus,
>. == i=I,
max
... ,n
sup IAi(X)1
xEX
< +00.

For every x E X, v is a (real) eigenvalue of the (symmetric) matrix H(x) iff


det [H(x) - vIn] = det [H(x) - (214 + v)In] = O. Thus, v is an eigenvalue of
H(x) iff A == v + 2/-£ is an eigenvalue of H(x). If AO and /-£0 are defined as in
the proof of Theorem 2.1, and
1-
/-£ ~ /-£1 == max{2 A,/JO},
then (2) and (3) are equivalent and, furthermore,
v = A - 2/-£ :$ A - AO < O.
Hence, H(x) is negative definite. This completes the proof. o
The closeness of XQ and the assumption f E C 2(XQ) in Theorem 2.2
cannot be weakened, as the following example shows.
2Let Mn be the space of n x n real matrices, and let T : R n -t Mn be a continuous
function. The function>. : R n -t G::, where >.(y) is any eigenvalue of Mn(Y), is continuous.
This is a straightforward consequence of a known theorem on linear continuous operators
in a normed space [15].
Nonlinear Programming and Discrete Optimization 157

Example 2.3 Set n = 1, XQ = [0,1], !p(x) = x{l- x), and

_x3 sin ~, if x =/: 0,


f{x) = {
0, if x = 0,

so that
-3x2 sin ~ + x cos ~, if x =/: 0,
J'(x) = {
0, if x = 0,

f"(x) = { (~-6X)Sin~+4cos~, if x =/:0,

undefined, if x = 0,
(~ - 6x) sin ~ + 4 cos ~ - 2~, if x=/: 0,
J"(x) + ~!p"(x) = {
undefined , if x = 0.
For every ~ E ~ \{o} it is possible to find x E]O,I] (close enough to
zero) such that f"(x) + ~!p"(x) > 0. Indeed, it is sUfficient3 to choose
x = (SfJL1!13)7r' where r~l denotes the upper integer part of~· Hence, f(x)+
~!p(x) is not concave in (O,lj even if f E C 1 ([0,1]) and f has continuous
second derivative in ]0,1].

3 Equivalence between Integer and Real Opti-


mization
We now consider a special case of (2) which, however, embraces most com-
binatorial extremum problems. Let

R = {x E R n : g(x) ;:: O} and Z = B n , (6)

where 9 : R n -t R m , B == {O, I} and Bn == B x ... xB (n times). Thus, (2)

°,
becomes:
minf(x) , s.t. g(x) ;:: x E Bn . (7)
3In this case the above inequality becomes (f,,1 denotes the upper integer part of ,,):

1I'2(8f,,1 + 13)2 - 471"(8f,,1 + 13)" - 24> 0,


and is easily verified by I' ~ 1.
158 F. Giannessi and F. Tardella

The case where, in (7), x E B n is replaced by x E 71.n can be reduced


to (7) by means of well known devices like binary expansion. The natural
relaxation of Z = B n and the penalization when R is defined by (6) are the
hypercube XQ and cp(x) = xT(e - x) of the preceding section, respectively;
with this choice (3) becomes:

min[f(x) + J.LxT(e - x)] , s.t. g(x) ~ 0 , 0:'5 x :'5 e. (8)

Theorem 2.1 becomes here:

Theorem 3.1 Let f verify assumption (HI) of Theorem 2.1 with a = 1,


i.e., is bounded on XQ = {x E R n : 0 :'5 x :'5 e} and Lipschitz continuous
on an open set A :::> Z = Bn. Then, there exists J.Lo E R such that for every
J.L ~ J.Lo problems (7) and (8) are equivalent.

Proof. We only need to prove that cp(x) = xT(e - x) satisfies assumption


(H2) of Theorem 2.1. Note that (i) and (ii) are trivially true. We will
now prove that (iii) holds with S(z) = {x E R n : IIx - zll :'5 p < I} and
€(z) = 1- p. To see this, consider pER and '1.1. = ('1.1.11'" ,un) satisfying

P == IIx - zll :'5 p,

'1.1. == .!-.{x - z).


P
Then
n
cp(x) = ~)pUj + zj)(l - Zj - p'Uj). (9)
j=1
Since z + p'U = x E XQ, Uj > 0 implies Zj = 0 and Uj < 0 implies Zj = 1;
therefore, from (9) we deduce:

cp{x) = E p'Uj{l- PUj) + E (I + PUj)( -PUj) =


n n n
EpIUjl(l- plUjl) = P IUjl- p2 E IUjI2. E (10)
j=1 j=1 j=1
Since
n n
P:'5 P< 1 , E IUjl2 = 11'1.1.112 = 1 , E IUjl ~ 11'1.1.11 = 1,
j=1 j=1
Nonlinear Programming and Discrete Optimization 159

from (10) we obtain:

cp(x) ? p(1 - p) = €(z)lIx - zll.


This completes the proof. o
When f is linear (or quadratic) and g affine, Theorem 3.1 states an equiv-
alence between (7) - called 0-1 linear (or quadratic) programming problem
- and (8) which, because of Theorem 2.2, is a strictly concave quadratic real
program, if J.I. is large enough. An analogous remark can be made in the
more general case f E C 2 (XQ). This condition is not redundant, as it may
be shown by Example 2.3.
It is well known that the satisfiability problem in logic and many prob-
lems in Graph Theory can be formulated as 0 - 1 programming problems.
Therefore, such problems can also be formulated as strictly concave pro-
grams in a continuous setting as shown in Theorem 3.1. Other formulations
as nonlinear programs for this type of problems can be found in [53].
When the equivalence between (7) and (8) holds, properties and methods
valid for one of the two problems can be transferred to the other one. As
an instance of this, consider the case
1
f(x) = cT x + "2xTOx , g(x) = Ax - b,

where bERm, cERn, A E R mxn and 0 E R nxn is a symmetric matrix.


Theorem 3.2 If J.I. E R is large enough, then the 0-1 quadratic program-
ming problem:

min(cT x + ~xTOx), s.t. Ax ~ b, x E Bn (11)

is equivalent to the linear complementarity problem:

mine ~ , s.t. A~ + 7]
T -
= b,- ~ ~ 0, 7] ~ 0, eT 7] = 0, (12)
where e, 7] E R 2n+m and

eT == ~(cT + J.l.eT + eT , bT , eT ) ,
_ (-0 + 2J.1.In
A== A
AT
o -In)
o .
In o o
160 F. Giannessi and F. Tardella

Proof. Because of Theorem 3.1, whose hypotheses are trivially satisfied, (11)
is equivalent to the quadratic problem:

min[(cT + J.1.eT )x + ~xT(G - 2J.1.In)x], s.t. Ax ~ b, 0 ~ x ~ e, (13)

if J.1. is large enough. The well known Karush-Kuhn-Tucker necessary con-


dition for (13) is:
cT + J.1.eT + (G - 2J.1.In)x - AT y + t - u = OJ (14)
Ax - v = b ; x + W = e ; x,y, t,u,v,w ~ 0; (15)
x T u = yT v = t T W = 0; (16)
where y, t, n are vectors of multipliers associated to the inequalities Ax ~
b, x ~ e, x ~ 0, respectively; and v, W are slack variables. Solving (13)
is equivalent to finding, among the solutions of the complementarity sys-
tem (14-16) (stationary points), those which minimize the function in square
brackets of (13). Such a function, evaluated at the stationary points, "be-
comes linear':

In fact, (14) implies


Gx - 2J.1.x = -(c+ J.1.e) + ATy - t+u;
from (16) we have:

xTu = 0,
o = yTV = yTb - yT Ax,
o = t Tw = tT e - tT x,
and therefore:
yTb= yT Ax,
tTe = t T x.
Now, to achieve the thesis it is sufficient to set
eT = (xT,yT,tT) , rt = (uT,vT,wT).
This completes the proof. o

Note that no assumption has been made on G, so that the convex case, as
well as the nonconvex one, have been considered. See also [53] for a reduction
of the mixed integer feasibility problem to a linear complementarity system.
Nonlinear Programming and Discrete Optimization 161

4 Functions attaining the Minimum on the Ver-


tices of a Polyhedron
In many cases it is possible to restrict the search for the global solution of
a nonlinear program to a finite set of points. One way of doing this consists
in considering only the set of points that satisfy some kind of first or second
order necessary condition for local optimality. This set is often finite, but
in general it is not practical to compute all its elements or to minimize the
objective function over it. Another important case where the search for a
solution of a nonlinear program can be restrained to a finite set is when
at least one solution is guaranteed to be in the set of extreme points of the
feasible region and such set is finite. We recall here the standard definition of
extreme point of a set and we introduce a slightly more restrictive property
which will be used in Section 5.

Definition 4.1 Let X be a subset of Rn. A point x E X is called an


extreme point of X iff y,z E X, a E]O,1[ and x = ay + (1 - a)z imply
y = z. A point x E X is called a convex hull extreme point of X iff it is
an extreme point of the convex hull (denoted by co(X») of x.

Note that the set of extreme points of a polyhedron coincides with the set
of its vertices and is finite. Let &(X) denote the set of extreme points of a set
X. Then the set of convex hull extreme points is given by &(co(X». Since
X ~ co(X), it follows from the above definition that &(co(X» ~ &(X) ~ X.
It is well known (see, e.g., [58]) that the convex hull of a compact set
is compact. Furthermore, the Krein-Millman Theorem states that for a
nonempty convex compact set Y the equality Y = co(&(Y» holds. Hence,
for any nonempty compact set X one has

co(X} = co(&(co(X))} ~ co{&(X)) ~ co(X},

which trivially implies that

&(co(X» =F 0 and co(&(co(X») = co(&{X)).

One of the basic properties of Linear Programming, which underlies the


validity of the Simplex Algorithm, is the following:
(A) If in (1) f attains its global minimum on Sand S is a polyhedron,
then at least one global minimum point must be in a vertex of s.
162 F. Giannessi and F. Tardella

This property, which establishes an equivalence between a continuous


problem and a combinatorial problem, actually holds also for classes of fea-
sible sets S and of objective functions f that are considerably larger than
the classes of polyhedra and of linear functions. Indeed, it is well-known
(see, e.g., [37, 69] ) that the same property holds for concave functions on
closed convex sets and for quasi-concave functions on compact convex sets.
Note however that Property (A) does not hold in general for quasi-concave
functions on unbounded closed convex sets. A simple counterexample is pro-
vided by the quasi-concave (and convex) function f : X = [0, +oo[-t R
defined by f(x) = -x, for x E [0,1], and f(x) = -1, for x E]I, +00[.
We now state a definition of concavity and quasi-concavity for functions
defined on a subset X of R n without the usual assumption of convexity of
X. This greater generality is required for the results of Section 5.

Definition 4.2 Let X be a subset ofRn. A function f : X -t R is concave


(resp. quasi-concave) on X iff for every x E X and for every set of points
{xl, ... , xm} ~ X and of coefficients a1, ... , am E]O, 1[ such thatEi:!:l ai =
1 and x = Ei:!:l aixi one has
f(x) ~ Ei:!:l ad(xi)
(17)
(resp. f(x) ~ milli f(x i ))
A function is called strictly concave or strictly quasi-concave iff the above
relations are satisfied with a strict inequality whenever xi =I- x for at least
one index i.

Note that if the set X coincides with £(co(X)) (like e.g. in the case
of a circle), then every function is strictly concave on X with the above
definition. However, when X is convex Definition 4.2 is equivalent to the
standard definition of concavity and strict concavity of a function.
In this section we restrict Our attention to the minimization of a function
on a polyhedron and, in order to avoid trivial cases, we also assume that all
(unbounded) polyhedra have at least one vertex.
It is easily seen that the class of quasi-concave functions is the most
general class of functions for which Property (A) holds for every bounded
polyhedron S. Indeed, by applying Property (A) to the special polyhedron
that coincides with the line segment joining two points xl and x 2 , one triv-
ially obtains the inequality defining quasi-concave functions.
For a given polyhedron, or class of polyhedra, it is however possible to
find classes of functions which properly include quasi-concave functions and
Nonlinear Programming and Discrete Optimization 163

satisfy Property (A). We recall here some results obtained in [64] and extend
them to the case of unbounded polyhedra and of polymatroids.
In the sequel we denote by riX and rbdX the relative interior and the
relative boundary of a set XcRn respectively, i.e., the interior and the
boundary of X with respect to the topology induced on the smallest affine
manifold containing X. Furthermore, we denote by dim(X) the dimension
of the smallest affine manifold containing X.

Definition 4.3 Let X be a subset 0/ R n and let / be a function from X


into R. We say that f satisfies the Weak Minimum Principle (WMP for
short) on X iff, whenever x· E X and f(x*) ~ f(x) for every x E X, either
f(x) = f(x*) for every x E X or x* E rbdX.

Lemma 4.4 Let f be a real function on a polyhedron P and assume that f


satisfies WMP on all faces of P belonging to a subset n of the set of all faces
of P. Then, the set Pmin of global minimum points of f over P satisfies the
following relation

Pmin n (Vp u U )'f: 0 <==? Pmin 'f: 0 (18)


FEOC

Proof. Assume that Pmin 'f: 0 and let F* denote a face of P of min-
imal dimension among the faces F satisfying the relation Pmin n F 'f: 0.
If F* E nc or dim(F*) = 0, then (18) holds. Hence assume that F* E n
and dim(F*) > 0 and let x E Pmin n F*. By the WMP one can then find
y E rbdF* such that y E Pmin, contradicting the minimality assumption on
F*. 0

By choosing n equal to the set of all faces of dimension greater than m we


obtain the following:

Theorem 4.5 If f satisfies WMP on all faces of P of dimension greater


than m, then, if Pmin 'f: 0, at least one global minimum point lies in an
m-dimensional face of P. In particular, ifm = 0, then Property (A) holds.

Theorem 4.5 provides a fairly general condition for the validity of Prop-
erty (A); however, in general it is not easy to check whether a function f
satisfies WMP on some or all faces of P. The following corollaries provide
useful sufficient conditions for this hypothesis to hold.
164 F. Giannessi and F. Tardella

Corollary 4.6 ([30]) Iff E G2(p) and the Hessian matrix H,(x) of f has
least n - m negative eigenvalues at any x E P then, if Pmin :f 0, at least
one global minimum point lies in an m-dimensional face of P.

Proof. The assumption implies that I cannot satisfy the second order
necessary condition for optimality at any point in the relative interior of a
face of dimension greater than m. Hence, WMP holds on all faces of P of
dimension greater than m. 0

Given a point s E R n define Ps(x) = {z E P : z = x + AS, A E


lR}. Given a face F of P define also IF = {h : sh is parallel to some
edge of F}. Furthermore, let HI = {I, 2, ... ,qI}, H2 = {I, 2, ... ,q2} and
H = HI U H2, and let {sh : h E H} be a set of vectors in lRn such that each
bounded edge of P is parallel to some sh with h E HI and each unbounded
edge of P is parallel to some sh with h E H2.

Lemma 4.7 If Pmin =F 0, I is quasi-concave on Psh(X) for all hE KICHI


and x E P, and concave or strictly quasi-concave on Psh(X) for all h E
K 2CH2 and x E P, then a global minimum point of lover P belongs to a
face F satisfying IF n (KI U K2) = 0

Proof. Let n = {F : F is a face of P and IF n (KI U K2) :f 0}. Then


by Lemma 4.4 we only need to prove that f satisfies WMP on every face
F E O. To this end, assume that F E 0 and x* E F is a global minimum
point for f over F. If x* E rbdF, then WMP trivially holds on F. So
suppose that x* E riF and let h E IF n (KI U K2). Then x* E riPsh(X*). If
hE HI, then Psh(X*) is bounded and its extreme points xl and x2 belong
to rbdF. Furthermore, quasi-concavity of I on Psh (x*) implies that either
xl or x2 is also a global minimum point. Hence, WMP holds. If h E H2,
then Psh (x*) is unbounded. In this case let xl be the only extreme point
of Psh(X*) (which exists because P has vertices) and take any point x 2 on
Psh (x*) such that x* is in the interior of the line segment joining xl and x 2.
If f is concave on Psh(X), then I(x l ) = f(x*) = l(x 2) which establishes
WMP. On the other hand, if I is strictly quasi-concave on Psh(X), then one
should have f(x*) > min{f(x 1 ) , f(x 2)} ~ f(x"'), which is a contradiction. 0

Since the only faces of P that satisfy IF nH = 0 are the vertices of P,


we immediately obtain the following:
Nonlinear Programming and Discrete Optimization 165

Theorem 4.8 If f is quasi-concave on Ps"(x) for all h E HI and x E P,


and concave or strictly quasi-concave on Psh(X) for all h E H2 and x E P,
then Property (A) holds.

In order to apply the above proposition one needs to know the directions
of all the edges of P. When such directions are not explicitly known, one can
still guarantee the validity of Property (A) by making some more restrictive
assumptions on f.

Definition 4.9 Let J be a real Junction defined on a convex set Xc R n and


let m :::; n. We say that f is m-concave (m-quasi-concave) on X iff
J(ax l + (1 - a)x2 »~ af(x l ) + (1- a)J(x2 )
(resp. »
f(ax l + (1 - a)x2 ~ min{J(xl), J(x 2 )}),
Jor every a E [0,1] and for every xl, x 2 E X such that xl = x; for at least
n - m indices i in {I"" ,n}.
Assume now that the polyhedron P is expressed in one of the following
ways:

P = {x E R n : Ax = b,t:::; x :::; 'II}


or (19)
p = {x E R n : Ax:::; b,t:::; x:::; 'II},
where A E Rmxn,b E Rm,l,u E Rn,t = (tI, ... ,ln),u = (Ub""U n ),
-00 :::; lj < 'Uj :::; +00 and min{lli I, I 'IIi I} < +00. Note that the assump-
tions on l and'll imply that, if P i: 0, then P has at least one vertex.

Theorem 4.10 Let P be a polyhedron defined by (19) and assume that A


has rank m - 1. If f is m-concave on P or J is m-quasi-concave on P and
P is bounded, then Property (A) holds.

Proof. Let S = (Sl, ... , sn) be any vector parallel to an edge of P.


Observe that every edge of P lies in the intersection of n - 1 linearly in-
dependent hyperplanes taken from among those defining P in (19). Since
rank (A) = m - 1, we have Si = 0 for at least n - m indices i. From the
m(-quasi)-concavity of J we then derive that J is m(-quasi)-concave on the
sets Ps(x) for every x E P. Hence the conclusion follows from Theorem 4.8.
o
166 F. Giannessi and F. Tardella

4.1 The Box Case


We now consider the case where P is a box, i.e., it is defined by P = {x E
1Rn : l ~ x ~ u} where l,u E 1Rn,l = (l}, ... ,ln),u = (UI, ... ,un),-oo ~
li < Ui ~ +00 and min{1 f.i I, I Ui I} < +00. In this case, it is easy to see
that a set of vectors parallel to the edges of P is the canonical basis of 1Rn
denoted by e1 , ..• , en.
Consider the sets of indices JI = {i : li = -00}, J2 = {i : Ui =
+oo} and J = {I, ... , n} \ (Jl U J2) and apply the transformation Xi =
(Ui - li)Yi + li for i E J, Xi = Ui for i E Jl and Xi = li for i E J2. The
problem of minimizing I on the vertices Vp of P is then equivalent to the
problem of minimizing the transformed function f{y) on the 0-1 hypercube
1Bn = {O, 1}n , where n = card(J). Hence, when Property (A) holds, the
I I I

problem of minimizing lover a box can be reduced to a problem of (non-


linear) 0 - 1 programming for which several solution methods are available
(see, e.g., [31, 33]).
In [19] the equivalence between minimization on P and on Vp in the box
case is exploited to efficiently perform a stability test for a system of linear
differential equations with uncertain real parameters.
Taking into account the directions of the edges of P, Theorem 4.8 can
be reformulated as follows:

Theorem 4.11 II I is quasi-concave with respect to each coordinate Xi such


that i E J, and concave or strictly quasi-concave with respect to each coor-
dinate Xi such that i E JI U J2, then Property (A) holds.

By exploiting the above reduction to the 0-1 case, a function I can be


efficiently minimized over a box P if it is submodular on Vp in addition to
satisfying Property (A).
We recall that a subset Xc R n is a sublattice of 1RR iff

where

Note that a box is trivially a sublattice of 1Rn. Furthermore, a real-valued


function g is submodular on a sublattice X if
Nonlinear Programming and Discrete Optimization 167

A well-known result in 0 -1 optimization is the fact that the problem of


minimizing a submodular function over the 0 - 1 hypercube can be solved
in polynomial time [27].
In the case where I E C 2 it has been proved [67] that I is submodular on
a box Q iff the second order mixed partial derivatives I XiXj (x) are nonpositive
for all x E Q. Note that I is concave with respect to each coordinate iff
!XiX,(X) ~ 0 for all i and x. Hence, by choosing Q equal to the convex hull
of Vp we obtain the following:
Theorem 4.12 If !XiXi(X) ~ 0 for every index i and for all x E P, and
!x;Xj(x) ~ 0 lor all i,j E J with i '! j, then the problem of minimizing!
over P can be solved in polynomial time.

4.2 The Polymatroid Case


A more complex type of polyhedra for which a simple characterization of the
edges is available are polymatroids and base polyhedra. These polyhedra,
originally introduced by Edmonds [16], have attracted considerable interest
due to their combinatorial structure, their connections with submodular
function minimization and the possibility of efficiently minimizing linear
and separable functions over them. A polymatroid is a polyhedron Pin :Rn
described by the following inequalities:
LXi ~ g(I} for all leN = {I, ... ,n},
iEf
Xi ~ 0 for all i E N,
where the function 9 : 2N -7 R is
isotone: g(8) ~ g(T) for all 8, TeN,
submodular: g(8 n T) + g(8 U T) ~ g(8) + g(T) for all 8, TeN,
normalized: g(0) = O.
A base polyhedron is the facet of a polymatroid determined by the addi-
tional constraint L:iEN Xi = g(N). From known results on the characteriza-
tion of adjacent vertices of polymatroids and base polyhedra (see [18, 25, 68])
one can easily deduce that every edge of a polymatroid is parallel to a vector
of the set {ei : i E N} U {ei - ei : i, j E N, i '! j}, while every edge of a base
polyhedron is parallel to a vector of the set {ei - ei : i, j E N, i '! j}.
In the case of a base polyhedron and of a twice differentiable function I
a sufficient condition for the validity of Property (A) is then the following:
IXiX,(X) + IXjxj(x) - 2!x,xj(x) ~ 0, Vx and Vi,j E N, with i '! j.
168 F. Giannessi and F. Tardella

For a polymatroid one must add the condition

IXix.(x) :$ 0, "Ix and Vi E N.


Note that the polyhedra described by non-negativity constraints together
with a single "knapsack" constraint of the type Ei aixi :$ b or Ei aixi = b,
with ai > 0, can be transformed into special cases of polymatroids and base
polyhedra by setting Yi = aixi and choosing 9(8) = b for all 8 :I 0. This
type of polyhedra arise naturally in problems of resource allocation (see
[38,43]).

5 Piecewise Concave Functions and Minimax


In this section we introduce the classes of piecewise concave and piecewise
quasi-concave functions on a set X, i.e., those functions which are concave
or quasi-concave on each element of a family of subsets of X whose union
covers X. These classes of functions arise naturally from several applications
including location theory and various types of minimax problems. We show
here that for these functions the search for a global minimum point can
be restricted to a special subset of X that, under suitable assumptions, is
guaranteed to be finite.
Let X c R n and assume that X = UieI Xi, where each Xi is a closed
subset of Rn. Consider a family {fiheI of functions with Ii : Xi -+ It and
hex) = f;(x) for every x E Xi n Xj and for all i,j E I. Define a function
9 : X -+ R by setting

9(X) == li(x), (20)


The function 9 defined above is called (strictly) piecewise concave or
piecewise quasi-concave iff all the functions Ii are (strictly) concave or
quasi-concave respectively. We point out that the sets Xi are not required
to be convex and that the notions of concavity and quasi-concavity on Xi
are those introduced in Definition 4.2. In fact, the sets Xi are not necessarily
convex, e.g., in the important problem of minimizing a function defined as
the maximum of a family of concave or quasi-concave functions on a convex
set. Clearly, piecewise convexity or piecewise quasi-convexity of a function
can be defined in a similar manner.
Consider the subset & of X containing all the convex hull extreme points
of all sets Xi and its subset D S;;; & formed by those points that are convex
hull extreme points of every set Xi to which they belong. Formally we set:
Nonlinear Programming and Discrete Optimization 169

£= U£(CO(Xi» and D = {x E £ : x E Xi => x E £(CO(Xi»)} (21)


iEl

Denoting by Xmin the set of global minimum points of 9 over X, we can


now state the main result of this section which generalizes and strengthens
some well-known results for piecewise concave functions and for minimax
problems [66].

Theorem 5.1 (i) If, for all i E I, fi is quasi-concave on Xi and Xi is


compact, then
Xmin =10 {:} Xmin n £ =I 0.
(ii) If, for all i
E I, /i is strictly quasi-concave on Xi and Xi is compact,
then
Xmin =10 {:} Xmin n D =10.
(iii) If X is compact, 9 is lower semicontinuous on X and fi is concave on
Xi for all i E I, then
Xmin n D =10 and £(CO(Xmin» ~ D.

Proof·
(i) Let x· be a global minimum point for 9 over X and assume that
x* E Xi. Since Xi is compact, CO(Xi) is also compact and hence
CO(Xi) = co(£(CO(Xi»)' by the Krein-Millman Theorem. Hence Xi ~
cO(£(CO(Xi»)' so that there exist points xl, ... ,xm E £(CO(Xi» and
coefficients al, ... , am E]O, 1[ such that x* = E~l aixi and E~l ai =
1. Then, by the quasi-concavity of /i, at least one of the points
Xl, ... , xm must be a global minimum point for g.
(ii) Let x· be a global minimum point for 9 over X and assume that x· E Xi.
With an argument similar to the one employed in the proof of (i)
if x* rJ. £(CO(Xi» we obtain fi(x*) > min{/i(x l ), ... , /i(xm)} which
contradicts the minimality of x* .
(iii) First note that, by the lower semicontinuity of g, the set X min is
nonempty, closed and hence compact since it is contained in the com-
pact set X. Hence, £(CO(Xmin» =10 and CO(Xmin) = co(£(CO(Xmin»)
by the Krein-Millman Theorem. To complete the proof, we show that if
x* E £(CO(Xmin», then x* E £(CO(Xi» for every i such that x* E Xi.
170 F. Giannessi and F. Tardella

Ab absurdo, if x* E Xi \ E(CO(Xi», then there exist y, z E CO(Xi)


and ,\ E]O,l[ such that x* = '\y + (1 - '\)z. Then, by definition
of CO(Xi), there exist yl, ... , yml and zl, ... , Zm 2 in Xi and coef-
ficients al, ... , a mll 131, ... ,13m2 E]O, 1[ such that y = Li,~\ aiyi,
z = L~l f3i zi , L~l ai = 1 and L~1 f3i = 1. Setting "Yi = '\aj
for i = 1, ... , ml and 8i = (1 - ,\)f3i for i = 1, ... , m2 one has
m . m' m
x* = Li=\ "YiY' + Lid1 8i Z' and Li~l "Yi + Lid1 8i = 1. From the
m

concavity of Ii, it then follows that fi(x*) = Ii(yi) = Ii(zi) for all
indices i. Hence, g(x*) = g(y) = g(z) and therefore y, z E Xmin, con-
tradicting x* E E(CO(Xmin». 0

Remark 5.2 Note that if CO(Xi) is a polytope, then, taking into account
Theorem .4,8, the assumption of (i) in Theorem 5.1 can be weakened by
requiring only quasi-concavity of Ii on all line segments in CO(Xi) which are
parallel to some edge of CO(Xi)'

Remark 5.3 The result of Theorem 5.1(i} holds also when some set Xi is
an unbounded polyhedron. In this case, taking into account Theorem 4.8,
the assumption of (i) in Theorem 5.1 can be replaced by the requirement of
quasi-concavity of fi on all line segments in Xi which are parallel to some
bounded edge of Xi and concavity or strict quasi-concavity of Ii on all line
segments in Xi which are parallel to some unbounded edge of Xi.

In several applications (see, e.g., [11, 13]) one has to solve minimax
problems of the form
min n;wc fi(x), (22)
xEX ~EI

where X is a compact subset of R n (often a polytope) and each fi is


(quasi-)concave on X. In this case Theorem 5.1 can be applied to the piece-
wise (quasi-)concave function g(x) == maxiEI fi(x) which is (quasi-)concave
on the sets Xi == {x EX: g(x) = fi(x}}. Thus we obtain the following
result first stated by Zangwill [71] for the concave case with a somewhat
incomplete proof:

Theorem 5.4 If X is a compact subset ofRn and the functions


Ii : X ---t R are continuous and (quasi-}concave, then at least one solution
of (22) belongs to the set D (resp. E).

Suppose now that X is a polytope described by a set of linear inequalities


aT x ::; bj,j E J and, for every x E X define I(x) = {i E I : g(x) = fi(x)}
Nonlinear Programming and Discrete Optimization 171

and J(x) = {j E J: aT x = bj}. It can be easily verified that a point x E X


is a vertex (extreme point) of X iff I(x) is maximal, i.e., there does not exist
any y E X such that I(x) is a proper subset of I(y). Analogously, a point
x E X is called a g-vertex of X iff I(x) U J(x) is maximal. This notion has
been introduced by Du and Hwang [12] (see also [11]). They also proved the
following result, which has been extended to the case of an infinite index set
I by Du and Pardalos [14]:
Theorem 5.5 If X is a polytope in R n and the junctions Ii : X -+ R are
continuous and concave, then at least one solution of (22) belongs to the set
G of g-vertices of X.
Theorems 5.4 and 5.5 establish an interesting equivalence between the
continuous problem (22) and the problem of minimizing g(x) over the sets
D, £ or G which are often finite. Note however that Theorem 5.4 provides a
stronger restriction for global minimum points than Theorem 5.5. Indeed, it
is clear that D ~ £ and it has been proved in [66] that D ~ G. Furthermore,
simple examples, like the following one, show that there are cases where the
inclusions D c £ c G can be strict.
Example 5.6 Consider the interval X = [a, e] C R and the functions
h,12 : X -+ R illustrated in fig. 3. Then we have Xl = [a, b] U [e, d),
X2 = [b,c] U [d,e], D = {a,e}, £ = {a,b,d,e} and G = {a,b,e,d,e}.

fl
-----

.
I

,/f[ .,
.
• \

I is
a c e
Fig.3
172 F. Giannessi and F. Tardella

6 Location Problems
Piecewise concavity has some important applications in location theory.
Consider, e.g., the rectilinear distance facility location problem which con-
sists in finding the coordinates of m new facilities xl, ... , xm in an n-
dimensional space so as to minimize a weighted sum of the rectilinear (or
L l) distances among them and between them and a set of p existing facilities
located in the points a l , ••• ,aP• Formally, we seek to minimize over R nxm
the function

n m m n m P
g(xl, ... ,xn) == L L L Wijklx~ - x{1 + L L L Vijklx~ - a{l,
k=li=lj=l k=li=lj=l

where Wijk and Vijk are non-negative weights. Note that the function 9 is
separable, i.e., it can be written in the form g(x) = L~=l 9k(Xk •...• xV.
where
m m m p
9k(x1,··· •x~) = 2: 2: Wijklx~ - x~1 + 2: 2: Vijklx~ - a{l.
i=l j=l i=l j=l

Hence, the problem of minimizing 9 over R nxm can be solved by mini-


mizing each function 9k over R m separately. For this reason we will restrict
our attention to the one-dimensional case where xi, ai E R. It is well-known
[5] that a global minimum point of 9 must be contained in the finite set
A == {al •... , aP}m. We will now show that this is true also in a more
general case and that it is a consequence of the piecewise concavity of g.
Suppose now that 9 is defined by
m m m p
g(x) = 2: 2: rpij(lxi - xjl) + 2: 2: .,pij(lxi - ajl), (23)
i=lj=l i=lj=l

where rpij, .,pij : ~ -t R are nondecreasing continuous concave functions


satisfying rpij(O) = .,pij(O) = 0 and limx-Hoo .,pij(t) = +00 for all i and j.
We assume, without loss of generality, that a l < a 2 < ... < aP • Let H
denote the set of all bijective mappings (permutations) 11" from the set of
indices L = {I, ... , m + p} into itself that verify 1r- l (i) < 1I"-1(i + 1) for all
i = m + 1, ... ,m + p - 1. Furthermore, for every i E L, define Yi = xi if
i ~ m and Yi = ai - m if i > m.
Nonlinear Programming and Discrete Optimization 173

Lemma 6.1 The function 9 defined in (23) is concave on all the sets X 1f ==
{x E R n : Y7l'(i):5 Y7l'(Hl),i = 1, ... ,m +p-I}.

Proof. Indeed, on X7l' one has IYi - Yjl = Yi - Yj if 1r-l(i) < 1r- 1 (j) and
IYi-Yjl = Yj-Yi if 1r- (i) > 1r- (j). Hence, on X7l' the functions IPij(lxi-xjl)
I 1

and tPij(lxi - ail) are concave for all i, j , since they are obtained as the
composition of a concave and a linear function. Therefore, the function 9 is
concave on X7l' because it is a sum of concave functions. 0

Theorem 6.2 At least one global minimum point of g on R n is achieved at


a point in A.

Proof. Note that the sets X1f are polyhedra with vertices in A and that
Rn = U1fEHX7T • Furthermore, since limt-t+ootPij(t) = +00 for all i and j
and 9 is continuous, there must be a global minimum point of 9 in a bounded
neighbourhood of O. Hence, by Theorem 5.1, a global minimum point of 9
must be contained in A. 0

Another continuous location problem which can be solved by restrict-


ing the search to a finite set of candidate solutions [65] is the problem of
locating some undesirable facilities in locations xl, ... ,xm E lRn in a way
that minimizes the maximum of some decreasing nuisance cost functions
of the distances among them and between them and a given set of points
bl , ... ,1I' E lRn (see [17] for a survey on this type of problems). Formally,
we wish to minimize the function:

where aij and {3ij are continuous nonincreasing functions, I = {I, ... , m},
J = {I, ... ,p} and II . II is any norm in lRn. In this case the function 9
is piecewise quasi-concave, since it is the maximum of a finite family of
continuous quasi-concave functions. Clearly, the infimum of 9 on R nxm is
obtained when II xi II -t 00 for all i. Hence, this location problem becomes
meaningful only when x is required to be in a compact set X. Under this
assumption, Theorem 5.4 can be applied to establish the following result.
174 F. Giannessi and F. Tardella

Theorem 6.3 At least one global minimum point of 9 on a compact set X


is achieved on the set e = Ui,jEI e(co(Xi;)) U UiEI,;EJ e(co(Yij)), where

Xi; == {x EX: oij(lIxi - bill) ~

~ max{ max ohk(lIxh - bkll)}, max .Bhk(lIxh - xkll)}}.


heI,keJ h#eI

Yij == {x EX: .Bij(lIxi - xjll) ~


~ max{ max ohk(lIxh - bkll)}, max .Bhk(lIxh - xkll)}}.
heI,keJ h¢'keI

When only one new undesirable facility has to be established, the objec-
tive function 9 : R n -t R becomes

g(x) = Ifearoj(lIx - bill)·

In this case, Theorem 6.3 implies that a global minimum point of 9 on a


compact set X is attained at a point in e = UjeJt:(co(X nXj», where Xj =
{x ERn: oj(lIx - bill) ~ ok(lIx - ~II) for all k =F i}. The family of sets
Xj forms the (generalized) Voronoi diagram with respect to the functions
o;(lIxll). This geometric structure has received much attention in recent
years especially in the case where o;(lIxll) = IIxll or oj(lIxll) = Ajllxll and
11·11 is the Euclidean norm (see [2] and references therein). In particular, when
oj(llxll) = IIxll and II ·11 is the Euclidean norm, the sets X; are polyhedra
described by:

In this case, the maximum number N of points in t: = UjeJt:(Xj)


satisfies the following bounds [45]:

n/2! pn/2 :::; N ~ 2(n/2! pn/2) for n even


and
Un/21-1)! prn/21 ~ N ~ 2Un/21! prn/21) for n odd.
e
Furthermore, the points of e can be computed in O(prn/2l+1) time in the
general case [3], and in time O(plogp) for n = 2. If we wish to minimize g on
a polyhedron X, then a global solution can be found by simple enumeration
Nonlinear Programming and Discrete Optimization 175

of all vertices of the sets Xi n X. In this way we always obtain a finite algo-
rithm, while in [62] an algorithm is presented which requires some additional
assumptions to solve the same problem in finitely many iterations.
Facility location on networks is an important field in location theory
where much research has concentrated in identifying a finite set of points
that necessarily contain an optimal solution. Good surveys on this topic are
[29, 47, 50].

definitions and notations taken from [47]. An edge of length i > is the °
In order to formulate location problems on networks, we introduce some

image of an interval [0, £] by a continuous mapping 1 form [0, f] to Rd such


that 1(0) '# 1(0') for any 0 '# 0' in [O,f]. A network is defined as a subset
N of Rd that satisfies the following conditions: (i) N is the union of a finite
number of edges; (ii) any two edges intersect at most at their extremities;
(iii) N is connected. The vertices of the network are the extremities of the
edges defining N and are denoted by V = {vI, ... , v n }. The set of edges
defining the network is denoted by E. For every edge [vi, vj] E E let lij from
[0, £ij] to Rd be the defining mapping and denote by Oij the inverse of lij
which maps [vi, vj] into [0, iij]. Given two points Xl, x 2 E [Vi, vj], the subset
of points of [Vi, vi] between and including xI,x 2 is a subedge [xI,x 2 ]. The
length of [x\x 2 ] is given by IOij(X I ) - Oij(x 2 )1. A path joining two points
xl EN and x 2 E N is a minimal connected subset of N containing xl and
x 2 • The length of a path is equal to the sum of all its constituent edges
and subedges. A metric d on N is defined by setting d(xl, x 2 ) equal to the
length of a shortest path joining xl and x 2 •
For any point zEN consider the function on [0, iij] defined by d(z, lij (0»).
Note that, by the definition of the distance, one has

d(z,lij(O» = min{d(z,vi) +O,d(z,v j ) +£ij -O}.

Hence, d(z,lij(O» is the minimum of two linear functions in O. Therefore,


the following properties hold [47]:

(i) d(z,lij(O» is continuous and concave on [O,iii];

(ii) d(z, lij(O» is linearly increasing with slope +1 on [0, Oij(Z)[ and linearly
decreasing with slope -1 on ]Oij(Z), iij] , where Oij(Z) = ![£ij +d(z, vi)-
d(z, vi)].

The (single) median problem on the network N consists of finding a


176 F. Giannessi and F. Tardella

point in N that minimizes the function

F(x} = L: wi«d(vi,x»),
v'EV

where Wi(t) are concave nondecreasing functions from R+ to R. In 1964


Hakimi [28] showed that at least one solution of this problem belongs to V
when Wi(t) = Ait with Ai 2: O. This result was extended in [50] to the case
where the Wi(t) are concave nondecreasing functions. The proof is based
on the remark that, if Wi(t) is concave nondecreasing, then the function
wi«d(vi,hj(O»))) is concave on [O,iij] and hence F(Jij(O» is also concave
on [0, iij]. Therefore, the global minimum of F must be attained at one of
the points hj(O) or hj(lij), i.e., at a vertex of N.
In the (single) center problem on the network N one seeks to minimize
the function
G(x} = II;lax(d(v i , x».
v'EV
In this case the function G(/ij(O» is not concave on the whole segment
[0, iij], but only on subsegments thereof. However, the number of such sub-
segments is bounded by 1V121EI and their extremities can be explicitly de-
termined (see [47] for details). Hence, also the center problem on a network
can be solved by enumerating a finite (and polynomially bounded) number
of points.

7 Convex-Concave Problems
Consider a problem of the form:

minf(x, y) s.t. (x, y) E S x T, (24)


where S,TcRn and f: S x T -t R.
When f(x,y) = CI'x+xTQy+d!'y, with Q E Rpx q , c,x E RP, d,y E Rq
and Sand T are polyhedra, problem (24) is called a bilinear programming
problem. This kind of problems, which can be reduced to concave minimiza-
tion problems, have been studied by several authors (see, e.g., [4,40,54] and
references therein). It is well known that at least one solution of a bilinear
programming problem is achieved at a vertex of its feasible region. We now
show that this property holds for a more general class of functions.
It is straightforward to prove that, if minxes f(x, y} exists for every y E
T, then problem (24) is equivalent to the following:
Nonlinear Programming and Discrete Optimization 177

min cp{y) , (25)


yET

where cp{y) == minxes f{x, y).


Hence, the solution of problem (24) may be reduced to the solution of
the two subproblems minxEs f(x, y) and miDyeT cp(y). In the case where T
is a polyhedron Theorem 4.8 can be applied to ensure that cp(y) attains its
minimum at one of the vertices ofT.

Theorem 7.1 1fT is a polyhedron and the function y I---t f(x,y) is quasi-
concave on all directions parallel to bounded edges of T and concave or
strictly quasi-concave on all directions parallel to unbounded edges of T,
then, if the function cp{y) attains its minimum on T, at least one global
minimum point lies in a vertex of T.

Proof. It is sufficient to notice that, since cp{y) is defined as a minimum


of functions that are concave or quasi-concave on the directions parallel to
the edges of T, cp shares the same properties of such functions and hence
Theorem 4.8 may be employed to complete the proof. 0

Clearly, if minXES f(x, y) and miDyET cp(y) can be evaluated efficiently,


then problem (24) can also be solved in an efficient manner. We now show
that this is the case for a special class of indefinite quadratic programs.
Consider the problem

minf{x,y) = xTBx + xTCy + yTDy + cTx + JI'y


{ xES = {x: Alx ~ bI } (26)
yET = {y : A2 y ~ ~},
where B E R,PxP,C E Rpxq,D E Rqxq,A1 E R m l xP,A2 E R m 2 x q, c,X E
RP, d, y E Rq, bi E Rml and ~ E Rm2. Note that, if B is positive semidef-
inite, then cp{y) = minxES f(x, y) can be evaluated in polynomial time [46]
and, if sT Ds ~ 0 for all vectors s parallel to an edge of T, then cp{y)
achieves its minimum on a vertex of T. Hence, problem (26) can also be
solved in polynomial time if the set VT of vertices of T is small or if cp{y)
can be minimized in polynomial time over VT. This is the case, e.g., if
T = {y E R n : l ~ y ~ u} and bi ; ~ 0, Ci; ~ 0, di ; ~ 0 for all i i= j. In
fact, these assumptions imply submodularity of f{x, y) which in turn implies
sub modularity of cp{y) by a result of Topkis [67]. We have thus proved the
following result:
178 F. Giannessi and F. Tardella

Theorem 7.2 Problem (26) can be solved in polynomial time il T = {y E


1Rn : l :s y :s u}, B is positive semidefinite, bij, ~j :s 0 lor all i ~ j and
Cij :s 0 lor all i and j.

8 Discretization, Extension and Relaxation


A function I defined on a subset X of 7l. n can be extended in several ways
to a piecewise linear function 1 on the convex hull co(X) of X. In [20,
49, 63] some extensions of this type are presented which also satisfy the
condition that the global minimum of lover X and of lover co(X) coincide.
When 1 is convex, the problem of minimizing I on X can be efficiently
solved by minimizing 1 on co(X). This approach has been exploited in [49]
to prove that the problem of minimizing a submodular function over the
0-1 hypercube lBn can be solved in polynomial time and in [20] to show
that submodular functions for which a particular extension is convex can be
minimized in polynomial time over any box in 7l. n .
A typical method for finding lower bounds for problems of the form

min/(x) s.t. xES n 7l.n (27)

consists in solving the following continuous relaxation obtained by dropping


the integrality constraint:

min/(x) s.t. xES. (28)

This approach is frequently adopted when I is linear and S is a polyhedron


since,in this case, problem (28) is considerably easier to solve than (27).
However, even in the linear case, the distance IIx* - x'il between an optimal
solution x* of (28) and any optimal solution x' of (27) can be quite large
and the same thing can happen to the difference I(x') - I(x*) between the
optimal objective values. Some upper bounds on the distance between the
optimal solutions of (27) and (28) have been established in [9] for the linear
case and in [26] for the quadratic case.
Recently, much attention has been devoted to polyhedral methods for
integer programming. The key step of these methods is the addition of valid
constraints to the set S in order to find a smaller feasible region S' ~ S
which satisfies S' n 7l.n = S n 7l. n • The final aim of polyhedral methods
is that of finding a polyhedral representation S' of the convex hull of the
integer points in S, i.e., S' = co(S n 7l. n ), so that problem (27) can be
Nonlinear Programming and Discrete Optimization 179

solved by minimizing f on Sf. See, e.g., [51] for an introduction to these


methods and [60, 61] for a procedure which constructs a hierarchy of sets
So = S ~ Sl ~ ... ~ Sn = co(Sn71n).
A different type of relaxation for 0-1 programming, which gives tighter
bounds than standard continuous relaxation, has been introduced and ana-
lyzed in the last few years. The main idea consists in reformulating the 0-1
program as a linear program in a space with n 2 variables with the addition
of the condition that a symmetric matrix formed with the n 2 variables is
positive definite. Such problems are called semidefinite programs and can
be solved efficiently both in theory and in practice. See [1, 55, 56, 70] for
an introduction to these problems and for the connections with 0-1 and
combinatorial optimization.
Another active field of research consists in the investigation of conditions
that guarantee integrality of the vertices of S. Several classes of constraint
matrices for which this property holds have been identified and the problem
of recognizing them has also been addressed. The book of Schrijver [59]
contains many results of this type, while [8] is a very recent survey on this
topic.
When f is linear (or concave) and equality between the optimal solutions
of (27) and (28) does not hold, problem (27) is often solved by means of
a branch and bound procedure which exploits the bounds obtained from
the solution of several continuous relaxations of type (28). When f is not
linear, however, the roles might be exchanged. If f is separable and convex
and S is a polyhedron [39] or f is separable and S is a polymatroid [38],
then problem (27) can be solved efficiently and problem (28) may then be
solved to within any prescribed accuracy by solving a sequence of discretized
problems of the form

minf(x) s.t. xES n >"71 n , (29)

where>.. is a positive scaling factor which enlarges the grid of integer points
when>.. > 1 and shrinks it when>.. < 1. The methods developed in [38, 39] are
based on proximity results between the optimal solutions of problems (28)
and (29) which allow to efficiently improve the scaling factor>" in order to
obtain a point that is at distance at most € from an optimal solution of
problem (28).
180 F. Giannessi and F. Tardella

9 Necessary and Sufficient Optimality Conditions


Some of the main tools in nonlinear programming are necessary and/or
sufficient optimality conditions. However, unless some kind of convexity
assumption is made, such conditions only concern local solutions and one is
then faced with the (combinatorial) problem of finding the global optimal
solution among them.
Recently, some new global optimality conditions have been proposed for
nonlinear minimization problems, which do not require any convexity as-
sumptions (see [10, 34, 35, 52] ). Since all 0-1 programs and many other
discrete optimization problems can be equivalently reformulated as nonlin-
ear programs, it seems natural to try to exploit global optimality conditions
in the nonlinear setting to obtain analogous conditions in the discrete case.
A first step in this direction has been made by Hiriart-Urruty and
Lemarechal [36] who observed that, since 0-1 quadratic minimization prob-
lems can be equivalently formulated as concave quadratic minimization prob-
lems on the hypercube [o,l]n (see Sect. 3), one could deduce from the gen-
eral global optimality condition established in [34] for concave minimization
problems a global optimality condition for the 0-1 quadratic case. This con-
dition has been further analyzed in [6, 7], where some computational results
are provided.
Another global optimality condition for concave minimization problems
which might be fruitfully employed in discrete optimization has been intro-
duced by A. Strekalovski and refined by Y. Ledyaev (see [35]). When f is
a concave differentiable function on a convex set S this condition can be
formulated as follows:

Theorem 9.1 A point xES is a global minimum point for f over S if and
only if

(30)
for all s E S and for all xES satisfying f(x) = f(x). Furthermore, if
xES satisfies f(x) = f(x} and (8 - x)TV f(x) < 0, for some 8 E S, then
f(8) < f(x).

Note that the first part of the Theorem 9.1 provides an optimality test
for a candidate solution X. The second part suggests a method for improving
the candidate solution, if the optimality test (30) fails. This is illustrated in
the following example.
Nonlinear Programming and Discrete Optimization 181

Example 9.2 Consider the problem of minimizing the function


f(xI, X2) = -(x - ~)2 - (X2 -1)2 on the 0-1 square]B2 = {a, 1}2. Since f is
concave, this is equivalent to the problem of minimizing f over S = co(]B2) =
[0,1]2. The point x = (0,1) is a local minimum point for f over S since it

f(x) = 1:
satisfies (30) for every s E S. However, the point x = (1, ~) satisfies f(x) =
and minsEssTVf(x) = (l,I)TVf(x) < xrvf(x). Hence, x is
not a global minimum point, by the first part of Theorem 9.1. Furthermore,
a better solution is provided by the point s = (1, 1). Since s satisfies (90) and
the only other point in S satisfying f(s) = f{'8) is (1,0), that also satisfies
condition (30), we can conclude that (1,1) and (1,0) are global minimum
points for f over S and hence, a fortiori, over]82.

The previous example gives the flavor of how global optimality conditions
for nonlinear programs can be used for developing global solution methods
for continuous and discrete optimization problems. However, we feel that
the development of global optimality conditions in nonlinear programming
and their application to discrete optimization is still at a fairly early stage
and deserves further investigation.

10 Conclusions and Remarks on Further Develop-


ments
It is a common belief that in order to find a good solution to a problem it is
advisable to look at it from various angles. In this paper we presented sev-
eral ways of approaching discrete optimization problems from a continuous
viewpoint and viceversa. These connections have been already exploited in
several efficient solution methods. We now discuss some ideas for further
developments in this direction.
Some of the assumptions of Theorem 2.1 might be weakened in the gen-
eral case or, at least, in special cases such as that of Section 3. In the general
case :JR." might be replaced with a metric space. For instance, isoperimetric
problems or optimal control problems, where the feasible region has only a
discrete or even finite number of curves, might be equivalently transformed
into a classic differentiable problem which admits Euler equation as neces-
sary condition. To this end compactness might be usefully replaced with
bicompactness, continuity with lower semicontinuity.
In Section 3 we have noted that integer programs can be reduced to (7).
However, since such a reduction implies, in general, an enormous increase
182 F. Giannessi and F. Tardella

of the number of variables. Hence, it is interesting to extend the analysis to


the case where x E B n is replaced by x E 7l.n • In this case, functions like
~j=l sin2(7rxj) or ~j=l rr~'!'o(Xj - 8)2 ,with Lj upper bound of Xj, might
be used as a penalization function c,o(x). Furthermore, for problem (7) - or,
more generally, when Z is finite - it is interesting to investigate other types
of functions c,o, e.g., in the field of gauge functions.
The idea of resolvent for a Boolean problem [41] might be transferred
to a concave minimization problem through the equivalence established in
Sections 3 and 4.
Theorem 3.2 states a connection between 0-1 programming problems and
complementarity problems which are a special case (when the domain is a
cone) of Variational Inequalities. Hence, we have the possibility of connect-
ing two fields very far from each other. For instance, fixed-point methods
or gap function methods, which have been designed for Variational Inequal-
ities, may be adapted to 0-1 problems. On the other hand, cutting plane
methods and the related group theory, as well as other methods conceived
for integer programs, may be transferred to Variational Inequalities.
It is well known that the concept of saddle point of the Lagrangian
function associated to a problem like (8) can be used to achieve sufficient
optimality conditions (besides necessary ones). Through the equivalence
such a concept can be introduced into 0-1 programs.
The equivalence between nonlinear and 0-1 programs can be used also
in the context of duality. For instance, it can be used to close the duality
gap when Lagrangian duality is applied to a facial constraint [48].
Since the optimality of a constrained extremum problem can be put in
terms of the impossibility of a suitable system of inequalities, it might be
useful to extend the connections between continuous and discrete problems
to a system of inequalities, recovering constrained extremum problems as
special cases. This might let us embrace vector extremum problems.
In Section 4 we described general classes of functions which attain their
minimum on the vertices of a polyhedron. It is interesting to investigate
the possibility of extending the methods employed for concave minimization
problems (see, e.g., [4, 40, 54]) to this more general case.

References
[1] F. Alizadeh, Interior point methods in semidefinite programming with
applications to combinatorial optimization, SIAM Journal Optim. Vo1.5
(1995) pp. 13-51.
Nonlinear Programming and Discrete Optimization 183

[2] F. Aurenhammer, Voronoi diagrams - A survey of a fundamental geomet-


ric data structure, ACM Computing Surveys Vo1.23 (1991) pp. 345-405.

[3] D. Avis and B.K. Bhattacharya, Algoritluns for computing d-


dimensional Voronoi diagrams and their duals, Adv. Comput. Res. VoLl
(1983) pp. 159-180.

[4] H.P. Benson, Concave minimization: theory, applications and algo-


rithms, in R. Horst and P.M. Pardalos (eds.), Handbook of Global Opti-
mization, (Boston, Kluwer Academic Publisher, 1995) pp. 43-148.

[5] V. Cabot, R.L. Francis and M.A. Stuart, A network flow solution to a
rectilinear distance facility location problem, AIlE Trans. Vo1.2 (1970)
pp. 132-141.

[6] P. Carraresi, F. Farinaccio and F. Malucelli, Testing optimality for


quadratic 0-1 problems, Technical Report 11/95 (1995), Dept. of Compo
Sci., Univ. of Pisa.

[7] P. Carraresi, F. Malucelli and M. Pappalardo, Testing optimality for


quadratic 0-1 problems, ZOR Mathematical Methods of Op. Res. Vo1.42
(1995) pp. 295-312.

[8] M. Conforti, G. Cornuejols, A. Kapoor, and K. Vuskovic, Perfect, ideal


and balanced matrices, In M. Dell'Amico, F. Maffioli and S. Martello
(eds.), Annotated bibliographies in combinatorial optimization, (New
York, Wiley, 1997).

[9] W. Cook, A.M. Gerards, A. Schrijver and E. Tardos, Sensitivity theo-


rems in linear integer programming, Mathematical Programming Vo1.34
(1986) pp. 251-264.

[10] G. Danninger, Role of copositivity in optimality criteria for nonconvex


optimization problems, Journal of Optimization Problems and Applica-
tions Vo1.75 (1992) pp. 535-558.

[11] D.-Z. Du, Minimax and its applications, in R. Horst and P.M. Parda-
los (eds.), Handbook of Global Optimization, (Boston, Kluwer Academic
Publisher, 1995), pp. 339-367.

[12] D.-Z. Du and F.K. Hwang, A proof of Gilbert-Pollak conjecture on the


Steiner ratio, Algoritmica Vo1.7 (1992) pp. 121-135.
184 F. Giannessi and F. Tardella

[13] D.-Z. Du and P.M. Pardalos (eds.), Minimax and applications, (Boston,
Kluwer Academic Publisher, 1995).

[14] D.-Z. Du and P.M. Pardalos, A continuous version of a result of Du


and Hwang, Journal of Global Optimization Vol.5 (1994) pp. 127-130.

[15] N. Dunford and J.T. Schwartz, Linear operators. Part I, (New York,
Interscience, 1958).

[16] J. Edmonds, Submodular functions, matroids and certain polyhedra,


in R. Guyet al. (eds.), Combinatorial Structures and their Applications
(Proceedings Calgary International Conference, 1969), (New York, Gor-
don and Breach, 1970).

[17] E. Erkut and S. Neumann, Analytical models for locating undesirable


facilities, European Journal of Operational Research Vol.40 (1989) pp.
275-291.

[18] S. Fujishige, Submodular Functions and Optimization, (Amsterdam,


North Holland, 1991).

[19] P. Gahinet, P. Apkarian and M. Chilali, Parameter-dependent Lya-


punov functions for real parametric uncertainty, IEEE Trans. Automat.
Contr. Vo1.41 (1996) pp. 436-442.

[20] P. Favati and F. Tardella, Convexity in nonlinear integer programming,


Ricerca Operativa Vo1.53 (1990) pp. 3-44.

[21] A. Frank and E Tardos, Generalized polymatroids and submodular


flows, Mathematical Programmin9 Vo1.42 (1988) pp. 489-563.

[22] A. Frank, Matroids and submodular functions, In M. Dell'Amico, F.


Maffioli and S. Martello (eds.), Annotated bibliographies in combinatorial
optimization, (New York, Wiley, 1997).

[23] F. Giannessi and F. Niccolucci, Connections between nonlinear and in-


teger programming problems, in Symposia Mathematica, Vol. XIX, (Lon-
don, Academic Press, 1976) pp. 161-176.

[24] F. Giannessi, On some connections among variational inequalities, com-


binatorial and continuous optimization, Annals of Operations Research
Vo1.58 (1995) pp. 181-200.
Nonlinear Programming and Discrete Optimization 185

[25] E. Girlich and M. Kovalev, Classification of polyhedral matroids, ZOR


Mathematical Methods 0/ Op. Res. Vol.43 (1996) pp. 143-160.
[26] F. Granot and J. Skorin-Kapov, Some proximity and sensitivity results
in quadratic integer programming. Mathematical Programming Vo1.47
(1990) pp. 259-268.
[27] M. Grotschel, l. Lovasz and A. Schrijver, The ellipsoid algorithm and its
consequences in combinatorial optimization, Combinatorica Vol. 1 (1981)
pp. 169-197.
[28] S.L. Hakimi, Optimum location of switching centers and the absolute
centers and medians of a graph, Operations Research Vo1.12 (1964) pp.
450-459.
[29] G.Y. Handler and P.B. Mirchandani, Location on Networks: Theory
and Algorithms, (Cambridge, MIT Press, 1979).
[30] W.W. Hager, P.M. Pardalos, I.M. Roussos, and H.D. Sahinoglou, Active
constraints, indefinite quadratic test problems, and complexity, Journal
of Optimization Theory and Applications Vol. 68 (1991) pp. 499-51l.
[31] P. Hansen, Methods of nonlinear 0-1 programming, Annals of Discrete
Math. Vo1.5 (1979) pp. 53-70.
[32] P. Hansen, D. Peeters and J.-F. Thisse, On the location of an obnoxious
facility, Sistemi Urbani Vol.3 (1981) pp. 299-317.
[33] P. Hansen, B. Jaumard and V. Mathon, Constrained nonlinear 0-1
programming, Rutcor Research Report 47-89 (1989), Rutgers University.
[34] J.-B. Hiriart-Urruty, From convex optimiztion to nonconvex optimiza-
tion. Part I: necessary and sufficient conditions for global optimality, in
R. Horst and P.M. Pardalos (eds.), Nonsmooth Optimization and Related
Topics, (New York, Plenum Press, 1995) pp. 219-239.
[35] J.-B. Hiriart-Urruty, Conditions for global optimality, in R. Horst and
P.M. Pardalos (eds.), Handbook of Global Optimization, (Boston, Kluwer
Academic Publisher, 1995) pp. 1-26.
[36] J.-B. Hiriart-Urruty and C. Lemarechal, Testing necessary and suf-
ficient conditions for global optimality in the problem of maximizing
a convex quadratic function over a convex polyhedron, Tech. Report
(1990), Univ. Paul Sabatier of Toulouse, Seminar of Numerical Analysis.
186 F. Giannessi and F. Tardella

[37] W.M. Hirsch and A.J. Hoffman, Extreme varieties, concave functions
and the fixed charge problem, Communications on Pure and Applied
Mathematics Vo1.14 (1961) pp. 355-369.

[38] D.S. Hochbaum, Lower and upper bounds for the allocation problem
and other nonlinear optimization problems, Mathematics of Operations
Research Vo1.19 (1994) pp. 309-409.

[39] D.S Hochbaum and J.G. Shanthikumar, Convex separable optimization


is not much harder than linear optimization, J. Assoc. Comput. Mach.
Vol.37 (1990) pp. 843-862.

[40] R. Horst and H. Thy, Global Optimization. Deterministic approaches,


(Berlin, Springer-Verlag, 1990).

[41] P.L. Hammer and S. Rudeanu, Methods Booleennes en Recherche


Operationelle, (Paris, Dunod, 1970).

[42] F.K. Hwang and U.G. Rothblum, Directional quasi-convexity, asym-


metric Schur-convexity and optimality of consecutive partitions, Mathe-
matics of Operations Research Vol.21 (1996) pp. 540--554.

[43] T. Ibaraki and N. Katoh, Resource Allocation Problems: Algorithmic


Approaches, (Boston, MIT Press, 1988).

[44] B. Kalantari and J.B. Rosen, Penalty for zero-one integer equivalent
problem, Mathematical Programming Vol.24 (1982) pp. 229-232.

[45] V. Klee, On the complexity of d-dimensional Voronoi diagrams, Arkiv


der Mathematik Vol.34 (1980) pp. 75-80.

[46] M.K. Kozolov, S.P. Tarasov and L.G. Haeijan, Polynomial solvability of
convex quadratic programs, Soviet Math. Dokl. Vo1.20 (1979) pp. 1108-
1111.

[47] M. Labbe, D. Peeters and J.F. Thisse, Location on networks, in M.D.


Ball et al. (eds.) Handbooks in OR and MSVol.8, (Amsterdam, Elsevier,
1995) pp. 551-624.

[48] C. Larsen and J. Tind, Lagrangean duality for facial programs with
applications to integer and complementarity problems, Operations Re-
search Letters Vol.11 (1992) pp. 293-302.
Nonlinear Programming and Discrete Optimization 187

[49] 1. Lovasz, Submodular functions and convexity, in A. Bachem et aI.


(eds.), Mathematical Programming - The State of the Art, (Berlin,
Springer, 1983) pp. 235-257.

[50] P.B. Mirchandani and R.L. Francis (eds.), Discrete Location Theory,
(New York, Wiley, 1990).

[51] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial.Optimiza-


tion, (New York, Wiley, 1988).

[52] A. Neumaier, Second-order sufficient optimality conditions for local


and global nonlinear programming, Journal of Global Optimization Vol.9
(1996) pp. 141-151.

[53] P.M. Pardalos, Continuous approaches to discrete optimization prob-


lems, in G. Di Pillo and F. Giannessi (eds.), Nonlinear Optimization and
Applications, (New York, Plenum Press, 1996) pp. 313-328.

[54] P.M. Pardalos and J.B. Rosen,Constrained Global Optimization: Algo-


rithms and Applications, (Berlin, Springer, 1987).

[55] P.M. Pardalos and H. Wolkowicz (eds.), Topics in Semidefinite and


Interior-Point Methods, Fields Institute Communications Series, Amer-
ican Mathematical Society (in press 1997).

[56] S. Poljak, F. Rendl, and H. Wolkowicz, A Recipe for Best Semidefinite


Relaxation for (0,1)-Quadratic Programming, Journal of Global Opti-
mization Vo1.7 (1995) pp. 51-73.

[57] M. Ragavachari, On the connections between zero-one integer program-


ming and concave programming under linear constraints, Operations Re-
search Vol.17 (1969) pp. 680-683.

[58] R.T. Rockafellar, Convex Analysis, (Princeton, Princeton University


Press, 1970).

[59] A. Schrijver, Theory of Linear and Integer Programming, (New York,


Wiley, 1986).

[60] H.D. Sherali and W.P. Adams, A hierarchy of relaxations between the
continuous and convex hull representations for zero-one programming
problems, SIAM J. Discrete Math. Vol.3 (1990) pp. 411-430.
188 F. Giannessi and F. Thrdella

[61] H.D. Sherali and W.P. Adams, A hierarchy of relaxations and con-
vex hull characterizations for mixed-integer zero-one programming prob-
lems., Discrete Applied Mathematics Vol.52 (1994) pp. 83-106.
[62] J. Shi and Y. Yoshitsugu, A D.C. approach to the largest empty sphere
problem in higher dimension, in C.A. Floudas and P.M. Pardalos (eds.),
State of the Art in Global Optimization, (Boston, Kluwer Academic Pub-
lisher, 1996) pp. 395-411.
[63] I. Singer, Extension of functions of 0-1 variables and applications to
combinatorial optimization, Numer. Funct. Anal. and Optimiz. Vol. 7
(1984-85) pp. 23-62.
[64] F. Tardella, On the equivalence between some discrete and continuous
optimization problems, Rutcor Research Report 30-90 (1990), Rutgers
University. Published in Annals of Operations Research Vol.25 (1990)
pp. 291-300.

[65] F. Tardella, Discretization of continuous location problems, manuscript


(1997).

[66] F. Tardella, Piecewise concavity and minimax problems, manuscript


(1997).
[67] D.M. Topkis, Minimizing a submodular function on a lattice, Opera-
tions Research Vol.26 (1978) pp. 305-321.

[68] D.M. Topkis, Adjacency on polymatroids, Mathematical Programming


Vol.30 (1984) pp. 229-237.

[69] H. Thy, Concave programming under linear constraints, Soviet Math.


Dokl. Vo1.5 (1964) pp. 1437-1440.

[70] L. Vandenberghe and S. Boyd, Semidefinite Programming, SIAM Re-


view Vol.38 (1996) pp. 49-95.

[71] W.I. Zangwill, The piecewise concave function, Management Science


Vol.13 (1967) pp. 900-912.
189

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 189-297
©1998 Kluwer Academic Publishers

Interior Point Methods for Combinatorial


Optimization
John E. Mitchell
Mathematical Sciences
Renssaeler Polytechnic Institute, Troy, NY 12180 USA
E-mail: mitchj~rpi.edu

Panos M. Pardalos
Center for Applied Optimization, ISE Department
University of Florida, Gainesville, FL 32611 USA
E-mail: pardalos(Qufl. edu

Mauricio G. C. Resende
Information Sciences Research
AT&T Labs Research, Florham Park, NJ 07932 USA
E-mail: mgcr~research.att.com

Contents
1 Introduction 191

2 Combinatorial optimization 191


2.1 Examples of combinatorial optimization problems .. 191
2.2 Scope and computational efficiency . . . . . . . . . . . 196

3 Solution techniques 202


3.1 Combinatorial approach . . . . . . . . . . . . . . . . . . . . . . .. 202
3.2 Continuous approach . . . . . . . . . . . . . . . . . . . . . . . . .. 203
3.2.1 Examples of embedding . . . . . . . . . . . . . . . . . . .. 204
3.2.2 Global approximation . . . . . . . . . . . . . . . . . . . .. 205
3.2.3 Continuous trajectories 205
3.2.4 Topological properties . . . . . . . . . . . . . . . . . . . . . . 207
190 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

4 Interior point methods for linear and network programming 208


4.1 Linear programming . . . . . . . . . . . . . . . . . . 208
4.2 Network programming. . . . . . . . . . . . . . . . . 212
4.3 Components of interior point network flow methods 217
4.3.1 The dual affine scaling algorithm. . . . . . . 217
4.3.2 Computing the direction . . . . . . . . . . . . 218
4.3.3 Network preconditioners for conjugate gradient method 221
4.3.4 Identifying the optimal partition 225
4.3.5 Recovering the optimal flow. . . . . . . . . . . . . . . . 229

5 Branch and bound methods 233


5.1 General concepts . . . . 233
5.2 An example: The QAP . 236

6 Branch and cut methods 242


6.1 Interior point cutting plane methods 246
6.2 Solving the relaxations approximately 249
6.3 Restarting.............. 251
6.4 Primal heuristics and termination. 252
6.5 Separation routines. . 253
6.6 Fixing variables. . . . . 253
6.7 Dropping constraints . . 254
6.8 The complete algorithm 254
6.9 Some computational results 254
6.10 Combining interior point and simplex cutting plane algorithms 256
6.11 Interior point column generation methods for other problems 257
6.12 Theoretical issues and future directions . . . . . . . . . . . . . 258

7 N onconvex potential function minimization 259


7.1 Computing the descent direction . . . . . 265
7.2 Some computational considerations. . . . 269
7.3 Application to combinatorial optimization 270

8 A lower bounding technique 272

9 Semidefinite Programming Relaxations 276

10 Concluding Remarks 280

11 Acknowledgement 282

References
Interior Point Methods for Combinatorial Optimization 191

1 Introd uction
Interior-point methods, originally invented in the context of linear program-
ming, have found a much broader range of applications, including discrete
problems that arise in computer science and operations research as well as
continuous computational problems arising in the natural sciences and en-
gineering. This chapter describes the conceptual basis and applications of
interior-point methods for discrete problems in computing.
The chapter is organized as follows. Section 2 explains the nature and
scope of combinatorial optimization problems and illustrates the use of inte-
rior point approaches for these problems. Section 3 contrasts the combinato-
rial and continuous approaches for solving discrete problems and elaborates
on the main ideas underlying the latter approach. The continuous approach
constitutes the conceptual foundation of interior-point methods. Section 4 is
dedicated to interior point algorithms for linear and network optimization.
Sections 5 and 6 discuss branch-and-bound and branch-and-cut methods
based on interior point approaches. Sections 7 and 8 discuss the application
of interior point techniques to minimize nonconvex potential functions to
find good feasible solutions to combinatorial optimization problems as well
as good lower bounds. In Section 9, a brief introduction to semidefinite pro-
gramming techniques and their application to combinatorial optimization is
presented. We conclude the paper in Section 10 by observing the central role
played by optimization in both natural and man-made sciences. We provide
selected pointers to web sites constaining up-to-date information on interior
point methods and their applications to combinatorial optimaization.

2 Combinatorial optimization
In this section, we discuss several examples of combinatorial optimization
problems and illustrate the application of interior point techniques to the
development of algorithms for these problems.

2.1 Examples of combinatorial optimization problems


As a typical real-life example of combinatorial optimization, consider the
problem of operating a flight schedule of an airline at minimum cost. A
flight schedule consists of many flights connecting many cities, with spec-
ified arrival and departure times. There are several operating constraints.
Each plane must fly a round trip route. Each pilot must also fly a round trip
192 J.E. Mitchell, P.M. Par-dalos, and M.G.C. Resende

route, but not necessarily the same route taken by the plane, since at each
airport the pilot can change planes. There are obvious timing constraints
interlocking the schedules of pilots, planes and flights. There must be ade-
quate rest built into the pilot schedule and periodic maintenance built into
the plane schedule. Only certain crews are qualified to operate certain types
of planes. The operating cost consists of many components, some of them
more subtle than others. For example, in an imperfect schedule, a pilot
may have to fly as a passenger on some flight. This results in lost revenue
not only because a passenger seat is taken up but also because the pilot
has to be paid even when riding as a passenger. How does one make an
operating plan for an airline that minimizes the total cost while meeting all
the constraints? A problem of this type is called a combinatorial optimiza-
tion problem, since there are only a finite number of combinations possible,
and in principle, one can enumerate all of them, eliminate the ones that do
not meet the conditions and among those that do, select the one that has
the least operating cost. Needless to say, one needs to be more clever than
simple enumeration, due to the vast number of combinations involved.
As another example, consider a communication network consisting of
switches interconnected by trunks (e.g. terrestrial, oceanic, satellite) in a
particular topology. A telephone call originating in one switch can take
many different paths (of switches) to terminate in another switch, using
up trunks along the path. The problem is to design a minimum cost net-
work that can carry the expected traffic. After a network is designed and
implemented, operating the network involves various other combinatorial
optimization problems, e.g. dynamic routing of calls.
As a third example, consider inductive inference, a central problem in ar-
tificial intelligence and machine learning. Inductive inference is the process
of hypothesizing a general rule from examples. Inductive inference involves
the following steps: (i) Inferring rules from examples, finding compact ab-
stract models of data or hidden patterns in the data; (ii) Making predictions
based on abstractions; (iii) Learning, i.e. modifying the abstraction based
on comparing predictions with actual results; (iv) Designing questions to
generate new examples. Consider the first step of the above process, i.e.
discovering patterns in data. For example, given the sequence 2,4,6,8, ... ,
we may ask, "What comes next?" One could pick any number and justify it
by fitting a fourth degree polynomial through the 5 points. However, the an-
swer "10" is considered the most "intelligent." That is so because it is based
on the first-order polynomial 2n, which is linear and hence simpler than a
fourth degree polynomial. The answer to an inductive inference problem is
Interior Point Methods for Combinatorial Optimization 193

not unique. In inductive inference, one wants a simple explanation that fits
a given set of observations. Simpler answers are considered better answers.
One therefore needs a way to measure simplicity. For example, in finite au-
tomaton inference, the number of states could be a measure of simplicity. In
logic circuit inference, the measure could be the number of gates and wires.
Inductive inference, in fact leads to a discrete optimization problem, where
one wants to maximize simplicity, or find a model, or set of rules, no more
complex than some specified measure, consistent with already known data.
As a further example, consider the linear ordering problem, an important
problem in economics, the social sciences, and also archaeology. In this
problem, we are given several objects that we wish to place in order. There
is a cost associated with placing object i before object j and a cost for
placing object j before object i. The objective is to order the objects to
minimize the total cost. There are methods for ranking sports teams that
can be formulated as linear ordering problems: if team A beats team B then
team A should go ahead of team B in the ranking, but it may be that team
B beat team C, who in turn beat team A, so the determination of the "best"
ordering is a non-trivial task, usually depending on the margin of victory.
Even though the four examples given above come from four different
facets of life and look superficially to be quite different, they all have a
common mathematical structure and can be described in a common math-
ematical notation called integer programming. In integer programming, the
unknowns are represented by variables that take on a finite or discrete set
of values. The various constraints or conditions on the problem are cap-
tured by algebraic expressions of these variables. For example, in the airline
crew assignment problem discussed above, let us denote by variable Xij the
decision quantity that assigns crew i to flight j. Let there be m crew and
n flights. If the variable Xij takes on a value 1, then we say that crew i is
assigned to flight j, and the cost of that assignment is Cij. If the value is 0,
then crew i is not assigned to flight j. Thus, the total crew-scheduling cost
for the airline is given by the expression
m n
LLCijXij (1)
i=lj=l

that must be minimized. The condition that every flight should have exactly
one crew is expressed by the equations
m
L Xij = 1, for every flight j = 1, ... , n. (2)
i=l
194 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Figure 1: Geometric view of linear programming

We should also stipulate that the variables should take on only values 0 or
1. This condition is denoted by the notation

Xij E {O, 1}, 1 ~ i ~ mj 1 ~ j ~ n. (3)

Other conditions on the crew can be expressed in a similar fashion. Thus,


an integer programming formulation of the airline crew assignment problem
is to minimize the operating cost given by (1) subject to various conditions
given by other algebraic equations and inequalities. The formulations of
the network design problem, the inductive inference problem, as well as
the linear operdering problem, look mathematically similar to the above
problem.
Linear programming is a special and simpler type of combinatorial op-
timization problem in which the integrality constraints of the type (3) are
absent and we are given a linear objective function to be minimized subject
to linear inequalities and equalities. A standard form of linear program is
stated as follows:
mlR·n{cT xlAx ~ bj I ~ x ~ u}, (4)
xE n

where c, u, l,x ERn, bERm and A E Rmxn. In (4), x is the vector of decision
variables, Ax ~ b and I ~ x ~ u represent constraints on the decision
Interior Point Methods for Combinatorial Optimization 195

variables, and cT x is the linear objective function to be minimized. Figure 1


shows a geometric interpretation of a linear program on the Euclidean plane.
Each linear inequality is represented by a line that partitions the plane into
two half-spaces. Each inequality requires that for a solution to be feasible, it
must lie in one of the half-spaces. The feasible region is the intersection of the
half-spaces and is represented in the figure by the hashed area. The objective
function, that must be minimized over the feasible region, is represented by
a sliding line. This line intersects the feasible region in a set of points,
all having the same objective value. As the line is swept across the feasible
region in the direction of improvement, its objective value decreases. The set
of points determined by the intersection of the sliding line with the feasible
region that attains the best objective function value is called the optimal
solution. In the example of the figure there is a unique optimal solution. In
fact, a fundamental theorem in linear programming states that the optimal
solution of a linear program occurs at a vertex of the polytope defined by
the constraints of the linear program. This result gives linear programming
its combinatorial nature. Even though the linear programming decision
variables are continuous in nature, this result states that only a discrete and
finite number of points in the solution space need to be examined.
Linear programming has a wide range of applications, including per-
sonnel assignment, production planning and distribution, refinery planning,
target assignment, medical imaging, control systems, circuit simulation,
weather forecasting, signal processing and financial engineering. Many
polynomial-time solvable combinatorial problems are special cases of linear
programming (e.g. matching and maximum flow). Linear programming has
also been the source of many theoretical developments, in fields as diverse
such as economics and queueing theory.
Combinatorial problems occur in diverse areas. These include graph
theory (e.g. graph partitioning, network flows, graph coloring), linear in-
equalities (e.g. linear and integer programming), number theory (e.g. fac-
toring, primality testing, discrete logarithm), group theory (e.g. graph iso-
morphism, group intersection), lattice theory (e.g. basis reduction), and
logic and artificial intelligence (e.g. satisfiability, inductive and deductive
inference boolean function minimization). All these problems, when ab-
stracted mathematically have a commonality of discreteness. The solution
approaches for solving these problems also have a great deal in common. In
fact, attempts to come up with solution techniques revealed more common-
ality of the problems than was revealed from just the problem formulation.
The solution of combinatorial problems has been the subject of much re-
196 J.E. Mitchell, P.M. Paz-dalos, and M.G.C. Resende

search. There is a continuously evolving body of knowledge, both theoretical


and practical, for solving these problems.

2.2 Scope and computational efficiency


We illustrate with some examples the broad scope of applications of the
interior-point techniques and their computational effectiveness. Since the
most widely applied combinatorial optimization problem is linear program-
ming, we begin with this problem. Each step of the interior-point method
as applied to linear programming involves the solution of a linear system
of equations. While a straightforward implementation of solving these lin-
ear systems can still outperform the Simplex Method, more sophisticated
implementations have achieved orders of magnitude improvement over the
Simplex Method. These advanced implementations make use of techniques
from many disciplines such as linear algebra, numerical analysis, computer
architecture, advanced data structures, and differential geometry. Tables 1-
2 show the performance comparison between implementations of the Sim-
plex (CPLEX) and interior-point (ADP [79]) methods on a class of linear
programming relaxations of the quadratic assignment problems [127, 122].
Similar relative performances have been observed in problems drawn from
disciplines such as operations research, electrical engineering, computer sci-
ence, and statistics [79]. As the table shows, the relative superiority of
interior-point method over the Simplex Method grows as the problem size
grows and the speed-up factor can exceed 1000. Larger problems in the table
could only be solved by the interior-point method because of impracticality
of running the Simplex Method. In fact, the main practical contribution
of the interior-point method has been to enable the solution of many large-
scale real-life problems in fields such as telecommunication, transportation
and defense, that could not be solved earlier by the Simplex Method.
From the point of view of efficient implementation, interior-point meth-
ods have another important property: they can exploit parallelism rather
well [60, 131]. A parallel architecture based on multi-dimensional finite pro-
jective geometries, particularly well suited for interior-point methods, has
been proposed [75].
We now illustrate computational experience with an interior point based
heuristic for integer programming [74, 77]. Here again, the main compu-
tational task at each iteration, is the solution of one or more systems of
linear equations. These systems have a structure similar to the system
solved in each iteration of interior point algorithms for linear programming
Interior Point Methods for Combinatorial Optimization 197

Table 1: LP relaxations of QAP integer programming formulation

LP relaxation simplex into pt.


name rows vars itr time itr time time ratio
nug05 210 225 103 0.2s 14 1.6s 0.1
nug06 372 486 551 2.3s 17 2.6s 0.9
nug07 602 931 2813 22.0s 19 6.2s 3.5
nug08 912 1632 5960 91.3s 18 9.5s 9.6
nug12 3192 8856 57524 9959.1s 29 754.1s 13.2
nug15 6330 22275 239918 192895.2s 36 5203.8s 37.1
nug20 15240 72600 est. time: > 2 months 31 6745.5s -
nug30 52260 379350 did not run 36 35058.0s -

Table 2: CPLEX 3.0 and ADP runs on selected QAPLIB instances

LP relaxation primal simplex dual simplex ADP


prob rows vars itr time itr time itr time
nug05 1410 825 265 1.7s 370 1.1s 48 3.2s
nug06 3972 2886 7222 604.3s 1872 22.2s 55 12.2s
nug07 9422 8281 39830 47970.3s 6057 720.3s 59 43.3s
nug08 19728 20448 did not run 16034 37577.1s 63 139.1s
nug12 177432 299256 did not run did not run 91 6504.2s
198 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

and therefore software developed for linear programming can be reused in


integer programming implementations.
Consider, as a first example, the Satisfiability (SAT) Problem in propo-
sitional calculus, a central problem in mathematical logic. During the last
decade, a variety of heuristics have been proposed for this problem [61, 30].
A Boolean variable x can assume only values 0 or 1. Boolean variables can
be combined by the logical connectives or (V), and (,,) and not (x) to form
Boolean formulae (e.g. Xl" X2 V X3). A variable or a single negation of
the variable is called a literal. A Boolean formula consisting of only literals
combined by the V operator is called a clause. SAT can be stated as follows:
Given m clauses CI , ... ,Cm involving n variables Xl, •.. , X n , does the for-
mula CI " ... "Cm evaluate to 1 for some Boolean input vector [Xl, ... ,xn ]?
If so, the formula is said to be satisfiable. Otherwise it is unsatisfiable.
SAT can be formulated as the integer programming feasibility problem
L: Xi - L: Xi ~ 1 - l.JeI, C = Cl, ... ,Cm , (5)
iEIc iE:Jc
where
Ie = {j I literal Xi appears in clause C}
.Jc = {j I literal Xi appears in clause C} .
If an integer vector x E {O, 1}n is produced satisfying (5), the corresponding
SAT problem is said to be satisfiable.
An interior point implementation was compared with an approach based
on the Simplex Method to prove satisfiability of randomly generated in-
stances of SAT [65]. Instances with up to 1000 variables and 32,000 clauses
were solved. Compared with the Simplex Method approach on small prob-
lems (Table 3), speedups of over two orders of magnitude were observed.
Furthermore, the interior point approach was successful in proving satisfia-
bility in over 250 instances that the Simplex Method approach failed.
As a second example, consider inductive inference. The interior point ap-
proach was applied to a basic model of inductive inference [68]. In this model
there is a black box (Figure 2) with n Boolean input variables Xl, .•. , xn
and a single Boolean output variable y. The black box contains a hidden
Boolean function :F : {o,l}n --t {O, I} that maps inputs to outputs. Given
a limited number of inputs and corresponding outputs, we ask: Does there
exist an algebraic sum-of-products expression with no more than K product
terms that matches this behavior? If so, what is it? It turns out that this
problem can be formulated as a SAT problem.
Interior Point Methods for Combinatorial Optimization 199

Table 3: SAT: Comparison of Simplex and interior point methods

SAT Problem Size Speed


Variables Clauses (ICI) Avg Lits/Clause Up
50 100 5 5
100 200 5 22
200 400 7 66
400 800 10 319

t-----y
:F: {O, l}n ~ {O, I}

Xn

Figure 2: Black box with hidden logic

Table 4: Inductive inference SAT problems: 32-variable hidden logic


I/O SAT Size CPU Prediction
Samples Vars ICi itr time Inferred Logic Accuracy
2703 y -2:22:1:282:29 + :1:122:172:25:1:27+
50 332 49 66s .74
2:3:1:9:1:20 + :l:11:1:12XI62:32
y =:1:9:1:112:222:29 + :1:4:1:112:22+
100 404 5153 78 178s .91
2:3:1:9:1:20 + :l:l22:152:162:29

400 824 19478 147 12278 Y =:1:4:1:112:22 + 2:10:1:112:29:1:32+


exact
2:3:1:9:1:20 + :l:2:1:122:15X29
200 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Table 5: Efficiency on inductive inference problems: interior point and com-


binatorial approaches

Variables SAT Problem Interior Method Combinatorial


Hidden Logic vars lei itr time Method (time)
8 396 2798 1 9.33s 43.05s
8 930 6547 13 45.72s 11.78s
8 1068 8214 33 122.62s 9.48s
16 532 7825 89 375.83s 20449.20s
16 924 13803 98 520.60s *
16 1602 23281 78 607.80s *
32 228 1374 1 5.02s 159.68s
32 249 2182 1 9.38s 176.32s
32 267 2746 1 9.76s 144.40s
32 450 9380 71 390.22s *
32 759 20862 1 154.62s *
* Dld not find satlsfiable asslgnment 10 43200s.

Consider the hidden logic described by the 32-input, 1-output Boolean


expression y = X4XllXlSX22 + X2X12XlSX29 + X3X9X20 + XlOXllX29X32. This
function has 232 ~ 4.3 X 109 distinct input-output combinations. Table 4
summarizes the computational results for this instance, where subsets of
input-output examples of size 50, 100 and 400 were considered and the num-
ber of terms in the expression to be synthesized was fixed at K = 4. In all
instances, the interior point algorithm synthesized a function that described
completely the behavior of the sample. With a sample of only 400 input-
output patterns the approach succeeded in exactly describing the hidden
logic. The prediction accuracy given in the table was computed with Monte
Carlo simulation, where 10,000 random vectors were input to the black box
and to the inferred logic and their outputs compared. Table 5 illustrates
the efficiency of the interior-point method compared to the combinatorial
Davis-Putnam Method [24].
As another example of an application of the continuous approach to
combinatorial problems, consider the wire routing problem for gate arrays,
an important subproblem arising in VLSI design. As shown in Figure 3, a
gate array can be abstracted mathematically as a grid graph. Input to the
wire routing problem consists of a list of wires specified by end points on a
rectangular grid. Each edge of the graph, also known as a channel, has a pre-
Interior Point Methods for Combinatorial Optimization 201

B'

A'

Figure 3: Wire routing

specified capacity representing the maximum number of wires it can carry.


The combinatorial problem is to find a wiring pattern without exceeding ca-
pacity of horizontal and vertical channels. This problem can be formulated
as an integer programming problem. The interior-point approach has suc-
cessfully obtained provably optimal global solutions to large-scale problems
of this type having more than 20,000 wires [109]. On the other hand, com-
binatorial heuristics, such as simulated annealing are not comparable either
in the quality of the solution they can find or in terms of the computational
cost.
A further example of the successful application of interior point methods
to solve combinatorial optimization problems comes from statistical physics.
The problem of finding the ground state of an Ising spin glass is related to
the magnetism of materials. Finding the ground state can be modelled as the
problem of finding the maximum cut in a graph whose vertices and edges are
those of a grid on a torus. It can be formulated as an integer programming
problem and solved using a cutting plane approach. If the weights on the
edges are ±1 then the linear programs suffer from degeneracy, which limits
the size of problems that can be solved efficiently using the Simplex Method.
The use of an interior point algorithm to solve the relaxations allows the
202 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

solution of far larger problems. For example, solving problems on a 70 x 70


toroidal grid using simplex required up to a day on a Sun SPARCstation
10 [26], whereas problems on a 100 x 100 grid could be solved in an average
of about 3 hours 20 minutes on a Sun SPARC 20/71 when using an interior
point code [98].
In the case of many other combinatorial problems, numerous heuristic
approaches have been developed. Many times, the heuristic merely encodes
the prior knowledge or anticipation of the structure of solution to a specific
class of practical applications into the working of the algorithm. This may
make a limited improvement in efficiency without really coming to grips with
the problem of exponential growth that plagues the combinatorial approach.
Besides, one needs to develop a wide variety of heuristics to deal with dif-
ferent situations. Interior-point methods have provided a unified approach
to create efficient algorithms for many different combinatorial problems.

3 Solution techniques
Solution techniques for combinatorial problems can be classified into two
groups: combinatorial and continuous approaches. In this section, we con-
trast these approaches.

3.1 Combinatorial approach


The combinatorial approach creates a sequence of states drawn from a dis-
crete and finite set. Each state represents a suboptimal solution or a partial
solution to the original problem. It may be a graph, a vertex of a polytope,
a collection of subsets of a finite set or some other combinatorial object. At
each major step of the algorithm, the next state is chosen in an attempt to
improve the current state. The improvement may be in the quality of the
solution measured in terms of the objective function, or it may be in making
the partial solution more feasible. In any case, the improvement is guided
by local search. By local search we mean that the solution procedure only
examines a neighboring set of configurations and greedily selects one that
improves the current solution. Thus, local search is quite myopic, with no
consideration given to evaluate whether this move may make any sense glob-
ally. Indeed, a combinatorial approach often lacks the information needed
for making such an evaluation. In many cases, the greedy local improvement
may trap the solution in a local minimum that is qualitatively much worse
Interior Point Methods for Combinatorial Optimization 203

than a true global minimum. To escape from a local minimum, the com-
binatorial approach needs to resort to techniques such as backtracking or
abandoning the sequences of states created so far altogether and restarting
with a different initial state. Most combinatorial problems suffer from the
property of having a large number of local minima when the search space
is confined to a discrete set. For a majority of combinatorial optimization
problems, the phenomenon of multiple local minima may create a problem
for the combinatorial approach.
On the other hand, for a limited class of problems, one can rule out
the possibility of local minima and show that local improvement also leads
to global improvement. For many problems in this class, polynomial-time
algorithms (i.e. algorithms whose running time can be proven to be bounded
from above by polynomial functions of the lengths of the problems) have
been known for a long time. Examples of problems in this class are bipartite
matching and network flows. It turns out that many of these problems
are special cases of linear programming, which is also a polynomial-time
problem. However, the Simplex Method, which employs a combinatorial
approach to solving linear programs, has been shown to be an exponential-
time algorithm. In contrast, all polynomial-time algorithms for solving the
general linear programming problem employ a continuous approach. These
algorithms use either the Ellipsoid Method [81] or one of the variants of the
Karmarkar Method.

3.2 Continuous approach


In the continuous approach to solving discrete problems, the set of candidate
solutions to a given combinatorial problem is embedded in a larger continu-
ous space. The topological and geometric properties of the continuous space
play an essential role in the construction of the algorithm as well as in the
analysis of its efficiency. The algorithm involves the creation of a sequence
of points in the enlarged space that converges to the solution of the original
combinatorial problem. At each major step of the algorithm, the next point
in the sequence is obtained from the current point by making a good global
approximation to the entire set of relevant solutions and solving it. Usually
it is also possible to associate a continuous trajectory or a set of trajectories
with the limiting case of the discrete algorithm obtained by taking infinites-
imal steps. Topological properties of the underlying continuous space such
as connectivity of the level sets of the function being optimized are used for
bounding the number of local minima and choosing an effective formulation
204 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0

Figure 4: Discrete set embedded into continuous set

of the continuous optimization problem. The geometric properties such as


distance, volume and curvature of trajectories are used for analyzing the
rate of convergence of an algorithm, whereas the topological properties help
determine if a proposed algorithm will converge at all. We now elaborate
further on each of the main concepts involved in the continuous approach.

3.2.1 Examples of embedding

Suppose the candidate solutions to a discrete problem are represented as


points in the n-dimensional real space Rn. This solution set can be em-
bedded into a larger continuous space by forming the convex hull of these
points (Figure 4). This is the most common form of continuous embed-
ding and is used for solving linear and integer programming problems. As
another example, consider a discrete problem whose candidate solution set
is a finite cyclic group. This can be embedded in a continuous Lie group
{ei8 10 ~ e < 27r} [20]. A Lie group embedding is useful for the problem
of graph isomorphism or automorphism. In this problem, let A denote the
adjacency matrix of the graph. Then the discrete solution set of the auto-
morphism problem is the permutation group given by {PIAP = P Aj P is a
permutation matrix}. This can be embedded in a larger continuous group
given by {UIAU = U Aj U is a complex unitary matrix}.
Interior Point Methods for Combinatorial Optimization 205

3.2.2 Global approximation


At each major step of the algorithm, a subproblem is solved to obtain the
next point in the sequence. The subproblem should satisfy two properties:
(i) it should be a global approximation to the original problem; and (ii)
should be efficiently solvable. In the context of linear programming, the
Karmarkar Method contains a way of making a global approximation having
both of the above desirable properties and is based on the following theorem.

Theorem 3.1 Given any polytope P and an interior point x E P, there


exists a projective trons/ormation T, that trons/orms P to P' and x to x' E
P' so that it is possible to find in the trons/ormed space a circumscribing
ball B(x',R) :2 P', 0/ rodius R, center x', containing P' and a inscribing
ball B(x', r) ~ P' 0/ rodius r, center x' contained in P' such that the rotio
R/r is at most n.

The inverse image (under T) of the inscribed ball is used as the opti-
mization space for the subproblem and satisfies the two properties stated
above, leading to a polynomial-time algorithm for linear programming. The
effectiveness of this global approximation is also borne out in practice since
Karmarkar's Method and its variants take very few approximation steps to
find the global optimum of the original problem. Extension of this global
approximation step to integer programming have led to new algorithms for
solving NP-complete problems, as we shall see later.

3.2.3 Continuous trajectories


Suppose an interior-point method produces an iteration of the following
type,
(6)
where x(k) is the k-th iterate, f(k) is the k-th direction of improvement, and
o is the step-length parameter. Then by taking the limit as 0 --+ 0, we get
the infinitesimal version of the algorithm whose continuous trajectories are
given by the differential equation
dx
do = f(x), (7)

where f(x) defines a vector field. Thus, the infinitesimal version of the
algorithm can be thought of as a nonlinear dynamical system. For the
206 J.E. Mitchell, P.M. Pardalos, and M.O.C. Resende

projective method for linear programming, the differential equation is given


by
dx T
dt = -[D - xx ]PAD' Dc,
where
PAD = 1- DAT (AD2 AT)-l AD,
D = diag{xl,x2,'" ,xn }.
Similarly, continuous trajectories and the corresponding differential equa-
tions can be derived for other interior-point methods. These trajectories
have a rich mathematical structure in them. Many times they also have
algebraic descriptions and alternative interpretations. The continuous tra-
jectory given above for the linear programming problem converges to an
optimal solution of the problem corresponding to the objective function
vector c. Note that the vector field depends on c in a smooth way and as
the vector c is varied one can get to each vertex of the polytope as limit of
some continuous trajectory. If one were to attempt a direct combinatorial
description of the discrete solution set of a linear programming problem,
it would become enormously complex since the number of solutions can be
exponential with respect to the size of the problem. In contrast, the simple
differential equation given above implicitly encodes the complex structure
of the solution set. Another important fact to be noticed is that the differ-
ential equation is written in terms of the original input matrix A defining
the problem. Viewing combinatorial objects as limiting cases of continuous
objects often makes them more accessible to mathematical reasoning and
also permits construction of more efficient algorithms.
The power of the language of differential equations in describing com-
plex phenomena is rather well-known in the natural sciences. For example,
if one were to attempt a direct description of the trajectories involved in
planetary motion, it would be enormously complex. However, a small set
of differential equations written in terms of the original parameters of the
problem is able to describe the same motion. One of the most important
accomplishments of Newtonian mechanics was finding a simple description
of the apparently complex phenomena of planetary motion, in the form of
differential equations.
In the context of combinatorial optimization, the structure of the solution
set of a discrete problem is often rather complex. As a result, a straightfor-
ward combinatorial approach to solving these problems has not succeeded
in many cases and has led to a belief that these problems are intractable.
Interior Point Methods for Combinatorial Optimization 207

Even for linear programming, which is one of the simplest combinatorial op-
timization problems, the best known method, in both theory and practice, is
based on the continuous approach rather than the combinatorial approach.
Underlying this continuous approach is a small set of differential equations,
capable of encoding the complicated combinatorial structure of the solution
set. As this approach is extended and generalized, one hopes to find new
and efficient algorithms for many other combinatorial problems.

3.2.4 Topological properties


There are many ways to formulate a given discrete problem as a continuous
optimization problem, and it is rather easy to make a formulation that
would be difficult to solve even by means of continuous trajectories. How
does one make a formulation that is solvable? The most well-known class of
continuous solvable problems is the class of convex minimization problems.
This leads to a natural question: Is convexity the characteristic property
that separates the class of efficiently solvable minimization problems from
the rest? To explore this question we need to look at topological properties.
Topology is the study of properties invariant under any continuous, one-to-
one transformation of space having a continuous inverse.
Suppose we have a continuous optimization problem that is solvable by
means of continuous trajectories. It may be a convex problem, for exam-
ple. Suppose we apply a nonlinear transformation to the space that is a
diffeomorphism. The transformed problem need not be convex, but it will
continue to be solvable by means of continuous trajectories. In fact, the
image of the continuous trajectories in the original space, obtained by ap-
plying the same diffeomorphism gives us a way of solving the transformed
problem. Conversely, if the original problem was unsolvable, it could not be
converted into a solvable problem by any such transformation. Hence any
diffeomorphism maps solvable problems onto solvable problems and unsolv-
able problems onto unsolvable problems. This argument suggests that the
property characterizing the solvable class may be a topological property and
not simply a geometric property such as convexity.
The simplest topological property relevant to the performance of interior-
point methods is connectivity of the level sets of the function being opti-
mized. Intuitively, a subset of continuous space is connected if any two
points of the subset can be joined by a continuous path lying entirely in the
subset. In the context of function minimization, the significance of connec-
tivity lies in the fact that functions having connected level sets do not have
208 J.E. Mitchell, P.M. Pardalos, and M.C.C. Resende

spurious local minima. In other words every local minimum is necessar-


ily a global minimum. A continuous formulation of NP-complete problems
having such desirable topological properties is given in [74]. The approach
described there provides a theoretical foundation for constructing efficient
algorithms for discrete problems on the basis of a common principle. Al-
gorithms for many practical problems can now be developed which differ
mainly in the way the combinatorial structure of he problem is exploited to
gain additional computational efficiency.

4 Interior point methods for linear and network


programming

4.1 Linear programming

Interior point methods were first described by Dikin [29] in the mid 1960s,
and current interest in them started with Karmarkar's algorithm in the mid
1980s [72]. As the name suggests, these algorithms generate a sequence of
iterates which moves through the relative interior of the feasible region, in
marked contrast to the simplex method [22], where each iterate is an ex-
treme point. Like the ellipsoid method [81], many interior point methods
have polynomial complexity, whereas every known variant of the simplex
method can take an exponential number of iterations in the worst case.
Computationally, interior point methods usually require far less time than
their worst-case bounds, and they appear to be superior to the simplex
method, at least for problems with a large number of constraints and vari-
ables (say, more than one thousand). Recent books discussing interior point
methods for linear programming include [132, 140, 146, 148].
The dual affine scaling method is similar to Dikin's original method and
is discussed in section 4.3.1. In this section, we consider a slightly more com-
plicated interior point method, namely the primal-dual predictor-corrector
method PDPCM [92,90]. This is perhaps the most popular and widely imple-
mented interior point method. The basic idea with an interior point method
is to enable the method to take long steps, by choosing directions that do
not immediately run into the boundary. With the PDPCM this is achieved by
considering a modification of the original problem, with a penalty term for
approaching the boundary. Thus, for the standard form linear programming
Interior Point Methods for Combinatorial Optimization 209

problem
min
subject to

where c and x are n-vectors, b is an m-vector, and A is dimensioned appro-


priately, the barrier function subproblem is constructed:

min cT x - J-L l:i=llog(Xi)


subject to Ax = b
x 2 0

Here, log denotes the natural logarithm, and J-L is a positive constant. Note
that if Xi approaches zero for any component, then the objective function
value approaclies 00.
If the original linear program has a compact set of optimal solutions then
each barrier subproblem will have a unique optimal solution. The set of all
these optimal solutions is known as the central trajectory. The limit point of
the central trajectory as J-L tends to zero is an optimal point for the original
linear program. If the original linear program has more than one optimal
solution, then the limit point is in the relative interior of the optimal face.
Fiacco and McCormick [34] suggested following the central trajectory
to the optimal solution. This requires solving a barrier subproblem for a
particular choice of J-L, decreasing J-L, and repeating. The hope is that knowing
the solution to one subproblem will make it easy to solve the next one. It
also suffices to only solve the subproblems approximately, both theoretically
and practically. Monteiro and Adler [102] showed that if J-L is decreased by a
sufficiently small amount then an approximate solution to one subproblem
can be used to obtain an approximate solution to the next one in just one
iteration, leading to an algorithm that requires O(nl/2) iterations. With a
more aggressive reduction in J-L (for example, J-L is halved at each iteration),
more iterations are required to obtain an approximate solution to the new
subproblem, and the best complexity result that has been proved for such
algorithms is that they require O(n) iterations.
There are several issues that need to be resolved in order to specify
the algorithm, including the choice of values of J-L, methods for solving the
subproblems, and the desired accuracy of the solutions of the subproblems.
Many different choices have been investigated, at least theoretically. Much
of the successful computational work has focussed on the PDPCM, and so we
now describe that algorithm.
210 J.E. Mitchell, P.M. Pardalos, and M.O.C. Resende

The optimality conditions for the subproblems can be written:

Ax = b (8)
ATy+z = c (9)
XZe = pe (10)

where e denotes the vector of all ones of appropriate dimension, y is an


m-vector, z is a nonnegative n-vector, and X and Z are n x n diagonal
matrices with Xii = Xi and Zii = Zi for i = 1, ... , n. Equation (9) together
with the nonnegativity restriction on z corresponds to dual feasibility. Notice
that the last condition is equivalent to saying that XiZi = P for each com-
ponent i. Complementary slackness for the original linear program would
require XiZi = 0 for each component i. The duality gap is xT Z, so if a point
satisfies the optimality conditions (8-10) then the duality gap will be np.
We assume we have a strictly positive feasible solution x and a dual
feasible solution (y, z) satisfying AT y+z = c, z > O. If these assumptions are
not satisfied, the algorithm can be modified appropriately; see, for example,
[90] or Zhang [150].
An iteration of PDPCM consists of three parts:
• A Newton step to move towards the solution of the linear program is
calculated (but not taken). This is known as the predictor step.

• This predictor step is used to update p.

• A corrector step is taken, which combines the decrease of the predictor


step with a step that tries to stay close to the central trajectory.
The predictor step gives an immediate return on the value of the objective
function, and the corrector step brings the iterate back towards the central
trajectory, making it easier to obtain a good decrease on future steps.
The calculation of the predictor step requires solving the following system
of equations to obtain the search directions f::..Px , f::..Py, and f::..Pz:

Af::..Px = 0 (11)
ATf::..Py+f::..Pz = 0 (12)
Zf::..Px + Xf::..Pz = -XZe. (13)

One method to solve this system is to notice that it requires

(14)
Interior Point Methods for Combinatorial Optimization 211

The Cholesky factorization of the matrix AZ- 1 XAT can be calculated, and
this can be used to obtain /)..P y . The vectors /)..P z and /)..Px can then be
calculated from the equations (12) and (13).
It can be shown that if we choose a step of length a then the duality is
reduced by a factor of a, so if we took a step of length one then the duality
gap would be reduced to zero. However, it is not usually possible to take
such a large step and still remain feasible. Thus, we calculate c? and e?n,
the maximum possible step lengths to maintain primal and dual feasibility.
We use these steplengths to aid in the adaptive update of J.I.. If the
steplengths are close to one, then the duality gap can be decreased dramat-
ically, and the new J.I. should be considerably smaller than the old value.
Conversely, if the steplengths are short then the iterates are close to the
boundary, so J.I. should only be decreased slightly and the iterates should
be pushed back towards the central trajectory. This leads to one possible
update of J.I. as
(15)
where 9p is the duality gap that would result if primal and dual steps of
lengths c? and e?n,respectively, were taken.
A corrector step (/)..x, /)..y, /)..z) is used to bring the iterates back towards
the central trajectory. This involves solving the system of equations

A/)..x - 0 (16)
AT /)..y + /)..z = 0 (17)
Z/)..x + X/)..z = lJ.+e - XZe - vP (18)

where v P is an n-vector, with components vf = /)..PXi/)..PZi. This system can


be solved using the Cholesky factors of the matrix AZ-l XAT, which were
already formed when calculating the predictor step.
Once the direction has been calculated, the primal and dual step lengths
ap and aD are chosen to ensure that the next iterate has x+ > 0 and z+ > o.
The iterates are updated as:

x+ = X + ap/)..x (19)
y+ = Y + aD/)..Y (20)
z+ - z + aD/)..z. (21)

Typically, ap and aD are chosen to move the iterates as much as 99.95% of


the way to the boundary.
212 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

The predictor-corrector method [103, 92] can be thought of as finding a


direction by using a second order approximation to the central trajectory.
A second order solution to (8-1O) would require that the direction satisfy

Z!!J.x + X!!J.z +v = Jl.+e - XZe

where v is a vector to be determined with Vi = !!J.Xi!!J.Zi' It is not possible


to easily solve this equation together with (16-17), so it is approximated by
the system of equations (16-18).
The method makes just one iteration for each reduction in p, but typ-
ically p is decreased very quickly, by perhaps at least a constant factor at
each iteration. The duality gap usually close to np, so the algorithm is ter-
minated when the duality gap drops below some tolerance. The algorithm
typically takes only 40 or so iterations even for problems with thousands of
constraints and/or variables.
The computational work in an iteration is dominated by the factorization
of the matrix AZ-I XAT. The first step in the factorization is usually to
permute the rows of A to reduce the number of nonzeroes in the Cholesky
factors - this step need only be performed once in the algorithm. Once the
ordering is set, a numerical factorization of the matrix is performed at each
iteration. These factors are then used to calculate the directions by means
of backward and forward substitutions. The use of the correction direction
was found to decrease the number of iterations required to solve a linear
program by enough to justify the extra work at each iteration of calculating
an extra direction. Higher order approximations [13, 14, 3, 76] have proven
useful for some problems where the cost of the factorization is far greater
than the cost of backward and forward substitutions - see [148]. For a
discussion of the computational issues involed in implementing an interior
point algorithm, see, for example, Adler et al. [3] and Andersen et al. [7].

4.2 Network programming


A large number of problems in transportation, communications, and manu-
facturing can be modeled as network flow problems. In these problems one
seeks to find the most efficient, or optimal, way to move flow (e.g. materials,
information, buses, electrical currents) on a network (e.g. postal network,
computer network, transportation grid, power grid). Among these optimiza-
tion problems, many are special classes of linear programming problems,
with combinatorial properties that enable development of efficient solution
techniques. In this section, we limit our discussion to these linear network
Interior Point Methods for Combinatorial Optimization 213

flow problems. For a treatment of classes of nonlinear network flow prob-


lems, the reader is referred to [31, 50, 51, 114] and references therein.
Given a directed graph G = (N,A), where N is a set of m nodes and
A a set of n arcs, let (i,j) denote a directed arc from node i to node j.
Every node is classified in one of the following three categories. Source
nodes produce more flow than they consume. Sink nodes consume more
flow than they produce. Transshipment nodes produce as much flow as
they consume. Without loss of generality, one can assume that the total
flow produced in the network equals the total flow consumed. Each arc has
associated with it an origination node and a destination node, implying a
direction for flow to follow. Arcs have limitations (often called capacities
or bounds) on how much flow can move through them. The flow on arc
(i, j) must be no less than lij and can be no greater than Uij. To set up
the problem in the framework of an optimization problem, a unit flow cost
Cij, incurred by each unit of flow moving through arc (i,j), must be defined.
Besides being restricted by lower and upper bounds at each arc, flows must
satisfy another important condition, known as Kirchhoff's Law (conservation
of flow), which states that for every node in the network, the sum of all
incoming flow together with the flow produced at the node must equal the
sum of all outgoing flow and the flow consumed at the node. The objective
of the minimum cost network flow problem is to determine the flow on each
arc of the network, such that all of the flow produced in the network is
moved from the source nodes to the sink nodes in the most cost-effective
way, while not violating Kirchhoff's Law and flow limitations on the arcs.
The minimum cost network flow problem can be formulated as the following
linear program:
min LCijXij (22)
(i,j)eA
subject to:
L Xjk - L Xkj = bj , j EN (23)
(j,k)eA (k,j)eA

lij $ Xij $ Uij, (i,j) EA. (24)


In this formulation, Xij denotes the flow on arc (i,j) and Cij is the cost of
transporting one unit of flow on arc (i,j). For each nodej EN, let bj denote
a quantity associated with node j that indicates how much flow is produced
or consumed at the node. If bj > 0, node j is a source. If bj < 0, node j is
a sink. Otherwise (bj = 0), node j is a transshipment node. For each arc
(i,j) E A, as before, let lij and Uij denote, respectively, the lower and upper
214 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

bounds on flow on arc (i, j). The case where Uij = 00, for all (i, j) E A, gives
rise to the uncapacitated network flow problem. Without loss of generality,
lij can be set to zero. Most often, the problem data (i.e. Cij, Uij, lij, for
(i,j) E A and bj, for j E N) are assumed to be integer, and many codes
adopt this assumption. However, there can exist applications where the data
are real numbers, and algorithms should be capable of handling problems
with real data.
Constraints of type (23) are referred to as the flow conservation equa-
tions, while constraints of type (24) are called the flow capacity constraints.
In matrix notation, the above network flow problem can be formulated as a
linear program of the special form

min {cT x I Ax = b, I ~ x ~ u},

where A is the m x n node-arc incidence matrix of the graph G = (N, A), i.e.
for each arc (i, j) in A there is an associated column in matrix A with exactly
two nonzero entries: an entry 1 in row i and an entry -1 in row j. Note that
from the mn entries of A, only 2n are nonzero and because of this, the node-
arc incidence matrix is not a space-efficient representation of the network.
There are many other ways to represent a network. A popular representation
is the node-node adjacency matrix B. This is an m x m matrix with an entry
1 in position (i, j) if arc (i, j) E A and 0 otherwise. Such a representation
is efficient for dense networks, but is inefficient for sparse networks. A more
efficient representation of sparse networks is the adjacency list, where for
each node i EN there exists a list of arcs emanating from node i, i.e. a list
of nodes j such that (i, j) E A. The forward star representation is a multi-
array implementation of the adjacency list data structure. The adjacency
list enables easy access to the arcs emanating from a given node, but not
the incoming arcs. The reverse star representation enables easy access to
the list of arcs incoming into i. Another representation that is much used in
interior point network flow implementations is a simple arc list, where the
arcs are stored in a linear array. The complexity of an algorithm for solving
network flow problems depends greatly on the network representation and
the data structures used for maintaining and updating the intermediate
computations.
We denote the i-th column of A by Ai, the i-th row of A by A.i and a
submatrix of A formed by columns with indices in set S by As. If graph G
is disconnected and has p connected components, there are exactly p redun-
dant flow conservation constraints, which are sometimes removed from the
Interior Point Methods for Combinatorial Optimization 215

problem formulation. We rule out a trivially infeasible problem by assuming


L bj = 0, k = 1, ... ,p, (25)
jENIc

where Nk is the set of nodes for the k-th component of G.


Often it is further required that the flow Xij be integer, i.e. we replace
(24) with
lij :::; Xij :::; Uij, Xij integer, (i,j) EA. (26)
Since the node-arc incidence matrix A is totally unimodular, when the data
is integer all vertex solutions of the linear program are integer. An algorithm
that finds a vertex solution, such as the simplex method, will necessarily pro-
duce an integer optimal flow. In certain types of network flow problems, such
as the assignment problem, one may be only interested in solutions having
integer flows, since fractional flows do not have a logical interpretation.
In the remainder of this section we assume, without loss of generality,
that lij = 0 for all (i,j) E A and that c:f:. O. A simple change of variables
can transform the original problem into an equivalent one with Iij = 0 for
all (i, j) E.A.. The case where c = 0 is a simple feasibility problem, and can
be handled by solving a maximum flow problem [4].
Many important combinatorial optimization problems are special cases
of the minimum cost network flow problem. Such problems include the linear
assignment and transportation problems, and the maximum flow and short-
est path problems. In the transportation problem, the underlying graph
is bipartite, i.e. there exist two sets Sand 7 such that S U 7 = N and
S n 7 = 0 and arcs occur only from nodes of S to nodes of 7. Set S is
usually called the set of source nodes and set 7 is the set of sink nodes. For
the transportation problem, the right hand side vector in (23) is given by

b. = { Sj ifj E S
3 -tj if j E 7,
where Sj is the supply at node j E S and tj is the demand at node j E 7.
The assignment problem is a special case of the transportation problem, in
which Sj = 1 for all j E Sand tj = 1 for all j E 7.
The computation of the maximum flow from node S to node t in G =
(N,A) can be done by computing a minimum cost flow in G' = (N',A'),
where N' = N and A' = Au (t, s), where
0 if(i,j) E A
{
Cij = -1 if (i,j) = (t, s),
216 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

and
.. _ {cap(i,j) if (i,j) E A
U 1J - 00 1'f(")
~,J ,s,
= (t)

where cap{i,j) is the capacity of arc (i,j) in the maximum flow problem.
The shortest paths from node s to all nodes in N \ {s} can be computed
by solving an uncapacitated minimum cost network flow problem in which
Cij is the length of arc (i,j) and the right hand side vector in (23) is given
by
b.={m-1
J
ifj=s
-1 ifjEN\{s}.
Although all of the above combinatorial optimization problems are for-
mulated as minimum cost network flow problems, several specialized algo-
rithms have been devised for solving them efficiently.
In many practical applications, flows in networks with more than one
commodity need to be optimized. In the multicommodity network flow
problem, k commodities are to be moved in the network. The set of com-
modities is denoted by /C. Let xt
denote the flow of commodity k in arc
(i,j). The multicommodity network flow problem can be formulated as the
following linear program:

min L L <1jX~j (27)


kelC (i,j)eA

subject to:
L xjl- L X~j = bj, j EN, k E /C (28)
(j,l)eA (l,j)eA

L X~j ~ Uij, (i,j) E A, (29)


kelC

xt ~ 0, (i,j) E A, k E}C. (30)


The minimum cost network flow problem is a special case of the multicom-
modity network flow problem, in which there is only one commodity.
In the 1940s, Hitchcock [58] proposed an algorithm for solving the trans-
portation problem and later Dantzig [23] developed the Simplex Method for
linear programming problems. In the 1950s, Kruskal [83] developed a mini-
mum spanning tree algorithm and Prim [118] devised an algorithm for the
shortest path problem. During that decade, commercial digital computers
were introduced widely. The first book on network flows was published by
Interior Point Methods for Combinatorial Optimization 217

Ford and Fulkerson [37] in 1962. Since then, active research produced a
variety of algorithms, data structures, and software for solving network flow
problems. For an introduction to network flow problems and applications,
see the books [4, 15, 31, 37, 80, 84, 133, 137].

4.3 Components of interior point network flow methods


Since Karmarkar's breakthrough in 1984, many variants of his algorithm,
including the dual affine scaling, with and without centering, reduced dual
affine scaling, primal path following (method of centers), primal-dual path
following, predictor-corrector primal-dual path following, and the infeasible
primal-dual path following, have been used to solve network flow problems.
Though these algorithms are, in some sense, different, they share many
of the same computational requirements. The key ingredients for efficient
implementation of these algorithms are:
1. The solution of the linear system ADAT u = t, where D is a diagonal n xn
scaling matrix, and u and t are m-vectors. This requires an iterative al-
gorithm for computing approximate directions, preconditioners, stopping
criteria for the iterative algorithm, etc.
2. The recovery of the desired optimal solution. This may depend on how the
problem is presented (integer data or real data), and what type of solution
is required (fractional or integer solution, E-optimal or exact solution,
primal optimal or primal-dual optimal solution, etc.).
In this subsection, we present in detail these components, illustrating
their implementation in the dual affine scaling network flow algorithm DLNET
of Resende and Veiga [130].

4.3.1 The dual affine scaling algorithm


The dual affine scaling (DAS) algorithm [12, 29, 135, 141] was one of the
first interior point methods to be shown to be competive computationally
with the simplex method [2, 3]. As before, let A be an m x n matrix, c, u,
and x be n-dimensional vectors and b an m-dimensional vector. The DAS
algorithm solves the linear program
min {cT x I Ax = b, 0 ~ x ~ u}
indirectly, by solving its dual
max{bTy-uTzIATy-z+s=c, z~O,s~O}, (31)
218 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

where z and s are an n-dimensional vectors and y is an m-dimensional vector.


The algorithm starts with an initial interior solution {yO, zO, sO} such that
A T yO _ zO + sO = c, zO > 0, sO > 0,

and iterates according to


{yk+1, zk+I, sk+1} = {yk, zk, sk} +a {~y, ~z, ~s},

where the search directions ~y, ~z, and ~s satisfy


A(Zf + Sf)-l AT /).y = b - AZf(Zf + S~)-lu,
/).Z = Z~(Z~ + Sf)-l(AT ~y - S~u),
/).s = ~z - AT ~y,

where
Zk = diag(z~, ... ,z~) and Sk = diag(sf, ... ,s~)
°
and a is such that zk+1 > and sk+1 > 0, i.e. a =,
x min{a z , as},
°
where
<, < 1 and
az = min{ -Zfl(~Z)i I (~Z)i < 0, i = 1, ... , n}
as = min{-sf/(~s)i I (~S)i < 0, i = 1, ... ,n}.
The dual problem (31) has a readily available initial interior point solu-
tion:

y? = 0, i = 1, ... ,n
s? = Ci+ A, i = 1, ... ,n
z? A, i = 1, ... ,n,
°
where A is a scalar such that A > and A > -Ci, i = 1, ... ,n. The algorithm
described above has two important parameters, , and A. For example, in
DLNET, , = 0.95 and A = 2 IIc1l2'

4.3.2 Computing the direction


The computational efficiency of interior point network flow methods relies
heavily on a preconditioned conjugate gradient algorithm to solve the direc-
tion finding system at each iteration. The preconditioned conjugate gradient
algorithm is used to solve

(32)
Interior Point Methods for Combinatorial Optimization 219

procedure pcg(A, Dk, b, feg, ~y)


1 ~Yo:= 0;
2 ro := b;
3 Zo := M-1ro;
4 Po := zo;
5 i:= 0;
6 do stopping criterion not satisfied -+
7 qi := ADkAT Pi;
8 Q' .= zT r·/pTq ..
, • " I "

9 ~Yi+1 := ~Yi + aiPi;


10 ri+1 := ri - aiqi;
11 Zi+1 := M- 1 ri+1;
12 i+l r'1+ l/zTr
it:lJ• l•'- - zT i I,··
13 PHI := Zi+1 + {3iPi;
14 i:= i + 1
15 od;
16 ~Y:= ~Yi
end peg;

Figure 5: The preconditioned conjugate gradient algorithm

where M is a positive definite matrix and, in the case of the DAS algorithm,
b = b - AZ~DkU, and Dk = (Z~ + S~)-l is a diagonal matrix of positive
elements. The objective is to make the preconditioned matrix
(33)
less ill-conditioned than ADkAT, and improve the convergence of the con-
jugate gradient algorithm.
The pseudo-code for the preconditioned conjugate gradient algorithm is
presented in Figure 5. The computationally intensive steps in the precondi-
tioned conjugate gradient algorithm are lines 3, 7 and 11 of the pseudo-code.
These lines correspond to a matrix-vector multiplication (7) and solving lin-
ear systems of equations (3 and 11). Line 3 is computed once and lines 7
and 11 are computed once every conjugate gradient iteration. The matrix-
vector multiplications are of the form ADkA T Pi, carried out without form-
ing ADkAT explicitly. One way to compute the above matrix-vector multi-
plication is to decompose it into three sparse matrix-vector multiplications.
220 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Let

Then

The complexity of this matrix-vector multiplication is O(n), involving n


additions, 2n subtractions and n floating point multiplications.
The preconditioned residual is computed in lines 3 and 11 when the
system of linear equations

(34)

is solved, where M is a positive definite matrix. An efficient implementation


requires a preconditioner that can make (34) easy to solve. On the other
hand, one needs a preconditioner that makes (33) well conditioned. In the
next subsection, we show several preconditioners that satisfy, to some extent,
these two criteria.
To determine when the approximate direction ~Yi produced by the con-
jugate gradient algorithm is satisfactory, one can compute the angle () be-
tween (ADkAT)~Yi and b and stop when II-cos (}I < fcos, where fcos is some
small tolerance. In practice, one can initially use fcos = 10- 3 and tighten
the tolerance as the interior point iterations proceed, as fcos = fcos x 0.95.
The exact computation of

has the complexity of one conjugate gradient iteration and is therefore ex-
pensive if computed at each conjugate gradient iteration. One way to pro-
ceed is to compute the cosine every IcoB conjugate gradient iterations. A more
efficient procedure [116] follows from the observation that (ADkAT)~Yi is
approximately equal to b- ri, where ri is the estimate of the residual at the
i-th conjugate gradient iteration. Using this approximation, the cosine can
be estimated by

Since, in practice, the conjugate gradient method finds good directions in few
iterations, this estimate has been shown to be effective and can be computed
at each conjugate gradient iteration.
Interior Point Methods for Combinatorial Optimization 221

4.3.3 Network preconditioners for conjugate gradient method


A useful preconditioner for the conjugate gradient algorithm must be one
that allows the efficient solution of (34), while at the same time causing
the number of conjugate gradient iterations to be small. Five precondi-
tioners have been found useful in conjugate gradient based interior point
network flow methods: diagonal, maximum weighted spanning tree, incom-
plete QR decomposition, the Karmarkar-Ramakrishnan preconditioner for
general linear programming, and the approximate Cholesky decomposition
preconditioner [93] .
A diagonal matrix constitutes the most straightforward preconditioner
used in conjunction with the conjugate gradient algorithm [45]. They are
simple to compute, taking O(n) double precision operations, and can be ef-
fective [129, 131, 149]. In the diagonal preconditioner, M = diag (ADkAT),
and the preconditioned residue systems of lines 3 and 11 of the conjugate
gradient pseudo-code in Figure 5 can each be solved in O(m) double preci-
sion divisions.
A preconditioner that is observed to improve with the number of inte-
rior point iterations is the maximum weighted spanning tree preconditioner.
Since the underlying graph G is not necessarily connected, one can identify a
maximal forest using as weights the diagonal elements of the current scaling
matrix,
(35)
where e is a unit n-vector. In practice, Kruskal's and Prim's algorithm have
been used to compute the maximal forest. Kruskal's algorithm, implemented
with the data structures in [137] has been applied to arcs, ordered approx-
imately with a bucket sort [62, 130], or exactly using a hybrid QuickSort
[64]. Prim's algorithm is implemented in [116] using the data structures
presented in [4].
At the k-th interior point iteration, let Sk be the submatrix of A with
columns corresponding to arcs in the maximal forest, tl,.'" t q • The pre-
conditioner can be written as
M = Sk'DkSl,
where, for example, in the DAS algorithm

'Dk = diag(1/zl1 + l/s~l"'" l/zlq + l/s~q)'


For simplicity of notation, we include in Sk the linear dependent rows corre-
sponding to the redundant flow conservation constraints. At each conjugate
222 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

gradient iteration, the preconditioned residue system

(36)

is solved with the variables corresponding to redundant constraints set to


zero. As with the diagonal preconditioner, (36) can be solved in O(m) time,
as the system coefficient matrix can be ordered into a block triangular form.
Portugal et al. [116] introduced a preconditioner based on an incom-
plete QR decomposition (IQRD) for use in interior point methods to solve
transportation problems. They showed empirically, for that class of prob-
lems, that this preconditioner mimics the diagonal preconditioner during
the initial iterations of the interior point method, and the spanning tree
preconditioner in the final interior point method iterations, while causing
the conjugate gradient method to take fewer iterations than either method
during the intermediate iterations. In [117], the use of this preconditioner
is extended to general minimum cost network flow problems. In the follow-
ing discussion, we omit the iteration index k from notation for the sake of
simplicity. Let T = {I, 2, ... ,n} \ T be the index set of the arcs not in the
computed maximal spanning tree, and let

D=[Dr ],
D-r
where Dr E Rqxq is the diagonal matrix with the arc weights of the maximal
spanning tree and D-r E R(n-q)x(n-q) is the diagonal matrix with weights of
the arcs not in the maximal spanning tree. Then

AD AT = [Ar A-r] [Dr D-r] [ 1~ ]


= [ ArDf! A-rDf! ] [ DfA:j
!
1
.
D~AI
r r
The Cholesky factorization of AD AT can be found by simply computing the
QR factorization of

(37)
Interior Point Methods for Combina.torial Optimiza.tion 223

In fact, if Q.A = R, then

The computation of the QR factorization is not recommended here, since


besides being more expensive than a Cholesky factorization, it also destroys
the sparsity of the matrix.A. Instead, Portugal et al. [116] propose an in-
complete QR decomposition of .A. Applying Givens rotations [39] to .A, using
1 1
the diagonal elements of DfAf, the elements of D.J.A:f. become null. No
fill-in is incurred in this factorization. See [116] for an example illustrating
this procedure. After the factorization, we have the preconditioner

M=P1)p T ,

where P is a matrix with a diagonal of ones that can be reordered to trian-


gular form, and 1) is a diagonal matrix with positive elements.
To avoid square root operations, 1) and P are obtained without explic-
itly computing 1)~ pT. Suppose that the maximum spanning tree is rooted
at node r, corresponding to the flow conservation equation that has been
removed from the formulation. Furthermore, let AT denote the subset of
arcs belonging to the tree and let Pi represent the predecessor of node i in
the tree. The procedure used to compute the nonzero elements of 1) and
the off-diagonal nonzero elements of P is presented in the pseudo-code in
Figure 6.
The computation of the preconditioned residual with pVpT requires
O(m) divisions, multiplications, and subtractions, since 1) is a diagonal ma-
trix and F can be permuted into a triangular matrix with diagonal elements
equal to one. The construction of P and 1), that constitute the precondi-
tioner, requires O(n) additions and O(m) divisions.
In practice, the diagonal preconditioner is effective during the initial it-
erations of the DAS algorithm. As the DAS iterations progress, the spanning
tree preconditioner is more effective as it becomes a better approximation
of matrix ADkA T. Arguments as to why this preconditioner is effective are
given in [62, 116]. The DLNET implementation begins with the diagonal pre-
conditioner and monitors the number of iterations required by the conjugate
gradient algorithm. When the conjugate gradient takes more than {JIm it-
erations, where f3 > 0, DLNET switches to the spanning tree preconditioner.
Upper and lower limits to the number of DAS iterations using a diagonal
preconditioned conjugate gradient are specified.
224 J.E. Mitchell, P.M. Pardalos, and M. G. C. Resende

procedure iqrd(T, T,N, D, V, F)


1 do i E N \ {r} -+
2 j = Pij
3 if (i,j) EAT -+ Vii = Dij fij
4 if (j, i) E AT -+ Vii = Dji fij
5 odj
6 do (i,j) EAT -+
7 ifi E N\ {r} -+ Vii = Vii + Dij fij
8 if j E N\ {r} -+ Vjj = Vjj +Dij fij
9 odj
10 do i EN \ {r} -+
11 j = Pij
12 if j EN \ {r} -+
13 if (i,j) EAT -+ Fij = Dij/Vii
fij
14 if (j, i) E AT -+ Fji = Dji/Vii
fij
15 fi·,
16 odj
end iqrdj

Figure 6: Computing the F and V matrices in IQRD


Interior Point Methods for Combinatorial Optimization 225

In [93], is proposed a Cholesky decomposition of an approximation of


the matrix AeAT (eDAM) as preconditioner. This preconditioner has the
form
M=LLT, (38)
whith L the lower triangular Cholesky factor of the matrix

(39)

where Band N are such that A = [B N] with B a basis matrix, eB and eN


are the diagonal submatrices of e corresponding to B and N, respectively,
and p is a parameter.
Another preconditioner used in an interior point implementation is the
one for general linear programming, developed by Karmarkar and Ramakr-
ishnan and used in [79, 120]. This preconditioner is based on a dynamic
scheme to drop elements of the original scaled constraint matrix D A, as
well as the from the factors of the matrix ADAT of the linear system, and
use the incomplete Cholesky factors as the preconditioner. Because of the
way elements are dropped, this preconditioner mimics the diagonal pre-
conditioner in the initial iterations and the tree preconditioner in the final
iterations of the interior point algorithm.

4.3.4 Identifying the optimal partition


One way to stop an interior point algorithm before the (fractional) interior
point iterates converge is to estimate (or guess) the optimal partition of arcs
at each iteration, attempt to recover the flow from the partition and, if a
feasible flow is produced, test if that flow is optimal. In the discussion that
follows we describe a strategy to partition the set of arcs in the dual affine
scaling algorithm. The discussion follows [128] closely, using a dual affine
scaling method for uncapacitated networks to illustrate the procedure.
Let A E Rmxn, c, X, 8 E Rn and b, y E Rm. Consider the linear program-
ming problem
minimize cT x
(40)
subject to Ax = b, x ~ 0
and its dual
maximize bT Y
(41)
subject to ATY + 8 = C, 8 ~ O.
226 J.E. Mitchell, P.M. Pardalos, and M.O.C. Resende

The dual affine scaling algorithm starts with an initial dual solution yO E
{y : s = c - ATY > o} and obtains iterate yk+l from yk according to
Yk+1 -- yk + akdv'k where the search direction d is dk = (AD- 2AT)-lb and
v V k
Dk = diag(st, ... , s~). A step moving a fraction 'Y of the way to the boundary
of the feasible region is taken at each iteration, namely,

where d~ = _AT d; is a unit displacement vector in the space of slack


variables. At each iteration, a tentative primal solution is computed by
xk = Dk"2AT(ADk"2AT)-lb. The set of optimal solutions is referred to as
the optimal face. We use the index set N* for the always-active index set
on the optimal face of the primal, and B* for its complement. It is well-
known that B* is the always-active index set on the optimal face of the dual,
and N* is its complement. An indicator is a quantity to detect whether an
index belongs to N* or B*. We next describe three indicators that can be
implemented in the DAS algorithm. For pointers to other indicators, see [33].
Under a very weak condition, the iterative sequence of the DAS algorithm
converges to a relative interior point of a face on which the objective function
is constant, i.e. the sequence {yk} converges to an interior point of a face
on which the objective function is constant. Let B be the always-active
index set on the face and N be its complement, and let boo be the limiting
objective function value. There exists a constant Co > 0 such that
k
lim sup boo SibT k :5 Co (43)
k-+oo - Y

for all i E B, while


,
s~
(44)

diverges to infinity for all i E N. Denote by SOO the limiting slack vector.
Then srJ > 0 and sri = O. The vector

Uk = (Dk)-ld~ _ Dkxk
- boo _ bT yk - boo _ bT yk (45)

plays an important role, since

( k)T k
· (k)T e=lm
1Imu
k-+oo

b
s bTk
k-+oo 00 -
x
Y
=.1 (46)
Interior Point Methods for Combinatorial Optimization 227

Consequently, in the limit boo - bT yk can be estimated by (Sk) T xk asymp-


totically, and (43) can be stated as

k
lim sup (sk~~ x k
k~oo
~ Co·
Then, if i E B, for any f3 such that 0 < f3 < 1,
k
k~ sup ((sk)s-i xk)fJ = 0,

since (( sk) T xk) fJ converges to zero at a slower rate than ((sk) T xk) for any f3
such that 0 < f3 < 1. Therefore, if f3 = 1/2, the following indicator has the
property that limk~oo N k = N •.
Indicator 1: Let C1 > 0 be any constant, and define

(47)

This indicator is available under very weak assumptions, so it can be used


to detect B. and N. without any substantial restriction on step-size. On the
other hand, it gives the correct partition only if the limit point yoo happens
to be a relative interior point of the optimal face of the dual and thus lacks
a firm theoretical justification. However, since we know by experience that
yoo usually lies in the relative interior of the optimal face, we may expect
that it should work well in practice. Another potential problem with this
indicator is that it is not scaling invariant, so that it will behave differently
if the scaling of the problem is changed.
Now we assume that the step-size 'Y is asymptotically less than or equal
to 2/3. Then the limiting point exists in the interior of the optimal face and
boo is the optimal value. Specifically, {yk} converges to an interior point of
the optimal face of the dual problem, {xk} converges to the analytic center
of the optimal face of the primal problem, and {b T yk} converges linearly to
the optimal value boo asymptotically, where the (asymptotic) reduction rate
is exactly 1 - 'Y. Furthermore, one can show that

lim u~ =
k~oo
11/B./ for i E B. (48)
lim u~ = 0 otherwise. (49)
k~oo '
228 J.E. Mitchell, P.M. Paz-dalos, and M.G.C. Resende

The vector uk is not available because the exact optimal value is unknown
a priori, but boo - bT yk can be estimated by (sk) T xk to obtain
k k
lim (
k-HXJ S
:i)~ix k = I/IB.1 for i E B. (50)
k k
·
11m si xi 0 h . (51)
( k)T k = ot erWlse.
k-+oo s x

On the basis of this fact, the following procedure to construct N k , which


asymptotically coincides with N.:
Indicator 2: Let d be a constant between 0 and 1. We obtain N k according
to the following procedure:

• Step 1: Sort Y~ = s~xf/(sk)T xk according to its order of magnitude.


Denote il the index for the I-th largest component .
• Step 2: For p := 1,2, ... compare Yip and dip, and let p. be the first
number such that Yip. ~ dip·. Then set

(52)

To state the third, and most practical indicator, let us turn our attention
to the asymptotic behavior of s~+1 I sf. If i E N., then sf
converges to a
positive value, and hence
s~+1
lim ~ = 1. (53)
k-+oo si
If i E B., sf converges to zero. Since

. k k
si Xi
1
11m =-- (54)
k-+oobOO-bTyk IB.I'
xf converges to a positive number, and the objective function reduces with
a rate of 1 - ,,(, then
s~+1
lim ~ = 1-,,(, (55)
k-+oo si
which leads to the following indicator:
Indicator 3: Take a constant 'TJ such that 1 - "( < 'TJ < 1. Then let
s~+1
Nk={i: ~~'TJ} (56)
si
Interior Point Methods for Combinatorial Optimization 229

be defined as the index set. Then Nk = N. holds asymptotically.


Of the three indicators described here, Indicators 2 and 3 stand on the
firmest theoretical basis. Furthermore, unlike Indicator 1, both are scaling
invariant. The above discussion can be easily extended for the case of ca-
pacitated network flow problems. DLNET uses Indicator 3 to identify the
set of active arcs defining the optimal face by examining the ratio between
subsequent iterates of each dual slack. At the optimum, the flow on each
arc can be classified as being at its upper bound, lower bound, or as active.
From the discussion above, if the flow on arc i converges to its upper bound,

lim s~/s~-l = 1-'Y and lim z~/z~-l = 1.


k-+oo I I k-+oo' ,

If the flow on arc i converges to its lower bound,

lim s~ / s~-l
k-+oo I I
=1 and lim zf / zf- 1 = 1 - 'Y.
k-+oo

If the flow on arc i is active,

lim s~ / s~-l = 1 - 'Y and lim zf / z~-l = 1 - 'Y.


k-+oo I , k-+oo '

From a practical point of view, scale invariance is the most interesting


feature of this indicator. An implement able version can use constants which
depend only on the step size factor 'Y. Let KO = .7 and Kl = .9. At each
iteration of DLNET, the arcs are classified as follows:
• If sf / sf-l < KO and zfJ zf- 1 > Kl, arc i is set to its upper bound.
• If sf / sf-l > Kl and zfJ zf- 1 < KO, arc i is set to its lower bound.
• Otherwise, arc i is set active, defining the tentative optimal face.

4.3.5 Recovering the optimal flow


The simplex method restricts the sequence of solutions it generates to nodes
of the linear programming polytope. Since the matrix A of the network
linear program is totally unimodular, when a simplex variant is applied to a
network flow problem with integer data, the optimal solution is also integer.
On the other hand, an interior point algorithm generates a sequence of
interior point (fractional) solutions. Unless the primal optimal solution is
unique, the primal solution that an interior point algorithm converges to
is not guaranteed to be integer. In an implementation of an interior point
230 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

network flow method, one would like to be capable of recovering an integer


flow even when the problem has multiple optima. We discuss below the
stopping strategies implemented in DLNET and used to recover an integer
optimal solution.
Besides the indicator described in subsection 4.3.4, DLNET uses the arcs
of the spanning forest of the tree preconditioner as an indicator. If there
exists a unique optimal flow, this indicator correctly identifies an optimal
primal basic sequence, and an integer flow can be easily recovered by solving
a triangular system of linear equations. In general, however, the arc indices
do not converge to a basic sequence. Let r = {tl,"" t q } denote the set
of arc indices in the spanning forest. To obtain a tentative primal basic
solution, first set flow on arcs not in the forest to either their upper or lower
bound, i.e. for all i E A \ 7:

if s~t > z~t


otherwise,

where sk and zk are the current iterates of the dual slack vectors as defined
in (31). The remaining basic arcs have flows that satisfy the linear system

ATXT = b - L uiAi, (57)


iEn-

where n- = {i E A \ r : sf ~ zf}. Because AT can be reordered in a


triangular form, (57) can be solved in O{m) operations. If UT ~ xT ~ 0
then the primal solution is feasible and optimality can be tested.
Optimality can be verified by producing a dual feasible solution (y* , s*, z*)
that is either complementary or that implies a duality gap less than 1. The
first step to build a tentative optimal dual solution is identify the set of dual
constraints defining the supporting affine space of the dual face complemen-
tary to x*,
:F = {i E r : 0 < xi < Ui},
i.e. the set of arcs with zero dual slacks. Since, in general, x* is not feasible,
:F is usually determined by the indicators of subsection 4.3.4, as the index-
set of active arcs. To ensure a complementary primal-dual pair, the current
dual interior vector yk is projected orthogonally onto this affine space. The
solution y* of the least squares problem

(58)
Interior Point Methods for Combinatorial Optimization 231

is the projected dual iterate.


Let G:F = (N, F) be the subgraph of G with F as its set of arcs. Since
this subgraph is a forest, its incidence matrix, A:F, can be reordered into a
block triangular form, with each block corresponding to a tree in the forest.
Assume G:F has p components, with TI, ... ,Tp as the sets of arcs in each
component tree. After reordering, the incidence matrix can be represented
as

The supporting affine space of the dual face can be expressed as the sum
of orthogonal one-dimensional subspaces. The operation in (58) can be
performed by computing the orthogonal projections onto each individual
subspace independently, and therefore can be completed in O(m) time. For
i = 1, ... ,p, denote the number of arcs in Ti by mi, and the set of nodes
spanned by those arcs by M. ATi is an (mi + 1) x mi matrix and each
subspace
.T,.;
'.I!'.
= {Y"."
IV, T; Ni = c}
E Rm i+ 1 .• ATy T;

has dimension one. For all YN; E Wi,


o h
YNi = YNi + CtiYNi' (59)

where yJ.,r; is a given solution in Wi and Y'Ni is a solution of the homogeneous


system A~;YN; = 0. Since AT; is the incidence matrix of a tree, the unit
vector is a homogeneous solution. The given solution yJ.,r; can be computed
by selecting v EM, setting y~ = 0, removing the row corresponding to node
v from matrix ATi and solving the resulting triangular system
-T
ATiYN;\{v} = CT;·
With the representation in (59), the orthogonal projection of YN; onto sub-
space Wi is
* _ 0 eJr (YN; -
i yJ.,r)
YNi - YNi + (mi + 1) eN;

where e is the unit vector. The orthogonal projection, as indicated in (58),


is obtained by combining the projections onto each subspace,

Y* = (*
YNi"'" * q )•
YN
232 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

A feasible dual solution is built by computing the slacks as

z'! =
-5·' if d·, < 0 ifdi<O
{
, 0 otherwise, otherwise,

where di = Ci - AT y*.
If the solution of (57) is feasible, optimality can be checked at this point,
using the projected dual solution as a lower bound on the optimal flow. The
primal and dual solutions, x* and (y*, s*, z*), are optimal if complementary
slackness is satisfied, i.e. if for all i E A \ 7 either si > 0 and xi = 0 or
z; > 0 and xi = Ui. Otherwise, the primal solution, x*, is still optimal if
the duality gap is less than 1, i.e. if cT x* - bT y* + UT z* < 1.
However, in general, the method proceeds attempting to find a feasible
flow x* that is complementary to the projected dual solution y.. Based on
the projected dual solution y., a refined tentative optimal face is selected
by redefining the set of active arcs as

Next, the method attempts to build a primal feasible solution, x*, comple-
mentary to the tentative dual optimal solution by setting the inactive arcs
to lower or upper bounds, i.e., for i E A \ j:,

if i E n+ = {i E A \ j: Ci - Aly* > O}
if i E n- = {i E A \ j: Ci - Aly· < O}.
By considering only the active arcs, a restricted network is built, represented
by the constraint set

Aj:xj: = b= b- L: UiAi, (60)


iE{}-

(61)
Clearly, from the flow balance constraints (60), if a feasible flow x} for
the restricted network exists, it defines, along with and xn+ xn-,
a primal
feasible solution complementary to y*. A feasible flow for the restricted
network can be determined by solving a maximum flow problem on the
augmented network defined by underlying graph G = (N, .4), where

N={u}U{8}uN
Interior Point Methods for Combinatorial Optimization 233

and
.A = EUeU.1".
In addition, for each arc (i,j) E .1" there is an associated capacity Uij. The
additional arcs are such that
E = {(O",i) : i E N+},
with associated capacity bi for each arc (0", i), and
e= {(i,e) : i E N-},
with associated capacity -bi for each arc (i, e), where N+ = {i EN bi>
O} and N- = {i EN: bi < O}. It can be shown that if M u,8 is the
maximum flow value from 0" to e, and x is a maximal flow on the augmented
network, then M u ,8 = EiEN'+ bi if and only if xj: is a feasible flow for
the restricted network. Therefore, finding a feasible flow for the restricted
network involves the solution a maximum flow problem. Furthermore, this
feasible flow is integer, as we can select a maximum flow algorithm [4] that
provides an integer solution.

5 Branch and bound methods


Branch and bound methods are exact algorithms for integer programming
problems - given enough time, they are guaranteed to find an optimal solu-
tion. If there is not enough time available to solve a given problem exactly,
a branch and bound algorithm can still be used to provide a bound on the
optimal value. These methods can be used in conjunction with a heuristic
algorithm such as local search, tabu search, simulated annealing, GRASP,
genetic algorithms, or more specialized algorithms, to give a good solution
to a problem, with a guarantee on the maximum possible improvement avail-
able over this good solution. Branch and bound algorithms work by solving
relaxations of the integer programming problem, and selectively partitioning
the feasible region to eventually find the optimal solution.

5.1 General concepts


Consider an integer programming problem of the form
min eTx
subject to Ax :5 b
x 2: 0, integer,
234 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

where A is an m x n matrix, c and x are n-vectors, and b is an m-vector.


The linear programming relaxation (LP relaxation) of this problem is

min
subject to

If the optimal solution x* to the LP relaxation is integral then it solves the


integer programming problem also. Generally, the optimal solution to the
LP relaxation will not be an integral point. In this case, the value of the
LP relaxation provides a lower bound on the optimal value of the integer
program, and we attempt to improve the relaxation.
In a branch and bound method, the relaxation is improved by dividing
the relaxation into two subproblems, where one of the variables is restricted
to take certain values. For example, if xi = 0.4, we may set up one subprob-
lem where Xi must be zero and another subproblem where Xi is restricted
to take a value of at least one. We think of the subproblems as forming a
tree, rooted at the initial relaxation.
If the solution to the relaxation of one of the subproblems in the tree is
integral then it provides an upper bound on the optimal value of the complete
integer program. If the solution to the relaxation of another subproblem has
value larger than this upper bound, then that subproblem can be pruned,
as no feasible solution for it can be optimal for the complete problem. If
the relaxation of the subproblem is infeasible then the subproblem itself is
infeasible and can be pruned. The only other possibility at a node of the tree
is that the solution to the relaxation is fractional, with value less than that
of the best known integral solution. In this case, we further subdivide the
subproblem. There are many techniques available for choosing the branching
variable and for choosing the next subproblem to examine; for more details,
see, for example, Parker and Rardin [115].
Interior point methods are good for linear programming problems with
a large number of variables, so they should also be useful for large integer
programming problems. Unfortunately, large integer programming problems
are often intractable for a general purpose method like branch and bound,
because the tree becomes prohibitively large. Branch and bound interior
point methods have proven successful for problems such as capacitated fa-
cility location problems [19, 25], where the integer variables correspond to
the decision as to whether to build a facility at a particular location, and
there are a large number of continuous variables corresponding to trans-
Interior Point Methods for Combinatorial Optimization 235

porting goods from the facilities to customers. For these problems, the LP
relaxations can be large even for instances with only a few integer variables.
As with interior point cutting plane methods (see section 6), the most
important technique for making an interior point branch and bound method
competitive is early termination. There are four possible outcomes at each
node of the branch and bound tree; for three of these, it suffices to solve
the relaxation approximately. The first outcome is that the relaxation has
value greater than the known upper bound on the optimal value, so the
node can be pruned by bounds. Usually, an interior point method will get
close to the optimal value quickly, so the possibility of pruning by bounds
can be detected early. The second possible outcome is that the relaxation
is infeasible. Even if this is not detected quickly, we can usually iterate
with a dual feasible algorithm (as with interior point cutting plane algo-
rithms), so if the dual value becomes larger than the known bound we can
prune. The third possible outcome is that the optimal solution to the re-
laxation is fractional. In this case, there are methods (including the Tapia
indicator [33] and other indicators discussed in section 4.3.4) for detecting
whether a variable is converging to a fractional value, and these can be used
before optimality is reached. The final possible outcome is that the optimal
solution to the relaxation is integral. In this situation, we can prune the
node, perhaps resulting in an improvement in the value of the best known
integral solution. Thus, we are able to prune in the only situation where it
is necessary to solve the relaxation to optimality.
If the optimal solution to the relaxation is fractional, then the subprob-
lem must be subdivided. The iterate for the parent problem will be dual
feasible but primal infeasible for the child problems. The solution process
can be restarted at these child problems either by using an infeasible inte-
rior point method or by using methods similar to those described for interior
point cuting plane methods in section 6. For very large or degenerate prob-
lems, the interior point method has proven superior to simplex even when
the interior point code is started from scratch at each node.
The first interior point branch and bound code was due to Borchers and
Mitchell [19]. This method was adapted by De Silva and Abramson [25]
specifically for facility location problems. Ramakrishnan et al. [119] have
developed a branch and bound algorithm for the quadratic assignment prob-
lem. The linear programming relaxations at the nodes of the tree for this
problem are so large that it was necessary to use an interior point method
to solve them. Lee and Mitchell have been developing a parallel interior
point branch and cut algorithm for mixed integer nonlinear programming
236 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

problems [85].

5.2 An example: The QAP


The quadratic assignment problem (QAP) can be stated as

where II is the set of all permutations of {1,2, ... ,n}, A = (aij) E Rnxn,
B = (bij) E Rnxn.
Resende, Ramakrishnan, and Drezner [127] consider the following linear
program as a lower bound (see also [1]) to the optimal solution of a QAP.

(i<j) (r;l:s)
min L L LL(aijbrs + ajibsr)Yirjs
iEI rEI jEI sEI

subject to:
(j>i) (j<i)
L Yirjs + L Yjsir = Xir, i E I,r E I,s E I (s ~ r),
jEI jEI

(r;l:s)
L Yirjs = XiT! i E I,r E I,j E I (j > i),
sEI
(r;l:s)
L Yjsir = Xir, i E I,r E I,j E I (j < i),
sEI

L Xir = 1, rEI,
iEI

L Xir = 1, i E I,
rEI
o::; Xir ::; 1, i E I, rEI,
0::; Yirjs::; 1, i E I,r E I,j E I,s E I,
where the set I = {I, 2, ... , n}. This linear program has n 2 (n - 1)2/2 + n 2
variables and 2n 2 (n - 1) + 2n constraints. Table 6 shows the dimension of
theses linear programs for several values of n.
Interior Point Methods for Combinatorial Optimization 237

Table 6: Dimension of lower bound linear programs

n constraints variables
2 12 6
3 42 27
4 104 88
5 210 225
6 372 486
7 602 931
8 912 1632
9 1314 2673
10 1820 4150
11 2442 6171
12 3192 8856
13 4082 12337
14 5124 16758

The linear programs were solved with ADP [79], a dual interior point
algorithm (see Subsection 2.2). The solver produces a sequence of lower
bounds (dual interior solutions), each of which can be compared with the
best upper bound to decide if pruning of the search tree can be done at
the node on which the lower bound is computed. Figure 7 illustrates the
sequence of lower bounds produced by ADP, compared to the sequence of
feasible primal solutions produced by the primal simplex code of CPLEX on
QAPLIB test problem nug15. The figure suggests that the algorithm can be
stopped many iterations prior to convergence to the optimal value and still
be close in value to the optimal solution. This is important in branch and
bound codes, where often a lower bound needed to prune the search tree is
less than the value of the best lower bound.
Pardalos, Ramakrishnan, Resende, and Li [110] describe a branch and
bound algorithm used to study the effectiveness of a variance reduction
based lower bound proposed by Li, Pardalos, Ramakrishnan, and Resende
[87]. This branch and bound algorithm is used by Ramakrishnan, Pardalos,
and Resende [121] in conjunction with the LP-based lower bound described
earlier.
In the first step, an initial upper bound is computed and an initial
branch-and-bound search tree is set up. The branch and bound tree is a
238 J.E. Mitchell, P.M. Pardalos, a.nd M.G.C. Resende

obj

1000 *
-it-

800 ADP +
simplex x
optimal QAP bound -

100 1000 10000 100000


seconds

Figure 7: CPLEX simplex and ADP iterates on nug15


Interior Point Methods for Combinatorial Optimization 239

binary tree, each node having a left and right child. For the purpose of
describing the branching process, denote, at any node of the branch and
bound tree, SA to be the set of already assigned facilities in the partial as-
signment, SE the facilities that will never be in the partial assignment in
any node of the subtree rooted at the current node. Let S~, sk and SA, SE
be the corresponding sets for the left and right children of the current node.
Let q denote the partial assignment at the current node. Each node of the
branch and bound tree is organized as a heap with a key that is equal to the
lower bound on the solution to the original QAP obtainable by any node in
the subtree rooted at this node. The binary tree is organized in maximum
order, i.e. the node with the largest lower bound is first.
The initial best known upper bound is computed by the GRASP heuristic
described in [88, 126]. The initial search tree consists of n nodes with SA ==
{i} and SE = 0 for i = 1, ... ,n, and q(i} = p(i}, where p is the permutation
obtained by the GRASP and for k '# i, q(k} = 0 and a key of O.
In the second step, the four procedures of the branch-and-bound as de-
scribed earlier are:
• Selection: The selection procedure simply chooses the partial permutation
stored in the root of the heap, i.e. we pick the node with the maximum
key.
• Branching: The branching procedure creates two children, the left and
right children, as follows:
pick i ¢ SA
S~ = SA
sk = SE U {i}
SA = SA U {i}
51: = 0
ql = q
{ = q and q(i} = p(i}, where p is the incumbent,
and the key of left child is the same as the key of the current node and
the key of the right child is the newly computed lower bound.
• Elimination: The elimination procedure compares the newly computed
lower bound of the right child to the incumbent and deletes the right
child if its key is greater than the incumbent, thus pruning the entire
subtree rooted at the right child.
240 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Table 7: Branch and bound algorithm on nug05

node UB LB permutation
1 52 58 1 - - - -
2 52 55 2 - - - -
3 52 52 5 - - - -
4 52 57 3 - - - -
5 52 50 4 - - - -
6 52 57 4 3 - - -
7 52 50 4 5 - - -
8 52 56 4 5 3 - -
9 52 56 4 5 2 - -
10 52 50 4 5 1 - -
11 52 60 4 5 1 3 -
12 52 50 4 5 1 2 -
50 - 4 5 1 2 3
13 50 56 4 2 - - -
14 50 50 4 1 - - -

• Termination Test: The algorithm stops if, and only if, the heap is empty.

In the final step, a best permutation found is taken as the global optimal
permutation.
As an example of the branch and bound algorithm, consider the QAPLIB
instance nug05. The iterations of the branch and bound algorithm are sum-
marized in Table 7. The GRASP approximation algorithm produced a so-
lution (UB) having cost 52. The branch and bound algorithm examined 14
nodes of the search tree. In the first five nodes, each facility was fixed to lo-
cation 1 and the lower bounds of each branch computed. The lower bounds
corresponding to branches rooted at nodes 1 through 4 were all greater than
or equal to the upper bound, and thus those branches of the tree could
be pruned. At node 6 a level-2 branching begins with a lower bound less
than the upper bound produced at node 7. Deeper branchings are done at
nodes 8, 11, and 12, at which point a new upper bound is computed having
value 50. Nodes 13 and 14 complete the search. The same branch and bound
algorithm using the GLB scans 44 nodes of the tree to prove optimality.
We tested the codes on several instances from the QAP library QAPLIB.
Interior Point Methods for Combinatorial Optimization 241

Table 8: QAP test instances: LP-based vs. GLB-based B&B algorithms


LP-based B&B GLB-based B&B time nodes
problem dim nodes time nodes time ratio ratio
nug05 5 12 11.7 44 0.1 117.0 3.7
nug06 6 6 9.5 82 0.1 95.0 13.7
nug07 7 7 16.6 115 0.1 166.0 16.4
nug08 8 8 35.1 895 0.2 175.5 111.9
nug12 12 220 5238.2 49063 14.6 358.8 223.0
nug15 15 1195 87085.7 1794507 912.4 95.4 1501.7
serlO 10 19 202.1 1494 0.6 336.8 78.6
serl2 12 252 5118.7 12918 4.8 1066.4 51.3
serl5 15 228 3043.3 506360 274.7 11.1 2220.9
rou10 10 52 275.7 2683 0.8 344.6 51.6
rou12 12 152 2715.9 37982 12.3 220.8 249.9
rou15 15 991 30811.7 4846805 2240.3 13.8 4890.8
ese08a 8 8 37.4 57464 7.0 5.3 7183.0
ese08b 8 208 491.1 7352 0.7 701.6 35.3
esc08e 8 8 42.7 2552 0.3 142.3 319.0
esc08d 8 8 38.1 2216 0.3 127.0 277.0
esc08e 8 64 251.0 10376 1.0 251.0 162.1
esc08f 8 8 37.6 1520 0.3 125.3 190.0
chr12a 12 12 312.0 672 0.7 445.7 56.0
chr12b 12 12 289.4 318 0.6 482.3 26.5
ehr12e 12 12 386.1 3214 1.5 257.4 267.8
ehr15a 15 15 1495.9 413825 235.5 6.4 27588.3
chr15b 15 15 1831.9 396255 217.8 8.4 26417.0
chr15e 15 15 1908.5 428722 240.0 8.0 28581.5
chr18a 18 35 1600.0 > 1.6 x 109 > 106 < 648.0- 1 > 45 X 106
242 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Table 8 summarizes the runs on both algorithms. For each instance it dis-
plays the name and dimension of the problem, as well as the solution times
and number of branch and bound search tree nodes examined by each of the
algorithms. The ratio of CPU times is also displayed.
The number of GRASP iterations was set to 100,000 for all runs.
Table 9 shows statistics for the LP-based algorithm. For each run, the
table lists the number of nodes examined, the number of nodes on which
the lower bound obtained was greater than the best upper bound at that
moment, the number of nodes on which the lower bound obtained was less
than or equal to the best upper bound at that moment, and the percentage
of nodes examined that were of levels 1, 2, 3, 4, and 5 or greater.

6 Branch and cut methods


For some problems, branch and bound algorithms can be improved by re-
fining the relaxations solved at each node of the tree, so that the relaxation
becomes a better and better approximation to the set of integral feasible
solutions. In a general branch and cut method, many linear programming
relaxations are solved at each node of the tree. Like branch and bound, a
branch and cut method is an exact algorithm for an integer programming
problem.
In a cutting plane method, extra constraints are added to the relaxation.
These extra constraints are satisfied by all feasible solutions to the integer
programming problem, but they are violated by the optimal solution to the
LP relaxation, so we call them cutting planes. As the name suggests, a
branch and cut method combines a cutting plane approach with a branch
and bound method, attacking the subproblems at the nodes of the tree using
a cutting plane method until it appears that no further progress can be made
in a reasonable amount of time.
Consider, for example, the integer programming problem

min -2XI X2
s.t. Xl + 2X2 < 7
2XI X2 < 3
XI,X2 ~ 0, integer.

This problem is illustrated in figure 8. The feasible integer points are indi-
cated. The LP relaxation is obtained by ignoring the integrality restrictions;
this is given by the polyhedron contained in the solid lines. The boundary
Interior Point Methods for Combinatorial Optimization 243

Table 9: QAP test instances: B&B tree search


nodes of B&B tree percentage of nodes of level
problem scan good bad 1 2 3 4 ~5
nug05 14 10 4 35.7 28.6 21.4 14.3 0.0
nug06 6 6 0 100.0 0.0 0.0 0.0 0.0
nug07 7 7 0 100.0 0.0 0.0 0.0 0.0
nug08 8 8 0 100.0 0.0 0.0 0.0 0.0
nug12 220 200 20 5.5 45.0 45.5 4.1 0.0
nug15 1195 1103 92 1.3 17.6 56.6 21.1 3.5
serlO 19 18 1 52.6 47.4 0.0 0.0 0.0
ser12 252 228 24 4.8 43.7 23.8 21.4 6.3
ser15 228 211 17 6.6 49.1 11.4 10.5 22.4
roulO 54 46 8 18.5 16.7 14.8 13.0 37.0
rou12 154 137 17 7.8 57.1 6.5 5.8 22.7
rou15 991 912 79 1.5 21.2 69.5 1.2 6.6
esc08a 8 8 0 100.0 0.0 0.0 0.0 0.0
esc08b 208 176 32 3.8 26.9 69.2 0.0 0.0
esc08c 8 8 0 100.0 0.0 0.0 0.0 0.0
ese08d 8 8 0 100.0 0.0 0.0 0.0 0.0
ese08e 64 56 8 12.5 87.5 0.0 0.0 0.0
ese08f 8 8 0 100.0 0.0 0.0 0.0 0.0
ehr12a 12 12 0 100.0 0.0 0.0 0.0 0.0
chr12b 12 12 0 100.0 0.0 0.0 0.0 0.0
chr12c 12 12 0 100.0 0.0 0.0 0.0 0.0
ehr15a 15 15 0 100.0 0.0 0.0 0.0 0.0
ehr15b 15 15 0 100.0 0.0 0.0 0.0 0.0
chr15e 15 15 0 100.0 0.0 0.0 0.0 0.0
chr18a 35 17 18 51.4 48.6 0.0 0.0 0.0
244 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Figure 8: A cutting plane example

of the convex hull of the feasible integer points is indicated by dashed lines
and can be described by the inequalities

Xl X2 :5 1
Xl :5 2
Xl + X2 :5 4
X2 :5 3
XI,X2 > o.
When solving this problem using a cutting plane algorithm, the linear
programming relaxation is first solved, giving the point Xl = 2.6, X2 = 2.2,
which has value -7.4. The inequalities Xl +X2 :5 4 and Xl :5 2 are satisfied by
all the feasible integer points but they are violated by the point {2.6,2.2}.
Thus, these two inequalities are valid cutting planes. Adding these two
Interior Point Methods for Combinatorial Optimization 245

inequalities to the relaxation and solving again gives the point Xl = 2,


X2 = 2, with value -6. Notice that this point is feasible in the original
integer program, so it must actually be optimal for that problem, since it is
optimal for a relaxation of the integer program.
If instead of adding both inequalities, we had just added the inequality
Xl ~ 2, the optimal solution to the new relaxation would have been Xl = 2,
X2 = 2.5, with value -6.5. We could then have looked for a cutting plane
that separates this point from the convex hull, for example Xl + X2 ~ 4,
added this to the relaxation and solved the new relaxation. This illustrates
the basic structure of a cutting plane algorithm:
• Solve the linear programming relaxation.
• If the solution to the relaxation is feasible in the integer programming
problem, STOP with optimality.
• Else, find one or more cutting planes that separate the optimal solution
to the relaxation from the convex hull of feasible integral points, and add
a subset of these constraints to the relaxation.

• Return to the first step.


Notice that the values of the relaxations provide lower bounds on the
optimal value of the integer program. These lower bounds can be used
to measure progress towards optimality, and to give performance guaran-
tees on integral solutions. None of these constraints can be omitted from
the description of the convex hull, and they are called facets of the con-
vex hull. Cutting planes that define facets are the strongest possible cutting
planes, and they should be added to the relaxation in preference to non-facet
defining inequalities, if possible. Families of facet defining inequalities are
known for many classes of integer programming problems (for example, the
traveling salesman problem [47, lOS], the matching problem [32], the linear
ordering problem [4S], and the maximum cut problem [27, 2S]). Junger et
al. [63] contains a survey of cutting plane methods for various integer pro-
gramming problems. Nemhauser and Wolsey [105] gives more background
on cutting plane methods for integer programming problems.
Traditionally, Gomory cutting planes [46] were used to improve the re-
laxation. These cuts are formed from the optimal tableau for the LP relax-
ation of the integer program. Cutting plane methods fell out of favour for
many years because algorithms using Gomory cuts showed slow convergence.
The resurgence of interest in these methods is due to the use of specialized
246 J.E. Mitchell, P.M. PardaJos, and M.C.C. Resende

methods that search for facets, enabling the algorithm to converge far more
rapidly. The cutting planes are determined using a separation routine, which
is usually very problem specific. General integer programming problems
have been solved by using cutting planes based on facets of the knapsack
problem min{cTx: aTx ~ b,x ~ O,X integer}: each constraint of the general
problem can be treated as a knapsack constraint [59]. Other general cut-
ting plane techniques include lift-and-project methods [10]. Gomory cutting
planes have also been the subject of a recent investigation [11]. It appears
that they are not as bad as originally thought, and they in fact work quite
well if certain modifications are made, such as adding many constraints at
once.
The separation problem for the problem min{ cT x : Ax ~ b, x integer}
can be defined as:

Given a point X, either determine that x is in the convex hull Q of the


feasible integer points, or find a cutting plane that separates x from the
convex hull.

Grotschel et al. [49] used the ellipsoid algorithm to show that if the separa-
tion problem can be solved in polynomial time then the problem (I P) itself
can also be solved in polynomial time. It follows that the separation prob-
lem for an N P-hard problem cannot be solved in polynomial time, unless
P = NP. Many of the separation routines in the literature are heuristics
designed to find cutting planes belonging to certain families of facets; there
are many undiscovered families of facets, and, for N P-hard problems, it is
unlikely that a complete description of the facets of the convex hull will
be discovered. Such a description would certainly contain an exponential
number offacets (provided P =1= NP). Even small problems can have many
facets. For example, the convex hull of the travelling salesman problem with
only nine cities has over 42 million facets [21].

6.1 Interior point cutting plane methods


We now assume that our integer programming problem takes the form

min cTx
subject to Ax = b
o~ x < u (IP)
Xi binary for i in I
x satisfies some additional conditions
Interior Point Methods for Combinatorial Optimization 247

where A is an m x n matrix of rank m, c, u, and x are n-vectors, b is an


m-vector, and 1 is a subset of {1, ... ,n}. We assume that these additional
conditions can be represented by linear constraints, perhaps by an expo-
nential number of such constraints. For example, the traveling salesman
problem can be represented in this form, with the additional conditions be-
ing the subtour elimination constraints [47, 108], and the conditions Ax = b
representing the degree constraints that the tour must enter and leave each
vertex exactly once. It is also possible that the problem does not need any
such additional conditions. Of course, problems with inequality constraints
can be written in this form by including slack variables. Note that we allow
a mixture of integer and continuous variables. In this section, we describe
cutting plane methods to solve (IP) where the LP relaxations are solved
using interior point methods. Computational experience with interior point
cutting plane methods is described in [99, 96, 101]. Previous surveys on
interior point cutting plane methods include [94, 95].
It has been observed that interior point algorithms do not work very well
when started from close to a nonoptimal extreme point. Of course, this is
exactly what we will have to do if we solve the LP relaxation to optimality,
since the fractional optimal solution to the relaxation will be a nonoptimal
infeasible extreme point after adding a cutting plane. The principal method
used to overcome this drawback is to only solve the relaxation approximately.
We use this approximate solution to generate an integral feasible point that
is, with luck, close to the optimal integral solution. The best integral solution
found so far gives an upper bound on the optimal value of (1P) and the value
of the dual solution gives a lower bound. A conceptual interior point cutting
plane algorithm is given in Figure 9.
To make this algorithm practical, we have to decide how accurately to
solve the relaxations. Notice that if the entries in c are integral then it
is sufficient to reduce the gap between the integral solution and the lower
bound to be less than one. Other refinements include methods for choosing
which cuts to add, generating good integral solutions, dropping unimportant
constraints, and fixing variables at their bounds. We discuss all of these
issues, and conclude by presenting the complete algorithm, and some typical
computational results.
In what follows, we refer several times to the linear ordering problem
and to finding the ground state of an Ising spin glass with no external force,
which we call the Ising spin glass problem. We now define those problems.

The linear ordering problem:


248 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

1. Solve the current relaxation of (I P) approximately using


an interior point method.
2. Generate an integral feasible solution from the approximate
primal solution.
3. If the gap between the best integral solution found so far
and the best lower bound provided by a dual solution is
sufficiently small, STOP with an optimal solution to the
original problem.
4. Otherwise, use a separation routine to generate cutting
planes, add these constraints to the LP relaxation, and
return to Step 1.

Figure 9: A conceptual interior point cutting plane algorithm

Given p sectors with costs gij for placing sector i before sector j for each
pair i and j of sectors, find a permutation of the sectors with minimum
total cost.
This problem can be represented algebraically as follows:
min El$i$p,l$j$p,i=#;j gijXij
subject to Xij + Xji= 1 for all pairs i and j
x binary
x satisfies the triangle inequalities,
where the triangle inequalities require that

Xij + Xjk + xki ~ 2


for each triple (i,j, k). When this problem is solved using a cutting plane
approach, the triangle inequalities are used as cuts. They define facets of
the convex hull of feasible solutions. Other facets are known (see Grotschel
et a1. [48]), but these prove to be unnecessary for many problems.

The Ising spin glass problem:


Given a grid of points on a torus, and given interaction forces Cij be-
tween each point and each of its neighbours, partition the vertices into
Interior Point Methods for Combinatorial Optimization 249

two sets to minimize the total cost, where the total cost is the sum of
all interaction forces between vertices that are in different sets.

The physical interpretation of this problem is that each point possesses either
a positive or a negative charge, and the interaction force will be either +1 or
-1 depending on the charges on the neighbours. The interactions between
the points can be measured, but the charges at the points cannot and need
to be determined. The Ising spin glass problem is a special case of the
maximum cut problem:

Given a graph G = (V, E) and edge weights Wi;, partition the vertices
into two sets to maximize the value of the cut, that is, the sum of the
weights of edges where the two ends of the edge are in opposite sets of
the partition.
This problem can be represented as an integer program, where Xe indicates
whether e is in the cut:
min cTX
subject to X is binary
x satisfies the cycle/cut inequalities.

The cycle/cut inequalities exploit the fact that every cycle and every cut
intersect in an even number of edges. They can be stated as

x(F) -x(C\F) ::;IFI-1


for sets F of odd cardinality, where C is a cycle in the graph, and x(S) :=
LeES Xe for any subset S of the edges. An inequality of this form defines a
facet if the cycle C is chordless.

6.2 Solving the relaxations approximately


The principle technique used to make an interior point cutting plane algo-
rithm practical is early termination: the current relaxation is only solved
approximately. Typically, the relaxations are solved more exactly as the
algorithm proceeds.
There are two main, related, advantages to early termination. In the first
place, iterations are saved on the current relaxation and the early solution
is usually good enough to enable the efficient detection of cutting planes,
so solving the relaxation to optimality would not provide any additional
250 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

information but would require additional computational effort. Secondly,


the approximate solution provides a better starting point for the method on
the next relaxation, because it is more centered than the optimal solution.
The disadvantages result from the fact that the solution to the current
relaxation may be the optimal solution to the integer program. It is possible
that the approximate solution is not in the convex hull of feasible integral
solutions, even though the optimal solution is in this set, and so cutting
planes may be generated and the relaxation may be modified unnecessarily.
On the other hand, if the approximate solution is in the convex hull, the
separation routines will not find cutting planes, but time will be wasted in
trying to find cuts. The effect of the first drawback can be mitigated by
initializing the relaxation with a point that is not too far from the center of
the convex hull, and by solving the relaxation to optimality occasionally, for
example on every tenth relaxation. This last technique proved to be very
useful in the experiments on Ising spin glass problems described in [96].
One way to reduce the cost of the first drawback is to control how accu-
rately the relaxations are solved by using a dynamically adjusted tolerance
for the duality gap: one searches for cutting planes once the duality gap
falls below this tolerance. If many cutting planes are found, then perhaps
one did not need to solve the current relaxation so accurately, so one can
increase this tolerance. On the other hand, if only a few cutting planes are
found then the tolerance should be decreased. In most ofthose experiments,
the tolerance was initiliazed with a value of 0.3 on the relative duality gap
and then was modified by multiplying by a power of 1.1, with the power
depending on the number of cuts found and on how badly these cuts were
violated by the current iterate.
Other ways to control the accuracy include requiring that the current
primal solution should have better value than the best known integral solu-
tion, and that the dual solution should be better than the best known lower
bound. Perhaps surprisingly, it was observed that the condition based on
the dual value is in general too restrictive, forcing the algorithm to perform
more iterations than necessary on the current relaxation without resulting
in a reduction in the total number of relaxations solved. Mitchell has even
had mixed results with the condition on the primal solution: for the linear
ordering problem, this condition resulted in an increase in computational
times, but it improved runtimes for Ising spin glass problems. (The Ising
spin glass problems are harder than the linear ordering problems.)
A more sophisticated condition is to require that the relaxations be
solved to an accuracy such that it appears that the optimal value to the
Interior Point Methods for Combinatorial Optimization 251

relaxation would not be sufficiently good to enable the algorithm to termi-


nate, unless it provided an optimal integral solution. For example, one can
require that the average of the primal and dual values should be at least
one less than the best known integral value, if all the data is integral. If
one solves such a relaxation to optimality, the lower bound would not be
sufficient to prove optimality with the current best known integral solution.
A similar condition was helpful for the Ising spin glass problems.

6.3 Restarting
When cutting planes are added, the current primal iterate is no longer fea-
sible, and the algorithm must be restarted. It is possible to restart from the
current iterate using a primal-dual infeasible interior point method, perhaps
with an initial centering step, but it has been observed that other techniques
have proved superior in practice.
After adding cutting planes, the primal relaxation becomes
min cTx
subject to Ax = b
AOx + XO = bO (LPnew)
0 :$ x :$ 1.£
0 :$ XO :$ 1.£0,

where X o is the vector of slack variables for the cutting planes given by
AOx :$ bOo The dual (LDnew) to this problem can be written as

max bTy + bOT yO uTw - uoTwo


subject to
ATy + AOTyO + Z - W = C
yO + ZO WO = 0
z,zo w,wo > O.
Since one uses an interior point method, and did not solve the last relaxation
to optimality, the last iterate is a primal dual pair X, (:ij, z, w) satisfying
Ax = b, 0 < x < 1.£, ATy + z - w = c, Z > 0, w > O.
A feasible interior solution to (LDnew) is obtained by setting yO = 0 and
ZO = WO = € for any positive € (a typical value is 10-3 ). It is often beneficial
to update the dual solution to an older iterate than (y, z, w), which will be
more centered. It is also useful to increase any small components of w or z
up to € if necessary; if Wi is increased, then Zi is also increased to maintain
dual feasibility, and vice versa.
252 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Primal feasibility is harder to maintain, since AOx > b. Possible updating


schemes are based upon knowing a point xQ in the interior of the convex hull
of feasible integer points. Of course, such a point will be an interior point in
{LPnew}. One can either update to this point, or to an appropriate convex
combination of this point and x. It is often straightforward to initialize xQ:
for the linear ordering problem and for the maximum cut problem, one can
take xQ to be the vector of halves; for the undirected traveling salesman
problem on a complete graph with n cities, one can take each component of
xQ to be 2j{n - I} {each component corresponds to an edge; an edge e is
in the tour if and only if Xe = I}. The point xQ can be updated by moving
towards the current primal iterate or by moving towards the best integral
solution found so far. For the Ising spin glass problem, Mitchell found it
best not to update x Q , but to restart by taking a convex combination of
xQ and x which was 95% of the way from xQ to the boundary of new the
relaxation. On the other hand, updating xQ by moving towards the current
primal iterate worked well for the linear ordering problem. Another possible
restarting scheme is to store earlier iterates, and take the most recent iterate
that is feasible in the current relaxation. This works well on some instances,
but it is generally outperformed by methods based on xQ.

6.4 Primal heuristics and termination


A good primal heuristic can save many iterations and stages, especially if
the objective function data is integral. The algorithm terminates when the
gap between the best known integral solution and the best lower bound
drops below some tolerance. If the data is integer, then a tolerance of one is
sufficient, so it is not necessary to refine the relaxation to such a degree that
it completely describes the convex hull in the region of the optimal integral
solution.
The importance of the primal heuristic varies from problem to prob-
lem. Mitchell found that his runtimes improved dramatically for the Ising
spin glass problem when he implemented a good local search heuristic, even
though the heuristic itself required as much as 60% of the total runtime on
some large instances. The primal heuristic was not nearly so important for
the linear ordering problem, where it was relatively easy to generate the opti-
mal ordering from a very good fractional solution. Another indication of the
difference in the importance of the primal heuristic for these two problems
could be observed when Mitchell tried to solve them so that the gap between
the best integral solution and the lower bound was less than, say, 10- 6 • The
Interior Point Methods for Combinatorial Optimization 253

linear ordering problems could be solved almost as easily as before, but the
larger spin glass problems became computationally intractable.

6.5 Separation routines


Separation routines are problem specific. Good routines for simplex based
cutting plane algorithms can usually be adapted to interior point cutting
plane methods. Because the iterates generated by the interior point ap-
proach are more centered, it may be possible to find deeper cuts and cutting
planes that are more important. This is a topic that warrants further inves-
tigation.
One issue that is specific to separation routines for interior point cutting
plane algorithms is the effect of the cutting planes on the sparsity of the ma-
trix AAT. (Here, A represents the whole constraint matrix.) If the structure
of this matrix is unfavourable, then a simplex method will outperform an
interior point method based on Cholesky factorization, even for the linear
programming relaxation (see, for example, [90]). For this reason, it is often
useful to add cuts that are variable-disjoint, that is, a particular Xi appears
in just one of the constraints added at a particular stage.

6.6 Fixing variables


When using a simplex cutting plane algorithm, it is well known that a vari-
able can be fixed at zero or one if the corresponding reduced cost is suffi-
ciently large (see, for example [108]). The dual variables can be used for the
same purpose if an interior point method is used.
When using a cutting plane algorithm, an upper bound Vu on the optimal
value is provided by a feasible integral solution. Let v be the value of the
current dual iterate (17, z, w). It was shown in [97] that if Zi > Vu - v then
Xi must be zero in any optimal integral solution. Similarly, if Wi > Vu - v
then Xi must be one in any optimal solution.
These techniques can be very useful for reducing the size of the relax-
ations. They are most useful when the objective function data is fractional,
since the gap between the upper and lower bounds has to become small in
order to prove optimality, so many of the dual variables will eventually be
large enough that the integral variables can be fixed.
Notice that if a variable is fixed, and thus eliminated from the relaxation,
the point xQ is no longer feasible. Therefore care has to be taken when
restarting the algorithm. In particular, it is useful to examine the logical
254 J.E. Mitchell, P.M. Pardalos, and M.O.C. Resende

implications of fixing a variable; it may be possible to fix further variables,


or to impose constraints on the remaining variables. For example, when
solving a maximum cut problem, if one fixes two of the edges of a cycle of
length 3 then the third edge can also be fixed. If one fixes one edge of a cycle
of length 3, then the variables for the other two edges can be constrained
to either take the same values as each other, or to take opposite values,
depending on the value of the fixed edge. Fixing variables and adding these
logical constraints can worsen the conditioning of the constraint matrix,
perhaps introducing rank deficiencies. Thus, care must be exercised.

6.7 Dropping constraints


Dropping unimportant constraints reduces the size of the relaxation and so
enables the relaxation to be solved more quickly. It is possible to develop
tests based on ellipsoids to determine when a constraint can be dropped,
but the cost of these tests outweighs the computational savings. Therefore,
an implementation will generally drop a constraint based on the simple
test that its slack variable is large. Of course, it is undesirable to have a
constraint be repeatedly added and dropped; a possible remedy is to insist
that a constraint cannot be dropped for several stages.
The development of efficient, rigorous tests for dropping constraints
would be useful.

6.8 The complete algorithm


The complete algorithm is contained in figure 10.
If a primal feasible solution is known, v U can be initialized in Step 1 to
take the value of that solution; otherwise v U should be some large number.
If all the objective function coefficients Ci correspond to binary variables,
then the lower bound v L can be initialized to be Ei mini Ci, O}; otherwise,
the lower bound can be taken to be a negative number with large absolute
value.

6.9 Some computational results


We present some computational results for Ising spin glass problems on grid
of sizes up to 100 x 100 in Table 10. For comparison, De Simone et al. [26]
have solved problems of size up to 70 x 70 with a simplex cutting plane
algorithm using CPLEX3.0 on a Sun Sparc 10 workstation, requiring up to
a day for each problem. The results in Table 10 were obtained on a Sun
Interior Point Methods for Combinatorial Optimization 255

1. Initialize: Read in the problem. Set up the initial relax-


ation. Find initial interior feasible primal and dual points.
Find a point xQ in the interior of the convex hull of feasi-
ble integral solutions. Choose a tolerance T on optimality
for the integer program. Choose a tolerance p on the du-
ality gap for the relaxation. Initialize the upper and lower
bounds v U and v L on the optimal value appropriately.
2. Iterate: Take a primal-dual predictor-corrector step from
the current iterate.
3. Add cuts? If the relative duality gap 0 is smaller than
p (and perhaps if other conditions on the primal and dual
values are met), then go to Step 4; otherwise, return to
Step 2.
4. Primal heuristic: Search for a good integral solution,
starting from the current primal iterate. Update v U if a
solution is found which is better than this bound.
5. Check for optimality: If v U - v L < T, STOP: the best
integer solution found so far is optimal.
6. Search for cutting planes: Use the separation routines
to find cutting planes. If cutting planes are found, go to
Step 7. If none are found and 0 ~ 10- 8 , reduce p and
return to Step 2. If none are found and 0 < 10-8 then
STOP with a nonoptimal solution; use branch and bound
to find the optimal solution.
7. Modify the relaxation: Add an appropriate subset of
the violated constraints to the relaxation. Increase p if it
appears that the relaxations do not need to be solved so ac-
curately. Decrease p if it appears that the relaxations need
to be solved more accurately. Fix any variables if possi-
ble, and add any resulting constraints. Drop unimportant
constraints.
8. Restart: Update the primal and dual solutions to give
feasible interior points in the new relaxation. Return to
Step 2.

Figure 10: An interior point cutting plane algorithm


256 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

L Sample Mean Std Dev Minimum Maximum


Size
10 100 0.42 0.20 0.17 1.17
20 100 4.87 2.01 1.30 12.48
30 100 24.32 11.84 7.42 87.00
40 100 88.46 43.68 32.50 259.02
50 100 272.86 151.59 96.35 795.50
60 100 860.57 969.79 227.38 7450.18
70 100 1946.14 1286.13 593.57 8370.37
80 100 5504.11 4981.00 1403.27 32470.40
90 100 10984.82 6683.37 2474.20 28785.30
100 100 12030.69 3879.55 3855.02 21922.60

Table 10: Time (seconds) to solve Ising spin glass problems

Sparc 20/71, and are taken from [98]. As can be seen, even the largest
problems were solved in an average of less than 3~ hours. They needed
approximately nine iterations per relaxation - the later relaxations required
more iterations and the earlier relaxations fewer. The primal heuristic took
approximately 40% of the total runtime.

6.10 Combining interior point and simplex cutting plane al-


gorithms
Practical experience with interior point cutting plane algorithms has shown
that often initially they add a large number of constraints at a time (hun-
dreds or even thousands), and the number of added constraints decreases to
just a handful at a time towards the end. The number of iterations to reop-
timize increases slightly as optimality is approached, because the relaxations
are solved to a higher degree of accuracy.
When a simplex method is used to solve the relaxations, the number of
iterations to reoptimize depends greatly on the number of added constraints.
Initially, when many constraints are added, the dual simplex method can
take a long time to reoptimize, but towards the end it can reoptimize in very
few iterations, perhaps as few as ten.
Because of the time required for an iteration of an interior point method,
it is very hard to compete with the speed of simplex for solving these last
Interior Point Methods for Combinatorial Optimization 257

few relaxations. Conversely, the interior point method is considerably faster


for the first few stages. The interior point method may also make a better
selection of cutting planes in these initial stages, because it is cutting off an
interior point that is well-centered, a property that is intensified because it
is looking for cutting planes before termination.
Mitchell and Borchers [100] investigated solving linear ordering problems
with a cutting plane code that uses an interior point method for the first few
stages and a dual simplex method for the last few stages. Computational
results are contained in table 11. These problems have up to 250 sectors,

n % zeros Interior Simplex Combined


150 0 206 75 68
200 0 755 385 209
250 0 4492 3797 592
100 20% 1405 1296 230
150 10% 2247 1294 208
200 10% N/A 9984 879

Table 11: Preliminary Results on Linear Ordering Problems.

with a percentage of the cost entries zeroed out. The nonzero costs above
the diagonal were uniformly distributed between 0 and 99, and those be-
low the diagonal were uniformly distributed between 0 and 39. The table
contains runtimes in seconds on a Sun SPARC 20/71 for an interior point
cutting plane code, a simplex cutting plane code using CPLEX 4.0, and a
combined cutting plane code. The interior point code was unable to solve
the problems with 200 sectors and 20% of the entries zeroed out because of
space limitations. As can be seen the combined code is more than 10 times
faster than the simplex code on the largest problems, and the interior point
and simplex codes require similar amounts of time, at least on the harder
problems.

6.11 Interior point column generation methods for other prob-


lems
A cutting plane method can be regarded as a column generation method
applied to the dual problem. Interior point methods have been successfully
applied in several other situations amenable to solution by a column gener-
258 J.E. Mitchell, P.M. Pardalos, and M.C.C. Resende

ation approach. Goffin et al. [42] have solved nondifferentiable optimization


problems. Bahn et al. [9] have used an interior point method within the
L-shaped decomposition method of Van Slyke and Wets [136] for stochastic
programming problems. Goffin et al. [41] have also solved multicommodity
network flow problems using an interior point column generation approach.
In this method, the columns correspond to different paths from an origin to
a destination, and they are generated by solving a shortest path problem
with an appropriate cost vector.

6.12 Theoretical issues and future directions


As mentioned earlier, the ellipsoid algorithm can be used to solve an integer
programming problem in polynomial time if the separation problem can be
solved in polynomial time. It is not currently known how to use an inte-
rior point method in an exactly analogous way. Atkinson and Vaidya [8]
developed an interior point algorithm for this process, but their algorithm
requires that unimportant constraints be dropped, unlike the ellipsoid algo-
rithm. Vaidya later obtained a similar result for an algorithm that used the
volumetric center [138]. Goffin et al. [44] have proposed a fully polynomial
algorithm that does not require that unimportant constraints be removed.
It is an interesting open theoretical question to find an interior point al-
gorithm that does not require that unimportant constraints be removed,
and also solves the optimization problem in polynomial time provided the
separation problem can be solved in polynomial time.
The algorithms proposed in [8, 138, 44] required that only a single con-
straint be added at a time, and that the constraint be added far from the
current iterate. These algorithms have been extended to situations where
many cuts are added at once, and the constraints are added right through the
current iterate, with no great increase in the complexity bound [124, 125,43].
It has been shown that if p constraints are added through the analytic center
then the analytic center of the new feasible region can be found in O( yIP)
iterations [124].
There are several open computational questions with interior point cut-
ting plane methods. Combining interior point methods with the simplex
algorithm needs to be investigated further. When a direct method is used
to calculate the Newton direction, it is necessary to choose an ordering of
the columns of AAT to reduce fill in the Cholesky factor; it would be in-
teresting to see if the ordering from one stage can be efficiently modified
to give an ordering for the next stage, rather than calculating an ordering
Interior Point Methods for Combinatorial Optimization 259

from scratch. When the constraint matrix contains many dense columns, it
becomes expensive to use a direct method to calculate the Newton direc-
tion; it would be interesting to examine whether it is efficient to switch to a
preconditioned conjugate gradient method in the later stages.

7 N onconvex potential function minimization


Consider the problem of maximizing a convex quadratic function defined as
m
max wTw= LW~ (62)
i=l
subject to
(63)
The significance of this optimization problem is that many combinatorial
optimization problems can be formulated as above with the additional re-
quirement that the variables are binary.
In [73, 77] a new affine scaling algorithm was proposed for solving the
above problem using a logarithmic potential function. Consider the noncon-
vex optimization problem
min {<p(w) I ATW :::; b}, (64)
where

<p(w) (65)

(66)

and where
c:4(w) = bi - aTw, i = 1, ... ,n, (67)
are the slacks. The denominator of the log term of <p(w) is the geometric
mean of the slacks and is maximized at the analytic center of the polytope
defined by
£, = {w
E Rm I AT w: :; b} .
To find a local (perhaps global) solution of (64), an approach similar to
the classical Levenberg-Marquardt methods [86, 91] is used. Let

w E £,0 = {w E R
O m I ATW < b}
260 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

be a given initial interior point. The algorithm generates a sequence of


interior points of C.
Let w k E CO be the k-th iterate. Around w k a quadratic approximation
of the potential function is set up. Let D = diag( dd w), ... , dn (w)), e =
(1, ... , 1), fo = m-wT wand C be a constant. The quadratic approximation
of <p(w) around wk is given by

Q(w) = ~(w - wk)TH(w - w k ) + hT(w - w k ) + C (68)

where the Hessian is

(69)

and the gradient is


1 k 1
h = --w + - AD- e.
1
(70)
fo n
Recall that minimizing (68) over a polytope is NP-complete. However, if
the polytope is substituted by an inscribed ellipsoid, the resulting approx-
imate problem can be solved in polynomial time [147]. Since preliminary
implementations of this algorithm indicate that trust region methods are
more efficient for solving these problems, in the discussion that follows we
consider a trust region approach.
Consider the ellipsoid

e(r) = {w E Rm I (w - wk)T AD-2AT(w - w k ) ~ r2}.

To see that the ellipsoid e(r) is inscribed in the polytope C, assume that
r = 1 and let y E e(I). Then
(y - wk)T AD- 2 AT(y - w k ) ~ 1

and consequently
D- 1 AT(y - w k ) ~ e,
where w k E Co. Denoting the i-th row of AT by aT-, we have
b IT kaf.(y-w k ) ~ 1, Vi= 1, ... ,n.
i - ai. w

Hence,
Interior Point Methods for Combinatorial Optimization 261

and consequently
ary ~ bi, Vi = 1, ... ,n,
i.e. AT y ~ b, showing that y E C. This shows that £(1) C C and since
£(r) C £(1), for 0 ~ r < 1, then £(r) C C, i.e. £(r) is an inscribed ellipsoid
in C.
Substituting the polytope by the appropriate inscribed ellipsoid and let-
ting /),. w == w - w k results in the minimization of a quadratic function over
an ellipsoid, i.e.
(71)

subject to
(72)
The optimal solution /)"w· to (71-72) is a descent direction of Q(w) from wk.
For a given radius r > 0, the value of the original potential function cp(w)
may increase by moving in the direction /)"w·, because of the higher order
terms ignored in the approximation. It can be easily verified, however, that
if the radius is decreased sufficiently, the value of the potential function will
decrease by moving in the new /)"w· direction. We shall say a local minimum
to (64) has been found if the radius must be reduced below a tolerance € to
achieve a reduction in the value of the potential function.
The following result, proved in [73], characterizes the optimal solution
of (71-72). Using a linear transformation, the problem is transformed into
the minimization of a quadratic function over a sphere.
Consider the optimization problem

(73)

subject to
(74)
where Q E Rmxm is symmetric and indefinite, x, c E Rm and 0 < r E R.
Let Ul, ... , U m denote a full set of orthonormal eigenvectors spanning Rm
and let >'1, ... , Am be the corresponding eigenvalues ordered so that Al ~
A2 ~ ... ~ Am-l ~ Am. Denote 0 > Amin = min{Al, ... ,Am} and Umin the
corresponding eigenvector. Furthermore, let q be such that Amin = Al =
... = Aq < Aq+1. To describe the solution to (73-74) consider two cases:
Case 1: Assume El=l (cT Ui)2 > O. Let the scalar A E (-00, Amin) and
262 J.E. Mitchell, P.M. Pardalos, and M. G. C. Resende

consider the parametric family of vectors

x(>.) = _ f (c>'iTUi)Ui
i=l - >.
.

For any r > 0, denote by >'(r) the unique solution ofthe equation x(>.)T x(>.) =
r2 in >.. Then x(>.(r)) is the unique optimal solution of (73-74).
Case 2: Assume cT Ui = 0, Vi = 1, ... ,q. Let the scalar>' E (-00, >'min) and
consider the parametric family of vectors

(75)

Let
rmax = Ilx(>'min)ll2.
If r < rmax then for any 0 < r < rmax, denote by >'(r) the unique solution
of the equation x(>.)T x(>.) = r2 in >.. Then x(>.(r)) is the unique optimal
solution of (73-74).
If r ~ r max, then let aI, a2, ... ,aq be any real scalars such that
q
'"
L...J ai2 = r 2 - r max·
2
i=l

Then
~ ;:... (cT Ui)Ui
x = L...J aiui - L...J
i i=q+1 (>'i - >'min)
is an optimal solution of (73-74). Since the choice of ai's is arbitrary, this
solution is not unique.
This shows the existence of a unique optimal solution to (73-74) if r <
rmax. The proof of this result is based on another fact, used to develop the
algorithm described in [73, 77], that we state next.
Let the length of x(>') be
l (x(>')) == IIx(>')II~ = x(>.)T x(>'),
then l (x(>')) is monotonically increasing in >. in the interval>' E (-00, >'min).
To see this is so, consider two cases. First, assume L:{=l (cT Ui)2 > O. Con-
sider the parametric family of vectors

x(>') = - f
i=l
(cT Ui)Ui ,
>'i - >.
Interior Point Methods for Combinatorial Optimization 263

for>. E (-00, >'min). Now, assume that cT Ui = 0, Vi = 1, ... ,q and consider


the parametric family of vectors

(76)

for>. E (-00, >'min). Furthermore, assume

Then 1(x (>.)) is monotonically increasing in >. in the interval >. E (- 00, >'min).
The above result suggests an approach to solve the nonconvex optimiza-
tion problem (64). At each iteration, a quadratic approximation of the
potential function cp( w) around the iterate w k is minimized over an ellipsoid
inscribed in the polytope {w E RmlATw ~ b} and centered at wk. Either a
descent direction !::l.w* of cp( w) is produced or w k is said to be a local mini-
mum. A new iterate wk+ 1 is computed by moving from w k in the direction
!::l.w* such that cp(w k+1) < cp(w k ). This can be done by moving a fixed step
a in the direction !::l.w* or by doing a line search to find a that minimizes
the potential function cp(w k + a!::l.w*) [134].
Figure 11 shows a pseudo-code procedure cmq, for finding a local mini-
mum of the convex quadratic maximization problem. Procedure cmq takes as
input the problem dimension n, the A matrix, the b right hand side vector, an
initial estimate 1L0 of parameter IL and initial lower and upper bounds on the
acceptable length, 10 and 1o, respectively. In line 2, get_start_point returns
a strict interior point of the polytope under consideration, i.e. w k E £,0.
The algorithm iterates in the loop between lines 3 and 13, terminating
when a local optimum is found. At each iteration, a descent direction of
the potential function cp( w) is produced in lines 4 through 8. In line 4,
the minimization of a quadratic function over an ellipsoid (71-72) is solved.
Because of higher order terms the direction returned by descent_direction
may not be a descent direction for cp( w). In this case, loop 5 to 8 is repeated
until an improving direction for the potential function is produced or the
largest acceptable length falls below a given tolerance E.
lf an improving direction for cp(w) is found, a new point wk+l is defined
(in line 10) by moving from the current iterate w k in the direction !::l.w* by
a step length a < 1.
264 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

procedure cmq(n, A, b, J.to, la, 10)


1 k = OJ 'Y = l/(J.to + l/n)j i = laj 1 = 10j K = OJ
2 w k = get-start_point(A, b)j
3 do l> € ~
4 Aw* = descent_direction(-y, w k ,i, 1) j
5 do cp(w k + aAw*) ~ cp(w k ) and 1> € ~
6 1 = l/lrj
7 Aw* = descent_direction('Y, w k ,I, l)j
8 odj
9 if cp(wk + aAw*) < cp(w k ) ~
10 w k +1 = w k + aAw*j
11 k = k + 1j
12 ft·,
13 odj
end cmqj

Figure 11: Procedure cmq: Algorithm for nonconvex potential function min-
imization
Interior Point Methods for Combinatorial Optimization 265

7.1 Computing the descent direction


Now consider in more detail the computation of the descent direction for
the potential function. The algorithm described in this section is similar to
the trust region method described in More and Sorensen [104].
As discussed previously, the algorithm solves the optimization problem

(77)

subject to
(78)
to produce a descent direction Aw· for the potential function cp(w). A
solution Aw* E Rm to (77-78) is optimal if and only if there exists p ~ 0
such that

(H + pAD- 2AT) Aw· = -h (79)

p ((AW*)T AD- 2AT Aw* - r2) = 0 (80)

H + pAD- 2AT is positive semidefinite. (81)


With the change of variables 'Y = 1/(p + lin) and substituting the
Hessian (69) and the gradient (70) into (79) we obtain

Aw· = _ (AD-2 AT _ 2'Y wkwkT _ 21) -1 x


I~ 10
'Y (_..!.wk
10
+ !.AD- 1
n
e) (82)

that satisfies (79). Note that r does not appear in (82) and that (82) is not
defined for all values of r. However, if the radius r of the ellipsoid (78) is
kept within a certain range, then there exists an interval 0 ~ 'Y ~ 'Ymaz such
that
(83)

is nonsingular. Next, we show that for'Y small enough Aw* is a descent


direction of cp(w). Note that

Aw * = - (-2
AD A T - -w
2'Y kw kT - - I -1 'Y (--w
1'Y) 1 k + -AD
1 -1)e
I~ 10 10 n
266 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

= - [AD- 2AT {I - 'Y{AD- 2AT}-1 ( - :6wkwkT - Jo I) }] -1 X

'Y(_~wk + !AD- e) 1
10 n
= -'Y [I +'Y{AD-2AT }-1 (;6wkwkT + Jo I) ]-1 {AD- 2AT }-1 X
( _~wk + .!AD- e) 1
10 n
= 'Y[I + 'Y{AD-2AT)-1 (;6wkwkT + Jo I)] -1 X

(AD- 2AT)-1 (-h). {84}


Let 'Y = € > 0 and consider lim hT Aw*. Since
£-+0+

lim Aw* = € (AD- 2A T )-I{_h)


£-+0+
then
lim hT Aw* = -€ hT {AD- 2A T )-lh.
£-+0+
Since, by assumption, € > 0 and hT {AD-2 AT}-lh > 0 then
lim hT Aw* < 0,
£-+0+
showing that there exists 'Y > 0 such that the direction Aw*, given in {82},
is a descent direction of cp{w}.
The idea of the algorithm is to solve (77-78), more than once if necessary,
with the radius r as a variable. Parameter 'Y is varied until r takes a value
in some given interval. Each iteration of this algorithm is comprised of two
tasks. To simplify notation, let
He = AD- 2AT (85)
2 k kT 1
H =--w w --I (86)
o 16 10
and define
M=He+'YHo·
Given the current iterate w k , we first seek a value of'Y such that MAw = 'Yh
has a solution Aw*. This can be done by binary search, as we will see shortly.
Once such a parameter 'Y is found, the linear system
MAw* = 'Yh (87)
Interior Point Methods for Combinatorial Optimization 267

is solved for Dow* == Dow*(-y(r)). As was shown previously, the length


l(Dow*(-y)) is a monotonically increasing function of 'Y in the interval 0 ~
'Y ~ ,max' Optimality condition (80) implies that r = Jl{Dow*b)) if J1. > O.
Small lengths result in small changes in the potential function, since r is
small and the optimal solution lies on the surface of the ellipsoid. A length
that is too large may not correspond to an optimal solution of (77-78), since
this may require r > 1. An interval (l, I) called the acceptable length region,
is defined such that a length l(Dow*(-y)) is accepted if l ~ l(Dow*(-y)) ~ I. If
l(Dow*(-y» < l, 'Y is increased and (87) is resolved with the new M matrix
and h vector. On the other hand, if l(Dow*(-y)) > I, 'Y is reduced and (87)
is resolved. Once an acceptable length is produced we use Dow*(-y) as the
descent direction.
Figure 12 presents pseudo-code for procedure descent_direction, where
(77-78) is optimized. As input, procedure descent_direction is given an
estimate for parameter 'Y, the current iterate w k around which the inscrib-
ing ellipsoid is to be constructed and the current acceptable length region
defined by land 1. The value of'Y passed to descent_direction at minor
iteration k of cmq is the value returned by descent_direction at minor
iteration k - 1. It returns a descent direction Dow* of the quadratic ap-
proximation of the potential function Q(w) from wk , the next estimate for
parameter 'Y and the current lower bound of the acceptable length region l.
In line 1, the length 1 is set to a large number and several logical keys
are initialized: LDkey is true if a linear dependency in the rows of M is ever
found during the solution of the linear system (87) and is false otherwise;
'Ykey (Ikey) is true if an upper (lower) bound for an acceptable 'Y has been
found and false otherwise.
The problem of minimizing a nonconvex quadratic function over an ellip-
soid is carried out in the loop going from line 2 to 19. The loop is repeated
until either a length l is found such that l ~ 1 ~ I or 1 ~ l due to a linear
dependency found during the solution of (87), i.e. if LDkey = true. Lines
3 to 8 produce a descent direction that may not necessarily have an accept-
able length. In line 3 the matrix M and the right hand side vector b are
formed. The linear system (87) is tentatively solved in line 4. The solution
procedure may not be successful, i.e. M may be singular. This implies that
parameter, is too large and parameter, is reduced in line 5 of loop 4-7,
which is repeated until a nonsingular matrix M is produced.
Once a nonsingular M matrix is available, a descent direction Dow* is
computed in line 8 along with its corresponding length 1. Three cases can
268 J.E. Mitchell, P.M. PardaIos, and M.G.C. Resende

procedure descent..direction('Y, w k , I, 1)
1 l = 00; LDkey = false; A(key = false; '1..key = false;
2 do l > 1 or (l < I and LDkey = false) -+
3 M = He + 'YHo; b = 'Yh;
4 do MAw = b has no solution -+
5 'Y = 'Y /'Yr; LDkey = true;
6 M = He + 'YHo; b = 'Yh;
7 od;
8 Aw* = M-1b; 1 = (Aw*)T AD-2AT AW*j
9 if l < I and LDkey = false -+
10 -'Y = 'Yj '1..k ey = true;
11 if A(key = true -+ 'Y = v'TY fij
12 if A(key = false -+ 'Y = 'Y . 'Yr fi;
13 fij
14 ifl>l-+
15 'Y = 'Yj A(key = truej
16 'TI fij
if 'Y.Lk ey = true -+ 'Y = V.LI
17 if 'Y.Lkey = false -+ 'Y = 'Y / 'Yr fij
18 fi·,
19 odj
20 do 1 < I and LDkey = true -+ I = lIlr odj
21 return(Aw*)j
end descent_directionj

Figure 12: Procedure descent..direction: Algorithm to compute descent


direction in nonconvex potential function minimization
Interior Point Methods for Combinatorial Optimization 269

occur: (i) - the length is too small even though no linear dependency was
detected in the factorization; (ii) - the length is too large; or (iii) - the
length is acceptable. Case (iii) is the termination condition for the main
loop 2-19. In lines 9-13 the first case is considered. The value of 'Y is a
lower bound on an acceptable value of'Y and is recorded in line 10 and the
corresponding logical key is set. If an upper bound ;y for an acceptable value
of 'Y has been found the new estimate for 'Y is set to the geometric mean of
'1 and;y in line 11. Otherwise'Y is increased by a fixed factor in line 12.
Similar to the treatment of case (i), case (ii) is handled in lines 14-18.
The current value of'Y is an upper bound on an acceptable value of 'Y and is
recorded in line 15 and the corresponding logical key is set. If a lower bound
'1 for an acceptable value of 'Y has been found the new estimate for 'Y is set
to the geometric mean of '1 and ;y in line 16. Otherwise'Y is decreased by a
fixed factor in line 17.
Finally, in line 20, the lower bound l may have to be adjusted if 1 < l
and LDkey = true. Note that the key LDkey is used only to allow the
adjustment in the range of the acceptable length, so that the range returned
contains the current length l.

7.2 Some computational considerations


The density of the linear system solved at each iteration of descent_direction
is determined by the density of the Hessian matrix. Using the potential func-
tion described in the previous section, this Hessian,

is totally dense, because of the rank one component -kwkwkT. Consequently,


direct factorization solution techniques must be ruled out for large instances.
However, in the case where the matrix A is sparse, iterative methods can be
applied to approximately solve the linear system. In [71], a preconditioned
conjugate gradient algorithm, using diagonal preconditioning, was used to
solve the system efficiently taking advantage of the special structure of the
coefficient matrix. In this approach, the main computational effort is the
e
multiplication of a dense vector and the coefficient matrix M, i.e. Me.
This multiplication can be done efficiently, by considering fact that M is
the sum of three matrices, each of which has special structure. The first
270 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

multiplication,
.!.1{
fo
is simply a scaling of {. The second product,

e
is done in two steps. First, an inner product w kT is computed. Then, the
vector jgw k is scaled by the inner product. The third product,

is done in three steps. First the product ATe is carried out. The resulting
vector is scaled by D- 2 and multiplies A. Therefore, if A is sparse, the
entire matrix vector multiplication can be done efficiently.
In a recent study, Warners et al. [142] describe a new potential function
n
cpp(w} = m - wT w - LPi logdi(w},
i:::::l

whose gradient and Hessian are given by

h = -2w + AD-1p,

and
H = -21 +AD-1pD-1AT ,
where P = (PI, ... , Pn) and P = diag(p}. Note that the density of the Hessian
depends only on the density of AAT. Consequently, direct factorization
methods can be used efficiently when the density of AAT is small.

7.3 Application to combinatorial optimization


The algorithms discussed in this section have been applied to the following
integer programming problem: Given A' E Rmxn' and b' ERn', find w E Rm
such that:

AlT'w < b' (88)


Wi = {-I,I}, i=I, ... ,m. (89)
Interior Point Methods for Combinatorial Optimization 271

The more common form of integer programming, where variables Xi take on


(0,1) values, can be converted to the above form with the change of variables

1 +Wi .
Xi = -2-" = 1, ... ,m.

More specifically, let I denote an m x m identity matrix,

and

and let
I={WERmIATw~b and Wi={-l,l}}.
With this notation, we can state the integer programming problem as: Find
wEI.
As before, let
{w
.c = E Rm I ATW ~ b}
and consider the linear programming relaxation of (88-89), i.e. find W E .c.
One way of selecting ±1 integer solutions over fractional solutions in linear
programming is to introduce the quadratic objective function,
m
max wTw = Lwl
i=l

and solve the nonconvex quadratic programming problem (62-63). Note that
w T w ~ m, with the equality only occurring when Wj = ±1, j = 1, ... , m.
.c = =
Furthermore, if wEI then w E and Wi ± 1, i 1, ... , m and therefore
w T w = m. Hence, if w is the optimal solution to (62-63) then w E .c. If
w T w = m then Wi = ±1, i = 1, ... , m and therefore wEI. Consequently,
this shows that if w E .c then wEI if and only if w T w = m.
In place of (62-63), one solves the nonconvex potential function mini-
mization
(90)
272 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

where <pew) is given by (65-67). The generally applied scheme rounds each
iterate to an integer solution, terminating if a feasible integer solution is
produced. If the algorithm converges to a nongloballocal minimum of (90),
then the problem is modified by adding a cut and the algorithm is applied
to the augmented problem. Let v be the integer solution rounded off from
the local minimum. A valid cut is

(91)

Observe that if w = v then vT w = m. Otherwise, vT w ~ m - 2. Therefore,


the cut (91) excludes v but does not exclude any other feasible integral
solution of (88-89).
We note that adding a cut of the type above will not, theoretically, pre-
vent the algorithm from converging to the same local minimum twice. In
practice [77], the addition of the cut changes the objective function, conse-
quently altering the trajectory followed by the algorithm.
Most combinatorial optimization problems have very natural equivalent
integer and quadratic programming formulations [113]. The algorithms de-
scribed in this section have been applied to a variety of problems, including
maximum independent set [78], set covering [77], satisfiability [71, 134], in-
ductive inference [69, 70], and frequency assignment in cellular telephone
systems [143].

8 A lower bounding technique


A lower bound for the globally optimal solution of the quadratic program

min q(x) = ~xTQx + cT x (92)

subject to
x E P = {x E R n I Ax = b, x ~ O}, (93)
where Q E Rnxn, A E Rmxn, cERn, and bERm, can be obtained by
minimizing the objective function over the largest ellipsoid inscribed in P.
This technique can be applied to quadratic integer programming, a problem
that is NP-hard in the general case. Kamath and Karmarkar [66] proposed a
polynomial time interior point algorithm for computing these bounds. This
is one of the first computational approaches to solve semidefinite program-
ming relaxations. The problem is solved as a minimization of the trace of a
Interior Point Methods for Combinatorial Optimization 273

matrix subject to positive definiteness conditions. The algorithm takes no


more than O(nL) iterations (where L is the the number of bits required to
represent the input). The algorithm does two matrix inversions per itera-
tion.
Consider the quadratic integer program

(94)

subject to
xES = {-I, l}n, (95)
where Q E Rnxn is symmetric. Let fmin be the value of the optimal solution
of (94-95).
Consider the problem of finding good lower bounds on fmin. To apply
an interior point method to this problem, one needs to embed the discrete
set S in a continuous set T ;2 S. Clearly, the minimum of f(x) over T is a
lower bound on f min.
A commonly used approach is to choose the continuous set to be the box

B = {x E Rn I- 1$ Xi $ 1, i = 1, ... ,n}.

However, if f(x) is not convex, the problem of minimizing f(x) over B is


NP-hard. Consider this difficult case, and therefore assume that Q has at
least one negative eigenvalue. Since optimizing over a box can be hard,
instead enclose the box in an ellipsoid E. Let

u= {w = (WI, •.• , wn ) E Rn I Ef=l Wi = 1 and Wi> 0, i = 1, ... , n,},

and consider the parameterized ellipsoid

where w E U and W = diag(w).


Clearly, the set S is contained in E( w). If -Xmin (w) is the minimum
eigenvalue of W-l/2QW-l/2, then

and therefore
274 J.E. Mitchell, P.M. Pardalos, and M.C.C. Resende

Hence, the minimum value of I(x} over E(w} can be obtained by simply
computing the minimum eigenvalue of W- 1/ 2 QW- 1/ 2 • To further improve
the bound on I min requires that Amin (w) be maximized over the set U.
Therefore, the problem of finding a better lower bound is transformed into
the optimization problem
max J-t
subject to
XTQX
TW ~ J-t, Vx E Rn \ {O} and w E U.
x x
One can further simplify the problem by defining d = (d 1 , •.. , dn ) E Rn such
that Ef=l di = O. Let D = diag(d}. If

xT(Q - D}x >


xTWx -J-t,

then, since Ei=l Wi = 1 and Ei=l di = 0,

for xES. Now, define z = J-tw + d and let Z = diag(z). For all XES,

and therefore the problem becomes

subject to
xT(Q - Z)x ~ O.
Let M(z) = Q - Z. Observe that solving the above problem amounts
to minimizing the trace of M(z) while keeping M(z) positive semidefinite.
Since M(z} is real and symmetric, it has n real eigenvalues Ai(M(z)),i =
1, ... , n. To ensure positive definiteness, the eigenvalues of M(z) must be
nonnegative. Hence, the above problem is reformulated as

min tr(M(z))

subject to
Ai(M(z)) ~ 0, i = 1, ... ,no
Interior Point Methods for Combinatorial Optimization 275

procedure qplb(Q,€,z,opt)
1 z{O) = (Amin(Q) - l)ej
2 v{O) = OJ
3 M(z(O») = Q - Z{O)j k = OJ
4 do tr(M(z{k»)) - v(k») ~ € -+
5 Construct H(k) where Hi~) = (e'f M(z{k})-lej)2 j
6 f{k)(z) = 2nln(tr(M(z{k})) - v(k») -lndetM(z{k»)j
7 g{k) = V f{k)(z{k»)j
8 f3 = V
0.5/ g{k)T H{k)-lg{k)j
9 Solve H{k} ~z = _f3g(k)j
10 if g(k)T ~z < 0.5 -+
11 Increase v(k) until g(k)T ~z = 0.5j
12 fij
13 z(k+ 1) = z(k) + ~Zj k = k + Ij
14 odj
15 z = z(k)j opt = tr(Q) - v(k)j
end qplbj

Figure 13: Procedure qplb: Interior point algorithm for computing lower
bounds
276 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Kamath and Karmarkar [66, 67] proposed an interior point approach to


solve the above trace minimization problem, that takes no more that O(nL)
iterations, having two matrix inversions per iteration. Figure 13 shows a
pseudo code for this algorithm.
To analyze the algorithm, consider the parametric family of potential
functions given by

g(z,v) = 2nln(tr(M{z)) - v) -lndet{M{z)),

where v E R is a parameter. This algorithm will generate a monotonically


increasing sequence of parameters v(k) that converges to the optimal value
v*. The sequence v(k) is constructed together with the sequence z(k) of
interior points, as shown in the pseudo code in Figure 13. Since Q - Z* is
a positive definite matrix, v(O) = 0 ~ v* is used as the initial point in the
sequence.
Let g~k) (z, v) be the linear approximation of g(z, v) at z{k). Then

(k){) 2n T ( (k»)T
g1 Z,V =-tr(M(z(k»)-ve z+Vlndet M z z+C,

where C is a constant. Kamath and Karmarkar show how glk)(z,v) can


be reduced by a constant amount at each iteration. They prove that it is
possible to compute v(k+1) E R and a point z(k+ 1) in a closed ball of radius
o centered at z(k) such that v(k) ~ v(k+1) ~ v* and

g~k) (Z(k+1) , v(k+ 1») _ g~k) (z(k), v(k+1») ~ -0.

Using this fact, they show that, if z{k) is the current interior point and
v(k) ~ v* is the current estimate of the optimal value, then

2
g( z<k+l) v(k+1») _ g(z(k) v(k») < - 0 + -:-_-:-
0
, ,- 2(1- 0)'

where z(k+l) and v(k+1) are the new interior point and new estimate, respec-
tively. This proves polynomial-time complexity for the algorithm.

9 Semidefinite Programming Relaxations


There has been a great deal of interest recently in solving semidefinite pro-
gramming relaxations of combinatorial optimization problems [5, 6, 40, 144,
Interior Point Methods for Combinatorial Optimization 277

145, 152, 54, 53, 57, 112, 123, 111]. The semidefinite relaxations are solved
by an interior point approach. These papers have shown the strength of
the relaxations, and some of these papers have discussed cutting plane and
branch and cut approaches using these relaxations. The bounds obtained
from semidefinite relaxations are often better than those obtained using lin-
ear programming relaxations, but they are also usually more expensive to
compute.
Semidefinite programming relaxations of some integer programming prob-
lems have proven to be very powerful, and they can often provide better
bounds than those given by linear programming relaxations. There has
been interest in semidefinite programming relaxations since at least the sev-
enties (see Lovasz [89]). These were regarded as being purely of theoretical
interest until the recent development of interior point methods for semidefi-
nite programming problems [6, 106,56, 107,82, 139]. Interest was increased
further by Goemans and Williamson [40], who showed that the bounds gen-
erated by semidefinite programming relaxations for the maximum cut and
satisfiability problems were considerably better than those that could be
obtained from a linear programming relaxation, in the worst case, and that
the solutions to these relaxations could be exploited to generate good integer
solutions.
For an example of a semidefinite programming relaxation, consider the
quadratic integer programming problem

subject to
XES={-I,I}n,
where Q E Rnxn is symmetric, first discussed in equations (94) and (95) in
section 8.
We let trace(M) denote the trace of a square matrix. By exploiting the
fact that trace(AB}=trace(BA}, we can rewrite the product xTQx:

xTQx = trace(xTQx} = trace(QxxT )


This lets us reformulate the quadratic program as

min trace(QX)
subject to X=xx T
Xii = 1 i = 1, ... ,n.
278 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

The constraint that X = xxT is equivalent to saying that X must have rank
equal to one. This is a hard constraint to enforce, so it is relaxed to the
constraint that X is positive semi-definite, written X t: O. This gives the
following semidefinite programming relaxation:

min trace(QX)
subject to Xii = 1 i = 1, ... ,n
X~O.

Once we have a good relaxation, we can then (in principle) use a branch
and bound (or branch and cut) method to solve the problem to optimal-
ity. Helmberg et al. [53] showed that the semidefinite programming relax-
ations of general constrained 0 - 1 quadratic programming problems could
be strengthened by using valid inequalities of the cut-polytope. There are
a large number of such inequalities, and in [54], a branch and cut approach
using semidefinite relaxations is used to solve quadratic {-I, I} problems
with dense cost matrices. They branch using the criterion that two vari-
ables either take the same value or they take opposite values. This splits
the current SDP relaxation into two SDP subproblems, each corresponding
to quadratic {-I, I} problems of dimension one less. They are able to solve
problems with up to about 100 variables in a reasonable amount of time.
Helmberg et al. [57] contains a nice discussion of different families of
constraints for semidefinite relaxations of the quadratic knapsack problem.
They derive semidefinite constraints from both the objective function and
from the knapsack constraint. Many of the semidefinite constraints derived
from the objective function are manipulations of the linear constraints for
the Boolean quadric polytope. Similarly, they derive semidefinite constraints
from known facets of the knapsack polytope. They attempt to determine
the relative importance of different families of constraints.
The bottleneck with the branch and cut approach is the time required
to solve each relaxation, and in particular to calculate the interior point di-
rections. One way to reduce this time is to fix variables at -lor 1, in much
the same way that variables with large reduced costs can be fixed when we
use a branch and cut algorithm that solves linear programming relaxations
at each node. Helmberg [52] has proposed a method to determine whether
a variable can be fixed when solving an SDP relaxation. This method ex-
amines the dual to the SDP relaxation. If it appears that a variable should
be fixed at 1, say, then the effect of adding an explicit constraint that the
variable should take the value -1 is examined. The change in the dual value
Interior Point Methods for Combinatorial Optimization 279

that would result is then bounded; if this change is large enough then the
variable can be fixed at l.
The papers [54, 53, 57,18] all contain semidefinite relaxations of quadratic
programming problems with at most one constraint. By contrast, Wolkowicz
and Zhao [144, 145] and Zhao et al. [152] have looked at semidefinite relax-
ations of more complicated integer programming problems. This required
the development of some techniques that appear to be widely applicable.
For these problems, the primal semidefinite programming relaxation does
not have an interior feasible point, that is, there is no positive definite ma-
trix that satisfies all the constraints. This implies that the dual problem
will have an unbounded optimal face, so the problem is computationally
intractable for an interior point method. To overcome this difficulty, the au-
thors recast the problem in a lower dimensional space, where the barycenter
of the known integer solutions corresponds to an interior point. In particu-
lar, if X is the matrix of variables for the original semidefinite formulation, a
constant matrix V is determined so that the problem can be recast in terms
of a matrix Z of variables, with X = V ZVT and Z is of smaller dimension
than X. To ensure that the new problem corresponds to the original prob-
lem, a gangster operator is used, which forces some components of V ZVT
to be zero. With this reformulation, an interior point method can be used
successfully to solve the semidefinite relaxations. An extension of the gang-
ster operator may make it possible to use these relaxations in a branch and
cut approach.
Another interesting aspect of [145] is the development of an alternative,
slightly weaker, semidefinite relaxation that allows the exploitation of some
sparsity in the original matrix for the set covering problem. The resulting re-
laxation contains both semidefinite constraints and linear constraints. This
may make the semidefinite approach viable for problems of this type which
are far larger than those previously tackled with semidefinite programming
approaches. Whether this approach can be extended to other problems is
an interesting question.
Some work on attempting to exploit sparsity in the general setting has
been performed by Fujisawa et al. [38], and by Helmberg et al. [56] in
their MATLAB implementation. Zhao et al. [152, 151] propose using a
preconditioned conjugate gradient method to calculate the directions for
the quadratic assignment problem (QAP) within a primal-infeasible dual-
feasible variant of the method proposed in [56]. In the setting of solving
a QAP, the semidefinite relaxation is used to obtain a lower bound on the
optimal value; this bound is provided by the dual solution. Thus, only dual
280 J.E. Mitchell, P.M. Pardalos, and M.C.C. Resende

feasibility is needed to get a lower bound, and so primal feasibility is not as


important, and it is possible to solve the Newton system of equations only
approximately while still maintaining dual feasibility. It should be possible
to extend this approach to other problems.
Zhao et al. [152] also developed another relaxation for the QAP which
contains a large number of constraints. This second relaxation is stronger
than the relaxation that uses the gangster operator, but because of the
number of constraints, they could only use it in a cutting plane algorithm.
Due to memory limitations, the gangster approach provided a better lower
bound than the other relaxation for larger problems.
One way to solve sparse problems using semidefinite programming tech-
niques is to look at the dual problem. Benson et al. [16] and Helmberg
and Rendl [55] have both recently proposed methods that obtain very good
bounds and sometimes optimal solutions for sparse combinatorial optimiza-
tion problems by looking at the dual problem or relaxations of the dual.
There are several freely available implementations of SDP methods.
Many of these codes are written in MATLAB. One of the major costs in
an iteration of an SDP algorithm is constructing the Newton system of
equations, with a series of for loops. MATLAB does not appear to han-
dle this well, because of the slowness of its interpreted loops: in compiled
C code, each iteration of these loops takes half a dozen machine language
instructions, while in the interpreted code, each pass through one of these
loops takes 100 or more instructions. For details of a freely available C
implementation, see [17].

10 Concluding Remarks
Optimization is of central importance in both the natural sciences, such as
physics, chemistry and biology, as well as artificial or man-made sciences,
such as computer science and operations research. Nature inherently seeks
optimal solutions. For instance, crystalline structure is the minimum energy
state for a set of atoms, and light travels through the shortest path. The be-
havior of nature can often be explained on the basis of variational principles.
Laws of nature then simply become optimality conditions. Concepts from
continuous mathematics have always played a central role in the description
of these optimality conditions and in analysis of the structure of their so-
lutions. On the other hand, in artificial sciences, the problems are stated
using the language of discrete mathematics or logic, and a simple-minded
Interior Point Methods for Combinatorial Optimization 281

search for their solution confines one to a discrete solution set.


With the advent of interior point methods, the picture is changing, be-
cause these methods do not confine their working to a discrete solution set,
but instead view combinatorial objects as limiting cases of continuous ob-
jects and exploit the topological and geometric properties of the continuous
space. As a result, the number of efficiently solvable combinatorial prob-
lems is expanding. Also, the interior-point methods have revealed that the
mathematical structure relevant to optimization in the natural and artificial
sciences have a great deal in common. Recent conferences on global opti-
mization (e.g. [35,36]) are attracting researchers from diverse fields, ranging
from computer science to molecular biology, thus merging the development
paths of natural and artificial sciences. The phenomena of multiple solutions
to combinatorial problems is intimately related to multiple configurations a
complex molecule can assume. Thus, understanding the structure of solu-
tion sets of nonlinear problems is a common challenge faced by both natural
and artificial sciences, to explain natural phenomena in the former case and
to create more efficient interior-point algorithms in the latter case.
In the last decade we have witnessed computational breakthroughs in
the approximate solution of large scale combinatorial optimization prob-
lems. Many of these breakthroughs are due to the development of interior
point algorithms and implementations. Starting with linear programming
in 1984 [72], these developments have spanned a wide range of problems,
including network flows, graph problems, and integer programming. There
is a continuing activity with new papers and codes being announced almost
on daily basis. The interested reader can consult the following web sites:

1. http://www.mcs.anl.gov:80/home/otc/InteriorPoint
is an archive of technical reports and papers on interior-point methods,
maintained by S.J. Wright at Argonne National Laboratory.

2. http://www.zib.de/helmberg/semidef.html
contains a special home page for semidefinite programming organized by
C. Helmberg, at Berlin Center for Scientific Computing, Konrad Zuse
Zentrum fur Informationstechnik, Berlin.

3. ftp://orion.uwaterloo.ca/pub/henry/reports/psd.bib.gz
contains a bib file with papers related to SDP.
282 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

Acknowledgement
The first author acknowledges support in part by ONR grant NOOOI4-94-
1-0391, and by a grant from the Dutch NWO and by Delft University of
Technology for 1997-98, while visiting TWIjSSOR at Delft University of
Technology. The second author acknowledges support in part by NSF grant
BIR-9505913 and U.S. Department of Air Force grant F08635-92-C-0032.

References
[1] W.P. Adams and T.A. Johnson. Improved linear programming-based
lower bounds for the quadratic assignment problem. In P.M. Pardalos
and H. Wolkowicz, editors, Quadratic assignment and related prob-
lems, volume 16 of DIMACS Series on Discrete Mathematics and
Theoretical Computer Science, pages 43-75. American Mathematical
Society, 1994.
[2] I. Adler, N. Karmarkar, M.G.C. Resende, and G. Veiga. Data struc-
tures and programming techniques for the implementation of Kar-
markar's algorithm. ORSA Journal on Computing, 1:84-106, 1989.
[3] I. Adler, N. Karmarkar, M.G.C. Resende, and G. Veiga. An implemen-
tation of Karmarkar's algorithm for linear programming. Mathematical
Programming, 44:297-335, 1989.
[4] Navindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Net-
work Flows. Prentice Hall, Englewood Cliffs, NJ, 1993.
[5] F. Alizadeh. Optimization over positive semi-definite cone: Interior-
point methods and combinatorial applications. In P.M. Pardalos, ed-
itor, Advances in Optimization and Parallel Computing, pages 1-25.
North-Holland, Amsterdam, 1992.
[6] F. Alizadeh. Interior point methods in semidefinite programming with
applications to combinatorial optimization. SIAM Journal on Opti-
mization, 5(1):13-51, 1995.
[7] E. D. Andersen, J. Gondzio, C. Meszaros, and X. Xu. Implementa-
tion of interior point methods for large scale linear programming. In
T. Terlaky, editor, Interior Point Methods in Mathematical Program-
ming, chapter 6. Kluwer Academic Publishers, 1996.
Interior Point Methods for Combinatorial Optimization 283

[8] D. S. Atkinson and P. M. Vaidya. A cutting plane algorithm for convex


programming that uses analytic centers. Mathematical Programming,
69:1-43, 1995.

[9] O. Bahn, O. Du Merle, J. L. Goffin, and J. P. Vial. A cutting plane


method from analytic centers for stochastic programming. Mathemat-
ical Programming, 69:45-73, 1995.

[10] E. Balas, S. Ceria, and G. Cornuejols. A lift-and-project cutting


plane algorithm for mixed 0-1 programs. Mathematical Programming,
58:295-324, 1993.

[11] E. Balas, S. Ceria, G. Cornuejols, and N. Natraj. Gomory cuts revis-


ited. Operations Research Letters, 19:1-9, 1996.

[12] E.R. Barnes. A variation on Karmarkar's algorithm for solving lin-


ear programming problems. Mathematical Programming, 36:174-182,
1986.

[13] D. A. Bayer and J. C. Lagarias. The nonlinear geometry oflinear pro-


gramming, I. Affine and projective scaling trajectories. Transactions
of the American Mathematical Society, 314:499-526, 1989.

[14] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear


programming, II. Legendre transform coordinates and central trajec-
tories. Transactions of the American Mathematical Society, 314:527-
581, 1989.

[15] M.S. Bazaraa, J.J. Jarvis, and H.D. Sherali. Linear Programming and
Network Flows. Wiley, New York, NY, 1990.

[16] S. J. Benson, Y. Ye, and X. Zhang. Solving large-scale sparse semidef-


inite programs for combinatorial optimization. Technical report, De-
partment of Management Sciences, University ofIowa, Iowa City, Iowa
52242, September 1997.

[17] B. Borchers. CSDP, a C library for semidefinite programming. Tech-


nical report, Mathematics Department, New Mexico Tech, Socorro,
NM 87801, March 1997.

[18] B. Borchers, S. Joy, and J. E. Mitchell. Three methods to


the exact solution of max-sat problems. Talk given at IN-
284 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

FORMS Conference, Atlanta, 1996. Slides available from the URL


http://www.nmt.edu;-borchers/atlslides.ps. 1996.

[19] B. Borchers and J. E. Mitchell. Using an interior point method in


a branch and bound algorithm for integer programming. Technical
Report 195, Mathematical Sciences, Rensselaer Polytechnic Institute,
'froy, NY 12180, March 1991. Revised July 7, 1992.

[20] C. Chevalley. Theory of Lie Groups. Princeton University Press,


Princeton, New Jersey, 1946.

[21] T. Christof and G. Reinelt. Parallel cutting plane generation for the
tsp (extended abstract). Technical report, IWR Heidelberg, Germany,
1995.

[22] G. B. Dantzig. Maximization of a linear function of variables subject


to linear inequalities. In Tj. C. Koopmans, editor, Activity Analysis
of Production and Allocation, pages 339-347. Wiley, New York, 1951.

[23] G.B. Dantzig. Application of the simplex method to a transportation


problem. In T.C. Koopsmans, editor, Activity Analysis of Production
and Allocation. John Wiley and Sons, 1951.

[24] M. Davis and H. Putnam. A computing procedure for quantification


theory. Journal of the ACM, 7:201-215, 1960.

[25] A. de Silva and D. Abramson. A parallel interior point method and its
application to facility location problems. Technical report, School of
Computing and Information Technology, Griffith University, Nathan,
QLD 4111, Australia, 1995.

[26] C. De Simone, M. Diehl, M. Junger, P. Mutzel, G. Reinelt, and G. Ri-


naldi. Exact ground states of two-dimensional ±J Ising spin glasses.
Journal of Statistical Physics, 84:1363-1371, 1996.

[27] M. Deza and M. Laurent. Facets for the cut cone I. Mathematical
Programming, 56:121-160, 1992.

[28] M. Deza and M. Laurent. Facets for the cut cone II: Clique-web
inequalities. Mathematical Programming, 56:161-188, 1992.
Interior Point Methods for Combinatorial Optimization 285

[29] 1. 1. Dikin. Iterative solution of problems of linear and quadratic pro-


gramming. Doklady Akademiia Nauk SSSR, 174:747-748, 1967. En-
glish Translation: Soviet Mathematics Doklady, 1967, Volume 8, pp.
674-675.
[30] D. Z. Du, J. Gu, and P. M. Pardalos, editors. Satisfiability Problem:
Theory and Applications, volume 35 of DIMACS Series on Discrete
Mathematics and Theoretical Computer Science. American Mathe-
matical Society, 1997.
[31] Ding-Zhu Du and Panos M. Pardalos, editors. Network Optimization
Problems: Algorithms, Applications and Complexity. World Scientific,
1993.
[32] J. Edmonds. Maximum matching and a polyhedron with 0, 1 vertices.
Journal of Research National Bureau of Standards, 69B:125-130, 1965.
[33] A. S. EI-Bakry, R. A. Tapia, and Y. Zhang. A study of indicators for
identifying zero variables in interior-point methods. SIAM Review,
36:45-72, 1994.
[34] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequen-
tial Unconstrained Minimization Techniques. John Wiley and Sons,
New York, 1968. Reprinted as Volume 4 of the SIAM Classics in
Applied Mathematics Series, 1990.

[35] C. Floudas and P. Pardalos. Recent Advances in Global Optimization.


Princeton Series in Computer Science. Princeton University Press,
1992.

[36] C. Floudas and P. Pardalos. State of the Art in Global Optimization:


Computational Methods and Applications. Kluwer Academic Publish-
ers, 1996.
[37] L.R. Ford and D.R. Fulkerson. Flows in Networks. Princeton Univer-
sity Press, Princeton, NJ, 1990.
[38] K. Fujisawa, M. Kojima, and K. Nakata. Exploiting sparsity in primal-
dual interior-point methods for semidefinite programming. Technical
report, Department of Mathematical and Computing Sciences, Tokyo
Institute of Technology, 2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152,
Japan, January 1997.
286 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

[39] A. George and M. Heath. Solution of sparse linear least squares prob-
lems using Givens rotations. Linear Algebra and Its Applications,
34:69-83, 1980.

[40] Michel X. Goemans and David P. Williamson. Improved Approxima-


tion Algorithms for Maximum Cut and Satisfiability Problems Using
Semidefinite Programming. J. Assoc. Comput. Mach., 42:1115-1145,
1995.

[41] J.-L. Goffin, J. Gondzio, R. Sarkissian, and J.-P. Vial. Solving non-
linear multicommodity network flow problems by the analytic center
cutting plane method. Mathematical Programming, 76:131-154, 1997.
[42] J.-L. Goffin, A. Haurie, and J.-P. Vial. Decomposition and nondif-
ferentiable optimization with the projective algorithm. Management
Science, 38:284-302, 1992.
[43] J.-L. Goffin, Z.-Q. Luo, and Y. Yeo Further complexity analysis of
a primal-dual column generation algorithm for convex or quasiconvex
feasibility problems. Technical report, Faculty of Management, McGill
University, Montreal, Quebec, Canada, November 1993.
[44] J.-L. Goffin, Z.-Q. Luo, and Y. Yeo On the complexity of a column
generation algorithm for convex or quasiconvex problems. In Large
Scale Optimization: The State of the Art. Kluwer Academic Publish-
ers, 1993.
[45] G.H. Golub and C.F. van Loan. Matrix Computations. The Johns
Hopkins University Press, Baltimore, MD, 1983.
[46] R. E. Gomory. Outline of an algorithm for integer solutions to linear
programs. Bulletin of the American Mathematical Society, 64:275-278,
1958.

[47] M. Grotschel and O. Holland. Solution of large-scale travelling sales-


man problems. Mathematical Programming, 51{2}:141-202, 1991.
[48] M. Grotschel, M. Junger, and G. Reinelt. A cutting plane algorithm
for the linear ordering problem. Operations Research, 32:1195-1220,
1984.

[49] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and


Combinatorial Optimization. Springer-Verlag, Berlin, Germany, 1988.
Interior Point Methods for Combinatorial Optimization 287

[50] G.M. Guisewite. Network problems. In Reiner Horst and Panos M.


Pardalos, editors, Handbook of global optimization. Kluwer Academic
Publishers, 1994.
[51] G.M. Guisewite and P.M. Pardalos. Minimum concave cost network
flow problems: Applications, complexity, and algorithms. Annals of
Operations Research, 25:75-100, 1990.

[52] C. Helmberg. Fixing variables in semidefinite relaxations. Techni-


cal Report SC-96-43, Konrad-Zuse-Zentrum fuer Informationstechnik,
Berlin, December 1996.

[53] C. Helmberg, S. Poljak, F. Rendl, and H. Wolkowicz. Combin-


ing semidefinite and polyhedral relaxations for integer programs. In
E. Balas and J. Clausen, editors, Integer Programming and Combina-
torial Optimization, Lecture Notes in Computer Science, volume 920,
pages 124-134. Springer, 1995.
[54] C. Helmberg and F. Rendl. Solving quadratic {O,l)-problems by
semidefinite programs and cutting planes. Technical Report SC-95-
35, Konrad-Zuse-Zentrum fuer Informationstechnik, Berlin, 1995.

[55] C. Helmberg and F. Rendl. A spectral bundle method for semidefinite


programming. Technical Report SC-97-37, Konrad-Zuse-Zentrum fuer
Informationstechnik, Berlin, August 1997. Revised: October 1997.

[56] C. Helmberg, F. Rendl, R. J. Vanderbei, and H. Wolkowicz. An in-


terior point method for semidefinite programming. SIAM Journal on
Optimization, 6:342-361, 1996.

[57] C. Helmberg, F. Rendl, and R. Weismantel. Quadratic knapsack relax-


ations using cutting planes and semidefinite programming. In W. H.
Cunningham, S. T. McCormick, and M. Queyranne, editors, Inte-
ger Programming and Combinatorial Optimization, Lecture Notes in
Computer Science, volume 1084, pages 175-189. Springer, 1996.

[58] F .L. Hitchcock. The distribution of product from several sources to nu-
merous facilities. Journal of Mathematical Physics, 20:224-230, 1941.
[59] K. L. Hoffman and M. Padberg. Improving LP-representation of zero-
one linear programs for branch-and-cut. ORSA Journal on Computing,
3(2):121-134, 1991.
288 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

[60] E. Housos, C. Huang, and L. Liu. Parallel algorithms for the AT&T
KORBX System. AT&T Technical Journal, 68:37-47, 1989.
[61] D. S. Johnson and M. A. Trick, editors. Cliques, Coloring, and Sat-
isfiability: Second DIMACS Implementation Challenge, volume 26 of
DIMACS Series on Discrete Mathematics and Theoretical Computer
Science. American Mathematical Society, 1996.
[62] A. Joshi, A.S. Goldstein, and P.M. Vaidya. A fast implementation
of a path-following algorithm for maximizing a linear function over a
network polytope. In David S. Johnson and Catherine C. McGeoch,
editors, Network Flows and Matching: First DIMACS Implementation
Challenge, volume 12 of DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, pages 267-298. American Mathemati-
cal Society, 1993.
[63] M. Jiinger, G. Reinelt, and S. Thienel. Practical problem solving with
cutting plane algorithms in combinatorial optimization. In Combi-
natorial Optimization: DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, pages 111-152. AMS, 1995.
[64] J.A. Kaliski and Y. Yeo A decomposition variant of the potential
reduction algorithm for linear programming. Management Science,
39:757-776, 1993.

[65] A. P. Kamath, N. K Karmarkar, K G. Ramakrishnan, and M. G. C.


Resende. Computational experience with an interior point algorithm
on the Satisfiability problem. Annals of Operations Research, 25:43-
58, 1990.

[66] A.P. Kamath and N. Karmarkar. A continuous method for computing


bounds in integer quadratic optimization problems. Journal of Global
Optimization, 2:229-241, 1992.
[67] A.P. Kamath and N. Karmarkar. An O(nL) iteration algorithm for
computing bounds in quadratic optimization problems. In P.M. Parda-
los, editor, Complexity in Numerical Optimization, pages 254-268.
World Scientific, Singapore, 1993.
[68] A.P. Kamath, N. Karmarkar, KG. Ramakrishnan, and M.G.C. Re-
sende. A continuous approach to inductive inference. Mathematical
Programming, 57:215-238, 1992.
Interior Point Methods for Combinatorial Optimization 289

[69] A.P. Kamath, N. Karmarkar, K.G. Rama.krishnan, and M.G.C. Re-


sende. A continuous approach to inductive inference. Mathematical
Programming, 57:215-238, 1992.
[70] A.P. Kamath, N. Karmarkar, K.G. Rama.krishnan, and M.G.C. Re-
sende. An interior point approach to Boolean vector function synthe-
sis. In Proceedings of the 96th MSCAS, pages 185-189, 1993.
[71] A.P. Kamath, N. Karmarkar, N. Rama.krishnan, and M.G.C. Resende.
Computational experience with an interior point algorithm on the Sat-
isfiability problem. Annals of Operations Research, 25:43-58, 1990.
[72] N. Karmarkar. A new polynomial-time algorithm for linear program-
ming. Combinatorica, 4:373-395, 1984.
[73] N. Karmarkar. An interior-point approach for NP-complete problems.
Contemporary Mathematics, 114:297-308, 1990.
[74] N. Karmarkar. An interior-point approach to NP-complete prob-
lems. In Proceedings of the First Integer Programming and Combinato-
rial Optimization Conference, pages 351-366, University of Waterloo,
1990.
[75] N. Karmarkar. A new parallel architecture for sparse matrix compu-
tation based on finite projective geometries. In Proceedings of Super-
computing '91, pages 358-369. IEEE Computer Society, 1991.
[76] N. Karmarkar, J. Lagarias, L. Slutsman, and P. Wang. Power se-
ries variants of Karmarkar-type algorithms. AT&T Technical Journal,
68:20-36, 1989.

[77] N. Karmarkar, M.G.C. Resende, and K. Rama.krishnan. An interior


point algorithm to solve computationally difficult set covering prob-
lems. Mathematical Programming, 52:597-618, 1991.
[78] N. Karmarkar, M.G.C. Resende, and K.G. Rama.krishnan. An interior
point approach to the maximum independent set problem in dense
random graphs. In Proceedings of the XIII Latin American Conference
on Informatics, volume 1, pages 241-260, Santiago, Chile, July 1989.
[79] N. K. Karmarkar and K. G. Rama.krishnan. Computational results of
an interior point algorithm for large scale linear programming. Math-
ematical Programmin9, 52:555-586, 1991.
290 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

[80] J.L. Kennington and R.V. Helgason. Algorithms for network program-
ming. John Wiley and Sons, New York, NY, 1980.

[81] L. G. Khachiyan. A polynomial algorithm in linear programming.


Doklady Akademiia Nauk SSSR, 224:1093-1096, 1979. English Trans-
lation: Soviet Mathematics Doklady, Volume 20, pp. 1093-1096.

[82] M. Kojima, S. Shindoh, and S. Hara. Interior point methods for the
monotone semidefinite linear complementarity problem in symmetric
matrices. SIAM Journal on Optimization, 7:86-125, 1997.

[83] J.B. Kruskal. On the shortest spanning tree of graph and the traveling
salesman problem. Proceedings of the American Mathematical Society,
7:48-50, 1956.

[84] Eugene Lawler. Combinatorial Optimization: Networks and Matroids.


Holt, Rinehart and Winston, 1976.

[85] E. K Lee and J. E. Mitchell. Computational experience in nonlinear


mixed integer programming. In Proceedings of Symposium on Opera-
tions Research, August 1996, Braunschweig, Germany, pages 95-100.
Springer-Verlag, 1996.

[86] K. Levenberg. A method for the solution of certain problems in least


squares. Quart. Appl. Math., 2:164-168, 1944.

[87] Y. Li, P.M. Pardalos, KG. Ramakrishnan, and M.G.C. Resende.


Lower bounds for the quadratic assignment problem. Annals of Oper-
ations Research, 50:387-410, 1994.

[88] Y. Li, P.M. Pardalos, and M.G.C. Resende. A greedy randomized


adaptive search procedure for the quadratic assignment problem. In
P.M. Pardalos and H. Wolkowicz, editors, Quadratic assignment and
related problems, volume 16 of DIMACS Series on Discrete Mathe-
matics and Theoretical Computer Science, pages 237-262. American
Mathematical Society, 1994.

[89] L. Lovasz. On the Shannon capacity of a graph. IEEE Transactions


on Information Theory, 25:1-7, 1979.

[90] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point meth-


ods for linear programming: Computational state of the art. ORSA
Interior Point Methods for Combinatorial Optimization 291

Journal on Computing, 6(1):1-14, 1994. See also the following com-


mentaries and rejoinder.

[91] D. Marquardt. An algorithm for least-squares estimation of nonlinear


parameters. SIAM J. Appl. Math., 11:431-441, 1963.

[92] S. Mehrotra. On the implementation of a (primal-dual) interior point


method. SIAM Journal on Optimization, 2(4):575-601, 1992.

[93] S. Mehrotra and J. Wang. Conjugate gradient based implementation


of interior point methods for network flow problems. In L. Adams
and J. Nazareth, editors, Linear and Nonlinear Conjugate Gradient
Related Methods. SIAM, 1995.

[94] J. E. Mitchell. Interior point algorithms for integer programming. In


J. E. Beasley, editor, Advances in Linear and Integer Programming,
chapter 6, pages 223-248. Oxford University Press, 1996.

[95] J. E. Mitchell. Interior point methods for combinatorial optimization.


In Tamas Terlaky, editor, Interior Point Methods in Mathematical
Programming, chapter 11, pages 417-466. Kluwer Academic Publish-
ers, 1996.

[96] J. E. Mitchell. Computational experience with an interior point cut-


ting plane algorithm. Technical report, Mathematical Sciences, Rens-
selaer Polytechnic Institute, Troy, NY 12180-3590, February 1997.
Revised: April 1997.

[97] J. E. Mitchell. Fixing variables and generating classical cutting planes


when using an interior point branch and cut method to solve integer
programming problems. European Journal of Operational Research,
97:139-148, 1997.

[98] J. E. Mitchell. An interior point cutting plane algorithm for Ising spin
glass problems. Technical report, Mathematical Sciences, Rensselaer
Polytechnic Institute, Troy, NY 12180-3590, July 1997.

[99] J. E. Mitchell and B. Borchers. Solving real-world linear ordering prob-


lems using a primal-dual interior point cutting plane method. Annals
of Operations Research, 62:253-276, 1996.
292 J.E. Mitchell, P.M. Pardalos, and M.O.C. Resende

[100] J. E. Mitchell and B. Borchers. Solving linear ordering problems with


a combined interior point/simplex cutting plane algorithm. Technical
report, Mathematical Sciences, Rensselaer Polytechnic Institute, Troy,
NY 12180-3590, September 1997.

[101] J. E. Mitchell and M. J. Todd. Solving combinatorial optimization


problems using Karmarkar's algorithm. Mathematical Programming,
56:245-284, 1992.

[102] R. D. C. Monteiro and I. Adler. Interior path following primal-dual


algorithms. Part I: Linear programming. Mathematical Programming,
44(1):27-41, 1989.

[103] R. D. C. Monteiro, I. Adler, and M. G. C. Resende. A polynomial-time


primal-dual affine scaling algorithm for linear and convex quadratic
programming and its power series extension. Mathematics of Opera-
tions Research, 15(2):191-214, 1990.

[104] J.J. More and D.C. Sorenson. Computing a trust region step. SIAM
J. of Stat. Sci. Comput., 4:553-572, 1983.

[105] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Opti-


mization. John Wiley, New York, 1988.

[106] Y. E. Nesterov and A. S. Nemirovsky. Interior Point Polynomial Meth-


ods in Convex Programming : Theory and Algorithms. SIAM Publi-
cations. SIAM, Philadelphia, USA, 1993.

[107] Y. E. Nesterov and M. J. Todd. Self-scaled barriers and interior-


point methods for convex programming. Mathematics of Operations
Research, 22:1-42, 1997.

[108] M. W. Padberg and G. Rinaldi. A branch-and-cut algorithm for the


resolution of large-scale symmetric traveling salesman problems. SIAM
Review, 33(1):60-100, 1991.

[109] R. Pai, N. K. Karmarkar, and S. S. S. P. Rao. A global router for gate-


arrays based on Karmarkar's interior point methods. In Proceedings
of the Third International Workshop on VLSI System Design, pages
73-82, 1990.
Interior Point Methods for Combinatorial Optimization 293

[110] P. M. Pardalos, K. G. Ramakrishnan, M. G. C. Resende, and Y. Li. Im-


plementation of a variance reduction-based lower bound in a branch-
and-bound algorithm for the quadratic assignment problem. SIAM
Journal on Optimization, 7:280-294, 1997.
[111] P. M. Pardalos and M.G.C. Resende. Interior point methods for global
optimization problems. In T. Terlaky, editor, Interior Point Methods
of Mathematical Programming, pages 467-500. Kluwer Academic Pub-
lishers, 1996.

[112] P. M. Pardalos and H. Wolkowicz, editors. Topics in Semidefinite


and Interior-Point Methods. Fields Institute Communications Series.
American Mathematical Society, New Providence, Rhode Island, 1997.
[113] P.M. Pardalos. Continuous approaches to discrete optimization prob-
lems. In G. Di Pillo and F. Giannessi, editors, Nonlinear optimization
and applications. Plenum Publishing, 1996.
[114] P.M. Pardalos and H. Wolkowicz, editors. Quadratic assignment and
related problems, volume 16 of DIMACS Series in Discrete Mathe-
matics and Theoretical Computer Science. American Mathematical
Society, 1994.
[115] R G. Parker and R L. Rardin. Discrete Optimization. Academic
Press, San Diego, CA 92101, 1988.

[116] L. Portugal, F. Bastos, J. Judice, J. Paixao, and T. Terlaky. An


investigation of interior point algorithms for the linear transportation
problem. SIAM J. Sci. Computing, 17:1202-1223, 1996.

[117] L. Portugal, M.G.C. Resende, G. Veiga, and J. Judice. An efficient im-


plementation of an infeasible primal-dual network flow method. Tech-
nical report, AT&T Bell Laboratories, Murray Hill, New Jersey, 1994.

[118] RC. Prim. Shortest connection networks and some generalizations.


Bell System Technical Journal, 36:1389-1401, 1957.
[119] K. G. Ramakrishnan, M. G. C. Resende, and P. M. Pardalos. A branch
and bound algorithm for the quadratic assignment problem using a
lower bound based on linear programming. In C. Floudas and P.M.
Pardalos, editors, State of the Art in Global Optimization: Computa-
tional Methods and Applications. Kluwer Academic Publishers, 1995.
294 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

[120] K.G. Ramakrishnan, N.K. Karmarkar, and A.P. Kamath. An approx-


imate dual projective algorithm for solving assignment problems. In
David S. Johnson and Catherine C. McGeoch, editors, Network Flows
and Matching: First DIMACS Implementation Challenge, volume 12
of DIMACS Series in Discrete Mathematics and Theoretical Computer
Science, pages 431-451. American Mathematical Society, 1993.

[121] K.G. Ramakrishnan, M.G.C. Resende, and P.M. Pardalos. A branch


and bound algorithm for the quadratic assignment problem using a
lower bound based on linear programming. In State of the Art in
Global Optimization: Computational Methods and Applications, pages
57-73. Kluwer Academic Publishers, 1996.

[122] K.G. Ramakrishnan, M.G.C. Resende, B. Ramachandran, and J.F.


Pekny. Tight QAP bounds vias linear programming. In From Local to
Global Optimization. Kluwer Academic Publishers, 1998. To appear.

[123] M. Ramana and P. M. Pardalos. Semidefinite programming. In T. Ter-


laky, editor, Interior Point Methods of Mathematical Programming,
pages 369-398. Kluwer Academic Publishers, 1996.

[124] S. Ramaswamy and J. E. Mitchell. On updating the analytic center


after the addition of multiple cuts. Technical Report 37-94-423, DSES,
Rensselaer Polytechnic Institute, Troy, NY 12180, October 1994.

[125] S. Ramaswamy and J. E. Mitchell. A long step cutting plane algorithm


that uses the volumetric barrier. Technical report, DSES, Rensselaer
Polytechnic Institute, Troy, NY 12180, June 1995.

[126] M.G.C. Resende, P.M. Pardalos, and Y. Li. FORTRAN subroutines


for approximate solution of dense quadratic assignment problems using
GRASP. ACM Transactions on Mathematical Software, To appear.

[127] M.G.C. Resende, K.G. Ramakrishnan, and Z. Drezner. Computing


lower bounds for the quadratic assignment problem with an inte-
rior point algorithm for linear programming. Operations Research,
43{5}:781-791, 1995.

[128] M.G.C. Resende, T. Tsuchiya, and G. Veiga. Identifying the optimal


face of a network linear program with a globally convergent interior
Interior Point Methods for Combinatorial Optimization 295

point method. In W.W. Hager, D.W. Hearn, and P.M. Pardalos, edi-
tors, Large scale optimization: State of the art, pages 362-387. Kluwer
Academic Publishers, 1994.
[129] M.G.C. Resende and G. Veiga. Computing the projection in an in-
terior point algorithm: An experimental comparison. Investigaci6n
Operativa, 3:81-92, 1993.
[130] M.G.C. Resende and G. Veiga. An efficient implementation of a net-
work interior point method. In David S. Johnson and Catherine C. Mc-
Geoch, editors, Network Flows and Matching: First DIMACS Imple-
mentation Challenge, volume 12 of DIMACS Series in Discrete Math-
ematics and Theoretical Computer Science, pages 299-348. American
Mathematical Society, 1993.
[131] M.G.C. Resende and G. Veiga. An implementation of the dual affine
scaling algorithm for minimum cost flow on bipartite uncapaciated
networks. SIAM Journal on Optimization, 3:516-537, 1993.
[132] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms for Lin-
ear Optimization: An Interior Point Approach. John Wiley, Chich-
ester, 1997.
[133] Giinther Ruhe. Algorithmic Aspects of Flows in Networks. Kluwer
Academic Publishers, Boston, MA, 1991.
[134] C.-J. Shi, A. Vannelli, and J. Vlach. An improvement on Karmarkar's
algorithm for integer programming. In P.M. Pardalos and M.G.C. Re-
sende, editors, COAL Bulletin - Special issue on Computational As-
pects of Combinatorial Optimization, volume 21, pages 23-28. Mathe-
matical Programming Society, 1992.
[135] L.P. Sinha, B.A. Freedman, N.K. Karmarkar, A. Putcha, and K.G.
Ramakrishnan. Overseas network planning. In Proceedings of the
Third International Network Planning Symposium - NETWORKS'86,
pages 8.2.1-8.2.4, June 1986.
[136] R Van Slyke and R Wets. L-shaped linear programs with applications
to optimal control and stochastic linear programs. SIAM Journal on
Applied Mathematics, 17:638-663, 1969.
[137] RE. Tarjan. Data Structures and Network Algorithms. Society for
Industrial and Applied Mathematics, Philadelphia, PA, 1983.
296 J.E. Mitchell, P.M. Pardalos, and M.G.C. Resende

[138] P. M. Vaidya. A new algorithm for minimizing convex functions over


convex sets. Mathematical Programming, 73:291-341, 1996.
[139] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM
Review, 38:49-95, 1996.
[140] R. J. Vanderbei. Linear Programming: Foundations and Extensions.
Kluwer Academic Publishers, Boston, 1996.
[141] R.J. Vanderbei, M.S. Meketon, and B.A. Freedman. A modification of
Karmarkar's linear programming algorithm. Algorithmica, 1:395-407,
1986.
[142] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. Potential reduc-
tion algorithms for structured combinatorial optimization problems.
Operations Research Letters, 21:55-64, 1997.
[143] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. A potential reduc-
tion approach to the frequency assignment problem. Discrete Applied
Mathematics, 78:251-282, 1997.
[144] H. Wolkowicz and Q. Zhao. Semidefinite programming relaxations for
the graph partitioning problem. Technical report, Combinatorics and
Optimization, University of Waterloo, Waterloo, Ontario, N2L 3Gl
Canada, October 1996.
[145] H. Wolkowicz and Q. Zhao. Semidefinite programming relaxations for
the set partitioning problem. Technical report, Combinatorics and
Optimization, University of Waterloo, Waterloo, Ontario, N2L 3Gl
Canada, October 1996.
[146] S. Wright. Primal-dual interior point methods. SIAM, Philadelphia,
1996.
[147] Y. Yeo On affine scaling algorithms for nonconvex quadratic program-
ming. Mathematical Programming, 56:285-300, 1992.
[148] Y. Yeo Interior Point Algorithms: Theory and Analysis. John Wiley,
New York, 1997.
[149] Quey-Jen Yeh. A reduced dual affine scaling algorithm for solving
assignment and transportation problems. PhD thesis, Columbia Uni-
versity, New York, NY, 1989.
Interior Point Methods for Combinatorial Optimization 297

[150] Y. Zhang. On the convergence of a class of infeasible interior-point


methods for the horizontal linear complementarity problem. SIAM
Journal on Optimization, 4(1):208-227, 1994.

[151] Q. Zhao. Semidefinite programming for assignment and partitioning


problems. PhD thesis, Combinatorics and Optimization, University of
Waterloo, Waterloo, Ontario, N2L 3G1, Canada, 1996.

[152] Q. Zhao, S. E. Karisch, F. Rendl, and H. Wolkowicz. Semidefinite pro-


gramming relaxations for the quadratic assignment problem. Technical
Report 95-27, Combinatorics and Optimization, University of Water-
loo, Waterloo, Ontario, N2L 3G1, Canada, September 1996.
299

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 299-428
©1998 Kluwer Academic Publishers

Knapsack Problems
David Pisinger
DIKU, University of Copenhagen
Universitetsparken 1
DK-2100 Copenhagen
E-mail: pisinger@diku.dk

Paolo Toth
DEIS, University of Bologna
Viale Risorgimento 2
1-40136 Bologna
E-mail: paolo@deis.unibo.it

Contents
1 Introduction 302
1.1 Historical Overview . . 302
1.2 Applications...... . 304
1.3 The Problems . . . . . . 306
1.4 NP-hardness and Solvability . 309
1.5 Fundamental Properties of the Knapsack Problems. . 310
1.6 Experimental Comparisons . 314
1.7 Notation.......... . 316
1.8 Overview of the Chapter. . 317

2 0-1 Knapsack Problem 318


2.1 Upper Bounds . . . . . . . . . . 320
2.1.1 Lagrangian Relaxation. . 322
2.1.2 Tighter Bounds. . . . . . 323
2.1.3 Bounds from Minimum and Maximum Cardinality 324
2.2 Heuristics 329
2.3 Reduction........................... 331
300 D. Pisinger and P. Toth

2.4 Branch-and-bound Algorithms . . . . . . . 332


2.5 Dynamic Programming Algorithms. . . . . 336
2.5.1 Primal-dual Dynamic Programming 336
2.5.2 Horowitz and Sahni Subdivision .. 337
2.5.3 Other Dynamic Programming Algorithms 337
2.6 Solution of Large-sized Problems 338
2.6.1 Deriving a Core. . . . . . . 339
2.6.2 Fixed-core Algorithms . . . 342
2.6.3 Expanding-core Algorithms 343
2.6.4 Inconveniences in Core Problems 344
2.7 Solution of Hard Problems. . 345
2.8 Approximation Schemes . . . 346
2.9 Computational Experiments . 348

3 Subset-sum Problem 351


3.1 Upper Bounds . . . . . . . . . . . . . . . . 353
3.2 Dynamic Programming Algorithms. . . . . . 354
3.2.1 Horowitz and Sahni Decomposition. . 355
3.2.2 Balancing......... 356
3.3 Hybrid Algorithms . . . . . . . . 359
3.4 Solution of Large-sized Instances 361
3.5 Computational Experiments . . . 362

4 Multiple-choice Knapsack Problem 364


4.1 Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
4.1.1 Linear Time Algorithms for the Continuous Problem 368
4.1.2 Bounds from Lagrangian Relaxation 371
4.1.3 Other Bounds. 372
4.2 Heuristics . . . . . . . . . . . . 372
4.3 Class Reduction. . . . . . . . . 373
4.4 Branch-and-bound Algorithms 373
4.5 Dynamic Programming Algorithms. 374
4.6 Reduction of States. . . . . . . . 376
4.7 Solution of Large-sized Instances 377
4.8 Computational Experiments . 379

5 Bounded Knapsack Problem 382


5.1 Upper Bounds 383
5.2 Heuristics . . . . . . . . . . 385
5.3 Reduction.......... 386
5.4 Branch-and-bound Algorithms 387
5.5 Dynamic Programming . . . . 388
5.6 Reduction of States. . . . . . . 390
5.7 Solution of Large-sized Problems 391
Knapsack Problems 301

5.8 Computational Experiments . . . 392

6 Unbounded Knapsack Problem 394


6.1 Upper Bounds . . . . . . 395
6.2 Heuristics . . . . . . . . 396
6.3 Dynamic Programming 397
6.4 Branch-and-bound . . . 397
6.5 Reduction Algorithms 398
6.5.1 Reductions from Bounds . 399
6.6 Solution of Large-sized Instances 400
6.7 Computational Experiments . 400

7 Multiple Knapsack Problem 402


7.1 Upper Bounds . . . . . 403
7.2 Tightening Constraints .. . 406
7.3 Reduction Algorithms .. . 407
7.4 Heuristics and Approximate Algorithms 408
7.5 Dynamic Programming . . . . 409
7.6 Branch-and-bound Algorithms 410
7.6.1 The mtm Algorithm . . . 410
7.6.2 The mulknap Algorithm 412
7.7 Computational Experiments. 413

8 Conclusion and Future Trends 417

References

Abstract
Knapsack Problems are the simplest NP-hard problems in Combinatorial
Optimization, as they maximize an objective function subject to a single
resource constraint. Several variants of the classical 0-1 Knapsack Problem
will be considered with respect to relaxations, bounds, reductions and other
algorithmic techniques for the exact solution. Computational results are
presented to compare the actual performance of the most effective algorithms
published.
302 D. Pisinger and P. Toth

1 Introduction
Knapsack Problems have been intensively studied since the emergence of
Combinatorial Optimization, both because of their immediate applications
in industry and financial management, and even more for theoretical rea-
sons, as Knapsack Problems often occur by relaxation of different integer
programming problems. In such applications, we need to solve a Knapsack
Problem each time a bounding function is derived, demanding extremely
fast solution times of the Knapsack algorithm.
The family of Knapsack Problems all consider a set of items, each item j
having an associated profit Pj and weight Wj. The problem is then to choose
a subset of the given items such that the corresponding profit sum is maxi-
mized without exceeding the capacity c of the knapsack(s). Different types
of Knapsack Problems occur depending on the distribution of items and
knapsacks: In the 0-1 Knapsack Problem each item may be chosen at most
once, while in the Bounded Knapsack Problem we have a bounded amount
of each item type. The Multiple-choice Knapsack Problem occurs when the
items should be chosen from disjoint classes and, if several knapsacks are to
be filled simultaneously we get the Multiple Knapsack Problem. The most
general form is the Multi-constrained Knapsack Problem, which is basically
a general Integer Programming (IP) Problem with non-negative coefficients.
Although Knapsack Problems, from a theoretical point of view, are al-
most intractable as they belong to the family of N'P - hard problems, several
of the problems may be solved to optimality in fractions of a second. This
surprising result is the outcome of several decades of research which has ex-
posed the special structural properties of Knapsack Problems that make the
problems so easy to solve. The intention of this chapter is to state several
of these properties and show how they influence the solution methods.

1.1 Historical Overview


Knapsack Problems are some of the most intensively studied discrete opti-
mization problems. The reason for such interest basically derives from three
facts: (a) they can be viewed as the simplest Integer Linear Programming
problem; (b) they appear as a subproblem in many more complex problems;
(c) they represent many practical situations.
Recently, Knapsack Problems have been used for generating minimal
cover induced constraints (see, e.g., Crowder, Johnson and Padberg [16])
and in several coefficient reduction procedures for strengthening LP bounds
Knapsack Problems 303

in general integer programming (see, e.g., Dietrich and Escudero [19, 20]).
During the last few decades, Knapsack Problems have been studied through
different approaches, according to the theoretical development of Combina-
torial Optimization.
In the fifties, Bellman's dynamic programming theory produced the first
algorithms to exactly solve the 0-1 Knapsack Problem. In 1957 Dantzig gave
an elegant and efficient method to determine the solution of the continuous
relaxation of the problem, and hence a bound on the integer solution value.
The bound was used in the following twenty years in almost all studies on
KP.
In the sixties, the dynamic programming approach to the KP and other
knapsack-type problems was deeply investigated by Gilmore and Gomory.
In 1967 Kolesar experimented with the first branch-and-bound algorithm for
the problem.
In the seventies, the branch-and-bound approach was further developed,
proving to be capable of solving problems with a large number of variables.
The most well-known algorithm of this period was developed by Horowitz
and Sahni. In 1973 Ingargiola and Korsh presented the first reduction pro-
cedure, a preprocessing algorithm which significantly reduces the number
of variables. In 1974 Johnson gave the first polynomial-time approximation
scheme for the subset-sum problem; the result was extended by Sahni to
the 0-1 Knapsack Problem. The first fully polynomial-time approximation
scheme was obtained by Ibarra and Kim in 1975. In 1977 Martello and Toth
proposed the first upper bound dominating the value of the continuous re-
laxation.
The main results of the eighties concern the solution of large-sized prob-
lems, for which sorting of the variables (required to solve the continuous
solution) takes a very high percentage of the running time. In 1980 Balas
and Zemel presented a new approach to solve the problem by sorting, in
many cases, only a small subset of the variables (the core problem). Several
efficient algorithms were designed based on this idea.
The current decade has been focused on solving difficult instances of
reasonable size instead of extremely large easy instances. Martello and Toth
showed in 1993 how upper bounds derived from Lagrangian relaxation of
cardinality bounds may help solving several difficult problems.
Dynamic Programming has also been accepted as an efficient solution
technique for two reasons: by incorporating bounding rules in the enumer-
ation, the number of states can often be held at a reasonable level, and new
improved recursions can focus the enumeration on those items which are
304 D. Pisinger and P. Toth

most interesting.
The outcome of the latest research has been some algorithms with very
stable overall performance. On the theoretical frontier the nineties have
brought algorithms with improved worst-case time bounds. In 1995 Pisinger
was the first to present a dynamic programming recursion for the subset-sum
problem which has better worst-case complexity than Bellman's classical
recursion, and many of these results can be generalized to other Knapsack
Problems.

Each year numerous papers on Knapsack Problems are presented, and sev-
eral new variants of the classical problem are considered. Since every prob-
lem with a single weight constraint can be seen as some kind of Knapsack
Problem, recent papers have considered quadratic versions of the KP, as well
as collapsing, nested, bottleneck, graph and tree Knapsack Problems to men-
tion only the names. The techniques applied for these more sophisticated
problems may vary a lot, but it is interesting to see that many ideas from
the 0-1 Knapsack Problem are applicable even in the generalized versions.
Where the nineties have brought efficient algorithms to solve difficult
problems in reasonable time, the theoretical work is far behind. For Knap-
sack Problems there is a very large gap between the theoretical worst-case
performance of the best algorithms and the practical ability to solve large
and difficult problems in reasonable time. Thus, there are still several the-
oretical and practical problems to be solved.

1.2 Applications
Knapsack Problems have numerous applications in theory as well as in prac-
tice. From a theoretical point of view, the simple structure calls for exploita-
tion of numerous interesting properties, that can make the problems easier to
solve. Knapsack Problems also arise as subproblems in several more complex
algorithms in combinatorial optimization, and these algorithms will benefit
from any improvement in this field.
Despite the name, practical applications of Knapsack Problems are not
limited to packing problems: Assume that you may invest in n projects,
each giving the profit Pj. It costs Wj to invest in project j, and you have
only c dollars available. The best projects for investment may be found by
solving a 0-1 Knapsack Problem.
Another application appears in a restaurant, where a person has to
choose k courses, without surpassing the amount of c calories, that his diet
Knapsack Problems 305

prescribes. Assume that there are Ni dishes to choose from for each course
i = 1, ... , k, Wi; is the nutritive value and Pi; is a rating saying how good
each dish tastes. Then an optimal meal may be found by solving the corre-
sponding Multiple-choice Knapsack Problem [109].

A two-processor scheduling problem, where a number of jobs have to be


divided among two processors such that the completion time is minimized,
may be solved as a Subset-sum Problem [77]. Cassette recorders have been
introduced which are able to select a number of songs from a CD such that
the longest possible play time will be recorded. This algorithm also solves
some kinds of Subset-sum Problem.

Apart from these simple illustrations, Knapsack Problems are frequently


used in the following industrial fields: Problems in cargo loading, cutting
stock, budget control, and financial management may be formulated as
Knapsack Problems, where the specific model depends on the side con-
straints present. Sinha and Zoltners [109] proposed using Multiple-choice
Knapsack Problems to select which components should be linked in series
in order to maximize fault tolerance. In several two- and three-dimensional
cutting and packing problems, the Knapsack Problem is used heavily as a
subproblem to find an optimal partition in layers or strips (Gilmore and Go-
mory [43]); Diffe and Hellman [21] designed a public cryptographic scheme
whose security relies on the difficulty of solving the Subset-sum Problem.

The more theoretical applications appear either where a general problem


is transformed into a Knapsack Problem, or where the Knapsack Problem
appears as a subproblem, e.g. for deriving bounds in a branch-and-bound
algorithm intended to solve a more complex problem. In the first cate-
gory Mathews [S4] a century ago showed how several constraints may be
aggregated to one single knapsack constraint, making it possible to solve
any IP Problem as a 0-1 Knapsack Problem. Moreover Nauss [S9] proposed
transforming nonlinear Knapsack Problems into Multiple-choice Knapsack
Problems. In the second category we should mention that the 0-1 Knapsack
Problem appears as a subproblem when solving the Generalized Assignment
Problem, which is also heavily used when solving Vehicle Routing Problems
[70]. Knapsack Problems also occur as a subproblem in airline scheduling
problems [53], production planning problems [lOS], clustering and graph
partitioning problems [32] and in the design of some electronic circuits [31].
306 D. Pisinger and P. Toth

1.3 The Problems


All Knapsack Problems consider a set of items with associated profit Pj
and weight Wj. A subset of the items is to be chosen such that the weight
sum does not exceed the capacity c of the knapsack, and such that the
largest possible profit sum is obtained. We will assume that all coefficients
Pj, Wj, c are positive integers, although weaker assumptions may sometimes
be handled in the individual problems.
The 0-1 Knapsack Problem is the problem of choosing some of the n
items such that the corresponding profit sum is maximized without the
weight sum exceeding the capacity c. Thus, it may be formulated as the
following maximization problem:

maximize E'j=l PjXj


subject to E'j=l WjXj :$ c, (1)
Xj E {O,l}, j = 1, ... ,n,
where Xj is a binary variable having value 1 if item j should be included in
the knapsack, and 0 otherwise. If we have a bounded amount mj of each
item type j, then the Bounded Knapsack Problem appears:

maximize E'j=l PjXj


subject to E'j=l WjXj :$ c, (2)
Xj E {O, 1, ... ,mj}, j = 1, ... ,n.
Here Xj gives the amount of each item type that should be included in the
knapsack in order to obtain the largest objective value. The Unbounded
Knapsack Problem is a special case of the Bounded Knapsack Problem,
since an unlimited amount of each item type is available:

maximize E'j=lPjXj
subject to E'j=l WjXj :$ c, (3)
Xj ~ 0 integer, j = 1, ... ,n.
Actually, any variable Xj of an Unbounded Knapsack Problem will be bounded
by the capacity c, as the weight of each item is at least one. But generally
there is no benefit by transforming an Unbounded Knapsack Problem into
its bounded version.
Another generalization of the 0-1 Knapsack problem is to choose exactly
one item j from each class Ni, i = 1, ... ,k such that the profit sum is max-
imized. This gives the Multiple-choice Knapsack Problem which is defined
Knapsack Problems 307

as
maximize E~=I EjENi PijXij
subject to E~=I EjENi WijXij ~ c,
(4)
EjENi Xij = 1, i = 1, ... , k,
XijE{O,l}, i=l, ... ,k, jENi.
Here Pij and Wij are the profit and weight of item j in class i, while the
binary variable Xij is one if item j was chosen in class i, and zero otherwise.
The constraint EjENi Xij = 1, i = 1, ... , k ensures that exactly one item is
chosen from each class.
If the profit Pj equals the weight Wj for each item j in a 0-1 Knapsack
Problem, then we obtain the Subset-sum Problem, which may be formulated
as:
maximize Ej=1 WjXj
subject to Ej=l WjXj ~ c, (5)
Xj E {0,1}, j = 1, ... ,n.
The name indicates that it may also be seen as the problem of choosing
a subset of the values WI, • •• , Wn such that the sum is the largest possible
without exceeding c.
Now imagine a cashier who has to give back an amount of money c by
using the least possible number of coins of values WI, .•• , W n • The Change-
making Problem is then defined as

minimize Ej=l Xj
subject to Ej=l WjXj = c, (6)
Xj ~ 0 integer, j = 1, ... ,n,

where Wj is the face value of coin j, and we assume that an unlimited amount
of each coin is available. The optimal number of each coin j that should be
used is then given by Xj.
If we choose some of n items to pack in m knapsacks of (maybe) different
capacity Ci, such that the largest possible profit sum is obtained, then we
get the Multiple Knapsack Problem:

maximize Ef,;1 Ej=l PjXij


subject to Ej=l WjXij ~ Ci, i= 1, ... ,m,
(7)
Ef,;1 Xij ~ 1, j = 1, ... ,n,
Xij E {O, I}, i = 1, ... , m, j = 1, ... , n.
308 D. Pisinger and P. Toth

Here Xij = 1 says that item j should be packed into knapsack i (Xij = 0
otherwise), while the constraint 'E']=1 WjXij ::5 Ci ensures that the capacity
constraint of each knapsack i is respected. The constraint 'Ef:!:1 Xij ::5 1
ensures that each item j is chosen at most once.
A very useful model is the Bin-packing Problem where all the n items
should be packed in a number of equally large bins, such that the number
of bins used is the smallest possible. Thus we have
minimize "n
.L..i=1 y,.
subject to E']=1 WjXij ::5 CYi, i = 1, ... ,n,
'Er=1 Xij = 1, j = 1, ... ,n, (8)
Yi E {0,1}, i = 1, ... ,n,
Xij E {0,1}, i = 1, ... ,n, j = 1, ... ,n,
where Yi = 1 indicates that bin i is used (Yi = 0 otherwise), and Xij says
that item j should be packed in bin i. The constraint 'Er=1 Xij = 1 ensures
that every item j is packed exactly once, while inequalities 'E']=1 WjXij ::5 CYi
ensure that the capacity constraint is respected for all bins that are actually
used.
The most general form of a Knapsack Problem is the Multi-constrained
Knapsack Problem, which is basically a general Integer Programming Prob-
lem where all coefficients Pj, Wij and Ci are nonnegative integers. Thus, it
may be formulated as
maximize 'E']=1 PjXj
subject to E']=1 WijXj ::5 Ci, i = 1, ... , m, (9)
Xj ~ 0 integer, j = 1, ... ,n.
Gavish and Pirkul [41] consider different relaxations of this problem, and
propose an exact algorithm. Approximation algorithms are considered in
Frieze and Clarke [38] as well as Plotkin, Shmoys and Tardos [107]. If the
number of constraints is m = 2 then the Bidimensional Knapsack Problem
appears. Exact solution techniques for this problem are presented in e.g.
Freville and Plateau [37].
The Quadratic Knapsack Problem presented by Gallo, Hammer and
Simeone [39] is an example of a Knapsack Problem with a quadratic ob-
jective function. It may be stated as
maximize E']=1 Er=IPij X i X j
subject to 'E']=1 WjXj ::5 c, (10)
Xj E {O, 1}, j = 1, ... , n.
Knapsack Problems 309

Here Pij is the profit obtained if both items i and j are chosen, while Wj is
the weight of item j. The Quadratic Knapsack Problem is a knapsack coun-
terpart to the Quadratic Assignment Problem, and the problem has sev-
eral applications in telecommunication and hydrological studies. Caprara,
Pisinger and Toth [11] have recently presented an exact algorithm for large
instances with up to n = 400 variables.
Other related problems within the family of Knapsack Problems are: The
Collapsing Knapsack Problem which is considered in Fayard and Plateau [30]
and Pferschy, Pisinger, Woeginger [95], and the Nested Knapsack Problem,
treated in Dudzinski and Walukiewicz [24] together with several general-
izations. Morin and Marsten [S5] as well as Hochbaum [52] report some
results on Nonlinear Knapsack Problems. Burkard and Pferschy consider
the Inverse-parametric Knapsack Problem in [10]. Bottleneck versions of
the 0-1 Knapsack Problem are considered in Martello and Toth [S2] where
polynomial algorithms are proposed for their solution. Although not usually
grouped as a Knapsack Problem, Martello and Toth treat the Generalized
Assignment Problem in [SO] using the terminology of Knapsack Problems.
Numerous other Knapsack Problems appear by combining the above
constraints in some way, e.g. the Bounded Multiple-choice Knapsack Prob-
lem [104], the Multiple-choice Subset-sum Problem [99], the Multiple-choice
Nested Knapsack Problem [24] etc.
We have presented the above Knapsack Problems in the maximization
form (with the change-making problem as an important exception), although
equivalent minimization versions can be defined. For several of the problems
the maximization problem may however be transformed into an equivalent
minimization problem and vice versa. We will describe these transformations
when dealing with each problem.
For all the problems (apart from change-making), a feasible solution
may be obtained in polynomial time. If the capacity constraint is however
changed to demanded equality, even finding a feasible solution becomes an
NP-hard problem.

1.4 NP-hardness and Solvability


For most of the Knapsack Problems it is easy to derive a recursive formula-
tion, expressing the objective value as the maximum of the optimal solution
values for a number of subproblems. Since the solution space for several
of these subproblems overlaps, dynamic programming yields an effective so-
lution technique. Basically, the recursion is evaluated in an iterative way,
310 D. Pisinger and P. Toth

saving the intermediate solutions (so-called states) in a table. By dynamic


programming, many of the Knapsack Problems can be solved in pseudo-
polynomial time, i.e. in a time bounded by a polynomium in the size of the
instance and the magnitude of the coefficients. For the 0-1 Knapsack Prob-
lem a straightforward dynamic programming algorithm yields a time bound
of O(nc).
Since most Knapsack Problems are pseudo-polynomially solvable, we are
balancing at the border between NP-hard problems and Polynomially solv-
able problems 'P. One could claim that since c in all practical applications
is bounded by the word-length of a computer, and thus by a constant, all
Knapsack Problems considered in practice are polynomially solvable. This
does not however give a clue to efficient solution techniques, since the word-
length may be large.
Most of the test-instances considered in this chapter have coefficients
bounded by a constant, and thus we are basically dealing with polynomially
solvable problems. But even for coefficients of modest magnitude, instances
can be constructed which tend to be difficult. Avis [4] found a class of
problems where the magnitude of the weights is bounded by O(n2 ), but any
branch-and-bound algorithm not using dominance and cuts will perform
badly. Chvatal [14] proved that if all coefficients of a Knapsack Problem
are exponentially growing, and if the profit equals the weight for each item,
then no bounding and no dominance relations will stop the enumerative
process before at least (2n/lO) nodes have been enumerated, thus implying
strictly exponentially growing computational times. Lagarias and Odlyzko
[69] however showed that more general algorithms can handle such instances
in a better way.
Recently Krass, Sethi and Sorger [68] investigated properties that make
a Knapsack Problem hard to solve. Although considering only Knapsack
Problems where the weights are elements of a given subset S of the positive
integers, they show that the family of Knapsack Problems obtained by vary-
ing the parameter S in the power set of the positive integers Z+ contains
polynomially solvable problems and NP-complete problems, even when S
is restricted to the class of polynomially recognizable sets.

1.5 Fundamental Properties of the Knapsack Problems


Knapsack Problems are highly structured, which fortunately implies that
several instances may be solved in fractions of a second despite the worst-
case complexity. These structural properties give rise to different techniques
Knapsack Problems 311

which either lead to algorithms with known worst-case performance, or


which are able to limit the solution space considerably such that real life
problems are generally easily solved despite bad worst-case behavior. It is
important to distinguish between these two directions of research, since when
dealing with NP-hard problems it often happens that solution techniques
with good worst-case performance are outperformed by less mathematically
founded techniques when practical applications are considered.
The perhaps most important property of the Knapsack Problems is that
the continuous relaxation of the problems, where the constraints on the
variables x j E {O, ... , mj} are changed to 0 ~ x j ~ mj, are so fast to solve.
Back in 1957, Dantzig [17] showed an elegant solution for the continuous 0-1
Knapsack Problem, by ordering the items according to their profit-to-weight
ratio,
PI ~ P2 ~ ... ~ Pn , (11)
WI W2 Wn

and using a greedy algorithm for filling the knapsack. The first item which
does not fit into the knapsack is denoted the break item b (also called critical
item) and an optimal solution is found by choosing all items j < b plus a
fraction of item b corresponding to the residual capacity. The ordering (11)
takes O(nlogn} time, but Balas and Zemel [6] showed that the continuous
0-1 Knapsack Problem may be solved in linear time without sorting, since b
may be found as a weighted median. This result has later been generalized
to several of the Knapsack Problems, such that we now know linear time
algorithms for the continuous version of problems (1) to (8).
The existence of quickly obtainable upper bounds makes it possible to
solve large-sized problems through branch-and-bound. Since branch-and-
bound techniques are basically based on a complete enumeration, their per-
formance strictly depends on the quality of the bounds applied. For many
applications, the bounds obtained by continuous relaxation are sufficient,
but instance classes exist where these bounds overestimate the integer solu-
tion value, and thus do not efficiently cut off unpromising branches in the
search tree. Recently, much effort has been made to improve the bounds in
order to obtain branch-and-bound algorithms with stable performance for a
large class of instances.
Another essential property is that, having solved the continuous relaxed
problem generally only a few decision variables need to be changed in order
to obtain the optimal solution. Fig. 1 shows a typical solution to a 0-1
Knapsack Problem compared to the continuous solution. It can be seen
that most of the solution values are the same, while the differing variables
312 D. Pisinger and P. Toth

are generally close to the break item. To be more specific, Fig. 2 shows how
often the integer solution differs from the continuous solution as average over
1000 "easy" instances constructed such that b is the same for all instances.
The figure shows that the only differences are close to b, while other variables
maintain their solution value from the continuous problem.

solution xi(l < i < 6) Xs X10 Xu X14 X15 X16


continuous 1 1 1 1 0.35 o o o o o o
IP-optimal 1 o o 1 1 o 1 o o o o
Figure 1: A typical solution to the 0-1 Knapsack Problem compared to the
continuous solution. The break item is b = 10, and it is seen that those
variables, where the two solution values differ, are generally close to b
frequency

50%

.··
..
10%
....•
i
100 b 1000
Figure 2: Frequency of items j where the optimal solution xi differs from
the continuous solution x;

This behavior motivated Balas and Zemel [6] to propose that only a few
variables around b are considered in order to solve the Knapsack Problem
to optimality. This problem was denoted the core problem and has been
an essential part of all efficient algorithms for Knapsack Problems. Recent
results have, however, shown that it is important how a core is chosen, as
degeneration may occur such that all item weights in the core will be almost
the same. Obviously this makes it very difficult to obtain a solution which
fills the knapsack well, and thus a tight lower bound.
For all the Knapsack Problems, efficient reduction algorithms have been
developed, which enable one to fix some decision variables at their optimal
values before the problem is solved, thus considerably decreasing the size of
Knapsack Problems 313

an instance. Basically, these tests may be viewed as a special case of the


branch-and-bound technique; for each 0-1 variable, we test both branches,
fathoming one of them if a bounding test shows that a better solution cannot
be found.
Already in the seventies Balas [5], Padberg [93] and Wolsey [115] in-
vestigated polyhedral properties of the knapsack polytope. Among their
results we find that if the weights are ordered in increasing order and we let
a = min{h : ~j=l Wj > c} then the inequality

(12)

defines a facet of the knapsack polytope. Balas and Zemel [6] made the first
experiment with adding such constraints to the formulation in order to get
solutions for the continuous problem which are closer to the integer optimum.
But only recently have these cardinality constraints been used efficiently in
an enumerative algorithm by relaxation with the original weight constraint.
This technique often closes the gap between the integer and continuous
solutions of ill-conditioned problems (Le. problems where the continuous
solution is far from the integer solution).
Several Knapsack Problems are solvable in pseudo-polynomial time by
dynamic programming. We do in fact know that problems (1) to (6) are
pseudo-polynomially solvable, while the remaining problems are N'P-hard
in the strong sense. The dominance relations are generally very efficient,
making it possible to fathom several unpromising states. By incorporating
bounding tests in the dynamic programming, very efficient algorithms may
be developed.
Another important property of Knapsack Problems is that they are sep-
arable, as observed by Horowitz and Sahni [54], which means that a 0-1
Knapsack Problem may be solved in 0(#) worst-case time. The idea is
to divide the problem into two equally sized parts, and enumerate each sub-
problem through dynamic programming. The two tables of optimal profit
sums that may be obtained for each capacity are easily combined in time
proportional to the size of the sets by considering pairs of states which have
the largest possible weight sum not exceeding c. The technique gives an
improvement over a complete enumeration by a factor of a square-root. Al-
though this bound is still exponential, the consequence of this observation
is that we may solve a 0-1 Knapsack Problem through parallel computation
by recursively dividing the problem into two parts. The resulting alga-
314 D. Pisinger and P. Toth

rithm runs in o (log n log c), which is probably the best one can hope for, as
mentioned in Kindervater and Lenstra [65], but the number of processors
required is huge: o(nc2 j(logn(logc)2)).
A final technique to limit the search is balancing. A balanced solution
is loosely speaking a solution which is sufficiently filled, i.e. the weight sum
does not differ from the capacity by more than the weight of a single item.
Obviously, an optimal solution is balanced, and thus the enumeration may
be limited to only consider balanced states. Using balancing, several prob-
lems from the knapsack family are solvable in linear time provided that the
weights Wj are bounded by a constant r [99]. For the subset-sum problem
one gets the attractive solution time of O(nr), which is an improvement
over the Bellman recursion running in O(nc) = O(n2 r). For 0-1 Knapsack
problems balancing yields solution times of O(nr2), which can only compete
with the Bellman recursion for large-sized problems having small weights.
Finally, fully polynomial approximation schemes have been derived for
some of the Knapsack Problems. The algorithms are generally based on dy-
namic programming where we make some kind of scaling of the profit sums
in order to limit the number of states in the dynamic programming. The
scaling introduces some kind of relative error for each state; thus two tech-
niques are used for limiting the error: 1) The items are divided into "large"
and "small" items, where the dynamic programming is done for the large
items only. 2) At each stage of the dynamic programming, states are deleted
which are "sufficiently" close to each other. In practice fully polynomial
approximation schemes do, however, have quite disappointing performance
compared to well-designed heuristics with no formal performance guarantee
[80].

1.6 Experimental Comparisons


Due to the large gap between the worst-case performance of algorithms and
their practical solution times, one cannot study algorithms for Knapsack
Problems without performing some experiments.
In the following sections we will end each presentation of a problem with
comprehensive computational comparisons showing the performance of the
best algorithms available. The experimental work is also intended to expose
other properties which may be relevant for the design of new algorithms,
such as measuring the quality of the upper bounds, showing the size of the
minimal core, etc.
For the experimental comparisons we will apply different groups of ran-
Knapsack Problems 315

domly generated instances, which have been constructed to reflect special


properties that may influence the solution process. Thus, we will discuss the
nature of each group of instances. In all instances the weights are randomly
distributed in a given interval while the profits are expressed as a function
of the weights, yielding the specific properties of each group. The instance
groups are graphically illustrated in Fig. 3.

. .. . ... ..
Pj pj .f. Pj

··
/
.. . ...• .'
. . •• ...... .. ........
• .0 o ••

"
.\ ~
~ '\
..,::;,
.. .
.. .. . .· .....:':-1"
~
:
.1

I.-.. ..... .
0

.' . • 0

Wj
-,..... .
I. '.

~-----------+ Wj ~-----------+Wj
(a) (b) (c)
t.
Pj / Pj Pj ••••
,. " / '. 0'
: ~-:.'.
"
......
.'To '::'

.// .. ...
.I.'·
:,.~:

.' ...
·f .. •
Wj ~~----------Wj ~------------ Wj
(d) (e) (f)
Figure 3: The test instances considered

a Un correlated data instances: In these instances there is no correlation


between the profit and weight of an item. Such instances illustrate
those situations where it is reasonable to assume that the profit does
not depend on the weight. Uncorrelated instances are generally easy to
solve, as there is a large variation between the weights, making it easy
to obtain a filled knapsack. Moreover, it is easy to fathom numerous
variables by upper bound tests or by dominance relations.
b Weakly correlated instances: Despite their name, weakly correlated
instances have a very high correlation between the profit and weight
of an item. Typically the profit only differs from the weight by a couple
of percent. Such instances are perhaps the most realistic in manage-
ment, as it is well-known that the return of an investment is generally
316 D. Pisinger and P. Toth

proportional to the sum invested within some small variations. The


high correlation means that it is generally difficult to fathom variables
by upper bound tests. In spite of this, weakly correlated instances are
generally easy to solve, since there is a large variation in the weights,
making it easy to obtain a filled knapsack, and filled solutions are
generally very close to an optimal solution due to the correlation.

c Subset-sum instances: These instances reflect the situation where the


profit of each item is proportional to the weight. Thus, our only goal is
to obtain a filled knapsack. Subset-sum instances are however difficult
to solve, as any upper bound returns the same trivial value c, so we
cannot use bounding rules for cutting off branches before an optimal
solution has been found. On the other hand, large randomly generated
instances generally have many optimal solutions, meaning that any
permutation algorithm will easily reach the optimum.

d Strongly correlated instances: Such instances correspond to a real-


life situation where the return is proportional to the investment plus
some fixed charge for each project. The strongly correlated instances
are hard to solve for two reasons: 1) The instances are ill-conditioned
in the sense that there is a large gap between the continuous and
integer solution of the problem. 2) Sorting the items according to
nonincreasing profit-to-weight ratios correspond to a sorting according
to the weights. Thus for any interval of the ordered items there is a
small variation in the weights, making it difficult to satisfy the capacity
constraint with equality.

e Inverse strongly correlated instances: These instances are like strongly


correlated instances, but the fixed charge is negative.

f Almost strongly correlated instances: Are a kind of fixed-charge prob-


lem with variance. Thus they reflect the properties of both strongly
and weakly correlated instances.

1. 7 Notation
For the following presentation we define x(P) and z(P) to represent the
optimal solution of a problem P and its objective value, respectively. The
continuous relaxation of a problem P will be denoted by C(P) while the
Knapsack Problems 317

L(P,).) denotes the Lagrangian relaxation of the same problem using mul-
tiplier).. Finally, S(P,).) denotes the surrogate relaxed problem of P using
multiplier).. Other symbols used during the presentation are:

x*,z* optimal solution and the corresponding objective value


x,p,w break solution and the corresponding profit and weight sum
). surrogate or Lagrangian multiplier
8 core size
'Y median, middle value

In the following sections, we will consider several upper and lower bounds
for the Knapsack Problem. The worst-case performance ratio p(U) of an
upper bound U is defined as the smallest real number such that

U(P) < (U) (13)


z* - p
for a problem P with optimal solution z*. Similarly assume that a lower
bound zh on P has been obtained with an approximation algorithm H.
Then the worst-case performance ratio of algorithm H is defined as the
largest real number r(H) such that

Zh~) ~ r(H) (14)


z

1.8 Overview of the Chapter


In this chapter we will consider theoretical aspects of several problems from
the knapsack family, giving outlines of the most efficient algorithms. Several
new and unpublished results will be presented, and old results are presented
in a new and systematical way, using simplified proofs and algorithms where
possible. All the codes presented have been tested on the same computer
(HP9000 model 735/99, performance index according to SPEC [110]: 3.27
specint95, 3.98 specfp95). This gives an up-to-date performance index of
the different algorithms, as well as making it possible to compare algorithms
across sections.
Numerous results have been presented for Knapsack Problems, thus due
to the limited space, we have chosen to focus on those techniques which lead
to effective solution methods. We will however, as far as possible, also give
references to important theoretical results, which have not yet been applied
in the best algorithms.
318 D. Pisinger and P. Toth

Since the family of Knapsack Problems is so wide, only some of the


problems can be covered in this chapter. We have chosen to focus on the
most fundamental Knapsack Problems, since most research goes on in this
field, and since the solution techniques presented may also be used in some
of the more general cases. Thus in Section 2 we consider the 0-1 Knap-
sack Problem and show several fundamental properties. Upper and lower
bounds are presented, and these are combined to show the foremost solu-
tion techniques from the literature. The following sections generalize these
results to other kinds of knapsack problems. First, the Subset-sum Problem
is considered in Section 3. Although this problem is a special case of the 0-1
Knapsack Problem, specialized solution techniques are developed to fully
exploit the structure of the problem. In Section 4 we consider the knapsack
problem with multiple-choice constraints. Since it is a generalization of the
0-1 Knapsack Problem, it is interesting to see which principles are adapt-
able to the more general case, and which techniques are difficult to utilize.
Sections 5 and 6 consider the Bounded Knapsack Problem and Unbounded
Knapsack Problem, respectively. Despite being quite similar in their for-
mulation, these problems have very different properties and thus solution
techniques. Finally, Section 7 considers the Multiple Knapsack Problem,
which is NP-hard in the strong sense, and thus differs from the previous
pseudopolynomially solvable problems.
The text-book by Martello and Toth [80] is the most comprehensive work
on Knapsack Problems, and the reader is referred to it for additional de-
tails. Other surveys published recently are: Martello and Toth [74] consider
exact algorithms for the 0-1 Knapsack Problem and their average compu-
tational performance; the study is extended to the other linear Knapsack
Problems and to approximate algorithms in Martello and Toth [78]. Dudzin-
ski and Walukiewicz [24] analyze dual methods for solving Lagrangian and
linear programming relaxations. In addition, almost all books on Integer
Programming contain a section on Knapsack Problems: Papadimitriou and
Steiglitz [94], Syslo, Deo and Kowalik [111], Nemhauser and Wolsey [92] and
Ibaraki [56, 57] are some of the more recent ones.

2 0-1 Knapsack Problem


The 0-1 Knapsack Problem (KP) is the problem of choosing some of the
n items such that the corresponding profit sum is maximized without the
weight sum exceeding the capacity c. Thus it may be formulated as the
Knapsack Problems 319

following maximization problem:


n
maximize LP;:Z:j
j=l
n
(15)
subject to L WjXj ~ C,
j=l
Xj E {0,1}, j = 1, ... ,n,
where Xj is a binary variable having value 1 if item j should be included in
the knapsack, and 0 otherwise.
Many industrial problems can be formulated as 0-1 Knapsack Problems:
Cargo loading, cutting stock, project selection, and budget control to men-
tion a few examples. Several combinatorial problems can be reduced to KP,
and KP has important applications as a subproblem of several algorithms
of integer linear programming.
To avoid trivial cases we assume that Wj ~ c for all j = 1, ... , n and
'E/J=l Wj > c. Items violating the first constraint will not be present in any
solution, and thus we can set Xj = 0 for such items. Problems violating the
second constraint have a trivial solution with all items chosen.
Without loss of generality, we may also assume that Pj, Wj and c are
positive integers, since fractions can be handled by multiplying the coeffi-
cients by a proper factor, while instances with nonpositive coefficients can
be treated as described in Glover [45]: Let
NO = {j EN: Pj :$ 0, Wj ;::: O} N1 = {j EN: Pj ;::: 0, Wj :$ O}
N- = {j EN: Pj :$ 0, Wj :$ O} N+ = {j EN: Pj ;::: 0, Wj ;::: O}
(16)
All items j E NO will never be chosen, thus Xj = 0 for these items. Similarly,
all items j E N 1 must be present in an optimal solution, thus for all these
items Xj = 1. The remaining items are transformed as follows
Pj = -Pj, Wj = -Wj, Xj = 1- Xj for j E N-
(17)
Pj=Pj, Wj=Wj, Xj=Xj for jEN+
The transformed problem satisfies the nonnegativity of the coefficients and
thus may be solved as
maximize L PjXj + L Pj
jEN-UN+ jENIUN-

subject to L WjXj :$ C - L Wj (18)


jEN-UN+ jENIUN-
Xj E {O, 1}, j E N- U N+
320 D. Pisinger and P. Toth

Throughout this section we will consider KP in the maximization form


as it is equivalent to the minimization form
n
minimize 2: PjYj
j=1
n
(19)
subject to 2: WjYj ~ d,
j=1
YjE{O,l}, j=l, ... ,n,

by setting Yj = 1 - Xj and d = "£'1=1 Wj - c. Intuitively, we here minimize


the profit of the items not inserted in the knapsack.
In Section 2.1 we will present different upper bounds for the knapsack
problem and show how they may be tightened by addition of cardinality
bounds. Heuristic solutions for KP are discussed in Section 2.2, and Section
2.3 shows how the size of a KP may be reduced by using some upper bound
tests. Enumerative algorithms are presented in Sections 2.4 and 2.5, consid-
ering respectively branch-and-bound algorithms and dynamic programming
algorithms. In Section 2.6 we consider the solution of large-sized instances,
where a core is used to focus the enumeration on the most interesting items.
In Section 2.7 we consider the solution of hard problems, where basically all
the previous techniques are applied to obtain a stable and well-performing
algorithm. Although this text is mainly focused on exact solutions of Knap-
sack Problems, we show a few results on approximation schemes in Section
2.8. Finally, comprehensive computational experiments are presented in
Section 2.9.

2.1 Upper Bounds


The continuous relaxation C(KP) of (15) leads to the so-called Dantzig
bound [17], which may be derived as follows: Order the items according to
nonincreasing profit-to-weight ratios

PI > P2 > ... > Pn (20)


WI - W2 - - Wn

and then using the greedy principle items j = 1,2, ... are added to the
knapsack as long as
j
2: Wh ~ c. (21)
h=1
Knapsack Problems 321

The first item b which cannot be included in the knapsack is denoted the
break item and an optimal solution x to the continuous problem is given by
Xj = 1 for j = 1, ... , b - 1 and Xj = 0 for j = b + 1, ... , n while Xb is given
by
Xb = (c - E:~: Wi) /Wb. (22)
The continuous solution has objective value

U1 = z(G(KP)) =
b-l (b-l)
~Pj + ~Wj ~:
c- (23)

Since any integer solution will have an integral objective value, we may
actually tighten this bound to l U1J. An integer solution may be obtained
by truncating Xb to zero. The solution will be denoted the break solution x
and the profit sum of x is P = E~:t Pj while the weight sum is w = E~:t Wj.
Notice that if Xb = 0 in the continuous solution, then the break solution x
is an optimal solution x· to (15).
Balas and Zemel [6], independently from Fayard and Plateau [28], showed
that the Dantzig bound may be found in O(n) time through a partitioning
algorithm. Basically the problem is to find a break item b which satisfies
the following criteria:

for j = 1, ... ,b - 1
for j = b + 1, ... , n (24)

Thus finding b may be seen as a kind of weighted median problem, which


can be solved as follows: Let [s, t] (initially equal to [1, n]) be the current
interval of items. Choose 'Y as the median of Ps/w s , ... ,pt!Wt. Partition the
items into two intervals [s, i-I] and [i, t] such that Pi/wi ~ 'Y for j E [s, i-I]
=
and Pi/wi:::; 'Y for j E [i, t]. Let w E~~\ Wj. If w > c then the break item
b cannot be in [i, t], thus this interval is discarded, while otherwise b cannot
be in [s, i-I] and we discard the latter. The process is repeated until [s, t]
contains only one item, which is obviously the break item.

algorithm baLzemj
s t- Ij t t- nj W t- OJ
while (8 < t) do
'Y t- median(ps/ws , ... ,Pt!Wt)j
322 D. Pisinger and P. Toth

i +-Sj j +- tj W +- Wj
repeat
while (Pi/wi ~ 'Y) and (i < j) do W +- W + Wij i +- i + 1j
while (pj/Wj ::; 'Y) and (i < j) do j +- j -1;
if (i < j) then swap(i,j)j
until (i ~ j)j
if (w > c) then t +- i -1 else S +- ij W +- Wj
b +- tj

The partitioning of the interval [s, t] is done in the same way as in most
sorting algorithms: Search forward from s until an item i is met which does
not satisfy the ordering pi/wi ~ 'Y. Similarly, starting from t and searching
backward find the first item j violating Pj/Wj ::; 'Y. By swapping the items
i and j the ordering is satisfied, and the search can continue.
When 'Y is chosen as the median of the remaining items, each iteration
will discard half of the items, thus leading to a computational effort of O(n).
Balas and Zemel, however, experimentally showed that better average-case
performance can be obtained by choosing 'Y as a median-of-three.

2.1.1 Lagrangian Relaxation


The Lagrangian relaxation of KP with a nonnegative multiplier oX leads to
the problem L(KP, oX):

maximize tPjXj
j=l
+ oX (c - t
j=l
WjXj)
(25)
subject to Xj E {O, 1}, j = 1, ... ,n ,

The objective function can be restated as z(L(KP, oX)) = max "E']=1 PjXj +oXc,
where pj = Pj - oXWj for j = 1, ... , n, and the optimal solution to (25) is
easily determined in O(n) time as

X. =
J
{1° if pj
if pj
> 0,
< 0,
(26)

where Xj may be chosen to any of the two values when pj = O. The corre-
sponding objective value is

z(L(KP, oX)) = L pj + oXc • (27)


{j;pj/Wj>>'}
Knapsack Problems 323

Notice that the continuous relaxation of (25) will produce the same (integer)
solution as above, thus we have

z(L(KP, A)) = z( C(L(KP, A))) ~ z( C(KP)) ~ z(KP) . (28)

This means that the bound obtained by Lagrangian relaxation will never be
tighter than the continuous bound z(C(KP)). The value of A producing the
minimal value of z(L(KP, A)) is A* = Pb/Wb, in which case we exactly obtain
the continuous bound, thus z(L(KP,A*)) = z(C(KP)) = U1 [80].

2.1.2 Tighter Bounds


Martello and Toth [72] derived a tighter bound on KP than the continuous
bound by applying the integrality of Xb. Obviously, in an optimal solution
either Xb = 0 or Xb = 1. Thus first assuming Xb = 0 a valid upper bound on
KP is:
U' = lfi (c -
+ til) ~::~ J' (29)

and with Xb = 1 we have the bound

l
U" = PA+ Pb + (c A
- W - Wb ) -
Pb-l
- ,
Wb-l
J (30)

where fi and til are the profit and weight sums of the break solution. Both
bounds have been derived by relaxing the constraints on Xb-l and Xb+! to
Xb-l > 0 resp. xb+! > O. Now the Martello and Toth upper bound is given
by
U2 = max{U', U"} (31)
Obviously U2 can be derived in O(n) through the baLzem algorithm, and
it is easy to see that U2 ~ U1 [72].
Fayard and Plateau [29] used the same dichotomy to derive the bound

U3 = max {z(C(KP,Xb = 1)), z(C(KP,Xb = O))} (32)

where the continuous KP with additional constraint Xb = a can be solved


in O(n) time by using the bal....zem algorithm. Obviously U3 ~ U2 ~ Ul.
Generalizing the above principle, Martello and Toth [79] proposed the
derivation of arbitrarily tight upper bounds by using partial enumeration.
Let M ~ N be a subset of the items, and XM = {Xj E {O, I}, j EM}.
324 D. Pisinger and P. Toth

Then every optimal solution to KP must follow one of the paths through
XM, and an upper bound on KP is thus given by
UM = _max U(x), (33)
XEXM

where U(X) is any upper bound on (15) with additional constraint Xj = Xj


for j E M. For M = N we obtain the upper bound UM = z* i.e. the
optimal solution value. The computational effort is however large: O(2n).
A good trade-off between tightness and computational effort may however
be obtained by choosing M as a small subset of items with profit-to-weight
ratios close to that of the break item.
Miiller-Merbach [86] proposed a bound based on the fact that, if the con-
tinuous solution given by (22) becomes integral, then either Xb is truncated
to zero, or a variable Xj, j =1= b changes value to 1 - Xj. Thus we have

U4 = max {p, ~axLujJ, ~axLujJ}


3<b 3>b
(34)

The first term comes from assuming that Xb = 0, in which case we get the
objective value p. If we set Xj = 0 for j < b then an upper bound is given
by uj = P-Pj+ (c-W+Wj)Pb/Wb. Similarly, if we set Xj = 1 for j > b then
an upper bound is given by u'J = P + Pj + (c - W - Wj)Pb/Wb. Both bounds
can be derived by relaxing the integrality constraint on the break item to
Xb E !R.
The Miiller-Merbach bound is derived in O(n) time, and we have U4 $ U1
but no dominance exist between U3 and U4. Dudzinski and Walukiewicz [24]
further exploited the above technique, obtaining a bound which dominates
all of the above, and which can be derived in O(n) time.

2.1.3 Bounds from Minimum and Maximum Cardinality


Polyhedral theory has recently found application in the field of Knapsack
Problem where it has led to some efficient solution techniques for classes of
hard problems. The study of facets of the knapsack polytope however dates
back to 1975 when Balas [5], Hammer, Johnson and Peled [50] and Wolsey
[115] gave necessary and sufficient conditions for a canonical inequality to
be a facet of the knapsack polytope. Already in 1980 Balas and Zemel
[6] made the first experiments adding facets to the formulation of strongly
correlated knapsack problems, but the solution times were too large to show
the benefits of this approach. For a recent treatment of Knapsack Polyhedra,
see e.g. Weismantel [113].
Knapsack Problems 325

Valid inequalities do not exclude any integer solution, and are thus re-
dundant for the integer problem. The continuous relaxation is however
strengthened by addition of valid inequalities, leading to tighter continuous
bounds. We will here consider different inequalities which may be obtained
from minimum and maximum cardinality of an optimal solution. Inequali-
ties from minimum and maximum cardinality are very tight, and professional
mixed-integer programming solvers like CPLEX automatically impose car-
dinality constraints for each Boolean inequality.
Cardinality bounds as presented by e.g. Balas [5] and Martello and Toth
[83] are derived as follows: Assume that the weights are ordered according
to nondecreasing weights, and define k as

k = min { h : 2:7=1 W; > c} - 1 . (35)

Since items 1, ... , k have the smallest weights, every feasible solution will
comprise no more than k items, and thus we can impose a maximum cardi-
nality constraint of the form
n
LX; :5 k (36)
;=1
without excluding any feasible solution. However, adding a constraint which
is not violated will not tighten the formulation, thus for problem (15) we
will only add a maximum cardinality constraint if k = b - 1. Notice that
since the break solution is a feasible solution, we must always have k ~ b - 1.
In a similar way we may define minimum cardinality constraints. Assume
that the current lower bound is given by z and that the items are ordered
according to nonincreasing profits. We set

k = max { h : '2:7=1 P; :5 z} + 1 (37)

and thus we have the constraint


n
LX;;::: k (38)
;=1
for any solution with objective value larger than z. As before, there is no
reason to add this constraint to (15) if it is not violated, thus we will only
use the constraint when k = b. Notice that in every case k :5 b, since an
ordering according to (20) gives k = b in (37) and ordering according to
nonincreasing profits can only decrease this value.
326 D. Pisinger and P. Toth

Adding the constraints (36) or (38) to our model leads to a two-constraint


knapsack problem which may be difficult to solve, as it is N'P-hard in the
strong sense. The continuous relaxation may be solved by any LP-solver
which is however time-consuming for large instances. Thus Martello and
Toth [83] presented a specialized algorithm for the continuous solution as
follows:

We consider the KP with maximum cardinality constraint added, thus hav-


ing the problem (KP')
n
maximize L,PjXj
j=l
n
subject to L, WjXj $ c,
(39)
j=l
n
L,xj$k
j=1
XjE{O,l}, j=l, ... ,n,
By Lagrangian relaxing the cardinality constraint using a nonnegative multi-
plier >. ~ 0, and by relaxing the integrality constraints, one gets the problem
L(KP', >.) given by

maximize t.v;x; -){t. x;-k) t.Pxi +!.k =


n (40)
subject to L, WjXj $ C,
j=1
o $ Xj $ 1, j = 1, ... , n,
which is a continuous KP where the items have profits P = Pj - >. that may
be nonpositive. The above problem is easily solved for each value of >. by
using transformation (16) for the nonpositive values of Pj, and then sorting
the remaining items according to

- - >.> -
PI - P2 - >.
- > ... Pn -
>~-
>. (41)
WI - W2 - - Wn

in order to find the Dantzig bound. This can be done in O(n) using the
bal..zem algorithm. Let K(x, >.) denote the cardinality of the solution x to
L(KP', >.).
Knapsack Problems 327

We are interested in deriving the multiplier>' *which leads to the smallest


objective value of L(KP', >'), i.e.

z(L(KP', >'*)) = min {z(L(KP', >'))} (42)


..\~o

or, equivalently
IK(x, >'*) - kl = min IK(x, >.) - kl (43)
..\~o

since in this way the cardinality constraint (36) has the greates effect on the
continuous solution. It can be proved that a multiplier>. * exists such that
K(x, >.*) = k. Equation (43) may be solved efficiently due to the following

Theorem 2.1 (Martello and Toth [83]) K(x, >.) is monotonically non-
increasing as >. is increasing.

Proof. First note that if we have given two items i, j and two multipliers
>.', >''' then if

PI· - >" > P' - >.'


:..:',--_ and
Pi - >''' p' - >'''
< :..:'!....-- (44)
Wi - Wi Wi - Wi

then Wi ~ Wi, which can be seen by subtracting the second inequality from
the first. But this means that when>. increases from>.' to >''' then items i, j
may change places in the ordering (41), but according to (44) bigger items
will move to the first positions, thus never increasing the cardinality of the
solution. 0

The theorem means that we can use binary search to minimize (43).
Thus, initially set >'1 = 0 and >'2 = k'th largest Pi value. Now, repeatedly
derive>. = (>'1 + >'2)/2, and solve (40). If K(x, >.) = k, we have found the
optimal multiplier >'* = >.. Otherwise, if K(x, >.) > k set >'1 = >. and if
K(x, >.) < k set >'2 = >..
For each value of >. we need to solve (40), which can however be done
in O(n) time. Martello and Toth [83] investigated further properties of
L(KP', >'), and derived an algorithm which finds the optimal multiplier >'*
in O(n2 ) time. Obviously the derived bound

U5 = z(L(KP',>.*)) = z(C(KP')) (45)


is tighter than the continuous bound U1 since it contains U1 as a special
case for>. = O.
328 D. Pisinger and P. Toth

It is interesting to note that if instead we Lagrangian relax the capacity


constraint with ). ~ 0 in (39), without relaxing the integrality constraint,
we get
n
maximize L(Pj - ).Wj)Xj +).c
j=l
n
subject to L Xj ::5 k,
(46)
j=l
Xj E {O,l}, j = 1, ... ,n.
which can be solved in O(n) time by selecting the items corresponding to
the k largest positive Lagrangian profits (Pj - ).Wj). It can be easily proved,
however, that the best upper bound which can be obtained through this
relaxation is equal to Us.
Martello and Toth [83] derive symmetrical results for bounds from min-
imum cardinality, thus the reader is referred to this paper for more details.

Surrogate relaxation of the capacity constraint with maximum cardinal-


ity constraint (36) using multiplier ). ~ 0 gives the problem
n
maximize LPjXj
j=l
n
(47)
subject to L(Wj +
)')Xj ::5 c + k)',
j=l
Xj E {O, I}, j = 1, ... ,n .

which is also a Knapsack Problem. The latter problem may, however, be


easier to solve, since with a well-chosen multiplier )., the continuous bounds
are tighter than for the original problem. In general, it is however sufficient
to look at the continuous relaxation C(S(KP))
n
maximize LPjXj
j=l
n
(48)
subject to L(Wj + )')Xj ::5 c + k)',
j=l
o ::5 Xj ::5 1, j = 1, ... ,n .

in order to derive a tight bound on KP. For C(S(KP)) Pisinger [103] an-
alytically proved that the best choice of multiplier is ). = a for strongly
Knapsack Problems 329

correlated Knapsack Problems with Pj = Wj + Q. In this way the problem


becomes a Subset-sum Problem, and specialized algorithms can be used for
its solution. For less structured instances, the surrogate multiplier .x must
be determined through iterative techniques. Monotonicity of the continuous
solution to (47) can however be proved as in [83], thus making it possible
to use a binary search to efficiently derive the optimal multiplier. With
the best multiplier .x for C(S(KP)) one however gets the same bound U5 as
found in (45).

2.2 Heuristics
In section 2.1 we derived a feasible solution from the continuous solution
by truncating Xb to zero. The corresponding break solution x has objective
value
Z
,

= P = "b-I
A

L.tj=IPj· (49)
On average z' is quite close to the optimal solution value z. In fact z' ~
z ~ UI ~ Z' + Pb, i.e. the absolute error is bounded by Pb. The worst-case
performance ratio, however, is arbitrarily bad. This is shown by the series
of problems with n = 2, PI = WI = 1, P2 = W2 = k and c = k, for which
Zl = 1 and z = k, so the ratio Zl / z is arbitrarily close to 0 for sufficiently

large k.
Noting that the above pathology occurs when Pb is relatively large, we
can obtain a heuristic with improved performance ratio by also considering
the feasible solution given by the break item alone and taking the best of
the two solution values, i.e.
(50)
The worst-case performance ratio of the new heuristic is !. We have already
noted, in fact, that z ~ Zl + Pb, so, from (50) z ~ 2zh. To see that! is tight,
consider the series of problems with n = 3, PI = WI = 1, P2 = W2 = P3 =
W3 = k and c = 2k: we have zh = k + 1 and z = 2k, so zh / z is arbitrarily
close to ! for sufficiently large k.
The computation of zh requires O(n) time, once the break item is known.
If the items are sorted as in (20), a more effective algorithm is to start from
the break solution x and repeatedly insert the next item j ~ b if it fits. The
algorithm is known as the Greedy Algorithm and the obtained solution x
will have an objective value that is not smaller than Zl since we start from
the break solution. The worst-case performance can however again be as
bad as 0 (take e.g. the series of problems introduced for z').
330 D. Pisinger and P. Toth

The performance ratio of the greedy algorithm can be improved to


! if we also consider the solution given by the item of maximum profit
alone. Assume that the items are ordered according to (20) and let £ =
arg max {Pi, j = 1, ... , n} be the index of the most profitable item. Then
the greedy lower bound is given by

(51)

!
The worst-case performance ratio is since: (a) Pl ~ Pb, so z9 ~ z\ (b)
the series of problems introduced for zh proves the tightness. The time
complexity is O(n), plus O(nlogn) for the initial sorting.
When a 0-1 knapsack problem in minimization form (see Section 2) is
heuristically solved by applying the Greedy Algorithm to its equivalent maxi-
mization instance, we of course obtain a feasible solution, but the worst-case
performance is not preserved. Consider, in fact, the series of minimization
problems with n = 3, Pl = Wl = k, P2 = W2 = 1, P3 = W3 = k and q = 1,
for which the optimal solution value is 1. Applying the Greedy Algorithm
to the maximization version (with c = 2k), we get z9 = k + 1 and hence an
arbitrarily bad heuristic solution of value k for the minimization problem.
Fixed-cardinality heuristics have been proposed in [96]. These are es-
pecially well-suited for strongly correlated problems, since it can be shown
that optimal solutions of strongly correlated problems are characterized by
choosing the largest possible number of items and filling the knapsack as
much as possible.
The forward greedy solution is the best value of the objective function
when adding one item to the break solution:

zl = .max {p + Pi : tV + Wi ~ c}, (52)


]=b, ... ,n

and the backward greedy solution is the best value of the objective function
when adding the break item to the break solution and removing another
item:
(53)

The time complexity of bounds zl,zb is O(n) but the performance ratio is
arbitrarily close to zero.
The 0-1 Knapsack Problem accepts a fully polynomial approximation
scheme, as will be described in Section 2.8.
Knapsack Problems 331

2.3 Reduction
The size of a Knapsack Problem may be reduced by applying a reduction
algorithm to fix variables at their optimal value. For the following discussion
assume that a lower bound z has been found through any kind of heuristic
and that the corresponding solution vector x has been saved. The reason
for this assumption is that it allows us to use tighter reductions.
Ingargiola and Korsch [59] presented the first reduction algorithm for the
0-1 Knapsack Problem. This reduction algorithm and all similar algorithms
are based on the dichotomy of each variable Xj. If one of the two branches
cannot lead to an improved solution, we can fix the variable at the opposite
value.
Thus letuJ be an upper bound for (15) with additional constraint Xj = O.
Similarly let u} be an upper bound for (15) with constraint Xj = 1. Then
the set of items which can be fixed at optimal value Xj = 1 is
N 1 = {j EN: uJ < z+ I} (54)
In a similar way the set of items which can be fixed at Xj = 0 is
NO = {j EN: u} < z + I} (55)
the remaining items F = N \ (Nl U NO) are called free, and a solution with
objective value better than z may be found by solving the reduced problem
maximize z' = LXj +P
jEF
subject to ~w'x'<c-W
L...J 3 3 - , (56)
jEF
Xj E {a, I}, j EF,
where P is the profit of items fixed at one: P = EjENl Pj and W is the
corresponding weight sum W = EjENl Wj' The optimal solution value is
now found as z* = max{z, z'}, where z is the lower bound used in the
reduction.
The time complexity of the reduction depends on the complexity of the
applied bound. Dembo and Hammer [18] used the bounds
°= P~ - Pj + (
Uj C- W~ + Wj ) Wb
Pb u} = P+Pj+ (c - W - Wj) ~: (57)
which, as we saw in (34) are both obtained by relaxing the integrality con-
straint on the break item to Xb E In. Since each bound is derived in constant
time, the whole reduction may be performed in O(n).
332 D. Pisinger and P. Toth

Tighter reductions have been proposed by Ingargiola and Korsh [59].


Here uJ and uJ are derived by using the Dantzig bound for (15) with ad-
ditional constraint xi = 0, resp. xi = 1. Since the break item of the con-
strained problem may differ from b, a linear search is used to find the new
break item bJ (resp. bJ) for each of the problems. This means that each
bound is found in O{n) time, demanding totally O{n 2 ) for the reduction. If
the weights are randomly distributed, the new break item will be close to b
meaning that the whole reduction can be done in O{n) on average.
Notice that there is no reason for deriving uJ
for j < b since for these
items u} = Ul which cannot be used for fixing Xi at O. Similarly uJ should
not be derived for j > b as uJ = Ul.
A further improvement proposed by Ingargiola and Korsh is trying to
improve the lower bound z during the reduction. Since each break item
corresponds to a greedy solution, we may set z +- max{z,zO,zl} with

(58)

Martello and Toth [79] further improved the approach by using the bound
U2 instead of the Dantzig upper bound Ul, and by using a binary search to
find the break item. The latter needs a preprocessing of the items where we
set Pi = E{=l Pi resp. Wi = E{=l Wi· Then for any capacity d one may find
the break item b' in O{logn) time as the index which satisfies

(59)

where d = c - Wi if uJ is computed and d = c+ Wi for the opposite bound.


The Martello and Toth reduction algorithm also improves the sequel of the
execution by first finding the break item for all additionally constrained
problems, and during this process improving the lower bound z. Having
found and saved all break items, reductions (54) and (55) are performed.
This gives a complexity of O{n logn), and generally the reductions are more
effective than those of the Ingargiola and Korsch scheme, since a better lower
bound is generally applied.

2.4 Branch-and-bound Algorithms


The first branch-and-bound algorithm for the 0-1 Knapsack Problem was
presented by Kolesar [67] back in 1967. Since then several variants have
emerged. The most efficient of these are based on a depth-first search to
Knapsack Problems 333

limit the number of life nodes to O(n) at any stage of the enumeration.
Moreover, a greedy strategy for choosing the next item for branching seems
to lead to best solution times. Among the greedy depth-first algorithms
we may distinguish between two variants: A primal method which builds
feasible solutions from scratch until the weight constraint is exceeded, and
a primal-dual approach which accepts infeasible solutions in a transition
stage.
The primal method dates back to Horowitz and Sahni [54] and several
improved versions have emerged since then. In a recursive formulation, each
iteration corresponds to a branch on Xi where Xi = 1 is investigated before
Xi = 0 according to the greedy principle. The profit and weight sum of the
currently fixed variables Xj,j < i is p resp. Vi, thus if W > c for a given node,
we may backtrack. Also, if the Dantzig upper bound Ul for the remaining
problem is less or equal to the current lower bound z, we may backtrack.
The lower bound z is improved during the search while the optimal solution
vector X = x* is defined during the backtracking.

algorithm hs_branch(i,p,w): boolean;


var improved;
if (w > c) then return false;
if (p > z) then z +-- p; improved +-- true; else improved +-- false;
if (i > n) or (c - W < Wi) or (p + (c - W)Pi/Wi < z + 1)
then return improved;
if hs_branch(i + l,p + Pi, W + Wi) then Xi +-- 1; improved +-- true;
if hs_branch(i + l,p, w) then Xi +-- 0; improved +-- true;
return improved;

The original formulation of the Horowitz and Sahni algorithm may insert
more than one item for each branching node, but from a computational
point of view the above formulation is equivalent. To execute the algorithm
properly, all decision variables Xi must be set to 0 and the lower bound set
to z = 0 before calling hs_branch(O, 0, 0).
Notice how the optimal solution vector X is defined during the back-
tracking of the algorithm. This idea of lazy updating of the solution vector
is due to Martello and Toth [72]. Their mti algorithm is based on the same
frame as above, but the Martello-Toth upper bound is used instead of the
Dantzig bound, and a table is used for storing the last derived profit and
weight sums at each branching node. This is implicitly done by the above
recursive formulation. The use of a minimal weight table Wi = minj~i Wj is
334 D. Pisinger and P. Toth

also due to Martello and Toth. Obviously, we may backtrack whenever no


item j ~ i fits into the knapsack. Finally, the original mt 1 algorithm con-
tains a special dominance step; whenever an item is removed, only specific
items can replace it in order to obtain a better solution.
A comprehensive comparison of algorithms built over the above frame
is presented in Martello and Toth [81] showing that there are only minor
differences among the performance of the algorithms, for easy instances.
A different approach is what we could call the primal-dual algorithm by
Pisinger [96]. The algorithm starts from the break item b and repeatedly in-
serts an item if the current weight sum is less than c and otherwise removes
an item. In this way one only needs to consider solutions which are appro-
priately filled, but on the other hand infeasible solutions must be considered
in a transition stage. The variables fixed at some value are X s+1, ... ,Xt-l
where in the profit and weight sum p, w, we implicitly assume that all items
j $ s also have been chosen. Thus we insert an item when W $ c and remove
an item when W > c. As opposed to the hs_branch algorithm we cannot
backtrack whenever W > c, as infeasible solutions may become feasible by
removal of some items, but must rely on the Dantzig bound U1 to state when
backtracking should occur.
algorithm pLbranch(s, t,p, w): boolean;
var i, improved;
improved +-- false;
if (w $ c) then {insert some item j ~ t}
if (p > z) then z +-- p; improved +-- true; A +-- Uj
for i +-- t to n
if (p + (c - W)pdWi < z + 1) then return improved;
if pLbranch(s, i + 1,p + Pi, W + Wi)
then A +-- A U {Xi}; improved +-- true;
else
for i +-- s downto 1
if (p + (c - W)Pi/Wi < z + 1) then return improved;
if pLbranch(i - 1, t,p - Pi, W - Wi)
then A +-- A U {xd; improved +-- true;
return improved;
The algorithm is called pLbranch(b, b,p, w), and initially the lower bound
must be set to z = o.
As opposed to the hs_branch algorithm we cannot directly define the
solution vector during the backtracking. Instead, all changes performed are
Knapsack Problems 335

pushed to a stack A. Then the solution vector x is easily defined by first


setting Xj = 1 for j = 1, ... , b - 1 resp. Xj = 0 for j = b, ... , n, and then
changing
Xj +- 1 - Xj for j E A (60)
Notice that only solutions with weight sum close to the capacity c are con-
sidered:
Definition 2.1 A balanced filling is a solution x obtained from the break
solution through balanced operations as follows:
• The break solution x is a balanced filling.
• Balanced insert: If we have a balanced filling x with '£7=1 WjXj ::; c
and change a variable Xt, (t ~ b) from Xt = 0 to Xt = 1 then the new
filling is also balanced.
• Balanced remove: If we have a balanced filling x with ,£7=1 WjXj > c
and change a variable X S , (s < b) from Xs = 1 to Xs = 0 then the new
filling is balanced.

Theorem 2.2 (Pisinger [99]) An optimal solution to (15) is a balanced


filling, i. e. it may be obtained through balanced operations.

Proof. Assume that the optimal solution is given by x·. Let SI, ••• , SOt be
x:
the indices Si < b where i = 0, and t1, ... ,tp be the indices ti ~ b where
Xti = 1. Order the indices such that SOt < ... < Sl < b ::; tl < ... < tp.
Starting from the break solution x = x we perform balanced operations in
order to reach x·. As the break solution satisfies '£7=1 WjXj ::; c we must
insert an item tl, thus set Xtl = 1. If the hereby obtained weight sum
'£7=1 WjXj is greater than c we remove item SI by setting X S1 = 0, otherwise
we insert the next item t2' Continue this way till one of the following
three situations occur: (1) All the changes corresponding to {Sl,"" SOt}
and {tl, ... , tp} have been made, meaning that we have reached the optimal
solution x* through balanced operations. (2) We reach a situation where
'£7=1 WjXj > c and all indices {Si} have been used but some {til have not
been used. This however implies that x· could not be a feasible solution
from the start as the knapsack is filled and we still have to insert items. (3)
A similar situation occurs when '£7=1 WjXj ::; c is reached and all indices
{til have been used, but some {Si} are missing. This implies that x· cannot
be an optimal solution, as a better feasible solution can be obtained by not
removing the remaining items Si. 0
336 D. Pisinger and P. Toth

2.5 Dynamic Programming Algorithms


Bellman [8] presented the first exact solution technique for the Knapsack
Problem by using Dynamic Programming. The solution technique is based
on a recursive formulation of the problem, which is then evaluated in an
iterative way. Since several subsolutions overlap, these can be saved in a
table of size O(c), thus reaching the pseudo-polynomial complexity O(nc)
for solving KP.
At any stage we let fi(C) for 0 ::; i ::; nand 0 ::; C ::; c be an optimal
solution value to the subproblem of KP, which is defined on items 1, ... , i
with capacity c. Then the recursion by Bellman is given by

rc) = { fi-l(C) for c = 0, ... ,Wi- 1 (61)


I C max{fi-l (c), fi-l (C - Wi) + Pi} for c = Wi, ... ,c

while we set fo(c) = 0 for C= 0, ... , c.


Toth [112] improved the Bellman recursion in several respects: First of all
dynamic programming by reaching is used (see Ibaraki [57] for a definition)
and dominance is used to delete unpromising states. Moreover, if at the ith
stage a state has weight sum p. satisfying
n
p. < c- L Wj (62)
j=i+l

then the state will never reach a filled solution with p. = c and thus can be
fathomed. Moreover, if

(63)

then the state cannot be improved further and can thus be fathomed.

2.5.1 Primal-dual Dynamic Programming


The Bellman recursion builds the optimal solution from scratch by repeat-
edly adding a new item to the problem. A more efficient recursion [99]
should however take into account that generally only a few items around b
need be changed from their LP-optimal values in order to obtain the IP-
optimal values. Thus, assume that the items are ordered according to (20),
and let fs,t(c), (s::; b, t::::: b - 1, 0::; c::; 2c) be an optimal solution to the
Knapsack Problems 337

problem:
Ej:ipj + E~=sPjxj : }
fs,t(c) = {
max Ej:i Wj + E~=~ WjXj ~ c, . (64)
Xj E {O, I} for J = s, ... , t
We may use the following recursion for the enumeration

°
fs,t-l(C) if t b
~

f s t (C-) -- max { ffs,t-l(C - Wt) + Pt if t ~ b, c- Wt ~


(65)
, s+l,t (-)
C if s<b
fs+1,t(c + w s) - Ps if s < b, c+ Ws ~ 2c

setting fb,b-l (c) = -00 for c = 0, ... , W - 1 and fb,b-l (c) = p for c =
w, ... ,2c. Thus the enumeration starts at (8, t) = (b, b - 1) and continues
by either removing an item 8 from the knapsack, or inserting an item t into
the knapsack. An optimal solution to KP is found as h,n(c).
Since generally far less states than 2c need be considered at each stage,
the algorithm may be modified to dynamic programming by reaching, ob-
taining a time bound O(2t - s+1) for enumerating an interval [8, t] of items
around the break item. The pseudo-polynomial time bound is also valid
giving a different bound O(c(t - s + 1)).

2.5.2 Horowitz and Sahni Subdivision


Horowitz and Sahni [54] presented a different recursion based on the sub-
division of the original problem of n variables into two subproblems of n/2
size. For each of the subproblems a normal dynamic programming recursion
(61) is used to enumerate all states and the two sets are easily combined in
time proportional to the set sizes by considering pairs of states which have
the largest possible weight sum not exceeding c.
The worst-case complexity of the Horowitz and Sahni algorithm is O(2n/2)
as well as the pseudo-polynomial bound O(nc). Thus, for difficult problems
the recursion improves by a square-root over a complete enumeration of
magnitude O(2n).

2.5.3 Other Dynamic Programming Algorithms


A final recursion can be obtained based on the principle of balanced solutions
described in Definition 2.1. Pisinger [99] presented a dynamic programming
algorithm running in O(nTIT2) where Tl = maxwj and T2 ~ maxPj is the
338 D. Pisinger and P. Toth

gap between any upper and lower bounds on the problem. The algorithm is
however nontrivial, so a simplified version will be presented in the section
on Subset-sum Problems.
A topic which has not yet been dealt with is how the solution vector x
corresponding to the optimal solution value z should be found. Bellman gave
the simple answer that one should follow the states backwards and monitor
those choices that lead to the optimal solution. This however means that
all states must be saved during the solution process, which may be quite a
hindrance for large-sized problems.
A different approach is to save only the last two levels of the recursion at
any stage, thus reducing the space complexity to O(c). In this way one can
get information about the optimal value of the last variable X n , and previous
solution values can be found by repeatedly solving the problem with the last
variable removed. In this way, the computational effort however increases
to O(n 2 c).
Munro and Ramirez [87] improved this technique further. In their ver-
sion, every state h(c) of recursion (61) with i > n/2 should save the corre-
sponding profit and weight sum at level i = n/2. When the optimal solution
value is found as fn(c), then the problem of finding the solution vector may
be decomposed into two equally large subproblems of n/2 items each. Re-
cursively applying this approach one gets the space bound O(c) and a time
bound O(cnlogn).
The Munro and Ramirez technique cannot be directly used for primal-
dual dynamic programming algorithms like recursion (65), since these are
designed to terminate the enumeration before all items are considered, and
thus the subdivision at n/2 is not optimal. Instead, for each state fs,t(c),
one should save the corresponding profit and weight sum when the core size
t - s + 1 most recently became a power of two. This basically means that
each time t - s + 1 = 2k then the current profit and weight sums are saved
as part of all states, and these values follow all succeeding states. The space
bound is again O(c) and the time bound for deriving an optimal solution
becomes O(clCllog ICI) for a final core C.

2.6 Solution of Large-sized Problems


When large-sized knapsack problems are solved it may be beneficial to con-
sider a core problem - basically a problem defined on a subset of the items
where there is a high probability of finding an optimal solution. The first
algorithm of this kind was presented by Balas and Zemel [6] where the core
Knapsack Problems 339

problem was used to avoid a complete sorting of the items according to (20).
The concept of solving a core problem unfortunately has some draw-
backs, as pointed out by Pisinger [97] since it tends to "count the chickens
before they are hatched". However, the technique is important as it allows
one to focus the search on those items that are most interesting according
to general knowledge. In Fig. 2 we saw that an optimal solution generally
corresponds to the break solution apart from some items close to the break
item which have been changed. Thus obviously it is more interesting to start
the enumeration around the break item than, say, at item 1.
Balas and Zemel (6] presented it as follows: Assume that the items are
ordered according to nonincreasing profit-to-weight ratios as in (20), and let
an optimal solution be given by x*. Now let

. = mm
Jl . {'J: Xj* = O} h = max{j : xj = I} (66)

then the core is given by the items in the interval C = [h, h] and the core
problem is defined as

maximize LPjXj+P
jEC
subject to L....J J J <
~w·x· - c- W , (67)
jEC
Xj E {O, I}, j EG

where P = Lj<il Pj and W = Lj<jl Wj. For several classes of instances,


with strongly correlated problems as an important exception, the size of the
core is a small fraction of n. Hence if we knew a priori the values of jl and
x;
h we could easily solve the complete problem by setting = 1 for j < j1
and x; = 0 for j > h and simply solving the core problem through any
enumerative algorithm.

2.6.1 Deriving a Core


The definition of the core C is based on the knowledge of the optimal solution
x*. Since this knowledge is obviously not present, most algorithms rely on
an approximated core, basically choosing 28 items around the break item.
The following algorithm finds the break item b and sorts 28 items around
b such that the core is given by G = [8, t] = [b - 8, b + 8]. The algorithm
is a quicksort algorithm with slight modifications such that the interval
containing b is always partitioned first. This means that b is well-defined
340 D. Pisinger and P. Toth

when the second interval is considered upon backtracking, and thus it is


trivial to check whether the interval is outside the wanted core G.

algorithm findcore(s, t, w)j


comment [s, t] is an interval to be partitioned, and W = Ej:t Wj
As in baLzem partition [s, t] in two intervals satisfying Pj/Wj ~ 'Y
for j E [s, i-I] and Pj/Wj $ 'Y for j E [i, t].
M oreover set W- = ",i-l
L.Jj=1 Wj.
if (w $ c) and (c < W + Wi) then b f- ij
if (w > c) then
findcore(s,i -l,w)j
if (i $ b + 0) then findcore(i, t, w) else L f- L U (i, t)
else
findcore(i, t, w)j
if (i -1 ~ b - 0) then findcore(s,i -l,w) else H f- H U (s,i -1)

The procedure is called findcore(l, n, 0) and it returns with items in G =


[b - 0, b + 0] sorted while all preceding items have the profit-to-weight ratios
larger than that of the items in the core, and succeeding items have smaller
profit-to-weight ratios.
As already seen, the findcore algorithm is a normal sorting algorithm
apart from the fact that only intervals containing the break item are par-
titioned further. Thus all discarded intervals represent a partial ordering
of the items according to Pj/Wj which may be used later in an enumerative
algorithm. Hence, Pisinger [96] proposed to push the delimiters of discarded
intervals into two stacks H and L as sketched in the algorithm.
Several proposals have been made as to how large a core should be
chosen. Balas and Zemel [6] claimed that the size of the core is constant
IGI = 25. Martello and Toth [79] however chose IGI = .;n since more items
are needed in order to prove optimality. Pisinger [105] found the minimal
core size according to definition (66) for several problems. These are given
in Table 1 as average values of 100 instances. For comparison Table 2 gives
the minimal core size which is necessary in order to prove optimality of the
solution by a branch-and-bound algorithm using continuous bounds. The
instances are generated with weights distributed in [1, R], having different
problem sizes n. Notice that difficult problems demand a very large core
to find the optimal solution, and even more items need to be enumerated
before optimality can be proved. The entries considered are average values
for different capacities c and there may be a large variation in the core size
~
~
g;
Table 1: Minimal core size (number of items). Average of 100 instances
uncorr. weak.corr str.corr inv.str.corr aI.str.corr subs. sum sim.w
n\R 103 104 10 3 104 103 104 103 104 10 3 104 103 10 4 105 *~
50 4 3 8 8 10 12 7 9 12 12 11 15 0 0-
100 5 5 10 12 12 13 11 12 17 20 10 14 0 ~
200 6 7 12 14 18 18 14 17 24 26 10 13 0 tt:l

500 8 8 15 17 25 25 21 23 32 41 10 13 0
1000 9 11 14 17 34 36 31 32 41 59 10 13 0
2000 11 13 13 20 46 44 43 40 53 72 10 13 0
5000 12 14 11 21 76 79 65 62 68 97 10 13 0
10000 11 17 16 25
----
105 104 93~-
88 -
71 106
------------
10
--_.- ~L 0
- -

Table 2: Core size needed to prove optimality (number of items). Average of 100 instances
uncorr. weak.corr str.corr inv .str .corr aI.str.corr subs.sum sim.w
n\R 103 104 10 3 104 103 104 10 3 10 4 103 104 103 10 4 10 5
50 5 5 11 11 20 21 19 20 20 19 12 16 15
100 7 7 13 14 38 39 35 37 38 40 12 15 32
200 9 9 16 16 76 81 68 80 73 70 11 15 64
500 12 12 17 22 192 194 180 197 159 170 11 15 153
1000 13 16 17 24 392 407 369 393 261 332 11 15 302
2000 15 18 16 27 739 693 722 649 439 526 11 14 595
5000 15 21 12 28 1956 2034 1774 1876 731 998 11 15 1460
10000 14 24 12 29 3889 3768
~-
3633 3926 804 1301 11 15 2585
w
..,.
~
342 D. Pisinger and P. Toth

depending on c.
Goldberg and Marchetti-Spaccamela [46] claimed that the core size in-
creases with the size of n and thus the proposed core size by Balas and
Zemel of ICI = 25 is not correct. Table 1 does not give a unique answer
to this question, as the core size apparently depends on n as well as on the
instance type and on the range R of the weights. Apart from the strongly
correlated instances, choosing ICI = 50 should be sufficient.

2.6.2 Fixed-core Algorithms


Algorithms which solve a fixed core have been presented by Balas and Zemel
[6], Fayard and Plateau [29] and Martello and Toth [79]. The latter, which
is the most general of the algorithms, may be sketched as follows:

algorithm mt2

1 Determine an approximate core C = [b - 0, b + 6] using the findcore


algorithm. Sort the items in the core and solve the core problem to
optimality, deriving a heuristic solution Zc.

2 Derive the enumerative bound Uc given by (33). If Uc $ E~:~-l Pi +


Zc then the core solution is optimal, and we may terminate.

3 Reduce the problem by using the Martello-Toth version of reduction


(58).

4 If all variables j ¢ C have been fixed at their optimal value, the core
solution is optimal and we may terminate.

5 Sort the items not fixed at their optimal value, and solve the remaining
problem to optimality.

The core size in Step 1 is chosen as 6 = Vii for n ;;::: 100 while ICI = n
for smaller instances, meaning that all items are considered for small prob-
lems. The enumerative algorithm used in step 1 and 5 is the mt 1 algorithm
described in Section 2.4.
The algorithm by Balas and Zemel [6] differs from the above frame, by
only finding an approximate solution to the core problem in Step 1, and by
choosing a core size of ICI = 50. The Dantzig upper bound is used in Step 2
instead of the tighter enumerative bound. Finally the enumeration in Step
5 is based on a branch-and-bound algorithm by Zoltners [118].
Knapsack Problems 343

The Fayard and Plateau algorithm [29] basically solves a core of size
one, since only the break item is derived using algorithm baLzem. How-
ever, knowing b, a heuristic solution may be derived by using the greedy
algorithm without sorting. The items are then reduced using the Dembo
and Hammer bounds (57). Finally the remaining problem is enumerated by
using a branch-and-bound algorithm specialized for this problem.
A detailed description of all the above algorithms can be found in Martello
and Toth [80] including some computational comparisons of the algorithms.

2.6.3 Expanding-core Algorithms


Realizing that the core is only a guess on which items may be the most
interesting to consider, a different principle is to use expanding-core algo-
rithms, which simply start with an empty core and add more items when
needed. Using branch-and-bound for the enumeration, one gets the expknap
algorithm presented by Pisinger [96]:

procedure expknap
1 Find core C = [b, b] through algorithm f indcore pushing all discarded
intervals into two stacks H and L containing respectively items with
higher or lower profit-to-weight ratios than that of the break item b.

2 Find a heuristic solution z using the forward and backward heuristics


(52) and (53).
3 Run algorithm pLbranch; for each step testing if the set [s, t] is within
the current core C. If this is not the case, the next interval from H or
L is popped, and the items are reduced using (57) for fixing variables
at their optimal values. The remaining variables are sorted according
to nonincreasing profit-to-weight ratios, and added to the core C.

Since the sorting and reduction is done as they are needed, no more effort
will be made than absolutely necessary. Also the reductions are tighter when
postponed in this way, since one may expect that a better lower bound is
found the later a reduction test is performed in an enumeration.
The branch-and-bound algorithm however may follow an unpromising
branch to completion, thus basically extending the core with all items, even
if the instance could be solved with a smaller core by following a different
branch. This inadequate behavior is avoided by using dynamic programming
since the breadth-first search basically ensures that all variations of the
344 D. Pisinger and P. Toth

solution vector have been considered before a new item is added to the core.
This idea is due to [105] and can be sketched as:

algorithm minknap

1 Find an initial core C = [b, b] as in the expknap algorithm.

2 Run the dynamic programming recursion (65) alternately inserting an


item t or removing an item s. Check if s, t is within the current core
C, otherwise pick the next interval from H or L, reduce and sort the
items, finally adding the remaining items to the core.

3 The process stops when all states in the dynamic programming have
been fathomed due to an upper bound test, or all items j ¢ C have
been fixed at their optimal value.

Both algorithms have the property that the core size adapts to the problem,
but due to the breadth first-search of the dynamic programming, one can
show that

Theorem 2.3 (Pisinger [105]) The enumerated core by algorithm minknap


is minimal.

The meaning of "minimal" is in this context, that one cannot find a smaller
core, which has symmetrical size around the break item b, such that a fixed-
core algorithm like mt2 will terminate before reaching the last enumeration
in Step 5. Tables 1 and 2 are immediate results of the above theorem.

2.6.4 Inconveniences in Core Problems


For large-sized problems it is generally beneficial to focus the search on those
items which are most interesting, i.e. solve a core problem. But in some
situation this focusing makes the problem more difficult than the original
complete KP. This behavior has been studied by Pisinger [97].
A core is by definition a collection of items where the profit-to-weight
ratio is close to that of the break item b. Pisinger's experiments however
demonstrated that in some cases this definition of a core results in many
items with similar or proportional weights. This degeneration of the core
results in a difficult problem, since with many proportional weights it is
difficult to obtain a solution which fills the knapsack. With this lack of a
good lower bound, many algorithms get stuck in the enumeration of the
Knapsack Problems 345

core. The paradox is that in most situations one can find an item which fills
the knapsack to completion by considering items outsides the core, but the
algorithm which solves the core problem cannot "see" this item .

...................................... ...... ..-....-.................


. ..
..-...... ....... ...........
c
10% 50% 100%
Figure 4: Average log computational times for mt2 in seconds, as a function
of the capacity c. Weakly correlated instances, n = 3000, R = 100. The core
has size lei = 100

A computational comparison in [97] showed that branch-and-bound al-


gorithms are sensitive to the choice of the core, while dynamic program-
ming algorithms are generally not affected in their performance, since simi-
lar weights are handled efficiently by dominance criteria. The degeneration
of a core however only occurs for large-sized instances and for some choices
of the capacity, as illustrated in Fig. 4. When c is about 45% of the total
weight sum Ej=l Wj the core problem is 10 times more difficult to solve
than when c = 50% of the weight sum. Methods to avoid such inadequate
behavior are either choosing a larger core, or putting a time limit on the
enumeration of the core (see e.g. Martello and Toth [83]).

2.7 Solution of Hard Problems


Martello and Toth [83] recently published an algorithm mthard specially
designed for hard problems, which uses cardinality bounds to effectively
limit the enumeration.
Before starting the branch-and-bound algorithm with cardinality bounds,
an attempt is made to solve the problem by less demanding techniques. Thus
the mt2 algorithm is initially called with a limit on the number of branching
steps it may perform. If the mt2 terminates with an optimal solution z, the
algorithm stops.
If the mt2 algorithm halts without proving optimality of the solution
found, the branching process starts. The upper bounds are derived by using
the cardinality bounds described in Section 2.1.3. If the upper bound u
346 D. Pisinger and P. Toth

derived satisfies u > z then a branching step is performed by exploring the


dichotomy on the next binary variable Xj. Let Xj be the value of Xj in a
continuous solution to KP. Then the branch Xj = rXj 1 is made first, thus
following the greedy principle.
The algorithm is further improved by using greedy techniques to improve
z, dominance steps to fathom dominated nodes, and finally partial enumer-
ation to build a table of optimal profit sums for small residual capacities.

2.8 Approximation Schemes


As the KP is N'P-hard, some instances may be impossible to solve to op-
timality within a reasonable amount of time. In such situations one may
be interested in an approximate solution with objective value z, where the
relative error is bounded from above by a certain constant e, i.e.
z-z*
- -* <-e, (68)
z
where z* is the optimal objective value. Fully polynomial approximation
schemes must satisfy the condition that for any e > 0 they find a feasible
solution satisfying (68) in time polynomially bounded by the size of the
problem and by lie.
Ibarra and Kim [58] presented the first fully polynomial approximation
scheme for the 0-1 Knapsack problem. The algorithm is based on dynamic
programming, where state space relaxation (see Ibaraki [57] for a thorough
treatment of this field) is used in order to limit the number of possible states.
Since the relative error by scaling profits is larger for the small profits,
Ibarra and Kim divide the items into those with large profits, and those
with small profits. The first group of items is enumerated through dynamic
programming, and a greedy algorithm is used to improve the enumerated
states by adding some additional items from the second group.

algorithm iba-Itim
Assume that the items are ordered according to nonincreasing profit-to-
weight ratios (20) and let the break item be defined by b= min{h : Ej=l Wi >
c}. IfE;:t Wj = c then the break solution z = p is optimal, and the algo-
z
rithm stops. Otherwise let = E~=l Pj be an upper bound on the objective
value. Obviously an optimal solution z* must satisfy

z/2 ~ z* < z, (69)


Knapsack Problems 347

since z* ~ max{z - Pb, Pb}, thus 2z* ~ z. The upper bound z is used to
partition the items into two groups such that
Pj > ze/3, for ; = 1, ... , s,
(70)
Pj ~ ze/3, for ; = s + 1, ... , n
still preserving the ordering (20) on each of the intervals.
State space relaxation is used for the dynamic programming algorithm,
thus scale the profits with a factor 6 = z{e/3)2, obtaining Pj = [pj/6J,
for; = 1, ... , s. As z is an upper bound on the objective value, it is not
possible to obtain profits larger than q = Lz/6J = L{3/e)2 J in the dynamic
programming. Let 1i{7r), (O ~ 7r ~ q, 0 ~ i ~ s) be the smallest weight
sum, such that a solution with scaled profit sum equal to 7r can be obtained
on the variables; = 1, ... , i. Thus
fi{7r) = min {E~=l Wj : E~=lPjXj = 7r, Xj E {O, I},; = 1, ... , i}. (71)
The normal Bellman recursion can be used to evaluate Ii as
t{
,7r
)={ li-l{7r) for 7r=0, ... ,Pi- 1 }
min{li-l{7r),Ii-l{7r-Pi)+Wi} for 7r=Pi, ... ,q ,
(72)
setting fO{7r) = 00 for 7r = 1, ... , q and fo{O) = O. Now for all states
fs{7r), 7r = 0, ... , q where fs{7r) i= 00 a greedy solution is found by inserting
some items j = s + 1, ... , n into the knapsack to fill the residual capacity
c- fs{7r). Let z be the objective value of the best heuristic solution obtained
in this way.
Theorem 2.4 The space and time complexity of iba...kim is O{n/e2) and
hence polynomial in nand 1/e.
Proof. The dynamic programming part considers q = L{3/e)2 J states at each
stage of i = 1, ... , s, which gives the space complexity O{nq), i.e. O(n/e 2).
The time complexity is O(nq) for the dynamic programming, while the
heuristic filling demands considering n - s items for each value of 7r =
0, ... , q, giving the complexity O{nq). Thus the time bound becomes O(n/e2 ).
Actually the time bound should also embrace the initial sorting, which is
however obviously polynomial in n. 0

Theorem 2.5 For any instance of KP we have {z* - z)/z* ~ e where z* is


the optimal solution value and z is the heuristic value returned by the above
algorithm.
See Ibarra and Kim [58] or Martello and Toth [80] for a proof.
348 D. Pisinger and P. Toth

2.9 Computational Experiments


We will compare the performance of minknap, mthard and mt2. With mt2
as a reference point, readers can go further back using [80].
We will consider how the algorithms behave for different problem sizes,
instance types, and data ranges. Six types of randomly generated data
instances are considered as sketched below. Each type will be tested with
data range R = 1000 and 10 000 for different problem sizes n. We consider
the following problems: Uncorrelated instances are generated by chosing
Pi, Wj randomly in [1, R]. Weakly correlated instances: the weights Wi are
distributed in [1, R], and the profits Pi in [Wi - R/lO, Wi + R/lO] such that
Pi ~ 1. Strongly correlated instances: the weights Wi are distributed in
[1, R], and Pi = Wi + R/lO. Inverse strongly correlated instances: the profits
Pi are distributed in [1, R], and Wi = Pj + RllO. Almost strongly correlated
instances: the weights Wj are distributed in [1, R], and the profits Pj in
[Wj + RllO - R1500, Wi + RllO + RI500]. Subset-sum problems: the weights
Wi are randomly distributed in [1, R] and Pj = Wi. Uncorrelated problems
with similar weights: the weights are distributed in [100000, 100100] and the
profits Pi in [1,1000].
We are considering series with H = 100 instances for each instance type.
The capacity is in each instance chosen as c = H~l Ej=l Wi, for test h in
a series of H tests. This is to smooth out variations due to the choice of
capacity as described in Section 2.6.4. To respect the assumption that every
item fits into the knapsack, the capacity is however chosen not smaller than
the largest weight.
Tables 3 to 5 compare the solution times of the three algorithms. All
tests were run on a HP9000/735, and a time limit of 5 hours was assigned
to each instance type for all H instances. If some of the instances were not
solved within the time limit, this is indicated by a dash in the table.
The oldest code, i.e. mt2, has the overall worst performance, although it
is quite fast on easy instances like the uncorrelated and weakly correlated
ones. Moreover it is the fastest code for the Subset-sum instances. For
strongly correlated instances, mt2 performs badly, as it is able to solve only
tiny instances.
The dynamic programming algorithm minknap has an overall stable per-
formance, as it is able to solve all instances within reasonable time. It is
the fastest code for uncorrelated and weakly correlated instances, and it
has almost as good performance for the Subset-sum instances as mt2. The
strongly correlated instances take considerably more time to be solved but
~
~
Table 3: Average solution times in seconds (mt2)
g;
uncorr. weak.corr str.corr inv.str.corr al.str.corr subs. sum sim.w
R--
n\R 10 3 10 4 103 104 103 10 4 10 3 10 4 103 104 103 10 4 10 5 o~
50 0.00 0.00 0.00 0.00 0.06 0.04 0.01 0.02 0.03 0.03 0.00 0.01 0.02 2::
100 0.00 0.00 0.00 0.00 26.26 24.78 4.44 - 5.90 16.02 0.00 0.01 3.28
200 0.00 0.00 0.00 0.00 - - - - - - 0.00 0.02 -
13
tI)

500 0.00 0.00 0.01 0.01 - - - - - - 0.00 0.02 -


1000 0.00 0.01 0.01 0.02 - - - - - - 0.00 0.02 -
2000 0.01 0.01 0.01 0.04 - - - - - - 0.00 0.02 -
5000 0.01 0.02 0.01 0.08 - - - - - - 0.01 0.02 -
10000
-
0.02 0.05 0.02 0.13
- - - - - - - - _.. - - - -
- - - --~-
- -
---- ---
- 0.01 0.03 -

Table 4: Average solution times in seconds (minknap)


uncorr. weak.corr str.corr inv .str .corr al.str.corr subs. sum sim.w
n\R 10 3 10 4 103 104 103 104 103 104 103 104 10 3 10 4 10 5
50 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.02 0.00 0.01 0.00 0.03 0.00
100 0.00 0.00 0.00 0.00 0.02 0.17 0.01 0.18 0.01 0.03 0.00 0.03 0.00
200 0.00 0.00 0.00 0.00 0.05 0.82 0.04 0.65 0.04 0.15 0.00 0.03 0.01
500 0.00 0.00 0.00 0.00 0.20 2.52 0.19 2.80 0.16 0.88 0.00 0.03 0.03
1000 0.00 0.00 0.00 0.01 0.48 8.30 0.45 7.59 0.37 3.18 0.00 0.03 0.10
2000 0.00 0.00 0.00 0.01 0.96 13.17 1.09 14.16 0.72 8.57 0.00 0.03 0.35
5000 0.00 0.01 0.01 0.02 3.73 54.11 3.20 54.66 1.63 26.57 0.01 0.04 1.32
10000 0.01 0.01 0.01 O.~ ~.~ 115.4~ ,--6.57_12~.84
-~
1.83 48.33 0.01 0.04 1.57
,j:::..
""
to
~
0"1
o

Table 5: Average solution times in seconds (mthard)


uncorr. weak.corr str.corr inv.str.corr al.str.corr subs.sum sim.w
n\R 103 10 4 103 104 103 104 103 104 103 104 103 104 105
50 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.03 0.00 0.01 0.00
100 0.00 0.00 0.00 0.00 0.01 0.02 0.01 0.01 0.03 0.14 0.00 0.01 0.01
200 0.00 0.00 0.00 0.00 0.04 0.05 0.03 0.04 0.06 0.36 0.00 0.02 0.03
500 0.00 0.00 0.01 0.01 0.09 0.09 0.08 0.09 0.10 0.75 0.00 0.01 0.06
1000 0.01 0.01 0.01 0.02 0.15 0.23 0.14 0.16 0.19 1.01 0.00 0.02 0.11
2000 0.01 0.02 0.01 0.03 0.17 0.38 0.18 0.23 0.31 0.81 0.00 0.01 0.18
5000 0.02 0.04 0.02 0.06 0.17 1.75 0.33 0.66 2.55 1.46 0.00 0.02 0.24
10000 0.04 0.08 0.02 0.10 0.28 5.89 0.48 1.64 - - 0.01 0.68 0.35
~
::t1

~
...
g;
Q..
~

~
So
Knapsack Problems 351

although minknap uses simple bounds from continuous relaxation, the pseudo-
polynomial time complexity gets it safely through these instances.
The mthard algorithm has an excellent overall performance, as it is able
to solve nearly all problems within seconds. The short solution times are
mainly due to the cardinality bounds described in Section 2.1.3 which make
it possible to terminate the branching after a few nodes have been explored.
There are however a few anomalous entries for large-sized almost strongly
correlated problems, where the cardinality bounds somehow fail. Also for
some large-sized Subset-sum problems the mthard algorithm takes an un-
necessarily long time.

3 Subset-sum Problem
Assume that a knapsack of capacity c is given, and that a subset of n items
should be selected for the knapsack. Item j has weight Wj and we want
to obtain the largest weight sum not exceeding c. Thus the Subset-sum
Problem (SSP) can be stated:
n
maximize L WjXj
j=l
n
(73)
subject to L WjXj $; c,
j=l
Xj E {O,l}, j = 1, .•• ,n.

where Xj = 1 if item j is selected and Xj = 0 otherwise. The Subset-sum


Problem is related to the diophantine equation
n
LWjXj =c, (74)
j=l
Xj E {O, I}, j = 1, ... ,n.

in the sense that the optimal solution of SSP is the largest c for which (74)
has a solution. SSP is also called the Value-independent Knapsack Problem
since it is a special case of the 0-1 Knapsack Problem arising when Pj = Wj
for all j = 1, ... , n. Hence without loss of generality we may assume that all
weights Wj and the capacity c are nonnegative integers, that Ej=l Wj > c
and finally that Wj < c for all j = 1, ... ,n. Violations of these assumptions
may be handled as described in Section 2.
352 D. Pisinger and P. Toth

The Subset-sum Problem appears in several Cutting problems where


the most possible material should be cut out of the length c. SSP also
appears as a subproblem in Multiple Knapsack Problems [101], and it can be
used to strengthen constraints in the formulation of an integer-programming
problem [19, 20, 27].

For SSP, all upper bounds based on some kind of continuous relaxation
as described in Section 2.1-2.1.2 give the trivial bound u = c. Thus al-
though SSP in principle can be solved by every KP algorithm, the lack of
tight bounds may imply an unacceptably large computational effort. This is
however seldom the case when the range of the weights is not large, as these
instances generally have many solutions to (74), and thus the enumeration
may be terminated when such a solution is found (See e.g. Tables 3-5 in
Section 2.9).

Also, owing the lack of tight bounds, it may be necessary to apply heuris-
tic techniques to obtain a reasonable solution within a limited time. Several
approximate algorithms are considered in Martello and Toth [80]. Recently
Gens and Levner [42] presented a fully polynomial approximation scheme
with an improved time bound. The algorithm finds an approximate solu-
tion with relative error less than e in time O(min{n/e,n+ 1/e3 }) and space
O(min{n/e,n + 1/e2 }).

Probabilistic results for SSP are considered in d'Atri and Puech [3]. As-
suming that the weights are independently drawn from a uniform distribu-
tion over {I, ... , c(n)} and the capacity from a uniform distribution over
{I, ... , nc(n)} where c(n) is an upper bound on the weights value, they
proved that a simple variant of the greedy algorithm solves SSP with prob-
ability tending to 1.

We will start the treatment of SSP by considering upper bounds tighter


than u = c in Section 3.1. Then Section 3.2 deals with different dynamic
programming algorithms: If the coefficients are not too large, dynamic pro-
gramming algorithms are generally able to solve Subset-sum Problems even
where the diophantine equation has no solution. In Section 3.3 we will
present some hybrid algorithms which combine branch-and-bound with dy-
namic programming. Section 3.4 shows how large-sized instances can be
solved by defining a core problem, and finally Section 3.5 gives a computa-
tional comparison of the algorithms presented.
Knapsack Problems 353

3.1 Upper Bounds


As previously mentioned, all upper bounds for SSP based on some kind of
continuous relaxation, yield the trivial bound u = c. The set of feasible solu-
tions for a KP and an SSP are identical, thus the same polyhedral properties
apply. In particular, this means that minimum and maximum cardinality
bounds are defined as in Section 2.1.3. Starting with the maximum cardi-
nality bound, assume that the weights are ordered by WI :5 W2 :5 ... :5 Wn
and let the break item b be defined by b = min{h : Ej=1 Wj > c}. We may
now impose the maximum cardinality constraint to (73) as
n
L:Xj :5 b-1. (75)
j=1
An obvious way of using this constraint is to choose the b-llargest weights
in = Ej=n-b+2 Wj, getting the bound

u = min{c,w}. (76)

If we surrogate relax the cardinality constraint, using a nonnegative multi-


plier A, we get the problem S(SSP, A) given by
n
maximize L: WjXj
j=1
n
(77)
subject to L:(Wj + A)Xj :5 c + A(b - 1),
j=1
Xj E {O, I}, j = 1, ... ,n ,

which is an inverse strongly correlated knapsack problem. Similarly La-


grangian relaxing the cardinality constraint leads to a strongly correlated
knapsack problem. Since in Section 2 we saw that these instances may be
difficult to solve, one may consider the continuous relaxation instead. For
C(S(SSP, A)) the items are already ordered according to nondecreasing ra-
tios of Wj/(Wj + A), thus let the break item be defined by b' = max{h :
Ej=h Wj > c}. The continuous solution is then given by

z(C(S(SSP, A))) = t
j=b'+1
Wj + (c + A(b - 1) - t
j=b'+l
(Wj + A)) Wb' A
Wb' +
(78)
354 D. Pisinger and P. Toth

Since n - b' ~ b - 1 we get

z(C(S(SSP,.\))) ~ t w· + (c- t w.)


j:;::b'+1 3 j:;::b'+1 3
Wb'
Wb' + .\
(79)

= c Wb' + t (1 _ Wj Wb' )
+.\
Wb' j:;::b'+1 Wb' +.\

which can be written ca + w(1 - a) with 0 ~ a = Wb' /(Wb' +.\) ~ 1, thus


z(C(S(SSP, .\))) is a convex combination of c and wand hence cannot be
tighter than 'U given by (76).
If we instead Lagrangian relax the weight constraint of SSP using mul-
tiplier.\ > 0, we get the problem L(SSP,.\)

maximize t
j:;::1
WjXj - .\ (t ;:;::1
WjXj - c)
(80)
subject to En Xj ~ b - 1,
;:;::1
XjE{O,I}, j=I, ... ,n.
The objective function may be written as
n
(1 - .\) E wp:j + .\c (81)
j:;::1

For .\ ~ 1 the optimal solution is found by not choosing any items, thus we
get the value z(L(SSP, .\)) = .\c ~ c. For 0 ~ .\ < 1 the optimal value is
found by choosing the b - 1 largest values w;. The objective value is again a
convex combination of w and c, and it will never be tighter than the bound
(76).
Thus the trivial bound (76) is the tightest one can get by simple relax-
ations of the cardinality constraint. Similar results are obtained for mini-
mum cardinality bounds.

3.2 Dynamic Programming Algorithms


Due to the lack of tight bounds for SSP, dynamic programming may be
the only way of obtaining an optimal solution in reasonable time when no
solution to (74) exists. In addition, dynamic programming is used in branch-
and-bound algorithms to avoid a repeated search for subsolutions with the
Knapsack Problems 355

same capacity. The dominance rule for SSP is extremely simple, since one
state dominates another state if and only if they represent the same weight
sum.
The Bellman recursion presented in Section 2.5 for the 0-1 Knapsack
Problem is trivially generalized to the SSP: At any stage we let li(c), for
o ~ i ~ n and 0 ~ c ~ c, be an optimal solution value to the subproblem of
SSP which is defined on items 1, ... , i with capacity c. Then the bellman
recursion becomes
Fe) = { li-l(C) for c = 0, ... ,Wi- l (82)
l C max{Ji-l(C), li-dc - Wi) + Wi} for c = Wi, ... , C
where lo(c) = 0 for c = 0, ... , c. This yields a time and space complexity
of O(nc). Using the improved techniques by Toth [112] as described in
Section 2.5, the complexity may be brought down to O(min{nc,2n}) by
using dynamic programming by reaching.

3.2.1 Horowitz and Sahni Decomposition


Since SSP is a special case of KP it is straightforward to adapt the Horowitz
and Sahni decomposition algorithm described in Section 2.5 to Subset-sum
Problems.
Ahrens and Finke [1] proposed an algorithm where the decomposition
technique is combined with a branch-and-bound algorithm to reduce the
space requirements. Furthermore a replacement technique (Knuth [66]) is
used in order to combine the dynamic programming lists, obtained by par-
titioning the variables into four subsets. The space bound of the Ahrens
and Finke algorithm is O(2n/4), making it best-suited for small but difficult
instances.
Pisinger [101] presented a modified version of the Horowitz and Sahni
decomposition, named decomp, which combines good worst-case properties
with quick solution times for easy problems. Let b be the break item for
(73) thus
b = min { h : 'Ej=l Wj > c} . (83)
The break solution x has weight sum tV = 'E~:~ Wj. Now let h(c), (b -1 ~
t ~ n, 0 ~ c ~ c) be the optimal solution value to (73) restricted to variables
b, ... , t as follows:

h(c) = max { 'E}=b WjXj : ~}=b WjXj ~ Cj } (84)


XjE{O,I}, J=b, ... ,t.
356 D. Pisinger and P. Toth

Let 9s(e), (1 ~ 8 ~ b, °
~ e ~ c) be the optimal solution value to the
problem defined on variables 8, ... , b-l with the additional constraint Xj = 1
for j = 1, ... , 8 - 1, thus
"s-l
9s(e) = max { L.Jj=l Wj
+ "b-l "s-l
~j=s WjXj : L.Jj=l Wj
+ "b-l <- }
L.Jj=s WjXj - Cj (85)
XjE{O,I}, J=8, ... ,b-1.

The recursion for It will repeatedly insert an item into the knapsack, while
the recursion for 9s will remove one item, thus

(86)

where Ib-l (e)


becomes
= ° for e = 0, ... , c. The corresponding recursion for 9s

C) - { 9sH (e) for e = C - Ws + 1, ... ,C


9s C - max{9sH(e), 9sH(e + w s ) - w s } for e = 0, ... , C - Ws

°
(87)
with initial values 9b(e) = for c =F '111 and 9b(e) = '111 for c = W.
Now starting from (8, t) = (b, b - 1) the decamp algorithm repeatedly
uses the above recursions, each time decreasing 8 and increasing t. At each
iteration the two sets are merged in O(c) time, in order to find the best
current solution
z = _max h(c) + 9s(C - e), (88)
c=O, ... ,c

and the process is terminated if z = c, or (8, t) = (1, n) has been reached.


The algorithm may be improved in those cases where v = Ed=l Wj -'111 <
c. Since there is no need for removing more items j < b than can be
compensated for by inserting items j ~ b, the recursion 9s may be restricted
to consider states e = C - v, ... , c while recursion It only will consider states
e = 0, ... ,v.
The decomp algorithm has basically the same complexity O(nc) as the
Bellman recursion but if dynamic programming by reaching is used, only the
live states need be saved thus giving the complexity O(min{ nc, nv, 2b +
2n - b }).

3.2.2 Balancing
If all weights Wj are bounded by a fixed constant r, the complexity of the
Bellman recursion may be written O(n 2 r), i.e. quadratic time for constant
Knapsack Problems 357

r. A linear time algorithm may however be derived by using balancing as


defined in Section 2.4.
Due to the weight constraint in Theorem 2.2 in Section 2.4 let fs,t(c),
(s ~ b, t ~ b - 1, C - r < C ~ C + r) be the optimal solution value to
the subproblem of SSP which is defined on the variables i = s, ... , t of the
problem:

+ ~t L...,j=l Wj + ~t
L...,j=s WjXj < - }
f s,t (C-) --
~s-l
L...,j=l Wj L...,j=s WjXj : ~s-l _ Cj
max { . . ..
Xj E {O, I} for J = s, ... , tj x IS a balanced fillmg
(89)
We will only consider those states (s, t, p.) where p. = fs,t(p.) , i.e. those
weight sums p. which can be obtained by balanced operations on X s , .•. , Xt,
applying the following (unusual) dominance relation:

Definition 3.1 Given two states (s,t,p.) and (s', t', p.'). If p. = p.', s ~ s'
and t ~ t', then state (s,t,p.) dominates state (s', t', p.').

If a state (s, t,p.) dominates another state (s', t,p.') then we may fathom the
latter. Using the dominance rule, we will enumerate the states for t running
from b - 1 to n. Thus at each stage t and for each value of p. we will have
only one index s, which is actually the largest S such that a balanced filling
with weight sum p. can be obtained on the variables x s , ••• ,Xt. Therefore
let St{p.) for t = b - 1, ... , nand C - r < p. ~ C + r be defined as

there exists a balanced filling x which satisfies


St (P. ) = max S { L...,j=l Wj + L...,j=s WjXj = p.j Xj E 0,1 , J = S, ••• , t
~s-l ~t { } .

(90)
where we set St{p.) = 0 if no balanced filling exists. Notice that for t = b-l
only one value of St(p.) is positive, namely St{w) = b, as only the break
solution is a balanced filling at this stage. An optimal solution to SSP is
found as z = max{p. ~ c: sn{p.) > O}.
After each iteration of t we will ensure that all states are feasible by
removing a sufficient number of items j < St{p.) from those solutions where
p. > c. Thus only states St{p.) with p. ~ C need be saved, but in order to
improve efficiency, we use St{p.) for p. > C to memorize that items j < St(p.)
have been removed once before. We get:

1 Algorithm balsub
2 for p. +- C - r + 1 to C do Sb-l(P.) +- OJ
358 D. Pisinger and P. Toth

I~j I~
2 3 4 5 6 c= 15
4 2 6 4 3

JJ 83(1') 84(1') 85(1') 86(1')


10 0 1 1 1
11 0 0 0 1
12 4 4 4 4
13 0 0 0 2
14 0 2 3 3
15 0 c 0 0 4 c
16 1 3 4 4
17 1 1 1 3
18 1 4 4 4
19 1 1 1 1
20 1 1 1 1
21 1 1 1 1

Figure 5: The items and table St(Jl.) for a given instance.

3 for J.L f- c+ 1 to c+r do Sb-l(J.L) f-l;


4 Sb-l(W) f- b;
5 for t f- b to n do
6 for Jl. f- c - r + 1 to c + r do St(J.L) f- St-l(J.L);
7 for J.L f- C - r + 1 to c do J.L' f- Jl. + Wt;
St (Jl.') f- max {St (J.L'), St-l (J.L)};
8 for Jl. f- C + Wt downto C + 1 do
9 for j f- St(J.L} - 1 downto St-l(J.L} do Jl.' f- J.L - Wj;
St(J.L'} f- max{St(J.L'}, j};

Algorithm balsub does the following (see Fig. 5 for an example): For t =
b - 1 we only have one balanced solution, the break solution, thus St (Jl.) is
initialized according to this in lines 2-4. Since the table St(Jl.) for Jl. > C is
used for memorizing which items have been removed previously, we may set
St (J.L) = 1 for Jl. > c as no items j < St (J.L) have ever been removed.
Now we consider the items t = b, ... , n in lines 5-9. In each iteration
item t may be added to the knapsack or omitted. Line 6 corresponds to the
latter case, thus the states St-l(Jl.} are copied to St(Jl.} without changes. Line
7 adds item t to each feasible state, obtaining the weight Jl.'. According to
(90), St(Jl.') is the maximum of the previous value and the current balanced
solution.
In lines 8-9 we complete the balanced operations by removing items
Knapsack Problems 359

j < St(J.L) from states with J.L > c. As it may be necessary to remove several
items in order to maintain feasibility of the solution, we consider the states
for decreasing J.L, thus allowing for several removals.

Theorem 3.1 Algorithm balsub finds the optimal solution x*.

Proof. We just need to show that the algorithm performs unrestricted bal-
anced operations: 1) It starts from the break solution x'. 2) For each state
with J.L ~ c we perform a balanced insert, as each item t may be added
or omitted. 3) For each state with J.L > c we perform a balanced remove
by removing an item j < St(J.t). As the hereby obtained weight J.L' satisfies
J.L' < J.L, we must consider the weights J.L in decreasing order in line 8 in order
to allow multiple removals. This ensures that all states will be feasible after
each iteration of t.
The only restriction in balanced operations is line 9, where we pass by
items j < St-l(J.L) when items are removed. But due to the memorizing we
know that items j < St-l(J.t) have been removed once before, meaning that
St(J.L - Wj) ~ j for j = 1, ... , St-t{J.L). Thus repeating the same operations
will not contribute to an increase in St(J.L - Wj). 0

Theorem 3.2 The complexity of Algorithm balsub is O(nr) in time and


space.

Proof. Space: The array St(J.L) has size (n - b + 1)(2r), thus O(nr). Time:
Lines 2-3 demand 2r operations. Line 6 is executed 2r(n - b+ 1) times. Line
7 is executed r(n - b + 1) times. Finally, for each J.t > c, line 9 is executed
sn(J.L) ~ b times in all. Thus during the whole process, line 9 is executed at
most rb times. 0

3.3 Hybrid Algorithms


Despite the lack of tight bounds, branch-and-bound algorithms give excel-
lent solution times in those cases where many solutions to the diophan-
tine equation (74) exist. If no solutions to the diophantine equation exist,
branch-and-bound algorithms yield very bad performance, and dynamic pro-
gramming may be a better alternative. Several hybrid algorithms have been
proposed which combine the best properties of dynamic programming and
branch-and-bound. The most successful are those by Plateau and Elkihel
[106] and Martello and Toth [77]. We will go into the latter in detail.
360 D. Pisinger and P. Toth

The mts algorithm assumes that the weights are ordered in nonincreasing
order,
(91)
and apply dynamic programming to enumerate the last (small) weights,
while branch-and-bound is used to search through combinations of the first
(large) weights. The motivation is that during branch-and-bound a very
great effort is used to search for solutions which fill a small residual capacity.
Thus by building a table of such solutions through dynamic programming,
repeated calculations can be avoided.
The mts algorithm actually builds two lists of partial solutions. First,
the recursion (82) is used to enumerate items (3, ... ,n, for all weight sums
not greater than c. Then the same recursion is used to enumerate items
a, ... ,n, but only for weight sums up to c. The constants a < (3 < nand
c < c were experimentally determined to be
a = n-min{2Iogr, 0.7n} (3 = n - min{2.Slog r, 0.8n} c= 1.3w,8
(92)
where r = WI is the largest weight. Thus we have two tables

f (c) = max { Ej=a WjXj : Ej=a WjXj ~ c, j = a, ... ,n} , c = 0, ... , c

g(c} = max { Ej=,8 WjXj : Ej=,8 WjXj ~ c,j = (3, ... ,n} , c = 0, ... ,c
(93)
We will assume that if c < 0 the above tables return -00. The branch-
and-bound part of mts repeatedly sets a variable Xi to 1 or 0, backtracking
when either the current weight sum exceeds cor i = (3. For each branching
node, the dynamic programming lists are used to find the largest weight
sum which fits into the residual capacity. Assuming that i is the next item
to be inserted, and that W is the weight sum of the currently fixed items, a
simplified sketch of the mts algorithm becomes:
algorithm mts(i,w);
Use the dynamic programming lists to complete the solution:
w' = W + f(c - w}; w" = W + g(c - w};
if (w' > z) or (w" > z) then
z ~ maxi w', W"};
Construct solution vector by setting x;
~ Xj for j = 1, ... ,i - 1 and
remaining xi's as given by the dynamic programming tables.
if (i < (3) and (c - W ~ Wi) then
Knapsack Problems 361

if (w + Wi ~ c) then Xi +-1j mts(i + 1,w + Wi)j


Xi +- OJ mts(i + 1, W)j

The algorithm is called mts(1, 0) and it returns the optimal solution vector
x*. The minimum weight table must be initialized as Wi = minj~i Wj.
Since dynamic programming by reaching is used for constructing the
tables f and g, only weight sums which are obtainable will be saved. Thus
a binary search method is used to look up values in the tables. This binary
search algorithm also checks whether the solution vector X corresponding
to g(c - w) satisfies Xj = 0 for j < i, and returns g(c - w) = 0 if this
is not the case. This is a necessary restriction since the solution space of
branch-and-bound and dynamic programming are not allowed to overlap.
The mts algorithm also allows several insertions in each iteration, such
that the tables f and 9 are only accessed when a sufficiently filled solution
has been obtained. A table Vh = Ej=h Wj is used to avoid forward moves
when c - W ~ Vi, or when the insertion of all items j ~ i will not lead to an
improved solution.

3.4 Solution of Large-sized Instances


Many randomly generated instances have several optimal solutions which
satisfy the weight constraint with equality. For such classes of instances one
may expect that an optimal solution to large-sized instances can be obtained
by considering only a relatively small subset of the items. Martello and Toth
[77] presented such an algorithm which solves a core problem similar to the
one for KP, although special properties for SSP make things much simpler.
The mtsl algorithm may be outlined as follows:
For a given instance of SSP, let b = min{ h : E7=1 Wj > c} be the break
item, and define a core as the interval of items [b - 6, b + 6], where 6 is a
appropriately chosen value. Martello and Toth experimentally found that
6 = 45 is a sufficiently large value. With this choice of the core, the core
problem becomes:

b+6
maximize L: WjXj
j=b-6
b+6 b-6-1 (94)
subject to L: WjXj ~ c- L: Wj,
j=b-6 j=1
XjE{O,I}, j=b-6, ... ,b+6.
362 D. Pisinger and P. Toth

The above problem is solved using the mts algorithm. If a solution is ob-
tained which satisfies the weight constraint with equality, we may obtain an
optimal solution to the main problem by setting Xj = 1 for j < b - 8 and
x j = 0 for j > b + 8. Otherwise 8 is increased to twice the size, and the
process is repeated.

3.5 Computational Experiments


We will compare the presented algorithms mtsl, balsub, bellman and decomp.
A comparison between the Ahrens and Finke algorithm and mtsl can be
found in [80]. Five types of data instances presented in Martello and Toth
[80] are considered:
Problems P3: Wj randomly distributed in [1,103 ], and c = ln10 3 /4J.
Problems P6: Wj randomly distributed in [1,10 6 ], and c = ln106 /4J. Prob-
lems evenodd: Wj even, randomly distributed in [1, 103 ], and c = 2ln103 /8J +
1 (odd). Jeroslow [62] showed that every branch-and-bound algorithm enu-
merates an exponentially increasing number of nodes when solving even odd
problems. Problems avis: Wj = n(n+1)+j, and c = n(n+1) l(n-1)/2J +
n(n-1)/2. Avis [14] showed that any recursive algorithm which does not use
dominance will perform poorly for the avis problems. Finally the todd prob-
lems: set k = llog2 nJ then Wj = 2k+n +1 + 2k+j + 1, and c = l~ Ej=l WjJ.
Todd [14] constructed these problems such that any algorithm which uses
upper bounding tests, dominance relations, and rudimentary divisibility ar-
guments will have to enumerate an exponential number of states.
Each instance type was tested with 100 instances and a time limit of 10
hours was assigned for the solution of all problems in the series. For the avis
and todd problems 100 permutations of the same problem were tested.
The running times of four different algorithms are compared in Table 6:
The bellman recursion, the decomp dynamic programming algorithm, the
balsub algorithm, and finally the mtsl algorithm. A dash indicates that
the 100 instances could not be solved within the time limit.
For the randomly distributed problems P3, P6 and evenodd we have r
bounded by a (large) constant. Thus the bellman recursion runs in O(n2 )
time, while balsub has linear solution time. The problems P3 and P6 have
the property that several solutions to L.:j=l WjXj = c exist when n is large,
thus generally the algorithms may terminate before a complete enumeration
has been performed. The bellman recursion has to enumerate all states up
to at least t = b before it can terminate. It is seen that the mtsl and decomp
algorithms are superior for these problems.
Knapsack Problems 363

Table 6: Solution times m seconds, as averages of 100 instances


(hp9000/735).
algorithm n P9 P6 even odd avis todd

·-··
10 0.00 0.00 0.00 0.00 0.00
30 0.02 - 0.01 0.01 -
100 0.33 - 0.21 1.57 -
-

·-··
300 4.11 4.84 -

··
bellman 1000 52.86 - 69.34 - -

··
3000 505.38 - 723.25 - -
-

10000 - - -
- - -•
·
30000 -
100000 - -• - -• -

··
10 0.00 0.00 0.00 0.00 0.00
30 0.00 0.01 3.84 12.39 -
100 0.00 0.00 - - -
- •
··
300 0.00 0.00 - -
mtsl 1000 0.00 0.00 - - -
-•
·· ·· ··
3000 0.00 0.01 - -
10000 0.00 - - - -
- -
· ·
30000 0.00 - -
100000 0.02 - - -• -


10 0.00 5.37 0.00 0.00 0.12

··
30 0.00 8.68 0.01 0.01
100 0.00 4.21 0.02 0.26 -

··
300 0.00 2.62 0.07 15.21 -

··
balsub 1000 0.00 2.12 0.22 562.38 -

·· ··
3000 0.00 2.11 0.66 - -

·
10000 0.00 - 2.22 - -


30000 0.00 - 6.66 - -
100000 0.02 -• 23.76 -•
10 0.00 0.00 0.00 0.00 0.00
30 0.00 0.00 0.01 0.00 -•
-•
··
100 0.00 0.00 0.24 0.25
300 0.00 0.00 2.90 23.94 -
-
·· ···
decomp 1000 0.00 0.00 34.20 -

··
3000 0.00 0.00 311.28 - -
-
··
10000 0.00 - - -
-
· -·
30000 0.00 - - -
.
100000 0.02 - - -
Could not be generated on a 32-blt computer.
364 D. Pisinger and P. Toth

For the evenodd problems no solutions satisfying 'L//=1 WjXj = c do exist,


meaning that we get strict linear solution time for balsub. The bellman
recursion has complexity O(n 2 ), and thus cannot solve problems larger than
n = 3000, and the same applies for the decomp algorithm. The mtsl al-
gorithm cannot prove optimality of the solution, before an almost complete
enumeration has been done in the branch-and-bound part, thus it is not able
to solve problems larger than n = 30.
The avis problems have weights of magnitude O(n2 ) while the capacity is
of magnitude O(n 3 ), so the bellman and decomp algorithms demand O(n4)
time, while balsub solves the problem in O(n 3 ). Algorithm mtsl again
needs a complete enumeration to prove optimality, and thus cannot solve
problems larger than n = 30.
Finally the todd problems are considered. Due to the exponentially in-
creasing weights, none of the algorithms are able to solve more than tiny
instances. However, the decomp and mtsl algorithms are able to solve the
largest instances, since they use some kind of decomposition to limit the
enumeration.
The comparisons do not show a clear winner, since all algorithms have
different properties, but the bellman recursion has worst solution times for
all instances. Both decomp and mtsl are good algorithms for randomly gen-
erated instances P3 and P6, but decomp is also able to solve some of the
difficult problems. On the other hand balsub is excellent for the non-fill
problems: evenodd and avis. Thus we may conclude that decomp domi-
nates mtsl and bellman, while balsub dominates bellman. No dominance
however exists between decomp and balsub.

4 Multiple-choice Knapsack Problem


Consider k classes Nt, ... , Nk of items to be packed into a knapsack of
capacity c. Each item j E Ni has a profit Pij and a weight Wij, and the
problem is to choose one item from each class such that the profit sum
is maximized without the weight sum exceeding c. The Multiple-choice
Knapsack Problem (MCKP) may thus be formulated as:
k
maximize z = LL PijXij
i=lje N i
k
subject to LL WijXij ~ c, (95)
i=ljeNi
Knapsack Problems 365

L Xij = 1, i = 1, ... , k,
JENi
Xij E {O, I}, i = 1, ... , k, j E Ni,

where Xij = 1 when item j is chosen in class Ni and Xij = 0 otherwise. All co-
efficients Pij, Wij, and c are nonnegative integers, and the classes N I , ... , Nk
are mutually disjoint, class Ni having size ni. The total number of items
is n = 2:f=1 ni. The MCKP is also denoted as Knapsack Problem with
Generalized Upper Bound Constraints or for short Knapsack Problem with
GUB.
Negative coefficients Pij, Wij in (95) may be handled by adding a suffi-
ciently large constant to all items in the corresponding class as well as to c.
Fractions can be handled by multiplying through with an appropriate fac-
tor. If the multiple-choice constraints in (95) are replaced by L-jENi Xij ~ 1
as considered in [63] then this problem can be transformed into the above
form by adding a dummy item (Pi,ni+ l , Wi,ni+ l ) = (0,0) to each class.
To avoid unsolvable or trivial situations we assume that
k k
""
LJ min
. N W··
I} <
- c < ""
LJ max
. N W··
I}
(96)
i=I}E i i=I}E i

and moreover we assume that every item i', j' satisfies


(97)

as otherwise it may be discarded.


If we relax the integrality constraint on Xij in (95) we obtain the Contin-
uous Multiple-choice Knapsack Problem C(MCKP). If each class has two
items, where (pil, Wid = (0,0), i = 1, ... , k, the problem (95) corresponds
to the 0-1 Knapsack Problem KP, and thus MCKP is NP-hard.
The MCKP in a minimization form may be transformed into the above
formulation (95) by finding for each class Ni the values Pi = maxjENi Pij,
Wi = maxjENi Wij, and by setting Pij = Pi -Pij and Wij = Wi -Wij for j E Ni
and c = 2:f=1 Wi - c. Then the equivalent maximization problem is defined
in P,w and c.
The MCKP problem has a large range of applications: Nauss [89] men-
tions Capital Budgeting and transformation of nonlinear KP to MCKP as
possible applications, while Sinha and Zoltners [109] propose MCKP used for
Menu Planning or to determine which components should be linked in series
366 D. Pisinger and P. Toth

in order to maximize fault tolerance. Witzgal [114] proposes MCKP used


to accelerate ordinary LP /GUB problems by the dual simplex algorithm.
Moreover MCKP appears by Lagrangian relaxation of several integer pro-
gramming problems, as described by Fisher [36]. Recently MCKP has been
used for solving Generalized Assignment Problems by Barcia and J6rnsten
[7].
When dealing with Multiple-choice Knapsack Problems, two kinds of
dominance appear:

Definition 4.1 If two items rand s in the same class Ni satisfy

WiT ::; Wis and PiT ~ Pis, (98)

then we say that item r dominates item s. Similarly if some items r, s, t E Ni


with WiT ::; Wis ::; Wit and PiT ::; Pis ::; Pit satisfy
Pit - PiT > Pis - PiT
(99)
Wit - WiT Wis - WiT

then we say that item s is continuously dominated (for short LP-dominated)


by items rand t.

Theorem 4.1 (Sinha and Zoltners [109]) Given two items r, s E Ni. If
item r dominates item s then an optimal solution to MGKP with Xis = 0
exists. If two items r, t E Ni LP-dominate an item s E Ni then an optimal
solution to C(MGKP) with Xis = 0 exists.

;:--: o

/00 o
o

L-------_w
Figure 6: LP-undominated items R;, (black) form the upper convex hull of
Ni·

In Section 4.1 we will use these dominance relations to derive different


upper bounds, while Section 4.2 considers some heuristic solution methods.
Knapsack Problems 367

Reduction of classes is discussed in Section 4.3, and these results are in-
corporated into exact solution techniques presented in Section 4.4 and 4.5.
Finally some reduction rules to fathom states in an enumerative algorithm
are given in Section 4.6, and algorithms for large-sized instances are con-
sidered in Section 4.7. We end the section by bringing some computational
results in Section 4.8.

4.1 Upper Bounds


The simplest bound is obtained from the continuous relaxation C(MCKP)
of the formulation (95). In this case dominated and LP-dominated items
may be fathomed, which is easily obtained by ordering the items in each
class Ni according to increasing weights, and successively testing the items
according to criteria (98) and (99). For each class Ni the reduction takes
O(ni log ni) time due to the sorting. The remaining items ~ are called
the LP-undominated items, and they form the upper convex hull of Ni, as
illustrated in Fig. 6. We will assume that Wil < Wi2 < ... < wi,IRiI in ~.
The continuous bound z(C(MKP)) can now be derived in O(nlogn)
time by using the greedy algorithm:

algorithm greedy

1 Choose the lightest item from each class (i.e. set Xii +- 1, Xij +- 0 for
j = 2, ... , I~I, i = 1, ... , k) and define the chosen weight and profit
sum as P +- Ef=IPib resp. W +- Ef=l Wil. For all items j :/: 1 define
the slope lij as

lij +- Pij - Pi,j-l , t. = 1, ... , k ID·I


, ·J = 2, ... ,.L "i • (100)
Wij - Wi,j-l

This slope is a measure of the profit-to-weight ratio obtained by choos-


ing item j instead of item j -1 in class~. Using the greedy principle,
order the slopes {,ij} in nondecreasing order. With each value of lij
we associate the indices i, j during the sorting.

2 Assume that lij is the next slope in hij}. If W + Wij > C go to


Step 3. Otherwise set Xij +- 1, Xi,j-l +- 0 and update the sums
P +- P + Pij - Pi,j-l, W +- W + Wij - Wi,j-l. Repeat Step 2.
3 If W = c we have an integer solution and the optimal objective value
to C(MCKP) (and to MCKP) is z* = P. Otherwise set 1* = lij.
368 D. Pisinger and P. Toth

We have two fractional variables xii +- (c - W)/(wii - Wi,j-l) and


Xi,j-l +- 1 - xii' which both belong to the same class. The optimal
objective value is
z· = p + -y. (c - W) . (101)

The greedy algorithm is based on the transformation principles presented


in Zemel [116]. The time complexity of the greedy algorithm is O(nlogn)
due to the sorting and preprocessing of each class.
The LP-optimal choices bi obtained by the greedy algorithm are the
variables for which Xibi = 1. The class containing two fractional variables in
Step 3 will be denoted the fractional class N a , and the fractional variables
are Xaba , Xall.a possibly with Xall.a = o. An initial feasible solution to MCKP
may be constructed by choosing the LP-optimal variables, i.e. by setting
Xibi = 1 for i = 1, ... , k and xii = 0 for i = 1, ... , k, j # bi. The solution
will be denoted the break solution and the corresponding weight and profit
sums are p = p = Ef=1 Pibi and w = W = Ef=1 Wibi' respectively. As a
consequence of the greedy algorithm we have:

Theorem 4.2 An optimal solution x· to C(MCKP) satisfies the following:


1) x· has at most two fractional variables Xaba and xafl... 2) If x· has two
fractional variables they must be adjacent variables within the same class
N a • 3) If x· has no fractional variables, then the break solution is an optimal
solution to MCKP.

4.1.1 Linear Time Algorithms for the Continuous Problem


Dyer [25] and Zemel [117] independently developed O(n) algorithms for
C(MCKP) which do not use the time-consuming preprocessing of classes
Ni to ~. Both algorithms are based on the convexity of the LP-dual prob-
lem to (95), which makes it possible to pair the dual line segments, so that
at each iteration at least 1/6 of the line segments are deleted. The fol-
lowing primal algorithm is more intuitively appealing and can be seen as a
generalization of Algorithm baLzem in Section 2.1.
As a consequence of the greedy algorithm just described, an optimal
solution to C(MCKP) is characterized by an optimal slope -y. which satisfies:

k k
LWit/>i ~ C < LWit/Ji (102)
i=1 i=1
Knapsack Problems 369

1/Ji
.- -
-
ifJi .....
..
(1, ')')

"""-------_w

Figure 7: Projection of (Wij,Pij) E Ni on (-,""(,1). Here we have Mi


{¢i,1/!i}

where
¢i = arg min Wij 1/!i = arg max Wij (103)
jEMi(-Y·) jEMi(-Y·)

and
Mi(r) = {j E Ni : (Pij - '""(Wij) = Te~(Pil- '""(Wil)} (104)

Here (102) is the termination criteria from Step 2, while (103) and (104)
ensure that '""(* is a slope corresponding to two items j - l,j E ~ as given
by (100). This is graphically illustrated in Fig. 7. The optimal slope '""(* may
be found by the following partitioning algorithm:

algorithm dye-zem

1 For all classes Ni: pair the items two by two as (iii, ih). Order each
pair such that Wijl ~ wih breaking ties such that Piit ~ Pih when
Wiit = wih' If item j1 dominates item h then delete item h from
Ni and pair item j1 with another item from the class. Continue this
process till all items in Ni have been paired (apart from one last item
if INil is odd). Set P ~ 0, W ~ O.

2 For all classes Ni: if the class has only one item j left, then set P ~
P + Pij, W ~ W + Wij, and fathom class Ni.

3 For all pairs (ijI, ih) derive the slope '""(iith = !i~2:::::~~
')2 'Jl
.
4 Let ')' be the median of the slopes {'""(ijtj2}'

5 Derive Mi(r) and ¢i, 1/!i for i = 1, ... , k according to (104) and (103).
370 D. Pisinger and P. Toth

6 If 'Y is optimal according to (102), i.e. if W + L:f=1 Wit/>, ~ c < W +


L:f=1 Wi"" then set W +- W + L:f=1 Wit/>i and P +- P + L:f=1 Pit/>i' An
optimal solution to C{MCKP) is z* = P + {c - Who Stop.

7 If L:f=1 Wit/>i ~ c then for all pairs (iiI, ih) with 'Yijd2 ~ 'Y delete item

8 If L:f=1 Wi"" < c then for all pairs with 'Yiilh ~ 'Y delete item iI.
9 Go to step 1.

In Step 7 we know that the chosen 'Y is too small according to the greedy
algorithm, thus the validity of deleting h can be stated as follows: If iI, h E
I4., then since 'Yiilh ~ 'Y we may delete h· If h ¢ I4. then h cannot be in
an optimal solution of C{MCKP), and thus can be deleted. Thus the final
case is it ¢ I4. while h E I4.. Let i~ be the predecessor of h in I4.. Since
'Yij~h < 'YiiIh ~ 'Y we may delete h. The validity of Step 8 is confirmed in
a similar way.

Theorem 4.3 The time complexity of the dye..zem algorithm is O{n).

Proof. Assume that all items and all classes are represented as lists, such
that deletions can be made in O{l). At any stage, ni refers to the current
number of items in class Ni and k is the current number of classes. Notice
that each iteration 1-9 can be performed in O{n) time, where n is the current
number of items, since no step demands a higher complexity.
There are L:f=1 Lni/2J pairs of items (iiI, ih). Since'Y is the median of
biilh}, half of the pairs will satisfy the criteria in step 7 or 8, and thus one
item from these pairs will be deleted; i.e. at least ! L:f=1 L ni/2J items are
deleted out of n = L:f=1 ni ~ L:~=1{2Lni/2J + 1). So each iteration deletes
at least
!L:f=ILni/2J > L:f=ILni/2J >! (105)
L:f=1 (2lni/ 2J + 1) - 2k + 4L:f=1 Lni/2J - 6
items since lni/2J ~ 1, which gives the stated. o

Pisinger [102] proposed to improve the above algorithm by also deleting in


Step 7 all items with Wij ~ Wit/>, for all classes Ni. In Step 8, all items
with Pij ~ Pi"" can be deleted. The validity of these reductions follows from
dominance of items and the criterion (102).
Knapsack Problems 371

4.1.2 Bounds from Lagrangian Relaxation


Two different bounds can be obtained from Lagrangian relaxation. If we
relax the weight constraint with a multiplier A ~ 0 we get LI (MCKP, A)
defined as

maximize tL
i=l jENj
PijXij +A (c -t L i=l jENj
WijXij )

subject to LXij = 1, i = 1, ... ,k, (106)


jENj
Xij E {O, I}, i = 1, ... , k, j E Ni.

The objective function may be written


k k
z(LI(MCKP,A) = AC+ LL (Pij - AWij) Xij = AC+ LIP-we (Pij - AWij)
i=l jENi i=l JE •
(107)
due to the multiple-choice constraint, thus the bound may be derived in O(n)
time. The corresponding solution vector is Xij = 1 for j = arg maxjENi (Pij -
AWij) and 0 otherwise. Notice that an optimal solution to the continuously
relaxed problem (106) will yield the same integer solution, thus we have

z(LI(MCKP, A)) = z(C(LI(MCKP, A))) ~ z(C(MCKP))


z(MCKP) ~
(108)
A different relaxation can be obtained by relaxing the multiple-choice con-
straints. In this case, by using multipliers AI, ... ,Ak, we get L 2 (MCKP, Ai):

maximize t.L
z=l JENi
PijXij + t (1 - .L
z=l
Ai
JENi
Xij)

k
subject to L L WijXij ~ c (109)
i=ljENi
XijE{O,l}, i=l, ... ,k, jENi.

The objective function may be rewritten as


k k k k
z(L 2 (MCKP, Ai)) = L Ai + L L (Pij - Ai)Xij = L Ai + L L PijXij
i=l i=ljENi i=l i=ljENi
(110)
372 D. Pisinger and P. Toth

thus the relaxation leads to a 0-1 Knapsack Problem defined in profits Pij =
Pij - Ai· This means that any polynomial bounding procedure for KP as
described in Section 2.1 can be used to obtain polynomial upper bounds for
MCKP.

4.1.3 Other Bounds


Enumerative bounds (see Section 2.1.2) have been considered in [102]. Some
work on the polyhedral properties of MCKP have been presented by Johnson
and Padberg [63] as well as Ferreira, Martin and Weismantel [33] - although
only for the case where the multiple-choice constraint is in the weaker form
'EjENi Xij :$ 1. No bounds derived from polyhedral properties have however
been presented.

4.2 Heuristics
The break solution X, taking the LP-optimal choices Xib; in each class is
generally a good heuristic solution of value z' = p. The relative performance
of the heuristic is however arbitrarily bad, as can be seen with the instance
k = 2, nl = n2 = 2, c = d and items (Pll,Wll) = (0,0), (P12,W12) = (d,d),
(P21,W2d = (0,0) and (P22,W22) = (2,1). The break solution is given by
items 1,1 and 2,2 yielding z' = p = 2 although the optimal objective value
is z* = d.
Dyer, Kayal and Walker [26] presented an improvement algorithm which
runs in O(l::~=l nd = O(n) time. Let f3i = bi for classes i =I a and f3a = b~.
Thus, by setting XiP; = 1 in each class, we obtain an infeasible solution.
Now, for each class i = 1, ... , k, find the item ii which when replacing item
f3i give a feasible solution with the largest profit. Thus ii = arg maxjENi {Pij :
Wij - WiP; + 11) ~ c} and let

zr = . max (p + Pili - Pip;), (111)


'=l, ... ,k
This heuristic however also has an arbitrarily bad performance ratio which
can be seen with the instance Nl = N2 = N3 = ((O,O), (1,1), (d, d + I)}
and k = 3, c = d + 1. We have z' = p = 3, and also zr = 3 although z* = d.
It is possible to obtain a heuristic solution with worst-case.performance
~ by setting
(112)
where zb = Pall,. + 'Ei¥bPi,Ct; with £¥i = argminjEN;{wij}. Notice that zb is
a feasible solution according to (97). Obviously z* :$ z' +zb, thus z* ~ 2zh.
Knapsack Problems 373

To see that the bound is tight, consider the KP instance given below (50)
transformed to a MCKP.
A fully polynomial approximation scheme for MCKP has been presented
by e.g. Chandra, Hirschberg and Wong [12].

4.3 Class Reduction


As for the 0-1 Knapsack Problem, several decision variables may be fixed
a-priori at their optimal value through class reduction. Let utj be an upper
bound on MCKP with additional constraint Xij = 1. If Uij < Z + 1 then
we may fix Xij at zero. Similarly, if u?j is an upper bound on MCKP with
additional constraint Xij = 0 and u?j < z+ 1, then we may fix Xij at one, and
thus all other decision variables in the class at zero. Using continuous bounds
for deriving u?j will however lead to the trivial bound u?j = C(MCKP), thus
the latter test is seldom used.
A bound similar to the Dembo and Hammer bound for KP may be
derived as follows [26]: Relax the constraint on the fractional variables
ba , b~ E Na in (95) to Xab a , Xab~ E n. Then the bound utj may be derived in
constant time as
1
Uij = P - Pibi
A
+ Pij + 'Y*( c - W
A
+ Wibi - Wij ,
)
(113)

The complexity ofreducing class Ni is O(ni), and ifthe reduced set has only
one item left, say j, we fathom the class, fixing Xij at 1.
Tighter, but more time-consuming bounds can be obtained from contin-
uous relaxation. For these reductions it is appropriate to solve C(MCKP)
through the zemel algorithm, as then it is easy to find the new break class
by considering the 'Yij in decreasing order from 'Y. The worst-case complexity
of the reduction becomes O(n 2 ) which however in practice is O(n) as only
few 'Yij need be considered.
Dudzinski and Walukiewicz [24] propose bounds obtained from the La-
grangian relaxation L1 (MCKP) in (106) obtaining an O(n) reduction.

4.4 Branch-and-bound Algorithms


Several enumerative algorithms for MCKP have been presented during the
last two decades: Nauss [89], Sinha and Zoltners [109], Dyer, Kayal and
Walker [26], Dudzinski and Walukiewicz [24]. Most of these algorithms start
by solving C(MCKP) in order to obtain an upper bound for the problem.
The C(MCKP) is solved in two steps: 1) The LP-dominated items are
374 D. Pisinger and P. Toth

removed as described in Section 4.1. 2) The reduced C(MCKP) is solved


by a greedy algorithm. After these two initial steps, upper bound tests may
be used to fix several variables in each class to their optimal value. The
reduced MCKP problem is then solved to optimality through enumeration.
We will go into the algorithm by Dyer, Kayal and Walker in detail, since
this is a well-designed branch-and-bound algorithm. The algorithm may be
sketched as follows:

procedure dyer..kay_llal
1 Remove LP-dominated items as described in Section 4.1. Solve the
continuous relaxation C(MCKP) in order to derive an upper bound.
This is done by finding a good candidate 'Y for 'Y* as an average value of
the slopes 'Yij. Now, starting from 'Y = 'Y search forward or backward
through the set {'Yij} as described in greedy until the weight constraint
(102) is satisfied. Heaps are used to efficiently access the slopes hij}.
2 Reduce the classes as described in Section 4.3.
3 Solve the remaining problem through branch-and-bound: Bounds are
derived by solving C(MCKP) defined on the free variables. If Xaba , Xab~
are the fractional variables of the continuous solution, branching is
performed by first setting Xab'a = 1 and then xa~ = O. Backtracking
is performed if either the upper bound u = z(G(MCKP)) is not larger
than the current lower bound z, or if the MCKP problem with the
current variables fixed is infeasible. The branch-and-bound algorithm
follows a depth-first search to limit the space consumption.

Dyer, Kayal and Walker furthermore improve the lower bound z at every
branching node by using the heuristic (111). For classes larger than ni ~
25 they also propose to use the class reduction from Section 4.3 at every
branching node.

4.5 Dynamic Programming Algorithms


MCKP can be solved in pseudo-polynomial time through dynamic program-
ming as follows [24]. Let h(c) be an optimal solution value to MCKP defined
on the first l classes and with restricted capacity c. Thus

f l(-)c ={m
L1=1 LjENi PijXij
ax.
: L1=1 LjENi WijXij $ c; } , (114)
LjENi Xij = 1, z = 1, ... ,i; Xij E {O, I}
Knapsack Problems 375

where we assume that h(c) = -00 if no solution exists. Initially we set

I
fo(c) = 0 for all c = 0, ... , c, Then for .e = 1, ... , k we can use the recursion

h-l(C - Wll) + Pll if 0 ~ C- Wil ,


h-l(C - Wl2) + Pl2 if 0 ~ C- Wl2 ,
h(c) = max . (115)

ii-l(C - Wlnt ) + Pint if 0 ~ C- Wint ,

where we assume that the maximum operator returns -00 if we are maximiz-
ing over an empty set. An optimal solution to MCKP is found as z = fk(C)
and we obtain z = -00 if assumption (96) is violated. The space bound of
the dynamic programming algorithm is O(kc), while each iteration of (115)
takes ni time, demanding totally O(c Ef=l ni) = O(nc) operations.
The recursion presented has the drawback that an optimal solution is
not reached before all classes have been enumerated, meaning that we have
to pass through all O(nc) steps. To avoid this problem, Pisinger [102] pro-
posed a generalization of the primal-dual dynamic programming described
in Section 2.5.1. Assume that the classes are reordered according to some
global considerations, guessing that the last classes have a large probability
for being fixed at their LP-optimal value.
Let h(c), C = 0, ... ,2c be an optimal solution to the following MCKP
problem defined on the first .e classes, and where variables in classes after I!
are fixed at their LP-optimal values:

(116)

Initially we set fo(c) = P for all C ~ w, and fo(c) = -00 for all c < w. Then
the following recursion is applied:

h-l(C - Wll + Wlbt) + Pil - Pibt


if 0 ~ C- Wll + Wibt ~ 2c,
h-l(C - Wl2 + Wlbt) + P12 - Pibt
fl(C) = max if 0 ~ C - Wl2 + Wlbt ~ 2c, (117)

h-l(C - Wlnt + Wlbt) + Plnt - Plbt


if 0 ~ C- Wlnt + Wlbt ~ 2c.
376 D. Pisinger and P. Toth

An optimal solution to MCKP is found as z = h(c), obtaining z = -00 if


assumption (96) is violated. The recursion (117) demands O(nd operations
for each class in the core and for each capacity c, yielding the complexity
O(L:f=12cni) = O(nc) for a complete enumeration. The space complexity
is O(2kc). However if optimality of a state can be proved after enumerat-
ing classes up to Ne, then we may terminate the process, having used the
computational effort O(c L:i<t ni).
Other dynamic programming algorithms are: The separability property
as presented by Horowitz and Sahni [54] is easily generalized to the MCKP.
The classes should be separated in two classes such that nl . n2'" nt and
ntH . ntH' .. nk are of same magnitude and then use normal dynamic pro-
gramming on each of the two problems. As for KP, the states are easily
merged in linear time. This leads to a time bound of O(min{nc, nl'" ne,
ntH" ·nd)·
No balanced algorithms for the MCKP have been published, but if Pij =
Wij for all items, we have a Multiple-choice Subset-sum Problem. For this
problem Pisinger [99] presented a balanced algorithm which has complexity
O(nr) where r = maxi,j Wij.
As in dynamic programming algorithms for KP, it is convenient to rep-
resent states as a list of triples (7ri,J.Li,Vi), where !e(J.Li) = 7ri and Vi is a
representation of the solution vector.

4.6 Reduction of States


During a branch-and-bound or dynamic programming algorithm, it is nec-
essary to derive upper bounds for the considered branching nodes or states.
Bounds could be derived in O(n) using the Algorithm dye-zem, but special-
ized methods yield a better performance.
Dudzinski and Walukiewicz [23] derived an efficient bounding technique
as follows: Initially the classes are reduced according to (98) and (99) ob-
taining classes~. Then, each time an upper bound should be derived, a
median search algorithm is used in the sorted classes ~, finding the con-
tinuous bound in O(k log2(n' jk)) where n' is the number of undominated
items.
Weaker bounds, which are however derived in constant time, were pro-
posed in Pisinger [102]. Assume that an expanding core is enumerated using
recursion (117) up to class Nt. Moreover define for each class i the extreme
gradients
r~ = max'V+
1 l>i Il
r:- = min 'Yl-
I t>i
(118)
Kna.psa.ck Problems 377

Then the bound on a state with profit 1r and weight p. may be found as

1r + (c - p.)rt if p. ~ c,
u(1r,P.) = { (119)
1r + (c - p.)ri if p. > c.

Notice that the best bounds are obtained by ordering the classes such that
the classes with "It, "Ii- closest to "1* are ordered first.

4.7 Solution of Large-sized Instances


As for other knapsack problems it may be beneficial to focus the enumeration
on a core when dealing with large-sized problems. Basically one should
distinguish between two kinds of core when dealing with MCKP: A core
where only a subset G c K = {I, ... ,k} of the classes is enumerated, or a
core where only some items j E Gi C Ni in each class are enumerated. The
first approach is suitable for problems with many classes, while the second
approach is more appropriate for problems with many items in each class.
No work has however been published on the second approach, thus we will
here consider a core consisting of a subset of the classes.
p

_b~ •
y...
• •
-
• •

L..---------+w

Figure 8: Gradients "It, "Ii in class Ni.

Pisinger [102] defined a core problem based on positive and negative


gradients "It and "Ii for each class Ni, i #: a. The gradients are defined as
(see Fig. 8):

(120)

and we set "It = 0 (resp. "Ii = 00) if the set we are maximizing (resp.
minimizing) over is empty. Note that "It and "Ii- can be derived in O(ni)
378 D. Pisinger and P. Toth

for each class Ni and they do not demand any preprocessing. The gradients
are a measure of the expected gain (resp. loss) per weight unit by choosing
a heavier (resp. lighter) item from Ni instead of the LP-optimal choice bi.
,t % differences

~~-----";:=--=l~O~O class Ni
(a)
,i % differences

)("'frequency
L..,..~-------l"'!:O~O class Ni
(b)
Figure 9: Frequency of classes Ni where IP-optimal choice differs from LP-
optimal choice, compared to gradient 'Yt.

In Fig. 9 (a) we have ordered the classes according to decreasing 'Yt


and show how often the IP-optimal solution to MCKP differs from the LP-
optimal choice in each class Ni. The figure is a result of 5000 randomly
generated data instances (k = 100, ni = 10), where we have measured how
often the IP-optimal choice j (satisfying Wij > Wibi since we are considering
forward gradients) differs from the LP-optimal choice bi in each class Ni. It
can be seen that when 'Yt decreases, so does the probability that bi is not
the IP-optimal choice. Similarly, in Fig. 9 (b) we have ordered the classes
according to increasing 'Yi to show how the probability for changes decreases
with increased 'Yi- .
This observation motivates considering only a small number of the classes
Ni, namely those classes where 'Yt or 'Yi are sufficiently close to 'Y* given
by (102) to (104). Thus a fixed-core algorithm for MCKP can be derived by
straightforward generalization of KP algorithms: Choose a subset C C K of
the classes where the gradients are close to 'Y. These classes may be obtained
Knapsack Problems 379

in O(k) by choosing the 6 classes with the largest value of ,t.In a similar
way additional 6 classes with smallest value of Ii are chosen. This gives a
core C of at most 26 classes, since some classes may have been chosen twice,
and thus should be eliminated.
A fixed-core algorithm should enumerate the classes in the core, and then
try to fix the remaining classes at their optimal values through the reduction
rules described in Section 4.3. If all items in classes Ni ¢ C can be fixed
at their LP-optimal values, the solution to the core problem is optimal.
Otherwise the reduced classes must be enumerated to completion either by
branch-and-bound or dynamic programming. However no algorithm has
presently been published which uses this approach.
An expanding core algorithm for MCKP has been published by Pisinger
[102]: Initially the algorithm mcknap solves the continuous problem, using a
simplified version of dye...zem. Then the classes are ordered such that classes
Ni with gradient It or Ii closest to I· are first. Initially only the break
class Na is enumerated, while other classes are introduced consecutively in
recursion (117). Upper bound tests are used to fathom states which cannot
lead to an optimal solution, hoping that optimality of the problem can be
proved with classes Ni ¢ C fixed at their LP-optimal value. Before a class
is added to the core, it is reduced by using the tests (113), and if some
items cannot be fixed at their LP-optimal values, these are sorted and then
reduced according to (98) and (99).

4.8 Computational Experiments


We will compare the dyer_kay _val and mcknap algorithms. Five types of
randomly generated data instances are considered, each instance tested with
data-range Rl = 1 000 or R2 = 10 000 for different number of classes k and
sizes ni as follows. Un correlated data instances: In each class we generate
ni items by choosing Wij and Pij uniformly in [1, R]. Weakly correlated
data instances: In each class, Wij is uniformly distributed in [1, R] and Pij
is uniformly distributed in [Wij - 10, Wij + 10], such that Pij ~ 1. Strongly
correlated data instances: For KP these instances are generated as Wj uni-
formly distributed in [1, R] and P; = Wj + 10, which are difficult instances
for KP. These instances are however trivial for MCKP, since they degen-
erate to subset-sum data instances, but hard instances for MCKP may be
constructed by cumulating strongly correlated KP-instances: For each class
generate ni items (wj,Pj) as for KP, and order these by increasing weights.
The data instance for MCKP is then Wi; = L~=l w~, Pij = L~=lPh' j =
380 D. Pisinger and P. Toth

1, ... ,ni. Such instances have no dominated items, and form an upper con-
vex hull. Subset-sum data instances: Wi; uniformly distributed in [1, R] and
Pi; = Wi;· Zig-zag instances: Sinha and Zoltners [109] consider some zig-zag
instances with very few dominated items. For each class construct ni items
as (wj,Pj) uniformly distributed in [1, R]. Order the profits and weights in
. . ord
mcreasmg er, an d t
se Wi; ='
Wi' Pi; = Pi'' ·J1= , ... , ni·
The FORTRAN code dyer..kay_val was obtained from Prof. Dyer. Both
algorithms were run on a HP9000/735. The computational times obtained
are given in Table 7 and 8. A time limit of one hour was given to each
instance, and a dash in the tables indicates that some (or all) instances
could not be solved within this limit. It follows that mcknap has an overall
stable behavior, as it is able to solve all the instances. There are however
several instances where the dyer_kay _val algorithm is not able to solve the
problem within the time limit, and in one situation, the algorithm does not
find the optimal solution.
The dyer ..kay _val algorithm is sometimes better than the mcknap algo-
rithm for small instances with very large classes Ni. This may be explained
by the fact that mcknap is designed to enumerate a few classes instead of
a few items. But when the classes are very large, this strategy demands a
large enumeration. Other definitions of a core can however be applied in
these situations as discussed at the end of Section 4.7.
~
{;
Table 7: Total computing time (mcknap) in seconds. Averages of 100 instances. g;
Uncorrelated Weakly corr. Strongly corr. Subset sum Zig-zag
k ni Rl R2 Rl R2 Rl R2 Rl R2 Rl R2
10 10 0.00 0.00 0.00 0.04 0.01 0.08 0.00 0.14 0.00 0.00 ~
*
o
0-
100 10 0.00 0.00 0.01 0.24 0.32 5.03 0.00 0.09 0.01 0.01
1000 10 0.02 0.02 0.02 0.19 6.35 90.37 0.01 0.07 0.02 0.04 [
Ul
10000 10 0.20 0.25 0.19 0.34 151.30 1461.16 0.13 0.13 0.27 0.33
10 100 0.00 0.00 0.02 0.50 0.02 0.18 0.05 0.92 0.01 0.01
100 100 0.01 0.01 0.02 0.46 0.27 6.68 0.01 0.59 0.03 0.05
1000 100 0.10 0.12 0.12 0.35 8.28 183.22 0.09 0.09 0.19 0.24
10 1000 0.02 0.02 0.09 2.38 1.96 0.11 0.02 11.26 0.15 0.61
100 1000 0.09 0.11 0.14 0.94 160.56 2.70 0.09 0.11 0.32 2.21

Table 8: Total computing time (dyer.ltay_val) in seconds. Averages of 100 instances.


Uncorrelated Weakly corr. Strongly corr. Subset sum Zig-zag
k ni Rl R2 Rl R2 Rl R2 Rl R2 Rl R2
10 10 0.00 0.00 0.07 0.64 5.24 7.66 0.02 0.13 0.00 0.00
100 10 0.01 0.01 0.12 - - - 0.03 0.29 0.05 0.07
1000 10 0.11 0.21 0.09 - - - 0.04 0.24 0.15 1.22
10000 10 0.66 2.64 0.67 1.42 - - 0.42 0.44 0;84 4.76
10 100 0.00 0.00 0.05 - - - 0.04 0.44 0.01 0.01
100 100 0.04 0.04 0.13 1.05 - - 0.04 2.84 0.07 0.12
1000 100 0.41 0.95 0.53 9.57 - - 0.37 ·2.43 0.66 1.58
10 1000 0.03 0.03 0.08 - - - 0.04 2.31 0.09 0.33
100 1000 0.35 0.36 0.44 3.70 34.52 - 0.41 0.43 0.52 0.67
.
Optimal solution not found in one instance. w
00
I-'
382 D. Pisinger and P. Toth

5 Bounded Knapsack Problem


We consider the problem where a knapsack of capacity c should be filled
by using n given item types, where type j has a profit Pj, a weight Wj,
and a bound mj on the availability. The problem is to select a number Xj
(0 :S Xj :S mj) of each item type j such that the profit sum of the included
items is maximized without the weight sum exceeding c. The Bounded
Knapsack Problem (BKP) may thus be defined as
n
maximize z = LPjXj
j=1
n
subject to L WjXj :S c, (121)
j=1
Xj E {0,1, ... ,mj}, j = 1, ... ,n,

where all coefficients are positive integers. Without loss of generality we


may assume that wjmj :S c for j = 1, ... ,n so all items available of a given
type fit into the knapsack, and that E'1=1 wjmj > c to ensure a nontrivial
problem. If the coefficients are negative or fractional, this can be handled by
a straightforward adaptation of the Glover [45] method presented in Section
2. Also the transformation of a KP in minimization form into a KP in
maximization form described in Section 2 can be immediately extended to
BKP. If mj = 1 for all item types, we get the ordinary KP and thus BKP is
NP-hard.
Several industrial problems which are usually solved as 0-1 Knapsack
Problems may equally well be formulated as Bounded Knapsack Problems,
thus taking advantage of the fact that most products come from series of
identical item types. Many combinatorial problems can be reduced to BKP,
and the problem arises also as a subproblem in several algorithms for Integer
Linear Programming.
A close connection between BKP and KP is self-evident, so all the math-
ematical and algorithmic techniques analyzed in Section 2 could be extended
to the present case. Only a few papers have however focused on generalizing
the KP results to BKP, which may be due to the fact that BKP may be
transformed into an equivalent KP, and thus solved effectively this way.
The transformation of BKP into an equivalent KP is based on a binary
encoding of the bound mj on each item type j (see Martello and Toth [80]).
Each item type j is replaced by Llog2 mj + 1J items in the KP case, whose
Knapsack Problems 383

profits and weights are:

(Pj, Wj), (2pj,2wj), (4pj, 4wj), ... , (2 a - 1pj, 2a - 1wj), (dpj, dWj), (122)

where a = llog2 mj J, and d = mj - Ef::-J 2i. Thus the equivalent KP will


contain Ej=lllog2 mj + 1J variables. It can be seen that the transforma-
tion introduces 2a+1 binary combinations, i.e. 2a+1 - (mj + 1) redundant
representations of possible Xj values, since the values from d to 2a - 1 have
a double representation. However, a + 1 is the minimum number of binary
variables needed to represent the integers from 0 to mj, thus any alternative
transformation must introduce the same number of redundancies.
A different transformation into a Multiple-choice KP is possible, by hav-
ing for each item type j a class of mj items, each representing one of the
mj choices of Xj. This transformation however demands Ej=1 mj variables,
which is at first sight less attractive than the transformation to KP. But due
to the simple structure of each class, it is not necessary to store all variables
of the class, thus a simplified version of e.g. the mcknap algorithm may be
used to solve the problem.
In the following we will present some upper bounds in Section 5.1, while
lower bounds obtained through different heuristics and approximation al-
gorithms are discussed in Section 5.2. Section 5.3 shows how the size of
an instance may be reduced by applying the well-known reduction rules
from KP, as well as specific techniques used for BKP. Branch-and-bound
algorithms are considered in Section 5.4, while dynamic programming ap-
proaches are outlined in Section 5.5. Fathoming of states in enumerative
algorithms is briefly considered in 5.6, while Section 5.7 deals with the solu-
tion of large-sized instances. Finally Section 5.8 gives a comparison of the
best codes for BKP, considering specialized algorithms for BKP as well as
approaches based on a transformation into KP.

5.1 Upper Bounds


The Continuous relaxation of BKP is easily solved by a greedy algorithm:
Order the item types according to nonincreasing profit-to-weight ratios

PI > P2 > ... >Pn (123)


WI - W2 - - Wn

and define the break item type as b = min{j : E{=1 wimi > c}. Then an
optimal solution to C(BKP) is defined as Xj = mj for j = 1, ... , b -1, and
384 D. Pisinger and P. Toth

Xj = 0 for j = b + 1, ... , n, while Xb = (c - E~:l wjmj)/wb. Thus the


Dantzig bound for BKP becomes

Ul = l
~pjmj + (c-
b-l

3=1
b-l
~wjmj ) ~b
3=1 b
j (124)

The break item type b may be found by adapting the baLzem algorithm
presented in Section 2.1, meaning that Ul can be derived in O(n) time. By
truncating the continuous solution to Xb = 0 we obtain the break solution
X, W h'ICh h as profit sum P = "b-l
A A
L..Jj=IPjmj and ' ht sum W = "b-l
weIg L..Jj=1 wjmj.
A

A tighter bound than the Dantzig bound has been derived by Martello
and Toth [73] by generalizing the results in Section 2.1.2. Let a = L(c-
W)/WbJ be the number of items of type b which additionally fit into the break
solution. Then the Martello and Toth upper bound is based on the fact that
in an integer-optimal solution either Xb :$ a or Xb ~ a + 1. A valid upper
bound for BKP with the first constraint added is

U' = lAP + apb + (A


c- W- aWb ) Wb+l
PHI J' (125)

while an upper bound on BKP with constraint Xb ~ a + 1 becomes

Hence
U2 = max{U', U"} (127)
is an upper bound for BKP. Obviously U2 satisfies U2 :$ Ul and thus the
Martello and Toth upper bound is tighter than the Dantzig bound [73].
The bound U2 may be found in O(n) time, using the bal....zem algorithm for
finding b.
The continuous solution of BKP in the transformed version to a 0-1 KP
produces the same continuous bound Ul. This is however not the case for
bound U2, since U' and U" are tighter than the corresponding values given
by (29) and (30) in the KP version.
Enumerative bounds as presented in Section 2.1.2 have been considered
in Pisinger [100]. Let MeN be a subset of item types, and XM = {Xj E
{O, ... , mj}, j E M}. Then an upper bound on BKP is given by

(128)
Knapsack Problems 385

where U(X) is an upper bound on (121) with additional constraint Xj = Xj


for j EM.
All the bounds introduced in Section 2.1 for the 0-1 Knapsack Problem
can be generalized to obtain upper bounds for BKP. This could be done
either in a straightforward way, by applying the formulae in Section 2.1 to
BKP in 0-1 form (as was done for Ul) or, better, by exploiting the peculiar
nature of the problem.
Maximum cardinality constraints are derived as follows: Assume that
the weights are ordered according to nondecreasing weights and let f3 =

l-
min{h: Ej=1 wjmj > c}. Then setting

fJ-l "fJ-l
LJj=1 wJmJ oj
'"
+ C 0

k = L.J mj (129)
j=1 wfJ

we may add the additional constraint

(130)

to the formulation (121) without excluding any integer solutions. An in-


teresting observation is that if an instance of BKP is transformed into a
KP defined in p', w' ,c according to (122), then the corresponding cardinal-
ity constraint Ej~1 xj ::; k' will have k' ::; k since the weights in the KP
case are not smaller than those in the BKP case. Adding the cardinality
constraint (130) to the BKP however leads to tighter bounds from contin-
uous relaxation, than in the KP case. This can be seen with the instance
PI = WI = 10, P2 = W2 = 11 with bounds ml = m2 = 3 and capacity c = 40.
We find k = 3 and thus we add the constraint E~=1 Xj ::; 3 to the problem,
obtaining the continuous solution z(G(BKP)) = 33. If we instead transform
the instance into a KP, we get p~ = w~ = 10, p~ = w~ = 20, p~ = w~ = 11
and P4 = w4 = 22. Here k' = 2, and thus adding the constraint E1=I xj ::; 2
to the problem gives the continuous solution z(C(KP)) = 40.

5.2 Heuristics

l
The integer solution corresponding to bound Ul is given by

I b-l
'"
z = L.JPjmj + C- "b-l
LJj=1 wjmj JPb (131)
j=1 Wb
386 D. Pisinger and P. Toth

The absolute error z* - z' of this heuristic solution is bounded by Pb since


z' ::; z* ::; U1 ::; z' + Pb, but the ratio z' / z can be arbitrarily close to 0 as can
be proved by using the same instance as given below (49). The worst-case
!
performance ratio, however, can be improved to by computing

(132)

The same instance as in (49) can be used to prove the performance ratio.
A greedy heuristic for BKP assumes that the items are ordered accord-
ing to (123). Now repeatedly take the largest amount of each item type
j = 1, ... ,n by setting Xj = min{Lc/wjJ,mj} where c = c-EttwiXi.
The performance of the greedy heuristic may be improved by deriving l =
arg maxj=l, ... ,n {pjmj} and setting

(133)

!
The relative performance ratio of z9 is which can be proved as in Section
2.2.
Thansforming BKP into an equivalent 0-1 problem and then applying
any of the Polynomial-time or fully polynomial approximation schemes of
Section 2.8 leads to approximate solutions with worst-case bound as defined
by such schemes.

5.3 Reduction
The reductions presented in Section 2.3 are easily generalized to BKP: An
item type j may be fathomed if either all the available items of that type
have to be included in the knapsack (j < b) or if none of the available items
of that type can be included in the knapsack (j 2: b). Thus let be anuJ
upper bound on BKP with additional constraint Xj ::; mj -1, and u} be an
upper bound with additional constraint Xj 2: 1. If uJ < z + 1 then we may
fix Xj at mj and similarly if u} < z + 1 then we fix Xj at O.
Pisinger [100] used a generalization of the Dembo and Hammer bound
(57) getting the bounds

Ujo =P-Pj+ c-w + Wj ) Wb


A ( APb u} =P+Pj+(c-W-Wj) ~: (134)

where uJ is derived for j < band u} is derived for j 2: b. Both bounds may
be derived in constant time for each item type.
Knapsack Problems 387

A different kind of reduction was presented in Pisinger [100], where some


bounding tests are used to tighten the bound mj on an item type j. In this
way the bound Xj E {O, ... , mj} may be restricted to Xj E {O, ... , dj} when
j ~ b or to Xj E {mj - dj, ... ,mj} when j < b. If dj = 0 then the decision
variable Xj can only take one value, and thus we may fathom item type j.
It is only fruitful to insert d items of type j ~ b in the knapsack if an
upper bound uj(d) on the BKP with additional constraint Xj = d exceeds
the current lower bound z, i.e. if uj(d) ~ z + 1. We use a generalization of
the bound by Dembo and Hammer [18] for this purpose, obtaining the test

p + dpj + (c - W - dWj) :: ~ z + 1. (135)

From this inequality we may obtain the maximum number of an item type,
which may be included into the knapsack, as:

d .=
3 l{Z+l-P)Wb-(C-W)PbJ
PjWb - WjPb
h . b
wenJ>, (136)

where we set dj = mj whenpj/wj = Pb/Wb or when the right side of equation


(136) is larger than mj. A similar result is obtained for items of type j < b,
which have to be removed from the knapsack. Here an upper bound on the
number of removed items is

dj=
l(Z + 1 - p)Wb - (c - W)PbJ
-PjWb +WjPb
h . b
wenJ<, (137)

with the same conventions as for equation (136).


Pisinger [100] also derived a tighter reduction method than the above by
using the enumerative bound (128). These bounds are derived in O{IXMI)
and thus are computationally expensive to derive, but they are able to
tighten the bound mj on item type j by about 10% more than the bounds
(136) and (137). In most applications the simple and cheaply evaluated
bounds are however sufficient.

5.4 Branch-and-bound Algorithms


Martello and Toth [73] adapted the rot 1 procedure of Section 2.4 to BKP.
The obtained algorithm rotb could be sketched in simplified form as

algorithm rotb(i,p,w): boolean;


388 D. Pisinger and P. Toth

var improved;
if (w > c) then return false;
if (p > z) then z t- Pi improved t- true; else improved t- false;
if (i > n) or (c - W < Wi) or (U2 < z + 1) then return improved;
for a t- mi downto 0 do
if mtb(i + l,p + api, W + aWi) then Xi t- a; improved t- true;
return improved;

where the table W is given by Wi = minj~iWj. All decision variables Xj


must be initialized to 0, and the lower bound set to z = 0 before calling
mtb(O, 0, 0). The Martello and Toth bound U2 is derived by equation (127)
restricted to variables j 2:: i and with reduced capacity C = c - w.
Ingargiola and Korsh [61] presented a different branch-and-bound algo-
rithm, using the reductions described in Section 5.3 to a-priori fix some
variables at their optimal value. Bulfin, Parker and Shetty [9] presented
a branch-and-bound algorithm where penalties were used to improve the
bounding phase. According to Aittoniemi [2] these two branch-and-bound
algorithms are outperformed by the mtb algorithm.

5.5 Dynamic Programming


Let fi(C), (0 :::; i :::; n, 0 :::; C :::; c) be the optimal solution value to the
following subproblem of BKP, defined on the first i variables of the problem,
and with capacity restricted to C

fi(c)=max{ E~=IPjXj:E~=IWjXj:::;C; XjE{O, ... ,mj}, j=I, ... ,i}.


(138)
Generalizing the results by Bellman [8] for the 0-1 Knapsack Problem, we
obtain the following recursion

fi(C) = max j li-I(C)


Ii-I (C - Wi)
.
+ Pi
ifc2::0
if C- Wi 2:: 0
(139)

fi-I (C - Wimi) + Pimi


setting fo(c) = 0 for c = 0, ... , c. Thus the Bellman recursion has time
complexity O(CEj=l mj) and space complexity O(nc). Dynamic program-
ming algorithms based on this recursion have been presented by Gilmore
and Gomory [44] and Nemhauser and Ullmann [91].
Knapsack Problems 389

In Section 2.5.1 we saw that a primal-dual dynamic programming al-


gorithm for the 0-1 Knapsack Problem in general is more efficient than the
Bellman recursion. A primal-dual algorithm for BKP assumes that the items
are ordered according to nonincreasing efficiencies, and fs,t(c), (s ~ b, t ~
b - 1, 0 ~ c ~ 2c) is the optimal solution value to the problem:
_ ",8-1
LJj=IPjmj + ",t
LJj=sPjXj : ",8-1 LJj=sWjXj _< Cj
LJj=l Wjmj + ",t - }
fs,t{c) = m{
ax. .
Xj E {O, ... ,mj}, J =s, ... ,t
(140)
We get the recursion
fs,t-l(C) if t ~ b
fs,t-l(C - Wt) + Pt if t ~ b, c - Wt ~ 0

fs,t-l(C - Wtmt) + Ptmt if t ~ b, C - Wtmt ~ 0


fs,t(c) = max
fs+1,t(c) if 8 <b
fs+1,t(c + ws) - Ps if 8 < b, C+ Ws ~ 2c

if s < b, c+ wsms
2c ~
(141)
If P and ware the profit and weight sums of the break solution, we may
initially set fb,b-l (c) = p for c = w, ... ,2c and fb,b-l (c) = -00 for C =
0, ... , w- 1. Thus the enumeration starts at (s, t) = (b, b - 1) and continues
by either removing some items of type s from the knapsack, or inserting
some items of type t in the knapsack. An optimal solution to BKP is found
as h,n{c).
As for 0-1 Knapsack Problems, it is convenient to represent each state
by the corresponding profit and weight sums (11', J1.) where 11' = fs,t(J1.). Any
state (11', J1.) with J1. > c + L:j:t wjmj, may be fathomed, since even if we
removed all items of types j < 8 in the forthcoming iterations, the state
would never become a feasible solution. This observation implies that no
states with weights J1. ~ 2c can occur, giving the algorithm space complexity
O(n2c) = O(nc).
An efficient iterative version of recursion (141) is obtained by applying
the transformation (122) for the current item type, such that each inser-
tion/removal of an item type results in Llog2 mj + 1J mergings of length
O(c) (see Pisinger [105] for details on the merging). This means that the
time complexity for the dynamic programming is O(c L:j=1 Llog2 mj+1J), i.e.
o (nclogc) in the worst case. For moderate core sizes lei = t-s+1 a tighter
390 D. Pisinger and P. Toth

bound is obtained, as only O(IClclog c} steps are necessary. Moreover by


using dynamic programming by reaching, at most O(ms ·ms+l··· mt) states
should be considered. Thus we obtain the time bound

(142)

on the enumeration of a core C = [s, t].


A dynamic programming algorithm based on balancing has been pre-
sented in Pisinger [99]. The algorithm is based on transformation to KP,
and since the solution times of balanced algorithms are bounded in the
magnitude of the coefficients, a linear transformation is used instead of
the binary transformation (122). Thus each item type j in BKP is re-
placed by mj individual items in the KP case. The obtained solution times
are O(rlr2 Ej=l mj) where rl = maxwj and r2 is a constant satisfying
r2 ~ max Pj. Since the capacity c in the Bellman recursion is bounded
by c ~ Ej=l wjmj, balancing becomes attractive when rlr2 « Ej=l wjmj
and no bounding rules are able to terminate the process before complete
enumeration.
Concerning the solution vector corresponding to an optimal solution
value, the same principles as described in Section 2.5.3 may be applied to
the above recursions.

5.6 Reduction of States


During dynamic programming or branch-and-bound it is necessary to derive
upper bounds in an attempt to fathom the current node or state. Bounds
can obviously be derived as described in Section 5.1, but specialized methods
may often improve the performance.
Assume e.g. that items [s, t], have been enumerated in (141), and that
a given state i has the profit and weight sum (7r, /-L). We may then fathom
the state if u(7r, /-L} < Z + 1 where the upper bound U is obtained by relaxing
the constraints on Xs-l and Xt+l to Xs-l ~ 0 and Xt+l ~ 0, yielding:

( ) {
7r + (c - /-L)PHt/WHl if /-Li ~ C,
(143)
U 7r, /-L = 7r + (C -/-L)Ps-l / Ws-l .f
1 /-Li > c.

This bound may also be used for deriving a global upper bound on (121) by
applying u in equation (128).
Knapsack Problems 391

5.7 Solution of Large-sized Problems


A core of the Bounded Knapsack Problem can be derived by adapting the
principles described in Section 2.6. The findcore algorithm in Section
2.6.1 can be generalized in a straightforward way, returning a fixed-size core
G = [s, t] = [b - 0, b + 0]. The core may then be solved using a specialized
algorithm like mtb, or the core may be transformed into a 0-1 Knapsack
Problem and then solved using one of the algorithms described in Section
2.4 or 2.5. If optimality of the solution can be proved, an optimal solution
to the original problem is found by setting variables Xj with j < s to mj,
and variables with j > t to O. If optimality cannot be proved, then the
reduction algorithm in Section 5.5 is used, and the remaining problem is
solved through enumeration. No algorithms have however been presented
based on this approach.
Martello and Toth [80] presented an algorithm mtb2 for BKP which
transforms the instance into an equivalent 0-1 KP. The transformed instance
is then solved through the mt2 algorithm, which is designed for large-sized
instances of KP.
Pisinger [100] presented an expanding-core algorithm for BKP, adapting
the principles of Section 2.6.3. The bouknap algorithm is based on dynamic
programming and can be sketched as follows:

algorithm bouknap
1 Find core C = [b, b] through an algorithm similar to findcore, pushing
all discarded intervals into two stacks Hand L.
2 Run the dynamic programming recursion (141) alternately inserting
an item type t or removing an item type s. Unpromising states with
u < z + 1 are deleted, where u is obtained from (143).
3 Before an item type is added to the core, check if s, t is within the
current core G, otherwise pick the next interval from H or L, and
reduce the item types using (134). Those items which could not be
fixed at their optimal values are sorted, and finally added to the core.
4 Before an item type is enumerated in the dynamic programming, tests
(136) and (137) are used to tighten the bound on the item type.
5 The process stops when all states in the dynamic programming have
been fathomed due to an upper bound test, or all item types j ¢ G
have been fixed at their optimal value.
392 D. Pisinger and P. Toth

As in Section 2.6.3 it can be proved that the core, bouknap enumerates, is


minimal.

5.8 Computational Experiments


In order to compare specialized appraoches for BKP with those based on a
transformation to KP, we have chosen to focus on the algorithms bouknap
and mtb2. The latter was tested in two versions: The published code mtb2
which solves the KP through algorithm mt2, and a new technique which
solves KP through the mthard code. Using Martello and Toth's naming
tradition we could call the last approach mtbhard.
The following data instances are considered: Un correlated data instances:
Pi and wi are randomly distributed in [1, R]. Weakly correlated data in-
stances: wi randomly distributed in [1, R] and Pi randomly distributed in
[wi - R/10, wi + R/lO] such that Pi ~ 1. Strongly correlated data instances:
wi randomly distributed in [1, R] and Pi = wi + 10. Subset-sum data in-
stances: wi randomly distributed in [1, R] and Pi = Wi' The data range R
will be tested with values R = 100, 1000 and 10000, while the bounds mi
are randomly distributed in [5,10].
Each problem type is tested with a series of 200 instances, such that the
capacity c of the knapsack varies from 1% to 99% of the total weight sum of
the items. This approach takes into account that the computational times
may depend on the chosen capacity. Since mtb2 transforms the BKP into
an equivalent 0-1 KP, instances larger than n = 50000 cannot be solved due
to memory limitations. Similarly the mtbhard code cannot solve instances
larger than n = 10000. In the following tables a dash means that the 200
instances could not be solved in a total of 10 hours. The results have been
achieved on a HP9000/735 computer.
The computational times are presented in Table 9. The mtb2 algorithm
has substantial stability problems for low-ranged data instances even of rel-
atively small size. Subset-sum data instances also show a few anomalous
occurrences. The mtbhard algorithm generally has a more stable behavior,
but it is interesting to note, that although mthard is excellent at solving
strongly correlated KP instances, it is not able to solve strongly corrrelated
instances when these are transformed from BKP into KP. But as mentioned
in Section 5.1 the effect of cardinality bounds is weakened by the transfor-
mation. Finally bouknap has a very stable behavior, solving most instances
within a fraction of a second. The strongly correlated instances demand
more computational time, but the effort increases linearly with the problem
Table 9: Computing times in seconds, bouknap (top), mtb2 (mid) and mtbhard (bottom). Averages of 200 ~
~
instances
Uncorrelated Weakly corr. Strongly corr. Subset-sum
n\R 100 1000 10000 100 1000 10000 100 1000 10000 100 1000 10000
100 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.16 1.55 0.00 0.02 0.27
a
o~
300 0.00 0.00 0.00 0.00 0.01 0.01 0.06 0.62 7.36 0.00 0.02 0.27
1000 0.00 0.01 0.01 0.00 0.01 0.03 0.22 2.24 25.70 0.00 0.02 0.28
3000 0.00 0.01 0.02 0.01 0.01 0.09 0.81 9.37 109.67 0.01 0.03 0.30 f
10000 0.01 0.02 0.06 0.02 0.03 0.16 3.03 39.71 - 0.02 0.04 0.30
30000 0.04 0.05 0.12 0.08 0.08 0.20 13.59 116.96 - 0.05 0.06 0.31
100000 0.20 0.16 0.27 0.40 0.26 0.37 107.90 450.46 - 0.17 0.19 0.45
100 0.00 0.00 0.00 0.00 0.01 0.02 - - - 0.00 0.02 0.15
300 0.01 0.01 0.01 0.00 0.03 0.06 - - - 0.00 - 40.96
1000 0.01 0.03 0.04 0.01 0.05 0.26 - - - 0.00 0.01 0.09
3000 0.02 0.05 0.14 62.58 0.04 0.57 - - - 0.01 0.02 0.10
10000 14.72 0.11 0.43 - 0.17 0.98 - - - 0.05 0.06 0.14
30000 - 0.31 1.09 - - 1.04 - - - 0.18 0.19 0.26
100 0.00 0.00 0.00 0.00 0.01 0.02 0.62 - - 0.00 0.01 0.14
300 0.00 0.01 0.01 0.00 0.01 0.02 - - - 0.00 0.01 0.47
1000 0.01 0.03 0.03 0.01 0.03 0.17 - - - 0.00 0.02 3.32
3000 0.02 0.06 0.09 0.05 0.07 0.51 - - - 0.01 0.03 15.01
10000 0.11 0.22 0.43 0.22 0.19 1.05 - - - 0.05 0.13 39.15

""c.c
""
394 D. Pisinger and P. Toth

size n and the data range R.


Thus the computational experiments indicate that specialized approaches
for BKP are more efficient than those based on a transformation to KP, since
tighter upper bounds can be derived, and specialized reductions can be ap-
plied as described in Section 5.3.

6 Unbounded Knapsack Problem


The Unbounded Knapsack Problem (UKP) is a special case of the Bounded
Knapsack Problem, where there is an unlimited availability of each item
type. Thus assume that n given item types are characterized by their profit
Pj and weight Wj, and we want to choose a number Xj of each type j,
such that the chosen profit sum is the largest possible, but such that the
weight sum does not exceed a given capacity c. Thus the problem may be
formulated as
n
maximize LPjXj
j=l
n
subject to L WjXj :$ C, (144)
j=l
Xj ~ 0, integer, j = 1, ... ,no
The problem has several applications in financial management, cargo loading
and cutting stock, and it also appears as surrogate relaxation of IP problems
with nonnegative coefficients. The Unbounded Knapsack Problem may be
transformed into a Bounded Knapsack Problem by imposing the constraint
Xj :$ Lc/wjJ for each variable considered, but according to Martello and
Toth [80], algorithms for the BKP perform rather poorly for instances of
this kind.
The problem is N'P-hard, as proved in Lueker [71] by transformation
from SSP. However, it can be solved in polynomial time in the n = 2 case as
proved by Hirschberg and Wong [51], and Kannan [64]. Notice that the result
is not trivial, since a naive algorithm, testing Xl = i, X2 = L(c - iWl)/W2J
for i taking on integer values from 0 to Lc/wIJ, would require a time O(c),
exponential in the input length.
Despite the fact that UKP is closely related to BKP, several unique
techniques can be used to solve the unbounded case. Thus in Section 6.1
we present some upper bounds for UKP, while lower bounds are consid-
ered in Section 6.2. Exact solution methods are presented in the following
Knapsack Problems 395

two sections, first dealing with dynamic programming in Section 6.3, then
branch-and-bound in Section 6.4. Reduction algorithms are considered in
Section 6.5, concerning both reductions known from other Knapsack Prob-
lems, and also some reductions which are unique for UKP. Section 6.6 deals
with the solution of large-sized problems, and we conclude the presentation
of UKP in Section 6.7 by comparing the performance of some of the most
efficient algorithms.

6.1 Upper Bounds


As usual assume that the item types are ordered according to

(145)

breaking ties such that Wj ~ Wj+! whenpj/wj = Pj+I/Wj+I. The continuous


relaxation of UKP has the optimal solution Xl = C/WI and Xj = 0 for
j = 2, ... ,no This yields the trivial upper bound Uo = lc/wdpi
Since Xl ~ m = lc/wtJ in any integer solution, imposing this constraint
to UKP we get the continuous solution:

Xl =m Xj = 0, j = 3, ... , n . (146)

where the residual capacity is r = c - mWl. We get the improved upper


bound
(147)

which is a counterpart to the Dantzig bound in Section 5.1. Note that the
break item is always b = 2 and thus the profit and weight sum of the break
solution is

(148)

Setting a = lr /W2J and noting that in every optimal solution either


X2 ~a or X2 ~ a + 1, we get a counterpart to the Martello and Toth upper
bound (31) as
U2 = max{U', U"} (149)
where
(150)
396 D. Pisinger and P. Toth

.
and
U" = lmPI + {a + l)p2 + (r - (a + l)w2) ~:J (151)

In UKP we can exploit the fact that b = 2 to obtain a better bound. Since
U" is a bound on UKP with constraint X2 2: a + 1, this can only be obtained
if at least (3 = r(r - (a + l)w2)/wll items of type 1 are removed, and thus
r - (a + l)w2 + (3wI units of capacity are available for the items of type 2.
Hence, a valid upper bound can be obtained by replacing U" with

l
U" = mpi + {a + l)p2 - (3PI + (r - (a + l)w2 + (3wI) ~:J. (152)

Furthermore, U" :::; u" since we are "moving" some capacity units from
items of type 1 to the less efficient items of type 2. Thus we have

Theorem 6.1 (Martello and Toth [81]) U3 = max{U', U"} is an upper


bound for UKP and, U3 :::; U2.

The time complexity for the computation of Uo, UI, U2 and U3 is O{n),
since only the three largest ratios Pj/Wj are needed.
The worst-case performance ratio of all four bounds is p(Uo) = p(UI) =
p(U2) = p(U3) = 2, since U3 :::; U2 :::; UI :::; Uo :::; p + PI :::; 2z. and the
tightness is shown by considering a series of problems with n = 3, Pj =
Wj = k for all j, and c = 2k - 1. Here we get Uo = UI = U2 = Uo = 2k -1,
and z* = k, so the ratio U/ z can be arbitrarily close to 2 for sufficiently
large k.

6.2 Heuristics
The break solution of value z' = p, is an obvious heuristic solution. But
where the break solution for KP and BKP can provide an arbitrarily bad
approximation of z*, the situation is different for UKP, as we have z' / z* 2:
~. The proof is immediate by observing that z* - z' :::; PI and from the
assumption Wj :::; C we get z' 2: Pl. The series of problems with n = 2,
PI = WI = k + 1, Pw = W2 = k and c = 2k shows that 1 is tight since
f
z' / z* = (k+ 1) / (2k) and this ratio can be arbitrarily close to for sufficiently
large k. The same property holds for the simpler heuristic z" = lc/wdpl.
A counterpart to the greedy heuristic (133) thus gets the simpler form:
Repeatedly take the largest amount of each item type j = 1, ... ,n by setting
Knapsack Problems 397

-
Xj = L-/
C Wj J were
h c- = c- " j - l WiXi.
wi=l - Thus
n
zg = LPjXj (153)
j=l

UKP accepts a fully polynomial approximation scheme as shown by e.g.


Ibarra and Kim [58].

6.3 Dynamic Programming


An immediate adaptation of the BKP recursion from Section 5.5 yields

h(c) = max{h-l(C-awi) +api: a = 0, ... , Lc/wiJ; c-awi ~ o}


(154)
with lo(c) = 0 for c = 0, ... ,c. Since C/Wi may be of magnitude O(c), the
time complexity for determining z = In(c) is O(nc2 ). Gilmore and Gomory
[43] derived a better recursion as

f i (C-) = max { Ii-l (c) if c~ 0 (155)


h(c - wd +Pi if c - Wi ~ 0
which reduces the time complexity to O(nc). Specialized dynamic program-
ming algorithms have been presented by e.g. Greenberg and Feldman [49],
Greenberg [47, 48] to mention some of the more recent ones.

6.4 Branch-and-bound
Several branch-and-bound algorithms for UKP have been proposed, but
Martello and Toth [73] showed that their algorithm mtu is the most efficient.
Assuming that the items are sorted according to (145), let p and W be the
profit and weight sum of the currently chosen items. Then the mtu algorithm
can be sketched as:

algorithm mtu(i,p,w): boolean;


var improved;
if (w > c) then return false;
if (p > z) then z f- p; improved f- true; else improved f- false;
if (i > n) or (c - W < wd or (p + (c - W)pdWi < z + 1) then return
improved;
m = L(c - w)/wd;
398 D. Pisinger and P. Toth

for a +- m downto 0 do
if mtb(i + 1,15 + api, W + aWi} then Xi +- aj improved +- truej
return improvedj

All decision variables Xj must be initialized to 0, and the lower bound set
to z = 0 before calling mtu(O, 0, O}. The table W j of minimal weights is
initialized as W j = milli~j Wi.

6.5 Reduction Algorithms


An efficient way of solving UKP is to first apply some dominance relations
to fathom unpromising item types, and then to solve the remaining problem
through enumerative techniques. The general form of dominance (Pisinger
[98]) is defined as follows

Definition 6.1 Given an item type j, and a set of item types il, ... , id
where ik =1= j for k = 1, ... ,d, but where the indices {ik} are not necessarily
distinct. If

(156)

then item type j is said to be dominated by item types il, ... , id.

Obviously a dominated item type j may be fathomed from the problem, as


any optimal solution with Xj > 0 may be replaced by a solution where Xj
items of types iI, ... ,id are chosen instead.
A complete testing for dominated items may be performed in pseudo-
polynomial time through dynamic programming, where recursion (155) is
used to enumerate the left-hand sides in (156). Thus assume that the items
are ordered according to (145). Now, running through the items j = 1, ... ,n
recursion (155) is used to enumerate the states f;(c}. Before considering an
item j ~ 2 let 1[' = fj-l(wj}. If 1[' ~ Pj then itemj is dominated by an integer
combination of some items iI, ... ,im < j, and thus item j may be fathomed.
Since we do not need to consider states with weight larger than r = max Wj
in the reduction, the time complexity of this approach is O(n + mr} where
m is the number of undominated items.
Martello and Toth [81] consider a computationally cheaper dominance
test, which assumes that the indices ii, ... ,ik correspond to the same item,
say i, in equation (156). The tightest choice of d in this case is d = lWj/WiJ
as this results in the largest profit sum not exceeding the weight Wj.
Knapsack Problems 399

Theorem 6.2 (Martello and Toth [81]) For two given item types i, j
where i =/: j, we say that type j is dominated by type i if

(157)

A n optimal solution exists where x j = a for a dominated item type j and


thus item type j may be fathomed.
Martello and Toth gave a simple algorithm for reducing the dominated items
by first sorting the items according to (145). Then an O(n 2 } algorithm is
performed where each item type i attempts to dominate item types j > i.
Dudzinski [22] improved this reduction to O(nlogn + mn} where m is the
number of undominated items. Finally Pisinger [98] improved the reduction
to O(nlogn + min{mn, nw/w}} where W = maxwj and W = minwj.

6.5.1 Reductions from Bounds


The reductions presented in the previous section are unique to UKP, since
they are based on the fact that any number of items can be chosen for each
dominating subset. More common reduction techniques can however also be
used for this problem. Thus, for j ~ 3 let u} be an upper bound for (144)
with additional constraint Xj ~ 1. If u} < z + 1 then we may fix Xj at a and
thus fathom the item type j.
Bounds derived in constant time can be obtained from a generalization
of the Dembo and Hammer bound (57) getting
1
Uj = PA+ Pj + (c A
- W - Wj
)P2
W2 (158)

Martello and Toth [81] proposed using the tighter bound Ul in the reduction.
By adding the constraint Xj ~ 1 the capacity decreases to c = C - Wj but
the break item is still b = 2, thus Ul can be derived in constant time. If it is
not possible to fathom an item type by using bound Ul then Martello and
Toth proposed to use U3 instead.
In Section 5.3 we saw that upper bound tests may be used to tighten the
bound on an item type. Thus we may impose the bound on each item

mj = min {l~j,
Wj
l(Z + 1- P}W2 - (c - W)P2j}
PjW2 - WjP2
(159)

obtained directly from (136). By imposing the constraints Xj ~ mj on all


item types, we get an ordinary BKP.
400 D. Pisinger and P. Toth

6.6 Solution of Large-sized Instances


The concept of core problem described in Section 2.6 can be directly ex-
tended to UKB by recalling that, in this case, the break item type b is
always the second one. Hence a core may be defined as a collection of the 0
first item types according to ordering {145}.
The core C = [1,0] can be derived in O{n} time by a straightforward
adaptation of the findcore algorithm described in Section 2.6.1: The break
item type may initially be set to b = 2 and thus no weight sums need be
maintained. Left intervals [8, i-I] are always partitioned further, while
right intervals [i, t] are only partitioned when i ~ o. Martello and Toth [81]
experimentally found that 0' = max{lOO, Ln/lOOj} is a good initial core
size.
The only algorithm published for large sized UKP is the mtu2 algorithm
by Martello and Toth [81]. It may be outlined as follows:

algorithm mtu2

1 Set 0 = 0'.
2 Determine a core C = [1,0] using a modified findcore algorithm.

3 Sort the item types in the core and use dominance rules {157} for all
items in C to fathom unpromising item types.

4 Exactly solve the reduced core problem using algorithm mtu. If the
obtained solution z equals the bound U3 stop.

5 Reduce items j ¢ C by applying the reduction rules from previous


section.

6 If all items j ¢ C were fathomed, then stop. Otherwise increase the


core size to 0 +-- 0 + 0' and go to Step 2.

In Step 2, the core C can also be defined in O{n} time by looking for the
o'th largest element in a set {see e.g. Fischetti and Martello [34].

6.7 Computational Experiments


We compare the solution times of the specialized algorithm mtu2 with those
of the general algorithm bouknap designed for Bounded Knapsack Problems.
The latter was run with trivial bounds on each item type mj = Lc/wjJ. Both
Knapsack Problems 401

. Second s as avera.ge 0 f 100'mstances


Table 10: Computmg times m
U ncorrelated Weakly corr. Strongly corr. Subset sum
Rl n mtu2 bouknap mtu2 bouknap mtu2 bouknap mtu2 bouknap
10 100 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.09
10 300 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.31

···
10 1000 0.00 0.01 0.00 0.02 0.00 0.19 0.01 1.21
10 3000 0.01 0.02 0.01 0.02 0.01 2.63 0.01 -
10 10000 0.02 0.08 0.02 0.05 0.02 37.87 0.02 -
10 30000 0.07 0.16 0.Q7 -
• 0.07 -• 0.07 -
50 100 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.08
50 300 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.32
50 1000 0.00 0.00 0.00 0.00 0.01 0.05 0.01 1.47
50 3000 0.01 0.01 0.01 0.01 0.13 0.47 0.01 -•
50 10000 0.03 0.03 0.03 0.22 0.20 7.31 0.02 -•
50 30000 0.10 -• 0.10 -• 0.25 71.66 0.08 -•
100 100 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.12
100 300 0.00 0.00 0.00 0.01 0.01 0.06 0.01 0.42
100 1000 0.00 0.00 0.00 0.01 2.20 0.17 0.01 1.54
100 3000 0.01 0.01 0.01 0.04 - 0.51 0.01 6.77
-• - -•
·
100 10000 0.03 0.04 0.03 4.86 0.02
100 30000 0.12 0.13 0.16 - - 41.87 0.08 -•
500 100 0.00 0.00 0.00 0.00 - 0.16 0.01 0.13
500 300 0.00 0.01 0.00 0.02 - 0.72 0.01 0.63
500 1000 0.01 0.04 0.04 0.08 - 3.30 0.01 2.01
-
·· ··
500 3000 0.04 0.31 15.35 0.77 13.11 0.01 7.04
500
500
10000
30000
0.95
4.22 .
0.55
- -
- 1.06
- ·
-
-
-
-
0.02
0.09
-
-
• InsufficIent space

algorithms were tested with a number of randomly generated instances as


follows: The weights Wi are randomly distributed in [Rl, R2], while the
distribution of the profits depend on the problem type. Uncorrelated data
instances: Pi are randomly distributed in [1, R2]. Weakly correlated data
instances: Pi is randomly distributed in [Wi - 100, Wi + 100] such that Pi ~
1. Strongly correlated data instances: Pi = Wi + 100. Subset-sum data
instances: Pi = Wi'
We consider instances with R2 = 1000 and Rl having values 10,50, 100
and 500. The capacity was chosen as c = h/(H +1} Ej=l Wi for instance h in
a series of H. Table 10 gives the computational times of mtu2 and bouknap.
The results have been obtained on a HP9000/735, and a maximum time
limit of 1 hour was assigned to each series of 100 instances.
It can be seen that mtu2 is able to quickly solve all instances for Rl small,
while when Rl becomes larger it starts to have substantial problems for
402 D. Pisinger and P. Toth

weakly and strongly correlated instances. This may be due to the fact that
the reduction (157) for small values of RI is very effective, leaving only a few
items. When RI becomes large, it is not possible to fathom so many items,
and thus the enumerative part gets more difficult. Notice especially that
for the strongly correlated instances, the core [1,6] described in Section 6.6
will only contain items with weight close to R 1 • Thus the enumeration gets
stuck in replacing items of similar weight. The same happens in large-sized
weakly correlated instances, as the ordering according to profit-to-weight
ratios will also choose the smallest weights first.
The bouknap algorithm has generally worse computational times than
mtu2, and in many cases it runs out of space in the dynamic programming
part. This is due to the fact, that at least C/WI states will be enumerated
already after considering the first item. But bouknap is able to solve some of
the difficult weakly and strongly correlated problems, that mtu2 was not able
to solve, which somehow confirms that dymamic programming algorithms
are more tolerant to instances where many items have similar weights.

7 Multiple Knapsack Problem


We consider the problem where n given items should be packed in m knap-
sacks of distinct capacities Ci, i = 1, ... ,m. Each item j has an associated
profit Pj and a weight Wj, and the problem is to select m disjoint subsets of
items, such that subset i fits into knapsack i and the total profit of the se-
lected items is maximized. Thus the 0-1 Multiple Knapsack Problem (MKP)
may be defined as the optimization problem
m n
maximize LLPjXij
i=lj=1
n
"', i= 1, ... ,m,
L...J 3 13 -<,.,
subject to ""w·x··
j=1
(160)
m
LXij::; 1, j = 1, ... ,n,
i=l
Xij E {O, I}, i = 1, ... , m, j = 1, ... , n,

where Xij = 1 if item i is assigned to knapsack j and Xij = 0 otherwise. It


is usual to assume that the coefficients Pj, Wj and Ci are positive integers,
as fractional values are handled by multiplying through by a proper factor,
and items with Pj ::; 0 as well as knapsacks with Ci ::; 0 may be eliminated.
Knapsack Problems 403

There is however no easy way of transforming an instance so as to handle


negative weights, as described in Section 2.
To avoid trivial cases it is assumed that Wj ~ maxi=I, ... ,m Ci for j =
1, ... ,n, to ensure that item j fits into at least one knapsack as otherwise it
may be removed from the problem. Also Ci ~ minj=l, ... ,n Wj for i = 1, ... ,m
may be assumed, as any knapsack where no items fit into it, may be dis-
counted. Finally it may be assumed that 2:'1=1 Wj > Ci for i = 1, ... , m to
avoid a trivial solution where all items fit into one of the knapsacks.
There are several applications for MKP, as the problem directly re-
flects a situation of loading m ships/containers or e.g. packing m envelopes.
Martello and Toth [80] also proposed the problem used for deciding how to
load m liquids into n tanks, when the liquids may not be mixed.
Several branch-and-bound algorithms for MKP have been presented dur-
ing the last two decades, among which we should mention Hung and Fisk
[55], Martello and Toth [75], Pisinger [101], Neebe and Dannenbring [90],
and Christofides, Mingozzi and Toth [13]. The first three are best-suited for
problems where several items are filled into each knapsack, while the last
two are designed for problems with many knapsacks and few items. In the
following we will mainly focus on the first kind of problems where the ratio
n/m is assumed to be large.
In Section 7.1 we will consider different upper bounds for MKP, and
see how these may be tightened by some polyhedral properties in Section
7.2. Reduction algorithms, attempting to fix some variables at their optimal
values, are described in Section 7.3. Heuristics and approximate algorithms
are considered in the following section, while Sections 7.5 and 7.6 deal with
exact solution methods. A comparison of the most effective algorithms for
MKP is finally presented in Section 7.7.

7 .1 Upper Bounds

Different upper bounds for MKP may be obtained by either surrogate, La-
grangian or continuous relaxation. We will first consider surrogate relax-
ation. Let AI, ... , Am be some nonnegative multipliers, then the surrogate
404 D. Pisinger and P. Toth

relaxed problem S(MKP, Ai) becomes:

m n
maximize L LPjXij
i=lj=l
m n m
subject to L Ai L WjXij ~ L AiCi,
i=l j=l i=l
m
LXij ~ 1, j = 1, ... ,n,
i=l
Xij E {0,1}, i=I, ... ,m, j=I, ... ,n.
(161)
The best choice of multipliers Ai are those producing the smallest objective
value in (161). These can be analytically found as follows:

Theorem 7.1 (Martello and Toth [76]) For any instance of MKP, the
optimal choice of multipliers AI, . .. , Am for MKP is Ai = k, i = 1, ... , m,
where k is a positive constant.

With this choice of multipliers S(MKP, Ai) becomes the following 0-1 Knap-
sack Problem

n
maximize LPjxj
j=l
n
(162)
subject to L Wjxj ~ c,
j=l
xjE{O,I}, j=I, ... ,n,

where the introduced decision variables xj = L:~1 Xij indicate whether item
j is chosen for any of the knapsacks i = 1, ... ,m, and c = L:~1 Ci may be
seen as the capacity of the united knapsacks.
By Continuous relaxation of the variables Xij the problem C(MKP) ap-
pears. Martello and Toth [80J proved that z(C(MKP» = z(C(S(MKP, 1»)
and thus the objective value of the continuously relaxed problem may be
found in O(n) as the Dantzig bound for (162), which however can never be
less than z(S(MKP, 1».
Two different Lagrangian relaxations of MKP are possible. By relaxing
the constraints L:~1 Xij ~ 1 using nonnegative multipliers AI, ... , An the
Knapsack Problems 405

problem Ll (MKP, Aj) becomes:

ma>dmize t. t, t, (t.
n
PjX;;- Aj X;j - 1)

(163)
subject to 2: WjXij $ Ci, i = 1, ... , m,
j=l
XijE{O,l}, i=l, ... ,m, j=l, ... ,n.

By setting pj = Pj - Aj for j = 1, ... , n the relaxed problem can be decom-


posed into m independent 0-1 Knapsack Problems, where problem i has the
form
n
maximize Zi = 2:PjXij
j=l
n
(164)
subject to " W 3·x··
L.J
" 13 - < .... ,
r':

j=l
Xij E {0,1}, j = 1, ... ,n,
for i = 1, ... , m. All the problems have similar profits and weights, thus only
the capacity distinguishes the individual instances. An optimal solution to
the L1(MKP, Ai} is then z = E~l Zi + Ej=l Aj.
As opposed to surrogate relaxation, there is however no optimal choice
of multipliers Aj for Lagrangian relaxation, thus approximate values may
be found by subgradient optimization. Hung and Fisk [55] used predefined
multipliers given by

"\. = { Pj - WjPb/Wb if j < b (165)


/\3 0 if j ~ b

where b is the break item of S(MKP, 1}. With this choice of Aj, in (164) we
have 'Pi/Wj = Pb/Wb for j $ b, and pj/Wj $ Pb/Wb for j > b. Thus it follows
that
m n b-l m b-l
z(C(LI(MKP, -A))} = -Pb ""
L.JCi + ""
L.J -Aj = ""
L.JPj + (L.JCi
"" - "" Pb (166)
L.J Wj)-
Wb i=l j=l j=l i=l j=l Wb
thus we get

z(C(Lt{MKP, Xj})) = z(C(S(MKP, I})) = z(C(MKP)) (167)


i.e. the multipliers Xj represent the best multipliers for C(L1(MKP, Aj)}. In
addition both Lt{MKP, X} and surrogate relaxation S(MKP,1) dominate
406 D. Pisinger and P. Toth

continuous relaxation. There is however no dominance between Lagrangian


and the surrogate relaxations.
Another Lagrangian relaxation L2(MKP, Ai) appears by relaxing the
weight constraints. Using positive multipliers Al, ... , Am, the relaxed prob-
lem becomes

maximize t tP;Xi;-
i=1 ;=1
t (ti=1
Ai
;=1
W;Xi; - Ci)
(168)
L Xi; ~ 1
m
subject to
i=1
Xi; E {0,1}, i = 1, ... ,m, j = 1, ... ,n.
It is important that Ai > 0 for all i = 1, ... , m as otherwise the above
problem has a useless solution, where all items are put into the knapsack
where Ai = O. The objective function can be rewritten as
m n m
maximize L L(P; - AiW;)Xi; + L AiCi (169)
i=1 ;=1 i=1

which shows that the optimal solution can be found by selecting the knap-
sack /'i, with the smallest associated value of Ai and choosing all items with
P; - AK.W; > 0 for this knapsack. Since this is also the optimal solution of
the continuous relaxation of the same problem, i.e. Z(C(L2(MKP, Ai))) =
Z(L2(MKP, Ai))), we get Z(L2(MKP, Ai)) ;::: z(C(MKP)) and thus this La-
grangian relaxation cannot produce a bound tighter than the continuous
one.
Note that any of the polynomially obtainable upper bounds presented
in Section 2.1 can be used for the Knapsack Problems (162) and (164) to
obtain upper bounds for MKP in polynomial time.
The most natural polynomially computable upper bound for MKP is
Ul = lz(C(MKP))J = lz(C(S(MKP,I)))J = lz(C(Ll(MKP, X))J (170)
Martello and Toth [80] have shown that the worst case performance of bound
U1 is p(UJ) = m + 1, where p(U) is defined by (13).

7.2 Tightening Constraints


Polyhedral properties may be used to tighten the upper bound obtained by
any of the above relaxations. Pisinger [101] used the following technique to
tighten the constraints:
Knapsack Problems 407

Considering the capacity constraints in (160) for each knapsack sepa-


rately, the largest possible filling ~ of knapsack i is found by solving the
following Subset-sum Problem

~ = max {E']=1 WjXij : E']=1 WjXij $ Ci; Xij E {O, I}, j = 1, ... ,n}
(171)
As no optimal solution x can have weight sum E']=1 WjXij > ~, we may
tighten the constraints in (160) to
n
L WjXij $~, i = 1, ... , m. (172)
j=1

Since all problems (171) are defined for the same weights, dynamic pro-
gramming may be used to derive all capacities c, in O(nc) time where
c = maxi=I, ...,m Ci.
Within a branch-and-bound algorithm it is possible to use a tighter ver-
sion of the above criterion, since only variables not fixed by the enumeration
need be considered in (171). The tightening yields tighter upper bounds,
and may also help in obtaining better lower bounds, as the capacities are
more realistic. Pisinger [101] experimentally showed, that the tightening is
able to decrease the capacity by several hundreds of weight units for a small
problem n = 50, m = 5, Wj E [10, 10 000] and so-called dissimilar capacities
Ci given by (177).
Notice, for instance, that if knapsack i was tightened to capacity~, then
problem (164) with multipliers Xj given by (165) is bounded by Zi $ Cpb/Wb
which is the continuous solution of (164).
Other polyhedral properties of MKP are considered in Ferreira, Martin
and Weismantel [33].

7.3 Reduction Algorithms


The size of a Multiple Knapsack Problem may be reduced by preprocessing
as described in Section 2.3: Assume that the solution vector corresponding
to the current lower bound z has been saved. For a given item j let u~ be
any upper bound on (160) with the additional constraintEi:!:1 Xij = O. If
uJ < Z + 1, then the constraintEi:!:1 Xij = 1 may be added to the problem,
i.e. in every improved solution to MKP, item j must be included in some
knapsack. In a similar way let u} be any upper bound on (160) with the
additional constraintEi:!:1 Xij = 1. If u} < Z + 1, then Xij may be fixed at 0
408 D. Pisinger and P. Toth

for i = 1, ... ,m, thus in any improved solution, item j cannot be included
in any of the knapsacks. Notice that the last equation is able to fix all
variables Xij to their optimal value for i = 1, ... ,m, while the first equation
only rules out one possibility out of m + 1. Pisinger [101] used the Dembo
and Hammer [18] bound with respect to {162} for the reduction, getting
an O(n} reduction procedure. Only bounds u} were derived, attempting to
exclude an item from all knapsacks. This reduction was performed at each
branching node.
Ingargiola and Korsh [60] presented a reduction procedure based on dom-
inance of items. An item j dominates another item k if Pj ~ Pk and Wj ~ Wk.
Whenever an item j is excluded from a given knapsack i during the branch-
ing process, all items dominated by j may also be excluded from knapsack i.
Similarly, whenever an item j is included in a knapsack, all items dominating
j must be included in one of the knapsacks.

7.4 Heuristics and Approximate Algorithms


Since MKP is N'P-hard in the strong sense, no fully polynomial approxima-
tion scheme can be found unless 'P = N'P [40]. The same conclusion holds
for MKP in its minimization form. Martello and Toth [80] also proved that
MKP in the minimization form cannot have a polynomial-time approxima-
tion scheme. It is however open whether the same holds for the maximization
form, but currently no polynomial-time approximation scheme is known for
MKP in the maximization form. For fixed m, Chandra, Hirschberg and
Wong [12] however showed that a fully polynomial approximation scheme
exists for MKP in the maximization form.
Greedy algorithms are considered in [80]. The most natural one is to
use the continuous solution of S{MKP, I} to produce a feasible solution.
Assume that the items are ordered according to

{173}

and let b = min{h : Ej=l Wj > c} be the break item. For each knapsack
i define the ith break item bi as bi = min{h : Ej=bi_l Wj > Ci} where
bo = 1. Then each knapsack i can be filled with items bi-l, ... ,bi - 1, and
the objective value is z' = E~~ll Pj. The absolute error of z' is less than
E~=bm Pj where b - bm < m. The relative error is however arbitrarily bad.
A polynomial-time approximation algorithm was proposed by Martello
and Toth [76]. The items are assumed to be sorted as above while the
Knapsack Problems 409

capacities are ordered Cl ~ C2 ~ ••• ~ em. An initial feasible solution is


determined by calling the greedy algorithm described in Section 2.2 for the
first knapsack, removing the chosen items, and then calling it for the next
knapsack until all m knapsacks have been considered. In the second phase,
a number of exchanges are performed. If two items assigned to different
knapsacks can be interchanged such that one more item fits into one of the
knapsacks, then this exchange is made. The resulting algorithm runs in
O(n 2 ). Some computational results given in [80] show a good performance
for randomly generated instances.

7.5 Dynamic Programming


The Multiple Knapsack Problem is NP-hard in the strong sense, meaning
that we cannot expect to find pseudo-polynomial algorithms for the problem
except if NP = P. However if we only consider cases with a fixed number m
of knapsacks, then the problem may in fact be solved in pseudo-polynomial
time through dynamic programming. For instance, an MKP with two knap-
sacks (m = 2) may be solved in O(nclC2) time as follows: Let fk(Cl, (2) be
the optimal solution value to the problem defined on the first k items, with
capacities Cl, C2 i.e.

Et=lPj(Xlj +:2j ): Ej=l WjXlj ~ Cli }


Ej=1 Wj X 2j ~ c2i (174)
Xlj + X2j ~ 1, j = 1, ... , ki
Xlj,X2j E {O,l},j = 1, ... ,k

The following recursion may be used to evaluate fk:

(175)

with initial values fO(Cl, (2) = 0 for all Cl, C2 and with h(Cl, (2) = -00 if
Cl < 0 or C2 < O. An optimal solution is found as fn(cI, C2) by evaluating
fk("') for all values of k = 1, ... , n, yielding the complexity O(nclC2). The
above may be generalized to a MKP problem with m knapsacks, where the
time bound becomes O(nm n~l Ci). Since this approach is not attractive
for large values of m, most of the literature has been focused on branch-
and-bound techniques, although Fischetti and Toth [35] used some kind of
dominance tests to speed up the solution process.
410 D. Pisinger and P. Toth

7.6 Branch-and-bound Algorithms


Hung and Fisk [55] proposed a depth-first branch-and-bound algorithm
where the upper bounds were derived by the Lagrangian relaxation, and
branching was performed for the item which in the relaxed problem had
been selected in most knapsacks. Each branching item was alternately as-
signed to the knapsacks in increasing index order, where the knapsacks were
ordered in nonincreasing order Cl ~ C2 ~ ••• ~ em. When all the knap-
sacks had been considered, a last branch was considered where the item was
excluded from the problem.
A different branch-and-bound algorithm was proposed by Martello and
Toth [75], where at each decision node, MKP was solved with constraint
Ef,!,l Xij :$ 1 omitted, and the branching item was chosen as an item which
had been packed in k > 1 knapsacks of the relaxed problem. The branching
operation generated k nodes by assigning the item to one of the correspond-
ing k - 1 knapsacks or excluding it from all of these. A parametric technique
was used to speed up the computation of the upper bound at each node.
In a later work Martello and Toth [76] however focused on three aspects
that make it difficult to solve MKP:

• Generally it is difficult to verify feasibility of an upper bound obtained


either by surrogate relaxation or Lagrangian relaxation.

• The branch-and-bound algorithm needs good lower bounds for fath-


oming nodes in the enumeration.

• Some knowledge to guide the branching towards good feasible solutions


is needed.

In order to avoid these problems, Martello and Toth proposed a bound-and-


bound algorithm for MKP, where at each node of the branching tree not
only an upper bound, but also a lower bound is derived. This technique
is well-suited for problems where it is easy to find a fast heuristic solution
which yields good lower bounds, and where it is difficult to verify feasibility
of the upper bound.

7.6.1 The mtm Algorithm


The mtm bound-and-bound algorithm derives upper bounds by solving the
surrogate relaxed problem {162} while lower bounds are found by solving
m individual 0-1 Knapsack Problems as follows: The first knapsack i = 1
Knapsack Problems 411

is filled optimally, the chosen variables are removed from the problem, and
the next knapsack i = 2 may be filled. This process is continued until
all the m knapsacks have been filled. The branching scheme follows this
greedy solution, as a greedy solution should be better to guide the branching
process than individual choices at each branching node. Thus at each node
two branching nodes are generated, the first one assigning the next item j of
a greedy solution to the chosen knapsack i, while the other branch excludes
item j from knapsack i.
Martello and Toth assume that the capacities are ordered in nondecreas-
ing order CI ~ C2 ~ ••• ~ em, while the items are ordered according to
nonincreasing profit-to-weight ratios (173) so every branching takes place
on the item with the largest profit-to-weight ratio. At any stage, knapsacks
up to k have to be filled before knapsack k + 1 is considered. The current
solution vector is x, while the optimal solution vector is denoted x"'.
At any stage the list T = {(il,iI), ... , (id,jd)) contains indices to the
variables Xij which have been fixed to either 0 or 1 during the branching
process. The profit sum of the currently fixed variables is P while CI, .•• , em
refer to the current residual capacities.

algorithm mtm{k,P,cI,'" ,em);


Find an upper bound u by solving S(MKP, 1) defined on the free items and
capacity C = E~l Ci·
Find a lower bound l and the corresponding greedy solution y.
if (P + l > z) then xij +- Xij for (i,j) E T and xij +- Yij for (i,j) ¢ T.
Set z +- P + i.
if (P+u > z) then
Choose the first item j where Ykj = 1 and item j has not been excluded
from knapsack k, i.e. (k, j) ¢ T.
If no such item exists, increase k and repeat the search.
Set T +- T U (k,j);
Set Xkj +- 1; {Assign item j to knapsack k}
mtm{k, P + Pj, CI,"" Ck - Wj,"" em);
Set Xkj +- 0; {Exclude item j from knapsack k}
mtm{k, P, CI, ... ,em);
Set T +- T \ (k,j);

Initially variables Xij, Xij are set to 0, and the lower bound is set to z = O.
Then the recursion mtm{1, 0, 0, CI, ••• ,em) is called. Martello and Toth used
the mtl algorithm to solve S{MKP, 1) in the first part of the algorithm.
412 D. Pisinger and P. Toth

Since every forward move setting Xkj = 1 follows the greedy solution y,
the lower bound l and solution vector Y will not change by this branching.
Thus land y need only be determined after branches of the form Xkj = O.

7.6.2 The mulknap Algorithm


Pisinger [101] noted that a solution to S(MKP, 1) may be validated by solv-
ing a series of Subset-sum Problems, where the chosen items are attempted
to be distributed among the m knapsacks. If this attempt succeeds, the
lower bound equals the upper bound, and a backtracking occurs. Other-
wise a feasible solution has been obtained which contains some (but not
necessarily all) of the items selected by S(MKP,l). Thus the algorithm is
merely an ordinary branch-and-bound approach since upper bounds yield a
feasible solution. Pisinger also used Subset-sum Problems for tightening the
capacity constraints corresponding to each knapsack as described in Section
7.2.
The branching scheme is based on a binary splitting where an item j is
either assigned to knapsack i or excluded from the knapsack. The knapsacks
are ordered in nondecreasing order CI ~ C2 ~ ..• ~ em and the smallest
knapsack is filled completely before starting to fill the next knapsack. At
any stage, items j ~ h have been fixed by the branching process, thus only
items j > h are considered when upper and lower bounds are determined.
To keep track of which items are excluded from some knapsacks, a variable
dj for each item j indicates that the item may only be assigned to knapsacks
i ~ dj. The current solution vector is x while P is the profit sum of the
currently fixed items.

algorithm mUlknap(h, P, CI, ••• , em);


Tighten the capacities q by solving m Subset-sum Problems defined on
h+1, ... ,n.
Solve the surrogate relaxed problem with capacity C= ~i:!:l q. Let y be
the solution to this problem, with objective value u.
if (P + u > z) then
Split the solution y in the m knapsacks by solving a series of Subset-sum
Problems defined on items with Yj = 1. Let Yij be the optimal filling
of q with corresponding profit sum Zi.
Improve the heuristic solution by greedy filling knapsacks with
Ej=h+l WjYij < q.
if (P + ~i:!:1 Zi > z) then xij = Yij for j ~ hand xij = Xij for j < h,
Knapsack Problems 413

set z t- P + Er;,l Zi.


if (P + u > z) then
Reduce the items as described in Section 7.3, and swap the reduced
items to the first positions, increasing h.
Let i be the smallest knapsack with Ci > O. Solve an ordinary 0-1
Knapsack Problem with C = Ci defined on the free variables.
The solution vector is y. Choose the branching item k
as the item with largest profit-to-weight ratio among items Yj = 1.
Swap k to position h + 1 and set j t- h + 1.
Set Xij t- Ij {Assign item j to knapsack i }
mulknap(h + 1, P + Pj, Wj, CI, .•. ,Ci - Wj, ... , em)j
Set Xij t- OJ d' t- djj dj t- i + 1; {Exclude item j from knapsack i}
mulknap(h, P,CI, ... , em);
Find j again, and set dj t- d'.

The surrogate relaxed problem is solved using the minknap algorithm de-
scribed in Section 2.6. The Subset-sum Problems are solved by the decomp
algorithm described in Section 3.2.1. The algorithm does not demand any
special ordering of the variables, as the minknap algorithm makes the neces-
sary ordering itself. This however implies that items are permuted at each
call to minknap thus the last line of mulknap cannot assume that item j is
at the same position as before.
The main algorithm thus only has to order the capacities according to
nondecreasing Ci, set z = 0 and initialize dj = 1, and xij = Yij = 0 for all
i, j before calling mulknap{O, 0, 0, CI, ••• , em).

7.7 Computational Experiments


We will compare the performance of the mtm algorithm with that of the
mulknap algorithm. Four different types of randomly generated data in-
stances are considered for different ranges R = 100, 1000 and 10 000. Un-
correlated data instances: Pj and Wj are randomly distributed in [1, R].
Weakly correlated data instances: Wj randomly distributed in [1, R] and Pj
randomly distributed in [Wj - R/I0, Wj + R/lO] such that Pj ?: 1. Strongly
correlated data instances: Wj randomly distributed in [1, R] and Pj = Wj+ 10.
Subset-sum data instances: Wj randomly distributed in [1, R] and Pj = Wj.
H:>.
t-'
H:>.
Table 11: Total computing time mulknap, small problems with m = 5.
Uncorrelated Weakly corr. Strongly corr. Subset-sum
n\R 100 1000 10000 100 1000 10000 100 1000 10000 100 1000 10000
25 0.57 0.70 0.44 5.02 15.29 9.49 0.10 2.45 3.85 0.00 1.82 13.16
50 0.00 0.00 0.02 0.00 0.00 15.67 0.00 0.01 0.12 0.00 0.01 0.10
75 0.00 0.00 0.01 0.00 0.00 0.01 0.01 0.02 0.22 0.00 0.01 0.04
100 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.03 0.24 0.00 0.01 0.04
200 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.11 1.17 0.00 0.01 0.03
25 0.01 0.12 0.03 0.20 0.21 0.87 0.00 0.13 0.47 0.00 0.12 1.29
50 0.00 0.00 0.01 0.00 0.34 2.80 0.00 0.02 0.69 0.00 0.01 0.38
75 0.00 0.00 0.01 0.00 0.01 0.10 0.01 0.08 1.70 0.00 0.02 0.28
100 0.00 0.00 0.01 0.01 0.03 0.25 0.01 0.03 0.44 0.00 0.01 0.16
200 0.00 0.00 0.01 0.00 0.01 0.02
-----------
0.02 ().~ l]~. 0.00 0.01 0.12

Table 12: Total . tin!! t' 11 . M 'th 5


Uncorrelated Weakly corr. Strongly corr. Subset-sum
n\R 100 1000 10000 100 1000 10000 100 1000 10000 100 1000 10000
25 0.34 0.80 0.75 3.96 7.87 4.42 0.37 9.74 29.12 0.00 19.64 111.36
50 0.06 2.13 6.21 0.63 62.06 181.67 3.02 6.68 522.53 0.00 0.01 0.06 ~
75 0.05 1.19 13.38 0.41 21.04 123.00 - 126.91 322.34 0.00 0.00 0.05 1:l
U).
100 0.05 1.27 10.35 0.11 21.36 242.46 - 623.90 - 0.00 0.00 0.05 ::;.
200 0.03 1.26 5.81 0.02 16.10 278.77 - - - 0.00 0.01 0.05
~
"'I
25 0.02 0.04 0.06 0.17 0.14 0.80 0.03 0.36 1.49 0.00 0.67 5.82
50 0.02 0.13 1.06 0.18 2.82 22.60 0.16 2.58 151.76 0.00 0.28 14.47 ~
Q.,
75 0.02 0.29 1.17 0.02 3.54 21.49 8.68 111.48 169.50 0.00 0.08 4.97
100 0.01 0.23 2.63 0.03 4.17 21.71 - 197.25 212.37 0.00 0.02 1.22 ~
200 0.01 0.09 0.80 0.01 11.73 153.46 - - - 0.00 0.01 0.41 63
:;
Knapsack Problems 415

Martello and Toth [80] proposed considering two different classes of ca-
pacities as follows: Similar capacities have the first m - 1 capacities Gi
randomly distributed in

[0.4 E7=1 wj/m, 0.6 E7=1 wj/m] for i = 1, ... , m - 1, (176)

while dissimilar capacities have Gi distributed in

The last capacity em is in both classes chosen as em = 0.5 E7=1 Wj - Ei,!,ll Gi,
to ensure that the sum of the capacities is half of the total weight sum. For
each instance a check is made on whether the assumptions stated below
(160) are respected, and a new instance is generated otherwise.
A maximum amount of 1 hour was given to each algorithm for solving
the ten instances in each class, and a dash in the following tables indicates
that the ten problems could not be solved within this time limit. The
mtm algorithm is only designed for problems up to n = 1000 and cannot
solve larger instances without a modification of the code. Thus no tests
with n > 1000 have been run with mtm. Small data instances are tested
with m = 5 knapsacks, while large instances have m = 10, as none of the
algorithms are able to solve problems with small values of n/m. In each of
the following tables, instances with similar capacities are considered in the
upper part of the table, while the lower part of the table considers instances
with dissimilar capacities.
Table 11 and 12 compare the solution times of the two algorithms for
small instances up to n = 200. Both algorithms solve uncorrelated and
subset-sum type problems in reasonable time, while mtm has considerable
problems for large-sized and large-ranged problems of weak and strong cor-
relation. If the individual entries are compared, it is seen that mulknap
generally has faster solution times than mtm, and the larger the problems
become the more efficient mulknap gets.
~
I-'
0')

Table 13: Total computing time mulknap, large problems with m = 10.
Uncorrelated Weakly COIT. Strongly COIT. Subset-sum
n\R 100 1000 10000 100 1000 10000 100 1000 10000 100 1000 10000
100 0.00 0.01 0.03 0.00 0.01 0.04 0.01 0.04 0.32 0.00 0.01 0.11
300 0.00 0.01 0.02 0.01 0.01 0.03 0.02 0.23 1.75 0.01 0.01 0.08
1000 0.00 0.00 0.01 0.00 0.01 0.01 0.10 0.93 8.24 0.01 0.01 0.06
3000 0.01 0.01 0.01 0.01 0.01 0.03 0.23 3.04 29.60 0.01 0.01 0.06
10000 0.02 0.02 0.04 0.01 0.02 0.07 1.30 11.04 188.61 0.02 0.03 0.08
30000 0.05 0.05 0.10 0.05 0.06 0.09 4.63 48.79 684.22 0.06 0.07 0.14
100000 0.20 0.21 0.28 0.19 0.21 0.25 16.30 214.26 1623.33 0.24 0.25 0.36
100 0.04 0.01 0.87 0.02 0.62 27.88 0.01 2.51 16.10 0.00 0.39 10.35
300 0.01 0.01 0.07 0.01 0.02 0.29 0.03 4.00 4.99 0.01 0.25 0.47
1000 0.00 0.01 0.03 0.01 0.06 0.04 0.08 0.58 52.66 0.01 0.02 0.22
3000 0.01 0.01 0.03 0.01 0.01 0.11 0.21 2.48 31.08 0.01 0.02 0.10
10000 om 0.02 0.05 0.01 0.02 0.06 0.75 8.28 162.32 0.02 0.03 0.08
30000 0.04 0.04 0.10 0.04 0.04 0.08 3.23 38.55 477.18 0.05 0.07 0.12
100000 0.17 0.17 0.24 0.16 0.17 0.21 14.57 169.21 1287.07 0.20 0.21 0.30

Table 14: Total computing tinl~ mtm,Jarge


-
problems with m = 10. ~
Uncorrelated Weakly corr. Strongly COIT. Subset-sum 'i:I
~.
n\R 100 1000 10000 100 1000 10000 100 1000 10000 100 1000 10000
100 1.78 337.84 - - - 0.00 0.01 0.08

- 54.25 - - ~
300 0.12 31.71 502.31 0.11 391.93 - - - - 0.00 0.01 0.10 .,
1000 0.03 37.02 828.39 0.01 105.97 - - - - 0.01 0.02 0.11 §
100 0.18 1.20 18.80 0.74 56.63 696.98 542.83 - - 0.00 2.04 515.92 0...
300 0.07 2.47 30.36 0.07 145.85 - - - - 0.00 0.25 6.09 ~
1000 0.08 2.49 75.87_ '-----0___0_2_ 24.76 - - - - om 0.04 4.79 63
g.
Knapsack Problems 417

The same situation appears for large instances with n ~ 100 as those
presented in Table 13 and 14. Here mulknap is able to solve all of the large-
sized instances while mtm only can solve low-ranged problems. For very large
instances with n ~ 10 000, mulknap is actually able to solve the problems in
times comparable with the best solution times for the 0-1 Knapsack Problem.
The computational results indicate that large Multiple Knapsack Problems,
despite the NP-hardness, are generally as easy to solve as ordinary 0-1
Knapsack Problems. For n < 100 and m = 10 both algorithms however
have problems in solving the instances, as the n/m ratio becomes too small.
The performance of mtm can be improved by using a more recent algo-
rithm for solving the 0-1 KP. Using mthard instead of mti will especially
improve the performance for strongly correlated problems.

8 Conclusion and Future Trends


Knapsack Problems is a field of research where theoretical results and com-
putational experience go hand in hand. The present chapter has shown
that recent techniques make it possible to solve several large-sized difficult
problems within reasonable time. But the computational results also show
that several instances are impossible to solve, and thus require new solu-
tion techniques to be developed. One could mention real-valued knapsack
problems, quadratic knapsack problems, non-fill problems, and knapsack
problems with large weights as some of the fields where more research is
required.
Where the 80s were mainly concerned with the solution of large-sized
instances, the past decade has focused on developing algorithms which can
solve a large spectrum of problems. In this development, one may notice that
solving the core problem is not so attractive, since this technique is mainly
applicable to easy problems. The core problem is however an elegant way of
quickly obtaining a good lower bound, but for difficult instances it is neces-
sary to involve other techniques: Facets lead to tight bounds for some diffi-
cult problems and dynamic programming gives useful worst-case bounds. A
future research topic could be combining dynamic programming algorithms
with the tight bounds developed for branch-and-bound algorithms, in or-
der to derive effective algorithms with worst-case complexity given by the
dynamic programming recursion.
It is well-known that a Bellman recursion for the 0-1 knapsack problem
has some overheads: The problem is not only solved for a given capacity c,
418 D. Pisinger and P. Toth

but for all capacities c :::; c. This indicates that the time bound O(nc) of
the Bellman recursion is not tight. One attempt to derive faster recursions
is balancing, but we may see algorithms based on other properties, which
obtain tighter worst-case time bounds.
Knapsack Problems have recently been applied [27] with great success
to tighten the formulation of difficult real-life combinatorial problems. The
problems that appear in this context are however not pure Knapsack Prob-
lems, but Knapsack Problems with some additional constraints like cardi-
nality bounds, incompatibility-constraints, clique constraints, etc. Thus it
is important that we find effective solution techniques for these combined
problems.
When Knapsack Problems appear as subproblems in more general Com-
binatorial problems, it is not unusual that several Knapsack Problems are
solved at every branching node of the master problem. It is however often
the case that the knapsack instances solved are quite similar, e.g. only dif-
fer by a single item which has been removed or inserted. This motivates
developing dynamic knapasack algorithms, which by maintaining some ex-
tra information about the previous problems, are able to solve the modified
problem faster than if one started from scratch. Dynamic programming is an
obvious technique to apply for such algorithms, but dynamic data structures
may also be effective tools.

References
[1] J. H. Ahrens and G. Finke (1975), "Merging and Sorting Applied to the
Zero-One Knapsack Problem", Operations Research, 23, 1099-1109.

[2] L. Aittoniemi (1982), "Computational comparison of knapsack algo-


rithms", presented at XIth International Symposium on Mathematical
Programming, Bonn, August 23-27.

[3] G. d'Atri, C. Puech (1982), "Probabilistic analysis of the subset-sum


problem", Discrete Applied Mathematics, 4, 329-334.

[4] D. Avis (1980) Theorem 4. In V. Chvatal, "Hard knapsack problems",


Operations Research, 28, 1410-1411.

[5] E. Balas (1975), "Facets of the Knapsack Polytope", Mathematical Pro-


gramming, 8, 146-164.
Knapsack Problems 419

[6] E. Balas and E. Zemel (1980), "An Algorithm for Large Zero-One Knap-
sack Problems", Operations Research, 28, 1130-1154.
[7] P. Barcia and K. Jornsten (1990), "Improved Lagrangean decomposi-
tion: An application to the generalized assignment problem" , European
Journal of Operational Research, 46, 84-92.
[8] R. E. Bellman (1957), Dynamic programming, Princeton University
Press, Princeton, NJ.
[9] R. L. Bulfin, R. G. Parker and C. M. Shetty (1979), "Computational re-
sults with a branch and bound algorithm for the general knapsack prob-
lem", Naval Research Logistics Quarterly, 26, 41-46.
[10] R. E. Burkard and U. Pferschy (1995), "The Inverse-parametric Knap-
sack Problem", European Journal of Operational Research, 83 376-393.
[11] A. Caprara, D. Pisinger, P. Toth (1997), "Exact Solution of Large Scale
Quadratic Knapsack Problems", Abstracts ISMP'97, EPFL, Lausanne,
24-29 August 1997.
[12] A. K. Chandra, D. S. Hirschberg, C. K. Wong (1976), "Approximate al-
gorithms for some generalized knapsack problems". Theoretical Com-
puter Science, 3, 293-304.
[13] N. Christofides, A. Mingozzi, P. Toth (1979). "Loading Problems". In
N. Christofides, A. Mingozzi, P. Toth, C. Sandi (eds.), Combinatorial
Optimization, Wiley, Chichester, 339-369.
[14] V. Chvatal (1980), "Hard Knapsack Problems", Operations Research,
28, 1402-1411.
[15] T. H. Cormen, C. E. Leiserson and R. L. Rivest (1990), Introduction to
Algorithms, MIT Press, Massachusetts.
[16] H. Crowder, E.L. Johnson, M.W. Padberg (1983), "Solving large-scale
zero-one linear programming problems", Operations Research, 31, 803-
834.
[17] G. B. Dantzig (1957), "Discrete Variable Extremum Problems", Opera-
tions Research, 5, 266-277.
[18] R. S. Dembo and P. L. Hammer (1980), "A Reduction Algorithm for
Knapsack Problems", Methods of Operations Research, 36, 49-60.
420 D. Pisinger and P. Toth

[19] B. L. Dietrich and L. F. Escudero (1989), "More coefficient reduction for


knapsack-like constraints in 0-1 programs with variable upper bounds",
IBM T.J., Watson Research Center, RC-14389, Yorktown Heights N. Y.

[20] B. L. Dietrich and L. F. Escudero (1989), "New procedures for pre-


processing 0-1 models with knapsack-like constraints and conjunctive
and/or disjunctive variable upper bounds", IBM T.J., Watson Research
Center, RC-14572, Yorktown Heights N. Y.

[21] W. Difi"e and M. E. Hellman {1976}, "New directions in cryptography",


IEEE Trans. Inf. Theory, IT-36, 644-654.

[22] K. Dudzinski (1991), "A note on dominance relations in unbounded


knapsack problems", Operations Research Letters, 10, 417-419.

[23] K. Dudzinski and S. Walukiewicz {1984}, "A fast algorithm for the lin-
ear multiple-choice knapsack problem", Operations Research Letters, 3,
205-209.

[24] K. Dudzinski and S. Walukiewicz (1987), "Exact Methods for the Knap-
sack Problem and its Generalizations", European Journal of Opera-
tional Research, 28, 3-21.

[25] M. E. Dyer {1984}, "An O(n} algorithm for the multiple-choice knapsack
linear program", Mathematical Programming, 29, 57-63.

[26] M. E. Dyer, N. Kayal and J. Walker {1984}, "A branch and bound al-
gorithm for solving the multiple choice knapsack problem", Journal of
Computational and Applied Mathematics, 11, 231-249.

[27] L. F. Escudero, S. Martello and P. Toth {1995}, "A framework for tight-
ening 0-1 programs based on an extension of pure 0-1 KP and SS
problems", In: E. Balas, J. Clausen (eds.): Integer Programming and
Combinatorial Optimization, Fourth IPCO Conference. Lecture Notes
in Computer Science, 920, 110-123.

[28] D. Fayard and G. Plateau (1977), "Reduction algorithm for single and
multiple constraints 0-1 linear programming problems", Conference on
Methods of Mathematical Programming, Zakopane {Poland}.

[29] D. Fayard and G. Plateau {1982}, "An Algorithm for the Solution ofthe
0-1 Knapsack Problem", Computing, 28, 269-287.
Knapsack Problems 421

[30] D. Fayard and G. Plateau (1994), "An exact algorithm for the 0-1 col-
lapsing knapsack problem", Discrete Applied Mathematics, 49, 175-
187.

[31] C.E. Ferreira, M. Grotsche, S. KieH, C. Krispenz, A. Martin and


R. Weismantel (1993), "Some integer programs arising in the design
of mainframe computers", ZOR, 38 77-100.

[32] C. E. Ferreira, A. Martin, C. de Souza, R. Weismantel and L. Wolsey


(1994), "Formulations and valid inequalities for the node capacitated
graph partitioning problem", CORE discussion paper, 9437, Unversite
Catholique de Louvain.

[33] C.E. Ferreira, A. Martin, R. Weismantel (1996), "Solving multiple knap-


sack problems by cutting planes", SIAM Journal on Optimization, 6,
858-877.

[34] M. Fischetti and S. Martello (1988), "A hybrid algorithm for finding
the kth smallest of n elements in O(n) time. In B. Simeone, P. Toth,
G. Gallo, F. Maffioli, S. Pallottino (eds), Fortran codes for Network Op-
timization, Annals of Operations Research, 13, 401-419.

[35] M. Fischetti and P. Toth (1988), "A new dominance procedure for com-
binatorial optimization problems", Operations Research Letters, 7, 181-
187.

[36] M. L. Fisher (1981), "The Lagrangian Relaxation Method for Solving


Integer Programming Problems", Management Science, 27, 1-18.

[37] A. Freville G. Plateau (1993), "An exact search for the solution of the
surrogate dual for the 0-1 bidimensional knapsack problem", European
Journal of Operational Research, 68, 413-421.

[38] A.M. Frieze, M.R.B. Clarke (1984), "Approximation algorithms for the
m-dimensional 0-1 knapsack problem: Worst-case and probabilistic
analysis", European Journal of Operational Research, 15, 100-109.

[39] G. Gallo, P.L. Hammer, B. Simeone (1980), "Quadratic knapsack prob-


lems", Mathematical Programming 12 132-149.

[40] M. R. Garey and D. S. Johnson (1979), Computers and Intractability: A


Guide to the Theory of NP-Completeness, Freeman, San Francisco.
422 D. Pisinger and P. Toth

[41] B. Gavish and H. Pirkul (1985), "Efficient algorithms for solving mul-
ticonstraint zero-one knapsack problems to optimality" Mathematicial
Programming, 31, 78-105.

[42] G. Gens and E. Levner (1994), "A fast approximation algorithm for the
subset-sum problem", INFOR, 32, 143-148.

[43] P. C. Gilmore and R. E. Gomory (1965), "Multistage cutting stock prob-


lems of two and more dimensions", Operations Research, 13, 94-120.

[44] P. C. Gilmore and R. E. Gomory (1966), "The theory and computation


of knapsack functions", Operations Research, 14, 1045-1074.

[45] F. Glover (1965), "A multiphase dual algorithm for the zero-one integer
programming problem", Operations Research, 13, 879-919.

[46] A.V. Goldberg, A. Marchetti-Spaccamela (1984), "On finding the exact


solution to a zero-one knapsack problem", Proc. 16th Annual ACM
Symposium Theory 01 Computing, 359-368.

[47] H. Greenberg (1985), "An algorithm for the periodic solutions in the
knapsack problem", Journal 01 Mathematical Analysis and Applica-
tions, 111, 327-331.

[48] H. Greenberg (1986), "On equivalent knapsack problems", Discrete Ap-


plied Mathematics, 14, 263-268.

[49] H. Greenberg, I. Feldman (1980), "A better-step-off algorithm for the


knapsack problem", Discrete Applied Mathematics, 2, 21-25.

[50] P.L. Hammer, E.L. Johnson, U.N. Peled (1975), "Facets of regular 0-1
polytopes", Mathematical Programming, 8 179-206.

[51] D. S. Hirschberg and C. K. Wong (1976), "A polynomial time algorithm


for the knapsack problem with two variables", Journal 01 ACM, 23
147-154.

[52] D.S. Hochbaum (1995), "A Nonlinear Knapsack Problem", Operations


Research Letters, 17, 103-110.

[53] K. L. Hoffman and M. Padberg (1993), "Solving airline crew-scheduling


problems by branch and cut", Management Science, 39, 657-682.
Knapsack Problems 423

[54] E. Horowitz and S. Sahni (1974), "Computing partitions with applica-


tions to the Knapsack Problem", Journal of ACM, 21, 277-292.

[55] M. S. Hung and J. C. Fisk (1978), "An algorithm for 0-1 multiple knap-
sack problems", Naval Research Logistics Quarterly, 24, 571-579.

[56] T. Ibaraki (1987), "Enumerative Approaches to Combinatorial Opti-


mization - Part 1", Annals of Operations Research, 10.

[57] T. Ibaraki (1987), "Enumerative Approaches to Combinatorial Opti-


mization - Part 2", Annals of Operations Research, 11.

[58] O. H. Ibarra and C. E. Kim (1975), "Fast approximation algorithms for


the knapsack and sum of subset problem", Journal of ACM, 22, 463-
468.

[59] G. P. Ingargiola and J. F. Korsh (1973), "A Reduction Algorithm for


Zero-One Single Knapsack Problems", Management Science, 20, 460-
463.

[60] G. P. Ingargiola and J. F. Korsh (1975), "An algorithm for the solution
of 0-1 loading problems", Operations Research, 23, 752-759.

[61] G. P. Ingargiola and J. F. Korsh (1977), "A general algorithm for the
one-dimensional knapsack problem", Operations Research, 25, 752-759.

[62] R. G. Jeroslow (1974), "Trivial Integer Programs Unsolvable by Branch-


and-Bound", Mathematical Programming, 6, 105-109.

[63] E.L. Johnson and M.W. Padberg (1981), "A note on the knapsack
probem with special ordered sets", Operations Research Letters, 1, 18-
22.

[64] R. Kannan (1980), "A polynomial algorithm for the two-variables inte-
ger programming problem", Journal of ACM, 27, 118-122.

[65] G. A. P. Kindervater and J. K. Lenstra (1986), "An introduction to par-


allelism in combinatorial optimization", Discrete Applied Mathematics,
14, 135-156.

[66] D.E. Knuth (1973), The art of Computer Programming, Vol. 3, Sorting
and Searching, Addison-Wesley, Reading, MA.
424 D. Pisinger and P. Totb

[67] P. J. Kolesar (1967), "A branch and bound algorithm for the knapsack
problem", Management Science, 13, 723-735.

[68] D. Krass, S.P. Sethi and G. Sorger (1994), "Some complexity issues in a
class of knapsack problems: What makes a Knapsack Problem 'hard' " ,
INFOR, 32,149-162.

[69] J.C. Lagarias, A.M.Odlyzko (1983), "Solving low-density subset sum


problems" , Proc. 24 th Annual Symposium on Foundations of Computer
Science, Tucson, Arizona, 7-9 November 1983, 1-10.

[70] G. Laporte (1992), "The Vehicle Routing Problem: An overview of ex-


act and approximate algorithms", European Journal of Operational Re-
search, 59, 345-358.

[71] G. S. Lueker (1975), "Two NP-complete problems in nonnegative in-


teger programming", Report No. 178, Computer Science Laboratory,
Priceton University, Princeton, NJ.

[72] S. Martello and P. Toth (1977), "An Upper Bound for the Zero-One
Knapsack Problem and a Branch and Bound algorithm", European
Journal of Operational Research, 1, 169-175.

[73] S. Martello and P. Toth (1977), "Branch and bound algorithms for the
solution of general unidimensional knapsack problems" . In M. Roubens
(ed.), Advances in Operations Research, North-Holland, Amsterdam,
295-301.

[74] S. Martello and P. Toth (1979), "The 0-1 knapsack problem". In


A. Mingozzi, P. Toth, C. Sandi (ed.), Combinatorial Optimization, Wi-
ley, Chichester, 237-279.

[75] S. Martello and P. Toth (1980), "Solution of the zero-one multiple knap-
sack problem", European Journal of Operational Research, 4, 276-283.

[76] S. Martello and P. Toth (1981), "A bound and bound algorithm for the
zero-one multiple knapsack problem", Discrete Applied Mathematics,
3,275-288.

[77] S. Martello and P. Toth (1984), "A mixture of dynamic programming


and branch-and-bound for the subset-sum problem", Management Sci-
ence, 30, 765-771.
Knapsack Problems 425

[78] S. Martello, and P. Toth (1987), "Algorithms for Knapsack Problems".


In S. Martello, G. Laporte, M. Minoux and C. Ribeiro (Eds.), Surveys in
Combinatorial Optimization, Ann. Discrete Math. 31, North-Holland,
Amsterdam, 1987, 213-257.

[79] S. Martello and P. Toth (1988), "A New Algorithm for the 0-1 Knapsack
Problem", Management Science, 34, 633-644.

[80] S. Martello and P. Toth (1990), Knapsack Problems: Algorithms and


Computer Implementations, Wiley, Chichester, England.

[81] S. Martello and P. Toth (1990), "An exact algorithm for large un-
bounded knapsack problems", Operations Research Letters, 9, 15-20.

[82] S. Martello and P. Toth (1995), "The bottleneck generalized assignment


problem", European Journal of Operational Research, 83621-638.

[83] S. Martello and P. Toth (1997), "Upper Bounds and Algorithms for
Hard 0-1 Knapsack Problems", Operations Research, 45, 768-778.

[84] G. B. Mathews (1897), "On the Partition of Numbers", Proc. of the


London Mathematical Society, 28, 486-490.

[85] T. L. Morin and R. E. Marsten (1976), "An algorithm for nonlinear


knapsack problems", Management Science, 22, 1147-1158.

[86] H. Miiller-Merbach (1979), "Improved upper bound for the zero-one


knapsack problem. A note on the paper by Martello and Toth", Euro-
pean Journal of Operational Research, 2, 212-213.

[87] J.1. Munro, R. J. Ramirez (1982), "Reducing Space Requirements for


Shortest Path Problems", Operations Research, 30, 1009-1013.

[88] R. M. Nauss (1976), "An Efficient Algorithm for the 0-1 Knapsack Prob-
lem", Management Science, 23, 27-31.

[89] R. M. Nauss (1978), "The 0-1 knapsack problem with multiple choice
constraint", European Journal of Operational Research, 2, 125-131.

[90] A. Neebe and D. Dannenbring (1977), "Algorithms for a specialized seg-


regated storage problem", University of North Carolina, Technical Re-
port 77-5.
426 D. Pisinger and P. Toth

[91] G. L. Nemhauser and Z. Ullmann (1969), "Discrete dynamic program-


ming and capital allocation", Management Science, 15, 494-505.
[92] G.L. Nemhauser and L.A. Wolsey (1988), Integer and Combinatorial
Optimization, Wiley, Chichester.
[93] M.W. Padberg (1975), "A note on zero-one programming", Operations
Research, 23, 833-837.
[94] C. H. Papadimitriou and K. Steiglitz (1982), Combinatorial Optimiza-
tion: Algorithms and Complexity, Prentice Hall, Englewood Cliffs, New
Jersey.
[95] U. Pferschy, D. Pisinger, G.J. Woeginger (1997), "Simple but Efficient
Approaches for the Collapsing Knapsack Problem", Discrete Applied
Mathematics, 77, 271-280,
[96] D. Pisinger (1995), "An expanding-core algorithm for the exact 0-1
knapsack problem," European Journal of Operational Research, 87,
175-187.
[97] D. Pisinger (1994), "Core Problems in Knapsack Algorithms", DIKU,
University of Copenhagen, Denmark, Report 94/26. Submitted Opera-
tions Research, conditionally accepted.
[98] D. Pisinger (1994), "Dominance Relations in Unbounded Knapsack
Problems", DIKU, University of Copenhagen, Denmark, Report 94/33.
Submitted European Journal of Operational Research.
[99] D. Pisinger (1995), "An O(nr) Algorithm for the Subset Sum Problem",
DIKU, University of Copenhagen, Denmark, Report 95/6.
[100] D. Pisinger (1995), "A minimal algorithm for the Bounded Knapsack
Problem", In: E. Balas, J. Clausen (eds.): Integer Programming and
Combinatorial Optimzation, Fourth IPCO conference. Lecture Notes
in Computer Science, 920, 95-109.
[101] D. Pisinger (1995), "The Multiple Loading Problem", Proc. NOAS'95,
University of Reykjavik, 18-19 August 1995. Submitted European Jour-
nal of Operational Research
[102] D. Pisinger (1995), "A minimal algorithm for the Multiple-choice
Knapsack Problem," European Journal of Operational Research, 83,
394-410.
Knapsack Problems 427

[103] D. Pisinger (1996), "Strongly correlated knapsack problems are triv-


ial to solve", Proc. C096, Imperial College of Science, Technology
and Medicine, London 27-29 March 1996. Submitted Discrete Applied
Mathematics.

[104] D. Pisinger (1996), "The Bounded Multiple-choice Knapsack Prob-


lem", Proc. AIRO'96 Perugia, 16-20 September 1996, 363-365.
[105] D. Pisinger (1997), "A minimal algorithm for the 0-1 knapsack prob-
lem", Operations Research, 45, 758-767.

[106] G. Plateau and M. Elkihel (1985), "A hybrid method for the 0-1 knap-
sack problem", Methods of Operations Research, 49, 277-293.

[107] S.A. Plotkin, D.B. Shmoys, E. Tardos (1991), "Fast approximation al-
gorithms for fractional packing and covering problems", Proc. 32 nd
Annual Symposium on Foundations of Computer Science, San Juan,
Puerto Rico, 1-4 October 1991, 495-504.

[108] T. J. van Roy and L. A. Wolsey (1987), "Solving mixed integer pro-
gramming problems using automatic reformulation", Operations Re-
search, 35, 45-57.

[109] A. Sinha and A. A. Zoltners (1979), "The multiple-choice knapsack


problem", Operations Research, 27, 503-515.

[110] The Standard Performance Evaluation Corporation,


http://vww/specbench.org

[111] M. M. Syslo, N. Deo, J. S. Kowalik (1983), Discrete Optimization Algo-


rithms, Prentice Hall, Englewood Cliffs, New Jersey.

[112] P. Toth (1980), "Dynamic programming algorithms for the zero-one


knapsack problem", Computing, 25, 29-45.

[113] R. Weismantel (1995), "Knapsack Problems, Test Sets and Polyhe-


dra", Habilitationsschrift, TU Berlin, June 1995.

[114] C. Witzgal (1977), "On One-Row Linear Programs", Applied Mathe-


matics Division, National Bureau of Standards.

[115] L.A. Wolsey (1975), "Facets of linear inequalities in 0-1 Variables",


Mathematical Programming, 8, 165-178.
428 D. Pisinger and P. Toth

[116] E. Zemel (1980), "The linear multiple choice knapsack problem", Op-
erations Research, 28, 1412-1423.

[117] E. Zemel (1984), "An O{n) algorithm for the linear multiple choice
knapsack problem and related problems", Information Processing Let-
ters, 18, 123-128.

[118] A.A. Zoltners (1978), "A direct descent binary knapsack algorithm",
Journal of ACM, 25, 304-311.
429

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL.l)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 429-478
©1998 Kluwer Academic Publishers

Fractional Combinatorial Optimization


Tomasz Radzik
Department of Computer Science
King's College London, London WC2R 2LS, UK
E-mail: radziklOdcs.kcl.ac . uk

Contents
1 Introduction 430

2 Fractional Combinatorial Optimization - the General Case 431

3 The Newton method 437

4 The Newton Method for the Linear Case 441

5 Megiddo's Parametric Search 448

6 Maximum Profit-to-Time Ratio Cycles 458

7 Maximum Mean Cycles 460

8 Maximum Mean-Weight Cuts 462

9 Concluding Remarks 472

References
430 T. Radzik

1 Introd uction
An instance of a fractional combinatorial optimization problem :F consists
of a specification of a set X ~ {O, 1}P, and two functions f : X --+ Rand
9 : X --+ R. The task is to

f(x)
:F: maximize for x E X.
g(x) ,
We assume that f(x) > 0 for some x E X, and g(x) > 0 for all x E X.
The elements of the discrete domain X are often called structures, since in
concrete fractional combinatorial optimization problems they represent com-
binatorial structures like cycles or spanning trees of a graph. Examples of
fractional combinatorial optimization include the minimum-ratio spanning-
tree problem [4], the maximum profit-to-time cycle problem [9, 13], the
maximum-mean cycle problem [28], the maximum mean-weight cut prob-
lem [38], and the fractional 0-1 knapsack problem [24].
Fractional combinatorial optimization is a special case of (general) frac-
tionaloptimization: maximize f(x)fg(x) over a subset D of Rn. Fractional
optimization problems have been extensively studied and many algorithms
have been designed and analyzed (see, for example, surveys [41, 42, 43, 44]).
Computational methods designed for general fractional optimization can
usually be adapted to fractional combinatorial optimization, but the anal-
ysis of a method for the combinatorial case may be considerably different
from the analysis of the same method for the general case. For example,
when general fractional optimization is considered, it is often assumed that
the domain D is a compact subset of R n and functions f and 9 are con-
tinuous. Such assumptions clearly do not have equivalent counterparts in
fractional combinatorial optimization. Computational methods for general
fractional optimizatidn converge to the optimum objective value, but they
cannot guarantee that this value is actually reached. In most cases, the cor-
responding methods for fractional combinatorial optimizations do produce
optimal solutions. Note that the domain X is finite, so an optimal solution
can be found in finite time by simply checking all structures x EX.
This chapter is focused on two main methods for fractional combinatorial
optimization, the Newton method and Megiddo's parametric search method.
Both these methods, as well as most of the existing methods for general
fractional optimization, are based on reduction from fractional optimization
to non-fractional, parametric optimization. The Newton method was intro-
duced for general fractional optimization by Dinkelbach [10], and there are
Fractional Combinatorial Optimization 431

a number of results concerning the fast convergence of this method and its
variants (see for example [23, 35, 40)). The analyses of the Newton method
for fractional combinatorial optimization presented in this chapter are based
on [36] and [38]. Megiddo's parametric search method [31, 32] was designed
for the case of fractional combinatorial optimization when both functions
f and 9 are linear. This method yielded, for example, the first strongly
polynomial algorithm for the maximum profit-to-time cycle problem. A
strongly polynomial bound related to problem:F is a bound which depends
polynomially on p and does not depend on any other input parameters.
This chapter is organized in the following way. In Section 2 we describe
basic properties offractional combinatorial optimization, the reduction from
fractional optimization to parametric optimization, and the binary search
method. We also introduce in this section the maximum-ratio path problem
for acyclic graphs, which we later use to illustrate both the Newton method
and Megiddo's parametric search. In Section 3 we introduce the Newton
method for fractional combinatorial optimization and present some general
results about its convergence. In Section 4 we show that if both functions
f and 9 of problem :F are linear, then the Newton method finds an opti-
mal solution in a strongly polynomial number of iterations, regardless of
the structure of the domain X. In Section 5 we discuss in depth Megiddo's
parametric search method. Sections 6 and 8 contain two case studies. In
Section 6 we show how Megiddo's parametric search yields fast algorithms
for the maximum profit-to-time ratio cycle problem. In Section 8 we present
an analysis of the Newton method for the maximum mean-weight cut prob-
lem, which gives the best known bound on the computational complexity
of this problem. Most of the fastest known algorithms for fractional combi-
natorial optimization problems are based on either the Newton method, or
Megiddo's parametric search, or the basic binary search method. The most
notable exception to this rule is Karp's algorithm for the maximum-mean
cycle problem [26]. We present this algorithm and its analysis in Section 7.
Section 9 contains a few final comments and suggestions for further research.

2 Fractional Combinatorial Optimization - the Gen-


eral Case
For x E X, numbers f(x), g(x), and f(x)/g(x) are called the cost, the
weight, and the mean-weight cost of structure x. Using this terminology,
problem :F is to compute the maximum mean-weight cost of a structure in
432 T. Radzik

domain X. We also want to find a structure which achieves this maximum


mean-weight cost.
Consider the following problem.

P: minimize 0 E R, subject to f(x) - og(x) ~ 0, for all x E X.

A pair (0*, x*) E R x X is an optimal solution of problem 'P, if and only if,

f(x} - o*g(x) ~ 0 = f(x*) - o*g(x*), for each x EX.

This condition is equivalent to

f(x}Jg(x) ~ 0* = f(x*)Jg(x*), for each x E X,

which means that 0* is the optimum objective value and x* is an optimal


solution of problem:F. Thus problems :F and P are equivalent. Many
iterative methods for solving problem :F generate and solve a sequence of
instances of the following problem, where 0 E R is an additional input
parameter.

P(8}: maximize f(x) - 8g(x), for x EX.

Problem 'P(8) is called the parametric problem corresponding to problem:F,


and sometimes also the non-fractional version of problem:F. Let h(8) and
x6 denote the optimum objective value and an optimal solution of problem
P(8). We have h(8) = 0, if and only if, (0,x6) is an optimal solution of
problem P, that is, if and only if, 8 and x6 are the optimum objective value
and an optimal solution of problem:F. Hence we have another equivalent
formulation Z of problem:F.

Z: find 8 E R such that h( 8) = 0, where

h(8) = max{J(x) - 8g(x) I x EX}. (1)


This formulation suggests that one can try to design algorithms for problem
:F by applying classical methods for finding a root of a function. The fol-
lowing properties of function h can be easily obtained from the fact that h
is the maximum of a finite number of decreasing linear functions.

(i) function h is continuous on (-00, +00) and strictly decreasing from


+00 to -00.
Fractional Combinatorial Optimization 433

(ii) h(O) > 0 (this follows from the assumption that f(x) > 0, for some
XEX).
(iii) function h has exactly one root 6*, and 6* > o.
(iv) If 61 < 62 < ... < 6q denote all values of 6 for which two lines in
{f(x) - 6g(x) I x E X} intersect, then function h is linear on each
interval [-00,61], [6i' 6i+l] , for i = 1,2, ... , q - 1, and [6q ,00].
(v) function h is convex.
Some methods for solving problem :F require a subroutine only for the
following weaker version of problem 1'(6).

1'0(6) : find y E X such that


sign(f(y) - 6g(y)) = sign(max{f(x) - 6g(x) : x E X}).
That is, problem 1'0(6) is only to find the sign of the objective value of
problem 1'(6). In our discussion of properties of computational methods for
problem:F, we use the following three parameters:

MAX! = max{f(x) : x EX},


MAXg = max{g(x) : x EX},
MINg = min{g(x) : x EX},
GAP = I
min { f(x') _ f(x") . , "
g(x') g(x") . x,x
IE X, and
f(x')
g(x') i=
f(X")}
g(x") .
Our assumptions on functions f and 9 imply that the optimum objective
value of problem :F is contained in interval (O,MAXdMINg]. Parameter
GAP is the smallest difference between two different mean-weight costs.
If the weight function 9 counts the number of ones, then we have a
uniform fractional combinatorial optimization problem:
f(XI. X 2, ••• ,Xp )
:Fu: maximize
Xl + X2 + ... + xp ,
If both the cost function f and the weight function 9 are linear, then we
have a linear fractional combinatorial optimization problem:

maximize
alxl + a2X2 + ... + apxp
blXl + b2X2 + ... + bpxp
subject to (Xl, X2, • •• , xp) E X.
434 T. Radzik

An instance of problem :Ii consists of a specification of a set of structures


X ~ {O, I}P and two real vectors a = (aI, a2, ... , ap ) and b = (bI, b2, ... , bp ).
Throughout this paper, we denote the inner product CIZI + C2Z2 + ... + Cpzp
of two vectors c = (CI, C2, ••• , Cp) and z = (ZI, Z2, ••• , zp) by cz. Thus in
problem :FL, the cost, the weight, and the mean-weight cost of a structure x
are equal to ax, bx, and (ax)j{bx), respectively.

Maximum-ratio paths in acyclic graphs


As an example of fractional combinatorial optimization, we consider the fol-
lowing MaxRatioPath problem. An input instance of this problem consists
of an integer n ~ 1, a set of edges E of an acyclic directed graph with n
vertices 1,2, ... , n, an edge-cost function c : E --+ R, and an edge-weight
function w : E --+ R. To keep this example simple, we make the follow-
ing assumptions: there are no multiple edges, the vertices are numbered
according to a topological sort of the graph (that is, if (v,u) E E, then
v < u), and each vertex is reachable from vertex 1. For a path P, let
f(P) = L(v,u)EP c(v, u), g(P) = L(v,u)EP w(v, u), and f(P)jg(P) be the
cost, the weight, and the mean-weight cost of this path. The task is to find
a path from vertex 1 to vertex n which has the maximum mean-weight cost.

MaxRatioPath: maximize ~~;~ ,


over all paths P ~ E from vertex 1 to vertex n.
This is a linear fractional combinatorial optimization problem. Using our
terminology of general fractional combinatorial optimization, the set of struc-
tures X is here the set of characteristic vectors x E {O, I}E corresponding
to paths from vertex 1 to vertex n.
The parametric problem corresponding to the MaxRatioPath problem is
MaxPath(8): maximize f(P) - 8g(P) ,
over all paths P ~ E from vertex 1 to vertex n.
For a fixed 8 E R and a path P, f{P) -8g(P) = LeEp{c(e) -8w(e)), so the
optimum value of problem MaxPath(8) is the maximum cost of a path from
vertex 1 to vertex n according to the edge-cost function c-8w. An example
of an input instance of the MaxRatioPath problem and the corresponding
function h are shown in Figure 1.
In an acyclic graph, the shortest paths, as well as the longest ones, can
be computed by considering the vertices according to a topological order [8,
Fractional Combinatorial Optimization 435

.-._._ •••• 9-61) (path 1-2-3-4)

8-41) (path 1-2-4)


0,-5

6-21) (path 1-4)

4-1) (path 1-3-4)

Figure 1: An input for the MaxRatioPath problem and the corresponding


function h(8). Lines 9 - 68, 8 - 48, 6 - 28, and 4 - 8 correspond to the four
paths from vertex 1 to vertex 4. The optimal path is 1-3-4.

Chapter 25]. Algorithm MAXCOST described below computes the maximum


cost of a path from vertex 1 to vertex n for a set of edges E and an arbitrary
edge-cost function a by considering the vertices from 1 to n - 1. While
considering vertex v, the algorithm examines all edges outgoing from v.
During the computation, the boolean variable seen[u] indicates whether
vertex u has already been encountered. At the end of the computation,
number d[v], for each v = 1,2, ... , n, is equal to the maximum cost of a
path from vertex 1 to vertex v. The predecessor pointers form a "longest-
paths" tree rooted at vertex 1.

MAxCOST{n, E, a)
1) d[l] f-- OJ seen[l] f-- true
2) for v f-- 2 to n do seen[v] f-- false
3) for v f-- 1 to n - 1 do
4) for each u such that (v, u) E E do
5) if (not seen[u]) or (d[v] + a{v, u) - d[u] > 0) then
6) d[u] f-- d[v] + a{v,u)j predecessor[u] f-- Vj seen[u] f-- true
7) let P = (Vb V2, .•. , Vk) where VI = 1, Vk = n, and
Vi = predecessor[vi+l]' for i = 1,2, ... , k - 1
8) return d[n] and path P
436 T. Radzik

The running time of algorithm MAXCOST is clearly Oem). Thus for each
8 E R, problem MaxPath(8) can be solved in O(m) time. We use the
MaxRatioPath problem later in this chapter to illustrate both the New-
ton method and Megiddo's parametric search. Some details of algorithm
MAXCOST may look a bit awkward, but we choose this particular presenta-
tion to simplify the explanation of Megiddo's parametric search method in
Section 5.

The binary search method


To apply the binary search method to the problem of finding the root of
function h defined in (1), we need an algorithm Ao which for a given d E R
solves problem 'Po(d). During the binary search, we maintain an interval
(a, {3) containing the root 8* of function h, and a structure Xa such that
f(xa) - ag(xa) > O. Structure Xa is a "witness" that d* > a. During one
iteration of the binary search, algorithm Ao is run for value d = (a + {3)f2.
If it returns a structure x such that f(x) - dg(X) = 0, then x is an optimal
solution of problem F, and the computation terminates. If the returned
structure x is such that f(x) - dg(X) > 0, then 8* must be in interval (d, {3),
so a is set to 8 and Xa is set to x. Otherwise 8* is in interval (a, d), so {3 is
set to d.
The computation proceeds until an optimal solution has been found or
the desired precision has been reached. At the end of the current iteration,
a < f(xa)fg(xa) ::; 8* < {3, so d* - f(xa)fg(xa) < {3 - a. Therefore, af-
ter rlog«{3o - ao)fe)l iterations, d* - f(xa)fg(xa) < e, where (ao,{3o) is
the initial interval (a, {3). Throughout this chapter, the base of logarithms
is 2. If the computation is continued for a sufficient number of iterations,
it actually finds an optimal solution of problem F. Observe that when
d* - f(xa)fg(xa), which is equal to f(x*)fg(x*) - f(x)fg(x) for an optimal
structure x*, becomes eventually less than GAP, then f(xa)fg(xa) = d*,
so Xa is an optimal structure. This means that the binary search finds an
optimal solution of problem F in at most rlog«{3o - ao)fGAP)l iterations.
The initial interval (ao,{3o) can be (O,UfL), where U and L are an upper
bound on MAXI and a positive lower bound on MIN g , respectively. For
concrete fractional combinatorial optimization problems, such bounds are
usually readily available. If algorithm Ao is somewhat weaker than we as-
sumed and finds only a structure x with nonnegative f(x) - dg(X), provided
that such a structure exists, then the binary search procedure described here
can be modified in a straightforward way.
Fractional Combinatorial Optimization 437

,,
5*

Figure 2: The Newton method for solving h( a) = 0

As an example, consider the special case of the MaxRatioPath problem


when all edge costs and weights are integers not greater than U. For this
problem we have MAX, ~ nU, MINg ~ 1, and GAP ~ 1/(nU)2, so the
binary search method yields an O(mlog(nU»-time algorithm.

3 The Newton method


The Newton method for a fractional combinatorial optimization problem:F,
called also the Dinkelbach method [10], is an application of the classical
Newton method to the problem of finding the root of function h defined
in (1). To use this method we need an algorithm Amax which for a given
a E R computes h(a) and a structure x E X such that f(x) - ag(x) =
h(8). That is, we need an algorithm Amax which for a given a E R solves
problem 'P(a).
The Newton method for problem:F is an iterative process which in each
iteration generates a new, better lower estimate on the optimum objective
value 8*. During iteration i, the current estimate ai ~ a· is considered and
the following computation is performed. We first run algorithm Amax for
a = 8i and obtain hi = h(ai) and a structure Xi E X such that f(xi) -
438 T. Radzik

Oi9(Xi) = hi. If hi = 0, then Oi = 0* and Xi is an optimal solution of


problem F, so the computation terminates. Otherwise we obtain the new
estimate 0i+l t- J(Xi)/9(Xi) (observe that Oi < 0i+l $ 0*, since in this case
hi > 0) and we proceed to the next iteration. This process is illustrated in
Figure 2. The computation begins with 01 = o.
Our aim is to derive bounds on the number of iterations. Let t be the
index of the last iteration, or +00, if the computation does not terminate,
and let Ii = f(xi) and 9i = 9(Xi), for each 1 $ i < t + 1. From the
description of the computation, we have

hi - Ii - Oi9i , for each 1 $ i < t + 1, (2)

= fi, for each 1 $ i < t. (3)


9i
Lemma 3.1 The Newton method terminates in a finite number of iterations
and
(AJ hI > h2 > ... > h t -l > h t = 0,
(BJ 0 = 01 < 02 < ... < 0t-l < Ot = 0*,
(CJ 91 > 92 > ... > 9t-l ? 9t·
Proof. This lemma follows immediately from the following claim. For each
1 $ i < t + 1,
(a) hI > h2 > ... > hi-I> hi ? 0,
(b) 0 = 01 < 02 < ... < Oi-l < Oi $ 0*,
( c) 91 > 92 > ... > 9i-l ? 9i,
and the equalities at the end of these three chains are possible only if iter-
ation i is the last iteration in the whole computation. We prove this claim
by induction on i. For i = 1 the claim follows from the properties (ii)
and (iii) of function h. Assume now that the claim is true for i = j, for
some 1 $ j < t. Using (2) and (3), we have

and A. 1
(}j+ --
r
- j --

- j + A.
(}j > r.
(}j'
9j 9j
so (b) is true for i = j + 1. Since OJ < OJ+! $ 0*, the monotonicity of
function h implies that hj+! = h(oj+d < h(oj) = hj, and hj+! = h(oj+1) ?
Fractional Combinatorial Optimization 439

h(o*) = 0, so (a) is true for i = j + 1. To show that also (c) is true for
i = j + 1, we use the following two inequalities
/'+1
, - 0, '9'+1
, ::; max{f(x)
xEX - 0' '9(X)} = h', = /,' - 0,'9,',

1;+1 - 0j+19j+1 = hj+1 = max{f(x) - OJ+19(X)} ~ fj - OJ+19j,


xEX
Subtracting these two inequalities we obtain 9j+1 ::; 9j, and the equality is
possible only if hj+1 = I; - OJ+19j. Since I; - OJ+19j is by definition equal
to 0 (see (3)), 9j+l = 9j only if hj+1 = 0, that is, only if iteration j + 1 is
the last iteration of the whole computation. Hence (c) is true for i = j + 1.
This concludes the proof of the claim.
The claim implies that all elements of sequence (9ih:~i<t are distinct, so
t must be finite (numbers 9i are weights of structures from X, so t - 1 ::;
IXI < +00). 0

The following lemma indicates that sequences {hih~i:5t and (9ih~i~t


should geometrically decrease to zero. This lemma is the basic tool in all
analyses of the Newton method presented in this chapter.

Lemma 3.2 For each i = 1,2, ... , t - 1,

hHI + 9Hl < 1. (4)


hi 9i-

Proof, Let 1 ::; i ::; t - 1. Structure Xi maximizes f(x) - Oi9(X) over X, so

hi = Ii - Oi9i = f(Xi) - Oi9(Xi)


> f(Xi+1) - Oi9(XHl) - fHl - Oi9i+1'

Therefore, using (2) and (3), we have

This inequality immediately implies Inequality (4). o


440 T. Radzik

We present now one general bound on the number of iterations of the


Newton method, and two bounds for special classes of fractional combina-
torial optimization problems.
Theorem 3.3 The Newton method solves problem:F in at most
10g(MAX,) + 10g(MAXg) -log(MINg) -log(GAP) + 2 iterations.
Proof. Lemma 3.2 implies that for i = 1,2, ... , t - 1,
hi+19i+1 < !. (5)
higi - 4
Hence ht-lgt-l ~ (lj4 t - 2)hI91 and
t ~ log4(hlgl) -log4(ht- l gt-d + 2. (6)
Since hI = MAX" so
(7)
Equations (2) and (3) imply that for each i = 1,2, ... , t-l, di+l -di = hijgi.
Hence
ht-lgt- l = (dt-dt-dgtl = (ft-l - ft-2) gl-l 2: GAp· (MINg)2. (8)
gt-l gt-2
Inequalities (6), (7), and (8) imply the bound on the number of iterations
stated in this theorem. 0

The following two theorems have been frequently rediscovered in the


context of many different fractional combinatorial optimization problems.
Theorem 3.4 For a linear fractional combinatorial optimization problem:FL
such that all input numbers ai and bi, 1 ~ i ~ p, are integers not greater
than U, the Newton method runs in O(log(pU)) iterations.
Proof. For such a problem :FL, we have MAX, ~ pU, MAXg ~ pU,
MIN g ~ 1, and GAP ~ Ij(pU)2, so Theorem 3.3 implies a O(log(pU))
bound on the number of iterations. 0

Theorem 3.5 For a uniform fractional combinatorial optimization prob-


lem :Fu, the Newton method runs in at most p + 1 iterations.
Proof Sequence (gih$i9-1 is strictly decreasing. For a uniform fractional
optimization problem, each number gi is a positive integer not greater than p,
sot~p+1. 0
Fractional Combinatorial Optimization 441

4 The Newton Method for the Linear Case


In this section we show that for every linear fractional combinatorial opti-
mization problem FL, the Newton method runs in a strongly polynomial
number of iterations. We use the notation introduced in Section 3. Inequal-
ity (5), a direct consequence of Lemma 3.2, says that sequence (hi9i) geo-
metrically decreases to zero. Equations (2) and (3) imply that h19l = it9l
and for 2 ~ i ~ t,

hi9i = (fl - Oi9i)9i = (h - 9i-l


h-1 9i ) 9i· (9)

For each 1 ~ i ~ t, numbers hand 9i are sums of some numbers drawn from
2p numbers aI, a2, ... , ap , and bI, b2 ... , bp • Thus the elements of sequence
(hi9i) are created from fixed 2p numbers using O(P) additions/subtractions
and up to three multiplications/divisions. Bounds on the numbers of itera-
tions of the Newton method for linear fractional combinatorial optimization
come from the fact that a geometric sequence whose elements are constructed
in such a limited way cannot be long.
To establish intuition why such sequences cannot be long, let us assume
for a moment that bi ~ ~ ~ ... ~ bp ~ 0, and that there exists a constant
a such that for every i = 1,2, ... , t - 1, 9Ht!9i ~ a < 1. That is, we
assume that sequence (9ih$i$t on its own geometrically decreases to zero.
Each number 9i is equal to the sum of distinct elements drawn from the
multi-set {bI, b2, ... , bp }. Obviously 91 ~ pbl. Since sequence (9i) decreases
geometrically, we must have 9k < bI, for some k = O(logp). This means
that number bl is not a term in the sum 9k, nor does it occur in any sum 9i,
for i ~ k, so it can be excluded from further considerations. Since 9k < bI,
we must have 9k ~ (p - 1)~, and after next O(logp) iterations we can
exclude ~, then b3, and so on. Therefore, the length of sequence (9i) is only
o (p log p). (In fact, one can show that the length of such a sequence is O(P).
See Lemma 8.4.)
The actual analysis of the number of iterations is somewhat more in-
volved than the previous paragraph may suggest, because numbers hi9i
have "more complicated structure" than numbers 9i (see Equation (9».
Moreover, we have to deal with sums of numbers which are not necessary
nonnegative. Even if all numbers aI, a2, ... ,ap , bI, ~ ... ,bp are positive,
negative terms may appear since subtractions are used in the definition of
numbers hi. The following lemma, which gives a bound on the length of a
geometrically decreasing sequence of sums of numbers, is our main tool in
442 T. Radzik

bounding the number of iterations of the Newton method. In the statement


of this lemma, the elements of sequence (Ykck~l are constructed by adding
and/or subtracting numbers drawn from the set of the components of vector
c.

Lemma 4.1 {15} Let c = (CI, C2, ••• ,Cp) be a p dimensional vector with non-
negative real components, and let Yl, Y2, ... ,Yq be vectors from {-I, O,I}p.
If for all i = 1, 2, ... , q - 1

°< Y'+lc
,
1
-< -Y'c
2 ' ,

then q = O(plogp).

Proof. Consider the following polyhedron


p = {z = (ZI,Z2,'" ,zp) E RP
(Yi - 2Yi+1)Z > 0, for i = 1,2, ... ,q - 1,
YqZ = 1,
Zi ~ 0, for i = 1,2, ... ,p }.

Let A and b denote the coefficient matrix and the right-hand side vector of
the system defining polyhedron P. This polyhedron is not empty because
it contains vector c/(yqc). From polyhedral theory we know that there
exists a vector c* = (ci, ci, ... ,c;) E P such that A'c* = b' for some
nonsingular p x p submatrix A' of matrix A and a subvector b' of vector b
(the vertices of the polyhedron are such vectors). Cramer's rule says that
for each i = 1,2, ... ,p,
* det A~
ci = detA"

where matrix A~ is obtained from matrix A' by replacing the i-th column
with vector b'.
The determinant of a p x p matrix M = (mij) is equal to

det M = L>'O'ml,0'(1)m2,0'(2)'" mp,O'(p) ,


0'

where the summation is over all permutations q of the set of indices


{I, 2, ... ,p}, and each number eO' is equal to either 1 or -1. Thus
I det MI :5 m P p!, where m is the maximum absolute value of an entry of ma-
trix M. The entries of matrix A and vector b, and consequently the entries
Fractional Combinatorial Optimization 443

of all matrices A~, are integers from interval [-3,3]. Thus Idet A~I :::; 3P pI,
for each i = 1,2, ... ,p. Since we also have Idet A'I 2: 1, then ci :::; 3P pI, for
each i = 1,2, ... ,p, and
P
Yjc* :::; Eei :::; p3Pp!,
i=l

for each j = 1,2, ... , q. Finally we have

1 = Yqc * :::; 2q-1


1 YIC * < 1 3P p.,'
2Q-IP

so q :::; 10g(p3Pp!) + 1 = O(plogp). o


In the analysis of the Newton method we also use the following rephrasing
of Lemma 4.1.

Corollary 4.2 Let C = (CI, C2, ••• , ep) E RP, and let Yll Y2, ... , Yq be vec-
tors from {O, l}p. II lor all i = 1,2, ... , q - 1
1
o < Y"+IC < -Y"C
I 2 ' ,-

then q = O(plogp).

We show two bounds on the number of iterations of the Newton method


for linear fractional combinatorial optimization. First we show an O(p310gp)
bound (Theorem 4.3), which comes from a direct analysis of sequence (hi9i).
This analysis uses Inequality (5), Equation (9), and Lemma 4.1. Then, by
analyzing sequences (hi) and (9i) separately, we show an O(p210g2 p) bound
(Theorem 4.6).

Theorem 4.3 The Newton method solves a linear fractional combinatorial


optimization problem :FL in O(p310gp} iterations.

Proof For each 1 :::; i :::; t - 1, hi > O. This fact, Equation (9), and
Inequality (5) imply that for each i = 2,3, ... , t - 2,

o< (/HI - ~:9HI) 9i+l :::; ~ (Ii - ~:=>i) 9i,


and since 0 < 9i :::; 9i- 11 we further have
2 1 2
o< IHIgHl9i -/i9i+l :::; 4/i9i9i-1 - fi-19i'
444 T. Radzik

Hence, putting Si = fi9i9i-l -1i-19l, we have for each i = 2,3, ... , t - 2,


1
o< Si+l $ '4Si.

For a p3-dimensional vector c whose set of components is {Iajbkbd : 1 $


j, k, 1 $ p}, there exist p3-dimensional vectors Y2, Y3' ... ,Yt-l with compo-
nents from { -1,0,1} such that Si = Yic, for each i = 2,3, ... ,t-l. Applying
Lemma 4.1, we conclude that t = O(p310gp). 0

The O(p210g2 p) bound on the number of iterations is based on the fol-


lowing two lemmas.

Lemma 4.4 There are at most O(plogp) iterations k such that 9k+l $
{2/3)9k.

Proof. Let kl < k2 < ... < kq be the indices k such that 9k+l $ {1/2)9k.
Since sequence (9ih<i<t is non-increasing, we have for each i = 1,2, ... ,q-l,

Applying Corollary 4.2 with c = band Yi = Xkp for i = 1,2, ... ,q, we con-
clude that q = O(plogp). Monotonicity ofsequence (9ih:5i:5t further implies
that there are at most 2q = O(plogp) iterations k such that 9k+l $ (2/3)9k.
o

Lemma 4.5 There are at most O(plogp) consecutive iterations k such that
9k+l ~ (2/3)9k.

Proof Consider a sequence of q consecutive iterations j + 1, j + 2, ... ,j + q


such that for each k = j + l,j + 2, ... ,j + q -1,
2
9k+l ~ 3" 9k· (1O)

We show that Corollary 4.2 can be applied to vector c = dj+qb - a and


the sequence of q - 2 vectors Xk, k = j + 1, ... ,j + q - 2. Inequalities (1O)
and (4) imply that for k = j + 1, j + 2, ... , j + q - 1,
1
hk+l $ 3" h k' (11)
Fractional Combinatorial Optimization 445

Using Equations (2) and (3), and Inequalities (10) and (11), we obtain for
each k = j + 1, j + 2, ... ,j + q - 2,

(12)
9k+! 2 9k

Inequality (12) implies

OJ+q - Ok+l = (oj+q - oj+q-d+ (OJ+q-l - OJ+q-2) + ... + (Ok+2 - Ok+!)


1 1
< 2(Oj+q-l - OJ+q-2) + ... + 2(Ok+! - Ok)
1
= 2(Oj+q-l - Ok)
1
< 2(oj+q - Ok). (13)

Using Equation (3), we obtain for k = j + 1,j + 2, ... ,j + q - 2,

Thus, using Inequalities (13) and the monotonicity of sequence (9i), we


obtain for k = j + 1, j + 2, ... , j + q - 3,

Applying Corollary 4.2 to vector c = oj+qb - a and the sequence of q - 2


vectors Xb k = j + 1, ... ,j + q - 2, we conclude that q = O(p logp). 0

Lemmas 4.4 and 4.5 immediately imply the following bound on the num-
ber of iterations of the Newton method.

Theorem 4.6 The Newton method solves a linear fractional combinatorial


optimization problem :FL in O(p210g2 p) iterations.

Strongly polynomial bounds on the number of iterations of the Newton


method for problem:FL are somewhat unexpected because, as the following
example demonstrates, function h(o) may consist of an exponential number
of linear segments. Let a = (aI, a2, ... , a3p) and b = (bb b2, ... , b3p) be
446 T. Radzik

vectors such that


2i-l , £or t. = 1, 2, ... ,p,
0, fori=p+l, ... ,3p,

0, for i = 1,2, ... ,p,


2i - p - 1 , for i = p + 1, ... ,3p.

Let U ~ {a, 1}3P (respectively, W ~ {O, 1pP) be the set such that a binary
vector x = (Xl, ... , X3p) belongs to U (respectively, W) if and only if Xi =
for each p < i ~ 3p (respectively, for each 1 ~ i ~ p). The 2P numbers au,
°
for u E U, are numbers 0,1, ... , 2P -1, and the 22p numbers bw, for wE W,
are numbers 0,1, ... , 22p -1. For u E U, let Wu denote the vector for which
bw u = (au)2. Finally, let X = {(u, w u ) : u E U, u i= O}. For this instance
(X, a, b) of linear fractional combinatorial optimization, function h is equal
to
h(t5) = max{k-t5k2 : k=I,2, ... ,2P -l},
and consists of 2P - 1 linear segments.
The bound stated in Theorem 4.6 is the best known bound on the number
of iterations of the Newton method for a general problem :FL. However, when
we consider a concrete problem of type FL, it often happens that special
properties of this problem enable us to derive a better bound. For example,
Theorem 4.6 implies that the Newton method solves the MaxRatioPath
problem in O(m 2 log2 m) iterations and, consequently, in O(m3 log2 m) total
time. Using the properties of the MaxRatioPath problem, we show below
that if the edge weights are nonnegative, then the Newton method requires
actually only O(mlogn) iterations. A more advanced example of such an
analysis is presented in Section 8. We need one more technical lemma about
the Newton method for the general case.
Lemma 4.7 For each 1 ~ i ~ t - 2 and 15 ~ t5i+2'
(14)
Proof. Since sequence (t5i) monotonically increases and numbers 9i are pos-
itive, it is enough to show that Inequality (14) is true for each 1 ~ i ~ t - 2
and 15 = 0i+2. Using Equations (2) and (3) and Lemma 3.1(0), we have for
each 1 ~ i ~ t - 2,
hi+l = fi+l - Oi+l9i+l = (Oi+2 - 0i+t}9i+l
< (Oi+2 - 0i+t}9i = di+29i - Ii-
Fractional Combinatorial Optimization 447

Thus Inequality (14) is true for each 1 ~ i ~ t - 2 and d = di+2. 0

Theorem 4.8 The Newton method solves the MaxRatioPath problem in


O(mlogn) iterations and in O(m2 Iogn) total time, if the weights of the
edges are nonnegative.

Proof. We use the notation from the description of the MaxRatioPath prob-
lem in Section 2. Consider an input instance (n, E, c, w) of the MaxRatioPath
problem. We introduce the following definitions. For d E R, a vertex v, and
an edge (v, u) E E,

(c - dW)(V,u) = c(v,u) - dW(V,U),


deC
the maximum length of a path from vertex 1 to
vertex v according to the edge-cost function Ct5,

For a subset of edges Q ~ E, Ct5(Q) (respectively, Ct5(Q)) denotes the sum of


ct5(e) (respectively, ct5(e)) over all edges e E Q. The introduced definitions
imply that for each path P ~ E,

Ct5(P) = /(P) - dg(P).

It is also easy to verify that for each edge (v, u) E E,

(15)

and for each path P from vertex 1 to vertex n,

(16)

Consider the computation of the Newton method on input (n, E, c, w).


For i = 1,2, ... , t, let l'i be the path from vertex 1 to vertex n computed
during iteration i. We have for each 1 ~ i ~ t,

(17)

We say that an edge e is essential at the beginning of iteration i, if it belongs


to at least one of the paths l'i, l'i+l, ... , Pt. To prove the theorem, we
show that a sequence of k = flog n1+ 1 consecutive iterations decreases the
448 T. Radzik

number of essential edges at least by one. Consider a sequence of iterations


i, i + 1, ... ,i + k. Inequality (5) implies that
1
hi+k9Hk :5 2hHlgi+1. (18)
n
If gHk :5 (1/ n) gi, then let e E ~ be such an edge that
1 1 1
wee} ~ IRI
l
L w(a}
aEPi
= IRlg(Pi} ~ n -1 gi
l
> gi+k·

Since sequence (gj) is non-increasing, then for each j ~ i + k, wee} > gj =


LaEPj w( a), so edge e does not belong to path Pj. Thus edge e is essential
at the beginning of iteration i (since e E ~) but is not essential at the
beginning of iteration i + k (since e ¢ Pj, for each j ~ i + k).
If gi+k > (l/n)gi' then also gHk > (l/n)gi+l and Inequality (18) implies
that hHk < (l/n)hi+l. This inequality and Lemma 4.7 imply that

C6i+k(Pi) = f(~) - Oi+kg(Pi) = Ii - 0Hkgi :5 -hi+1 < -nhi+k·


Therefore, using (16) and (17), we have

C6i+k (Pi) = C6i+k (Pi) - d6i+k (n) < -nhHk - hHk = -en + l)hi+k.

This means that there must be an edge e E ~ such that C6i+k (e) < -hHk.
Let e be such an edge. For any path P from vertex 1 to vertex n which
contains edge e, we have, using (15) and (16),

Hence none of the paths Pj, for j ~ i + k, contains edge e, because OJ ~ 0Hk
and
C6iH(Pj) ~ C6j(Pj) = hj ~ o.
Thus edge e is essential at the beginning of iteration i but is not essential
at the beginning of iteration i + k. 0

5 Megiddo's Parametric Search


In this section we present the parametric search method for fractional com-
binatorial optimization proposed by Megiddo [31, 32]. We introduce this
Fractional Combinatorial Optimization 449

method using the MaxRatioPath problem as an example. Let (n, E, c, w)


be an input instance of this problem and let 5* denote the optimum value
for this input. See Section 2 for the notation concerning the MaxRatioPath
problem. The maximum cost of a path from vertex 1 to vertex n according
to the edge-cost function c - ~*w is equal to O. Thus if we know ~* and run
algorithm MAXCOST on input (n, E, c - ~*w), then it returns value 0 and a
path P which is optimal for the instance (n, E, c, w) of the MaxRatioPath
problem.
Now we try to run algorithm MAXCOST on input (n, E, c - ~*w) but
without knowing the value of ~*. We assume at first that each time we need
the outcome of the comparison in line 5 of the algorithm, we get it from an
oracle which is always correct. Having such an oracle, we can easily proceed
with the computation, and the only change over the case with known ~* is
that at each point of the computation, each d[v] is now not a number but a
function of unknown ~*. Observe that the way variables d[v] are updated in
line 6 implies that these functions are linear functions of ~*. Therefore, the
comparison
d[v] + (c(v,u) - ~*w(v,u)) - d[u] > 01
performed in line 5 is always of the form "s - 5*t > 0 1" for some known
numbers sand t. To find out the outcome of such a comparison, we only need
to know the relation between numbers sit and ~*. The three possibilities,
sit < ~*, sit = 5*, and sit > ~*, are equivalent to the optimum value of
problem MaxPath(slt) being positive, equal to 0, or negative, respectively.
Thus instead of the oracle, we can use algorithm MAXCOST itself. We obtain
the following algorithm for the MaxRatioPath problem.
WEIGHTEDMAxCosT(n, E, c, w)
1) d[l] +- OJ seen[l] +- true
2) for v+-2 to n do seen[v] +- false
3) for v +- 1 to n -1 do
4) for each u such that (v, u) E E do
5) let sand t be the numbers such that
s- ~*t = d[v] + c(v, u) - ~*w(v, u) - d[u]
6) if t 0 then
~
7) x +- the number returned by MAxCosT(n, E, c - (slt)w)
8) if (not seen[uD or (t = 0 and s > 0) or (tx < 0) then
9) d[u] +- d[v] + c(v, u) - ~*w(v, u)
10) predecessor[u] +- Vj seen[u] +- true
11) let P = (VI, V2, • •• ,Vk) where VI = 1, Vk = n, and
450 T. Radzik

Vi = predecessor[vi+1J, for i = 1,2, ... , k - 1


12) return path P

Theorem 5.1 Algorithm WEIGHTEOMAXCOST solves the MaxRatioPath


problem in O(m 2 ) time.

Proof. The computation MAxCosT(n, E, c - 6*w) returns an optimal


path P for the instance (n, E, c, w) of the MaxRatioPath problem. If
we trace, in parallel, the computations MAxCosT(n, E, c - 6*w) and
WEIGHTEOMAxCosT(n, E, c, w), then we see that the predecessor pointers
change in exactly the same way in both computations, because the outcomes
of the corresponding conditions checked in line 5 of the first algorithm and
in line 8 of the second one are always the same. Thus both computations
return the same path, so the computation WEIGHTEOMAxCosT(n, E, c, w)
returns an optimal path P for the instance (n, E, c, w) of the MaxRatioPath
problem. The running time of algorithm WEIGHTEOMAXCOST is O(m 2 ) be-
cause algorithm MAXCOST is executed at most m times. 0

To generalize the above idea, we need the notion of linear algorithm.


Let Q( 6) denote a problem whose input includes a parameter 6 E R. An
algorithm A is a linear algorithm for problem Q( 6), if it satisfies the following
conditions.

1. For each fixed 8E R, algorithm A computes a correct solution of


problem Q(8).

2. The value of each arithmetic expression on each possible execution


path of algorithm A is a linear function of parameter 6.

Observe that in our example we do not actually need an algorithm which


computes a maximum cost path. Everything works equally well if we use
an algorithm which only finds a positive-cost path, if such a path exists.
In general terms, this means that instead of considering problem P(6), it is
enough to consider problem Po(6). The following theorem summarizes the
basic form of Megiddo's parametric search.

Theorem 5.2 If there exists a linear algorithm A for the parametric prob-
lem 'Po (0') corresponding to a fractional combinatorial optimization prob-
lem:F which runs in time T, then problem :F can be solved in O(T2) time.
Fractional Combinatorial Optimization 451

Proof. Run algorithm A for problem Po(5*) with unknown 5*. Since A is
a linear algorithm, each comparison encountered during the computation is
equivalent to a comparison "s - 5*t > 0 ?" for some numbers sand t, and
can be evaluated by solving problem Po(s/t). If each such problem is solved
by running algorithm A with 5 = sit, then the whole computation runs in
O(T2) time. 0

If we exploit possible independence of comparisons in algorithm A, we


may obtain a better bound than the bound stated in Theorem 5.2. Consider,
for example, iteration v, 1 5 v 5 n -1, of the outer loop of the computation
MAxCosT(n, E, c - 5*w). Each comparison in line 5 performed during this
iteration would give exactly the same outcome if it were performed right at
the beginning of the iteration, because it does not depend on any updates
done during this iteration. (Note that this statement would not be true
if the graph had multiple edges.) Let Sk - 5*tk, for k = 1,2, ... , deg(v),
be the expressions which have to be compared with 0 during this itera-
tion, where deg(v) denotes the number of edges outgoing from vertex v.
To know the outcome of these comparisons, we have to establish the rela-
tion between unknown number 5* and all numbers 8k/tk' We can either
establish the relation between 8k/tk and 5* for each k independently, by
computing MAxCosT(n, E, c - 5w) for each 5 = 8k/tk, or we can first sort
numbers Sk/tk and then establish the position of 5* with respect to the ele-
ments of the sorted sequence (s~/tU using binary search. The latter method
requires only O(log(deg(v))) applications of algorithm MAXCOST, one ap-
plication per one iteration of the binary search. The running time of sorting
deg(v) numbers and O(log(deg(v))) applications of algorithm MAXCOST is
O(deg(v) log(deg(v))) + O(mlog(deg(v))) = O(mlog(deg(v))). Thus the
running time of the whole computation is
n-l
O(m L: log(deg(v))) = O(mnlog(m/n)), (19)
v=l

since :E~:i deg( v) = m and a logarithmic function is concave.


To generalize this idea, we introduce the notion of a stage of the compu-
tation. Consider a problem Q whose input includes a parameter 5 E R. A
stage of the computation of an algorithm for problem Q is a (continual) part
of the computation which has the following property. For each comparison
performed during one stage, if this comparison is moved to the beginning
of this stage, the outcome of the comparison does not change. We consider
452 T. Radzik

only comparisons whose outcomes may depend on the value of parameter o.


We say that an algorithm consists of r stages, if the computation of this al-
gorithm can be partitioned into r stages, independently of the actual input
values. In the theorem below, we distinguish between the main algorithm
AI, which guides the whole computation, must be linear and should ideally
consist of a small number of stages, and the "subordinate" algorithm A2,
which is used to resolve individual comparisons performed by algorithm AI.
Algorithms Al and A2 may of course be the same, but they are usually
different, if they are to provide the best possible running time of the whole
computation (see Section 6).
Theorem 5.3 If there exist
(1) a linear algorithm Al for the parametric problem 'Po(6) corresponding
to a fractional combinatorial optimization problem :F, which runs in
time Tl and consists of r stages with qi comparisons during stage i,
and
(2) an algorithm A2 for problem 'Po(6) which runs in time T 2 ,
then problem :F can be solved in O(TI + T2 Li=llog qi) time.
Proof. Run algorithm Al for problem 'Po(6*) with unknown 6*. Consider
stage i of this computation. During this stage we are to resolve qi indepen-
dent comparisons of the form s - 6*t. Computing the outcomes of all these
comparisons can be reduced to computing the relation between 6* and qi
numbers Xl, X2, ••• , x qi • If we sort these numbers and locate 6* in the sorted
sequence using binary search, as suggested in our example, then the run-
ning time of this stage is O(qi log qi +T2log qi). This gives, however, only an
O(TI maxi {log qd + T2 Li=llog qi) bound on the running time of the whole
computation.
To obtain the bound claimed in this theorem, observe that sorting num-
bers Xl, X2, ••• 'X qi can be avoided. We can find the median of these numbers
in O(qi) time [2] and compare this median with 6* by running algorithm A 2 •
The outcome of this comparison reveals the relation between 6* and half
of the numbers Xl, X2, ••• , X qi • We can continue this process by finding the
median of the remaining half. This way we eventually partition numbers
Xl, X2, ••• ,xqi into the numbers smaller than 6* and the numbers greater
than 6*. The running time of this computation is

o (~U: +T.)) = O{qi +T.!Ogqi).


Fractional Combinatorial Optimization 453

Summing this bound over all stages i, we obtain the bound claimed in this
theorem. 0

To minimize the bound of Theorem 5.3 for a given fractional combina-


torial optimization problem, algorithm A2 should obviously be the fastest
known algorithm for the non-fractional version of this problem. To find good
candidates for algorithm AI, we should look among parallel algorithms, since
such algorithm are especially designed to have a small number of stages. A
parallel algorithm which runs in time T and uses P processors can be viewed
as an algorithm which runs in sequential time T P and consists of T stages
with at most P operations per stage. Thus the following theorem is an
immediate corollary of Theorem 5.3.

Theorem 5.4 If there exist

(1) a linear parallel algorithm Al for the parametric problem Po(6) corre-
sponding to a fractional combinatorial optimization problem :F, which
runs in time TI and uses P processors, and

(2) a (sequential) algorithm A2 for problem Po(6) which runs in time T2,
then problem :F can be solved in O(TIP + T2Tliog P} time.

Consider another example of fractional combinatorial optimization, the


maximum-ratio spanning-tree problem. An input instance of this problem
consists of an undirected connected graph G = (V, E), an edge-cost function
c : E --+ R, and an edge-weight function w : E --+ R. The cost and the
weight of a spanning tree in graph G is equal to the sum of the costs and
the sum of the weights of the edges of this tree, respectively. The problem
is to find a spanning tree in graph G with the maximum cost-to-weight
ratio, or, using our terminology, a spanning tree with the maximum mean-
weight cost. The corresponding parametric problem P( 6) is the problem of
computing a maximum spanning tree for the edge-cost function c - 6w. The
maximum-cost spanning-tree problem, which is equivalent to the minimum-
cost spanning-tree problem, can be solved in TMST = o (min{ m log log n, m+
n log n}) time [49, 14], where m is the number of edges and n is the number
of vertices in the input graph.
A maximum-ratio spanning tree is a maximum-cost spanning-tree for
the edge-cost function c - 6* w. Maximum-cost spanning trees depend only
on the order of the edges by their costs. Thus to find a maximum-cost
454 T. Radzik

spanning-tree for the edge-cost function c - 6*w, it is enough to sort m


numbers c(e) - 6*w(e), e E E. Sorting m numbers can be done in O{logm)
parallel time using m processors [7]. Therefore, Theorem 5.4 implies that
the maximum-ratio spanning-tree problem can be solved in O(mlogm +
TMST log2 m) = O(TMST log2 m) time.
We return now to the general case. Since we are interested in maxi-
mization problems, many sub-tasks appearing during the computation of
algorithm Al of Theorem 5.3 may be of the form

u(6*) f- max{ uI(6*), u2(6*), . .. ,uq (6*)}, (20)

where each Ui is a linear function of parameter 6. We discuss two different


approaches to computing such a maximum. If the maximum (20) is cal-
culated by comparing each Ui, for i = 2,3, ... , q, with the already known
maximum of UI through Ui-l, then this computation has q - 1 stages with
one comparison per stage, and the maximum can be computed in O(T2q)
time. Here T2 denotes, as in Theorem 5.3, the running time of an algorithm
A2, which is used to resolve the comparisons performed during the computa-
tion of algorithm AI. Computation of the maximum (20) can be organized
in flog q1 stages by scheduling the comparisons in a tournament tree. Each
stage consists of O(q) comparisons, so Theorem 5.3 implies that the maxi-
mum (20) can be computed in O(q + T210g2 q) time. The number of stages
needed to compute the maximum can be further reduced to o {log log q) in
the following way. Follow the tournament for O{logloglogq) stages to re-
duce the number of live elements to kl ~ qj(loglogq). Live elements are
those which so far have won all their comparisons. Now perform a sequence
of the following stages. During stage i, partition the set of ki live elements
into kU (2kd equal-size groups, and perform all pairwise comparisons in
each group. This stage requires 2kl = 0 (q j log log q) comparisons and re-
duces the number of live elements to ki+1 = kU(2kd. Thus ki = kI/2 2i - 1 _I,
so we have only O(log log q) such stages. Therefore, this algorithm for com-
puting the maximum of q numbers consists of O(q) comparisons partitioned
into O(log log q) stages. Using this algorithm as algorithm AI, Theorem 5.3
implies that the maximum (20) can be computed in O(q + T210gqloglogq)
time. The above O(log log q )-stage algorithm for computing the maximum of
q numbers is based on Valiant's parallel algorithm which runs in o {log log q)
time and uses q processors [47].
There is an alternative approach to computing the maximum (20). The
maximum of q linear functions is a convex, piecewise linear function with
Fractional Combinatorial Optimization 455

at most q breakpoints. If we have two convex, piecewise linear functions


71"1 and 71"2 with ql and q2 breakpoints, respectively, and the breakpoints of
each function are sorted by their x-coordinates, then we can compute the
sorted sequence ofthe breakpoints offunction max{ 71"1, 7I"2} in O{ ql +q2) time
by a straightforward merging process. Therefore, by following the merging
scheme of the merge-sort algorithm, we can compute all breakpoints of the
maximum of q linear functions Ui in O{q log q) time. Finding the linear
segment of function U which contains 8* can then be done by binary search
over the breakpoints of function u. Using this approach, the maximum (20)
can be computed in O{qlogq + T2logq) time.
Consider an algorithm A for a problem Q whose input includes a param-
eter 8 E R. A maxima-computing phase of an algorithm A is a (continual)
part of the computation of A such that all comparisons performed during
this part of the computation are (implicitly) involved in computing maxima,
and all these maxima are independent, that is, they can be performed in ar-
bitrary order without changing the outcome of the whole computation. As in
the definition of a stage of an algorithm, we consider here only comparisons
whose outcomes may depend on the value of parameter 8. The following
theorem is based on the two approaches to computing the maximum (20)
described above. The size of a maximum of q elements is equal to q.

Theorem 5.5 If there exist


(1) a linear algorithm Al for the parametric problem 1'0(8) corresponding
to a fractional combinatorial optimization problem :F, which runs in
time TI, consists of r maxima-computing phases, and the total size
of the maxima in phase i is at most qi, and

(2) an algorithm A2 for problem 1'0(8) which runs in time T2,

then problem :F can be solved in

(i) O{TI + T2 Ei=llog qi log log qi) time, and

(ii) O{TI + Ei=dqi + T2) logqi) time.

Proof. Run algorithm Al for problem 'Po (8*) with unknown 8* and consider
phase i of this computation. During this phase we have to compute inde-
pendent maxima of the total size at most qi. If we compute these maxima
in parallel using the O{log log q )-stage algorithm described above, then the
running time of this phase is O{qi + T2log qi log log qi) and the running time
456 T. Radzik

of the whole computation is as in part (i) of the theorem. If we first compute


the breakpoints of each maximum, sort together all breakpoints, and locate
8* in the sorted list by binary search, then the running time of this phase is
O((qi + T2) logqi) and the running time of the whole computation is as in
part (ii) of the theorem 0

The best bound for the MaxRatioPath problem which we have shown
so far is O(mnlog(m/n)) (see (19)). Using Theorem 5.5, we show now
an O(mn) bound. This bound, however, is based on other algorithm for
computing a maximum-cost path than algorithm MAXCOST. Algorithm
MAXCOST considers vertices 1,2, ... , n -1, and while considering vertex v,
it examines all edges outgoing from v. The following algorithm MAXCOST2
considers vertices 2,3 ... ,n and while considering vertex v, it examines all
edges incoming into v. This algorithm returns only the maximum cost of a
path from vertex 1 to vertex n, but can be easily modified so that it returns
also a maximum-cost path.

MAXCOST2(n, E, a)
1) d[l] f- 0
2) for v f- 2 to n do d[v] f- -00
3) for v f- 2 to n do
4) d[v] f- max{ d[u] + a(u, v) (u, v) E E}
5) return d[n]

Algorithm MAXCOST2 is still not quite what we need. If we apply Theo-


rem 5.5 with algorithm MAXCOST2 as algorithm AI, then we obtain bound
O(mn log (m/n)1og log(m/n)), from part (i) of the theorem, and bound
O(mnlog(m/n)), from part (ii) of the theorem, so we do not improve
bound (19).
By rearranging the computation of algorithm MAXCOST2, we obtain
the recursive algorithm MAXCOSTRECURSIVE shown below. This algo-
rithm considers first, recursively, the subgraph induced by vertices u =
1,2, ... ,k = l(n+ 1)/2J, then considers the edges (u,v) E E such that
1 ~ u ~ k < v ~ n (observe that all maxima of this part of the computation
-lines 7-8 below - are independent), and finishes up by considering, again
recursively, the subgraph induced by vertices v = k, k + 1, ... , n.

MAXCOSTRECURSIVE(n, E, a)
1) d[l] f- 0
Fractional Combinatorial Optimization 457

2) for v t- 2 to n do d[v] t- -00


3) COMPUTEMAXIMA(I,n)
4) return d[n]

COMPUTEMAXIMA(P, r)
5) k t- l(P + r}/2J
6) if k > p then COMPUTEMAXIMA(P, k)
7) for v t- k + 1 to r do
8) d[v] t- max{ d[v], d[u] + a(u, v) : p ~ u ~ k, (u, v) E E}
9} if k + 1 < r then COMPUTEMAXIMA(k + 1, r)

Algorithm MAX COST RECURSIVE runs in 0 (m) time (each edge is considered
only once) and, for each k = 1,2, ... ,logn, it has rk = n/2k maxima-
computing phases (assume for convenience that n is a power of 2). Each
of these rk phases considers only edges (u, v) E E such that a + 1 ~ u ~
a + 2k - 1 < V ~ a + 2k, for some integer a, so the total size of the maxima
in one such phase is at most qk = min{2 2k,m}. Therefore, part (ii) of
Theorem 5.5 implies that the MaxRatioPath problem can be solved within
the time bound:

o ( m+ logn
Erk(qk+m)logqk
k=l
)
= 0
(lOgn
m+ E;
k=l
·m·(2k}
)
= O(nm).

Bound (i) of Theorem 5.5 gives here the same time bound. We use Theo-
rem 5.5 in next section to show low time bounds for computing a maximum
profit-to-time ratio cycle.
We end this section with a comment about implementation of Megiddo's
parametric search. Throughout the whole computation of algorithm AI, we
should maintain an interval (a, (3) containing the optimum value 6*. Initially
(a,{3) = (0,+00). If during the computation we need to know the relation
between a number x and unknown 6*, we first check if x belongs to (a, (3). If
x ¢ (a,{3), then we know the relation between x and 6*. If x E (a,{3), then
we run algorithm A2 and find out whether x = 6* , 6* E (a, x), or 6* E (x, (3).
If x = 6*, we should stop the entire computation since we have just found
the optimum objective value we were looking for. In the other two cases
we update interval (a, (3) and proceed with the computation. Maintaining
interval (a,{3) does not improve the worst-case running times but it should
significantly improve the average running times.
458 T. Radzik

6 Maximum Profit-to-Time Ratio Cycles


In this section we consider the following classical fractional combinatorial
optimization problem, which we call the MaxRatioCycle problem. An input
instance of this problem consists of a directed graph G = (V, E), an edge-
cost function c : E ---+ R, and an edge-weight function w : E ---+ R.
Let nand m denote the number of vertices and the number of edges in
graph G, respectively. As in the MaxRatioPath problem, if P is a path in
G, then f(P) = ECi,j)EP c(i,j), g(P) = ECi,j)EP w(i,j), and f(P)jg(P) are
the cost, the weight, and the mean-weight cost of this path. The task is to
find a cycle in graph G with the maximum mean-weight cost.

MaxRatioCycle: maximize ~~~~ over all cycles r in graph G.

We assume that graph G is not acyclic, the weight of each cycle is positive,
and there exists a cycle with positive cost. A cycle r = (VO, VI, ••• , Vj = vo)
is simple ifthe vertices vo, VI, ••• , Vj-I are distinct. There are infinitely many
cycles in G, but to find a cycle with the maximum mean-weight cost, we can
limit the search only to simple cycles. If a cycle r is not simple, then its set
of edges can be partitioned into a number of simple cycles r I, r 2, ... , r k,
and

The inequality holds because numbers g(ri)jg(r) are positive and sum up
to 1. Therefore, there must be a simple cycle which has the maximum
mean-weight cost among all cycles. For convenience, however, instead of
considering only simple cycles, we define the set of feasible solutions XG as
the set of the cycles in graph G which contain at most n edges.
The MaxRatioCycle problem models the following tramp-steamer prob-
lem [9, 29]. A trip from a port V to a port u takes w(v, u) time and gives
profit of c(v, u) units. To maximize the mean daily profit, the steamer should
follow a cycle which maximizes the ratio of the total profit to the total time.
The MaxRatioCycle problem is often called the maximum profit-to-time ra-
tio cycle problem.
The parametric problem corresponding to the MaxRatioCycle problem is

MaxCycle(tS): maximize f(r) - tSg(r), over all cycles r E XG.


Fractional Combinatorial Optimization 459

An optimal solution of problem MaxCycle(o) is a cycle from XG with the


maximum cost according to the edge-cost function c - ow. To be able
to directly apply the Newton method to the MaxRatioCycle problem, we
need an algorithm which for given edge costs finds a cycle r E XG with
the maximum cost. (Observe that if we restricted ourselves only to simple
cycles, this problem would be NP-hard.) A weaker algorithm, which finds a
positive-cost cycle, if there exists one, is sufficient for Megiddo's parametric
search.
Let a: E ~ R be an arbitrary edge-cost function. Let dk(V,U) denote
the maximum cost of a path from vertex v to vertex u containing at most k
edges. If there is no path from v to u, then dk(V,U) = -00. We have

d1 (v,u ) -_ { a(v,u), if (v,u) E E,


-00, if (v,u) ¢ E,
and
d2k(V,U) = max{dk(v,U), max{dk(v,X) +dk(X,U)}}. (21)
:cEV
To abstract from technical details, we assume that n is a power of 2. For n
other than a power of 2, we could simply change the set of feasible solu-
tions to the set of all cycles containing at most 2rlognl edges. The recursive
relation (21) gives immediately an O(n3 Iogn) algorithm for computing num-
bers dn(v,u), for each pair of vertices v and u. If during the computation
we store the information about the numbers which define the maxima in
Equation (21), then we can also find a maximum-cost path from v to u,
for each pair of vertices v and u. The maximum cost of a cycle r E XG is
equal to maxvEv{dn(v,v)}. Hence we have an O(n3 Iogn) algorithm which
for a given 0 solves problem MaxCycle(o). Using this algorithm and the
Newton method, we obtain an algorithm for the MaxRatioCycle problem
which runs in O(n3 m 2 log3 n) time for arbitrary edge costs and weights (see
Theorem 4.6) and in O(n 3 log n log(nU)) time, if all edge costs and weights
are integers not greater than U (see Theorem 3.4). Using a similar analy-
sis to the analysis presented for the MaxRatioPath problem in the proof of
Theorem 4.8, one can show that if the weights of the edges are nonnegative,
then the Newton method solves the MaxRatioCycle problem in O(m log n)
iterations and, consequently, in O(n3 m log2 n) time. This O(m log n) bound
on the number of iterations for the MaxRatioCycle problem, as well as for
the MaxRatioPath problem, can be further improved to O(m) (an example
of analysis which can give such a bound is presented in Section 8). How-
ever, we do not include here a proof of this bound, since this bound anyway
460 T. Radzik

yields a slower algorithm than algorithms which can be obtained by using


Megiddo's parametric search.
Let MaxCycleo (c5) denote the problem of detecting a positive-cost cycle
according to the edge-cost function c - c5w, if there exists such a cycle. To
apply Megiddo's parametric search to the MaxRatioCycle problem, we have
to combine two algorithms for the parametric problem MaxCycleo (c5): a
linear algorithm AI, which is to be run with unknown 0* and should ideally
have a small number of stages/phases, and a fast algorithm A2 used for
resolving comparisons performed by the first algorithm (see Theorems 5.3
and 5.5). The problem of detecting a positive-cost cycle is equivalent to the
problem of detecting a negative-cost cycle (simply negate the edge costs). All
known algorithms for detecting a negative-cost cycle are actually based on
single-source shortest-paths algorithms. The best known bound for shortest-
paths algorithms on graphs with arbitrary edge costs is O(nm). This bound
is achieved, for example, by the Bellman-Ford algorithm [1, 12]. Golberg's
shortest-paths algorithm [16] runs in o (my'n log U), if the edge costs are
integers greater than -U, for a positive number U. Thus for arbitrary
edge costs, the best known bound on the running time of algorithm A2
is O(nm). It turns out that the best candidate for algorithm Al is the
algorithm which is based on the computation of numbers dk(v,u) using the
recursive relation (21). This algorithm runs in O(n3 Iogn) time, consists of
O(logn) maxima-computing phases, and the total size of all maxima in one
phase is O(n 3 ). Therefore, Theorem 5.5 implies that the MaxRatioCycle
problem can be solved in O(n3Iogn+nmlog2Iog logn) time (part (i) ofthe
theorem) and in O(n3 Iog2 n) time (part (ii) of the theorem).

7 Maximum Mean Cycles


The maximum-mean cycle problem is the uniform MaxRatioCycle problem.
An input instance of this problem consists of a directed graph G = (V, E)
and an edge-cost function c: E ~ R. We want to find a maximum-mean
cycle in G, that is, a cycle which has the maximum mean cost. The mean
cost of a cycle r is equal to (Leer c(e)) /Irl. In this section we describe
the O(nm) algorithm for the maximum-mean cycle problem designed by
Karp [26]. This algorithm is an example that there are specialized algorithms
for some fractional combinatorial optimization problems, which are faster
than algorithms yielded by general methods such as the Newton method
and Megiddo's parametric search.
Fractional Combinatorial Optimization 461

Let nand m denote the number of vertices and the number of edges
in graph G. We assume that graph G is not acyclic. Let s be a vertex in
graph G such that all other vertices are reachable from s. If such a vertex
does not exist, we can add to graph G a new vertex s and edges (s,v) with
arbitrary costs, for each v E V. This modification does not change the cycles
or their costs. For each vertex v E V and integer k ~ 0, let dk{V) be the
maximum cost of a path of length k from vertex s to vertex v. If no such
path exists, then dk{V) = -00. For a subset of edges A ~ E, c{A) denotes
the sum EeeA c{e). Let 0* denote the maximum mean cost of a cycle in
graph G.

Theorem 7.1

ui'* = max { mln


. {dn{V) - dk{V)} : v E V and dn{v) > -00 }. (22)
O~k:5n-l n- k
Proof. If we decrease the cost of each edge by the same amount a, then for
each vertex v E V and each k ~ 0, dk (v) decreases by ak and, consequently,
(dn{v) - dk{v))/{n - k) decreases by a. Thus both sides of Equation (22)
decrease by a. Therefore, it is enough to prove the theorem for the case
when 0* = 0.
If o· = 0, then Equation (22) is equivalent to

° = max { min
O:5k~n-l
{dn{v) - dk{V)} : v E V and dn{v) > -00 } ,

which is equivalent to

° = max { dn{v) - max {dk{V)}: v E V and dn{v) >


O:5k:5n-l
-00 } . (23)

Let v be a vertex such that dn{v) > -00, and let P be a path of length n
and cost dn (v) from s to v. Path P is not simple (that is, at least one vertex
occurs on P more than once), so the edges of P can be partition into a cycle
r and, possibly trivial, path R from s to v. Let k = IRI ~ n - 1. Since
°
0* = 0, then c{r) ~ and we have

Thus the right-hand side of (23) is at most zero.


Now we show that the right-hand side of (23) is at least zero. Let r be
a zero-cost cycle in graph G. Since o·
= 0, such a cycle must exist. Let v
462 T. Radzik

be an arbitrary vertex on cycle r, and let H be a maximum cost path from


s to v. Such a path must exists because the graph does not have positive-
cost cycles. Let P2 be a path of length at least n which consists of path PI
followed by some number of repetitions of cycle r. Since the cost of cycle
r is equal to 0, the costs of paths PI and P2 are the same, so path P2 is a
maximum cost path from s to v. Let P3 be the path which consists of the
first n edges of path P2. Path P3 is a maximum cost path from vertex s to
some vertex w, because any initial part of a maximum cost path must be a
maximum cost path. Thus we have

This means that the right-hand side of (23) is at least zero. o


Theorem 7.1 gives an algorithm for the maximum-mean cycle problem
which runs in O{nm) time. We have do{s) = 0 and do{v) = -00, for each
v i= s. Numbers dk{v), for each v E V and k = 1,2, ... n, can be computed
in O{nm) time using the following relation:

dk{V) = maxi dk-I{U) + c{u,v) : (u,v) E E}. (24)

Knowing all numbers dk{V), for v E V and 0:$ k :$ n, we can compute ~* in


O{n 2 ) time using Equation (22). Let v be a vertex which gives the maximum
in (22), and let P be a maximum-cost path from s to v of length n. We can
construct such a path, if during the computation of numbers dk{X) we store
the information about the vertices which give the maxima in Equation (24).
Each cycle extracted from path P, and there must be at least one, is a
maximum-mean cycle.

8 Maximum Mean-Weight Cuts


In Section 6 we discussed a problem (the MaxRatioCycle problem) for which
the algorithms obtained by applying Megiddo's parametric search are consid-
erably better than the algorithms yielded by the Newton method, at least for
the worst-case inputs. In this section we consider the MaxRatioCut problem,
which is an example of the reverse situation. Megiddo's parametric search
method does not yield fast algorithm for this problem, because there are no
fast parallel algorithms for the maximum flow problem, which is de facto the
non-fractional version of the MaxRatioCut problem. The main aim of this
Fractional Combinatorial Optimization 463

section is to show a linear bound on the number of iterations of the Newton


method for the MaxRatioCut problem with non-negative edge weights. This
linear bound gives the fastest known algorithm for this problem.
A network G = (V, E, u, d) is a directed, strongly connected graph with
a set of vertices V, a set of edges E, a nonnegative edge-capacity function
u : E ~ R, and a demand function d : V ~ R such that EvEv d( v) =
O. Without loss of generality, we assume that the set of edges is symmetric,
that is, if (x, y) E E, then also (y, x) E E. If d(v) is negative, then v is a
source - a node with supply. If d( v) is positive, then v is a sink - a node
with demand. As before, nand m denote the number of vertices and the
number of edges in network G, respectively.
For a subset of vertices W ~ V, d(W) ~f EVEW d(v) is the net demand
in W. If 8 ~ V, T = V - 8, 8 =f:. 0 and T =f:. 0, then cut (8, T) is the set of
edges (x, y) such that x E 8 and yET. The capacity and the surplus of a
cut (8, T) are defined as, respectively,
u(8, T) def L u(e),
eE(S,T)

surplus(8, T) ~f d(T) - u(8, T).


If network G is viewed as a model for designing shipment of the commodity
from the sites with supply (the sources) to the sites with demand (the sinks),
then a positive surplus(8, T) means that not all demand in T can be satisfied.
The amount of the demand in T which cannot be satisfied without violating
the edge capacities is at least surplus(8, T).
If we have an edge-weight function w : E ~ R, then g(8, T) ~f
EeE(S,T) w(e) and surplus(8, T)/g(8, T) are the weight and the mean-weight
surplus of cut (8, T). The assumption that the underlying graph is strongly
connected guarantees that each cut has at least one edge. We further as-
sume, as in the general model of fractional combinatorial optimization, that
the edge weights are such that the weight of each cut is positive. The
MaxRatioCut problem is the problem of finding a cut with the maximum
mean-weight surplus in a given network G and for a given edge-weight func-
tion w. The corresponding parametric problem MaxSurplusCut(0) is the
problem of finding a maximum-surplus cut in network G6 = (V, E, u+ow, d),
that is, in network G with the edge-capacity function changed to function
u + ow. The problem of finding a maximum-surplus cut in network G can
be reduced in O(m) time to the problem of computing a maximum flow in
network G. All known algorithms for computing maximum-surplus cuts are
464 T. Radzik

based on maximum-flow algorithms. Let TMaxFlow(n, m) denote the worst-


case time complexity of computing a maximum flow in a network with n
vertices and m edges. The results from [17] and [27] imply that

TMaxFlow(n,m) = 0(min{nmlog(n2jm),nm+ n2+E}), (25)

where E is an arbitrary positive constant.


The MaxMeanCut problem is the uniform MaxRatioCut problem, that
is, the special case of the MaxRatioCut problem when all edge weights are
equal to 1. The MaxMeanCut and the MaxRatioCut problems appear, for
example, in the context of the classical minimum-cost flow problem. Gold-
berg and Tarjan [18] showed a simple strongly polynomial iterative method
for solving the minimum-cost flow problem, which is based on computa-
tion of minimum-mean cycles. Ervolina and McCormick [11] (see also [39])
showed an analogous method for solving the dual problem of the minimum-
cost flow problem, which is based on computation of maximum mean-surplus
cuts. Wallacher [48] used minimum mean-weight cost cycles in a generaliza-
tion of Goldberg and Tarjan's method. Analogously, maximum mean-weight
surplus cuts can be used in a generalization of Ervolina and McCormick's
method.
Theorem 5.2 implies that Megiddo's parametric search solves
the MaxRatioCut problem in 0((TMaxFlow(n,m))2) = 0(n 2 m 2 10g2 n) time.
The first parallel algorithm for the maximum flow problem is due to Schiloach
and Vishkin [45]. This algorithm runs in 0(n 2 10gn) time and uses n
processors. Theorem 5.4 implies that Megiddo's parametric search based
on Schiloach and Vishkin's algorithm solves the MaxRatioCut problem in
0(n 3 mlog 3 n) time. So far no parallel maximum-flow algorithm has been
designed, which would lead to a better bound for Megiddo's parametric
search for the MaxRatioCut problem than 0*(n 3 m). Notation 0*0 hides a
poly-logarithmic factor.
Theorem 4.6 gives an 0(m2 10g2 n) bound on the number of iterations
of the Newton method for the MaxRatioCut problem. In the remaining
part of this section we show that special properties of the MaxRatioCut
problem, notably the "maximum flow - minimum cut" duality, imply
that the number of iterations is actually only O(m), provided that the
edge weights are nonnegative. This bound implies that the Newton
method solves the MaxRatioCut problem with nonnegative edge weights
in O(mTMaxFlow(n,m» = 0(nm 2 10gn) time. Observe that in the case of
the MaxMeanCut problem, an O(m) bound on the number of iterations fol-
Fractional Combinatorial Optimization 465

lows from Theorem 3.5 (in this case a stronger O(n) bound can be shown,
see [38]). From now on we assume that the edge weights are nonnegative.
A function r : E ---t R is a flow in network G iffor each edge (x, y) E E,

r(x, y) ::; u(x, y),


r(x, y) = -r(y, x).

For a flow r, if A ~ E, then r(A) ~f EeEAr(e), and if v E V, then r(in)(v)


denotes the net flow into vertex v:
r (in) (v) ~f "L...J r ( v,x ) .
(v,X)EE

We defi ne Ur
def
= U - r, d r def (. ) and fior a cut (
= d - rill, 8, )
T ,

surplusr (8, T) ~f dr(T) - u r (8, T).

Values ur(e) and dr(v) are commonly called the residual capacity of edge e
and the residual demand at vertex v with respect to flow r. A flow r is a
maximum flow if it minimizes EVE V Idr(v)l. We need the following facts
from the network-flows theory. Let r, r max , and (8max , Tmax) be a flow, a
maximum flow, and a maximum surplus cut in network G = (V, E, u, d).

Fl. For each cut (8, T), surplusr (8, T) = surplus(8, T).

F3. For each T ~ V, drmax(T) ::; drmax(Tmax).


F4. Let a new edge-capacity function u' be such that u'(e) ~ u(e), for each
edge e E E. There exists a maximum flow r~ax in network G' = (V, E, u', d)
such that
(i) for each vertex v E V, dr:..ax (v) and drmax (v) have the same sign,
(ii) for each edge e E E, Ir~ax(e) - rmax(e)1 ::; drmaATmax), and
(iii) for each cut (8, T) in G, Ir~ax(8, T) - rmax(8, T)I ::; dr(Tmax).
These facts can be derived from the properties of network flows described
in [34] and [8, Chapter 27]. Fact Fl follows actually directly from the
definitions we have introduced (for each flow r and cut (8, T), the differ-
ences d(T) - dr(T) and u(8, T) - u r (8, T) are the same). Facts F2 and F3
466 T. Radzik

are closely related to the maximum-flow /minimum-cut theorem. Informally


speaking, the conditions listed in Fact F4 are satisfied by a maximum flow
r~ax in network Gt which "extends" the maximum flow rmax in network G.
In our analysis of the Newton method for the MaxRatioCut problem
we use the same notation as in Section 3, where this method is introduced.
Thus
Ii = surplus(Si,11),
gi = g(Si' Ti),
hi = h(t5i) = surplus(Si' Ti) - t5ig(Si, Ti) = SUrpIUS6, (Si, Ti), (26)
where (Si, Ti ) is the maximum-surplus cut in network G6, =
(V, E, u + t5iw, d) computed in iteration i. Subscript 15 will always indi-
cate that the underlying network is network G6, that is, network G with the
edge-capacity function changed to function u + t5w. In particular,
SUrpIUS6(S, T) deC d(T) - (u + t5w)(S, T)
= surplus(S, T) - t5g(S, T), (27)
SUrp1US6,r(S, T) ~C dr(T) - U6,r(S, T), (28)
where r is a flow in network G6. The analysis in this section is similar to the
analysis of the Newton method for the MaxRatioPath problem presented
in the proof of Theorem 4.8. We say that an edge e is essential at the
beginning of iteration i, if it belongs to a cut (S;,T;), for some j :2: i. As
the computation proceeds, the number of essential edges decreases and we
want to analyze this decrement. As in the proofs of the other bounds on
the number of iterations of the Newton method, we use Lemma 3.2 and its
direct consequence expressed in Inequality (5) to measure the progress of
the computation.
Let ri, for i = 1,2, ... , t (t is the index of the last iteration) be maximum
flows in networks G6, such that for each i, all conditions of Fact F4 above are
satisfied with G = G6p Gt = G6i+1' rmax = ri, and r~ax = ri+!o Using (28),
Facts F1 and F2, and then (26), we have
dr, (11) = surp1us6,,r.(Si' 11) - U6"r, (Si, Ti)
= SUrp1US6, (Si, 11)
(29)
The following lemma gives a simple, sufficient condition for an edge not to
be essential.
Fractional Combinatorial Optimization 467

Lemma 8.1 If
uoi,ri(e) > hi, (30)
then edge e is not essential at the beginning of iteration i.

Proof. Let (8, T) be a cut containing an edge e which satisfies Inequal-


ity (30). Using Facts Fl and F3, and Equation (29), we obtain

surpluso; (8, T) = surpluso"ri (8, T) = d ri (T) - uo"ri (8, T)


< d ri (Ti}- u o"ri(8,T) = hi- U oi,ri(8,T)
< hi - uo;,ri(e) < O.
Therefore none of the cuts (8j, Tj), for j ~ i, can contain edge e, because
8-3 >
- (;-, and

This means that edge e is not essential at the beginning of iteration i. 0

The proof of the O(m) bound on the number of iterations shown below in
Theorem 8.3 involves quite a few technical details. Therefore, to highlight
the main line of the argumentation, we first show a simpler O(mlogm)
bound.

Theorem 8.2 The Newton method solves the MaxRatioCut problem in


O( m log m) iterations, if the edge weights are nonnegative.

Proof. We show that a sequence of k = llogmJ + 2 consecutive iterations


decreases the number of essential edges at least by 1. This claim immedi-
ately implies the O(mlogm) bound on the number of iterations. Consider
iterations i, i + 1, .. _,i + k. Inequality (5) implies that

(31)

If gi+k < (l/m)gi' then, using the assumption that the edge weights are
nonnegative, we know that for some edge e E (8i' Ti),

1 1
wee) ~ I(S- T,-)I
" ,
L weal
aE(Si,Ti)
~ -gi
m
> gi+k·
468 T. Radzik

Such an edge e cannot belong to cut (SHk, THk) or to any cut (Sj, Tj),
for j 2: i + k (since sequence (91) is nonincreasing), so e is essential at the
beginning of iteration i but is not essential at the beginning of iteration i+k.
If 9Hk 2: (1/m)9i' then also 9i+k 2: (1/m)9Hb and Inequality (31)
implies that hHk < (l/m)hi+!. Fact F3 implies that for each vertex v E 11,
dri (v) is nonnegative. Hence Fact F4( i) further implies that for each vertex
v E 11, drHk(V) is nonnegative, so drHk (11) must be nonnegative as well.
Therefore, using Lemma 4.7 and Fact F1, we obtain

-mhHk > -hHl > fi - 0Hk9i


= surplus(Si,11) - 0Hk9(Si, 11)
= surplusoHk (Si, 11)
= surplusoHk,ri+k (Si, Ti)
= dri+k (Ti) - uOi+k,rHk (Si, Ti)

2: -uoHk,rHk (Si, 11). (32)

The above inequality implies that there is an edge e E (Si,11) such that
uoHk,rHk (e) > hHk' Such an edge is essential at the beginning of itera-
tion i, but Lemma 8.1 implies that it is not essential at the beginning of
iteration i + k. 0

Theorem 8.3 The Newton method solves the MaxRatioCut problem in O(m)
iterations, if the edge weights are nonnegative.

Proof. Lemma 3.2 implies that for each iteration i except the last one,

9Hl <! (33)


9i - 2
or
hi+! < ! (34)
hi - 2'
We separately bound the number of iterations for which Inequality (33)
holds and the number of iterations for which Inequality (34) holds. Let
iI, i2, ... ,iq be the indices of the iterations for which Inequality (33) holds.
Let J-Lj = 9ij' for j = 1,2, ... , q, and let (ak)r=l be the sequence of the edge
weights in nonincreasing order. Sequences (ak)r=l and (J-Lj)l=j satisfy the
conditions of Lemma 8.4 below, so q =::; m.
Fractional Combinatorial Optimization 469

Now we bound the number of iterations for which Inequality (34) holds.
Here the argument is more involved, because numbers hi are not simply sub-
sums of a fixed set of m numbers. From now on we consider only iterations
for which Inequality (34) holds. To avoid towering subscripts, we number
these iterations from 1 to t ::; t. Thus "iteration i" and subscripts i refer
now to the ith iteration for which Inequality (34) holds. We use the same
notation and definitions as in the proof of Theorem 8.2, taking into account
the new numbering of the iterations.
Consider cut (8i,Ti) = {eI,e2, ... ,ep } computed in iteration i. For
technical reasons, we assume that there are at least logp + 5 iterations
following iteration i, that is, t' ~ i + log p + 5. The proof of Theorem 8.2 was
based on showing that at least one edge from cut (8i' Ti) becomes unessential
by the beginning of iteration i + llog m J + 2. Here we show that there exists
I ~ 3 such that at least 1-2 edges from cut (Si,1i) become unessential by
the beginning of iteration i + 1+ 2. This claim immediately implies an O( m)
bound on the number of iterations.
Fact F4(ii), Equation (29), and Inequality (34) imply that for each edge
e E E and index 1 such that i + 3::; 1 ::; t,
!uoH3 ,rl+l (e) - uOi+3,r,(e)! = !rl+!(e) - rl(e)!
< dr, (11) = h,
1
< --h'+2
2'-i-2 ' . (35)
Analogously, using Fact F4( iii), we have for each cut (8, T) and index I such
that i + 3::; 1 ::; t',
1
!UOHVl+l (8, T) - uOi +3,r, (8, T)I ::; 21- i - 2 hi+2. (36)
The derivations (32) for k = 3 and Inequality (34) give
uOH3 ,rH3 (8i' 1i) ~ hi+! ~ 2hi+2. (37)
Using this inequality and Inequality (36), we have for each index 1 such that
i+4::; l::; t',
u oi+3 ,r, (8i, 1i) = uoi+3,ri+3 (8i' 1i)
+ (UOi+3,ri+4(S,T) - uOi+3,ri+3(8,T»
+ ... + (uoi+3,r,(S,T) - Uoi +3 ,r'_1(8,T»
> UrUH3,rH3 (S·, ,T,.)
' - h'+2
, (!
2 + -.!:..
22 + ... + 2'-1-3 _1._)
> hi+2. (38)
470 T. Radzik

For each I = 1,2, ... , t' - i - 2 and j = 1,2, ... ,p, let

Inequalities (37) and (38) imply that matrix (a,,j) satisfies Condition 2 of
Lemma 8.5 below. (since we have E~=l a,,j = u6i+3,ri+2+,{Si,1i)/hi+2).
Inequality (35) implies that Condition 3 is satisfied as well. Condition 1
holds because we assumed that t' ~ i + logp + 5. Thus we can apply
Lemma 8.5 to matrix (a,,j). According to the definition of "good entries"
in the statement of Lemma 8.5, if a,,j is a good entry, then there exists
1 :s; I' :s; I such that a",j > 1/2" . If a",j > 1/2", then

so Lemma 8.1 implies that edge ej is unessential at the beginning of itera-


tion i + I + 2. Lemma 8.5 implies that there exists l ~ 3 such that there
are at least I - 2 good entries among entries a',l, a',2, ... , a"p' The I - 2
edges corresponding to these I - 2 entries are essential at the beginning of
iteration i but become unessential by iteration i + I + 2. 0

It remains to prove the following two technical lemmas.

Lemma 8.4 Let al ~ a2 ~ .. , ~ am ~ 0 and J.'l > J.'2 > ... > J.'q > 0 be
such that

1. J.'j+l :s; {1/2)J.'j, for j = 1,2, ... ,q - 1, and

2. J.'j :s; E{ak I ak :s; J.'j}, for j = 1,2, ... ,q.

Then q:S; m.

Proof. Let ak = ak + ak+l + ... + am. Condition 2 implies that J.'l :s;
aI, and if J.'j < am = am, then J.'j :s; O. Thus all numbers J.'j lie in the
interval [am, all. To prove that q :s; m, we show that each of the intervals
{am. am-I], {am-I, a m-2], ... , (a2' ad contains at most one element from
sequence (J.'j). Let J.'j E {ak+b ak], for some 1 :s; j :s; q-l and 1 :s; k :s; m-l.
We show that J.'j+l :s; ak+l, so J.'j+l ¢ (ak+b ak]. If ak+l ~ (1/2)ak' then
Fractional Combinatorial Optimization 471

If ak+l < (1/2)ak' then


1 1
I/.
rJ
"+1 <
- 2 rJ -< -ak
-I/."
2

The above inequality and Condition 2 imply that j.Lj+1 ~ ak+1' o

Lemma 8.5 Let (al,j) be a q x p matrix such that

1. q ~ log p + 3,
2. L~=1 al,j ~ 1, for each 1 ~ 1 ~ q, and

3. /al+1,j - al,j/ ~ 1/21, for each 1 ~ 1 ~ q - 1 and 1 ~ j ~ p.

If al,j > 1/21, then this and all subsequent entries in column j are "good
entries." There exists 1 such that 3 ~ 1 ~ q and row l of the matrix contains
at least 1 - 2 good entries.

Proof. Condition 3 implies that if 1 ~ l' < l" ~ q and 1 ~ j ~ p, then

I I I
al",j < al',j + 21' + 21'+1 + ... + 21"-1
1
< al',j + 21'-1 . (39)

If al,j is the first good entry in column j and 1 ~ 2, then

1 1 1 1
al,j ~ al-l,j + 21- 1 ~ 21- 1 + 21-1 = 21- 2 ' (40)

The first inequality above follows from Condition 3, and the second one
holds because entry al-l,j is not a good entry.
Assume that for each 3 ~ 1 ~ q, row 1 contains at most 1 - 3 good
entries. (In particular, there are no good entries in rows 1, 2 and 3.)
We show that this assumption implies that the sum of the entries in the
last row is less than 1, which contradicts Condition 2 of the lemma. Let
jl, h, ... ,jr be the indices of all columns which have at least one good en-
try. Let all,il' aI 2,h,'" ,alr,jr be the first good entries in these columns.
Let it, h, ... ,jr be ordered in such a way that it ~ 12 ~ ... ~ lr. Hence for
each k = 1,2, ... ,r, entries alh,jl' al h ,j2' ... ,alh,jh are good entries, so row lk
472 T. Radzik

contains at least k good entries. Our assumption says that row lk contains
at most lk - 3 good entries, so

(41)

The sum of the entries in the last row is at most

The first inequality follows from Inequality (39), the second one from Con-
dition 1 and Inequality (40), and the third one from Inequality (41). 0

9 Concluding Remarks
The binary search, the Newton method, and Megiddo's parametric search
form together a powerful set of methods for solving fractional combinatorial
optimization problems. All three methods reduce a fractional problem to
a sequence of instances of the corresponding non-fractional problem. Im-
plementations of the binary search and the Newton method are straightfor-
ward once a procedure for solving the non-fractional problem is provided.
Implementation of Megiddo's parametric search is a bit more complicated,
because this method, unlike the other two, does not treat a procedure for
the underlying non-fractional problem as a black box, but has to examine
its structure and modify it appropriately.
Megiddo originally proposed his parametric search method in the context
of fractional combinatorial optimization problems such as the minimum-
ratio spanning-tree problem and the maximum profit-to-time cycle prob-
lem [31]. Since then, however, his method has been extended to far more
general optimization problems; see for example [5, 6, 21, 33, 46]. If we use
Fractional Combinatorial Optimization 473

Megiddo's parametric search method to obtain a fast algorithm for a given


fractional combinatorial optimization problem, our main task is to find a
parallel algorithm for the underlying non-fraction problem which exhibits a
right trade-off between the parallel running time and the total amount of
computation performed.
The Newton method for general fractional optimization has been exten-
sively studied since Dinkelbach [10] introduced it in 1967, and a number of
modifications and extensions of this method have been proposed and an-
alyzed [23, 35, 40]. However, analyzing the Newton method for fractional
combinatorial optimization problems is a relatively young research direction
(the results presented in Sections 4 and 8 come from early 90's [37, 38])
and there are quite a few questions which remain to be answered. For ex-
ample, how tight is the O(p21og2 p) bound on the number of iterations of
the Newton method for linear fractional combinatorial optimization shown
in Theorem 4.6? The main tool used in the derivation of this bound is
Lemma 4.1, which gives an O(plogp) bound on the length of a geometric
sequence of subsums ofap-element set. We know that this O(plogp) bound
is tight [19], but there may be a better way of using it in the analysis of the
Newton method.
For problems such as MaxRatioPath, MaxRatioCycle, and MaxRatioCut,
the Newton method requires only a linear number of iterations, provided
that the edge weights are non-negative, and the proofs of these three bounds
are remarkably similar to each other (compare the proofs of Theorems 4.8
and 8.2). However, a unified framework which would generalize these three
examples has not been proposed yet. Such a framework would have to be
based on the "primal-dual" structure of problems such as the three problems
mentioned here. Another related task is to examine if the assumption that
the edge weights are non-negative is really necessary to obtain a liner bound
on the number of iterations.
The Newton method for fractional combinatorial optimization is an ap-
plication of the classical Newton method to computing the root of a piecewise
linear, convex function h defined in (1). The number of linear segments of
function h is an obvious upper bound on the number of iterations of this
method. The artificial example presented in Section 4 shows that func-
tion h may consist of an exponential number of linear segments, even for a
linear fractional combinatorial optimization problem. The results of Gus-
field [20] (upper bound) and Carstensen [3] (lower bound) imply that for
the MaxRatioPath and the MaxRatioCycle problems function h consists, in
the worst-case, of n 9 (logn) linear segments. Such tight bounds, however, are
474 T. Radzik

known only for relatively few problems. For example, no tight bounds on the
complexity of function h for the MaxRatioCut problem and the maximum-
ratio spanning-tree have been reported yet.
The Newton method constructs a sequence of improving lower bounds
converging to the optimum objective value, but since it does not provide
upper bounds, no bound on the error is available at any given iteration.
There are methods for general fractional optimization, which are based on
the Newton method but construct also improving upper bounds (see, for
example, [23, 35, 40]). It is not clear yet if these methods can lead also to
interesting results for fractional combinatorial optimization.
The methods for solving fractional combinatorial optimization problems
rely on methods for solving the corresponding non-fractional optimization
problems. The underlying non-fractional problem may however be diffi-
cult on its own, and a practical algorithm for solving this problem exactly
may not exists. Hashizume et al [22] analyze Megiddo's parametric search
method for maximizing (ao + ax)/{bo + bx), when the algorithm for the
underlying non-fractional problem of maximizing {a - ob)x returns only
approximate solutions. They show that the accuracy of the approximate so-
lution obtained for the fractional problem is at least as good as the accuracy
of the solutions computed for the non-fractional problem. As an example of
their analysis, they present a fully polynomial approximation scheme for the
fractional 0-1 knapsack problem. Examples of analogous analyses for the bi-
nary search method are shown in [25, 30]. However, a general approximation
analysis of the Newton method has not been proposed yet.

References
[1] R. E. Bellman. On a routing problem. Quarterly of Applied Mathemat-
ics, 16:87-90, 1958.
[2] M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan. Time
bounds for selection. J. Compo and Syst. Sci., 7(4):448-461, 1973.
[3] P. J. Carstensen. The Complexity of Some Problems in Parametric,
Linear, and Combinatorial Programming. PhD thesis, Department of
Mathematics, University of Michigan, Ann Arbor, Michigan, 1983. Doc-
toral Thesis.
[4] R. Chandrasekaran. Minimum ratio spanning trees. Networks, 7:335-
342, 1977.
Fractional Combinatorial Optimization 475

[5] E. Cohen and N. Megiddo. Maximizing concave functions in fixed di-


mension. In P. Pardalos, editor, Complexity in Numerical Optimization,
pages 74-87. World Scientific, 1993.
[6] E. Cohen and N. Megiddo. Strongly polynomial time and NC algo-
rithms for detecting cycles in dynamic graphs. J. Assoc. Comput.
Mach., 40(4):791-830, 1993.

[7] R. Cole. Parallel merge sort. SIAM J. Comput., 17(4):770-785, August


1988.

[8] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Al-


gorithms. MIT Press, Cambridge, MA, 1990.

[9] G. B. Dantzig, W. O. Blattner, and M. R. Rao. Finding a cycle in


a graph with minimu cost to time ratio with application to a ship
routing problem. In P. Rosentiehl, editor, Theory of Graphs, pages
77-84. Gordon and Breach, New York, 1967.

[10] W. Dinkelbach. On nonlinear fractional programming. Management


Science, 13:492-498, 1967.

[11] T. R. Ervolina and S. T. McCormick. Two strongly polynomial cut


cancelling algorithms for minimum cost network flow. Discrete Applied
Math., 46:133-165, 1993.

[12] L. R. Ford, Jr. and D. R. Fulkerson. Flows in Networks. Princeton


Univ. Press, Princeton, NJ, 1962.
[13] B. Fox. Finding minimal cost-time ratio circuits. Oper. Res., 17:546-
551, 1969.

[14] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in


improved network optimization algorithms. J. Assoc. Comput. Mach.,
34:596-615, 1987.

[15] M. Goemans. Personal communication, 1992.

[16] A. V. Goldberg. Scaling algorithms for the shortest paths problem.


SIAM J. Comput., 24(3):494-504, 1995.

[17] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum


flow problem. J. Assoc. Comput. Mach., 35:921-940, 1988.
476 T. Radzik

[18] A. V. Goldberg and R. E. Tarjan. Finding minimum-cost circulations


by canceling negative cycles. J. Assoc. Comput. Mach., 36:388-397,
1989.

[19] M. Goldmann. On a subset-sum problem. Personal Communication,


1994.

[20] D. Gusfield. Sensitivity analysis for combinatorila optimizaton. Techni-


cal Report UCB/ERL M90/22, Electronics Research Laboratory, Uni-
versity of California, Berkeley, May 1980.

[21] D. Gusfield. Parametric combinatorial computing and a problem of


program module distribution. J. Assoc. Comput. Mach., 30(3):551-
563, 1983.

[22] S. Hashizume, M. Fukushima, N. Katoh, and T. Ibaraki. Approxi-


mation algorithms for combinatorial fractional programming problems.
Math. Programming, 37:255-267, 1987.

[23] T. Ibaraki. Parametric approaches to fractional programs. Math. Pro-


gramming, 26:345-362, 1983.

[24] H. Ishii, T. Ibaraki, and H. Mine. Fractional knapsack problems.


Math. Programming, 13:255-271, 1976.

[25] K. Iwano, S. Misono, S. Tezuka, and S. Fujishige. A new scaling algo-


rithm for the maximum mean cut problem. Algorithmica, 11(3):243-
255, 1994.

[26] R. M. Karp. A Characterization of the minimum cycle mean in a


digraph. Discrete Math., 23:309-311, 1978.

[27] V. King, S. Rao, and R. Tarjan. A faster deterministic maximum flow


algorithm. J. Alg., 17(3):447-474, 1994.

[28] E. L. Lawler. Optimal cycles in doubly weighted directed linear graphs.


In P. Rosentiehl, editor, Theory of Graphs, pages 209-213. Gordon and
Breach, New York, 1967.

[29] E. L. Lawler. Combinatorial Optimization: Networks and Matroids.


Holt, Reinhart, and Winston, New York, NY., 1976.
Fractional Combinatorial Optimization 477

[30] S. T. McCormick. A note on approximate binary search algorithms for


mean cuts and cycles. UBC Faculty of Commerce Working Paper 92-
MSC-021, University of British Columbia, Vancouver, Canada, 1992.
[31] N. Megiddo. Combinatorial optimization with rational objective func-
tions. Math. Oper. Res., 4:414-424, 1979.
[32] N. Megiddo. Applying parallel computation algorithms in the design of
serial algorithms. J. Assoc. Comput. Mach., 30:852-865, 1983.
[33] C. Haibt Norton, S. A. Plotkin, and E. Tardos. Using separation algo-
rithms in fixed dimension. J. Alg., 13:79-98, 1992.
[34] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization:
Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982.
[35] P. M. Pardalos and A. T. Phillips. Global optimization of fractional
programs. J. Global Opt., 1:173-182, 1991.
[36] T. Radzik. Newton's method for fractional combinatorial optimization.
In Proc. 33rd IEEE Annual Symposium on Foundations of Computer
Science, pages 659-669, 1992.
[37] T. Radzik. Newton's method for fractional combinatorial optimization.
Technical Report STAN-CS-92-1406, Department of Computer Science,
Stanford University, January 1992.
[38] T. Radzik. Parametric flows, weighted means of cuts, and fractional
combinatorial optimization. In P. Pardalos, editor, Complexity in Nu-
merical Optimization, pages 351-386. World Scientific, 1993.
[39] T. Radzik and A. Goldberg. Tight bounds on the number of minimum-
mean cycle cancellations and related results. Algorithmica, 11:226-242,
1994.
[40] S. Schaible. Fractional Programming 2. On Dinkelbach's algorithm.
Management Sci., 22:868-873, 1976.
[41] S. Schaible. A survey of fractional programming. In S. Schaible and
W. T. Ziemba, editors, Generalized Concavity in Optimization and Eco-
nomics, pages 417-440. Academic Press, New York, 1981.
[42] S. Schaible. Bibliography in fractional programming. Zeitschrift fUr
Operations Res., 26(7), 1982.
478 T. Radzik

[43] S. Schaible. Fractional programming. Zeitschrift fUr Operations Res.,


27:39-54, 1983.

[44] S. Schaible and T. Ibaraki. Fractional programming. Europ. J. of


Operational Research, 12, 1983.

[45] Y. Shiloach and U. Vishkin. An O(n2 logn) Parallel Max-Flow Algo-


rithm. J. Algorithms, 3:128-146, 1982.
[46] S. Toledo. Maximizing non-linear concave functions in fixed dimension.
In P. Pardalos, editor, Complexity in Numerical Optimization, pages
429-447. World Scientific, 1993.

[47] L. G. Valiant. Parallelism in comparison problems. SIAM J. Comput.,


4:348-355, 1975.

[48] C. Wallacher. A Generalization of the Minimum-mean Cycle Selec-


tion Rule in Cycle Canceling Algorithms. Unpublished manuscript,
Institut fur Angewandte Mathematik, Technische Universitat Carolo-
Wilhelmina, Germany, November 1989.

[49] A. C. Yao. An O(lEllog log IVI) algorithm for finding minimum span-
ning trees. Information Processing Let., 4:21-23, 1975.
479

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds) pp. 479-532
©1998 Kluwer Academic Publishers

Reformulation-Linearization Techniques for


Discrete Optimization Problems
Hanif D. Sherali
Department of Industrial and Systems Engineering
Virginia Polytechnic Institute and State University,
Blacksburg, VA 24061-0118
E-mail: hanif sCDvt . edu

Warren P. Adams
Department of Math Sciences
Clemson University, Clemson, SC 29634-1907

1 Introduction
Discrete and continuous nonconvex programming problems arise in a host
of practical applications in the context of production, location-allocation,
distribution, economics and game theory, process design, and engineering
design situations. Several recent advances have been made in the develop-
ment of branch-and-cut algorithms for discrete optimization problems and
in polyhedral outer-approximation methods for continuous nonconvex pro-
gramming problems. At the heart of these approaches is a sequence of linear
programming problems that drive the solution process. The success of such
algorithms is strongly tied in with the strength or tightness of the linear
programming representations employed.
This paper addresses the issue of generating tight linear programming
(LP) representations via automatic reformulation techniques in solving dis-
crete mixed-integer 0-1 linear (and polynomial) programming problems. The
particular approach that we focus on in this paper is called the Reformu-
lation Linearization Technique (RLT) , a procedure that can be used
to generate tight linear (or sometimes convex) programming representation,
480 H.D. Sherali and W.P. Adams

for constructing not only exact solution algorithms, but also to design pow-
erful heuristic procedures. Actually, this methodology can be extended to
solve classes of continuous nonconvex programs as well, and we refer the
interested reader to Sherali and Tuncbilek (1992, 1995, 1996), and Sherali
(1996) for an introduction to this subject. (See Sherali and Adams, 1996,
for a general survey on this topic.) The tight relaxations produced can often
be used to derive good quality solutions (in polynomial time) to problems of
practical sizes that arise in the aforementioned applications. Also, the RLT
procedure is capable of generating representations of increasing degrees of
strength, but with an accompanied increase in problem size. Coupled with
the recent advances in LP technology, this permits one to incorporate tighter
RLT based representations within the context of exact or heuristic methods.
The motivation for constructing "goo(/' models, that is, models that
have tight underlying linear programming representations, rather than sim-
ply "mathematically corred' models, has led to some crucial and critical
research on the model formulation process as in Balas (1985), Jeroslow
and Lowe (1984, 1985), Johnson (1989), Meyer (1981), Sherali and Adams
(1989, 1990), Williams (1985), and Wolsey (1989). Much work has also
been done in converting classes of separable or polynomial nonlinear integer
programming problems into equivalent linear integer programs, and for gen-
erating tight, valid inequalities for such problems, as in Adams and Sherali
(1986, 1987a,b), Balas and Mazzola (1984a,b), Glover (1975), and Sherali
and Adams (1989, 1990). Sometimes, a special variable redefinition tech-
nique may be applicable depending on the problem structure as in Martin
(1987). In this approach, a linear transformation is defined on the variables
to yield an equivalent formulation that tightens the continuous relaxation
by constructing a partial convex hull of a specially structured subset of
constraints. A more generally applicable technique is to augment the for-
mulation through the addition of valid or implied inequalities that typically
provide some partial characterization for the convex hull of feasible solu-
tions. Some cutting plane generation schemes in this vein include the ones
described in Nemhauser and Wolsey (1990), Padberg (1980), Van Roy and
Wolsey (1983), and Wolsey (1976). Automatic reformulation procedures
utilizing such constraint generation schemes within a branch-and-bound or
branch-and-cut framework are presented in Crowder et al. (1983), Hoffman
and Padberg (1991), Johnson et al. (1985), Nemhauser et al. (1991), Oley
and Sjouquist (1982), Padberg and Rinaldi (1991), and Van Roy and Wolsey
(1987). Besides these studies, ample evidence is available in the literature
on the efficacy of providing tight linear programming relaxations for pure
Reformulation-Linearization for Discrete Optimization 481

and mixed zero-one programming problems as in Adams and Sherali (1986,


1987a,b), Crowder and Padberg (1986), Geoffrion and McBryde (1979), and
Magnanti and Wong (1981), among many others.
In this paper, we shall be discussing in detail the Reformulation - Lin-
earization Technique (RLT) of Sherali and Adams (1989, 1990, 1994) along
with its several enhancements and extensions. This procedure is also an
automatic reformulation technique that can be used to derive tight LP rep-
resentations as well as to generate strong valid inequalities. Consider a
mixed-integer zero-one linear programming problem whose feasible region
X is defined in terms of some inequalities and equalities in binary variables
x = (Xl, ... , xn) and a set of bounded continuous variables Y = (YI, ... , Ym).
Given a value of d E {I, ... , n} , this RLT procedure constructs various
polynomial factors of degree d comprised of the product of some d binary
variables or their complements. These factors are then used to multiply
each of the constraints defining X (including the variable bounding restric-
tions), to create a (nonlinear) polynomial mixed-integer zero-one program-
ming problem. Using the relationship xJ = Xj for each binary variable
Xj, j = 1, ... , n, substituting a variable WJ and VJk, respectively, in place
of each nonlinear term of the type I1 jEJ Xj, and Yk I1jEJXj, and relaxing
integrality, the nonlinear polynomial problem is re-linearized into a higher
dimensional polyhedral set Xd defined in terms of the original variables
(x, y) and the new variables (w, v). For Xd to be equivalent to X, it is only
necessary to enforce X to be binary valued and the remaining variables may
be treated as continuous valued, since the binariness on the x-variables is
shown to automatically enforce the required product relationships on the
w- and v-variables. Denoting the projection of Xd onto the space of the
original (x, y)-variables as XPd , it can be shown that as d varies from 1 to
n, we get,

where Xpo is the ordinary linear programming relaxation, and conv{X)


represents the convex hull of X. The hierarchy of higher-dimensional rep-
resentations produced in this manner markedly strengthens the usual con-
tinuous relaxation, as is evidenced not only by the fact that the convex hull
representation is obtained at the highest level, but that in computational
studies on many classes of problems, even the first level representation helps
design algorithms that significantly dominate existing procedures in the lit-
erature (see Sherali and Adams, 1996). The theoretical implications of this
hierarchy are noteworthy; the resulting representations subsume and unify
482 H.D. Sherali and W.P. Adams

many published linearization methods for nonlinear 0-1 programs, and the
algebraic representation available at level n promotes new methods for iden-
tifying and characterizing facets and valid linear inequalities in the original
variable space, as well as for providing information that directly bridges
the gap between discrete and continuous sets. Indeed, since the level-n for-
mulation characterizes the convex hull, all valid inequalities in the original
variable space must be obtainable via a suitable projection; thus such a
projection operation serves as an all-encompassing tool for generating valid
inequalities.
Sherali and Adams (1989) also demonstrate the relationship between
their hierarchy of relaxations and that which can be generated through dis-
junctive programming techniques. Balas (1985) has shown how a hierar-
chy spanning the spectrum from the linear programming relaxation to the
convex hull of feasible solutions can be generated for linear mixed-integer
zero-one programming problems by inductively representing the feasible re-
gion at each stage as a conjunction of disjunctions, and then taking its hull
relaxation. This hull relaxation amounts to constructing the intersection of
the individual convex hulls of the different disjunctive sets, and hence yields
a relaxation. Sherali and Adams show that their hierarchy produces a dif-
ferent, stronger set of intermediate relaxations that lead to the underlying
convex hull representation. To view their relaxations as Balas' hull relax-
ations requires manipulating the representation at any stage d to write it as
a conjunction of a non-standard set of disjunctions. Moreover, by the nature
of the RLT approach, Sherali and Adams also show (see Section 5) how one
can readily construct a hierarchy of linear relaxations leading to the con-
vex hull representation for mixed-integer zero-one polynomial programming
problems having no cross-product terms among continuous variables.
In connection with the final comment above, we remark that Boros et al.
(1989) have also independently developed a similar hierarchy for the special
case of the unconstrained, quadratic pseudo-Boolean programming problem.
For this case, they construct a standard linear programming relaxation that
coincides with our relaxation at level d = 1 , and then show in an existen-
tial fashion how a hierarchy of relaxations indexed by d = 1, ... , n leading
up to the convex hull representation at level n can be generated. This is
done by including at level d, constraints corresponding to the extreme di-
rections of the cone of nonnegative quadratic pseudo-Boolean functions that
involve at most d of the n-variables. Each such relaxation can be viewed
as the projection of one of our explicitly stated higher-order relaxations
onto the variable space of the first level for this special case. Moreover, our
Reformulation-Linearization for Discrete Optimization 483

approach also permits one to consider general pseudo-Boolean polynomials,


constrained problems, as well as mixed-integer situations.
Lovasz and Shrijver (1989) have also independently proposed a similar
hierarchy of relaxations for linear, pure 0-1 programming problems, which
essentially amounts to deriving Xl from Xo, finding the projection XPI,
and then repeating this step by replacing Xo with XPI. Continuing in
this fashion, they show that in n steps, conv(X) is obtained. However,
from a practical viewpoint, while the relaxations Xl, X2, ... of Sherali and
Adams are explicitly available and directly implementable, the projections
required by Lovasz and Shrijver for level two and higher order relaxations
are computationally burdensome, necessitating the potentially exponential
task of vertex enumeration. Moreover, extensions to mixed-integer or to
nonlinear zero-one problems are not evident using this development.
Another hierarchy along the same lines has been proposed by Balas et a1.
(1993). In this hierarchy, the set Xl is generated as in Sherali and Adams
(1989), but using factors involving only one of the binary variables, say, Xl.
Projecting the resulting formulation onto the space of the original variables,
produces the convex hull of feasible solutions to the original LP relaxation
with the added restriction that Xl is binary valued. This follows from Sher-
ali and Adams (1989) since it is equivalent to treating only Xl as binary
valued and the remaining variables as continuous, and then generating the
convex hull representation. Using the fact that Xl is now binary valued at all
vertices, this process is then repeated using another binary variable, say, X2,
in order to determine the convex hull of feasible vertices at which both Xl
and X2 are binary valued. Continuing with the remaining binary variables
X3, ••• , Xn in this fashion, produces the convex hull representation at the fi-
nal stage. Based on this construct, that amounts to a specialized application
of Sherali and Adams' hierarchy, Balas et al. describe and test a cutting
plane algorithm. Encouraging computational results are reported, despite
the fact that this specialization produces weaker relaxations than the origi-
nal Sherali-Adams' relaxations. Since this cutting plane generation scheme
is based simply on the first level of Sherali and Adams' hierarchy in which
binariness is enforced on only a single variable at a time, the prospect of
enhancing computational performance by considering multiple binary vari-
ables at a time appears to be promising.
The remainder of this paper is organized as follows. We begin in Sec-
tion 2 by presenting the basic RLT procedure of Sherali and Adams (1989,
1990, 1994). Its various properties are summarized in Section 3, and ideas
for characterizing and generating facets of the convex hull of feasible solu-
484 H.D. Sherali and W.P. Adams

tions (or strong valid inequalities) are presented in Section 4. Extensions to


multilinear programs and specializations for equality constrained situations
are addressed in Section 5 and are illustrated using the popular quadratic
assignment problem. Section 6 deals with a different RLT process proposed
by Sherali, Adams and Driscoll (1996) that can be used to exploit special
structures inherent in the problem, and Section 7 provides illustrations for
some such structures, including generalized/variable upper bounding, cover-
ing, partitioning and parking constraints, and even problem sparsity. In any
RLT approach, a significant additional tightening can be obtained by using
conditional logic while generating the RLT constraints. This is discussed in
Section 8 and is illustrated on the extensively-studied Traveling Salesman
Problem. Section 9 concludes the paper by briefly discussing some special
persistency results, extensions to general integer programs, and various com-
putational guidelines. To lighten the reading, we omit proofs in this paper,
and we refer the reader to the original papers for such details.

2 RLT Hierarchy for Mixed Integer Zero-One Prob-


lems
Consider a linear mixed integer 0-1 programming problem whose (nonempty)
feasible region is given as follows:
n m
X = {(x, Y) : L arjxj + L 'YrkYk ~ (Jr, for r = 1, ... ,R,
j:=1 k=1
o ::;: X ::;: en, x integer, 0::;: Y ::;: em}, (1)
where en and em are, respectively, column vectors of nand m entries of
1, and where the continuous variables Yk are assumed to be bounded and
appropriately scaled to lie in the interval [0, 1] for k = 1, ... , m. (Upper
bounds on the continuous variables are imposed here only for convenience
in exposition, as we comment on later in the discussion.) Note that any
equality constraints present in the formulation can be accommodated in
a similar manner as are the inequalities in the following derivation, and
we omit writing them explicitly in (1) only to simplify the presentation.
However, we will show later that the equality constraints can be treated in
a special manner which, in fact, might sometimes encourage the writing of
the R inequalities in (1), as equalities by using slack variables.
In this section, we present the basic construction process of the Refor-
mulation Linearization Technique (RLT). For the region described in (1),
Reformulation-Linearization for Discrete Optimization 485

given any level d E {O, ... , n}, this technique first converts the constraint
set into a polynomial mixed-integer zero-one set of restrictions by multiply-
ing the constraints with some suitable d-degree polynomial factors involving
the n binary variables and their complements, and subsequently linearizes
the resulting problem through appropriate variable transformations. For
each level d, this produces a higher dimensional representation of the fea-
sible region (I) in terms of the original variables x and y, and some new
variables (w, v) that are defined to linearize the problem. Relaxing integral-
ity, the projection, or "shadow," of this higher dimensional polyhedral set
on the original variable space produces a tighter envelope for the convex hull
of feasible solutions to (I), than does its ordinary linear programming (LP)
or continuous relaxation. In fact, as d varies from zero to n, we obtain a
hierarchy of such relaxations or shadows, each nested within the previous
one, spanning the spectrum from the ordinary linear programming relax-
ation to the convex hull of feasible solutions. As mentioned earlier, the first
level relaxation has itself proven to be sufficiently tight to benefit solution
algorithms for several classes of problems. The RLT process for constructing
a relaxation Xd of the region X defined in (I) corresponding to any level
d E {O, 1, ... ,n} proceeds as follows. For d = 0, the relaxation Xo is simply
the LP relaxation obtained by deleting the integrality restrictions on the

° °
x- variables. In order to construct the relaxation for any level d E {I, ... , n},
let us consider the bound-factors Xi ~ and (1 - Xi) ~ for j = 1, ... ,n
and let us compose bound-factor products of degree (or order) d by selecting
some d distinct variables from the set Xl,"" X n , and by using either the
bound-factor Xi or (1 - Xi) for each selected variable in a product of these
terms. Mathematically, for any d E {I, ... , n}, and for each possible se-
lection of d distinct variables, these (nonnegative polynomial) bound factor
products of degree (or order) d are given by

Fd(J1, J2} = rII Xi] rII (1- Xi)] for each J1, J2, ~ N == {I, ... ,n}
GEJI GEh
such that Jl n J2 = 0 and IJl U J21 = d. (2)

Any (Jl, h) satisfying the conditions in (2) will be said to be of order


d. For example, for n = 3 and d = 2, these factors are X1X2, XIX3, X2X3,
xl(l - X2}, Xl{1- X3}, x2(1 - xt}, x2(1 - X3}, x3(1 - xt}, x3(1 - X2}, (1 -
xt}(1 - X2), (1 - xl)(l - X3), and (1 - x2)(1 - X3). In general, there are
486 H.D. Sherali and W.P. Adams

( ~) 2d such factors. For convenience, we will consider the single factor of

degree zero to be Fo(0, 0) == 1 , and accordingly assume products over null


sets to be unity. Using these factors, let us construct a relaxation Xd of X,
for any given d E {O, ... ,n}, using the following two steps that comprise the
Reformulation-Linearization Technique (RLT).
Step 1 (Reformulation Phase): Multiply each of the inequalities in
(1), including 0 ::5 x ::5 en and 0 ::5 Y ::5 em, by each of the factors Fd(JI, h)
of degree d as defined in (2). Upon using the identity xJ
== Xj (and so
xj(1 - Xj) = 0) for each binary variable Xj,j = 1, ... , n, this gives the
following set of additional, implied, nonlinear constraints:

m
+L 'YrkYk Fd(JI, h) ~ 0
k=1
for r = 1, ... , R, and for each (JI, J2) of order d (3)
FD(JI,h)~Oforeach(JI,J2)oforderD== min{d+l,n}, (4)
Fd(JI, J2) ~ YkFd(JI, J2) ~ 0
for k = 1, ... ,m and for each (Jl' h) of order d. (5)

Step 2 (Linearization Phase): Viewing the constraints in (3,4,5) in


expanded form as a sum of monomials, linearize them by substituting the
following variables for the corresponding nonlinear terms for each J ~ N :

WJ = II Xj and VJk == Yk II Xj, for k = 1, ... ,m. (6)


jEJ jEJ

We will assume the notation that

Wj == Xj for j = 1, ... ,n, W0 == 1, and v0k == Yk for k = 1, ... ,m. (7)

Denoting by !d(JI, J2) and f3(JI, J2) the respective linearized forms of the
polynomial expressions Fd(JI, J2) and YkFd(JI, h) under such a substitu-
tion, we obtain the following polyhedral set Xd whose projection onto the
(x, y) space is claimed to yield a relaxation for X:
Reformulation-Linearization for Discrete Optimization 487

Xd = ((X,y,w,V) :

rGEJt
L: Orj - Prj !d(JI, J2) + L:
jEN-(JtUJ2)
Orj!d+1(Jl + j, J2)
m
+ L: 'Yrk!3(Jl' J2) ~ 0
k=l
for r = 1, ... , R, and for each (Jl' J2) of order d (8)
!D(Jl,J2) ~ 0 for each (JI,J2) of order D == min{d+ I,n}, (9)
!d(J1, h) ~ !3(Jl, J2) ~ 0
for k = 1, ... , m, and for each (JI, J2) of order d}. (10)

Let us denote the projection of Xd onto the space of the original variables
(x,y) by

XPd = {(x, y) : (x, y, w, v) E Xd} for d = 0, 1, ... , n. (11)


Then, Sherali and Adams (1989, 1994) show that

Xpo == Xo ~ X p1 ~ XP2 ~ ... ~ XPn == conv(X) (12)

where Xpo == Xo(for d = 0) denotes thE: ordinary linear programming relax-


ation, and conv(x) denotes the convex hull of X.
Notice that the nonlinear product constraints generated by this RLT
process are implied by the original constraints, and were it not for the fact
that we have explicitly imposed x~ = xj(or xj(I - Xj) = 0) for each binary
variable, this process would have simply produced a relaxation that is equiv-
alent to the ordinary LP relaxation. Hence, the key to the tightening lies in
the recognition of binariness of Xj in replacing x~ by Xj for each j = 1, ... n.
Example 2.1. Consider the following mixed-integer 0-1 constraint re-
gion.

x = {(x,y) : x+y ~ 2,-x+y ~ I,2x-2y ~ I,x binary and y ~ O}. (13)

The corresponding LP relaxation of X is depicted in Figure 1. Note


that no explicit upper bound on the continuous variable y is included in
(13), although the other defining constraints of X imply the boundedness of
y. The RLT process can be applied directly to X as above, without creating
488 H.D. Sherali and W.P. Adams

any explicit upper bound on y. Notice that for this instance, we have n = 1,
and so by our foregoing discussion, the relaxation Xl at level d = 1 should
produce the convex hull representation. Let us verify this fact.
For d = 1, the bound-factor (products) of order 1 are simply X and
(1 - x). Multiplying the constraints of X by x and by (1 - x), and using
x 2 = x along with the linearizing substitution v = xy as given by (6), we
get the following constraints in the higher dimensional space of the variables
(x, y, v). This represents the set Xl.

/' *x => -x~ -v


x+y~2
\" *(1 - x) => 2x+y ~ 2+v

/' *x => -2x ~-v


-x+y ~ 1
\" *(1 - x) => x+y~l+v

/' *x => x ~ 2v
2x - 2y ~ 1
\" *(1- x) => x - 2y ~ 1- 2v

y~O
/' *x => v~O
\" *(1 - x) => -y~ -v

along with 0 ~ x ~ 1. To examine the projection XPI of Xl onto the space


of the (x, y) variables, let us rewrite Xl as follows.

Xl = {(x,y,v) : v ~ 2x +y - 2,v ~ x + Y -1,v ~ x/2,v ~ 0,


v ~ x,v ~ 2x,v ~ (1- x + 2y)/2,v ~ y}.

This yields its projection (using Fourier-Motzkin elimination) as

XPI = {(x, y) : max{2x + y - 2,x + y -1,x/2,0}


~ minix, 2x, (1 - x + 2y)/2, yn
Writing out the corresponding equivalent linear inequalities and dropping
redundant constraints, we obtain

XPI == XPn = {(x, y) : x ~ 2y, 0 ~ x ~ 1, y ~ I}


Reformulation-Linearization for Discrete Optimization 489

x PI = conv(X)

Figure 1: Illustration of RLT relaxations for Example 2.1.

which describes conv(X) as seen in Figure 1.


Example 2.2. Consider the set

x = ({x, y) : 0lXl + 02X2 + 'YlYl + 'Y2Y2 ~ {3,O ::; x ::; e2, x integer,
o ::; Y ::; e2}
Hence, we have n = m = 2. Let us consider d = 2 , so that D =
min{n + I,d} = 2 as well. The various sets (Jl,h) of order 2 and the
corresponding factors are given below:

(Jl , J2) ({1,2},0) ({I}, {2}) ({2},{1}) (0, {I, 2})


F 2 (Jl , J2) XIX2 xl(1- X 2) x2(1 - Xl) (1 - xt}(l - X2)
12(Jl, J2) W12 Xl - W12 X2 - W12 1 - (Xl + X2) + W12
f~(Jl,J2) V12k Va - V12k V2k - V12k Yk - (Vlk + V2k) + V12k

Hence, we obtain the following constraints (14)-(16) corresponding to


(8)-(10), respectively, where VJk has been written as VJ,k for clarity:
490 H.D. Sherali and W.P. Adams

X 2 = {(x,y,w,v) :
(al + a2 - (3)WI2 + 'YIVI2,1 + 'Y2VI2,2 ~ 0,

(14)

{3[-1 + (Xl + X2) - W12] + 'YI[YI - (VI,1 + V2,1) + VI2,1]


+'Y2[Y2 - (VI,2 + V2,2) + VI2,2] ~ 0,

(16)
and [1 - (Xl + X2) + W12] ~ [Yk - (VI,k + V2,k) + V12,k] ~ 0,
for k = 1, 2}.
Some comments and illustrations are in order at this point. First, note
that we could have computed inter-products of other defining constraints
in constructing the relaxation at level d, so long as the degree of the inter-
mediate polynomial program generated at the Reformulation step remains
the same and no nonlinearities are created with respect to the y-variables.
Hence, linearity would be preserved upon using the substitution (6,7). While
the additional inequalities thus generated would possibly yield tighter relax-
ations for levels 1, ... , n - 1, by our convex hull assertion in (12), these
constraints would be implied by those defining X n . Hence, our hierarchy
results can be directly extended to include such additional constraints, but
for simplicity, we omit these types of constraints. Nonetheless, one may
include them in a computational scheme employing the sets X d , d < n.
Second, as alluded to earlier, note that for the case d = 0, using the fact
that 10(0,0) == 1 , that 1~(0, 0) == Yk for k = 1, ... , m, and that h(j, 0) == Xj
and h (0, j) == (1 - Xj) for j = 1, ... ,n, it follows that Xo given by (8)-(10)
is precisely the continuous (LP) relaxation of X in which the integrality
restrictions on the x-variables are dropped. Finally, for d = n, note that the
inequalities (9) are implied by (10) and can therefore be omitted from the
representation X n .
Reformulation-Linearization for Discrete Optimization 491

3 Properties of the RLT Relaxation


Equation (12) asserts the main RLT result that for d = 0,1, ... , n, the
(conceptual) projections of the sets Xd represent a sequence of nested, valid
relaxations leading up to the convex hull representation. In this section, we
summarize some of the salient properties of the RLT relaxations that lead
to this result.
Property 3.1. The constraints (9) and (10) for any level's relaxation
can be surrogated in a particular fashion using nonnegative multipliers to
produce the constraints of the lower level relaxations. Hence, the hierarchy
among these relaxations is established via this property whereby lower level
constraints are directly implied by higher level constraints.
Property 3.2. Consider the RLT constraints (10) that are generated
by multiplying the bound factors of order d with the constraints 0 ~ y ~ em.
If x is any binary vector for which (x, y, w, v) is feasible to these constraints,
then we must have 0 :::; y ~ em and moreover, the identities (6) must hold
true. Hence, these constraints serve to make the nonlinear relationship (6)
hold true within the linear set Xd for any solution having binary values for
x. In fact, for any binary value of x, we have that (x, y, w, v) E Xd if and
only if (x, y) E X and (2.4c) holds true.
Property 3.3. The highest level relaxation is equivalent via a nonsin-
gular affine transformation to the polytope generated by taking the convex
hull of all feasible solutions to X. Hence, this result along with Property
1 establishes (12). In fact, (x, y, w, v) is a vertex of Xn if and only if x
is binary valued, (2.4c) holds true, and y is an extreme point of the set
{y: (x,y) EX}.
A few remarks are pertinent at this point.
Remark 3.1. Observe that the upper bounds on the y-variables are not
needed for (12) to hold true, and that the foregoing analysis holds by sim-
ply eliminating any RLT product constraints that correspond to nonexistent
bounding restrictions. Given feasibility of the underlying mixed-integer pro-
gram, we have that x is binary valued at an optimum to any linear program
max{ cx+dy: (x, y) E XPn} for which there exists an optimal solution, that
is, for which this LP is not unbounded. Hence, by the foregoing properties,
we have that (12) holds true in this case as well.
Remark 3.2. A direct consequence of the convex hull result stated in
(12) is that the convex hull representation over subsets of the x-variables
can be computed in an identical fashion. In particular, suppose that the first
p ~ n variables are treated as binary in the set X of (1), with the remaining
492 H.D. Sherali and W.P. Adams

(n-p)x-variables relaxed to be continuous. Then it directly follows that the


pth level linearization of this relaxed problem produces conv{Xo n {(x, y) :
Xj is binary Vj = 1, ... ,pH. The special case in which p = 1 precisely
recovers the main step in the convexification argument of Balas et al. (1993)
mentioned in Section 1.
Remark 3.3. In concluding this section, let us comment on the situa-
tion in which there exist certain constraints from the first set of inequalities
in (1) which involve only the x-variables. In this case, one can multiply
such constraints with the factors Yk and (1 - Yk) for k = 1, ... , m, and then
linearize the resulting constraints using (6,7) as with the other constraints
(3,4,5). By Property 3.2, these additional constraints are implied when x is
binary in any feasible solution, but they might serve to tighten the contin-
uous relaxations at the intermediate levels. Naturally, these constraints are
implied by the other constraints defining the highest level relaxation X n . In
a likewise fashion, inequalities defining X that involve only the x-variables
can be used in the same spirit as the factors Xj ~ 0 and (1 - Xj) ~ 0 to
generate products of inequalities up to a given order level in order to further
tighten intermediate level relaxations.

4 Generating Valid Inequalities and Facets Using


RLT
Thus far, we have presented a hierarchy of relaxations leading up to the
convex hull representation for zero-one mixed-integer programming prob-
lems. A key advantage of this development is that the RLT produces an
algebraically explicit convex hull representation at the highest level. While
it might not be computationally feasible to actually generate and solve the
linear program based on this convex hull representation because of its po-
tentially exponential size, there are other ways to exploit this information
or facility to advantage.
One obvious tactic might be to identify a subset of variables along with
either original or implied constraints involving these variables for which such
a convex hull representation (or some higher order relaxation) would be
manageable. Another approach might be to exploit any special structures
in order to derive a simplified algebraic form of the convex hull represen-
tation for which the polyhedral cone that identifies all defining facets itself
possesses a special structure. Exploiting this structure, it might then be
possible to generate specific extreme directions of this cone, and hence iden-
Reformulation-Linearization for Discrete Optimization 493

tify certain (classes of) facets that could be used to strengthen the original
problem formulation. Perhaps, this algebraic simplification could yield more
compact, specially structured, representations of each level in the hierarchy,
hence enabling an actual implementation of a full or partial application of
RLT at some intermediate level in order to compose tight relaxations.
Each of these ideas have been explored in particular contexts. For ex-
ample, for the Boolean Quadric Polytope, Sherali et al. (1995) have derived
a characterization for all its defining facets via the extreme directions of
a specially structured polyhedral cone. They then demonstrate how vari-
ous known classes of facets including the clique, cut, and generalized cut
inequality facets, emerge via the enumeration of different types of extreme
directions. By examining a particular lower-dimensional projected restric-
tion of this polyhedral cone, they discover a new class of facets. (This class
can also be recovered via a combination of two types of facets for the related
cut cone.) This type of an analysis was also used by Sherali and Lee (1995)
for generalized upper bounding (GUB) constrained knapsack polytopes for
which new classes of facets have been characterized through a polynomial-
time sequential-simultaneous lifting procedure. Known lower bounds on the
coefficients of lifted facets derived from minimal covers associated with the
ordinary knapsack polytope have also been tightened using this framework.
For the set partitioning polytope, Sherali and Lee (1996) show that a number
of published valid inequalities along with constraint tightening procedures
are automatically captured within the first- and second-level relaxations
themselves. A variety of partial applications of the RLT scheme have also
been developed in order to delete fractional linear programming solutions
while tightening the relaxation in the vicinity of such solutions, and various
simplified forms of RLT relaxations have been derived by exploiting its spe-
cial structures. The polyhedral structure of many other specially structured
combinatorial optimization problems can be studied using such constructs,
and tight relaxations and strong valid inequalities can be generated for such
problems. As recently shown by Sherali (1996), even convex envelopes of
some special multilinear nonconvex functions can be generated using this
approach.
We conclude this section by pointing out another strategy that can be
used to derive strong valid inequalities in the space of the original variables
through the intermediate higher dimensional RLT relaxations Xd, for d ~ 1.
Note that by linear programming duality, the set of inequalities that define
the projection XPd onto the (x, y) space of the higher dimensional set Xd
are characterized via the set of surrogates of the constraints defining Xd that
494 H.D. Sherali and W.P. Adams

zero out the coefficients of the new variables (w,v) in the lifted space. While
the enumeration of all such projected inequalities might be prohibitive, it is
possible to formulate linear programming separation problems in this fashion
to derive specific facets of XPd that delete a given infeasible continuous
solution. The lift-and-project cutting plane algorithm of Balas et al. (1993)
precisely does this for the special case of d = 1 and using RLT bound
factor products involving only a single binary variable that turns out to
be fractional at an optimum to the current LP relaxation. Stronger level
one or higher relaxations can potentially lead to improved cuts within this
framework.

5 Multilinear Programs and Equality Constraints


with Applications to the Quadratic Assignment
Problem
In this section, we present two important extensions of the RLT procedure
described in Section 2. The first of these concerns multilinear mixed-integer
zero-one polynomial programming problems in which the continuous vari-
ables 0 ~ y ~ em appear linearly in the constraints and the objective func-
tion. This is discussed below.
Extension 1. Multilinear mixed-integer zero-one polynomial program-
ming problems.

Consider the set


m
X = {(x, y) : :E :E :E
a rtp(Jlt, J2t) + Yk 'Yrktp(Jlt, J2t) ~ fJr,
tETro k=1 tETrA:
r = 1, ... ,R,O ~ x ~ en and integer, 0 ~ Y ~ em}, (17)

=
where for all t, p(Jlt , ht) [ITjEJlt Xj][I1jEJ2t(l-xj)] are polynomial terms
for the various sets (Jlt, J 2t ), indexed by the sets Tro and TrA:' in (17).
For d = 0,1, ... , n, we can construct a polyhedral relaxation Xd for X by
using the factors Fd(JI, h) to multiply the first set of constraints as before,
where (JI, J2) are of order d. However, denoting 151 as the maximum degree
of the polynomial terms in x not involving the y-variables, and 152 as the
maximum degree of the polynomial terms in x that are associated with
products involving y-variables, in lieu of (4), we now use FDI (JI, J2) ~ 0
for (JI, J2) of order DI = min{d + dI,n}, and in lieu of (5), we employ the
Reformulation-Linearization for Discrete Optimization 495

constraints FD2 (JI, J2) ~ YkFD2 (JI, J2) ~ 0, k = 1, ... , m, for all (JI, J2)
of order D2 = min{ d + 02, n}. Of course, if D2 ~ DI, then the former
restrictions are unnecessary, as they are implied by the latter. Note that in
computing 01 and 02 in an optimization context, we consider the terms in
the objective function as well, and that for the linear case, we have 01 = 1
and 02 = o.
Now, linearizing the resulting constraints under the substitution (6,7)
produces the desired set Xd. Because of Property 3.2, which again holds
true here, when the integrality on the x-variables is enforced, each such
set Xd is equivalent to the set X. Moreover, similar to Property 3.1, each
constraint from the first set of inequalities in Xd for any d < n is obtainable
by surrogating two appropriate constraints from Xd+l. In fact, we again
obtain the hierarchy of relaxations

Extension 2: Construction of Relaxations Using Equality Constraint


Representations.

Note that given any set X ofthe form (1), by adding slack variables to the
first R constraints, determining upper bounds on these slacks as the sum
of the positive constraint coefficients minus the right-hand side constant,
and accordingly scaling these slack variables onto the unit interval, we may
equivalently write the set X as
n m
X = {(x, y) : E arjxj + E 'YrkYk + 'Yr(m+r)Ym+r = Pr for r = 1, ... , R,
j=1 k=1
o ~ X ~ en,x integer, 0 ~ Y ~ em+R}. (18)

Now, for any d E {I, ... , n}, observe that the factor Fd(JI, J2) for any
(Jl, J2) of order d is a linear combination of the factors Fp( J, O) for J ~
N,p == IJI = 0,1, ... , d. Hence, the constraint derived by multiplying an
equality from (18) by Fd(JI, J 2) and then linearizing it via (6,7) is obtainable
via an appropriate surrogate (with mixed-sign multipliers) of the constraints
derived similarly, but using the factors Fp(J,0) for J ~ N, p == IJI =
0,1, ... ,d. Hence, these latter factors produce constraints that can generate
the other constraints, and so Xd defined by (8)-(10) corresponding to X as
in (18) is equivalent to the following, where fp(J,0) == WJ, and f;(J,0) == VJk
for all p = 0,1, ... , d + 1, as in (6,7):
496 H.D. Sherali and W.P. Adams

Xd = {(x, y, W, v) : constraints of type (9) and (10) hold, and


for r = 1, ... , R,

rL
liEJ
arj - ,Br] WJ +L
jeJ
arjwJ+j +
k=l
f: 'YrkVJk + 'Yr(m+r)vJ(m+r) = °
for all J S;; N with IJI = 0,1, ... ,d}. (19)

Note that the savings in the number of constraints in (19) over that in
(8)-(10) corresponding to the set X as in (18) is given by

Also, observe that for J = 0, the equalities in (19) are precisely the
original equalities defining X in (18). Of course, because (19) is equivalent
to the set of the type (8)-(10) which would have been derived using the
factors Fd(J1 , J2) of degree d, all the foregoing results continue to hold true
for (19).
While the approach in Section 2 developed for the inequality constrained
case is convenient for theoretical purposes as it avoids the manipulation of
surrogates that would be required for the equalities in (19), note that from
a computational viewpoint, when d < n, the representation in (19) has
fewer type (8) "complicating" constraints and variables (including slacks
in (8)) than does (10) as given by the above savings expression, but has
R x 2d ( ~) additional constraints of the type (10), counting the nonneg-
ativity restrictions on the slacks in (8), for the inequality constrained case.
Hence, depending on the structure, either form of the representation of these
relaxations may be employed, as convenient.
To summarize, when constructing the level d relaxation Xd in the pres-
ence of equality constraints, each equality constraint needs to be multiplied
by the factors Fp(J, 0) for J S;; N,p == IJI = 0,1, ... ,d in the Reformulation
step. Naturally, factors Fp(J,0) that are known to be zeros, i.e., any such
factor for which we know that there does not exist a feasible solution that
has Xj = 1, Vj E J, need not be used in constructing these product con-
straints. The Linearization Phase is then applied as before.
Reformulation-Linearization for Discrete Optimization 497

Example 5.1. To illustrate both the foregoing extensions, let us con-


sider the celebrated quadratic assignment problem defined as follows.

m m
QAP: Minimize L L L L CijklXijXkl (20)
i=1 j=1 k>i l~j
m
subject to LXij = 1 Vj = 1, ... ,m (21)
i=1
m
L Xij = 1 Vi = 1, ... , m (22)
j=1
x ~ 0 and integer. (23)
First Level Relaxation:
To construct the relaxation at the first level, in addition to the con-
straints (21,22), and the nonnegativity restrictions on x in (23), we include
the product constraints obtained by multiplying each constraint in (21) and
(22) by each Xkl = Fl ( {x kl}, 0), and then apply the rest of the RLT pro-
cedure as usual, using the substitution Wijkl = XijXkNi, j, k, l. Note that
for any j E {1, ... ,m}, mUltiplying (21) by Xkj for each k E {1, ... ,m},
produces upon using X~j = Xkj that

L Wijkj = 0 Vk = 1, ... , m. (24)


i#
=
Hence, since W ~ 0, this implies that Wijkj 0 Vi :I k. Similarly, from
(22), we get Wijil = 0 Vj 1= l. Furthermore, noting that Wijkl =
Wklij, we
only need to define Wijkl Vi < k,j 1= l. (For convenience in exposition below,
=
however, we define Wijkl Vi 1= k,j 1= l, and explicitly impose Wijkl Wklij.)
Additionally, since we only need to impose Xij ~ 0 in (23) for the continuous
relaxation, since Xij :5 1 is then implied by the constraints (21,22), we only
need to multiply the factors Xkl ~ 0 and (1 - Xkl) ~ 0 with each constraint
Xij ~ 0 in the reformulation phase. This produces the restrictions Wijkl :5 Xij
and Wijkl :$ Xkl Vi,j,k,l, along with W ~ O. However, the former variable
upper bounding constraints are implied by the constraints (28,29,30) below.
Hence, the first level relaxation (that would yield a lower bound on QAP)
is given as follows.

Minimize (25)
498 H.D. Sherali and W.P. Adams

subject to (26)

LXij = 1 Vi (27)
j

L Wijkl = Xkl Vj, k, 1 :f: j (28)


i:#
L Wijkl = Xkl Vi, 1, k :f: i (29)
#1
x? 0, W ? 0, Wijkl == Wklij Vi < k,j :f: 1. (30)
Second Level Relaxation:
Following the same procedure as above, and eliminating null variables as
well as null and redundant constraints, we would obtain the same relaxation
as in (25)-(30) with additional constraints (and variables) corresponding
to multiplying each equality constraint in (21) by each factor XklXpq for
k < p, I :f: q :f: j, and by multiplying each constraint in (22) by each factor
XklXpq for k < p, k :f: i :f: p, I :f: q. This would produce the additional
constraints

L Wijklpq = Wklpq Vj, k, l,p, q, where k < p, and 1 :f: q :f: j (31)
i:#,p
L Wijklpq = Wklpq Vj, k, l,p, q, where k < p, k :f: i :f: p, (32)
#l,q
and 1 :f: q

along with Wij,kl,pq ? 0 and that this is the same variable for each permuta-
tion of

ij, kl,pq, V distinct i, k,p and j, I, q. (33)


Such relaxations have been computationally tested by Ramakrishnan et
al. (1996), and Ramachandran and Pekny (1996) with promising results.
Extensions of such specializations for general set partitioning problems are
given by Sherali and Lee (1996). In fact, Adams and Johnson (1994) show
that the lower bound produced by the first level relaxation itself subsumes
a multitude of known lower bounding techniques in the literature, including
a host of matrix reduction strategies. By designing a heuristic dual ascent
procedure for the level-one relaxation, and by incorporating dual-based cut-
ting planes within an enumerative algorithm, an exact solution technique
Reformulation-Linearization for Discrete Optimization 499

has been developed and tested by Adams and Johnson (1996), that can
competitively solve problems up to size 17. In an effort to make this algo-
rithm generally applicable, no special exploitation of flow and/or distance
symmetries was considered. As far as the strength of the RLT relaxation is
concerned, on a set of standard test problems of sizes 8-20, the lower bounds
produced by the dual ascent procedure uniformly dominated 12 other com-
peting lower bounding schemes except for one problem of size 20, where our
procedure yielded a lower bound of 2142, while an eigenvalue-based proce-
dure produced a lower bound of 2229, the optimum value being 2570 for this
problem. Recently, Resende et al. (1995) have been able to solve the first
level RLT relaxation exactly for problems of size up to 30 using an interior-
point method that employs a preconditioned conjugate gradient technique
to solve the system of equations for computing the search directions. (For
the aforementioned problem of size 20, the exact solution value of the lower
bounding RLT relaxation turned out to be 2182, compared to our dual as-
cent value of 2142.) Sherali and Brown (1994) have also applied RLT to the
problem of assigning aircraft to gates at an airport, with the objective of
minimizing passenger walking distances. The problem is modeled as a vari-
ant of the quadratic assignment problem with partial assignment and set
packing constraints. The quadratic problem is then equivalently linearized
by applying the first-level of the RLT. In addition to simply linearizing the
problem, the application of this technique generates additional constraints
that provide a tighter linear programming representation. Since even the
first-level relaxation can get quite large, we investigate several alternative
relaxations that either delete or aggregate classes of RLT constraints. All
these relaxations are embedded in a heuristic that solves a sequence of such
relaxations, automatically selecting at each stage the tightest relaxation that
can be solved with an acceptable estimated effort, and based on the solution
obtained, it fixes a suitable subset of variables to 0-1 values. This process
is repeated until a feasible solution is constructed. The procedure was com-
putationally tested using realistic data obtained from USAir for problems
having up to 7 gates and 36 flights. For all the test problems ranging from
4 gates and 36 flights to 7 gates and 14 flights, for which the size of the
first-level relaxation was manageable (having 14, 494 and 4,084 constraints,
respectively, for these two problem sizes), this initial relaxation itself al-
ways produced an optimal 0-1 solution. Finally, we mention that Adams
and Sherali (1986) had earlier developed a technique for solving general 0-1
quadratic programming problems using a relaxation that turns out to be
precisely the level-one RLT relaxation discussed above. This relaxation was
500 H.D. Sherali and W.P. Adams

shown to theoretically dominate other existing linearizations and was shown


to computationally produce far tighter lower bounds. In these computations,
we solved quadratic set covering problems having up to 70 variables and 40
constraints. For example, for this largest size problem, where the optimum
objective value was 1312, our relaxation produced an initial lower bound of
1289 at the root node, and enumerated 14 nodes to solve the problem in
79 cpu seconds on an IBM 3081 Series D24 Group K computer. When the
same algorithmic strategies were used on a relaxation that did not include
the special RLT constraints, the initial lower bound obtained was only 398,
and the algorithm enumerated 2130 nodes, consuming 197 cpu seconds.

6 Exploiting Special Structures to Generate Tighter


RLT Representations
In Section 2, we discussed a technique for generating a hierarchy of relax-
ations that span the spectrum from the continuous LP relaxation to the
convex hull of feasible solutions for linear mixed-integer 0-1 programming
problems. The key construct was to compose a set of multiplication factors
based on the bounding constraints 0 ~ x ~ en on the binary variables x,
and to use these factors to generate implied nonlinear product constraints,
then tighten these constraints using the fact that xJ == Xj 'r/j = 1, ... , n, and
subsequently linearize the resulting polynomial problem through a variable
substitution process. This process yielded a family of tighter representations
of the problem in higher-dimensional spaces.
It should seem intuitive that if one were to identify a set S of constraints
involving the x-variables that imply the bounding restrictions 0 ~ x ~ en,
then it might be possible to generate a similar hierarchy by applying a set
of multiplicative factors that are composed using the constraints defining
S. Indeed, as we exemplify in this section, such generalized so-called S-
f actor s can not only be used to construct a hierarchy of relaxations leading
to the convex hull representation, but they also provide an opportunity to
exploit inherent special structures. Through an artful application of this
strategy, one can design relaxations that are more compact than the ones
available through the RLT process of Section 2, while at the same time
being tighter, as well as affording the opportunity to construct the convex
hull representation at lower levels in the hierarchy in certain cases.
Consider the feasible region of a mixed-integer 0-1 programming problem
stated in the form
Reformulation-Linearization for Discrete Optimization 501

x = {(x,y) ERn x R m : Ax + Dy ~ b,x E S,x binary, y ~ O} (34)


where
S = {x: gix - gOi ~ 0 for i = 1, ... ,pl. (35)
Here, the constraints defining the set S have been specially composed to
generate useful, tight relaxations as revealed in the sequel. For now, in
theory, all that we require of the set S is that it implies the bounds 0 ::; x ::;
en on the x-variables, where en is a vector of n-ones. Specifically, we assume
that for each t = 1, ... ,n,
min{xt : XES} = 0 and max{xt: xES} = 1. (36)
Note that if min(xt) > 0, we can fix Xt = 1, and if max(xt) < 1, we can
fix Xt = 0, and if both these conditions hold for any t, then the problem
is infeasible. Therefore, without loss of generality we will assume that the
equalities of (36) hold for all t. We further note that (36) ensures that
p~n+1.
Now, define the sets P and P as follows, where P duplicates each index
in P n times:

P = {I, ... ,p}, and P = {n copies of P} (37)


The construction of the new hierarchy proceeds in a manner similar to that
of Section 2. At any chosen level of relaxation d E {I, ... , n}, we construct
a higher dimensional relaxation X d by considering the S-factors of order d
defined as follows:

g(J) = II (gi x - gOi) for each distinct J ~ P, IJI = d. (38)


iEJ
Note that to compose the S-factors of order d, we examine the collection
of constraint-factors giX - gOi ~ 0, i = 1, ... ,p, and construct the product of
some d of these constraint-factors, including possible repetitions. To permit
such repetitions as d varies from 1 to n, we have defined P as in (37) for use
in (38). Since the relaxation described below will recover the convex hull
representation for d = n, we need not consider d > n. Using (d -1) suitable
dummy indices to represent duplications, it can be easily verified that there
are a total of
( p +~- 1 ) = (p + d - 1)!/ d! (p - I)!
502 H.D. Sherali and W.P. Adams

distinct factors of this type at level d.


These factors are then used in a Special Structures Reformulation Lin-
earization Technique, abbreviated SSRLT, as stated below, in order to gen-
erate the relaxation X d.
(a) Reformulation Phase. Multiply each inequality defining the feasible
region (34) (including the constraints defining S) by each S-factor g(J)
of order d, and apply the identity xJ
= Xj for all j E {I, ... ,n}.
(b) Linearization Phase. Linearize the resulting polynomial program
by using the substitution defined in (39) below. This produces the dfh
level relaxation X d.

WJ = II Xj VJ ~ N, VJk = Yk II Xj VJ ~ N, Vk. (39)


jEJ jEJ

For conceptual purposes, as in Section 2, define the projection of X d onto


the space of the original variables as

XPd = {(x,y) : (x,y,w,v) E Xd} Vd = 1, ... ,no (40)

Additionally, as before, we will denote X PO = X 0 (for d = 0) as the ordinary


linear programming relaxation. Sherali, Adams and Driscoll (1996) then
show that similar to (12),

Xpo == Xo 2 XPI 2 XP2 2 ... 2 XPn = conv(X) (41)

where conv(·) denotes the convex hull of feasible solutions. Moreover, this
hierarchy dominates that of Section 2 in the sense that

(42)

For convenience in notation, let us henceforth denote by {.} L the process


of linearizing a polynomial expression {.} in x and y via the substitution
defined in (39), following the use of the identity xJ
= XjVj = 1, ... ,n.
Before proceeding further, let us highlight some important comments
that pertain to the application of SSRLT. First, in an actual implementation,
xJ
note that under the substitution = Xj for all j, several terms defining the
factors in (38) might be zeros, either by definition, or due to the restrictions
of the set X in (34). For example, if S == {x : 0 ~ x ~ en}, when d = 2,
one such factor of type (38) is xj(l- Xj) for j E {I, ... , n}, which is clearly
Reformulation-Linearization for Discrete Optimization 503

null when xJ is equated with Xj. Additionally, if S = {x : 0 $; x $; en}, then


multiplying the inequalities defining X in (3.1a) by any factor Fd(JI, h)
for which the remaining constraints of X d enforce !d(JI, J2) = 0 will yield
null constraints. Secondly, some of these factors might be implied in the
sense that they can be reproduced as a nonnegative surrogate of other such
factors that are generated in (38). For example, when d = 3 for S == {x : 0 $;
x $; en}, the factor x1x r ~ 0 of order 3 is equivalent to XtXr ~ 0 of order
2, which is implied by other nonnegative factors of order 3 generated by
the RLT constraints by Property 3.1. All such null and implied factors and
terms should be eliminated in an actual application of SSRLT. Third, if any
constraint in Xd of (34,41) is implied by the remaining constraints, then we
can simply remove this constraint from X without changing any resulting set
X d. (This same logic hold relative to the foregoing RLT process and the sets
Xd.) To illustrate, any single constraint in (21) or (22) can be removed from
the QAP formulation of Example 5.1 while preserving the strengths of the
prescribed relaxations at levels 1 and 2, saving n(n -1) and n(n -1)2(n - 2)
constraints respectively.
As evident from the foregoing comments, the RLT process described in
Section 2 is a special case of SSRLT when S = {x : 0 $; x $; en}. Note
that one obvious scheme for generating tighter relaxations via SSRLT is to
include in the latter set S certain suitable additional constraints depending
on the problem structure, and hence generate S-factors at level d that in-
clude fd(Jl, h) of order d as defined in Section 2, along with any collection
of additional S-factors as obtained via (38). Eliminating any null terms
or implied factors thus generated, a hierarchy can be generated using SS-
RLT that would dominate the ordinary RLT hierarchy of relaxations at each
level. Our focus here will largely reside on less obvious, and richer, instances
where the set S possesses a special structure that implies the restrictions
o $; x $; en, without explicitly containing these bounding constraints.
Observe that we could conceptually think of SSRLT as being an inductive
process, with the relaxation at level (d + 1) being produced by multiplying
each of the constraints in the relaxation at level d with each S-factor defin-
ing S. Constraints produced by this process that effectively use null (zero)
factor expressions g( J) of order d are null constraints. Constraints produced
by this process that effectively use factors g(J) that are implied by other
factors in (38), themselves implied by the constraints generated using the
latter factors. Hence, the process of reducing the set of factors based on
eliminating null or implied factors from use at the reformulation step, or
that of eliminating the corresponding redundant constraints generated by
504 H.D. Sherali and W.P. Adams

such factors, are equivalent steps. It follows that an equivalent relaxation at


level d would be produced by using only the non-null, non-implied factors,
recognizing any zero variable terms in the resulting relaxation as identified
by the S-factors. Furthermore, such non-redundantfnon- null factors can be
generated inductively through the levels, recognizing zero terms revealed at
previous levels. This latter relaxation is what should actually be generated
in practice. Sections 7 and 8 provide several examples. In a similar fashion
to the RLT process of Section 2, a hierarchy leading to the convex hull repre-
sentation can be generated in a piecewise manner by sequentially enforcing
binariness on sets of variables, constructing the highest level relaxation for
each considered set in turn.
Finally, let us comment on the treatment of equality constraints and
equality factors. Note that whenever an equality constraint-factor defines
an S-factor, any resulting product constraint is an equality restriction. Con-
sequently, in the presence of equality constrained factors, in general, it is
only necessary to multiply the corresponding equality constraint-factors sim-
ply with each x and y variable alone, as well as the constant 1, since the
product with any other expression in x and y can be composed using these
resulting products. Moreover, since the products with the x-variables are
already being generated via other SSRLT constraints by virtue of the cor-
responding defining equality constraints of S already being included within
X, and since x ~ 0 is implied by the inequality restrictions of xES, only
products using y variables are necessary.
Furthermore, in this connection, note that if X contains equality struc-
tural constraints, in general, then these can be treated as in Section 5. That
is, at level d, these equality constraints would simply need to be multiplied
by the factors Fp(J,0) for J ~ N,p = IJI = 0,1, ... ,d. Naturally, factors
Fp(J, 0) that are known to be zeros, i.e., any such factor for which we know
that no feasible solution exists that has Xj = 1 Vj E J, need not be used in
constructing these product constraints, and can be set equal to zero in the
relaxation.

Example 6.1. For example, suppose that we have a set S = {x : en·x =


1, x ~ o}. Then the S-factors of order 1 are the expressions that define the
restrictions
{(I - en' x) = 0, Xl ~ 0, ... , xn ~ O}, (43)

which include the equality constraint-factor along with the bound-factors


Reformulation-Linearization for Discrete Optimization 505

Xl ~ 0, ... , Xn ~ O. To compose the S- factors of order 2, note that

xt{1 - en . x) = 0 yields L w(jt) = 0 Vt, (44)


j#:t

upon using x~ == Xt and substituting W(jt) for XjXt Vj =; t according to (39).


(Note that we only need to define Wjt for j < t , and accordingly, we will
denote w(jt) == Wjt if j < t and w(jt) == Wtj if t < j.) Equation (44) along
with
(45)
produced by the other S-factors of order 2 imply that w(jt) == 0 Vj =; t, hence
yielding null factors via (44) and (45). The only non-null S-factors of order
2 are therefore produced by pairwise self-products of the constraints defining
S. But {I - en· x)2 = 0 and x; ~ O,j = 1, ... , n, respectively yield (I - en·
x) = 0, and Xj ~ 0 Vj = 1, ... , n, upon using x; == Xj and XjXt = 0 Vj =; t as
above. Hence, the reduced set of factors of order 2 are precisely the same as
those of order 1, and this continues for all levels 2, ... , n. Consequently, by
(41), the convex hull representation would necessarily be produced at level
1 itself for this example.
To produce this level 1 representation, we would multiply all the con-
straints defining X (including the ones in S) by each factor Xj ~ 0, j =
1, ... , n, from (43). However, for the equality factor (1- en· x) = 0, by the
foregoing discussion, we would only need to construct the RLT constraints
{Yk{1 - en· x)h = 0 and retain en . x = 1. The resulting relaxation would
produce the convex hull representation, as asserted above.
To further reinforce some of the preceding ideas before presenting addi-
tional specific details, we use another example that includes an equality con-
straint in S, but also explicitly includes the bound restrictions 0 :5 x :5 en.
As mentioned above, since the S-factors would now include the regular RLT
bound-factors, any S-factors other than these bound-factor products are op-
tional.

Example 6.2. Suppose that n = 4 and consider S = {x E R4 : Xl +


X2 + X3 + X4
= 2, 0:5 x :5 ed. The following factors are derived that can
be applied in SSRLT, noting the equality constraint defining S.

{a} Level 1 factors: Xj ~ 0 and {I - Xj} ~ O,j = 1, ... ,4, and optionally,
(e4 . x - 2) = 0 (to be multiplied by 1 and by each Y variable alone as
noted above).
506 H.D. Sherali and W.P. Adams

(b) Level 2 factors: Bound factors of order 2 given by {XiXj, (l-Xi}Xj,Xi(l-


Xj), and (1 - xi)(l- Xj) 'VI ~ i < j ~ 4}, and optionally, any factors
(to be applied to y ~ 0 alone) from the set {Xi - E#i XiXj = 0 'Vi =
1, ... ,4 obtained by multiplying e4 . X = 2 by each Xi, i = 1, ... ,4,
and (e4 • X - 2) = 0 itself, obtained from (e4 . X - 2}2 = 0 upon using
E#i XiXj = Xi 'Vi}.

(c) Level d factors, d = 3,4: Bound factors Fd(Jl, J2) ~ 0 of order d, with
the additional restriction that all3 rd and 4th order terms are zeros, plus
optionally, factors from the optional set at level 2. Note that the valid
implication of polynomial terms of order 3 being zero, for example, is
obtained through the RLT process by multiplying Xi - E#i XiXj = 0
with Xk, for each i, k, i =f k. This gives Ej:l:i,k XiXjXk = 0 which, by
the nonnegativity of each triple product term, implies that XiXjXk =
o'Vi =f j =f k.
In a likewise fashion, for set partitioning problems, for example, any quadratic
or higher order products of variables that involve a pair of variables that ap-
pear together in any constraint are zeros. More generally, any product term
that contains variables or their complements that cannot simultaneously
take on a value of 1 in any feasible solution can be restricted to zero. Sher-
ali and Lee (1996) use this structure to present a specialization of RLT to
derive explicit reduced level d representations in their analysis of set parti-
tioning problems.

7 Applications ofRLT for Some Particular Special


Structures
We now demonstrate how some specific special structures can be exploited in
designing an application of the general framework of SSRLT. This discussion
will also illuminate the relationship between RLT and SSRLT, beyond the
simple dominance result stated in (42). For this purpose, we employ various
commonly occurring special structures such as generalized upper bounding
(GUB) constraints, variable upper bounding (VUB) constraints, and to a
lesser degree of structure, problem sparsity. These illustrations are by no
means exhaustive; our motivation is to present the basic framework for this
approach, and encourage the reader to design similar constructs for other
applications on a case-by-case basis.
Reformulation-Linearization for Discrete Optimization 507

7.1 Generalized Upper Bounding (GUB) or Multiple Choice


Constraints
Suppose that the set S of (35) is given as follows,

S = {x: L Xj $1 Vi E Q = {I, ... ,q},x ~ O}, (46)


jEN;

where UiEQ Ni == N == {I, ... ,n}. Problems possessing this particular spe-
cial structure arise in various settings including maximum cardinality node
packing, set packing, capital budgeting, and menu planning problems among
others (see Nemhauser and Wolsey, 1988).
First, let us suppose that q == 1 in (46), so that

S == {x: en' x $ 1,x ~ O}

The S-factors of various orders for this particular set can be derived as
follows:

(a) S-factors at level 1. These factors are directly obtained from the set
S via the constraint factors (1 - en' x) ~ 0, and Xj ~ 0 Vj = 1, ... , n.
(b) S-factors at level 2. The linearization operation [xt(l - en . X)]L ~ 0
produces an expression

LW(jt) $ 0 Vt = 1, ... ,n
#t
where w(jt) == Wtj if t < j, and w(jt) == Wjt if j < t. Moreover, via the
pairwise products of Xj and Xt, j '# t, we obtain factors of the type

(XtXj) ~ 0 Vj '# t, or w(jt) ~ 0 Vj '# t.


Similar to (44) and (45), the foregoing two sets of inequalities imply that

W(jt) = 0 Vj '# t. (47)

Consequently, under (47), the only S-factors of order 2 that survive such a
cancellation are self-product factors of the type (XtXt) ~ 0 Vt, and (1 - en .
x) . (1 - en . x) ~ O. These yield the same factors as at levell, upon using
x~ = Xt Vt along with (47) as seen in Section 6. Hence, we only need to use
the factors
(1 - en . x) ~ 0 and Xj ~ 0, j = 1, ... ,n
508 H.D. Sherali and W.P. Adams

to construct the equivalent set X 2. Notice that X 2 == X I , and this equiv-


alence relation continues through all levels of relaxations up to X n . Hence,
the first level relaxation itself produces the convex hull representation in
this case. There are two insightful points worthy of note in the context of
this example. First, as illustrated next, although RLT recognizes that (47)
holds true at each relaxation level, it may not produce the convex hull rep-
resentation at the first level as does SSRLT.

Example 1.1. Let

X = {(XI,X2): 6Xl +3X2 2: 2,Xl +X2 ~ l,x binary},

and consider the generation of the first level RLT relaxation. Note that the
factors used in this context are Xj and (1 - Xj) for j = 1,2. Examining
the product of Xl + X2 ~ 1 with Xl yields Wl2 ~ 0, which together with

°
Wl2 == [XIX2JL 2: 0 yields Wl2 == O. Other products of the factors Xj and
(1 - Xj},j = 1,2, with Xl + X2 ~ 1 and ~ X ~ e2 simply reproduce these
same latter constraints.

Figure 2: The first level relaxation using RLT.

Examining the products of 6Xl + 3X2 2: 2 with these first level factors
Reformulation-Linearization for Discrete Optimization 509

yields nonredundant inequalities when the factors (1 - xI) and (1 - X2) are
used, generating the constraints 2XI +3X2 ~ 2 and 3XI +X2 ~ 1, respectively.
Hence, we obtain the first level relaxation (directly in projected form in this
case) as

Figure 2 depicts the region XPI. However, using SSRLT, by the above
argument, we would obtain XPI == conv(X) == {x : Xl + X2 = 1, x ~ O}
which is a strict subset of XPI.
A second point to note is that we could have written the generalized
upper bounding inequality en . x ~ 1 as an equality en . x + Xn+l = 1 by
introducing a slack variable Xn+1, and then recognizing the binariness of
this slack variable, we could have used this additional variable in composing
the bound-factor products while applying RLT. Although this would have
produced the same relaxation as with SSRLT, the process would have gen-
erated several more redundant constraints while applying the factors Xj and
(l-xj) for j = 1, ... , n+ 1, to the constraints, as opposed to using the fewer
factors (1 - en . x) and xj,j = 1, ... , n, as needed by SSRLT. However, in
more general cases of the set S, such a transformation that yields the same
representation using RLT as obtained via SSRLT may not be accessible.
(See Example 8.2 below, for instance.)

Example 7.2. Next, let us consider the case of q = 2 in (46). For the
sake of illustration, suppose that

S = {x E R S : Xl + X2 + X3 ~ 1, X3 + X4 + Xs ~ 1, and x ~ O} (48)

(a) S -factors at level 1: These are simply the constraints defining S.

(b) S-factors at level 2: As before, the pairwise products within each


GUB set of variables will reproduce the same factors as at the first
level, since (47) holds true within each GUB set. However, across the
two GUB sets, we would produce the nonnegative quadratic bound
factor products XIX4, XIXS, X2X4, and X2XS along with the following
factor products, recognizing that any quadratic product involving X3
is zero, as this variable appears in both GUB sets.

X4 • (1 - Xl - X2 - X3) ~ 0 yielding X4 - XIX4 - X2X4 ~ 0,


XS . (1 - Xl - X2 - X3) ~ 0 yielding Xs - XIXS - X2XS ~ 0,
510 H.D. Sherali and W.P. Adams

Xl· (1- X3 - X4 - xs) ~ 0 yielding Xl - XIX4 - XIXS ~ 0,


X2 • (1 - X3 - X4 - xs) ~ 0 yielding X2 - X2X4 - X2XS ~ 0,
and
(1 - Xl - X2 - X3) • (1 - X3 - X4 - xs) ~ 0 yielding
1 - Xl - X2 - X3 - X4 - Xs + XIX4 + XIXS + X2X4 + X2XS ~ o.

These can now be applied to the constraints defining X, recognizing


the terms that have been identified to be zeros.

(c) S-factors at levels ~ 3: Since there are only 2 GUB sets in this exam-
ple, and since any triple product of distinct factors must involve a pair
of factors coming from the defining constraints corresponding to the
same GUB set, and the latter product is zero, all such products must
vanish. Hence, all factors at level 3, and similarly at levels 4 and 5,
coincide with those at level 2. In other words, the relaxation at level
2 (defined as X 2) itself yields the convex hull representation.

In general, the level equal to the independence number of the underlying in-
tersection graph corresponding to the GUB constraints, which simply equals
the maximum number of variables that can simultaneously be 1, is sufficient
to generate the convex hull representation. In the case of (46), the convex
hull representation would be obtained at level q, or earlier.

An enlightening special case of (46) deals with the vertex packing prob-
lem. Given a graph G = (V, E) with vertex set V = {VI, V2, ••• ,Vn }, an
edge set E connecting pairs of vertices in V, and a weight Cj associated
with each vertex Vj, the vertex packing problem is to select a maximum
weighted subset of vertices such that no two vertices are connected by an
edge. For each j = 1, ... , n by denoting the binary variable Xj to equal 1
if vertex j is chosen and 0 otherwise, the vertex packing problem can be
stated as maximize Ej CjXj : Xi +Xj :$ 1 V(i,j) E E,x binary}. The convex
hull representation over any subset P of the variables can be obtained as
above, by considering any clique cover of the subgraph induced by the corre-
sponding vertices, with each set Ni corresponding to the variables defining
some clique i. In fact, given a cover that has q cliques where each edge
of E is included in some clique graph, the S-factors of level q themselves
Reformulation-Linearization for Discrete Optimization 511

Figure 3: Vertex packing graph.

define the convex hull representation, since their products with the packing
constraints, as well as with the nonnegativity restrictions on x, are implied
by these factors. To illustrate, the inequalities of (48) can be considered as
a (maximum cardinality) clique cover of the vertex packing problem on the
graph in Figure 3, and so, the stated S-factors of level 2 themselves define
the convex hull representation. This general packing observation may have
widespread applicability since, as noted by Garfinkel and Nemhauser (1973),
any finite integer linear program can be reformulated as a packing problem.
To illustrate the computational benefits of SSRLT over RLT in this
particular context, we conducted the following experiment using pseudo-
randomly generated set packing problems of the type maximize{Ej=l CjXj :
EjENi Xj ~ 1 Vi = 1, ... ,q, x binary}.
For several instances of such problems, we computed the optimal value
of the 0-1 packing problem, that of its ordinary LP relaxation, as well as the
optimal values of the first level relaxations produced by applying RLT and
SSRLT, where the latter was generated by using all of the defining clique
constraints, together with x ~ 0, to represent the set S. Table 1 gives the
percentage gaps obtained for the latter three upper bounds with respect
512 H.D. Sherali and W.P. Adams

Table 1: C omputt' a Ional resuIt s £or set pack'mgproblems.


Ordinary LP relaxation Level-one via RLT Level-one via SSRLT
% % %
(q,n) Density gap Iter (m',n') gap Iter (m',n') gap Iter
(15,25) 66 42.0 22 (170,25) 16.8 160 (123,25) 0 35
(20,30) 56 26.7 36 (319,30) 7.2 267 (214,30) 0 32
(25,35) 52 57.3 43 (481,35) 20.2 152 (328,35) 0 113
(55,45) 27 37.0 103 (2256,63) 2.0 344 (1896,63) 0 171
(35,35) 56 36.5 48 (607,35) 10.7 292 (631,35) 0 60

to the optimal 0-1 value, along with the sizes of the respective relaxations
(m' = number of constraints, n' = number of variables), and the number
of simplex iterations (ITER) needed by the OSL solver version 2.001, to
achieve optimality. Note that for all instances, the first level application
of SSRLT was sufficient to solve the underlying integer program. On the
other hand, although RLT appreciably improved the upper bound produced
by the ordinary LP relaxation, it still left a significant gap that remains
to be resolved in these problem instances. Moreover, the relatively simpler
structure of SSRLT results in far fewer simplex iterations being required to
solve this relaxation as compared with the effort required to solve RLT.

7.2 Variable Upper Bounding Constraints


This example points out that in the presence of variable upper bounding
(VUB) types of restrictions, a further tightening of relaxations via SSRLT,
beyond that of RLT, can be similarly produced.

Example 7.3. Consider a set S that is composed as follows in a partic-


ular problem instance:

The first level factors for this instance are given by Xl ~ 0, X2 - Xl ~ 0, X3 -


X2 ~ 0, 1 - X3 ~ 0, X4 ~ 0, Xs - X4 ~ 0, 1 - Xs ~ 0, X6 ~ 0, and 1 - X6 ~ 0.
Compared with the RLT factors, these yield tighter constraints as they
imply the RLT factors. For dE {1, ... , 6}, taking these factors d at a time,
including self-products, and simplifying these factors by eliminating null or
implied factors, would produce the relaxation X d.
Reformulation-Linearization for Discrete Optimization 513

It is interesting to note in this connection that the VUB constraints


of the type 0 ::; Xl ::; X2 ::; ••• ::; Xk ::; 1 used in this example can
be equivalently transformed into a GUB constraint via the substitution
Zj = Xj - Xj-l for j = 1, ... , k, where Xo == O. The inverse transforma-
tion yields Xj = '2:1=1 Zt for j = 1, ... , k, thereby producing the equivalent
representation Zl + Z2 + ... + Zk ::; 1, Z ~ O. Under this transformation, and
imposing binary restrictions on all the z-variables, the reformulation strate-
gies described in Section 7.1 above can be employed. However, note that
the process of applying RLT to the original or to the transformed problem
can produce different representations. This is illustrated next.

Example 7.4. Consider the set

The convex hull of feasible solutions is given by 0 ::; Xl = X2 ::; 1 (see Figure
4(a». This representation is produced by the level-l SSRLT relaxation using
the VUB constraints to define S, where the relevant constraint Xl ~ X2 which
yields Xl = X2 is obtained by noting that the factor products [XI(X2-xd]L ~
o and [xr{1 - X2)]L ~ 0 respectively give Wl2 ~ Xl and Wl2 ::; Xl, or that
Wl2 = Xl. This together with the constraint [(X2 - xI)(1 + 6XI - 3X2)]L ~ 0
yields -2(X2 - xd ~ 0, or that Xl ~ X2.
On the other hand, constructing RLT at level 1 by applying the factors
Xj and (1 - Xj),j = 1,2, to the inequality restrictions of X, produces the
relaxation (directly in projected form)

where Wl2 = Xl is produced as with SSRLT, and where the first two con-
straints defining X PI result from the product constraints [( 1+6XI - 3X2) (1-
Xt)]L ~ 0 and [(1 + 6XI - 3X2)X2]L ~ 0, respectively. Figure 4(a) depicts the
region defined by this relaxation.
However, if we were to apply the transformation Zl = Xl, Z2 = X2 - Xl to
X, where the inverse transformation is given by Xl = Zl and X2 = Zl + Z2,
the problem representation in z-space becomes

where the binariness on Z2 has been additionally recognized. Figure 4(b)


illustrates that the set conv(Z) is given by the constraints 0 ::; Zl ::; 1, Z2 = O.
514 H.D. Sherali and W.P. Adams

convex hull

(a)

Z 2

ZPl =conv(Z)

Z 1

(b)

Figure 4: Depiction of the first level relaxations using RLT for Example 7.4.
Reformulation-Linearization for Discrete Optimization 515

Now, applying RLT to this transformed space, the relevant constraint Z2 = 0


is produced via Z2 ~ 0 and the first level product constraint [(1 + 3Z1 -
3Z2)Z2]L ~ 0 which yields -2Z2 ~, where [ZlZ2]L == 0 from [ZlZ2]L ~ 0 and
[zl(1 - Zl • Z2)]L ~ O. Hence, for this transformed problem, RLT produces
the first level relaxation ZP1 = conv(Z), while we had XP1 C conv(X) when
applying RLT to the original problem. However, as with Example 7.1 for
the case q = 1 (treating that as the transformed z- variable problem), we
could have possibly obtained ZP1 ::> conv(Z) as well, whereas SSRLT would
necessarily produce the convex hull representation at level one in either case.

7.3 Sparse Constraints

In this example, we illustrate how one can exploit problem sparsity. Sup-
pose that in some 0-1 mixed-integer problem, we have the knapsack con-
straint (either inherent in the problem or implied by it) given by 2X1 + X2 +
2X3 ~ 3. The facets of the convex hull of {(Xl, X2, X3) : 2X1 + X2 + 2X3 ~
3, Xi binary, i = 1,2, 3}, can be readily obtained as {Xl + X2 + X3 ~ 2, Xl ~
1,X2 ~ 1,X3 ~ 1}. Similarly, another knapsack constraint might be of the
type X4 + 2X5 + 2X6 ~ 2, and the corresponding facets of the convex hull
of feasible 0-1 solutions can be obtained as {X4 + X5 + X6 ~ 1,X4 ~ 0,X5 ~
0, X6 ~ O}. The set S can now be composed of these two sets of facets,
along with other similar constraints involving the remaining variables on
which binariness is being enforced, including perhaps, simple bound con-
straint factors. Note that in order to generate valid tighter relaxations, we
can simply enforce binariness on variables that fractionate in the original
linear programming relaxation in the present framework. Furthermore, en-
tire convex hull representations of underlying knapsack polytopes are not
necessary - simply, the condition (36) needs to be satisfied, perhaps by
explicitly including simple bounding constraints. This extends the strategy
of Crowder et al. (1983) in using facets obtained as liftings of minimal cov-
ers from knapsack constraints within this framework, in order to generate
tighter relaxations.
516 H.D. Sherali and W.P. Adams

8 Using Conditional Logic to Strengthen RLT Con-


straints: Application to the Traveling Salesman
Problem
In all of the foregoing discussion, depending on the structure of the problem,
there is another idea that we can exploit to even further tighten the RLT
constraints that are generated. This trick deals with the use of conditional
logic in the generation of RLT based constraints that can enable the tight-
ening of a relaxation at any level, which would otherwise have been possible
only at some higher level in the hierarchy. In effect, this captures within the
RLT process the concepts of branching and logical preprocessing, features
that are critical to the efficient solution of discrete optimization problems.
To introduce the basic concept involved, for simplicity, consider the fol-
lowing first level SSRLT constraint that is generated by multiplying a factor
(ax - f3) ~ 0 with a constraint ('Yx - 8) ~ 0, where x is supposed to be
binary valued, and where the data is all-integer. (Similar extensions can
be developed for mixed-integer constraints, as well as for higher-order SS-
RLT constraints in which some factor is being applied to some other valid
constraint-or-bound- factor product of order greater than one.)

{(ax - f3)('Yx - 8)}L ~ o. (49)

Observe that if ax = f3, then (ax - f3)('Yx - 8) ~ 0 is valid regardless of


the validity of 'YX ~ 8. Otherwise, we must have ax ~ f3 + 1 (or possibly
greater than f3 + 1, if the structure of ax ~ f3 so permits), and we can
then perform standard logical preprocessing tests (zero-one fixing, coefficient
reduction, etc. - see Nemhauser and Wolsey, 1988, for example) on the
set of constraints ax ~ f3 + 1, 'Yx ~ 8, x binary, along with possibly other
constraints, to tighten the form of 'YX ~ 8 to the form 'Y' x ~ 8'. For example,
if ax ~ f3 is of the type (1 - Xj) ~ 0, for some j E {I, ... , n}, then the
restriction ax ~ f3 + 1, x binary, asserts that Xj = 0, and so, 'YX ~ 8
can be tightened under the condition that Xj = O. (Similarly, in a higher-
order constraint, if a factor Fd(JI, J2) multiplies 'YX ~ 8, then the latter
constraint can be tightened under conditional logical tests based on setting
Xj = 1 Vj E J1 and Xj = 0 Vj E J2.)
Additionally, the resulting constraint 'Y' x ~ 8' can potentially be further
tightened by finding the maximum () ~ 0 for which 'Y' x ~ 8' + () is valid when
Reformulation-Linearization for Discrete Optimization 517

ax ~ {3 + 1 is imposed by considering the problem

e = -8' +minb' x : "'1' x ~ d' ,ax ~ {3+ 1, any other valid inequalities x binary},
(50)
and by increasing 8' by this quantity e. Note that, of course, we can simply
solve the continuous relaxation of (50) and use the resulting value after
rounding it upwards (using e = 0 if this value is negative), in order to
impose the following SSRLT constraint, in lieu of the weaker restriction
(49), within the underlying problem:

{(ax - (3)(-y'x - 8' - e}h ~ 0 (51)

Observe that this also affords the opportunity to now tighten the factor
(ax - {3} in a similar fashion in (51), based on the valid constraint "f'X ~
8' + e. Interestingly, the sequential lifting process studied by Balas and
Zemel (1978), for lifting a minimal cover inequality into a facet for a full-
dimensional knapsack polytope can be viewed as a consequence of this RLT
construct (see Sherali et al., 1996).

Example 8.1. To illustrate, consider the following knapsack constraint


in binary variables Xl, X2, and X3 : 2XI + 3X2 + 3X3 ~ 4. Let us examine the
RLT constraint
(52)
Applying the foregoing idea, we can tighten (52) under the restriction that
X3 = 1, knowing that it is always valid when X3 = 0 regardless of the
nonnegativity of any expression contained in [.]. However, when X3 = 1,
the given knapsack constraint becomes 2XI + 3X2 ~ 1, which by coefficient
reduction, can be tightened to Xl + X2 ~ 1. Hence, (52) can be replaced by
the tighter restriction

(53)
Similarly, consider the RLT constraint

(54)
This time, imposing (I-X3) ~ 1 or X3 = 0, the knapsack constraint becomes
2XI + 3X2 ~ 4 which implies via standard logical tests that Xl = X2 = 1.
Hence, we can impose the equalities
518 H.D. Sherali and W.P. Adams

in lieu of (54), which is now implied. Observe that in this example, the
sum of the RLT constraints in (53) and (55) yield Xl + X2 + X3 ;::: 2 which
happens to be a facet of the knapsack polytope conv{x : 2XI + 3X2 + 3X3 ;:::
4, X binary}. This facet can alternatively be obtained by lifting the minimal
cover inequality Xl + X2 ;::: 1 as in Balas and Zemel (1978).

Example 8.2. To illustrate the use of the foregoing conditional logic


based tightening procedure in the context of solving an optimization problem
using general S-factors, consider the following problem:

Maximize {Xl + X2 : 6XI + 3X2 ;::: 2, xES, X binary} (56)


where S = {(XI,X2) : 2XI + X2 ::; 2,XI + 2X2 ::; 2,x;::: O}.

Figure 5 depicts this problem graphically. The integer problem has the op-
timal value v(IP) = 1, attained at the solution (1,0) or (0,1). The ordinary
LP relaxation has the optimal value v(LP) = 1/3 attained at the solution
(1/3,0).
Next, let us consider the first level relaxation (RLT-1) produced by RLT,
using the factors Xj and (1 - Xj), j = 1,2 , to multiply all the problem
constraints. Note that [(2 - 2XI - X2)XI]L yields Wl2 ::; 0, which together
with [XIX2]L == Wl2 ;::: 0, gives Wl2 = O. The other non-redundant constraints
defining this relaxation are produced via the product constraints [(l-XI)(l-
X2)]L ;::: 0, [(6Xl + 3X2 - 2)(1 - XI}]L ;::: 0, and [(6XI + 3X2 - 2)(1 - X2)]L ;:::
0, which respectively yield (using Wl2 = 0), Xl + X2 ::; 1,2xI + 3X2 ;:::
2, and 3XI + X2 ;::: 1. This gives (directly in projected form)

Figure 5 depicts this region. The optimal value using this relaxation is
given by v(RLT-1} = 5/7, attained at the solution (1/7, 4/7).
On the other hand, to construct the first level relaxation (SSRLT-1)
produced by SSRLT, we would employ the S-factors given by the constraints
in (56). As in RLT, we obtain W12 = 0, and using this, the other non-
redundant constraints defining this relaxation are produced via the product
constraints

[(2 - 2Xl - x2)(2 - Xl - 2X2)]L ;::: 0,


[(6XI + 3X2 - 2)(2 - 2XI - X2)]L ;::: 0 and
[(6XI + 3X2 - 2)(2 - Xl - 2X2}]L ;::: 0,
Reformulation-Linearization for Discrete Optimization 519

B=(I/7,417), C=(1/6,2/3)
ABD= Xp/
ACD= X;;
AD = conv(X) = enhanced (X PI )

Figure 5: Depiction of the various first level RLT relaxations.

which respectively yield, Xl + X2 ::; I,4xI + 5X2 ~ 4, and 2XI + X2 ~ 1. This


gives {directly in projected form}

Figure 5 depicts this region. Note that X PI C XPI and that the optimal
value of this relaxation is given by v{SSRLT-I} = 5/6 > v{RLT-I}, and is
attained at the solution {I/6, 2/3}.
Now, consider an enhancement of SSRLT-I using conditional logic (a
similar enhancement can be exhibited for RLT-I) . Specifically, consider the
product constraint [{2 - 2XI - X2)(6xI + 3X2 - 2}]L ~ 0 of the form {49}.
Imposing (2-2xI-X2) ~ 1 as for {49}, i.e., 2XI +X2::; 1, yields Xl = 0 by a
standard logical test. This together with 6XI + 3X2 ~ 2 implies that X2 = 1.
Hence, the tightened form of this constraint is {2 - 2XI - x2)(xd = 0 and
{2 - 2XI - x2)(1 - X2} = O. The first constraint yields Wl2 = 0 {as before},
while the second constraint states that Xl + X2 = 1. This produces the
convex hull representation, and so, this enhanced relaxation now recovers
an optimal integer solution.
520 H.D. Sherali and W.P. Adams

8.1 Application to the Traveling Salesman Problem


In this example, we demonstrate the potential utility of the various concepts
developed in Sections 6-8 by applying them to the celebrated traveling sales-
man problem (TSP). We assume the case of a general (asymmetric) TSP
defined on a totally dense graph (see Lawler et al., 1985, for example). For
this problem, Desrochers and Laporte (1991) have derived a strengthened
version of the Miller-Thcker-Zemlin (MTZ) formulation obtained by lifting
the MTZ-subtour elimination constraints into facets of the underlying TSP
polytope. While the traditional MTZ formulation of TSP is well known to
yield weak relaxations, Desrochers and Laporte exhibit computationally that
their lifted-MTZ formulation significantly tightens this representation. We
show below that an application of SSRLT concepts to the MTZ formulation
of TSP, used in concert with the conditional logic based strengthening pro-
cedure, automatically recovers the formulation of Desrochers and Laporte.
Toward this end, consider the following statement of the asymmetric
traveling salesman problem, where Xij = 1 if the tour proceeds from city i
to city j, and is 0 otherwise, for all i,j = 1, ... , n, i # j.
ATSP: Minimize Ei E#i CijXij

subject to: LXij = 1 V i = 1, ... ,n (57)


i#i
LXij = 1 V j = 1, ... ,n (58)
i=f.j
Uj ~ (Ui + 1) - (n - 1)(1 - Xij) V i,j ~ 2,i #j (59)
1 ~ Uj ~ (n -1) V j =2, ... ,n (60)
Xij binary V i,j = 1, ... ,n,i #j. (61)

Note that (57, 58, and 61) represent the assignment constraints and (59) and
(60) are the MTZ subtour elimination constraints. These latter constraints
are derived based on letting Uj represent the rank order in which city j is
visited, using Ul = 0, and enforcing that Uj = Ui + 1 whenever Xij = 1 in any
binary feasible solution. Now, in order to construct a suitable reformulation
using SSRLT, let us compose the set S as follows, and include this set of
implied inequalities within the problem ATSP stated above.

S == {x: Xij +Xji ~ 1 Vi,j ~ 2,i < j,Xlj +Xjl ~ 1 Vj ~ 2,


Xij ~ 0 Vi,j, i # j}. (62)
Reformulation-Linearization for Discrete Optimization 521

Note that S is comprised of simple two-city subtour elimination constraints


of the form proposed by Dantzig, Fulkerson, and Johnson (1954). Next,
let us construct the following selected S-factor constraint products. First,
consider the product constraint generated by multiplying (59) with the S-
factor Xij. Using conditional logic as with (49), and noting that we can
impose'l.l.j = ('I.I.i + 1) when Xij = 1, this yields the constraint

(63)

Similarly, considering [Xlj('I.I.j - 1)]L ~ 0 from (60), and enhancing this by


the conditional logic that 'I.I.j = 1 when Xlj = 1, we get

XljUj = Xlj 'Vj = 2, ... ,n. (64)

Repeating this with the upper bounding constraint in (60), we can enhance
[Xjl(n - 1- 'I.I.j)]L ~ 0 to the following constraint, noting that 'I.I.j = (n - 1)
if Xjl = 1:
XjlUj = (n - I)Xjl 'Vj = 2, ... , n. (65)
Next, let us consider the product of (59) with the S-factor (I-Xij -Xji) ~ o.
This gives the constraint [(1- Xij - Xji)('I.I.j - 'I.I.i -1 + (n -1)(1- Xij))]L ~ o.
Using X~j = Xij and XijXji = 0 (since Xij + Xji :5 1), the foregoing constraint
becomes

From (63), we get (Uj - 'I.I.i)Xij = Xij, and upon interchanging i and j,
this yields ('I.I.j - Ui)Xji = -Xji. Substituting this into the foregoing SSRLT
constraint, we obtain the valid inequality

Uj ~ ('I.I.i + 1) - (n -1)(1- Xij) + (n - 3)Xji 'Vi,j ~ 2,i =f j. (66)

Similarly, multiplying (60) with the S-factor (1 - Xlj - xjd ~ 0 yields


[(1- Xlj - xjd(uj -1)]L ~ 0 and [(1- Xlj - Xjl)(n - 1- Uj)]L ~ O. Using
the conditional logic procedure, under Xlj = Xjl = 0, these constraints can
be respectively tightened to [(1 - Xlj - Xjl)(Uj - 2)]L ~ 0 and [(1 - Xlj -
Xjl)(n - 2 - 'I.I.j)]L ~ O. Simplifying these products and using (64) and (65)
yields the constraints

1 + (1 - Xlj) + (n - 3)Xjl :5 'I.I.j :5 (n - 1) - (1 - Xjl) - (n - 3)Xlj. (67)


522 H.D. Sherali and W.P. Adams

Observe that (66) and (67) are tightened versions of (59) and (60), re-
spectively, and are precisely the facet- defining, lifted-MTZ constraints de-
rived by Desrochers and Laporte (1991). Hence, we have shown that a
selected application of SSRLT used in concert with our conditional logic
based strengthening procedure automatically generates this improved MTZ
formulation. Sherali and Driscoll (1997) have developed further enhance-
ments that tighten and subsume this formulation for both the ordinary as
well as for the precedence constrained version of the asymmetric traveling
salesman problem, using these SSRLT constructs along with conditional
logic implications.

9 Conclusions and Extensions: Persistency, Gen-


eral Integer Programs, and Computational Ex-
pedients
The hierarchy of relaxations emerging from the RLT can be intuitively
viewed as "stepping stones" between continuous and discrete sets, leading
from the usual linear programming relaxation to the convex hull at level-
n. By inductively progressing along these stepping-stone formulations, we
have studied some novel persistency issues for certain constrained and un-
constrained pseudo-Boolean programming problems. Given the tight linear
programming relaxations afforded by RLT, a pertinent question that can be
raised is that if we solve a particular cfh level representation in the RLT hier-
archy, and some of the n variables turn out to be binary valued at optimality
to the underlying linear program, then can we expect these binary values
to ''persist'' at optimality to the original problem? Adams, Lassiter and
Sherali (1993) derive sufficient conditions in terms of the dual solution that
guarantee such a persistency result. For the unconstrained pseudo-Boolean
program, we show that for d = 1 or for d ~ n - 2, persistency always holds.
However, using an example with d = 2 and n = 5, we show that without
the additional prescribed sufficient conditions, persistency will not hold in
general. These results are also extended to constrained polynomial 0-1 pro-
gramming problems. In particular, the analysis here reveals a class of 0-1
linear programs that possess the foregoing persistency property. Included
within this class as a special case is the popular vertex packing problem,
shown earlier in the literature to possess this property.
Thus far, we have been confining our attention to binary discrete vari-
ables. Conceptually, for a more general discrete integer program, one could
Reformulation-Linearization for Discrete Optimization 523

employ a binary transformation to rewrite the problem as a 0-1 mixed-


integer program, and then apply the foregoing RLT constructs. However,
Adams and Sherali (1997) have shown that a more novel direct approach
can lead to more insightful and compact representations, that are nonethe-
less equivalent to those obtained via the usual RLT process applied to the
transformed binary problem.
To see the basic form of this approach, consider the following feasible
region of a discrete mixed-integer program.

x= {(X,y) ERn x R m : Ax+Dy ~ b (68)


Xj E Sj == {ejk,k = 1, ... ,kj}, Vj = 1, ... ,n (69)
y ~ O}, (70)

where ejk,k = 1, ... ,kj,j = 1, ... ,n are discrete real numbers (of either
sign). As a generalization of the bound factors Xj and (1 - Xj) when Xj is
binary, let us define the Lagrange Interpolation Polynomials (LIP)

np#{Xj - ejp) .
Ljk=n (e. -e.) Vk=l, ... ,kj,J=l, ... ,n. (71)
p::pk Jk JP

The RLT process employs LIP factors or order d composed by selecting


some d distinct indices j, and for each selected index, choosing some LIP
L jk , k E {I, ... , kj}. The procedure then proceeds as follows for construct-
ing the relaxation Xd , for any dE {O, 1, ... , n}.

Reformulation Phase:
1. Multiply (68, 70) by all possible LIP factors of order d.

2. Include nonnegativities on all possible LIP factors of order D, where


D = min{d + 1,n}.

3. Use the following identity in the resulting nonlinear program

xjLjk = ejkLjk {or (Xj - ejk)Ljk = 0) Vk = 1, ... , kj, j = 1, ... , n.


(72)
Note that (72) corresponds to the step of letting x~ = Xj, or setting
xj{l - Xj) = 0, whenever Xj is binary as in the previous analysis.

4. Model (69) by including the following constraints via the introduction


of binary variables Xjk, k = 1, ... , kj,j = 1, ... , n. {Note that binari-
ness on the xjk-variables are relaxed below for defining the polyhedron
524 H.D. Sherali and W.P. Adams

kj

Xj = L 6jk xjk Vj = 1, ... ,n


k=l
kj

LXjk = 1, Vj = 1, ... ,n
k=l
Xjk ~ 0 Vk= 1, ... ,kj,j = 1, ... ,n.

Linearization Phase: Linearize the resulting problem obtained via


Steps 1-4 of the Reformulation Phase by substituting a particular variable
for each distinct product of the x-variables thus generated, and similarly,
for each distinct product of Yt with the x-variables, Vt. (Note that these
products terms can include self-products of variables.) This produces a
polyhedron Xd in higher dimensions whose projection XPd onto the (x, y)
space can be shown to satisfy the usual RLT hierarchy

Moreover, the relationship between the binary transformed problem and the
original discrete problem can be used to translate valid inequalities that are
derived in the former space to those represented in the original (x, y) variable
space. This construct can be useful in implementing a cutting plane or a
branch-and-cut approach for general integer programs.
As far as the use of RLT as a practical computational aid for solving
mixed-integer 0-1 problems is concerned, one may simply work with the
relaxation Xl itself, which has frequently proven to be beneficial. Using
variations on the first-order implementation of RLT (a partial generation
of Xl enhanced by additional constraint products), Adams and Johnson
(1994), Adams and Sherali (1984, 1986, 1990, 1991, 1993), Sherali and
Brown (1994), Sherali, Krishnamurthy, and AI-Khayyal (1996), and Sherali,
Ramachandran, and Kim (1994), have shown how highly effective algorithms
can be constructed for various classes of discrete problems and applications.
A study of special cases of this type has provided insights into useful imple-
mentation strategies based on first-order products. Additionally, techniques
for generating tight valid inequalities that are implied by higher order re-
laxations may be devised, or explicit convex hull representations or facetial
inequalities could be generated by applying the highest order RLT scheme
to various subsets of sparse constraints that involve a manageable number of
Reformulation-Linearization for Discrete Optimization 525

variables. The lattermost strategy would be a generalization of using facets


of the knapsack polytope from minimal covers (see Balas and Zemel, 1978,
and Zemel, 1987), so successfully implemented by Crowder et al. (1983) and
Hoffman and Padberg (1991). An alternative would be to apply a higher
order scheme on a subset of binary variables that turn out to be fractional
in the initial linear programming solution. Letting Xj,j E Jj, represent
such a subset, it follows from Sherali and Adams (1989) that by deriving
XIJ,I based on this subset, which is manageable if IJfl is relatively small,
the resulting linear program will yield binary values for Xj, Vj E Jf. This
would be akin to employing judicious partial convex hull representations.
The development and testing of many such implementation strategies are
under present study.
Finally, we address the task of solving RLT-based relaxations. While the
RLT process leads to tight linear programming relaxations for the under-
lying discrete problem being solved as discussed above, one has to contend
with the repeated solutions of such large-scale linear programs. By the na-
ture of the RLT process, these linear programs possess a special structure
induced by the replicated products of the original problem constraints (or
its subset) with certain designated variables. At the same time, this process
injects a high level of degeneracy in the problem since blocks of constraints
automatically become active whenever the factor expression that generated
them turns out to be zero at any feasible solution and the condition number
of the bases can become quite large. As a result, simplex-based procedures
and even interior-point methods experience difficulty in coping with such re-
formulated linear programs (see Adams and Sherali, 1993, for some related
computational experience). On the other hand, a Lagrangian duality based
scheme can not only exploit the inherent special structures, but can quickly
provide near optimal primal and dual solutions that serve the purpose of ob-
taining tight lower and upper bounds. However, for a successful use of this
technique, there are two critical issues. First, an appropriate formulation
of the underlying Lagrangian dual must be constructed (see Fisher, 1981).
Sherali and Myers (1989) also discuss and test various strategies and provide
guidelines for composing suitable Lagrangian dual formulations. Second, an
appropriate nondifferentiable optimization technique must be employed to
solve the Lagrangian dual problem. For the size of problems that are encoun-
tered by us in the context of RLT, it appears imperative to use conjugate
subgradient methods as in Camerini et al. (1975), Sherali and Ulular (1989),
and Sherali et al. (1995), that employ higher-order information, but in a
manner involving minor additional effort and storage over traditional sub-
526 H.D. Sherali and W.P. Adams

gradient algorithms. Since these types of algorithms are not usually dual
adequate (Geoffrion, 1972), for algorithms that require primal solutions for
partitioning purposes, some extra work becomes necessary. For this pur-
pose, one can either use powerful LP solvers such as CPLEX (1990) or OB1
(Marsten, 1991) on suitable surrogate versions of the problem based on the
derived dual solution, or apply primal solution recovery procedures as in
Sherali and Choi (1995) along with primal penalty function techniques as in
Sherali and Ulular (1989).
In closing, we anticipate that automatic reformulation techniques, such
as the RLT scheme discussed in this paper will playa crucial role over the
next decade in enhancing problem solving capability. Ongoing advances
in solving large-scale linear programming problems will provide a further
impetus to such techniques, which typically tend to derive tighter represen-
tations in higher dimensions and with additional constraints. Furthermore,
newer relaxations based on semidefinite programming approaches as moti-
vated by the work of Lovasz and Shrijver (1991), along with interior point
approaches for solving such relaxations (see Overton and Wolkowicz, 1977),
hold promise for future advancements.

Acknowledgement: This paper is based on excerpts from the book by


the authors entitled, "A Reformulation-Linearization Technique for Solving
Discrete and Continuous Nonconvex Problems," that is to be published by
Kluwer Academic Publishers, 1997. Thanks are also due to the research
support of the National Science Foundation under Grant Number DMI-
9521398 and the Air Force Office of Scientific Researchunder Grant Number
F49620-96-1-0274.

References
[1] Adams, W. P. and T. A. Johnson, An Exact Solution Strategy for the
Quadratic Assignment Problem Using RLT-Based Bounds, Working pa-
per, Department of Mathematical Sciences, Clemson University, Clem-
son, SC, 1996.

[2] Adams, W. P. and T. A. Johnson, Improved Linear Programming-Based


Lower Bounds for the Quadratic Assignment Problem, DIMACS Series
in Discrete Mathematics and Theoretical Computer Science, Quadratic
Assignment and Related Problems, eds. P. M. Pardalos and H. Wolkow-
icz, 16, 43-75, 1994.
Reformulation-Linearization for Discrete Optimization 527

[3] Adams, W. P. and H. D. Sherali, A Tight Linearization and an Algorithm


for Zero-One Quadratic Programming Problems, Management Science,
32(10), 1274-1290, 1986.
[4] Adams, W. P. and H. D. Sherali, Linearization Strategies for a Class of
Zero-One Mixed Integer Programming Problems, Operations Research,
38(2), 217-226, 1990.
[5] Adams, W. P. and H. D. Sherali, Mixed-Integer Bilinear Programming
Problems, Mathematical Programming, 59(3), 279-305, 1993.
[6] Adams, W. P., A. Billionnet, and A. Sutter, Unconstrained 0-1 Optimiza-
tion and Lagrangean Relaxation, Discrete Applied Mathematics, 29(2-3),
131-142, 1990.
[7] Adams, W. P., J. B. Lassiter, and H. D. Sherali, Persistency in 0-1
Optimization, under revision for Mathematics of Operations Research,
Manuscript, 1993.
[8] Balas, E., Disjunctive Programming and a Hierarchy of Relaxations for
Discrete Optimization Problems, SIAM Journal on Algebraic and Dis-
crete Methods, 6(3), 466-486, 1985.
[9] Balas, E. and J. B. Mazzola, Nonlinear 0-1 Programming: I. Lineariza-
tion Techniques, Mathematical Programming, 30, 2-12, 1984a.
[10] Balas, E. and J. B. Mazzola, Nonlinear 0-1 Programming: II. Domi-
nance Relations and Algorithms, Mathematical Programming, 30, 22-45,
1984b.
[11] Balas, E. and E. Zemel, Facets of the Knapsack Polytope from Minimal
Covers, SIAM Journal of Applied Mathematics, 34, 119-148, 1978.
[12] Balas, E., S. Ceria, and G. Cornuejols, A Lift-and-Project Cutting Plane
Algorithm for Mixed 0-1 Programs, Mathematical Programming, 58(3),
295-324, 1993.
[13] Boros, E., Y. Crama and P. L. Hammer, Upper Bounds for Quadratic
0-1 Maximization Problems, RUTWR Report RRR # 14-89, Rutgers
University, New Brunswick, NJ 08903, 1989.
[14] Camerini, P. M., L. Fratta, and F. Maffioli, On Improving Relaxation
Methods by Modified Gradient Techniques, Mathematical Programming
Study 3, North-Holland Publishing Co., New York, NY, 26-34, 1975.
528 H.D. Sherali and W.P. Adams

[15] CPLEX, Using the CPLEX Linear Optimizer, CPLEX Optimization,


Inc., Suite 279, 930 Tahoe Blvd., Bldg. 802, Incline Village, NV 89451,
1990.
[16] Crowder, H., E. L. Johnson, and M. W. Padberg, Solving Large-Scale
Zero-One Linear Programming Problems, Operations Research, 31, 803-
834, 1983.
[17] Desrochers, M. and G. Laporte, Improvements and Extensions to the
Miller- Thcker-Zemlin Subtour Elimination Constraints, Operations Re-
search Letters, 10(1}, 27-36, 1991.
[18] Fisher, M. L., The Lagrangian Relaxation Method for Solving Integer
Programming Problems, Management Science, 27(1}, 1-18, 1981.
[19] Garfinkel, R. S. and G. L. Nemhauser, A Survey of Integer Programming
Emphasizing Computation and Relations Among Models, In Mathemat-
ical Programming: Proceedings of an Advanced Seminar, T. C. Hu and
S. Robinson (eds.), Academic Press, New York, NY, 77-155, 1973.
[20] Geoffrion, A. M., Lagrangian Relaxation for Integer Programming,
Mathematical Programming Study 2, M. L. Balinski (ed.), North- Hol-
land Publishing Co., Amsterdam, 82-114, 1974.
[21] Geoffrion, A. M. and R. McBryde, Lagrangian Relaxation Applied to
Facility Location Problems, AIlE Transactions, 10, 40-47, 1979.
[22] Glover, F., Improved Linear Integer Programming Formulations of Non-
linear Integer Problems, Management Science, 22(4), 455-460, 1975.
[23] Hoffman, K. L. and M. Padberg, Improving LP-Representations of
Zero-One Linear Programs for Branch-and-Cut, ORSA Journal on Com-
puting, 3(2), 121-134, 1991.
[24] Jeroslow, R. G. and J. K. Lowe, Modeling with Integer Variables, Math-
ematical Programming Study 22, 167-184, 1984.
[25] Jeroslow, R. G. and J. K. Lowe, Experimental Results on New Tech-
niques for Integer Programming Formulations, Journal of the Opera-
tional Research Society, 36, 393-403, 1985.
[26] Johnson, E. L., Modeling and Strong Linear Programs for Mixed Inte-
ger Programming, Algorithms and Model Formulations in Mathematical
Reformulation-Linearization for Discrete Optimization 529

Programming, NATO ASI 51, (ed.) S. Wallace, Springer-Verlag, 3-43,


1989.

[27] Johnson, E. L., M. M. Kostreva, and U. H. Suhl, Solving 0-1 Inte-


ger Programming Problems Arising From Large Scale Planning Models,
Operations Research, 33(4), 803-819, 1985.

[28] Lovasz, L. and A. Schrijver, Cones of Matrices and Set Functions, and
0-1 Optimization, SIAM J. Opt., 1, 166-190, 1991.

[29] Magnanti, T. L. and R. T. Wong, Accelerating Benders Decomposition:


Algorithmic Enhancement and Model Selection Criteria, Operations Re-
search, 29, 464-484, 1981.

[30] Martin, K. R., Generating Alternative Mixed-Integer Programming


Models Using Variable Redefinition, Operations Research, 35, 820-831,
1987.

[31] Meyer, R. R., A Theoretical and Computational Comparison of 'Equiv-


alent' Mixed-Integer Formulations, Naval Research Logistics Quarterly,
28, 115-131, 1981.

[32] Nemhauser, G. L. and L. A. Wolsey, Integer and Combinatorial Opti-


mization, John Wiley & Sons, New York, 1988.

[33] Nemhauser, G. L. and L. A. Wolsey, A Recursive Procedure for Gener-


ating all Cuts for Mixed-Integer Programs, Mathematical Programming,
46, 379-390, 1990.

[34] Oley, L. A. and R. J. Sjouquist, Automatic Reformulation of Mixed and


Pure Integer Models to Reduce Solution Time in Apex IV, Presented at
the ORSA/TIMS Fall Meeting, San Diego, 1982.

[35] Overton, M. and H. Wolkowicz, "Semidefinite Programming," Mathe-


matical Programming, 77(2), 105-110, 1997

[36] Padberg, M. W., {1,k)-Configurations and Facets for Packing Problems,


Mathematical Programming, 18, 94-99, 1980.

[37] Padberg, M. and G. Rinaldi, A Branch-and-Cut Algorithm for the Res-


olution of Large-Scale Symmetric Traveling Salesman Problems, SIAM
Review, 33, 60-100, 1991.
530 H.D. Sherali and W.P. Adams

[38] Ramachandran, B. and J. F. Pekny, Dynamic Factorization Methods


for Using Formulations Derived from Higher Order Lifting Techniques
in the Solution of the Quadratic Assignment Problem, in State of the Art
in Global Optimization, eds. C. A. Floudas and P. M. Pardalos, Kluwer
Academic Publishers, 7, 75-92, 1996.

[39] Ramakrishnan, K. G., M. G. C. Resende, and P. M. Pardalos A Branch


and Bound Algorithm for the Quadratic Assignment Problem Using a
Lower Bound Based on Linear Programming, in State of the Art in
Global Optimization, eds. C. A. Floudas and P. M. Pardalos, Kluwer
Academic Publishers, 7, 57-74, 1996.

[40] Sherali, H. D., On the Derivation of Convex Envelopes for Multlinear


Functions, Working Paper, Department of Industrial and Systems Engi-
neering, Virginia Polytechnic Institute and State University, Blacksburg,
VA 24061-0118, 1996.

[41] Sherali, H. D. and W. P. Adams, A Decomposition Algorithm for a


Discrete Location-Allocation Problem, Operations Research, 32(4), 878-
900, 1984.

[42] Sherali, H. D. and W. P. Adams, A Hierarchy of Relaxations Between


the Continuous and Convex Hull Representations for Zero-One Program-
ming Problems, SIAM Journal on Discrete Mathematics, 3(3), 411-430,
1990.

[43] Sherali, H. D. and W. P. Adams, A Hierarchy of Relaxations and Con-


vex Hull Characterizations for Mixed- Integer Zero-One Programming
Problems, Discrete Applied Mathematics, 52, 83-106, 1994. (Manuscript,
1989).

[44] Sherali, H. D. and E. L. Brown, A Quadratic Partial Assignment and


Packing Model and Algorithm for the Airline Gate Assignment Problem,
DIMACS Series in Discrete Mathematics and Theoretical Computer Sci-
ence, Quadratic Assignment and Related Problems, eds. P. M. Pardalos
and H. Wolkowicz, 16, 343-364, 1994.

[45] Sherali, H. D. and G. Choi, Recovery of Primal Solutions When Using


Subgradient Optimization Methods to Solve Lagrangian Duals of Linear
Programs, Operations Research Letters, 19(3), 105-113, 1996.
Reformulation-Linearization for Discrete Optimization 531

[46] Sherali, H. D. and P. J. Driscoll, On Tightening the Relaxations of


Miller-Tucker-Zemlin Formulations for Asymmetric Traveling Salesman
Problems, Working Paper, Department of Industrial and Systems Engi-
neering, Virginia Polytechnic Institute and State University, Blacksburg,
Virginia, 1996.

[47] Sherali, H. D. and Y. Lee, Sequential and Simultaneous Liftings of Mini-


mal Cover Inequalities for GUB Constrained Knapsack Polytopes, SIAM
Journal on Discrete Mathematics, 8(1), 133-153, 1995.

[48] Sherali, H. D. and Y. Lee, Tighter Representations for Set Partitioning


Problems, Discrete Applied Mathematics, 68, 153-167, 1996.

[49] Sherali, H. D. and D. C. Myers, Dual Formulations and Subgradient


Optimization Strategies for Linear Programming Relaxations of Mixed-
Integer Programs, Discrete Applied Mathematics, 20(S-16), 51-68, 1989.

[50] Sherali, H. D. and O. Ulular, A Primal-Dual Conjugate Subgradient Al-


gorithm for Specially Structured Linear and Convex Programming Prob-
lems, Applied Mathematics and Optimization, 20, 193-221, 1989.

[51] Sherali, H. D. and C. H. Thncbilek, A Global Optimization Algorithm


for Polynomial Programming Problems Using a Reformulation- Lin-
earization Technique, Journal of Global Optimization, 2, 101-112, 1992.

[52] Sherali, H. D. and C. H. Thncbilek, A Reformulation-Convexification


Approach for Solving Nonconvex Quadratic Programming Problems,
Journal of Global Optimization, 7, 1-31, 1995.

[53] Sherali, H. D. and C. H. Thncbilek, New Reformulation-Linearization


Technique Based Relaxations for Univariate and Multivariate Polyno-
mial Programming Problems, Operation Research Letters, to appear,
1996.

[54] Sherali, H. D., W. P. Adams, and P. Driscoll, Exploiting Special Struc-


tures in Constructing a Hierarchy of Relaxations for 0-1 Mixed Integer
Problems, Operations Research, to appear, 1996.

[55] Sherali, H. D., G. Choi, and C. H. Thncbilek, A Variable Target Value


Method, under revision for Mathematical Programming, 1995.
532 H.D. Sherali and W.P. Adams

[56] Sherali, H. D., R. Krishnamurthy, and F. A. AI-Khayyal, A Refor-


mulation -Linearization Approach for the General Linear Complemen-
tarity Problem, Presented at the Joint National ORSA/TIMS Meeting,
Phoenix, Arizona, 1993.
[57] Sherali, H. D., R. S. Krishnamurthy, and F. A. AI-Khayyal, An En-
hanced Intersection Cutting Plane Approach for Linear Complementarity
Problems, Journal of Optimization Theory and Applications, to appear,
1995.
[58] Sherali, H. D., Y. Lee, and W. P. Adams, A Simultaneous Lifting Strat-
egy for Identifying New Classes of Facets for the Boolean Quadric Poly-
tope, Operations Research Letters, 17(1), 19-26, 1995.

[59] Van Roy, T. J. and L. A. Wolsey, Solving Mixed Integer Programs by


Automatic Reformulation, Operations Research, 35, 45-57, 1987.

[60] Van Roy, T. J. and L. A. Wolsey, Valid Inequalities for Mixed 0-1 Pro-
grams, CORE Discussion Paper No. 8316, Center for Operations Re-
search and Econometrics, Universite Catholique de Louvain, Belgium,
1983.
[61] Williams, H. P., Model Building in Mathematical Programming. John
Wiley and Sons (Wiley Interscience), Second Edition, New York, NY,
1985.
[62] Wolsey, L. A. Facets and Strong Valid Inequalities for Integer Programs,
Operations Research, 24, 367-373, 1976.
[63] Wolsey, L. A., Strong Formulations for Mixed Integer Programming: A
Survey, Mathematical Programming, 45, 173-191, 1989.

[64] Wolsey, L. A., Valid Inequalities for 0-1 Knapsacks and MIPs with
Generalized Upper Bound Constraints, Discrete Applied Mathematics,
29, 251-262, 1990.
533

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 533-572
©1998 Kluwer Academic Publishers

Grobner Bases in Integer Programming


Rekha R. Thomas
Department of Mathematics
Texas A&M University, College Station, TX 77843
E-mail: rekha@math.tamu.edu

Contents
1 Introduction 534

2 Parametric linear programming and


regular triangulations 531

3 Parametric integer programming and


Grobner bases 542
3.1 Algebraic fundamentals ..... 542
3.2 The Conti-Traverso algorithm . . 543
3.3 Test sets in integer programming 546

4 Universal test sets for linear and


integer programming 550

5 Variation of cost functions in linear and


integer programming 558

6 Unimodular matrices 565

1 Implementation Issues 568

References
534 R.R. Thomas

1 Introd uction
Recently, application of the theory of Grabner bases to integer programming
has given rise to new tools and results in this field. Here we present this
algebraic theory as the natural integer analog of the simplex approach to
linear programming. Although couched in algebra, the theory of Grabner
bases and its consequences for integer programming are intimately inter-
twined with polyhedral geometry and lattice arithmetic which are staples of
the traditional approach to this subject.
Throughout this paper we fix a d by n integral matrix A of full row rank.
Let ker(A) denote the n - d dimensional subspace {u E R n : Au = O} and
kerz(A) the saturated lattice {u E zn: Au = O} of rank n-d. For technical
simplicity we assume that ker(A) and hence kerz(A) have no non-trivial
intersection with the non-negative orthant of Rn. We often identify A with
the point configuration A = {ar, .. . ,an} c Zd where aj is the jth column
of A. Let cone(A) = {Au: u E R~o} be the d-dimensional polyhedral
cone in Rd generated by elements of-A and coneN(A) = {Au: u E Nn}
be the additive monoid in Zd generated by A. Throughout this article, N
denotes the non-negative integers and R~o the non-negative real numbers.
For x ERn, we let supp(x) = {i : Xi i= O} denote the support of x.
We study LPA,e(b) := minimize {c· x : Ax = b, x ~ O}, the linear
program in which the coefficient matrix A is as above, the right hand side
vector b E R d , and the cost vector cERn. The set of feasible solutions to
LPA,e(b) is the polyhedron Pb = {x E R~o : Ax = b} which is non-empty
if and only if b lies in cone(A). Notice that Pb is the fiber of b under the
linear map 1rA : R~o --t Rd, such that x H Ax, i.e., Pb = 1rA-l(b). We
call Pb the b- fiber Of 1rA. The assumption ker(A) n R~o = {O} implies that
P b is a polytope (bounded polyhedron) for all b E cone(A). We let LPA,e
denote the family of programs LPA,e(b) obtained by varying b E cone (A)
while keeping A and c fixed, and LPA the family obtained by keeping only
A fixed. The cost vector c is generic with respect to LPA if, LPA,e(b) has a
unique optimum (which is automatically a vertex of Pb) for each bE cone(A}.
Let IPA,e(b) := minimize {c· x : Ax = b, x E Nn} be the integer
program where A and c are as above. We may assume that b is integral
since IPA,e(b) is feasible if and only if bE coneN(A). Let pi = conv{x E
N n : Ax = b} where conv stands for "convex hull". Since pi ~ Pb and Pb is
a polytope, pi is again a polytope for each bE coneN(A). By an abuse of
nomenclature, we call pi the b-fiber of 1ri where1ri : Nn --t Zd, x H Ax.
As above, we let I P A,e denote the family of programs obtained by varying
Grobner Bases in Integer Programming 535

bE coneN(A) while keeping A and c fixed and IPA the family obtained by
keeping only A fixed. The cost vector c is generic with respect to I PA if
IPA,c(b) has a unique optimum (which is automatically a vertex of pl) for
each b E coneN(A). In Section 5 we will see that if c is generic for IPA then
it is also generic for LPA but not conversely, in general.
We now recall some polyhedral facts that will be needed. The reader is
refered to the books [24] and [33] for details. A polyhedral complex D. is a
finite collection of polyhedra such that (i) if P E D., then every face of Pis
in D. and (ii) if PI,P2 E D., then PI nP2 is a face of PI and P2. Elements
of D. are called cells or faces of D. and the support of D. is the union of all
faces in D.. A polyhedral complex is completely specified by its maximal
(with respect to inclusion) cells. If all cells in D. are cones, then D. is called
a polyhedral fan. A fan in Rn is complete if its support is all of Rn.
For a polyhedron P ~ R n and cost vector cERn we write facec(P)
for the face of P at which c gets minimized, i.e., facec(P) = {v E P :
c· v S c . u, 'Vu E P}. If F is any face of P, then N(Fj P) denotes the cone
of (inner) normals, called the inner normal cone of P at F. In symbols,
N(Fj P) = {c E Rn : c· x S c· y for all x E F, YEP}. The collection
of cones N(Fj P) is denoted N(P) and called the (inner) normal fan of P.
The normal fan of P is a polyhedral fan in Rn that is complete if and only
if P is a polytope.

1 2

6
3

inner nonnal cone of P at vertex 1 inner nonnal fan of P

Figure 1: Inner normal cone and fan.

We say that two polytopes are normally equivalent if they have the same
normal fan. Given two polytopes P and Q in Rn, their Minkowski sum is
the polytope P + Q = {p + q : pEP, q E Q} C Rn and P and Q are
called Minkowski summands of P + Q. For all cERn, facec(P + Q) =
536 R.R. Thomas

jacec(P) + jacec(Q). This implies that every vertex of P + Q is a sum of


vertices of the summands and that every edge of P + Q is a parallel translate
of an edge of some summand. The definition of Minkowski sum extends to
the case of finitely many summands and as in the usual extension of addition
to integration, the operation of taking Minkowski sums of finitely many
polytopes extends naturally to the operation of taking Minkowski integrals
of infinitely many polytopes. See [5] for details. The common refinement of
two fans F and 9 in R n, denoted F n g, is the fan of all intersections of
cones from F and g. We say that F n 9 is a refinement of F (respectively
9). The following are two useful facts in this context: (i) for polytopes P
and Q in R n , the fan N(P + Q) = N(P) nN(Q) and (ii) the fan N{P) is
a refinement of N{Q) if and only if AQ is a Minkowski summand of P for
some positive real number A. For a hyperplane H = {x E Rn : ax = O}
in Rn, let H+ denote the closed half space {x E Rn : ax ~ O} and H-
denote {x E Rn : ax ~ O}. A hyperplane arrangement in Rn is the common
refinement of finitely many fans of the form {H+,H-}. The arrangement is
usually specified by listing the associated hyperplanes. The Minkowski sum
of finitely many line segments is called a zonotope and by (i), its normal fan
is a hyperplane arrangement.

P+Q

normal fan of P normal fan of Q normal fan of P+Q

~. ~ .
.~ .~

Figure 2: Minkowski sum of polytopes.


Grobner Bases in Integer Programming 537

This article is organized as follows: In Section 2 we introduce the regular


triangulation !:!.e of A with respect to c and show that !:!.e determines the
optimal solutions of programs in LPA,e' This is a reformulation of the usual
duality theory for linear programming. We also introduce test sets for linear
programming which give rise to a generalization of the simplex algorithm for
programs in LPA,e' In Section 3 we develop the Grabner basis algorithm for
I PA,e and examine the underlying geometry. Section 4 constructs and stud-
ies universal test sets for both LPA and IPA. In Section 5 we introduce the
secondary and state polytopes associated with A. The normal fans of these
polytopes model the effect of varying cost functions in linear and integer
programming. The layout of Sections 2 to 5 is strongly guided by the in-
tention to reaffirm the philosophy that integer programming is an arithmetic
refinement of linear programming. Section 6 examines unimodular matrices
and we conclude in Section 7 with a brief discussion of implementation issues
associated with solving integer programs using Grabner bases.

2 Parametric linear programming and


regular triangulations
In this section we study the family of linear programs LPA,e from the point of
view of regular triangulations ([3], [4] and [16]) and test sets [24]. We identify
the point configuration A = {al,"" an} with the index set {I, ... ,n}. A
subdivision of A is a collection of subsets (j of {I, ... , n} such that the cones
cone{a) = cone({ai : i E a}) form a polyhedral fan with support cone(A).
If cone(a) is k dimensional, then a is called a k-cell of the subdivision. A
subdivision is a triangulation if for each cell a, cone(a) is simplicial.
Every cost vector cERn induces a subdivision !:!.e of A as follows:
{iI, ... , ik} is a cell of !:!.e if and only if there exists a row vector y E R d such
that y·aj = Cj if j E {il, ... ,ik} and y·aj < Cj ifj E {I, ... , n} \ {it, ... , ik}'
Subdivisions obtained in this way are called regular (or coherent).
The regular subdivision !:!.e can also be constructed geometrically:
1. Let A = {(ab CI), (a2' C2), ... , (an, en)} c Rd+1 be the point config-
uration obtained by lifting each point ai of A to height Ci in one higher
dimensional space.
2. The set of "lower" faces of cone (A) form a d-dimensional polyhedral
complex on {I, ... ,n} : a lower face is represented by the set of indices of
points in A that lie on it. The set of these faces is !:!.e. Equivalently, !:!.e is
the projection onto cone (A) of the lower faces of cone(A).
538 R.R. Thomas

A face is "lower" if it has a normal vector with negative last coordinate.


One should verify that the above construction gives D.c. The algebraic theory
of Grabner bases for integer programming is especially suited for configu-
rations A whose elements span an affine hyperplane in Rd : the toric ideal
IA constructed in Section 3 is homogeneous in this case. It then suffices
to construct conv(A) instead of cone (A) and project the lower faces of the
(d + l)-polytope conv(A) onto conv(A), to obtain D.c.

Example 2.1 Let A = {1, 4, 6, 1O} C Rand c = (5,1,3,5). Then cone(A)


is the non-negative real line and the unique lower face of cone (A) is the
extreme ray generated by (a2' C2). Hence D.c = {{2}}.

Example 2.2 Let A = {(1, 0, 0), (1, 1,0), (1,2,1), (1,1,2), (1,0, 1)}, be the
three dimensional configuration whose affine span is the hyperplane Xl = 1
in R3. Then conv(A) is a pentagon with five distinct regular triangula-
tions that may be represented as D.(2,3,5,0,0), D.(3,2,1,0,0), D.(2,3,1,0,0), D.(0,1,0,1,1)
and D.(0,1,2,0,1). Notice that the cost vector (0,0,0,0, 1) is not generic since
D.(0,0,0,0,1) = {{1, 4, 5}, {1, 2, 3, 4}} is not a triangulation. It may be re-
fined to either D.(0,1,0,1,1) = {{1, 4, 5}, {1, 3, 4}, {1, 2, 3}} or D.(0,1,2,0,1) =
{{1, 4, 5}, {1, 2, 4}, {2, 3, 4}}. Figure 3 shows the five triangulations and the
above sudivision of A.

We now wish to relate the regular subdivision D.c to the optimal solutions
of programs in the family LPA,c. Let LPA,c(b)dual denote the linear program
dual to LPA,c(b):

maximize{y . b : y. A :$ c, Y E R d},

with feasible region Qc = {y E Rd : yA :$ c}.

Theorem 2.3 For each b E cone(A), a feasible solution X to LPA,c(b) is


optimal if and only if the support of x, supp(x), is a subset of a cell of D.c.

°
Proof. Let x be an optimal solution of LPA,c(b) and y an optimal solution
of LPA,c(b)dual. By complementary slackness, Xj > implies y. aj = Cj,
which means that supp(x) lies in a face of D.c. Conversely, let x be any
feasible solution to LPA,c(b) such that supp(x) is a subset of a face of D.c.
Then there exists y E Rd with supp(x) ~ {j : y. aj = Cj}. This implies
c· x = y. A· x = y. b and hence, x is an optimal solution of LPA,c(b). 0
Grobner Bases in Integer Programming 539

,tV, tS7 - ~
c=(2,3,5,O,O) c=(3,2,l,O,O) c =(2,3,1,0,0)

-------
1 2

t !

l1> t&>
.p.
.0 :
-<0-- ......
c?/ ·0 -<0-- ......

6····0·
c=(O,l,2,O,l) c=(O,O,O,O,l) c=(O,l,O,l,l)

Figure 3: Regular triangulations and subdivision of conv(A).

Corollary 2.4 For a cost vector cERn, the following are equivalent:
(a) c is generic with respect to LPA
(b) I:1 c is a triangulation
(c) Qc is a simple polyhedron.

Proof. (a) ¢:} (b): The subdivision I:1c is a triangulation if and only if for
each b E cone (A) , the smallest cone in I:1c containing b is simplicial. The
latter condition is equivalent to the existence, for each b E cone(A), of a
unique point in Pb with support in a cell of I:1 c• By Theorem 2.3 this is the
unique optimal solution to LPA,c(b). Therefore I:1c is a triangulation if and
only if c is generic with respect to LPA.
(b) ¢:} (c): By definition, F ~ {I, ... ,n} is a face of I:1 c if and only if
{y E Rd: aj'Y = CjVj E F,aj'Y < CjVj ¢ F} is non-empty. But every
set of this form is a, possibly empty, face of Qc. Using the relationship
between the dimension of the face given by F and the cardinality of F, it
can be seen that I:1c is polar to the boundary complex of Qc, and I:1 c is a
triangulation if and only if Qc is a simple polyhedron. (The boundary com-
plex of a polyhedron P is the collection (complex) of all proper faces of P.) 0

Example 2.2 continued. Consider the regular subdivision 1:1(0,0,0,0,1) =


540 R.R. Thomas

{{I, 4, 5}, {I, 2, 3,4},} and b = (25,34,18) E cone( {I, 2, 3, 4}). Then Pb is a
quadrilateral (see Figure 4) with vertices: (23/3,0,50/3,2/3, O),
(7,0,17,0,1), (0,23/2,9,9/2,0), and (0,7, 27/2, 0, 9/2} with

face(o,O,O,O,l)(Pb) = [(23/3,0,50/3,2/3, O), (0,23/2,9,9/2, O)].

This reaffirms that (0,0,0,0,1) is not generic for LPA. The generic costs
(0,1,0,1,1) and (0,1,2,0,1) are optimized at vertices (23/3,0,50/3,2/3,0)
and (0, 23/2, 9, 9/2,0} respectively.

For the remainder of this section we assume that c is generic, so that D.e
is a triangulation. Our goal will be to use D.e to construct a test set to solve
all programs in LPA,e'

Definition 2.5 A test set for the family LPA,e is any finite subset 7c of
ker(A) such that c· t > 0 for all t E 7c and, for every b E cone (.A.) and
every x E Pb, either x is the optimal solution of LPA,e(b} or there exists
t E 7c and i > 0 such that x - it ~ O.

A test set is minimal if it has minimal cardinality. We say that I c


{I, ... , n} is a minimal non-face of D.e if I is not a face of D.e but every
proper subset of I is a face of D.c.

Proposition 2.6 A finite subset 7c C ker(A) is a minimal test set for


LPA,e if and only if for every minimal non-face I of D.e there is a unique
vector t E 7c such that I = {i : ti > O}.

Proof. By Theorem 2.3, a feasible solution x of LPA,e(b} is non-optimal if


and only if supp(x} contains a minimal non-face of D.c. Hence 7c C ker(A}
is a minimal test set for LPA,e if and only if for every minimal non-face I of
D.e there is a unique vector t E 7c such that I = {i : tj > O}. This property
of 7c is necessary and sufficient to guarantee the existence of an improving
direction in 7c for every non-optimal solution to a program in LPA,e' 0

Example 2.2 continued. For the generic cost vector c = (0,1,0,1, I), we
had 6 e = {{I, 4, 5}, {I, 2, 3}, {I, 3, 4}}. The minimal non-faces of D.e are
{2,5}, {2,4} and {3,5}. A minimal test set for LPA,e is therefore,
7c = { 2e2 + e5 - 2el - e3, 3e2 + e4 - 2el - 2e3, e3 + 3e5 - 2el - 2e4} C ker(A}
where ej denotes the ith unit vector in Rn.
Grabner Bases in Integer Programming 541

We may assume without loss of generality that fc C kerz(A). It is


sometimes convenient to view u E fc as the line segment [u+, u-] where
u+ and u- are the unique vectors in N n of disjoint supports such that
u = u+ - U-. Since c· u+ > c· u-, we may assume that [u+, u-] is directed
from u+ to U-. Then solving LPA,c(b) using fc is a generalization of the
simplex algorithm for this linear program since in this method, the directed
segments of fc trace a monotone path from every non-optimal solution of
the program to the optimum that stays entirely within Pb • Unlike in the
simplex method, this path is not required to be along the edge-skeleton of
Pb. Figure 4 is an illustration of this method on LPA,c(b) where A is as
in Example 2.2, c = (0,1,0,1,1), b = (25,34,18) and fc as above. All
objects have been projected onto the X4, X5 plane. The dotted line shows a
monotone path from the initial solution (0,7,27/2,0,9/2) to the optimum
(23/3,0,50/3,2/3,0) traced by the directed segments of fc.

Test set directions

\t

(23/3,0,50/3,213,0) (0,23/2,9,9/2,0)

Figure 4: The generalized simplex method.


542 R.R. Thomas

3 Parametric integer programming and


Grobner bases
3.1 Algebraic fundamentals
Our goal in this section is to present the Grobner basis algorithm for integer
programs in the family IPA,e, due to Conti and Traverso [8]. (See also
Section 2.8 in [1].) We use this subsection to introduce the essential algebraic
notions that will be needed. We refer the reader to [1] and [11] for the theory
of Grabner bases and to [27] for an algebraic development of the connections
between Grobner bases and convex polytopes in general.
Let k[x] = k[XI' ... ,xn ] be the polynomial ring in the variables Xl, ... ,Xn
where k is any field. We identify a monomial xQ = xr 1 X~2 ... x~n E k[x]
with its exponent vector a = (aI, ... ,an) E N n . A term order ).- on N n is a
total order on N n such that 0 -< a for all a E N n \ {O} and a + <5 ).- (3 + <5 for
a
all E Nn whenever a ).- (3. For a polynomial f E k[x], there exists a unique
term cQxQ in f such that xQ has the most expensive exponent vector with
respect to ).-, among all monomials in f. Then cQxQ is the initial term and
xQ the initial monomial of f with respect to).-. We may assume without loss
of generality that the scalar coefficient of the initial monomial is one. For an
ideal I ~ k[x], the initial ideal of I with respect to ).- is the monomial ideal
in,;>-{I) = (in,;>-(f) : f E I) and a Grobner basis for I with respect to ).- is any
finite set Y,;>- C I such that in,;>-(l) = (in,;>-(g): 9 E Y,;>-). A Grobner basis Y,;>-
is reduced if for any pair of elements 9i, gj E Y,;>-, in,;>- (gj) does not divide any
term of gi. Every term order ).- has a unique reduced Grobner basis that can
be computed via Buchberger's algorithm [7] implemented in most computer
algebra packages. The monomials in k[x] that lie outside in,;>-(I) are the
standard monomials with respect to).-. The division algorithm rewrites
every f E k[x] uniquely as a k-linear combination of standard monomials
called the normal form of f with respect to ).-, denoted nf,;>-(f).
Just as we identify N n with the monomials in k[x] we may identify Zd
with the monomials in k[t±1] = k[tl, tIl, ... , td, til]. The map 7r~ then lifts
to a homomorphism of monoid algebras via

ir I .

k[x] -+ k[t±l]' X·
J I-t taj
'
where aj is the jth column of A. The kernel of ir~, denoted lA, is a prime
ideal in k[x] called the torie ideal of A, a name that follows from the fact
that the affine variety V (IA) C k n is a (not necessarily norman toric variety.
For more details of this connection see [15] and [27].
Grabner Bases in Integer Programming 543

Clearly, if u E kerz(A) then the binomial x U + - x U - E lA. Conversely,


every binomial in lA is of the form x U + - x U - for some u E kerz(A).

Lemma 3.1 The tonc ideal lA = (xut - xui : Ui E kerz(A), i = 1, ... , t}.
Sketch of proof. The ideal fA is generated as a k-vector space and hence
as an ideal by the infinite set of binomials {x u + - x U - : u E kerz(A)}.
Therefore, by the Hilbert basis theorem, lA has a finite generating set of the
+ -
form {X Ui -xu; : uEkerz(A),i=l, ... ,t}forsometEN\{O}. 0

The mechanics of the Buchberger algorithm assures that if a set of bi-


nomials is input to the algorithm then, regardless of the term order, every
intermediate polynomial created during the course of the algorithm as well
as the Grabner basis that is output, again consists of binomials. For the
toric ideal lA, the following stronger result holds.

Corollary 3.2 Every reduced Grobner basis of lA consists of a finite set of


binomials of the form x u + - x U - , where u E kerz(A).

Corollary 3.3 While dividing (reducing) a monomial x P E k[x] by a re-


duced Grobner basis of lA, every remainder formed (and in particular the
normal form of x P) is a monomial whose exponent vector lies in P{p.

Proof. An element ;f.u+ - x u - E O-r divides a monomial x P if and only if the


initial monomial x u+ divides x p • The remainder is (x P /x u+) . x u- = x p - u
which is again a monomial in k[x] since p ;::: u+ and hence p - u E N n .
Further, since Au = 0, it follows that Ap = A(P - u). 0

3.2 The Conti-Traverso algorithm


The Conti-Traverso algorithm requires that the cost vector c in I PA,e is
generic with respect to lPA. Every vector cERn orders the points in N n
via the inner product c· x. Whenever this is a partial order, we fix a term
order r- to break ties and denote by r-e, the refinement of c by r-. The total
order r-e is such that a r-e {3 if either c· a > c· {3 or c· a = c· {3 and a r- (3.
The optimal cost value of both lPA,e(b) and lPA,>-c(b) are the same, but
we now have the advantage that r-e is generic with respect to lPA. Since
every non-generic cERn can be made generic as above without affecting
the optimal value of the programs in I P A,e, we assume in the rest of this
subsection that c is generic with respect to lPA.
544 R.R. Thomas

In the first step of the algorithm, we need to compute the reduced


Grabner basis gc of the toric ideal lA. In usual Grabner basis computations
one requires a term order as input to the Buchberger algorithm. However,

°
the order induced by c is often not a term order since it can fail to satisfy the
condition C'O! > c·o = for all non-zero O! E Nn. It is this property that usu-
ally guarantees that the Buchberger algorithm will terminate after finitely
many steps. In our situation, the termination of the Buchberger algorithm
follows from the assumption that pl is a polytope for all b E coneN(A).

Algorithm 3.4 The Conti-Traverso algorithm for lPA,c'


(i) Compute the reduced Grabner basis gc of lAo
(ii) To solve lPA,c(b) for bE coneN(A), find any solution u of the program
and compute xu' = nfc(xU ). Then u' is the unique optimum of lPA,c(b).

Proof of Algorithm. Let gc = {xQ;+ - x Q;-, i = 1, ... ,p} where the un-
derlined term is the initial monomial of each binomial. By Corollary 3.3
u' is a solution to lPA,c(b). Suppose there exists v E pt n Nn such that
c . u' > c· v. Then xu' is the initial term of ;[2u' - XV E lA with respect
to c and hence there exists a binomial in gc whose initial monomial divides
xu'. This contradicts that xu' is the normal form of XU with respect to gc. 0

Example 2.2 continued. For A = ( ° °


1 1 1 11)
1 2 1
00121
and the cost vector

c = (0,1,0,1,1) which is generic with respect to lPA,

where the underlined terms are the initial terms. For b = (25,34,18), the
point (1,10,10,4,0) is a solution of lPA,c(b). The normal form ofthe mono-
mial xlx~oxlox1 with respect to gc is xl xF Xs and hence the optimal solution
of lPA,c(b) is (7,0,17,0,1). A possible reduction path of xlx~oxlox1 is given
by the following chain of monomials:
o
4 times 3 times

There are a number of issues that have to be dealt with while implement-
ing the above version ofthe Conti-Traverso algorithm. In particular, finding
a generating set for lA to be used as input to the Buchberger algorithm in
Step (i) and finding an initial solution u to lPA,c(b) are both non-trivial.
Grobner Bases in Integer Programming 545

(We will briefly address the former problem in Section 7.) Once these two
issues are taken care of, the reduced Grabner basis Qc and the normal form
of XU can be computed using a software package that does Grabner basis
computations like Macaulay 2 [18] or GRIN [21]. We note that Algorithm 3.4
is a condensed version of the original Conti-Traverso algorithm in [8], useful
for highlighting the main computational steps involved. The original algo-
rithm uses a single Grabner basis computation on a larger ideal to bypass
the computation of generators for lA. Then a reduction phase takes care of
finding both an initial solution u and the optimal solution u' of lPA,c(b).

Algorithm 3.5 The Original Conti-Traverso Algorithm.


- +
Consider the ideal J = (Xjt aj - taj , j = 1, ... ,n, tOtl ... td -1) in the poly-
nomial ring k[XI, ... , Xn , to, tt, ... , td]. Let to = {tt, ... , td}.
(i) Compute the reduced Grabner basis Q-r' of J with respect to any elimina-
tion term order >-' such that {to, tl,"" td} >-' {Xl, ... , xn} and >-' restricted
to k[x] induces the same total order as c.
(ii) In order to solve lPA,c(b), form the monomial t b = tgt~+.B(el+ .. +ed)
where f3 = max{lbjl : bj < O} and ei is the i-th unit vector in Rd. Compute
nf-r,(tb ) = fYx u'. If 'Y = 0 then lPA,c(b) is feasible with optimal solution u'.
Else lPA,c(b) is infeasible.
- +
Proof. The ideal J = (Xjt aj - taj ,j = 1, ... , n, tOtl ... td - 1) has the
property that J n k[x] = lA, which in turn implies that Q-r' n k[x] = Qc.
For a proof of this fact see Theorem 2 in Section 2.2 of [11]. Therefore,
Step (i) of Algorithm 3.5 indirectly achieves Step (i) of Algorithm 3.4. Since
>-' is an elimination order with t >-' x, once a monomial in k[x] has been
encountered during the reduction of t b, all subsequent monomials also lie in
k[x] and their exponent vectors are all solutions to lPA,c(b). The algorithm
will reduce t b to a monomial XU E k[x] if and only if lPA,c(b) is feasible.
The only if direction is clear. To see the if direction, suppose the normal
form f'Yxu' of t b has 'Y =1= 0 and lPA,c(b) has a solution v. Then since >-'
was an elimination order with {to,tl, ... ,td} >-' {xt, ... ,xn }, the binomial
fY xu' - XV E J has fY xu' as initial term. This contradicts that fY xu' is the
normal form of t b with respect to Q-r" Hence, if 'Y =1= 0 we may conclude
that lPA,c(b) is infeasible. However, if'Y = 0, by the same argument as in
the proof of Algorithm 3.4, u' is the optimal solution to lPA,c(b). 0
If all entries of the matrix A are non-negative, we do not need the variable
to and the binomial totl ... td - 1 in the above computation. We refer the
546 R.R. Thomas

reader to [8] for computational tests of the above algorithm. Algorithm 3.5 is
like a "Phase 1- Phase 2" procedure for IPA,c(b) while Algorithm 3.4 is just
"Phase 2". Building the ideal J is equivalent to considering the "extended"
integer program :

minimize{ M . t +c.x : -1· to +I . to + A . x = b, t, x ~ 0, integer}

for which (to, to) = ({3, b + {3(el + ... + ed)), x = 0 is a solution. Here M is a
vector with sufficiently large entries, I is the d x d identity matrix and -1 is
a column vector of minus ones. In the reduction step, we start at the above
"artificial solution" of IPA,c(b) and reduce this whenever IPA,c(b) is feasible,
to an initial solution of IPA,c(b) and then to the optimum of IPA,c(b).

3.3 Test sets in integer programming


We now examine the Conti-Traverso algorithm and establish the integer
analogs of the results in Section 2. For a general wE Rn and a polynomial
f E k[x], we define the initial form of f with respect to w, denoted inw(J), to
be the sum of all terms in f whose monomials are of maximal w-weight. For
an ideal Ie k[x] the initial ideal of I with respect to w is inw(I) = (inw(J) :
f E I} which in general may not be a monomial ideal (ideal generated by
monomials). The reduced Grobner basis gw may be defined as before and
all monomials that lie outside inw(I) are said to be standard with respect
to w. The following theorem is the integer analog to Theorem 2.3.

Theorem 3.6 For each b E coneN(A), a feasible solution u to IPA,c(b) is


optimal if and only if XU is standard with respect to c.

Proof. A solution u to IPA,c(b) is non-optimal if and only if nfc(xU ) i= XU

which happens if and only if XU E inc(IA). 0

For a generic cost vector c with reduced Groner basis gc = {~at - xai :
i = 1, ... ,p}, the initial ideal inc(IA) = (x at : i = 1, ... ,p) is a monomial
ideal. The monomials in inc(IA) are in bijection with the lattice points in
the "staircase" like subset Se = uf=l (at +Nn) of Nn. The optimal solutions
to the programs in I PA,e are precisely the lattice points of N n that do not
lie in Se and there is precisely one optimal solution per program.

Corollary 3.7 A cost vector cERn is generic with respect to IPA if and
only if inc(IA) is a monomial ideal.
Grobner Bases in Integer Programming 547

Proof. A cost vector c is not generic with respect to IPA if and only if
there exists a minimal generator of the form x u+ - x u - for ine(IA)' Such
a generator would have c . 1'+ = c· 1'- and would allow multiple optimal
solutions in certain fibers of 11"~. 0

Example 2.2 continued. For the non-generic cost vector c = (0,0,0,0,1),


ine(IA) = (X3XS, X2X~, x~xs, X~X4 - x~x~). For b = (25,34,18), pI
is a
quadrilateral with vertices: (7,0,17,0,1), (7,1,6,1,0), (1,10,10,4,0) and
(1,6,14,0,4) with face(o,O,O,O,l)(Pb) = [(7,1,6,1,0), (1, 10, 10,4,0)]. 0

For the rest of this section we assume that c is generic with respect to I PA.

°
Definition 3.8 A set ne ~ kerz(A) is called a test set for the family IPA,e
if c . r > for all rEne and, for each non-optimal solution l ' to a program
IPA,e(b), there exists rEne such that l ' - r E pr
Existence of a finite test set ne for I P A,e implies a trivial solution method
for all programs in IPA,e. Starting at a solution l' of IPA,e(b) , we can succes-
sively move to improved solutions of the program by subtracting appropriate
elements of ne. A solution 1" will be optimal for I PA,e(b) if and only if there
does not exist rEne such that 1" - r is feasible for IPA,e(b).

Proposition 3.9 A finite subset ne ~ kerz(A) is a minimal test set for


IPA,e if and only if for every minimal generator at
of Se, there is a unique
vector r in ne such that r+ = at.
Proof. By Theorem 3.6, the non-optimal solutions to the programs in IPA,e
are precisely the lattice points in Se. 0

As before we may interpret each element 3lQ ;+ - x Q ; - of the reduced


Grabner basis Qe = {3lQt - xQj" , i = 1, ... ,p} uniquely, as either the line
segment [ai + , ai -] inpI
(AOi +) directed from ai + to 0i - or as the vector
ai = at - oi E kerz(A).

Corollary 3.10 The reduced Grobner basis Qe of IA is a uniquely defined


minimal test set for I P A,e.

Proof. Since Qe is reduced, for every minimal generator of ine{IA) , there is a


unique binomial in Qe with initial monomial equal to this generator. Hence
Qe is a minimal test set by Proposition 3.9. For a fixed initial monomial xQt,
548 R.R. Thomas

the trailing monomial xQi is such that ai is the unique optimal solution to
IPA,c(Aan. Else, the trailing monomial would be divisible by some initial
monomial of Qc contradicting that Qc is reduced. 0

The above theorems show that inc(IA) is to IPA,c what l:l.c is to LPA,c.
As might be expected, these two entities are related as shown in the following
theorem which first appeared in [25].

Theorem 3.11 A subset F ~ {I, ... ,n} is a face of l:l.c if and only if there
does not exist a minimal generator xQt of the initial ideal inc(IA) such that
supp( an is contained in F.

Example 2.2 continued. For l:l.(0,1,0,1,1) = {{I, 4, 5}, {I, 3, 4}, {I, 2, 3}}
andQ (0,1,0,1,1) -{
- 3 22 2 2 2 2}
X3 X S - Xl X4' X2 X S - Xl X4, X2 X 4 - X3XS, X2 X S - Xl X3 , no
°
minimal generator of in(o,1, ,11)(IA)
, °
is supported on a face of l:l.(0 ,1, ,11)'
,

Proof of Theorem 9.11. (taken from [27]). For a subset F ~ {I, ... ,n} the
following are equivalent:
F is a face of l:l.c
<* 3 y E Zd feasible for LPA,c(b)dual such that F = {j : aj . Y = Cj}
<* 3 bE Zd : an optimal solution y of LPA,c(b)dual
satisfies F = {j : aj . Y = Cj}
<* 3 bE Zd : an optimal solution u of LPA,c(b) has supp(u) = F
<* 3 bE Zd : an optimal solution u of LPA,c(b) has supp(u) = F
and is integral
<* 3 a monomial XU such that F = supp(u) and no power of XU is in inc(IA)
<* F does not contain the support of any minimal generator of inc(IA)'
The first equivalence follows from the definition of l:l.c. Since every solution
of LPA,c(b)dual is optimal for some cost vector, we get the second equiva-
lence. The third equivalence follows from complimentary slackness and the
fourth from the fact that scaling b by a suitable integer multiple until the
optimal solution to LPA,c(b) is integral does not change any of the preced-
ing arguments. However, then u is optimal for IPA,c(b) and all its integer
multiples are optimal for the integer programs IPA,c(') in which they are
solutions. This condition is equivalent to the fact that XU and all its powers
are standard with respect to c which is equivalent to saying that no minimal
generator of inc(IA) can have support F = supp(u). 0
Grabner Bases in Integer Programming 549

In other words, inc(IA) is determined by l:l.c at the level of its radi-


cal ideal although typically this correspondence is many-one, reflecting the
philosophy that integer programming is an arithmetic refinement of linear
programming.

Example 2.2 continued. The triangulation l:l.(0 ,13:0 '11) , = l:l.(2 ,7,820'0) sup-
ports both the initial ideals, in(O,I,O,I,I)(IA) = (X3XS, X2X~, X2X4, X2XS) and
in(2,7,8,0,0) = (X3 X S, X2X~, x~xs, X~X4)' in the sense of Theorem 3.11. 0

If the elements of gc are thought of as directed line segments, then the


above results imply that there exists a directed path in pt (possibly more
than one) from every non-optimal solution of IPA,c(b) to the unique opti-
mum, comprised of translates of elements in gc. In fact, the elements of
gc build a connected directed graph in each fiber pt wherein the nodes are
the lattice points in that fiber and the edges are elements of gc. The graph
in a fiber pt is connected since every non-optimal solution to IPA,c(b) has
out degree at least one and the graph has a unique sink at the optimal so-
lution to IPA,c(b). If the directions on all edges of this graph are reversed,
then starting at the unique optimum of IPA,c(b), one can trace a directed
path to every lattice point in pt. This allows all lattice points in pt to be
enumerated. These ideas have been applied to a class of stochastic integer
programs from manufacturing in [29] to statistical sampling in [13].

~'O'I) 1 2 ,0,0,1)

(1,0,0,4,0) (2,0,0,1,0) (0,1,0,1,0) (2,0,1,0,0)

Figure 5: Elements of gc

Example 2.2 continued. Figure 5 shows the projections onto the X4, Xs
plane, of the elements in g(O,I,O,I,I), interpreted as directed line segments.
Figure 6 shows the directed path corresponding to the particular red uc-
tion discussed earlier, of xlx~oxlox~ to its normal form xIxf Xs. The black
points in the figure are the projections onto the X4, Xs coordinates of the
550 RR Thomas

lattice points in the polytope P&5,34,18). The two types of directed segments
in this path correspond to the two distinct binomials used in the reduction.
o

.
(1,6,14,0,4 )
,,
~
,

. .,
'F: ,
• •, ,


~
(7,0,17,0,1)

(7,1,6,1,0) • • • '" @ (1,10,10,4,0)

Figure 6: A reduction path.

Given that the idea of solving integer programs via test sets is rather
natural, it is not surprising that a number of test sets can be found in the
integer programming literature. In 1975, Graver [17] showed the existence of
a finite set of vectors that solve integer programs ofthe form IPA,c(b) for all
b and all c. We call this test set the Graver basis of A and it will be discussed
further in Section 4. Variants of the Graver basis appear in both [6] and
[9]. In 1981, Scarf [23] introduced a test set for integer programs called
the neighbors of the origin. Relationships among these test sets (including
Grabner bases) are discussed in [30].

4 Universal test sets for linear and


integer programming
Thus far we examined test sets for linear and integer programs in which the
cost vector c was fixed. A universal test set for LPA (respectively IPA) is
a finite subset of ker(A) (respectively kerz(A)) that contains a test set for
LPA,c (respectively IPA,c) for all cERn that are generic with respect to
Grobner Bases in Integer Programming 551

LPA (respectively IPA). In this section we construct universal test sets for
LPA and I PA, examine their geometry and establish their relationships. We
say that a lattice point is primitive if the g.c.d. of its coordinates is one.

Definition 4.1 A circuit of A is a primitive non-zero vector '1.1 in kerz(A)


such that its support is minimal with respect to inclusion.

For the purposes of linear programming it suffices to define a circuit of


A as any non-zero vector of minimal support in ker(A) where two circuits
t and t' are equivalent if one is a real multiple of the other. The above
definition fixes a representative from each equivalence class (up to sign)
and is more precise for integer programming. We denote the set of circuits
of A as CA and as before we may interpret '1.1 E CA as the line segment
['1.1+,'1.1-] C PAu+ = PAu -' The polytope Pb is called a circuit fiber of?rA
(respectively circuit fiber of ?r~) if b = Au+ for some '1.1 E CA.

(0.1,0.0,2)

(1,0,112,0,3/2)

b = (3,1,2)

(2,0,0,1,0)

(0,0,1,0,1)

(0,2,0,0,1) L(2'O'O'2'O)

b = (3,2,1)

(0,1.0.1,0)
(2,0,1,0,0)

Figure 7: Circuit fibers of ?rA.

Example 2.2 continued. The circuits of A are 2e2 + e5 - 2el - ea, 3e2 +
e4 - 2el - 2ea, ea + 3e5 - 2el - 2e4, e2 + 2e5 - 2el - e4, and e2 + e4 - ea - e5.
552 R.R. Thomas

Figure 7 shows the circuit fibers of 11"A each accompanied by the right hand
side vector it corresponds to. It may be observed that every circuit is a
primitive edge in its fiber. Proposition 4.2 confirms that this is true in
general.

Proposition 4.2 If u is a circuit of A then, [u+, u-] is a primitive edge of


the circuit fiber, P Au+ .

Proof. We need only show that [u+,u-] is an edge of PAu+' Let cERn
be such that Cj = 0 for all j E supp{ u) and Cj = 1 otherwise. Then,
[u+,u-] lies in facec{PAu+). Suppose v E facec(PAu+)\[u+,u-]. Then
u+ -v is a non-zero vector in ker(A) whose support is contained in supp(u).
Since u E CA, there exists A E R\{O} such that u+ - v = AU. Therefore,
v = (1 - A)U+ + AU- which shows that v lies in the affine span of u+ and
U-. Since v ¢ [u+, u-] either u+ E]V, u- [ or u- E]U+, v[. However, neither
of these can occur since supp(u+) n supp(u-) = 0. Hence no such v exists
and facec(PAu+) = [u+,u-]. 0

The dashed lines in Figure 7 are the edges in each circuit fiber corre-
sponding to the circuit it contains.

Theorem 4.3 The circuits of A form a minimal universal test set for LPA.

Proof. Let Te be a minimal test set for LPA,c and u E Te. Either u E CA or
u = 2: AiPi where Ai E R~o and Pi are circuits of A that are sign compatible
with u. We say that x is sign compatible to y if Xi = 0 whenever Yi = 0 and
if Xi -=I 0 then sign(xi) = sign(Yi). Since c . u > 0 there exists some Pi in
the above sum such that c· Pi > O. However, then we can replace u by Pi
and still have a minimal test set for LPA,c' Therefore all minimal test sets
of LPA can be chosen to be subsets of CA.
To see that CA is a minimal universal test set for LPA we need to show
that every circuit is necessary in some minimal test set associated with LPA.
Consider the circuit u and the cost vector c in the proof of Proposition 4.2
for which facec(PAu+) = [u+, u-]. For an appropriate refinement ~c of c,
faceh(PAu+) = u- which shows that u is necessary in 7)-c to improve the
non-optimal solution u+. 0

Corollary 4.4 For every generic cost vector cERn, there exists a minimal
test set for LPA,c that consists only of edges of certain fibers of 1I"A.
Grobner Bases in Integer Programming 553

We now describe a universal test set for I P A due to Graver [17]. For
each a E {+,_}n, consider the monoid Su = kerz(A) n R~, where R~ is
the orthant with sign pattern a. Let Hu be the unique minimal Hilbert basis
of the pointed polyhedral (n-d)-cone cone(Sq) in Rn. The Hilbert basis of
a polyhedral cone K in R n is a minimal subset of K n zn such that every
integral vector in K can be written as a non-negative integral combination
of the elements in the basis. Pointed cones have unique Hilbert bases (see
Chapter 16, [24]). We call GrA:= UqHq\{O} the Graver basis of A.

Theorem 4.5 [17} The Graver basis of A is a universal test set for lPA.

Proof. Consider an arbitrary generic cost vector cERn and a right hand
side vector b E coneN(A). Let u be a non-optimal solution to lPA,c(b) for
which u' is the optimal solution. If u - u' E Sq then, there exists hi E Hq
and ni E N\{O} such that u-u' = 'Enihi. Since c· (u-u') > 0 there exists
some hi in the above sum such that c . hi > O. Subtracting this hi from u,
we get an improved solution to lPA,c(b). 0

The elements of Gr A are precisely all the elements u E kerz(A) for which
there does not exist v E kerz(A) such that v+ :$ u+ and v- :$ u-. However,
one can verify that the elements in a reduced Grobner basis of lA also have
this property, giving rise to the following result.

Theorem 4.6 Every reduced Grabner basis of lA is a subset of the Graver


basis of A.

Corollary 4.7 There exists only finitely many distinct reduced Grabner
bases for IA as the cost function is varied.

Let UGBA denote the union of all the finitely many distinct reduced
Grobner bases of lA. Then UGBA is a subset of GrA and is a univer-
sal test set for I P A. We call U G BAthe universal Grabner basis of A. If
9 E UGBA (respectively GrA) , we call the fiber P,ig+ a Grabner fiber (re-
spectively Graver fiber) of 1r~.
Both Gr A and U G B A can be computed using Grobner basis calculations
as described in [28]. In order to compute Gr A we consider the Lawrence
lifting of A which is the enlarged matrix A(A) = [1 ~ 1 E z(n+d)x2n

where 0 is a d x n matrix of all zeros and I is an n x n identity matrix. The


554 R.R. Thomas

matrices A and A(A) have isomorphic kernels: kerz(A(A)) = {(u, -u)


u E kerz(A)}. The toric ideal heAl is the homogeneous prime ideal

heAl = (xOtyP - xPyOt : a, {3 E N n , Aa = A{3}

in the polynomial ring k[Xl, ... , x n, yt, ... , Yn].

Theorem 4.8 For the matrix A(A), the following sets coincide:
(i) the Graver basis of A(A),
(ii) the universal Grobner basis of A(A),
(iii) any reduced Grobner basis of Ih(A) ,
(iv) any minimal generating set of heAl (up to scalar multiples), and
(v) the set of binomials xOtyP _xPyOt supported on primitive one-dimensional
fibers [(a, (3), ({3, a)].

Proof. The Graver bases Gr A and Grh(A) are related as follows: Grh(A) =
{ xOtyP - xPyOt : a,{3 E Nn,xOt - x P E GrA }. Since Grh(A) is the Graver
basis of A(A), it is a generating set of heAl and by Theorem 4.6, it is
a Grabner basis of heAl (not necessarily reduced), with respect to every
generic cost vector. Notice that it suffices to show that Grh(A) is the unique
minimal generating set of heAl in order to prove the equality of the sets
in (i),(ii),(iii) and (iv). This is because of Theorem 4.6, the definition of
UGBh(A), and the fact that every reduced Grabner basis of heAl contains
a minimal generating set for heAl. We show below that the sets in (i) and
(iv) coincide. Choose any element g := xOtyP - xPyOt of Grh(A), and fix
q E {-, +}n such that 0.- (3 lies in 8 u = kerz(A) n R~. Let 8 be the set
of all binomials x'Yy6 - x 6y'Y in Ih(A) except g. Suppose that 8 generates
h(A). Then xOtyP - xPyOt can be written as a linear combination of elements
in 8. But this is only possible if there exists a binomial x'Yy6 - x 6y'Y in 8
such that x'Yy6 divides xOtyp. This implies that 'Y - 0 lies in the semigroup
8 u · Moreover, since'Y :5 a and 0 :5 {3, the non-zero vector (a - (3) - b - 0)
lies in 8 u as well. Therefore 0.- (3 cannot be an element in the Hilbert basis
of cone(8u ). This is a contradiction, and we conclude that every minimal
generating set of heAl requires (a scalar multiple of) the binomial g.
For the equality of (i) and (v) we shall prove that every Graver fiber
contains precisely two lattice points. Let g E Grh(A) as above. Suppose that
the common fiber of (a,{3) and ((3,a) contains a third point b,o) E N 2n.
Then a, {3, 'Y, 0 E Nn all lie in the same fiber of 7r~ and 0.+ {3 = 'Y + o. This
implies that 0.- {3 = ('Y - (3) + (0 - (3). We will show that the non-zero vectors
Grobner Bases in Integer Programming 555

1- f3 and 6 - f3 are sign compatible with a - f3. This contradicts a - f3 E Gr A


and thus completes the proof. Let j E {I, ... ,n}. If aj > 0 then f3j = 0,
and this implies (a-f3)j = aj > 0, b-(3)j = Ij ~ 0, and (6-f3)j = 6j ~ 0.
°
If aj = then (a - (3)j = -f3j ::; 0, b - (3); = Ij - f3j = -6j ::; 0, and
(6 - (3)j = 6j - f3j = -'j ::; 0. 0

Algorithm 4.9 How to compute the Graver basis of A.


(i) Compute the reduced Grabner basis g of h(A) with respect to any term
order.
(ii) The Graver basis Gr A consists of all elements a- f3 such that x Ot y f3 _x f3 y Ot
appears in g.

Proof. By Theorem 4.8, any reduced Grabner basis of IA(A) is also the
Graver basis of A(A). The bijection between the kernels of A and A(A)
implies that a reduced Grabner basis of h(A) with the variables Yj set to
one, is the Graver basis of A. 0

Since U G B A ~ Gr A, all we need now for the computation of U G B A is


a way to identify the elements of U GB A from among the elements of GrA.
To do this, we establish the integer analog of Proposition 4.2. A second test
for whether an element in the Graver basis lies in UGBA can be found in
[28].

Proposition 4.10 If a vector u lies in the universal Grabner basis UGBA


then [u+, u-] is a primitive edge of the Griibner fiber p1u+.

Lemma 4.11 Let x Ot be a minimal generator of the initial monomial ideal


inc(IA), and let 6 be any lattice point in P~Ot such that c· a ~ c· 6. Then
supp(6) n supp(a) = 0.

Proof. Suppose k E supp(a) n supp(6) for a lattice point 6 in the Aa-fiber


of 7r~ for which c· a ~ c· 6. Then a - ek and 6 - ek are lattice points in the
same fiber of 7r~ and c· (a - ek) ~ c· (6 - ek). This implies that x Ot /Xk lies
in the initial monomial ideal inc(IA), which is a contradiction to x Ot being
a minimal generator. 0

Lemma 4.12 For an element u of UGBA, both u+ and u- are vertices in


the Au+ -fiber of 7r~.
556 R.R. Thomas

Proof. By definition, u- is the optimal vertex with respect to some cost


function c in the Au- -fiber of 1r~. Recall our assumption that the integer
programs IPA,c(b) are bounded. This implies the existence of an integral
vector M with all coordinates positive in the row space of A. After re-
placing M by a mUltiple if necessary, we may assume that M - c has all
coordinates positive. Clearly, the cost function w := M - c attains its max-
imum over p1u+ at u-. Let v denote the restriction of w to the support
of u+ (Le, Vi = Wi if ut > 0 and Vi = 0 if ut = 0). We claim that V
attains a unique maximum over p1u+ at u+. If not, then there exists an-
other lattice point 0 in p1u+ with V· 0 ~ v . u+. Since v . u+ > 0, the
set supp(v) n supp(o) = supp(u+) n supp(o) is not empty. By Lemma 4.11,
this implies c· u+ < c· o. In view of M· u+ = M . 0, we conclude that
v . u+ = W • u+ > W • 0 ~ v . 0, as desired. 0

Proof of Proposition 4.10. Let 9 E UGBA and choose w,v E N n as in the


proof of Lemma 4.12. Consider the cost vector u:= (v· g) w + (w· -g) v E
N n . We have u . g+ = u . g-. It suffices to show that u . g+ > u . 'Y
for all lattice points 'Yother than g+ and g- in the Ag+ -fiber of 1r~. If
supp(-y) n supp(g+) = 0, then supp(-y) n supp(v) = 0 which implies that
u . g- = (v· g)(w . g-) > (v· g)(w . 'Y). If supp(-y) n supp(g+) :1= 0, then by
Lemma 4.11, W· g+ > W· 'Y. This implies that U· g+ = (v· g)(w· g+) + (w·
-g))(v· g+) > (v· g)(w· 'Y) + (w· -g)(v. 'Y) = U· 'Y. Therefore, [g+ ,g-] is an
edge of the Ag+ -fiber of 1r~ with outer normal vector u. 0

Corollary 4.13 For every generic cost vector cERn, the reduced Grabner
basis gc consists only of edges of certain fibers of 1r~.

Algorithm 4.14 How to compute UGBA.


1. Compute the Graver basis Gr A using Algorithm 4.9.
2. For each element X O - xfJ of Gr A decide whether [a,.8] is an edge of plo .

Example 2.2 continued. Figure 8 shows the Grabner fibers of 1r~. As


before, the dashed lines indicate the edge defined by the elements of UGBA
in each Grabner fiber. In general, it is possible for two or more elements of
U GB A to come from a given Grabner fiber. 0

The following proposition ties together the three universal test sets in-
troduced in this section to give the result: CA S; UGBA S; GrA. However,
as Example 4.16 shows any of these containments mayor may not be strict.
Grabner Bases in Integer Programming 557

(0,1,0,0,2)

b = (3,1,2)

I
(2,0,0,1,0)

(0,0,1,0,1) (2,0,0,2,0)

(0,2,0,0,1)

b = (2,2,2)
b - (3,2,1)

(0,1,0,1,0) (2,0,1,0,0)

Figure 8: Grobner fibers of 1l"~,

Proposition 4.15 The circuits of A are contained in UGBA'

Proof, Let u = u+ - u- be a circuit of A. Consider the cost function


c := E{ei : i ¢ supp(u)}. After refining c to be generic, we may suppose
that c·u+ > C'U-. Then the monomial x u + lies in inc(IA), and there exists a
binomial xOtt -xOti in Qc such that xOtt divides xU+, Since c'at ~ c·ai and
supp(at) s;; supp(u+) s;; supp(u), we conclude that supp(ai) s;; supp(u) ,
Since u is a circuit, these facts imply u = a E U GB A, 0

Example 4.16 (1) For A = , (1 111)


0 1 2 3 ' CA = {x~ - XIX3,X~ - X2 X4,

xg - XIX~,X~ - x~xd while UGBA = GrA = CA U {XIX4 - X2X3}.

( .. )
11 ror A =
D (1 11 1) C
0 5 6 10 ' A = UG BA = {5 4 5 2 3 2
x3 -X2 X 4,X3 -xlx4,x2-

XIX4'X~ - XIXg} while GrA = CA U {xg - XIX~Xn.


558 R.R. Thomas

It is an important problem to bound the degree of the elements in the


three test sets given above. The degree of a binomial x U + - x U - E IA is
the I-norm of the vector U E kerz(A). It was shown in [25] that the degree
of an element in GrA is at most (n - d)(d + I)V(A) where V(A) is the
largest maximal minor of A. However, this bound is not sharp, and further
discussions on degree issues can be found in [20].

5 Variation of cost functions in linear and


integer programming
We now study the effect of varying cost vectors in linear and integer pro-
gramming and study their relationships.
There is a natural equivalence relation on the space of all (not just
generic) cost vectors with respect to IPA.

Definition 5.1 Two cost vectors c and din R n are equivalent (with respect
to IPA) if the integer programs IPA,c(b) and IPA,r!(b) have the same set of
optimal solutions for all b in coneN{A).

We establish a structure theorem for these equivalence classes (Theo-


rem 5.10). This theorem can also be derived from more general results
of Mora-Robbiano [22] and Bayer-Morrison [2] on Grabner fans and state
polytopes for graded polynomial ideals. What we present here is an alter-
nate construction for toric ideals which is self-contained and provides more
precise information for integer programming.
Recall that a cost vector c is generic for I PA if the optimal solution with
respect to c in every fiber p{ of 7r~ is a unique vertex. Generic equivalence
classes are characterized as follows:

Proposition 5.2 Given two generic cost vectors c and d in R n , the fol-
lowing are equivalent:
(i) For every b E coneN(A), the programs IPA,c(b) and IPA,c,(b) have the
same optimal solution.
(ii) The cost vectors c and d support the same optimal vertex in each fiber
p{ of7r~.
(iii) The reduced Grabner bases Oc and Or! of IA are equal.
Proof. Conditions (i) and (ii) are equivalent since the optimal solution of
IPA,c(b) is the vertex of p{ supported by c. The set of all non-optimal
Grobner Bases in Integer Programming 559

solutions to the programs IPA,eO and IPA,c'(·) are the monomial ideals
ine(IA) and ine/(IA) respectively. Then (i) holds if and only if ine(IA) =
ine/(IA). This is equivalent to (iii) since an initial ideal uniquely determines
the reduced Grabner basis associated with it. 0

Lemma 5.3 For the reduced Grabner basis Qe C zn, spanZ(Qe) = kerz(A).

Proof· Every vector gi = gt - gi in Qe lies in kerz(A). Hence spanZ(Qe) ~


kerz(A). Let 0 E kerz(A). We can write 0 uniquely as 0+ - 0- where
0+,0- are vectors in Nn with disjoint supports. Further, Ao+ = Ao-, and
hence 0+ and 0- lie in the same fiber of 1r~. Let {3 be the unique optimum
in this fiber with respect to c. Since Qe is a test set for IPA,e, there exist
non-negative integral multipliers ni and ni such that 0+ - {3 = L9iEQc nigi
and 0- - {3 = L9iEQc nigi. Hence 0 = L9iEQc (ni - ni)gi which implies that
spanZ(Qe) = kerz(A) ~ zn-d. 0

Let St(A) denote the Minkowski sum of all Grabner fibers of 1r~. This is a
well-defined polytope in Rn which we call the state polytope of A. Lemma 5.3
implies dim(St(A)) = n-d. The complete polyhedral fanN(St(A)) is called
the Grabner fan of A.

Lemma 5.4 Every fiber of 1r~ is a Minkowski summand of St(A).


Proof. It suffices to show that N(St(A)) is a refinement of N(Pl) for all
b E coneN(A). Let c be a generic cost vector and let w '" c belong to the
interior of the cone N(facee(St(A))j St(A)). Then w lies in N({3ij P1 p)
for each element 0i - f3i in the reduced Grabner basis Qe. This implies
w . 0i > w . {3i for all i, and therefore QUI = Qe.
Now consider an arbitrary bE coneN(A). Let U be the unique optimum
of IPA,e(b). The equality of test sets QUI = Qe implies that u is also the
unique optimum of IPA,UI(b). Hence w lies in the interior of N(uj pl).
Therefore, N(facee(St(A))j St(A)) ~ N(uj pl), as desired. 0

Proposition 5.5 Let db denote any probability measure with support


coneN (A) such that Ib bdb is finite. Then the Minkowski integral Ib pl db is
a polytope normally equivalent to St(A).

Proof. The hypothesis Ib bdb < 00 guarantees that Ib pl db is bounded.


By Lemma 5.4, Ib pl db is a summand of St(A) and is hence a polytope.
However, each Grabner fiber is a summand of Ib pl db and hence Ib pl db is
an (n - d)-polytope in Rn that has the same normal fan as St(A). 0
560 R.R. Thomas

Corollary 5.6 There exists only finitely many facet directions among the
fibers of of 1l'~.

l.From now on we shall use the term state polytope for any polytope
normally equivalent to Ib Pc db. We define the Grabner cone associated with
Yc to be the closed convex polyhedral cone

{x E R n : 9i . x ~ 0, 9i E Yc}

Observation 5.7 The Grabner cone ICc has full dimension n. Its lineality
space ICc n -ICc equals rowspan(A) ~ Rd.

Proof. We have dim(lC c) = n because c lies in the interior of ICc. The


lineality space ICc n -ICc equals the orthogonal complement of Yc in Rn,
which coincides with the row span of A by Lemma 5.3. 0

Proposition 5.8 The Grabner fan of A is the collection of all Grabner


cones ICc together with their faces, as c varies over all generic cost vectors.

Proof. The argument in the proof of Lemma 5.4 shows that, for c generic,
the Grabner cone ICc equals N(facec(St(A))j St(A)). 0

We remark that each cone in the Grabner fan has the same lineality
space rowspan(A) ~ Rd and it is often more convenient to work with its
image in ker(A) ~ Rn /rowspan(A) ~ Rn-d. We call this image of the
Grabner fan in R n - d , the pointed Grabner fan of A.

Corollary 5.9 The equivalence classes of cost functions with respect to IPA
(cf. Definition 5.1) are precisely the cells of the Grabner fan.

Proof. By Proposition 5.2, two cost vectors c and d are equivalent if and
only if they support the same optimal face in each fiber of 1l'~. Using Propo-
sitions 5.5 and 5.8, it follows that c and d are equivalent if and only if they
lie in the relative interior of the same cell in N(St(A)). 0

Example 2.2 continued. Adding the five Grabner fibers of 1l'~ we see that
St(A) is an octagon on R5. Figure 9 shows the pointed Grabner fan of A in
R 2 • The numbers of the Grabner cones correspond to the numbers indexing
the eight distinct Grabner bases of IA which are given below along with a
representative cost vector. 0
Grobner Bases in Integer Programming 561

Figure 9: The pointed Grabner fan of A.

1. c = (2,3,5,0,0) 2. c = (3,2,1,0,0) 3. c = (2,3,1,0,0)


{X3XS - X2 X 4, {X2 X 4 - X3 X S, {X2X4 - X3 X S,
2 2 2 2 2 2
XIX4 - X2 S,X XIX4 - X2 S,X XIX4 - X2 X S,
2
XIX3 - X22
X S} X~X3 - X~XS} X22 X S - xIX3
2}
4. c = (2,5,3,0,0) 5. c = (0,1,0,1,1) 6. c = (2,7,8,0,0)
{X2X4 - X3 X S, {X2 X 4 - X3 X S, {X3XS - X2 X 4,
2 2 2 2 2 2
X2 X S - XIX3, X2 X S - X1X3, X2 X s - X1X4,
2 2 2 2 2X 2
X2 S - X XIX4, X2 X S - XIX4, X2 S - XIX3,
2 2 - X3 X 3}
XIX4 S X3 X 3 2 2}
S - XIX4 X23 X 4 - xIX3
2 2}
7. c = (0,1,2,0,1) 8. c = (0,2,5,1,0)
{X3XS - X2 X 4, {X3 X S - X2 X 4,
2 2 2 2
X2 X S - XIX4, X2 X S - XIX4,
2X 2
X2 S - XIX3, X~X3 - X~Xs}
2 2-
XIX3 X23X 4}

A cost vector w lies in the interior of a Grabner cone Kc if and only if


w is generic and equivalent to c. Hence the interiors of the top-dimensional
cells in the Grabner fan are precisely the equivalence classes of generic cost
vectors. The following theorem summarizes the above discussion.
562 RR Thomas

Theorem 5.10
(i) There are only finitely many equivalence classes of cost vectors with re-
spect to IPA.
(ii) Each equivalence class is the relative interior of a convex polyhedral cone
in Rn.
(iii) The collection of these cones defines a polyhedral fan that covers Rn.
This fan is called the Grabner fan of A.
(iv) Let db denote any probability measure with support coneN(A) such that
Jb bdb < 00. Then the Minkowski integral St(A) = Jb pi db is an (n - d)-
dimensional convex polytope, called the state polytope of A. The normal
fan of St(A) equals the Grobner fan of A.

Proposition 5.11 Every primitive edge direction of a fiber of 11"~ is an


element of the universal Grabner basis UGBA.

Proof. Proposition 5.5 says that the edge directions of the fibers of 11"~ are
precisely the edge directions of the state polytope. Therefore it suffices to
show that every primitive edge direction of St(A) is an element of UGBA.
Suppose u is primitive and defines an edge direction of the state polytope
St(A). Then u is the normal vector to a facet of a maximal cone ICc in the
Grabner fan N(St(A)). Therefore u appears in the inequality presentation
of ICc. In other words, u is equal to one of the elements Ui of the reduced
Grabner basis gc' 0

Theorem 5.12 The universal Grabner basis of A consists precisely of the


primitive edge directions in the fibers of 11"~.

Proof. This follows from Propositions 4.10 and 5.11. o


Corollary 5.13 For an element u in UGBA, there exists two cost vectors
c and d in R n such that u E gc and -u E gel'

Proof. Every element in UGBA appears as a facet normal of some cell in


the Grabner fan. Take as gc and gc! the Grabner bases associated with the
two Grabner cones that share this facet. 0

Direct translations of the work of Gel'fand-Kapranov-Zelevinsky ([16],


Chapter 7) and Billera-Gel'fand-Sturmfels [4] give an analogous theory for
linear programming. Their work predates the integer results above and had
Grabner Bases in Integer Programming 563

different motivations. However, their results also follow from the natural
integer analog of the above line of reasoning.
Congruent to Definition 5.1 we have the following definition for LPA.

Definition 5.14 Two cost vectors c and c' in Rn are equivalent (with re-
spect to LPA) if the linear programs LPA,c(b) and LPA,c,(b) have the same
set of optimal solutions for all b in cone(A).

Theorem 5.15 For two generic (with respect to LPA) cost vectors c, c' E
R n, the following conditions are equivalent:
(i) c and c' are equivalent with respect to LPA.
(ii) c and c' support the same optimal vertex in each fiber Pb of 'irA·
(iii) The vectors c and c' define the same regular triangulation l::J. c = l::J.c"

The above theorem follows from Theorem 2.3 and says that the equiva-
lence class of a generic c ERn is

which is a full dimensional open polyhedral cone in Rn whose closure Lc is


called the secondary cone of c. The set of all secondary cones together with
their faces is the secondary fan of A, a term that is justified in the following
theorem from [16].

Theorem 5.16 There exists an (n - d)-dimensional polytope called the sec-


ondary polytope of A, denoted ~(A), whose inner normal fan N(~(A)) zs
the secondary fan of A.

This result then establishes the following results for linear programming.

Corollary 5.17 The equivalence classes of cost functions with respect to


LPA are precisely the cells of the secondary fan.

Theorem 5.18
(i) There are only finitely many equivalence classes of cost vectors with re-
spect to LPA.
(ii) Each equivalence class is the relative interior of a convex polyhedral cone
in Rn.
(iii) The collection of these cones defines a polyhedral fan that covers Rn.
This fan is called the secondary fan of A.
564 RR Thomas

(iv) Let db denote any probability measure with support cone (A) such that
Ib bdb < 00. Then the Minkowski integral E(A) = Ib Pbdb is an (n - d)-
dimensional convex polytope, called the secondary polytope of A. The nor-
mal fan of E(A) equals the secondary fan of A.

We call any (n - d)-polytope that is normally equivalent to E(A), the


secondary polytope of A. A specific secondary polytope of A is obtained
by taking the Minkowski sum of all circuit fibers of 'IrA which we shall call
E (A) in the rest of this section. Using this secondary polytope we can prove
Theorem 5.18 by establishing the following three linear analogs of results
stated earlier.
Lemma 5.19 Every fiber of 1I"A is a Minkowski summand of E(A).
Proposition 5.20 Let db denote any probability measure with support
cone (A) such that Ib bdb is finite. Then the Minkowski integral Ib Pbdb is a
polytope normally equivalent to E(A).

Proposition 5.21 The fan N(E(A)), is the collection of all cones £'C to-
gether with their faces, as c varies over all generic cost vectors.
As for Grobner cones, each secondary cone has the same lineality space
rowspan{A) and one can work with its image in Rn-d. We call this image
the pointed secondary fan of A.

Example 2.2 continued. We saw that A in our running example had five
distinct regular triangulations. Figure 10 shows the pointed secondary fan
of A. The Minkowski sum of the five circuit fibers of 1I"A from Figure 7 is a
pentagon in R 5 . 0
Proposition 4.2 and Proposition 5.20 together imply the following theo-
rem which is the linear counterpart of Theorem 5.12.
Theorem 5.22 The circuits of A consists precisely of the primitive edge
directions in the fibers of 11"A.

We conclude this section by establishing the relationship between the


secondary and state polytope associated with a matrix A and hence between
the secondary and Grabner fan of A.
Proposition 5.23 (i) The Grabner fan of A is a refinement of the sec-
ondary fan of A.
(ii) The secondary polytope of A is a summand of the state polytope of A.
Grobner Bases in Integer Programming 565

Figure 10: The pointed secondary fan of A.

Proof. From the general facts about Minkowski sums in the introduction,
it suffices to prove (i). Let c,e be generic cost vectors with respect to IPA
that belong to the same Grabner cone }Cc. Then inc(IA) = inc' (IA) and
hence by Theorem 3.11, Ll.c = Ll.c'. This shows that every top dimensional
cell of the Grabner fan lies entirely within a top dimensional cell of the sec-
ondary fan and hence the Grabner fan is a refinement of the secondary fan. 0

The above proposition proves our claim in the introduction that cost
functions that are generic with respect to IPA are also generic with respect
to LPA since it proves that the interior of every Grabner cone is contained
in the interior of a secondary cone. The converse is true if and only if the
secondary fan and Grabner fan coincides.

6 Unimodular matrices
Unimodular matrices occupy a very special place among all matrices, in the
context of the theory developed above.
566 R.R. Thomas

Definition 6.1 An integral matrix A of full row rank is called unimodular


if each of its maximal minors is one of -p, 0 or p, where p is a positive
integral constant.

Theorem 6.2 [26] If A is unimodular, then CA = UGBA = GrA but the


converse is false.

Proof. It suffices to show that CA = Gr A. Since a circuit u of A lies in


kerz (A), by Cramer's rule, every coordinate of u is the ratio of two maximal
minors of A of which the denominator is necessarily non-zero. Hence if A is
unimodular, every coordinate of a circuit is one of 0, 1 or -1. This implies
that the extreme rays of cone(Soo) are generated by 0, ±1 vectors for all
U E {+, _}n. Therefore, each Hoo and hence the Graver basis Gr A does not
contain any integral vector that is not a circuit of A.
To see that the converse is false, consider our running example for which
CA = UGBA = Gr A. However this matrix is not unimodular since its
maximal minors have absolute value one, two and three. 0

Theorem 6.3 [26] If A is unimodular, then the secondary polytope of A


coincides with the state polytope of A, but the converse is false.

Proof. If A is unimodular then the integer programming fiber pl coincides


with the linear programming fiber Pb for all b E coneN(A) ([24], Theorem
19.2). Moreover, if bE cone (A) \ coneN(A), then there exists b' E coneN(A)
such that Pb and Pb' are normally equivalent. Therefore the Minkowski
integrals in Theorems 5.18 (iv) and 5.10 (iv) coincide, which proves that
St(A) = ~(A).
To see that the converse is false, consider the matrix

1 -2 1 0 0 0 0 0
1 -1 0 1 0 0 0 0
1 0 0 0 1 0 0 0
A =
0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
This matrix is the Lawrence lifting of the two by four matrix in its top left
corner. Its Graver basis consists precisely of the four circuits. There are
eight distinct reduced Grobner bases associated with this matrix each of
which corresponds to a distinct triangulation. This implies that the state
Grabner Bases in Integer Programming 567

and secondary polytopes coincide. However, A is not unimodular since it


has maximal minors of absolute value zero, one and two. 0

We now examine unimodular matrices that are of Lawrence type. A


matrix is of Lawrence type if it is the Lawrence lifting of another matrix.

Definition 6.4 The circuit arrangement 01 A is the hyperplane arrange-


ment consisting of the hyperplanes in R n which are orthogonal to the circuits
ofA.

The following result was stated in Lemma 5.2 of [3].

Proposition 6.5 (i) The circuit arrangement of A is a refinement of the


secondary fan of A.
(ii) II A is of Lawrence type, then the circuit arrangement equals the sec-
ondary fan.

Definition 6.6 The Graver arrangement 01 A is the arrangement consisting


of the hyperplanes in R n which are orthogonal to the elements in the Graver
basis GrA.

The following proposition is then a direct consequence of Theorem 4.6


and Proposition 4.15.

Proposition 6.7 (i) The Graver arrangement 01 A is a refinement 01 the


Grabner Ian 01 A.
(ii) The Graver arrangement of A is a refinement of the circuit arrangement
ofA.

Invoking Theorem 4.8 we get the following easy corollary.

Corollary 6.S If A is 01 Lawrence type, then the state polytope 01 A is a


zonotope (i.e., a Minkowski sum 01 line segments). The Grabner fan 01 A
coincides with the Graver arrangement of A.

However, if a matrix A is unimodular and of Lawrence type, then the


above results imply the following.

Corollary 6.9 If the matrix A is unimodular and of Lawrence type, then its
secondary fan, circuit arrangement, Grabner Ian and Graver arrangement
all coincide.
568 R.R. Thomas

Example 6.10 Let Aa be the vertex-edge incidence matrix of a directed


graph G = (V, E), and consider the capacitated transshipment problem:

minimize c· x subject to Aa . x = band 0 ::; x ::; b' , X E ZE.

When rewriting this integer program in the form (2.1), we get the enlarged
coefficient matrix A(Aa) = (~a ~). This matrix has format (IEI+IVD x
21V1 and it is unimodular and of Lawrence type.Hence, for the family of flow
problems lPA(AG)' the secondary fan, the circuit arrangement, the Grobner
fan and the Graver arrangement all coincide.

7 Implementation Issues
We conclude in this section with a brief discussion of issues relating to the
implementation of the Grobner basis method for integer programming.
In Section 3 we saw some of the problems that have to be faced while
solving an integer program using the Conti-Traverso algorithm. The first
issue is finding a generating set for the toric ideal, lA, which can be a stum-
bling block for large integer programs. By Theorem 4.8, finding a generating
set for a toric ideal is in the worst case as hard as finding a universal Grabner
basis of the ideal. In theory, generators for lA can be found using the orig-
inal Conti-Traverso algorithm via a Grobner basis computation on a larger
ideal in at most d + 1 extra variables. In practice, this method works only
when the integer programs at hand are reasonably small. Newer algorithms
given in [14] and [21] run entirely within the polynomial ring k[x] and have
been found to be faster (as one might expect). Both these algorithms use
many relatively short Grobner basis computations.
Once a generating set for lA has been found, the reduced Grobner basis
gc has to be calculated. This can be done in principle using any software
package that computes Grobner bases like Macaulay 2. As the problem size
increases this computation may become difficult or sometimes impossible.
In many situations, one can push the computations further by exploiting
the specific structure of the integer program at hand. One such method-
ology was adopted in [29] where it was possible to decompose the Grobner
basis computation into smaller sub-computations that could be carried out
efficiently. The perfect I-matching problem was studied in [12] where it
was shown that the Grobner basis elements can be interpreted graph theo-
retically. Recently Hayer and Hochstattler [19] investigated Grobner bases
Grobner Bases in Integer Programming 569

of vertex cover problems where once again graphical interpretations of the


Grabner basis is possible. In these situations combinatorial arguments aid
both in the computation and understanding of Qe and UGBA. Many other
special cases can be found in [27]. The recent paper [10] introduces certain
decomposition techniques that aid in the computation of test sets.
The software GRlN (GRabner bases for INteger programming) exploits
the combinatorial interpretation of the Buchberger algorithm possible in the
case of toric ideals, along with some of the usual criteria used to speed up
Grabner bases computations. A detailed discussion of the special features
implemented in GRlN can be found in [21]. This paper also reports experi-
ments with randomly generated integer matrices of sizes ranging from 4 x 8
to 8 x 16 with non-negative entries in the range 0 to 20 and comparisons
with CPLEX. According to their results GRlN becomes competitive with
CPLEX when the matrix A is dense and randomly generated with entries
that are not all 0/1. In these situations, the optimal solution of the integer
program is typically far from the optimal solution of its linear relaxation.
The expensive part of the work done by GRlN is in the computation of Qe,
after which reducing a non-optimal solution in a given fiber to the optimum
was found to be extremely fast. The computation of Qe has to be thought
of as a preprocessing procedure given the matrix A and cost vector c after
which any IPA,e(b) for b E coneN{A) can be solved very quickly. Traditional
algorithms for integer programming typically need to re-start calculations if
the right hand side vector is changed.
Often in practice, one is interested in solving IPA,e(b) for a fixed b. In
this situation, typically only a small fraction of Qe is required. So a natural
question to ask is if the Buchberger algorithm can be truncated with respect
to the right hand side vector b to output a subset of Qe that will be sufficient
to solve IPA,e(b). A combinatorial algorithm in this direction was given in
[32] which was generalized and placed in an algebraic setting in [31]. The idea
involves imposing a multivariate grading on IA with respect to coneN{A)
and a partial order on the elements of coneN(A). The Buchberger algorithm
is then truncated with respect to a given b to produce a test set sufficient
for IPA,e(b') where b' is any right hand side vector less than or equal to bin
the above partial order.

References
[1] W.W. Adams and P. Loustaunau, An Introduction to Grabner Bases,
570 R.R. Thomas

American Mathematical Society, Graduate Studies in Math., Vol. III,


1994.
[2] D. Bayer and 1. Morrison, Grabner bases and geometric invariant theory
I, Journal of Symbolic Computation Vol.6 (1988) pp. 209-217.
[3] L.J. Billera, P. Filliman and B. Sturmfels, Constructions and complexity
of secondary polytopes, Advances in Mathematics Vol. 83 (1990) pp. 155-
179.
[4] L.J. Billera, 1.M. Gel'fand and B. Sturmfels, Duality and minors of sec-
ondary polyhedra, Journal of Combinatorial Theory B Vol. 57 (1993)
pp. 258-268.
[5] L.J. Billera and B. Sturmfels, Fiber polytopes, Annals of Mathematics
Vol. 135 (1992) pp. 527-549.
[6] C.E. Blair and RG. Jeroslow, The value function of an integer program,
Mathematical Programming Vol.23 (1982) pp. 237-273.
[7] B. Buchberger, On Finding a Vector Space Basis of the Residue Class
Ring Modulo a Zero Dimensional Polynomial Ideal (German), Ph.D.
Thesis, Univ of Innsbruck, Austria, 1965.
[8] P. Conti and C. Traverso, Grabner bases and integer programming,
Proceedings AAECC-9 (New Orleans), Springer Verlag, LNCS Vol.539
(1991) pp. 130-139.
[9] W. Cook, A.M.H. Gerards, A. Schrijver and E. Tardos, Sensitivity theo-
rems in integer linear programming, Mathematical Programming Vol.34
(1986) pp. 251-264.
[10] G. Cornuejols, R Urbaniak, R Weismantel and L. Wolsey, Decomposi-
tion of integer programs and of generating sets, Fifth Annual European
Symposium on Algorithms (ESA'97), Graz, Austria, 1997. To appear in
LNCS, Springer-Verlag.
[11] D. Cox, J. Little and D. O'Shea, Ideals, Varieties, and Algorithms,
Second edition, Springer-Verlag, New York, 1996.
[12] J. de Loera, B. Sturmfels and RRThomas, Grabner bases and trian-
gulations of the second hypersimplex, Combinatorica, VoLl5 (1995) pp.
409-424.
Grabner Bases in Integer Programming 571

[13] P. Diaconis and B. Sturmfels, Algebraic algorithms for sampling from


conditional distributions, Annals of Statistics, to appear.
[14] F. Di Biase and R. Urbanke, An algorithm to calculate the kernel of cer-
tain polynomial ring homomorphisms, Experimental Mathematics, VolA
(1995) pp. 227-234.
[15] W. Fulton, Introduction to Toric Varieties, Princeton University Press,
Princeton, New Jersey, 1993.
[16] LM. Gel'fand, M. Kapranov and A. Zelevinsky, Multidimensional De-
terminants, Discriminants and Resultants, Birkhauser, Boston, 1994.
[17] J.E. Graver, On the foundations of linear and integer programming I,
Mathematical Programming Vol.8 (1975) pp. 207-226.
[18] D. Grayson and M. Stillman, Macaulay 2: a computer algebra system,
available from http://www.math.uiuc.edu/-vdanj.
[19] M.Hayer and W.Hochstattler, personal communication.
[20] S. Hosten, Degrees of Grabner Bases of Integer Programs, Ph. D. The-
sis, Cornell University, 1997.
[21] S. Hosten and B. Sturmfels, GRIN: An implementation of Grabner
bases for integer programming, in Integer Programming and Combinato-
rial Optimization (E. Balas and J. Clausen eds.), LNCS Vol. 920 (1995)
pp. 267-276.
[22] T. Mora and L. Robbiano, The Grabner fan of an ideal, Journal of
Symbolic Computation Vol.6 (1988) pp. 183-208.
[23] H.E. Scarf, Neighborhood systems for production sets with indivisibil-
ities, Econometrica Vol. 54 (1986) pp. 507-532.

[24] A. Schrijver, Theory of Linear and Integer Programming, Wiley-


Interscience Series in Discrete Mathematics and Optimization, New
York, 1986.
[25] B. Sturmfels, Grabner bases of toric varieties, Tohoku Math. Journal
Vol. 43 (1991) pp. 249-261.
[26] B. Sturmfels, Asymptotic analysis of toric ideals, Memoirs of the Fac-
ulty of Science, Kyushu University Ser.A, Vol. 46 (1992) pp. 217-228.
572 R.R. Thomas

[27] B. Sturmfels, Grobner Bases and Convex Polytopes, American Mathe-


matical Society, Providence, RI, 1995.
[28] B. Sturmfels and RRThomas, Variation of cost functions in integer
programming, Mathematical Programming Vol. 77 (1997) pp. 357-387.
[29] S.R Tayur, RR Thomas and N.R Natraj, An algebraic geometry al-
gorithm for scheduling in the presence of setups and correlated demands,
Mathematical Programming, Vol. 69 (1995) pp. 369-40l.
[30] RR Thomas, A geometric Buchberger algorithm for integer program-
ming, Mathematics of Operations Research Vol. 20 (1995) pp. 864-884.

[31] RR.Thomas and RWeismantel, Truncated Grabner bases for integer


programming, Applicable Algebra in Engineering, Communication and
Computing, to appear.
[32] R.Urbaniak, RWeismantel and G.Ziegler, A variant of Buchberger's
algorithm for integer programming, SIAM J. on Discrete Mathematics,
Vol. 1 (1997) pp. 96-108.

[33] G. Ziegler, Lectures on Polytopes, Graduate Texts in Mathematics,


Springer Verlag, New York, 1995.
573

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 1)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 573-746
©1998 Kluwer Academic Publishers

Applications of Set Covering, Set Packing and Set


Partitioning Models: A Survey
R.R. Vemuganti
Merrick School of Business
University of Baltimore
1420 North Charles Street
Baltimore, Maryland 21201
E-mail: rvemugantH!ubmail. ubalt. edu

Contents
1 Introduction 575

2 SP and 8PT Models and Their Variants 576

3 Transformation of the Models 578


3.1 SPT to SC . . . . . . . . . . . . . . . . . . . . . . 579
3.2 SPT to SP. . . . . . . . . . . . . . . . . . . . . . . . . . 579
3.3 SC to GSP and SP to GSC . . . . . . . . . . . . . . . . 580
3.4 Zero-one MDK to GSP . . . . . . . . . . . . . . . . . . . . . . . . . . 580
3.5 GSP to SP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
3.6 Zero-one LIP to Zero-one MD Knapsack . . . . . . . . . . . . . . . . 585
3.7 Mixed SPT and SP to SP . . . . . . . . . . . . . . . . . . . . . 586

4 Graphs and Networks 586


4.1 Vertex (Node) Packing Problem . . . . . . . . . . . . . . . . . . . . . 588
4.2 Maximum Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
4.3 Minimum Covering Problem . . . . . . . . . . . . . . . . . . . . . . . 592
4.4 Chromatic Index and Chromatic Number . . . . . . . . . . . . . . . 592
4.5 Multi-Commodity Disconnecting Set Problem . . . . . . . . . . . . . 594
4.6 Steiner Problem in Graphs . . . . . . . . . . . . . . . . . . . . . . . 596
574 R.R. Vemuganti

5 Personnel Scheduling Models 598


5.1 Days Off or Cyclical Scheduling. · 599
5.2 Shift Scheduling · 601
5.3 Tour Scheduling . . . . . . . . . · 602

6 Crew Scheduling 603

7 Manufacturing 609
7.1 Assembly Line Balancing Problem . . . . . . · 609
7.2 Discrete Lot Sizing and Scheduling Problem. · 610
7.3 Ingot Size Selection . . . . . . . . . . . . . . . .611
7.4 Spare Parts Allocation . . . . . . . . . . . . . · 613
7.5 Pattern Sequencing in Cutting Stock Operations · 613
7.6 Constrained Network Scheduling Problem · 615
7.7 Cellular Manufacturing . . . . . . . . . . . . . . · 617

8 Miscellaneous Operations 618


8.1 Frequency Planning . · 618
8.2 Timetable Scheduling · 619
8.3 Testing and Diagnosis · 620
8.4 Political Districting. . · 622
8.5 Information Retrieval and Editing · 623
8.6 Check Clearing . . . . . . . · 625
8.7 Capital Budgeting Problem · 626
8.8 Fixed Charge Problem. · 627
8.9 Mathematical Problems · 628

9 Routing 628
9.1 Traveling Salesman Problem. · 629
9.2 Single Depot Vehicle Routing · 631
9.3 Multiple Depots and Extensions · 633

10 Location 633
10.1 Plant Location Problem · 634
10.2 Lock Box Location Problem. · 636
10.3 P-Center Problem . . . . . . · 637
10.4 P-Median Problem . . . . . . · 638
10.5 Service Facility Location Problem. · 638

11 Review of Bibliography 640


11.1 Theory. . . . . . · 641
11.2 Transformations . . . · 642
11.3 Graphs. . . . . . . . . · 642
11.4 Personnel Scheduling. .643
11.5 Crew Scheduling . . . . · 644
Set Covering, Set Packing and Set Partitioning 575

11.6 Manufacturing . . . . . . 645


11. 7 Miscellaneous Operations 646
11.8 Routing . 647
11.9 Location. . 652

12 Conclusions 658

References

Abstract
Set covering, set packing and set partitioning models are a special class of
linear integer programs. These models and their variants have been used to
formulate a variety of practical problems in such areas as capital budget-
ing, crew scheduling, cutting stock, facilities location, graphs and networks,
manufacturing, personnel scheduling, vehicle routing and timetable schedul-
ing among others. Based on the special structure of these models, efficient
computational techniques have been developed to solve large size problems
making it possible to solve many real world applications. This paper is a
survey of the applications of the set covering, set packing, set partitioning
models and their variants, including generalizations.

1 Introduction
Set covering (SC), set packing (SP) and set partitioning (SPT) problems are
a very useful and important class of linear integer programming models. A
variety of practical problems have been formulated as one of these models
or their variants. Because of their special structure, it has been possible to
develop efficient techniques (see the list of references under the theory) to
solve large size problems. Due to the facility to solve large problems and
the flexibility to model a variety of systems coupled with the advances in
computer technology, these models have been used to solve many real world
problems. This paper is a survey of SC, SP and SPT formulations (including
their variants) of capital budgeting, crew scheduling, cutting stock, facilities
location, graphs and networks, manufacturing, personnel scheduling, vehicle
routing and time table scheduling problems among others. In addition,
576 R.R. Vemuganti

relationships among these models and the conversion of finite linear integer
programs (LIP) to one of these models are presented.
Another objective of this paper is to provide an extensive bibliography
on both theory and applications of SC, SP and SPT models. For conve-
nience, the reference list is divided into nine groups namely, general, the-
ory, graphs, personnel scheduling, crew scheduling, manufacturing, miscel-
laneous operations, routing and location. The general list consists of useful
references related to general integer programming techniques and concepts
including Lagrangian relaxation, surrogate constraints, subgradient meth-
ods, tabu search, disjunctive programming, branch and bound and cutting
plane methods. These references are selected from the papers listed in the
other eight groups which deal with both application and theory of SC, SP
and SPT models. The theory list contains articles exclusively devoted to
algorithms and mathematical properties of the SC, SP and SPT models.
The remaining seven groups deal with papers related to the specific appli-
cation area the title suggests except the group miscellaneous operations.
The miscellaneous operations group contains papers related to a variety of
applications including time table scheduling, information retrieval, political
redistricting, diagnostic systems, distribution of broadcasting frequencies
among others. It should be noted that many papers listed in the applica-
tion group contain theoretical contributions also. Also some papers in the
application group especially in location and routing, may be concerned with
nonlinear programming models and general integer programming models.
The purpose of including such articles is to provide as comprehensive a bib-
liography as possible. The list of references included in this paper is by no
means complete. The interested reader may review the list of the papers in-
cluded in the bibliography for additional references. The author apologizes,
if some useful and relevant papers are not included.
The next section deals with the basic SC, SP, and SPT models and their
variants. The subsequent sections address relationships among these models
including LIP and each application group. Several numerical examples are
provided throughout the paper to illustrate various applications.

2 SP and SPT Models and Their Variants


Consider a finite set M = 1,2, ... , m and Mj, j E N, a collection of subsets
of the set M where N = 1,2, ... , n. A subset F ~ N is called a cover of M if
UjEFMj = M. The subset F ~ N is called a packing of M if Mj n Mk = 0
Set Covering, Set Packing and Set Partitioning 577

for all j, kEF and j i= k. If F ~ N is both a cover and packing then it is


called a partitioning.
Suppose Cj is the cost associated with Mj. Then the set covering problem
is to find a minimum cost cover. If Cj is the value or weight of Mj, then the
set packing problem is to find a maximum weight or value packing. Similarly
the set partitioning problem is to find a partitioning with minimum cost.
These problems can be formulated as zero-one linear integer programs as
shown below. For all i E M and j E N let

I if i E Mj
{
aij = 0 otherwise

and
x. = {
I if·EF
J
J 0 otherwise
Then the set covering (1), set packing (2) and set partitioning (3) formula-
tions are given by

min Ej=l CjXj


s.t. Ej=l aijXj ~ 1 i = 1,2, ••. ,m
and Xj = 0,1 j = 1,2, ••• ,n (1)

max Ej=l CjXj


s.t. Ej=l aijXj :s; 1 i = 1,2, .•. ,m
and Xj = 0,1 j = 1,2, ••. ,n (2)

max Ej=l CjXj


s.t. Ej=l aijXj = 1 i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (3)

The following numerical example illustrates the above models.


Example 1 Suppose M = {1,2,3,4,5,6}, MI = {1,2}, M2 = {1,3,4},
M3 = {2,4,5}, M4 = {3,5,6}, M5 = {4,5,6}, CI = 5, C2 = 4, C3 = 6, C4 = 2,
and C5 = 4. The formulation of the set covering model is given by

min 5XI + 4X2 + 6X3 + 2X4 + 4X5


s. t. Xl + X2 ~ 1
578 R.R. Vemuganti

Xl + X3 ~ 1
X2 +X4 ~ 1
X2 + X3 +X5 ~ 1
X3 + X4 + X5 ~ 1
X4 + X5 ~ 1

and XI,X2,X3,X4,X5 E {a,l}

The formulation of the corresponding set packing and set partitioning mod-
els is straight forward.
Clearly, the three models are special structured zero-one linear integer
programs since all the elements of the constraint coefficient matrix A =
(aij) are a or 1 and the right hand side (RHS) of the constraints are all
unity. For some applications the RHS of the constraints may be not all be
unity but positive integers. The corresponding models are called general
set covering (GSC), general set packing (GSP) and general set partitioning
(GSPT). For these general models, while the variables are required to be
non-negative integers, they need not be constrained to zero or one. For some
other applications the constraint set may include two types of inequalities
or all three types of inequalities. Such models are called mixed models. For
example, a model with both less than or equal to and greater than or equal
to constraints with all RHS equal to unity is called mixed SC and SP model.
It should be noted that the integer programming formulations of the
SC, SP and SPT models including their generalizations are NP-complete,
except for a few special cases. The relationships among these models and
the transformation of the linear integer programs, including the zero-one
multi-dimensional knapsack problem into a SP problem, are presented next.

3 Transformation of the Models

Transformation of the models is useful in comparing various computational


techniques to generate the optimal solutions. Several transformations to
convert one model to another model including the conversion of a zero-
one LIP to a zero-one multi-dimensional knapsack (MDK) problem and the
conversion of a MDK problem to a SP problem are explored in this section.
Set Covering, Set Packing and Set Partitioning 579

3.1 SPT to SC
Subtracting artificial surplus variables Yi from the ith constraint of the SPT
model formulation (3), the constraints of the SPT model can be written as:
n
L aijXj - Yi = 1.
j=1

To insure all artificial surplus variables remain at zero level in any optimal
solution (when SPT is feasible), the objective function is changed to
n m
LCjXj +fJLYi
j=1 i=1

where fJ is any number greater than Ej Cj. Substituting for Yi in the ob-
jective function and eliminating Yi from the constraints, the following SO
formulation yields an optimal solution to the SPT problem.

min Ej=1 d;Xj - mfJ


s.t. Ej=1 aijXj ~ 1 i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (4)

where cj = Cj + fJ Ej!:l aij'


3.2 SPT to SP
Adding artificial slack variables Yi to the ith constraint of the SPT model
formulation (3), the constraints of the SPT model can be written as:
n
LaijXj +Yi = 1.
j=1

As noted earlier, the modified objective function


n m
LCjXj + fJLYi
j=1 i=l

guarantees that all artificial slack variables remain at zero level in any opti-
mal solution. Substituting for Yi in the objective function and eliminating
580 R.R. Vemuganti

Yi from the constraints, the following formulation yields an optimal solution


to the SPT problem.

min Ej=l SXj + m(J


s.t. Ej=l aijXj ::; 1 i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (5)

where S = Cj - (J EZ!:l aij· Noting the fact S are negative numbers and
changing the objective function to
n
max ~)-cj)Xj
j=l

results in a SP formulation.

3.3 SC to GSP and SP to GSC


Substituting Xj = 1- Yj, the formulation (1) of the se problem is equi
valent to

max Ej=l CjYj - Ej=l Cj


s.t. Ej=l aijYj ::; bi i=1,2, ... ,m
and Yj = 0,1 j = 1,2, ... ,n (6)

where bi = Ej=l aij. Noting the fact that all bi are nonnegative integers
(when se is feasible), the above is a asp
formulation of the se problem.
A similar transformation can be used to convert a SP problem into a ase
problem.

3.4 Zero-one MDK to GSP


The LIP formulation of a zero-one MDK problem is

max Ej=l CjXj


s.t. Ej=l aijXj ::; bi i = 1,2, ... ,m
and Xj = 0, 1 j = 1,2, ... ,n (7)

where aij, Cj, and bi are all nonnegative numbers. When all numbers in-
volved are rational they can be converted to integers by multiplying them
Set Covering, Set Packing and Set Partitioning 581

with an appropriate positive integer. In transforming this problem to a GSP


problem it is assumed that all aij, Cj, and bi are nonnegative integers. Let

j = 1,2, ... ,n, and


m
a' =
3
maxa"
i=l '3
Xjk = 0,1 j = 1,2, ••• ,n

k= 1,2, ••• ,aj

If all Xjk are guaranteed to be equal for a given j, clearly (when aij > 0)
aij

aijXj = L Xjk·
k=l

Then an equivalent formulation of the MDK problem is given by

max E'1=l E:~l CjkXjk


S.t. E'1=l E::;1 Xjk ~ bi i = 1,2, ... ,m
Xjk + Zj = 1 j = 1,2, ... ,n
k = 1,2, ... ,aj
Xjk = 0,1 j = 1,2, ••• ,n
k = 1,2, ••• ,aj
and Zj = 0,1 j = 1,2, ••• ,n (8)
where Cjk = cj/aj for k = 1,2, .•. ,aj. It should be noted that the second set
of constraints guarantee that all Xjk are equal for a given j. Add artificial
slack variables to the equality constraints with a very large negative coeffi-
cient in the objective function as in the conversion of the SPT problem to SP
problem and eliminate the artificial slack variables from both the objective
function and the constraints to obtain the required formulation.
Example 2. Consider the following zero-one two dimensional knapsack
problem.

max + 4X2 + 12xa + X4


lOXl

s.t. 3X1 + X2 + 4xa + X4 ~ 4

Xl + 2X2 + 2xa + 2X4 < 3


and Xj = 0,1 j = 1,2,3,4.

Since a1 = max(3, 1) = 3,a2 = max(1,2) = 2,aa = max(4,2) = 4, and a4 =


max(1,2) = 2, by defining eleven Xjk(Xll, X12, X13, X21, X22, Xal, Xa2, Xaa, Xa4,
582 R.R. Vemuganti

X41, and X42) and four Zj(ZI, Z2, Z3, and Z4) zero-one variables, the problem
can be reformulated as

max (1O/3)(xu + X12 + X13) + 2(X21 + X22) + 3(X31 + X32 + X33 + X34)
+(1/2)(X41 + X42)
s.t. Xu + X12 + X13 + X21 + X31 + X32 + X33 + X34 + X41 ~ 4
Xu + X21 + X22 + X31 + X32 + X41 + X42 ~ 3
Xlk + Zl = 1 k = 1,2,3
X2k + Z2 = 1 k = 1,2
X3k +Z3 =1 k = 1,2,3,4
X4k +Z4 = 1 k = 1,2
Zj = 0,1 j = 1,2,3,4
Xlk = 0,1 k = 1,2,3
X2k = 0,1 k = 1,2
X3k = 0,1 k = 1,2,3,4
and X4k = 0,1 k = 1,2.

The equality constraints can be changed to less than or equal to constraints


by subtracting the following expression
324 2
O(E Xu + L: X2k + L: X3k + L: X4k + 3Z1 + 2Z2 + 4Z3 + 2Z4) - 110
k=l k=l k=l k=l
from the objective function where 0 > (10 + 4 + 12 + 1) = 27.

3.5 GSP to SP
The LIP formulation of a GSP problem when the variables are restricted to
binary is

max E'J=l CjXj


s.t. E'J=l aijXj ~ bi i = 1,2, •.• ,m
and Xj = 0,1 j = 1,2, •.• ,n (9)

where bi are positive integers. In order to transform this to a SP problem,


the ith constraint is replaced by bi inequality constraints with all right hand
Set Covering, Set Packing and Set Partitioning 583

sides equal to one. Let

Yijk = 0,1 for j = 1,2, ... ,n


i = 1,2, ... ,m
k = 1,2, ... ,bi
Zj = 0,1 for j = 1,2, ... ,n
and b = m'Rxb·
i=1 I

Rearrange the constraints if necessary so that bl = b and consider the fol-


lowing mixed SP and SPT formulation.

max 2: n b.
j=1 Cj 2: k'=1 Yljk
s.t. 2:'1=1 aijYijk :::; 1 i = 1,2, ... ,m
k = 1,2, ... ,bi
2::;=1 Yijk + Zj = 1 i = 1,2, ... ,m
j = 1,2, ... ,n
Yijk = 0,1 j = 1,2, ... ,n
i = 1,2, ... ,m
k = 1,2, ... ,bi
and Zj = 0,1 j = 1,2, ... ,n (10)

To see the equivalence between these two formulations summing up the


inequalities over k for a given i yields
n b;
L aij L Yijk :::; bi.
j=1 k=l

From the equality constraints it is clear that


b;
LYijk = Xj
k=l

is zero-one for all i = 1,2, ... , m. It is straight forward to verify that Xj


constructed from a feasible solution to the mixed SP and SPT formulation
yields a feasible solution to the GSP problem. Now suppose Xj is a feasible
solution to the GSP. If aij = 0 and Xj = 1 then set anyone of the variables
584 R.R. Vemuganti

Yijk,k = 1,2, ... , bi equal to 1 and the remaining to zero. If Xj(t) and aij(t)
are equal to one for t = 1,2, ... , h (note that h ~ bi), then set

Yij(t)t = 1 for all t = 1,2, ... , h

and the rest of the variables to zero. This provides a feasible solution to the
mixed SP and SPT formulation with objective values of both formulations
the same. Since there is one to one correspondence, the mixed SP and SPT
formulation yields an optimal solution to the asp problem. The equali-
ties can be replaced by inequalities to obtain the SP formulation using the
procedure described in coverting SPT to SP.
When Xj are not binary, they can be replaced by binary variables Yjk, k =
1,2, ... , tj using the transformation
tj

Xj = L2kYjk
k=1

where tj is suitably chosen to insure all possible values of Xj are included.


Since all Xj are bounded above (~ b) it is possible to select such tj. Now
substituting for Xj, in the asp formulation yields a zero-one MD knapsack
problem which can be converted to a binary asp problem.
Example 3. Consider the following binary asp problem.

max 4XI + 6X2 + txs + 8X4 + 10xs


s. t. X2 + Xs + X4 + Xs ~ 3
Xl +X2 +XS ~ 2
Xl +XS +X4 ~ 2
Xj = 0,1 j = 1,2,3,4

Using several zero-one variables Yijk, an equivalent mixed SP and SPT for-
mulation is

s.t. Yl2k + YISk + Yl4k + Yl5k ~ 1 k = 1,2,3


Y2lk + Y22k + Y2Sk ~ 1 k = 1,2
YSlk + YSSk + YS4k ~ 1 j = 1,2
Ef=l Yljk + Zj = 1 j = 1,2, ... ,5
E~=l Y2jk + Zj = 1 j = 1,2, ... ,5
Set Covering, Set Packing and Set Partitioning 585

E~=1 Y3jk + Zj = 1 j = 1,2, ... ,5


Yljk = 0,1 j = 1,3, ... ,5
k = 1,2,3
Y2jk = 0,1 j = 1,2, ... ,5
k= 1,2
Y3jk = 0,1 j = 1,2, ... ,5
k = 1,2
and Zj = 0,1 j = 1,2, ... ,5

3.6 Zero-one LIP to Zero-one MD Knapsack


When the variables are bounded above, any LIP can be converted to a Zero-
one LIP using the standard binary transformation. Consider the Zero-one
LIP
max E.1=1 CjXj
s.t. E.1=1 aijXj ~ bi i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (11)
where all bi ~ 0, and integers. When all aij > 0, no changes are necessary
to convert the problem to a Zero-one MD Knapsack problem. When some
aij ~ 0, let

°
°
aijl = max(O, aij) ~
and aij2 = min(O, aij) ~
for i = 1,2, ... , m and j = 1,2, ... , n. For each variable Xj (only for j for
which some aij < 0)
aijXj = (aijlXjl + aij2Xj2)
when the binary variables Xjl and Xj2 are equal. Replacing Xj2 with 1- Yj2,
an equivalent formulation of the Zero-one LIP is
max E.1=1 CjXjl
s.t. E.1=I(aijlXjl + (-aij2Yj2) ~ bi + E.1=I(-aij2) i = 1,2, ... ,m
Xjl +Yj2 = 1 j=1,2, ... ,n
Xjl = 0,1
Yj2 = 0,1 j = 1,2, ... ,n
(12)
586 RR Vemuganti

Since all aij2 ~ 0, all the coefficients involved are nonnegative including
the right hand sides of the inequalities. Using the standard procedure of
adding artificial slack variables, the equality constraints can be converted
to inequality constraints which yields the equivalent Zero-one Knapsack for-
mulation.

3.7 Mixed SPT and SP to SP


The equality constraints can be converted to less than or equal to con-
straints, by adding the artificial slack variables and eliminating them from
both the constraints and the objective function. Similarly other mixed for-
mula transformations presented in this section. Even though it is possible
to convert one model to another it should be noted that some conversions
require a large number of additional variables and constraints. In the next
section, models related to Graphs and Networks are presented.

4 Graphs and Networks


Let N be a finite set of points and A be a finite set of ordered pair of points,
(i,j), from the set N. The pair N and A is called a directed graph and is
denoted by G = (N, A). The elements of A are called nodes (or vertices)
and the elements of A are called arcs. For any arc (i,j) E A, i is called
the beginning node and j is called the ending node. If the elements of A
are unordered, they are called edges and the corresponding graph denoted
by G = (N, E), is called an undirected graph. Usually no distinction is
made between graphs and networks. However, when a subset of nodes are
singled out for a specific purpose such as sources and sinks to transport
some commodity, the corresponding graph is called a network. A graph is
called a bipartite graph if the nodes can be partitioned into two sets such
that the beginning node of every arc belongs to one set and the ending node
of every arc belongs to the other set. The cardinality of these sets N and
A (or E) are denoted by nand m which represent the number of nodes and
arcs (edges) respectively.
Two arcs (or two edges) are called adjacent if they have at least one
node in common. Similarly two nodes are adjacent to each other if they are
connected by an edge or arc. A chain is a sequences of arcs (AI, A 2 , ••• Ar )
such that each arc has one node in common with its successor and predeces-
sor with the exception of Al and Ar which have a common node with the
successor and predecessor respectively. If i is the beginning node of Al and
Set Covering, Set Packing and Set Partitioning 587

j is the ending of Ar, then it is a chain from node i and to node j. If all
the nodes encountered are distinct than it is called an elementary chain. If
the beginning and end points of an elementary chain are the same then it
is called a cycle. A path is a sequence of arcs (AI, A 2 , ••• A r ) such that the
ending node of every arc in the sequence is the beginning node of the next
arc. If i is the beginning node of Al and j is the ending node of Ar, then it
is a path from node i to j. The path is elementary if all nodes encountered
are distinct. If the beginning and ending nodes of a path are same then it
is called a circuit. An undirected graph G = (N, E) is called a tree if it
has exactly (11: - 1) arcs and has no cycles. A directed graph G = (N, A),
is called a tree if it contains exactly m = n - 1 arcs, has no circuits and
every node is the ending node of exactly one arc except one node which is
the beginning node of one or many arcs but not the ending node of any arc.
The following examples are is used to illustrate the concepts.
Example 4. Suppose N = {I, 2, 3,4, 5} and A = ((1,2), (1,3), (4,1),
(1,5), (2,3), (4, 2), (3, 4), (3, 5), (4, 5)}. In the above graph the arc set Al =

Figure 1: A Directed Graph

{(I, 2), (2,3), (3, 4)} is an elementary path from node 1 to node 5, the arc set
A2 = {(I, 2), (2, 3), (3,4)' (4, I)} is a circuit, the arc set A3 = {(I, 5), (3, 5)}
is an elementary chain, and the arc set ~ = {(I, 5), (4,5), (4, I)} is a cycle.
Clearly in undirected graphs there are only chains and cycles.
Example 5. Suppose N = {I, 2, 3, 4, 5, 6} and A = {(I, 2), (1,3), (3,4),
(1,5), (4,6)}. Clearly the above graph is a tree (in fact it is a called rooted
tree with root 1).
There are many problems related to graphs and networks such as vertex
packing (stability number), maximum matching, minimum covering, chro-
588 R.R. Vemuganti

Figure 2: A Directed Tree

matic index, chromatic number, multi- commodity minimum disconnecting


set and Steiner problem which can be formulated as one of the SC, SP and
SPT models or their variants.

4.1 Vertex (Node) Packing Problem


Consider an undirected graph G = (N, E). A subset of nodes P of N is called
a vertex packing if no two nodes of the set P are adjacent to each other. The
Vertex Packing problem is to find a packing of maximum cardinality. Let
Xi = 1, if node i is included in a packing and Xi = 0 otherwise. Since two
nodes connected by an edge cannot be included in a packing a SP formulation
of the vertex packing problem is given by

max Ef=l Xi
s.t. Xi +Xj ~ 1 (i,j) E E
and Xj = 0, 1 i = 1,2, ... ,n (13)

The maximum value of the objective function is also called the stability
number of a graph. To illustrate the usefulness of this model, consider a
franchise business whose objective is to maximize the number of profitable
franchises in a given area. Certain locations are so close to each other, if
franchises are open in both neither will make a profit. Represent each loca-
tion by a node and connect any two nodes by an edge, if the corresponding
Set Covering, Set Packing and Set Partitioning 589

locations are unprofitable, when franchises are open in both. To provide


another example (even though esoteric) consider placing eight queens on a
chessboard so that no queen can capture another queen. In order to deter-
mine the feasibility of placing eight queens construct a graph with 64 nodes,
each node representing a square on the chessboard and connect two nodes
by an edge if the corresponding squares are in the same row or in the same
column or in the same diagonal. If the corresponding stability number is
eight or more it is possible to place eight queens on a chessboard without
one capturing another.
When each node i is assigned a nonnegative weight of Ci and the coef-
ficient of Xi is Ci in the objective function of the formulation (13), then it
is called a weighted Vertex Packing Problem. An interesting application of
this model is the transformation of a SP problem to a weighted Vertex Pack-
ing Problem. To see the connection between these two models, construct
an undirected graph with n nodes, each node representing a variable in the
formulation (2) and connect two nodes j and k by an edge if there is a con-
straint i such that aij = aik = 1. The equivalence between the SP problem
and the weighted Vertex Packing Problem generated by the corresponding
graph is illustrated below.
Example 6.
5
max LCiXj
j=l
s.t. Xl + X2 + X3 ~ 1
X2 + X3 + X4 ~ 1

X3 + X4 +X5 ~ 1

Xl + X3 + X5 ~ 1
and Xj = 0,1 j = 1,2, ... ,5
The undirected graph with 5 nodes and 8 edges corresponding to the above
SP problem is shown below. From the first constraint, it is clear that among
the variables Xl, X2, X3 no two variables can be found equal to 1 in any
feasible solution. This is equivalent to Vertex Packing constraints on nodes
(1,2), (2,3) and (1,3).

4.2 Maximum Matching


Consider an undirected graph G = (N, E). Two edges (or arcs) are said
to be adjacent to each other if they have a node in common. A subset of
590 R.R. Vemuganti

Figure 3: A Set Packing Graph

edges D is called a matching if no two edges in D are adjacent to each


other. The maximum matching problem is to determine a matching of
maximum cardinality. Let Xj = 1, if edge j is included in the matching
and Xj = 0 otherwise. Also for each edge (i,j), let aij = aji = 1. Then the
SP formulation of the maximum matching problem is given by

max Ej::lXj
s.t. Ej::l aijXj $ 1, i = 1,2, ... ,n
and Xj = 0,1 j = 1,2, ... ,m (14)

Example 7. To illustrate the above model consider the following graph


with numbers on each edge representing the number assigned to each edge.
The corresponding maximum matching problem is
7
max :EXj
j=l
s.t. Xl +X2 $1
Xl +X3 +X4 $1
X3 +X5 $1
X4 + X5 +X6 $ 1
X6 +X7 $1
X2 +X7 $1
and Xj = 0,1 j = 1,2, ... ,7
Set Covering, Set Packing and Set Partitioning 591

Figure 4: An Undirected Graph

To illustrate the application of this model, suppose there are p workers


and q jobs and each worker is trained to perform at least one job. The
problem is to determine whether each worker can be assigned to a job for
which the individual is qualified. Consider a bipartite graph with p nodes
corresponding to the workers in one group and q nodes corresponding to the
jobs in the second group. Connect the node in the first group with an edge
to a node in the second group if the individual is qualified to perform the
corresponding job. If the value of the maximum matching for this graph is
p then all workers can be assigned to jobs for which they are qualified.
A closely related problem is the standard assignment problem. Suppose
there are n workers and n jobs. It costs Cij if worker i is assigned to job j.
The problem is to assign each worker to one job and each job to one worker
so that the total cost is a minimum. Let Xij = 1, if worker i is assigned to
job j and Xij = 0 otherwise. The SPT formulation of this problem is given
by
min Ef=l Ej=l CijXij
s.t. Ej=l Xij = 1 i = 1,2, ... ,n
Ei:l Xij = 1 j = 1,2, ... ,n
and Xij = 0,1 i = 1,2, ... ,n
j = 1,2, ... ,n (15)
Efficient techniques have been developed to solve this problem which require
polynomial computational time to determine the optimum solution.
592 R.R. Vemuganti

4.3 Minimum Covering Problem


Given an undirected graph G = (N, E), an edge covering F is a subset of E
such that every node in N is the end point of at least one edge in F. The
problem is to determine the minimum number of edges needed to cover all
nodes. Let Xj = 1, if edge j is in F and Xj = 0 otherwise. Also let aij = 1
if node i is an end point of the edge j and aij = 0 otherwise. Then the SC
formulation of the minimum covering problem is given by
min EJ=lXj
s.t. EJ=l aijXj ::; 1 i = 1,2, ... , n
and Xj = 0, 1 j = 1,2, ... , m (16)
For a simple application of this model consider a fort with towers at the
endpoints of each wall. A guard stationed at a wall can watch both towers
at the end of the wall. The problem is to determine the minimum number
of guards needed to watch all towers. Define a node for each tower and
connect any two nodes by an edge if they are connected by a wall. Clearly,
the minimum number of edges to cover all nodes yields the minimum number
of guards needed to watch all the towers. Many more important and useful
applications of this model are included in the section related to location
problems.
Another related model is to cover all edges by nodes. That is find a
subset of nodes P of N such that at least one node of every edge belongs
to P. The problem is to determine the minimum number of nodes needed
to cover all edges. Let Xi = 1 if node i is included in the cover and Xi = 0
otherwise. Then the SC formulation of the node covering problem is given
by
min Ef:l xi
s.t. Xi + Xj 1 (i,j) E E
~
and Xi = 0,1 i = 1,2, ... ,n (17)
It is easy to see the vertex packing problem (13) and the node covering
problem (17) are closely related [set Xi = 1 - Yi in formulation (17) to
obtain the formulation (13)].

4.4 Chromatic Index and Chromatic Number


The chromatic index of an undirected graph is the minimum number of
colors needed to color all edges of the graph so that no two adjacent edges
Set Covering, Set Packing and Set Partitioning 593

receive the same color. It is clear that no more than m colors are needed
since m represents the number of edges. Let Yk = 1, if color k is used and
Yk = 0 otherwise, for k = 1,2, ... , m. Also let Xjk = 1, if edge j is given
color k and Xjk = 0 otherwise, for j = 1,2, ... , m and k = 1,2, ... , m. Finally
let aij = 1, if node i is an endpoint of edge j and aij = 0 otherwise for
i = 1,2, ... , nand j = 1,2, ... , m. Then an integer programming formulation
is given by
min Er=lYk
s.t. Er=lXjk = 1 j = 1,2, ... ,m
Ej;l aijXjk :::; Yk k = 1,2, ... ,m
i = 1,2, ... ,n
Yk = 0, 1 k = 1,2, ... ,m
and Xjk = 0,1 k = 1,2, ... ,m
j = 1,2, ... ,m (18)
The first set of constraints ensures that every arc is assigned a color and
the second set of constraints guarantees that all edges adjacent to a given
node receive at most one color if it is used to color some edge. Substituting
Zk = 1- Yk, the problem can be transformed to a mixed SP and SPT model.
A related problem is to find the minimum number of colors needed to
color all nodes of an undirected graph so that no two adjacent nodes re-
ceive the same color. The minimum number of colors needed is called the
chromatic number. Let Yk = 1, if color k is used and Yk = 0 otherwise
for k = 1,2, ... , n. Also let Xik = 1, if node i given color k and Xik = 0
otherwise for i = 1,2, ... , nand k = 1,2, ... , n. In addition, let aij = aji = 1,
if (i,j) E E and aij = 0 otherwise and au = 1 for all i = 12, ... ,n. Then a
linear integer programming formulation of the problem is given by
min E~=l Yk
s.t. E~=lXik = 1 k = 1,2, ... ,n
Ej=l aijXjk :::; Yk i = 1,2, ... ,n
k = 1,2, ... ,n
and Yk = 0,1 k = 1,2, ... ,n
Xik = 0,1 k = 1,2, ... ,n
i = 1,2, ... ,n. (19)
The first set of constraints guarantees that every node receives exactly one
594 R.R. Vemuganti

color and the second set of constraints ensures that none of the nodes adja-
cent to a given node receive the same color. Substituting Zk = 1 - Yk, this
problem also can be transferred to a mixed SP and SPT model.
To illustrate a simple application of this model suppose at the end of an
academic year several students must take oral exams from several professors.
The problem is to determine the minimum number of periods needed to
schedule the oral examinations. During an oral exam only one student can be
examined by a professor during any period. To model this problem construct
a bipartite graph with Nl nodes representing the students and N2 nodes
representing the professors. Connect a node i E Nl and j E N2 with an edge
if student i must be examined by professor j. If the edges are colored so
that no two adjacent edges receive the same color, each color can correspond
to a period. Clearly, the chromatic index of the graph yields the minimum
number of time periods needed to complete all oral examinations. Other
models of time table scheduling problems are discussed in the miscellaneous
operations section.

4.5 Multi-Commodity Disconnecting Set Problem


Consider a directed network G = (N, A) and let S = {SI, S2, ... Sk} ~ N and
T = {tl, t2, ... , tk} ~ N be the source set and sink set. A set of arcs D ~ A is
called a disconnecting set which when removed from the network would block
all paths from Si to ti for i = 1,2, ... , k. To disrupt communications from each
Si to ti, all arcs in a disconnecting set from the network must be removed.
Suppose it costs cj to remove (destroy) the arc ej, for j = 1,2, ... , m. The
problem of interest is to find a disconnecting D which costs the least. Such a
disconnecting set is called a multi-commodity minimum disconnecting and is
useful in attacking an enemy network to disrupt all communications between
the sources and the corresponding sinks. Suppose PI, P2 , ••• , Pr represent all
elementary paths from every point in S to the corresponding point in T.
Let Xj = 1, if the jth edge is selected for removal from the network and
Xj = 0 otherwise. Also let aij = 1, if path Pi contains the arc j and aij = 0
otherwise for i = 1,2, ... , T and j = 1,2, ... , m. A SC formulation of the
multi-commodity minimum disconnecting problem is given by

min EJ=l CjXj


s.t. EJ=l aijXj ~ 1 i = 1,2, ... ,T
and Xj = 0,1 j = 1,2, ... ,m (20)
Set Covering, Set Packing and Set Partitioning 595

Even for a network of moderate size the number of paths could be pro-
hibitively large. A method called row generation scheme may be used (for
k ? 3) to solve this problem which does not require the explicit knowledge
of all the constraints. Efficient computational techniques are available when
the number of sources k is equal to 1 or 2. The following numerical example
is used to illustrate the model.
Example 8. Consider the following network with source set and sink
set consisting of three nodes with numbers on each arc representing the arc
number assigned to it. There is only one path (elementary) from each source
to the corresponding sink and these are listed below.

s }-_ _
4 __ ~

1...._ _5_ _ @

Figure 5: A Multi-Commodity Network

PI - {(s},3),(3,1),(1,2),(2,tt}}
P2 = {(S2' 2), (2, 3), (3, I), (1, t2)}
P3 = {(s3,1),(1,2),(2,3),(3,t3)}
The corresponding SC formulation is given below.
7
min 2:CjXj
j=1
596 R.R. Vemuganti

s.t. X4 + X3 + Xl + X7 ~ 1
Xs + X2 + X3 + Xs ~ 1
X6 + Xl + X2 + X9 ~ 1
and Xj = 0,1 j = 1,2, ... , 7

4.6 Steiner Problem in Graphs


Consider an undirected graph G = (N, E) and cost Cj associated with edge
j, j = 1,2 ... , m. Suppose N = SUP and the node set P is designated as
the set of Steiner points. The problem is to determine a tree T = (NI , E I )
such that N1 and E1 are subsets of Nand E respectively and NI contains
all nodes in S and the total cost of the edges included in E1 is a minimum.
It should be noted that in a tree every two nodes are connected by a single
path. When P is empty, the problem (minimal spanning tree) can be solved
very efficiently. Consider a partitioning of the nodes N = N1 U N2 such
that both N1 and N2 contain some nodes of S(N1 n N2 = 0, N1 n S I: 0
and N2 n S t= 0). To span all the nodes in S at least one of the edges
from the node set NI to the node set in N2 must be included in the tree.
Suppose E 1, E 2, ... , Er represent the edges corresponding to all partitionings

°
(also called cut sets) of the nodes N with the specified property. Let Xj = 1,
if jth edge is included in E1 and Xj = otherwise for j = 1,2, ... , m. Also
°
let aij = 1 if edge set Ei contains the edge j and aij = otherwise, for all
i = 1,2, ... , rand j = 1,2, ... , m. A SC formulation of the Steiner Problem
in graphs is given by

min 2:.7=1 CjXj


s.t. 2:.7=1 aijXj ~ 1 i = 1,2, ... ,r
and Xj = 0, 1 j = 1,2, ... ,m. (21)
This problem also can be solved using the row generation scheme which does
not require the explicit knowledge of all constraints similar to the multi-
disconnecting set problem. Models of this type can be used to determine
the minimum cost needed to determine communication links between sev-
eral locations so that communication is possible between any two pair of
locations.
Example 9. Consider the following undirected graph with 5 nodes and
8 edges with numbers on each edge representing the number assigned to it
and S = {I, 2, 3}. The list of all possible partitionings of the nodes and
the edges corresponding to each partitioning are given in Table 1. The SC
Set Covering, Set Packing and Set Partitioning 597

Figure 6: An Undirected Graph for the Steiner Problem

Table 1:

Nl N2 Edges
(1) (2,3,4,5) El = (1,2,3,4)
(1,4) (2,3,5) E2 = (1,2,4,7,8)
(1,5) (2,3,4) E3 = (1,2,3,5,8)
(1,4,5) (2,3) E4 = (1,2,5,7)
(2) (1,3,4,5) E5 = (1,5,6)
(2,4) (1,3,5) E6 = (1,3,5,6,7,8)
(2,5) (1,3,4) E7 = (1,4,6,8)
(2,4,5) (1,3) Es=(1,4,6)
(3) (1,2,4,5) E9 = (2,6,7)
(3,4) (1,2,5) ElO = (2,3,6,8)
(3,5) (1,2,4) Ell = (2,4,6,7,8)
(3,4,5) (1,2) E12 = (2,3,4, 6)
598 R.R. Vemuganti

formulation consisting of 8 variables, one variable corresponding to each


edge and 12 constraints, one constraint corresponding to each partitioning
is straight forward.

5 Personnel Scheduling Models

Scheduling personnel is an important activity in many organizations in both


manufacturing and service sectors. The scheduling problems arise in a vari-
ety of service delivery settings such as scheduling nurses in hospitals, airline
and hotel reservations personnel, telephone operators, patrol officers, work-
ers in a postal facility, checkout clerks in supermarkets, ambulance and fire
service personnel, toll collectors, check encoders in banks, train and bus
crew, personnel in fast food restaurants, and others. The scheduling prob-
lems also arise in manufacturing activities especially those requiring conti-
nuity in the production process such as chemicals and steel. Operational
performance of both service and manufacturing operations such as quality
and level of service, labor cost and productivity are effected by employee
scheduling decisions. Tardiness, turnover and absenteeism complicates the
situation due to unsatisfactory work schedules. Providing satisfactory work
schedules to employees, maintaining the required service levels, insuring the
human needs such as breaks for lunch and rest, meeting the governmental
and union legal requirements, achieving the production goals and control-
ling the labor costs are some of the major issues in modeling the employee
scheduling problem.
The solution to the scheduling problem is relatively simple in organiza-
tions which operate five days a week and one standard shift a day, since all
employees are required to follow one schedule except possibly lunch breaks.
This section deals with scheduling problems which arise in organizations
which operate six or more days a week, with one or more shifts per day.
There are three basic models called cyclical or days off scheduling, shift
scheduling and tour scheduling to structure a variety of personnel schedul-
ing problems. Many criteria such as the total number of employees, total
number of hours of labor, total cost of labor, unscheduled labor costs, over-
staffing, understaffing, number of schedules with consecutive days off, num-
ber of different work schedules utilized may be used in conjunction with the
three basic models to suit a particular application.
Set Covering, Set Packing and Set Partitioning 599

5.1 Days Off or Cyclical Scheduling


Consider an organization which operates seven days a week with one shift
per day. The number of employees required on each day of the week may
vary but is stable from week to week. There is only one employee category
and the number of employees required on any given day is determined (or
estimated) on the basis of the required level of service. Every employee must
be given two consecutive days off in a week. The problem of interest is to
determine the minimum number of employees required to meet the daily
demand for their services. Suppose the number of employees needed on the
ith day of the week is bi, i = 1,2, ... , 7. Clearly there are 7 possible schedules
which satisfy the two consecutive days off requirement. Let Xi represent the
number of employees who start work on ith day and continue work for a
total of 5 consecutive days. For example an employee who begins work on
the 7th day will also work on the first four days of the week and is off on the
5th and 6th days of the week. This problem can be formulated as a GSC
and is given below.
7
mm LXj
j=l
s.t. Xl + X4 + X5 + X6 + X7 ~ bl
Xl + X2 + X5 + X6 + X7 ~ b2

Xl + X2 + Xa + X6 + X7 ~ ba

Xl + X2 + Xa + X4 + X7 ~ b4

Xl + X2 + Xa + X4 + X5 ~ b5

X2 + X3 + X4 + X5 + X6 ~ b6
Xa + X4 + X5 + X6 + X7 ~ b7
and Xi is a nonnegative integer for i = 1,2, ... , 7 (22)
Since the unused number of man days is given by
7 7
5L Xi - Lbi
i=l i=l
minimizing the total employees will also minimize the overstaffing. If the la-
bor costs vary depending upon the days off, it is possible to obtain minimum
cost solution by changing the objective function to
7
LCiXi
i=l
600 R.R. Vemuganti

where Ci is the cost of the employee who starts work on the ith day of
the week. If nonconsecutive days off are permitted, there are a total of 21
possible schedules. The corresponding model can be formulated with 21
variables.
If the optimal solution of the model (22) is implemented, it is possible
that some employees may never get a weekend off. Such solutions can be
avoided by extending the planning horizon to several weeks and incorpo-
rating only those schedules which satisfy the required minimum number of
weekends off. For example if the planning horizon consists of four weeks
and the schedules are restricted to consecutive days off in each week, there
are a total of (7)4 = 2401 possible schedules. Many of these schedules may
not include even one weekend off. In addition, some schedules may require
a long work stretch. For example, a work schedule with days 1 and 2 off in
the first week, days 6 and 7 off in the next three weeks requires an individual
to work 10 consecutive days without a break. This schedule also provides 4
consecutive days off if repeated once in four weeks. Such undesirable sched-
ules may be eliminated in formulating the problem. For a given planning
horizon consisting of m days, suppose the total number of schedules which
meet the requirements is n. Let aij = 1, ifthe day i in the planning horizon
is a work day in the schedule j and aij = 0 otherwise. Also let Xj be the
number employees with work schedule j. A GSC formulation of this model
is given by

min E'1=1Xj
s.t. E'1=1 aijXj ~ bi i = 1,2, ... ,m
and Xj is a nonnegative integer j = 1,2, ... ,n (23)

where bi is the required number of employees on ith day of the planning


horizon. When there are several categories of tasks to be performed and
each employee can perform only one task, the scheduling problem can be
solved by treating each category of tasks separately. The interesting case is
when employees can perform multiple tasks. Models (22) and (23) can easily
be extended to incorporate the ability of the employees to perform multiple
tasks by defining aijk = 1, if day i of the planning horizon is a work day
in schedule j and the employee is required to perform task k. Obviously,
the number of schedules, consequently the number of variables in model
(23) increase substantially with the length of the planning horizon and the
number of tasks.
Set Covering, Set Packing and Set Partitioning 601

5.2 Shift Scheduling

Many service facilities and manufacturing companies operate more than one
shift a day. For example hospitals operate twenty four hours a day and
seven days a week. Shift scheduling deals with problems related to start
time, work span, lunch breaks and rest periods in assigning shifts to em-
ployees. The work day is divided into several periods of equal duration such
as an hour. Based on shift length, constraints on work span (number of
periods of continuous work), lunch breaks, rest and start time, several fea-
sible schedules can be generated. For example, if the work day consists of
14 hours from 8 am through 10 pm and is divided into 28 half hour periods,
each work schedule can be represented by a sequence of ones and zeroes
with one corresponding to work period and zero corresponding to nonwork
period. The sequence (0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,0,0, 0,0)
corresponds to start work at 12 pm, take one hour break at 4 pm after
working for 4 hours, resume work again at 5 pm till 8 pm and quit for the
day. Assuming that all employees must start at the beginning of a period
and work eight hours at a stretch (including breaks), the number of possible
schedules or shifts is 11. However, if there is flexibility in the work stretch
many more schedules are possible. If part time employees are permitted,
additional schedules can be added to the list of feasible schedules. An im-
portant factor in determining the length of the period is the availability of
reliable estimates of the personnel requirement during each period of the
work day. Suppose bi represents the number of personnel needed during the
ith period and aij = 1, if ith period is a work period in the jth schedule. A
GSC formulation of the shift scheduling problem is given by

min E'j=l CjXj


s.t. E'j=l aijXj ~ bi i = 1,2, ... , m
and Xj is a nonnegative integer j = 1,2, ... , n (24)

where Cj is the cost of schedule j, n is the total number of feasible schedules


and m is the number periods in a work day.
While this model provides the number of personnel required to work on
each shift during each work day to minimize the total costs, it does not give
individual employee schedules for a week including the days off. A model
which integrates the shift scheduling and days off scheduling models is more
useful.
602 R.R. Vemuganti

5.3 Tour Scheduling


The combined model of days-off and shift scheduling is known as tour
scheduling. Suppose a planning horizon such as a week or a month is cho-
sen. Each working day of the planning horizon is divided into several periods
of equal length. Several feasible tours are selected taking into account the
days off, work stretch, starting time of a shift, lunch breaks and rest, legal
constraints, part-time employees, and other restrictions. Suppose there are
several tasks to be performed during each period of the planning horizon
and bik is the number personnel required during period i for task k. Also
let aijk = 1, if task k is performed during period i of tour j and aijk = 0
otherwise. Further, suppose m is the total number of periods in the plan-
ning horizon, n is the total number of feasible tours and r is the number of
tasks. The GSC model of the tour scheduling is problem is given by

min E'}=l CjXj


s.t. E'}=l aijkXj ~ bik i = 1,2, ... ,m
k = 1,2, ... ,r
and Xj is a positive integer j = 1,2, ... , n (25)

where Cj is the cost of tour j. The number of variables (feasible schedules)


increase significantly with the length of the planning horizon.
All the models described so far attempt to determine the number of
schedules of each type to minimize some chosen criteria. Individual employ-
ees or their preferences are not incorporated. Suppose Xj. is the optimal
solution to problem (25), Wij is the preference index of the employee i to
tour j and the total number of employees is p. Also let Yij = 1, if employee
is assigned to tour j and Yij = 0 otherwise. The GSPT model (also called
generalized assignment or transportation problem) to determine the optimal
assignment of tours to employees is given by

max "p ,,~ W"Y"


L.."i=l L.."J=l ~J ~3
s.t. E'}=l Yij = 1 i = 1,2, ... , k
Ef=l Yij = Xj. j = 1,2, ... , n
and Yij = 0, 1 i = 1,2, ... ,p
j = 1,2, ... ,n (26)

It is possible to explicitly incorporate each employee in model (25). Let


Xtj = 1, if employee t is assigned to tour j and Xtj = 0, otherwise and Ctj is
Set Covering, Set Packing and Set Partitioning 603

the corresponding cost. A mixed GSC and SPT formulation is given by

min Ef=l E']=l ctjXtj


s.t. E']=l Xtj = 1 t = 1,2, ... ,p
Ef=l E']=l aijkXtj ~ bik i = 1,2, ... , m
k = 1,2, ... ,r
and Xtj = 0,1 t = 1,2, ... ,p
j = 1,2, ... ,n (27)

Clearly the size of problem (27) is p times the size of the problem (25) with
respect to the number of variables.
Incorporating additional constraints, such as a limit on the ratio of part
time employees to the total number of employees, productivity of the em-
ployees, requires general integer programming formulation. The following
numerical example is used to illustrate tour generation.
Example 10. Consider a facility which operates Monday through Sat-
urday of every week with two shifts per day. Every employee is off on
Sunday and must work one shift per day for five days a week. In addition,
at least four shifts during a week must be either the first or the second shift.
Suppose an employee is off on Saturday. There are two tours which corre-
spond to either all or first shift all are second shift. There are five tours
with four first shifts and one second shift. Similarly, there are five tours
with four second shifts and one first shift, therefore, the number of possible
tours with Saturday off is 12. Since anyone of the six days can be selected
for day off, the total number of tours is 72. If each shift is treated as a
time period, each week consists of twelve time periods. A twelve dimension
vector may be used to represent each tour. For example, the vector T =
(0,1,0,1,0,1,0,1,1,0,0,0) represents a tour working on the first shift Monday
through Thursday, second shift on Friday and is off on Saturday.

6 Crew Scheduling
Crew scheduling is an important problem in many transportation systems
such as airlines, cargo and package carriers, mass transit systems, buslines
and trains since a significant portion of the cost of operations is due to the
payments to the crews which include salaries, benefits and expenses. The
primary objective in crew scheduling is to sequence the movements of the
crew in time and locations so as to staff the desired vehicle movements at a
604 R.R. Vemuganti

minimum cost. As in the employee scheduling problem generating a set of


feasible schedules is necessary to formulate the corresponding SC and SPT
(or their variants) for the crew scheduling problem. However, generating a
set of feasible schedules for the crew problem is much more complex, since
the crew has to be paired with a specified number of sequence of legs of trips
or flights over time and space, in addition to incorporating overnight stay
away from home base, return to the home base at least once in a specified
number of days, constraints on rest periods, restrictions on work stretch and
many other legal and safety rules and regulations. The terminology related
to scheduling airline crews is used to describe the crew scheduling problem
which can be easily interpreted in the context of train, busline, ship, mass
transit and other crew scheduling problems.
A flight leg or flight segment is an airborne trip between an origin and
destination city pair. Each flight segment has a specified departure time
from the city of origin and a scheduled arrival time at the city of the des-
tination. The duration of the flight segment is the difference between the
arrival time and the departure time. Each flight segment must be assigned a
crew consisting of a specified number of pilots and flight attendants. Crews
reside in various cities called bases. The number and location of the bases
depend upon the size of the operation. The union work rules and govern-
ment regulations for assigning crews vary depending upon the crew type
(pilot or flight attendant), crew size, aircraft type and type of operations
(international or domestic). A duty period consists of one or several flight
segments with a limit on both the duration of the flight and the number of
flight segments. A duty period is similar to shift in the employees scheduling
problem. During a duty period a crew might be attending to their duties
or traveling as passengers to reposition themselves for other assignments
which is called deadheading. During deadheading the crew may be assigned
flights operated by another carrier. Two duty periods must be separated by
rest periods which are called overnights. There are minimum and maximum
limits on the duration of the overnights and the minimum limit depends
upon the duration of the duty period. Time away from the base which is
the elapsed time from departure to the return of the crew base cannot ex-
ceed a specified number of days as mandated by the work rules. A pairing
is a sequence of flight segments which may be grouped into duty and rest
periods, the first flight segment beginning and the las flight segment ending
at the crew base.
Calculating the cost of pairing may vary from one organization to an-
other. One may use the salary paid to the crew, plus hotel, per-diem, ground
Set Covering, Set Packing and Set Partitioning 605

transportation and deadheading fare paid during the rest periods. One may
also use the opportunity cost which is obtained by calculating the difference
between salary of the crew and the actual salary earned during the flying
time plus expenses incurred during rest periods. An adjustment has to be
made to account for carryover flights not covered in the current planning
horizon. Clearly the planning horizon has a significant impact on the num-
ber of pairings, since the number of flight segments included in calculating
the pairings increases with the length of the planning horizon.
Determining all pairings is fairly time consuming. All legs are linked
together to form resolved legs which must be flown as a unit without chang-
ing the crew. Resolved legs are linked together to form trips which can be
completed in one duty period. In the third level the trips are linked into
pairings. After determining the optimal pairings, they are grouped into bid-
line to form monthly schedules for the crew. Suppose the total number of
pairings is nand Cj is the cost of jth pairing. Also suppose there are m flight
segments during the planning horizon. Let aij = 1, if jth pairing includes
the flight segment i and aij = 0 otherwise. Let Xj = 1, if jth pairing is
elected and Xj = 0 otherwise. Then a SC model formulation of the crew
scheduling problem is given by

mm Ej=ICjXj

s.t. Ej=1 aijXj 2 1 i = 1,2, ... ,m


and Xj = 0, 1 j = 1,2, ... ,n (28)

Inequalities are used in the constraint set to allow deadheading which could
be used to move a given crew from one place to another (from one base
to another) to reduce the costs. If deadheading is not permitted, an SPT
formulation can be obtained by converting the inequalities into equalities. A
major problem in solving the crew pairing problem is the enormous number
of pairings generated in real life applications.
The crew base concept is not meaningful in a mass-transport system. As
in the employee scheduling problem days off is relevant and a crew pairing
problem may be viewed as all feasible weekly or monthly schedules depend-
ing on the length of the planning horizon.
Example 11. To illustrate the construction of crew pairings consider a
small domestic airline operating between three cities with two morning and
two afternoon flights between each pair of cities during each day. Suppose
the flight time between the pairs 1 and 2 is 5 hours, 1 and 3 is 4 hours and
2 and 3 is 3 hours. The departure and arrival schedules of the daily flights
606 R.R. Vemuganti

Table 2:
From Departure To Arrival
City Time City Time
1 08 2 13
1 15 2 20
1 10 3 14
1 16 3 20
2 09 1 14
2 14 1 19
2 07 3 10
2 15 3 18
3 08 1 12
3 16 1 20
3 09 2 12
3 18 2 21

are given in Table 2.


The planning horizon is three days which consists of 36 flight legs. Each
duty period cannot exceed 12 hours. The minimum duration of the overnight
is 8 hours. Time away from the base is limited to three days. Since all
flights reach destinations on the same day and considerable time is available
for rest (overnight) before returning to duty the following day, duty periods
can be constructed treating each day separately. There are eight activities
including four departures and four arrivals at each city. The times of these
activities during each day for city 1, city 2, and city 3 are (08, 10, 12,14, 15,
16, 19, 20), (07, 09, 12, 13, 14, 15, 20, 21) and (08, 09, 10, 14, 16, 18, 20)
respectively. To generate all possible duty periods, first construct a network
consisting of 23 nodes representing the departure and arrivals times of each
flight and connect the departure node and the corresponding arrival node
by an arc. Also, join two consecutive nodes corresponding to each city by
an arc as shown in the network below where the first number associated
with each node represents the city and the second number the departure or
arrival time. Set the length of each arc equal to the difference between the
time of the ending node and the beginning node. Clearly the length of each
arc represents either the duration of a flight or wait time at an airport to
catch another flight. Starting with any departure node enumerate all paths
Set Covering, Set Packing and Set Partitioning 607

Figure 7: Graph of a Flight Schedule


608 R.R. Vemuganti

Table 3:

Duty Period List of Nodes Duration


Number in the Path of the Path
1 (1,08), (2,13) 5
2 (1,08), (2,13), (2,14), (1,19) 11
3 (1,10), (3,14) 4
4 (1,10), (3,14), (3,16), (1,20) 10
5 (1,10), (3,14), (3,18), (2,21) 11
6 (1,15), (2,20) 5
7 (1,16), (3,20) 4
8 (2,07), (3,10) 3
9 (2,09), (1,14) 5
10 (2,09), (1,14), (1,15), (2,20) 11
11 (2,09), (1,14), (1,15), (3,20) 11
12 (2,14), (1,19) 5
13 (2,15), (3,18) 3
14 (3,08), (1,12) 4
15 (3,08), (1,12), (1,15), (2,20) 12
16 (3,09), (2,12) 3
17 (3,09), (2,12), (2,14), (1,19) 10
18 (3,09), (2,12), (2,15), (3,18) 9
19 (3,16), (1,20) 4
20 (3,18), (2,21) 3

of length 12 hours or less. Table 3 is a list of 20 duty periods.


Combining duty periods into pairings is a combinatorial problem. To
generate all possible pairings, construct a network with 66 nodes represent-
ing 20 duty periods for three days, 3 source nodes 81, 82, and 83 and 3 sink
nodes tI, t2, and t3. A node on day 1 is connected by an arc to a node on
day 2 provided the city of destination on day 1 and the city of origin on
day 2 are the same. Similarly the nodes on day 2 and day 3 are connected.
Finally connect source node 8i with an arc to all nodes on day 1, with city
of origin i and connect all nodes on day 3 with city of destination i to ti.
All paths from (8i, td for i = 1,2,3 yield the required pairings. Duty period
5 on day 1, duty period 11 on day 2, and duty period 14 on day 3 is an
example of a pairing.
Set Covering, Set Packing and Set Partitioning 609

7 Manufacturing
A variety of problems related to manufacturing activity such as assembly
line balancing, discrete lot size and scheduling, ingot size selection, spare
parts allocation and cutting stock which can be formulated as SC, SP and
8PT models or their variants are presented in this section.

7.1 Assembly Line Balancing Problem


In an assembly line balancing problem there are a set of tasks to be per-
formed in a specified order determined by a set of precedence relations.
Given the time required for processing each activity, the problem is to de-
termine the minimum number of stations, so that the total time required
for processing all activities assigned to a station does not exceed a specified
number called cycle time without violating the precedence relations. Sup-
pose the number of tasks is n with processing times ti, i = 1,2, ... , n. The set
of precedence relations P is specified by ordered pairs (i,j) which implies
that the task i must be completed before the task j. Also suppose c is the
cycle length. Obviously the number of stations required is no more than n.
To implement the precedence relations, if (i,j) E P, and tasks i and j are
assigned to stations s{i) and s(j), then s(i) ~ s(j).
Let Xik = 1, if task i is assigned to station k and Xik = 0 otherwise. Also
let Yk = 1, if station k is open for assigning activities and Yk = 0 otherwise.
An integer programming formulation of the problem is

min Ek=l Yk
s.t. Ek=lxik = 1 i = 1,2, ... ,n
Er=l tixik ~ CYk k = 1,2, ... ,n
E~=l Xik ~ Xjh (i,j) E P
and h = 1,2, ... ,n
Yk = 0,1 k= 1,2, ... ,n
and Xik = 0,1 i = 1,2, ... ,n
k = 1,2, ... ,n (29)

The first set of constraints insure that every task is assigned exactly to one
station. The second set guarantees tasks are assigned only to a station if it
is open and the total time of the activities does not exceed the cycle length.
The third set maintains the precedence relations. Defining complimentary
610 R.R. Vemuganti

variables Vk and Zik such that

Vk +Yk =1
and Xik + Zik = 1.

The second set of constraints can be converted to binary knapsack con-


straints and the third set of constraints can be transformed to SC con-
straints. Converting the binary knapsack constraints to SP constraints as
noted earlier results in a mixed SP, SC and SPT model. Other formulations
of this model are available.

7.2 Discrete Lot Sizing and Scheduling Problem


Consider a production scheduling problem where several items are manufac-
tured on a single machine over a finite planning horizon consisting of several
time periods. During each time period either the machine is idle or the en-
tire duration is devoted to the production of a single item. Each item may
require a set up time of zero, one or several time periods before production
can start if this item is not produced in the previous period. The setup cost
and time are item dependent but independent of the item produced in the
previous period. Given that an entire duration of a period is devoted to a
single item, the demand for each item can be measured in terms of the num-
ber of time periods of production needed to satisfy the demand. Inventory
cost is incurred when excess production is carried from one time period to
the next period. Shortages are not permitted and the setup cost, inventory
cost and production cost may vary from period to period. The problem is to
determine a production schedule for each item which satisfies the demands
at a minimum cost.
Without loss of generality the demand for any item can be assumed to
be 0 or 1 during any time period. Clearly, the demand must be an integer
since it is measured in terms of the number of time periods of production
needed to meet the demand. Suppose the demand is 3 units in time period
6. Since only 1 unit of demand can be met from the production of time
period 6, the demand for the remaining two units must be met from the
units produced in the first two periods. Make the demand equal to one in
period 6 and increase the demands in periods 4 and 5 by 1. Examine the
total demand in period 5 and it is more than 1, continue the procedure.
Finally in period 1 if the demand is more than 1 and cannot be met by the
initial inventory, the problem is infeasible.
Set Covering, Set Packing and Set Partitioning 611

Suppose the planning horizon consists of m time periods and the number
of items is n. Let ki be the number of feasible schedules for product i,
Xij = 1, if jth schedule is selected for product i and Xij = 0 otherwise. Also
let Gij be the cost of jth schedule of product i, and aijt = 1, if period t
is being used either to setup or produce product i in the jth schedule and
aijt = 0 otherwise. A mixed SP and SPT formulation of this problem is
given by

min Ef=1 Ej=1 GijXij


s.t. E~1 E~~1 ~jtXij ~ 1 t = 1,2, ... ,m
E~~1 Xij = 1 i = 1,2, ... ,n
and Xij = 0,1 i = 1,2, ... ,n
j = 1,2, ... ,ki (30)

The first set of constraints insures no more than one product is made
during any time period and the second set of constraints guarantees that
exactly one schedule is selected for each product i. The number of variables
required to formulate this model grows exponentially with the number of
time periods.

7.3 Ingot Size Selection


In the steel industry, ingot is an intermediate product which is a mass of
metal shaped in a bar or a block. Ingot size or dimensions is an important
factor in producing a finished product to customer specifications. In order to
make the finished product an ingot must be processed to scale the dimensions
resulting in scrap metal. Clearly, producing an optimal ingot size within
the technological constraints to manufacture a specific finished product, will
reduce the scrap and waste. However, when each finished product requires
a different ingot size, producing too many sizes of ingots may result in
significant increase in inventory and material handling costs and in logistical
problems. One way to deal with the problem is to produce the minimum
number of ingots sizes necessary to produce all the finished products. If
every finished product can be made from all ingot sizes the problem is trivial;
however, this is not usually the case. Suppose the total number of all ingots
sizes is n and the number of finished products to be made is m. Let aij = 1,
if the finished product i can be produced from ingot of size j and aij = 0
otherwise. Also, let Yi = 1, if ingot size j is used to manufacture some
612 R.R. Vemuganti

finished product and Yj = 0 otherwise. Finally let Xij = 1 if ingot size j


is used to manufacture the finished product i and Xij = 0 otherwise. The
following formulation provides the minimum number of ingot sizes needed
to make all finished products.
mm 2:j=1 Yi
s.t. 2:j=1 aijXij =1 i = 1,2, ... ,m
xijleqYj i = 1,2, ... ,m
j = 1,2, ... ,n
Yj = 0,1 j = 1,2, ... ,n
and Xij = 0,1 i = 1,2, ... ,m
j = 1,2, ... ,n (31)

Substituting Zj = 1-Yj, yields the mixed SP and SPT model. The objective
function can be replaced by
m n
LLCijXij
i=1 j=1
to minimize the total scrap where Cij is the scrap generated when finished
product i is made from an ingot of size j. When aij = 0, the corresponding
Cij value can be considered to be infinite (a large number).
A similar formulation may be used to minimize the number of various
metallurgical grades of ingots needed to make a variety of steel plates used in
production of railroad cars, ships, pipes and boilers. When customer orders
are received for steel plates, they are categorized by sales grade based on the
required chemical and mechanical properties. The ingots produced in a basic
oxygen furnace are also assigned metallurgical grade called met grade based
on the chemical or metallurgical composition. Assigning one met grade to
each sales grade and producing all customers orders within a sales grade
category using only one met grade may require a substantial number of met
grades. Producing a variety of met grades can reduce productivity due to
change over time required in switching from one met grade to another, in
addition to maintaining a large inventory of different metallurgical grades.
When more than one met grade can be used to satisfy a customer order,
minimizing the number of met grades required to satisfy all customers orders
is useful in increasing the productivity and reducing the inventory. If the
size is replaced by grade, the formulation of this problem is identical to the
above problem.
Set Covering, Set Packing and Set Partitioning 613

7.4 Spare Parts Allocation


Consider a repair shop where several types of engines are paired. Each engine
may require one or several types of modules to repair. Given the available
number of spare modules of each type, the problem is to determine the
optimal allocation of modules in order to maximize the number of engines
repaired. Suppose m is the number of various types of modules, bi is the
number of spares of module type i, n is the number of engines requiring
repair and aij = 1, if engine j requires module i and aij = 0 otherwise. Let
x j = 1, if engine j is repaired and x j = 0 otherwise. A GSP formulation of
this model is

max 2:j=IXj

s.t. 2:j=1 aijXj ~ bi i = 1,2, ... ,m


and Xj = 0,1 j = 1,2, ... ,m (32)
When more than one module of the same type is required for repair, the
corresponding zero-one MD knapsack formulation can be converted to a
GSP problem.

7.5 Pattern Sequencing in Cutting Stock Operations


Certain types of products such as rolls of paper are produced in a variety of
large dimensions due to the economies of scale and technological restrictions.
These rolls must be cut to specifications using various patterns to meet
customer demands. With respect to a given pattern, there is a certain
amount of trim loss due to lack of demand for the left over roll because of
its small width. Given the demands for various width of rolls, the cutting
stock problem is to determine the number of rolls to be cut using a specific
pattern to minimize the total trim loss. This problem can be formulated as
a linear integer program. Having determined the optimal patterns and the
number of rolls to be cut using a specific pattern, determining the sequence
in which to cut the different patterns is an important problem.
Suppose there are p optimal patters to meet the demands for m different
types of widths. Let dki equal the number of rolls of width k to be cut in
pattern i and Pi = (dli, d2i, ... , dmi). In order to cut a pattern, slitting
knives are to be set at appropriate locations. The number of slitter knife
settings required to cut pattern i is equal to bi = 2:k dki the number of
rolls to be cut. When one pattern is followed by another, the settings of
some slitter knives have to be changed to match the required pattern. The
614 R.R. Vemuganti

objective in sequencing the patterns is to minimize the total number of slitter


knife settings required for all patterns combined. The slitter knives can be
arranged beginning from the left of the role called single-ended slitting plan
where the trim loss occurs at the right end of the roll. One may also use
a double-ended plan in which widths may be matched from either or both
ends of the roll.
Obviously the maximum number of settings required is 2: bi which cor-
responds to slitting each pattern separately. Suppose a single-ended slitting
plan is used and consider a subsequence S of patterns S = (iI,i2, ... ,jt). Let
hk (s) represent the number of rolls of width k common to all patterns in the
subset S which is given by

Also let c(s) represent the total number of widths of all sizes common to all
patterns in the subset. Clearly
m
c(s) = L hk(S)
k=1

and only subsets for which c(s) > 0 contribute to the reduction of slitter
knife settings. From these subsets an optimal sequence can be obtained by
selecting a combination of subsets in such a way that each pattern appears
in exactly one subset. Suppose the number of subsets for which c(s) > 0
is nand /j is the optimal number of slitter knife settings required for the
subset j. Also let aij = 1, ifthe subset j contains the pattern i and aij = 0
otherwise. For feasibility each individual pattern is included in the list of
the sets. A SPT formulation of the pattern sequencing problem is

min 2:7=1 /jXj


s.t. 2:7=1 aijXj = 1 i = 1,2, ... ,p
and Xj = 0,1 j = 1,2, ... , n (33)

where Xj = 1, if the subset j is selected and Xj = 0 otherwise.


Example 12. Suppose rolls of 215" width are cut to satisfy the demand
for 8 rolls of 64" width, 4 rolls of 60" width, 3 rolls of 48" width, 3 rolls of
45" width, 7 rolls of 33" width, 6 rolls of 32" width and 1 roll of 16" width.
Cutting one roll for each one of the 7 patterns listed in Table 4 will satisfy
the demand.
Set Covering, Set Packing and Set Partitioning 615

Table 4: PATTERNS
Width No. of Rolls 1234567
64" 8 2001203
60" 4 0301000
48" 3 1010010
45" 3 0002100
33" 7 1050100
32" 6 0100050
16" 1 0000001

Consider the subset s = (1,3) of patterns. The individual number of slitter


knives required for patterns 1 and 3 are 4 and 6 respectively. Since the set s
has two settings in common the number of settings required for the subset
is 8. The subsets, the corresponding optimal arrangement and the number
of setting required are listed in Table 5.
The SPT model corresponding to this example requires 26 variables and 7
constraints.

7.6 Constrained Network Scheduling Problem


Consider a scheduling problem involving m jobs and job i consisting of
several tasks. Further suppose that Pij, a nonnegative integer is the time
required for processing task j of job i and no preemption of a task is allowed.
A set of K resources (or machines) are available to process the tasks and
a specified amount of resource of each type is required to process each task
of each job. For each job i, there are a set of precedence relations which
require that a certain task be completed before processing can begin on
another task. The amount of resource k available in period t is Rkh during
a planning horizon consisting of T time periods. The cost of completing
the job i in f units of time is 9i(J). The problem is to determine the time
at which the processing of each task should begin in order to minimize the
total cost of completing all jobs in T units of time or less without violating
the precedence relations and the resource constraints.
Suppose the number of schedules to complete job i, in T units of time
or less without violating the precedence relations is ni, hh is the completion
time of job i under schedule h and Cih = 9i(Jih). Further suppose that aihkt
616 R.R. Vemuganti

Table 5:
Number of Patterns in Optimal Number of
Subset the Subset Arrangement Settings (Ii)
1 (1,3) (1,3) 8
2 (1,4) (1,4) 7
3 (1,5) (1,5) 5
4 (1,6) (1,6) 9
5 (1,7) (1,7) 6
6 (2,4) (2,4) 7
7 (2,6) (2,6) 9
8 (3,5) (3,5) 9
9 (3,6) (3,6) 11
10 (4,5) (4,5) 6
11 (4,7) (4,7) 7
12 (5,7) (5,7) 6
13 (1,3,5) (1,5,3) 10
14 (1,3,6) (1,3,6) 13
15 (1,4,5) (1,5,4) 8
16 (1,4,7) (1,7,4) 4
17 (1,5,7) (1,5,7) 7
18 (4,5,7) (4,5,7) 9
19 (1,4,5,7) (1,5,7,4) 10
20 (1) (1) 4
21 (2) (2) 4
22 (3) (3) 6
23 (4) (4) 4
24 (5) (5) 3
25 (6) (6) 6
26 (7) (7) 4
Set Covering, Set Packing and Set Partitioning 617

is the amount of resource k required for all tasks of job i in process at time
t in schedule h. Let Xih= 1, if schedule h is selected for job i and Xih = 0
otherwise. An integer programming formulation of the resource constrained
network scheduling problem is given by

i = 1,2, ... ,m
k = 1,2, ... ,K
t = 1,2, ... ,T
and Xih = 0,1 i = 1,2, ... ,m
h = 1,2, ... ,ni (34)
When each task requires the use of a single machine and only one machine
of each type is available during each period of the planning horizon, clearly
aihkt = 0 or 1 and Rkt = 1 for all k and t. For this special case (a version
of the job-shop problem), the formulation corresponds to a mixed SP and
SPT model.

7.7 Cellular Manufacturing


Consider a manufacturing operation where several parts are produced and
each part requires processing on a specified set of machines. The machines
are grouped into cells and several cells may contain the same machine type.
The problem of interest is to determine the minimum number of cells that
each part must visit to complete processing on all the required machines.
Since there is no interaction between parts, each part can be treated inde-
pendently. Let M(i), be the set of machines required for processing part i
and ajk = 1 if cell k contains the machine j and ajk = 0 otherwise. Also
let Yik = 1 if part i visits cell k and Yik = 0 otherwise. A SC model to
determine the minimum number of cells for part i is

min L:~=l Yik


s.t. L:~=l ajkYik ~ 1 j E M(i)
and Yik = 0,1 k = 1,2, ... ,n (35)
where n is the total number of cells.
More realistic problems with machine capacity limitations to process the
parts can be formulated as linear integer programs including the optimal
allocation of machines to cells to minimize the total number of cells visited
618 R.R. Vemuganti

by all parts combined. The above formulation may be used as a subproblem


in developing efficient techniques to solve optimal allocation machines to
cells.

8 Miscellaneous Operations
As the title of this section suggests, a variety of unique and unrelated prob-
lems are discussed in this section.

8.1 Frequency Planning


Transmit and receive sites, links to connect the sites and frequency bands
available for transmission are important components of any satellite com-
munication system. Each ground terminal in a communication system can
transmit and receive communications. Because of restrictions on the avail-
ability of channels in the region where the transmitter and receiver are lo-
cated, constraints due to interference and technological limitations of the
satellite, the number channels available to a system are limited. A fre-
quency plan is an assignment of a separate frequency interval within the
available channels to each link in a communication system.
Suppose r is the number of ground terminals which can be both a trans-
mitter and receiver. Each link j is an ordered pair of stations. Obviously
the maximum number of links required is n = r(r -1). If Xj is the frequency
assigned to the transmitter of link j then the corresponding frequency of the
receiver must be Xj+s where s is a specified number. Due to highly nonlinear
form of link inference function, the available range of the frequencies for the
entire satellite system is divided into m intervals of equal but small band-
widths. In addition, the link interference constraint requires if a frequency
interval i assigned to a link j, then none ofthe intervals, i, i+ 1, ... , i+mj-l
can be assigned to any other link. This is equivalent to assigning all fre-
quency intervals (i, i + 1, ... , i + mj - 1) to link j. Suppose Pij is a measure
of link interference representing the transmitter and link interference if the
interval i is assigned to link j. Let Xij = 1, if interval i is assigned to link j
and Xij = 0 otherwise. A SPT formulation of this model is given by

mm
"n "m-mj + 1
L.Jj=l L.Ji=l PijXij

s.t. E~~mj+1 Xij = 1 j = 1,2, ... ,n


"n "m-mj+l
L.Jj=l L.Ji=k Xij + Sk =
1 k = 1,2, ... ,m
Set Covering, Set Packing and Set Partitioning 619

Xij = 0,1 j = 1,2, ... ,n


and Sk = 0, 1 k = 1,2, ... ,m (36)

where Sk is the slack variable. The first set of constraints insures exactly one
bandwidth is assigned to link j and the second set of constraints guarantees
that the interference constraints are satisfied. Note that Xij = 0, if j ~
(m - mj) in the above formulation.

8.2 Timetable Scheduling


Scheduling classes and examinations to avoid conflicts, without violating the
resources available such as number of class rooms, room capacity and other
constraints such as no consecutive examinations are computationally difficult
problems. One simple model for each scheduling problem is discussed below.
Suppose there are m classrooms and n classes to be assigned to class-
rooms each day. Further, suppose each day is divided into t periods and
aik = 1 if class i is required to be scheduled during period k and aik =
°
otherwise. Let Xij = 1, if class i is scheduled in room j and Xij = otherwise.
°
A mixed SP and SPT model of this problem is given by

min Lf=l LJ!=l CijXij


s.t. LJ!=l Xij = 1 i = 1,2, ... ,n
Lf=l aikXij ~ 1 j = 1,2, ... ,m
k = 1,2, ... ,t
and Xij = 0,1 i=1,2, ... ,n
j = 1,2, ... ,m (37)

where Cij is the cost of assigning class i to room j. The first set of constraints
ensures that every class is assigned a classroom and the second set guarantees
that no more than one class is scheduled during any period in any classroom.
Scheduling examinations is a difficult problem in large universities be-
cause of the number of students and the number of courses involved. Simple
versions of this problem can be typically formulated in one of the two ways.
Given a set of examinations, determine the minimum number of time
periods necessary to schedule all examinations with no conflict. A conflict
occurs when two examinations are scheduled concurrently and one or more
students must take both examinations. A model of this problem is discussed
in the section graphs and networks (see Chromatic Number).
620 R.R. Vemuganti

A second formulation of the problem is more detailed, having determined


the groups of examinations to be scheduled simultaneously called examina-
tion block, assign at most one block to a time slot on a given day. Suppose
there are m examination blocks and t time periods during a given day. If all
examinations are to be completed in D days then Dt ~ m. Adding dummy
examination blocks if necessary it can be assumed that Dt = m. Clearly
there are n = (7) possible combinations (m ~ t) of examinations schedules
on any day. Suppose Cj is the total number of students having two or more
examinations in the jth combination. Let Xj = 1, if jth combination is se-
lected and Xj = 0 otherwise. Also let aij = 1, if examination block i is in jth
combination and aij = 0 otherwise. A GSPT formulation of this problem is
given by

min Ej=l CjXj


s.t. Ej=l aijXj = 1 i = 1,2, ... , m
Ej=lXj = D
and Xj = 0,1 j = 1,2, .. , N (38)

The first set of constraints insures that every exam block is scheduled and
the second constraint guarantees exactly D days are scheduled for all exam-
ination blocks.

8.3 Testing and Diagnosis


Testing and diagnosis are very important problems in medicine, repair ser-
vice, software reliability and others. In this section two models are presented
one related blood analysis and the second related to diagnostic inference.
Blood Analysis Several tests have to be performed on a blood speci-
men. To execute tests several cuvetts are filled with a blood specimen and
any necessary testing agents are added. Because of the testing equipment
configuration, the tests must be partitioned into r clusters of m tests each
with the maximum number of tests n = rm. Clearly the total number of
possible clusters is s = (~). A priori it is not possible to determine a com-
bination of tests to be performed for a blood specimen. If the clusters are
grouped improperly a specimen requiring three tests may require three clus-
ters which is time consuming when compared to all three tests are performed
in the same cluster. Given p historical data sets of tests, the problem is to
determine the cluster configuration which minimizes the expected number
clusters required per specimen.
Set Covering, Set Packing and Set Partitioning 621

Suppose the vector Z = (Zl' Z2, ••• zn) denotes the test composition of an
arbitrary blood specimen where Zk = 1, if test k is required and Zk = 0
otherwise. Further suppose Zik = 1, if test k is performed in sample i and
Zik = 0 otherwise. Since at least one test is performed for any specimen, the
minimum number of clusters required for any specimen is 1. Let akj = 1, if
test k is in cluster j and akj = 0 otherwise. The number of tests performed
for sample i in cluster j is
n
L Zikakj
k=l
and therefore, the fraction of samples using cluster j is equal to

Let Xj = 1. If cluster j is selected and Xj = 0 otherwise. A SPT formulation


of the model is given by

mm Ej=l CjXj
s.t. Ej=l akjXj = 1 k = 1,2, ... , n
and Xj = 0, 1 j = 1,2, ... , s (39)

Diagnostic Expert System Diagnostic systems deal with identifying


causes or reasons for various symptoms, examination findings on laboratory
test results. Such problems of interest in determining the diseases based on
various tests or symptoms, repairs needed to correct automobile problems
and others.
Suppose a set of disorders or diseases and the manifestations caused
by each disorder is specified by experts. Given a set of manifestations the
problem is to determine the minimum possible number of disorders causing
the manifestations. Let n be number of disorders, m be the number of
manifestations and aij = 1, if disorder j causes manifestation i and aij = 0
otherwise. Also let Xj = 1, if disorder j is selected and Xj = 0, otherwise.
An SC formulation of this problem is given by

mm Ej=lXj
s.t. Ej=l aijXj ;::: 1 i = 1,2, ... , m
and Xj = 0, 1 j = 1,2, ... ,n (40)
622 R.R. Vemuganti

8.4 Political Districting


Dividing a region such as a state into small areas such as a district to elect
political representatives is called political districting. Suppose the region
consists of m population units such as counties (or census tracks) and the
population units must be grouped together to form r districts. Due to court
rulings and regulations, the deviation of the population per district cannot
exceed a certain proportion of the average population. In addition, each
district must be contiguous and compact. A district is contiguous if it is
possible to reach any two places of the district without crossing another
district. Compactness essentially means, the district is somewhat circular
or a square in shape rather than a long and thin strip. Such shapes reduce
the distance of the population units to the center of the district or between
two population centers of a district.
Suppose Pi, i = 1,2, ... , m is the population of the unit i. The mean
population is
m
p= CLPi)/m.
i=l
Then for feasibility every district j must satisfy

Ip(j) - pi ~ ap
where p(j) is the population of district j and 0 < a < 1.
Suppose aij = 1, if the unit i is included in district j and aij = 0
otherwise. Clearly p(j) is given by
m
p(j) = L: aij·
i=l

To test for the contiguity of a district, construct an undirected graph whose


nodes a re the units of the district and connect two nodes by an edge if they
have a common border. The district is contiguous if there is a path between
any two nodes. For compactness, consider any two populations units i and
k of a district. If population units i and k are included in a district j,
aij = akj = 1 and
d"3 = max(d"ka""ak")
i,k 1 13 3

is the distance between two units which are farthest apart where dik is the
distance between units i and k. If Aj is the area of the district j, ~ / Aj may
be used as a measure of the compactness. Suppose n represents the number
Set Covering, Set Packing and Set Partitioning 623

of all feasible districts. Suppose Cj a measure of the deviation of population


of district j is
Cj = Ip(j) - pil ap.

A GSPT formulation of the political districting problem is given by

min '2:'1=1 CjXj

s.t. '2:'1=1 aijXj = 1, i = 1,2, ... , m


'2:'1=lXj = r
and Xj = 0,1 j = 1,2, ... ,n (41)

where Xj = 1, if the district j is selected and Xj = 0 otherwise. If the


objective function is changed to
• n
mmmaxc'x'
j=l :I 3

the problem is called a "bottleneck" problem.

8.5 Information Retrieval and Editing


Consider a multiple file data storage system with a distinct file for each
supercategory of information. Each file contains several records with items
corresponding to more detailed categories. Multiple files or super categories
of files are typically overlapping. A record may contain information relevant
to various supercategories. Because of the overlapping nature of informa-
tion stored, a request for certain specified items of information related to a
category can be obtained by interrogating any of the several different files.
Given the time required to search a file (depends on the number of records
stored) and several requests for information related to various categories the
problem is to select the files which provide information in the least amount
of time.
Suppose there are n files and!; is the length of the file j,j = 1,2, ... , n.
Also suppose there are m requests and aij = 1, ifrequest i can be met from
file j and aij = 0 otherwise. A SC formulation of the model is

min '2:'1=1 !;Xj

s.t. '2:'1=1 aijXj ~ 1 i = 1,2, ... ,m


and Xj = 0,1 j = 1,2, ... ,n (42)
The constraints insure that at least one file is selected for each request.
624 R.R. Vemuganti

Table 6:

Edit Fields
1 2 3 4 5 6
1 - (0,1) (0) - (0,1) -
2 (1) - (1) (0,1) - (2,3)
3 - (1,2) - (1,2,3) - -
4 - (0,2) - - - (0,1)
5 (1) - - (0) (1,2) -

Another application related to data is consider a database generated


from surveys or questionnaires. Usually the responses to surveys 0 r ques-
tionnaires contain categorical data meaning that the magnitude of the coded
information such as 1 = single, 2 = married, has no intrinsic value. Re-
sponses to surveys contain large amounts of incorrect data. To check for the
accuracy of data, all data sets are examined through a set of edits or tests.
Failed data sets are corrected by making as few changes as possible.
Suppose each questionnaire requires n fields (Yl, Y2, ... , Yn) to represent
the data and Rj represents all possible entries in field j. An edit Ei consists
of a set of logically unacceptable values. Rtj ~ Rj for some j E Fi ~
{I, 2, ... , n}. A data set Y fails or inaccurate if Y E Ei. Given a set of m edits
Ei with corresponding Rtj and Fi and Y E Ei, the problem is to determine
the minimum number of fields to be corrected to obtain a meaningful data.
Let Xj = 1, if field j is selected and Xj = 0, otherwise. Also let aij = 1,
if j E F;, and aij = 0 otherwise. An SC formulation of the problem is

min 'E']=lXj
s.t. aijXj ~ 1 i = 1,2, ... , m
Xj = 0,1 j = 1,2, ... , n (43)

Solution to this problem may not generate a feasible record. In generating


the constraints in addition to the m edits, all implied edits must be included.
Example 8. Suppose a questionnaire contains 6 fields and the possible
values for each field are Rl = (0,1), R2 = (0,1,2), R3 = (0,1), R4 =
(0,1,2,3), Rs = (0,1,2,3). The five edits selected are listed in Table 6.
Now consider y = (1,0,0,0,1,0). Clearly this data set fails EI, E4 and Es.
Set Covering, Set Packing and Set Partitioning 625

Without adding the implied edits the set covering formulation is

min Xl + X2 + X3 + X4 + Xs + X6
s. t. X2 + X3 + Xs ~ 1
X2 +X6 ~ 1
Xl +X4 +XS ~ 1
and Xj = 0,1 j = 1,2, ... ,6

Generating implied edits requires considerable effort due to the combinato-


rial nature of the problem.

8.6 Check Clearing


Clearing checks is an important activity in commercial banks since cleared
checks guarantee the availability of funds to the bank in which the check
is deposited. Checks drawn on out-of-town banks (transit) require consid-
erably longer duration for clearance in comparison with checks drawn on
local banks or checks drawn on the bank itself. A check not cleared in time
costs the bank since funds must be made available to the customer. Various
methods are available for check clearance such as clearing the checks through
the Federal Reserve System or shipping checks directly to the bank using
various transportation modes. Deciding which method to use for clearing
is complicated by additional factors such as the time and day of the week
the check is deposited and bank availability schedule. Each bank has an
availability schedule which outlines the number of days required to clear
checks in each region of the country. In addition, checks are grouped based
on the drawee check classification and each group or type must be treated
separately. The selection of the time period during which a check is sent for
clearance is also an important factor.
Suppose the number time periods, the numbers modes for clearance and
the number of types of checks are t, m and n respectively. Let Xijk = 1,
if check type i is sent for clearance by mode j in period k and Xijk = 0
otherwise.
Also let Yjk = 1, if clearing mode j is used in period k and Yik = 0
otherwise. Suppose Cijk is the opportunity cost of check type i cleared by
mode j in period k, Vj is the variable cost and /;k is the fixed cost for clearing
method j in period k and dik is the number checks of type i available for
clearance in period k. Let aij = 1, if mode j can be used to clear check type
i and aij = 0 otherwise.
626 R.R. Vemuganti

A mixed SP and SPT model of the check clearing problem is given by


(substitute Zjk = 1 - Yjk)
m t
min Ef=l Ej!:l Et=l(Cijk + Vjdjk)Xijk + LL!jkYjk
j=lk=l
s.t. Ej=l aijXijk = 1 i=1,2, ... ,n
Xijk $ Yjk i = 1,2, ... ,n
Xijk = 0,1 i=1,2, ... ,n
j = 1,2, ... ,m
k = 1,2, ... ,t
and Yjk = 0,1 j = 1,2, ... ,m
k = 1,2, ... , t (44)

8.7 Capital Budgeting Problem


Suppose there are n investment projects and Cj is the net - present value
of the project j, for j = 1,2, ... , n. Let aij be the cash-outlay or capital
expenditure required during period i for i = 1,2, ... , m. Given a budget
bi, for period i, the problem is to determine a subset of projects which

in each period. Let Xj = 1, if project j is selected and Xj = otherwise. A


LIP formulation of the problem is
°
maximizes the total net-present value without violating budget restrictions

max E'J=l VjXj


s.t. E'J=l aijXj $ bi i=1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (45)

where Vj is the net-present value of project j.


In this formulation both nonnegative and negative values of aij are per-
mitted. A positive value implies that the project requires capital expen-
diture and a negative value corresponds to the situation where the income
generated is greated than the capital expenditure. As noted earlier, this
problem can be converted to a Zero-one MDK problem which in turn can be
transformed to a GSP problem. It is possible to incorporate SP constraints
such as
Xk +Xj < 1

which implies that at most one of the projects k or j may be selected.


Set Covering, Set Packing and Set Partitioning 627

8.8 Fixed Charge Problem


A fixed charge bounded linear programming problem may be formulated as

min Ej=l CjXj + Ej=l f;Yj


s.t. Ej=l aijXj ~ bi i = 1,2, ... ,n
Xj $ UjYj j=1,2, ... ,n
Xj ~ 0 j=1,2, ... ,n
and Yj = 0,1 j = 1,2, ... ,n (46)

where Uj is an upper bound on the values of Xj and f; > 0, is the fixed cost
which is incurred only when the corresponding Xj > O. In this formulation,
it is assumed that all bi are nonnegative. Consider the following related SC
problem.

min Ej=l f;Yj


s.t. Ej=l bijYj ~ 1 i = iI, i2, ... , it
and Yj=O,l j=1,2, ... ,n (47)

where bij = 1 if aij > 0 and bij = 0 otherwise (corresponding to the con-
straints i = (iI, i2, ... , it) for which bi > 0). Clearly any feasible Yj for
problem (46) is also a feasible solution to problem (47) and the converse is
not true. When an optimal solution from (47) substituted in (46) yields an
optimal x, the corresponding value of the objective function yields an upper
bound for the fixed charge problem. When the optimal solution to (47) does
not lead to a feasible solution to (46), suppose J* is the set of indices j for
which Yj = 1 in the current optimal solution and tj = 0 if j in J* and tj = 1
otherwise. The SC constraint
n
'"' t·y· > 1
L.JJJ-
j=l

can be added to formulation (47) to eliminate the current optimal solution.


The procedure can be continued to generate a good feasible solution. In
addition, the formulation (47) can also be embedded in branch and bound
algorithms to solve the fixed charge linear programs including the fixed
charge transportation problem.
628 R.R. Vemuganti

8.9 Mathematical Problems


Suppose A = (aij) is a mxn matrix with all ai; = 0 or 1. The a-width of
such a matrix is the minimum number of columns of matrix A necessary so
that the sum of each row of the resulting submatrix is at least equal to an
integer a. Clearly, a cannot exceed
m n
a · = mIn. E a"
'-1
1- j=1
'J

and GSC formulation of this problem is straight forward.


Suppose S is a (finite) set of integers and Si ~ S for i = 1,2, ... , nand
each Si is an arithmetic progression. For an arithmetic progression the dif-
ference between two consecutive numbers is same. For example (3,5,7,9)
is an arithmetic progression which can be expressed as (2i + 1), i = 1,2,3
and 4. If all elements of S can be covered by the sets Si, i = 1,2, ... , n it
is called n-cover by arithmetic progressions. Formulation of this problem is
also straight forward. Two dimensional version of this problem is useful in
production operations of VLSI chips.

9 Routing
In distribution management, strategic decisions regarding the location of
plants, warehouses and depots, tactical decisions concerning the fleet size
and mix, operational decisions dealing with routing and scheduling of vehi-
cles have a significant impact on the cost of delivery of goods and services
to customers and maintaining a satisfactory level of service. Because of con-
siderable capital requirements, it is not possible to relocate facilities and to
some extent change the fleet size and mix frequently. Consequently selec-
tion of routes and scheduling vehicles is an important problem in adapting
to changing market conditions in many operations such as supermarkets,
department stores, package delivery, cargo pickup and delivery, newspaper
delivery, preventive maintenance tours and others.
Consider a distribution system with one or several depots delivering a
product to customers located over a network using several vehicles. Each
customer requires a specified amount of the product to be delivered and
each vehicle has a capacity which limits the amount of the product that can
be delivered in one trip. Usually a vehicle starts from a given depot and
must return to the same depot. Given the distances between customers, and
Set Covering, Set Packing and Set Partitioning 629

the distance between the depots and customers, the routing and scheduling
problem is to determine the number of vehicles needed and the assignment
of customers to each vehicle without violating the capacity constraints which
minimizes the total distance traveled by all vehicles. If the list of customers
assigned to each vehicle is known, minimizing the total distance traveled by
each vehicle separately and combining the results for all vehicles provides
the desired solution. Given a list of customers, each route corresponds to
the order in which the customers are visited. Finding the optimal order of
visiting customers is the well known Traveling Salesman Problem which is
presented next.

9.1 Traveling Salesman Problem


Consider a directed graph G = (N, A). Let Cij be the distance (length or
cost) of the arc (i,j) E A. A tour (Hamiltonian cycle) is an elementary
circuit which is also equivalent to starting at any given node, visiting every
other node exactly once and returning to the starting node. The sum of the
distances of the arcs in the circuit is the length of the tour. The objective
of the Traveling Salesman Problem (TSP) is to determine a tour of shortest
length. Assuming all possible arcs are included in the arc set A, the total
number of all possible tours is n! The TSP is a difficult combinatorial prob-
lem because of the enormous number of possible tours for a large n. The
following mixed SC and SPT formulation of the TSP is useful in developing
models for a variety of routing problems. Let Xij = 1, if arc (i,j) is in the
tour and Xij = 0 otherwise.

min Ei=l E'J=l CijXij


s.t. Ei=l Xij = 1 j = 1,2, ... ,n
E'J=l Xij = 1 i = 1,2, ... ,n
EiEq EjEq Xij 2: 1 for all nonempty q ~ N
and Xij = 0, 1 (i,j) E A (48)

The first two sets of constraints ensure that each node is visited exactly
once and the third set of constraints eliminates subtours. When Cij =
Cji, the problem is called a symmetric TSP. The distance matrix is Eu-
clidean if the distances satisfy the triangle inequality Cij ~ Cik + Ckj for all
(i,j), (i, k), (k,j) EA. Other formulations of this problem are available. An
extension of the problem is called M Traveling Salesman Problem (MTSP)
630 R.R. Vemuganti

where M salesman are to visit the nodes in such a way so that the total dis-
tance traveled by all salesman is a minimum. Each node must be visited by
exactly one salesman except the common node. Each salesman must travel
along a subtour of the nodes which includes a node common to all salesmen.
The MTSP can be formulated as TSP by creating M copies of the common
node and connecting each copy of the node with the rest of the nodes as
the original node. The M copies of the node are either not connected or
connected by an arc with a distance exceeding
n n
LLCij.
i=lj=l

The nodes connecting two copies of the node is a subtour which can be
assigned to a salesman. A direct integer programming formulation of the
MTSP is given by

min Ei=l ,£j=l CijXij


s. t. Ej=l X1j = M
Ej=l Xij = 1 i = 2, ... , n
Ei=lXil = M
Ei=l Xij = 1 j =2, ... ,n
EiEq EjEq Xij ~ 1 for all nonempty q ~ (N -1)
and Xij = 0,1 (i,j) E A (49)

In this formulation node 1 is assumed to be the common starting node


for all salesmen. This formulation is also a mixed GSPT and SC model.
Because of the number of variables and constraints involved the computa-
tional effort grows exponentially with the number of nodes to determine the
optimal tour using integer programming formulations. Several techniques
based on the relaxations of the TSP such as assignment, 2-matching, I-tree,
l-arborescence and n-path have been successful in generating optimal tours
for moderate size problems. Heuristic methods which generate good tours
(not necessarily optimal) are useful in practical applications. One approach
called the k-opt method seems to work well in generating good tours. Start-
ing with any tour, the k-opt method is a systematic search for better tours
by deleting and adding a specified number of arcs. The minimum number
of arcs to be replaced to generate a new tour is two. Select any two arcs
of the tour and replace them with two new arcs not in the tour if the new
Set Covering, Set Packing and Set Partitioning 631

tour is better than the current one. After examining all combinations of
two arcs, combinations of three arcs can be examined. In a 3-opt method,
the search procedure is stopped after examining all combinations of three
arcs. Clearly for large values of k, the k-opt method also requires substantial
computational effort.

9.2 Single Depot Vehicle Routing


Suppose G = (N, A) is a directed graph and all customers are located at
the nodes of the graph. Without loss of generality suppose node 1 is a
single depot. If the capacity of a single vehicle is sufficient to satisfy the
demand of all customers the problem can be formulated as a TSP. When
several vehicles are needed to satisfy the demand and the capacity of the
vehicles are different, each individual vehicle must be treated explicitly in
developing a model. To get a feel for the size ofthe problem, a direct integer
programming formulation is presented below.
Let Xijk = 1, if vehicle k is used to visit node j directly after visiting
node i and Xijk = 0 otherwise. Let qi be the demand at node i, Qk be the
capacity of vehicle k and v be the number of vehicles used.

min E?=1 E'1=1 Ek=1 CijkXijk


s.t. E?=1 Ek=1 Xijk = 1 j = 1,2, ... ,n
E'1=1 Ek=1 Xijk = 1 i = 1,2, ... ,n
E'1=1 Xijk ~ 1 k = 1,2, ... ,v
Ei=1 Xijk - Ei=1 Xjik = 0 k = 1,2, ... ,v
j =2, ... ,n
Ei=1 qi E'1=1 Xijk ~ Qk k = 1,2, ... ,v
EiEB EjEB Xijk ~ lsi - 1 for all nonempty
8 E {2, ... ,n}

k = 1,2, ... ,v
and Xijk = 0, 1 i = 1,2, ... ,n
j = 2,2, ... ,n
k = 1,2, ... ,v (50)

The first and second set of constraints ensure that a vehicle enters and
departs each node. The third set guarantees that each vehicle is used at
most once. The fourth set ensures that if a vehicle enters a node it must
632 R.R. Vemuganti

also depart from that node. The fifth set guarantees that the total demand
of the nodes visited by a vehicle is no more than the capacity of the vehicle.
The last set corresponds to the usual subtour elimination constraints for each
vehicle. Other formulations of this problem including additional constraints
on total travel time are available.
It is possible to formulate the above problem as a mixed SP and SPT
problem. Suppose r is the total number of feasible tours. For feasibility the
combined demand of the nodes in a tour (demand of the tour) must not
exceed the maximum capacity of the vehicles. Suppose aij = 1, if node i
is included in tour j and aij = 0 otherwise. Let bkj = 1, if the vehicle k
can carry the demand of tour j and bkj = 0 otherwise. Also let Cj be the
minimum cost of tour j, Xj = 1 if tour is selected and Xj = 0 otherwise.
Then the problem is
min Ej=l CjXj
s.t. Ej=l aijXj = 1 i = 2,3, ... ,n
Ej=l bkjXj ~ 1 k = 1,2, ... ,v
and Xj = 0,1 j = 1,2, ... ,r (51)
The first set of constraints ensures that every node is included in exactly
one tour and the second set guarantees that each vehicle is assigned no more
than one tour.
Generating all feasible tours is a very time consuming process. In addi-
tion, one TSP has to be solved for each tour to determine the cost associated
with each tour. A variety of heuristics have been developed to generate so-
lutions to the routing problem which can be grouped into four categories
namely cluster first-route second, route first-cluster second, savings or in-
sertion and improvement or exchange. In cluster first-route second approach
the nodes are first assigned to vehicles and one TSP is solved for each vehicle
to determine the optimal tour. In the route first-cluster second approach
first a long tour is constructed through all nodes, then the tour is partitioned
into pieces or segments which can be assigned to vehicles. In the savings or
insertion approach, unassigned nodes are inserted into the existing route or
routes based on the least cost or maximum savings, taking into account the
vehicle capacity constraints. Having generated a set of feasible routes for
each vehicle, improvement or exchange procedures examine exchanging arcs
in a given route or exchanging nodes between two routes and leads to an
improved feasible solution. Exact procedures to solve the routing problems
include specialized branch and bound methods, dynamic programming and
Set Covering, Set Packing and Set Partitioning 633

cutting plane algorithms.

9.3 Multiple Depots and Extensions


When there are multiple depots a vehicle starting at a specific depot may
be required to return to the same depot or permitted to visit another depot.
In some applications the delivery of goods at each node must be completed
within the time window defined by the earliest and latest time between which
the product must be delivered. When the time windows are full days and the
delivery must occur on a specified number of days of the planning horizon,
the problem is called multi-period vehicle or periodic routing problem. The
integer programming formulation (50) can be modified to incorporate multi-
ple depots, and time window constraints. In formulation (51), the additional
constraints are taken into account when generating the tours. However, it
is no longer feasible to determine the minimum cost of each tour using TSP
when time window constraints are imposed. One may have to resort to
heuristics to determine the best possible arrangement of each tour. Other
extensions include the integrated inventory and routing model and the inte-
grated depot location and routing model which can be formulated as mixed
integer and integer programs. The formulation (51) may also be used in
scheduling a fleet of ships to pick-up and deliver cargo as well as pick-up
passengers and deliver them to their destinations (dial-a-ride problem).

10 Location
The selection of sites for locating facilities has a major impact on both public
and private sector operations. Examples of facilities in manufacturing activ-
ity include plants, warehouses and retail outlets for producing, distributing
and selling products. The location of centers for processing checks (lock
box) has significant impact on banking operations. The locations of emer-
gency medical services, fire stations, social service centers, day care centers,
post offices, bus stops, shopping centers and hospitals are a major concern
of most regional and urban planners. Distance measure, cost structure, time
to travel to service centers, supply and demand for services and products
are some of the important factors in modeling location problems. When the
location of the facilities is unrestricted (can be located on a two dimensional
plane) both Euclidean metric and Rectilinear metric can be used to calcu-
late the distance between two points. Given two points PI = (Xl, YI) and
634 R.R. Vemuganti

P2 = (X2' Y2), the Euclidean distance between the two points is

and the rectilinear distance is

If the location of facilities are restricted to the nodes of a graph, the distance
between two nodes is the shortest distance between them using the arcs or
edges of the graph. When the location of facilities is permitted on the
edges (or arcs) of a graph the shortest distance between two points can be
calculated using the following approach. Suppose x is a point on the edge
(a, b), Y is a point on the edge (g, h) and dij is the shortest distance between
nodes i and j. Then the shortest distance between the points x and

dxy = min(d(x, a) + dag + d(g, y), d(x, a) + dah + d(h, y),


d(x, b) + dbg + d(g, y), d(x, b) + dbh + d(h, y))

where d(x, a), d(x, b), d(g, y) and d(h, y) are the lengths ofthe edge segments.
Depending on the selection criteria of the locations, distance measure
used and the restrictions on the location of the facilities a variety of models
have been developed to determine the optimum location of facilities. Models
based on the Euclidean distance measure are not included since they require
nonlinear programming formulations. In majority of the models, the location
offacilities is restricted to the nodes or edges of a graph (which may be used
to represent transportation networks) due to the physical travel involved
in delivering the goods and services. The location models are divided into
five categories, namely plant location problem, lock box location problem,
p-center problem, p-median problem and service facilities location problem
which are presented next.

10.1 Plant Location Problem


Consider a manufacturing operation with m potential sites for plants to
produce a single commodity and ship a specified number of units of the
product to each one of the n customers. Suppose fi is the fixed cost of
opening a plant at location i, Cij is the unit shipping cost from plant i to
customer j, bj is the demand at customer j and the capacity of the plant
is unlimited. The problem is to determine the location of the plants which
Set Covering, Set Packing and Set Partitioning 635

minimizes the total cost. Since the capacity of each plant is unlimited it

°
is optimal to ship all the quantity to a customer from one location. Let
Xi; = 1, if customer j is supplied from plant i and Xi; = otherwise. Also
°
let Yi = 1, if plant i is open and Yi = otherwise. An integer programming
formulation of the plant location problem is given by

min E~1 E']=1 CijbjXij + E~IIiYi


s.t. E~IXij = 1 j = 1,2, ... ,n
Xij ::; Yi i = 1,2, ... ,m
j = 1,2, ... ,n
Xij = 0,1 j = 1,2, ... ,n
i = 1,2, ... ,m
and Yi = 0,1 i = 1,2, ... ,m (52)

To transform this problem into a SP problem define Zi = 1 - Yi and select a


large number M such that
m n
M> LLCijbj •
i=1 j=1
Then problem (52) is equivalent to

max E~1 E']=I(M - Cijbj)Xij


+ E~llizi - E~1 Ii - mM
s.t. E~1 Xij ::; 1 j = 1,2, ... ,n
Xij + zi ::; 1 i = 1,2, ... ,m
j = 1,2, ... ,n
Xij = 0,1 i = 1,2, ... ,m
j = 1,2, ... ,n
and Zi = 0,1 i = 1,2, ... ,m (53)

It is also possible to transform problem (53) into a SC problem. Set Cijbj =


M, a large number if plant i cannot supply customer j and arrange Cijbj for
each customer in increasing order. Suppose rkj for k = 1,2, ... ,j* are the
distinct values of the sequence and rkj ::; rk+l,j. Let Zkj = 1, if customer
j is not served by a facility with transportation cost less than or equal to
636 R.R. Vemuganti

rkj and Zkj = 0 otherwise. Also let aijk = 1, if Cijbj < rkj and aijk = 0
otherwise. A SC formulation is given by

.
mm "m L.Jj=l "r
L.Ji=l f iYi + "n L.Jk=l (rk+1,j -rkj Xkj )
s.t. Ef:!:l aijkYi + Zkj ~ 1 k = 1,2, .. ·,i*
i = 1,2, .. ,n
Yi= 0,1 i = 1,2, ... ,m
and Zkj = 0,1 k = 1,2, ... ,i *
i = 1,2, ... , n (54)

10.2 Lock Box Location Problem


The number of days required to clear a check drawn on a bank in city i
depends upon the city i in which the check is cashed. For a company which
pays bills to many clients, it is profitable to maintain accounts at various
strategically located banks and pay the clients from checks drawn on one of
the banks so that large clearing times can be achieved. It costs the company
to maintain an account (lock box) in a bank. Suppose there are n potential
lock box locations and m client locations. Suppose Cij is the monetary value
per dollar of a check issued in city i and cashed in city i. Suppose bi is the
dollar volume of checks paid in city i. Let Xij = 1, if customer in city i is
paid from an account in city i. Also let Yj = 1, if an account is maintained
in city i and Yj = 0 otherwise. An integer programming formulation of the
lock box location problem is given by

max Ef:!:l Ej=l CijbiXij - Ej=l /jYj


s.t. Ej=l Xij = 1 i = 1,2, ... , m
i = 1,2, ... ,m
i = 1,2, ... ,n
Xij = 0,1 i = 1,2, ... ,m
i = 1,2, ... ,n
and Yj = 0,1 i = 1,2, ... ,n (55)

where Ij is the fixed cost of maintaining an account at location i. This


model is similar to the plant location model (47) which can be transformed
into a SP model.
Set Covering, Set Packing and Set Partitioning 637

10.3 P-Center Problem

Suppose the demand for a service is located at the points Xi, i = 1,2, ... , m
and locations y = (Yl, Y2, ... , Yp) for p service centers are to be selected from
a set of points s. For any two points X and Y, suppose d{x, y) is the shortest
distance between them. For each demand point xi the distance to the closest
service center is given by

The maximum closest distance between demand points and service centers
is

The P-Center problem is to determine y* E s which minimizes dey). The


problem of locating p emergency service facilities which can be reached from
demand points in the shortest possible time is the P-Center problem.
When the number of points in the set S is finite such as the nodes of a
graph the optimal locations of the centers can be obtained by solving a series
of SC problems. Suppose the number of points in S is n{n ~ p) and hij is
the shortest distance between demand point i and location j. Select any p
points from the set s and calculate the corresponding value of the objective
function dl = dey). To determine if the objective function can be improved
set aij = 1, if hij < d1 and aij = 0 otherwise and solve the following SC
problem

min Ej=IXj
s.t. Ej:;:::1 aijXj ~ 1 i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (56)

If the value of the objective function of SC is greater than p then the existing
solution is optimal. Otherwise calculate d2 corresponding to the new centers,
generate the corresponding SC problem and continue the procedure.
Clearly, when the service centers are restricted to the nodes of a graph,
the set of possible locations is finite. Even when the service centers are
permitted along the edges, it is possible to reduce the number of possible
locations to a finite set. When the locations of service centers are permitted
along the edges of a graph the problem is called absolute P-center problem.
638 R.R. Vemuganti

10.4 P-Median Problem


Suppose there are m demand points with a user population of ai at demand
point i and there are n possible locations for service centers. Also, suppose
dij is the distance between demand point i and location j. Given a set
of permissible locations for each demand point, the problem is to assign
one or more service centers to each demand point so that the sum of the
population weighted distance for all demands points from the respective
service centers is a minimum. To formulate this problem suppose aij = 1 if
the demand point i can receive service from location j and aij = 0 otherwise.
Also let Xij be the fraction of the population at node i receiving service from
location j. When the number of service centers required is p, a mixed integer
programming formulation is given by
min Er:l Ej=l aidijXij
s.t. Ej=l aijXij = 1 i = 1,2, ... ,m
Xij ~ Yj i = 1,2, ... ,m
j = 1,2, ... ,n
Ej=lYj =p
Xij ~ 0 i = 1,2, ... ,m
j = 1,2, ... ,n
and Vj = 0,1 j = 1,2, ... ,n (57)
where Vj = 1, if a service center is open in location j and Vj = 0 otherwise.
Since service centers have no capacity restrictions, assigning the nearest
service center among the selected sites to each demand point is feasible, an
optimal solution to problem (57) can always be found with all Xij = 0, l.
The model (57) is called generalized p-median problem. When the location
of service centers and demand points are restricted to the nodes of a graph,
the problem is called a p-median problem. Even when the location of service
center is permitted along the edges of the graph, an optimal solution can
always be found by restricting the locations to the nodes.

10.5 Service Facility Location Problem


In all the applications discussed in this section the demand points and the
service facility locations are restricted to the nodes of a graph. Suppose dij
and tij represent the shortest distance and time required to travel from node
i to node j. A set covering location problem is to find the minimum number
Set Covering, Set Packing and Set Partitioning 639

facilities required so that every demand point has at least one facility which
can be reached with in a specified distance or time or both. Let aij = 1, if
the facility located at node j can be reached from the demand point i with
in the specified distance or time and aij = 0 otherwise. Also, let Xj = 1,
if a service facility is located at node j and Xj = 0 otherwise. The SC
formulation of the this model is straight forward.
A related problem called, maximal covering problem is to maximize the
coverage when the number of service centers is restricted to a specified num-
ber p. An integer programming formulation of this model is

max Ef:!:1 Yi
s.t. aijXj ~ Yi i = 1,2, ... , m
E'1=IXj = P
Xj = 0,1 j = 1,2, ... ,n
Yi = 0,1 i = 1,2, ... ,m (58)
Substituting Yi = 1- Zi, the problem can be transformed to a mixed SC and
GSPT model.
Another useful model is called hierarchical objective set covering model
which has multiple objectives. One is to minimize the number of facility
locations to cover all demand points and the second is to maximize excess
coverage. A model to maximize excess coverage is given by

max Ef:!:l Yi
s.t. E'1=1 aijXj - Yi ~ 1 i = 1,2, ... ,m
and Xj = 0,1 j = 1,2, ... ,n (59)
Both objectives can be incorporated by changing the objective function to
n m
minw L X j - LYi
j=1 i=1

where w > 0 is some chosen weight. Substituting Yi = (1 - Zi), this model


can be transformed to a mixed SPT and GSC model.
Another extension of the set covering location problem is to incorporate
backup coverage. Suppose a backup coverage must be located with in a
specified distance or time. let bij = 1, if node j is with in the distance or
time to provide backup coverage. A SC formulation of this model is
640 R.R. Vemuganti

s.t. E']=1 aijXj ~ 1 i = 1,2, ... ,m


E']=1 bijXj - Yi ~ 1 i = 1,2, ... ,m
E']=I X j =p*
and Xj = 0,1 j=I,2, ... ,n
Yj = 0,1 i = 1,2, ... ,m (60)

where p* is the optimal number of service centers needed for the primary
coverage. The first set of constraints ensures the primary coverage and the
second set represents the backup coverage.
The most general extension is the multiple response units model. Sup-
pose multiple response units or several facilities can be located at a given
node. Also suppose the demand node i requires ri response units and the
response unit k must be located with in a distance (or time) of hik. Let
Xj be the number response units located at node j and aijk = l,if location
j, can provide response unit k to demand point j and aijk = 0 otherwise.
Note that it is reasonable to assume that the maximum distance for the first
response unit is less than or equal to the maximum distance for the second
response unit, the maximum distance for the second response unit is less
than or equal to the maximum distance for the third response unit and so
on. This assumption implies that if a location can provide a first response
unit, it can also provide second response unit, third response unit and so
on. A GSC formulation is given by

min E']=I X j
s.t. E']=1 aijkXj ~k k = 1,2, ... ,ri
i = 1,2, ... ,m
and Xj is a nonnegative integer j = 1,2, ... , n (61)

11 Review of Bibliography
The list of references in each category except the general list are divided
into subgroups related to the topics discussed in each section. An attempt
is made to include articles containing information relevant to more than one
subgroup in all appropriate subgroups. A few articles may not have been
included in any subgroup due to the inability of the author to secure copies of
the articles. Even though the papers have been reviewed carefully, it is very
possible that some of them may have been listed in the wrong category. The
Set Covering, Set Packing and Set Partitioning 641

list of references included in this paper is by no means complete. However,


it is the hope of the author that all significant papers have been included.
The subgroups and the references related to each subgroup for each category
are presented next.

11.1 Theory
Set Covering: Avis (80) Baker (81), Balas and Ng (89), Balas (84), Balas
and Ho (80), Balas (80), Balas and Padberg (76, 75b), Beasley (87), Bellmore
and Ratliff (71), Benvensite (82) Chang and Nemhauser (85), Chaudary,
Moon and McMormick (87), Christofides and Paixo (86), Christofides and
Korman (75), Chvatal (79), Comforti, Corneil and Mahjoub (86), Cornuejols
and Sassano (89), Crama, Hammer and Ibaraki (90) EI-Darzi (88), EI-Darzi
and Mitra (88a, 88b), Ectheberry (77) Fisher and Kedia (90), Fisher and
Wolsey (82), Fowler, Paterson and Tanimoto (81) Garfinkel and Nemhauser
(72) Hammer and Simeone (87), Hammer, Johnson and Peled (79), Ho (82),
Hochbaum (80) John and Kochenberger (88), Johnson (74) Lawler (66),
Leigh, Ali, Ezell and Noemi (88), Lemke, Salkin and Speilberg (71), Lavasz
(75) Murty (73) Padberg (79), Peled and Simeone (85), Pierce and Lasky
(75), Pierce (68) Roth (69), Roy (72) Salkin (75), Salkin and Koncal (73)
Vasko and Wolfe (88), and Vasko and Wilson (86, 84a, 84b)
Set Packing: Balas and Padberg (76, 75b) Chang and Nemhauser (85),
Coffman and Leuker (91), Crama, Hammer and Ibaraki (90) Fisher and
Wolsey (82), Fowler, Paterson and Tanimoto (81), Fox and Scudder (86)
Padberg (79, 73), Pierce (68)
Set Partitioning: Albers (80), Anily and Federgruen (91) Balas and
Padberg (76, 75a, 75b) Chan and Yano (92), Coffman and Leuker (91),
Crama, Hammer and Ibaraki (90) EI-Darzi (88), EI-Darzi and Mitra (88a,
88b) Fisher and Kedia (90, 86) Garfinkel and Nemhauser (69) Hammer
and Simeone (87), Hwang, Sum and Yao (85) Marsten (74), Michaud (72)
Nemhauser, Trotter and Nauss (74) Padberg (79, 73), Pierce and Lasky
(75), Pierce (68) Ryan and Falkner (88) Trubin (69)
The constraint co-efficient matrix for all the three models SC, SP and
SPT including their generalizations and the mixed models is a matrix of
zeroes and ones. Properties of the zero-one matrices in developing solution
strategies for linear models with zero-one constraint co-efficient matrix have
been explored by Balas and Padberg (76), Balas (72), Berge (72), Fulkerson,
Hoffman and Oppenheim (74), Padberg (74a, 74b) and Ryan and Falkner
(88).
642 R.R. Vemuganti

11.2 Transformations
The transformations and model conversions presented in this section can be
found in Balas and Padberg (76), Hammer, Johnson and Peled (79), Lemke,
Salkin and Spielberg (71), and Padberg (79) of the list of references on
theory and Garfinkel and Nemhauser (75, 73) and Karp (72) of the general
list of references.

11.3 Graphs
Vertex Packing: Berge (73) Chang and Nemhauser (85), Chvatal (77)
Edmonds (62) Houck and Vemuganti (77) Nemhauser and Trotter (75)
Maximum Matching: Balanski (70), Berge (73) Edmonds (62) Nor-
man and Rabin (59)
Minimum Cover: Balanski (70), Berge (73) Edmonds (62) Norman
and Rabin (59) Weinberger (76)
Chromatic Index and Chromatic Number: Berge (73), Brelaz (79),
Brown (72) Corneil and Graham (73) Leighton (79) Mehta (81) Salazar and
Oakford (74) Wang (74), Wood (69)
Multi-Commodity Disconnecting Set: Aneja and Vemuganti (77)
Bellmore and Ratliff (71)
Steiner Problem On Graphs: Aneja (80) Beasley (89, 84) Chopra
(92), Cockane and Melzak (69) Dreyfus and Wagner (71) Gilbert and Pollack
(68) Hakimi (71), Hanan (66), Hwang and Richards (92) Khoury, Pardalos
and Hearn (93), Khoury, Pardalos and Du (93) Maculan (87) Winter (87),
Wong (84), Wu, Widmayer and Wong (86)
The formulations and applications of the Vertex Packing, Maximum
Matching, Minimum Cover, Chromatic Index and Chromatic Number mod-
els can be found in Berge (73), Balanski (70), Edmonds (62), Houck and
Vemuganti (77), Nemhauser and Trotter (75) and Weinberger (76). The
book by Nemhauser and Wolsey (88) from the general list of references is
a good source for additional information on these four models. The Multi-
Commodity Disconnecting Set Problem and the Steiner Problem on graphs
formulations are due to Aneja and Vemuganti (77) and Aneja (80). Chopra
(92) and Khoury, Pardalos and Hearn (93) present many formulations of the
Steiner Problem on graphs. Implementation of the Examination Schedul-
ing Problem at Cedar Crest College, Allentown, Pennsylvania is reported in
Mehta (81).
Set Covering, Set Packing and Set Partitioning 643

11.4 Personnel Scheduling


Days Off (Or Cyclical) Scheduling: Abernathy, Baloff and Hershey (74)
Bailey (85), Bailey and Field (85), Baker, Burns and Carter (79), Baker and
Magazine (77), Baker (76,74), Baltholdi (81), Bartholdi, Orlin and Ratliff
(80), Bartholdi and Ratliff (78), Bechtold (88, 81), Bechtold and Showalter
(87, 85), Bennett and Potts (68), Bodin (73), Brown and Tibrewala (75),
Brownell and Lowerre (75), Burns and Koop (87), Burns and Carter (85),
Burns (78) Emmons and Burns (91), Emmons (85) Howell (66) Koop (86),
Krajewski and Ritzman (77) Miller, Pierskalla and Rath (76), Morris and
Showalter (83) Rothstein (73, 72) Tibrewala, Phillippe and Browne (72)
Vohra (88)
Shift Scheduling: Abernathy, Baloff and Hershey (74), Altman, Bel-
trami and Rappaport (71) Bailey (85), Bailey and Field (85), Baker (76),
Baker, Crabil and Magazine (73), Bartholdi (81), Bechtold and Jacobs (90),
Bechtold and Showalter (87,85), Bodin (73), Browne (79), Byrne and Potts
(73). Dantazig (54) Gaballa and Pearce (79) Henderson and Berry (77,
76) Ignall, Kolesar and Walker (72) Keith (79), Koop (88), Krajewski, Ritz-
man and McKenzie (80), Krajewski and Ritzman (77) Lessard, Rousseu and
DuPuis (81), Lowerre (79, 77) Mabert (79), Mabert and Raedels (77), Maier-
Rothe and Wolfe (73), Moondra (76), Morris and Showalter (83) Paixo and
Pato (89) Segal (74), Shepardson and Marsten (80) Vohra (88)
Tour Scheduling: Abernathy, Baloff and Hershey (74), Abernathy,
Baloff, Hershey and Wandell (73) Bailey (85), Bechtold, Brusco and Showal-
ter (91), Bechtold (88), Bechtold and Showalter (87, 85), Bodin (73), Buffa,
Cosagrave and Luce (76) Easton and Rossin (91a, 91b) Francis (66) Glover
and McMillan (86), Glover, McMillan and Glover (84), Guha and Browne
(75) Hagberg (85), Holloran and Byrn (86), Hung and Emmons (90) Krajew-
ski and Ritzman (77) Li, Robinson and Mabert (91) Mabert and Watts (82),
Mabert and McKenzie (80), McGinnis, Culver and Deane (78), Megeath
(78), Monroe (70), Morris and Showalter (83), Morrish and O'Connor (70)
Ozkarahan and Bailey (88), Ozkarahan (87) Papas (67) Ritzman, Krajewski
and Showalter (76) Showalter and Mabert (88), Smith (76), Smith and Wig-
gins (77), Stern and Hersh (80) Taylor and Huxley (89), Tien and Kamiyama
(82) Warner (76), Warner and Prawda (72)
Miscellaneous: Abernathy, Baloff and Hershey (71), Ahuja and Shep-
pard (75) Bechtold (91, 79), Bechtold and Sumners (88), Bechtold, Janaro
and Sumners (84) Chelst (81, 78), Chen (78), Church (73) Eilon (64) Gent-
zler, Khalil and Sivazlian (77), Green and Kolesar (84) Hershey, Abernathy
644 R.R. Vemuganti

and Baloff (74) Klasskin (73) Linder (69), Loucks and Jacobs (91) McGrath
(80) Price (70) Showalter, Krajewski and Ritzman (78) Wolfe and Young
(65a, 65b)
The SC formulation of the personnel scheduling problem is due to Dan-
tazig (54). Scheduling models of telephone operators at the General Tele-
phone Company of California and the Illinois Bell Telephone Company are
described in Buffa, Cosgrove and Luce (76) and Keith (79). Applications of
scheduling models to encode and process checks at the Ohio National Bank,
Chemical Bank and the Purdue National Bank are reported in Krajewski,
Ritzman and McKenzie (80), Mabert (79), and Mabert and Readels (77).
Applications of scheduling models to staffing nursing personnel at the Pe-
diatrics ward of the Colorado General Hospital, Harper Hospital (Detroit)
are reported in Megeath (78), and Morrish and O'Connor (70). Models
of scheduling patrol officers in San Francisco, aircraft cleaning crews for
an international airline, sanitory workers (household refuse collection) in
New York and bus drivers in Quebec city are described in Taylor and Hux-
ley (89), Stern and Hersh (80), Altman, Beltrami and Rappaport (71) and
Lessard (85). Scheduling of Sales Personnel and Clerical Employees at Qan-
tas Airlines and United Airlines are reported in Gabella and Pearce (79)
and Holloran and Byrn (86).

11.5 Crew Scheduling


Airline Crew Scheduling: Anbil, Gelman, Patty and Tanga (91), Arabeyre,
Fearnley, Steiger and Teather (69), Arabeyre (66) Baker and Fisher (81),
Baker and Frey (80), Baker, Bodin, Finnegan and Ponder (79), Ball and
Roberts (85), Barnhart, Johnson, Anbil and Hatay (91), Bronemann (70)
Darby-Dowman and Mitra (85) Evers (56) Gerbract (78), Gershkoff (90, 89,
87) Jones (89) Kabbani and Patty (93), Kolner (66) Lavoie, Minoux and
Odier (88) Marsten and Shepardson (81), Marsten, Muller and Killion (79),
McCloskey and Hansman (57), Minoux (84) Niederer (66) Rannou (86), Ru-
bin (73) Spitzer (87, 61), Steiger (65) Bodin, Golden, Assad and Ball (83)
of the Routing references.
Mass Transit Crew Scheduling: Amar (85) Ball, Bodin and Dial (85,
83,81,80), Belletti and Davani (85), Bodin, Ball, Duguid and Mitchell (85),
Bodin, Rosenfield and Kydes (81), Bodin and Dial (80), Booler (75), Borrett
and Roes (81) Carraresi and Gallo (84), Cedar (85) Edwards (80) Falkner
and Ryan (87) Hartley (85, 81), Henderson (75), Heurgon and Hervillard
(75), Hoffstadt (81), Howard and Moser (85) Keaveny and Burbeck (81),
Set Covering, Set Packing and Set Partitioning 645

Koutsopoulos (85) Leprince and Mertens (85), Lessard, Rousseau and DuPuis
(81), Leudtke (85) Marsten and Shepardson (81), Mitchell (85), Mitra and
Darby-Dowman (85), Mitra and Welsh (81) Paixo, Branco, Captivo, Pato,
Eusebio and Amado (86), Parker and Smith (81), Piccione, Cherici, Bielli
and LaBella (81) Rousseau and Lessard (85), Ryan and Foster (81) Scott
(85), Shepardson (85), Stern and Cedar (81), Stern (80) Tykulsker, O'Neil,
Cedar and Scheffi (85) Ward, Durant and Hallman (81), Wren, Smith and
Miller (85), Wren (81) Bodin, Golden, Assad and Ball (83) of the Routing
references.
Application of the Airline Crew Scheduling models at American Airlines,
Flying Tiger, Continental Airlines and Swiss Air are reported in Anbil, Gel-
man, Patty and Tanga (91), Gershkoff (89, 87), Kabbani and Patty (93),
Marsten and Shepardson (81), Marsten, Muller and Killion (79), and Steiger
(65).
Modelling and implementation of the Mass Transit Crew Scheduling
systems in various metropoliton areas (Amsterdam, Christchurch in New
Zealand, Hamburg, New York, Helsinki, Los Angeles, Dublin and Rome)
are described in Borrett and Roes (81), Falkner and Ryan (87), Hoffstadt
(81), Howard and Moser (85), Marsten and Shepardson (81), Mitchell (85),
Mitra and Darby-Dowman (85), Piccione, Cherici, Bielli and LaBella (81)
and Ryan and Foster (81).

11.6 Manufacturing
Assembly Line Balancing: Baybars (86a, 86b), Bowman (60) Freeman
and Jucker (67) Gutjahr and Nemhauser (64) Hackman, Magazine and Wee
(89), Hoffman (92) Ignall (65) Johnson (83, 81) Kilridge (62) Patterson and
Albracht (75) Salveson (55) Talbot, Patterson and Gehrlein (86), Talbot
and Patterson (84) White (61)
Discrete Lot Sizing and Scheduling Problem: Cattrysse, Saloman,
Kirk and Van Wassenhove (93), Cattrysse, Maes and Van Wassenhove (90,
88) Dizelenski and Gomory (65) Lasdon and Terjung (71) Manne (58)
Ingot Size Selection: Vasko, Wolfe and Scott (89, 87)
Spare Parts Allocation: Scudder (84)
Pattern Sequencing In Cutting Stock Operations: Pierce (70)
Resource Constrained Network Scheduling Problem: Fisher (73)
Cellular Manufacturing: Stanfel (89)
The Assembly Line Balancing problem formulation (29) is based upon
the formulations of Bowman (60) and White (61). Other formulations of
646 R.R. Vemuganti

this model can be found in the rest of the references listed under this topic.
The Discrete Lot Sizing and scheduling model is due to Cattrysse, Saloman,
Kirk and Van Wassenhove (93). Generalizations of this model are described
in Cattrysse, Maes and Van Wassenhove (90, 88), Dizelenski and Gomory
(65), Lasdon and Terjung (71), and Manne (58). Models and formulations of
Ingot Size Selection, Spare Parts Allocation, Pattern Sequencing In Cutting
Stock Operations, Resource Constrained Network Scheduling Problem and
Cellular Manufacturing are described in Vasko, Wolfe and Scott (89, 87),
Scudder (84), Pierce (70), Fisher (73), and Stanfell (89). Implementation
of the Ingot Size Selection models at the Bethlehem Steel Corporation are
reported in Vasko, Wolfe and Scott (89, 87).

11.7 Miscellaneous Operations


Frequency Planning: Thuve (81)
Timetable Scheduling: Almond (69, 66), Arani and Lofti (89), Aubin
(89), Aust (76) Barham and Westwood (78), Broder (64) Carter and Tovey
(92), Carter (86), Csima and Gotleib (64) Dempster (71), Dewerra (78, 75)
Even, Itai and Shamir (76) Ferland and Roy (85) Gans (81), Glassey and
Mizrach (86), Gosselin and Trouchon (86), Grimes (70) Hall and Action
(67), Hertz (92) Knauer (74) LaPorte and Desroches (84), Lions (67) Mehta
(81), Mulvey (82) Tripathy (84, 80) White and Chan (79), Wood (69)
Testing and Diagnosis: Nawijn (88) Reggia, Naw and Wang (83)
Political Districting: Garfinkel and Nemhauser (70)
Information Retrieval and Editing: Day (65) Garfinkel, Kunnathur
and Liepins (86)
Check Clearing: Markland and Nauss (83) Nauss and Markland (85)
Capital BUdgeting: Valenta (69)
Fixed Charge Problem: Aneja and Vemuganti (74) Frank (72) Mc-
cKeown (81)
Mathematical Problems: Fulkerson, Nemhauser and Trotter (74)
Heath (90)
The Models presented in this section on Frequency Planning, Testing and
Diagnosis, Political Districting, Information Retrieval and Editing, Check
Clearing, Capital Budgeting, Fixed Charge Problem and Mathematical Prob-
lems are based upon Thuve (81), Nawijn (88), Reggia, Naw and Wang (83),
Garfinkel and Nemhauser (70), Day (65), Garfinkel, Kunnathur and Liepins
(86), Markland and Nauss (83), Valenta (69), McKeown (81) and Fulkerson,
Nemhauser and Trotter (74) and Heath (90).
Set Covering, Set Packing and Set Partitioning 647

Applications of Timetable Scheduling modles at SUNY, Buffalo, Uni-


versity of Waterloo, Ontario School System and Cedar Crest College are
reported in Arani and Lofti (89), Carter (89), Lions (67) and Mehta (81).
Implementation of Check Clearing model at the Maryland National Bank is
presented in Markland and Nauss (83).

11.8 Routing
Travelling Salesman Problem: Bellmore and Hong (74), Bodin, Golden,
Assad and Ball (83) Christofides (85b), Christofides, Mingozzi and Toth
(81b), Christofides and Eilon (73) Eilon, Watson-Gandy and Christofides
(71) Gavish and Shlifer (78), Golden, Levy and Dahl (81), Golden, Magnanti
and Nguyen (77) Held and Karp (70) LaPorte (92b), Lenstra and Rinnooy
Kan (75), Lin and Kernighan (73), Lin (65) Magnanti (81), Malandraki and
Daskin (89) Russell (77) Solomon and Desrosiers (88)
Single And Multiple Depots: Agarwal, Mathur and Salkin (89),
Agin (75), Altinkemer and Gavish (91, 90, 87), Anily and Federgruen (90),
Averbakh and Berman (92) Baker (92), Balinski and Quandt (64), Ball,
Golden, Assad, Bodin (83), Bartholdi, Platzman, Collins and Warden (83),
Beasley (84, 83, 81), Bell, Dalberto, Fisher, Greenfield, Jaikumar, Kedia,
Mack and Prutzman (83), Bellman (58), Beltrami and Bodin (74), Bert-
simas (92, 88), Bertsimas and Ryzin (91), Bodin, Golden, Assad and Ball
(83), Bodin and Golden (81), Bodin and Kursh (79, 78), Bodin (75), Bon-
der, Cassell and Andros (70), Bramel and Simchi-Levi (93), Bramel, Coff-
man, Shor and Simchi-Levi (92), Brown and Graves (81), Butt and Cavalier
(91) Chard (68), Cheshire, Melleson and Naccache (82), Christofides (85a,
85b), Christofides, Mingozzi and Toth (81a, 81b, 79), Christofides (76, 71),
Christofides and Eilon (69), Clark and Wright (64), Crawford and Sinclair
(77), Cullen, Jarvis and Ratliff (81), Cunto (78) Daganzo (84), Dantazig
and Ramser (59), Doll (80), Dror and Trudeau (90) Eilon, Watson-Gandy
and Christofides (71), Eilon and Christofides (69), Etezadi and Beasley (83),
Evans and Norbeck (85) Ferebee (74), Ferguson and Dantazig (56), Fisher,
Greenfield, Jaikumar and Lester (82), Fisher and Jaikumar (81), Fletcher
(63), Fleuren (88), Foster and Ryan (76), Frederickson, Hecht and Kim (78)
Garvin, Crandall, John and Spellman (57), Gaskell (67), Gavish and Shlifer
(78), Gavish, Schweitzer and Shlifer (78), Gendreau, Hertz and LaPorte (92),
Gheysens, Golden and Assad (84), Gillett and Johnson (76), Gillett and
Miller (74), Golden and Assad (88, 86a), Golden, Bodin and Goodwin (86),
Golden and Baker (85), Golden, Gheysens and Assad (84), Golden and Wong
648 R.R. Vemuganti

(81), Golden, Magnanti and Nguyen (77), Golden (77) Haimovich, Rinnooy
Kan and Stouge (88), Haimovich and Rinnooy Kan (85), Hauer (71), Holmes
and Parker (76), Hyman and Gordon (68) Kirby and McDonald (72), Kirby
and Potts (69), Krolak, Felts and Nelson (72) Labbe, LaPorte and Mercure
(91), Lam (70), LaPorte (92a), LaPorte, Nobert and Taillefer (88, 87), La-
Porte and Nobert (87, 84), LaPorte, Mercure and Nobert (86), LaPorte and
Nobert and Derochers (85), LaPorte, Derochers and Nobert (84), Lenstra
and Rinnooy Kan (81, 76, 75), Levary (81), Levy, Golden and Assad (80),
Li and Simchi-Levi (93, 90), Lecena (86) Magnanti (81), Malandraki and
Daskin (89), Male, Liebman and Orloff (77), Marquez Diez- Canedo and
Escalante (77), Minas and Mitten (58), Minieka (79), Mole (83, 79), Mole,
Johnson and Wells (83), Mole and Jameson (76) Nelson, Nygard, Griffin
and Shreve (85), Norbeck and Evans (84) Orloff (76a, 76b, 74a, 74b), Orloff
and Caprera (76) Passens (88), Psarafits (89, 88, 83c), Pullen and Webb
(67) Robertson (69), Ronen (92) Salvelsbergh (90), Scharge (83), Solomon
and Desrosiers (88), Stern and Dror (79), Stewart and Golden (84), Stricker
(70), Sumichrast and Markham (93), Sutcliffe and Board (91) Tillman and
Cain (72), Tillman and Hering (71), Tillman (69), Tillman and Cochran
(68), Turner, Ghare and Foulds (76), Turner and Hougland (75), Tyagi (68)
Unwin (68) Van Leeuwen (83) Watson-Gandy and Foulds (72), Webb (72),
Williams (82), Wren and Holliday (72) Yellow (70)

Routing With Time Windows: Baker and Schaffer (86), Bodin,


Golden, Assad, Ball (83), Bodin and Golden (81) Cassidy and Bennett
(72) Desrochers, Desrosiers and Soloman (92), Desrochers, Lenstra, Savels-
bergh and Soumis (88), Desrochers, Soumis, Desrosiers and Sauve (85),
Desrochers, Soumis, Desrosiers (84), Dumas, Desrosiers and Soumis (91)
EI-Azm (85) Fleuren (88) Gertsbach and Gurevich (77), Golden and Assad
(88, 86a, 86b), Golden and Wasssel (87), Golden and Baker (85), Golden,
Magnanti and Nguyen (77) Jaw, Odoni, Psaraftis and Wilson (86) Knight
and Hofer (68), Kolen, Rinnooy Kan and Trienekens (87), Koskosidis, Powell
and Soloman (92) Malandraki and Daskin (89) Potvin and Rousseau (93),
Potvin, Kervahut and Rousseau (92) Salvelsbergh (85), Scharge (83), Sexton
and Choi (72), Soloman and Desrosiers (88), Solomon (87, 86)
Periodic Routing Problem: Bodin, Golden, Assad, Ball (83) Cheshire,
Melleson and Naccache (82), Christofides (85b, 84) Foster and Ryan (76)
Gaudioso and Paletta (92), Golden and Assad (88) Hausman and Gilmour
(67) Russel and Igo (79), Raft (82) Sexton and Bodin (85a, 8Sb), Solomon
and Desrosiers (88) Tan and Beasley (84)
Set Covering, Set Packing and Set Partitioning 649

Integrated Inventory And Routing: Anily and Federgruen (93, 92,


90a, 90b), Arisawa and Elmaghraby (77a, 77b) Bramel and Simchi-Levi (93)
Chien, Balakrishnan and Wong (89), Christofides (85b) Dror and Ball (87),
Dror and Levy (86), Dror, Ball and Golden (86) Farvolden, LaPorte and Xu
(93), Federgruen and Simchi-Levi (92), Federgruen and Zipkin (84) Golden
and Assad (86a), Golden and Baker (85), Golden, Assad and Dahl (84) Hall
(91)
Dial-A-Ride Problem: Angel, Caudle, Noonan and Whinston (72)
Bennett and Gazis (72), Bodin and Sexton (86), Bodin and Berman (79)
Cullin, Jarvis and Ratliff (81) Daganzo (78), Desrosiers, Dumas and Soumis
(86), Dulac, Ferland and Forgues (80), Dumas, Desrosiers and Soumis (91)
Fleuren (88), Foulds, Read and Robinson (77) Golden and Assad (88) Jaw,
Odoni, Psarafitis and Wilson (86) McDonald (72) Newton and Thomas (74,
69) Psaraftis (86, 83a, 83b, 80) Solomon and Desrosiers (88), Stein (78),
Stewart and Golden (81, 80)
Location And Routing: Jacobson and Madsen (80) LaPorte, Nobert
and Taillefer (88), LaPorte, Nobert and Arpin (86), LaPorte and Nobert
(81)
Scheduling A Fleet Of Ships, Aircrafts, Trains and Buses: Ap-
plegreen (71, 69), Assad (81, 80) Bartlett (57), Bartlett and Charnes (57),
Barton and Gumer (68), Bodin, Golden, Schuster and Romig (80), Brown,
Graves and Ronen (87) Ceder and Stern (81), Charnes and Miller (56) Fisher
and Rosenwein (89), Florian, Guerin and Bushell (76) Laderman, Gleiber-
man and Egan (66), Levin (71) Martin-Lof (70), McKay and Hartley (74)
Nemhauser (69) Peterson and Fullerton (73), Pierce (69), Pollak (77) Rao
and Zionts (68), Richardson (76), Ronen (86, 83) Saha (70), Salzborn (74,
72a, 72b, 70, 69), Simpson (69), Smith and Wren (81), Soumis, Ferland and
Rousseau (80), Spaccamela, Rinnooy Kan and Stougie (84), Szpiegel (72)
White and Bomberault (69), Wolters (79), Wren (81) Young (70)
The routing problem is introduced by Dantazig and Ramser (59). The
SPT formulation ofrouting problem is due to Balanski and Quandt (64). A
variety of routing applications are listed in Table 7.
C)
C1t
Table 7: APPLICATIONS o

Bartholdi, Platzman, Collins and Warden (3) Meals-on-Wheels Senior Citizens Inc.,
Atlanta
Bell, Dalberto, Fisher, Greenfield, Distribution of Oxygen, Hydrogen etc., at
Jaikumar, Kedia, Mack and Prutzman (83) Air Products and Chemicals, Inc.
Bodin and Berman (79) School Bus Routing at Brentwood School
District, Long Island, New York
Bodin and Kursh (78) Routing and Scheduling of street sweepers
in New York City and Washington, D.C.
Brown, Graves and Ronen (87) Scheduling of Crude Oil Tankers for a
major oil company
Brown and Graves (81) Routing Petroleum tank trucks at Chevron,
USA
Cassidy and Bennett (72) Catering of meals to the schools of the
Inner London Education Authority
Ceder and Stern (81) Scheduling bus trips at Egged, the Israel
National Bus Carrier
Crawford and Sinclair (72) Scheduling beer tankers at WAIKATO
Brewers Ltd., Hamilton, Nwe Zealand
Cunto (78) Routing of boats to sample oil wells at Lake ~
Maracaibo, Venezuela ~
Evans and Norbeck (85) Food Distribution at KRAFT ~
Fisher and Rosenwein (89) Military Sealift Command of the U.S. Navy !3
c:::
Fisher, Greenfield, Jaikumar and Lester (82) Distribution of a major product at DUPONT §....
Golden, Magnanti and Nguyen (77) Distributing newspaper with large ....
circulation
~
.....
Golden and Wassil (87) Distribution of soft drinks at Joyce
Beverages, Baltimore Division of Mid-
g
Ct3
Atlantic Coca-Cola, Pepsi-Cola Bottling '"1
~.
Group of Purchase, New York and others
Gavish, Schweitzer and Shlifer (78) Scheduling buses for large bus company ~
.....
Jacobsen and Madsen (80) Designing transfer points and routes for
distributing newspaper for a company in ~
R-
Denmark ~.
Jaw, Odoni, Psarafitis and Wilson (86) Dial-A-Ride Model application at Rufbus §
GmbhBodenseekreis, Friedrichshafen, 0..
Germany ~
.....
Knight and Hofer (68) Routing vehicles to collect and deliver small
consignments for a contract transport ..............~
undertaking in London
McDonald (72) Transporting specimens from a hospital to
g'
~.
laboratories
McKay and Hartley (74) Distribution of bulk petroleum products at
the Defence Fuel Supply Center (DFSC)
and the Military Sealift Command (MSC)
Salzborn (70) Scheduling trains at the Adelaide
Metropolitan Passenger Service of South
Australian Railways
Smith and Wren (81) Bus scheduling at the West Yorkshire
Passenger Transport System
0:>
Stern and Dror (79) Reading Electric Meters in the City of C11
f-'
Beersheva, Israel
652 R.R. Vemuganti

11.9 Location

Plant (Warehouse) Location and Allocation: Akinc and Khumawala


(77), Atkins and Shriver (68) Baker (74), Ballou (68), Barcelo and Casanovas
(84), Baumol and Wolfe (58), Bilde and Krarup (77), Brown and Gibson
(72), Burstall, Leaver and Sussams (62) Cabot, Francis and Stary (70), Cer-
veny (80), Cho, Johnson, Padberg and Rao (83), Cho, Padberg and Rao
(83), Cohon, ReVelle, Current, Eagles, Eberhart and Church (80), Cooper
(64,63), Cornuejols, Nemhauser and Wolsey (90) Davis and Ray (69), Dear-
ing (85), Drysdale and Sandiford (69), Dutton, Hinman and Millham (74)
Efroymson and Ray (66), Ellwein and Gray (71), EI-Shaieb (73), Elson
(72), Erlenkotter (78, 73), Feldman, Lehrer and Ray (66) Geoffrion and
McBride (78), Gelders, Printelon and Van Wassenhove (87), Guignard (80),
Guignard and Spielberg (79, 77) Hammer (68), Hoover (67), Hormozi and
Khumawala (92) Khumawala, Neebe and Dannenbring (74), Khumawala
(73a, 72), Khumawala and Whybark (71), Kolen (83), Koopmans and Beck-
man (74), Krarup and Pruzan (83), Kuehn and Hamburger (63), Kuhn
and Kuenne (62) LaPorte, Nobert and Arpin (86), Levy (67), Louveaux
and Peeters (92) Manne (64), Maranzana (64), Marks, ReVelle and Lieb-
man (70) Nambiar, Gelders and Van Wassenhove (89, 81) Perl and Daskin
(85, 84), Polopolus (65) ReVelle, Marks and Liebman (70) Sa (69), Saedat
(81), Scott (70), Shannon and Ignizio (70), Spielberg (70, 69a, 69b), Smith,
Mangtelsdorf, Luna and Reid (89), Swain (74) Tapiero (71) Van Roy and
ErlenKotter (82), Vergin and Rogers (67) Wendell and Hurter (73)
Lock Box Location: Cornuejols, Fisher and Nemhauser (77a, 77b)
Kramer (66), Kraus, Janssen and McAdams (70) Maier and Vanderwede (76,
74), Malczewski (90), Mavrides (79), McAdams (68) Nauss and Markland
(81, 79) Shankar and Zoltners (72), Stancil (68)
P-Center Problem: Aneja, Chandrasekaran and Nair (88)
Chandrasekaran and Daugherty (81), Chhajed and Lowe (92), Christofides
and Viola (71) Dearing (85), Drenzner (86, 84), Dyer and Frieze (85) Eise-
mann (62) Garfinkel, Neebe and Rao (77), Goldman (72a, 69) Hakimi,
Schmeichel and Pierce (78), Hakimi and Maheshwari (72), Hakimi (64),
Halfin (74), Halpern (76), Handler (73), Hansen, Labbe, Peters and Thisse
(87), Hooker, Garfinkel and Chen (91) Kariv and Hakimi (79a), Kolen (85)
Lin (75) Musuyama, Ibaraki and Hasegawa (81), Minieka (77, 70), Moon
and Chaudary (84) Richard, Beguin and Peeters (90) Tansel, Francis and
Lowe (83a, 83b) Vijay (85)
P-Median Problem: Chhajed and Lowe (92), Church and Weaver
Set Covering, Set Packing and Set Partitioning 653

(86), Church and Meadows (77), Church and ReVelle (76) Dearing (85)
Erkut, Francis and Lowe (88) Goldman (72b, 71), Goldman and Witzgall
(70) Hakimi (65, 64), Halpern (76), Hansen, Labbe, Peters and Thisse (87),
Hooker, Garfinkel and Chen (91) Jarvinen, Rajala and Sinerro (72) Kariv
and Hakimi (79a), Khumawala (73b) Mavrides (79), Minieka (77), Mirchan-
dani (79), Moon and Chaudary (84) Narula, Ogbu and Samuelsson (77),
Neebe (78) ReVelle and Elzinga (89), ReVelle and Hogan (89b), Richard,
Beguin and Peeters (90), Rydell (71, 67) Snyder (71b) Tansel, Francis and
Lowe (83a, 83b), Teitz and Bart (68), Toregas, Swain, ReVelle and Bergman
(71) Weaver and Church (85) Service Facilities Location Ball and Lin (93),
Berlin and Liebman (74) Chrissis, Davis and Miller (82), Church and Mead-
ows (79), Church and ReVelle (76), Current and Storbeck (88) Dee and Lieb-
man (72), Deighton (71), Drezner (86) Erlenkotter (73) Foster and Vohra
(92), Francis, Lowe and Ratliff (78) Goodchild and Lee (89), Gunawardane
(82) Holmes, Williams and Brown (72) Kolen (85), Kolesar and Walker (74)
Marks, ReVelle and Liebman (70), Moon and Chaudary (84), Mukundan
and Daskin (91) Neebe (88) Orloff (77) Patel (79) Rao (74), Ratick and
White (88), ReVelle (89), ReVelle and Hogan (89a), ReVelle, Toregas and
Falkson (76), ReVelle, Marks and Liebman (70), ReVelle and Swain (70),
Richard, Beguin and Peeters (90), Rojeski and ReVelle (70) Saatcioglue (82),
Saydam and McKnew (85), Schilling, Jayaraman and Barkhi (93), Schilling
(82), Schilling, ReVelle, Cohen and Elzinga (80), Schilling (80), Schreuder
(81), Slater (81), Storbeck (82) Toregas and ReVelle (73), Toregas, Swain,
ReVelle and Bergman (71), Toregas and ReVelle (70) Valinsky (55) Wagner
and Falkson (75), Walker (74), White and Case (74), Weaver and Church
(83)
Maximal Covering Problem: Balakrishnan and Storbeck (91), Balas
(83), Batta, Dolan and Krishnamurty (89), Bennett, Eaton and Church
(82) Church and Weaver (86), Church and Roberts (83), Church and Mead-
ows (79), Church and ReVelle (76), Current and O'Kelly (92), Current and
Schilling (90), Current and Storbeck (88) Daskin, Haghani, Khanal and Ma-
landraki (89), Daskin (83, 82) Eaton, Daskin, Simmons, Bulloch and Jansma
(85) Fuziwara, Makjamroen and Gupta (87) Kalstorin (79) Medgiddo, Zemel
and Hakimi (83), Mehrez and Stulman (84, 82), Mehrez (83), Meyer and
Brill (88) Pirkul and Schilling (91) ReVelle and Hogan (88) Schilling, Ja-
yaraman and Barkhi (93), Storbeck and Vohra (88)
Hierarchical Objective Set Covering Model: Charnes and Stor-
beck (80), Church, Current and Storbeck (91), Church and Eaton (87)
Daskin, Hogan and ReVelle (88), Daskin and Stern (81) Flynn and Rat-
654 R.R. Vemuganti

ick (88) Moore and ReVelle (82), Mukundan and Daskin (91) Plane and
Hendrick (77)
Backup Coverage Model: Church and Weaver (86) Daskin, Hogan
and ReVelle (88) Hogan and ReVelle (86, 83) Pirkul and Schilling (89) Stor-
beck and Vohra (88)
Multiple Response Unit Model: Batta and Mannur (80) Marianov
and ReVelle (91) Schilling, Elzinga, Cohen, Church and ReVelle (79)
Miscellaneous: Alao (71), Armour and Buffa (63) Beckman (63), Bell
and Church (85), Bellman (65), Bertsimas (88), Bindschedler and Moore
(61), Bouliane and LaPorte (92) Chan and Francis (76), Chaudary, Mc-
Cormick and Moon (86), Chaiken (78), Church and Garfinkel (78), Conway
and Maxwell (61), Cooper (72, 68, 67), Current and Schilling (89), Current
and Storbeck (87), Current, ReVelle and Cohen (85) Dearing and Fran-
cis (74a, 74b) Eislet (92), Elzinga, Hearn and Randolph (76), Elzinga and
Hearn (73, 72a, 72b), Erkut, Francis, Lowe and Tamir (89), Eyster, White
and Wierwille (73) Fitzsimmons and Allen (83), Fitzsimmons (69), Fran-
cis and Mirchandani (89), Francis, McGinnis and White (83), Francis and
Goldstein (74), Francis and White (74), Francis and Cabot (72), Francis
(72, 67a, 67b, 64, 63), Frank (66) Gavett and Plyter (66), Ghosh and Craig
(86), Gleason (75), Goldberg and Paz (91), Goldberg, Dietrich, Chen and
Mitwasi, Valenzuela and Criss (90) Handler and Mirchandani (79), Hansen,
Thisse and Wendell (86), Hitchings (69), Hodgson (90, 81), Hogg (68), Hop-
mans (86), Hsu and Nemhauser (79), Hurter, Schaeffer and Wendell (75)
Keeny (72), Kimes and Fitzsimmons (90), Kirca and Erkip (88) Lar-
son (75, 72), Lawrence and Pengilly (69), Leamer (68), Love, Morriss and
Wesolowsky (88), Love, Wesolowsky and Kraemer (73), Love and Morris
(72), Love (72, 69, 67) McKinnon and Barber (72), McHose (61), Mirchan-
dani (80, 79), Mole (73), Mycielski and Trzechiakowske (63) Nair and Chan-
drasekaran (71) Osleeb, Ratick, Buckley, Lee and Kuby (86) Palermo (61),
Picard and Ratliff (78), Price and Turcotte (86), Pritsker (73), Pritsker and
Ghare (70) Rand (76), ReVelle and Serra (91), ReVelle (86), Roodman and
Schwartz (77, 75), Rosing (92), Ross and Soland (77), Rushton (89) Schae-
fer and Hurter (74), Schneider (71), Schniederjans, Kwak and Helmer (82),
Simmons (71, 69), Snyder (71a), Storbeck (90, 88) Tansel and Yesilkokeen
(93), Tansel, Francis and Lowe (80), Taylor (69), Teitz (68), Tewari and
Jena (87), Tideman (62) Volz (71) Watson-Gandy (82, 72), Watson-Gandy
and Eilon (72), Wesolowsky (73a, 73b, 72), Wesolowsky and Love (72, 71a,
71b), Weston (82), Wirasinghe and Waters (83), Weaver and Church (83)
Young (63)
~
<"I-

Table 8: APPLICATIONS g

'"1
Bennett, Eaton and Church (82) Selecting Sites for Rural Health Workers; ~.
Valle del Cauca, Columbai
Cerveny (80) Location of Bloodmobile Operations ~
<"I-

Cohon, ReVelle, Currnt, Eagles, Eberhart and Church Power plant locations in a six-state region of U.S ~
Current and O'Kelly (92) Locating emergency warning sirens in a
~.
midwestern city of U.S. *
Daskin (82) Emergency medical vehicles located in ~
c..
Austin, Texas
Daskin and Stern (81) Emergency medical vehicles location in ~
<"I-

Austin, Texas
Drysdale and Sandiford (69) Locating warehouses for R + CA Victor
~
~
<"I-

Company Ltd., Canada g.


Dutton, Hinman and Millham (74) Locating electrical power generating plants ~.
in the Pacific Northwest
Eaton, Daskin, Simmons, Bulloch and Emergency medical service vehicle location
Jansma (85) in Texas
Fitzsimmons and Allen (83) Selection of out-of-state audit offices
Flynn and Ratic (88) Air Service locations for small communities
in North and South Dakota
Fujiwara, Makjamroen and Gupta (87) Ambulance deployment - A case study in
Bangkok ------------ -

0)

'"'"
O'l
C1I
O'l

Goldberg, Dietrich, Chen and Mitwasi, Locating emergency medical servicesin


Valenzuela and Criss (90) Tucson, Arizona
Hogg (68) Siting of fire stations in Briston County
Borough, England
Holmes, Williams and Brown (72) Location of public day care facilities in
Columbus, Ohio
Hopmans (86) Locating bank branches in Netherlands
Kimes and Fitzsimmons Selecting profitable sites at La Quinta Inns
Kirca and Erkip (88) Selecting solid waste transfer points in
Turkey
Kolesar and Walker (74) Dynamic relocation of fire companies in
New York City
Nambiar, Gelders and Van Wassenhove (89, 81) Location of rubber processing factories in
Malaysia
Patel (79) Locating rural social service centers in
India
Plane and Hendrick (77) Location of fire companies for Denver Fire
Department
Price and Turcotte (86) Location of blood bank in Canada
Saedat (81) Location of grass drying plants in the ~
Netherlands ~
Schniederjans, Kwak and Helmer (82) Locating a trucking terminal ~
Schreuder (81) Locating fire stat~ons in Rotterd~ S
c::
- - - - - - - _ ... - -- ~-

i.......
f;?
....
g
cg
'"1
~.

f;?
....
~
Smith, Mangelsdorf, Luna and Reid (89) Supply centers for Ecuador's health k*
workers just-in-time ~Q,.
Tewari and Jena (87) Location of high schools in rural India
Volz (71) Ambulance location in semi-rural areas of f;?
....
Washtenaw County, Michigan
Walker (74) Location of fire stations in New York City ~........
Weston (82) Telephone answering sites in a service
....
g'
industry (U.S.) ~.
Wirasinghe and Waters (83) Location of solid waste transfer points
(Canada)

~
CT1
~
658 R.R. Vemuganti

The models and formulations addressed in this section can be found in


Dearing (85), Lourveaux and Peeters and Manne (64) (Plant Location), Cor-
nuejols, Fisher and Nemhauser (77a), Kraus, Janseen and McAdams (70),
Nauss and Markland (81) (Lock Box), Garfinkel, Neebe and Rao (77) and
Minieka (70) (P-Center), Hakimi (64), Narula, Ogbu and Samuelsson (77)
and ReVelle and Swain (70) (P-Median), Toregas and ReVelle (70), Toregas,
Swain, ReVelle and Bergman (71) (Service Facilities), Church and ReVelle
(74) (Maximal Covering), Daskin and Stern (81) (Hierarchical Objective
Set Covering), Hogan and ReVelle (86) (Backup Coverage) and Batta and
Mannur (90) (Multiple Response Unit).
91

12 Conclusions
In this paper a variety of applications of set covering, set packing, and
set partitioning models including their variants and generalizations are pre-
sented. In addition, transformations to convert one model to another includ-
ing the transformation of the MD Knapsack problem and the LIP to one of
these models are discussed. It should be noted that some applications such as
Travelling Salesman Problem, Multicommodity Disconnecting Problem and
Steiner Problem in Graphs require enormous number of constraints where
as the formulation of the Personnel Scheduling, Crew Scheduling, and the
Routing models require a very large number of variables. The transforma-
tions of the MD Knapsack problem and the LIP require both a very large
number of variables and constraints.
Moderate size SC, SP and SPT models can be solved efficiently with
the existing algorithms and techniques and have been used in many real life
situations. Efficient solution techniques to solve very large SC, SP and SPT
models will enhance the application of these mpodels to solve many real
life 19oistics problems. Clearly, these special structured models are a very
useful and important class of linear integer programs and deserve the effort
devoted by many researchers.
Set Covering, Set Packing and Set Partitioning 659

References
Theory

Albers, S., "Implicit Enumeration Algorithms for the Set Partitioning Prob-
lem", OR Spektrum, Vol. 2, pp. 23-32 (1980).

Anily, S., and Federguren, A., "Structured Partitioning Problems", Oper-


ations Research, Vol. 39, pp. 130- 149 (1991).

Avis, D., "A Note on Some Computationally Difficult Set Covering Prob-
lems", Mathematical Programming, Vol. 18, pp. 138-145 (1980).

Baker, E., "Efficient Heuristic Algorithms for the Weighted Set Covering
Problem", Computers and Operations Research, Vol. 8, pp. 303-310
(1981).

Balas, E., and Ng, S.M., "On the Set Covering Polytope: 1. All the Facets
With Coefficients in 0, 1,2", Mathematical Programming, Vol. 43, pp.
57-69 (1989).

Balas, E., "A Sharp Bound on the Ratio Between Optimal Integer and
Fractional Covers", Mathematics of Operations Research, Vol. 9, pp.
1-5 (1984).

Balas, E., and Ho, A., "Set Covering Algorithms Using Cutting Planes
Heuristics and Subgradient Optimization: A Computational Study",
Mathematical Programming Study, Vol. 12, pp. 37-60 (1980).

Balas, E., "Cutting Planes from Conditional Bounds: A New Approach to


Set Covering" Mathematical Programming Study, Vol. 12, pp. 19-36
(1980).

Balas, E., and Padberg, M.W., "Set Partitioning: A Survey", Combina-


torial Optimization (Edited by N. Christofides, A. Mingozzi, P. Toth
and C. Sandi), John Wiley and Sons, New York, pp. 151 - 210 (1979).

Balas, E., "Set Covering with Cutting Planes for Conditional Bounds",
Ninth International Symposium on Mathematical Programming (1976).

Balas, E., and Padberg, M.W., "Set Partitioning: A Survey", SIAM Re-
view, Vol. 18, pp. 710-760 (1976).
660 R.R. Vemuganti

Balas, E., and Padberg, M.W., "On Set Covering Problem II: An Algorithm
for Set Partitioning", Operations Research, Vol. 22, 1, pp. 74-90
(1975a).
Balas, E., and Padberg, M.W., "Set Partitioning", Combinatorial Pro-
gramming: Methods and Applications (Edited by B. Roy), Reidel
Publishing Co., pp. 205-258 (1975b).
Balas, E., and Padberg, M.W., "On the Set-Covering Problem", Opera-
tions Research, Vol. 20, pp. 1152- 1161 (1972).
Beasley, J.E., "An Algorithm for Set Covering Problem", European Journal
of Operational Research, Vol. 31, pp. 85-93 (1987).
Bellmore, M., and Ratliff, H.D., "Set Covering and Involutory Basis", Man-
agement Science, Vol. 18, pp. 194-206 (1971).
Benvensite, R., "A Note on the Set Covering Problem", Journal of the
Operational Research Society, Vol. 33, pp. 261-265 (1982).
Berge, C., "Balanced Matrices", Mathematical Programming, Vol. 2, pp.
19-31 (1972).
Chan, T.J., and Yano, C.A., "A Multiplier Adjustment Approach for the
Set Partitioning Problem", Operations Research, Vol. 40, pp. S40-S47
(1992).
Chaudry, S.S., Moon, LD., and McCormick, S.T., "Conditional Covering:
Greedy Heuristics and Computational Results", Computers and Op-
erations Research, Vol. 14, pp. 11-18 (1987).
Christofides, N., and Paixo, J., "State Space Relaxation for the Set Cover-
ing Problem" Faculdade, de ciencias Lisboa, Portugal (1986).
Christofides, N., and Korman, S., "A Computational Survey of Methods
for the Set Covering Problem", Management Science, Vol. 21, pp.
591-599 (1975).

Chvatal, V., "A Greedy-Heuristic for the Set Covering Problem", Mathe-
matics of Operations Research, Vol. 4, pp. 233-235 (1979).
Coffman, E.G., Jr., and Lueker, G.S., "Probabilistic Analysis of Packing
and Partitioning Algorithms", John Wiley (1991).
Set Covering, Set Packing and Set Partitioning 661

Conforti, M., Corneil, D.G., and Mahjoub, A.R., "K-Covers 1: Complexity


and Polytope", Discrete Mathematics, Vol. 58, pp. 121-142 (1986).
C

ornvejols, G., and Sassano, A., "On the (0,1) Facets of the Set Covering
Polytope", Mathematical Programming, Vol. 43, pp. 45-55 (1989).

Crama, Y., Hammer, P., and Ibaraki, T., "Packing, Covering and Par-
titioning Problems with Strongly Unimodular Constraint Matrices",
Mathematics of Operaions Research, Vol. 15, pp. 258-269 (1990).

EI-Darzi, E., "Methods for Solving the Set Covering and Set Partition-
ing Problems using Graph Theoretic (Relaxations) Algorithms", PhD
Thesis, BruneI University, Uxbridge (UK) (1988).

EI-Darzi, E., and Mitra, G., "A Tree Search for the Solution of Set Problems
Using Alternative Relaxations", TR/03/88, Department of Mathemat-
ics and Statistics, BruneI University, Uxbridge (UK) (1988a).

EI-Darzi, E., and Mitra, G., "Set Covering and Set Partitioning: A Collec-
tion of Test Problems", TR/0l/88, Department of Mathematics and
Statistics, BruneI University, Uxbridge (UK) (1988b).

Etcheberry, J., "The Set Covering Problem: A New Implicit Enumeration


Algorithm", Operations Research, Vol. 25, pp. 760-772 (1977).

Fisher, M.L., and Kedia, P., "Optimal Solution of Set Covering/Partitioning


Problems Using Dual Heuristics", Management Science, Vol. 36, pp.
674-688 (1990).

Fisher, M.L., and Kedia, P., "A Dual Algorithm for Large Scale Set Parti-
tioning", Paper No. 894, Purdue University, West Lafayette, Indiana
(1986).

Fisher, M.L., and Wolsey, L., "On the Greedy Heuristic for Continuous
Covering and Packing Problems", SIAM Journal On Algebraic and
Discrete Methods, Vol. 3, pp. 584-591 (1982).

Fowler, R.J., Paterson, M.S., and Tanimoto, S.L., "Optimal Packing and
Covering in the Plane are NP- Complete", Information Processing
Letters, Vol. 12, pp. 133-137 (1981).
662 R.R. Vemuganti

Fox, G.W., and Scudder, G.D., "A Simple Strategy for Solving a Class ofO-
1 Integer Programming Models" , Computers and Operations Research,
Vol. 13, pp. 707-712 (1986).

Fulkerson, D.R, Hoffman, A.J., and Oppenheim, R, "On Balanced Matri-


ces", Mathematical Programming Study, Vol. 1, pp. 120-132 (1974).

Garfinkel, R, and Nemhauser, G.L., "Optimal Set Covering: A Survey",


Perspectives on Optimization., (Edited by A.M. Geoffrion), Addison-
Wesley, pp. 164-183 (1972).

Garfinkel, RS., and Nemhauser, G.L., "The Set Partitioning Problem: Set
Covering with Equality Constraints". Operations Research, Vol. 17,
pp. 848-856 (1969).

Hammer, P.L., and Simeone, B., "Order Relations of Variables in 0-1 Pro-
gramming", Annals of Discrete Mathematics, Vol. 31, pp. 83-112
(1987)

Hammer, P.L., Johnson, F.L., and Peled, U.N., "Regular 0-1 Programs",
Cahiers du Centre d'Etudes de Recherche Operationnelle, Vol. 16, pp.
267-276 (1974).

Ho, A., "Worst Case of a Class of Set Covering Heuristic", Mathematical


Programming, Vol. 23, pp. 170- 181 (1982).

Hochbaum, D., "Approximation Algorithms for the Weighted Set Covering


and Node Cover Problems", GSIA, Carnegie-Mellon University (1980).

Hwang, F.K., Sum, J., and Yao, E.Y., "Optimal Set Partitioning", SIAM
Journal On Algebraic Discrete Methods, Vol. 6, pp. 163-170 (1985).

John, D.G., and Kochenberger, G.A., "Using Surrogate Constraints in a


Lagrangian Relaxation Approach to Set Covering Problems" Journal
of Operational Research Society, Vol. 39, pp. 681-685 (1988).

Johnson, D., "Approximation Algorithms for Combinatorial Problems",


Journal of Computer and System Scinces, Vol. 9, pp. 256-278 (1974).

Lawler, E.L., "Covering Problems: Duality Relations and a New Method


of Solutions", SIAM Journal of Applied Mathematics, Vol. 14, pp.
1115-1132 (1966).
Set Covering, Set Packing and Set Partitioning 663

Leigh, W., Ali, D., Ezell, C., and Noemi, P., "A Branch and Bound Al-
gorithm for Implementing Set Covering Model Expert System", Com-
puters and Operations Research, Vol. 11, pp. 464-467 (1988).

Lemke, C.E., Salkin, H.M., and Spielberg, K., "Set Covering by Single
Branch Enumeration with Linear Programming Subproblems", Oper-
ations Research, Vol. 19, pp. 998-1022 (1971).

Lovasz, L., "On the Ratio of Optimal Integer and Fractional Covers", Dis-
crete Mathematics, Vol. 13, pp. 383-390 (1975).

Marsten, R.E., "An Algorithm for Large Set Partitioning Problems", Man-
agement Science, Vol. 20, pp. 774- 787 (1974).

Michaud, P., "Exact Implicit Enumeration Method for Solving the Set
Partitioning Problem", IBM Journal of Research and Development,
Vol. 16, pp. 573-578 (1972).

Murty, K., "On the Set Representation and Set Covering Problem", Sym-
posium on the Theory of Scheduling and Its Applications, (Edited by
S.E., Elmaghraby), Springer-Verlag (1973).

Nemhauser, G.L., Trotter, L.E., and Nauss, R.M., "Set Partitioning and
Chain Decomposition", Management Science, Vol. 20, pp. 1413-1423
(1974).

Padberg, M., "Covering, Packing and Knapsack Problems", Annals of Dis-


crete Mathematics, Vol. 4, pp. 265-287 (1979).

Padberg, M., "Perfect Zero-One Matrices", Mathematical Programming,


Vol. 6, pp. 180-196 (1974a).

Padberg, M., "Characterization of Totally Unimodular, Balanced and Per-


fect Matrices", Combinatorial Programming: Methods and Applica-
tions, (Edited by B. Roy), D. Reidal Publishing, pp. 275-284 (1974b).

Padberg, M., "On the Facial Structure of Set Packing Polyhedra", Math-
ematical Programming, Vol. 5, pp. 199-215 (1973).

Peled, U.N., and Simeone, B., "Polynomial-Time Algorithms for Regular


Set-Covering and Threshold Synthesis", Discrete Applied Mathemat-
ics, Vol. 12" pp. 57-69 (1985).
664 R.R. Vemuganti

Pierce, J.F., and Lasky, J.S., "Improved Combinatorial Programming Al-


gorithms for a Class of All Zero-One Integer Programming Problems" ,
Management Science, Vol. 19, pp. 528-543 (1975).

Pierce, J.F., "Applications of Combinatorial Programming to a Class of


All-Zero-One Integer Programming Problems", Management Science,
Vol. 15, pp. 191-209 {1968}.

Roth, R., "Computer Solutions to Minimum Cover Problems", Operations


Research, Vol. 17, pp. 455-465 {1969}.

Roy, B. "An Algorithm for a General Constrained Set Covering Problem" ,


Computing and Graph Theory, Academic Press {1972}.

Ryan, D.M. and Falkner, J.C., "On the Integer Properties of Scheduling
Set Partitioning Models" , European Journal of Operational Research,
Vol. 35, pp. 442-456 {1988}.

Ryzhkov, A.P., "On Certain Covering Problems", Engineering Cybernetics,


Vol. 2, pp. 543-548 {1973}.

Salkin, H.M., "The Set Covering Problem", Integer Programming, Addison


Wesley, pp. 439-481, {1975}.

Salkin, H.M., and Koncal, R.D., "Set Covering by an All-Integer Algorithm:


Computational Experience", ACM Journal, Vol. 20, pp. 189-193,
(1973).

Trubin, V.A., "On a Method of Integer Linear Programming of a Special


Kind" , Soviet Math. Dokl., Vol. 10, pp. 1544-1546 (1969).

Vasko, F.J., and Wolfe F.E., "Solving Large Set Covering Problems on a
Personal Computer", Computers and Operations Research, Vol. 15,
pp. 115-121 (1988).

Vasko, F.J., and Wilson, G.R., "Hybrid Heuristics for Minimum Cardinal-
ity Set Covering Problems", Naval Research Logistic Quarterly, Vol.
33, pp. 241-250 {1986}.

Vasko, F.J., and Wilson, G.R., "Using Facility Location Algorithms to


Solve Large Set Covering Problems", Operations Research Letters,
Vol. 3, pp. 85-90 (1984a).
Set Covering, Set Packing and Set Partitioning 665

Vasko, F.J., and Wilson, G.R., "An Efficient Heuristic for Large Set Cov-
ering Problems", Naval Research Logistics Quarterly, Vol. 31, pp.
163-171 (1984b).
Graphs

Aneja, Y.P., "An Integer Programming Approach to the Steiner Problem


in Graphs", Networks, Vol. 10, pp. 167-178 (1980).
Aneja, Y.P., and Vemuganti, R.R., "A Row Generation Scheme for Find-
ing Multi-Commodity Minimum Disconnecting Set" , Management Sci-
ence, Vol. 23, pp. 652-659 (1977).
Balanski, M.L., "On Maximum Matching, Minimum Covering and Their
Connections", Proceedings of the International Symposium on Math-
ematical Programming, (Edited by H.W.Kuhn), Princeton University
Press, pp. 303-311, (1970).
Beasley, J.E., "An SST-Based Algorithm for the Steiner Problem on Graphs",
Networks, Vol. 19, pp. 1-16 (1989).
Beasley, J.E., "An Algorithm for the Steiner Problem in Graphs", Net-
works, Vol. 14, pp. 147-159 (1984).

Bellmore, M., and Ratliff, H.D., "Optimal Defence of Multi-Commodity


Networks", Management Science, Vol. 18, pp. 174 - 185 (1971).
Berge, C., "Graphs and Hypergraphs", Translated by E. Minieka, North-
Holland Publishing Company, Amsterdam-London (1973).
Brelaz, D., "New Method to Color the Vertices of a Graph", Communica-
tions of the ACM, Vol.22, pp. 251- 256 (1979).
Brown, J.R., "Chromatic Scheduling and the Chromatic Number Prob-
lem", Management Science, Vol. 19, pp. 456-663 (1972).
Chang, G.J., and Nemhauser, G.L., "Covering, Packing and Generalized
Perfection", SIAM Journal on Algebraic and Discrete Methods, Vol.
6, pp. 109-132 (1985).
Chopra, S., "Comparison of Formulations and Heuristics for Packing Steiner
Trees on a Graph", Technical Report, J.L. Kellogg Graduate School
of Management, Northwestern University, Illinois (1992).
666 R.R. Vemuganti

Chvatal, V., "Determining the Stability Number of a Graph", SIAM Jour-


nal on Computing, Vol. 6, pp. 643-662 (1977).

Cockayne, E.J., and Melzak, Z.A., "Steiner's Problem for Set Terminals",
Quarterly of Applied Mathematics, Vol. 26, pp. 213-218 (1969).

Corneil, D.G., and Graham, B., "An Algorithm for Determining the Chro-
matic Number of a Graph", SIAM Journal of Computing, Vol. 2, pp.
311-318 {1973}.

Dreyfus, S.E., and Wagner, RA., "The Steiner Problem in Graphs", Net-
works, Vol. 1, pp. 195-207 (1971).

Edmonds, J., "Maximum Matching and a Polyhedron with 0,1 Vertices",


Journal of Research of National Bureau of Standards, Vol. 69b, pp.
125-130 (1965a)

Edmonds, J., "Paths, Trees and Flowers", Canadian Journal of Mathemat-


ics, Vol. 17, pp. 449-467 (1965b).

Edmonds, J., "Covers and Packing in a Family of Sets", Bulletin of the


American Mathematical Society, Vol. 68, pp. 494-499 (1962).

Gilbert, E.N., and Pollak, H.O., "Steiner Minimal Trees", SIAM Journal
of Applied Mathematics, Vol. 16, pp. 1-29 (1968).

Hakimi, S.L, "Steiner's Problem in Graphs and Its Implications", Net-


works, Vol. 1, pp. 113-133 (1971).

Hanan, M., "On Steiner's Problem with Rectilinear Distance", SIAM Jour-
nal of Applied Mathematics, Vol. 14, pp. 255-265 (1966).

Houck, D.J., and Vemuganti, RR, "An Algorithm for the Vertex Packing
Problem", Operations Research, Vol. 25, pp. 773-787 (1977).

Hwang, F.K., and Richards, D.S., "Steiner Tree Problems", Networks, Vol.
22, pp. 55-89 (1992).

Khoury, B.N., Pardalos, P.M., and Hearn, D.W., "Equivalent Formulations


for the Steiner Problem on Graphs", Network Optimization Problems
(Edited by D.-Z, Du and P.M. Pardos), World Scientific Publishing
Co., pp. 111-124 (1993).
Set Covering, Set Packing and Set Partitioning 667

Khoury, B.N., Pardalos, P.M., and Du, D.-Z., "A Test Problem Generator
for the Steiner Problem in Graphs", Department of Industrial and
Systems Engineering Working Paper, University of Florida, Gainsville,
Florida {1993}.
Leighton, F., "A Graph-Coloring Algorithm for Large Scheduling Prob-
lems", Journal of Research of the National Bureau of Standards, Vol.
84, pp. 489-506 {1979}.
Maculan, N., "The Steiner Problem in Graphs", Annals of Discrete Math-
ematics, Vol. 31, pp. 185-212 {1987}.
Mehta, N.K., "The Application of Graph Coloring Model to an Examina-
tion Scheduling Problem", Interfaces, Vol. 11, pp. 57 - 64 {1981}.
Nemhauser, G.L., and Trotter, L.E., "Vertex Packing: Structural Proper-
ties and Algorithm" , Mathematical Programming, Vol. 8, pp. 232-248
{1975}.
Nemhauser, G.L., and Trotter, L.E., "Properties of Vertex Packing and
Independent Systems Polyhedra", Mathematical Programming, Vol.
6, pp. 48-61 {1974}.
Norman, R.Z., and Rabin, M.O., "An Algorithm for a Minimum Cover of a
Graph", Proceedings of the American Mathematical Society, Vol. to,
pp. 315-319 {1959}.
Salazar, A., and Oakford, R.V., "A Graph Formulation ofa School Schedul-
ing Algorithm", Communications of the ACM, Vol. 17, pp. 696-698
{1974}.
Wang, C.C., "An Algorithm for the Chromatic Number of a Graph", Jour-
nal of ACM, Vol. 21, pp.385-391 {1974}.
Weinberger, D.B., "Network Flows, Minimum Coverings, and the Four
Color Conjecture", Operations Research, Vol. 24, pp. 272-290 {1976}.
Winter, P., "Steiner Problem in Networks: A Survey", Networks, Vol. 17,
pp. 129-167 (1987).
Wong, R.T., "A Dual Ascent Approach to Steiner Tree Problems on a
Directed Graph", Mathematical Programming, Vol. 28, pp. 271-287
{1984}.
668 R.R. Vemuganti

Wood, D.C., "A Technique for Coloring a Graph Applicable to Large Scale
Time-Tableing Problems", Computing Journal, Vol. 12, pp. 317-319
(1969).

Wu, Y.F., Widmayer, P., and Wong, C.K, "A Faster Approximation Al-
gorithm for the Steiner Problem in Graphs", Acta Informatica, Vol.
23, pp. 223-229 (1986).

Personnel Scheduling

Abernathy, W., Baloff, N., and Hershey, J., "A Variable Nurse Staffing
Model", Decision Sciences, Vol. 5, pp. 58-72 (1974).

Abernathy, W., Baloff, N., Hershey, J., and Wandel, S., "A Three Stage
Manpower Planning and Scheduling Model - A Service Sector Exam-
pie", Operations Research, Vol. 21, pp. 693-711 (1973).

Abernathy, W., Baloff, N., and Hershey, J., "The Nurse Staffing Problem:
Issues and Prospects", Sloan Management Review, Vol. 12, pp. 87-
99 (1971).

Ahuja, H., and Sheppard, R., "Computerized Nurse Scheduling" Industrial


Engineering, Vol. 7, pp. 24-29 (1975).

Altman, S., Beltrami, E.J., Rappoport, S.S., and Schoepfie, G.K, "Nonlin-
ear Programming Model of Crew Assignments for Household Refuse
Collection" , IEEE Transactions on Systems, Man and Cybernetics,
Vol. 1, pp. 289-291 (1971).

Bailey, J., "Integrated Days Off and Shift Personnel Scheduling", Comput-
ers and Industrial Engineering, Vol. 9, pp. 395-404 (1985).

Bailey, J., and Field, J., "Personnel Scheduling with Flexshift Models",
Journal of Operations Management, Vol. 5, pp. 327-338 (1985).

Baker, KR., Burns, R.N., and Carter, M., "Staff Scheduling with Days-
Off and Work Stretch Constraints", AilE Transactions, Vol. 11, pp.
286-292 (1979).

Baker, KR., and Magazine, M.J., "Workforce Scheduling with Cyclic De-
mands and Days-Off Constraints", Management Science, Vol. 24, pp.
161-167 (1977).
Set Covering, Set Packing and Set Partitioning 669

Baker, KR., "Workforce Allocation in Cyclic Scheduling Problem: A Sur-


vey", Operational Research Quarterly, Vol. 27, pp. 155-167 (1976).
Baker, KR., "Scheduling a Full-Time Workforce to Meet Cyclic Staffing
Requirements", Management Science, Vol. 20, pp. 1561-1568 (1974).
Baker, KR., Crabil, T.B., and Magazine, M.J., "An Optimal Procedure for
Allocating Manpower with Cyclic Requirements", AIlE Transactions,
Vol. 5, pp. 119-126 (1973).
Bartholdi, J.J., "A Guaranteed-Accuracy Round-off Algorithm for Cyclic
Scheduling and Set Covering", Operations Research, Vol. 29, pp. 501-
510 (1981).
Bartholdi, J.J., Orlin, J.B., and Ratliff, H.D., "Cyclic Scheduling via In-
teger Programs with Circular Ones," Operations Research, Vol. 29,
pp.1074-1085 (1980).
Bartholdi, J.J., and Ratliff, H.D., "Unnetworks, With Applications to Idle
Time Scheduling", Management Science, Vol. 24, pp. 850-858 (1978).
Bechtold, S., Brusco, M., and Showalter, M., "A Comparative Evaluation
of Labor Tour Scheduling Methods", Decision Sciences, Vol. 22, pp.
683-699 (1991).
Bechtold, S.E., "Optimal Work-Rest Schedules with a Set of Fixed-Duration
Rest Periods", Decision Sciences, Vol. 22, pp. 157-170 (1991).
Bechtold, S.E., and Jacobs, L.W., "Implicit Modelling of Flexible Break
Assignments in Optimal Shift Scheduling", Management Science, Vol.
36, pp. 1339-1351 (1990).
Bechtold, S.E., and Sumners, D.L., "Optimal Work-Rest Scheduling with
Exponential Work Rate Decay", Management Science, Vol. 34, pp.
547-552 (1988).
Bechtold, S.E., "Implicit Optimal and Heuristic Labor Staffing in a Multi-
objective Multilocation Environment", Decision Sciences, Vol. 19, pp.
353 - 372 (1988).
Bechtold, S.E., and Showalter, M., "A Methodology for Labor Scheduling
in a Service Operating System", Decision Sciences, Vol. 18, pp. 89-107
(1987).
670 R.R. Vemuganti

Bechtold, S.E., and Showalter, M., "Simple Manpower Scheduling Methods


for Managers", Production and Inventory Management, Vol. 26, pp.
116-132 (1985).

Bechtold, S.E., Janaro, R.E., and Sumners, D.L., "Maximization of Labor


Productivity through Optimal Rest-Break Schedules", Management
Science, Vol. 30, pp. 1442-1458 (1984).

Bechtold, S.E., "Work Force Scheduling for Arbitrary Cyclic Demand",


Journal of Operations Management, Vol. 1, pp. 205-214 (1981).

Bechtold, S.E., "Quantitative Models for Optimal Rest Period Scheduling:


A Note", OMEGA, The International Journal of Management Science,
Vol. 7, pp. 565-566 (1979).

Bennett, B.T., and Potts, R.B., "Rotating Roster for a Transit System",
Transportation Science, Vol. 2, pp. 14-34 (1968).

Bodin, L.D., "Toward a General Model for Manpower Scheduling: Parts 1


and 2", Journal of Urban Analysis, Vol. 1, pp. 191-245 (1973).

Browne, J.J., "Simplified Scheduling of Routine Work Hours and Days


Off', Industrial Engineering, Vol. 11, pp. 27 - 29 (1979).

Browne, J.J. and Tibrewala, R.K., "Manpower Scheduling", Industrial En-


gineering, Vol. 7, pp. 22-23 (1975).

Brownell, W.S., and Lowerre, J.M., "Scheduling of Workforce Required in


Continuous Operations Under Alternate Labor Policies" , Management
Science, Vol. 22, pp. 597-605 (1976).

Buffa, E.S., Cosagrove, M.J., and Luce, B.J., "An Integrated Work Shift
Scheduling System", Decision Sciences, Vol. 7, pp. 620-630 (1976).

Burns, R.N., and Koop, G.J., "A Modular Approach to Optimal Shift
Manpower Scheduling", Operations Research, Vol. 35, pp. 100-110
(1987).

Burns, R.N., and Carter, M.W., "Work Force Size and Single Shift Sched-
ules with Variable Demands" , Management Science, Vol. 31, pp. 599-
607 (1985).
Set Covering, Set Packing and Set Partitioning 671

Burns, R.N., "Manpower Scheduling with Variable Demands and Alternate


Weekends Off", INFOR, Canadian Journal of Operations Research and
Information Processing, Vol. 16, pp. 101-111 (1978).

Byrne, J.L., and Potts, R.B., "Scheduling of Toll Collectors", Transporta-


tion Science, Vol. 30, pp. 224-245 (1973).

Chaiken, J., and Dormont, P., "A Patrol Car Allocation Model: Back-
ground and A Patrol Car Allocation Model: Capabilities and Algo-
rithms", Management Science, Vol. 24, pp. 1280-1300 (1978)

Chelst, K., "Deployment of One vs. Two-Officer Patrol Units: A Com-


parison of Travel Times", Management Science, Vol. 27, pp. 213-230
(1981).

Chelst, K., " Algorithm for Deploying a Crime Directed Patrol Force" , Man-
agement Science, Vol. 24, pp. 1314-1327 (1978).

Chen, D., "A Simple Algorithm for a Workforce Scheduling Model", AIlE
Transactions, Vol. 10, pp. 244 - 251 (1978).

Church, J.G., "Sure Staff: A Computerized Staff Scheduling System for


Telephone Business Offices" Management Science, Vol. 20, pp. 708-
720 (1973).

Dantazig, G.W., "A Comment on Edie's Traffic Delays at Toll Booths",


Operations Research, Vol. 2, pp. 339 - 341 (1954).

Easton, F.F., and Rossin, D.F., "Sufficient Working Subsets for the Tour
Scheduling Problem", Management Science, Vol. 37, pp. 1441-1451
(1991a).

Easton, F.F., and Rossin, D.F., "Equivalent Alternate Solutions for the
Tour Scheduling Problems", Decision Sciences, Vol. 22, pp. 985-1007
(1991b).

Eilon, S., " On a Mechanistic Approach to Fatigue and Rest Periods", Inter-
national Journal of Production Research, Vol. 3, pp. 327-332 (1964).

Emmons, H., and Burns, R.N., "Off-Day Scheduling with Hierarchical


Worker Categories", Operations Research, Vol. 39, pp. 484-495 (1991).
672 R.R. Vemuganti

Emmons, H., "Workforce Scheduling with Cyclic Requirements and Con-


straints on Days Off, Weekends Off and Workstretch", IIE Transac-
tions, VoL 17, pp. 8-16 {1985}.

Frances, M.A., " Implementing a Program of Cyclical Scheduling of Nursing


Personnel", Hospitals, VoL 40, pp. 108 - 123 {1966}.

Gaballa, A., and Pearce, W., "Telephone Sales Manpower Planning at


Qantas", Interfaces, VoL 9, pp. 1-9 {1979}.

Gentzler, G.L., Khalil, T.M., and Sivazlian, B.B., "Quantitative Meth-


ods for Optimal Rest Period Scheduling" , OMEGA, The International
Journal of Management Science, VoL 5, pp. 215-220 {1977}.

Glover, F., and McMillan, C., "The General Employee Scheduling Problem:
An Integration of Management Science and Artificial Intelligence",
Computers and Operations Research, VoL 13, pp. 563-573 {1986}.

Glover, F., McMillan, C. and Glover, R., "A Heuristic Programming Ap-
proach to the Employee Scheduling Problem and Some Thoughts on
'Managerial Robots'" , Journal of Operations Management, Vol. 4, pp.
113- 128 {1984}.

Green, L. and Kolesar, P., "The Feasibility of One Officer Patrol in New
York City", Management Science, Vol. 30, pp. 964-981 {1984}.

Guha, D., and Browne, J., "Optimal Scheduling of Tours and Days Oft",
Preprints, Workshop on Automated Techniques for Scheduling of Ve-
hicle Operators for Urban Public Transportation Services {Edited by
Bodin and Bergman}, Chicago, Illinois {1975}.

Hagberg, B., " An Assignment Approach to the Rostering Problem: An Ap-


plication to Taxi Vehicles", Computer Scheduling of Public Transport
2 {Edited by J-M Rousseau}, pp. 313 - 318, North-Holland {1985}.

Henderson, W.B., and Berry, W.L., "Determining Optimal Shift Schedules


for Telephone Traffic Exchange Operators", Decision Sciences, Vol. 8,
pp. 239-255 {1977}

Henderson, W.B., and Berry, W.L., "Heuristic Methods for Telephone Op-
erator Shift Scheduling: An Experimental Analysis" , Management Sci-
ence, Vol. 22, pp.1372-1380 {1976}.
Set Covering, Set Packing and Set Partitioning 673

Hershy, J.C., Albernathy, W.J., and Baloff, N., "Comparison of Nurse


Allocation Policies - A Monte Carlo Model", Decision Sciences, Vol.
5, pp. 58 - 72 (1974).
Holloran, T.J., and Byrn, J.E., "United Airline Station Manpower Planning
System", Interfaces, Vol. 16, pp. 39-50 (1986).
Howell, J.P., "Cyclical Scheduling of Nursing Personnel," Hospitals, J.A.H.A.,
Vol. 40, pp. 77 - 85 (1966).
Hung, R., and Emmons, H., "Multiple-Shift Workforce Scheduling Un-
der the 3-4 Compressed Workweek With a Hierarchical Workforce",
Department of Operations Research Working Paper, Case Western
Reserve University, Cleveland, Ohio (1990).
Ignall, E. Kolesar, P., and Walker, W., "Linear Programming Models of
Crew Assignments for Refuse Collection" , IEEE Transactions on Sys-
tems, Man and Cybernetics, Vol. 2, pp. 664-666 (1972).
Keith, E.G., "Operator Scheduling", AIlE Transactions, Vol. 11, pp. 37-41
(1979).
Klasskin, P.M., "Operating to Schedule Operators", Telephony, Vol. pp.
29 - 31 (1973).
Koop, G.J., "Multiple Shift Workforce Lower Bounds", Management Sci-
ence, Vol. 34, pp. 1221-1230 (1988).
Koop, G.J., "Cyclic Scheduling of Weekends", Operations Research Let-
ters, Vol. 4, pp. 259-263 (1986).
Krajewski, L.J., Ritzman, L.P., and McKenzie, P., "Shift Scheduling in
Banking Operations: A Case Application", Interfaces, Vol. 10, pp.
1-8 (1980).
Krajewski, L.J., and Ritzman, L.P., "Disaggregation in Manufacturing and
Service Organizations: Survey of Problems and Research", Decision
Sciences, Vol. 8, pp. 1 - 18, (1977).
Lessard, R., Rousseau, J.M., and DuPuis, D., "Hatus I: A Mathematical
Programming Approach to the Bus Driver Scheduling Problem" , Com-
puter Scheduling of Public Transport, (Edited by A. Wren), North-
Holland Publishing Company, pp. 255-267 (1981).
674 R.R. Vemuganti

Li, C., Robinson, E.P., and Mabert, V.A., "An Evaluation of Tour Schedul-
ing Heuristics with Differences in Employee Productivity and Cost",
Decision Sciences, Vol. 22, pp. 700-718 {1991}.

Linder, R.W., "The Development of Manpower and Facilities Planning


Methods for Airline Telephone Reservation Offices", Operational Re-
search Quarterly, Vol. 20, No.1, pp. {1969}.

Loucks, J.S., and Jacobs, F.R., "Tour Scheduling and Task Assignment of
a Heterogeneous Work Force", Decision Sciences, Vol. 22, pp. 719-738
(1991).

Lowerre, J.M., "On Personnel Budgeting on Continuous Operations (With


Emphasis on Hospitals)", Decision Sciences, Vol, 10, pp. 126-135
{1979}.

Lowerre, J.M., "Work Stretch Properties for the Scheduling of Continuous


Operations Under Alternative Labor Policies", Management Science,
Vol. 23, pp. 963-971 (1977).

Mabert, V.A., and Watts C.A., "A Simulation Analysis of Tour Shift Con-
struction Procedures", Management Science, Vol. 28, pp. 520-532
(1982).

Mabert, V.A., and McKenzie, J.P., "Improving Bank Operations: A Case


Study at Bank Ohio/Ohio National Bank", OMEGA, The Interna-
tional Journal of Management Science, Vol. 8, pp. 345-354 (1980).

Mabert, V.A., "A Case Study of Encoder Shift Scheduling Under Uncer-
tainty", Management Science, Vol. 25, pp. 623-631 (1979).

Mabert, V.A., and Raedels, A., "The Detail Scheduling of A Part-Time


Work Force: A Case Study of Tellar Staffing", Decision Sciences, Vol.
8, pp. 109-120 (1977).

Maier-Rothe, C., and Wolf, H.B., "Cylical Scheduling and Allocation of


Nursing Staff", Socio-Economic Planning Sciences, Vol. 7, pp. 471-
487 (1973).

McGinnis, L.F., Culver, W.D., and Deane, R.H., "One and Two-Phase
Heuristics for Workforce Scheduling", Computers and Industrial En-
gineering, Vol. 2, pp. 7-15 (1978).
Set Covering, Set Packing and Set Partitioning 675

McGrath, D., "Flextime Scheduling: A Survey", Industrial Management,


Vol. 22 , pp. 1-4 (1980).
Megeath, J.D., " Successful Hospital Personnel Scheduling", Interfaces, Vol.
8, pp. 55 - 59 (1978).
Miller, H.E., Pierskalla, W.P., and Rath, G.J., "Nurse Scheduling, Using
Mathematical Programming", Operations Research, Vol. 24, pp. 857-
870 {1976}.
Monroe, G., "Scheduling Manpower for Service Operations", Industrial
Engineering, Vol. 2, pp. 10-17 {1970}.
Moondra, S.L., "An L.P. Model for Work Force Scheduling for Banks",
Journal of Bank Research, Vol. 6, pp. 299-301 {1976}.
Morris, J.G., and Showalter, M.J., "Simple Approaches to Shift, Days-Off
and Tour Scheduling Programs", Management Science, Vol. 29, pp.
942-950 (1983).

Morrish, A.R., and O'Connor, A.R., " Cyclic Scheduling" Hospitals J.A.H.A.,
Vol. 14, pp. 66-71 (1970).
Ozkarahan, I., and Bailey, J.E., "Goal Programming Model Subsystem of
Flexible Nurse Scheduling Support System", lIE Transactions, Vol.
20, No.3, pp. 306-316 {1988}.
Ozkarahan, I., "A Flexible Nurse Scheduling Support System", Ph.D. Dis-
sertation, Arizona State University {1987}.
Paixao, J., and Pato, M., "A Structural Lagrangean Relaxation for Two-
Duty Period Bus Drive Scheduling Problems", European Journal of
Operational Research, Vol. 39, No.2, pp. 213-222 (1989).
Pappas, I.A., "Dynamic Job Assignment for Railway Personnel", Manage-
ment Science, Vol. 13, pp. B809- B816 {1967}.
Price, E., "Techniques to Improve Staffing", American Journal of Nursing,
Vol. 70, pp. 2112 - 2115 {1970}.
Ritzman, L.P., Krajewski, L.J., and Showalter, M.J., "The Disaggregation
of Aggregate Manpower Plans", Management Science, Vol. 22, pp.
1204-1214 {1976}.
676 RR Vemuganti

Rothstein, M., " Hospital Manpower Shift Scheduling by Mathematical Pro-


gramming", Health Service Research, Vol. 8, pp. 60 - 66 (1973).
Rothstein, M., "Scheduling Manpower by Mathematical Programming",
Industrial Engineering, Vol. 4, pp. 29-33 (1972).
Segal, M., "The Operator-Scheduling Problem: A Network Flow Approach" ,
Operations Research, Vol. 22, pp. 808-823 (1974).
Shepardson, F., and Marsten, R.E., "A Lagrangean Relaxation Algorithm
for the Two Duty Period Scheduling Problem", Management Science,
Vol. 26, pp. 274-281 (1980).
Showalter, M.J., and Mabert, V.A., "An Evaluation of A Full-Part Time
Tour Scheduling Methodology", International Journal of Operations
and Production Management, Vol. 8, pp. 54-71 (1988).
Showalter, M.J., Krajewski, L. J., and Ritzman, L.P., "Manpower Alloca-
tion in U.S. Postal Facilities: A Heuristic Approach", Computers and
Operations Research, Vol. 2, pp. 141-13 (1978).
Smith, D.L., "The Application of an Interactive Algorithm to Develop
Cyclical Schedules for Nursing Personnel", INFOR, Canadian Journal
of Operations Research and Information Processing" Vol. 14, pp. 53-
70 (1976).
Smith, H.L., Mangelsdorf, K.R., Luna, J.C., and Reid, R.A., "Supplying
Ecuador's Health Workers Just in Time", Interfaces, Vol. 19, pp. 1-12
(1989).
Smith, L., and Wiggins, A., "A Computer-Based Nursing Scheduling Sys-
tem", Computers and Operations Research, Vol. 4, pp. 195-212
(1977).
Stern, H.I., and Hersh, M., "Scheduling Aircraft Cleaning Crews", Trans-
portation Science, Vol. 14, pp. 277 - 291 (1980).
Taylor, P.E., and Huxlery, S.J., "A Break from Tradition for the San Fran-
cisco Police: Patrol Officer Scheduling Using An Optimization-Based
Decision Support System" , Interfaces, Vol. 19, pp. 4-24 (1989).
Tibrewala, R., Phillippe, D., and Browne, J., "Optimal Scheduling of Two
Consecutive Idle Periods", Management Science, Vol. 19, pp. 71-75
(1972).
Set Covering, Set Packing and Set Partitioning 677

Tien, J.M., and Kamiyama, A., "On Manpower Scheduling Algorithms",


SIAM Review, Vol. 24, pp. 275- 287 {1982}.
Vohra, R.V., "A Quick Heuristic for Some Cyclic Staffing Problems with
Breaks", Journal of Operations Research Society, Vol. 39, pp. 1057-
1061 {1988}.
Warner, n.M., "Scheduling Nursing Personnel According to Nursing Pref-
erence: A Mathematical Programming Approach", Operations Re-
search, Vol. 24, pp. 842-856 {1976}.
Warner, n.M., and Prawda, J., "A Mathematical Programming Model
for Scheduling Nursing Personnel in Hospitals", Management Science,
Vol. 19, pp. 411-422 {1972}.
Wolfe, H., and Young, J.P., "Staffing the Nursing Unit Part I Controlled
Variable Staffing", Nursing Research, Vol. 14, pp. 236-243 (1965a).
Wolfe, H., and Young, J.P., "Staffing the Nursing Unit Part II The Mul-
tiple Assignment Technique", Nursing Research, Vol. 14, pp. 299-303
(1965b).
Crew Scheduling
Amar, G., "New Bus Scheduling Methods at RATP", Computer Schedul-
ing of Public Transport, {Edited by J.H. Roussean}, Elsevier Science
Publishers, North Holland, pp. 415-426 (1985).
Anbil, R., Gelman,E., Patty, B., and Tanga, R., "Recent Advances in Crew-
Paring Optimization at American Airlines", Interfaces, Vol. 21, pp.
62-74 {1991}.
Arabeyre, J.P., Fearnley, J., Steiger, F.C., and Teather, W., "The Airline
Crew Scheduling Problem: A Survey", Transportation Science, Vol.
3, pp. 140-163 {1969}.
Arab eyre, J.P., "Methods of Crew Scheduling", Proceedings, 6th AGI-
FORS (Airline Group of International Federation of Operations Re-
search) Symposium, Air France {1966}.
Baker, E., and Fisher, M., "Computational Results for Very Large Air
Crew Scheduling Problems", OMEGA, The International Journal of
Management Science, Vol. 9, pp. 613-618 {1981}.
678 R.R. Vemuganti

Baker, E.K, and Frey, K, "A Heuristic Set Covering Based System for
Scheduling Air Crews", Proceedings, SE AIDS (1980).

Baker, E.K, and Bodin, L.D., Finnegan, W.F., and Ponder, R., "Efficient
Heuristic Solutions to an Airline Crew Scheduling Problem", AIlE
Transactions, Vol. 11, pp. 79-85 (1979).

Ball, M.O., Bodin, D.L., and Greenberg, J., "Enhancement to the RU-
CUS - II Crew Scheduling System", Computer Scheduling of Public
Transport 2 (Edited by J.-M. Rousseau), Elsevier Science Publishers,
North- Holland Publishing Company, pp. 279-293 (1985).

Ball, M., and Roberts, A., "A Graph Partitioning Approach to Airline Crew
Scheduling", Transportation Science, Vol. 19, pp. 106-126 (1985).

Ball, M., Bodin, L., and Dial, R., "A Matching Based Heuristic for Schedul-
ing Mass Transit Crews and Vehicles" Transportation Science, Vol. 17,
pp. 4-31 (1983).

Ball, M.O., Bodin, D.L., and Dial, R., "Experimentation with Computer-
ized System for Scheduling Mass Transit Vehicles and Crews", Com-
puter Scheduling of Public Transport (Edited by A. Wren), North-
Holland Publishing Company, pp. 313-334, (1981).

Ball, M., Bodin, L., and Dial, R., "Scheduling of Drivers for Mass Transit
Systems Using Interactive Optimization", World Conference on Trans-
portation Research, London, England {April 1980).

Barnhart, C., Johnson, E., Anbil, R., and Hatay, L., "A Column Genera-
tion Technique for the Long-haul Crew Assignment Problem" ,
ORSA/TIMS (1991).

Belletti, R., and Davani, A., "BDROP: A Package for the Bus Drivers' Ros-
tering Problem", Computer Scheduling of Public Transport 2 (Edited
by J. -M. Rousseau), Elsevier Science Publishers, North-Holland Pub-
lishing Company, pp. 319-324 (1985).

Bodin, L., Ball, M., Duguid, R., and Mitchell, M., "The Vehicle Scheduling
Problem with Interlining", Computer Scheduling of Public Transport
2, (Edited by J. -M. Rousseau), Elsevier Science Publishers, North-
Holland, (1985).
Set Covering, Set Packing and Set Partitioning 679

Bodin, L., Rosenfield and Kydes, A., "Scheduling and Estimation Tech-
niques for Transportation Planning", Computers and Operations Re-
search, Vol. 8, pp. 25-38 (1981).
Bodin, L., and Dial, R., "Hierarchical Procedures for Determining Vehicle
and Crew Requirements for Mass Transit Systems", Transportation
Research Record, 746, pp. 58-64 (1980).
Booler, J.M., "A Method for Solving Crew Scheduling Problems", Opera-
tional Research Quarterly, Vol. 26, pp. 55-62 (1975).
Borret, J.M.J., and Roes, A.W., "Crew Scheduling by Computer: A Test on
the Possibility of Designing Duties for a Certain Bus Line" , Computer
Scheduling Public Transport, (Edited by A. Wren), North- Holland
Publishing Company, pp. 237-253 (1981).
Bronemann, D.R., "A Crew Planning and Scheduling System", Proceed-
ings, 10th AGIFORS (Airline Group of International Federation of
Operations Research) Symposium, (1970).
Carraresi, P., and Gallo, G., "Network Models for Vehicle and Crew Schedul-
ing" , European Journal of Operational Research, Vol. 16, pp. 139-151
(1984).
Ceder, A., "The Variable Trip Procedure Used in the Automobile Vehicle
Scheduler", Computer Scheduling of Public Transport 2 (Edited by J.
-W. Rousseau), Elsevier Science Publishers, North-Holland Publishing
Company, pp. 371-390, (1985).
Darby-Dowman, K., and Mitra, G., "An Extension of Set Partitioning
with Applications to Scheduling Problems" , European Journal of Op-
erational Research, Vol. 21, pp. 200-205 (1985).
Edwards, G.R., "An Approach to the Crew Scheduling Problem", New
Zealand Operational Research, Vol. 8, pp. 153-171 (1980).
Evers, G.H.E., "Relevant Factors Around Crew-Utilization", AGIFORS
(Airline Group of International Federation of Operations Research)
Symposium, KLM (1956).
Falkner, J.C., and Ryan, D.M., "A Bus Crew Scheduling System Using a
Set Partitioning Model", Asia- Pacific Journal of Operations Research,
Vol. 4, pp. 39-56 (1987).
680 R.R. Vemuganti

Gerbract, R, "A New Algorithm for Very Large Crew Pairing Problems" ,
Proceedings, 18th AGIFORS {Airline Group of the International Fed-
eration of Operations Research} Symposium {1978}.

Gershkoff, 1., "Overview of the Crew Scheduling Problem", ORSA/TIMS


National Conference {1990}.

Gershkoff, I., "Optimizing Flight Crew Schedules", Interfaces, Vol. 19, pp.
29-43 {1989}.

Gershkoff, 1., "American's System for Building Crew Pairings", Airline


Executive, Vol. 11, pp. 20-22 {1987}.

Hartley, T., "Two Complementary Bus Scheduling Programs", Computer


Scheduling of Public Transport 2 {Edited by J. - M. Rousseau}, Else-
vier Science Publishers, pp. 345 - 367 {1985}.

Hartley, T., "A Glossary of Terms in Bus and Crew Scheduling", Computer
Scheduling of Public Transport {Edited by A. Wren}, North-Holland
Publishing Company, pp. 353-359 {1981}.

Henderson, W., "Relationships Between the Scheduling of Telephone Op-


erators and Public Transportation Vehicle Drivers", Preprint, Work-
shop on Automated Techniques for Schedules of Vehicle Operators for
Urban Public Transportation Services {Edited by L. Bodin, and D.
Bergmann}, Chicago, Illinois {1975}.

Heurgon, E., and Hervillard, R, "Preparing Duty Rosters for Bus by Com-
puters", UITP Revue, Vol. 24, pp. 33 - 37 (1975).

Hoffstadt, J., "Computerized Vehicle and Driver Scheduling for the Ham-
burger Hochbahn Aktiengesellschaft", Computer Scheduling of Pub-
lic Transport: Urban Passenger and Crew Scheduling (Edited by A.
Wren), North-Holland Publishing Company, pp. 35 - 52 {1981}.

Howard, S.M., and Moser, P.L, "Impacs: A Hybrid Interactive Approach


to Computerized Crew Scheduling", Computer Scheduling of Public
Transport 2 {Edited by J.-M.Rousseau}, Elsevier Science Publishers,
North-Holland Publishing Company, pp. 211-220, {1985}.

Jones, RD., "Development of an Automated Airline Crew Bid Generations


Systems", Interfaces, Vol. 19, pp. 44-51 {1989}.
Set Covering, Set Packing and Set Partitioning 681

Kabbani, N.M., and Patty, B.W., "Aircraft Routing at American Airlines" ,


ORSA/TIMS, Joint National Meeting (1993).

Keaveny, LT., and Burbeck, S., "Automatic Trip Scheduling and Opti-
mal Vehicle Assignments", Computer Scheduling of Public Transport
(Edited by A.Wren), North-Holland Publishing Company, pp. 125-
145, (1981).

Kolner, T.K., "Some Highlights of a Scheduling Matrix Generator Sys-


tem", Proceedings, 6th AGIFORS (Airline Group of the International
Federation of Operations Research) Symposium (1966).

Koutsopoulos, H.N., Odoni, A.R., and Wilson, N.H.M., "Determination of


Headways as a Function of Time Varying Characteristics on a Tran-
sient Network", Computer Scheduling of Public Transport 2 (Edited
by J. M., Rousseau), Elsevier Science Publishers, North-Holland, pp.
391-413 (1985).

Lavoie, S., Minoux, M., and Odier, E., "A New Approach for Crew Pairing
Problems by Column Generation Scheme with An Application to Air
Transportation", European Journal of Operational Research, Vol. 35,
pp. 45-58 (1988).

LePrince, M., and Mertens, W., "Vehicle and Crew Scheduling at the
Societe Des Transports Intercommunaux De Bruxelles", Computer
Scheduling of Public Transport 2 (Edited by J.-M. Rousseau), Elsevier
Science Publishers, North-Holland, pp. 149-178, (1985).

Lessard, R., Rouseau, J.-M., and DuPuis, D., "HASTUS I: A Mathemat-


ical Programming Approach to the Bus Driver Scheduling Problem" ,
Computer Scheduling of Public Transport (Edited by A. Wren), North-
Holland Publishing Company, pp. 255-267, (1980).

Leudtke, L.K., "RUCUS II: A Review of System Capabilities", Computer


Scheduling of Public Transport 2 (Edited by J.-M. Rousseau), Elsevier
Science Publishers, North-Holland Publishing Company, pp. 61-116
(1985).

Marsten, R.E., and Muller, M.R., and Killion, D.L., "Crew Planning at Fly-
ing Tiger: A Successful Application of Integer Programming", Man-
agement Science, Vol. 25, pp. 1175-1183 (1989).
682 R.R. Vemuganti

Marsten, R.E., and Shepardson, F., "Exact Solution of Crew Scheduling


Problems Using the Set Partitioning Model: Recent Successful Appli-
cations", Networks, Vol. 11, pp. 165-177 {1981}.
McCloskey, J.F., and Hanssman, F., "An Analysis of Stewardess Require-
ments and Scheduling for a Major Airline", Naval Research Logistic
Quarterly, Vol. 4, pp. 183-192 {1957}.
Minoux, M., "Column Generation Techniques in Combinatorial Optimiza-
tion, A new Application to Crew- Pairing Problems", Proceedings,
24th AGIFORS {Airline Group of the International Federation of Op-
erations Research} Symposium, Strasbourg, France {1984}.
Mitchell, R., "Results and Experience of Calibrating HASTAUS-MARCO
for Work Rule Cost at the Southern California Rapid Transit District,
Los Angeles", Computer Scheduling of Public Transport 2 {Edited by
J.-M. Rousseau}, Elsevier Science Publishers, North-Holland, {1985}.
Mitra, G., and Darby-Dowman K., "CRU-SCHED: A Computer Based
Bus Crew Scheduling System Using Integer Programming" Computer
Scheduling in Public Transport {Edited by J-M. Rousseau}, Elsevier
Publishers, North-Holland Publishing Company, pp. 223-232 {1985}.
Mitra, G., and Welsh, A., "A Computer Based Crew Scheduling System
Using a Mathematical Programming Approach", Computer Schedul-
ing Public Transport: Urban Passenger Vehicle and Crew Scheduling
(Edited by A.Wren) North Holland Publishing Company, pp. 281-296
{1981}.
Niederer, M., "Optimization of Swissair's Crew Scheduling by Heuristic
Methods Using Integer Linear Programming Models", Proceedings,
6th AGIFORS {Airline Group of the International Federation of Op-
erations Research}, Symposium {1966}.
Paixao, J.P., Branco, M.I., Captivo, M.E., Pato, M.V., Eusebio, R., and
Amado, L., "Bus and Crew Scheduling on a Microcomputer", {Edited
by J. D. Coelho and L. V. Tavers}, North-Holland Publishing Company
{1986}.
Parker, M.E., and Smith, B.M., "Two Approaches to Computer Crew
Scheduling", Computer Scheduling of Public Transport, (Edited by
A. Wren), North-Holland Publishing Company, pp. 193-221 {1981}.
Set Covering, Set Packing and Set Partitioning 683

Piccione, C., Cherici, A., Bielli, M., and LaBella, A., "Practical Aspects in
Automatic Crew Scheduling", Computer Scheduling of Public Trans-
port: Urban Passenger Vehicle and Crew Scheduling (Edited by A.
Wren), North-Holland Publishing Company, pp. 223-236 (1981).
Rannou, B., "A New Approach to Crew Pairing Optimization", Proceed-
ings, 26th AGIFORS (Airline Group of the International Federation
of Operations Research) Symposium, England (1986).

Rousseau, J.-M., (Ed)., Computer Scheduling of Public Transport 2, Else-


vier Publishers, North Holland (1985).
Rousseau, J. -M, and Lessard, R., "Enhancements to the HASTUS Crew
Scheduling Algorithm" , Computer Scheduling of the Public Transport
2 (Edited by J. -M. Rousseau), Elsevier Science Publishers, North-
Holland, pp. 295 - 310, (1985).
Rubin, J., "A Technique for the Solution of Massive Set Covering Problem
with Application to Airline Crew Scheduling" , Transportation Science,
Vol. 7, No.1, pp. 34-48 (1973).
Ryan, D.M., and Foster, B.A., "An Integer Programming Approach to
Scheduling", Computer Scheduling of Public Transport: Urban Pas-
senger Vehicle and Crew Scheduling (Edited by A. Wren), North-
Holland Publishing Company, pp. 269-280 (1981).
Scott, D., "A Large Scale Linear Programming Approach to the Public
Transport Scheduling and Costing Problem", Computer Scheduling
of Public Transport 2 (Edited by J. - M. Rousseau), Elsevier Science
Publishers, North-Holland Publishing Company, pp. 473-491, (1985).

Shepardson, F., " Modelling the Bus Crew Scheduling Problem", Computer
Scheduling of Public Transport 2 {Edited by J. -M. Rousseau}, Elsevier
Science Publishers, North-Holland Publishing Company, pp. 247-261
{1985}.

Stern, H.I., and Ceder, A., "A Deficit Funciton Approach for Bus Schedul-
ing", Computer Scheduling of Public Transport {Edited by A. Wren},
North-Holland Publishing Company, pp. 85 - 96, {1981}.
Spitzer, M., "Crew Scheduling with Personal Computer", Airline Execu-
tive, Vol. 11, pp. 24-27 {1987}.
684 R.R. Vemuganti

Spitzer, M., "Solution to the Crew Scheduling Problem", Proceedings, 1st


AGIFORS (Airline Group of International Federation of Operations
Research) Symposium (1961).

Steiger, F., "Optimization of Swissair's Crew Scheduling by an Integer


Linear Programming Model", Swissair, O.R. SDK 3.3.911 (1965).

Stern, H., "Bus and Crew Scheduling (Note)", Transportation Research,


Vol. 14A, pp. 154- (1980).

Tykulsker, R.J., O'Niel, K.K., Ceder, A., and Scheffi, Y., "A Commuter
Rail Crew Assignment/Work Rules Model", Computer Scheduling of
Transport 2, (Edited by J-M. Rousseau), Elsevier Publishers, North-
Holland, pp. 232-246 (1985).

Ward, R.E., Durant, P.A., and Hallman,A.B., "A Problem Decomposi-


tion Approach to Scheduling the Drivers and Crews of Mass Transit
Systems", Computer Scheduling of Public Transport, (Edited by A.
Wren), North-Holland Publishing Company, pp. 297-316 (1981).

Wren, A., Smith, B.M., and Miller, A.J., "Complimentary Approaches


to Crew Scheduling", Computer Scheduling of Transport 2, (Edited
by J.H. Rousseau), North-Holland Publishing Company, pp. 263-278
(1985).

Wren, A., "General Review of the Use of Computers in Scheduling Buses


and Their Crews", Computer Scheduling of Public Transport: Urban
Passenger Vehicle and Crew Scheduling, (Edited by A. Wren), North-
Holland Publishing Company, pp. 3-16 (1981).
Manufacturing

Baybars, I., "A Survey of Exact Algorithms for the Simple Assembly Line
Balancing Problem", Management Science, Vol. 32, pp. 909-932
(1986a).

Baybars, I., "An Efficient Heuristic Method for the Simple Assembly Line
Balancing Problem", International Journal of Production Research,
Vol. 24, pp. 149-166 (1986b).

Bowman, E.H., "Assembly-Line Balancing by Linear Programming", Op-


erations Research, Vol. 8, pp. 385- 389 (1960).
Set Covering, Set Packing and Set Partitioning 685

Cattrysse, D., Saloman, M., Kuik, R, and Van Wassenhove, L.N., "A Dual
Ascent and Column Generation Heuristic for the Discrete Lotsizing an
Scheduling Problem with Setup Times" , Management Science, Vol. 39,
pp. 477-486 (1993).

Cattrysse, D., Maes, J. and Van Wassenhove, L.N., "Set Partitioning and
Column Generation Heuristics for Capacitated Dynamic Lotsizing",
European Journal of Operational Research, Vol. 46, pp. 38-47 (1990)

Cattrysse, J., Maes, J., and Van Wassenhove, L.N., "Set Partitioning
Heuristic for Capacitated Lotsizing", Working Paper 88-12, Division
of Industrial Management, Katholieke Universiteit Leuven, Belgium.
(1988).

Dzielinski, B.P., and Gomory, RE., "Optimal Programming of Lot Sizes


Inventory and Labor Allocations", Management Science, Vol. 11, pp.
874 - 890 (1965).

Fisher, M.L., "Optimal Solution of Scheduling Problems Using Lagrange


Multipliers - Part I", Operations Research, Vol. 21, pp. 1114-1127
(1973).

Freeman, D.R, and Jucker, J.V., "The Line Balancing Problem", Journal
of Industrial Engineering, Vol. 18, pp. 361-364 (1967).

Gutjahr, A.L., and Nemhauser,G.L., "An Algorithm for the Line Balancing
Problem", Management Science,Vol. 11, pp. 308-315 (1964).

Hackman, S.T., Magazine, M.J., and Wee, T.S., "Fast Effective Algorithms
for Simple Assembly Line Balancing Problems", Operations Research,
Vol. 37, pp. 916-924 (1989).

Hoffmann, T.R, "Eureka: A Hybrid System for Assembly Line Balancing",


Management Science, Vol. 38, pp. 39-47 (1992).

Ignall, E.J., "A Review of Assembly Line Balancing", The Journal of In-
dustrial Engineering, Vol. 16, pp. 244-254 (1965).

Johnson, RV., "A Branch and Bound Algorithm for Assembly Line Balanc-
ing Problems with Formulation Irregularities", Management Science,
Vol. 29, pp. 1309-1324 (1983).
686 RR Vemuganti

Johnson, R.V., "Assembly Line Balancing Algorithms: Computational


Comparisons:, International Journal of Production Research, Vol. 19,
pp. 277-287 {1981}.

Kilbridge, M.D., and Webster, L., "A Review of Analytical Systems of Line
Balancing", Operations Research, Vol. 10, pp. 626-638 {1962}.

Lasdon, L.S., and Terjung, R.C., "An Efficient Algorithm for Multi-Item
Scheduling", Operations Research, Vol. 19, pp. 946 - 969 {1971}.

Manne, A.S., "Programming of Economic Lot Sizes", Management Science,


Vol. 4, pp. 115-135 {1958}.

Patterson, J.H., and Albracht, J.J., "Assembly-Line Balancing: Zero-One


Programming with Fibonacci Search", Operations Research, Vol. 23,
pp. 166 - 174 {1975}.

Pierce, J.F., "Pattern Sequencing and Matching a Stock Cutting Opera-


tions", Tappi, Vol. 53, pp. 668-678 (1970).

Salveson, M.E., "The Assembly Line Balancing Problem", Journal of In-


dustrial Engineering, Vol. 6, pp. 18- 25 (1955).

Scudder, G.D., "Priority Scheduling and Spare Parts Stocking Policies for
a Repair Shop: The Multiple Failure Case" , Management Science, Vol.
30, pp. 739-749 {1984}.

Stanfell, L.E., "Successive Approximation Procedures for a Cellular Man-


ufacturing Problem with Machine Loading Constraints", Engineering
Costs and Production Economics, Vol. 17, pp. 135-147 (1989).

Talbot, F.B., and Gehrlein, W.V., "A Comparative Evaluation of Heuristic


Line Balancing Techniques", Management Science, Vol. 32, pp. 430-
454 (1986).

Talbot, F.B., and Patterson, J.H., "An Integer Programming Algorithm


with Network Cuts Solving the Assembly Line Balancing Problem",
Management Science, Vol. 30, pp. 85-99 (1984).

Vasko, F.J., Wolf, F.E., and Scott, K.L., Jr., "A Set Covering Approach
to Metallurgical Grade Assignment" , European Journal of Operations
Research, Vol. 38, pp. 27 - 34 (1989).
Set Covering, Set Packing and Set Partitioning 687

Vasko, F.J., Wolf, F.E., and Scott, K.L., "Optimal Selection of IGNOT
Sizes Via Set Covering", Operations Research, Vol. 35, pp. 346-353
{1987}.
White, W.W., "Comments on a Paper by Bowman", Operations Research,
Vol. 9, pp. 274 - 276 {1961}.
Miscellaneous Operations
Almond, M., "A University Faculty Time Table", Computer Journal, Vol.
12, pp. 215-217 (1969).
Almond, M., "An Algorithm for Constructing University Time-Table",
Computer Journal, Vol. 8, pp. 331 - 340 (1966).
Aneja, Y.P., and Vemuganti, RR, "Set Covering and Fixed Charge Trans-
portation Problem", Technical Report, University of Baltimore, Mary-
land (1974).
Arani, T., and Lotfi, V., "A Three Phased Approach to Final Exam Schedul-
ing" , liE Transactions, Vol. 21, pp. 86 - 96 (1989).
Aubin, J., and Ferland, J.A., "A Large Scale Timetabling Problem", Com-
puters and Operations Research, Vol. 16, pp. 67 - 77 (1989).
Aust, RJ., "An Improvement Algorithm for School Timetablikng", Com-
puter Journal, Vol. 19, pp. 339-345 {1976}.
Barham, A.M., and Westwood, J.B., "A Simple Heuristic to Facilitate
Course Timetabling", Journal of the Operational Research Society,
Vol. 29, pp. 1055-1060 (1978).
Broder, S., "Final Examination Scheduling", Communications of ACM,
Vol. 7, pp. 494-498 (1964).
Carter, M.W., and Tovey, C.A., "When is the Classroom Assignment Prob-
lem Hard?", Operations Research, Vol. 40, pp. S28-S39 (1992).
Carter, M.W., "A Lagrangian Relaxation Approach to Classroom Assign-
ment Problem", INFOR, Canadian Journal of Operations Research
and Information Processing, Vol. 27, pp. 230-246 (1989).
Carter, M.W., "A Survey of Practical Applications of Examination Timetable
Scheduling", Operations Research, Vol. 34, pp. 193-202 (1986).
688 R.R. Vemuganti

Csima, J., and Gotleib, G.C., "Tests on a Computer Method for Construct-
ing Timetables", Communications of the ACM, Vol. 7, pp. 160 - 163
(1964).

Day, R.H., "On Optimal Extracting from a Multiple Data Storage System:
An Application of Integer Programming", Operations Research, Vol.
13, pp. 482-494 (1965).

Dempster, M.A.H., "Two Algorithms for the Timetabling Problem" , Com-


binatorial Mathematics and Applications (Edited by D.J.A. Welsh),
Academic Press, pp. 63 - 85 (1971).

DeWerra, D., "Some Comments on a Note About Timetabling", INFOR,


Canadian Journal of Operations Research and Information Processing,
Vol. 16, pp. 90-92 (1978).

DeWerra, D., "On a Particular Conference Scheduling Problem", INFOR,


Canadian Journal of Operations Research and Information Processing,
Vol. 13, pp. 308 - 315 (1975).

Even, S., Itai, A., and Shamir, A., "On the Complexity of Timetable and
Multicommodity Flow Problems", SIAM Journal on Computing, Vol.
5, pp. 691 - 703 (1976).

Ferland, P.C., and Roy, S., "Timetabling Problem for University as Assign-
ment of Activities to Resources" , Computers and Operations Research,
Vol. 12, pp. 207-218 (1985).

Frank, R.S., "On the Fixed Charge Hitchcock Transportation Problem",


Ph.D. Dissertation, The Johns Hopkins University, Baltimore, Mary-
land (1972).

Fulkerson, D.R., Nemhauser, G.L., and Trotter, LE., "Two Computation-


ally Difficult Set Covering Problems that Arises in Computing the 1-
With of Incidence Matrices of Steiner Triple Systems", Mathematical
Programming Study, Vol. 2, pp. 72-81 (1974).

Gans, O.B. de., "A Computer Timetabling System for Secondary Schools
in Netherlands", European Journal of Operational Research, Vol. 7,
pp. 175 - 182 (1981).
Set Covering, Set Packing and Set Partitioning 689

Garfinkel, R.S., Kunnathur, A.S., and Liepins, G.E., "Optimal Imputa-


tion of Erroneous Data: Categorial Data, General Edits", Operations
Research, Vol. 34, pp. 744-751 (1986).

Garfinkel, RS., and Nemhauser, G.L., "Optimal Political Distracting by


Implicitly Enumeration Techniques", Management Science, Vol. 16,
pp. B495-B508 (1970).

Glassey, C.R., and Mizrach, M., "A Decision Support System for Assigning
Classes to Rooms", Interfaces, Vol. 16, pp. 92 - 100 (1986).
Gosselin, K., and Trouchon, M., "Allocation of Classrooms by Linear Pro-
gramming", Journal of Operational Research Society, Vol. 37, pp. 561
- 569 (1986).
Grimes, J., "Scheduling to Reduce Conflict in Meetings", Communications
of the ACM, Vol. 13, pp. 351- 352 (1970).

Hall, A., and Action, F., "Scheduling University Course Examination by


Computer", Communications of the ACM, Vol. 10, pp. 235-238
(1967).

Heath, L.S., "Covering a Set with Arithmetic Progressions is NP-Complete",


Information Processing Letters", Vol. 34, pp. 293-298 (1990).

Hertz, A., "Find a Feasible Course Schedule Using Tabu Search", Discrete
Applied Mathematics, Vol. 35, pp. 255-270 (1992).

Knauer, B.A., "Solution of Timetable Problem", Computers and Opera-


tions Research, Vol. 1, pp. 363 - 375 (1974).

LaPorte, G., and Desroches, S., "Examination Timetabling by Computer",


Computers and Operations Research, Vol. 11, pp. 351-360 (1984).

Lions, J., "The Ontario School Scheduling Problem", Computer Journal,


Vol. 10, pp. 14 - 21 (1967).

Markland, RE., and Nauss, RM., "Improving Transit Clearing Operations


at Maryland National Bank", Interfaces, Vol. 13, pp. 1-9 (1983).

McKeown, P.G., "A Branch-and-Bound Algorithm for Solving Fixed-Charge


Problems", Naval Research Logistics Quarterly, Vol. 28, pp. 607-617
(1981).
690 R.R. Vemuganti

Mehta, N.K., "The Application of a Graph Coloring Method to an Exam-


ination Scheduling Problem", Interfaces, Vol. 11, pp. 57-64 {1981}.
Mulvey, J.M., "A Classroom/Time Assignment Model", European Journal
of Operational Research, Vol. 9, pp. 64-70 (1982).
Nawijn, W.M., "Optimizing the Performance of a Blood Analyzer: Appli-
cation of the Set Partitioning Problem" , European Journal of Opera-
tional Research, Vol. 36, pp. 167-173 (1988).
Reggia, J.A., Naw, D.S., and Wang, P.Y., "Diagnostic Expert Systems
Based on Set Covering Model" , International Journal of Man-Machine
Studies, Vol. 19, pp. 437-460, (1983).
Thuve, H., "Frequency Planning as a Set on Partitioning Problem", Euro-
pean Journal of Operational Research, Vol. 6, pp. 29-37 (1981).
Tripathy, A., "School Timetabling - A Case in Large Binary Linear Integer
Programming", Management Science, Vol. 30, pp. 1473-1489 (1984).
Tripathy, A., "A Lagrangian Relaxation Approach to Course Timetabling",
Journal of the Operational Research Society, Vol. 31, pp. 599-603
(1980).
Valenta, J.R., "Capital Equipment Decision: A Model for Optimal Sys-
tems Interfacing", M.S. Thesis, Massachusetts Institute of Technology
(1969).
Van Slyke, R., "Redundant Set Covering in Telecommunications Networks",
Proceedings of the 1982 IEEE Large Scale Systems Symposium, pp.
217-222 {1982}.
White, G.M., and Chan, P.W., "Towards the Construction of Optimal Ex-
amination Schedules", INFOR, Canadian Journal of Operations Re-
search and Information Processing, Vol. 17, pp. 219-229 (1979).
Wood, D.C., "A Technique for Coloring a Graph Applicable to Large Scale
Time-Tabling Problems", Computer Journal, Vol. 12, pp. 317-319
(1969).
Woodbury, M., Ciftan, E., and Amos, D., "HLA Serum Screening Based on
an Heuristic Solution of the Set Cover Problem" , Computer Programs
in Biomedicine, Vol. 9, pp. 263-273 (1979).
Set Covering, Set Packing and Set Partitioning 691

Routing
Agarwal, Y., Mathur, K., and Salkin, H.M., "A Set-Partitioning Based
Algorithm for the Vehicle Routing Problem", Networks, Vol. 19, pp.
731-750 (1989).
Agin, N., and Cullen, D., "An Algorithm for Transportation Routing and
Vehicle Loading", Logistics (Edited by M. Geisler), North Holland,
Amsterdam, pp. 1-20 (1975).
Altinkemer, K., and Gavish, B., "Parallel Savings Based Heuristics for the
Delivery Problem", Operations Research, Vol. 39, pp. 456-469 (1991).
Altkinkemer, K., and Gavish, B., "Heuristics for the Delivery Problem
with Constant Error Guarantees", Transportation Science, Vol. 24,
pp. 294-297 (1990).
Altkinkemer, K., and Gavish, B., "Heuristics for Unequal Weight Delivery
Problem with a Fixed Error Guarantee", Operations Research Letters,
Vol. 6, pp. 149-158 (1987).
Angel, R., Caudle, W., Noonan, R., and Whinston, A., "A Computer
Assisted School Bus Scheduling", Management Science, Vol. 18, pp.
279-288 (1972).
Anily, S., and Federgruen, A., "Two-Echelon Distribution Systems with
Vehicle Routing Costs and Central Inventories", Operations Research,
Vol. 41, pp. 37-47 (1993).
Anily, S., and Federgruen, A., "Rejoinder to Comments on One-Warehouse
Multiple Retailer Systems with Vehicle Routing Costs", Management
Science, Vol. 37, pp. 1497-1499 (1992).
Anily, S., and Federgruen, A., "A Class of Euclidean Routing Problems
with General Route Costs Functions" , Mathematics of Operations Re-
search, Vol. 15, pp. 268-285 (1990a).
Anily S., and Federgruen, A., "One-Warehouse Multiple Repair Systems
with Vehicle Routing Costs", Management Science, Vol. 36, pp. 92-
114 (1990b).
Appelgren, L., "Integer Programming Methods for a Vessel Scheduling
Problem", Transportation Science, Vol. 5, pp. 64-78 (1971).
692 R.R. Vemuganti

Appelgren, K., "A Column Generation Algorithm for a Ship Scheduling


Problem", Transportation Science, Vol. 3, pp. 53-68 (1969).

Arisawa, S., and Elmaghraby, S.E., "The 'Hub' and 'Wheel' Scheduling
Problems; I. The Hub Scheduling Problem: The Myopic Case" , Trans-
portation Science, Vol. 11, pp. 124 - 146 (1977a).

Arisawa, S., and Elmaghraby S.E., "The 'Hub' and 'Wheel' Scheduling
Problems, II. The Hub Operation Scheduling Problem (HOSP): Multi-
Period and Infinite Horizon and the Wheel Operation Scheduling Prob-
lem (WOSP)" Transportation Science, Vol. 11, pp. 147-165 (1977b).

Assad, A., " Analy tic Models in Rail Transportation: An Annotated Bib-
liography", INFOR, Canadian Journal of Operations Research and
Information Processing, Vol. 19, pp. 59-80 (1981).

Assad, A., "Models for Rail Transportation", Transportation Research,


Vol. 14A, pp. 205-220 (1980).

Averabakh, I., and Berman, 0., "Sales-Delivery Man Problems on Tree-


Like Networks", Working Paper, Faculty of Management, University
of Toronto, Canada (1992).

Baker, B.M., "Further Improvements to Vehicle Routing Heuristics", Jour-


nalofthe Operational Research Society, Vol. 43, pp. 1009-1012 (1992).

Baker, E., and Schaffer, J., "Solution Improvement Heuristics for the Vehi-
cle Routing and Scheduling Problem with Time Window Constraints" ,
American Journal of Mathematical and Management Sciences, Vol. 6,
pp. 261-300 (1986).

Balinksi, M., and Quandt, R., "On an Integer Program for the Delivery
Problem", Operations Research, Vol. 12, pp. 300 -304 (1964).

Ball, M., Golden, B., Assad, A., and Bodin, L., "Planning for Truck Fleet
Size in the Presence of a Common Carrier Option", Decision Sciences,
Vol. 14, pp. 103-130 (1983).

Bartholdi, J., Platzman, L., Collins, R., and Warden, W., "A Minimal
Technology Routing System for Meals on Wheels", Interfaces, Vol.
13, pp. 1-8 (1983).
Set Covering, Set Packing and Set Partitioning 693

Bartlett, T., "An Algorithm for the Minimum Number of Transport Units
to Maintain a Fixed Schedule", Naval Research Logistics Quarterly,
Vol. 4, pp. 139-149 (1957).

Bartlett, T., and Charnes, A., " Cyclic Scheduling and Combinatorial Topol-
ogy: Assignment of Routing and Motive Power to Meet Scheduling
Maintenance Requirements: Part II, Generalizations and Analysis.",
Naval Research Logistics Quarterly, Vol. 4, pp. 207-220 (1957).

Barton, R., and Gumaer, R., "The Optimum Routing for an Air Cargo
Carrier's Mixed Fleet", Transportation - A Series, pp. 549-561. New
York Academy of Science, New York (1968).

Beasley, J., "Fixed Routes", Journal of the Operational Research Society,


Vol. 35, pp. 49-55 (1984).

Beasley, J.E., "Route First-Cluster Second Methods for Vehicle Routing",


OMEGA, The International Journal of Management Science, Vol. 11,
pp. 403-408 (1983).

Beasley, J., "Adapting the Savings Algorithm for Varying Inter-Customer


Travel Times", OMEGA, The International Journal of Management
Science, Vol. 9, pp. 658-659 (1981).

Bell, W., Dalberto, L., Fisher, M., Greenfield, A., Jaikumar, R., Keedia,
P., Mack, R., and Prutzman, P., "Improving the Distribution of In-
dustrial Gases with an On-Line Computerized Routing and Scheduling
Optimizer", Interfaces, Vol. 13, pp. 4-23 (1983).

Bellman R., " On a Routing Problem", Quarterly Journal of Applied Math-


ematics, Vol. 16, pp. 87 - 90 (1958).

Bellmore, M., and Hong, S., "Transformation of the Multisalesman Prob-


lem to the Standard Traveling Salesman Problem", Journal of the
ACM, Vol. 21, pp. 500 - 504 (1974).

Beltrami, E., and Bodin, L., "Networks, and Vehicle Routing for Municipal
Waste Collection", Networks, Vol. 4, pp. 65-94 (1974).

Bennett, B., and Gazis, D., "School Bus Routing by Computer", Trans-
portation Research, Vol. 6, pp. 317- 325 (1972).
694 R.R. Vemuganti

Bertsimas, D.J., "A Vehicle Routing Problem with Stochastic Demand",


Operations Research, Vol. 40, pp. 574-585 (1992).
Bertismas, D.J., and Van Ryzin G., "A Stochastic and Dynamic Vehicle
Routing Problem in the Euclidean Plane", Operations Research, Vol.
39, pp. 601-615 (1991).
Bertsimas, D., "The Probabilistic Vehicle Routing Problem", Sloan Work-
ing Paper No. 2067-88, Massachusetts Institute of Technology, Cam-
bridge, Massachusetts (1988).
Bodin, L, and Sexton, T., "The Multiple - Vehicle Subscriber Dial-A-Ride
Problems", TIMS Studies in the Management Sciences, Vol. 22, pp.
73-76 (1986).
Bodin, L.D., Golden, L.B., Assad, A.A., and Ball, M.O., "Routing and
Scheduling of Vehicles and Crews: The State of Art", Computers and
Operations Research, Vol. 10, pp. 65-211 (1983).
Bodin, L., and Golden, B., "Classification in Vehicle Routing and Schedul-
ing", Networks, Vol. 11, pp. 97- 108 (1981).
Bodin, L., and Golden, B., Schuster A.D., and Romig, W., "A Model for
the Blocking of Trains", Transportation Research, Vol. 14B, pp. 115-
120 (1980).
Bodin, L., and Berman, L., "Routing and Scheduling of School Busses by
Computer", Transportation Science, Vol. 13, pp. 113-129 (1979).
Bodin, L., and Kursh, S., "A Detailed Description of a Computer System
for the Routing and Scheduling of Street Sweepers", Computers and
Operations Research, Vol. 16, pp. 181-198 (1979).
Bodin, L., and Kursh, S., "A Computer-Assisted System for the Routing
and Scheduling of Street Sweepers", Operations Research, Vol. 26, pp.
527-637 (1978).
Bodin, L., "A Taxonomic Structure for Vehicle Routing and Scheduling
Problems", Computers and Urban Society, Vol. 1, pp. 11-29 (1975).
Bodner, R., Cassell, E., and Andros, P., "Optimal Routing of Refuse Col-
lection Vehicles", Journal of Stationary Engineering Division, ASCE,
96(SA4) Proceedings Paper 7451, pp. 893-904 (1970).
Set Covering, Set Packing and Set Partitioning 695

Bramel, J., and Simchi-Levi, D., "A Location Based Heuristic for General
Routing Problems", Technical Report, Graduate School of Business,
Columbia University {1993}.

Bramel, J., Coffman, E.G., Shor, P.W., and Simchi-Levi, D., "Probabilistic
Analysis of the Capacitated Vehicle Routing Problem With Unsplit
Demands", Operations Research, Vol. 40, pp. 1095-1106 {1992}.

Brown, G.B., Graves, G.W., and Ronen, D., "Scheduling Ocean Trans-
portation of Crude Oil", Management Science, Vol. 33, pp. 335-346
{1987}.

Brown, G., and Graves, G., "Real Time Dispatch of Petroleum Tank
Trucks", Management Science, Vol. 27, pp. 19-32 {1981}.

Butt, S., and Cavalier, T.M., "A Heuristic for the Multiple Tour Maximum
Collection Problem", Department of Industrial and Managerial Sys-
tems Working Paper, The Pennsylvania State University, University
Park, Pennsylvania {1991}.

Cassidy, P.J., and Bennett, H.S., "Tramp - A Multiple Depot Vehicle


Scheduling System", Operational Research Quarterly, Vol. 23, pp.
151-163 {1972}.

Ceder, A., and Stern, H., "Deficit Function Bus Scheduling and Dead-
heading Trip Insertations for Fleet Size Reduction", Transportation
Science, Vol. 15, pp. 338-363 {1981}.

Chard, R., "An Example of An Integrated Man-machine System for Truck


Scheduling" , Operational Research Quarterly, Vol. 19, pp. 108 {1968}.

Charnes, A., and Miller, M.H., "A Model for the Optimal Programming of
Railway Freight Train Movements", Management Science, Vol. 3, pp.
74-92 {1956}.

Cheshire, I.M., Malleson, A.M., and Naccache, P.F., "A Dual Heuristic for
Vehicle Scheduling" , Journal of the Operational Research Society, Vol.
33, pp. 51-61 {1982}.

Chien, T.W., Balakrishnan, A., and Wong, R.T., "An Integrated Inven-
tory Allocation and Vehicle Routing Problem", Transportation Sci-
ence, Vol. 23, pp. 67-76 {1989}.
696 R.R. Vemuganti

Christofides, N., "Vehicle Routing", Traveling Salesman Problem, (Edited


by E.L. Lawler, J.K. Lenstra, A.H.A. Rinooy Kan and D.S. Shmoys),
John Wiley & Sons, pp. 431-448 (1985a).

Christofides, N., "Vehicle Routing", Combinatorial Optimization: Anno-


tated Bibliographies (Edited by M. O'Heigeartaigh, J.K. Lenstra and
A.H.G Kinnooy Kan) , pp. 148-163, Centre for Mathematics and Com-
puter Science, Amsterdam (1985b).

Christofides, N., and Beasley, J., "The Period Routing Problem", Net-
works, Vol. 14, pp. 237-256 (1984).

Christofides, N., Mingozzi, A., and Toth, P., "Exact Algorithms for the Ve-
hicle Routing Problem, Based on Spanning Tree and Shortest Path Re-
laxations", Mathematical Programming, Vol. 20, pp. 255-282 (1981a).

Christofides, N., Mingozzi, A., and Toth, P., "State Space Relaxation Pro-
cedures for the Computation of Bounds to Routing Problems", Net-
works, Vol. 11, pp. 145-164 (1981b).

Christofides, N., Mingozzi, A., and Toth, P., "The Vehicle Routing Prob-
lem", In Christofides et al., Combinatorial Optimization, Wiley and
Sons, New York, pp. 315-338 (1979).

Christofides, N., "The Vehicle Routing Problem", Revue Francoise


d'Automatique, Informatique et Recherche Operationnelle (RAIRO),
Vol. 10, pp. 55-70 (1976).

Christofides, N., and Eilon, S., "Algorithms for Large Scale Traveling Sales-
man Problem", Operational Research Quarterly, Vol. 23, pp. 511-518
(1973).

Christofides, N., "Fixed Routes and Areas of Delivery Operations", Inter-


national Journal of Physical Distribution, Vol. 1, pp. 87-92 (1971).

Christofides, N., and Eilon, S., "An Algorithm for the Vehicle-Dispatching
Problem", Operational Research Quarterly, Vol. 20, pp. 309-318
(1969).

Clarke, G., and Wright, J.W., "Scheduling of Vehicles from a Central Depot
to a Number of Delivery Points", Operations Research, Vol. 12, pp.
568-681 (1964).
Set Covering, Set Packing and Set Partitioning 697

Crawford, J.L., and Sinclair, G.B., " Computer Scheduling of Beer Tanker
Deliveries" , International Journal of Physical Distribution, Vol. 7, pp.
294-304 {1977}.
Cullen, F.H., Jarvis, J.J., and Ratliff, H.D., "Set Partitioning Based Heuris-
tic for Interactive Routing", Networks, Vol. 11, pp. 125-143 {1981}.
Cunto, E., "Scheduling Boats to Sample Oil Wells in Lake Maracaibo",
Operations Research, Vol. 26, pp. 183-196 {1978}.
Daganzo, C., "The Distance Traveled to Visit N Points With A Maximum
of C Stops Per Vehicle: An Analytical Model and An Application" ,
Transportation Science, Vol. 18, pp. 331-350 {1984}.
Daganzo, C., "An Approximate Analytic Model of Many-to-Many Demand
Responsive Transportation Systems", Transportation Research, Vol.
12, pp. 325-333 {1978}.
Dantazig, G.G., and Ramser, J.H., "The Truck Dispatching Problem",
Management Science, Vol. 6, pp. 80- 91 {1959}.
Desrochers, M., Desrosiers, J., and Solomon, M., "A New Optimization
Algorithm for the Vehicle Routing with Time Windows", Operations
Research, Vol. 40, pp. 342-354 {1992}.
Desrochers, M., Lenstra, J.K., Savelsbergh, M.M.P, and Soumis, F., "Vehi-
cle Routing with Time Windows: Optimization and Approximation" ,
Vehicle Routing: Methods and Studies, {Edited by B.L. Golden and
A.A. Assad}, North-Holland, Amsterdam, pp. 65-84 {1988}.
Desrosiers, J., Dumas, Y., and Soumis, F., "A Dynamic Programming Solu-
tion of the Large-Scale Single- Vehicle Dial-A-Ride Problem with Time
Windows" , The American Journal of Mathematical and Management
Sciences, Vol. 6, pp. 301 - 325 {1986}.
Desrosiers, J., Soumis, F., Desrochers, M., and Sauve, M., "Routing and
Scheduling Time Windows Solved by Network Relaxation and Branch-
and-Bound on Time Variables", Computer Scheduling of Public Trans-
port 2 {Edited by J.-M. Rousseau}, pp. 451-471, Elsevier Science Pub-
lishers {1985}.
Desrosiers, J., Soumis, F., and Desrochers, M., "Routing with Time Win-
dows by Column Generation", Networks, Vol. 14, pp. 545-565 {1984}.
698 R.R. Vemuganti

Doll, L.L., "Quick and Dirty Vehicle Routing Procedures", Interfaces, Vol.
10, pp. 84-85 (1980).
Dror, M., and Trudeau, P., "Split-Delivery Routing", Naval Research Lo-
gistics Quarterly, Vol. 37, pp. 383- 402 (1990).
Dror, M., and Ball, M., "Inventory/Routing: Reduction from an Annual
to a Short-Period Problem", Naval Research Logistics Quarterly, Vol.
34, pp. 891-905 (1987).
Dror, M., and Levy, L.," A Vehicle Routing Improvement Algorithm Com-
parison of a Greedy and Matching Implementation of Inventory Rout-
ing", Computers and Operations Research, Vol. 13, pp. 33-45 (1986).
Dror, M., Ball, M., and Golden, B., "A Computational Comparison of
Algorithms for the Inventory Routing Problem" , Annals of Operations
Research, Vol. 4, pp. 3-23 (1986).
Dulac, G., Ferland, J., and Forgues, P., "School Bus Routes Generator in
Urban Surroundings", Computers and Operations Research, Vol. 1,
pp. 199-213 (1980).
Dumas, Y., Desrosiers, J., and Soumis, F., "The Pick-up and Delivery
Problem with Time Windows", European Journal of Operational Re-
search, Vol. 54, pp. 7-22 (1991).
Eilon, S., Watson-Gandy, G., and Christofides, N., "Distribution Manage-
ment", Griffin, London (1971).
Eilon, S., and Christofides, N., "An Algorithm for the Vehicle Dispatch-
ing Problem", Operational Research Quarterly, Vol. 20, pp. 309-318
(1969).
EI-Azm, A., "The Minimum Fleet Size Problem and Its Applications to
Bus Scheduling" , Computer Scheduling of Public Transport 2 (Edited
by J.-M. Rousseau), pp. 493-512, Elsevier Science Publishers (1985).
Etezadi, T., and Beasley, J., "Vehicle Fleet Composition", Journal of the
Operational Research Society, Vol. 34, pp. 87-91 (1983).
Evans, S.R., and Norback, J.P., "The Impact of a Decision Support System
for a Vehicle Routing in a Food Service Supply Situation", Journal of
the Operational Research Society, Vol. 36, pp. 467-472 (1985).
Set Covering, Set Packing and Set Partitioning 699

Farvolden, J.M., Laporte, G., and Xu, J., " Solving an Inventory Allocation
and Routing Problems Arising in Grocery Distribution", CRT-886,
Centre De Recherche Sur Les Transports, Universite De Montreal,
Montreal, Canada {1993}.
Federgruen, A., and Simchi-Levi, D., " Analytical Analysis of Vehicle Rout-
ing and Inventory Routing Problems", Handbooks in Operations Re-
search and Management Science, the volume on "Networks and Distri-
bution" , {Edited by M. Ball, C. Magnanti, C. Monma and G. Nemhauser}
{1992}.
Federgruen, A., and Zipkin, P., "A Combined Vehicle Routing and Inven-
tory Allocation Problem", Operations Research, Vol. 32, pp. 1019-
1037 {1984}.
Ferebee, J., " Controlling Fixed-Route Operations", Industrial Engineering,
Vol. 6, pp. 28-31 {1974}.
Ferguson, A., and Dantzig, G., "The Allocation of Aircraft to Routes - An
Example of Linear Programming Under Uncertain Demand", Manage-
ment Science, Vol. 3, pp. 45-73 {1956}.
Fisher, M.L., and Rosenwein, M.B., "An Interactive Optimization System
for Bulk Cargo Ship Scheduling", Naval Research Logistic Quarterly,
Vol. 36, pp. 27-42 {1989}.
Fisher, M.L., Greenfield, A.J., Jaikumar, R., and Lester, J.T. III, " A
Computerized Vehicle Routing Application", Interfaces, Vol. 12, No.
4, pp. 42-52 {1982}.
Fisher, M.L., and Jaikumar, R., "A Generalized Assignment Heuristic for
Vehicle Routing", Networks, Vol. ll,.pp. 109-124 {1981}.
Fletcher, A., "Alternative Routes Round the Delivery Problem", Data and
Control, Vol. 1, pp. 20-22 {1963}.
Fleuren, H.A., "A Computational Study of the Set Partitioning Approach
for Vehicle Routing and Scheduling Problems", Ph.D Dissertation,
Universiteit Twente, Netherlands {1988}.
Florian, M., Guerin,G., and Bushel, G., "The Engine Scheduling Prob-
lem in a Railway Network", INFOR, Canadian Journal of Operations
Research and Information Processing, Vol. 15, pp. 121-138 {1976}.
700 R.R. Vemuganti

Foster, B.A., and Ryan, D.M., "An Integer Programming Approach to the
Vehicle Scheduling Problem", Operational Research Quarterly, Vol.
27, pp. 367-384 (1976).

Foulds, L.R., Read, E.G., and Robinson, D.F., "A Manual Solution Pro-
cedure for the School Bus Scheduling Problem", Australian Road Re-
search, Vol. 7, pp. 1-35 (1977).

Frederickson, G., Hecht, M., and Kim, C., "Approximation Algorithms for
Some Routing Problems", SIAM Journal on Computing, Vol. 7 pp.
178-193 (1978).

Garvin, W., Crandall, H., John, J., Spellman, R., "Applications of Vehicle
Routing in the Oil Industry", Management Science, Vol. 3, pp. 407-
430 (1957).

Gaskell, T., "Bases for Vehicle Fleet Scheduling", Operational Research


Quarterly, Vol. 18, pp. 281-295 (1967).

Gaudioso, M., and Paletta, G., "A Heuristic for the Periodic Vehicle Rout-
ing Problem", Transportation Science, Vol. 26, pp. 86-92 (1992).

Gavish B., and Shlifer, E., "An Approach for Solving a Class of Transporta-
tion Scheduling Problem", European Journal of Operational Research,
Vol. 3, pp. 122-134 (1978).

Gavish, B., and Schweitzer, P., and Shlifer, E., " Assigning Buses to Sched-
ules in a Metropolitan Area", Computers and Operations Research,
Vol. 5, pp. 129-138 (1978).

Gendreau, M., Hertz, A., and LaPorte, G., "A Tabu Search Heuristic for
the Vehicle Routing Problem", CRT 777, Centre De Recherche Sur
Les Transports, Universite De Montreal, Montreal, Canada (1992).

Gertsbach, I., and Gurevich, Y., "Constructing an Optimal Fleet for a


Transportation Schedule", Transportation Science, Vol. 11, pp. 20-36
(1977).

Gheysens, F., Golden, B., and Assad, A., "A Comparison of Techniques
for Solving the Fleet Size and Mix Vehicle Routing Problem", OR
Spektrum, Vol. 6, pp. 207-216 (1984).
Set Covering, Set Packing and Set Partitioning 701

Gillett, B.E., and Johnson, J., "Multi-Terminal Vehicle-Dispatch Algo-


rithm", OMEGA, The International Journal of Management Science,
Vol. 4, pp. 711-718 {1976}.

Gillett, B.E., and Miller, L.R, "A Heuristic Algorithm for the Vehicle-
Dispatch Problem", Operations Research, Vol. 22, pp. 340-349 {1974}.

Golden, B.L., and Assad, A.A., "Vehicle Routing: Methods and Studies",
North-Holland, Amsterdam {1988}.

Golden, B., and Wasil, E., "Computerized Vehicle Routing in the Soft
Drink Industry", Operations Research, Vol. 35, pp. 6-17 {1987}.

Golden, B., and Assad, A.A., "Prospectives of Vehicle Routing: Excit-


ing New Developments", Operations Research, Vol. 34, pp. 803-810
(1986a).

Golden, B., and Assad, A., "Vehicle Routing with Time Window Con-
straints", American Journal of Mathematical and Management Sci-
ences, Vol. 15, pp .. (1986b).

Golden, B., Bodin, L., and Goodwin, T., "Micro Computer-Based Vehicle
Routing and Scheduling Software", Computers and Operations Re-
search, Vol. 13, pp. 277-285 (1986).

Golden, B., and Baker, E., "Future Directions in Logistics Research",


Transportation Research, Vol. 19A, pp. 405-409 (1985).

Golden, B., Gheysens, F., and Assad, A., "On Solving the Vehicle Fleet
Size and Mix Problem", Operations Research, (Edited by J.P. Barnes),
North Holland (1984).

Golden, B., Assad, A., and Dahl, R, "Analysis of a Large-Scale Vehicle


Routing Problem with An Inventory Component", Large Scale Sys-
tems, Vol. 7, pp. 181-190 (1984).

Golden, B., Levy, L., and Dahl, R, "Two Generalizations of the Traveling
Salesman Problem", OMEGA, The International Journal of Manage-
ment Science, Vol. 9, pp. 439-441 (1981).

Golden, B., and Wong, R, "Capacitated Arc Routing Problems", Net-


works, Vol. 11, pp. 305-315 (1981).
702 R.R. Vemuganti

Golden, B., Magnanti, T.L., and Nguyen, H.Q., "Implementing Vehicle


Routing Algorithms", Networks, Vol. 7, pp. 113-148 (1977).
Golden, B., "Evaluating a Sequential Vehicle Routing Algorithm", AIlE
Transactions, Vol. 9, pp. 204-208 (1977).
Haimovich, M., Rinnooy Kan, H.G., and Stouge, L., "Analysis of Heuristics
for Vehicle Routing Problem", Vehicle Routing: Methods and Studies,
(Edited by B.L. Golden, A.A. Assad), Elsevier Science Publishers, pp.
47-61 (1988).
Haimovich, M., and Rinnooy Kan, A., "Bounds and Heuristics for Capaci-
tated Routing Problems" , Mathematics for Operations Research, Vol.
10, pp. 527-542 (1985).
Hall, RW., "Comments on One-Warehouse Multiple Retailer Systems with
Vehicle Routing Costs", Management Science, Vol. 37, pp. 1496-1497
(1991).
Hauer, E., "Fleet Selection for Public Transportation Routes", Transporta-
tion Science, Vol. 5, pp. 1-21 (1971).
Hausman, W., and Gilmour, P.," A Multi Period Truck Delivery Problem",
Transportation Research, Vol. 1, pp. 349-357 (1967).
Held, M., and Karp, RM., "The· Traveling Salesman Problem and Mini-
mum Spanning Trees", Operations Research, Vol. 18, pp. 1138-1162
(1970).
Holmes, RA., and Parker, RG., "A Vehicle Scheduling Procedure Based
Upon Savings and A Solution Perturbation Scheme", Operational Re-
search Quarterly, Vol. 27, pp. 83-92 (1971).
Hyman, W., and Gordon, L., "Commercial Airline Scheduling Technique",
Transportation Research, Vol. 2, pp. 23-30 (1968).
Jacobsen, S.K., and Madsen, 0., "A Comparative Study of Heuristics for
a Two-Level Routing Location- Problem", European Journal of Oper-
ational Research, Vol. 5, pp. 378-387 (1980).
Jaw, J., Odoni, H., Psaraftis, H., and Wilson, N., "A Heuristic Algorithm
for the Multi-Vehicle Advance- Request Dial-A-Ride Problem with
Time Windows", Transportation Research, Vol. 20B, pp. 243-257
(1986).
Set Covering, Set Packing and Set Partitioning 703

Kirby, R.F., and McDonald, J.J., "The Savings Method for the Vehicle
Scheduling", Operational Research Quarterly, Vol. 24, pp. 305-306
(1972).

Kirby, R.F., and Potts, R.B., "The Minimization Route Problem with Turn
Penalties and Prohibitions", Transportation Research, Vol. 3, pp. 397-
408 (1969).
Knight, K.W., and Hofer, J.P., "Vehicle Scheduling with Timed and Con-
nected Calls: A Case Study", Operational Research Quarterly, Vol.
19, pp. 299-309 (1968).

Kolen, A., Rinnooy Kan, A., and Trienekens, H., "Vehicle Routing with
Time Windows", Operations Research, Vol. 35, pp. 266-273, (1987).

Koskosidis, Y.A., Powell, W.B., and Solomon, M.M., "An Optimization-


Based Heuristic for Vehicle Routing and Scheduling with Soft Time
Window Constraints", Transportation Science, Vol. 26, pp. 69-85
(1992).

Krolak, P., Felts, W., and Nelson, J., "A Man-Machine Approach Toward
Solving the Generalized Truck Dispatching Problem", Transportation
Science, Vol. 6, pp. 149-169 (1972).

Labbe, M., Laporte, G., and Mercure, H., "Capacitated Vehicle Routing
on Trees", Operations Research, Vol. 39, pp. 616-622 (1991).

Laderman, J.L., Gleiberman, L., and Egan, J.F., "Vessal Allocation by


Linear Programming", Naval Research Logistics Quarterly, Vol. 13,
pp. 315-320 (1966).

Lam, T., "Comments on a Heuristic Algorithm for the Multiple Termi-


nal Delivery Problem", Transportation Science, Vol. 4, pp. 403-405
(1970).

LaPorte, G., "The Vehicle Routing Problem: An Overview of Exact and


Approximate Algorithms", European Journal of Operational Research,
Vol. 59, pp. 345-358 (1992a).

LaPorte, G., "The Traveling Salesman Problem: An Overview of Exact and


Approximate Algorithms", European Journal of Operational Research,
Vol. 59, pp. 231-248 (1992b).
704 R.R. Vemuganti

LaPorte, G., Nobert, Y., and Taillefer, S., "Solving a Family of Multi-Depot
Vehicle Routing and Location Routing Problems" , Transportation Sci-
ence, Vol. 22, pp. 161-172 (1988).

LaPorte, G., Nobert, Y., and Taillefer, S., "A Branch-and-Bound Al-
gorithm for the Assymetrical Distance Constrained Vehicle Routing
Problem", Mathematical Modelling, Vol. 9, pp. 857-868 (1987).

LaPorte, G., and Nobert, Y., "Exact Algorithms for the Vehicle Routing
Problem", Annals of Discrete Mathematics, Vol. 31, pp. 147-184
(1987).

Laporte, G., Mercure and Nobert, Y., "An Exact Algorithm for the Asym-
metrical Capacitated Vehicle Routing Problem", Networks, Vol. 16,
pp. 33-46 (1986).

LaPorte, G., Nobert, Y., and Arpin, D., "An Exact Algorithm for Solving
Capacitated Location-Routing Problem", Annals of Operations Re-
search, Vol. 6, pp. 239 - 310 (1986).

LaPorte, G., Nobert, Y., and Derochers, M., "Optimal Routing Under
Capacity and Distance Restrictions", Operations Research, Vol. 33,
pp. 1050-1073 (1985).

LaPorte, G., Desrochers, M., and Nobert, Y., "Two Exact Algorithms for
the Distance - Constrained Vehicle Routing Problem", Networks, Vol.
14, pp. 161-172 (1984).

LaPorte, G., and Nobert, Y., "Comb Inequalities for the Vehicle Routing
Problem", Methods of Operations Research, Vol. 51, pp. 271-276
(1984).

LaPorte, G., and Nobert, Y., "An Exact Algorithm for Minimizing Rout-
ing and Operating Costs in Depot Locations", European Journal of
Operational Research, Vol. 6, pp. 224-226 (1981).

Lenstra, J.K., and Rinnooy Kan, A.H.G., "Complexity of Vehicle Routing


and Scheduling Problems", Networks, Vol. 11, pp. 221-227 (1981).

Lenstra, J.K,. and Rinnooy Kan, A.H.G., "On General Routing Problems",
Networks, Vol. 6, pp. 273-280 (1976).
Set Covering, Set Packing and Set Partitioning 705

Lenstra, J.K., and Rinnooy Kan, A.H.G., "Some Simple Applications of


the Traveling Salesman Problem", Operational Research Quarterly,
Vol. 26, pp. 717-733 (1975).

Levary, R, "Heuristic Vehicle Scheduling" OMEGA, The International


Journal of Management Science, Vol. 9, pp. 660-663 (1981).

Levin, A., "Scheduling and Fleet Routing Models for Transportation Sys-
terns", Transportation Science, Vol. 5, pp. 232-255 (1971).

Levy, L., Golden B., and Assad, A., "The Fleet Size and Mix Vehicle
Routing Problem" , Management Science and Statistics Working Paper
80-011, College of Business and Management, University of Maryland
College Park, Maryland (1980).

Li, Chung-Lun, Simchi-Levi, D., and Desrochers, S.M., "On the Distance
Constrained Vehicle Routing Problem" , Operations Research, Vol. 40,
pp. 790-799 (1993).

Li, Chung-Lun, and Simchi-Levi, D., "Worst Case Analysis of Heuristics for
Multidepot Capacitated Vehicle Routing Problems", ORSA Journal
on Computing, Vol. 2, pp. 64-73 (1990).

Lin, S., and Kernighan, B., "An Effective Heuristic Algorithm for the Trav-
eling Salesman Problem", Operations Research, Vol. 21, pp. 498-516
(1973).

Lin, S., "Computer Solutions of the Traveling Salesman Problem", Bell


Systems Technical Journal, Vol. 44, pp. 2245-2269 (1965).

Lucena Filho, A.P., "Exact Solution Approaches for the Vehicle Routing
Problem", Ph.D. Thesis, Imperial College, London, (1986).

Magnanti, T.L., "Combinatorial Optimization and Vehicle Fleet Planning:


Perspectives and Prospects", Networks, Vol. 11, pp. 179-213 (1981).

Malandraki, C., an Daskin, M.S., "Time Dependent Vehicle Routing Prob-


lems: Formulations and Heuristic Algorithms", Technical Report, De-
partment of Civil Engineering, Northwestern University (1989).

Male, J., Liebman, J., and Orloff, C., "An Improvement of Orloff's General
Routing Problem" , Networks, Vol. 7, pp. 89-92 (1977).
706 R.R. Vemuganti

Marquez Diez-Canedo, J., and Escalante, 0., "A Network Solution to a


General Vehicle Scheduling Problem", European Journal of Opera-
tional Research, Vol. 1, pp. 255-261 (1977).

Martin-Lof, A., "A Branch-And-Bound Algorithm for Determining the


Minimal Fleet Size of a Transportation System", Transportation Sci-
ence, Vol. 4, pp. 159-163 (1970).

McDonald, J.J., "Vehicle Scheduling: A Case Study", Operational Re-


search Quarterly, Vol. 23, pp. 433-444 (1972).

McKay, M.D., and Harlety, H.O., "Computerized Scheduling of Seagoing


Tankers", Naval Research Logistics Quarterly, Vol. 21, pp. 255-264
(1974).
Minas, J.G., and Mitten, L.G., "The Hub Operation Scheduling Problem",
Operations Research, Vol. 6, pp. 329-345 (1958).

Minieka, E., "The Chinese Postman Problem for Mixed Networks", Man-
agement Science, Vol. 25, 643-648 (1979).

Mole, R, "The Curse of Unintended Rounding Error, A Case from the


Vehicle Scheduling Literature", Journal of the Operational Research
Society, Vol. 34, pp. 607-613 (1983).

Mole, R, Johnson, D.G., and Wells,R, "Combinatorial Analysis for the


Route First-Cluster Second Vehicle Routing", OMEGA, The Interna-
tional Journal of Management Science, Vol. 11, pp. 507-512 (1983).

Mole, R, "A Survey of Local Delivery Vehicle Routing Methodology",


Journal of the Operational Research Society, Vol. 30, pp. 245-252
(1979).

Mole, R, and Jameson, S., "A Sequential Route-Building Algorithm Em-


ploying a Generalized Savings Criterion", Operational Research Quar-
terly, Vol. 27, pp. 503-511 (1976).

Nelson, M.D., Nygard, K.E., Griffin, J.H., and Shreve, W.E., "Implemen-
tation Techniques for the Vehicle Routing Problem", Computers and
Operations Research, Vol. 12, pp. 273-283 (1985).
Nemhauser, G., "Scheduling Local and Express Trains", Transportation
Science, Vol. 3, pp. 164-175 (1969).
Set Covering, Set Packing and Set Partitioning 707

Newton, R., and Thomas, W., "Bus Routing in a Multi-School System",


Computers and Operations Research, Vol. 1, pp. 213-222 {1974}.

Newton, R., and Thomas, W., "Design of School Bus Routes by Com-
puter", Socio-Economic Planning Sciences, Vol. 3, pp. 75-85 {1969}.

Norback, J.P., and Evans, S.R., "An Heuristic Method for Solving Time-
Sensitive Routing Problems" , Journal of the Operational Research So-
ciety, Vol. 35, pp. 407-414 {1984}.

Orloff, C., "On General Routing Problems Comments", Networks, Vol. 6,


pp. 281-284 {1976a}.

Orloff, C., "Route Constrained Fleet Scheduling", 'Ii-ansportation Science,


Vol. 10, pp. 149-168 {1976b}.

Orloff, C., and Caprera, D., "Reduction and Solution of Large Scale Vehicle
Routing Problems", 'Ii-ansportation Science, Vol. 10, pp. 361-373
{1976}.

Orloff, C., "A Fundamental Problem in Vehicle Routing", Networks, Vol.


4, pp. 35-64 {1974a}.

Orloff, C., "Routing A Fleet of M Vehicles to/from a Central Facility",


Networks, Vol. 4, pp. 147-162 {1974b}.

Olson, C.A., Sorenson, E.E., and Sullivan, W.J., "Medium Range Schedul-
ing for Freighter Fleet", Operations Research, Vol. 17, pp. 255-264
{1969}.

Paessens, H., "The Savings Algorithms for the Vehicle Routing Problem",
European Journal of Operational Research, Vol. 34, pp. 336-344
{1988}.

Peterson, F.R., and Fullerton, H.V., "An Optimizing Network Model for
the Canadian Railways", Rail Int., Vol. 4, pp. 1187-1192 {1973}.

Pierce, J.F., "Direct Search Algorithms for 'Ii-uck Dispatching Problems -


Part I", 'Ii-ansportation Science, Vol. 3, pp. 1-42 {1969}.

Pollack, M., "Some Elements of the Airline Fleet Planning Problem",


'Ii-ansport Research, Vol. 11, pp. 301- 310 {1977}.
708 R.R. Vemuganti

Potvin, J-Y., and Rousseau, J-M., "A Parallel Route Building Algorithm
for the Vehicle Routing and Scheduling Problem with Time Windows" ,
European Journal of Operational Research, Vol. 66, pp. 331-340
(1993).

Potvin, J-Y., Kervahut, T., and Rousseau, J-M., "A Tabu Heuristic for
the Vehicle Routing Problem with Time Windows", CRT-855, Centre
De Recherche Sur Les Transports, Universite De Montreal, Montreal,
Canada (1992).

Psaraftis, H.N., "Dynamic Vehicle Routing: Is It A Simple Extension of


Static Routing", CORSjTIMSjORSA, Vancouver, Working Paper -
MIT-OR-89-1 (1989).

Psaraftis, H.N., "Dynamic Vehicle Routing Problems", Vehicle Routing:


Methods and Studies, (Edited by B. Golden and A. Assad), North
Holland (1988).

Psaraftis, H.N., "Scheduling Large Scale Advance Request Dial-A-Ride


Systems", American Journal of Mathematical and Management Sci-
ence, Vol. 6, pp. (1986).

Psaraftis, H.N., "An Exact Algorithm for the single Vehicle Many-to-Many
Dial-A-Ride Problem with Time Windows", Transportation Science,
Vol. 17, pp. 351-357 (1983a).

Psaraftis, H.N., " Analysis of an O(N2) Heuristi for the Single Vehicle Many-
to-Many Euclidean Dial-a-Ride Problem", Transportation Research,
Vol. 17B, pp. 133-145 (1983b).

Psaraftis, H.N., "K-Interchange Procedures for Local Search in a Precedence-


Constrained Routing Problem" , European Journal of Operational Re-
search, Vol. 13, pp. 391-402 (1983c).

Psaraftis, R.N., "A Dynamic Programming Solution to the Single Vehi-


cle Many-to-Many Immediate Request Dial-A-Ride Problem", Trans-
portation Science, Vol. 14, pp. 130-154 (1980).

Pullen, H.G.M., and Webb, M.H.J., "A Computer Application to a Trans-


port Scheduling Problem", Computer Journal, Vol. 10, pp. 10-13
(1967).
Set Covering, Set Packing and Set Partitioning 709

Raft, O.M., "A Modular Algorithm for an Extended Vehicle Scheduling


Problem", European Journal of Operational Research, Vol. 11, pp. 67
- 76 (1982).
Rao, M.R., and Zionts, S., "Allocation of Transportation Units to Alterna-
tive Trips-A Column Generation Scheme with out-of-killer Subprob-
lems", Operations Research, Vol. 16, pp. 52-63 (1968).
Richardson, R., "An Optimization Approach to Routing Aircraft", Trans-
portation Science, Vol. 10, pp. 52-71 (1976).
Robertson, W.C., "Route and Van Scheduling in the Newspaper Industry",
Operational Research Quarterly, Special Conference Issue, Vol. 20, pp.
99 - (1969).
Ronen, D., "Allocation of Trips to Trucks Operating from a Single Ter-
minal", Computers and Operations Research, Vol. 19, pp. 445-451
(1992).
Ronen, D., "Short-Term Scheduling of Vessals for Shipment of Bulk and
Semi-Bulk Commodities Originating in a Single Area", Operations
Research, Vol. 34, pp. 164-173 (1986).
Ronen, D., "Cargo Ships Routing and Scheduling: A Survey of Models and
Problems", European Journal of Operational Research, Vol. 12, pp.
119-126 (1983).
Russell, R., and Igo, W., "An Assignment Routing Problem", Networks,
Vol. 9, pp. 1-17 (1979).
Russell, R.A., "An Effective Heuristic for the M-Tour Traveling Salesman
Problem with Some Side Constraints", Operations Research, Vol. 25,
pp. 517-524 (1977).
Saha, J.L., "An Algorithm for Bus Scheduling Problems", Operational Re-
search Quarterly, Vol. 21, pp. 463- 474 (1970).
Salzborn, F., "Minimum Fleet Size Models for Transportation Systems",
Transportation and Traffic Theory (Edited by D. Buckley), Reed, Syd-
ney, pp. 607-624 (1974).
Salzborn, F., "A Note on Fleet Routing". Transportation Research, Vol.
7, pp. 335-355 (1972a).
710 R.R. Vemuganti

Salzborn, F., "A Note on Fleet Routing Models for Transportation Sys-
terns", Transportation Science, Vol. 6, pp. 335-337 (1972b).

Salzborn, F., "The Minimum Fleet Size for a Suburban Railways System",
Transportation Science, Vol. 4, pp. 383-402 (1970).

Salzborn, F., "Timetables for a Suburban Rail Transit System", Trans-


portation Science, Vol. 3, pp. 297-316 (1969).

Salvelsbergh, M.W.P., "An Efficient Implementation of Local Search Al-


gorithms for Constrained Routing Problems", European Journal of
Operational Research", Vol. 47, pp. 75-85 (1990).

Salvelsbergh, M.W.P., "Local Search in Routing Problems with Time Win-


dows", Annals of Operations Research, Vol. 4, pp. 285-305 (1985).

Schrage, L., "Formulation and Structure of More Complex/Realistic Rout-


ing and Scheduling Problem", Networks, Vol. 11, pp. 229-232 (1981).

Schultz, H., "A Practical Method for Vehicle Scheduling", Interfaces, Vol.
9, pp. 13-19 (1979).

Sexton, T., and Choi, Y., "Pick-up and Delivery of Partial Loads with Time
Windows", The American Journal of Mathematical and Management
Sciences, Vol. 6, pp. 369-398 (1986).

Sexton, T., and Bodin, L., "Optimizing Single Vehicle Many-to-Many Op-
erations with Desired Delivery Times: II Routing", Transportation
Science, Vol. 19, pp. 411-435 (1985a).

Sexton, T., and Bodin, L., "Optimizing Single Vehicle Many-to-Many Op-
erations with Desired Delivery Times: I, Scheduling", Transportation
Science, Vol. 19, pp. 378-410 (1985b).

Sexton, T., and Choi, Y., "Routing and Scheduling Problems with time
Windows, Partial Loads and Dwell Times", American Journal of Math-
ematical and Management Sciences, Vol. pp. (1972).

Simpson, R., "A Review of Scheduling and Routing Models for Airline
Scheduling", Proceedings, Ninth AGIFORS Symposium, Operations
Research Division, American Airlines, New York (1969).
Set Covering, Set Packing and Set Partitioning 711

Smith, B., and Wren, A., "VAMPIRES and TASC: Two Successfully Ap-
plied Bus Scheduling Programs" , Computer Scheduling of Public Trans-
port: Urban Passenger Vehicle and Crew Scheduling (Edited by A.
Wren), North-Holland Publishing Company, Amsterdam, pp. 97-124.
(1981).

Solomon, M., and Desrosiers, J., "Time Window Constrained Routing and
Scheduling Problems", Transportation Science, Vol. 22, pp. 1-13
(1988).

Solomon, M., "Algorithms for the Vehicle Routing and Scheduling Prob-
lems with Time Windows Constraints", Operations Research, Vol. 35,
pp. 254-265 (1987).

Solomon, M.M., "On the Worst-Case Performance of Some Heuristics for


the Vehicle Routing and Scheduling Problems with Time Window Con-
straints", Networks, Vol. 16, pp. 161-174 (1986).

Soumis, F. Ferland, J., Rousseau, J., "A Model for Large Scale Aircraft
Routing and Scheduling Problems", Transportation Research, Vol.
14B, pp. 191-201 (1980).

Spaccamela, M.A., Rinnooy Kan, A., and Stougie, L., "Hierarchical Vehicle
Routing Problems", Networks, Vol. 14, pp. 571-586 (1984).

Stein, D., "Scheduling Dial-A-Ride Transportation Systems", Transporta-


tion Science, Vol. 12, pp. 232-249 (1978).

Stein, D.M., "An Asymptotic, Probabilistic Analysis of a Routing Prob-


lem", Mathematics of Operations Research, Vol. 3, pp. 89-101 (1978).

Stern, H., and Dror, M., "Routing Electric Meter Readers", Computers
and Operations Research, Vol. 6, pp. 209-233 {1979}.

Stewart, W.R., and Golden, B., "A Lagrangian Relaxation Heuristic for
Vehicle Routing" , European Journal of Operational Research, Vol. 15,
pp. 84-88 (1984).

Stewart, W., and Golden, B., "Computing Effective Subscriber Bus Routes",
Proceedings, 1980 SE TIMS Conference {Edited by P. Dearing, G.
Worm}, Virginia Beach, pp. 170-178 (1981).
712 R.R. Vemuganti

Stewart, W., and Golden, B., "The Subscriber Bus Routing Problem", Pro-
ceedings, IEEE International Conference on Circuits and Computers,
Port Chester, New York, pp. 153-156 (1980).
Stricker, R., "Public Sector Vehicle Routing: The Chinese Postman Prob-
lem" , M.S. Thesis, Massachusetts Institute of Technology, Cambridge,
Massachusetts (1970).
Sumichrast, R.T., and Markham, I.S., "Routing Delivery Vehicles with
Multiple Sources, Destinations and Depots", TIMS/ORSA Joint Na-
tional Meeting (1993).
Sutcliffe, C., and Board, J., "The Ex-Ante Benefits of Solving Vehicle-
Routing Problems", Journal of the Operational Research Society, Vol.
42, pp. 135-143 (1991).
Szpigel, V., "Optimal Train Scheduling on a Single Track Railway", Oper-
ations Research, Vol. 20, pp. 343- 352 (1972).
Tan, C., and Beasley, J., "A Heuristic Algorithm for the Period Vehicle
Routing Problem", OMEGA,The International Journal of Manage-
ment Science, Vol. 12, pp. 497-504 (1984).
Tillman, F., and Cain, T., "An Upper Bounding Algorithm for the Single
and Multiple Terminal Delivery Problem", Management Science, Vol.
18, pp. 664-682 (1972).
Tillman, F., and Hering, R., "A Study of a Look-Ahead Procedure for Solv-
ing the Multiterminal Delivery Problem", Transportation Research,
Vol. 5, pp. 225-229 (1971).
Tillman, F.A., "The Multiple Terminal Delivery Problem with Probabilis-
tic Demands", Transportation Science, Vol. 3, pp. 192-204 (1969).
Tillman, F.A., and Cochran, H., "A Heuristic Approach for Solving the
Delivery Problem", Journal of Industrial Engineering, Vol. 19, pp.
354-358 (1968).
Turner, W., Ghare, P., and Foulds, L., "Transportation Routing Problem
- A Survey", AilE Transactions, Vol. 6, pp. 288-301 (1976).
Turner, W., and Hougland, E., "The Optimal Routing of Solid Waste Col-
lection Vehicles", AIlE Transactions, Vol 7, pp. 427-431 (1975).
Set Covering, Set Packing and Set Partitioning 713

Tyagi, M., "A Practical Method for the Truck Dispatching Problem", Jour-
nal Operational Research Society of Japan, Vol. 10, pp. 76-92 (1968).
Unwin, E., "Bases for Vehicle Fleet Scheduling", Operational Research
Quarterly, Vol. 19, pp. 201-202 (1968).
Van Leeuwen P., and Volgenant, A., "Solving Symmetric Vehicle Routing
Problems Asymetrically", European Journal of Operational Research,
Vol. 12, pp. 388-393 (1983).

Watson-Gandy, C.D.T., and Foulds, L.R., "The Vehicle Scheduling Prob-


lem: A Survey", New Zealand Operational Research Quarterly, Vol.
23, pp. 361-372 (1972).

Webb, M.H.J., "Relative Performance of Some Sequential Methods of Plan-


ning Multiple Delivery Journeys", Operational Research Quarterly,
Vol. 23, pp. 361-372 (1972).

White, W., and Bomberault, A., "A Network Algorithm for Empty Freight
Car Allocation", IBM Systems Journal, Vol. 9, pp. 147-169 (1969).

Williams, B., "Vehicle Scheduling: Proximity Searching", Journal of Op-


erational Research Society, Vol. 33, pp. 961-966 (1982).

Wolters, J., "Minimizing the Number of Aircraft for a Transportation Net-


work" , European Journal of Operational Research, Vol. 3, pp. 394-402
(1979).
Wren, A., and Holliday, A., "Computer Scheduling of Vehicle from One or
More Depots to a Number of Delivery Points", Operational Research
Quarterly, Vol. 23, pp. 333-344 (1972).

Wren, A., (Ed.), Computer Scheduling of Public Transportation: Urban


Passenger Vehicle and Crew Scheduling", North-Holland Publishing
Company, Amsterdam (1981).

Yellow, P., "A Computational Modification to the Savings Method of Vehi-


cle Scheduling", Operational Research Quarterly, Vol. 21, pp. 281-283
(1970).
Young, D., "Scheduling a Fixed Schedule, Common Carrier Passenger
Transportation System", Transportation Science, Vol. 4, pp. 243-269
(1970).
714 R.R. Vemuganti

Location
Akinc, V., and Khumawala, B.M., "An Efficient Branch and Bound Al-
gorithm for the Capacitated Warehouse Location Problem", Manage-
ment Science, Vol. 23, pp. 585-594 (1977).
Alao, N., "Two Classes of Distance Minimization Problems", A Review,
Some Interpretations and Extensions", Geographical Analysis, Vol. 3,
pp. 299-319 (1971).
Aneja, Y.P., and Chandra Sekaran, R., and Nair, R.P.K., "A Note on the
M-Center Problem with Rectilinear Distances", European Journal of
Operational Research, Vol. 35, pp. 118-123 (1988).
Armour, G.C., and Buffa, E.S., "A Heuristic Algorithm and Simulation
Approach to the Relative Location of Facilities" , Management Science,
Vol. 9, pp. 294-309 (1963).
Atkins, R.J., and R.H. Shriver, "New Approaches to Facilities Location,"
Harvard Business Review, pp. 70-79 (1968).
Baker, K.R., "A Heuristic Approach to Locating a Fixed Number of Fa-
cilities", Logistics and Transportation Review, Vol. 10, pp. 195-205
(1974).
Balakrishnan, P.V., and Storbeck, J.E., "MCTHRESH: Modeling Maxi-
mum Coverage with Threshold Constraints", Environment and Plan-
ning B: Planning and Design, Vol. 18, pp. 459-472 (1991).
Balas, E., "A Class of Location, Distribution and Scheduling Problems:
Modelling and Solution Methods", Revue BeIge de Statistique,
d'Informatique et de Recherche Operationnelle, Vol. 22, pp. 36-57
(1983).
Ball, M.O., and Lin, F.L., "A Reliability Model Applied to Emergency
Service Vehicle Location", Operations Research, Vol. 41, pp. 18-36
{1993}.
Ballou, R., "Locating Warehouses in a Logistics System", The Logistics
Review, Vol. 4, pp. 23-40 (1968).
Barcelo, J., and Casanovas, J., "A Heuristic Lagrangian Algorithm for the
Capacitated Plant Location Problem", European Journal of Opera-
tional Research, Vol. 15, pp. 212-226 {1984}.
Set Covering, Set Packing and Set Partitioning 715

Batta, R., and Mannur, N. R., "Covering-Location Models for Emergency


Situations that Require Multiple Response Units", Management Sci-
ence, Vol. 36, pp. 16-23 {1990}.
Batta, R., Dolan, J.M., and Krishnamurty, N.N., "The Maximal Expected
Covering Location Problem: Revisited", Transportation Science, Vol.
23, pp. 277-287 {1989}.
Baumol, W.J., and Wolfe, P., "A Warehouse-Location Problem", Opera-
tions Research, Vol. 6, pp. 252-263 {1958}.
Beckmann, M., "Principles of Optimum Location for Transportation Net-
works", Quantitative Geography, Atherton Press, NY {1963}.
Bell, T., and Church, R., "Location and Allocation Modelling in Arche-
olgoical Settlement Pattern Search: Some Preliminary Applications" ,
World Archeology, Vol. 16, pp. 354-371 {1985}.
Bellman, R., "An Application of Dynamic Programming to Location-
Allocation Problems," SIAM Review, Vol. 7, pp. 126-128 {1965}.
Bennett, V.L., Eaton, D.J., and Church, R.L., "Selecting Sites for Rural
Health Workers", Social Sciences of Medicine, Vol. 16, pp. 63-72
{1982}.
Berlin, G.N., and Liebman, J.C., "Mathematical Analysis of Emergency
Ambulance Location", Socio- Economical Planning Science, Vol. 8,
pp. 323-328 {1971}.
Bertsimas, D., " Two Traveling Salesman Facility Location Problems", Sloan
Working Paper No. 2068-88, Massachusetts Institute of Technology,
Cambridge, Massachusetts {1988}.
Bilde, 0., and Krarup, J., "Sharp Lower Bounds and Efficient Algorithms
for the Simple Plant Location Problem", Annals of Discrete Mathe-
matics, Vol. 1, pp. 79-97 {1977}.
Bindschedler, A.E., and Moore, J.M., " Optimal Location of New Machines
in Existing Plant Layouts", The Journal of Industrial Engineering,
Vol. 12, pp. 41-47 {1961}.
Bouliane, J., "Locating Postal Relay Boxes Using a Set Covering Alo-
gorithm", American Journal of Mathematical and Management Sci-
ences, Vol. 12, pp. 65-74 {1992}.
716 R.R. Vemuganti

Brandeau, M., and Chiu, S., "An Overview of Representative Problems


in Location Research", Management Science, Vol. 35, pp. 645-673
(1989).

Brown, P.A., and Gibson, D.F., "A Quantified Model for Facility Site
Selection-Application to a Multiplant Location Problem" , AilE Trans-
actions, Vol. 4, pp. 1-10 {1972}.

Burstall, RM., Leeaver, RA., and Sussmas, J.E., "Evaluation of Trans-


port Costs for Alternative Factory Sites - A Case Study," Operational
Research Quarterly, Vol. 13, pp. 345-354 {1962}.

Cabot, A.V., and Francis, RL., and Stary, M.A., "A Network Flow Solu-
tion to a Rectilinear Distance Facility Location Problem" , AilE Trans-
actions, Vol. 2, pp. 132-141 {1970}.

Cerveny, R.P., "An Application of Warehouse Location Technique to Blood-


mobile Operations", Interfaces, Vol. 10, pp. 88-94 {1980}.

Chaiken, J.M., " Transfer of Emergency Service Deployment Models to Op-


erating Agencies", Management Science, Vol. 24, pp. 719-731 {1978}.

Chan, A.W., and Francis, RL., "A Round-Trip Location Problem on a


Tree Graph", Transportation Science, Vol. 10, pp. 35-51 {1976}.

Chandrasekaran, R, and Daughety, A., "Location on Tree Networks: P-


Center and N-Dispersion Problems", Mathematics of Operations Re-
search, Vol. 6, pp. 50-57 {1981}.

Charnes, A., and Storbeck, J., "A Goal Programming Model for the Siting
of Multilevel EMS Systems" , Socia-Economic Planning Sciences", Vol.
14, pp. 155-161 {1980}.

Chaudry, S.S., McCormick, S., and Moon, D., "Locating Independent Fa-
cilities with Maximum Weight: Greedy Heuristics." OMEGA", The
International Journal of Management Science, Vol. 14, pp. 383-389
(1986).

Chhajad, D., and Lowe, T.J., "M-Median and M-Center Problems with
Mutual Communication: Solvable Special Cases" , Operations Research,
Vol. 40, pp. S56-S66 {1992}.
Set Covering, Set Packing and Set Partitioning 717

Cho, D.C., Johnson, E.L., Padberg, M., and Rao, M.R., "On the Unca-
pacitated Plant Location Problem I: Valid Inequalities and Facets",
Mathematics of Operations Research, Vol. 8, pp. 579-589 (1983).

Cho, D.C., Padberg, M., and Rao, M.R., "On the Uncapacitated Plant
Location Problem II: Facets and Lifting Theorems", Mathematics of
Operations Research, Vol. 8, pp. 590-612 (1983).

Chrissis, J.W., Davis, R.P., and Miller, D.M., "The Dynamic Set Covering
Problem", Applied Mathematics Modelling, Vol. 6, pp. 2-6 (1982).

Christofides, N., and Viola, P., "The Optimum Location of Multi-Centers


on a Graph", Operational Research Quarterly, Vol. 22, pp. 145-154
(1971).

Church, R., Current, J. and Storbeck, J., "A Bicriterion Maximal Covering
Location Formulation Which Considers the Satisfaction of Uncovered
Demand", Decision Sciences, Vol. 22, pp. 38-52 (1991).

Church, R.L., and Eaton, D.J., "Hierarchical Location Analysisd Using


Covering Objectives" , Spatial Analysis and Location - Allocation Mod-
els (Edited by A. Ghosh and G. Ruston), Van Nostrand Reinhold
Company, Inc., New York, pp. 163-185 (1987).

Church, R.L., and Weaver, J.R. "Theoretical Links Between Median and
Coverage Location Problems" , Annals of Operations Research, Vol. 6,
pp. 1-19 (1986).

Church, R.L., and Roberts, K.L., "Generalized Coverage Models and Pub-
lic Facility Location", Papers of the Regional Science Association, Vol.
53, pp. 117-135 (1983).

Church, R.L., and Meadows, M.E., "Location Modeling Utilizing Maximum


Service Distance Criteria", Geographical Analysis, Vol. 11, pp. 358-
373 (1979).

Church, R.L., and Garfinkel, R.S., "Locating an Obnoxious Facility on a


Network", Transportation Science, Vol. 12, pp. 107-118 (1979).

Church, R.L., and Meadows, M.E., "Results of a New Approach to Solving


the P-median Problem with Maximum Distance Constraints", Geo-
graphical Analysis, Vol. 9, pp. 364-378 (1977).
718 R.R. Vemuganti

Church, RL., and Revelle, C., "Theoretical and Computational Links Be-
tween the p-median, Location Set-Covering, and the Maximal Cover-
ing Location Problem", Geographical Analysis, Vol. 8, pp. 406- 415
(1976).

Church, RL., and Revelle, C. "The Maximum Covering Location Prob-


lems", Papers of the Regional Science Association, Vol. 32, pp. 101-
118 (1974).

Cohon, J.L., ReVelle, C.S., Current, J., Eagles, T., Eberhart, R, and
Church, R, "Application of Multiobjective Facility Location Model to
Power Plant Siting in a Six-State Region of the U.S.", Computers and
Operations Research, Vol. 7, pp. 107-123 (1980).

Conway, RW., and Maxwell, W.L., "A Note on the Assignment of Facility
Locations", The Journal of Industrial Engineering, Vol. 12, pp. 34-36
(1961).

Cooper, L., "The Transportation-Location Problem", Operations Research,


Vol. 20, pp. 94-108 (1972).

Cooper, L., "An Extension of the Generalized Weber Problem", The Jour-
nal of Regional Science, Vol. 8, pp. 181-197 (1968).

Cooper, 1., "Solutions of Generalized Location Equilibrium Models," Jour-


nal of Regional Science, Vol. 7, pp. 1-18 (1967).

Cooper, L., "Heuristic Methods for Location and Allocation Problems",


SIAM Review, Vol. 6, pp. 37- 53 (1964).

Cooper, L., "Location - Allocation Problems", Operations Research, Vol.


11, pp. 331-343 (1963).

Cornuejols, G., Nemhauser, G.L., and Wolsey, L.A., "The Uncapacitated


Facility Location Problem", In Discrete Location Theory, RL., ( Edited
by Francis and P. Mirchandani), Wiley Interscience (1990).

Cornuejols, G., Fisher, M.1., and Nemhauser, G.L., "Location of Bank


Accounts to Optimize Float: An Analytic Study of Exact and Ap-
proximate Algorithms", Management Science, Vol. 23, pp. 789-810
(1977a).
Set Covering, Set Packing and Set Partitioning 719

Cornuejols, G., Fisher, M.L., and Nemhauser, G.L., "On the Uncapacitated
Location Problem", Annals of Discrete Mathematics, Vol. 1, pp. 163-
177 (1977b).

Current, J., and O'Kelly, M., "Locating Emergency Warning Sirens", De-
cision Sciences, Vol. 23, pp. 221-234 (1992).

Current, J.R., and Schilling D.A. "Analysis of Errors Due to Demand Data
Aggregation in the Set Covering and Maximal Covering Location Prob-
lems", Geographical Analysis, Vol. 22, pp. 116-126 (1990).

Current, J.R., and Schilling, D.A., "The Covering Salesman Problem",


Transportation Science, Vol. 23, pp. 208-213 (1989).

Current, J.R., and Storbeck, J.E., "Capacitated Covering Models", Envi-


ronment and Planning B: Planning and Design, Vol. 15, pp. 153-163
(1988).

Current, J.R., and Storbeck, J.E., "Satisfying Solutions to Infeasible Set


Partitions", Environment and Planning B: Planning and Design, Vol.
14, pp. 182-192 (1987).

Current, J.R., ReVelle, C.S., and Cohon, J.L., "The Maximum/Shortest


Path Problem: A Multiobjective Network Design and Routing For-
mulation", European Journal of Operational Research Vol. 21, ppp.
189-199 (1985).

Daskin, M., Haghani, A.E., Khanal, M., and Malandraki, C., "Aggrega-
tion Effects in Maximum Covering Models", Annals of Operations Re-
search, Vol. 18, 115-140 (1989).

Daskin, M., Hogan, K., and Revelle, C., "Integration of Multiple, Excess,
Backup and Expected Covering Models", Environment and Planning
B: Planning and Design, Vol. 15, pp. 15-35 (1988).

Daskin, M.S., "A Maximum Expected Covering Location Model: Formula-


tion, Properties, and Heuristic Solution" , Transportation Science, Vol.
17, pp. 48-70 (1983).

Daskin, M.S., "Application of an Expected Covering Model to Emergency


Medical Service Systems Design", Decision Sciences, Vol. 13, pp. 416-
439 (1982).
720 R.R. Vemuganti

Daskin, M., and Stern, E., "A Hierarchical Objective Set Covering Model
for Emergency Medical Service Vehicle Deployment", Transportation
Science, Vol. 15, pp 137-152 {1981}.

Davis, P.S., and Ray, T.L., "A Branch and Bound Algorithm for the Capac-
itated Facilities Location Problem", Naval Research Logistics Quar-
terly, Vol. 16, pp. 331-344 {1969}.

Dearing, P.M., "Location Problems", Operations Research Letters, Vol. 4,


pp. 95-98 {1985}.

Dearing, P.M., and Francis, R.L., "A Minimax Location Problem on a


Network", Transportation Science, Vol. 8, pp. 333-343 {1974a}.

Dearing, P.M., and Francis, R.L., "A Network Flow Solution to a Multi-
Facility Minimax Location Problem Involving Rectilinear Distances" ,
Transportation Science, Vol. 8, pp. 126-141 {1974b}.

Dee, N., and Liebman, J.C., " Optimal Location of Public Facilities", Naval
Research Logistics Quarterly, Vol. 19, pp. 753-760 {1972}.

Deighton, D., "A Comment on Location Models", Management Science,


Vol. 18, pp. 113-115 {1971}.

Drezner, Z., "The P-Cover Problem", European Journal of Operational


Research, Vol. 26, pp. 312-313 {1985}.

Drezner, Z., "The P-Center Problem-Heuristic and Optimal Algorithms",


Journal of the Operational Research Society, Vol. 35, pp. 741-748
{1984}.

Drysdale, J.K., and Sandiford, P.J., "Heuristic Warehouse Location - A


Case History Using a New Method", Canadian Operations Research
Society, Vol. 7, pp. 45-61 (1969).

Dutton, R., Hinman, G., and Millham, C.B., "The Optimal Location ofNu-
clear Power Facilities in the Pacific Northwest", Operations Research,
Vol. 22, pp. 478-487 (1974).

Dyer, M.E., and Frieze, A.M., "A Simple Heuristic for the P-Center Prob-
lem", Operations Research Letters, Vol. 3, pp. 285-288 (1985).
Set Covering, Set Packing and Set Partitioning 721

Eaton, D., Daskin, M., Simmons, D., Bulloch, B., and Jansma, G., "De-
termining Emergency Medical Service Deployment in Austin, Texas",
Interfaces, Vol. 15, pp. 96-108 (1985).
Efroymson, M., and Ray, T., "A Branch and Bound Algorithm for Plant
Location", Operations Research, Vol. 14, pp. 361-368 (1966).
Eiselt, H.A., "Location Modeling in Practice", American Journal of Math-
ematical and Management Sciences, Vol. 12, pp. 3-18 (1992).

Eiselt, H.A., and Pederzoli, G., "A Location Problem in Graphs", New
Zealand Journal of Operations Research, Vol. 12, pp. 49-53 (1984).
Eisemann, K., "The Optimum Location of a Center", SIAM Review, Vol.
4, pp. 394-401 (1962).

Ellwein, L.B., and Gray, P., "Solving Fixed Charge Location - Allocation
Problems with Capacity and Configuration Constraints" , AIlE Trans-
actions, Vol. 3, pp. 290-298 (1971).
EI-Shaieb, A.M., "A New Algorithm for Locating Sources Among Destina-
tions", Management Science, Vol. 20, pp. 221-231 (1973).
Elson, D.G., "Site Location via Mixed Integer Programming", Operational
Research Quarterly, Vol. 23, pp. 31-43 (1972).
Elzinga, J., Hearn, D., and Randolph, W.D., "Minimax Multifacility Loca-
tion with Euclidean Distances", Transportation Science, Vol. 10, pp.
321-336 (1976).
Elzinga, J., and Hearn, D., "A Note on a Minimax Location Problem",
Transportation Science, Vol. 7, pp. 100-103 (1973).

Elzinga, J., and Hearn, D., "Geometrical Solutions for Some Minimax
Location Problems", Transportation Science, Vol. 6, pp. 379-394
(1972a).

Elzinga, J., and Hearn, D., "The Minimum Covering Sphere Problem,"
Management Science, Vol. 19, pp. 96-104 (1972b).
Erkut, E., Francis, R.L., Lowe, T.J., and Tamir, A., "Equivalent Mathe-
matical Programming Formulations of Montonic Tree Network Loca-
tion Problems", Operations Research, Vol. 37, pp. 447- 461 (1989).
722 R.R. Vemuganti

Erkut, E., Francis, R.L., and Lowe, T.J., "A Multimedian Problem with
Interdistance Constraints", Environment and Planning B: Planning
and Design, Vol. 15, pp. 181-190 {1988}.

Erlenkotter, D., "A Dual-Based Procedure for Uncapacitated Facility Lo-


cation", Operations Research, Vol. 26, pp. 992-1009 {1978}.

Erlenkotter, D., "Facility Location with Price-Sensitive Demand, Private,


Public and Quasi-Public", Management Science, Vol. 24, pp. 378-386
{1977}.

Erlenkotter, D., "A New Algorithm for Locating Sources Among Destina-
tions", Management Science, Vol. 20, pp. 221-231 {1973}.

Eyster, J.W., White, J.A., and Wierwille, W.W., "On Solving Multifacility
Location Problems Using a Hyperboloid Approximation Procedure,"
AIlE Transactions, Vol. 5, pp. 1-6 {1973}.

Feldmann, E., Lehrer, F., and Ray, T., "Warehouse Location Under Contin-
uous Economics of Scale", Management Science, Vol. 12, pp. 670-684
(1966).

Fitzsimmons, J.A., and Allen, L. A., "A Warehouse Location Model Helps
Texas Comptroller Select Out-of-State Audit Offices", Interfaces, Vol.
13, pp. 40 - 46 {1983}.

Fitzsimmons, J.A., "A Methodology for Emergency Ambulance Deploy-


ment", Management Science, Vol. 15, pp. 627-636 {1969}.

Flynn, J., and Ratick, S., "A Multiobjective Hierarchical Covering Model
for the Essential Air Services Program", Transportation Science, Vol.
22, pp. 139-147 {1988}.

Foster, D.P., and Vohra, R.K., "A Probabilistic Analysis ofthe K-Location
Problem", American Journal of Mathematical and Management Sci-
ences, Vol. 12, pp. 75 - 87 {1992}.

Francis, R.L., and Mirchandani, P.B., {Eds.}, "Discrete Location Theory",


John Wiley & Sons {1989}.

Francis, R.L., McGinnis, L.F., and White J.A., "Locational Analysis", Eu-
ropean Journal of Operational Research, Vol. 12, pp. 220-252 (1983).
Set Covering, Set Packing and Set Partitioning 723

Francis, RL., Lowe, T.J., and Ratliff, H.D., "Distance Constraints for Tree
Network Multifacility Location Problem", Operations Research, Vol.
26, pp. 570-596 (1978).

Francis, RL., and Goldestein, J.M., "Location Theory: A Selective Bibli-


ography", Operations Research, Vol. 22, pp. 400-410 (1974).

Francis, RL., and White, J.A., "Facilities Layout and Location", Prentice
Hall, Inc., (1974).

Francis, RL., and Cabot, A.V., "Properties of a Multifacility Location


Problem Involving Euclidean Distances", Naval Research Logistics
Quarterly, Vol. 19, pp. 335-353 (1972).

Francis, RL., "A Geometric Solution Procedure for a Rectilinear Distance


Minimax Location", AIlE Transactions, Vol. 4, pp. 328-332 (1972).

Francis, RL., "Some Aspects of Minimax Location Problem", Operations


Research, Vol. 15, pp. 1163-1168 (1967a).

Francis, RL., "Sufficient Conditions for Some Optimum - Property Facility


Design", Operations Research, Vol. 15, pp. 448-466 (1967b).

Francis, RL., "On the Location of Multiple New Facilities With Respect
to Existing Facilities" , The Journal of Industrial Engineering, Vol. 15,
pp. 106-107 (1964).

Francis, RL., "A Note on the Optimum Location of New Machines in


Existing Plant Layouts", The Journal of Industrial Engineering, Vol.
14, pp. 57-59 (1963).

Frank, H., "Optimum Location on a Graph with Probabilistic Demand,"


Operations Research, Vol. 14, pp. 409-421 (1966).

Fujiwara, 0., Makjamroen, T., and Gruta, K.K., "Ambulance Deployment


Analysis: A Case Study of Bangkok", European Journal of Opera-
tional Research, 31, pp. 9-18 (1987).

Garfinkel, RS., and Noebe, A.W., and Roo, M.R, "The M-Center Prob-
lem: Minimax Facility Location", Management Science, Vol. 23, pp.
1133-1142 (1977).
724 RR Vemuganti

Gavett, J.W., and Plyter, N.V., "The Optimal Assignments of Facilities to


Locations by Branch and Bound", Operations Research, Vol. 14, pp.
210-231 {1966}.

Gelders, L. F., Pintelow, L.M., and Van Wassenhove, L.N., "A Location-
Allocation Problem in a Large Belgian Brewery", European Journal
of Operational Research, Vol. 28, pp. 196-206 {1987}.

Geoffrion, A.M., and McBride, R., "Lagrangian Relaxation Applied to Ca-


pacitated Facility Location Problem" , AIlE 'fransactions, Vol. 10, pp.
40-47 {1978}.

Ghosh, A., and Craig, C.S., "An Approach to Determining Optimal Lo-
cation of New Services", Journal of Marketing Research, Vol. 23, pp.
354-362 {1986}.

Gleason, J., "A Set Covering Approach to Bus Stop Location", OMEGA,
The International Journal of Management Science, Vol. 3, pp. 605-608
{1975}.

Goldberg, J., and Paz, L., "Locating Emergency Vehicle Bases When Ser-
vice Time Depends On Call Location", 'fransportation Science, Vol.
25, pp. 264-280 {1991}.

Goldberg, J. R., Dietrich, R., Chen, J.M., Mitwasi, M.G., Valenzuela, T.,
and Criss, E., "Validating and Applying a Model for Locating Emer-
gency Medical Vehicles in Tucson, AZ {Case Study}", European Jour-
nal of Operational Research, Vol. 49, pp. 308-324 {1990}.

Goldman, A.J., "Minimax Location of a Facility on a Network", 'frans-


portation Science, Vol. 6, pp. 407-418 {1972a}.

Goldman, A.J., "Approximate Localization Theorems for Optimal Facility


Placement," Transportation Science, Vol. 6, pp. 195-201 {1972b}.

Goldman, A.J., "Optimal Center Locations in Simple Networks", 'frans-


portation Science, Vol. 5, pp. 212-221 {1971}.

Goldman, A.J., and Witzgall, C.J., "A Localization Theorem for Opti-
mal Facility Placement," 'fransportation Science, Vol. 4, pp. 406-408
{1970}.
Set Covering, Set Packing and Set Partitioning 725

Goldman, A.J., "Optimum Location for Centers in a Network", Trans-


portation Science, Vol. 3, pp. 352-360 (1969).
Goodchild, M., and Lee, J., "Coverage Problems and Visibility Regions on
Topographic Surfaces", Annals of Operations Research, Vol. 18, pp.
175-186 (1989).
Guignard, M., "Fractional Vertices, Cuts, Facets of the Simple Plant Loca-
tion Problem", Mathematical Programming Study, Vol. 12, pp. 150-
162 (1980).
Guignard M., and Spielberg, K., "A Direct Dual Method for the Mixed
Plant Location Problem with Some Side Constraints", Mathematical
Programming, Vol. 17, pp. 198-228 (1979).
Guignard, M., and Spielberg, K., "Algorithms for Exploiting the Structure
of Simple Plant Location Problems" , Annals of Discrete Mathematics,
Vol. 1, pp. 247-271 (1977).
Gunawardane, G., "Dynamic Version of Set Covering Type Public Facility
Location Problems", European Journal of Operational Research, Vol.
10, pp. 190-195 (1982).
Hakimi, S., Schmeichel, E., and Pierce, J., "On P-Centers in Networks",
Transportation Science, Vol. 12, pp. 1-15 (1978).
Hakimi, S., and Maheshwari, S.N., " Optimum Locations of Centers in Net-
works", Operations Research, Vol. 20, pp. 967-973 (1972).
Hakimi, S.L., "Optimum Distribution of Switching Centers in a Commu-
nication Network and Some Related Graph Theoretic Problems", Op-
erations Research, Vol. 13, pp. 462-475 (1965).
Hakimi, S., "Optimum Locations of Switching Centers and the Absolute
Centers and Medians of a Graph", Operations Research, Vol. 12, pp.
450-459 (1964).
Halfin, S., "On Finding the Absolute and Vertex Centers of a Tree with
Distances", Transportation Science, Vol. 8, pp. 75-77 (1974).
Halpern, J., "The Location of a Center-Median Convex Combination on an
Undirected Tree", Journal of Regional Science, Vol. 16, pp. 237-245
(1976).
726 R.R. Vemuganti

Hammer, P.L., "Plant Location - A Pseudo-Boolean Approach", Israel


Journal of Technology, Vol. 6, pp. 330-332 (1968).

Handler, G.Y., and Mirchandani, P.B., "Location on Networks, Theory


and Algorithms", MIT Press, Cambridge (1979).

Handler, G., "Minimax Location Facility in an Undirected Tree Graph",


Transportation Science, Vol. 7, pp. 287-293 (1973).

Hansen, P., Labbe, M., Peters, D., and Thisse, J-F., "Single Facility Lo-
cation on Networks", Annals of Discrete Mathematics, Vol. 31, pp.
113-146 (1987).

Hansen, P., Thisse, J.F., and Wendell, R.E., "Equivalence of Solutions to


Network Location Problems", Mathematics of Operations Research,
Vol. 11, pp. 672-678 (1986).

Hitchings, G.F., "Analogue Techniques for the Optimal Location of a Main


Facility in Relation to Ancillary Facilities" , International Journal Pro-
duction Research, Vol. 7, pp. 189-197 (1969).

Hodgson, M.J., "A Flow Capturing Location and Allocation Model", Ge-
ographical Analysis, Vol. 22, pp. 270 - 279 (1990).

Hodgson, M.J., "The Location of Public Facilities Intermediate to the Jour-


ney to Work", European Journal of Operational Research, Vol. 6, pp.
199-204 (1981).

Hogan, K., and Revelle, C., "Concepts and Applications of Backup Cover-
age", Management Science, Vol. 32, pp. 1434-1444 (1986).

Hogan, K., and Revelle, C.S., "Backup Coverage Concepts in the Locaiton
of Emergency Services", Modleing and Simulation, Vol. 14, pp. 1423
(1983).

Hogg, J., "The Siting of Fire Stations", Operational Research Quarterly,


Vol. 19, pp. 275-287 (1968).

Holmes, J., Williams, F.B., and Brown, L.A., "Facility Location Under
Maximum Travel Restriction: An Example Using Day Care Facilities" ,
Geographical Analysis, Vol. 4, pp. 258-266 (1972).
Set Covering, Set Packing and Set Partitioning 727

Hooker, J.N., Garfinkel, R.S., and Chen, C.K., "Finite Dominating Sets
for Network Location Problems", Operations Research, Vol. 39, pp
100-118 (1991).
Hoover, E.M., "Some Programmed Models of Industry Location", Land
Economics Vol. 43, pp. 303- 311 (1967).
Hopmans, A.C.M., "A Spatial Interaction Model for Branch Bank Ac-
counts", European Journal of Operations Research, Vol. 27, pp. 242
- 250 (1986).
Hormozi, A.M., and Khumawala, B.M., "An Improved Multi-Period Facil-
ity Location Model", CBA Working Paper Series - 252, University of
Houston (1992).
Hsu, W.L., and Nemhauser, G.L., "Easy and Hard Bottleneck Location
Problems", Discrete Applied Mathematics., Vol. 1, pp. 209-215 (1979).
Hurter, A.P., Schaeffer, M.K., and Wendell, R.E., "Solution of Constrained
Location Problems", Management Science, Vol. 22, pp. 51-56 (1975).
Jarvinen, P., Rajala, J., and Sinerro, J., "A Branch-and-Bound Algorithm
for Seeking P-Median", Operations Research, Vol. 20, pp. 173-178
(1972).
Kalstorin, T.D., "On the Maximal Covering Location Problem and the
General Assignment Problem", Management Science, Vol. 25, pp.
106-112 (1979).
Kariv, 0., and Hakimi, S.L., "An Algorithmic Approach to Network Lo-
cation Problem I: The P- Centers", SIAM Journal of Applied Mathe-
matics, Vol. 37, pp 513-538 (1979a).
Kariv, 0., and Hakimi, S.L., "An Algorithmic Approach to Network Loca-
tion Problems II: The P- Medians", SIAM Journal of Applied Mathe-
matics, Vol. 37, pp. 539-560 (1979b).
Keeney, R.L., "A Method for Districting Among Facilities", Operations
Research, Vol. 20, pp. 613- 618 (1972).
Khumawala, B.M., Neebe, A.,and Dannenbring, D.G., "A Note on EI-
Shaieb's New Algorithm for Locating Sources Among Destinations",
Management Science, Vol. 21, pp. 230-233 (1974).
728 R.R. Vemuganti

Khumawala, B.M., " An Efficient Heuristic Procedure for the Uncapacitated


Location Problem", Naval Research Logistics Quarterly, Vol. 20, pp.
109-121 (1973a).
Khumawala, B.M., "An Efficient Algorithm for the P-Median Problem with
Maximum Distance Constraints", Geographical Analysis, Vol. 5, pp.
309-321 (1973b).
Khumawala, B.M., "An Efficient Branch and Bound Algorithm for the
Warehouse Location Problem", Management Science, Vol. 18, pp.
718-731 (1972).
Khumawala, B.M., and Whybark, W.E., "A Comparison of Some Recent
Warehouse Location Techniques", Logistics Review, Vol. 7, pp. 3-19
(1971).
Kimes, S.E., and Fitzsimmons, J.A., "Selecting Profitable Sites ast La
Quinta Inns", Interfaces, Vol. 20, pp. 12 - 20 (1990).
Kirca, 0., and Eurkip, N., "Selecting Transfer Station Locations for Large
Solid Waste Systems" , European Journal of Operational Research, Vol.
38, pp. 339 - 349 (1988).
Kolen, A., "The Round-Trip P-Center and Covering Problem on a Tree",
Transportation Science, Vol. 19, pp. 222-234 (1985).
Kolen, A., " Solving Problems and the Uncapacitated Plant Location Prob-
lems on Trees", European Journal of Operational Research, Vol. 12,
pp. 266-278 (1983).
Kolesar, P., and Walker W.E., "An Algorithm for the Dynamic Relocation
of Fire Companies", Operations Research, Vol. 11, pp. 244-274 (1974).
Koopmans, Tjalling, C., and Beckmann, M., "Assignment Problems and
the Location of Economic Activities," Econometrica, Vol. 25, pp. 53-
76 (1957).
Kramer, R.L., "Analysis of Lock-Box Locations", Bankers Monthly Maga-
zine, Vol. pp. 50-55 (1966).
Krarup, J., and Pruzan, P.M., "The Simple Plant Location Problem: Sur-
vey and Synthesis", European Journal of Operational Research, Vol.
12, pp. 36-81 (1983).
Set Covering, Set Packing and Set Partitioning 729

Kraus, A., Janssen, C., and McAdams, A.K., "The Lock-Box Location
Problem: A Class of Fixed Charge Transportation Problem", Journal
of Bank Research, Vol. 1, pp. 51-58 (1970).
Kuehn, A., and Hamburger, N., "A Heuristic Program for Locating Ware-
houses" , Management Science, Vol. 9, pp. 643-666 (1963).
Kuhn, H.W., and Kuenne, R.E., "An Efficient Algorithm for the Numerical
Solution of the Generalized Weber Problem in Spatial Economics,"
Journal of Regional Science, Vol. 4, pp. 21-33 (1962).

Laporte, G., Nobert, Y., and Arpin, D., "An Exact Algorithm for Solving
Capacitated Location - Routing Problems", In Location Decisions:
Methodology and Applications, (Edited by J.C. Baltzer), Scientific
Publishing Company, pp. 246-257 (1986).

Larson, R.C., "Approximating the Performance of Urban Emergency Ser-


vice Systems", Operations Research, Vol. 23, pp. 845-869 (1975).

Larson, R.C., and Stevenson, K.A., " On Insensitivities in Urban Redistrict-


ing and Facility Location", Operations Research, Vol. 20, pp. 595-612
(1972).

Lawrence, R.M., and Pengilly, P.J., "The Number and Location of Depots
Required for Handling Products for Distribution to Retail Stores in
South-East England", Operational Research Quarterly, Vol. 20, pp.
23-32 (1969).
Leamer, E.E., "Locational Equilibria," Journal of Regional Science, Vol.
8, pp. 229-242 (1968).
Levy, J., "An Extended Theorem for Location on a Network", Operational
Research Quarterly, Vol. 18, pp. 433-442 (1967).

Lin, C.C., "On Vertex Addends in Minimax Location Problems", Trans-


portation Science, Vol. 9, pp. 165-168, (1975).

Louveaux, F.V., and Peters, D., "A Dual-Based Procedure for Stochastic
Facility Location", Operations Research, Vol. 40, pp. 564-573 (1992).

Love, R., F., Morris, J.G., and Weslowsky, G.O., "Facilities Location:
Models and Methods", North Holland Publishing Company, New York
(1988).
730 R.R. Vemuganti

Love, R.F., Wesolowsky, G.O. ,and Kraemer, S.A., "A Multi-Facility Mini-
max Location Method for Euclidean Distances", International Journal
of Production Research, Vol. 11, pp. 37-46 (1973).

Love, R.F., and Morris, J.G., "Modelling Inter-City Road Distances by


Mathematical Functions", Operational Research Quarterly, Vol. 23,
pp. 61-71 (1972).

Love, R.F., "A Computational Procedure for Optimally Locating a Facility


with Respect to Several Rectangular Regions," Journal of Regional
Science, Vol. 12, pp. 233-243 (1972).

Love, R.F., "Locating Facilities in Three-Dimensional Space by Convex


Programming," Naval Research Logistics Quarterly., Vol. 16, pp. 503-
516 (1969).

Love, R.F., "A Note on the Convexity of the Problem of Siting Depots" , In-
ternational Journal Production Research, Vol. 6, pp. 153-154 (1967).

MacKinnon, Ross, G., and Barber, G.M., "A New Approach to Network
Generation and Map Representation: The Linear Case of the Location-
Allocation Problem," Geographical Analysis, Vol. 4, pp. 156-158
(1972).

Maier, S.F., and Vanderweide, J.H., "A Unified Location Model for Cash
Disbursements and Lock- Box Collections" , Journal of Bank Research,
Vol. 7, pp. 166-172 (1976).

Maier, S.F., and Vanderweide, J.H., "The Lock-Box Location Problem: A


Practical Reformulation" , Journal of Bank Research, Vol. 5, pp. 92-95
(1974).

Malczewski, J., and Ogyczak, W., "An Interactive Approach to the Cen-
teral Facility Locaiton Problem", Geographical Analysis, Vol. 22, pp.
244 - 258 (1990).

Manne, A.S., "Plant Location Under Economics-of-Scale-Decentralization


and Computation", Management Science, Vol. 11, pp. 213-235 (1964).

Maranzana, F., "On the Location of Supply Points to Minimize Transport


Costs", Operational Research Quarterly, Vol. 15, pp. 261-270 (1964).
Set Covering, Set Packing and Set Partitioning 731

Marks, D., Revelle, C.S., and Liebman, J.C., "Mathematical Models of


Location: A Review", Journal of Urban Planning and Development
Division, Vol. 96, pp. 81-93 {1970}.

Marianov, V., and Revelle, C., "The Standard Response Fire Protection
Siting Problem", Information Systems and Operations Research, Vol.
29, pp. 116-129 {1991}.

Masuyama, S., Ibaraki, T., Hasegawa, T., "The Computational Complexity


of the M-Center Problem in the Plane" Transactions of IECE Japan,
Vol. E 64, pp. 57-64 {1981}.

Mavrides, L.P., "An Indirect Method for the Generalized K-Median Prob-
lem Applied to Lock-Box Location", Management Science, Vol. 25,
pp. 990-996 {1979}.

McAdams, A.K., "Critique of: A Lock-Box Location Model", Management


Science, Vol. 15, pp. 888- 890 {1968}.

McHose, A.H., "A Quadratic Formulation of the Activity Location Prob-


lem," The Journal of Industrial Engineering, Vol. 12, pp. 334 {1961}.

Medgiddo, N., Zemel, E., and Hakimi, L., "The maximum coverage location
problem", SIAM Journal of Algebra and Discrete Methods, Vol. 4, pp
253-261 {1983}.

Mehrez, A., and Stulman, A., "An Extended Continuous Maximal Cov-
ering Location Problem With Facility Placement", Computers and
Operations Research, Vol. 11, pp. 19-23 {1984}.

Mehrez, A., "A Note on the Linear Integer Formulation of the Maximal
Covering Location Problem with Facility Placement on the Entire
Plane", Journal of Regional Science, Vol. 23, pp. 553-555 {1983}.

Mehrez, A., and Stulman, A., "The Maximal Covering Location Problem
With Facility Placement on the Entire Plane", Journal of Regional
Science, Vol. 22, pp. 361-365 {1982}.

Meyer, P.D., and Brill, E.D., Jr., " A Method for Locating Wells in a
Groundwater Monitoring Networking Under Conditions of Uncertainty" ,
Water Resources Research, Vol. 24, pp. 1277-1282 {1988}.
732 R.R. Vemuganti

Minieka, E., "The Centers and Medians of a Graph" , Operations Research,


Vol. 25, pp. 641-650 (1977).

Minieka, E., "The M-Center Problem" , SIAM Review, Vol. 12, pp. 138-139
(1970).

Mirchandani, P.B., "Locational Decisions on Stochastic Networks", Geo-


graphical Analysis, Vol. 12, pp. 172-183 (1980).

Mirchandani, P.B., and Odoni, A.R, "Locating New Passenger Facilities


on a Transportation Network", Transportation Research, Vol. 13-B,
pp. 113-122 (1979a).

Mirchandani, P.B., and Odoni, A.R, "Location of Medians on Stochastic


Networks", Transportation Science, Vol. 13, pp. 85-97 (1979b).

Mole, RH., " Comments on the Location of Depots" , Management Science,


Vol. 19, pp 832-833 (1973).

Moon, I.D., and Chaudhry, S.S., "An Analysis of Network Location Prob-
lem with Distance Constraints", Management Science, Vol. 30, pp.
290-307 (1984).

Moore, G., and Revelle, C., "The Hierarchical Service Location Problem",
Management Science, Vol. 28, pp. 775-780 (1982).

Mukundan, S., and Dakin, M., "Joint Location/Sizing Maximum Profit


Covering Models" , Information Systems and Operations Research, Vol.
29, pp. 139-152 (1991).

Mycielske, J., and Trzechiakowske, W., "Optimization of the Size and Lo-
cation of Service Stations", Journal of Regional Science, Vol. 5, pp.
59-68 (1963).

Nair, K.P.K., and R. Chandrasekaran, " Optimal Location of a Single Ser-


vice Center of Certain Types," Naval Research Logistics Quarterly,
Vol. 18, pp. 503-510 (1971).

Nambiar, J.M., Gelders, L.F., and Van Wassenhove, L.N., "Plant Location
and Vehicle Routing in the Malaysian Rubber Smallholder Sector: A
Case Study" , European Journal of Operational REsearch, Vol. 38, pp.
14 - 26 (1989).
Set Covering, Set Packing and Set Partitioning 733

Nambiar, J., M., Gelders, L. F." and Van Wassenhove, L.N., "A Large-
Scale Location-Allocation Problem in a Natural Rubber Industry",
European Journal of Operational Research, Vol. 6, pp. 181- 189
(1981).

Narula, S.C., Ogbu, U.I., and Samuelsson, H.M., "An Algorithm for the P-
Median Problems", Operations Research, Vol. 25, pp. 709-713 {1977}.

Nauss, RM., and Markland,RE., "Theory and Application of an Optimiz-


ing Procedure for Lock-Box Location Analyses", Management Science,
Vol. 27, pp. 855-865 {1981}.

Nauss, RM., and Markland, RE., "Solving Lock-Box Location Problems",


Financial Management, Vol. pp. 21-31 {1979}.

Neebe, A.W., "A Procedure for Locating Emergency-Service Facilities For


All Possible Response Distances", Journal of Operation Research So-
ciety, Vol. 39, 743-748 {1988}.

Neebe, A.W., "A Branch and Bound Algorithm for the P-Median Trans-
portation Problem" , Journal of the Operational Research Society, Vol.
29, pp. 989 {1978}.

Orloff, C.S., "A Theoretical Model of Net Accessibility in Public Facility


Location", Geographical Analysis, Vol. 9, pp. 244-256 {1977}.

Osleeb, J.P., Ratick, S.J., Buckley, P., Lee, K., and Kuby, M., "Evaluation
of Dredging and Offshore Loading Locations for U.S. Coal Exports
Using Local Logistics System", Annals of Operations Research, Vol.
6, pp. 163 - 180 {1986}.

Palermo, F.P., "A Network Minimization Problem", IBM Journal, Vol. pp.
335-337 {1961}.

Patel, N., "Locating Rural Social Service Centers in India", Management


Science, Vol. 28, pp. 775-780 {1979}.

Perl, J., and Daskin, M., "A Warehouse Location-Routing Problem", Trans-
portation Research, Vol. 19B, pp. 381-396 (1985).

Perl, J., and Daskin, M.S., "A Unified Warehouse Location-Routing Method-
ology", Journal of Business Logistics, Vol. 5, pp. 92-111 {1984}.
734 R.R. Vemuganti

Picard, J.C., and Ratliff, H.D., "A Cut Approach to the Rectilinear Dis-
tance Facility Location Problem", Operations Research, Vol. 28, pp.
422-433 {1978}.
Pirkul, H., and Schilling, D., "The Maximal Covering Location Problem
With Capacities on Total Workload", Management Science, Vol. 37,
pp. 233-248 {1991}
Pirkul, H., and Schilling, D., "The Capacitated Maximal Covering Location
Problem With Backup Service", Annals of Operations Research, Vol.
18, pp. 141-154 {1989}.
Plane, D.R., and Hendrick, T.E., "Mathematical Programming and the
Location of Fire Companies for the Denver Fire Department", Oper-
ations Research, Vol. 25, pp. 563-578 {1977}.
Polopolus, L., "Optimum Plant Numbers and Locations for Multiple Pro-
duce Processing", Journal of Farm Economics, Vol. 47, pp. 287- 295
{1965}.
Price, W.L., and Turcotte, M., "Locating a Blood Bank", Interfaces, Vol.
16, pp. 17 - 26 {1986}.
Pritsker, A.A.B., "A Note to Correct the Procedure of Pritsker and Ghare
for Locating Facilities with Respect to Existing Facilities" , AIlE Trans-
actions, Vol. 5, pp. 84-86 {1973}.
Pritsker, A.A., Ghare, P.M., "Locating New Facilities with Respect to
Existing Facilities," AIlE Transactions, Vol. 2, pp. 290-297 {1970}.
Rand, G.K., "Methodological Choices in Depot Location Studies", Opera-
tional Research Quarterly, Vol. 27, pp. 241-249 {1976}.
Rao, A., "Counterexamples for the Location of Emergency Service Facili-
ties", Operations Research, Vol. 22, pp. 1259-1261 {1974}.
Ratick, S.J., and White, A.L., "A Risk-Sharing Model for Locating Noxious
Facilities", Environment and Planning B: Planning and Design, Vol.
15, pp. 165-179 {1988}.
ReVelle, C., and Serra, D., "The Maximum Capture Problem Including
Relocation", Information Systems and Operations Research, Vol. 29,
pp. 130-138 {1991}.
Set Covering, Set Packing and Set Partitioning 735

ReVelle, C., "Review, Extension and Prediction in Emergency Service Sit-


ing Models", European Journal of Operational Research, Vol. 40, pp.
58-69 (1989).

ReVelle, C., and Elzinga, D.J., "An Algorithm for Facility Location in
A Districted Region." Environment and Planning B: Planning and
Design, Vol. 16, pp. 41 -50 (1989).

ReVelle, C., and Hogan, K., "The Maximum Availability Location Prob-
lem", Transportation Science, Vol. 23, pp. 192-200 (1989).

ReVelle, C., and Hogan, R., "A Reliability - Constrained Siting Model
with Local Estimates of Busy Fractions", Environment and Planning
B: Planning and Design, Vol. 15, pp. 143-152 (1988).

ReVelle, C., "The Maximum Capture or 'Sphere of Influence' Location


Problem: Hotelling Revisited on a Network", Journal of Regional Sci-
ence, Vol. 26, pp. 343-358 (1989b).

ReVelle, C., and Hogan, K., "The Maximum Reliability Location Problem
and Alpha-Reliable P- Center Problem: Derivatives of the Probabilis-
tic Location Set Covering Problem", Annals of Operations Research,
Vol. 18, pp. 155-174 (1986).

ReVelle, C., Toregas, C., and Falkson, L., "Applications of the Location
Set-Covering Problem", Geographical Analysis, Vol. 8, pp. 65-76
(1976).

ReVelle, C., Marks, D., and Liebman, J.C., "An Analysis of Private and
Public Sector Location Models", Management Science, Vol. 16, pp.
692-707 (1970).

ReVelle, C., and Swain, R., "Central Facilities Location", Geographical


Analysis, Vol. 2, pp. 30-42 (1970).

Richard, D., Beguin, H., and Peeters, D., "The Location of Fire Stations
in a Rural Environment: A Case Study", Environment and Planning
A, Vol. 22, pp. 39-52 (1990).

Rojeski, P., and Revelle, C., "Central Facilities Location Under an Invest-
ment Constraint", Geographical Analysis, Vol. 2, pp. 343-360 (1970).
736 R.R. Vemuganti

Roodman, G. M., and Schwarz, L.B., "Extensions of the Multi-period Fa-


cility Phase-Out Model New Procedures and Application to a Phase-
in/Phase-out Problem", AIlE Transactions, Vol. 9, pp. 103-107
(1977).
Roodman, G.M., and Schwarz, L.B., "Optimal and Heuristic Facility Phase-
Out Strategies", AIlE Transactions, Vol. 7, pp. 177-184 (1975).
Rosing, K.E., "The Optimal Location of Steam Generators in Large Heavy
Oil Fields" , American Journal of Mathematical and Management Sci-
ences, Vol. 12, pp. 19-42 (1992).
Ross, G.T., and Soland, R.M., "Modeling Facility Location Problems as
Generalized Assignment Problem" , Management Science, Vol. 24, pp.
345-357 (1977).
Rushton, G., "Applications of Location Models", Annals of Operations
Research", Vol. 18, pp. 25 - 42 (1989).

Rydell, P.C., "A Note on the Principle of Median Location: Comments,"


Journal of Regional Science, Vol. 11, pp. 395-395-396 (1971).
Rydell, P.C., "A Note on a Location Principle: Between the Median and
the Mode," Journal of Regional Science, Vol. 7, pp. 185-192 (1967).

Sa, G., "Branch-and-Bound and Approximate Solutions to the Capacitated


Plant-Location Problem", Operations Research, Vol. 17, pp. 1007-
1016 (1969).
Saatcioglu, 0., "Mathematical Programming Models for Airport Site Se-
lection", Transportation Research-B, Vol. 16B, pp. 435-447 (1982).
Saedt, A.H.P., "The Siting of Green Forage Drying Plants for Use by a
Large Number of Farms - A Location-Allocation Case Study", Euro-
pean Journal of Operational Research, Vol. 6, pp. 190-194, (1981).

Saydam, C., and McKnew, M., "A Separable Programming Approach to


Expected Coverage: An Application to Ambulance Location", Deci-
sion Sciences, Vol. 16, pp. 381-397 (1985).
Schaefer, M.R., and Hurtur, A.P., "An Algorithm for the Solution of a
Location Problem", Naval Research Logistics Quarterly, Vol. 4, pp.
625-636 (1974).
Set Covering, Set Pa.cking a.nd Set Partitioning 737

Schilling, D.A., Jayaraman, V., and Barkhi, R, "A Review of Covering


Problems in Facility Location", Location Science, Vol. 1, pp. 25 - 55
(1993).

Schilling, D., "Strategic Facility Planning: The Analysis of Options", De-


cision Sciences, Vol. 13, pp. 1-14 (1982).

Schilling, D., ReVelle, C., Cohen, J., and Elzinga, D.J., "Some Models for
Fire Protection Location Decisions", European Journal of Operations
Research, Vol. 5, pp. 1-7 (1980).

Schilling, D., "Dynamic Location Modeling for Public Sector Facilities:


A Multicriteria Approach", Decision Sciences, Vol. 11, pp. 714-724
(1980).

Schilling, D., Elzinga, D.J., Cohen, J., Church, R,and ReVelle, C., "The
Team/Fleet Models for Simultaneous Facility and Equipment Siting",
Transportation Science, Vol. 13, pp. 163-175 (1979).

Schneider, J.B., " Solving Urban Location Problems: Human Intuition Ver-
sus the Computer" , American Institute of Planners, Vol. 37, pp. 95-99
{1971}.

Schneiderjans, M.J., Kwak, N.K, and Helmer, M.C., "An Application of


Good Programming to Resolve A Site Locaiton Problem" , Interfaces,
Vol. 12, pp. 65 - 72 {1982}.

Schreuder, J.A.M., "Application of a Location Model for Fire Stations in


Rotterdam", European Journal of Operational Research, Vol. 6, pp.
212-219 {1981}.

Scott, A.J., "Location-Allocation Systems: A Review", Geographical Anal-


ysis, Vol. 2, pp. 95-119 {1970}.

Shanker, RJ., and Zoltners, A.A., "An Extension of the Lock-Box Prob-
lem", Journal of Bank Research, Vol. 2, pp. 62 - 62 {1972}.

Shannon, R.D., and Ignizio, J.P., "A Heuristic Programming Algorithm for
Warehouse Location," AIlE Transactions, Vol. 2, pp. 334-339 {1970}.

Simmons, D.M., "A Further Note on One-Dimensional Space Allocation",


Operations Research, Vol. 19, pp. 249-249 {1971}.
738 R.R. Vemuganti

Simmons, D.M., "One Dimensional Space Allocation: An Ordering Algo-


rithm", Operations Research, Vol. 17, pp. 812-826 (1969).

Slater, P.J., "On Locating a Facility to Service Areas Within a Network",


Operations Research, Vol. 29, pp. 523-531 (1981).

Smith, H.L., Mangelsdorf, KR., Luna, J.S., and Reid, R.A., "Supplying
Ecuador's Health Workers Just in Time", Interfaces, Vol. 19, pp. 1-12
(1989).

Snyder, R.D., "A Note on the Location of Depots", Management Science,


Vol. 18, pp. 97 - 97 (1971a).

Snyder, R.D., "A Note on the Principles of Median Location", Journal of


Regional Science, Vol. 11, pp. 391-394 {1971b}.

Spielberg, K, "On Solving Plant Location Problems", In Applications


of Mathematical Programming Techniques, (Edited by E. M. Beale),
American Elsevier (1970).

Spielberg, K, "Plant Location with Generalized Search Origin", Manage-


ment Science, Vol. 16, pp. 165-178 (1969a).

Spielberg, K, "Algorithms for the Simple Plant Location Problem with


Some Side Constraints", Operations Research, Vol. 17, pp. 85 - 111
(1969b).

Stancill, J.M., "A Decision Rule Model for Establishment of a Lock-Box",


Management Science, Vol. 15, pp. 884-887 (1968).

Storbeck, J.E., "Classical Central Places As Protected Thresholds", Geo-


graphical Analysis, Vol. 22, pp. 4-21 (1990).

Storbeck, J.E., "The Spatial Structuring of Central Places", Geographical


Analysis, Vol. 20, pp. 93- 110 (1988).

Storbeck, J.E., and Vohra, V. "A Simple Teade-Off Model for Maximal
and Multiple Coverage", Geographical Analysis, Vol. 20, pp. 220-230
(1988).

Storbeck, J.E., "Slack, Natural Slack, and Location Covering", Socio-


Economic Planning Sciences, Vol. 16, pp. 99-105 (1982).
Set Covering, Set Packing and Set Partitioning 739

Swain, R.W., "A Parametric Decomposition Approach for the Solution of


Uncapacitated Location Problems", Management Science, Vol. 21, pp.
189-98 (1974).

Tansel, B. C., and Yesilkokcen, G., "Composite Regions of Feasibility for


Certain Classes of Distance Constrained Network Location Problems" ,
IEOR - 9313, Department of Industrial Engineering, Bilkent Univer-
sity, Ankara, 'lUrkey (1993).

Tansel, B.C., Francil, R.L., and Lowe, T.J., "Location on Networks, Part
I, the P-Center and P- Median Problems", Management Science, Vol.
29, pp. 482-497 (1983a).

Tansel, B.C., Francis, R.L., and Lowe, T.J., "Location on Networks, Part
II, Exploiting Tree Network Structure", Management Science, Vol. 29,
pp. 498-511 (1983b).

Tansel, B.C., Francis, and R.L., Lowe, T.J., "Binding Inequalities for Tree
Network Location Problems with Distance Constraints", Transporta-
tion Science, Vol. 14, pp. 107-124 (1980).

Tapiero, C.S., "Transportation-Location-Allocation Problems Over Time",


Journal of Regional Science, Vol. 11, pp. 377-386 (1971).

Taylor, P.J., "The Location Variable in Taxonomy," Geographical Analysis,


Vol. 1, pp. 181-195 (1969).

Teitz, M., and Bart, P., "Heuristic Methods for Estimating Generalized
Vertex Median of a Weighted Graph", Operations Research, Vol. 16,
pp. 955-961 (1968).

Teitz, M., "Towards a Theory of Urban Public Facility Location", Papers


of the Regional Science Association, Vol. 11, pp. 35-51 (1968).

Tewari, V.K., and Jena, S., "High School Location Decision Making in
Rural India an Location - Allocation Models", Spatial Analysis and
Location and Allocation Models (Editors, A. Ghosh and G. Rushton),
Van Nostrand Reinhold Company, Inc., New York, pp. 137-162 (1987).

Tideman, M., "Comment on a Network Minimization Problem,'" IBM


Journal, pp. 259-259 (1962).
740 R.R. Vemuganti

Toregas, C., and ReVelle, C., " Binary Logic Solutions to a Class of Location
Problems", Geographical Analysis, Vol. 5, pp. 145-155 {1973}.

Toregas, C., ReVelle C., Swain, R, Bergman, L., "The Location of Emer-
gency Service Facilities", Operations Research, Vol. 19, pp. 1363-1373
{1971}.

Toregas, C., and ReVelle, C., " Optimal Location Under Time or Distance
Constraints" , Papers of the Regional Science Association, Vol. 28, pp.
133-143 {1970}.

Valinsky, D., "A Determination of the Optimum Location of the Firefight-


ing Units in New York City" , Operations Research, Vol. 3, pp. 494-512
{1955}.

Van Roy, T.J., and Erlenkotter, D., "A Dual Based Procedure for Dy-
namic Facility Location" , Management Science, Vol. 28, pp 1091-1105
{1982}.

Vergin, RC., and Rogers, J.D., "An Algorithm and Computational Proce-
dure for Locating Economic Facilities", Management Science, Vol. 13,
pp. 240-254 {1967}.

Vijay, J., "An Algorithm for the P-Center Problem in the Plane", Trans-
portation Science, Vol. 19, pp. 235-245 {1985}.

Volz, RA., "Optimum Ambulance Location in Semi-Rural Areas", Trans-


portation Science, Vol. 5, pp. 193-203 {1971}.

Wagner, J.L., and Falkson, L.M., "The Optimal Nodal Location of Public
Facilities with Price- Sensitive Demand", Geographical Analysis, Vol.
7, pp. 69-83 {1975}.

Walker, W., "Using the Set-Covering Problem to Assign Fire Companies


to Fire Houses", Operations Research, Vol. 22, pp. 275-277 {1974}.

Watson-Gandy, C.D.T., "Heuristic Procedures for the M-Partial Cover


Problem on a Plane", European Journal of Operations Research, Vol.
11, pp. 149-157 {1982}.

Watson-Gandy, C.D.T., "A Note on the Centre of Gravity in Depot Loca-


tion", Management Science, Vol. 18, pp. B478-B481 {1972}.
Set Covering, Set Packing and Set Partitioning 741

Watson-Gandy, C.D.T., and Eilon, S., "The Depot Siting Problem with
Discontinuous Delivery Cost", Operational Research Quarterly, Vol.
23, pp. 277 - 287 (1972).
Weaver, J. -R., and Church, R.L., "A Median Locaiton Model with Non-
closest Facility Service", Transportation Science, Vol. 19, pp. 58
(1985).
Weaver, J. - R, Church, R.L., "A Comparison of Solution Procedures for
Covering Location Problems", Modeling and Simulation, Vol. 14, pp.
147 (1983).
Weaver, J. -R., and Church, R.L., "Computational Procedures for Location
Problems on Stochastica Networks", Transportation Science, Vol. 17,
pp. 168 (1983).
Wendell, R.E., and Hurter, Jr., A.P., "Optimal Location on a Network",
Transportation Science, Vol. 7, pp. 18-33 (1973).
Wesolowsky, G.O., "Dynamic Facility Location," Management Science,
Vol. 19, pp. 1241-1248 {1973}.
Wesolowsky, G.O., "Location in Continuous Space", Geographical Analy-
sis, Vol. 5, pp. 95-112 (1973).
Wesolowsky, G.O., and Love, R.F., "A Nonlinear Approximation for Solv-
ing a Generalized Rectangular Distance Weber Problem," Manage-
ment Science, Vol. 18, pp. 656-663 {1972}.
Wesolowsky, G.O., "Rectangular Distance Location Under the Minimax
Optimality Criterion", Transportation Science, Vol. 6, pp. 103-113
(1972).
Wesolowsky, G.O., and Love, R.F., "Location of Facilities with Rectangu-
lar Distances Among Point and Area Destinations", Naval Research
Logistics Quarterly, Vol. 18, pp. 83-90 (1971).
Wesolowsky, G.O., and Love, R.F., "The Optimal Location of New Facili-
ties Using Rectangular Distances", Operations Research, Vol. 19, pp.
124-129 {1971}.
Weston, Jr., F.C., "Optimal Configuration of Telephone Answering Sites in
a Service Industry", European Journal of Operational Research, Vol.
10, pp. 395 - 405 (1982).
742 R.R. Vemuganti

White, J.A., and Case, K.E. "On Covering Problems and the Central Facil-
ities Location Problem", Geographical Analysis, Vol. 6, pp. 281-293
{1974}.

Wirasinghe, S.C., and Waters, N.M., "An Approximate Procedure for De-
terming the Number, Capacities and Locations of Solid Waste Transfer
Stations in an Urban Region", European Journal of Operational Re-
search, Vol. 12, pp. 105-111 {1983}.

Young, H.A., "On the Optimum Location of Checking Stations", Opera-


tions Research, Vol. 11, pp. 721-731 {1963}.

General

Balas, E., "Disjunctive Programming", Annals of Discrete Mathematics,


Vol. 5, pp. 3-51 {1979}.

Balas, E., "Machine Sequencing via Disjunctive Graphs: An Implicit Enu-


meration Algorithm" Operations Research, Vol. 17, pp. 941-957
{1969}.

Bland, R. G., "New Finite Rules for the Simplex Method", Mathematics
of Operations Research, Vol. 2, pp. 103-107 {1977}.

Balanski, M., and Spielberg, K., "Methods for Integer Programming: Al-
gebraic, Combinatorial and Enumerative" , Progress in Operations Re-
search, Vol. III, Wiley {1969}.

Balanski, M., " Integer Programming: Methods, Uses, Computation", Man-


agement Science, Vol. 12, pp. 253-313 {1965}.

Balanski, M.L., "Fixed Cost Transportation Problem", Naval Research Lo-


gistics Quarterly, Vol. 8, pp. 41 - 54 {1961}.

Barr, R.S., Glover, F. and Klingman, D., "A New Optimization Method
for Large-Scale Fixed-Charge Transportation Problems", Operations
Research, Vol. 29, pp. 448-463 {1981}.

Breu, R., and Burdet, C.A., "Branch and Bound Experiments in 0-1 Pro-
gramming", Mathematical Programming Study, Vol. 2, pp. 1-50
{1974}.
Set Covering, Set Packing and Set Partitioning 743

Camerini, P., Fratta, L. and Maffioli, F., "On Improving Relaxation Meth-
ods by Modified Gradient Techniques", Mathematical Programming
Study, Vol. 3, pp. 26-34 (1975).

Christofides, N., "Zero-one Programming Using Non-Binary Tree Search",


The Computer Journal, Vol. 14, pp. 418-421 (1971).

Dyer, M.E., "Calculating Surrogate Constraints", Mathematical Program-


ming, Vol. 19, pp. 255-278 (1980).

Fisher, M.L., "An Applications Oriented Guide to Lagrangian Relaxation",


Interfaces, Vol. 15, pp. 10- 21 (1985).

Fisher, M.L., "The Langrangean Relaxation Method for Solving Integer


Programming Problems", Management Science, Vol. 27, pp. 1-18
(1981).

Fisher, M.L., "Worst-Case Analysis of Integer Programming Heuristic Al-


gorithms", Management Science, Vol. 26, pp. 1-17 (1980).

Fisher, M. L., Nemhauser, G.L., Wolsey, L., "An Analysis of Approxima-


tions for Finding a Maximum Weight Hamiltonian Circuit", Opera-
tions Research, Vol. 27, pp. 799-809 (1979).

Fisher, M.L., Northup, W.D., and Shapiro, J.F., "Using Duality to Solve
Discrete Optimization Problems: Theory and Computational Experi-
ence", Mathematical Programming Study, Vol. 3, pp. 56-94 (1975).

Fisher, M.L., and Shapiro, J. "Constructive Duality in Integer Program-


ming". SIAM Journal of Applied Mathematics, Vol. 27, pp. 31-52
(1974).

Fitzpatrick, D. W., "Scheduling on Disjunctive Graphs" , Ph.D Dissertation,


The Johns Hopkins University, Baltimore, Maryland (1976).

Garfinkel, R.S., and Neuhauser, G.L., "Integer Programming", Wiley, New


York (1975).

Garfinkel, R.S., and Nemhauser, G.L. , "A Survey ofInteger Programming


Emphasizing Computation an Relations Among Models" , Mathemati-
cal Programming, (Edited by T.C. Hu and S.M. Robinson), Academic
Press (1973).
744 R.R. Vemuganti

Gavish, B., "On Obtaining the 'Best' Multipliers for a Lagrangean Re-
laxation For Integer Programming". Computers and Operations Re-
search, Vol. 5, pp. 55-71 (1978).
Geoffrion, A.M., " Lagrangean Relaxation for Integer Programming", Math-
ematical Programming Study, Vol. 2, pp. 82 - 114 (1974).
Geoffrion, A.M., and Marsten, R.E., "Integer Programming Algorithms: A
Framework and State-of- the-Art Survey", Management Science, Vol.
18, pp. 465-491 (1972).
Glover, F., "Tabu Search - Part II", ORSA Journal on Computing, Vol. 2,
pp. 4-32 (1990).

Glover, F., "Tabu Search - Part I" , ORSA Journal on Computing", Vol. 1,
pp. 190-206 {1989}.
Glover, F., "Heuristics for Integer Programming Using Surrogate Con-
straints", Decision Sciences, Vol. 8, pp. 156-166 (1977).

Glover, F. "Surrogate Constraint Duality in Mathematical Programming",


Operations Research, Vol. 23, pp. 434-451 (1975).
Goffin, J .L., "On the Convergence Rates of Subgradient Optimization Meth-
ods", Mathematical Programming, Vol. 13, pp. 329-348 (1977).
Harrison, T.P., " Micro Versus Mainframe Performance for a Selected Class
of Mathematical Programming Problems", Interfaces, Vol. 15, pp.
14-19 (1985).
Held, M., Wolfe, P., and Crowder, H.P., "Validation of Subgradient Opti-
mization", Mathematical Programming, Vol. 6, pp. 62-88 {1974}.
Jeroslow, R.G., "Cutting Plane Theory: Disjunctive Methods", Annals of
Discrete Mathematics, Vol. 1, pp. 293-330, (1977).
Johnson, D.S., Demers, A., Ullman, J.D., Garey, M.R., and Graham, R.L.,
"Worst-Case Performance Bounds for Simple One-Dimensional Pack-
ing Problems", SIAM Journal on Computing, Vol. 3, pp. 299 - 326
{1974}.

Karp, R.M., "On the Computational Complexity of Combinatorial Prob-


lems", Networks, Vol. 5, pp. 45 - 68 {1975}.
Set Covering, Set Packing and Set Partitioning 745

Karp, R.M., "Reducibility Among Combinatorial Problems", Complexity


of Computer Computations, (Edited by R.E. Miller and J. W. Thatcher),
Plenum Press, New York, pp. 85-103 (1972).
Karwan, M.H., and Rardin, R., "Surrogate Dual Multiplier Search Pro-
cedures in Integer Programming", Operations Research, Vol. 32, pp.
52-69 (1984).
Karwan, M.H., and Rardin, R.L., "A Lagrangian Surrogate Duality in a
Branch and Bound Procedure", Naval Research Logistics Quarterly,
Vol. 28, pp. 93 - 101 (1973).
Klee, V., "Combinatorial Optimization: What is the State of Art?" Math-
ematics of Operations Research, Vol. 5, pp. 1-26 (1980).
Lawler, E.L., Lenstra, J.K., Rinooy Khan, A.H.G., and Shmoys, D.S.,
"Traveling Salesman Problem", John Wiley and Sons (1985).
Lawler, E.L., "Combinatorial Optimization", Holt, Reinhart and Winston
(1976).
McKeown, P.G, " A Branch-and-Bound Algorithm for Solving Fixed-Charge
Problems", Naval Research Logistics Quarterly, Vol. 28, pp. 607-617
(1981).
Mitra, G., "Investigation of some Branch and Bound Strategies for the
Solution of Mixed Integer Linear Programs", Mathematical Program-
ming, Vol. 4, pp. 150-170 (1973).
Nemhauser, G.L., and Wolsey, L.A., "Integer Programming", Handbooks
in Operations Research and Manangement Science, Vol. 1, Optimiza-
tion, (Edited by G.L. Nemhauser, A.H.G. Rinnooy Kan, M.J. Tood),
North-Holland, Amsterdam (1989).
Nemhauser, G.L., and Wolsey, L.A., "Integer and Combinatorial Optimiza-
tion", John Wiley and Sons, Inc. (1988).
Owen, J., "Cutting Planes for Programs with Disjunctive Constraints",
Journal of Optimization Theory and Applications, Vol. 11, pp. 49-55
(1973).
Padberg, M., "Perfect Zero-One Matrices", Mathematical Programming,
Vol. 6, pp. 180-196 (1974).
746 R.R. Vemuganti

Ragsdale, C.T., and McKeewn, P.G., "An Algorithm for Solving Fixed-
Charge Problem with Surrogate Constraints", Computers and Opera-
tions Research, Vol. 18, pp. 87-96 {1991}.
Raimond, J.F., "Minimaximal Paths in Disjunctive Graphs by Direct Search" ,
1MB Journal of Research and Development, Vol. 13, pp. 391-399
{1969}.
Shapiro, J., "A Survey of Lagrangian Techniques for Discrete Optimiza-
tion", Annals of Discrete Mathematics, Vol. 5, pp. 113-138 {1979}.
Singhal, J., Marsten, R.E., and Morin T.L., "Fixed Order Branch and
Bound Methods for Mixed- Integer Programming: The Zoom System" ,
ORSA Journal on Computing, Vol. 1, pp. 44-51 {1989}.
1

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (eds.) pp. 1-33
@1998 Kluwer Academic Publishers

Efficient Algorithms for Geometric Shortest Path Query


Problems
Danny Z. Chen
Department of Computer Science and Engineering, University of Notre Dame,
Notre Dame, IN 46556, U.S.A.
chen!Dcse.nd.edu

Contents
1 Introduction 2

2 The Gateway Paradigm 5

3 Exact Shortest Path Queries 6


3.1 The Visibility-Sensitive Approach . 6
3.2 The Reduced Visibility Graph Approach . 8
3.2.1 Reduced Visibility Graphs . . . . . 9
3.2.2 Characterization of Gateways .. . 10
3.2.3 Algorithms for Rectilinear Path Queries 13
3.3 Other Related Path Problems . . . . . . . . . . 14

4 Approximate Shortest Path Queries 15


4.1 Main Ideas . . . . . . . . . . . . 16
4.2 Planar Spanners . . . . . . . . . . . 17
4.3 Short Paths in Planar Graphs . . . . 19
4.3.1 A General Approximation Paradigm 20
4.3.2 3-Short Paths in Planar Graphs. . . 21
4.3.3 c-Short Paths in Planar Graphs. . . . 21
4.4 Short Path Queries amid Obstacles in the Plane 22
4.5 A Special Case . . . . . . . . . . . . . . . . . . . 25

5 Concluding Remarks 26

References
2 D. Z. Chen

Abstract Computing shortest paths in a geometric environment is a fun-


damental topic in computational geometry and finds applications in many
other areas. The problem of processing geometric shortest path queries
is concerned with constructing an efficient data structure for quickly an-
swering on-line queries for shortest paths connecting any two query points
in a geometric setting. This problem is a generalization of the well-
studied problem of computing a geometric shortest path connecting only
two specified points. This paper covers the newly-developed algorith-
mic paradigms for processing geometric shortest path queries and related
problems. These general paradigms have led to efficient techniques for de-
signing algorithms and data structures for processing a variety of queries
on exact and approximate shortest paths in a number of geometric and
graphical settings. Some open problems and promising directions for fu-
ture research are also discussed.

1 Introd uction
The problems of computing optimal or near optimal paths in a certain environ-
ment arise in many disciplines, and in fact are one of the several most powerful
tools for modeling combinatorial optimization problems. A well-known example
of path planning problem is to compute shortest paths in a graph [40, 48]. A
geometric version of this shortest path problem is, given a d-dimensional space
scattered with geometric obstacles, to compute a path connecting two speci-
fied locations such that the path does not intersect the interior of any obstacle
and such that the total length of the path (based on a certain metric such as
the Euclidean) is minimized. There are many variations and generalizations of
this geometric shortest path problem. Figure 1 gives an example of a shortest
obstacle-avoiding path in the plane.
Computing geometric shortest paths is a very fundamental topic in computa-
tional geometry, a field that studies the development of algorithms and analysis
of complexity for problems with useful geometric structures [42, 44, 68, 72].
Geometric path planning problems play an important role in many practical
applications, such as computer-aided design, geographical information systems,
intelligent transportation systems, operations research, pattern matching, plant
and facility layout, robotics, and VLSI. Furthermore, these problems have sig-
nificant connections with other fundamental topics in computational geometry
(e.g., convexity, geometric graphs, geometric optimization, triangulation, vis-
ibility, and Voronoi diagrams) and with other disciplines (e.g., graph theory,
combinatorial optimization, and networks). Problems in these areas or geomet-
ric topics often appear as subproblems in solving some geometric shortest path
Geometric Shortest Path Query Problems 3

Figure 1: A shortest obstacle-avoiding path connecting points p and q.

problems, and vice versa. For example, geometric path planning problems ad-
dress quite naturally the combinatorial and algorithmic aspects of robot motion
and navigation.
Consequently, a great deal of work has been done on solving various geo-
metric shortest path problems with respect to different types of environment
specifications, path constraints, obstacle natures, optimization criteria, espe-
cially for planar and 3-dimensional (3-D) settings [63, 64, 66]. In particular, for
computing shortest paths in the plane with polygonal obstacles, Hershberger
and Suri [54] obtained an optimal O(n log n) time algorithm (where n is the
total number of obstacle vertices) . For computing shortest paths in the 3-D
space with polyhedral obstacles, Canny and Reif [20] showed that the problem
is NP-hard, and several efficient approximation algorithms have been discovered
[36, 37, 69] .
In this paper, we mainly consider the following geometric shortest path query
problem:

Given a set of (possibly weighted) obstacles in a geometric space


S, construct an efficient data structure so that on-line queries on
shortest paths (and their "lengths") connecting any two query points
in S can be quickly answered (e.g., in polylogarithmic time for a
length query). If one of the two points for a path query is fixed,
then the queries are called single-source path queries.
Despite the fact that there are numerous geometric algorithms known for
computing a shortest path between two specified points and for computing
single-source shortest paths, few results were known, until very recently, for
solving geometric shortest path query problems, even in the plane. In fact, the
only known geometric path query algorithms before our recent results that are
to be discussed in this paper are for several cases of shortest paths inside a
simple polygon [10, 35, 41, 49, 51, 53, 74] . Clearly, the general problems which
are concerned with geometric shortest path queries among many obstacles are
of much more theoretical and practical interest, and are certainly much more
4 D. Z. Chen

challenging.
It is clear that geometric shortest path query problems are natural gener-
alizations of the well-studied problem of computing a geometric shortest path
connecting only two specified points. Geometric shortest path queries are use-
ful in databases for several application areas such as geographical information
systems and intelligent transportation systems. These problems model the prac-
tical situations in which quality routes (under certain criteria) between many
pairs of different locations or between two constantly-changing locations in a
geometric environment are sought. For example, it is very common for a police
car patrolling an area to quickly get to a spot where an accident has occurred.
In this situation, it is natural to treat both the locations of the police car and
the accident as on-line input data to a path query. In graph theory, a solution
to the shortest path query problem is nothing more than computing and storing
the all-pairs shortest paths in a graph. However, in a geometric environment,
a solution based straightforwardly on computing and storing all-pairs geomet-
ric shortest paths will not work well because there are uncountably infinitely
many points in a geometric space. Furthermore, unlike the graph version of
the problems, it was not immediately clear until recently how to use geomet-
ric single-source shortest path query algorithms and data structures to obtain
reasonably efficient solutions for processing geometric shortest path queries be-
tween arbitrary points.
In this paper, we discuss the newly-developed algorithmic paradigms for pro-
cessing geometric shortest path queries and related problems. (We were involved
in most of the recent results on geometric shortest path query problems.) These
general paradigms have led to very efficient techniques for designing algorithms
and data structures for a variety of queries on exact and approximate geometric
shortest paths in planar settings. More importantly, these frameworks offer the
promise of achieving new efficient algorithmic techniques for geometric shortest
path queries and for other related problems. We also discuss some open prob-
lems and promising directions for future research on geometric shortest path
problems.
The focus of our discussion will be on the planar settings that have polygonal
obstacles. Let n denote the number of obstacle vertices in the plane.
It should be pointed out that it is possible to solve some geometric short-
est path query problems in the plane by using known computational geometry
techniques (e.g., by reducing such a problem to higher dimensional point loca-
tion). However, such an approach tends to yield data structures with a storage
space and construction time of polynomials of rather high degrees, which would
hardly be usable to practical applications. In contrast, the algorithmic solutions
we present in this paper all take either near quadratic or even sub quadratic space
and construction time, and support in most cases polylogarithmic query time.
Hence, in addition to their theoretical merits, our solutions are likely to have
an impact to practical applications.
We will mainly discuss length queries (Le., reporting the length of a shortest
Geometric Shortest Path Query Problems 5

or approximate shortest path). The length of a path is often determined by


the Lp metric for some positive integer p. Recall that for two points a and b
in the plane, the Lp distance Dp(a, b) between a and b is defined as Dp(a, b) =
(lax - bxl P + lall - bIl IP )l/P , where ax and all denote the x- and y-coordinates
of the point a. When an actual shortest path is desired, our solutions, after
finding its length, can report such an actual path P in an additional O(IFD
time, where IFI is the number of segments of P.
The rest of the paper is organized as follows. Section 2 presents a general
paradigm, called the gateway paradigm, for processing geometric shortest path
queries. Section 3 applies this paradigm to several problems of processing exact
geometric shortest path queries. Section 4 applies this paradigm to problems
of processing approximate geometric shortest path queries. Section 5 discusses
some open problems and promising directions for future research.

2 The Gateway Paradigm


The following paradigm, which was introduced in [29, 30, 25], is a key to our
algorithmic techniques for processing geometric shortest path queries. Let s and
t be two query points between which a shortest path in a geometric environment
S is sought.
The Gateway Paradigm: For a query point s, identify a point set W,
in the geometric space S, such that a shortest path in S connecting
s and the other query point t passes through some point in W,. The
points in W, are called the gateways of s. Once W, is available, a
shortest path between s and t can be obtained from the single-source
path query data structures for the gateways of s (with each point in
Ws being the source of such a data structure).
For the gateway paradigm to work well, several difficulties must be resolved:
1. Ws must have a reasonably small size, because this will give rise to an
nowsi x f(n)) query time, where f(n) is the time for a single-source path
query and IWsl is the size of Ws.
2. It must be possible to quickly identify the gateway points of W8 for any
query point s (this is part of the cost for the query time).
3. It must be known how to build efficient single-source data structures for
the desired shortest paths (this is a key factor to the overall complexity
bounds of the shortest path query data structure).
4. Let W = U8E SW" where S is the geometric space (that is, W is the union
of gateways for all possible query points in S). The size of W must be
finite, and, furthermore, cannot be too large (this is because for each point
of W, it needs to build a single-source path query data structure).
6 D. Z. Chen

Several different forms of the above general gateway paradigm emerge in our
solutions to various geometric shortest path query problems, due to the specific
constraints and geometric structures of each of these problems. These solutions
will be discussed in the next two sections.

3 Exact Shortest Path Queries


In this section, we discuss efficient algorithmic techniques for processing various
length queries on exact geometric shortest paths among polygonal obstacles in
the plane. The lengths of the paths we consider are determined according to
the L 1 , L2 (Le., the Euclidean), the link (Le., the number of edges) metric, or
some combinations of these metrics. Our algorithms are crucially based on the
gateway paradigm.
One thing needs to be mentioned before we proceed: For any two query
points sand t, if the line segment st whose end points are sand t does not
cross any obstacle boundary edge, then the shortest path between sand t in
the L2 metric is simply st. In this case, we say that sand t are visible to each
other. It is easy to detect whether sand t are visible to each other by using
the ray shooting technique [1, 2, 3] in computational geometry. A ray shooting
operation is that given a point p in an obstacle scene and a direction d, "shoot"
a ray Ray(p, d) from p in the direction d, until Ray(p, d) hits for the first time
an obstacle boundary. To find out whether two query points sand t are visible
to each other, one only needs to shoot a ray from s in the direction of t. It is
known [1, 2, 3J that for a given polygonal obstacle scene, one can build a data
structure in O(n 2 ) space and O(n 2 1ogn) time that supports any ray shooting
operation in logarithmic time. Hence, it is easy to handle the case in which s
and t are visible to each other by using a ray shooting data structure. In the
rest of this section, we assume without loss of generality (WLOG) that the two
query points sand t are not visible to each other.

3.1 The Visibility-Sensitive Approach


We first discuss a general technique, called the visibility-sensitive approach [25J,
for various geometric path queries. This approach is based on the computation
of visibility polygons of query points sand t.
The visibility polygon Q8 of a point s is the (possibly unbounded) polygonal
region in the plane that is visible from s, with the obstacle boundaries being the
"opaque" objects (see Figure 2 for an example). The polygon Qs is represented
as a counterclockwise sequence of vertices of the obstacle scene that are visible
from s when scanning around s a ray that originates at s.
The simple observation we use here is that when the query points sand tare
not visible to each other, various optimal paths between s and t pass through
one of the vertices visible from s (resp., t). Therefore, we can use the vertices
Geometric Shortest Path Query Problems 7

··
·
··
· ",
"
,,-

Figure 2: The visibility polygon Q8 of a point s.

of Q8 (resp., Qt) as the gateways of s (resp., t). In this way, the set of gateways
for the whole obstacle scene consists of all the n obstacle vertices.
Based on the visibility complex of Pocchiola and Vegter [71, 70], we developed
in [25] the following visibility-sensitive approach:
Construct a geometric path data structure that supports, for any
two query points sand t in the plane, an O(min{IQsl, IQtI} x logn)
query time on path lengths.
This approach enables us to identify and use only O(min{IQsl, IQtl}) instead
of 0(IQ81 + IQtl) gateway points, without having to compute both Q8 and Qt
completely. Note that min{IQsl, IQtl} can be much smaller than n and can be
as small as 0(1). The visibility complex of a polygonal obstacle scene with n
vertices can be constructed in 0(n 2 ) time and space [71].
Using this approach, we achieve data structures for length queries on various
geometric paths that support a query time of 0(min{IQ81, IQtI} x logn). In [25],
this approach has been applied for solving three path query problems among
polygonal obstacles: (1) Euclidean shortest paths, (2) approximate minimum-
link paths, and (3) monotone paths (among convex polygonal obstacles).
• Euclidean shortest obstacle-avoiding paths among polygonal obstacles.
The data structure uses 0(n2 10gn) time and 0(n 2 ) space. This solution
makes use of Hershberger and Suri's single-source path data structure [54] .
• Approximate minimum-link paths among polygonal obstacles. The prob-
lem is to compute an obstacle-avoiding approximate minimum-link path
between any two query points. The path that we obtain has at most
two more links than the exact optimal path. The data structure takes
O(n((IEI + In)2/3n2/3[1/3Iog3.11 n + IEIlog3 n)) time and 0(n 2) space to
build, where E is the edge set of the visibility graph of the obstacle scene
and I is the longest link length among all the minimum-link paths between
any two obstacle vertices. This solution makes use of Mitchell, Rote, and
Woeginger's single-source path data structure [65].
8 D. Z. Chen

• Monotone paths among convex polygonal obstacles. The problem is to


find a direction d such that there is an obstacle-avoiding path between two
query points that is monotone to d (it was shown in [9] that for convex
polygonal obstacles, such a direction always exists). The data structure
uses O(n2 Iogn) time and O(n 2 ) space to build. This solution makes use
of Arkin, Connelly, and Mitchell's single-source path data structure [9].

3.2 The Reduced Visibility Graph Approach


The visibility-sensitive approach discussed in the previous subsection is quite
general and gives rise to several reasonably efficient path query data structures.
But, the query time of these data structures depends on the size of the smaller
visibility polygon of the query points, which can still be as big as n in the
worst case. In an attempt to reduce this query time while still maintaining
the efficiency of the path query data structures, we develop a different approach
called reduced visibility graph approach [29, 30, 25]. This approach is particularly
useful to queries on various rectilinear optimal paths, and has yielded data
structures that use near quadratic construction time and space and support
polylogarithmic query time.
A rectilinear geometric object is one whose boundary edges are parallel to a
Cartesian coordinate axis (i.e., horizontal or vertical). Note that in rectilinear
path problems, the polygonal obstacles need not be rectilinear. Since rectilinear
shortest paths have many applications, especially in VLSI design, considerable
work has been done on designing a variety of rectilinear shortest path algorithms
[58]. Notably, Mitchell [62, 61] obtained an optimal O(nlogn) time algorithm
for building an O(n) space data structure for planar rectilinear single-source
shortest path queries among general (non-rectilinear) polygonal obstacles.
Our reduced visibility graph approach is quite powerful and has been applied
to the following optimal path query problems [29, 30, 25].

• Ll shortest paths amid general polygonal obstacles.

• Shortest A-distance paths amid general polygonal obstacles. A-distance


paths are paths whose edges can be in arbitrary directions but whose edge
lengths are measured based on IAI given orientations, where A is a finite
set of allowed orientations [75].

• Rectilinear shortest paths amid weighted rectilinear polygonal obstacles,


such that the paths are allowed to penetrate "obstacles" with "weight
factors" by paying higher costs for the path lengths (more on this later).

• A rectilinear path with the minimum number of links among all the rec-
tilinear shortest paths between any two query points s and t, amid recti-
linear polygonal obstacles.
Geometric Shortest Path Query Problems 9

• A rectilinear shortest path among all the rectilinear minimum-link paths


between s and t, amid rectilinear polygonal obstacles.

• The rectilinear minimum-cost path between s and t amid rectilinear polyg-


onal obstacles, where the cost is a nondecreasing function of the number
of links and the length of a path.

The above paths whose "lengths" are determined according to a combination


of the L1 and link metrics are called bicriteria paths.
In a plane with weighted obstacles, each obstacle is associated with a non-
negative weight factor, so that a path in the plane, if it intersects the interior
of an obstacle, is charged extra cost based on the weight of that obstacle in
addition to the cost of the L1 length of the path. Specifically, the "length" of
a path in the interior of an obstacle with a weight factor of W is (1 + W)D 1 ,
where D1 is the length of the path in the L1 metric. Weighted obstacles often
appear in path planning problems in real applications. The weight factors, for
example, can represent speed limits or traffic conditions of streets in a city. Note
that shortest paths amid weighted obstacles are in fact a generalization of the
shortest paths that completely avoid the interior of obstacles, because shortest
paths become obstacle-avoiding if we let the weight of each obstacle be +00.
The rest of this subsection discusses the reduced visibility graph approach
and the resulting optimal path query algorithms.

3.2.1 Reduced Visibility Graphs


A key component of our rectilinear path query data structures is the reduced
visibility graph G = (V(G), E(G)), which captures the necessary information
about shortest paths among the n obstacle vertices. This graph was introduced
by Clarkson, Kapoor, and Vaidya [38, 39] for computing L1 shortest paths
amid general polygonal obstacles, and was generalized by Lee, Yang, and Chen
[57] to computing rectilinear shortest paths amid weighted rectilinear polygonal
obstacles. Yang, Lee, and Wong [76] also used this graph to compute rectilinear
optimal bicriteria paths amid rectilinear polygonal obstacles.
The vertex set V (G) of the reduced visibility graph G can be partitioned into
the set Vo of obstacle vertices and the set Vs of Steiner points. The following
recursive procedure is used to generate Steiner points for the graph G:

1. Draw a vertical (resp., hOrizontal) line L, which we call a cut-line, at the


median of the x- (resp., y-) coordinates of all the vertices in Vo.

2. Project all the vertices in Vo that are visible in the horizontal (resp.,
vertical) direction from the cut-line L onto L. The projection points of
Vo on L are the Steiner points of Vs on L. Figure 3 gives an example of
this step.
10 D. Z. Chen

Figure 3: Creating Steiner points on the cut-line L.

3. Use the cut-line L to partition Va into two subsets 8 1 and 8 2, one on each
side of L.
4. Perform this procedure recursively on the vertex sets 8 1 and 8 2 respec-
tively, until the size of each vertex set becomes 1.
Because the recursive procedure above has 0 (log n) recursion levels, it clearly
generates O(nlogn) Steiner points in Vs. Since IVai = O(n), there are totally
O(nlogn) vertices in V(G) = Va U Vs. We associate with each cut-line L a
level number, which is the number of the recursion level at which L is used in
the above procedure (with the root level being level!).
The edge set E( G) of G consists of the set Ev of line segments between
every Steiner point in Vs and its corresponding vertex in Va, and the set EL
of line segments between consecutive Steiner points on every cut-line if the two
consecutive Steiner points are visible to each other. Clearly, IE( G) I = O( n log n).
The algorithms in [38, 57] run Dijkstra's shortest path algorithm on G [48]
to find a shortest path between two points sand t in the plane (with both sand
t being included in V(G)), in O(nlog2 n) time and O(nlogn) space. A modified
version G' of the graph G was then used in [39, 57], such that G' consists of
O(nJlogn) vertices and O(nlog1.5 n) edges. Consequently, running Dijkstra's
shortest path algorithm on G' [48] takes O(n log1.5 n) time and space.
As will be seen in the next subsubsection, we use the vertex set V (G) of
G as the gateways for the whole plane. Hence, we need to compute shortest
paths in the reduced visibility graph G (with O(n log n) vertices and edges). In
[29, 30, 25], we are able to compute single-source and all-pairs shortest paths in
G without having to straightforwardly apply the shortest path algorithm in [48]
to the graph G. Actually, by exploiting the special structures of G, we compute
single-source shortest paths in G in O(n log1.5 n) time and O(n log n) space, and
all-pairs shortest paths in G in O(n2 log 2 n) time and space, both of which are
improvements over the previous algorithms in [38, 39, 57].

3.2.2 Characterization of Gateways


Our idea for computing a set Ws of gateways for a query point s is that of
"inserting" s into the reduced visibility graph G. That is, we treat s as if it were
Geometric Shortest Path Query Problems 11

level 4 level 2 level 4 level 3 level4 level I level 4


I
I
I

....... r--
I

----+----
............ 1.............. .

----'-----
I
js
I
I
R(s)
I
I
I
-"1- _
I

Figure 4: The gateways and gateway region 'R(s) of a point s.

one of the obstacle vertices and compute the edges in G adjacent to s that would
have resulted if the graph construction procedure in the previous subsubsection
for constructing G were applied to s. Note that such an "insertion" of a point
into G does not essentially change the shortest path information contained in
G (Le., s is not a new obstacle); furthermore, the resulting "graph" (after the
"insertions") contains shortest path information among the inserted points and
the vertices of G, if a shortest path between the inserted points does intersect
some vertices of G. Also, note that our "insertion" process does not actually
modify the graph G because the points are never truly inserted into G.
To "insert" a point s into the reduced visibility graph G, it is necessary to
project the point s onto the relevant cut-lines based on the graph construction
procedure. A fixed set of O(n) cut-lines are used in the construction of G.
These cut-lines subdivide the plane recursively and each cut-line is associated
with a particular recursion level. In consequence, all points in each region of the
resulting planar subdivision can be projected onto the same subset of cut-lines.
If, according to the graph construction procedure, s would have been projected
onto a cut-line L, then we say L is a projection cut-line of s. Note that if a
cut-line L is a projection cut-line of s, then s is visible in the horizontal or
vertical direction from L. But, not every cut-line visible from s is a projection
cut-line of s. It is sufficient for our discussion to use vertical cut-lines only.
The lemmas and observations in this subsections are proved in [29, 30, 25].
The set Ws of gateways for a point s is determined as follows: For each
projection cut-line L of s, if z is the vertex of G on L that is immediately
above (resp., below) the projection point of son L, then z E Ws (see Figure 4).
Observe that if s were "inserted" into G, then the only edges adjacent to s in
the graph would be those connecting s with its projection points. Because each
projection point of s is adjacent to at most two neighboring vertices of G on its
cut-line, Ws controls every path in G from s to any other vertex of G. Since
at each recursion level of the graph construction procedure, s can be projected
onto at most one cut-line, there are O(logn) projection cut-lines for s. Along
each projection cut-line of s, there can be at most two neighboring vertices of
G. Therefore, we have the following two lemmas.

Lemma 3.1 For each point s in the plane, IWsl = O(1ogn).


12 D. Z. Chen

Lemma 3.2 For each point 8 and each vertex v of G, there is a shortest 8-to-v
path in the plane that goes through a vertex of Ws.
For every vertex z E W s , we define an "edge" (8, z) in the graph as follows:
Let z be on a projection cut-line L of 8; then the edge (8, z) consists of the
segments 8p(8) and p(8)Z, where p(8) is the projection point of 8 on L.
A gateway region 'R(8) (Figure 4) of a point 8 is the area defined by an
interconnection of the vertices in Ws. WLOG, we only show how the gateways
of Ws that are in the first quadrant of 8 are connected together:
1. Let a vertical "pseudo" cut-line pass through 8.
2. For each vertex v of Ws in the first quadrant of 8, project v horizontally
to the projection cut-line L of 8 that is immediately to the left of v (it can
be shown that such a projection is always possible even among weighted
obstacles). Note that L can be the "pseudo" cut-line passing 8.
3. Let the projection point of von L be p{v) and the vertex of Ws that is on
L and in the first quadrant of 8 be u (if u exists).
4. Connect u and v by line segments up{ v) and p{ v )v.
5. If v is the rightmost gateway of 8, then connect v with the point (V x ,8 y)
by a segment.
6. If L is the pseudo-cut line passing through 8 (and hence u does not exist
on L), then connect v and p{v) by a segment.
The area in the first quadrant of 8 enclosed together by the polygonal chain so
defined, the x-axis, and y-axis (with s as the origin) is part of 'R( s). 'R( s) is the
union of the areas so defined in the four quadrants of 8.
A region 'R in the plane is said to be rectilinearly convex if 'R is a connected
region and for any horizontal or vertical line L, L n 'R is either a single segment
or empty. The following lemma characterizes some crucial structures of'R(8).
Lemma 3.3 The gateway region 'R(8) of any point s has the following struc-
tures.
• 'R{ 8) is rectilinearly convex.
• For any point z in'R(8), the points (zx, 8y) and (8 x , Zy) are both in'R(s).
• R(8) contains no vertices of Va U VI in its interior.
Lemma 3.3 implies that when obstacles are unweighted (Le., their weights are
all 00), the interior of'R(8) is either completely free of obstacle or is completely
contained within an obstacle. However, when obstacles are weighted, there ia
an additional case in which the interior of 'R{ 8) contains parallel "strips" of
obstacles which are all across R(s) (see Figure 5).
From the above discussion, one can see that the set of gateways for the whole
plane is the vertex set V{G) of the reduced visibility graph G.
Geometric Shortest Path Query Problems 13

.-------------- ~---I ·-r


i T
uu --
I :
.--u~u.J R(s)

Figure 5: Horizontal obstacle strips across the gateway region R(8).

3.2.3 Algorithms for Rectilinear Path Queries


By making use of the structures of a gateway region R( 8) (Lemma 3.3), Ws can
be computed efficiently for any query point 8, especially when R( 8) is completely
free of obstacle or is completely contained within an obstacle (e.g., in O(log2 n)
time by doing a binary search on the vertices of G along each projection cut-line
of s, or in o (log n) time by using a fractional cascading data structure [21, 22]).
But, there are still some difficulties. For example, amid weighted obstacles,
each edge connecting s with a gateway point in Ws can penetrate as many as
O(n) obstacle strips across the gateway region R(8) (Figure 5), making the
computation of such a weighted edge length seemingly quite costly. However,
by exploiting numerous geometric observations on these path problems, we are
able to compute Ws and the costs of all the edges adjacent to 8 in O(logn) time
[29, 30, 25j.
Our algorithms for various path query problems are summarized as follows.

• L1 shortest paths amid general polygonal obstacles [29, 30j. Here, the path
edges can be in arbitrary directions but the edge lengths are measured
in the L1 metric. We make use of Mitchell's single-source L1 shortest
path data structure [62, 61j. Our L1 path query data structure takes
O(n 2 1og2 n) time and O(n2 1ogn) space to build, and supports a length
query in O(log2 n) time.

• Shortest A-distance paths amid general polygonal obstacles [25J. We con-


struct a generalized reduced visibility graph, which has O(IAln log n) ver-
tices and edges. We show that there are O(IAllogn) gateways for each
query point. Since single-source shortest path data structure was not
available before, we need to develop both the single-source and general
path query data structures. Our general path query data structure takes
O(IA1 2n 2 1og2 n) time and space to build, and supports a length query in
O(IAI 2 Iog2 n) time.

• Rectilinear shortest paths amid weighted rectilinear polygonal obstacles


[29, 30J. We need to develop both the single-source and general path query
data structures. Our general path query data structure takes O(n2 1og2 n)
time and space to build, and supports a length query in O(log2 n) time.
14 D. Z. Chen

:-- a staircase separator

Figure 6: A staircase separator partitioning a set of rectangular obstacles.

• Rectilinear optimal bicriteria paths amid rectilinear polygonal obstacles


based on combinations of the L1 and link metrics [25]. We need to develop
both the single-source and general path query data structures. In addi-
tion, we generalize Yang, Lee, and Wong's special segment dragging data
structure [76] for our gateway-based procedure of answering path queries.
Our general path query data structure takes O(n2 log2 n) time and space
to build, and supports a length query in O(log2 n) time.

3.3 Other Related Path Problems


We studied in [13, 14] a simpler case of the problem of processing geometric
shortest path queries: Process rectilinear shortest path queries among disjoint
rectangular rectilinear obstacles. The techniques we developed involve a fast
parallel computation of staircase separators, and a parallel scheme for parti-
tioning the boundaries of the obstacles in a way that ensures that the resulting
path length matrices have a special mono tonicity property that was apparently
absent before applying our partitioning scheme. In particular, we presented in
[13, 16] optimal parallel algorithms for partitioning a set of m disjoint rectan-
gles into two subsets of size m/2 each, by using an obstacle-avoiding staircase,
called a staircase separator (see Figure 6 for an example). We showed that for
rectilinear shortest path queries among rectangular obstacles, each query point
needs to use at most two gateway points. The data structure was constructed
sequentially in O(n 2 ) time and space, and in parallel in O(log2 n) time using
O(n 2 / logn) CREW PRAM processors, for supporting an O{logn) length query
time [13, 14].
Our staircase separator algorithms have been used, by us and by others, in
solving various geometric paths and Voronoi diagram problems sequentially and
in parallel [16, 28, 45, 50, 67].
Matrices with the special monotonicity property that we used in [13, 14] are
called monotone matrices or Monge matrices [4, 5, 7]. Monotone matrices can
be multiplied in the (min, +) semi-ring very efficiently, even in parallel. For
Geometric Shortest Path Query Problems 15

example, two n x n monotone matrices can be multiplied in parallel in O(logn)


time and O(n 2 / logn) CREW PRAM processors [5, 7]. Monotone matrices find
numerous applications in many areas. In particular, we have discovered that
monotone matrices appear in many geometric and graph shortest path problems,
yielding very efficient algorithms for such path problems [13, 14,15,17,18,27].

4 Approximate Shortest Path Queries


In this section, we discuss a family of efficient algorithmic solutions for process-
ing various length queries on approximate geometric shortest obstacle-avoiding
paths among polygonal obstacles in the plane. The lengths of the paths we con-
sider are determined according to the Lp metric, where p is any positive integer.
These algorithms achieve an interesting trade-off between the approximation
factor, the query time, and the space and construction time of the data struc-
tures. In addition to the gateway paradigm, our algorithms are also based on
other important paradigms for approximation of geometric and graph shortest
paths.
Note that in a plane with multiple polygonal obstacles, the best known data
structures for even relatively simple cases of exact shortest path queries take
super-quadratic space, as shown in the previous section. However, solutions
of super-quadratic space can still be too expensive for many practical applica-
tions. Moreover, solutions suitable for many practical problems (e.g., in data
compression and geographical information systems) often need not use exact
shortest paths. In such situations, practitioners may be willing to sacrifice, to
a certain degree, the optimality of their solutions to gain algorithmic simplicity
and efficiency. Therefore, it makes sense to develop efficient solutions (e.g., of
sub-quadratic space) for queries on approximate geometric shortest paths whose
lengths are within a guaranteed small constant factor c of their corresponding
exact shortest paths. We will call such approximate shortest paths c-short paths
or simply short paths. For example, the exact shortest paths are I-short paths.
One of our goals in this section, therefore, is to illustrate some trade-off
between the approximation factor, the query time, and the space and construc-
tion time of the approximate shortest path query data structures that we have
developed [23, 8].
Very few previous results were known for computing approximate geometric
shortest paths in the plane. Notably, Clarkson [37] presented an algorithm for
processing Euclidean (1 +t)-short path queries amid polygonal obstacles, for any
positive constant t. Clarkson's algorithm uses O(n log n) time and O(n) space
for building the data structure, and answers a short path query in O(nlogn)
time. As was mentioned in [37], it is possible to extend Clarkson's result as
follows: In O(n 2 Iogn) time, an O(n 2 ) space data structure can be constructed
for supporting a length query in o (log n) time; the details of this solution is ac-
tually provided in [23]. Mitchell [62,61] gave a different approach for computing
16 D. Z. Chen

(1 + f)-short obstacle-avoiding paths in the plane.

4.1 Main Ideas


We first discuss the main ideas of our approximation solutions. Let V denote
the set of n obstacle vertices. Let Gv = (V, E) be the visibility graph of the
obstacle scene with vertex set V and edge set E, such that for any two distinct
vertices u, v E V, (u, v) is an edge in E if and only if the line segment uv does
not intersect the interior of any obstacle. The cost of edge (u, v) E E is the Lp
distance d(u, v) between u and v for a metric Lp. Clearly, lEI = O(n 2 ) in the
worst case. Let Length(P) denote the length of an obstacle-avoiding path P.
We will use the vertex set V of the visibility graph Gv as the set of gateway
points for the whole plane. To be able to compute efficiently a c-short path
Ps(P, q) between any two query points p and q in the plane, we need to overcome
two difficulties:

(1) "Discretize" the computation of Ps(P, q), i.e., compute Ps(P, q) based on
short paths Ps(u,v), for some obstacle vertices u,v E V.

(2) Obtain short paths between obstacle vertices quickly.

To handle difficulty (1) in a fast manner (e.g., in polylogarithmic time) for


each pair of query points requires the query procedure to examine Ps(u, v) for
only a small number of pairs of obstacle vertices u and v. To answer short
path queries between arbitrary query points in the plane in polylogarithmic
time and with a data structure in subquadratic space and construction time,
one problem we must get around is to avoid using general ray shooting. (In
contrast, recall that many of our solutions for processing exact path queries in
the previous section do make use of general ray shooting.) When two query
points are visible to each other, the true shortest path between them is found
naturally by ray shooting (in polylogarithmic time). However, a data structure
that supports general ray shooting operations in logarithmic time takes O(n 2 )
space and O(n 2 Iogn) time to build [1], or one can use sub quadratic space data
structures and perform ray shooting operations in superpolylogarithmic time
(e.g., see [1, 2, 3]). Neither of those ray shooting data structures gives us a
satisfactory solution. .
Difficulty (2), in a sense, is the problem of finding a good "spanner". Given
a set S of n points in a d-dimensional space, a t-spanner is a graph G whose
vertex set includes the points of S, such that for any two points u and v of S,
there is a u-to-v path in the graph G of length at most t times the distance
between u and v in a chosen Lp metric. The problem of constructing various
"good" spanners has attracted a considerable amount of attention recently (e.g.,
see [11] and the references given there). Our "spanner" is not only for a finite
set of points (which is the case for most spanner results), but for the entire
obstacle-scattered plane. To resolve difficulty (2), we need an efficient scheme
Geometric Shortest Path Query Problems 17

for computing and maintaining short paths between any two obstacle vertices
using only sub quadratic space and time. The scheme we use, actually, makes
use of certain spanners for discrete points.
Our scheme for resolving difficulty (2), called approximation of approxima-
tion, is outlined as follows:
Approximation 1: Use a planar spanner for the gateway point set V, to ap-
proximate shortest paths between any two gateway points in the plane.
Approximation 2: Develop efficient algorithms and data structures for com-
puting approximate shortest paths in such a spanner graph.
The rest of this section unfolds our solutions for resolving these difficulties.

4.2 Planar Spanners


As will be shown in the next subsection, planar spanners are particularly useful
for computing geometric short paths. For our gateway point set V, a planar
spanneris a graph Gp = (Vp,Ep) such that:

1. the set V of obstacle vertices is a subset of Vpj


2. the edges of G p represent straight-line segments that do not intersect the
interior of any obstaclej
3. for any two obstacle vertices u, v E V, there is a u-to-v path in G p which
corresponds to a c-short obstacle-avoiding path in the plane in an Lp
metricj
4. G p is a planar graph.
If Vp = V, then we call G p a planar Lp c-spanner. Otherwise, if Gp contains
additional vertices (called Steiner vertices), we call G p a planar Steiner Lp c-
spanner. The real number c ~ 1, representing the approximation factor of short
paths, is called the stretch factor of the spanner. Figure 7 gives an example
of a planar spanner of the visibility graph for a point set in the plane with no
obstacle.
There were already several algorithms that construct planar L2 c-spanners
in O(nlogn) time. The best known stretch factor in L2 is c = 2, which was
discovered by Chew [33, 31].
Ideally, one would like to have planar spanners with a stretch factor of 1 + f
for any given positive constant f. So one interesting question is: What kind of
planar spanners can achieve a 1 + f stretch factor? The following lemma, proved
in [8], sheds some light on the answer to this question.
Lemma 4.1 There exist point sets of size n in the plane such that any pla-
nar L1 spanner that is a subgraph of the visibility graph on such a point set
18 D. z. Chen


.---,----e
I
I
I


I
I

(a) (b)

Figure 7: The visibility graph (a) and a planar spanner (b) of five points in the
plane.

/'
~ ,
'.,
/

'/.
Ll length 1 '
/

.
".
""
".

Ll length 1

Figure 8: Our convex distance function is based on the isosceles triangle within
an L1 unit diamond.

(modeling the L1 distances among the given points) has a stretch factor ~ 2.
Furthermore, the stretch factor of 2 is tight, since there is an 0 (n log n) time
algorithm for constructing such a planar L1 2-spanner for the vertices of any
polygonal obstacle scene in the plane.
Our O(n log n) time algorithm for constructing a planar L1 2-spanner (with-
out using any Steiner vertices) hinges on computing a constrained Delaunay
triangulation using a convex distance function [34, 32, 33] based on an isosceles
triangle which is within an L1 unit diamond (shown in Figure 8). This approach
is similar to those used in [32, 33, 31].
Lemma 4.1 shows that, if one wants a planar L1 spanner with a stretch factor
less than 2, then it is necessary to use Steiner vertices. Since any Lp metric is
related to the L1 metric by a fixed constant factor, this conclusion also holds
for planar Lp spanners (i.e., Steiner vertices are needed for achieving planar Lp
spanners with a 1 + E stretch factor).
The following lemma, proved in [8], shows that planar Steiner L1 (1 + E)-
spanners exist and can be constructed efficiently.
Lemma 4.2 Given a collection of disjoint polygonal obstacles in the plane with
n vertices, and given any constant E > 0, a planar L1 (1 + E)-spanner with
O(n/E 2 ) Steiner vertices can be constructed in O(nlogn + n/( 2 ) time.
We have extended the results of Lemma 4.2 to the L2 metric (the constant
factor associated with the number of Steiner vertices used by the resulting planar
Geometric Shortest Path Query Problems 19

L2 spann~r depends on a higher degree polynomial of 1/f.). It seems possible to


even further extend this lemma to any Lp metric.
Our O(n log n) time algorithm for Lemma 4.2 is quite involved. It is based on
a special planar subdivision introduced by Arya et al. [12], which was originally
used for solving nearest neighbor searching problems. Interestingly, we are able
to apply this planar subdivision to constructing our planar Steiner spanners for
obstacle vertices. In [8], we first construct the planar subdivision of Arya et
al. [12] for a polygonal obstacle scene. Doing so in a straightforward manner,
however, may result in a planar subdivision with O(n 2 ) regions. We are able to
show that it is possible to group the O(n 2 ) regions into only O(n) regions, in
o(n log n) time. Finally, we carefully create grids on the resulting 0 (n) regions
of the subdivision. This gives us a planar Steiner spanner with a 1 + f. stretch
factor.
In comparison with the planar L1 and L2 2-spanners (without any Steiner
vertices) in [8,33,31], the planar Steiner (1 + f.)-spanners we obtain in Lemma
4.2 have a better stretch factor, but at a price of possibly using a large number
of Steiner vertices.
Before we conclude this subsection, we should mention that all our O(n logn)
time algorithms for constructing planar spanners (with or without using Steiner
vertices) are optimal in the algebraic computation tree model [19]. Specifically,
we prove the following lower bound results in [26]:

Lemma 4.3 The problem of constructing geometric spanners, possibly contain-


ing Steiner points, for sets of points in the d-dimensional space, and the problem
of computing approximate shortest paths amid a collection of polygonal obsta-
cles in the plane, for any given approximation factor, all take O(nlogn) time
to solve in the algebraic computation tree model.

4.3 Short Paths in Planar Graphs


This subsection discusses several algorithmic techniques for computing and
maintaining approximate shortest paths in weighted undirected planar graphs
with nonnegative real edge costs. Our algorithms all construct certain data
structures for short path queries in planar graphs in sub quadratic time and
space, and achieve an interesting trade-off between the approximation factor,
the query time, and the space and construction time of the data structures.
These solutions, of course, are immediately applicable to the planar spanners
discussed in the previous subsection because such planar spanners are simply
instances of weighted undirected planar graphs.
Let G = (V, E) be a weighted undirected planar graph with nonnegative
real costs on its edges, and let n denote the number of vertices of G. Note
that we cannot simply store the all-pairs shortest paths in G, since that would
require O(n 2 ) space. Hence, we must use different approaches for computing
short paths in G.
4.3.1 A General Approximation Paradigm
Our first approach for computing short paths in undirected planar graphs hinges
crucially on the following key observation which was proved in [23].

Lemma 4.4 For two elements p and q in an environment R, let R' be a sub-
set of R such that a shortest path pcp, q) between p and q in R goes through
an element in R'. For p (resp., q), let p' (resp., q') be an element in R'
such that the length of the shortest path P(P,p') (resp., P(q, q')) in R is the
shortest among all the paths between p (resp., q) and the elements in R'; i.e.,
Length(P(p,p')) = min{Length(P(p,w)) I wE R'} (resp., Length(P(q,q')) =
min{Length(P(q,w)) I w E R'}). Then Length(P(p,p') UP(p',q') UP(q',q)) ::;
3 x Length(P(p, q)).
Lemma 4.4 states that the concatenation of the three shortest paths P(p, p'),
pep', q'), and P(q', q) is a good approximation of the shortest path Pep, q)
(within at most three times). This observation is quite general since the en-
vironment R can be geometric or graphical, so long as the paths involved in
the observation are undirected. It immediately leads to the following general
paradigm for processing short path queries in R:

1. Find a subset R' ofR (we call R' a separatorofR), and use R' to partition
R into connected components R I , R2, ... , Rk. Ideally, R' would give rise
to a "balanced" partition of R.

2. Compute and store shortest paths in R from each element of R' to other
elements of R' (Le., an all-pairs shortest path computation among the
elements of R'). Also, compute and store, for each element p of R, a
shortest path in R from p to its closest element p' in R' (Le., the Voronoi
diagram on R whose sites are the elements of R'). Then the shortest path
information so prepared is sufficient for reporting an approximate shortest
path (within a factor of 3 of the exact shortest path) between any pERi
and q E Rj, 1 ::; i < j ::; k.

3. For every i = 1,2, ... , k, build the data structure recursively for R i , until
Ri becomes easily manageable.
In addition to the above general approximation paradigm, Lemma 4.4 has
several other important implications. For example, the size of a short path query
data structure that is based on the above paradigm can be affected greatly by
the size of the separator R', because the all-pairs shortest paths among the
elements of R' may need to be represented explicitly. Therefore, "small sized"
separators such as Lipton and Tarjan's planar separators [59] are particularly
useful in this framework. However, large sized separators, such as our staircase
separators [13, 14, 16], can also be appropriate, especially when the all-pairs
shortest paths among the elements of such separators can be made available
Geometric Shortest Path Query Problems 21

without paying high computational costs. Hence, with this framework, results
on various graphical and geometric separators can have a significant impact on
algorithms for computing approximate shortest paths.
Now it is clear that this approximation paradigm is especially applicable to
undirected planar graphs, due to Lipton and Tarjan's separator theorem [59]. In
fact, the approximation paradigm has led to a number of results on computing
short geometric and graph paths in [23, 8].

4.3.2 3-Short Paths in Planar Graphs


When this paradigm is applied to a weighted undirected planar graph G, we
develop a two-phase recursive algorithm for constructing a 3-short path query
data structure [23]. At each recursion level of the algorithm, a planar separator
[59], denoted by S, is computed and is used to partition a region of the graph
G into smaller regions. The Voronoi diagram on the region of G whose sites are
the vertices of S is computed, based on Mehlhorn's notion of Voronoi regions in
graphs [60] and making use of the optimal single-source shortest path algorithm
in planar graphs by Klein et al. [56]. The (global) all-pairs shortest paths among
the vertices of S are computed in the bottom-up and top-down phases of the
algorithm, by repeatedly using Dobosiewicz's all-pairs shortest path algorithm
[43]. Our algorithm maintains a structure for the recursive graph partitioning
process, called the partition tree of G. Queries on 3-short paths between any
two vertices of G are answered by first computing their lowest common ancestor
[52, 73] in the partition tree. .
As a result, our algorithm in [23] blJ,ilds the 3-short path query data structure
in O{n log n) space and 0{n 3 / 2 h/logn) time, supporting a length query in 0(1)
time. The data structure construction algorithm is further improved by using
Frederickson's hammock decomposition technique [47], to O{nlogn) space and
O{n log n + q3/2 /..;rc;g;j) time, where q, with 1 ~ q ~ n, is the minimum number
of faces needed to cover all the vertices of the planar graph.

4.3.3 c-Short Paths in Planar Graphs


The ideas and algorithm for computing 3-short paths in [23] inspired a further
study of c-short paths in weighted undirected planar graphs, where 1 ~ c ~ 3
[8]. In particular, the following results are presented in [8].
Lemma 4.5 Given an n-vertex planar graph G and an arbitrary integer p with
1 ~ P ~ Vn, it is possible to construct a data structure in 0{n 2 / p) time and
space, such that a length query on the exact shortest path between any two
vertices in G can be answered in O{p) time.
Note that the algorithm for Lemma 4.5 is for computing exact shortest paths
(Le., I-short paths) in G. It is based on the planar separators [59] and Freder-
ickson's notion of r-division [46]. It partitions the planar graph G into a division
22 D. Z. Chen

consisting of a number of regions (depending on the value of p), and then com-
putes shortest path trees in the entire graph G, with each of these shortest path
trees rooting at a vertex on the boundary of such a region.
The following lemma, proved in [8], is for computing 2-short paths in planar
graphs.
Lemma 4.6 Let u and v be any two vertices of a planar graph G, and let
S be a separator of G. If a shortest u-to-v path in G passes through a ver-
tex in S, then min{Length(P(u,bu» + Length(P(bu,v», Length(P(v,bv + »
Length(P(bv,u»} ~ 2 x Length(P(u,v», where P(u,v) is a shortest u-to-v
path in G and bu , called a closest separator vertex of u on the separator S, is a
vertex in S which satisfies Length(P(u, bu» ~ Length(P(u, w» for any vertex
win S.

Based on Lemma 4.6, we develop an algorithm for constructing a data struc-


ture for 2-short path queries in planar graphs. The 2-short path data structure
construction algorithm, actually, is somewhat similar to the algorithm for the
3-short paths in the previous subsubsection. One main difference between these
two algorithms is that the computation of a Voronoi diagram in a region of G
with the separator vertices as the sites is replaced by the computation of single-
source shortest path trees in the region rooted at each separator vertex. Our
2-short path algorithm achieves the following result.

Lemma 4.7 Given an n-vertex planar graph G, it is possible to construct a


data structure in 0(n 3 / 2 ) time and space, such that a length query on a 2-short
path between any two vertices in G can be answered in O(logn) time.
The query procedure supported by this 2-short path data structure takes
O(logn) time because it needs to perform an 0(1) time computation (based on
Lemma 4.6) at each of the O(logn) levels of the partition tree of G.

4.4 Short Path Queries amid Obstacles in the Plane


In this subsection, we show how to put together the results on planar Steiner
Lp c-spanners (Subsection 4.2) and on all-pairs short path queries in undirected
planar graphs (Subsection 4.3), as well as other useful geometric structures, to
solve the problem of processing Lp short path queries amid disjoint polygonal
obstacles in the plane (with p = 1,2). We give several algorithms and data
structures for the problem, which differ from each other in the approximation
factors of the short paths they obtain and in their complexity bounds. The
complexity bounds of these solutions also depend on a given value f > O.
Our Lp short path query data structures are also based on the gateway
paradigm (Section 2). Recall that our gateway points in this case are simply
all the n obstacle vertices. These path query data structures all consist of the
following two major components:
Geometric Shortest Path Query Problems 23

Part I. A data structure for answering queries on all-pairs short paths in a


planar Steiner Lp c-spanner as discussed in Subsection 4.2.
Part II. A data structure that, given any two query points sand t in the
plane, quickly reduces the computation of a short s-to-t path in the plane
to the computation of short paths between a constant number of pairs of
gateway points (the number of pairs of gateway points involved depends
on the value of f).
Given the results in Subsections 4.2 and 4.3, the data structure of Part I can
be easily constructed as follows. Let 101 > 0 be a value depending on the given 10
(101 will be decided later). We first obtain a planar Steiner Lp (1 + fd-spanner
with O(n) Steiner vertices, denoted by Gp, and then build a data structure for
answering all-pairs short path queries in the planar graph G p.
Assume that, in O(logn) time, we can reduce the computation of a short
path query between any two query points in the plane to the computation of
short paths between 0(1) pairs of gateways (we will show later how to perform
this O(log n) time reduction with the O(n) space data structure of Part II).
Once the data structures of Parts I and II are available, we can, for example,
process a short path query between any two query points in the plane in O(n)
time. This is done by first reducing such a query to computing 0(1) exact
shortest paths in G p, and then applying to G p the O( n) time single-source
shortest path algorithm of [56]. Note that, in this case, the data structure of
Part I needs to consist of only the graph G p itself, and hence it takes only
O(n) space and O(nlogn) time to build. This O(n) time query answering
algorithm compares favorably with Clarkson's O(n logn) time and O(n) space
algorithm for L2 (1 + f)-short path queries among polygonal obstacles in the
plane [37]. If the data structure of Part I supports queries on c-short paths
in Gp (Subsubsections 4.3.2 and 4.3.3), then the length query time for a short
path between any two query points in the plane is O(log n + fc)' where fc is the
length query time for a c-short path in G p.
Let 102 > 0 be a value depending on the given 10 (102 will be decided later).
The data structure of Part II takes 0((nlogn)/f2) time and 0(n/f2) space to
build. This data structure serves the purpose of, given two query points sand
t in the plane, computing a c-short s-to-t path from the short paths between
0((1/102)2) pairs of obstacle vertices. This data structure is given in [23] for
processing L2 short path queries in the plane. It is based on a modified version
of Clarkson's cone system [37], which we sketch below.
For a given value 10', 0 < 10' ~ 1, Clarkson [37] constructed a set of cones in
the plane with a common corner point a, called the apex, as follows: Let 'l/J =
min {n /12, f'n /2} if 10' < 1/2, and 'l/J = f'n /6 if 1/2 ~ 10' ~ 1; partition the plane
into a set F~ of cones with the point a as their apex, such that the angle of each
cone C a E F~ is smaller than 'l/J (see Figure 9 for an example). If no particular
apex point is referred to, the set of cones is denoted by F t/J • For two points a
and b that are visible to each other, with b lying inside a cone Ca E F~, the
24 D. Z. Chen

Figure 9: Clarkson's cone system: To apex a, u is the closest visible obstacle


vertex in the cone containing u.

approximate L2 distance D;''b from a to b is defined by D~'b = (b - a) . UC o '


where UC o is a fixed unit vector contained in Ca. Based on this cone system,
Clarkson built a sparse subgraph G(, of the visibility graph G v in a plane with
polygonal obstacles, such that for any two obstacle vertices a and u, (a, u) is an
edge of G(, if u is visible from a and there is a cone Ca E F~ containing u with
Df.-:. being the minimum (approximate) distance among all the obstacle vertices
in Ca - {a} that are visible from a (see Figure 9 for an example) . G(, hence
consists of all the vertices of Gv and of O(n/f.') edges, and can be obtained
in O((nlogn)/f.') time by using certain variants of Voronoi diagrams in the
plane, called conical Voronoi diagrams, whose sites are the vertices of Gv [37] .
By using G(, and an O(n/f.') space data structure for a set of O(l/f.') conical
Voronoi diagrams, a (1 + f.')-short path between any two points in the plane can
be reported in O(nlogn+n/f.') time [37] . Our geometric short path query data
structures here include a modified version of Clarkson's cone system and several
additional components, such as those for oriented ray shooting and for planar
point location used in [23]. One of our modifications to Clarkson's cone system
is to carefully choose a set of unit vectors for computing the approximate Lp
distances in each cone (the choices of these unit vectors ensure the correctness
of the query procedure [23]).
Our data structure for Part II enables us to perform the following compu-
tation in 0((logn)/f.2) time: Given any point s in the plane, compute a set Pa
of 0(1/f.2) gateway points, such that Pa is a subset of the obstacle vertices and
such that each v E Pa (v "I s) is the closest point among all the obstacle vertices
in a cone Ca E F~ (based on the approximate Lp distance defined by a unit
vector for Ca ) that are visible from s.
Given Parts I and II of our data structures, we can answer the length query
for a geometric short path Ps(s, t) between any two query points sand t as
follows:
Geometric Shortest Path Query Problems 25

1. Compute the gateway point sets Pa and Pt using the set of 0(1/f.2) conical
Voronoi diagrams.
2. For each of the 0((1/f.2)2) pairs of gateway points u E Pa and v E Pt ,
compute the length of a short u-to-v path in the planar spanner Gp (with
the desired stretch factor).
3. Find the shortest length among all the short obstacle-avoiding paths from
s to t via u and v so obtained. This gives the length of the short path
Ps(s, t).
The time complexity of this query procedure is 0((logn)/f.2) (for Step 1)
plus 0((1/f.2)2) times the time for a short path length query in the planar graph
Gp (for Step 2).
To complete the discussion of our geometric short path query data structures,
we must decide the values of f.1 and f.2 based on f.. Note that, because G p is a
planar Steiner Lp (1 + f.d-spanner with O(n) Steiner vertices and because the
stretch factors of the short paths in G p that our all-pairs short path query data
structures report are k E {I, 2, 3}, a short path in G p so computed corresponds
to a short obstacle-avoiding path in the plane with a stretch factor of k(1 + f.1)
= k + kf.1. Also, note that we have another approximation error induced by our
above geometric short path query procedure and this error depends on f.2. The
stretch factor of the short obstacle-avoiding paths we compute, therefore, can
be bounded by k + k(f.1 + f.2), and we would like to have k + k(f.1 + f.2) ~ k + f..
Choosing f.1 = f.2 = f./(2k) will be sufficient, and this only increases the time
and space complexity bounds of our geometric short path query data structures
by a constant factor.
We summarize the results of our geometric short path query data structures
in the following table.
Stretch Factor Construction Time Space Query Time
1+f. O(n"l. / p) O(n"l. / p) O(logn + p)
1+f. O(n logn) O(n) O(n)
2+f. 0(n 3 / 2 ) 0(n 3 / 2 ) O(logn)
3+f. O(n log n + q3/2 / y'IOgq) O(nlogn) O(logn)
TABLE 1: Our results for the geometric short path query data structures in the L1
and L2 metrics. Here, f is an arbitrarily small positive constant, p is an arbitrary
integer such that 1 ::; p::; ,Jii, and q is the minimum number of faces needed to cover
all the vertices of the planar spanner used by a data structure and 1 ::; q ::; n.

4.5 A Special Case


As mentioned in Sub subsection 4.3.1, not only small sized separators (e.g., [59])
are useful in our approximation paradigm, large sized graphical and geomet-
ric separators can also be effective. For example, Mitra and Bhattacharya [67J
26 D. Z. Chen

and we [28] have used the staircase separators, which we introduced for paral-
lel geometric shortest path algorithms and for other geometric path problems
[13, 14, 16], in designing data structures for approximate rectilinear shortest
path queries among rectilinear rectangular obstacles. Mitra and Bhattacharya'S
data structure in [67] takes O(n log2 n) space and O(n log3 n) construction time,
and supports an O(log2 n) time on length queries for 7-short paths. We pre-
sented an improved data structure that takes O(nlogn) space and O(nlog2n)
construction time, and supports an O{logn) time length query for 3-short paths
[28]. Observe that an obstacle-avoiding staircase separator S (e.g., see Figure 6),
although can be of size O(n), has the nice property that for any two points on S,
there is a rectilinear shortest obstacle-avoiding path connecting the two points
along S. Hence, the data structures for approximate shortest path queries in
[28, 67] need not explicitly compute and store the all-pairs shortest paths among
points on any such staircase separator, resulting in a significant saving in their
space and construction time. We also showed in [28] that in this special case,
each query point needs to use at most two gateway points.

5 Concluding Remarks
In this paper, we have discussed several newly-developed algorithmic paradigms
for processing geometric shortest path queries and related problems. We have
shown how these general paradigms lead to efficient techniques for designing
algorithms and data structures for processing a variety of queries on exact and
approximate shortest paths in a number of geometric and graphical settings.
There are many exciting and important problems on computing various ge-
ometric shortest paths that are yet to be solved. In [24], we raise a number of
research issues on geometric optimal path planning. In addition to the research
issues discussed in [24], the following open problems and research directions are
likely to receive considerable attention in the future:
• In [8], we show that for the Ll metric, it is possible to achieve a planar
spanner (without any Steiner vertices) with a stretch factor of 2, and
this stretch factor is tight. However, what about other Lp metrics for
p > 1? For example, it is already known how to construct planar spanners
(without any Steiner vertices) for the L2 metric with a stretch factor of 2
[33, 31]. Can this stretch factor be improved for the L2 metric? What is
the tight lower bound for this stretch factor?

• Our visibility-sensitive approach for processing queries on exact geometric


shortest paths (Subsection 3.1) relates the query time with the visibility
polygons of the query points, whose sizes can still be big. Can one reduce
this query time for exact geometric shortest paths in the L2 metric while
still being able to use a data structure with a near quadratic space and
construction time?
Geometric Shortest Path Query Problems 27

• Our approaches for processing queries on approximate geometric shortest


paths (Section 4) crucially hinge on efficient schemes for computing ap-
proximate shortest paths in planar spanners of the obstacle vertices. There
are many other geometric spanners that have a variety of useful proper-
ties (e.g., see [11, 37]). Can one develop efficient schemes for computing
approximate shortest paths in these geometric spanners?

• The geometric shortest path query algorithms in this paper all assume that
the plane is perfectly "fiat". However, path planning problems in practical
situations are likely to occur in non-fiat planes or even terrains. Distances
in such non-fiat geometric settings are usually more than the standard Lp
distances. For example, in [6], we define a distance (called skew distance)
in a non-fiat plane as the Euclidean distance plus a signed difference in
height between two points. The skew distance is a natural generalization
of the Euclidean distance. Efficient algorithms for processing shortest
path queries in non-fiat geometric settings with such generalized distance
functions need to be developed.

• It is well-known that computing geometric shortest paths in the 3-D space


with polyhedral obstacles is NP-hard [20]. Several efficient approximation
algorithms have been discovered for computing 3-D shortest paths [36,37,
69]. However, there is so far no efficient approximation algorithm known
for processing 3-D shortest path queries among polyhedral obstacles, even
for the L1 metric.

• All geometric shortest path query algorithms in this paper assume that the
geometric settings are static. However, in real applications, the geometric
environments are likely to be dynamic (e.g., obstacles are moving, the
weight factors of the regions are changing, etc). Shortest path problems in
dynamic geometric environments are much harder than the static versions.
So far, no efficient algorithm is known for processing shortest path queries
in dynamic geometric environments.

Acknowledgement. The work of the author was supported in part by the


National Science Foundation under Grant CCR-9623585.

References
[1] P.K. Agarwal, Ray shooting and other applications of spanning trees with
low stabbing number, SIAM J. Computing Vo1.21 No.3 (1992) pp. 540-570.

[2] P.K. Agarwal and J. Matousek, Ray shooting and parametric search, SIAM
J. Computing Vo1.22 No.4 (1993) pp. 794-806.
28 D. Z. Chen

[3] P.K. Agarwal and M. Sharir, Applications of a new space partitioning tech-
nique, Proc. of 2nd Workshop on Algorithms and Data Structures, 1991, pp.
379-391.

[4] A. Aggarwal, M.M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric


applications of a matrix searching algorithm, Algorithmica Vo1.2 (1987) pp.
209-233.

[5] A. Aggarwal and J. Park, Parallel searching in multidimensional monotone


arrays, Proc. of 29th Annual IEEE Symp. Foundations of Computer Science,
1988, pp. 497-512.

[6] O. Aichholzer, F. Aurenhammer, D.Z. Chen, D.T. Lee, A. Mukhopadhyay,


and E. Papadopoulou, Voronoi diagrams for direction-sensitive distances,
Proc. of 13th Annual ACM Symp. Computational Geometry, Nice, France,
1997, pp. 418-420.

[7] A. Apostolico, M.J. Atallah, L. Larmore, and H.S. McFaddin, Efficient par-
allel algorithms for string editing and related problems, SIAM J. on Com-
puting VoU9 (1990), pp. 968-988.

[8] S. Arikati, D.Z. Chen, L.P. Chew, G. Das, M. Smid, and C.D. Zaroliagis, Pla-
nar spanners and approximate shortest path queries among obstacles in the
plane, Proc. of 4th Annual European Symposium on Algorithms, Barcelona,
Spain, 1996, pp. 514-528.

[9] E.M. Arkin, R. Connelly, and J .S.B. Mitchell, On monotone paths among
obstacles, with applications to planning assemblies, Proc. of 5th Annual
ACM Symp. Computational Geometry, 1989, pp. 334-343.

[10] E.M. Arkin, J.S.B. Mitchell, and S. Suri, Optimal link path queries in a
simple polygon, Proc. of 3rd Annual ACM-SIAM Symp. Discrete Algorithms,
1992, pp. 269-279.

[11] S. Arya, G. Das, D.M. Mount, J.S. Salowe, and M. Smid, Euclidean span-
ners: Short, thin, and lanky, Proc. of 27th Annual ACM Symp. Theory of
Computing, 1995, pp. 489-498.

[12] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A. Wu, An


optimal algorithm for approximate nearest neighbor searching, Proc. of 5th
ACM-SIAM Symp. Discrete Algorithms, 1994, pp. 573-582.

[13] M.J. Atallah and D.Z. Chen, Parallel rectilinear shortest paths with rect-
angular obstacles, Computational Geometry: Theory and Applications VoU
No.2 (1991) pp. 79-113.
Geometric Shortest Path Query Problems 29

[14] M.J. Atallah and D.Z. Chen, On parallel rectilinear obstacle-avoiding paths,
Computational Geometry: Theory and Applications Vo1.3 No.6 (1993) pp.
307-313.
[15] M.J. Atallah and D.Z. Chen, Computing the all-pairs longest chains in
the plane, International Journal of Computational Geometry & Applications
Vo1.5 No.3 (1995) pp. 257-271.
[16] M.J. Atallah and D.Z. Chen, Applications of a numbering scheme for polyg-
onal obstacles in the plane, Proc. of 7th Annual International Symp. Algo-
rithms and Computation, Osaka, Japan, 1996, pp. 1-24.
[17] M.J. Atallah, D.Z. Chen, and O. Daescu, Efficient parallel algorithms for
planar st-graphs, Proc. of 8th Annual International Symp. Algorithms and
Computation, Singapore, 1997, pp. 223-232.
[18] M.J. Atallah, D.Z. Chen, and K.S. Klenk, Parallel algorithms for longest
increasing chains in the plane and related problems, Proc. of 9th Canadian
Conference Computational Geometry, Kingston, Canada, 1997, pp. 59-64.
[19] M. Ben-Or, Lower bounds for algebraic computation trees. Proc. of 15th
Annual ACM Symp. Theory of Computing, 1983, pp. 80-86.
[20] J. Canny and J.H. Reif, New lower bound techniques for robot motion plan-
ning problems, Proc. of 28th IEEE Annual Symp. Foundations of Computer
Science, 1987, pp. 49-60.
[21] B. Chazelle and L.J. Guibas, Fractional cascading: I. A data structuring
technique, Algorithmica VoLl No.2 (1986) pp. 133-162.
[22] B. Chazelle and L.J. Guibas, Fractional cascading: II. Applications, Algo-
rithmica VoLl No.2 (1986) pp. 163-191.
[23] D.Z. Chen, On the all-pairs Euclidean short path problem, Proc. of 6th
Annual ACM-SIAM Symp. Discrete Algorithms, San Francisco, 1995, pp.
292-301.
[24] D.Z. Chen, Developing algorithms and software for geometric path plan-
ning problems, ACM Computing Surveys Vol.28 NoAes (1996) Article
18, http://www.acm.org/pubs/citations/journals/surveys/1996-28-4es/a18-
chen/
[25] D.Z. Chen, O. Daescu, and K.S. Klenk, On geometric path query problems,
Proc. of Fifth International Workshop on Algorithms and Data Structures,
Halifax, Canada, 1997, pp. 248-257.
[26] D.Z. Chen, G. Das and M. Smid, Lower bounds for computing geometric
spanners and approximate shortest paths, Proc. of 8th Canadian Conference
on Computational Geometry, Ottawa, Canada, 1996, pp. 155-160.
30 D. Z. Chen

[27] D.Z. Chen and X. Hu, Efficient approximation algorithms for floorplan area
minimization, Proc. of 33rd ACM/IEEE Design Automation Conference,
Las Vegas, 1996, pp. 483-486.

[28] D.Z. Chen and K.S. Klenk, Rectilinear short path queries among rectangu-
lar obstacles, Information Processing Letters Vo1.57 No.6 (1996) pp. 313-319.

[29] D.Z. Chen, K.S. Klenk, and H.-Y.T. Tu, Shortest path queries among
weighted obstacles in the rectilinear plane, Proc. of 11th Annual ACM Symp.
Computational Geometry, Vancouver, Canada, 1995, pp. 370-379.
[30] D.Z. Chen, K.S. Klenk, and H.-Y.T. Tu, Shortest path queries among
weighted obstacles in the rectilinear plane, submitted, 1996.

[31] L.P. Chew, Planar graphs and sparse graphs for efficient motion planning
in the plane, Computer Science Tech. Report, PCS-TR90-146, Dartmouth
College, 1987.

[32] L.P. Chew, Constrained Delaunay triangulations, Algorithmica Vol.4 (1989)


pp. 97-108.

[33] L.P. Chew, There are planar graphs almost as good as the complete graph,
J. of Computer and System Sciences Vo1.39 (1989) pp. 205-219.

[34] L.P. Chew and R.L. Drysdale, Voronoi diagrams based on convex distance
functions, Proc. of 1st Annual ACM Symp. Computational Geometry, 1985,
pp. 235-244.

[35] Y.-J. Chiang, F.P. Preparata, and R. Tamassia, A unified approach to


dynamic point location, ray shooting, and shortest paths in planar maps,
Proc. of 4th Annual ACM-SIAM Symp. Discrete Algorithms, 1993, pp. 44-
53.

[36] J. Choi, J. Sellen, and C.-K. Yap, Approximate Euclidean shortest path in
3-space, Proc. of 10th Annual ACM Symp. Computational Geometry, 1994,
pp.41-48.

[37J K.L. Clarkson, Approximation algorithms for shortest path motion plan-
ning, Proc. of 19th Annual ACM Symp. Theory of Computing, 1987, pp.
56-65.

[38J K.L. Clarkson, S. Kapoor, and P.M. Vaidya, Rectilinear shortest paths
through polygonal obstacles in O(n(logn)2) time, Proc. of 3rd Annual ACM
Symp. Computational Geometry, 1987, pp. 251-257.
[39] K.L. Clarkson, S. Kapoor, and P.M. Vaidya, Rectilinear shortest paths
through polygonal obstacles in O(nlog3/2 n) time, manuscript.
Geometric Shortest Path Query Problems 31

[40] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms,
(McGraw-Hill Book Company, New York, 1990).

[41] M. de Berg, On rectilinear link distance, Computational Geometry: Theory


and Applications VoLl (1991) pp. 13-34.

[42] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwartzkopf, Compu-


tational Geometry: Algorithms and Applications, (Springer-Verlag, 1997).

[43] W. Dobosiewicz, A more efficient algorithm for the min-plus multiplication,


International Journal of Computer Mathematics Vol.32 No.1 (1990) pp. 49-
60.

[44] H. Edelsbrunner, Algorithms in Combinatorial Geometry, (Springer-Verlag,


Heidelberg, Germany, 1987).

[45] H. EIGindy and P. Mitra, Orthogonal shortest route queries among axes
parallel rectangular obstacles, International J. of Computational Geometry
& Applications Vol.4 No.1 (1994) pp. 3-24.

[46] G.N. Frederickson, Fast algorithms for shortest paths in planar graphs,
with applications, SIAM J. on Computing VoLl6 (1987) pp. 1004-1022.

[47] G.N. Frederickson, Planar graph decomposition and all pairs shortest paths,
J. of the ACM Vol.38 No.1 (1991) pp. 162-204.

[48] M.L. Fredman and R.E. Tarjan, Fibonacci heaps and their uses in improved
network optimization algorithms, Journal of the ACM Vol.34 No.3 (1987)
pp. 596-615.

[49] M.T. Goodrich and R. Tamassia, Dynamic ray shooting and shortest paths
via balanced geodesic triangulations, Proc. of 9th Annual ACM Symp. Com-
putational Geometry, 1993, pp. 318-327.

[50] S. Guha and I. Suzuki, Proximity problems for points on a rectilinear plane
with rectangular obstacles, Algorithmica VoLl7 (1997) pp. 281-307.

[51] L.J. Guibas and J. Hershberger, Optimal shortest path queries in a simple
polygon, J. of Computer and System Sciences Vol.39 (1989) pp. 126-152.

[52] D. Harel and R.E. Tarjan, Fast algorithms for finding nearest common
ancestors, SIAM J. Computing VoLl3 (1984) pp. 338-355.

[53] J. Hershberger, A new data structure for shortest path queries in a simple
polygon, Information Processing Letters Vo1.38 (1991) pp. 231-235.

[54] J. Hershberger and S. Suri, Efficient computation of Euclidean shortest


paths in the plane, SIAM J. on Computing (1998).
32 D. Z. Chen

[55] M. Iwai, H. Suzuki, and T. Nishizeki, Shortest path algorithm in the plane
with rectilinear polygonal obstacles (in Japanese), Proc. of SIGAL Work-
shop, Ryukoku University, Japan, 1994.

[56] P.N. Klein, S. Rao, M. Rauch, and S. Subramanian, Faster shortest-path


algorithms for planar graphs, Proc. of 26th Annual ACM Symp. Theory of
Computing, 1994, pp. 27-37.

[57] D.T. Lee, C.D. Yang, and T.H. Chen, Shortest rectilinear paths among
weighted obstacles, International J. of Computational Geometry & Applica-
tions, Vol.1 No.2 (1991) pp. 109-124.

[58] D.T. Lee, C.D. Yang, and C.K. Wong, Rectilinear paths among rectilinear
obstacles, Discrete Applied Mathematics Vol.70 (1996) pp. 185-215.

[59] R.J. Lipton and R.E. Tarjan, A separator theorem for planar graphs, SIAM
J. Applied Mathematics Vol.36 (1979) pp. 177-189.

[60] K. Mehlhorn, A faster approximation algorithm for the Steiner problem in


graphs, Information Processing Letters Vol.27 (1988) pp. 125-128.

[61] J.S.B. Mitchell, An optimal algorithm for shortest rectilinear paths among
obstacles, Proc. of 1st Canadian Conf. Computational Geometry, Montreal,
Canada, 1989.

[62] J.S.B. Mitchell, Ll shortest paths among polygonal obstacles in the plane,
Algorithmica Vol.8 (1992) pp. 55-88.

[63] J.S.B. Mitchell, Shortest paths and networks, in J.E. Goodman and J.
O'Rourke (eds.) Handbook of Discrete and Computational Geometry, (CRC
Press LLC, 1997), pp. 445-466.

[64] J.S.B. Mitchell, Geometric shortest paths and network optimization, in


J.-R. Sack and J. Urrutia (eds.) Handbook of Computational Geometry, (El-
sevier Science, Amsterdam, 1998).

[65] J.S.B. Mitchell, G. Rote, and G. Woeginger, Minimum-link paths among


obstacles in the plane, Algorithmica Vol.8 (1992) pp. 431-459.

[66] J.S.B. Mitchell and S. Suri, Geometric algorithms, in M.O. Ball, T.L. Mag-
nanti, C.L. Monma, and G.L. Nemhauser (eds.) Network Models, Handbook
of Operations Research/Management Science, (Elsevier Science, Amster-
dam, 1995), pp. 425-479,

[67] P. Mitra and B. Bhattacharya, Efficient approximation shortest-path


queries among isothetic rectangular obstacles, Proc. of 3rd Workshop on
Algorithms and Data Structures, 1993, pp. 518-529.
Geometric Shortest Path Query Problems 33

[68J K. Mulmuley, Computational Geometry: An Introduction through Random-


ized Algorithms, (Prentice Hall, New York, 1993).

[69J C.H. Papadimitriou, An algorithm for shortest-path motion in three di-


mensions, Information Processing Letters Vo1.22 (1985) pp. 259-263.

[70J M. Pocchiola and G. Vegter, Computing the visibility graph via pseudo-
triangulations, Proc. of 11th Annual ACM Symp. Computational Geometry,
1995, pp. 248-257.
[71J M. Pocchiola and G. Vegter, The visibility complex, Internatational J. of
Computational Geometry f3 Applications Vo1.6 No.3 (1996) pp. 279-308.

[72J F.P. Preparata and M.l. Shamos, Computational Geometry: An Introduc-


tion, (Springer-Verlag, Berlin, 1985).

[73J B. Schieber and U. Vishkin, On finding lowest common ancestors: Simplifi-


cation and parallelization, SIAM J. Computing Vo1.17 (1988) pp. 1253-1262.

[74J S. Schuierer, An optimal data structure for shortest rectilinear path queries
in a simple rectilinear polygon, International J. of Computational Geometry
f3 Applications Vo1.6 No.2 (1996) pp. 205-225.
[75J P. Widmayer, Y.F. Wu, and C.K. Wong, On some distance problems in
fixed orientations, SIAM J. on Computing Vo1.16 No.4 (1987) pp. 728-746.
[76J C.D. Yang, D.T. Lee, and C.K. Wong, Rectilinear path problems among
rectilinear obstacles revisited, SIAM J. Computing Vo1.24 No.3 (1995) pp.
457-472.
35

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VoL.2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 35-76
©1998 Kluwer Academic Publishers

Computing Distances between Evolutionary Trees


Bhaskar DasGupta1
Rutgers University. E-mail: bhaskar<Dcrab. rutgers. edu

Xin He 2
SUNY at Buffalo. E-mail: xinhe<Ocs. buff alo. edu

Tao Jiang3
McMaster University. E-mail: jiang<Dmaccs.mcmaster.ca

Ming Li4
City University of Hong Kong and University of Waterloo. E-mail:
mli<Omath.uwaterloo.ca

John Tromps
CWl. E-mail: tromp<Dcwi.nl

Lusheng Wang6
City University of Hong Kong. E-mail: lwang<Dcs.cityu.edu.hk

Louxin Zhang7
National University of Singapore. E-mail: lxzhang<Diss.nus.sg

lSupported in part by a CGAT (Canadian Genome Analysis and Technology) grant.


Work done while the author was at McMaster University and University of Waterloo.
2Supported in part by NSF grant 9205982 and CGAT.
3Supported in part by NSERC Operating Grant OGP0046613 and CGAT.
4Supported in part by the NSERC Operating Grant OGP0046506 and CGAT.
5Supported in part by CGAT and NSERC Internation Fellowship.
6Supported in part by Hong Kong Research Council.
7Supported in part by CGAT.
36 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

Contents
1 Introduction 36

2 The Nni and Subtree-transfer Distances 38


2.1 The Case of Unweighted Trees 39
2.2 The Case of Weighted Trees . . . . . . . . 40

3 Computing the Nni Distance 41


3.1 Unweighted trees: Computing nni distance exactly 41
3.2 Unweighted trees: Computing nni distance approximately 49
3.3 Weighted trees: generalizing the nni distance . . . . . . . 52

4 Computing the Subtree-Transfer Distance 53


4.1 The NP-hardness . . . . . . . . . . . . . . 54
4.2 An Approximation Algorithm of Ratio 3 .. 55

5 Linear-Cost Subtree-Transfer Distance on Weighted Phylogenies 57


5.1 An NP-hardness Result ... 57
5.2 An Approximation Algorithm . . . . . . . . . . . . . . . . . . . . .. 59

6 The Rotation Distance 65


6.1 Rotation and its equivalences . . . . . . . . . . . . . . . 65
6.2 Upper and lower bounds for the rotation distance ... . 68
6.3 Approximating the rotation and diagonal flip distances. 69
6.4 Miscellaneous remarks . . . . . . . . . . . . . . . . . . . 72

7 Open Questions 72

References

1 Introduction
Comparing objects to find their similarities or, equivalently, dissimilarities,
is a fundamental issue in many fields including pattern recognition, image
analysis, drug design, the study of thermodynamic costs of computing, cog-
nitive science, etc. Various models have been introduced to measure the
degree of similarity or dissimilarity in the literature. In the latter case the
degree of dissimilarity is also often referred to as the distance. While some
distances are straightforward to compute, e.g. the Hamming distance for bi-
nary strings, the Euclidean distance for geometric objects; some others are
Computing Distances between Evolutionary Trees 37

formulated as combinatorial optimization problems and thus pose nontriv-


ial challenging algorithmic problems, sometimes even uncomputable, such
as the universal information distance between two objects [4].
Distances based on the notion of economic transformation usually fall
in the latter category. In a nutshell, a transform based distance model
assumes a set of transformation operations or moves, each associated with a
fixed cost, which can be applied on the objects in the domain studied. The
set of transformation operations should be complete in the sense that any
object can be transformed into any other object by performing a sequence
of such operations. The distance between two objects is then defined as the
minimum cost of any sequence of operations transforming one object into
the other. 8 The best known transform based distances are perhaps the
edit distances for strings [40], labaled trees [43, 48] and graphs [49] using
operations insertion, deletion, and replacement. The edit distances have
applications in many fields including computational molecular biology and
text processing, and have been studied extensively in both the literature
and practical settings. For example, the UNIX command diff is essentially
based on string edit distance. String edit distance is also a particularly
suitable model for biological molecular sequence comparison because the edit
operations often represent the most common form of evolutionary events.
In this chapter, we survey recent results on some transformation based
distances for evolutionary trees (also called phylogenies). Such a tree is an
unordered tree, it has uniquely labeled leaves and unlabeled interior nodes,
can be unrooted or rooted if the evolutionary origin is known, can be un-
weighted or weighted if the evolutionary length on each edge is known, and
usually has internal nodes of degree 3. Reconstructing the correct evolution-
ary tree for a set of species is one of the fundamental yet difficult problems in
evolutionary genetics. Over the past few decades, many approaches for re-
constructing evolutionary trees have been developed, including (not exhaus-
tively) parsimony [12, 15, 39], compatibility [32], distance [16, 38], maximum
likelihood [12, 13, 3]. The outcomes of these methods usually depend on the
data and the amount of computational resources applied. As a result, in
practice they often lead to different trees on the same set of species [28].
It is thus of interest to compare evolutionary trees produced by different
methods, or by the same method on different data. Several distance models
for evolutionary trees have been proposed in the literature. Among them,

BUsually the operations are reversible so we do not have to specify the direction of a
transformation.
38 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

the best known is perhaps the nearest neighbor interchange (nni) distance
introduced independently in [37] and [35]. We will focus on nni and a closely
related distance called the subtree-transfer distance introduced in [19, 20] for
dealing with evolutionary histories involving events like recombinations or
gene conversions. Some variants of these distances will also be discussed.
Since computing each such distance is NP-hard, our main interest is in the
design of efficient approximation algorithms with guaranteed performance
ratios.
The rest of the chapter is organized as follows. We first formally define
the nni and subtree-transfer distances as well as a variant of subtree-transfer
distance, called the linear-cost subtree-transfer distance, in Section 2. It is
also demonstrated that the nni distance coincides with the linear-cost sub-
tree distance on unweighted evolutionary trees. Section 3 presents results
concerning the nni distance on both weighted and unweighted evolutionary
trees. In particular, we give some tight upper and lower bounds on the nni
distance, prove that computing the nni distance is NP-hard, which was a
long-standing open problem, and give some logarithmic ratio approximation
algorithms. Section 4 is concerned with the subtree-transfer distance on un-
weighted evolutionary trees. The main results include the NP-hardness of
computing the subtree-transfer distance and an approximation algorithm
with ratio 3. In Section 5, we consider the linear-cost subtree-transfer dis-
tance on weighted evolutionary trees and present a ratio 2 approximation
algorithm. In Section 6, we discuss a variant of the nni distance for rooted,
ordered trees, called the rotation distance, and present a nontrivial approx-
imation algorithm. Some open problems are listed in Section 7.
We assume the reader has the basic knowledge of algorithms and com-
putational complexity (such as NP and Pl. Consult, e.g., [17] otherwise.
Unless otherwise mentioned, all the trees in this paper are degree-3 trees
with unique labels on leaves. An edge of a tree is external if it is incident on
a leaf, otherwise it is internal.

2 The' Nni and Subtree-transfer Distances

In this section, we first define the nni, subtree-transfer, and linear-cost


subtree-transfer distances for unweighted trees. Then we extend the nni
and linear-cost subtree-transfer distances to weighted trees.
Computing Distances between Evolutionary Trees 39

2.1 The Case of Unweighted Trees


An nni operation swaps two subtrees that are separated by an internal edge

>:<
(u, v), as shown in Figure 1. The nni operation is said to operate on this
c

u v

7~
>.-<B>.-<C
C
u v
D D
u v
B

Figure 1: The two possible nni operations on an internal edge (u, v): ex-
change B t-+ C or B t-+ D.

internal edge. The nni distance, Dnni(T1 , T2), between two trees Tl and T2 is
defined as the minimum number of nni operations required to transform one
tree into the other. Although the distance has been studied extensively in the
literature [37, 35, 47, 6, 10, 5, 25, 26, 29, 42, 30, 31, 33], the computational
complexity of computing it has puzzled the research community for nearly
25 years until recently [7].
An nni operation can also be viewed as moving a subtree past a neigh-
boring internal node. A more general operation is to transfer a subtree from
one place to another arbitrary place. Figure 2 shows such a subtree-transfer
operation. The subtree-transfer distance, Dst{Tl, T2), between two trees Tl
s5 s5

one subtree transfer

sl
\ s2 s3 s4 sl s2 s3 s4

Figure 2: An example of subtree-transfer.

and T2 is the minimum number of subtrees we need to move to transform


Tl into T2 [19, 20, 22, 8, 7].
40 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

It is sometimes appropriate in practice to discriminate among subtree-


transfer operations as they occur with different frequencies. In this case, we
can charge each subtree-transfer operation a cost equal to the distance (the
number of nodes passed) that the subtree has moved in the current tree.
The linear-cost subtree-transfer distance, D/cst(T1 , T2 ), between two trees
Tl and T2 is then the minimum total cost required to transform Tl into
T2 by subtree-transfer operations [8, 7]. Clearly, both subtree-transfer and
linear-cost subtree-transfer models can also be used as alternative measures
for comparing evolutionary trees generated by different tree reconstruction
methods.
It is easy to demonstrate that the linear-cost subtree-transfer and nni
distances in fact coincide. As mentioned before, an nni move is just a re-
stricted subtree-transfer where a subtree is only moved across a single node.
(In Figure 1, the first exchange can alternatively be seen as moving node v
together with subtree C past node u towards subtree A, or vice-versa.) On
the other hand, a subtree-transfer over a distance d can always be simulated
by a series of d nni moves. Hence the linear-cost subtree transfer-distance
is in fact identical to the nni distance. However, it will soon become clear
that the two models are different on weighted trees.

2.2 The Case of Weighted Trees


An evolutionary may also have weights on its edges, where an edge weight
(more popularly known as branch length in genetics) could represent the evo-
lutionary distance along the edge. Many evolutionary tree reconstruction
methods, including the distance and maximum likelihood methods, actually
produce weighted evolutionary trees. Comparison of weighted evolutionary
trees has recently been studied in [28]. The distance measure adopted is
based on the difference in the partitions of the leaves induced by the edges
in both trees, and has the drawback of being somewhat insensitive to the
tree topologies [14]. Both the linear-cost subtree-transfer and nni mod-
els can be naturally extended to weighted trees. The extension for nni is
straightforward: An nni is simply charged a cost equal to the weight of the
edge it operates on. In the case of linear-cost subtree-transfer, although the
idea is immediate, i.e. a moving subtree should be charged for the weighted
distance it travels, the formal definition needs some care and is given below.
Consider (unrooted) trees in which each edge e has a weight w(e) ~ O. To
ensure feasibility of transforming a tree into another, we require the total
weight of all edges to equal one. A subtree-transfer is defined as follows.
Computing Distances between Evolutionary Trees 41

Select a subtree S of T at a given node u and select an edge e f/. S. Split the
edge e into two edges el and e2 with weights w(et} and w(e2) (w(el), w(e2) ~
0, w(et} + w(e2) = w(e)), and move S to the common end-point of el and
e2. Finally, merge the two remaining edges e' and e" adjacent to u into one
edge with weight w( e') + w( e"). The cost of this subtree-transfer is the total
weight of all the edges over which S is moved. Figure 2.2 gives an example.
The edge-weights of the given tree are normalized so that their total sum is
1. The subtree S is transferred to split the edge e4 to e6 and e7 such that
w(e6), w(e7) ~ 0 and w(e6) +w(e7) = w(e4)j finally, the two edges el and e2
are merged to e5 such that w(e5) = w(et} + w(e2)' The cost of transferring
Sis w(e2) + w(e3) + w(e6)'

1>1'<1\ I I IIII
,,A \ S
I S
L _______ ..)
\

(a) (b)

Figure 3: Subtree-transfer on weighted phylogenies. Tree (b) is ob-


tained from tree (a) with one subtree-transfer.

Note that for weighted trees, the linear-cost subtree-transfer model is


more general than the nni model in the sense that we can slide a subtree
along an edge with subtree-transfers. Such an operation is not realizable
with nni moves. Intuitively both these measures, especially the nni distance,
are more sensitive to the tree topologies than the one in [28].

3 Computing the Nni Distance


In this section, we discuss the complexity of computing the nni distance
between labeled or unlabeled trees, either exactly or approximately. We first
discuss the case of unweighted trees, and then consider the more general case
of weighted trees.

3.1 Unweighted trees: Computing nni distance exactly


The nearest neighbor interchange (nni) distance was introduced indepen-
dently in [37] and [35]. The complexity of computing the nni distance has
42 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

been open for 25 years (since [37]). The problem is surprisingly subtle given
the history of many erroneous results, disproved conjectures, and a faulty
NP-completeness proof [47, 5, 25, 26, 29, 30, 33]9
K. Culik II and D. Wood [6] (improved later by [33]) proved that n log n+
O(n) nni moves are sufficient to transform a tree of n leaves to any other
tree with the same set ofleaves. D. Sleator, R. Tarjan, and W. Thurston [42]
proved an n(n log n) lower bound for most pair of trees. A restricted version
of the nni operation, known as the tree rotation operation (discussed in
Section 6), was considered in [41] and a trivial approximation algorithm
with approximation ratio of 2 was given. But given two individual pair
of trees, computing the nni distance between them (either for labeled or
unlabeled trees) has been a long standing open question until recently when
this problem was settled (for both labeled and unlabeled trees) in [7, 9].

Theorem 1 Computing the nni distance (between two labeled or unlabeled


trees) is JVjD-complete.

We provide a rough sketch of the proof of Theorem 1 for labeled trees


(which is the more difficult case). The proof is by a reduction from Exact
Cover by 3-Sets (X3C) , which is known to be NP-complete [17], to our
problem. Recall that, given an instance S = {S1, ... , sm}, where m = 3q,
and 0 1 , ... , On, where Oi = {Sil' Si2' Sis}, the X3C problem is to find disjoint
sets Oil' ... , Oiq such ·that Uj=lOij = S. We will construct two trees Tl and
T2 with unique leaf labels, such that transforming from Tl into T2 requires
at most N (to be specified later) nni moves iff an exact cover of Sexists.
Here is an outline of our reduction. We can perform sorting with nni
moves and thus view nni as a special sorting problem. A sequence Xl ... Xk
can be represented as a linear tree as in Figure 4. For convenience, such
a linear tree will be simply called a sequence of length k. Sorting such a
sequence means to transform it by nni operations to a linear tree whose
leaves are in ascending order.

X2 ~-l
Xl I I I I I I I I I ~

Figure 4: A linear tree with k leaves.

9In [29], the author reduced the Partition problem to nni by constructing a tree of i
nodes for a number i, in an attempt to prove the NP-hardness of computing nni distance
between unlabeled trees.
Computing Distances between Evolutionary Trees 43

To construct the first tree T 1 , for each Si E 8, we create a sequence 8i


of leaves that takes a "large" number of nni moves to sort. We will make
sure that 8i and 8j are "very different" permutations for each pair i =I j, in
the sense that we cannot hope to have the sequence 8i sorted for free while
sorting the sequence 8j by nni moves and vice versa. Then for each set
Gi = {Sill Si2' Si3}, we create three sequences with the same permutations
as the sequences 8il' 8 i2 , 8i3' respectively, but with distinct labels. Such n
groups of sequences for G1, ... , Gn , each consisting of three sequences, will
be placed "far away" from each other and from the m sequences 81, ... , 8 m
in tree T 1. Tree T2 has the same structure as T1 except that all sequences
are sorted.
Here is the connection between exactly covering 8 and transforming T1
into T2 by nni moves. To transform T1 into T2, all we need is to sort the
sequences defined above. If there is an exact cover Gil' ... , Gi q of 8, we can
partition the m sequences 8 1 , ... , 8 m intor; = q groups, according to the
cover. For each Gj (j = i1, ... , iq) in the cover, we send the corresponding
group of sequences 8il , 8h, 8ja to their counterparts, merge the three pairs of
sequences with identical permutations, sort the three permutations, and then
split the pairs and transport the three sorted versions of 8il, 8h, 8ja back
to their original locations in the tree. Thus, instead of sorting six sequences
separately, we do three merges, three sortings, three splits, and a round trip
transportation of three sequences. Our construction will guarantee that the
latter is significantly cheaper. If there is no exact cover of 8, then either
some sequence 8i will be sorted separately or we will have to send at least
q + 1 groups of sequences back and forth. The construction guarantees that
both cases will cost significantly more than the previous case.
We now give more details. Apparently many difficult questions have to
be answered: How can we find these m sequences 8 1 , ... , 8 m that are hard
to sort by nni moves? How do we make sure that sorting one such sequence
will never help to sort others? How can we ensure that it is most beneficial
to bring the sequences Sjl' Sh, Sja to their counterparts defined for Gj to
get sorted, and not the other way?
We begin with the construction of the sequences 81, ... ,8m . Recall that
each such sequence is actually a linear tree, as in Figure 4. Intuitively, it
would be a good idea to take a long and difficult-to-sort sequence and break
it into m pieces of equal length. But this simple idea does not work for
two reasons. First, such a sequence probably cannot be found in polynomial
time. Second, even we find such a sequence, because the upper bound
in [6, 33] and the lower bound in [42] (see [33]) do not match, these pieces may
44 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

still help each other in sorting possibly by merging, sorting together, and
then splitting. The following lemma states that there exists two sequences
of constant size that are hard to sort and do not help each other in sorting.
We will build our m sequences using these two sequences.
Lemma 2 For any positive constant e > 0, there exists infinitely many k
for which there is a constant c and two sequences x and y of length k such
that (i) each of them takes at least (c - e)k log k nni moves to sort, (ii) each
of them takes at most ck log k nni moves to sort, and (iii) it takes at least
(1-e)c{2k) log{2k) nni moves to sort both of them together, i.e. the sequence
xy.
Proof. Note that for any c, k, x, y, statements (ii) and (iii) imply statement
(i). So it suffices to prove the existence of a constant c and an infinite
number of k's that satisfy conditions (ii) and (iii).
l,From the results in [6, 33, 42], we know that for each k, there exists a se-
quence of k leaves such that sorting the sequence takes at most k log k+O(k)
nni moves and at least !k log k - O(k) nni moves. Let us define Ck, for any
k, as the maximum number of nni steps to sort any sequence of length k,
divided by k log k. Since! - 0(1) ~ Ck ~ 1 + 0(1) there must be infinitely
many k satisfy C2k ~ Ck - ~. Taking x and y to be the two halves of a
hardest sequence of length 2k, for large enough such k, and taking c = Ck,
one can see that conditions (ii) and (iii) are satisfied. 0

Let e = 1/2, k a sufficiently large integer satisfying Lemma 2 and c, x, y


the corresponding constant and sequences. Next we use x and y, each of
length k, to construct m long sequences 81, ... , 8m . Choose m distinct
binary sequences in {O, 1} pog m1. Replace each letter 0 with the sequence
x m3 and each letter 1 with the sequence ym3. Give each occurrence of x and
y unique labels. Insert in front of every x and y block a delimiter sequence of
length k 2 with unique labels. This results in sequences 81, ... , 8 m , all with
distinct labels. We can show that these sequences have the desired properties
concerning sorting. The m sequences will have specific orientations in the
tree; let's refer to one end as head and the other end as tail.
We are now ready to do the reduction. From sets 8 = {Sl,"" sm},
and 0 1 , O2 , ... , On, we construct the two trees T1 and T2 as follows. For
each element Si, T1 has a sequence 8i as defined above. For each set
OJ = {Sil' Si2' Si3}' we create three sequences 8i,il' 8i,i2 , 8i,i3' with the same
permutations as 8il' 8i2' 8i3' respectively, but with different and unique la-
bels (we are not allowed to repeat labels).
Computing Distances between Evolutionary Trees 45

Figure 5 outlines the structure of tree T 1 . Here a thick solid line rep-
resents a sequence Si or Si,j with the circled end as head; a dotted line
represents a toll sequence of m 2 uniquely labeled leaves; a small black rect-
angle represents a one-way circuit as illustrated in Figure 6(i). The heads

doubly tree
connection

Figure 5: Structure of tree TI

of m sequences at the left of Figure 5 are connected by two full binary trees
connected root-to-root of depth log m + log n to the n toll sequences, each
leading to the entrance of a one-way circuit. The exit of each such one-way
circuit is connected to the entrances of three one-way circuits leading finally
to the three sequences corresponding to some set Ci.

b
(;) (ii)

Figure 6: One-way circuit

A one-way circuit is designed for the purpose of giving free rides to


subtrees moving first from 'a' to 'b' and then later from 'b' to 'a', while
imposing a large extra cost for subtrees first moving from 'b' to 'a' and
then from 'a' to 'b'. We will choose r so large (i.e. r = m 4 ) that it is not
worthwhile to move any sequence Si,j, corresponding to some Gi, to the left
through the one-way circuits to sort and then move it back to its original
location in TI. This can be seen as follows. The counterpart of the one-way
circuit in T2 is as shown in Figure 6(ii).
In any optimal transformation of circuit (i) to (ii), the u's are paired
up with the z's first and then the v's are paired with the u-z pairs. This
requires U r and VI to move up and out of the way. The pairing of the u's
essentially provides a shortcut for U r to reach Zr in half as many steps, and
46 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

similarly for VI.


In the following sorting a sequence Si or SiJ means to have each of its
x/y blocks sorted and then the whole sequence flipped. The tree T2 has the
same structure as TI except that

• all sequences Si and Si,; are sorted.

• each circuit in Figure 6(i) is changed to (ii).

Let M be the cost for sorting a sequence Si,; optimally (M can be


computed easily). The following lemma completes the reduction and thus
the proof of Theorem 1.

Lemma 3 The set S has no exact cover iff Dnni(TI, T2) 2: N +m2/2, where
N = q(logm+logn)+qm2 +28nm4 -28n+0(q)+3nM +(k2 +6k)m3 1ogm+
0(1).

We provide an informal sketch of the proof of Lemma 3; the reader is


referred to [7,9] for more formal proofs. Assume that we have an exact cover
for S. First, we show that the one-way circuit in Figure 6 behaves as was
claimed. This can be seen as follows. The counterpart of the one-way circuit
in T2 is as shown in Figure 6(ii). Consider any optimal transformation of
circuit (i) to (ii). A precise breakdown of the cost is as follows: (r - 3)/2
steps to move U r up, then r;l times 6 steps to move each U pair down
between the proper z's and pair them up, and one final step to pair U r .
The exact same number of steps is needed for the symmetric pairing of V's.
Hence,assuming r is odd, in total we need 2(r;3 + 6 r ;1 + 1) = 7r - 7 nni
moves. Note that a subtree situated at 'a' can initially pair up with U r in
2 steps and move together with it, spending 3 more steps to pop off just
before U r pairs with Zr, to end up at 'b'. It can later spend another 5 steps
to move together with VI ending up back at 'a'. Going first from 'b' to 'a'
and then back to 'b' could only be done 'for free' by pairing with VI first
and with U r later, since these are the only leaves to move away from 'b' and
'a' respectively in an optimal transformation. But for VI to reach 'a' with
minimum cost requires collapsing all the v's which imposes an extra cost on
pairing u's with z's later. The least penalty for moving from 'b' to 'a' back
to 'b' is thus for VI not to take the shortcut which costs an extra ~ steps.
In the following sorting a sequence Si or Si,j means to have each of its x / y
blocks sorted and then the whole sequence flipped. In order to transform TI
into T2, we need to sort the sequences Si and SiJ and cpnvert each one-way
Computing Distances between Evolutionary Trees 47

circuit to the structure shown in Figure 6(ii). If the set S has an exact cover
Oil' ... , Oi q , we can do the transformation efficiently as follows. For each OJ,
j = il, ... , i q , in the cover, we send the three sequences Si!' Sh, Sja to their
counterparts Sj,jll Sj,h, Sj,ja, merge each pair and sort them together, then
move the sorted Sjll Sh, Sja sequences back. During this process we also
get each one-way circuit involved into the correct shape. We then sort the
other sequences Si,j and get their leading one-way circuits into the correct
shape.
The total cost N for this process is calculated as follows. Recall that we
send precisely q groups of sequences to the right.

1. The overhead for these q groups to cross the tree connection network:
q(logm + logn) + 0(1) nni moves.
2. The cost of crossing the q toll sequences of length m 2 before the first
batch of one-way circuits: qm2 mH moves.

3. Converting each one-way circuit to the structure in Figure 6(ii) costs


7r - 7 nni's. Since we select r = m 4 and there are in all 4n one-way
circuits, the total cost is 28nm4 - 28n.
4. Moving a group of sequences across a one-way circuit and back costs
0(1) extra nni moves, for each of the q groups. The total cost is
therefore 0 (q).

5. Let M be the cost for sorting a sequence Si,j optimally. M can be


computed easily, given optimal ways to sort an x block and an y block.
Observe that the k 2 delimiter sequences inserted in front of each x/y
block prevent the folding of any sequence Si,j in an optimal sorting
procedure, i. e. it will not be beneficial for two blocks on the same
sequence to be merged and sorted together because it costs at most
ck log k nni moves to sort a block and k 2 nni moves to bring a block
across a delimiter sequence. Similarly, shrinking any sequence Si,j
does not help either. So totally we need 3nM nni moves to sort the
3n sequences defined for Ol, ... On.

6. The extra cost of merging each sequence Si with its counterpart Sj,i
while sorting the latter, and splitting it out when the sorting is done.
The process is as follows. We sort Sj,i block by block from head to
tail. Before processing each block, we first merge this block with the
corresponding block of Si. After sorting this pair of blocks, we split out
48 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, 1. Zhang

the sorted block of Si, and move down to the next block of Si, passing
a delimiter path of length k 2 length. So the extra cost to sorting Sj,i
is (k 2 + 6k)m3 10gm. Observe that the above process automatically
reverses Sj,i and Si.
Conversely, suppose that S has no exact cover. Then to transform Tl
into T2, either we have to send q + 1 groups to the right crossing the one-
way circuits or some sequence Si is sorted separately from Sj,i'S or some
sequence Si is sorted together with a "wrong" sequence Sj,h, where h f= i.
In the first case, the cost will be increased by m 2 nni moves, which is the
cost of moving an extra group past a delimiter sequence of length m 2 • In the
last case, at least one segment of m 3 x's is sorted together with a segment
of y's. By Lemma 2 and the choice € = 0.5, this is not much better than
sorting the two segments separately and costs at least 0.5cm 3 k log k - m 3 k
more nni moves than sorting one such segment, which is larger than m 2
for sufficiently large k and m. The second case introduces an extra cost of
(0.5cm 3 klogklogm) - m 3 klogm - m 2 by Lemma 2, which is again larger
than m 2 for sufficiently large k and m.
Notice that in the above definition of N, the bounds in items 2,3,5,6 are
all optimal. The bounds in items 1 and 4 are the worst case overheads and
may not be optimal. But these two items only account for O(m(logm +
log n)) nni moves, which is not sufficient to compensate for the extra cost
m 2 given above. This completes the sketch of proof of Lemma 3.
In practice, however, the trees to be compared usually have small nni
distances between them and it is of interest to devise efficient algorithms
for computing the optimal nni sequence when the nni distance is small, say
d. An nO (d) algorithm for this problem is trivial. With careful inspection,
one can derive an algorithm that runs in O(nO(l) . dO(d 2 )) time. It turns out
that by using the results in [42, 33], we could improve this asymptotically to
O(n 2 logn + n· 223d/ 2 ) time. To be precise, the following result was proved
in [7, 8].

Theorem 4 Suppose that the nni distance between Tl and T2 is at most d.


Then, an optimal sequence of nni operations transforming Tl into T2 can be
computed in O(n 2 logn + n· 223d/ 2 ) time.

A sketch of proof of Theorem 4 is as follows. Let Tl and T2 be the two


trees being compared. An edge el E Tl is good if there is another edge
e2 E T2 such that el and e2 partition the leaf labels of Tl and T2 identically;
el is bad otherwise. It is easy to see that Tl contains at least 1 and at
Computing Distances between Evolutionary Trees 49

most d bad edges. Moreover, assume that these bad edges form t connected
components BI, .. ' ,Bt (1 ::; t ::; d). As observed in [33], for an optimal nni
transformation, sometimes one or more nni operations are needed across a
good internal edge of T I . Consider the set of at most d - 1 good edges in
TI across which at least one nni operation is performed in an optimal nni
sequence. This set of good edges form at most d - 1 connected components
in TI . Consider anyone such connected component S. Since good edges in
Tl and T2 partition the trees in similar manner, it is very easy to see that
there must be at least one connected component Bi sharing a vertex with
S.
Using these observations, one can devise the algorithm shown in the
next page. Figure 7 illustrates how the algorithm works. Figure 7(a) shows
two bad edges a, /3 in TI (shown by thick lines) forming two connected
components (t = 2). In Figure 7(b) we show one choice of two subtrees
containing kl and k2 edges, and including the edges a and /3, respectively.
For each subtree, algorithm NNI-d computes all possible nni sequences such
that at most 3 nni are performed across edges of each subtree.
How fast does the algorithm run? There are at most (d+~-l) < 25d / 2
choices for the integers k1 , ... ,kt (using the fact that (j) ::; (2.8nfj)j). Note
that any subtree of k edges including a fixed edge can be represented by a
rooted binary tree on k + 2 nodes (the root corresponding to the middle of
the fixed edge), hence there are at most Ck +2 = k!3 (2:':24) ::; 22k such trees.
It follows that the total number of choices for the subtrees AI, ... ,At (for
any particular value of kl' ... ,kt) is at most 22::=1 (( 2k i+1)) ::; 23d. For each
tree Ai, the number of sequences of ki nni operations to consider is at most
3ki-124ki < 2 6k i by Lemma 1 of [33]. Combining everything, the number
of trees we have to examine is at most 2 5d / 2 . 2 3d . 26d < 223d/2. The set of
all good edges of Tl can be found in 0(n2log n) time and this time bound
is also sufficient to find the connected components of good edges. Using
the adjacency-list representation of trees, updating a tree during a single
nni operation can easily be done in 0(1) time, and whether two trees are
isomorphic can be easily checked in O(n) time. Hence, this algorithm finds
an optimal nni sequence in 0(n 2 1og n + n . 2 23d / 2 ) time.

3.2 Unweighted trees: Computing nni distance approximately


Since computing the nni distance is NP-hard, the next obvious question
is: can we get a good approximation of the distance? The following result
appeared in [33].
50 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

For every choice of integers k l , ... , kt 2:: 1, L:~=l ki ~ d do


For every choice of subtrees A!, ... ,At of TI such that
Ai has at most ki edges and contains the component Bi do
Examine all sequence of nni transformations across edges
of all Ai'S such that no more than ki nni operations
are performed across the edges of Ai
Among all sequences examined, select the one of shortest length that
transforms TI to T2

Algorithm NNI-d for the case when nni distance is bounded

(a) (b)

Figure 7: Illustration of how Algorithm NNI-d works (d = 6, kl = k2 = 3,


t = 2).

Theorem 5 The nni distance can be polynomial time approximated within


a factor oflogn + 0(1).

Proof. Given two trees, To, TI, we first identify the bad edges in To with
respect to Tl. These edges induce a subgraph of To consisting of one or more
components, each of which is a subtree of To. Each bad-edge component
links up the same set of neighboring shared-edge-components in To and T 1 ,
but it does so in different ways.
The algorithm transforms To into TI by transforming each non-shared
edge component separately. Consider a component consisting of k non-
shared edges in To. This links up k + 3 shared-edge-components, which we
can consider as leaves for the purpose of linking them up differently. So we
want to transform Go into Gl , where Gi is the (k + 3)-tree corresponding
to the component in Ti. By the 'compression'-method of [6], the distance
between Go and G1 is at most 4{k+3)log{k+3) + (4-10g3)(k+3) -12.
Computing Distances between Evolutionary 'frees 51

On the other hand, it is clear that any transformation from To into Tl must
use at least one nni operation on every non-shared edge.
The approximation factor of this algorithm is at most
I:4(k + 3) log(k + 3) + (4 -log3)(k + 3) -12 < 4nlogn + O(n)
I:k - n-3 '
since I: k is at most the number of internal edges, which is n - 3. 0

As is apparent from the previous two sections, the question of the com-
putability of the nni distance measure, which we will denote by d, has gen-
erated a lot of interest. Of course, a brute force method can be employed
which searches all (or a significant fraction of) trees in exponential time and
space ([33] implemented a C program that uses O(n) space to find the dis-
tance of any tree to a given one using a brute-force approach and could run it
for trees up to size 11). In an attempt to improve efficiency, Waterman and
Smith in [47] propose another distance measure, "closest partition" which
they conjecture is actually equal to d. The closest partition distance c(T, S)
for trees sharing a partition is defined recursively as the sum of the two
distances between the corresponding smaller parts resulting from splitting
each tree into two. For trees T, S not sharing a partition it is defined as
k + c(R, S), where k is the minimum number of nni operations required
to transform a tree T into a tree R that shares a partition with tree S.
Note that the nondeterminism in choosing R makes this measure somewhat
ill-defined. They base their conjecture on what [10] aptly calls a decom-
posability property (DP) of nni. Informally, DP says that if two trees can
each be split at some internal edge into identical subsets of leaves, then an
optimal transformation of one into the other can be found in which no nni
operation affects that internal edge. This claim appears in [47] as Theo-
rem 4. It's proof however appeals to their Theorem 3, which was shown
invalid in [26J with a 6-node counterexample. Consequently, [26J concludes
that the status of Theorem 4 is unresolved, and observes that Theorem 5
of [47J is a single step version of the Waterman and Smith's conjecture that
c = d. This conjecture was shown to fail in [26] and [5] in a weak sense (for
some choices that c allows), and shortly thereafter in [25] in a strong sense
(for all choices in defining c). These papers also point out that computation
of c appears to require exponential time as well, since there is no obvious
bound on k in the definition of c. The work in [30] shows a logarithmic gap
between measures c and d. Their example is a pair of trees, each on n = 2k
nodes equidistant from the central internal edge. In one tree, the leaves can
52 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

be drawn in normal order, while in the other, the leaves can be drawn in
bit-reverse order (e.g. 0,4,2,6,1,5,2,7). For this pair of trees one can show
d = 8(n), whereas c = 8(nlogn) (in the weak sense at least). Finally, the
following result was proved in [33], serving as a counterexample to all three
theorems 3, 4, and 5 of [47].

Lemma 6 There are trees To, Tl sharing a partition which is not shared by
any intermediate tree on a shortest path from To to T1 .

3.3 Weighted trees: generalizing the nni distance


In this section we discuss how to generalize the nni distance between two
trees Tl and T2 when both Tl and T2 are weighted. The cost of an nni
operation is now the weight of the edge across which two subtrees are
swapped. As mentioned before, many phylogeny reconstruction methods
produce weighted phylogenies. Hence the weighted nni distance problem is
also very important in computational molecular biology. NP-completeness
of the (unweighted) nni distance problem (in Section 3.1) implies the NP-
completeness of the weighted nni distance problem also.
The authors in [7, 9] present a polynomial time algorithm with logarith-
mic approximation ratio for computing the nni distance on weighted phy-
logenies, generalizing the logarithmic ratio approximation algorithm in [33]
(discussed in Section 3.2). The approximation for the weighted case is con-
siderably more complicated. Note that nni operations can be performed
only across internal edges. For feasibility of weighted nni transformation
between two given weighted trees Tl and T2 , we require in this section that
the following conditions are satisfied: (1) for each leaf label a, the weight
of the edge in Tl incident on a is the same as the weight of the edge in T2
incident on a, (2) the multisets of weights of internal edges of Tl and T2 are
the same.

Theorem 7 Let Tl and T2 be two weighted phylogenies, each with n leaves.


Then, Dnni(T1 , T 2 ) can be approximated to within a factor of 6 + 6 log n in
O(n 2 logn) time.

Note that the approximation ratio does not depend on the weights. In-
tuitively, the idea of the algorithm is as follows. We first identify "bad"
components in the tree that need a lot of nni moves in transformation pro-
cess. Then, for each bad component, we put things in correct order by first
converting them into balanced shapes. But notice that we cannot afford
Computing Distances between Evolutionary 1}ees 53

to perform nni operations many times on heavy edges. Furthermore, not


only the leaf nodes need to be moved to the right places, so do the weighted
edges. The main difficulty of our algorithm is the careful coordination of
the transformations so that at most o (log n) nni operations are performed
on each heavy edge.

4 Computing the Subtree-Transfer Distance

Figure 8: The operations.

In this section, we show that computing the subtree-transfer distance be-


tween two evolutionary trees is NP-hard and give an approximation algo-
rithm with performance ratio 3. Before we prove the results, it is again
convenient to reformulate the problem. Let Tl and T2 be two evolutionary
trees on set S. An agreement forest of Tl and T2 is any forest which can be
obtained from both Tl and T2 by cutting k edges (in each tree) for some k
and applying forced contractions in each resulting component trees. Define
the size of a forest as the number of components it contains. Then the max-
imum agreement forest (MAF) problem is to find an agreement forest with
the smallest size. The following lemma shows that MAF is really equivalent
to computing the subtree-transfer distance.
Lemma 8 The size of a MAF of Tl and T2 is one more than their subtree-
transfer distance.

The lemma can be proven by a simple induction on the number of leaves.


Intuitively, the lemma says that the transfer operations can be broken down
54 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

Y2m-1 Y2m
(a) (b)
Figure 9: (a) The tree Tl. (b) The subtree Ai.

into two stages: first we cut off the subtrees to be transferred from the rest in
TI (not worrying where to put them), then we assemble them appropriately
to obtain T2. This separation will simplify the proofs.

4.1 The NP-hardness


Theorem 9 It is NP-hard to compute the subtree-transfer distance between
two binary trees.

Proof. (Sketch) The reduction is from Exact Cover by 3-Sets. Let S =


{81' 82, ... 8 m } be a set and 01, ... ,Gn be an instance of this problem. As-
sume m = 3q.
The tree TI is formed by inserting n subtrees AI, ... ,An into a chain con-
taining 2n + 2m leaves Xl, ... ,X2n, YI,'" ,Y2m uniformly. (See Figure 9(a).)
Each Ai corresponds to Gi = {Gi,l, Ci,2, Gi,3}, and has 9 leaves as shown in
Figure 9(b). Suppose that Cj,j', Ck,k' and Cl,l' are the three occurrences of
an 8i E Sin O. Then in T 2 , we have a subtree Bi as shown in Figure lO(a).
For each Gi, we also have a subtree Di in T2 as shown in Figure lO(b). The
subtrees are arranged as a linear chain as shown in Figure lO(c).
Note that, each adjacent pair of subtrees Ai and Ai+! in TI is sepa-
rated by a chain of length 2 which also appears in T2 • Thus, to form a
Computing Distances between Evolutionary Trees 55

Uj,j
Uk,k l
Ul,ll
Vj,i'
Vk,k l Vl,ll

(a) (b) (c)


Figure 10: (a) The subtree Bi. (b) The subtree Di . (c) The tree T2 •

MAF of TI and T 2, our best strategy is clearly to cut off AI, A 2, ... , An
in TI and similarly cut off B I , B 2, ... ,Bm in T 2. This then forces us to
cut off D I ,D2 , ... ,Dn in T 2 . Now in each Ai, we can either cut off the
leaves Ui,l, Vi,l, Ui,2, Vi,2, Ui,3, Vi,3 to form a subtree containing three leaves
ai,l,ai,2,ai,3 (yielding 6 + 1 = 7 components totally), or we can cut off ai,l,
ai,2, and ai,3' In the second case, we will be forced to also cut links between
the three subtrees containing leaves {Ui,l, Vi,I}, {Ui,2, Vi,2} and {Ui,3, Vi,3} re-
spectively, as the Bi'S are already separated. Hence in this case the best we
can hope for is 3 + 3 = 6 components (if we can keep all three 2-1eaf subtrees
in the agreement forest).
It can be shown that C has an exact cover of S if and only if TI and T2
have an agreement forest of size 1 + 6q + 7(n - q) = 7n - q + 1. 0

4.2 An Approximation Algorithm of Ratio 3


Our basic idea is to deal with a pair of sibling leaves a, b in the first tree TI
at a time. If the pair a and b are siblings in the second tree T2, we replace
this pair with a new leaf labeled by (a, b) in both trees. Otherwise, we will
cut T2 until a and b become siblings or separated. Eventually both trees
will be cut into the same forest. Five cases need be considered. Figure 11
illustrate the first four cases. The last case (Case (v)) is that a and bare
56 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

OR

(i)

Al Ak Al Ak Al
(ii)

4 4 a b
4 a b
(iii) (iv)
Figure 11: The first four cases of a and b in T2.

also siblings in T2.


The approximation algorithm is given in Figure 12. The variable N
records the number of components (or the number of cuts plus 1).

Theorem 10 The approximation ratio of the algorithm in Figure 12 is 3,


i.e., it always produces an agreement forest of size at most three times the
size of a MAF for Tl and T 2 •

The NP-hardness proof can be easily strengthened to work for MAX


SNP-hardness. Thus there is no hope for a polynomial-time approxima-
tion scheme for this problem. Moreover, the small distance exact algorithm
described in section 3 for nni also works here.
Oomputing Distances between Evolutionary Trees 57

Input: Tl and T2.


O. N:= 1;
1. For a pair of sibling leaves a, b in Tl,
consider how they appear in T2 and cut the trees:
Case {i}: Cut off the middle subtree A in T2; N := N + 1;
Case {ii}: Cut off a and b in both Tl and T2 ; N := N + 2;
Case {iii}: Cut off a and b in both Tl and T2 ; N := N + 2;
Case {iv}: Cut off bin T l ;
Case {v}: Replace this pair with a new leaf labeled (a, b) in both Tl
and T2;
2. If some component in the forest for Tl has size larger than 1,
repeat Step 1.
Output: The forest and N.

Figure 12: The approximation algorithm of ratio 3.

5 Linear-Cost Subtree-Transfer Distance on Weighted


Phylogenies
In this section we investigate the linear-cost subtree-transfer model on
weighted phylogenies.

5.1 An NP-hardness Result


It is open whether the linear-cost subtree-transfer problem is NP-hard for
weighted phylogenies. However, we can show that the problem is NP-hard
for weighted trees with non-uniquely labeled leaves.

Theorem 11 Let Tl and T2 be two weighted trees with {not necessarily


uniquely} labeled leaves. Then, computing Dst(Tl, T2) is NP-hard.

Proof. Our proof is by a reduction from the following Exact Cover by 3-Sets
(X3C) problem.

INSTANCE: S = {Sl, ... ,sm}, where m = 3q, and Gl , ... ,Gn, where Gi =
{Sill Si2' Si3} ~ S.

QUESTION: Are there q disjoint sets Gil' ... , Gi q such that Uj=l Gij = S ?
58 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

X3C is known to be NP-complete [17]. Given an instance of the X3C


problem, we will construct two trees Tl and T2 with leaf labels (not neces-
sarily unique) as shown in Figure 5.1, such that transforming from Tl into
T2 requires subtree-transfers of total cost exactly 1 iff an exact cover of S
exists.

---- n long arms ------ - m short anns ---. n-q long arms x
x x x -.----r------,-r,--.----.-----~

'all the'
remaining
x x x labels

e a1 elX;z e
an

x x x x ••• x
x x x x x
x x x
x x

s2 3 sn 3
s 11 s1 2 s21 s22 Sn 1 sn 2

Tl T2

Figure 13: Trees Tl and T2 used in the proof of Theorem 11. The leaf labels
are shown beside the corresponding leaves. The notations for some of the
internal edges are shown beside the corresponding edges. The edge weights
are as follows: w(e a1 ) = w(e a2 ) = ... = w(e an ) = w(elh) = w(e,B2) = ... =
w(e,Bn_q) = k,
w(e'YJ = w(e'Y2) = ... = w(e'Ym) = 3~' and all other edges
have zero weights.

Tl has n long arms, al,"" an. T2 has n - q long arms, fJl,"" fJn-q,
and m short arms, 11, ... "m' Each long (resp. short) arm consists of an
k
edge of weight (resp. 3~), with three leaves (resp. one leaf) labeled by the
same label x (x ¢ S), connected to it as shown in Figure 5.1. For notational
convenience, let eai (resp. e,Bi' e'Yi) denote the edge of non-zero weight in
the long arm ai (resp. in the long arm fJi, in the short arm Ii)' In T l , at
the bottom of the ith long arm ai, we attach a subtree ti consisting of three
leaves, as shown in Figure 5.1, labeled by the three elements Sil' Si2 and Si3
of Ci . At the bottom of each long arm ofT2 , there are no additional subtrees
attached. The labeling of the remaining leaves of T2 is as follows:
Computing Distances between Evolutionary Trees 59

• At the bottom of the ith short arm "ti, we attach a leaflabeled by Si.

• The remaining 3n - m leaf labels (each leaf label is an element of S)


are associated (in any order) with the 3n - m leaves in the middle of
T2 between the long and the short arms.

Note that the trees Tl and T2 are not uniquely labeled. The following
claim proves the correctness of the NP-hardness reduction.

Dst(Tl, T2) = 1 iff there is a solution of the X3C problem.


A proof of the above claim can be found in [8]. o

5.2 An Approximation Algorithm


In this section, we present an approximation algorithm for computing the
linear-cost subtree-transfer distance on weighted trees. First, we introduce
some notations and a lower bound on the subtree-transfer distance which will
be useful in subsequent proofs. For any tree T, let E(T) (resp. V(T)) denote
the edge set (resp. node set) of T and L(T) denote the set of leaf nodes
of T. An external edge of T incident on a leaf node a is denoted by eT (a).
Let Eint(T) and Eext(T) denote the set of internal and external edges of T,
respectively. For a subset E' ~ E(T), define w(E') = EeEEI w(e). Define
Wint(T) = W(Eint(T)) and Wext(T) = w(Eext(T)). Partition Eext(Tt} into
three subsets as follows:

Eext,Tl>T2(Tt} - {eTl (a) I w(eTl (a)) > w(eT2(a))}


Eext,n=T2(Tt} {eTl(a) I w(eTl(a)) =w(eT2(a))}
Eext,Tl<T2(Tl) {eTl (a) I w(eTl (a)) < w(eT2(a))}
Wext,Tl>T2(Tl) = L w(eTl (a)) - w(eT2(a))
eTl (a)EEe:t t ,Tl>T2(Td

Similarly, E ext (T2) can be partitioned into: E ext ,Tl>T2(T2), E ext ,Tl=T2(T2),
and Eext,Tl <T2 (T2). Wext,Tl <T2(T2) is defined analogously. The following
lemma is easy to prove.
60 B. DasGupta, X. He, T. Jiang, M. Li, J. 1}omp, L. Wang, L. Zhang

We next define the notion of good edge pairs as follows:

Definition 13 Let el E Eint(Td and e2 E Eint(T2). Let T{ and T{' be the


two subtrees of TI partitioned by el. Let Ta and Ta' be the two subtrees of
T2 partitioned by e2. el and e2 are called a good pair of Tl and T2 iff the
following two conditions hold:

1. L(TD = L(Ta) and L(T{') = L(Ta').


2. One of the following two conditions holds:

(a) w(E(T{)) ~ w(E(Ta)) < w(E(T{)) + w(edi or


(b) w(E(Ta)) ~ w(E(T{)) < w(E(Ta)) + w(e2)'

Lemma 14 If Tl and T2 share no good edge pairs, then:


(1) Dst(Tt, T2) ~ Wint(Tl) + Wext,Tl>T2(Tl)i
(2) Dst(Tl, T2) ~ Wint(T2) + Wext,Tl<T2(T2).

Proof. We only prove (1). The proof of (2) follows from (1) and Lemma 12.
For each edge e E E(Td, we determine the minimum portion of e over which
some subtrees ofTl must be transferred in order to transform Tl to T2' First,
consider an edge el E Eint (Td. By the assumption of the lemma, there is
no edge e2 in T2 such that el and e2 are a good pair. There are two cases:

Case 1. The partition of L(TI ) induced by el is different from the partition


of L(T2) induced by any edge in T2. Then, in order to transform TI to T2 ,
some leaf nodes of TI must be transferred across the entire length of el.

Case 2. The partition of L(TI) induced by el is the same as the partition of


L(T2) induced by an edge e2 inT2' Let T{ and T{' be the two subtrees of
TI partitioned by el. Let Ta and Ta' be the two subtrees of T2 partitioned
by e2, where L(TD = L(Ta) and L(T{') = L(Ta').

Case 2.1. w(E(Ta}} ~ w(E(T{))+w(ed. In this case, in order to transform


T{ to Ta, some subtree in T{ must be transferred across entire length of
el·
Case 2.2. w(E(T{)) ~ w(E(T2))+w(e2). This implies: w(E(T{'))+w(et} ~
w(E(Ta')). In order to transform T{' to T2', some subtree in T{' must be
transferred across the entire length of el.
Computing Distances between Evolutionary 1fees 61

In either case, some subtree of Tl must be transferred across the entire


length of el with cost w(el).
Next consider an edge eTl (a) E Eext ,Tl>T2(Tt}. In order to transform
eTl (a) to eT2(a), a subtree of TI must be transferred across a portion of
eTl (a) of length w(eTl (a)) - w(eT2 (a)). Thus:
Dst(Tl, T2) > L:eEE,nt(Td w(e) + L:eEEezt,Tl >T2 (Tdw(eTl (a)) - w(eT2 (a))]

o
We say that nodes connected by O-weight edges are equivalent and call
the resulting equivalence classes super-nodes. Let el, ... , ek be all positive
weight edges incident to a super-node o. With 0 cost, we can re-connect the
edges el, ... ,ek by any subtree, consisting of only 0 weight edges. In partic-
ular, the following observation will be useful in our subsequent descriptions.
Observation. Let 0 be a super-node of T. Let el, ... , ek be all posi-
tive weight edges incident on o. Pick any ei and ej. We can assemble
{ el,... , ek} - {ei' ej} into a single subtree S with 0 costj and then transfer
S along ei by a distance d $ w(ei). The effect of this operation is that
the edges el, ... ,ek are still incident on a super-node, and a portion of ei
of length d is moved into ej. The total cost of this operation is d. We de-
note this operation by move( ei, d, ej). This operation can be implemented
in O(k) time using the adjacency-list representation of the tree (where the
weight of the edge is also stored in the adjacency list).

~
~
(2)
(1 )

Figure 14: The operation move(el, 0.2, e3). (1) e2, e4, es are assembled into
a tree Sj (2) S is moved along el by a length of 0.2.

Figure 5.2 shows an example of this operation. In the figure, the thin
lines denote 0 weight edges and heavy lines denote positive weight edges.
62 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

A tree T is called a super-star if all of its internal edges have 0 weight.


In other words, all external edges of a super-star T are incident to a single
super-node.
In the rest of this section, we prove the following theorem.

Theorem 15 For any two weighted phylogenies Tl and T2, Dst(Tl, T2) can
be approximated to within a factor of 2 in O(n2 log n) time.

First, we describe the algorithm DST which approximates Dst(Tl, T2) to


within a factor of 2 for the special case when Tl and T2 do not have any good
edge pairs. Then we will show how to apply the algorithm to the general
case.
The algorithm transforms Tl into a super-star T{ (by moving the weight
of internal edges into external edges). Similarly, the algorithm transforms
T2 into a super-star T~. The transformations are chosen to make T{ coincide
with T~. To transform Tl to T2, we first transform Tl to T{(= T~) and
then transform this to T2. Let T{ (resp. T~) denote the tree during the
transformation of Tl (resp. T2). T{ (resp. T~) is initialized to be Tl (resp.
T2).
Algorithm DST:
Step O. Initialize T{ = Tl and T~ = T2.
Step 1. While T{ is not a super-star yet and there is an external edge
eT'1 (a) = (a, u) in T{ such that w(eT'1 (a)) < w(eT.'2 (a)), do:

• Let el be any positive weight internal edge of T{ incident on


the super-node containing u. Let d = min{w(et), [w(eT~ (a)) -
w(eT'1 (a))]).
• Perform the operation move(el,d,eT,(a)) 1
in T{. (Note: after
this move operation, either the entire length of el is moved into
eT'1 (a) or w(eT'1 (a)) = w(eT.'2 (a) )).

(Note: after the loop terminates, either T{ is a super-star or w( eT{ (a)) ;:::
w(eT.,(a))
2
for all leaf nodes a. Also we perform subtree-transfer only
on internal edges of Tt}.

Step 2. Similar to Step 1, with the roles of T{ and T~ swapped.


Step 3. We transform T{ and T~ into two super-stars such that w(eT'1 (a)) =
w(eT.'2 (a)) for all leaf nodes a. There are two possible cases as follows.
Computing Distances between Evolutionary Trees 63

Case 3.1. w(eT'l (a)) = w(eT.'2 (a)) for all leaf nodes a. Perform the foI-
lowing loop to transform both T{ and T2 into super-stars. During
the execution of the loop, we maintain the condition w(eT'I (a)) =
w(eT.'2 (a)) for all leaf nodes a (this condition implies that T{ is a
super-star iff T2 is a super-star).

Repeat
Pick any edge eT'I (a) = (a, ud in T{. Suppose that the cor-
responding edge eT.'2 (a) in T2 is (a, U2). Let el be any pos-
itive weight internal edge of T{ incident on the super-node
containing Ul. Let e2 be any positive weight internal edge
of T2 incident on the super-node containing U2. Let d =
min{ w(el), w(e2)}. In TL perform the operation
move( el, d, eT'I (a)). In T2, perform the operation
move(e2, d, eT.'2 (a)). (After this, we have moved the entire
length of either el or e2 into external edges.)
Until both T{ and T2 are super-stars.
(Note: during this step, we perform subtree-transfer only on internal
edges of Tl and T2).

Case 3.2. There exists a leaf node a such that w(eT'I (a)) =J w(eT.'2 (a)).
This can happen only if both T{ and T2 are super-stars already. We
need to make w(eT'I (a)) = w(eT.'2 (a)) for all leaf nodes a. This is
done as follows. Partition L(TD into three subsets A, B, and 0 as
follows: A (resp. B,O) is the set ofleafnodes a (resp. b, c) such that
w(eT{ (a)) = w(eT~ (a)) (resp. w(eT{ (b)) < w(eT~ (b)), w(eT{ (c)) >
w(eT~ (c))).

Repeat
Pick any edge eT' (b) with b E B and eT' (c) with cEO. Let
d = min{[w(eT') (c)) -w(eT.'2 (c))], [w{eT.'
2 1
(bh -w(eT' (b))]. In T{,
perform moveteT'1 (c), d, eT'I (b)). Then:
• If d = w(eT.'2 (b)) - w{eT'I (b)), remove b from B and put b
into A.
• If d = w{eT'I (c)) - w(eT.'2 (c)), remove c from 0 and put c
into A.
• If d = w(eT{ (c)) - w(eT~ (c)) = w(eT~ (b)) - w(eT{ (b)), re-
move b from Bj remove c from OJ put both band c into
A.
64 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

Until B = C = 0.
Step 4. Now both T{ and T~ are super-stars and w{eT'1 (a)) = w{eT'1 (a))
for all leaf nodes a. We adjust the topology of the super-nodes of T{
and T~ so that T{ and T~ are identical.

Lemma 16 Assume that Tl and T2 do not share any good edge pairs. Then,
algorithm DST approximates Dst{T1 , T2) to within a factor of 2 in O{n 2)
time.

Proof. We analyze the cost and running time of each step of the algorithm.
We use the adjacency-list representation of a tree. Steps 0 and 4 incur no
costs and can easily be implemented in O{n) time. During Steps 1, 2 and
3.1, we only transfer subtrees across internal edges of Tl and T2. Over any
portion of such an edge e, at most one subtree-transfer operation occurs.
So the total cost of these steps is bounded above by Wint{TI) + W int {T2).
Moreover, it is easy to see that at most O{n) moves are performed during
Steps 1,2, and 3.1, and since each move operation can be implemented in
O{n) time, the total time for all these steps is at most O{n 2 ).
Next, consider Step 3.2. Before the repeat loop is entered, for any e E C,
we have:

• w{eT'1 (e)) = W{eTl (e)). (This is because no additional weight is moved


to the edge eT'1 (e) during Steps 1 and 2).

• w(eT~(e)) ~ w(eT2(e)).

During Step 3.2, we only transfer subtrees across the edges eT'1 (e) for e E C.
Fix such an edge. Note that any portion of eT'1 (e) is traversed at most
once during Step 3.2. Once the length of eT'1 (e) is reduced to w{eT'2 (e)),
e is removed from C. So the portion of eT,{e) 1
traversed during Step 3.2
is w{eT{{e)) - w{eT~{e)) = w{eTl{e)) - w{eT~{e)) ~ w(eTl(e)) - w{eT2{e)).
So the total cost of Step 3.2 is at most Lcedw(eT{{e)) - w(eT~{e))] ~
Lcedw{eTl (e)) - w{eT2{e))] ~ W ext ,Tl>T2{TI). Also, we perform at most
O{n) move operations during Step 3.2, and hence this step can also be
implemented in O(n 2 ) time.
Thus the total cost of the algorithm is bounded above by Wint(Td +
W int (T2) + W ext ,Tl>T2(TI), which is at most 2D st {Tl,T2) by Lemma 14. 0
Computing Distances between Evolutionary Trees 65

Next, we show how to apply algorithm DST to achieve an approximation


ratio of 2 when TI and T2 may share some good edge pairs. We concentrate
on the algorithm and omit implementation details. Let K be the number
of good edge pairs in TI and T2 . Our algorithm is by induction on K.
If K = 0, algorithm DST works by Lemma 16. Suppose K > O. Let
el = (UI' VI) E E(Td and e2 = (U2' V2) E E(T2) be a good pair. Let T{ and
T{' be the two subtrees of TI partitioned by el. Let T~ and T~' be the two
subtrees of T2 partitioned by e2, where L(T{) = L(T~) and L(T{') = L(T~') .
Assume w(E(T{)) :::; w(E(T~)) < w(E(T{)) + w(el)' (The other case
can be handled in a similar way). Add a new edge (UI' x) to T{ and assign
W((UI' x)) = w(E(T~)) -w(E(T{)). Add a new edge (x, vd to T{' and assign
w((x, VI)) = w(ed - W((UI' x)). Add a new edge (U2, x) to T~ and assign
W((U2'X)) = O. Add a new edge (X,V2) to T~' and assign W((X,V2)) = w(e2)'
(See Figure 15). Note that the weights of all new edges are non-negative.

Figure 15: Cut each of TI and T2 into two smaller trees.

Now we have: L(T{) = L{T2) and w{Tl) = w{T2). We can normalize


the weights of T{ and T~ such that their sum is 1. By induction hypothesis~
we can transform T{ to T2 with cost at most 2Dst(T{, T~). Similarly, we
can transform T{' to T~' with cost at most 2D st (T{', T2'). Combining the two
transfer sequences, we can transform TI to T2 with cost at most 2D st (TI' T2)'
The complete algorithm takes O( n 2 log n) time. This completes the proof of
Theorem 15.

6 The Rotation Distance


6.1 Rotation and its equivalences
A rotation is an operation that changes one rooted binary tree into another
with the same size. Figure 16 shows the general rotation rule. Note that the
66 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

rotation is an invertible operation. If a tree T is changed into T' by a rota-


tion, then T' can be changed back into T by another rotation. In a rooted
binary tree of size n, there are n - 1 possible rotations, each corresponding
a non-root node.

rotation ji U

Eotation at v A

Figure 16: The definition of rotation.

A symmetric order traversal of a rooted tree visits all of the nodes exactly
once. The order can be described recursively as follows: for a node in the
tree, traverse its left subtree{if there is}, visit the node itself, and then
traverse its right subtree{if there is}. A rotation maintains the symmetric
order of the nodes.
The rotation on binary trees can be formulated with respect to different
systems of combinatorial objects and their transformations. The diagonal-
flip operation in triangulations is perhaps more intuitive and so supplies
more insight.
Consider the standard convex {n + 2}-gon. We choose an edge of the
polygon as a distinguished edge, called "root edge", and label its ends as 0
and n + 1. We also label the other n vertices from 1 to n counterclockwise.
Any triangulation of the {n + 2}-gon has n triangles and n - 1 diagonals.
l.From a triangulation of the {n + 2}-gon, we derive a binary tree of size
n by assigning a node for each triangle and connecting two nodes if the
corresponding triangles sharing a common diagonaL The root of the tree
corresponds to the triangle containing the root edge. It is not difficult to see
that the ith node of the binary tree in symmetric order corresponds to the
triangle with vertices i, j and k such that j < i < k. In this way, we obtain
a 1-1 correspondence between n-node binary trees and triangulations of the
{n + 2}-gon as illustrated in Figure 17.
A diagonal-flip is an operation that transforms one triangulation of a
convex polygon into another as showed in Figure 18. A diagonal inside the
polygon is removed, creating a quadrilateral. Then the opposite diagonal
of this quadrilateral is inserted in place of the one removed, restoring a
Computing Distances between Evolutionary Trees 67

1 + IAI
subgon r l+ IBI
] subgon

"

Figure 17: A triangulation and its corresponding n-node rooted binary tree.

triangulation of the polygon. It is not difficult to see that diagonal-flips


in a triangulation correspond one-to-one to rotations in the corresponding
binary tree.

Figure 18: A diagonal flip in a triangulation of the hexagon.

Given a triangulation 7r of a polygon, we define the internal degree of a


vertex v as the number of diagonals adjacent to v, denoted by id(v). Now
let us see how id( v) is reflected in the corresponding binary tree. In a rooted
binary tree T, the left (resp. right) path is a maximal sequence of nodes
that form a path starting at the root all of whose edges go in left (resp.
right) direction. For a node vET, the left and right subtree rooted at v
are denoted by LTv and RTv respectively. Recall that all non-leaf nodes are
internal nodes in a tree. The following result is of interest itself and has not
appeared in literature to the best of the authors knowledge.

Theorem 17 ([34}) Suppose that the (n + 2}-gon P is oriented by labeling


its vertices from 0 to n + 1 and (0, n + 1) is the root edge. Let 7r be a
triangulation of P and T be the corresponding n-node rooted binary tree.
Then
(1) The internal degree id(O) of vertex 0 in P equals the number of in-
ternal nodes on the left path of T;
68 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

(2) The internal degree id{n + 1) of vertex n + 1 in P equals the number


of internal nodes on the right path of T;
(3) The internal degree id{i) of vertex i E P (O < i < n + 1) equals the
number of subtrees at node i, which is at most 2, plus the number of internal
nodes on the right path of the left subtree LT;, and the number of internal
nodes on the left path of the right subtree RTi.
Other interesting relationship between a triangulation of a convex poly-
gon and its corresponding rooted binary tree can be found in a nice survey
article [1].

6.2 Upper and lower bounds for the rotation distance


Any rooted binary tree of size n can be converted into any other with the
same size by performing an appropriate sequence of rotations. Therefore, we
can define the rotation distance between two trees as the minimum number
of rotations required to convert one tree into the other. Let rt{Tr, T2) to
denote the rotation distance between two trees Tl and T2. Define

Similarly, we can define the diagonal flip distance between two triangulations
of the n-gon and denote the maximum distance between any pair of such
triangulations by fd{n). Obviously, rd{n) = fd{n + 2).
Culik and Wood showed that rd{n) S; 2n - 2{[6]). Sleator, Tarjan and
Thurston improved this bound to 2n - 6 and showed that the bound is tight
for all sufficiently large n using hyperbolic geometry.
Theorem 18 ([41]) rd{n) = fd{n + 2) S; 2n - 6 for all n > 10. Further-
more, the equality holds for all sufficiently large n.
The exact values of rd{n) for n S; 16 are listed below[41].

n 1 2 345 678 9 10 11 12 13 14 15 16
rd{n) o 1 245 7 9 11 12 15 16 18 20 22 24 26

However, little is known about the lower bound for the rotation distance
rd{Tl' T 2) of two given trees Tl and T2. The following two theorems are the
only known lower bounds, which is presented in term of the diagonal flip
distance for simplicity. The first one is a variant of lemma 3 in [41].
Computing Distances between Evolutionary Trees 69

Theorem 19 Let 7rl and 7r2 be two triangulations of the n-gon. If 7rl and
7r2 have k different diagonals, then fd(7rI, 7r2) ~ k.

Let 7rl and 7r2 be two triangulations of the n-gon. Consider a sequence
IT of diagonal-flips that transforms 7rl into 7r2. A diagonal-flip (ab, cd) E IT
is auxiliary if cd ¢ 7r2. We also say that the flip (ab, cd) touches the vertices
a, b, c, d. Let A(IT) denote the set of all auxiliary diagonal-flips in S. Let IITI
denote the number of flips in IT. Then

IITI ~ IA(IT)I + n - 3. (1)

Finally, a triangle of a triangulation is said to be internal if it contains three


diagonals of the triangulation.

Theorem 20 ([34]) Let 7rl and 7r2 be two triangulations of a convex polygon
and let v be a vertex of the polygon. Suppose that the following conditions
are satisfied:
(a) v is an end of at least two diagonals in 7r2,
(b) v is not a vertex of any internal triangles in 7rl or 7r2,
(c) v is not connected by a 7r2 -diagonal to any vertices of internal trian-
gles in 7r2, and
(d) flipping any 7rl -diagonal adjacent to v does not create a 7r2-diagonal.
Then, there is at least one auxiliary diagonal touching v in any sequence
IT of diagonal-flips that converts 7rl into 7r2.

As showed in the next subsection, Theorem 20 is useful for estimating


the diagonal flip distance between two triangulations.

6.3 Approximating the rotation and diagonal flip distances


Since the rotation and diagonal distance are equivalent, we just state results
in term of the diagonal flip distance.
Given two triangulations 7rl and 7r2 with k different diagonals. Since
every different 7rl-diagonal has to be flipped, any diagonal-flip transforma-
tion from one to another contains at least k flips. On the other hand, by
Theorem 18, 2k flips are enough to transform 7rl into 7r2. This implies an
approximation with ratio 2.

Theorem 21 The diagonal flip distance can be approximated with ratio 2


in polynomial time.
70 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

However, it is very hard to develop a polynomial approximation algo-


rithm with constant ratio < 2 for the distance. In what follows, we prove a
slightly better approximation.

Theorem 22 ([34]) There is a polynomial approximation algorithm that,


on the input of two triangulations 7r1 and 7r2 of the n-gon, output a diagonal
flip transformation of length at most (2 - 4(d-I)(2d+6)+1 )fd{7rI, 7r2), where d
is the maximum number of diagonals adjacent a vertex in one of the given
triangulations.

Proof.Let e be a diagonal in 7r1 or 7r2. The diagonal e is said to be isolated if


there is only one diagonal (in the other triangulation) crossing e. Given two
triangulations of the n-gon. They may have some diagonals in common in
generaL All the common diagonals divide the rest of diagonals into disjoint
subclasses. Each disjoint subclass together with common diagonals around
it is called a cell. A desired algorithm is presented in Table 1. Obviously,
the algorithm takes polynomial time. We analyze its approximation ratio
as follows.

Without loss of generality, we assume that 7r1 and 7r2 do not have common
diagonals. Suppose that the Do loop runs ml times for isolated diagonals.
Then, after the Do loop, 7r1 and 7r2 have been transformed into triangulations
7r~ and 7r~ which have m diagonals in common. Without loss of generality, we
may assume that different diagonals in 7rl and 7r~ forms two triangulations
of a convex (n - m)-gon. Note that fd{7rI, 7r2) = m + fd{7ri, 7r~).
A vertex v E P is pure with respect to 7ri, if it is only an end of 7ri-
diagonals. Let VI and V2 denote the sets of pure vertices with respect to 7r~
and 7r~ respectively. We first prove that

fd(7r~, 7r~) 2:: (n - m) - 3 + IVII/4.


Consider a shortest sequence S of diagonal flips that transforms 7ri into 7r~.
Since there are no isolated edges in both 7rl and 7r~, each vertex in VI is an
end of at least two 7r1-diagonals. By Theorem 20, for each node in VI, there
is at least one auxiliary flip touching it. Since each auxiliary flip can touch
at most 4 vertices, there are at most IVII/4 auxiliary flips in S. Hence, by
Inequality (I), fd{7rl., 7r~) 2:: (n - m) - 3 + IVII/4.
Computing Distances between Evolutionary Trees 71

Input: Two triangulations 71"1 and 7I"2j

Do until the following 'if' conditions fails


if there are isolated diagonals then
pick such an edge ej
let e' be the unique diagonal that intersects ej
if e' E 71"1 then 71"1 := 71"1 + e - e' else 71"2 := 71"2 + e - e' j
Enddo

Let the resulting polygon triangulations have k cells Pi (i ~ k), and let
7I"jlpi denote the restriction of 7I"j on Pi for j = 1,2 and i ~ kj assume P;,
has ni vertices.

For each cell Pi


pick a node Vj
transform 7I"llp; into the unique triangulation 71" all of whose
diagonals have one end at v using at most ni stepsj
transform 71" into 7I"2Ip; reversely.
Endfor

Table 1: Transformation algorithm.

Similarly, by considering pure vertices with respect to 71"2, we are able to


prove that fd(7I"L 7I"~) ~ (n - m) - 3 + 1V21/4. Combining these two bounds
together, we obtain that

(2)

On the other hand, one can prove that there are at least 'd-=-7 - 31 Vll- (d+
2)1V21 vertices satisfying the conditions in Theorem 20. Thus, by Inequality
(1) , Jd(7I"'1, 71"'2 ) >
-
(n - m) - 3 + 4(d-l)
n-m - 3IVlI
4
- (d+2)1V21
4'
Similarly, we have
J d(7I"'l' 71"'2 ) >
-
(n - m) - 3 + 4(d-l)
n-m - 31V21 - (d+2)lVll Combining these two
4 4'
inequalities together, we have that

' '»(
Jd( 71"1,71"2 - )-3 n-m _(d+5)(1V11+1V21) (3)
- n m + 4(d _ 1) 8 .

By (2) and (3), Jd(7I"1 , 71"2) = m+ Jd(7I"L 7I"~) ~ m+(n-m)(l+ 4(d-l~(d+6»)-3.


72 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

The algorithm transforms 1r~ to 1r~ using at most M = 2(n - m) + m =


2n - m flips, which is less than (2 - 4(d-lld+6)+l)!d(1rl,1r2). This finishes
the proof. 0
Furthermore, [34] also presented a polynomial approximation algorithm
with ratio 1.97 for the diagonal flip transformation between two triangula-
tions without internal triangle. Such a triangulation corresponds to a rooted
binary tree without degree 3 internal nodes.

6.4 Miscellaneous remarks


The diagonal flip operation was early studied by Wagner [45] in the context
of arbitrary triangulated planar graphs and by Dewdney [11] in the case of
graphs of genus one. They showed that any such graph can be transformed
to any other by diagonal flips. However, they did not try to accurately es-
timate how many flips are necessary. After [41] was published, the rotation
and diagonal flip operations have been studied in several aspects. Pallo [36]
proposed a heuristic search algorithm to compute the rotation distance be-
tween two given trees. Hurtado, Noy and Urrutia [24] studied diagonal flips
in arbitrary polygons. Guibas and Hershberger [18] abstracted polygon mor-
phing as a sequence of rotations on weighted binary trees and Hershberger
and Suri [23] proved that a weighted rooted binary tree can be converted
into any other in at most O(n log n) rotations.

7 Open Questions
In this section, we list some open questions concerning the nni and subtree-
transfer distances.

1. Give a constant ratio approximation algorithm for the nni distance on


unweighted evolutionary trees or prove that the ratio O(logn) is the
best possible.

2. Is the linear-cost subtree-transfer distance NP-hard to compute on


weighted evolutionary trees?

3. Can we improve the approximation ratios for subtree-transfer distance


on unweighted or weighted evolutionary trees?
4. Are there simple approximation algorithms for the rotation distance
with nontrivial ratios?
Computing Distances between Evolutionary 'Ii:ees 73

References
[1] D. Aldous, Triangulating the circle, at random. Amer Math. Monthly,
89, pp. 223-234, 1994.

[2] M.A. Armstrong, Groups and Symmetry, Springer Verlag, New York
Inc., 1988.

[3] D. Barry and J.A. Hartigan, Statistical analysis of hominoid molecular


evolution, Stat. Sci., 2, pp. 191-210, 1987.

[4] C.H. Bennett, P. Ga.cs, M. Li, P. Vitanyi, and W. Zurek, Information


Distance, to appear in IEEE Trans. Inform. Theory.

[5] R. P. Boland, E. K. Brown and W. H. E. Day, Approximating minimum-


length-sequence metrics: a cautionary note, Math. Soc. Sci., 4, pp.
261-270, 1983.

[6] K. Culik II and D. Wood, A note on some tree similarity measures,


Inform. Proc. Let., 15, pp. 39-42, 1982.

[7] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp and L. Zhang, On


distances between phylogenetic trees, Proc. 8th Annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 427-436, 1997.

[8] B. DasGupta, X. He, T. Jiang, M. Li, and J. Tromp, On the linear-cost


subtree-transfer distance, Algorithmica, submitted, 1997.

[9] B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, On


computing the nearest neighbor interchange distance, Preprint, 1997.

[10] W. H. E. Day, Properties of the nearest neighbor interchange metric


for trees of small size, Journal of Theoretical Biology, 101, pp. 275-288,
1983.

[11] A. K. Dewdney, Wagner's theorem for torus graphs, Discrete Math., 4,


pp. 139-149, 1973.

[12] A.W.F. Edwards and L.L. Cavalli-Sforza, The reconstruction of evolu-


tion, Ann. Hum. Genet., 27, 105, 1964. (Also in Heredity 18, 553.)

[13] J. Felsenstein, Evolutionary trees for DNA sequences: a maximum like-


lihood approach. J. Mol. Evol., 17, pp. 368-376, 1981.
74 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

[14] J. Felsenstein, personal communication, 1996.

[15] W.M. Fitch, Toward defining the course of evolution: minimum change
for a specified tree topology, Syst. Zool., 20, pp. 406-416, 1971.

[16] W.M. Fitch and E. Margoliash, Construction of phylogenetic trees, Sci-


ence, 155, pp. 279-284, 1967.

[17] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide


to the Theory of NP-Completeness, W. H. Freeman, 1979.

[18] L. Guibas and J. Hershberger, Morphing simple polygons, Proceeding of


the ACM 10th Annual Sym. of Comput Geometry, pp. 267-276, 1994.

[19] J. Hein, Reconstructing evolution of sequences subject to recombination


using parsimony, Math. Biosci., 98, pp. 185-200, 1990.

[20] J. Hein, A heuristic method to reconstruct the history of sequences


subject to recombination, J. Mol. Evol., 36, pp. 396-405, 1993.

[21] J. Hein, personal email communication, 1996.

[22] J. Hein, T. Jiang, L. Wang, and K. Zhang, On the complexity of com-


paring evolutionary trees, Discrete Applied Mathematics, 71, pp. 153-
169, 1996.

[23] J. Hershberger and S. Suri, Morphing binary trees. Proceeding of the


ACM-SIAM 6th Annual Symposium of Discrete Algorithms, pp. 396-
404, 1995.

[24] F. Hurtado, M. Noy, and J. Urrutia, Flipping edges in triangulations,


Proc. of the ACM 12th Annual Sym. of Comput. Geometry, pp. 214-223,
1996.

[25] J. P. Jarvis, J. K. Luedeman and D. R. Shier, Counterexamples in mea-


suring the distance between binary trees, Mathematical Social Sciences,
4, pp. 271-274, 1983.

[26] J. P. Jarvis, J. K. Luedeman and D. R. Shier, Comments on computing


the similarity of binary trees, Journal of Theoretical Biology, 100, pp.
427-433, 1983.
Computing Distances between Evolutionary 'ITees 75

[27] J. Kececioglu and D. Gusfield, Reconstructing a history of recombi-


nations from a set of sequences, Proc. 5th Annual ACM-SIAM Symp.
Discrete Algorithms, 1994.

[28] M. Kuhner and J. Felsenstein, A simulation comparison of phylogeny


algorithms under equal and unequal evolutionary rates. Mol. Bioi. Evol.
11(3), pp. 459-468, 1994.

[29] M. Krivanek, Computing the nearest neighbor interchange metric for


unlabeled binary trees is NP-complete, Journal of Classification, 3, pp.
55-60, 1986.

[30] V. King and T. Warnow, On Measuring the nni distance between two
evolutionary trees, DIMACS mini workshop on combinatorial structures
in molecular biology, Rutgers University, Nov 4, 1994.

[31] S. Khuller, Open Problems: 10, SIGACT News, 24(4), p. 46, Dec 1994.

[32] W.J. Le Quesne, The uniquely evolved character concept and its cladis-
tic application, Syst. Zool., 23, pp. 513-517, 1974.

[33] M. Li, J. 'Iromp, and L.X. Zhang, On the nearest neighbor interchange
distance between evolutionary trees, Journal of Theoretical Biology,
182, pp. 463-467, 1996.

[34] M. Li and L. Zhang, Better Approximation of Diagonal-Flip 'Iransfor-


mation and Rotation 'Iransformation, Manuscript, 1997.

[35] G. W. Moore, M. Goodman and J. Barnabas, An iterative approach


from the standpoint of the additive hypothesis to the dendrogram prob-
lem posed by molecular data sets, Journal of Theoretical Biology, 38,
pp. 423-457, 1973.

[36} J. Pallo, On rotation distance in the lattice of binary trees, Infor. Proc.
Letters, 25, pp. 369-373, 1987.

[37] D. F. Robinson, Comparison of labeled trees with valency three, Journal


of Combinatorial Theory, Series B, 11, pp. 105-119, 1971.

[38] N. Saitou and M. Nei, The neighbor-joining method: a new method


for reconstructing phylogenetic trees, Mol. Bioi. Evol., 4, pp. 406-425,
1987.
76 B. DasGupta, X. He, T. Jiang, M. Li, J. Tromp, L. Wang, L. Zhang

[39] D. Sankoff, Minimal mutation trees of sequences, SIAM J. Appl. Math.,


28, pp. 35-42, 1975.
[40] D. Sankoff and J. Kruskal (Eds), Time Warps, String Edits, and Macro-
molecules: the Theory and Practice of Sequence Comparison, Addison
Wesley, Reading Mass., 1983.

[41] D. Sleator, R. Tarjan, W. Thurston, Rotation distance, triangulations,


and hyperbolic geometry, J. Amer. Math. Soc., 1, pp. 647-681, 1988.

[42] D. Sleator, R. Tarjan, W. Thurston, Short encodings of evolving struc-


tures, SIAM J. Discr. Math., 5, pp. 428-450, 1992.

[43] K.C. Tai, The tree-to-tree correction problem, J. ACM, 26, pp. 422-433,
1979.
[44] A. von Haseler and G.A. Churchill, Network models for sequence evo-
lution, J. Mol. Evol., 37, pp. 77-85, 1993.

[45] K. Wagner, Bemerkungen zum vierfarbenproblem, J. Deutschen Math.-


Verin., 46, pp. 26-32, 1936.

[46] M. S. Waterman, Introduction to computational biology: maps, se-


quences and genomes, Chapman & Hall, 1995.

[47] M. S. Waterman and T. F. Smith, On the similarity of dendrograms,


Journal of Theoretical Biology, 73, pp. 789-800, 1978.

[48] K. Zhang and D. Shasha, Simple fast algorithms for the editing distance
between trees and related problems, SIAM J. Comput. 18, pp. 1245-
1262, 1989.
[49] K. Zhang, J. Wang and D. Sasha, On the editing distance between
undirected acyclic graphs, International J. of Foundations of Computer
Science 7 (13), March 1996.
77

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 77-103
©1998 Kluwer Academic Publishers

Combinatorial Optimization and Coalition Games


Xiaotie Deng
Department of Computer Science
City University of Hong Kong, Kowloon, Hong Kong, China
E-mail: deng@cs.cityu.edu.hk

Contents
1 Introduction 78

2 Solution Concepts in Cooperative Games 79


2.1 Computational Complexity . . . . . . . . . . . . . . . . 80
2.2 The Core and Related Concepts . . . . . . . . . . . . . . 81
2.3 Bargaining Set and von Neumann-Morgenstern Solution 82
2.4 Concepts with Unique Solution . . . . . . . . . . . . . . 84

3 Problems and Models 86


3.1 Combinatorial Optimization Games 86
3.2 Totally Balanced Game Models 88
3.3 Other Models . . . . . . . . . . . .. 90

4 Algorithms and Complexity of the Core 91


4.1 The Game of Assignment, Partition and Packing/Covering 91
4.2 Characterization of Total Balancedness . . . . . . . . . . . . 94
5 Beyond Complexity for Solution Concepts 95
5.1 PC-Core. 96
5.2 NE-Core.. . . . . . . . 97

6 Remarks and Discussion 98

References
1 Introduction
Studies on games in coalition form deal with the power of cooperation among
its participants. In this sense it is often referred to as cooperative game
theory. In a simple mathematical formulation, we have a set N of agents,
and a value function v : 2N ---t R where, for each subset S ~ N, v(S)
represents the value obtained by the coalition of agents of the subset S
without assistance of other agents, with v(0) = o. Individual income can be
represented by a vector x : N ---t R. We consider games with side payments.
The main issue here is how to fairly distribute the income collectively earned
by a group of cooperating participants in the game. For simplicity, we write
x(S) = I:iES Xi. A vector x is called an imputation if x(N) = v(N), and
Vi EN: Xi 2: v ({i}) (individual rationality).
Much of cooperative game theory is built around the question what kinds
of imputations are fair, stable and rational. Different philosophies result in
different solution concepts [54]. For example, the core is defined on the
concept of subgroup rationality. Thus, an imputation x is in the core if
it satisfies the subgroup rationality condition: VS ~ N : x(S) 2: v{S). A
new concept particularly interesting to us is the thesis of bounded ratio-
nality which argues that decisions made by real life agents may not spend
unbounded amount of resources to evaluate all the possibilities for opti-
mal outcome [55, 43, 45]. Methodologies in algorithms and computational
complexity, from theoretical computer science, have provided such type of
tools for discussion on whether solutions can be obtained within reasonable
amount of computational resources. For cooperative games, it is suggested
that polynomial time algorithms (as good algorithms following Jack Ed-
mOI,lds [15]) be sought for solutions [38], and computational complexities be
taken into consideration as a measure for evaluating and comparing different
solution concepts [11].
The field of combinatorial optimization has much to offer for the study
of cooperative games. First of all, the value of a subgroup is often a so-
lution to a combinatorial optimization problem, subject to constraints of
resources controlled by members in the subgroup. It is commonly referred
to as combinatorial optimization games. Moreover, the number of players
in cooperative games is fixed and finite in general. Solution concepts are
usually defined on a structure/relation on a finite number of players. This
often results in discrete optimization problems of combinatorial nature. For
example, the definition of an extensively studied solution concept, the core,
for a cooperative game requires the validity of an exponential number of lin-
Combinatorial Optimization and Coalition Games 79

ear constraints, a situation reminiscent of linear systems for combinatorial


optimization problems in polyhedral combinatorics [16]. Indeed, to decide
whether an imputation is in a core or not for a convex game, one would find
the minimum of a submodular function. This can be done in oracle poly-
nomial time via ellipsoid method [26] (see [6] for a combinatorial algorithm
which is pseudo-polynomial time).

2 Solution Concepts in Cooperative Games


In a cooperative game setting, let N be the set of players in the game and
n = INI. The value v(S) represents the gain obtained by the collective
action of a coalition formed by a subset S of players without any assistance
of other players in N - S. In general, v(S) may be a vector specifying the
outcome of each player in the coalition S. For games with side payments,
income of players can be redistributed. Therefore, we may consider it as a
function from subsets of N to real numbers, v : 2N --+ R and it is called the
characteristic function of the cooperative game. In this treatise, we assume
that v(0) = O. Though one may consider cases with v(0) not equal to zero,
setting it to be zero will not change most interesting results discussed here.
We represent a cooperative game by (N, v).
A cooperative game (N, v) is convex if for any pair of subsets Sand T
of N, S,T ~ N, we have

v(S) + v(T) ::; v(S U T) + v(S n T).


This classifies a large collection of cooperative games that often present
themselves in practical situations. As we should see later, convex games
result in very interesting properties.
We denote the income distributed to individual players by a vector x :
N --+ R. Let x(S) = 'EiES Xi. The focus of cooperative game theory has
always been how to fairly distribute their collective income and how to
calculate a fair solution. We focus on vectors x which meet the following
conditions:

1. x(N) = v(N),

2. individual rationality: 'Vi EN: Xi 2: v( {i} ).

A vector satisfies the above conditions is called an imputation.


W XD~

2.1 Computational Complexity


The characteristic function v : 2N ~ R, however, requires space size ex-
ponential in the number n of players if explicitly presented. We assume to
have an oracle which, given a subset S ~ N, returns the value v(S) of this
subset S. In some cases we are going to discuss, v(S) can be calculated in
polynomial time according to the oracle description. However, in some other
cases, calculating v(S) can be difficult. In general situations, an algorithm
is oracle-polynomial time if it makes inquiries to the oracle about values
v(S) for a polynomial (in n) number of subsets of players S ~ N and runs
in polynomial time. Therefore, an algorithm in oracle-polynomial time is a
polynomial time algorithm if v(S) can be evaluated in polynomial time.
The problem of finding solutions of cooperative games according to dif-
ferent concepts to be introduced later will naturally bring in issues of al-
gorithms and complexity. Littlechild, and with Owen, for example, have
discussed special games for which some solution concepts can be represented
by a simple formula [35, 34]. Megiddo has been the first to explicitly de-
mand that the solution for a cooperative game be found by good algorithms
(following Edmonds [15]), i.e., within time polynomial in the number of
players [38],
Deng and Papadimitriou have suggested to apply computational com-
plexity of finding a solution as another measure of fairness (in a way similar
to bounded rationality) for comparing and evaluating solution concepts in
cooperative games [9, 11]. To illustrate this point of view, the following
game is used as an example. In this game, "IS ~ N,

1. if lSI = 1, v(S) = 0,
2. for 181 = 2, the value v(S) is explicitly given,

3. if lSI ~ 3, v(S) = l:{i,j}~NV({i,j}).


This game can be related to a graph such that its nodes represent players
and edges represent the characterization function values for pairs of players.
The value for a subset S of players is the sum of edge values in the subgraph
induced by the corresponding subset nodes the graph. Thus, we may name
it as a game of sum of edge weights, and denote it by SEW(G,w) for a graph
G = (V, E) with edge weights w : E ~ R.
There are realistic situations this game may arise. Consider a telephone
network between different countries. Each player is a long distance tele-
phone company for a country. Income is generated from lines connecting
Combinatorial Optimization and Coalition Games 81

phone calls between two countries. The companies will have to negotiate
how to distribute the total income. (As an alternative example, consider
the problem how to share the cost when a network of highways are built
between cities.) To illustrate many concepts described in this section, we
will frequently make use of SEW(G,w) as an example.
For an SEW(G,w) game with n players, the input size is O(n 2 ). The
value v(S) can be calculated in polynomial time (i.e., 0(ISI 2 )). And SEW(G,w)
is convex if and only if Ve E E : w(e) 2:: O. To see the proof, if there
is an edge (u, v) of negative weight, let S = {u} and T = {v}. Then
v(S) + v(T) = 0 + 0 = 0 but v(S U T) + v(S n T) = w(e) + 0 < O. This
contradicts the condition

v(S) + v(T) ::; v(S U T) + v(S n T)


for convex games. On the other hand, it is not hard to see that, VS, T ~ N,
we have

v(S U T) + v(S n T) - v(S) - v(T) = w({i,j}),


iES-T,jET-S

which is no less than zero if all edge weights are nonnegative.

2.2 The Core and Related Concepts


The core of a game (N,v) is introduced by Gillies [21] which is based on
the concept of subgroup rationality. Thus, an imputation x is in the core if
and only if VS ~ N, x(S) 2:: v(S). The principle behind its definition is very
similar and can be seen as an extension to that of Nash Equilibrium [41].
For Nash Equilibrium point, the main principle is that no individual would
gain by deviating from this point. For a point in core, no subgroup of players
can do that. Given an imputation x, we have 2n - 1 linear inequalities to
satisfy in order for it to be in the core. Alternatively, it is equivalent to the
following condition:

min{ x(S) - v(S) : S ~ N} 2:: O.

We call this problem the membership test problem for the core. The problem
to decide whether there is no imputation in the core is called the emptiness
test problem for the core (though classically it is said that the game has no
core in this case instead that the core is empty [54]).
82 X. Deng

When the game is convex, it is not hard to see that x - v is a submodular


function. Applying submodular function minimization algorithms in [26],
the membership test problem for the core can be done in oracle polynomial
time. In the case x is not in the core, the above algorithm also finds a
unsatisfied constraints. Notice that, to find an imputation in the core is
equivalent to find a feasible solution to the above 2n - 1 linear constraints.
Using the above algorithm and applying the revised Ellipsoid algorithm for
linear systems with more than polynomial number of constraints in [26], this
again can be done in oracle polynomial time.
For SEW (G, w), it is known that the membership test problem for the
core is NP-hard. And the same is true for the emptiness problem for the
core [11]. On the other hand, for any cooperative game such that v(S) can be
calculated in polynomial time, given an imputation not in the core, it takes
polynomial time to verify a violated inequality. In general, for cooperative
game such that v(S) can be calculated in polynomial time, computational
complexity for these questions will depend on the game. In the worst case, it
is co-NP-complete for the membership test problem for the core. However,
it can be harder for the emptiness problem for the core.
Several related concepts derive from the core. The €-core is defined to
contain imputations such that 'VS ~ N, x(S) ~ v(S) - €. The least-core is
defined to be the €-core for the minimum value of € such that the €-core
is not empty. For convex games, the value of € for the least core can be
calculated in polynomial time in a similar way as above. We leave it for the
readers to verify that. The same is true of the membership test problem
and the emptiness problem for these solution concepts in convex games.

2.3 Bargaining Set and von Neumann-Morgenstern Solution


The bargaining set is an effort to capture the bargaining of players negotiat-
ing a deal introduced by Aumann and Maschler [1]. An imputation x is in
the bargaining set, if, for all players i and j, if i has an objection against j,
then j has a counter objection against the objection of i. Formally, if there
is an imputation y and a coalition S containing i but not j, and such that

1. y(S) :::; v(S)

2. Xk < Yk for all k E S,

then there is another imputation z and set T containing j but not i such
that
Combinatorial Optimization and Coalition Games 83

1. z(T) ::; v(T);

2. Zk 2': Yk for k E Tn s,
3. Zk 2': Xk for k E T - S.

This can be seen as an extension of the concept of the core. In fact, points
in the core are all in the bargaining set since no player can have an objection
against any other player. The (historically) first solution concept defined in
the classical work in Game Theory of von Neumann and Morgenstern, the
von Neumann-Morgenstern solution set, (if it exists) also contains the core.
This concept is based on the notion of domination. Given two imputations
x and y, x is said to dominate y ifthere is a coalition S such that (a) x(S) ::;
v(S), and (b) Xi > Yi, i E S. A set of imputations is a solution, suggested
by von Neumann and Morgenstern, if it has the following properties:

1. No two imputations in dominate each other,

2. any imputation not in this set is dominated by some imputation in


this set.

Again because a point in the core is not dominated by any other imputation,
the core is contained in any von Neumann-Morgenstern solution set.
A very interesting result for convex games is that the above trivial con-
tainments between the core and the bargaining set, and between the core and
the von Neumann-Morgenstern solution also hold in the opposite directions.

Theorem 2.1 The core, the bargaining set, and the von Neumann
-Morgenstern solution set are the same for convex games [51, 40j.

To provide some insight to the above results, we present a proof for the
convex game case of SEW(G,w). As discussed in last section, SEW(G,w) is
convex if and only if all the edge weights are nonnegative. Suppose there is
an imputation x in a von Neumann-Morgenstern solution set which is not
in the core. Since x is not in the core, there exists a subset S of players
such that x(S) < v(S). We choose S to be a minimal subset of S such that
x{S) < v(S). Obviously S is not the same as N since x{N) = v(N). It is
also trivial to see that S is not empty. Denote E = v(S)lsf(S). We define an
imputation Xl on S: Vi E S: xI(i) = x(i) + E. By minimality of S, Xl is in
the core of the game restricted on S. Let X2 be an imputation in the core
of the game restricted on N - S (X2 exists since since the game is convex.)
84 X. Deng

Now, we define an imputation y such that Vi E N - S : y{i) = x2{i) and


Vi E S : y{i) = Xl (i)+ LjEN-S w{i, j). It is easy to see that y is in the core of
SEW{G,w). Thus, X is dominated by an imputation in the core. And hence,
x cannot be in any von Neumann-Morgenstern solution set since the latter
contains the core and no imputation in a solution set can be dominated
by another one in the same solution set. This construction also provide
a proof for the bargaining set to be the same as the core. Suppose x is
an imputation in the bargaining set but not in the core. Then, with the
imputation y constructed above, any player i E S can make an objection to
any imputation j E N - S. However, since y is in the core, j cannot make
a counter objection. This is a contradiction to the assumption that x is in
the bargaining set. Therefore, the bargaining set must be the same as the
core.
A surprising result about the von Neumann-Morgenstern solution set is
a construction by Lucas to show a game with a core but no von Neumann-
Morgenstern solution.

Theorem 2.2 There is a game which has a core but no von Neumann-
Morgenstern solution [36].

For complexity, the membership test problem for bargaining set


of SEW{G,w) is shown to be NP-hard [11]. On the other hand, for games
that v(S) can be calculated in polynomial time for all S ~ N, it is not hard
to see that the membership test problem is in I12P, It would be not be
surprising to have some game with evaluation of v(S) in polynomial time
for all S ~ N, for which the membership test problem for the bargaining
set to be II2P-hard but this remains open. The complexity status of von
Neumann Morgenstern solution set is quite open and is not even known to
be decidable [11].

2.4 Concepts with Unique Solution


One of the dissatisfaction among mathematicians with the above and many
other solution concepts is that there is no definite outcome, though this
may allow for flexibility in application of these concepts to other areas such
as economics and political science. Among various attempts to obtain a
unique solution, the Shapley value and the nucleolus are most well-known.
The Shapley value, introduced to yield a unique solution satisfying several
axioms [50], is a weighted average of marginal contribution of a player to a
Combinatorial Optimization and Coalition Games 85

coalition:

Vi = L (n -ISI)!~ISI- I)! [v(S) ~ v(S - {i} )].


iEScN n.

The nucleolus [48], on the other hand, trying to capture the intuition of min-
imizing dissatisfaction of players, is defined to be the imputation which lex-
icographically minimizes the vector (e(Sl), e(S2)," . ,e(Sm)), where e(S) =
v(S) - x(S) and the vector is non-decreasing, i.e., e(Sl) 2': e(S2) ~ ... 2':
e(Sm), with m = 2n. Even though, by definition, the nucleolus may contain
multiple points, it has a unique solution.
The computation of nucleolus requires solutions to upto 2n linear pro-
grammings with constraints exponential in n [38, 27]. Thus, it has been a
challenge to obtain polynomial time algorithms for nucleolus for cooperative
games. A very interesting algorithmic result for cooperative game solution
concepts is a recent work by J. Kuipers:

Theorem 2.3 The nucleolus of convex games (with v(S) given by an oracle)
can be computed in polynomial time [33J.

For SEW(G,w), interestingly, its Shapley value is the same as its nu-
cleolus [11]. However, the complexity of Shapley value for convex games is
regarded more difficult than the nucleolus since it involved (weighted) sum-
mation of game values on all the subsets of players. Though there are cases
Shapley value can be calculated in polynomial time (e.g., SEW(G,w)), in
general it is very hard. It is shown to be #P-complete to evaluate the Shap-
ley value of a version of simple games, the majority game (the characteristic
function of the game can be evaluated in polynomial time but it is not a
convex game) [11].
A classical but less known solution concept is the Banzhaf index of power,
developed for voting games cite[Ban], which is very closely related to the
Shapley value [13, 47]. Tijs has suggested aT-value [60]. The kernel, con-
tained in the bargaining set, can also be seen as an effort to remove possible
excessive solutions [8]. There are still many other different solution concepts
and some are more useful and practical in different application areas which
we have not discussed. Since our focus is on the interplay of combinatorial
optimization and cooperative games in the coalition form, we choose not to
do an exclusive discussion of solution concepts to keep materials balanced for
readers from combinatorial optimization background and game theoretical
background.
86 X. Deng

3 Problems and Models


An important application of cooperative games is that it provides a math-
ematical formulation for collective decision-making and optimization prob-
lems. In such cases, very often, the characteristic function value of a subset
of players can be presented succinctly as the optimal value of an optimal-
ization problem for this subset of players. Therefore, the input size of the
problem may be polynomial in the number of players even though the char-
acteristic function value v(S) of each subset of players may not be obtainable
in polynomial time (and the complexity depends on that of the optimization
problem).

3.1 Combinatorial Optimization Games


In the classic work of von Neumann and Morgenstern, a simple game is
considered, for which the value of a subset of players is either zero or one [61].
In the general case, the input size could be exponential. However, for such
games associated with practical application area, the input size can often be
reduced. For example, in the majority voting game, the value is one if and
only if the number of players in a coalition is more than a threshold value.
Both the input size and the time to calculate the characteristic function
value for any coalition are polynomial in the number of players.
In an effort to associate cooperative game theory with realistic economic
problems, Shapley and Shubik have studied a market game in which players
start with a initial amount of commodities and wish to trade between the
players in order to maximize their (concave) utility functions. They for-
mulate it as a cooperative game by making the characteristic function for a
subset of players as maximum of the sum of their utility function values over
all the possible distribution schemes among themselves [52]. Therefore, in
this case, evaluation of the characteristic function is a maximization problem
of a concave function under linear inequality constraints. A major result in
their work is a proof of nonexistence of the von Neumann-Morgenstern so-
lution for a market game, derived from the counter example of Lucas. This
establishes the possibility of nonexistence of the von Neumann-Morgenstern
solution in realistic situations.
Another game associated with markets is the assignment game intro-
duced by Shapley and Shubik. It models a two-side market with buyers and
sellers. It is shown that the core is exactly the set of optimal solutions to a
linear program dual to the optimal assignment problem [53]. A major factor
Combinatorial Optimization and Coalition Games 87

for this result is the fact that the optimal solution to the corresponding linear
program can be achieved at integer points. This approach has been exploited
(and it seems to be all done unware of others' work) to apply to several other
situations [56, 20, 10] Granot [23], Tamir and Mitchell [59], have studied the
core of the roommate problem as a cooperative game. Faigle, et al., have
studied a matching game for general graphs [19]. X. Deng, T. Ibaraki and
H. Nagamochi, have studied matching games as an application of a general
integer formulation for combinatorial optimization games [10].
Kalai and Zemel studied games of network flows [30, 31]. In this game,
the players are associated with arcs of the network. The value of a subgroup
is the maximum flow from s to t (a pair of fixed source and sink) for the
subgraph consisting of the original node set and those edges corresponding to
the subgroup of players. For a simple network game for which arc capacities
are all one, they also show that the core is exactly the same as the set of
solutions to a linear program dual to a linear program formulation of the
network flow problem.
Bird [3], and independently, Claus and Granot [4]. have formulated
a minimum cost spanning tree game for cost allocation of communication
networks to its users, studied initially by Claus and Kleitman who have
introduced several cost allocation criteria [5]. In this game, one player cor-
responds to a node of the grpah. There is one more external node o. The
cost for a subset of players is the weight of the minimum spanning tree of
the subgraph induced by their corresponding nodes and node o. The char-
acteristic function of this game, the weight of minimum spanning tree in a
graph, can be calculated in polynomial time. An imputation in the core can
be calculated easily from this minimum spanning tree and thus it can be
found in polynomial time [5, 3, 25J. Faigle, et al., have an interesting and
surprising result that the membership test problem for core of the minimum
spanning tree game is co-NP-complete [18J.
In another direction, Megiddo has formulate this problem differently by
defining the cost of a subset of players to be the cost of a minimum steiner
tree which contains all corresponding nodes in the original graph [39]. This
results in a computational harder problem since given a subset S, it is NP-
hard to evaluate the cost of minimum steiner tree. There are cases the
core is empty. It would not be surprising if one shows that the emptiness
test problem in the core for this problem is NP-hard. However, this is not
known yet. Kuipers has an interesting generalization of these problems to a
minimum cost forest game [32].
In a similar way, for a traveling salesman cost allocation game, nodes
88 X. Deng

of the graph are the players and an extra node O. The value of a subgroup
S of players is minimum Hamiltonian tour in the subgraph induced by S U
{O} [12, 46, 57] Again, the complexity of evaluating the characterization
function is NP-hard and there are cases the core is empty [57]. It would be
nice to know the complexity of finding a member in the core which we believe
to be NP-hard. Faigle, et al., have studied an Euclidean space version [17]
with a solution concept modified from core. Hamers, Granot and Tijs have
studied a cooperative game associated with the chinese postman problem
[28].
Deng and Papadimitriou have discussed the game SEW(G,w) for which
the game value for any subset of players is the total weight of the edges in
the subgraph induced by them [9,11]. With this game, complexity, proposed
there as another measure of rationality and fairness, about different solution
concepts are studied: Whether the solution concept is empty; if not, how to
find an imputation of the solution concept; given an imputation, whether
it satisfies the solution concept. In a similar way, Nagamochi, et al., have
studied· a minimum base game on matroid (a extension of the minimum
spanning tree game) [42]. In this line of approach, one may be lured to
try to classify solution concepts by their complexity. However, very often,
they may display different orders in the complexity hierarchy from game to
game. Some concepts may be easier to compute than others in one game
but more difficult in others. But, still, we may ask this question: among all
the games for which v(S) can be computed in polynomial time, what is the
worst case complexity of a solution concept. With all algorithms we know
of, the concepts, core, bargaining set, von Neumann-Morgenstern solution
should be in an order of increasing complexity [9, 11]. However, it would be
nice to have a definite proof in terms of lower bound. The same questions
can be asked for all the games for which the input size is polynomial in the
number of players, which is the case of many combinatorial optimization
games as noted above.

3.2 Totally Balanced Game Models


Balancedness is a concept introduced to characterize the core through its
characterization functions [49]. For all our purposes, it is enough to know
that a game is balanced if and only if it has a nonempty core. Similarly,
totally balanced games are exactly those for which all the subgames have
nonempty cores. Therefore, all convex games are totally balanced.
For the game SEW(G,w), if there is an edge (i,j) with negative weight,
Combinatorial Optimization and Coalition Games 89

the subgame consisting of {i,j} has no core. Therefore for SEW(G,w),


a game is convex if and only if it is totally balanced. However, this is
not true in general. The game constructed by Shapley and Shubik is totally
balanced but has no von Neumann-Morgenstern Solution [52]. For all convex
games, von Neumann-Morgenstern Solution is the same as the core [51]. This
results implicitly shows an example of nonconvex totally balanced game.
The work [18] of Faigle, et al., shows a totally balanced game for which the
membership test problem in core is co-NP-hard (in contrast it is easy to find
an imputation in the core.) There is another example, a cooperative game
associated with the edge connectivity of a graph, with players as edges in
the graph, where the membership test problem for core is co-NP-complete
but an imputation in the core can be found in polynomial time by Deng,
Ibaraki and Nagamochi [10].
The market game (in a pure exchange market) studied by Shapley and
Shubik consists of players who own a initial amount of commodities. Each
of them has a (concave) utility function to maximize. They may trade
between themselves from a market. At the end, with equilibrium price,
commodities one player owns will be maximizing its utility function over
all the possible choices of set of commodities with value no more than that
of his/her initial endorsement. Moreover, for it to be in the core, the sum
of utility functions of players in each subgroup is at maximum for all the
feasible set of commodities they may own. They show that

Theorem 3.1 A game is a market game if and only if it is totally bal-


anced [52}.

In another direction to formulate cooperative games, Owen has intro-


duced a game of pure production [44]. A certain resource vector, bi, is
owned by a player j, j E N. The characterization function of a sub-
set S of players is defined to be the solution of a linear programming:
max{ ex : Ax ~ L,jES bi}. It is shown that the core for linear produc-
tion game is always nonempty [44]. It immediately follows that a linear
production game is totally balanced. In fact the reverse is also true.

Theorem 3.2 A game is totally balanced if and only if it is a linear pro-


duction game [44, 7}.
This is not yet the end of the equivalence class. For the class of combina-
torial optimization game associated with the maximum flow from a source
to a sink on a network, defined by Kalai and Zemel, each player controls
90 x. Deng

one arc in the network. The value of a subset of players is the maximum
flow from the source to the sink on the graph induced by the corresponding
subset of players. This is also general enough to be in the same class.

Theorem 3.3 A game is totally balanced if and only if it is a network flow


game [30, 31}.

However, one weakness of the above equivalence is that the reductions


and formulations may require a number of operations exponential in the
number of players. Consequently, computational complexity issues related
to these models may still remain independent of each other.

3.3 Other Models


It is clear that not all combinatorial optimization games are totally bal-
anced. Several general models are introduced to formulate combinatorial
optimization games. Dubey and Shapley have studied games related to
some nonlinear programs which result in totally balanced games [14]. The
work of Dubey and Shapley requires a resource vector b(S) for each sub-
set S of players (thus leading to an exponential number of linear inequality
constraints) .
Granot has generalized the linear production model to allow the re-
sources in possession of each coalition not to be sum of those of its players.
This can be applied to several combinatorial optimization games, which can-
not be handled by Owen's model. These include the minimum cost spanning
tree game, its directed version, a network synthesis game, and a weighted
matching game [24]. This model may introduce an exponential number of
linear inequalities. However, it has been known in polyhedral combinatorics
that allowing the corresponding linear program to have an exponential num-
ber of constraints may allow it to have integer solutions [16]. This property
is studied and exploited to the proof of existence of a core for these games.
Tamir have studied several combinatorial optimization games which or its
special cases can be reduced to the Owen's linear production game model.
For these cases, an integer solution always exists for the linear program and
the nonemptiness of the core of the corresponding combinatorial optimiza-
tion game follows from Owen's result [57, 58].
Alternatively, it is natural to have integrality conditions included in the
definition of linear production models. Thus, one may define the character-
ization function v(S) of a subset S of players to be the result of an integer
program: max{ cx : Ax ~ LjES hi, x integers}. The study of Shapley and
Combinatorial Optimization and Coalition Games 91

Shubik on the assignment game is a typical example. In this example, the


integer program reduces to a linear program since there is always an integer
solution to the linear program relaxation. Moreover, the particular struction
of the linear constraints make it possible to characterize the core with the
optimal solutions to its dual linear program [53]. Faigle and Kern have stud-
ied this approach for a partition game [20]. Deng, Ibaraki and Nagamochi
have studied it for pack/covering games [10].

4 Algorithms and Complexity of the Core


Studies in solution concepts of combinatorial optimization games, as well
as their computational complexity issues have been as diverse as the cor-
responding combinatorial optimization problems. The diversity and com-
plexity demand even more research efforts because various solution concepts
have added extra dimensions of difficulties with required conditions for them.
In this treatise, we will focus on the case of core to explore some general
methods that are applied to resolve the issues of complexity. In particular,
as in [11, 42, 10], we are interested in computational complexity to decide
whether the core is empty (or put it in a classical terminology, there is no
core); 2) if the core is not empty, construct an imputation of the core; 3)
given an imputation x, decide whether x is in the core.

4.1 The Game of Assignment, Partition and Packing/Covering


In the assignment game [53], a bipartite graph is used to represent M cus-
tomers and N merchants in a market. An edge (i, j) with weight aij represets
the joint profit if customer i buys from merchant j. Every customer buys
from one merchant and every merchant sells to one customer. Define Xij to
be one if customer i buys from merchant j, and zero other wise. Therefore,
the game value v{M U N) is a solution to the following integer program:

max L.iEM,jEN aijXij

s.t. L.iEM Xij ~ 1


L.jENXij ~ 1
x E {O, I}.

It turns out the linear relaxation of this integer program always has
an integer solutions since the coefficient matrix is totally unimodular [26].
92 x. Deng

Thus, we can remove the integrality conditions for evaluation of the game
value:

max ~iEM,jEN aijXij


s.t. ~iEM Xij ::; 1
~jEN Xij ::; 1
x ~ o.
The dual linear program is:

mm ~iEM Yi + ~jEN Zj
s.t. Yi + Zj ~ aij,i E M,j EN
Y ~ O,Z ~ O.

Let x* be a optimal solution to the linear program for the game value,
and let (y*, z*) be an optimal solution to the dual program. By linear
program duality,
L
aijxij = yi + zj,L L
iEM,jEN iEM JEN

which is the game value v(M UN).


On the other hand, The characterization value of a subset S of players
is a solution to the linear program:

max LiESnM,jESnN aijXij

s.t. LiEsnM Xij ::; 1


~jESnN Xij ::; 1
x ~ O.

By linear program duality, this value v(S) is the same value as the opti-
mal value for the dual linear program:

mm LiEsnM Yi + LjESnN Zj
s.t. Yi +Zj ~ aij,i E MnS,j E NnS
Y ~ O,Z ~ O.

It is trivial to see {(yi, zj) : i E S n M, j E S n N} is a feasible solution to


the above dual program. Therefore, LiEMns yi + LjENnS zj ~ v(S). That
is, (y*, z*) is in the core of the assignment game. It is not hard to see that
the reverse is also true.
Combinatorial Optimization and Coalition Games 93

Theorem 4.1 The core of the assignment game is the same as the set of
optimal solutions of a linear program dual to the linear program of the cor-
responding assignment problem [53].

This approach has since been adopted (though it seems that none of these
authors are unaware of Shapley and Shubik and have found the same method
independently for the problems they study) to various other problems by
several authors [56, 20, 10].
Tamir has consider a cost allocation game for a location problem for
which he was able to reduced the exponential number of constraints for
core (x(S) :::; c(S)) to a linear number. And in the special case of the
underlying graph is a tree, a similar result as in [53] is obtained. The core
is exactly the set of optimal solutions for the linear program dual to this
minimization problem [56]. This is because the particular location problem
for a tree always has an integer solution, a situation in essence the same as
that in [53].
Faigle and Kern have defined a class of partition games with the cost
function defined partially on a subcollection of subsets of players. They
extend the cost function to every other subset by defining it to be the min-
imum over all the partitions. Here they are able to associate the existence
of the core as a necessary and sufficient condition for the linear program
relaxation for the corresponding integer program formulation for the par-
titions games [20], though the techniques are very similar to those applied
in [53, 56]. Deng, Ibaraki, and Nagamochi have the same result for packing
games and covering games. A packing game Game(c, A, max) is associated
with an integer program. The row of A is indexed by M, and the column
of A is indexed by N. N is the set of players. 'l/S <;;;; N, v(S) is the value of
the following integer program:

max xtc
s.t. llSI'
xt AM,S:::; xt AM,S ~ O~_ISI'
XE{O,l}m,

where AT,S is the submatrix of A with row set T and column set S, and
v(0) is defined to be O.
A class of covering games is defined in a similar way through a mini-
mization integer program.
94 X. Deng

Theorem 4.2 The core of partition, packing, and covering game is nonempty
if and only if the the linear relaxation of the corresponding optimization
problem contains an integer optimal solution [20, 10j. In the case the core
is not empty, it is the same as the set of optimal solutions of a linear pro-
gram dual to the linear relaxation of the corresponding optimization prob-
lem [53, 56, 20, 1OJ.

The proof is all very similar to the proof for assignment game.
With this theorem, Deng, Ibaraki and Nagamochi have studied sev-
eral fundamental graph theoretical problems to· characterize the subclass
of graphs for which the linear relaxation to corresponding integer program
formulations has an integer solution, which is of independent interests. For
details, the readers are referred to [10].

4.2 Characterization of Total Balancedness


As discussed above, totally balanced games are a very interesting class of
games. It is the same as the class of games for which all the subgames have
nonempty core. Therefore convex games are totally balanced. However,
this class still may include games of unexpected properties. For example,
there is a totally balanced game for which the von Neumann-Morgenstern
solution does not exist [52]. There are examples for which the game is
totally balanced but the membership test problem for core is co-NP-complete
(minimum spanning tree game in [18] and connectivity game in [10)).
There are of course cases for which totally balanced games can be well
characterized. For example, for the partition game of Faigle and Kern, the
structure for the totally balanced property is very simple and easy to find
out.

Theorem 4.3 The partition game is totally balanced if and only if the .cost
function is defined for every single player [62j.

The same problem remains open for packing and covering games in gen-
eral, though some progress has been made in [10]. For a lot of examples
discussed in [10], the total balancedness can be clearly characterized.
For the matching game, the players are nodes of the graph, and the
value of a subset of players is the maximum matching value of the subgraph
induced by nodes corresponding to these players. It is known from the as-
signment game that the game is totally balanced if the graph is bipartite.
It is not hard to see that the reverse is also true (in contrast, there are
Combinatorial Optimization and Coalition Games 95

non-bipartite graphs for which the core is nonempty, in fact, a full charac-
terization of graph for which the core is nonempty is given in [10]). The
same is true of the vertex covering game where players are edges in the
graph and the game value is the minimum vertex set covering all the edges.

Theorem 4.4 The matching game and the vertex cover game is totally bal-
anced if and only if the graph is bipartite.

The proof is by contradiction: An odd induced cycle has core in neither


of these games and there is always an odd induced cycle if the graph is not
bipartite.
In [10], the total balancedness of a chromatic number game is associated
with perfect graphs. Recently, Ramers, Granot and Tijs have discussed a
delivery game associated with the chinese postman problem. It is shown
that this game is convex if and only if it is totally modular. Further more
this class is identified with weakly cyclic graphs [28]. This characterization
remains open for two classes of games mentioned above: the steiner tree
game and the traveling salesman game.
Very often these good characterizations lead to polynomial time algo-
rithms. It would be interesting to have an example for which no polynomial
time algorithm is known.

5 Beyond Complexity for Solution Concepts


In the study of cooperative games, it is suggested to have polynomial time
algorithms (good algorithms following Edmonds [15]) to find solutions to
solution concepts [38]. It is further suggested that computational complexity
be taken as one extra factor in considering rationality and fairness of a
solution concept, and to compare different solution concepts [9, 11], in a
way derived from the concept of bounded rationality [55]. Many authors
have found polynomial time algorithms for some solution concepts in some
cases and have established NP-hard proofs in others.
Still it is hard to draw a conclusion commonly agreed on from these al-
gorithmic and computational complexity results for cooperative games. It
seems to imply that out of the political and economic context from which
these game theoretical issues arise, it is difficult to evaluate exactly what
role computational complexity would play. Prisoner's dilemma is a case of
a successful example in interpreting bounded rationality as computational
96 X. Deng

complexity [43,45]. In this case, limitation in computational power of play-


ers may result in possible cooperative behavior in a game for which the Nash
Equilibrium is a non-cooperative behavior otherwise.
To make our study in complexity and algorithms for cooperative game
meanful to corresponding application areas, it is vital to have computational
complexity issues as an integrated part of theoretical consideration for so-
lution concepts. Even in the case solution concepts are difficult to compute
for some game, it may not be easy to simply dismiss the problem as hope-
less, especially when the game arises from important applications. What
can we say about their solutions from the point of view of computational
complexity in such cases ?
For the emptiness problem, we favor a polynomial time algorithm be-
cause it would allow participants of the game to know there is no solution in
a reasonable amount of time. In case the solution concept set is not empty,
again a polynomial time algorithm would allow participants of the game to
construct a solution in a reasonable amount of time. For the membership
test problem, a polynomial time would allow participants to decide tL 'are
not taken advantage of by a proposed solution.
Even though these three problems may have the same computational
complexity for some games, there are cases their computational complexi-
ties differ. For the minimum spanning tree game, it has been shown that the
membership test problem is co-NP-complete [18] while the emptiness prob-
lem is trivial and to find an imputation in core is solvable in polynomial
time [25]. A similar situation occurs for the edge connectivity game [10].
One would wonder that, when all the participants know how to find an
imputation in the core, what would their reaction be to a proposal of an im-
putation for which verification of its membership in the core is out of reach.
Naturally, to have computationally rational decisions in this situation, we
may propose to restrict the core to those imputations in core for which the
membership are verifiable in polynomial time.

5.1 PC-Core

In general, we consider a class of cooperative games 9 = {(N,v) : INI =


1,2, ... }. In this class, we consider a subset eN of core for each game (N, v)
through a polynomial time 'lUring machine Ts such that,

1. given N, Ts generates an imputation in eN;


Combinatorial Optimization and Coalition Games 97

2. given N and an imputation x E R N , it returns TRUE if x E C N and


FALSE if x is not in CN.
Then, {C N : N E Q} is defined to be a class of polynomial checkable
core (PC-core) for the class Q of games if VN: C N is a subset of the core
for (N, v) and there is a polynomial time Turing machine Tc which outputs
TRUE for every x E CN, and FALSE otherwise.
An immediate result is that for convex games, PC-core remains the same
as the core. The class of imputations, whose memberships in the core of min-
imum spanning tree games are co-NP-complete, discussed by Faigle, et al.,
is now excluded from PC-core. The same holds for the class of imputations
for edge connectivity games discussed in [10].
This only shows one possibility computational complexity can be intro-
duced as an integrated part as a rationality condition for solution concepts.
Here one may change computational complexity requirements for the Turing
machines Ts and Tc as they fit the requirement of the situation. We may
apply this methodology for other solution concepts.

5.2 NE-Core
There is another and even more innovative way to introduce computational
complexity in arguing for the rationality of a solution concept. For the game
SEW(G,w), Deng and Papadimitriou has considered a particular solution of
assigning each node a value equals to half the sum of all edges incident to
the node and has been able to show that this imputation is in the core if and
only if the core is nonempty [11]. Moreover, the emptiness question for the
core is shown to be NP-complete. Therefore, if the core is nonempty, this
particular imputation is a fair solution (in a same way as the justification for
core) since it is in the core. If the core is empty, this imputation is not a fair
solution since it would violate subgroup rationality. Testing the emptiness
of the core is equivalent to verifying that this particular imputation is not
in the core. Both are NP-hard. Therefore, we may define it to be a member
of the extend core.
Let {CN : N E Q} be the set of imputations generated by Turing machine
Ts. It is is defined to be a class of NP -completely extended core (NE-core)
for the class Q of games if VN: CN is a subset of the core if and only if
the core for (N, v) is nonempty; and the emptiness problem for cores of Q is
NP-complete.
In this case, the question whether any vector in CN is in the core is
equivalent to the question whether the core is nonempty and both are co-
98 x. Deng

NP-complete. Therefore, for any imputation in eN, to find a dissatisfactory


subgroup of players is equivalent to establish there is no way to satisfy all the
subgroups. In a way this is similar to the argument supporting the concept
of bargaining set. There an imputation is thought rational if anyone has
an objection against anyone else, the latter would have a counter objection
against the former. Here an imputation x in eN is thought to be rational
because, if any subgroup is unsatisfied with x, then for any imputation, there
is an unsatisfied subgroup, and the computational difficulty for finding both
is the same.
In this sense, the particular imputation for SEW(G,w) discussed above
will be in its NE-core.

6 Remarks and Discussion


In this treatise, we have discussed some progress in combinatorial optimiza-
tion games, in particular, those involved with computational complexity
issues. Several open problems are raised where the corresponding problems
are discussed. An especially interesting class of problems is the characteri-
zation of totally balancedness of combinatorial optimization games, and the
associated complexity issues. This line of work seems to be related to the
theory of graph minors since very often forbidden graphs come in as an
important factor in the characterization for cases already known. We also
have seen examples where computational complexity can be integrated into
solution concepts. We expect this will play an important role as we try
to understand more about what computational complexity, as a measure of
bounded rationality, can help our knowledge in decision making.

References
[1] R.J. Aumann and M. Maschler, The Bargaining Set of Cooperative
Games, in M. Dresher, L.S. Shapley and A.W. Tucker (eds.) Advances
in Game Theory, (Princeton University Press, Princeton, 1964) pp. 443-
447.

[2] J. F. Banzhaf, Weighted Voting Doesn't Work: A Mathematical Analy-


sis, Rutgers Law Reviews Vol. 19 (1965) pp.317-343.

[3] C.G. Bird, Cost-allocation for a spanning tree, Networks 6 (1976) pp.
335-350.
Combinatorial Optimization and Coalition Games 99

[4] A. Claus, and D. Granot, Game Theory Application to Cost Allocation


for a Spanning Tree, Working Paper No 402, (Faculty of Commerce and
Business Administration, University of British Columbia, June 1976).

[5] A. Claus, and D.J. Kleitman, Cost Allocation for a Spanning Tree, Net-
works Vol. 3 (1973) pp. 289-304.

[6] W. H. Cunningham, On Submodular Function Minimization, Combina-


torica Vol. 5 (1985) pp. 185-192.

[7] 1. J. Curiel, Cooperative Game Theory and Applications, Ph.D. disser-


tation, (University of Nijmegen, the Netherlands, 1988).

[8] M. Davis, and M. Maschler, The Kernel of a Cooperative Game, Naval


Research Logistics Quarterly Vol. 12 (1965) pp.223-295.

[9] X. Deng, Mathematical Programming: Complexity and Algorithms,


PhD Thesis, Department of Operations Research, Stanford University,
California (1989).

[10] X. Deng, T. Ibaraki and H. Nagamochi, Combinatorial Optimization


Games, Proceedings 8th Annual ACM-SIAM Symposium on Discrete Al-
gorithms, (New Orleans, LA, 1997) pp. 720-729.

[11] X. Deng and C. Papadimitriou, On the Complexity of Cooperative


Game Solution Concepts, Mathematics of Operations Research Vol. 19,
No.2 (1994) pp. 257-266.

[12] M. Dror, Cost Allocation: The Traveling Salesman, Binpacking, and


the Knapsack, Technical Report, INRS Telecommuincations, Quebac,
Canada (1987).

[13] P. Dubey, and L.S. Shapley, Mathematical Properties of the Banzhaf


Power Index, Mathematics of Opemtions Research Vol. 4 (1979) pp.99-
131.

[14] P. Dubey, and L.S. Shapley, Totally Balanced Games Arising from
Controlled Programming Problems, Mathematical Programming Vol. 29
(1984) pp.245-267.

[15] J. Edmonds, Path, Tree, and Flowers, Canadian Journal of Mathemat-


ics Vol. 17 (1965) pp. 449-469.
100 X. Deng

[16] J. Edmonds, Optimum Branchings, National Bureau of Standards Jour-


nal of Research Vol. 69B (1967) pp.125-130.

[17] U. Faigle, S. Fekete, W. Hochstiittler and W. Kern, On Approximately


Fair Cost Allocation in Euclidean TSP Games, (Technical Report, De-
partment of Applied Mathematics, University of Twente, The Nether-
lands, 1994).
[18] U. Faigle, S. Fekete, W. Hochstiittler and W. Kern, On the Complexity
of Testing Membership in the Core of Min-cost Spanning Tree Games,
International Journal of Game Theory Vol 26 (1997) pp.361-366.
[19] U. Faigle, S. Fekete, W. Hochstiittler and W. Kern, The Nukleon of
Cooperative Games and an Algorithm for Matching Games, (Technical
Report #94.178, Universitiit zu Kaln, Germany, 1994).
[20] U. Faigle and W. Kern, Partition games and the core of hierarchi-
cally convex cost games, (Universiteit Twente, faculteit der toegepaste
wiskunde, Memorandum, No. 1269, June 1995).
[21] D. Gillies, Solutions to General Nonzero Sum Games, Annals of Math-
ematical Studies Vol. 40 (1959) pp.47-85
[22] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide
to the Theory of NP-completeness, (W.H. Freeman & Company, Publish-
ers, San Francisco, 1979).
[23] D. Granot, A Note on the Roommate Problem and a Related Revenue
Allocation Problem, Management Science Vol. 30 (1984) pp.633-643.
[24] D. Granot, A Generalized Linear Production Model: A Unified Model,
Mathematical Programming Vol. 34 (1986) pp.212-222.
[25] D. Granot, and G. Huberman, On the Core and Nucleolus of Minimum
Cost Spanning Tree Games, Mathematical Programming Vol. 29 (1984)
pp.323-347.
[26] M. Gratschel, L. Lovasz and A. Schrijver, Geometric Algorithms and
Combinatorial Optimization, (Springer-Verlag, Hong Kong, 1988).

[27] A. Hallefjord, R. Helming, and K. J0rnsten, Computing the Nucleo-


lus when the Characteristic Function is Given Implicitly: A Constraint
Generation Approach, International Journal of Game Theory, Vol. 24
(1995) pp. 357-372.
Combinatorial Optimization and Coalition Games 101

[28] Herber Hamers, Daniel Granot, and Stef Tijs, On Some Balanced, To-
tally Balanced and Submodular Delivery Games, Program and Abstract
of 16th International Symposium on Mathematical Programming (1997)
p.118.

[29] E. Kalai, Games, Computers, and O.R., Proceedings of the 6th


ACM/SIAM Symposium on Discrete Algorithms, (1995) pp. 468-473.

[30] E. Kalai and E. Zemel, Totally Balanced Games and Games of Flow,
Mathematics of Operations Research Vol. 7 (1982) pp. 476-478.

[31] E. Kalai and E. Zemel, Generalized Network Problems Yielding Totally


Balanced Games, Operations Research Vol. 30 (1982) pp. 998-1008.

[32] Jeroen Kuipers, Minimum Cost Forest Games, International Journal


of Game Theory Vol. 26 (1997) pp.367-377.

[33] Jeroen Kuipers, A Polynomial Time Algorithm for Computing the Nu-
cleolus of Convex Games, Program and Abstracts of the 16th Interna-
tional Symposium on Mathematical Programming (1997) p. 156.

[34] S.C. Littlechild, A Simple Expression for the Nucleolus in a Special


Case, International Journal of Game Theory Vol. 3 (1974) pp.21-29.
[35] S.C. Littlechild and G. Owen, A Simple Expression for the Shapley
Value in a Special Case, Management Science Vol. 20 (1973) pp. 370-
372.
[36] W. F. Lucas, The proof that a game may not have a solution, Trans-
actions of the American Mathematical Society vol. 137 pp. 219-229.

[37] A. Mas-Colell, An Equivalence Theorem for a Bargaining Set, Journal


of Mathematical Economics Vol. 18 (1989) pp.129-139.

[38] N. Megiddo, Computational Complexity and the game theory approach


to cost allocation for a tree, Mathematics of Operations Research Vol. 3
(1978) pp. 189-196.

[39] N. Megiddo, Cost Allocation for Steiner Trees, Networks Vol. 8 (1978)
pp. 1-6.

[40] M. Maschler, B. Peleg, L. S. Shapley, The Kernel and Bargaining Set


for Convex Games, International Journal of Game Theory Vol. 1 (1972)
pp. 73-93.
102 X. Deng

[41] J.F., Jr., Nash, Equilibrium Points in n-person Games, Proceedings of


the National Academy of Science U.S.A. Vol. 36 (1950) pp. 48-49.

[42] H. Nagamochi, D. Zeng, N. Kabutoya and T. Ibaraki, Complexity of


the Minimum Base Games on Matroids, to appear in Mathematics of
Operations Research.

[43] A. Neyman, Bounded Complexity Justifies Cooperation in the Finitely


Repeated Prisoner's Dilemma," Economics Letters Vol. 19 (1985) pp.
227-229.

[44] G. Owen, On the core of Linear Production Games, Mathematical Pro-


gramming Vol. 9 (1975) pp.358-370.

[45] C. H. Papadimitriou and M. Yannakakis, On Complexity as Bounded


Rationality, Proceedings of the 26th ACM Symposium on the Theory of
Computing, (1994) pp. 726-733.

[46] J. Potters, I. Curiel, and S. Tijs, Traveling Salesman Games, Mathe-


matical Programming Vol. 53 (1992) pp.199-211.

[47] F. Sanchez S., Balanced Contribution Axiom in the Solution of Cooper-


ative Games, Games and Economic Behavior Vol. 20 (1997) pp.161-168.

[48] D. Schmeidler, The Nucleolus of a Characteristic Function Game, SIAM


Journal of Applied Mathematics Vol. 17 (1969) pp.1163-1170.

[49] L. S. Shapley, On Balanced Sets and Cores, Naval Research Logistics


Quarterly Vol. 14 (1967) pp. 453-460.

[50] L. S. Shapley, A Value for n-person Games, in H. Kuhn and A.W.


Tucker (eds.) Contributions to the Theory of Games Vol. II (Princeton
University Press, Princeton, 1953) pp.307-317.

[51] L. S. Shapley, Cores of Convex Games, Int. J. of Game Theory Vol. 1,


pp. 11-26, 1972.

[52] L. S. Shapley, and M. Shubik, On Market Games, J. Econ. Theory Vol.


1 (1969) pp. 9-25.

[53] L. S. Shapley, and M. Shubik, The Assignment Game, International


Journal of Game Theory Vol. 1 (1972) pp. 111-130.
Combinatorial Optimization and Coalition Games 103

[54] M. Shubik, Game Theory Models and Methods in Political Economy,


in Arrow and Intriligator (eds.) Handbook of Mathematical Economics,
Vol. I, (North-Holland, New York, 1981) pp. 285-330.

[55] H. Simon, Theories of Bounded Rationality, in R. Radner (eds.) Deci-


sion and Organization, (North Holland, 1972).

[56] A. Tamir On the Core of Cost Allocation Games Defined on Location


Problems, Preprints, Second International conference on Locational De-
cisions (ISOLDE 81) (Skodsborg, Denmark, 1981) pp.387-402.

[57] A. Tamir On the Core of a Traveling Salesman Cost Allocation Game,


Operations Research Letters Vol. 8 (1989) pp.31-34.

[58] A. Tamir On the Core of Network Synthesis Games, Mathematical Pro-


gramming Vol. 50 (1991) pp.123-135.

[59] A. Tamir and J.S.B. Mitchell, A maximum b-matching problem arising


from median location models with applications to the roommate prob-
lem, To appear in Mathematical Programming.

[60] S. Tijs Bounds for the Core and the T-value, in O. Moeschlin and P.
Pallaschke (eds.) Game Theory and Mathematical Economics, (North
Holland Publishing Company, 1981) pp.123-132.

[61] J. von Neumann and O. Morgenstern, Theory of Games and Economic


Behavior (Princeton University Press, Princeton, 1944).

[62] W. Zang, and X. Deng, manuscript.


105

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 105-157
©1998 Kluwer Academic Publishers

Steiner Minimal Trees: An Introduction, Parallel


Computation, and Future Work
Frederick C. Harris, Jr.
Department of Computer Science
University of Nevada, Reno, NV 89557
E-mail: fredh<Ocs. unr. edu

Contents
1 Introduction 106

2 The First Solution 108


3 A Proposed Heuristic 110
3.1 Background and Motivation . . . . . . . . . . . . 110
3.2 Adding One Junction . . . . . . . . . 111
3.3 The Heuristic . . . . . . . . . 111
3.4 Results......... 112

4 Problem Decomposition 116


4.1 The Double Wedge Theorem . . . . . 117
4.2 The Steiner Hull . . . . . . . . . . . . 118
4.3 The Steiner Hull Extension . . . . . . 120

5 Winter's Sequential Algorithm 121


5.1 Overview and Significance ... 121
5.2 Winter's Algorithm . . . . . . . . . . ... 122

6 A Parallel Algorithm 123


6.1 A Introduction to Parallelism ........ . ..... 123
6.2 Overview and Proper Structure . . ........ 124
6.3 First Approach .. • • • 0 ........ 124
6.4 Current Approach .............. . . . . .. . ........ 126
106 F. C. Harris, Jr.

7 Extraction of the Correct Answer 127


7.1 Introduction and Overview . 127
7.2 Incompatibility Matrix. 127
7.3 Decomposition ... 129
7.4 Forest Management. . 130

8 Computational Results 131


8.1 Previous Computation Times . . . . . . . . . . . 131
8.2 The Implementation . . . . . . . . . . . . . . . . 133
8.2.1 The Significance of the Implementation 133
8.2.2 The Platform . . . . 133
8.2.3 Errors Encountered . . . . . 134
8.3 Random Problems . . . . . . . . . . 135
8.3.1 100 Point Random Problems 135
8.3.2 Larger Random Problems 137
8.4 Grids . . . . . . . . . . . . . . . 138
8.4.1 2 x m and Square Grids 138
8.4.2 3 x m Grids . 139
8.4.3 4 x m Grids . 140
8.4.4 5 x m Grids . 143
8.4.5 6 x m Grids. 145
8.4.6 7 x m Grids. 147

9 Future Work 149


9.1 Grids ... 149
9.2 Further Parallelization 150
9.3 Additional Problems . 151
9.3.1 I-Reliable Steiner Tree Problem 151
9.3.2 Augmenting Existing Plane Networks 151

References

1 Introduction
Minimizing a network's length is one of the oldest optimization problems
in mathematics and, consequently, it has been worked on by many of the
leading mathematicians in history. In the mid-seventeenth century a sim-
ple problem was posed: Find the point P that minimizes the sum of the
distances from P to each of three given points in the plane. Solutions to
this problem were derived independently by Fermat, Torricelli, and Cava-
lieri. They all deduced that either P is inside the triangle formed by the
Steiner Minimal Trees: Parallel Computation 107

given points and that the angles at P formed by the lines joining P to the
three points are all 1200 , or P is one of the three vertices and the angle at
P formed by the lines joining P to the other two points is greater than or
equal to 1200 •
In the nineteenth century a mathematician at the University of Berlin,
named Jakob Steiner, studied this problem and generalized it to include an
arbitrarily large set of points in the plane. This generalization created a
star when P was connected to all the given points in the plane, and is a
geometric approach to the 2-dimensional center of mass problem.
In 1934 Jarnik and Kassler generalized the network minimization prob-
lem even further [38]: Given n points in the plane find the shortest possible
connected network containing these points. This generalized problem, how-
ever, did not become popular until the book, What is Mathematics, by
Courant and Robbins [14], appeared in 1941. Courant and Robbins linked
the name Steiner with this form of the problem proposed by Jarnik and
Kassler, and it became known as the Steiner Minimal Tree problem. The
general solution to this problem allows multiple points to be added, each of
which is called a Steiner Point, creating a tree instead of a star.
Much is known about the exact solution to the Steiner Minimal Tree
problem. Those who wish to learn about some of the spin-off problems
are invited to read the introductory article by Bern and Graham [3], the
excellent survey paper on this problem by Hwang and Richards [34], or the
recent volume in The Annals of Discrete Mathematics devoted completely
to Steiner Tree problems [35]. Some of the basic pieces of information about
the Steiner Minimal Tree problem that can be gleaned from these articles
are: (i) the fact that all of the original n points will be of degree 1, 2, or
3, (ii) the Steiner Points are all of degree 3, (iii) any two edges meet at an
angle of at least 120 0 in the Steiner Minimal Tree, and (iv) at most n - 2
Steiner Points will be added to the network.
This paper concentrates on the Steiner Minimal Tree problem, hence-
forth referred to as the SMT problem. We present several algorithms for
calculating Steiner Minimal Trees, including the first parallel algorithm for
doing so. Several implementation issues are discussed, some new results are
presented, and several ideas for future work are proposed.
In Section 2 we review the first fundamental algorithm for calculating
SMT's. In Section 3 we present a proposed heuristic for SMT's. In Section 4
problem decomposition for SMT's is outlined. In Section 5 we present Win-
ter's sequential algorithm which has been the basis for most computerized
calculation of SMT's to the present day. In Section 6 we present a parallel
108 F. C. Harris, Jr.

algorithm for SMT's. Extraction of the correct answer is discussed in Sec-


tion 7. Computational Results are presented in Section 8 and Future Work
and open problems are presented in Section 9.

2 The First Solution


A typical problem-solving approach is to begin with the simple cases and
expand to a general solution. As we saw in Section 1, the trivial three point
problem had already been solved in the 1600's, so all that remained was the
work toward a general solution. As with many interesting problems this is
harder than it appears on the surface.
The method proposed by the mathematicians of the mid-seventeenth
century for the three point problem is illustrated in Figure 1. This method
stated that in order to calculate the Steiner Point given points A, B, and
C, you first construct an equilateral triangle (ACX) using the longest edge
between two of the points (AC) such that the third (B) lies outside the
triangle. A circle is circumscribed around the triangle, and a line is con-
structed from the third point (B) to the far vertex of the triangle (X). The
location of the Steiner Point (P) is the intersection of this line (BX) with
the circle.

Figure 1: AP + CP = PX.
The next logical extension of the problem, going to four points, is at-
tributed to Gauss. In a letter to his friend, Gauss discussed various cases for
Steiner Minimal Trees: Parallel Computation 109

four points and also pointed out that the problem is worth studying. It is
interesting to note at this point that a general solution to the SMT problem
has recently been uncovered in the archives of a school in Germany [25].
For the next thirty years after Kassler and Jarnik presented the general
form of the SMT problem, only heuristics were known to exist. The heuris-
tics were typically based upon the Minimum-Length Spanning Tree (MST),
which is a tree that spans or connects all vertices whose sum of the edge
lengths is as small as possible, and tried in various ways to join three vertices
with a Steiner Point. In 1968 Gilbert and Pollak [24] linked the length of
the SMT to the length of a MST. It was already known that the length of
an MST is an upper bound for the length of an SMT, but their conjecture
stated that the length of an SMT would never be any shorter than ~ times
the length of an MST. This conjecture, was recently proved [15], and has
led to the MST being the starting point for most of the heuristics that have
been proposed in the last 20 years including a recent one that achieves some
very good results [28].
In 1961 Melzak developed the first known algorithm for calculating an
SMT [41]. Melzak's Algorithm was geometric in nature and was based upon
some simple extensions to Figure 1. The insight that Melzak offered was the
fact that you can reduce an n point problem to a set of n -1 point problems.
This reduction in size is accomplished by taking every pair of points, A and
C in our example, calculating where the two possible points, Xl and X2,
would be that form an equilateral triangle with them, and creating two
smaller problems, one where Xl replaces A and C, and the other where X2
replaces A and C. Both Melzak and Cockayne pointed out however that
some of these sub-problems are invalid. Melzak's algorithm can then be
run on the two smaller problems. This recursion, based upon replacing two
points with one point, finally terminates when you reduce the problem from
three to two vertices. At this termination the length of the tree will be the
length of the line segment connecting the final two points. This is due to
the fact that BP + AP + a P = B P + P X. This is straightforward to prove
using the law of cosines, for when P is on the circle, LAPX = LOPX = 60°.
This allows the calculation of the last Steiner Point (P) and allows you to
back up the recursive call stack to calculate where each Steiner Point in that
particular tree is located.
This redud.vn is important in the calculation of an SMT, but the algo-
rithm still has exponential order, since it requires looking at every possible
reduction of a pair of points to a single point. The recurrence relation for
110 F. C. Harris, Jr.

an n-point problem is stated quite simply in the following formula:

T(n) = 2 * ( ~ ) * T(n -1).


This yields what is obviously a non-polynomial time algorithm. In fact
Garey, Graham, and Johnson [16] have shown that the Steiner Minimal
Tree problem is NP-Hard (NP-Complete if the distances are rounded up to
discrete values).
In 1967, just a few years after Melzak's paper, Cockayne [9] clarified
some of the details from Melzak's proof. This clarified algorithm proved to
be the basis for the first computer program to calculate SMTs. The program
was developed by Cockayne and Schiller [13] and could compute an SMT
for any placement of up to 7 vertices.

3 A Proposed Heuristic
3.1 Background and Motivation
By exploring a structural similarity between stochastic Petri nets (see [42]
and [43]) and Hopfield neural nets (see [26] and [32]), Geist was able to pro-
pose and take part in the development of a new computational approach for
attacking large, graph-based optimization problems. Successful applications
of this mechanism include I/O subsystem performance enhancement through
disk cylinder remapping [21, 22], file assignment in a distributed network to
reduce disk access conflict [20], and new computer graphics techniques for
digital halftoning [19] and color quantization [18]. The mechanism is based
on maximum-entropy Gibbs measures, which is described in Reynold's dis-
sertation [47], and provides a natural equivalence between Hopfield nets and
the simulated annealing paradigm. This similarity allows you to select the
method that best matches the problem at hand. For the SMT problem we
will implement the Simulated Annealing approach.
Simulated An~ea1ing [39] is a probabilistic algorithm that has been ap-
plied to many optimization problems in which the set of feasible solutions
is so large that an exhaustive search for an optimum solution is out of the
question. Although Simulated Annealing does not necessarily provide an op-
timum solution, it usually provides a good solution in a user-selected amount
of time. Hwang and Richards [34] have shown that the optimal placement
of s Steiner Points to n original vertices yields a feasible solution space of
Steiner Minimal Trees: Parallel Computation 111

the size
2- n ( n ) (n - s - 2)!
s+2 s!

provided that none of the original points have degree 3 in the SMT. If the
degree restriction is removed they showed that the number is even larger.
The SMT problem is therefore a good candidate for this approach.

3.2 Adding One Junction


Georgakopoulos and Papadimitriou [23] have provided an O(n2 ) solution to
the 1-Steiner problem, wherein exactly one Steiner Point is added to the
original set of points. Since at most n - 2 Steiner Points are needed in
an SMT solution, repeated application of the algorithm offers a "greedy"
O{n 3 ) approach. Using their method, the first Steiner Point is selected by
partitioning the plane into Oriented Dirichlet Cells, which they describe in
detail. Since these cells do not need to be discarded and recalculated for each
addition, subsequent additions can be accomplished in linear time. Deletion
of a candidate Steiner Point requires regeneration of the MST, which Shamos
showed can be accomplished in O(n log n) time if the points are in the
plane [44], followed by the cost for a first addition (O(n2)). This approach
can be regarded as a natural starting point for Simulated Annealing by
adding and deleting different Steiner Points.

3.3 The Heuristic


The Georgakopoulos and Papadimitriou I-Steiner algorithm and the Shamos
MST algorithm are both difficult to implement. As a result, we have chosen
to investigate the potential effectiveness of this Annealing Algorithm using
a more direct, but slightly more expensive O{n 3 ) approach. As previously
noted, all Steiner Points have degree 3 with edges meeting in angles of
1200 • We consider all (~) triples where the largest angle is less than 1200 ,
compute the Steiner Point for each (a simple geometric construction), select
that Steiner Point giving greatest reduction, or least increase in the length of
the modified tree (increases are allowed since the Annealing Algorithm may
go uphill) and update the MST accordingly. Again, only the first addition
requires this (now O(n 3 )) step. We use the straightforward, O(n 2 ) Prim's
algorithm to generate the MST initially and after each deletion of a Steiner
Point.
112 F. C. Harris, Jr.

The Annealing Algorithm can be described as a non-deterministic walk


on a surface. The points on the surface correspond to the lengths of all
feasible solutions, where two solutions are adjacent if they can be reached
through the addition or deletion of one Steiner Point. The probability of
going uphill on this surface is higher when the temperature is higher but
decreases as the temperature cools. The rate of this cooling typically will
determine how good your solution will be. The major portion of this algo-
rithm is presented in Figure 2. This non-deterministic walk, starting with
the MST has led to some very exciting results.

3.4 Results
Before we discuss large problems, a simple introduction into the results from
a simple six point problem is in order. The Annealing Algorithm is given
the coordinates for six points: (0,0), (0,1), (2,0), (2,1), (4,0), (4,1). The first
step is to calculate the MST, which has a length of 7, as shown in Figure 3.
The output of the Annealing Algorithm for this simple problem is shown in
Figure 4. In this case the Annealing Algorithm calculates the exact SMT
solution which has a length of 6.616994.
We propose as a measure of accuracy the percentage of the difference be-
tween the length of the MST and the exact SMT solution that the Annealing
Algorithm achieves. This is a new measure which has not been discussed (or
used) because exact solutions have not been calculated for anything but the
most simple layouts of points. For the six point problem discussed above
this percentage is 100.0% (the exact solution is obtained).
After communicating with Cockayne, data sets were obtained for exact
solutions to randomly generated 100 point problems that were developed
for [12]. This allows us to use the measure of accuracy previously described.
Results for some of these data sets provided by Cockayne are shown in
Table 1.
An interesting aspect of the Annealing Algorithm that cannot be shown
in the table is the comparison of execution times with Cockayne's program.
Whereas Cockayne mentioned that his results had an execution cut-off of 12
hours, these results were obtained in less than 1 hour. The graphical output
for the first line of the table, which reaches over 96% of the optimal value,
appears as follows: the data points and the MST are shown in Figure 5, the
Simulated Annealing Result is in Figure 6, and the Exact SMT Solution is
lo
in Figure 7. The solution presented here is obtained in less than th of the
time with less than 4% of the possible range not covered. This indicates that
Steiner Minimal Trees: Parallel Computation 113

#define EQUILIBRIUM «accepts>=100 AND rejects>=200) OR


(accepts+rejects > 500»
#define FROZEN «temperature < 0.5) OR «temperature < 1.0)
AND (accepts==O»)

while(not(FROZEN»{
accepts = rejects = 0;
old_energy = energy();
while (not (EQUILIBRIUM»{
operation = add_or_delete()j
switch(operation) {
case ADD:
tlE = energy_change...from..adding..a...node();
break;
case DELETE:
tlE = energy_change...from_deleting..a...nodeO;
break;
}
if (rand (0 ,1) < emin{O.O,-.6E/temperature}) {
accepts++;
old_energy = new_energy:
}else {
1* put them back *1
undo_change(operation)j
rejects++j
}
}
temperature = temperature*0.8:
}

Figure 2: Simulated Annealing algorithm.


114 F. C. Harris, Jr.

(0,0) (0,1) (2,0) (2,1) (4,0) (4,1)

Figure 3: Spanning tree for 6 point problem.

Figure 4: 6 point solution.

we could hope to extend our Annealing Algorithm to much larger problems,


perhaps as large as 1,000 points. If we were to extend this approach to
larger problems we would definitely need to implement the Georgakopoulos-
Papadimitriou I-Steiner Algorithm and the Shamos MST Algorithm.

Table 1: Results from 100 point problems.

Exact Solution Spanning Tree Simulated Annealing Percent Covered


6.255463 6.448690 6.261797 96.39%
6.759661 6.935189 6.763495 98.29%
6.667217 6.923836 6.675194 96.89%
6.719102 6.921413 6.721283 99.01%
6.759659 6.935187 6.763493 98.29%
6.285690 6.484320 6.289342 98.48%
Steiner Minimal Trees: Parallel Computation 115

Figure 5: Spanning tree.

Figure 6: Simulated Annealing solution.


116 F. C. Harris, Jr.

Figure 7: Exact solution.

4 Problem Decomposition
After the early work by Melzak [41], many people began to work on the
Steiner Minimal Thee problem. The first major effort was to find some
kind of geometric bound for the problem. In 1968 Gilbert and Pollak [24]
showed that the SMT for a set of points, S, must lie within the Convex Hull
of S. This bound has since served as the starting point of every bounds
enhancement for SMT's.
As a brief review, the Convex Hull is defined as follows: Given a set of
points S in the plane, the Convex Hull is the convex polygon of smallest area
containing all the points of S. A polygon is defined to be convex if a line
segment connecting any two points inside the polygon lies entirely within
the polygon. An example of the Convex Hull for a set of 100 randomly
generated points is shown in Figure 8.
Shamos in his PhD thesis [48] proposed a Divide and Conquer Algorithm
which has served as the basis for many parallel algorithms calculating the
Convex Hull. One of the first such approaches appeared in the PhD thesis by
Chow [6]. This approach was refined and made to run in optimal O(log n)
time by Aggarwal et.al. [1], and Attalah and Goodrich [2].
Steiner Minimal Trees: Parallel Computation 117

0
00
0

0
0 0
0
0
0 00
0
0
0
0 0 0
0 0

0 0 0
0
0 0
0 0 0
0 0
f9
0 0
0
0
0 0
0
o 0 0 0 0
0"0
0 0
0 0 0
0 0
0
0
0 0
0
0 0
0
0 0
0
0 0
0

0 0
0 0

Figure 8: The Convex Hull for a random set of points.

The next major work on the SMT problem was in the area of problem
decomposition. As with any non-polynomial algorithm, the most important
theorems are those that say "If property P exists, then the problem may
be split into the following sub-problems." For the Steiner Minimal Tree
Problem property P will probably be geometric in nature. Unfortunately,
decomposition theorems have been few and far between for the SMT prob-
lem. In fact, at this writing there have been only three such theorems.

4.1 The Double Wedge Theorem


The first decomposition theorem, known as the Double Wedge Theorem,
was proposed by Gilbert and Pollak [24]. This is illustrated in Figure 9 and
can be summarized quite simply as follows: If two lines intersect at point
X and meet at 1200 , they split the plane into two 1200 wedges and two 60 0
wedges. If Rl and R2 denote the two 60 0 wedges, and all the points of S are
contained in Rl UR2 , then the problem can be decomposed. There are two
cases to be considered. In case 1 X is not a point in Sj therefore, the Steiner
Minimal Tree for S is the SMT for Rl, the SMT for R2, and the shortest
edge connecting the two trees. In case 2 X is a point in Sj therefore, the
Steiner Minimal Tree for S is the SMT for Rl and the SMT for R2 • Since
118 F. C. Harris, Jr.

Figure 9: An illustration of the Double Wedge.

X is in both RI and R2 the two trees are connected.

4.2 The Steiner Hull


The next decomposition theorem is due to Cockayne [10] and is based upon
what he termed the Steiner Hull. The Steiner Hull is defined as follows: let
PI be the convex Hull. ~+l is constructed from Pi by finding an edge (p, r)
of Pi that has a vertex (q) near it such that Lpqr 2: 1200 and there is not a
vertex inside the triangle pqr. The final polygon, Pi, that can be created in
such a was is called the Steiner Hull. The Algorithm for this construction is
shown in Figure 10. The Steiner Hull for the 100 points shown in Figure 8
is given in Figure 11.
After defining the Steiner Hull, Cockayne showed that the SMT for S
must lie within the Steiner Hull of S. This presents us with the following
decomposition: The Steiner Hull can be thought of as an ordered sequence
of points, {PI,P2,'" ,Pn}, where the hull is defined by the sequence of line
segments, {PIP2,P2P3,'" ,PnpI}. If there exists a point Pi that occurs twice
in the Steiner Hull, then the problem can be decomposed at point Pi. If a
Steiner Hull contains such a point, then the Steiner Hull is referred to as
degenerate. This decomposition is accomplished by showing that the Steiner
Hull splits S into two contained subsets, RI and R2, where RI is the set of
points contained in the Steiner Hull from the first time Pi appears until the
last time Pi appears, and R2 is the set of points contained in the Steiner
Hull from the last time Pi appears until the first time Pi appears. With this
decomposition it can be shown that S = RI UR2, and the SMT for S is
Steiner Minimal Trees: Parallel Computation 119

The initial Steiner Polygon, Pl. is the Convex Hull.


Repeat
Create Next Steiner Polygon Pi+I from Pi by
1) find a set of points pqr E S such that:
p and r are adjacent on Pi
Lpqr ~ 1200
~ a point from S in the triangle pqr
2) remove the edge pro
3) add edges pq and qr.
Until (Pi == PHI)
Steiner Hull = Pi

Figure 10: The Steiner Hull algorithm.

Figure 11: The Steiner Hull for a random set of 100 points.
120 F. C. Harris, Jr.

the union of the SMT for Rl and the SMT for R2. This decomposition
is illustrated in Figure 12. Cockayne also proved that the Steiner Hull
decomposition includes every decomposition possible with the Double Wedge
Theorem.
In their work on 100 point problems, Cockayne and Hewgill [12] men-
tion that approximately 15% of the randomly generated 100 point problems
have degenerate Steiner Hull's. The Steiner Hull shown in Figure 11 is not
degenerate, while that in Figure 12 is.

Figure 12: An illustration of the Steiner Hull decomposition.

4.3 The Steiner Hull Extension


The final decomposition belongs to Hwang, Song, Ting, and Du [36]. They
proposed an extension to the Steiner Hull as defined by Cockayne. Their
extension is as follows:
If there exist four points a, b, c, d on a Steiner Hull such that:
1. a, b, c, d form a convex quadrilateral,
2. there does not exist a point from S in the quadrilateral (a, b, c, d),
3. La ~ 120° and Lb ~ 120°,
4. the two diagonals (ac) and (bd) meet at 0, and LbOa ~ La+Lb-150°,
Steiner Minimal Trees: Parallel Computation 121

then the SMT for S is the union of the SMT's for Rl and R2 and the edge ab
where Rl is the set of points contained in the Steiner Hull from e to b with
the edge be, and R2 is the set of points contained in the Steiner Polygon
from a to d with the edge ad. This decomposition is illustrated in Figure 13.
These three decomposition theorems were combined into a parallel algo-
rithm for decomposition presented in [27].

R2 RI

Figure 13: An illustration of the Steiner Hull extension.

5 Winter's Sequential Algorithm


5.1 Overview and Significance
The development of the first working implementation of Melzak's algorithm
sparked a move into the computerized arena for the calculation of SMT's. As
we saw in Section 2, Cockayne and Schiller [13] had implemented Melzak's
Algorithm and could calculate the SMT for all arrangements of 7 points.
This was followed almost immediately by Boyce and Seery's program which
they called STEINER72 [4]. Their work, done at Bell Labs could calculate
the SMT for all 10 point problems. They continued to work on the problem
and in personal communication with Cockayne said they could solve 12 point
problems with STEINER73. Yet even with quite a few people working on
the problem, the number of points that any program could handle was still
very small.
As mentioned towards the end of Section 2, Melzak's algorithm yields
invalid answers and invalid tree structures for quite a few combinations
of points. It was not until 1981 that anyone was able to characterize a
few of these invalid tree structures. These characterizations were accom-
plished by Pawel Winter and were based upon several geometric construc-
122 F. C. Harris, Jr.

tions which enable one to eliminate many of the possible combinations pre-
viously generated. He implemented these improvements in a program called
GeoSteiner [52]. In his work he was able to calculate in under 30 seconds
SMT's for problems having up to 15 vertices and stated that "with further
improvements, it is reasonable to assert that point sets up to 30 V-points
could be solved in less than an hour [52]."

5.2 Winter's Algorithm


Winter's breakthrough was based upon two things: the use of extended
binary trees, and what he termed pushing. Winter proposed an extended
binary tree as a means of constructing trees only once and easily identifying
a Full Steiner Tree (FST: trees with n vertices and n - 2 Steiner Points) on
the same set of vertices readily.
Pushing came from the geometric nature of the problem and is illustrated
in Figure 14. It was previously known that the Steiner Point for a pair of
points, a and b, would lie on the circle that circumscribed that pair and
their equilateral third point. Winter set out to limit this region even further.
This limitation was accomplished by placing a pair of points, a' and b', on
the circle at a and b respectively, and attempting to push them closer and
closer together. In his work Winter proposed and proved various geometric
properties that would allow you to push a' towards band b' towards a. If
the two points ever crossed then it was impossible for the current branch of
the sample space tree to contain a valid answer.
Unfortunately, the description of Winter's algorithm is not as clear as
one would hope, since the presence of goto statements rapidly makes his
program difficult to understand, and almost impossible to modify. Winter's
goal is to build a list of FST's which are candidates for inclusion in the final
answer. This list, called Tlist, is primed with the edges of the MST, thereby
guaranteeing that the length of the SMT does not exceed the length of the
MST.
The rest of the algorithm sets about to expand what Winfur termed as
Qlist, which is a list of partial trees that the algorithm attempts to com-
bine until no combinations are possible. Qlist is primed with the origipal
input points. The legality of a combination is determined in the Constr~ct
procedure, which uses pushing to eliminate cases. While this combination
proceeds, the algorithm also attempts to take newly created members of
Qlist and create valid FST's out of them. These FST's are then placed
onto Tlist.
Steiner Minimal Trees: Parallel Computation 123

Figure 14: An illustration of Winter's pushing.

This algorithm was a turning point in the calculation of SMT's. It


sparked renewed interest into the calculation of SMT's in general. This
renewed interest has produced new algorithms such as the Negative Edge
Algorithm [51] and the Luminary Algorithm [33]. Winter's algorithm has
also served as the foundation for most computerized computation for cal-
culating SMT's and is the foundation for the parallel algorithm we present
next.

6 A Parallel Algorithm
6.1 A Introduction to Parallelism
Parallel computation is allowing us to look at problems that have previously
been impossible to calculate, as well as allowing us to calculate faster than
ever before problems we have looked at for a long time. It is with this in

mind that we begin to look at a parallel algorithm for the Steiner Minimal
Tree problem.
There have been volumes written on parallel computation and parallel
algorithms; therefore, we will not rehash the material that has already been
so excellently covered by many others more knowledgeable on the topic, but
will refer the interested readers to various books currently available. For
a thorough description of parallel algorithms, and the PRAM Model the
124 F. C. Harris, Jr.

reader is referred to the book by Joseph JaJa [37], and for a more practical
approach to implementation on a parallel machine the reader is referred to
the book by Vipin Kumar et.al. [40], the book by Michael Quinn [45], or the
book by Justin Smith [49].

6.2 Overview and Proper Structure


When attempting to construct a parallel algorithm for a problem the sequen-
tial code for that problem is often the starting point. In examining sequential
code, major levels of parallelism may become self-evident. Therefore for this
problem the first thing to do is to look at Winter's algorithm and convert
it into structured code without got os. The Initialization (Step 1) does not
change, and the translation of steps 2 through 7 appears in Figure 15.
Notice that the code in Figure 15 lies within a for loop. In a first
attempt to parallelize anything you typically look at loops that can be split
across multiple processors. Unfortunately, upon further inspection, the loop
continues while p<q and, in the large if statement in the body of the loop,
is the statement q++ (line 30). This means that the number of iterations is
data dependent and is not fixed at the outset. This loop cannot be easily
parallelized.
Since the sequential version of the code does not lend itself to easy par-
allelization, the next thing to do is back up and develop an understanding
of how the algorithm works. The first thing that is obvious from the code
is that you select a left subtree and then try to mate it with possible right
subtrees. Upon further examination we come to the conclusion that a left
tree will mate with all trees that are shorter than it, and all trees of the
same height that appear after it on QJist, but it will never mate with any
tree that is taller.

6.3 First Approach


The description of this parallel algorithm is in a master-slave perspective.
This perspective was taken due to the structure of most parallel architectures
at the time of its development, as well as the fact that all nodes on the QJist
need a sequencing number assigned to them. The master will therefore be
responsible for numbering the nodes and maintaining the main QJist and
TJist.
The description from the slave's perspective is quite simple. A process is
spawned off for each member of QJist that is a proper left subtree (Winter's
Steiner Minimal Trees: Parallel Computation 125

1* Step 2 */
1 for(p=Oj p<qj p++){
2 AP = A(p)j
3 1* Step 3 */
4 for(r=Oj «H(p) > H(r» AND (r!=q))j r++){
5 if«H(p) == H(r» AND (r<p»
6 r = pj
7 if(Subset(V(r), AP»{
8 p...star = pj
9 r...star = r;
10 for(Label = PLUS; Label <= MINUS; Label++){
11 /* Step 4 */
12 AQ = A(q);
13 if( Construct(p...star,r...star,&(E(q»» {
14 L(q) = pj
15 R(q) = fj
16 LBL( q) = Labelj
17 LF(q) = LF(p)j
18 H(q) = H(p) + Ij
19 /* next line is different */
20 Min(q) = max(Min(p)-1,H(r»j
21 if(Lsp(p) != 0)
22 Lsp( q) = Lsp(p)
23 else
24 Lsp(q) = Lsp(r)
25 if(Rsp(r) != 0)
26 Rsp(q) = Rsp(r)
27 else
28 Rsp( q) = Rsp(p)
29 q...star = qj
30 q++i
31 1* Step 5 */
32 if(Proper _to.Add_Tree_to_Tlist( q...star» {
33 for...all(j in AP with Lf(R(q...star» < j){
34 SRoot(t) = jj
35 Root(t) = q...starj
36 t++j
37 }
38 }
39 }
40 1* Step 6 */
41 p...star = rj
42 r...star = Pi
43 }
44 }
45 }
46 }

Figure 15: The main loop properly structured.


126 F. C. Harris, Jr.

algorithm allows members of QJist that are not proper left subtrees). Each
new process is then given all the current nodes on QJist. With this infor-
mation the slave then can determine with which nodes its left subtree could
mate. This mating creates new nodes that are sent back to the master, as-
signed a number and added to the master's QJist. The slave also attempts
to create an FST out of the new QJist member, and if it is successful, this
FST is sent to the master to be added to the T Jist. When a process runs
out of QJist nodes to check it sends a request for more nodes to the master.
The master also has a simple job description. It has to start a process
for each initial member of the QJist, sends them all the current members of
the QJist and wait for their messages.
This structure worked quite well for smaller problems (up to about 15
points), but for larger problems it reached a grinding halt quite rapidly. This
was due to various reasons such as the fact that for each slave started the
entire QJist had to be sent. This excessive message passing quickly bogged
down the network. Secondly in their work on 100 point problems Cockayne
and Hewgill [12] made the comment that T Jist has an average length of 220,
but made no comment about the size of QJist, which is the number of slaves
that would be started. From our work on 100 point problems this number
easily exceeds 1, 000 which means that over 1, 000 processes are starting,
each being sent the current QJist. From these few problems, it is quite easy
to see that some major changes needed to be made in order to facilitate the
calculation of SMT's for large problems.

6.4 Current Approach


The idea for a modification to this approach came from a paper by Quinn
and Deo [46J, on parallel algorithms for Branch-and-Bound problems. Their
idea was to let the master have a list of work that needs to be done. Each
slave is assigned to a processor. Each slave requests work, is given some,
and during its processing creates more work to be done. This new work is
placed in the master's work list, which is sorted in some fashion. When a
slave runs out of work to do, it requests more from the master. They noted
that this leaves some processors idle at times (particularly when the problem
was starting and stopping), but this approach provides the best utilization
if all branches are independent.
This description almost perfectly matches the problem at hand. First,
we will probably have a fixed number of processors which can be determined
at runtime. Secondly we have a list of work that needs to be done. The hard
Steiner Minimal 'Irees: Parallel Computation 127

part is implementing a sorted work list in order to obtain a better utilization.


This was implemented in what we term the ProcJist, which is a list of the
processes that either are currently running or have not yet started. This list
is primed with the information about the initial members of QJist, and for
every new node put on the QJist, a node which contains information about
the QJist node, is placed on the ProcJist in a sorted order.
The results for this approach are quite exciting, and the timings are
discussed in Section 8.

7 Extraction of the Correct Answer


7.1 Introduction and Overview
Once the T Jist discussed in Section 5 is formed, the next step is to extract
the proper answer from it. Winter described this in step 7 of his algorithm.
His description stated that unions of FST's saved in T Jist were to be formed
subject to constraints described in his paper. The shortest union is the SMT
for the original points.
The constraints he described were quite obvious considering the defi-
nition of an SMT. First, the answer had to cover all the original points.
Second, the union of FST's could not contain a cycle. Third, the answer is
bounded in length by the length of the MST.
This led Winter to implement a simple exhaustive search algorithm over
the FST's in TJist. This approach yields a sample space of size O(2m)
(where m is the number of trees in TJist) that has to be searched. This
exponentiality is born out in his work where he stated that for problems
with more than 15 points "the computation time needed to form the union
of FST's dominates the computation time needed for the construction of the
FST's [52]." An example of the input the last step of Winter's algorithm
receives (TJist) is given in Figure 16. The answer it extracts (the SMT) is
shown in Figure 17.

7.2 Incompatibility Matrix


Once Cockayne published the clarification of Melzak's proof in 1967 [9] and
Gilbert and Pollak published their paper giving an upper bound the the
SMT length [24] many people were attracted to this problem. From this
time until Winter's work was published in 1985 [52] quite a few papers were
published dealing with various aspects of the SMT Problem, but the attempt
128 F. C. Harris, Jr.

Figure 16: TJist for a random set of points.

Figure 17: SMT extracted from T Jist for a random set of points.
Steiner Minimal Trees: Parallel Computation 129

to computerize the solution of the SMT problem bogged down around 12


vertices. It wasn't until Winter's algorithm was published that the research
community received the spark it needed to work on computerized compu-
tation of the SMT problem with renewed vigor. With the insight Winter
provided into the problem, an attempt to computerize the solution of the
SMT problem began anew.
Enhancement of this algorithm was first attempted by Cockayne and
Hewgill at the University of Victoria. For this implementation Cockayne
and Hewgill spent most of their work on the back end of the problem, or
the extraction from T list, and used Winter's algorithm to generate T list.
This work on the extraction focused on what they termed an incompatibil-
ity matrix. This matrix had one row and one column for each member of
T list. The entries in this matrix were flags corresponding to one of three
possibilities: compatible, incompatible, or don't know. The rationale behind
the construction of this matrix is the fact that it is faster to look up the
value in a matrix than it is to check for the creation of cycles and improper
angles during the union of FST's.
The first value calculations for this matrix were straightforward. If two
trees do not have any points in common then we don't know if they are
incompatible or not. If they have two or more points in common then they
form a cycle and are incompatible. If they have only one point in common
and the angle at the intersection point is less than 1200 then they are also
incompatible. In all other cases they are compatible.
This simple enhancement to the extraction process enabled Cockayne
and Hewgill to solve all randomly generated problems of size up to 17 vertices
in a little over three minutes [11].

7.3 Decomposition
The next focus of Cockayne and Hewgill's work was in the area of the de-
composition of the problem. As was discussed earlier in Section 4, the best
theorems for any problem, especially non-polynomial problems, are those of
the form "If property P exists then the problem can be decomposed." Since
the formation of unions of FST's is exponential in nature any theorem of
this type is important.
Cockayne and Hewgill's theorem states: "Let Al and A2 be subsets of
n
A satisfying (i) Al UA2 = A (ii) IAI A21 = 1 and (iii) the leaf set of each
FST in T list is entirely contained in Al or A 2. Then any SMT on A is
the union of separate SMT's on Al and A2 [11]." This means that if you
130 F. C. Harris, Jr.

break T Jist into biconnected components, the SMT will be the union of the
SMT's on those components.
Their next decomposition theorem allowed further improvements in the
calculation of SMT's. This theorem stated that if you had a component
of T Jist left from the previous theorem and if the T Jist members of that
component form a cycle, then it might be possible to break that cycle and
apply the previous algorithm again. The cycle could be broken if there
existed a vertex v whose removal would change that component from one
biconnected component to more than one.
With these two decomposition theorems, Cockayne and Hewgill were able
to calculate the SMT for 79 of 100 randomly generated 30 point problems.
The remaining 21 would not decompose into blocks of size 17 or smaller, and
thus would have taken too much computation time [11]. This calculation
was implemented in the program they called EDSTEINER86.

7.4 Forest Management

Cockayne and Hewgill's next work focused on improvements to the incom-


patibility matrix previously described and was implemented in a program
called EDSTEINER89, Their goal was to reduce the number of don't know's
in the matrix and possibly remove some FST's from TJist altogether.
They proposed two refinements for calculating the entry into the in-
compatibility matrix and one Tree Deletion Theorem. The Tree Deletion
Theorem stated that if there exists an FST in T Jist that is incompatible
with all FST's containing a certain point a then the original FST can be
deleted since at least one FST containing a will be in the SMT.
This simple change allowed Cockayne and Hewgill to calculate the SMT
for 77 of 100 randomly generated 100 point problems [12]. The other 23
problems could not be calculated in less than 12 hours and were therefore
terminated. For those that did complete, the computation time to generate
T Jist had become the dominate factor in the overall computation time.
So the pendulum had swung back from the extraction of the correct
answer from T Jist to the generation of T Jist dominating the computation
time. In Section 8 we will look at the results of the parallel algorithm
presented in Section 9 to see if the pendulum can be pushed back the other
way one more time.
Steiner Minimal Trees: Parallel Computation 131

8 Computational Results
8.1 Previous Computation Times
Before presenting the results for the parallel algorithm presented in Sec-
tion 6, it is worthwhile to review the computation times that have preceded
this algorithm in the literature. The first algorithm for calculating FST's
was discussed in a paper by Cockayne [10] where he mentioned that prelimi-
nary results indicated his code could solve any problem up to 30 points that
could be decomposed with the Steiner Hull into regions of 6 points or less.
As we saw in Section 2, the next computational results were presented by
Cockayne and Schiller [13]. Their program, called STEINER, was written
in FORTRAN on an IBM 360/50 at the University of Victoria. STEINER
could calculate the SMT for any 7 point problem in less than 5 minutes of
cpu time. When the problem size was increased to 8 it could solve them if 7
of the vertices were on the Steiner Hull. When this condition held it could
calculate the SMT in under 10 minutes, but if this condition did not hold it
would take an unreasonable amount of time.
Cockayne called STEINER a prototype for calculating SMT's and al-
lowed Boyce and Serry of Bell Labs to obtain a copy of his code to improve
the work. They improved the code, renamed it STEINER72, were able to
calculate the FST for all 9 point problems and most 10 point problems in
a reasonable amount of time [4]. Boyce and Serry continued their work
and developed another version of the code that they thought could solve
problems of size up to 12 points, but no computation times were given.
The breakthrough we saw in Section 5 was by Pawel Winter. His program
called GEOSTEINER [52] was written in SIMULA 67 on a UNIVAC-llOO.
GEOSTEINER could calculate SMT's for for all randomly generated sets
with 15 points in under 30 seconds. This improvement was put into focus
when he mentioned that all previous implementations took more than an
hour for non-degenerate problems of size 10 or more. In his work, Winter
tried randomly generated 20 point problems but did not give results since
some of them did not finish in his cpu time limit of 30 seconds. The only
comment he made for problems bigger than size 15 was that the extraction
discussed in Section 7 was dominating the overall computation time.
The next major program, EDSTEINER86, was developed in FORTRAN
on an IBM 4381 by Cockayne and Hewgill [11]. This implementation was
based upon Winter's results, but had enhancements in the extraction pro-
cess. EDSTEINER86 was able to calculate the FST for 79 out of 100 ran-
132 F. C. Harris, Jr.

domly generated 32 point problems. For these problems the cpu time for
T Jist varied from 1 to 5 minutes, while for the 79 problems that finished
the extraction time never exceeded 70 seconds.

Cockayne and Hewgill subsequently improved their SMT program and


renamed it EDSTEINER89 [12]. This improvement was completely focused
on the extraction process. EDSTEINER89 was still written in FORTRAN,
but was run on a SUN 3/60 workstation. They randomly generated 200 32-
point problems to solve and found that the generation of T Jist dominated
the computation time for problems of this size. The average time for T Jist
generation was 438 seconds while the average time for forest management
and extraction averaged only 43 seconds. They then focused on 100 point
problems and set a cpu limit of 12 hours. The average cpu time to generate
T Jist was 209 minutes for these problems, but only 77 finished the extraction
in the cpu time limit. These programs and their results are summarized in
Table 2.

Table 2: SMT Programs, authors, and results.

Program Author(s) Points


STEINER Cockayne & Schiller 7
Univ of Victoria
STEINER72 Boyce & Serry 10
ATT Bell Labs
STEINER73 Boyce & Serry 12
ATT Bell Labs
GEOSTEINER Winter 15
Univ of Copenhagen
EDSTEINER86 Cockayne & Hewgill 30
Univ of Victoria
EDSTEINER89 Cockayne & Hewgill 100
Univ of Victoria
PARSTEINER94 Harris 100
Univ of Nevada
Steiner Minimal Trees: Parallel Computation 133

8.2 The Implementation


8.2.1 The Significance of the Implementation
The parallel algorithm we presented has been implemented in a program
called PARSTEINER94 [27, 30]. This implementation is only the second
SMT program since Winter's GEOSTEINER in 1981 and is the first parallel
code. The major reason that the number of SMT programs is so small is
due to the fact that any implementation is necessarily complex.
PARSTEINER94 currently has over 13,000 lines of C code. While there
is a bit of code dealing with the parallel implementation, certain sections of
Winter's Algorithm have a great deal of code buried beneath the simplest
statements. For example line 13 of Figure 15 is the following:

if(Construct(p_star,r_star,&(E(q)))){.

To implement the function Construct () over 4, 000 lines of code were nec-
essary, and this does not include the geometry library with functions such
as equilateraLthird_point (), center _of _equilateraLtriangle (),
line_circle_intersect 0, and a host more.
Another important aspect of this implementation is the fact that there
can now be comparisons made between the two current SMT programs.
This would allow verification checks to be made between EDSTEINER89
and PARSTEINER94. This verification is important since with any complex
program it is quite probable that there are a few errors hiding in the code.
This implementation would also allow other SMT problems, such as those we
will discuss in Section 9, to be explored independently, thereby broadening
the knowledge base for SMT's even faster.

8.2.2 The Platform


In the design and implementation of parallel algorithms you are faced with
many decisions. One such decision is what will your target architecture be?
There are times when this decision is quite easy due to the machines at hand
or the size of the problem. In our case we decided not to target a specific
machine, but an architectural platform called PVM [17].
PVM, which stands for Parallel Virtual Machine, is a software package
available from Oak Ridge National Laboratory. This package allows a col-
lection of parallel or serial machines to appear as a large distributed memory
computational machine (MIMD model). This is implemented via two major
134 F. C. Harris, Jr.

pieces of software, a library of PVM interface routines, and a PVM demon


that runs on every machine that you wish to use.
The library interface comes in two languages, C and Fortran. The func-
tions in this library are the same no matter which architectural platform you
are running on. This library has functions to spawn off (start) many copies
of a particular program on the parallel machine, as well as functions to allow
message passing to transfer data from one process to another. Application
programs must be linked with this library to use PVM.
The demon process, called pvmd in the user's guide, can be considered
the back end of PVM. As with any back end, such as the back end of a
compiler, when it is ported to a new machine the front end can interface to
it without change. The back end of PVM has been ported to a variety of
machines, such as a few versions of Crays, various Unix machines such as Sun
workstations, HP machines, Data General workstations, and DEC Alpha
machines. It has also been ported to a variety of true parallel machines
such as the iPSC/2, iPSC/860, CM2, CM5, BBN Butterfly and the Intel
Paragon.
With this information it is easy to see why PVM was picked as the
target platform. Once a piece of code is implemented under PVM it can
be re-compiled on the goal machine, linked with the PVM interface library
on that machine, and run without modification. In our case we designed
PARSTEINER94 on a network of SUN workstations, but, as just discussed,
moving to a large parallel machine should be trivial.

8.2.3 Errors Encountered

When attempting to implement any large program from another person's


description you often reach a point where you don't understand something.
At first you always question yourself, but as you gain an understanding of
the problem you learn that there are times when the description you were
given is wrong. Such was the case with the SMT problem. Therefore, to
help some of those that may come along and attempt to implement this
problem after us we recommend that you look at the list of errors we found
while implementing Winter's Algorithm [27].
Steiner Minimal Trees: Parallel Computation 135

8.3 Random Problems

8.3.1 100 Point Random Problems

From the literature it is obvious that the current standard for calculating
SMT's has been established by Cockayne and Hewgill. Their work on SMT's
has pushed the boundary of computation out from the 15 point problems of
Winter to being able to calculate SMT's for a large percentage of 100 point
problems.
Cockayne and Hewgill, in their investigation of the effectiveness of ED-
STEINER89, randomly generated 100 problems with 100 points inside the
unit square. They set up a CPU limit of 12 hours, and 77 of 100 prob-
lems finished within that limit. They described the average execution times
as follows: T Jist construction averaged 209 minutes, Forest Management
averaged 27 minutes, and Extraction averaged 10.8 minutes.
While preparing the code for this project, Cockayne and Hewgill were
kind enough to supply us with 40 of the problems generated for [12] along
with their execution times. These data sets were given as input to the
parallel code PARSTEINER94, and the calculation timed. The Wall Clock
time necessary to generate T Jist for the two programs appear in Table 3.
For all 40 cases, the average time to generate T Jist was less than 20 minutes.
This is exciting because we have been able to generate T Jist properly, while
cutting an order of magnitude off the time.
These results are quite promising for various reasons. First, the parallel
implementation presented in this work is quite scalable, and therefore could
be run with many more processors, thereby enhancing the speedup provided.
Secondly, with the PVM platform used, we can in the future port this work
to a real parallel MIMD machine, which will have much less communication
overhead, or to a shared memory machine, where the communication could
all but be eliminated, and expect the speedup to improve much more.
It is also worth noting that proper implementation of the Cycle Break-
ing which Cockayne and Hewgill presented in [11], is important if extrac-
tion of the proper answer is to be accomplished. In their work, Cockayne
and Hewgill mentioned that 58% of the problems they generated were solv-
able without the Cycle Breaking being implemented, which is approximately
what we have found with the data sets they provided. An example of such
a T Jist that would need cycles broken (possibly multiple times) is provided
in Figure 18.
136 F. C. Harris, Jr.

Table 3: Comparison of T Jist times.

I Test Case I PARSTEINER94 I EDSTEINER89 I


1 650 8597
2 1031 13466
3 1047 15872
4 1687 17061
5 874 13258
6 1033 15226
7 1164 12976
8 1109 16697
9 975 15354
10 554 8650
11 660 9894
12 946 13057
13 858 13687
14 978 17132
15 819 11333
16 752 12766
17 896 13815
18 788 10508
19 618 10550
20 724 11193
21 983 11357
22 889 12999
23 1449 15028
24 890 14417
25 912 17562
26 1125 12395
27 943 15721
28 583 10014
29 1527 18656
30 681 10033
31 873 16401
32 791 10217
33 1132 18635
34 1097 18305
35 1198 19657
36 803 11174
37 923 15256
38 824 12920
39 826 12538
40 972 15570
Avg. I 939 I 13748 I
Steiner Minimal Trees: Parallel Computation 137

Figure 18: T Jist with more than 1 cycle.

8.3.2 Larger Random Problems


Once the 100 point problems supplied by Cockayne and Hewgill had been
successfully completed, the next step was to try a few larger problems. This
was done with the hope of gaining an insight into the changes that would
be brought about from the addition of more data points.
For this attempt we generated several random sets of 110 points each.
The length of T Jist increased by approximately 38%, from an average of 210
trees to an average of 292 trees. The time to compute T Jist also increased,
but the growth more than doubled, going from an average of 15 minutes to
an average of more than 40 minutes.
The interesting thing that jumped out the most was the increase in the
number of large bi-connected components. Since the extraction process must
do a complete search of all possibilities, the larger the component the longer
it will take. This is a classic example of an exponential problem, where
when the problem size increases by 1 the time doubles. With this increased
component size, none of the random problems generated finished inside a 12
hour cut off time.
This rapid growth puts into perspective the importance of the work pre-
viously done by Cockayne and Hewgill. Continuation of their work with
138 F. C. Harris, Jr.

incompatibility matrices as well as decomposition of T Jist components ap-


pears at this point to be very important for the future of SMT calculations.

8.4 Grids
The problem of determining SMTs for grids was mentioned to the author
by Ron Graham. In this context we are thinking of a grid as a regular
lattice of unit squares. The literature has little of information regarding
SMTs on grids, and most of the information that is given is conjectured and
not proven. In Section 8.4.1 we will look at what is known about SMTs on
grids. In the following sub-sections we will introduce new results for grids
up through 7 x m in size. These results presented are computational results
from PARSTEINER94 [27, 29, 30] which was discussed previously.

8.4.1 2 x m and Square Grids


The first proof for anything besides a 2 x 2 grid came in a paper by Chung and
Graham [8] in which they proved the optimality of their characterization of
SMTs for 2 x m grids. The only other major work was presented in a paper
by Chung, Gardner, and Graham [7]. They argued the optimality of the
SMT on 2 x 2, 3 x 3, and 4 x 4 grids and gave conjectures and constructions
for those conjectures for SMTs on all other square lattices.
In their work Chung, Gardner, and Graham specified three building
blocks from which all SMTs on square (n X n) lattices were constructed.
The first, labeled I, is just a K2, or a path on two vertices. This building
block is given in Figure 19-A. The second, labeled y, is a Full Steiner Tree
(FST) (n vertices and n - 2 steiner points) on 3 vertices of the unit square.
This building block is given in Figure 19-B. The third, labeled X, is an FST
on all 4 vertices of the unit square. This building block is given in Figure 19-
C. For the generalizations we are going to make here, we need to introduce
one more building block, which we will label S. This building block is an
FST on a 3 x 2 grid and appears in Figure 19-D.
SMTs for grids of size 2 x m have two basic structures. The first is an
FST on all the vertices in the 2 x m grid. An example of this for a 2 x 3
grid is given in Figure 19-D. The other structure is constructed from the
building blocks previously described. We hope that these building blocks,
when put in conjunction with the generalizations for 3 x m, 4 x m, 5 x m,
6 x m, and 7 x m will provide the foundation for a generalization of m x n
grids in the future.
Steiner Minimal Trees: Parallel Computation 139

(A) (B) (C) (D)

Figure 19: Building Blocks

In their work on ladders (2 x m grids) Chung and Graham established


and proved the optimality of their characterization for 2 x m grids. Before
giving their characterization, a brief review of the first few 2 x m SMTs is
in order. The SMT for a 2 x 2 grid is shown in Figure 19-C, the SMT for a
2 x 3 grid is shown in Figure 19-D, and the SMT for a 2 x 4 grid is given in
Figure 20.
Chung and Graham [8] proved that SMTs for ladders fell into one of two
categories. If the length of the ladder was odd, then the SMT was the FST
on the vertices of the ladder. The SMT for the 2 x 3 grid in Figure 19-D
is an example of this. If the length of the ladder was even, the SMT was
made up of a series of (r; - 1) XI's followed by one last X. The SMT for
the 2 x 4 grid in Figure 20 is an example of this.

Figure 20: SMT for a 2 x 4 Grid

8.4.2 3 x m {;rids
The SMT for 3 x m grids has a very easy characterization which can be
seen once the initial cases have been presented. The SMT for the 3 x 2 grid
is presented in Figure 19-D. The SMT for the 3 x 3 grid is presented in
Figure 21.
From here we can characterize all 3 x m grids. Except for in the 3 x 2
grid, which is an S building block, there will be only two basic building
140 F. C. Harris, Jr.

Figure 21: SMT for a 3 x 3 Grid

blocks present, X's and Z's. There will be exactly two I's and (m - l)X's.
The two Z's will appear on each end of the grid. The X's will appear in a
staggered checkerboard pattern, one on each column of the grid the same
way that the two X's are staggered in the 3 x 3 grid. The 3 x 5 grid is a
good example of this and is shown in Figure 22.

Figure 22: SMT for a 3 x 5 Grid

8.4.3 4 x m Grids

The foundation for the 4 x m grids has already been laid. In their most
recent work, Cockayne and Hewgill presented some results on Square Lattice
Problems [12]. They looked at 4 x m grids for m = 2 to m = 6. They also
looked at the SMTs for these problems when various lattice points in that
grid were missing. What they did not do, however, was characterize the
structure of the SMT's for all 4 x m grids.
The 4 x 2 grid is given in Figure 20. From the work of Chung, Gardner,
and Graham [7], we know that the SMT for a 4 x 4 grid is a checkerboard
pattern of 5 X's. This layout gives us the first two patterns we will need
to describe the 4 x m generalization. The first pattern, which we will call
Steiner Minimal Trees: Parallel Computation 141

pattern A, is the same as the 3 x 4 grid without the two I's on the ends.
This pattern is given in Figure 23. The second pattern, denoted as pattern
B, is the 2 x 4 grid in Figure 20 without the connecting I. This is shown in
Figure 24.

Figure 23: 4 x m Pattern A

Figure 24: 4 x m Pattern B

Before the final characterization can be made, two more patterns are
needed. The first one, called pattern C, is a 4 x 3 grid where the pattern is
made up of two non-connected 2 x 3 SMTs, shown in Figure 25. The next
pattern, denoted pattern V, is quite simply a Y centered in a 2 x 4 grid.
This is shown in Figure 26. The final pattern, denoted c, is just an I on
the right side of a 2 x 4 grid. This is shown in Figure 27.
Now we can begin the characterization. The easiest way to present the
characterization is with some simple string rewriting rules. Since the 4 x 2,
4 x 3, and 4 x 4 patterns have already been given, the rules will begin with
a 4 x 5 grid. This grid has the string AC. The first rule is that whenever
there is a C on the right end of your string replace it with BVB. Therefore
a 4 x 6 grid is cal AB DB. The next rule is that whenever there is a B on
the right end of your string replace it with a C. The final rule is whenever
there is a VC on the right end of your string replace it with an cAB. These
rules are summarized in Table 4. A listing of the strings for m from 5 to 11
is given in Table 5.
142 F. C. Harris, Jr.

Figure 25: 4 x m Pattern C

o o

o o

Figure 26: 4 x m Pattern V

I
o o o

o o o

Figure 27: 4 x m Pattern £

1 B-+C
2 C -+ BVB
3 VC -+ £AB

Table 4: Rewrite rules for 4 x m Grids.


Steiner Minimal Trees: Parallel Computation 143

m= 9 10 11
String AB£AC AB£ABVB AB£ABVC

Table 5: String Representations for 4 x m Grids

8.4.4 5 x m Grids
For the 5 x m grids there are 5 building blocks (and their mirror images
which are donated with an ') that are used to generate any 5 x m grid.
These building blocks appear in Figure 28, Figure 29, Figure 30, Figure 31,
and Figure 32.

xx "

"

Figure 28: 5 x m Pattern A

x "
o
"
o
o

Figure 29: 5 x m Pattern B

With the building blocks in place, the characterization of 5 x m grids is


quite easy using grammar rewrite rules. The rules used for rewriting strings
representing a 5 x m grid are given in Table 6. The SMTs for 5 x 2, 5 x 3,
and 5 x 4 have already been given. For a 5 x 5 grid the SMT is made up
of the following string: £A'BV. As a reminder, the A' signifies the mirror
144 F. C. Harris, Jr.

Figure 30: 5 x m Pattern C

Figure 31: 5 x m Pattern V

>-< >-< I
Figure 32: 5 x m Pattern e
Steiner Minimal 'ITees: Parallel Computation 145

1 C ~ B'1)'
2 1) ~ A'£
3 £~AC
4 C' ~ B1)
5 1)' ~ A£'
6 £' ~ A'C'

Table 6: Rewrite rules for 5 x m Grids

of building block A. A listing of the strings for m from 5 to 11 is given in


Table 7.

m= 5 6 7 8
String £A'B1) £A'BA'£ £A'BA'AC £A'BA'AB'1)'

m= 9 10 11 I
String £A'BA'AB'A£' £A'BA' AB' AA'C' £A'BA'AB'AA'B1) I

Table 7: String Representations for 5 x m Grids

8.4.5 6 x m Grids

For the 6 x m grids there are 5 building blocks that are used to generate any
6 x m grid. These building blocks appear in Figure 33, Figure 34, Figure 35,
Figure 36, and Figure 37.

KKK
Figure 33: 6 x m Pattern A
146 F. C. Harris, Jr.

o xx o

Figure 34: 6 x m Pattern B

Figure 35: 6 x m Pattern C

Figure 36: 6 x m Pattern V

o o

o o

Figure 37: 6 x m Pattern e


Steiner Minimal Trees: Parallel Computation 147

The solution for 6 x m grids can now be characterized by using grammar


rewrite rules. The rules used for rewriting strings representing a 6 x m grid
are given in Table 8. The basis for this rewrite system is the SMT for the
6 x 3 grid which is AC. It is also nice to see that for the 6 x m grids there
is a simple regular expression which can characterize what the string will
be. That regular expression has the form: A(B£)*(CIBV). A listing of the
strings for m from 6 to 11 is given in Table 9.

1 C -t BV
2 V -t £C

Table 8: Rewrite rules for 6 x m Grids

m= 9 10 11
String AB£B£B£C AB£B£B£BV AB£B£B£B£C

Table 9: String Representations for 6 x m Grids

8.4.6 7 x m Grids
For the 7 x m grids there are 6 building blocks that are used to generate any
7 x m grid. These building blocks appear in Figure 38, Figure 39, Figure 40,
Figure 41, Figure 42, and Figure 43.

xxx o

Figure 38: 7 x m Pattern A


148 F. C. Harris, Jr.

o o o o o

o o o o o

Figure 39: 7 x m Pattern B

o o o

o o o

Figure 40: 7 x m Pattern C

Figure 41: 7 x m Pattern 'D

o o o

o o o

Figure 42: 7 x m Pattern £

x X X I
Figure 43: 7 x m Pattern :F
Steiner Minimal Trees: Parallel Computation 149

The grammar rewrite rules for strings representing a 7 x m grid are given
in Table 10. The basis for this rewrite system is the SMT for the 7 x 5 grid
which is FA' £' F'. A listing of the strings for m from 6 to 11 is given in
Table 11.

1 £' F' ~ BA'F


2 F~CV
3 CV ~ A£F
4 £F ~ B'AF'
5 F' ~ C'V'
6 C'V' ~ A'£' F'

Table 10: Rewrite rules for 7 x m Grids

m= 6 7 8 9
String FA'BA'F FA'BA'CV FA'BA'A£F FA' BA' AB' AF'

10 11 12
:FA' BA' AB' AC'V' :FA' BA' AB' AA' &':F' :FA' BA' AB' AA' BA':F

Table 11: String Representations for 7 x m Grids

9 Future Work
9.1 Grids
In this work we reviewed what is known about SMTs on grids and then
presented results from PARSTEINER94 [27, 30] which characterize SMTs
for 3 x m to 7 x m grids. The next obvious question is what is the charac-
terization for an 8 x m grid, or an n X m grid? Well, this is where things
start getting nasty. Even though PARSTEINER94 cuts the computation
time of the previous best program for SMTs by an order of magnitude, the
computation time for an NP-Hard problem blows up sooner or later, and
8 x m is where we run into the computation wall.
We have been able to make small chips into this wall though, and have
some results for 8 x m grids. The pattern for this seems to be based upon
150 F. C. Harris, Jr.

Figure 44: 8 x 8

repeated use of the 8 x 8 grid which is shown in Figure 44. This grid solution
seems to be combined with smaller 8 x solutions in order to build larger
solutions. However, until better computational approaches are developed
further characterizations of SMTs on grids will be very hard, and tedious.

9.2 Further Parallelization


There remains a great deal of work that can be done on the Steiner Minimal
Tree problem in the parallel arena. The first thing to consider is whether
there are other ways to approach the parallel generation of T Jist that would
be more efficient. Improvement in this area would push the computation
pendulum even further away from T Jist generation and towards SMT ex-
traction.
The next thing to consider is the entire extraction process. The initial
generation of the incompatibility matrix has the appearance of easy paral-
lelization. The forest management technique introduced by Cockayne and
Hewgill could also be put into a parallel framework, thereby speeding up
the preparation for extraction quite a bit.
With this in~tialization out of the way, decomposition could then be
considered. The best possible enhancement here might be the addition of
thresholds. As with most parallel algorithms, for any problem smaller than
a particular size it is usually faster to solve it sequentially. These thresholds
could come into play in determining whether to call a further decomposition,
such as the cycle decomposition introduced by Cockayne and Hewgill that
was discussed in Section 7.
Steiner Minimal Trees: Parallel Computation 151

The final option for parallelization is one that may yield the best re-
sults, and that is in the extraction itself. Extraction is basically a branch
and bound process, using the incompatibility matrix. This branch and bound
is primed with the length of the MST as the initial bound, and continues
until all possible combinations have been considered. The easiest implemen-
tation here would probably be the idea presented in the paper by Quinn and
Deo [46] that served as the basis for the parallel algorithm in Section 6.

9.3 Additional Problems


9.3.1 I-Reliable Steiner Tree Problem

If we would like to be able to sustain a single failure of any vertex, without


interrupting communication among remaining vertices, the minimum length
network problem takes on a decidedly different structure. For example, in
any FST all of the original vertices are of degree 1, and hence anyone can
be disconnected from the network by a single failure of the adjacent Steiner
Point.
We would clearly like a minimum length 2-connected network. The an-
swer can be the minimum length Hamiltonian cycle (consider the vertices
of the unit square), but it need not be, as shown in the 8 graph given in
Figure 45.
Here we can add Steiner points near the vertices of degree 3, and reduce
the network length without sacrificing 2-connectivity. This is not just a sin-
gle graph, but is a member of a family of graphs that look like ladders, where
the e graph has only one internal rung. We hope to extend earlier work
providing constructions on 2-connected graphs [31] to allow effective appli-
cation of an Annealing Algorithm that could walk through graphs within
the 2-connected class.

9.3.2 Augmenting Existing Plane Networks


In practical applications, it frequently happens that new points must be
joined to an existing Steiner Minimal Tree. Although a new and larger
SMT can, in principle, be constructed which connects both the new and the
existing points, this is typically impractical. ego in cases where a fiber optic
network has already been constructed. Thus the only acceptable approach
is to add the new points to the network as cheaply as possible. Cockayne
has presented this problem which we can state as follows:
152 F. C. Harris, Jr.

Figure 45: Theta graph.

Augmented Steiner Network: Given a connected plane graph G =


(V, E) (i.e. an embedding of a connected planar graph in E2) and a set V' of
points in the plane which are not on edges of G, construct a connected plane
supergraph G" = (V", E"), such that V" contains V UV', E" contains E,
and the sum of the Euclidean lengths of the set of edges in E" - E is a
minimum. In constructing the plane graph G" it is permitted to add an
edge connecting a point in Vito an interior point of an edge in G. It is also
permitted to add Steiner points. Thus, strictly speaking, G" need not be a
supergraph of G.
The Augmented Steiner Network Problem clearly has applications in
such diverse areas as canal systems, rail systems, housing subdivisions, irri-
gation networks and computer networks. For example, given a (plane) fiber
optic computer network G = (V, E) and a new set V' of nodes to be added
to the network, the problem is to construct a set F' of fiber optic links with
minimum total length that connects Vito G. The set F' of new links is
easily seen to form a forest in the plane, because the minimum total length
requirement ensures that there cannot be cycles in F'.
As an example, consider the situation in Figure 46 where G consists of
a single, long edge and V' = VI, ... ,Vs. The optimal forest F' consists of
three trees joining G at iI, h and fa. It is necessary that extra Steiner
points 81, 82 and 8a be added so that F has minimum length.
While we are aware of several algorithms for solving special cases of the
Augmented Existing Plane Network Problem, such as those by Chen [5] and
Trietsch [50] or the special case where the graph G consists of a single vertex,
in which case the problem is equivalent to the classical Steiner Minimal Tree
Steiner Minimal Trees: Parallel Computation 153

3
1 4
2 6

Figure 46: An Optimal Forest.

Problem, we are not aware of any algorithms or computer programs available


for exact solutions to the general form of this problem. Here, "exact" means
provably optimal except for round-off error and machine representation of
real numbers. Non-exact (i.e. heuristic) solutions are sub-optimal although
they may often be found considerably faster.

References
[1] A. Aggarwal, B. Chazelle, L. Guibas, C. O'Dunlaing, and C. Yap. Par-
allel computational geometry. Algorithmica, 3(3):293-327, 1988.
[2] M.J. Atallah and M.T. Goodrich. Parallel algorithms for some functions
of two convex polygons. Algorithmica, 3(4):535-548, 1988.
[3] M.W. Bern and R.L. Graham. The shortest-network problem. Sci.
Am., 260(1):84-89, January 1989.
[4] W.M. Boyce and J.R. Seery. STEINER 72 - an improved version of
Cockayne and Schiller's program STEINER for the minimal network
problem. Technical Report 35, Bell Labs., Dept. of Computer Science,
1975.
[5] G. X. Chen. The shortest path between two points with a (linear)
constraint [in Chinese]. Knowledge and Appl. of Math., 4:1-8, 1980.
[6] A. Chow. Parallel Algorithms for Geometric Problems. PhD thesis,
University of Illinois, Urbana-Champaign, IL, 1980.
[7] F.R.K. Chung, M. Gardner, and R.L. Graham. Steiner trees on a
checkerboard. Math. Mag., 62:83-96, 1989.
154 F. C. Harris, Jr.

[8] F .R.K. Chung and R.L. Graham. Steiner trees for ladders. In
B. Alspach, P. Hell, and D.J. Miller, editors, Annals of Discrete Math-
ematics:2, pages 173-200. North-Holland Publishing Company, 1978.

[9] E.J. Cockayne. On the Steiner problem. Ganad. Math. Bull., 10(3}:431-
450, 1967.

[10] E.J. Cockayne. On the efficiency of the algorithm for Steiner minimal
trees. SIAM J. Appl. Math., 18(1}:150-159, January 1970.

[11] E.J. Cockayne and D.E. Hewgill. Exact computation of Steiner minimal
trees in the plane. Info. Proccess. Lett., 22(3}:151-156, March 1986.

[12] E.J. Cockayne and D.E. Hewgill. Improved computation of plane


Steiner minimal trees. Algorithmica, 7(2/3}:219-229, 1992.

[13] E.J. Cockayne and D.G. Schiller. Computation of Steiner minimal trees.
In D.J.A. Welsh and D.R. Woodall, editors, Gombinatorics, pages 52-
71, Maitland House, Warrior Square, Southend-on-Sea, Essex SSI 2J4,
1972. Mathematical Institute, Oxford, Inst. Math. Appl.

[14] R. Courant and H. Robbins. What is Mathematics? an elementary


approach to ideas and methods. Oxford University Press, London, 1941.

[15] D.Z. Du and F.H. Hwang. A proof of the Gilbert-Pollak conjecture on


the Steiner ratio. Algorithmica, 7(2/3):121-135, 1992.

[16] M.R. Garey, R.L. Graham, and D.S Johnson. The complexity of com-
puting Steiner minimal trees. SIAM J. Appl. Math., 32(4):835-859,
June 1977.

[17] Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert


Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine - A
User's guide and tutorial for networked parallel computing. MIT Press,
Cambridge, MA, 1994.

[18] R. Geist, R. Reynolds, and C. Dove. Context sensitive color quantiza-


tion. Technical Report 91-120, Dept. of Compo Sci., Clemson Univ.,
Clemson, SC 29634, July 1991.

[19] R. Geist, R. Reynolds, and D. Suggs. A markovian framework for digital


halftoning. AGM Trans. Graphics, 12(2):136-159, April 1993.
Steiner Minimal Trees: Parallel Computation 155

[20] R Geist and D. Suggs. Neural networks for the design of distributed,
fault-tolerant, computing environments. In Proc. 11th IEEE Symp. on
Reliable Distributed Systems (SRDS) , pages 189-195, Houston, Texas,
October 1992.

[21] R Geist, D. Suggs, and R Reynolds. Minimizing mean seek distance in


mirrored disk systems by cylinder remapping. In Proc. 16 th IFIP Int.
Symp. on Computer Performance Modeling, Measurement, and Evalu-
ation (PERFORMANCE (93), pages 91-108, Rome, Italy, September
1993.

[22] R Geist, D. Suggs, R Reynolds, S. Divatia, F. Harris, E. Foster, and


P. Kolte. Disk performance enhancement through Markov-based cylin-
der remapping. In Cherri M. Pancake and Douglas S. Reeves, editors,
Proc. of the ACM Southeastern Regional Conf., pages 23-28, Raleigh,
North Carolina, April 1992. The Association for Computing Machinery,
Inc.

[23] G. Georgakopoulos and C. Papadimitriou. A I-steiner tree problem. J.


Algorithms, 8(1}:122-130, Mar 1987.

[24] E.N. Gilbert and H.O. Pollak. Steiner minimal trees. SIAM J. Appl.
Math., 16(1}:1-29, January 1968.

[25] RL. Graham. Private Communication.

[26] S. Grossberg. Nonlinear neural networks: Principles, mechanisms, and


architectures. Neural Networks, 1:17-61, 1988.

[27] F.C. Harris, Jr. Parallel Computation of Steiner Minimal Trees. PhD
thesis, Clemson, University, Clemson, SC 29634, May 1994.

[28] F.C. Harris, Jr. A stochastic optimization algorithm for steiner minimal
trees. Congr. Numer., 105:54-64, 1994.

[29] F.C. Harris, Jr. An introduction to steiner minimal trees on grids.


Congr. Numer., 111:3-17, 1995.

[30] F.C. Harris, Jr.Parallel computation of steiner minimal trees. In


David H. Bailey, Petter E. Bjorstad, John R Gilbert, Michael V.
Mascagni, Robert S. Schreiber, Horst D. Simon, Virgia J. Torczan,
and Layne T. Watson, editors, Proc. of the 7th SIAM Conf. on Parallel
156 F. C. Harris, Jr.

Process. for Sci. Comput., pages 267-272, San Francisco, California,


February 1995. SIAM.
[31] S. Hedetniemi. Characterizations and constructions of minimally
2-connected graphs and minimally strong digraphs. In Proc. 2nd
Louisiana Conf. on Combinatorics, Graph Theory, and Computing,
pages 257-282, Louisiana State Univ., Baton Rouge, Louisiana, March
1971.
[32] J.J. Hopfield. Neurons with graded response have collective computa-
tional properties like those oftwo-state neurons. Proc. Nat. Acad. Sci.,
81:3088-3092, 1984.
[33] F. K. Hwang and J. F. Weng. The shortest network under a given
topology. J. Algorithms, 13(3}:468-488, Sept. 1992.
[34] F.K. Hwang and D.S. Richards. Steiner tree problems. Networks,
22(1}:55-89, January 1992.
[35] F.K. Hwang, D.S. Richards, and P. Winter. The Steiner 7ree Problem,
volume 53 of Ann. Discrete Math. North-Holland, Amsterdam, 1992.
[36] F.K. Hwang, G.D. Song, G.Y. Ting, and D.Z. Du. A decomposition
theorem on Euclidian Steiner minimal trees. Disc. Comput. Geom.,
3(4}:367-382, 1988.
[37] J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley Pub-
lishing Company, Reading, MA, 1992.
[38] V. Jarnik and O. Kassler. 0 minimalnich gratech obsahujicich n danych
bodu [in Czech]. Casopis Pesk. Mat. Fyr., 63:223-235, 1934.
[39] S. Kirkpatrick, C. Gelatt, and M. Vecchio Optimization by simulated
annealing. Science, 220(13}:671-680, May 1983.
[40] V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to
Parallel Computing: Design and Analysis of Algorithms. The Ben-
jamin/Cummings Publishing Company, Inc., Redwood City, CA, 1994.
[41] Z.A. Melzak. On the problem of Steiner. Canad. Math. Bull., 4(2}:143-
150, 1961.
[42] Michael K. Molloy. Performance analysis using stochastic Petri nets.
IEEE 7rans. Comput., C-31(9}:913-917, September 1982.
Steiner Minimal Trees: Parallel Computation 157

[43] J.L. Peterson. Petri Net Theory and the Modeling of Systems. Prentice-
Hall, Englewood Cliffs, NJ, 1981.

[44] F .P. Preparata and M.I. Shamos. Computational Geometry: an intro-


duction. Springer-Verlag, New York, NY, 1988.

[45] Michael J. Quinn. Parallel Computing: Theory and Practice. McGraw-


Hill Inc., New York, NY, 1994.

[46] M.J. Quinn and N. Deo. An upper bound for the speedup of parallel
best-bound branch-and-bound algorithms. BIT, 26(1):35-43, 1986.

[47] W.R. Reynolds. A Markov Random Field Approach to Large Com-


binatorial Optimization Problems. PhD thesis, Clemson, University,
Clemson, SC 29634, August 1993.

[48] M.1. Shamos. Computational Geometry. PhD thesis, Department of


Computer Science, Yale University, New Haven, CT, 1978.

[49] Justin R. Smith. The Design and Analysis of Parallel Algorithms. Ox-
ford University Press, Inc., New York, NY, 1993.

[50] D. Trietsch. Augmenting Euclidean networks - the Steiner case. SIAM


J. Appl. Math., 45:855-860, 1985.

[51] D. Trietsch and F. K. Hwang. An improved algorithm for Steiner trees.


SIAM J. Appl. Math., 50:244-263, 1990.

[52] P. Winter. An algorithm for the Steiner problem in the Euclidian plane.
Networks, 15{3}:323-345, Fall 1985.
159

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 159-260
@1998 Kluwer Academic Publishers

Resource Allocation Problems


Naoki Katoh
Department of Architecture and Architectural Systems
Graduate School of Engineering
Kyoto University, Sakyo, Kyoto, Japan 606-8501
E-mail: naoki<oarchi.kyoto-u.ac.jp

Toshihide Ibaraki
Department of Applied Mathematics and Physics
Graduate School of Informatics
Kyoto University, Sakyo, Kyoto, Japan 606-8501
E-mail: ibaraki<Okuamp.kyoto-u.ac.jp

Contents
1 Introduction 160

2 Preliminaries and Problem Classification 164


2.1 Preliminaries . . . . . . . 164
2.2 Problem Classification . . . . . . . . . . . . . . 166

3 Fundamental Algorithms 172


3.1 SC/Simple/D............................ 172
3.1.1 Incremental algorithm . . . . . . . . . . . . . . . . . . . . . . 172
3.1.2 Polynomial algorithm . . . . . . . . . . . . . . . 174
3.2 SC/SM/D . . . . . . . . . . " . . . . . . . . . . . . . . . . . . . . . . 178
3.2.1 Incremental algorithm . . . . . . . . . . . . . . . . . . . . . . 184
3.2.2 Decomposition algorithm . . . . . . . . . . . . . . . . . . 187
3.3 SC/GUB/D, SC/Nested/D, SC/Tree/D and SC/Network/D . . . . . 192
3.4 Minimax and Maximin Problems . . . . . . . . . . . . . 194
3.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
160 N. Katoh and T. Ibaraki

4 Proximity Theorems 197


4.1 Proximity theorem for general linear constraints 198
4.2 Algorithm for problem IP . . . . . . . . . . . . 203
4.3 Proximity theorem for submodular constraints 205
4.4 Notes . . . . . . . . . . . . . . . . . . . . . . . 208

5 Lower Bounds on Time Complexity and Improved Algorithms 209


5.1 Improved algorithms for SC/GUB/D, SC/Nested/D and SC/Tree/D 210
5.2 Lower bounds . . . . . . . . . . . . . . . . . . . . 214
5.2.1 Lower bound on a comparison model . . . . . . . . . . . . . . 214
5.2.2 Lower bound on an algebraic tree model . . . . . . . . . . . . 215
5.3 Strongly polynomial algorithms for separable quadratic convex ob-
jective functions .. 216
5.4 Notes and references . . . . . . . . . . . . . . . . . . . . . . . . .. 220

6 Nonseparable Convex Resource Allocation 220


6.1 M-convex functions . . . . . . . . . . . . . . . . . . . . . . . . 221
6.2 A polynomial algorithm for minimizing an M-convex function. . 222

7 Applications 225
7.1 Computer science applications. 225
7.2 Reliabilityapplications..... 232
7.3 Production planning applications 234
7.4 Other applications . 236
7.5 Notes and references . . . . . . . 238

8 Further Topics 239


8.1 Multiple resource allocation . . . . . . . . . . . . . . . . . . . . .. 240
8.2 Multiperiod resource allocation . . . . . . . . . . . . . . . . . . .. 245
8.3 Minimizing a separable convex function under a nonlinear constraint 247
8.4 Other variants of resource allocation problems. . . . . . . . . 248
8.4.1 Resource allocation problem with smoothing objective 248
8.4.2 Minimum variance resource allocation problem 249
8.5 Notes and References. . . . . . . . . . . . . . . . . . . . . . . 251

References

1 Introduction
The resource allocation problem seeks to find an optimal allocation of a fixed
amount of resources to activities so as to minimize the cost incurred by the
Resource Allocation Problems 161

allocation. A simplest form of the problem is to minimize a separable convex


function under a single constraint concerning the total amount of resources
to be allocated. The amount of resources to be allocated to each activity
is treated as a continuous or integer variable, depending on the cases. This
can be viewed as a special case of the nonlinear programming problem or
the nonlinear integer programming problem.
Due to its simple structure, this problem is encountered in a variety
of application areas, including load distribution, production planning, com-
puter resource allocation, queueing control, portfolio selection, and appor-
tionment. The first explicit investigation of the resource allocation problem
was due to the paper by Koopman [76] of 1953 that dealt with the optimal
distribution of efforts which arises in the problem of searching for an object
whose position is a random variable. Since then, a great number of papers
have been published on resource allocation problems. Efficient algorithms
have also been developed, depending on the form of objective functions and
constraints or on the type of variables (i.e., continuous or integer).
In 1988, the present authors have published a book [58] that gave a
comprehensive review of the state-of-the-art of the problems. Since then,
ten years have passed, during which, many papers have been published on
resource allocation problems. A significant progress has been made on the
algorithm side. Also new generalizations and variants of the problem have
been investigated, and new application fields have been discovered. The
main purpose of this paper is to give a brief overview of the recent progress
on the theory and applications, putting emphasis on the cases with discrete
variables.
In order to give a perspective view of these years, let us start with a sim-
plest type of the problem, which is the separable convex resource allocation
problem with discrete variables, and is described as follows:

n
minimize L f; (xj)
j=l
n
subject to L Xj = N,
j=l
Xj ~ 0, integer j = 1, ... ,n.

Because of its simple constraint, this problem is also referred to as the sim-
ple resource allocation problem. For this problem, a simple greedy-type
162 N. Katoh and T. Ibaraki

algorithm was developed by Gross [45] (the same algorithm has been redis-
covered by many researchers, e.g., Fox [33] and Shih [105]). The running
time of this algorithm is O(N log n+n). However, this time complexity is not
polynomial but pseudo-polynomial in the input size, which is O(n + log N).
Thus, efficient polynomial time algorithms have been studied by [34, 41, 66].
The fastest among them is due to Frederickson and Johnson [34] and runs
in O(max{n,nlog(N/n)}) time.
In 1980's, the constraint in the above problem has been generalized,
while preserving the polynomial time solvability. Polynomial time algo-
rithms have been developed for the one with submodular constraint that
includes as special cases the problems with nested, tree and network con-
straint [35, 37,44,58]. It was shown that a greedy procedure that works for
the simple allocation problem also works for the one with submodular con-
straint [31, 58]. The book by Fujishige [37] gave a comprehensive overview
on the resource allocation problem under submodular constraint.
Recently, Ando, Fujishige and Naitoh [5, 6] considered the separable
convex resource allocation problem for a bisubmodular system and for a
finite jump system whose underlying constraint can be viewed as a gener-
alization of submodular constraint. They developed greedy algorithms for
such problems. For the case of a bisubmodular system, a polynomial time
algorithm has been developed by Fujishige [38]. Also, Hochbaum and Shan-
thikumar [55] showed that, for a class of general linear constraints, efficient
algorithms can be developed. The running time of their algorithm depends
on the maximum absolute value of the subdeterminants, i:l., and if i:l. = 1
(i.e., the constraint matrix is totally unimodular), the running time becomes
polynomial. The idea is based on the proximity result between the integral
and continuous optimal solutions. For the case of i:l. = 1, Karzanov and
McCormick [64] very recently proposed another polynomial time algorithm.
Hochbaum [51] has developed another stronger proximity result to improve
the previous algorithm for the separable convex resource allocation problem
with a submodular constraint. It has been also shown that, if specialized
to the separable convex resource allocation problem with nested and tree
constraints, the existing algorithms can be improved.
In addition to these efforts to generalize the constraints, new progress
has recently been made towards generalizing objective functions for which
efficient algorithms can still be developed. This research was done by Murota
[90,91] who identified a subclass of nonseparable convex functions, M-convex
junctions, which is defined on the base polyhedron of a submodular system.
The M -convex functions can enjoy nice theorems of discrete convex analysis
Resource Allocation Problems 163

in a parallel manner to the traditional convex analysis. A polynomial time


algorithm has been developed for this class of problems [106].
Another interesting research field is the minimax resource allocation
problem. The simplest of this type is described as follows:

minimize max f·(x·)


l<'<n
_1- J J
n
subject to L Xj = N,
j=l
Xj ;::: 0, integer j = 1, ... ,no

Here all fj{xj) are non increasing (or nondecreasing) in Xj. This class ofprob-
lem has been extensively studied due to its simple structure as well as rich
applications. Theoretically, the min-max resource allocation problem can
be equivalently transformed into the above simple resource allocation prob-
lem with a separable convex objective function. Therefore, equally efficient
algorithms can be developed for min-max problems. As for the extensions
and generalizations, min-max resource allocation problems with multiple
resources and multiperiod resource allocation problems have recently been
studied in connection with applications to production planning in high-tech
industries [70, 74, 71, 72, 73, 84, 75].
As for applications of resource allocation problems, a few new fields have
been pointed out, such as (1) optimal allocation of computer resources,
e.g., determining optimal buffer size or optimal cache size, (2) computer
scheduling, (3) software reliability, and (4) queueing control.
The organization of this paper is as follows. Section 2 prepares basic
concepts, and defines several types of resource allocation problems to be
discussed in this paper. Section 3 gives fundamental algorithms for the sep-
arable convex and mini-max resource allocation problems with several differ-
ent types of constraints. All the algorithms presented in Section 3 are more
than ten years old, and were surveyed in the book of [58]. The correctness
proofs given in Section 3 for these problems with submodular constraints
are however much simplified, by using the recent result by Murota [91]. Sec-
tions 4 through 6 review rather new results developed in 1990's. In Section
4, we explain proximity theorems between integer and continuous optimal
solutions for separable convex resource allocation problems under submod-
ular constraints as well as under linear constraints with totally unimodular
constraint matrix. In Section 5, based on the proximity theorem given in
164 N. Katoh and T. Ibaraki

Section 4, we present the improved algorithms developed by Hochbaum [51]


for several types of separable convex resource allocation problems. All these
algorithms are not strongly polynomial time. We then show two lower bound
results for the simple resource allocation problem that denies the existence
of a strongly polynomial time algorithm even if each fj is a polynomial of
degree three. This result is also due to Hochbaum [51]. We then present
strongly polynomial time algorithms of Hochbaum and Hong [53], which are
based on the proximity theorems, for several types of separable convex re-
source allocation problems in which each Ii is quadratic. Section 6 reviews
a new development of nonseparable convex resource allocation problem re-
lated to minimizing an M-convex function. Section 7 explains several new
applications of resource allocation problems. Section 8 briefly touches upon
further related topics, including allocation of multiple resources, multi period
allocation, and other variants of resource allocation problems.

2 Preliminaries and Problem Classification


2.1 Preliminaries
We first prepare necessary definitions and basic concepts for the discussion
in this chapter.
Let Rand Z be the set of reals and the set of integers, respectively.
A function f{x) : Rn ~ R is convex if f{ax + (I - a)y) ~ af{x) + (I -
a)f(y) holds for 'Vx, y E R n and 'Va with 0 ~ a ~ 1. For a function
f (x) : zn ~ R U {+oo}, f is defined to be convex if it can be extended
to a convex function in the usual sense; i.e., there exists a convex function
! : R n~ RU{ +oo} such that /(x) = f(x) for all x E zn. We often consider
a function f(x) : [I, u] n Z ~ R, where 1, u E Z and [1, u] denotes the closed
interval from I to u. Let us define

d(x) == f(x) - f(x -1) for x E [I + l,u] n Z,


z
which is called the increment of f at x. A function f(x) : [1, u] n ~ R is
called convex if d(x) is nondecreasing in x (this definition coincides with the
above if 1 = -00 and u = +00.)
[Computation Model] Since f(x) is assumed to be real-valued, it may
take any real number, implying that the output of f(x) may require an
infinite length. Thus, it is impossible to represent f(x) exactly in finite
digits. On the other hand, in order to analyze the computational complexity
Resource Allocation Problems 165

of an algorithm, we adopt, unless otherwise stated, a computation model,


in which the evaluation of f(x) can be done in a unit time. Although this
sounds unrealistic, it does not produce serious troubles in practice, because
an approximate evaluation of f(x) (in finite digits) can be usually done in a
unit of time. If an evaluation of f(x) requires not constant but O(g(x)) time,
it is usually straightforward to reevaluate the computation time obtained
under unit time assumption for this more general situation. We also assume
that basic operations such as an arithmetic operation and a comparison
operation concerning real numbers require a unit time.
[Submodular System] Let n be a positive integer and let
E = {1,2, ... ,n}.

Let 2E denote the family of all subsets of E, and V ~ 2E be given. If V


is closed under union and intersection operations, V is called a distributive
lattice. We assume throughout this chapter that V denotes a distributive
lattice and that 0, E E V, and r(0) = o. A function r : V -t Z is submodular
over V if

r(X) + r(Y) 2:: r(X U Y) + r(X n Y) (2.1)


for all X, Y E V. Such a pair (V, r) is called a submodular system. At
first glance, it may be difficult to understand the intuitive meaning of (2.1),
but it captures the essence of combinatorial structures that are common
to apparently different problems [37, 81]. In fact, as will be described in
Sections 2.2 and 3.2, the set of additional constraints that are often required
in real applications can be defined in terms of a submodular system.
Define
RE = {x = (Xl,X2, ... ,xn ) I Xj E R, j E E}, (2.2)
ZE = {x = (Xl,X2, ... ,xn ) I Xj E Z, j E E}. (2.3)
The unit basis vector e(j) = (et{j), ... ,en(j)) E ZE is defined by ei(j) = 1
for i = j and 0 for i =1= j. For vectors x, Y E R E , we write x ::; y if Xj ::; Yj
for all j E E, and write x < Y if x ::; y and Xj < Yj for some j E E. For
x E RE and S ~ E, we use notation

X(S) == LXj,
jES

where x(0) = 0 is assumed by convention. For a submodular system (V,r),


P(r) = {x I x ERE, x(S) ::; r(S), for all S E V} (2.4)
166 N. Katoh and T. Ibaraki

is called the submodular polyhedron of (V, r). A subset of P(r),

B(r) = {x I x E P(r) and x(E) = r(E)} (2.5)

is called the base polyhedron of (V, r), and x E B(r) is a base of (V, r). In
particular, x E B(r) n ZE is called an integral base of (V, r). B(r) n ZE
is called an integral base polyhedron and we also use the notation B (r) to
z
denote B (r) n E unless confusion occurs.
In section 3.2, we shall provide properties of a submodular system that
are necessary to develop efficient algorithms for the resource allocation prob-
lem with a submodular constraint.

2.2 Problem Classification


A generic form of the resource allocation problem discussed in this paper is
described as follows:

RESOURCE: minimize f(Xl, X2, ... , xn) (2.6)


n
subject to L Xj = N, (2.7)
j=l
Xj ~ 0, j = 1,2, ... ,no (2.8)

That is, given one type of resource whose total amount is equal to N, we
want to allocate it to n activities so that the objective value f(Xl, X2, ... , xn)
is minimized. The objective value may be interpreted as the cost or loss, or
the profit or reward, incurred by the resulting allocation. In case of profit or
reward, it is natural to maximize f, and we shall sometimes consider maxi-
mization problems. The difference between maximization and minimization
is not essential because maximizing f is equal to minimizing - f .
Each variable Xj represents the amount of resource allocated to activity
j. If the resource is divisible, Xj is a continuous variable that can take any
nonnegative value. If it represents persons, processors or trucks, however,
variable Xj becomes a discrete variable that takes nonnegative integer values,
and the constraint

Xj : integer, j = 1,2, ... ,n (2.9)


is added to (2.7) and (2.8). The resource allocation problem with this con-
straint is referred to as discrete resource allocation problem.
Resource Allocation Problems 167

As for the objective function, it usually has some special structure ac-
cording to the intended applications. Typically, the following special case,
called separable, is often considered.
n
'LJj(Xj)' (2.10)
j=l
If each Ii is convex, the objective function is called separable convex.
Resource allocation problems are classified according to the types of
objective functions, constraints and variables. We shall first describe the
classification scheme we use in this paper, and several types of problem
formulations according to the classification scheme. In general, we use the
notation a/{3/, to denote the type of a resource allocation problem. Here
a specifies the type of objective function, {3 the constraint type, and, the

,= ,=
variable type. Here
D and C
denote the integer variable and the continuous variable, respectively. We
shall now explain the notations for a and {3.
(1) a: Objective functions
The objective function !(XI, ... ,xn ) may take the following special struc-
tures:

1. Separable (S for short): 'L,J=l !j(Xj), where each !j is a function of one


variable.

2. Separable and convex (SC for short) : 'L,J=1 !j(Xj), where each Ii is a
convex function of one variable. In particular, if each !j is quadratic
and convex, we denote such a subclass by SQC.

3. Minimax: minimize maxl::;j::;n !j(Xj),


Maximin: maximize minl::;j::;n !j(Xj),
where all Ii are monotone nondecreasing in Xj.

4. Lexicographically minimax (Lexico-Minimax for short): Since the ob-


jective value of Minimax is determined by the single variable xk sat-
isfying Jk(xk) = maxj !j(xj), there may be many optimal solutions.
To remove such ambiguity, we introduce the lexicographical ordering
for n-dimensional vectors: given a = (al, .. . ,an) and b = (b l , ... ,bn ),
a is lexicographically smaller than b (i.e., b is lexicographically greater
168 N. Katoh and T. Ibaraki

than a) if aj = bj for j = 1, ... , k - 1 and ak < bk some k. This


is denoted by a ~lex b or b ~lex a. For a = (a 1, ... , an), denote
DEC(a) (respectively, INC(a)) the n-tuple of aj,j = 1, ... , n, ar-
ranged in nonincreasing order (respectively, nondecreasing order) of
their values (e.g., for a = (4,3,1,5), we have DEC (a) = (5,4,3,1)
and INC (a) = (1,3,4,5)). The objective of Lexico-Minimax is to find
an allocation vector x = (Xl' ... ' xn) such that DEC(x) is minimum.
Notice that an optimal solution to Lexico-Minimax is also optimal to
Minimax, but the converse is not generally true. This is a refined ob-
jective of Minimax. Similarly, we define Lexico-maximin as the one
that maximizes INC(x).

5. Fair: minimize g(maxl~j~n !i(Xj), minl~j~n !i(Xj)), where g(u, v) is


nondecreasing (resp. nonincreasing) in u (resp. v). This objective is
a generalization of Minimax and Maximin.

(2) (3: Constraints


In addition to the simple resource constraint of (2.7), other additional con-
straints are also imposed. Typical additional constraints which appeared in
various resource allocation problems are as follows. We refer the case of no
additional constraints as Simple (see Figure 2.1(1)).

1. Lower and upper bounds (LUB for short): lj ~ Xj ~ Uj, j = 1, ... , n.


2. Generalized upper bounds (GUB for short): 2: jE s; Xj ~ bi, i = 1, ... ,m,
where 8 1 ,82, ... ,8m is a partition of {I, 2, ... ,n} (see Figure 2.1(2)).

3. Nested constraints (Nested for short): 2:jE s; Xj ~ bi, i = 1, ... , m,


where 81 C 8 2 C ... c 8 m • We can assume bl ~ b2 ~ ••• ~ bm since
if bi > bi+1, the i-th constraint is redundant (see Figure 2.1(3)).

4. Tree constraints (Tree for short): 2:jES; Xj ~ bi, i = 1, ... , m, where


the sets 8 i are derived by some hierarchical decomposition of E into
disjoint subsets (see Figure 2.1(4)).

5. Network constraints (Network for short): The constraint is defined in


terms of a directed network with a single source and multiple sinks (see
Figure 2.1(5)). Given a directed graph G = (V, A) with node set V and
arc set A, let s E V be the source and T ~ V be the set of sinks. The
amount of supply from the source is N > 0, and the capacity of arc
Resource Allocation Problems 169

(u,v) is c{u,v). Denote the flow vector by <P = {<p{u,v) I (u,v) E A}.
<p is a feasible flow in G if it satisfies

0:::; <p{u, v) :::; c{u, v), (u, v) E A, (2.11)

L <p{v,w)- L <p(u, v) =0, VEV-T-{s}, (2.12)


(u,v)EA_ (v)

L
(s,v)EA+(s)
<p{s, v) - L
(u,s)EA_(s)
<p{u, s) = N, (2.13)

Xt( <p) == L <p(u, t) -


(u,t)EA- (t)
L
(t,V)EA+(t)
<p{t, v) ~ 0, tET, (2.14)

LXt{<p) = N. (2.15)

The value Xt (<p) denotes the amount of flow entering a sink t E T. For
a feasible flow <p, the vector {Xt( <p) Ix E T} is called the feasible flow
vector with respect to <po For instance, the problem SG/Network/G
(i.e., the separable convex resource allocation problem under network
constraints) is defined as follows:

minimize: l:tET It (Xt (<p))


(2.16)
subject to {2.11} - (2.15),

where ft for each t E T is a convex function.


6. Submodular constraints (8M for short): A set offeasible solutions is de-
fined by a base polyhedron B{r) = {x E RE I x{S) :::; r(S), for all S E
D and x(E) = r{E)} of (2.5) for a submodular system (D,r), i.e.,

x E B(r). (2.17)

In this case, notice that the constraint (2.7) is included in the con-
straints of (2.17) as x{E) = r{E) in the above definition. If we consider
the case of integer variables, the constraint is defined by

x E B(r) n ZE.
We assume throughout this paper that B{r) of the constraint (2.17)
is not explicitly given as an input, but is implicitly given through an
oracle that tells us the value r(X) when X is given.
170 N. Katoh and T. Ibaraki

7. General linear constraints {Linear for short}: Constraints are defined


by a set of linear inequalities.
n
L aijXj ~ bi, i = 1,2, ... , m. {2.18}
j=l

No other special assumption is imposed on the structure of the con-


straints.

Notice that all the constraints, LUB, GUB, Nested, Tree, Network are
special cases of submodular constraints (see [58]), and 8M is a special case
of Linear.
Resource Allocation Problems 171

Xl X2
fl (Xl) f2(X2)

(1) Simple (2) GUB

(3) Nested (4) Tree

(5) Network

Figure 2.1: Illustration of several constraints


172 N. Katoh and T. Ibaraki

3 Fundamental Algorithms
In this section, we first deal with the simplest problem SC jSimplejD and
review two basic algorithms. We then consider a more general problem
SCjSMjD and present two algorithms. We also discuss a relationship be-
tween SC/-jD and Minimax{or Maximin)j·jD.

3.1 SC/Simple/D
3.1.1 Incremental algorithm
We first introduce an incremental algorithm for the simple resource allo-
cation problem, SCjSimplejD. We assume that each h is defined over the
interval [0, N]. Since h is convex, we have

(3.1)

where
dj{Yj) = h{Yj) - f(Yj - 1).
The incremental algorithm is a kind of greedy algorithm, and is also called a
marginal allocation method. Starting with the initial solution x = (0,0, ... ,0),
one unit of resource is allocated at each iteration to the most favorable ac-
tivity (in the sense of minimizing the increase in the current objective value)
until 'E Xj = N is attained.

Procedure INCREMENT
Input: An instance of SCjSimplejD.
Output: An optimal solution x*.
Let x := (0,0, ... ,0) and k := 0;
while k < N do
begin
Find j* such that dj"(xj. + 1) = minl:5j:5ndj(xj + 1);
xj" := xj" + 1;
k:= k + 1
end;
Output x as x*.
Resource Allocation Problems 173

Theorem 3.1 Given an instance ofSC/Simple/D, define the following ma-


trix M:

M= (3.2)

Let D be the set of N smallest elements in M, where if dj (y - 1) = dj (y),


dj (y - 1) is given a higher priority to be chosen in D. Then the following
x* is an optimal solution of SC/Simple/D.

0, dj{l) ¢ D,
xj = { N, dj{N) ED, (3.3)
Y, dj{Y) ED and dj{Y + 1) ¢ D.

PROOF. Suppose that x* is not optimal, and let x be the optimal solution
such that n
8= L I Xj - xjl
j=l
is minimum. Then from Ej=l Xj = Ej=l xj = N, there exist i and k such
that xi > Xi and xk < Xk. From (3.1), we have

di{Xi) ~ di{Xi + 1) ~ di{Xi), dk{xk) ~ dk{xic + 1) ~ dk{Xk).


Since di{Xi) ~ dk{xic + 1) from dk{xic + 1) ¢ D and di{Xi) E D, it follows
di(Xi + 1) ~ dk(Xk). For the feasible solution x' = x + e{i) - e(k), we have
n n
L fj{xj) - L h{Xj) = di{Xi + 1) - dk{Xk) ~ 0.
j=l j=l
This says that x' is also an optimal solution of SC/Simple/D, contradicting
the minimality of 8. [;!]

Theorem 3.2 Procedure INCREMENT correctly computes an optimal so-


lution in 0 (N log n + n) time.

PROOF. Let xk denote the solution after k increments are made in the while
loop. Then it is an easy exercise to prove by induction that

Dk = {dj{y) I y= 1,2, ... ,xj, j = 1,2, ... ,n}


174 N. Katoh and T. Ibaraki

is a set of k smallest elements in M. Thus, the correctness immediately


follows from Theorem 3.1. For the running time, we maintain the set

in an appropriate data structure such as heap. In each iteration of while


loop, the minimum element dj* (xj* + 1) is chosen and S is thereby updated.
Since the initial construction of S requires O(n) time and finding the min-
imum element of S and the update of S require O(logn) time, and while
loop is iterated N times, the running time of O(Nlogn + n) is derived. !;!)

Algorithm INCREMENT is easy to implement, and efficient if N is not


too large. However, it is not polynomial in the input size because the input
size is O(n + log N), where it is assumed that the description of each fj
requires a constant input length.

3.1.2 Polynomial algorithm


Several polynomial time algorithms have been developed for problem
SCjSimplejD [34, 41, 66]. The fastest among them is proposed by Fred-
erickson and Johnson [34]. Its running time is O(max{n,nlog(Njn)}), but
the algorithm is very complicated. For this reason, here we present the
O( n(log N)2) time algorithm by Galil and Megiddo [41], which is a divide-
and-conquer method to find the N-th smallest element in matrix M of (3.2).
Recall that each column Mj of M is already sorted in the nondecreasing or-
der by property (3.1).
Once the N-th smallest element dj* (Y*) in M is found, we let ,\ = dj* (Y*)
and construct an optimal solution x* as follows (where we have to take care
of the situation that many dj{Y) may satisfy dj{Y) = '\). First compute
pj{,\) and qj{,\) for each j with j = 1,2, ... ,n (defined below) by binary
search.

pj(,\) = max{y lyE {1,2, ... ,N} and dj{Y) < ,\},
(3.4)
qj('\) = max{y lyE {I, 2, ... , N} and dj(Y) ~ '\}.

Then compute the index j' such that


j n
j' = max{j I L qk{'\) + L Pk('\) ~ N},
k=l k=j+l
Resource Allocation Problems 175

and let
. < .,
J - J
j=j'+1 (3.5)
j ;::: j' + 2.

Notice that, after A = dj* (y*) is computed, the time required to compute x*
in this way is O(nlogN) because the computation ofpj(A) and qj(A) can be
done in O(log N) time for each j by applying the binary search over [1, N].
Then it is easy to see that set {dj(Y) 11 ::; Y ::; xj, j = 1, ... ,n} is the
set of N smallest elements of M. Thus, the x* defined above is optimal to
SC/Simple/D.
Now, we shall focus on how to compute the N-th smallest element dj* (Y*)
in M. Let 8 denote the size of M (initially, 8 = nN). First, compute the
medians dj(yj) of columns Mj for all j, which can be computed in 0(1)
time for each j because each Mj is already sorted. We sort these medians
in the nondecreasing order

and compute the k such that


k-I k
L IMjil < 8/2 and L IMjil ;::: 8/2. (3.6)
i=1 i=l

We then choose
A = djk (yf:.)
as a candidate for. the N-th smallest element of M. For this A, define

MI = {dj(Y) EM I dj(Y) < A},


M2 = {dj(Y) EM I dj(Y) = A},
M3 = {dj(y) EM I dj{Y) > A}.
If IMII ;::: N, the N-th smallest element of M must be in MI, and the
above procedure is recursively applied to MI. If IMII + 1M2 I ;::: N ;::: IMII,
A = djk (yf:.) is the N-th smallest element of M. If IMII + IM21 < N, the N-
th smallest element is in M3. In this case, the above procedure is recursively
applied to M3 after setting N = N - (IMII + 1M2 !).
176 N. Katoh and T. Ibaraki

At each iteration, the size of the current matrix M systematically de-


creases, while maintaining the property that it contains the target Mi se-
lected in the previous iteration, and we eventually find the N-th smallest
element of M. We claim that, at each iteration, at most three-fourths of
the elements in the current matrix M remain in the M of the the next it-
eration. To see this, first note that by the choice of k, the set M' of Figure
3.1 contains L~=l rMjJ21(~ IMI/4) elements. Since each element in M' is
not larger than oX = djk (y'J:.), M' ~ Ml U M2 follows. Hence,

holds. Applying a similar argument to Mil, we obtain

From these, we conclude

proving the above claim.


From this observation, it is concluded that O(1og N) iterations are re-
quired before IMI ~ n holds. At this point, a linear time selection algorithm
is directly applied to the current M ofsize O(n) to identify the N-th smallest
element. The algorithm is described as follows:

Procedure SELECT
Input: An instance of problem SC/Simple/D.
Output: An optimal solution x*.
Let lj := 1 and Uj := N for j = 1,2, ... ,n;
Let Lj := Uj - lj + 1 for j = 1,2, ... ,n and 8 := LJ=l Lj;
while 8 > n do
begin
Find the medians of dj(yj) of Mj, j = 1,2, ... , n, and sort them
in nondecreasing order, diI (yJ:) ~ dh (yj;) ~ ... ~ djn (Y~);
Compute the k such that Ef::-llMjil < 8/2 and E~=lIMjil ~ 8/2,
and let oX = djk (Y'J:.);
Compute pj(,x) and qj(,x) of (3.4) for j = 1,2, ... ,n, and let
Ll := LJ=l (Pj(oX) -lj + 1), and L2 := LJ=l (qj(,x) -lj + 1);
if Ll ~ N ~ L2 then output oX = djk (Y'J:.) and halt.
Resource Allocation Problems 177

dj1(IJI) djk (lj,,) d·In (l·In )

M'
M dJI (yj;) dj " (yJ:) djn(y;:')

Mil
dJI(uJI) djk(Uj,,) d·In (u·In )

Figure 3.1: Illustration of M,M' and Mil.

else
if Ll ~ N then let Uj := Pj(>'), j = 1, ... , n
else let N := N - L2 and lj := qj(A) + 1, j = 1, ... , n.
end;
Find the N-th smallest element>.. = dj * (y*) in M, and construct an optimal
solution x* by (3.5).

We assume in the algorithm that M j denotes the j-th column of the


current M with lower and upper bounds, Le., Mj = {dj(lj), ... , dj(uj)}.
Thus, the current M is given by M = Ul$j$nMj.

Theorem 3.3 Given an instance of problem SC/Simple/D, procedure SE-


LECT correctly finds the N -th smallest element in M of (3.2) in O(n(log N)2)
time.

PROOF. The correctness directly follows from the discussion prior to the
description of the algorithm. The running time is analyzed as follows. First,
the medians of dj(yj) of M j , j = 1,2, ... , n are computed in O(n) time
178 N. Katoh and T. Ibaraki

because each column is already sorted as (3.1). Sorting these n medians


requires O{nlogn) time. Computation of pj{A) and qj{A) for each j is
done by binary search and requires O{log IMjl) = O{logN) time. Thus
O{nlogN) time is required in total. As mentioned earlier, the while loop
of SELECT is iterated O{log N) times, and the total running time is

O{{n log N + n log n) log N).


This running time is further improved by repeatedly applying linear time
median finding algorithm for identifying the k of (3.6) and A = djk (yJ:.) ,
instead of sorting all medians dj{yj). The details are omitted (see [41] or
the book [58] for details). Using this strategy, the running time is reduced
to

o{(n log N + n) log N) = 0 {n (log N) 2 ).


It is clear that the two algorithms INCREMENT and SELECT can be
generalized in a straightforward manner to deal with the constraints of LUB
without changing the time complexity.

3.2 SC/SM/D
The problem SCjSMjD is described as follows:

mInImIZe 'EjEE !i{Xj),


subject to x = (Xl,X2, ... ,Xn ) E B(r), (3.7)
Xj : nonnegative integer, j E E,
where E = {I, ... , n} and B(r) denotes the base polyhedron of a submodular
system, as defined in Section 2.1.
Before describing algorithms for solving (3.7), we shall give some defi-
nitions and properties concerning a submodular system, which are used to
develop the algorithms. Proofs of the properties are not given except for
Lemma 3.7.

Lemma 3.4 For a submodular system (V, r) and a vector x E R E , define


a function rX : 2E -+ R by

rX(X) = min{r(Z) + x(X - Z) IZ ~ X, Z E V}, X ~ E. (3.8)

Then rX is a submodular function defined over 2E. Q]


Resource Allocation Problems 179

This submodular system (2E, rX) is called the restriction of (V, r) by


vector x. The base polyhedron B(rX) of (2E,rx) is given by

{y lyE B(r), y:S; x}. (3.9)

Lemma 3.5 For a submodular system (V,r) and a vector x ERE, define
a function r x : 2E -t R by

rX(X) = min{r(Z) - x(Z - X) IX ~ Z, Z E V}, X ~ E. (3.10)

Then rx is a submodular function defined over 2E.

This submodular system (2 E ,rx ) is called the contraction of (V,r) by


vector x. The base polyhedron B{rx) of (2E, rx) is given by

{y lyE B(r), y 2:: x}. (3.11)


Since we are interested in the set of integral bases in B(r), which also sat-
isfy the nonnegativity constraints on Xj, we assume throughout this section
that the underlying submodular system is equal to (2E, ro) which is obtained
by contracting a submodular system (V,r) by zero vector O. Therefore, we
assume in what follows that B(r) denotes the base polyhedron corresponding
to (2E,ro), and that we consider the following problem instead of (3.7).

mInImIZe 'L-jEE fj(xj)


(3.12)
subject to x: an integral base of B(r).

For a submodular system (V, r) and a vector x E P{r) (see (2.4) for the
definition of P(r))' define the saturation function sat: P(r) -t 2E by

sat(x) = {j I j E E, x + d· e(j) rf. P(r) for any d > O}. (3.13)

We also define dependence function dep : P(r) X E -t 2E by

dep(x, j) = {j' I j' E E, x + d, (e(j) - e(j')) E P(r) for some d > O} (3.14)

if j E sat(x), and dep(x,j) = 0 otherwise. The following lemma character-


izes sat and dep (see [37] for its proof),

Lemma 3.6 (i) For x E P(r), we have

sat (x) = u
XEV:x(X)=r(X)
X. (3.15)
180 N. Katoh and T. Ibaraki

(ii) For x E P(r) and j E E, we have

dep(x,j) = n
XE'D:jEX,x(X)=r(X)
x. (3.16)

For x E P(r) and j E E, define the saturation capacity as

c(x,j)=max{dld2:O, x+d·e(j) EP(r)}. (3.17)

Similarly, for x E P(r), j E sat (x) and j' E dep(x,j) - {j}, define the
exchange capacity as

c(x,j,j') = max{d I d > 0, x + d· (e(j) - e(j')) E P(r)}. (3.18)

The following lemma is a basis of the algorithms presented in this section


and Section 6.

Lemma 3.7 Let B (r) be a base polyhedron as discussed above.


(i) For any integral base x of B(r), j E E, and j' E dep(x,j) - {j},

x + e(j) - e(j')

is also an integral base of B(r).


(ii) For any two distinct integral bases x, Y E B(r), and for any i E E
with Xi > Yi, there exists j E E with Xj < Yj such that

x - e(i) + e(j) E B(r) and Y + e(i) - e(j) E B(r).

PROOF. We only give the proof of (ii) since the proof of (i) is done in almost
the same fashion as that of Claim 3.8 below. We begin with the following
two claims.

Claim 3.8 For any two distinct integral bases X,Y E B(r), and for any
Xi > Yi, there exists j E E with Xj < Yj such that
i E E with

Y + e(i) - e(j) E B(r).


Resource Allocation Problems 181

PROOF. To prove the claim, let Y = dep(y, i). First we show that y(Y) =
r(Y) holds. Notice that, for any X, X, with i E X n X, such that y(X) =
r(X) and y(X') = r(X') hold, y(X n X') = r(X n X') also holds because

y(X n X') < r(X nX')


< r(X) + r(X') - r(X U X') (from submodularity of r(·))
< r(X) + r(X') - y(X U X')
= y(X) + y(X') - y(X U X') (from y(X) = r(X) and
y(X') = r(X'))
= y(X n X').

Therefore, repeating this argument, y(Y) = r(Y) follows from (3.16). Also,
WI ~ 2 holds since otherwise Y = {i} holds and thus r(Y) = y(Y)(= Yi) <
Xi = x(Y) holds, but this violates X E B(r). Thus, there exists j E Y with
j =1= i. Then we show y + e(i) - e(j) E B(r). From the definition of Y, all X
with y(X) = r(X) satisfies X ;2 Y. Thus (y+e(i) - e(j))(X) = r(X) for all
such X. For other X, y(X) < r(X) holds, and hence (y + e(i) - e(j))(X) ::;
r(X) follows from the integrality of y and r. This proves the claim. gj

Claim 3.9 For any two distinct integral bases x, y E B(r) with IIx - yll (=
Ej IXj - Yjl) = 4, the lemma assertion (ii) holds.
PROOF. The claim is proved by case analysis. There are two cases.
Case 1. There exists exactly one i such that Xi > Yi. From IIx - yll = 4,
Xi - Yi = 2. From Claim 3.8, there exists j with Xj < Yj such that y + e( i) -
e(j) E B(r). For the j, applying Claim 3.8 for y, x-e(i)+e(j) E B(r) must
follow because the i is the unique element such that Xi > Yi.
Case 2. There are i, i' with i =1= i' such that Xi > Yi and Xi' > Yi'. From
IIx - yll = 4, Xi - Yi = 1 and Xi' - Yi' = 1. Then, the claim can be proved
in a manner similar to Case 1. gj

Now we shall prove the lemma. Assuming that the following set is not
empty, we will derive a contradiction.

S = {(x, y) I x, Y E B(r), 3i* with Xi" > Yi" Vj with Xj < Yj it holds
X - e(i*) + e(j) ¢ B(r) or Y + e(i*) - e(j) ¢ B(r}}. (3.19)

Let us choose (x, y) from S such that IIx - yll is minimum. We assume
IIx - yll ~ 6 since otherwise the lemma is immediate from Claims 3.8 and
182 N. Katoh and T. Ibaraki

3.9. Let i* be the index satisfying the condition of (3.19) for the (x, y). We
first consider the case such that i* is the unique element satisfying Xi" > Yi".
From Claim 3.8, there exists jo with Y + e(i*) - e(jo) E B(r), and again by
applying Claim 3.8 with roles of x and y interchanged, x-e(i*)+e(jo) E B(r)
follows since i* is the unique element satisfying Xi" > Yi"' Thus, the lemma
follows. Therefore let us assume that there exists another io with io =1= i*
and Xio > Yio' Let

x = {j EEl x - e(i*) +e(j) E B(r)},

Y = {jo EEl Xjo < Yjo' Y + e(io) - e(jo) E B(r)}.

Recall that Y =1= 0 holds from Claim 3.8.


Case 1: X n Y =1= 0. Let us fix a jo such that jo E X n Y. Let Y' =
Y + e(i o) - e(jo). We shall derive a contradiction by showing (x, Y') E S
because Ilx - y'll = Ilx - YII- 2. Consider an arbitrary j with Xj < yj and
x - e(i*) + e(j) E: B(r), and let

y" = y' + e(i*) - e(j) = y + e(io) + e(i*) - e(jo) - e(j).

Notice that y + e(i*) - e(jo) ¢ B(r) and y + e(i*) - e(j) ¢ B(r) follow
from (x, y) E S. Thus, y" ¢ B(r) follows since otherwise at least one of
y + e(i*) - e(jo) or y + e(i*) - e(j) belongs to B(r), if we apply Claim 3.8
to y and y".
Case 2: X n Y = 0. Then, y + e(i*) - e(j) ¢ B(r) holds from (x, y) E S
and y + e(i o) - e(j) ¢ B(r) holds from X n Y = 0 and the definition of Y.
Thus, y" ¢ B(r) follows since otherwise at least one of y + e(i*) - e(j) or
y + e(i o) - e(j) belongs to B(r) if we apply Claim 3.9 to y and y". Q]

The following lemma is well-known (see [46, 37, 58] for its proof).

Lemma 3.10 Given a submodular system (V, r), the following computa-
tions are done in polynomial time in lEI and logr{E), provided that an
oracle to compute r{X) for a given X in polynomial time is available.
1. Membership test of x E P{r) for a given x ERE.

2. Computation of sat(x) and dep(x,j) for a given x E P(r) and j E E.


3. Computation of c{x,j) for given x E P{r) and j E E.
Resource Allocation Problems 183

4. Computation oj rX(X) oj (3.8) and rx(X) oj (3.9) Jor x E RE and


XE~. ~

For a general submodular system, polynomial algorithms based on ellip-


soid method can be used for all the above computations (see [46]). However,
for most of particular cases of our interest, more efficient polynomial time
algorithms are available.
The following fundamental results are useful to develop efficient algo-
rithms, which will be presented below.

Lemma 3.11 Consider the base polyhedron B(r) oj a submodular system


(2E, ro) and a function J(x) = ~jEE f;(Xj), where f;(Xj) are convex Jor all
j. Then, Jor any two distinct integral bases x, y E B(r) and Jor any i,j E E
with Xi > Yi and Xj < Yj such that

x - e(i) + e(j), Y + e(i) - e(j) E B(r), (3.20)


we have

J(X) + J(y) ~ J(x - e(i) + e(j)) + J(y + e(i) - e(j)). (3.21)

PROOF. Inequality (3.21) follows from

J(x) + J(y) - (J(x - e(i) + e(j)) + J(y + e(i) - e(j)))


= di(Xi) - di(Yi + 1) + dj(Yj) - dj(xj + 1),

and the convexity of Ji and J;.


The following theorem demonstrates that local optimality guarantees global
optimality for a submodular system.
Theorem 3.12 Let B(r) be the base polyhedron oj a submodular system
(2E, ro). An integral base x E B(r) is an optimal solution oj problem
SC jSMjD iJ and only if

J(x - e(i) + e(j)) ~ J(x) (i.e., dj(xj + 1) ~ di(Xi)) (3.22)

holds for all i,j E E with x - e(i) + e(j) E B(r).


PROOF. Since the "only if" part is obvious, we shall only prove the "if"
part. We shall prove

J(y) ~ J(x) for all y E B{r). (3.23)


184 N. Katoh and T. Ibaraki

Suppose that the x satisfying assumption (3.22) is not optimal. Let Y be


an optimal solution such that IIx - yll (= L-jEE IXj - Yjl) is minimum. By
Lemma 3.7(ii), there exist two integral bases x' = x - e(i) + e(j), Y' =
Y + e(i) - e(j) with Xi > Yi and Xj < Yj· Then, f(x) + f(y) ~ f(x') + f(y')
holds from (3.21). Since f(x') ~ f(x) from (3.22), f(y') :5 f(y) must follow.
Thus, f(y') is also an optimal solution and Ilx - y'll < Ilx - yll holds,
contradicting the minimality of IIx - yll. g)

We now present below two algorithms for finding an optimal solution of


problem SCISM/D.

3.2.1 Incremental algorithm


We shall show that the incremental algorithm presented in Section 3.1.1 also
works for problem SCISM/D. In this case, among all the elements such that
x+e(j) E P(r) (Le., feasible except for the constraint (x+e(j))(E) = r(E)),
the Xj with the minimum increase in f;(Xj) is incremented by one. This
process is repeated until x(E) = r(E) is finally attained.

Procedure SM-INCREMENT
Input: An instance of problem SCISM/D.
Output: An optimal solution x*.
Let Xj := 0 for all j E E, E' = E and k := 0;
while k < r(E) do
begin
Find j* E E such that dj.(xj. + 1) = min{dj(xj + 1) I j E E'}.
if x + e(j*) E P(r) then
let xi" := xi" + 1, k := k + 1;
else delete j* from E'
end;
Output x as x*.

To prove the correctness of SM-INCREMENT, we remark here that,


during the execution of the algorithm, it arrives at the following situation
(A) from time to time:
(A) For the solution x obtained in the while loop, some X E
2E becomes newly saturated (Le., x(X) = r(X)), as a result of
increasing Xi by one (where i = j*).
Resource Allocation Problems 185

Once (A) occurs, such Xi will never be increased again. Let EA denote
the set of all i for which (A) occurs before the algorithm halts. Without loss
of generality, let us assume EA = {I, 2, ... ,p}, for which (A) has occurred
in the order of i = 1,2, ... ,po Define

Si= U Y, (3.24)
x(Y)=r(Y), Ye2 E

whenever condition (A) has occurred for solution X and i E E, where we


define So = 0 by convention. Thus Si satisfies X{Si) = r{Si) by the sub-
modularity of r. When SM-INCREMENT halts, i = p and Sp ::!:: E hold.
Now we shall prove three lemmas and two theorems, in which x* denotes
the solution obtained by SM-INCREMENT.

Lemma 3.13 If condition (A) holds for the solution x and i E E during
the execution of SM-INCREMENT, the optimal solution x* output by SM-
INCREMENT satisfies xj = Xj for all j E Si and hence X*(Si) = r(Si).

PROOF. Once condition (A) holds with i E E, Xj for any j E Si cannot be


incremented later by the test x + e(j*) E P(r) in the algorithm. Thus, the
lemma follows. g)

Lemma 3.14 For the subsets Si, i = 1,2, ... ,p, constructed in SM-INCRE-
MENT, the following properties hold.
(i) So c S1 C ... c Sp = E.
(ii) For all i with 1 ~ i ~ p, {xj I j E Si - Si-t} is an optimal solution
of the following problem of SC/Simple/D:

Q,;:
• minimize" L.JjeSi-Si_l f·{x·)
3 3
subject to LjeSi-Si_l Xj = r{Si) - r{Si-l), (3.25)
Xj : nonnegative integer, j E Si - Si-l'

PROOF. (i) Immediate from the definition of Si, i = 1, ... ,po


(ii) Before Si becomes saturated, it holds j ft sat (x) for all j E Si - Si-1.
Hence, x + e{j) E P{r). Thus, SM-INCREMENT selects the j* E Si - Si-1
such that
dj> (Xj> + 1) = min{ dj{xj + 1) I j E Si - Si-1} (3.26)
in the succeeding situations. Since X*{Si) = r{Si) and X*(Si-l) = r(Si-t}
hold, we have X*(Si-Si-d = r(Si)-r(Si-d. Thus, what SM-INCREMENT
186 N. Katoh and T. Ibaraki

does to the variables Xj for j E Si - Si-l is exactly the same as what proce-
dure INCREMENT does if problem Qi of (3.25) is input. The correctness
of INCREMENT then proves property (ii). !;!]

Lemma 3.15 In the execution of SM-INCREMENT,

dj{xj) ~ min{dk{xk + 1) IkE E - Si-l} (3.27)

holds for all i E EA and any j E Si - Si-l'

PROOF. For aj E Si-Si-b let x = {Xk IkE E} denote the vector obtained
just before Xj = xj -1 is increased to Xj = xj. Then all k E E - Si-l satisfy
x + e(k) E P{r) by the definition of Si and Si-l, and thus
dj{xj) = dj{xj + 1) = min{dk{xk + 1) IkE E - Si-l}

holds. This proves the lemma because dk{Xk + 1) ~ dk{xk + 1) holds for all
k E E - Si-l by Xk ~ xk' !;!]

Theorem 3.16 SM-INCREMENT correctly computes an optimal solution


ofSC/SM/D in O{r(E)logn+{r{E)+n)r) time, where r denotes the time
required to perform the membership test of x E P{r).

PROOF. Let x* be the solution output by SM-INCREMENT. This x* is


clearly a feasible solution of SCISM/D. From Theorem 3.12 we only need
to show
(3.28)
holds for all i, j E E with x* - e{i) + e(j) E B{r). Let x' = x* - e{i) + e(j).
Suppose i E Sk - Sk-l' Then j E E - Sk-l must hold since otherwise
X'{Sk-l) > X*{Sk-l) = r{Sk-l) holds from i rt. Sk-l and j E Sk-l (the
equality is from Lemma 3.13), contradicting the feasibility of x'. Then (3.28)
follows from Lemma 3.15.
As for the running time, we need to test whether x + e(j*) E P{r).
Once j* is removed from E', j* is no more considered. Thus, the number
of executions of such feasible test is at most r{E) + n. The other part of
the procedure is analyzed in a manner similar to the analysis of procedure
INCREMENT. This completes the proof. !;!]

Notice that the running time of the above procedure SM-INCREMENT


is not polynomial but pseudo-polynomial if r{E) is given as an input.
Resource Allocation Problems 187

3.2.2 Decomposition algorithm

This section presents a polynomial algorithm for solving SCISM/D. It first


solves a problem of SC/Simple/D type, which is obtained from the original
problem by considering only the simple constraint x(E} = r(E} but disre-
garding the rest. If the obtained solution y is feasible, we are done, i.e., it
is an optimal solution of the original problem. Otherwise, the problem is
decomposed into subproblems using the information obtained from the vec-
tor y and the submodular constraints. For this, we first compute a maximal
vector v E RE satisfying

v E P(r) and v::; y. (3.29)

If v = y, it means y E B(r}, and y is a feasible solution of SC/SM/D; i.e., y


is an optimal solution. Otherwise, let

EI = sat(v) and E2 = E - E I ,

for j EEl,
(3.30)
lJ = yj, uJ = +00 for j E E2·

As will be proved later, there exists an optimal solution x* of SC/SM/D


such that

nJ <- x~J <- U~


J
for J' E E I ,

l~J <
-
x~J <
-
u~J for J' E E 2 ,
(3.31)
x*(Et} = r(Et},
188 N. Katoh and T. Ibaraki

This means that the original problem is equivalently solved by the following
two subproblems, Q1 and Q2.

subject to x(Ed = r(El),

x(X) :::; r(X), X E 2E1 , (3.32)

Xj: nonnegative integer, j EEl,

subject to x(E2 ) = r(E) - r(Ed,

Xj: nonnegative integer, j E E 2 .

Note that the constraint of problem Ql is to add constraints Xj :::; Yj, j EEl,
to the original constraint x E B(r), and the constraint of problem Q2 is to
add constraints Xj ~ Yj, j E E 2 , to the original constraint x E B(r).
The constraints of Q1 and Q2 are again submodular, because the former is
equivalent to the restriction of (2El, r) by vector Y (see Section 3.2)' and
the latter is equivalent to the contraction of (2E2, r) by vector Y (see Section
3.2). Thus, in order to solve these problems, we can recursively execute the
algorithm. In a general step of the recursion, it solves the following problem
SM(E', S, l, u) of SC/SM/D for some E', SeE and I, u E RE', by first
solving its relaxation, SI(E', S, l, u).
Resource Allocation Problems 189

SM(E', S, l, u): minimize "£jEEI fJ(Xj}

subject to x ERE' ,

x(E') = r(E' U S) - r(S),

x(X) ~ r(X U S) - r(S), X E 2E',

1~ x ~ u,

Xj: .
mteger, J. E E' .
(3.34)

subject to x E RE',

x(E') = r(E' U S) - r(S), (3.35)

1 ~ x ~ u,

Xj: integer, j E E'.


The entire procedure is called SM, in which subroutine DA(E', S, l, u)
solves the above problem S M (E', S, 1, u).
Procedure SM
Input: An instance of problem SCISM/D.
Output: An optimal solution x*.
Step 1: Let lj := 0 and Uj := +00 for j E E.
Step 2: Call procedure DA(E, 0, l, u) to obtain a solution x*.
Step 3: Output x* and halt.

Procedure DA(E', S, l, u)
Step 1: Compute an optimal solution y of SI(E', S, 1, u).
Step 2: Find a maximal vector v E P'(r) such that v ~ y, where P'(r) is
the submodular polyhedron of the underlying submodular system of
problem SM(E', S, l, u).
Step 3: If v = y, let x* := y and return x*.
190 N. Katoh and T. Ibaraki

Step 4: Compute sat(v) for P'(r), and let E~ := sat(v) and E~ := E' - E~.
Step 5: Define two vectors 11, u 1 E ZE~ by I} := lj, u} := Yj for j E EL and
call DA(E~, S, 11, u 1) to obtain an optimal solution xl of Qi =
SM(EL S, 11, u 1 ) (which is the above Q1 in the case of the first call).
Step 6: Define two vectors 12,u2 E ZE~ by I} := Yj,u} := Uj for j E E~, and
call DA(E~, S U EL 12, u2) to obtain an optimal solution x 2 of Q~ =
SM(E~, S U EL 12, u 2) (which is the above Q2 in the first call).
Step 7: Return an optimal solution x· = (x}, j E E~; Xl,j E E~).

Before proving the correctness of procedure SM, we prove the following


lemma:
Lemma 3.17 If DA(E', S, I, u) returns x· in Step 3, it is an optimal solu-
tion of problem SM(E', S, I, u).
PROOF. Step 1 ofDA(E',S,I,u) computes an optimal solution Y ofprob-
lem SI(E', S, I, u), which belongs to class SC/LUB/D. Since the problem
SI(E', S, I, u) is a relaxation of problem SM(E', S, I, u), the Y is optimal to
SM{E',S,I,u) if it is feasible to SM(E',S,I,u). g]

Theorem 3.18 Procedure SM correctly computes an optimal solution of


SCISM/D.
PROOF. Let us consider the computation of DA{E', 0, I, u). If v < y, E' is
partitioned into El and E~, and the original problem is accordingly decom-
posed into two subproblems. This decomposition is recursively applied, if
necessary, and the entire process is represented by a decomposition tree TE,
where E = {I, 2, ... ,n}. The proof is done by the induction on the height
of TE. At a leaf of TE, the corresponding subset E' is no more decomposed
because the condition of Step 3 holds. Thus, due to Lemma 3.17, the sub-
problem can be correctly solved at the leaf. Consider an intermediate node
of TE, and let SM(E',S,I,u) denote the problem considered at the node.
Let Qi and Q~ denote the subproblems of SC/SM/D considered in Steps 5
and 6 of DA(E', S, I, u), respectively. By induction hypothesis, Assume that
problems Qi and Q~ can be correctly solved, and let {xj I j E El} (resp.,
{x; I j E E~}) denote the obtained optimal solution of Qi (resp., Q~). We
shall now prove that {x; I j E E'(= E~ U E~)} is an optimal solution of
SM(E',S,I,u).
From Theorem 3.12 we only need to show
(3.36)
Resource Allocation Problems 191

for all i,j E E with x* - e(i) + e(j) E B(r). Let x' = x* - e(i) + e(j). If
i, j E E~ or i, j E E~, the inequality (3.36) clearly holds from the induction
hypothesis. If i E E~ and j E E~, we have

x'(ED > x*(ED = r(ED - r(S)

(the right equality follows from the constraint of (3.34)). Thus, let us con-
sider the case of i E E~ and j E E~. From the way of decomposition used in
Steps 5 and 6, we have

xi ~ Yi and x; ~ Yj· (3.37)


Since Y is an optimal solution of the simple resource allocation problem
SI(E', S, l, u),
(3.38)
holds from Theorem 3.1. Then from the convexity of Ii and Ii, and from
(3.37), we have

This completes the proof.

Theorem 3.19 The running time of procedure 8M is polynomial in lEI and


logr(E). The time complexity is O(TIIEI + T2IEI2Iogr(E) + T31EI) time,
where Tl, T2 and T3 denote the running times to solve 8I(E',S,l,u), to test
x E P(r) for a given x ERE, and to compute sat(X) for a given X ~ E,
respectively.

PROOF. We first evaluate the running time of each execution of procedure


DA. Step 1 can be solved in Tl time. Computation of a maximal vector v
in Step 2 is done by iteratively increasing the j-th component of v by dj ,
j E E, where
dj = max{d I v + d· e(j) E P(r)}.
The computation of this dj can be done by binary search (i.e., in O(logr(E))
iterations), and testing whether v + d . e(j) E P(r) or not can be done in
T2 time. Therefore Step 2 requires O(T2IEllogr(E)) time. Finally sat(v)
is computed in T3 time. These computations clearly dominate all the other
parts of DA, showing that the time to execute all steps of DA once is O( Tl +
T21EIlog r(E) + T3)' Since the number of vertices in the decomposition tree
TE is O(E), we need to call DA O(E) times, and the stated time complexity
of SM follows.
192 N. Katoh and T. Ibaraki

From Theorem 3.3, 71 is polynomial in lEI and logr{E}. Also, from


Theorem 3.10, 72 and 73 are polynomial in lEI and logr{E}. Therefore, the
running time of procedure SM is polynomial in lEI and logr{E). g)

3.3 SC/GUB/D, SC/Nested/D, SC/Tree/D and SC/Net-


work/D
We shall briefly explain how separable convex resource allocation problems
under generalized upper bounds, nested, tree and network constraints can
be solved efficiently, when the decomposition algorithm SM in the previous
subsection is specialized to these cases. Since generalized upper bounds,
nested and tree constraints are special cases of network constraints, we shall
only consider problem SO/Network/D.
Let us call a vector {xt{cp}lx E T} for a feasible flow cp of (2.11) - (2.15)
an inflow vector. For X E T, define

w{X) = max(E Xt{cp) I {xt{cp)lx E T} is an inflow vector}. (3.39)


teX

Then, it is easy to see from the well-known max-flow min-cut theorem [32]
that
w{X} = min{c{U, V - U} Is E U,X ~ V - U} {3.40}
holds, where c{U, V - U) = Eueu,vev-u c{u, v) represents the capacity of a
cut (Le., a partition) (U, V - U) separating X from the sources. It is also
well known in the network theory (see [57]) that the submodular property
holds for w{X); Le.,

W(X) + W{Y) ~ W{X U Y) + W{X n Y), for X, YET.

Then, it is not difficult to see that {Xt{CP) I x E T} is an inflow vector of


some feasible flow cp if and only if

X{X} ~ w{X} for all X ~ T,

and
x{T} = N.
(See e.g., [58] for the proof.)
Now, we shall analyze the running time of procedure SM when specialized
to SC/Network/D. Step 1 of DA can be executed by solving a problem
Resource Allocation Problems 193

of SC/Simple/D. For the obtained optimal solution y, Step 2 computes


a maximal vector v ~ y. In the case of SC/Network/D, this is done by
introducing a supersink t* as well as arcs (t, t*) for all t E T with capacity
c(t, t*) = Yt, and finding a maximum flow <p* from s to t* in the modified
network. (In general, while constructing the modified network, we also have
to introduce the lower bound on the flow value of the arc (t, t*) in order to
reflect the constraints implied by a vector 1.) Then we obtain a vector v as

{Vt = <p*(t, t*) I t E T}.


Thus, v can be computed by applying an appropriate max-flow algorithm.
The time required for other steps are dominated by that for Steps 1 and 2.
Thus, we have the following theorem.

Theorem 3.20 Problem SC/Network/D can be solved in O(ITI(T(n, m,


Cmax) + ITllog(N/ITI))) time, where T(n, m, Cmax ) denotes the running time
for the maximum flow algorithm for a graph with n vertices, m arcs and the
maximum arc capacity Cmax'

PROOF. Since SC/Network/D is a special case of SC/SM/D, the running


time is analyzed by following the proof of Theorem 3.19. Assuming that a
maximum flow algorithm requires O(mn) time, we have ITI ~ T(n, m, Cmax).
The computation of vector Y of Step 1 in procedure DA reduces to solving
problem SC/Simple/D, which can be done in O(ITllog(N/ITI)) time by
using Frederickson and Johnson's algorithm [34]. The computation of vector
v ::; y of Step 2 is done by computing a maximum flow, which requires
T(n, m, Cmax ) time. These computations clearly dominate all the rest of
DA. Thus, the theorem follows from Theorem 3.19. (;!]

Existing max-flow algorithms run in time T(n, m, Cmax) = O(n 3 ), O(nm +


n 3/2+f m l/2) or O(nm+n2 logCmax ) [69,3,2]. Since the maximum flow for a
tree network can be computed in O(m) time (see [58]) and m = O(n) holds
for a tree, we have the following corollary.

Corollary 3.21 Problems SC/GUB/D, SC/Nested/D and SC/Tree/D can


be solved in O(n 2 log(N/n)) time. (;!]

Faster algorithms have been developed for SC/Nested/D and


SC/Tree/D. For SC/Nested/D, Dyer and Walker's algorithm [27] runs in
194 N. Katoh and T. Ibaraki

O(nlognlog2(N/n)) time, and for SC/Tree/D, Dyer and Frieze [26] devel-
oped an O(nlog2nlogN) time algorithm. Hochbaum [51] further improved
the running time of these two algorithms to O(nlognlog(N/n)). The idea
of the improvement is based on a general and beautiful proximity theo-
rem between integral and continuous optimal solutions for SC/SM/D and
SCISM/C. We will describe in Sections 4 and 5 the proximity theorem and
the resulting efficient algorithms.

3.4 Minimax and Maximin Problems


We shall show here that minimax and maximin resource allocation problems,
Minimax/·/D and Maximin/·/D, are equivalently transformed into problems
of SCr/D, where· denotes any type of constraints defined in Section 2.2.
Therefore, equally efficient algorithms can be developed for minimax and
maximin problems. It suffices to show this fact for the most general case,
Le., Minimax/SM/D and Maximin/SM/D, which are described as follows:

MINIMAX: minimize maxjEE !j(Xj),


(3.41 )
subject to x: an integral base of B (r),

MAXIMIN: maximize minjEE Ji(Xj),


(3.42)
subject to x: an integral base of B (r).
Here all !j, j E E, are assumed to be nondecreasing. When allJi are non-
increasing, problems MINIMAX and MAXIMIN are mutually transformed
into MAXIMIN and MINIMAX, respectively, by the following identities:

-minmaxj-(x·) = maxmin-j-(x·)
x jEE J J x jEE J J'

-maxminj-(x·) = minmax-J-(x·).
x jEE J J x jEE J J

Define for j E E

(3.43)
()
hj Xj = Ly~-l !j(Y),
x'-l
Xj = 0,1,2, ... ,
Resource Allocation Problems 195

where fj( -1) = Ji(O) is assumed. Note that


9j(Xj) - 9j(Xj - 1) = fj(xj),
(3.44)
hj(Xj) - hj(xj - 1) = Ji(Xj - 1)
hold for each Xj = 0,1,.... From the nondecreasingness of Ji, it follows
that 9j and hj are convex over the nonnegative integers. Now consider the
following problems of SCjSMjD:
Qg : min(L:: 9j(Xj) I x: an integral base of B(r)}, (3.45)
jEE
Qh : min{L hj(xj) I x: an integral base of B(r)}. (3.46)
jEE
Theorem 3.22 An optimal solution of problem Qg(resp., Qh) is optimal to
MINIMAX (resp., MAXIMIN).
PROOF. We only give the proof for Qg (the case of Qh can be similarly
treated). Let xg be an optimal solution of Qg. By Theorem 3.12 and prop-
erty fj(xj) = 9j(Xj) - 9j(Xj -1) by (3.44), it holds
(3.47)
for all i,j E E with xg - e(i) + e(j) E B(r). Suppose that xg is not optimal
to MINIMAX, and let x* be an optimal solution such that Ilxg - x* II (=
L-jEE Ix] - xj I) is minimum. By (3.21) with x = xg and y = x*, we have
h(xf) - h(x; + 1) + Ji(xj) -Ji(X] + 1) ~ 0 (3.48)
for any i, j with xf > xi, X] < xj such that xg - e( i) + e(j), x* + e( i) - e(j) E
B(r). Hence, together with (3.47), we have
fj(xj) - h(x; + 1) ~ O. (3.49)
Let x = x* + e(i) - e(j). Since fj(xj) is nondecreasing,
fj(xj) ~ h(xi + 1) = h(Xi). (3.50)
Thus, from fj(xj) = fj(xj - 1) ~ Ji(xj), and fk(Xk) = fk(xiJ for k i= i,j,
it follows
~axfj(xj) ~ ~axfj(xj).
JEE JEE
This implies that x is also an optimal solution to MINIMAX , contradicting
the minimality of Ilx* - xgll. QJ
196 N. Katoh and T. Ibaraki

3.5 Notes and References


Most of the proofs of fundamental properties of a submodular system can
be found in [37, 58]. The proof of Lemma 3.7{ii} is due to Murota [92]
although this property has been known as a folklore among researchers in
this field. Also, Murota [92] has shown that the condition of Lemma 3.7{ii}
is a necessary and sufficient condition that a set of vectors in Z E, B, is a
base polyhedron of a submodular system.
Incremental and decomposition algorithms of Sections 3.2.1 and 3.2.2 are
due to Federgruen and Groenevelt [31] and Groenevelt [44]. The original
idea of the decomposition algorithm is due to Fujishige [35] who considered
a special case of SC/SM/C in which J;{Xj} = WjX~ {Wj > O} holds for each
j. The correctness proof given here is based on Theorem 3.12, which is due
to Murota [90]. Theorem 3.12 helps simplify the proofs given in the original
papers. The proof of Theorem 3.22 for MINIMAX is also based on the result
of [90] for M-convex functions.
The incremental algorithm for SC/SM/D was extended to more general
constraints; i.e., to the separable convex resource allocation problem by
Ando, Fujishige and Naitoh for a bisubmodular system and for a finite jump
system {see papers [5, 6, 38] for the details}. Notice that a finite jump system
is a generalization of a bisubmodular system. The existence of a polynomial
time algorithm similar to SM of Section 3.2.2 is known for a bisubmodular
system [38], while it is still to be investigated for a finite jump system.
The decomposition algorithm of Section 3.2.2 works also for the case
of continuous variables, i.e., problem SC/SM/C, with a minor modification
in Step 1 of procedure DA that we solve problem SC/Simple/C instead of
SC/Simple/D (see the book by Fujishige [37]). Fujishige [35, 37] showed
the equivalence between Lexico-Minimax/SM/C and SC/SM/C in a sense
similar to Theorem 3.22. Also, it has been shown in [35, 37] that an optimal
solution of Minimax of {3.41} obtained through problem Qg of {3.45} is
lexicographically minimax.
Minimax/Simple/C has been further studied in [18, 19, 29, 78] since the
book by Ibaraki and Katoh [58] has been published.
Fujishige, Katoh and Ichimori [39] proposed an efficient algorithm for
Fair /SM/D by making use of the proximity result that relates an optimal
solution of Fair/SM/D and optimal solutions of Minimax/SM/D and Max-
imin/SM/D. Namikawa and Ibaraki [93] also derived an efficient algorithm
for Fair/SM/D using the continuous relaxation.
Resource allocation problem with continuous variables has also been
Resource Allocation Problems 197

extensively studied, although omitted in this paper (see [58] for details).
Among them, separable convex quadratic resource allocation problem with
continuous variables has been well studied, where, instead of the constraint
'£7=1 Xj = N, the following knapsack constraint is considered:
n
2:ajXj:5 N, (3.51)
j=1
where all aj are nonnegative. This problem is called the quadratic knapsack
problem. Notice that the problem with this constraint can be transformed
into SQC/Simple/C because, due to continuous variables, introducing new
variables xj = ajxj and a slack variable s turns the above knapsack con-
straint into the equality constraint of the form '£7=1 Xj + s = N without in-
creasing the computational difficulty. The LUB constraint can also be easily
incorporated without increasing the difficulty. For this problem, Brucker [20]
developed a linear time algorithm. Recently, Nielsen and S.A. Zenious [95],
Pardalos and N. Kovoor [99], Pardalos, Ye and Han [100], and Robinson,
Jiang and Lerme [102] further studied this problem. Problem SQC/GUB/C
has been also studied by Bretthauer and Shetty [15] who performed exten-
sive computational experiments for various problem instances with up to
hundred thousand variables, and showed that their algorithm is extremely
efficient compared with a general-purpose software for nonlinear programs.
Megiddo and Tamir [88] developed a linear time algorithm for the separable
convex quadratic resource allocation problem with more general constraints.
We shall discuss problem SQC/-/C in Section 5.

4 Proximity Theorems
In this section, we present two types of proximity theorems between integral
and continuous optimal solutions. The first is due to Hochbaum and Shan-
thikumar [55], and the second is due to Hochbaum [51]. The first theorem
deals with SC/Linear/- and the second with SCISM/-. The first proximity
result is described in terms of the maximum absolute value of the subdeter-
minants, ~, of the constraint matrix. If ~ = 1 (i.e., constraint matrix is
totally unimodular), the theorem leads to a polynomial time algorithm for
the problem with integer variables. Thus this result identifies a new class of
resource allocation problems with integer variables for which a polynomial
time algorithm can be developed. Notice that a system of linear inequalities
defined by a totally unimodular matrix is different from the one defined by
198 N. Katoh and T. Ibaraki

submodular constraints, because the constraint matrix defined by submodu-


lar functions (Le., (2.5)) is totally dual integral but is not totally unimodular
(see e.g., [56, 28]).
From the second proximity theorem for SC/SMj-, Hochbaum [51] de-
veloped efficient algorithms for SC/GUB/D, SC/Nested/D and SC/Tree/D,
improving the running time of the algorithms given in Section 3.2. The
algorithms use the so-called scaling techniques in which the scaled versions
of incremental algorithms are iteratively applied. These algorithms will be
described in Section 5.

4.1 Proximity theorem for general linear constraints


For simplicity, denote
n
f(x) = L !;(Xj), (4.1)
j=1

where all !; : R --t R, j = 1, ... , n, are convex. We are interested in the


following problem,

IP: minimize f (z)


subject to Az ~ b,
(4.2)
z~O,
z: integer,

as well as its continuous version,

RP: minimize f (x)


subject to Ax 2: b, (4.3)
x 2: O.

Here A is an integral m x n matrix and b is an m-dimensional integral vector.


We assume that there exists an optimal solution for each of IP and RP. For
any scaled constant s (a positive integer), let the scaled problem IP I s be
defined by
IPIs: minimize f(sy)
subject to Ay 2: bls,
(4.4)
y ~ 0,
y: integer.
In problem IP I s, we require the solution x = sy to be an integer multiple
of s. Although RP and a continuous version oflP I s are the same, we need
Resource Allocation Problems 199

to define the linearized function ffs for s.

ffS : R --t R, a piecewise linear convex function such that ffS(sYj) =


fj(sYj) holds for all integers Yj.

Now define
n
fL:S(x) = L ffS(Xj), (4.5)
j=l

and consider the following problem:

LPIS: minimize fL:S(X)


subject to Ax;::: b, (4.6)
x;::: o.
Note that an optimal solution to the following problem,

IP'Is: minimize fL:S(sy)


subject to Ay;:::bls,
(4.7)
Y;::: 0,
y: integer,

is also optimal to IP Is, since f L:s (sy) = f (sy) holds for all integer vectors
y.
Now we assume without loss of generality (see [98]) that we know lower
and upper bounds on variables in which an optimal solution exists, i.e.,

Ij:S Xj:S Uj, j = 1,2, ... ,no

Remark 4.1: LP I s can be formulated as a linear programming problem


by introducing N continuous variables for each j corresponding to the line
fS
segments of the piecewise linear convex function f (sYj), where N = (Uj -
lj)ls (see [25]).
Let us use the notation x V Y and x /\ Y for x, Y E R n to denote

x /\ Y = (min{ Xl, YI}, ... , min{ X n, Yn}).


200 N. Katoh and T. Ibaraki

Lemma 4.1 For the function f of (4.1) and for any x,x',y,y' E Rn satis-
fying
(i) x 1\ x' :::; Y :::; x V x',
(ii) x 1\ x' :::; y' :::; x V x', (4.8)
(iii) x + x' = Y + V',
we have
f{x) + f{x') ~ f{y) + f{y'). (4.9)

PROOF. Let z = x 1\ x' and z' = x V x'. Since either Xj = Zj, xj = zj or


x j = zj, xj = Zj holds for all j from the definition of z and z',

(4.1O)

(4.11)
hold for all j. Then from (i), (ii), (iii) and (4.1O), we have Zj :::; Yj :::; zj,
Zj :::; yj :::; zj and Zj + zj = Yj + yj for all j. From the convexity of Ii, it is
easy to see
(4.12)
and hence
f(z) + f(z') ~ f(y) + f(y')· (4.13)
Since f{z) + f{z') = f{x) + f{x') holds from (4.11), the lemma follows. 0
Now let b. be the maximum absolute value of the sub determinants of
constraint matrix A, and Ilxll oo = maxl~j~n IXjl be the Loo norm of x ERn.
The proximity theorem between IP and RP is stated as follows.

Theorem 4.2 (i) For every optimal solution x of RP, there exists an opti-
mal solution z* of IP such that IIx - z*lIoo :::; ntl.
z
(ii) For every optimal solution of IP, there exists an optimal solution
x* of RP such that liz - x* 1100 :::; ntl.

z
PROOF. We first define a cone with respect to and X. Let (81 ,82) be the
partition of {I, ... ,n} such that Zj ~ Xj for all j E 8 1 and Zj < Xj for all
j E 8 2 • Also partition the matrix A into submatrices
Resource Allocation Problems 201

such that Alz < Alx and A 2z ~ A 2x. Define the cone G = {y E R n I Aly ~
0, A 2y > 0, Yj ~ 0, j E Sl, Yj ~ 0, j E S2}. Let U c G be a finite set
of integral vectors that generate Go Then, because of the integrality of the
components of A, Ilull oo ~ !:l holds for all u E U, and since z - x E G,
there exist t (~ n) vectors u(i) E U and ai > 0, i = 1, ... , t, such that
z - x - ",t
A A . (i) (
L....i=l a~u
_
see [23])', I.e.,
.

t
Z = X + Laiu(i)o (4.14)
i=l
Let
f3i = ai - Lad, i = 1, ... , t (4.15)
and define
t
z* = X + Lf3iU(i), (4.16)
i=l
and
t
x* = Z- Lf3iU(i). (4.17)
i=l
From (4.14), (4.16) and (4.17), we have
t
z* = Z - L(ai - f3i)u(i) (4.18)
i=l

t
x* = X + ~)ai - f3i)u(i). (4.19)
i=l
Now, we shall show that these z* and x* are feasible for IP and RP, respec-
tively. Observe from (4.18) that
t
AlZ* = Alz - L(ai - f3i)A l u(i) ~ Alz, (4.20)
i=l

since ai - f3i ~ °and u(i) E G (i.e., Al u(i) ~ 0, i = 1, ... ,t). Similarly, from
(4.16),
t
A2Z* = A2 X + L f3i A2U(i) ~ A 2x. (4.21)
i=l
Hence, Az* ~ b holds from Az ~ b and Ax ~ b. Since ai - f3i = Lad and
u(i) are integers, the integrality of z* then follows from (4.18). Analogously,
202 N. Katoh and T. Ibaraki

t
A1X* = A1Z - L.BiA1U(i) ~ A1z (4.22)
i=l
t
A2X* = A2x + L(ai - .Bi)A2u(i) ~ A2X. (4.23)
i=1
Hence, Ax* ~ b. Since u(i) EO, we have U)i) ~ 0 for j E 81 and uY) :::; 0 for
j E 82. Combining this with Zj ~ Xj for j E 81 and Zj < Xj for j E 82, we
have from (4.16) and (4.18) that

Zj ~ z; ~ Xj, j E 81
(4.24)
Zj :::; z; :::; Xj, j E 82,
and similarly from (4.17) and (4.19) that
A>*>A
Zj _ Xj _ Xj, J.E81
(4.25)
Zj :::; x; :::; Xj, j E 82.
These (4.24) and (4.25) imply

(4.26)

and

Z A x :::; x* :::; Z V x. (4.27)


Furthermore, (4.16) and (4.17) imply

Z+X = z* +x*. (4.28)


From (4.26), (4.27) and (4.28) and from Lemma 4.1, we have

J(Z) + J(x) ~ J(z*) + J(x*). (4.29)

Therefore
J(z*) :::; J(z) + J(x) - J(x*) :::; J(z),
since x is an optimal solution of RP and x* is feasible to RP. Hence, z* is
also optimal to IP and J(z*) = J(z) holds. Therefore, J(x) = J(x*) follows,
and x* is also optimal to RP. Finally, from (4.16) and (4.17), we have

Ilx - z*11 = II E~=l .8iU (i) 1100 ::; nA


Resource Allocation Problems 203

The proofs for the following proximity theorems, between IP I sand IP,
and between IP and LP Is, can also been done by using the arguments similar
to that of Theorem 4.2. These will be used to develop the algorithm in the
next subsection.
Theorem 4.3 (i) For every optimal solution '0 oj IP I s, there exists an
optimal solution z* oJIP such that Iisy - z*lIoo :$ ns~.
(ii) For every optimal solution z oj IP, there exists an optimal solution
y* oJIP/s such that liz - sy*lIoo :$ ns~. g]

Theorem 4.4 For every optimal solution x oj LP/s, there exists an optimal
solution z* oj IP such that

IIx - z*lIoo :$ 2ns~. (4.30)

PROOF. Since LP Isis a continuous relaxation ofIP' I s with x = sy, we know


from Theorem 4.2 that there exists an optimal solution y* of IP' / s (and
hence oflP/s) such that IIxls - y*lIoo :$ n~. From Theorem 4.3, it is clear
that there exists an optimal solution z* ofIP such that IIsy* - z*lIoo :$ ns~.
Combining these proximity results, we obtain (4.30). g]

4.2 Algorithm for problem IP


As pointed out earlier, we can assume that the polyhedron defined by the
{x I Ax ~ b, x E Rn} is bounded. This implies that there exists a positive
integer 'Y such that an optimal solution of the problem,
IP(l): minimize J(z)
subject to Az ~ b,
-2"Y+1n~e <
- z -< 2"Y+1n~e ,
z: integer,

where e = (1, ... ,1)T, is also optimal to IP (see the book [98] for the proof
of the existence of such 'Y)' In particular, we can choose

Now the algorithm for IP is described as follows:

Algorithm SOLVK1P
Input: An instance of problem IP.
204 N. Katoh and T. Ibaraki

Output: An optimal solution z.


Step 1: Let Sk := 2'Y- k , k = 0, ... ", and x(O} := O.
Step 2: For each of k = 1, ... , f' compute an optimal solution x(k} of the
following problem:
LP(k} / Sk: minimize JL:Sk(X)
subject to Ax ~ b,
x(k-l} - 2nsk-l~e ::; x ::; x(k-l} + 2nsk-1~e.
Step 3: Obtain an optimal solution z by solving the following integer pro-
gram with lower bound Lj = xJ'Y} - 2n~ and upper bound Uj = xJ'Y} + 2n~
on zr
IP*: minimize J(z)
subject to Az ~ b,
x("() - 2n~e ::; z ::; x("() + 2n~e
z: integer.
Output z as an optimal solution of IP.
Let T(n, m,~) denote the running time of a linear programming algo-
rithm for solving min{ ex I Ax ~ b, 0 ::; Xj ::; 1 for j = 1,2, ... ,n}, where
~ denotes the maximum absolute value of the sub determinants of A, and
let T I(n, m, A) denote the running time to solve the integer linear program-
ming problem min{ ex I Ax ~ b, 0 ::; x ::; 1 for all j, x: integer}. Let Ak
denote the m x kn matrix in which each column of A appears k times.

Theorem 4.5 The solution z obtained by Algorithm SOLVK1P is an opti-


mal solution oj problem IP. The running time oj SOLVEJP is

PROOF. Consider the problem LP(I) /81 with SI = 2'Y- 1, which is solved
in Step 1 for k = l. This LP(I} /SI is a relaxation of IP(I) since Jf S1 (x)
is a piecewise linearization of the original function J(x). From Theorem
4.4, we see that there exists an optimal solution z(l) of IP(I) such that
IIx(l) - z(1) 1100 ::; 2n81~ = 2'Yn~. Therefore, an optimal solution of the
integer program,
IP(2): minimize J(z)
subject to Az ~ b,
x(1) - 2'Yn~e ::; z ::; x(1) + 2'Yn~e
z: integer,
Resource Allocation Problems 205

is also optimal to IP(l) (and hence of IP). Observe that problem LP(2) /S2
with S2 = 2'Y- 2 is a linearized relaxation ofIP(2). Hence, again from Theorem
4.4, we see that there exists an optimal solution z(2) of IP(2) (and hence of
IP) such that Ilx(2) - z(2) 1100 ::; 2ns2b. = 2'Y- 1nb.. Repeating this argument,
we see that, for each k = 1, ... ", there exists an optimal solution z(k) ofIP
such that Ilx(k) - z(k) 1100 ::; 2'Y+1-knb.. Therefore, for each k = 1, ... ", an
optimal solution of IP(k) (defined similarly to IP(2)) is also optimal to IP.
Since Theorem 4.4 tells that there exists an optimal solution 2 of IP such
that Ilx b ) - 21100 ::; 2nb., the correctness of Algorithm SOLVE..IP follows.
As for the time complexity, Step 2 is repeated , times. Here, =
pog2((m/n)llbII 00 )1 - 1 < log2((m/n)llbII 00 ). At every iteration, each vari-
able Xj is replaced by Snb. variables (see Remark 4.1) because Xj is in an
interval of length 4nsk-1b. from definition of LP(k) / Sk, and the number of
variables used for linearization is 4nsk-1b./ Sk = Snb.. Thus, the time com-
plexity for solving the corresponding linear program is T(Sn 2b., m, b.). In
order to solve the problem IP* in Step 3, we first transform the problem into
an integer linear program. For this transformation, we need to introduce
4nb. 0 - 1 variables since x j for all j are in an interval of length 4nb.. Thus,
the number of resulting 0 - 1 variables is 4n 2 b., and the matrix A has each
column duplicated 4nb. times. Hence Step 3 requires T I( 4n 2b., m, A4nLl)
time. QJ

Corollary 4.6 If A is totally unimodular, problem IP can be solved in poly-


nomial time. The time complexity is log2((m/n)llbII 00 ) . T(Sn 2, m, 1).

PROOF. If A is totally unimodular, problem IP* can be solved by applying


a polynomial time algorithm for the linear program. The theorem then
immediately follows from b. = 1. QJ

4.3 Proximity theorem for submodular constraints


As remarked at the beginning of this section, the constraint matrix defining
the base polyhedron of a submodular system is not totally unimodular. Thus
we cannot directly apply the algorithm SOLVE..IP of the previous subsection
to SC/SM/D in order to obtain a polynomial time algorithm. However, as
will be shown below, we can establish a stronger proximity result for this
case.
Let us recall the following problem of SCISM/D.
206 N. Katoh and T. Ibaraki

Q: minimize 'EjEE /j(Xj),


(4.31)
subject to x: an integral base of B(r).
The following algorithm SM-INCREMENT(s) obtained by modifying an
incremental algorithm of Section 3.2.1 produces a solution x(s) of Q that
satisfies all the constraints except x(E) = r(E) (in fact, x(s)(E) < r(E)
may occur). This x(s) is then used to establish a strong proximity theorem.
Also, notice that this algorithm will be used as a subroutine in the algorithm
presented in Section 5.3.

Procedure SM-INCREMENT(s)
Input: An instance of problem Q of SC/SM/D and a positive integer s.
Output: A solution x(s) of SC/SM/D (which may not be feasible) and a
vector 0 E RE.

Let Xj := 0 for all j E E, E' := E and k := o.


while k < r(E) or E' = 0 do
begin
Find j* E E' such that
dj-(xj- + 1) = min{dj(xj + 1) I j E E'};
if x + e(j*) ¢ P(r) then E' := E' - {j*} and OJ- := s
else if x + s . e(j*) ¢ P(r) then
E' := E' - {j*}, Xj- := Xj- + 1, k := k + 1 and OJ- := 1
else Xj- := Xj- + s, k := k + s and OJ- := s
end;
Output x(s) := x and 6 = (OJ, j E E).

Let x* denote an optimal solution of Q. The vector 6 records the last


increments executed by SM-INCREMENT(s). Note that OJ = 1 or s implies
that x(s) is equal to 1 or 0 modulo s, respectively. In general, x(s) may not
be feasible to problem Q. However, the feature of allowing a single unit as
the last increment of a variable xj enables us to strengthen the statement
of the proximity theorem.

Theorem 4.7 For the solution X(8) and vector d output by SM-INCRE-
MENT(s), there exists an optimal solution x* olQ such that x* ~ x(s)- 6 ~
x(s) - se.
Resource Allocation Problems 207

PROOF. Since the second inequality is obvious from OJ ~ s, we shall concen-


trate on the proof of the first inequality. Let x/(s) be the solution obtained
by applying procedure SM-INCREMENT of Section 3.2.1 with x(s) as an
initial solution. Notice that x'(s) becomes a feasible solution of Q even if
x(s) is not. We define 0' by oj = 1 if X)s) is incremented by at least one unit
when SM-INCREMENT is applied, and oj = OJ otherwise. Thus, oj > OJ
cannot happen. Therefore, x,(s) - 0' 2: x(s) - 0 follows. In order to prove
the first inequality, it is now sufficient to show x* 2: x,(s) - 0'.
Let x** denote an optimal solution of Q that does not satisfy the assertion
of the theorem. Let x be defined by x = x** 1\ x,(s). Modify problem
Q to Q' by adding constraint x 2: x. Since x** 2: x, Q' has the same
optimal objective value as Q. Let x* be an optimal solution of Q' obtained
by SM-INCREMENT of Section 3.2.1 with x as an initial solution. Now,
in applying SM-INCREMENT we give a higher priority to j with Xj <
x~(s) - oj, whenever possible, for the next increment. If there exist a j
with Xj < x~(s) - oj and a k such that x + e(k) E P{r) and Xk 2: x~s),
dj{xj + 1) ~ dk{Xk + 1) must hold from the way of computing x'(s) in SM-
INCREMENT. Therefore, unless x 2: X'(S) -0' holds, we must have x ~ X'(S).
Now denote by x* the solution obtained by applying SM-INCREMENT with
x as an initial solution. This x* satisfies x* ~ X'(S). Thus, we must have
x* = X'(S) since otherwise ~jEEX* < ~jEEXI(S). This proves x* 2: X'(S) -0'.
g)

Corollary 4.8 For any optimal solution x* of Q, there exists an optimal


solution x of the continuous relaxation of Q such that

x* - e ~ x ~ x* + ne, (4.32)

and, vice versa, for any optimal solution x of the continuous relaxation of
Q, there exists an optimal solution x* of Q that satisfies (4.32).

PROOF. Let x denote a continuous optimal solution. Given any E > 0


such that liE is an integer, let us consider an optimal solution X{E) of the
following problem.

Q{E): mInImIZe ~jEE !i{EXj),


subject to EX: a base of B{r), (4.33)
x j : nonnegative integer, j E E.
208 N. Katoh and T. Ibaraki

It is easy to see that an optimal solution x* of Q is derived by applying


procedure SM-INCREMENT(1/€) to Q(€). Hence from Theorem 4.7,

Since x = liI14~ox(€), this becomes


x* - e:5 x.
In order to prove the second inequality, suppose

-
Xi> Xi* + n

for some i E E. Since Xj + 1 ~ xj holds for all j by x* - e :5 x, we have

L Xj + (n - 1) ~ L xj.
j#i j#i

Summing up the above two inequalities, we have

L Xj + (n - 1) > LX; + n,
jEE jEE

which is a contradiction since x*(E) = x(E) = r(E). This proves the second
inequality. QJ

This corollary says in particular Ilx* - xlloo :5 n. This is a tighter


proximity result than the one given in Theorem 4.2, which says Ilx* -xlloo :5
n~, since ~ for the submodular constraints is generally greater than one.
This proximity result will be used to develop efficient algorithms for SC /. /D
and SQC/-/D in the next section.

4.4 Notes
Proximity results between continuous and integral optimal solutions for some
optimization problems have been known before the theorems in this section.
In Chapter 4 of the book by Ibaraki and Katoh [58], a proximity theorem
between SC/Simple/C and SC/Simple/D was shown, based on which an
efficient algorithm for SC/Simple/D was developed. An efficient algorithm
for Fair/SM/D by Fujishige, Katoh and Ichimori [39] is based on the prox-
imity theorem developed therein among Fair/SM/D, Minimax/SM/D and
Resource Allocation Problems 209

Maximin/SM/D. Namikawa and Ibaraki [93] proved a similar theorem be-


tween Fair/SM/C and Fair/SM/D, based on which an efficient algorithm for
Fair/SM/D was derived.
These proximity results come from the special structures of objective
functions and/or constraints. Extending this line of development, we may
be able to show many other types of proximity results.
Notice that Algorithm SOLVE.1P explained in Section 4.2 does not spec-
ify an algorithm that solves the linear program LP(k) / Sk. When the con-
straint matrix represents a network, Hochbaum [50] proposed a network
flow algorithm for this case which is an adaptation of the succesive short-
est path method for minimum-cost fllow problems. For a general totally
unimodular constraint matrix, Karzanov and McCormick [64] very recently
proposed another polynomial time algorithm. The idea of their algorithm
is a generalization of the minimum mean cancellation method developed for
minimum-cost flow problems. Thus, the algorithm does not rely on lin-
ear programming, either. Karzanov and McCormick [64] further considered
specializations and generalizations by varying the form of the objective func-
tions and the constraints.

5 Lower Bounds on Time Complexity and Improved


Algorithms
We first explain an alternative algorithm for SC/SM/D, which is based on
the proximity theorem (Theorem 4.7) and is developed by Hochbaum [51].
This algorithm, when specialized to problems SC/GUB/D, SC/Nested/D
and SC/Tree/D, gives better time bounds than the previous correspond-
ing algorithms. The running times of the resulting algorithms are
O(nlog(N/n)) for SC/GUB/D, O(nlognlog(N/n)) for SC/Nested/D, and
O(nlognlog(N/n)) for SC/Tree/D, respectively. However, none of these al-
gorithms is strongly polynomial. Here, a polynomial time algorithm is called
strongly polynomial if its running time depends only on the dimension of the
problem (i.e., the number of the input; e.g., n in our case) rather than in
the input size, and is called weakly polynomial otherwise. For instance, the
O(nlog(N/n)) time bound of the SC/Simple/D is weakly polynomial be-
cause the number of the input is O(n), but the running time depends on N.
We then show a lower bound result for SC/Simple/D obtained by Hochbaum
[51], which says that, under a certain computation model, there cannot ex-
ist a strongly polynomial algorithm even for SC/Simple/D, if we assume
210 N. Katoh and T. Ibaraki

that Ii's are general convex functions (e.g., polynomials of degree at least
three). This is then contrasted with the cases of linear and quadratic objec-
tive functions. In the latter cases, there are strongly polynomial algorithms
for SCr/D, where· denotes Simple, GUB, Nested, Tree and Network, as
demonstrated by Hochbaum and Hong [53]. In fact, Hochbaum and Hong
[53] developed algorithms with O(n) running time for GUB constraints,
O(n log n) time for Nested and Tree constraints, and O(nmlog(n2 /m)) time
for Network constraints.
We shall describe in this section the underlying basic ideas of how the
above results are derived.

5.1 Improved algorithms for SC/GUB/D, SC/Nested/D and


SC/Tree/D
The following algorithm is based on the scaling technique and the proxim-
ity theorem, Theorem 4.7, and solves problem SC/SM/D in O(n(logn +
T) log(r(E)/n)) time, where T denotes the time required to test whether
x + e(j) E P(r) holds or not for a given x E P(r). The algorithm is de-
scribed as follows:

Procedure SCALING
Input: An instance of problem SCISM/D.
Output: An optimal solution x*.
Let s := rr(E)/2n1 and 1 := x =: (0,0, ... ,0);
while s > 1 do
begin
Call SM-INCREMENT(s, l), and let xes) be its output;
Let 1 := xes) - s . e;
Let s:= rs/21
end;
Call SM-INCREMENT with xes) as an initial solution;
Output the obtained solution as x* .

In the above procedure, we use notation e= (1,1, ... ,1) and SM-INCRE-
MENT(s, l) is the same as procedure SM-INCREMENT(s) of Section 4.3
except that it starts with x = I as an initial solution.
Theorem 5.1 Given an instance of SC/SM/D, procedure SCALING cor-
rectly finds an optimal solution x* in O(n(logn + T) log(r(E)/n)) time.
Resource Allocation Problems 211

PROOF. Since the vector x(s) - s· e bounds an optimal solution from below
by Theorem 4.7, letting the constraint x ~ x(s) - s . e does not change
the optimal value of the problem. Thus, the correctness of the algorithm is
immediate.
As for the running time, the algorithm calls SM-INCREMENT(s, l)
O(logfr(E)/2nl) times and SM-INCREMENT once. Each time a call to
SM-INCREMENT(s,l) is made, there are at most f(r(E) - L:jEElj)/sl in-
crements are executed. We first note that L:jEE x(s) ~ r(E)-sn holds, since,
for the solution x(s) obtained by SM-INCREMENT(s, l), x(s) +s·e(j) f/. P(r)
holds for any j from the way of SM-INCREMENT(s, l) being executed (see
Section 4.3). Thus, we have

L lj L(X)s) - s) = L X)s) - sn
jEE jEE jEE
> r(E) - sn - sn ~ r(E) - 2sn.
Hence, at each iteration ofthe while-loop, we have f(r(E)- L:j=llj)/(s/2)1 ~
4n. Thus at each call to SM-INCREMENT(s, l), at most O(n) increments
are executed. The last call to SM-INCREMENT after the while-loop also
requires O(n) increments. Thus, from Theorem 3.16, the running time of
procedure SCALING is O(n(logn + T) log(r{E)/n)). {;!]

When specialized to problems SC/GUB/D, SC/Nested/D and SC/Tree/D,


a direct consequence of the above theorem gives an O(n 2 Iog(N/n)) time
algorithm for all of the above three problems because there are O(n) con-
straints and hence the feasibility test requires O(n) time. However, Hochbaum
[51] improved upon this running time by making use of the special structure
of the problems. The results obtained by [51] are summarized as follows:
Theorem 5.2 (i) SC/GUB/D can be solved in O{nlog(N/n)) time.
(ii) SC/Nested/D can be solved in O(nlognlog(N/n)) time.
(iii) SC/Tree/D can be solved in O(nlognlog(N/n)) time. {;!]

This improvement is attained by devising a systematic method for effi-


ciently checking the feasibility. We do not give here their details, but, in
order to illustrate her idea, we shall briefly explain how to deal with the
case of problem SC/Nested/D.
Suppose we have the following n nested constraints,

L Xj ~ bi, i = 1, ... ,n, (5.1)


jESi
212 N. Katoh and T. Ibaraki

where Si = {1, 2, ... ,i}, i = 1, ... ,n, are assumed for simplicity. We assume
b1 ~ b2 ~ ••• ~ bn without loss of generality.
Now we explain how procedure SCALING can be sped up. Since there
are n constraints, a straightforward approach to implement the feasibility
test requires O(n) time. This can be improved, however, to 0(1) time on
the average. For this purpose, we associate with each variable a nonnega-
tive "slack" Si, initialized as sf = bi - bi-l (where bo is assumed to be 0),
corresponding to the initial solution x = (0,0, ... ,0). Some of these slacks
sf are zero already at the beginning if bi = bi-l. During the execution, the
algorithm implicitly maintains the feasibility of constraints (5.1) through
the slack vector s. For the current feasible solution x, the set of constraints
(5.1) is equivalent to
bi ~ 0, i = 1,2, ... ,n. (5.2)
Here bi is defined by
i
bi = min{bi - L Xj, bi+l}, i = n, n - 1, ... ,1, (5.3)
j=l

where bn is set equal to bn - 'L"]=l Xj' We maintain the slack vector S so


that
(5.4)
holds for all i. bo = 0 is assumed by convention. Since bi is monotone
nondecreasing in i by definition, the feasibility test of the constraints (5.1)
reduces to checking whether Si ~ 0 holds for all i or not. Now suppose that
the current solution x is feasible and a vector S satisfying (5.4) is given. For
the test to check whether the increment of Xi by one unit is feasible or not,
we compute
k(i) := max{j I 0 ~ j ~ i,sj > O}, (5.5)
where So = 1 is assumed. If k(i) = 0, the increment of x by one unit is
clearly infeasible from definition of band s. Otherwise, it is feasible and we
let
Sk(i) := sk(i) - 1. (5.6)
It is easy to see that this update correctly maintains the vector b for the
new feasible solution x + e(i) implicitly through the updated slack vector S
because bk(i) = ... = bi holds !rom the ~efinition of k(i), and updating s by
(5.6) implicitly decreases all bk(i) , ... , bi by one unit, which maintains the
correct b defined by (5.3) for the new feasible solution x + e(i). We shall
Resource Allocation Problems 213

explain how this update of the vector 8 can be done in 0(1) time on the
average after describing the algorithm.
The input of the algorithm are a feasible vector x, the index i of the
variable to be increased, the current slack vector 8, and a 0 - 1 labeling
vector label defined below.

I if 8i > 0
labeli = { 0 (5.7)
if 8i = O.
The role of the vector label is for an efficient implementation of the algo-
rithm, and will become clear later.

Procedure FEASIBILITY-CHECK
Input: An instance of SC/Nested/D, a feasible solution x, an index i,
a slack vector 8, and a 0 - 1 labeling vector label.
Output: "feasible" and the updated slack vector 8 if x + e(i) is feasible,
and "infeasible" otherwise.
Let 80 := 1;
Compute k(i) by (5.5);
if k(i) > 0 then do
begin
Let 8k(i) := 8k(i) - 1 and labelk(i) := min{labelk(i) , 8k(i)};
Output "feasible" and (81, ... , 8 n )
end
else output "infeasible"
The implementation of FEASIBILITY-CHECK requires the labeling of
label j for all j, and the identification of index k (i) for all i. This can be
done in linear time as follows. Regarding the vector label as one-dimensional
array, label consists of the intervals of O's, separated by l's. We maintain
such intervals so that the interval where a given index i belongs to can
be efficiently retrieved. Left-end and right-end indices of the interval are
associated with the interval id, which enables us to find k(i) efficiently.
When labelk(i) is updated from 1 to 0, the adjacent two intervals are merged.
This is a special case of a "union-find" problem, and the sequence of O(n)
such operations can be executed in O(n) time using Gabow and Tarjan's
algorithm [40J. Thus, the running time 7' in Theorem 5.1 is 0(1) on the
average.
Therefore, from Theorem 5.1, the total complexity of algorithm SCAL-
ING for the nested problem is O((n 10gn+n)·log(N/n)) = O(n logn 10g(N/n)).
214 N. Katoh and T. Ibaraki

5.2 Lower bounds


We shall present two types of weakly polynomial lower bounds for SC /Simple-
/D, which tell that no strongly polynomial algorithm exists for SC/Simple-
/D. The first one is based on a comparison model and the second on an
algebraic tree model.

5.2.1 Lower bound on a comparison model


Consider the following problem of SC/Simple/D, where c
constant.
> ° is a given

n
Q: minimize L fi(Xj) + c· xn+1,
j=l
n+l
subject to L Xj = N,
j=l
Xj: nonnegative integer, j = 1, ... ,n + 1.

Let the functions fi be monotone decreasing in interval [0, l ~ J], and


constant in [l ~ J, N]. Let us define a vector x = (Xl, X2, ... ,xn+d as follows.

Xj = max{y I dj(Y) ~ c, Y E [1, l ~ J]},


(5.8)
Xn+1 = N - 'LJ=l Xj.
We claim that x is the unique optimal solution of Q. First note that x is
feasible to Q from (5.8). Since dn+1 (y) = c holds for all y, condition (5.8)
tells that x satisfies the optimality condition of Theorem 3.12; i.e.,

(5.9)

holds for all i, j with 1 ~ i, j ~ n + 1. Since all fj for j with 1 ~ j ~ n


are assumed to be monotone decreasing in interval [0, l ~ J], condition (5.8)
also tells that no other feasible solution x =1= x satisfies the condition of
Theorem 3.12. Thus, the above claim holds. Therefore, solving problem Q
is equivalent to computing Xj for all j with 1 ~ j ~ n. Now, the well-known
information theoretic lower bound (see e.g., the book [7] by Baase for its
proof) for finding such Xj is O(logl ~ J). Since each column (dj(y), y =
1, 2, ... , L~ J) is independent of the others, n(n log L~ J) lower bound follows
for n variables.
Resource Allocation Problems 215

Theorem 5.3 Based on a comparison model, a lower bound on the time


complexity for sc /Simple /D is n(n log l ~ J). g]

5.2.2 Lower bound on an algebraic tree model


The comparison model may be considered to be too restrictive. If we allow
to use arithmetic operations such as addition, subtraction, multiplication
and division, we may have a possibility of constructing a strongly polyno-
mial algorithm for SC/Simple/D. In fact, if all h are quadratic, we can
develop an O(n) time algorithm for SQC/Simple/D (SQC means "separa-
ble quadratic convex") is possible by making use of a continuous optimal
solution of SQC/Simple/C (see Section 4.6 of [58]) which can be obtained
in O(n) time by the algorithm for SQC/Simple/C of Brucker [20]. However,
this approach is denied in the general case, as will be shown below, even if
h are polynomials of degree 3.
The lower bound proof is based on Renegar's result [101] saying that
O(loglog(R/f)) is a lower bound for the time complexity of finding an f-
accurate single real root of a polynomial of a single variable of degree ~ 2
in an interval [0, R], even if the polynomial is monotone in that interval.
Here x E R is called an f-accurate root of an equation p(x) = 0 if x satisfies
Ix - x*1 ::; f for an exact root x* of p(x) = O. Let Pl(X), ... ,Pn(x) be n
monotone polynomials each with a single root in interval [0, N/n] for the
equation pj(x) = c, where c is a fixed constant. Since the choice of these
polynomials is arbitrary, the lower bound on finding the n roots of these n
equations is O(nloglog(N/nf)). Let

h(Xj) = fo Xj
pj(x)dx.

These fj are then polynomials of degree ~ 3. Now let us consider the


following problem Qf for f > 0 such that l/f is an integer:

· t t 0 "n+1
subJec wj=l Xj -- e'
N (5.10)

Xj: nonnegative integer, j = 1, ... ,n + 1.

Problem Qf is an instance of SC/Simple/D. We claim that, for any optimal


solution x* of Qf' Y = f· x* is an f-accurate vector of roots for the following
216 N. Katoh and T. Ibaraki

system of nonlinear equations:

Pl(yt} = c,
P2(Y2) = c,

This means that the computational complexity is not less than finding an
E-accurate vector for the above system of equations. From Theorem 3.12,
we have
dj(xj + 1) 2:: dn+1(x~+d (= c· E)

and dj(xj) $ dn+1(x~+1 + 1) (= c· E)


for all j, i.e,

for all j, implying that there are Yj E [EXj - E, EXj + E] such that pj(Yj) = C
from the monotonicity of pj(Yj}. This proves the above claim. Hence, a
lower bound for the time complexity of solving Qt is O(n log log(Njm}).
For E = 1, Qt is an instance of SCjSimplejD and thus we get the following
result.

Theorem 5.4 Based on the algebraic computation tree model, a lower bound
on the computational complexity for SCjSimplejD is O(n log log(Njn)). Q]

5.3 Strongly polynomial algorithms for separable quadratic


convex objective functions
Although no strongly polynomial time algorithm for SCjSimplejD exists
even for polynomials of degree 3, as proved in the previous subsection, there
is a possibility of a strongly polynomial algorithm for the case of a separable
quadratic objective function. In fact, Hochbaum and Hong [53] constructed
such algorithms for several cases of constraints. Results obtained by them
are summarized as follows:
Resource Allocation Problems 217

Theorem 5.5 Let n be the number of variables.


(i) SQCjSimplejD can be solved in O(n) time.
(ii) SQCjGUBjD can be solved in O(n) time.
(iii) SQC/Nested/D can be solved in O(nlogn) time.
(iv) SQC/'Iree/D can be solved in O(nlogn) time.
(v) SQC/Network/D can be solved in O(nmlog(n 2 /m)) time, where n
and m denote the numbers of nodes and arcs in a network. !;!]

We shall now explain the underlying idea to derive the above results. For
this purpose, we consider SQC/SM/D because all the five problems listed
in the above theorem are special cases of SQC/SMjD. The first idea is to
make use of an optimal solution of the corresponding continuous problem
SQC/SM/C. We assume here that SQC/SM/C can be solved in strongly
polynomial time since the continuous versions of all the above five problems
can be solved in strongly polynomial time using the quadratic form of the
objective function. The continuous optimal solution can then be used to
bound an integer optimal solution from above, based on the proximity the-
orem given in Section 4. The algorithm then applies procedure SCALING
of Section 5.1 in order to solve SQC/SM/D. This is strongly polynomial be-
cause the upper bound obtained is not far from the integer optimal solution
as will be seen below.
Let us first solve SQC/SM/C and let x be its continuous optimal solution.
Suppose that x is not an integral vector since otherwise we are already done.
Then, from the proximity result of Corollary 4.8, there exists an optimal
solution x* of SQC/SM/D such that

x*:Slx+eJ. (5.11)

Here we use the notation l x J for a vector x = (x 1, ... , x n ) to denote


(lxd , ... , lxnJ ) . Notice that the vector lx + eJ does not belong to the
submodular polyhedron P(r) because LjEElxj + 1J > r(E). Therefore,
in order to find x* from l x + eJ, we formulate a kind of dual problem of
SQC/SM/D, which can be solved in the same fashion as SQC/SM/D.
This is done by noticing the following result (cf. the book by Fujishige
[37]). We begin with some necessary notations. For a sub modular system
(D,r) on E, define a function r# : D ---+ Z as follows.

15 = {E - X I XED}, (5.12)

r#(E - X) = r(E) - r(X), X ED. (5.13)


218 N. Katoh and T. Ibaraki

Then r# is a supermodular function on the dual lattice V of V, where a


function r' defined over the distributive lattice V' is supermodular if it sat-
isfies
r'{X) + r'{Y) ~ r'{X U Y) + r'{X n Y) (5.14)
for all X, Y E V'. All the properties related to a submodular system can
be extended to a supermodular system (see the book by Fujishige [37]) by
reversing the direction of inequality ~ over R. The pair (V, r#) is called
the dual supermodular system of (V, r), and the following theorem is known
(see [37]).

Theorem 5.6 For a submodular system (V, r) and its dual supermodular
system (V, r#), it holds B(r) = B(r#). g]

This implies

min{2: /j(Xj) I x E B(r)} = min{2: /j(Xj) I x E B(r#)}.


jEE jEE
Therefore the separable convex minimization defined over a submodular sys-
tem is equivalent to the one defined over its dual supermodular system. Also,
an algorithm for SCjSMjD can be turned into an algorithm for the separa-
ble convex minimization with supermodular constraints, without changing
the time complexity. Taking constraint (5.11) into account, the problem we
want to solve is:

min{2: /j{Xj) I x E B{r#), x ~ Lx + ej}. (5.15)


jEE

This problem can be solved by any of the modifications of SM-


INCREMENT of Section 3.2.1, SM of Section 3.2.2 and SCALING of Section
5.1.
Now for the supermodular function r# contracted by the vector lx + eJ,

r#{E) = 2: LXj + 1J - r{E) (5.16)


jEE

holds from Lemma 3.5, which then implies

(5.17)

since E jE ELxj+1J ~ EjEE xj+n = r{E)+n. The inequality (5.17) together


with Theorem 3.16 imply that, if we apply SM-INCREMENT in a reverse
Resource Allocation Problems 219

manner to problem (5.15) as an initial solution x = lx + eJ, problem (5.15)


can be solved in O(n(logn + T)) time since r#(E) ~ n holds from (5.17),
where T denotes the time required to test whether x - e(j) E P(r#) holds
or not for a given x E P(r#). Combining this with Theorem 5.2, we have
the following lemma.
Lemma 5.7 Suppose that we are given an optimal solution of SQC/-/C,
where· denotes Simple, GUB, Nested and Tree. Then
(i) SQC/Simple/D and SQC/GUB/D can be solved in O(n) time.
(ii) SQC/Nested/D and SQC/Tree/D can be solved in O(nlogn) time.

We also have the the next lemma.


Lemma 5.8 Given an optimal solution of SQC/Network/C, SQC/Network-
/D can be solved in O(nm) time, where nand m are the numbers of nodes
and arcs in a network, respectively.
PROOF. Similarly to Lemma 5.7, we use procedure SM-INCREMENT in
a reverse manner to compute an optimal solution x* of SQC/Network/D
from a continuous optimal solution x. As mentioned earlier, this is done by
considering the problem of (5.15). From (5.17), we need only O(n) decre-
ments. Whether an decrement is feasible can be checked in O(m) time as is
well known in the network flow theory (see the book [2]). This proves the
lemma. g)

In order to efficiently compute a continuous optimal solution for the


above five problems, we need a technique based on the Kuhn-Tucker con-
ditions involving the original variables and Lagrange multipliers. Since the
objective function is quadratic, the Kuhn-Tucker conditions contain only
linear terms with respect to variables, and optimal Lagrange multipliers can
be efficiently computed using the special structure of the constraints. We
omit further details about how we can do this task efficiently (see [53]). We
shall only cite the results here.

Theorem 5.9 Let n be the number of variables.


(i) SQC/simple/C can be solved in O(n) time.
(ii) SQC/GUB/C can be solved in O(n) time.
(iii) SQC/Nested/C can be solved in O(nlogn) time.
(iv) SQC/Tree/C can be solved in O(nlogn) time.
(v) SQC/Network/C can be solved in O(nm log(n2 /m)) time, where n
and m denote the numbers of nodes and arcs in a network, respectively. g)
220 N. Katoh and T. Ibaraki

Combining Theorem 5.9 with Lemmas 5.7 and 5.8, we have Theorem
5.5.
In Hochbaum and Hong [53], the proof of Theorem 5.9(v) is done by
reducing it to the parametric flow problem, which is solved by a general-
ization of the algorithm given by Gallo, Grigoriadis and Tarjan [42] (since
the latter algorithm is applicable only to the case in which each quadratic
function fi(Xj) does not contain a linear term). Also, Hochbaum and Hong
[53] found a flaw in the algorithm of [42] and fixed it.

5.4 Notes and references


Strongly polynomial algorithms have also been developed for problems SQC j
LinearjC(or D) which have special structures constraints other than sub-
modular constraints; the transportation problem with a fixed number of
sources (or sinks) [24] and its extension [88], and the minimum-cost flow in
a series-parallel network with a single source and sink. In these cases, the
constraint matrix is a special case of a totally unimodular matrix, and the
problems can be viewed as separable convex cost flow problems. A strongly
polynomial algorithm has also been developed for the transportation prob-
lem with a certain type of nonseparable quadratic convex objective function
[54].

6 Nonseparable Convex Resource Allocation


We consider in this section a class of nonseparable convex resource allocation
problems. Recall that a separable convex function f defined over a base
polyhedron B(r) satisfies Lemma 3.11, i.e.,

f(x) + f(y) ~ f(x - e(i) + e(j)) + f(y + e(i) - e(j))


for any bases x, y E B(r) and any i,j E E with Xi > Yi and Xj < Yj such that
x-e(i)+e(j), y+e(i)-e(j) E B(r). Conversely, the class of convex functions
f (which is not separable in general) that satisfy the assertion of Lemma
3.11 is defined as an M -convex function, which was recently introduced by
Murota [90]. There are several beautiful theorems concerning an M-convex
function that can be viewed as a discrete analogue of theorems in classical
convex analysis. Since the class of M-convex functions contains a potentially
much wider class of problems than the separable convex functions, it is
interesting to ask whether there exists an efficient general algorithm for
Resource Allocation Problems 221

minimizing an M-convex function over a base polyhedron of a submodular


system. Very recently, Shioura [106] proposed a polynomial time algorithm
for this problem, which will be explained in this section after preparing some
notions and properties in the next subsection.

6.1 M-convex functions


Let E = {1, 2, ... ,n}. A function f : ZE -+ RU {oo} is said to be M -convex
if it satisfies the following property.

(M-EXC) For any x, y E dom f, and for any i E supp+(x - y),


there exists j E supp-(x - y) such that

f(x) + f(y) ~ f(x - e(i) + e(j)) + f(y + e(i) - e(j)), (6.1)


where

dom f = {x E ZE I f(x) < +oo},


supp+(x - y) = {k EEl Xk > Yk},
supp-(x - y) = {k EEl Xk < yd·
Notice that a separable convex function satisfies (6.1) (see the inequality
(3.21)) as was shown in Lemma 3.11. The property (M-EXC) implies that
dom f is an integral base polyhedron as shown by Murota [90] because
B ~ ZE is an integral base polyhedron if and only if B satisfies the following
property (see Lemma 3.7(ii)):

(B-EXC): For any x, y E dom f, and for any i E supp+(x - y), there
exists j E supp-(x - y) such that x - e(i) + e(j), y + e(i) - e(j) E B.

Before describing an algorithm for minimizing an M-convex function,


we start with the following results. A vector x E ZE that minimizes an
M-convex function f is called a minimizer of f, which is assumed to exist
in the following discussion. Let the underlying base polyhedron be denoted
by B. We also assume that B is bounded.

Lemma 6.1 Given an x E dom f, it holds f(x) ~ f(y) for all y E dom f
if and only if f(x) ~ f(x - e(i) + e(j)) for all i,j E E.

PROOF. The proof is done essentially in the same manner as the proof of
Theorem 3.12. g)
222 N. Katoh and T. Ibaraki

Lemma 6.2 Consider an M -convex function f : ZE -t R U {oo}.


(i) Given an x E dom f and a j E E, let i E E satisfy f(x-e(i)+e(j)) =
minhEE f(x - e(h) + e(j)). Then for the x' = x - e(i) + e(j), there exists a
minimizer x* of f that satisfies xi ~ x~.
(ii) Given an x E dom f and an i E E, let j E E satisfy f(x - e(i) +
e(j)) = minlEE f(x - e(i) + e(l)). Then for the x' = x - e(i) + e(j), there
exists a minimizer x* of f that satisfies xj ~ xj.

PROOF. We only prove (i). Let x* be a minimizer of f such that xi is


minimum among all minimizers of f. Suppose xi > x~. From (M-EXC),
there exists a j E supp-(x* - x') such that f(x*) + f(x ' ) ~ f(x* - e(i) +
e(j)) + f(x ' + e(i) - e(j)). However, from the assumptions on x· and x', we
have f(x* - e(i) + e(j)) ~ f(x*), implying that f(x* - e(i) + e(j)) is also a
minimizer of f. This contradicts the assumption on x* . g]

Corollary 6.3 Let x E dom f be not a minimizer of an M -convex function


f, for which i, j E E satisfy
f(x - e(i) + e(j)) = min f(x - e(h) + e(l)).
h,lEE

Then there exists a minimizer x· of f satisfying xi ~ Xi -1 and xj ~ Xj + 1.


g]

6.2 A polynomial algorithm for minimizing an M -convex func-


tion
Lemma 6.1 suggests a simple algorithm for finding a minimizer of f in a man-
ner similar to the steepest-descent method for minimizing a convex function
over continuous variables. That is, starting from x E B, we check whether

f(x) = min f(x - e(h) + e(l))


h,lEE

holds. If so, x is a minimizer of f by Lemma 6.1. Otherwise, repeat the


above procedure after updating x := x - e(i) + e(j), where the i,j E E
satisfy f(x - e(i) + e(j)) = minh,lEE f(x - e(h) + e(l)). Since the value of
f(x) strictly decreases in every iteration, we can eventually find a minimizer
of f. However, this simple approach cannot guarantee the convergence in
polynomial time, and we need to devise a more sophisticated method.
For this, we define the narrowed base polyhedron Bnarrow ~ B as follows.
For each j E E, define
Resource Allocation Problems 223

(6.2)

and then

lj = l{1 - 1/n)lj + {1/n)ujJ, uj = r{l/n)lj + {1 - 1/n)uj 1- (6.3)

The Bnarrow is defined as

Bnarrow = {x E B Ilj :$ Xj :$ uj, j E E}.

In other words, Bnarrow is the base polyhedron obtained from B by per-


forming the restriction by u' = (u~, ... , u~) and the contraction by l' =
(l~, ... ,l~) to B.

Theorem 6.4 For a base polyhedron B, the narrowed base polyhedron Bnarrow
satisfies Bnarrow :f:. 0.

PROOF. Note that lj = r{E) - r{E - {j}) and Uj = r(j) hold for all j E E
(see [37]). It suffices to show the following (see [37]):

(i) lj{X):$ r{X) for all X ~ E,


(ii) uj{X) ~ r{E) - r{E - X) for all X ~ E.

Since (ii) can be proved similarly, we only prove (i). Let IXI = k. We claim

k . r{X) +k L [r{E - {j}) - r{E)] ~ L [r(j) + r{E - {j}) - r{E)]. (6.4)


jEX jEX

In fact,

(the left-hand side of (6.4))


= k· r{X) +L L [r{E - {k}) - r(E)] + L[r{E - {j}) - r{E)]
jEX kEX-{j} jEX
> k· r(X) +L [r{E - (X - {j})) - r{E)] +L [r{E - {j}) - r{E)]
jEX jEX
> {the right-hand side of (6.4)),

where the inequalities follow from the submodularity of r. Since the right-
hand side of (6.4) is nonnegative, the k in (6.4) can be replaced by n (~ k).
224 N. Katoh and T. Ibaraki

Thus, we conclude
r(X) > (1 - l/n) L [r(E - {j}) - r(E)] + L r(j)
jEX jEX
> Ij(X).

The following algorithm for finding a minimizer x* of f iteratively re-


duces dom f, until dom f contains a single base.

Procedure REDUCTION
Input: M -convex function f.
Output: A minimizer x* of f.

Let B := dom f, and compute vectors I and u by (6.2)j


while Uj -Ij ~ 1 for some j do
begin
Compute vectors [' and u' of (6.3) to define Bnarrowj
Find a vector x E Bnarrow j
Let f(x - e(i) + e(j)) := minh,lEE f(x - e(h) + e(l))j
if f(x) = f(x - e(i) + e(j)) then output x* := x and halt
else let Ui := Xi - 1, lj := Yj + 1, and B := B U {y E ZE I Yi ~ Ui,
Yj ~ lj};
end.

The correctness immediately follows from Corollary 6.3 and Theorem


6.4. Let us analyze the running time.
Lemma 6.5 Let Bk, lk and uk denote the set B, I and U at the beginning
of the k-th iteration of the while-loop of REDUCTION, respectively. Then
u~+1-I~+1 < (l-l/nHu~ -I~) holds for h = i or j, where i,j E E are the
elements found in the k-th execution of the while-loop.
PROOF. We only consider the case of h = i (the case of h = j is similar),
and let x E B~arrow be the vector chosen in the while loop. Then,
u~+1
t
- 1~+1
z < X·z - 1 - lipz
< f(l/n)lf + (1 - l/n)ufl - 1 -If
< (1 - l/nHuf -If)·
Resource Allocation Problems 225

Lemma 6.6 Procedure REDUCTION halts in O(n21ogL) iterations, where

L= %a:(u} -lJ). (6.5)

PROOF. Since u~ -l~ is a nonnegative integer for every j E E, the procedure


halts if u~ -lj < 1 holds for all j E E. Let k be the minimum integer such
that (1 - 1/n)1/k(u} -l}) < 1. Then,

k < -In(u} -l})jln(l -lin) + 1


< nln(u} -l}) + 1 $ nlnL + 1
by the inequality In z $ z - 1 for z > O. Thus, the lemma follows. g)

The most time-consuming part of the above procedure is the computa-


tion of x E Bnarrow. We assume that a subroutine to test whether a vector x
belongs to the submodular polyhedron P or not is available. Notice that this
test can be done in polynomial time, in general by Lemma 3.10. So, the land
u satisfying (6.2) can be computed by applying the subroutine O(nlogL)
times, because lj and Uj for each j can be computed by O(logL) applica-
tions of the subroutine using binary search. The computation of x E Bnarrow
is done, starting from x = l, by iteratively updating x as x + a . e(j) for
each j, where a = max{d I x + d· e(j) E B}. (See [37] for the justification
of this procedure.) Therefore, x E Bnarrow can be computed by applying
the subroutine O(n log L) times. Hence we have the following theorem from
Lemma 6.6.
Theorem 6.7 A minimizer of an M -convex function f with n variables
can be computed in polynomial time. It is required to check whether a given
vector x belongs to B or not O(n3 1og2 L) times where L is given by (6.5). g)

7 Applications
As mentioned in Introduction, the resource allocation problem has a wide
range of applications. In addition to standard ones (see e.g., [58]), we shall
introduce here several new, interesting applications.

7.1 Computer science applications


(1) Optimal bat ching policies for Video-on-Demand storage server (Aggar-
wal, Wolf and Yu, [1]).
226 N. Katoh and T. Ibaraki

In a video-on demand environment, batching of video requests is often


used to reduce I/O demand and improve throughput. The basic idea of
bat ching is to delay the requests for some videos for a certain amount of
time so that more requests for the same video arriving during the current
batching interval may be serviced using the same stream. Thus, when a
stream becomes available at the server end, a question arises as to which
video is best to schedule at that particular time moment.
There may be several scheduling policies for implementing the batch-
ing. Since viewers may defect if they experience long waits, a good video
scheduling policy needs to consider not only the batch size but also the
viewer defection probabilities and wait times. Two conventional scheduling
policies for bat ching are the first come-first-served (FCFS) policy which al-
ways schedules the video with the longest waiting request, and the maximum
queue length (MQL) policy which selects the video with the maximum num-
ber of waiting requests. Neither of these policies lead to entirely satisfactory
results. MQL tends to be too aggressive in scheduling popular videos, while
FCFS has the opposite effect by completely ignoring the queue length.
Let us formulate the problem of determining a best batching policy as an
optimization problem after introducing the following performance measure:
Suppose that the server contains a database of n different videos from which
the clients may make requests.

• A verage latency time: The latency of a viewer is the period that elapses
between the arrival of the video request and the time when the service
to the display device is actually initiated.

Assume that the frequency of requests for the jth video is known, and is
denoted by /j. If Lj denotes the length of video j, then L = (,£j=1 f; .
L j )/(,£j=I/j) is the average video length. Assume that the server capacity
(in terms of the number of streams) is S. Consequently, the average number
of streams which are scheduled by the server every minute at full capacity is
S / L. Suppose that tI, t2, ... ,tn are the average time intervals at which the
videos 1,2, ... , n are batched. Then the average latency for a video of type
j is equal to tj /2. Our objective in this model is to minimize the average
latency of the viewers. Thus, modulo a constant that we can ignore, we
want to minimize the sum of the expected latency times of all the requests
which arrive in a unit interval of time: ,£j=1 /j . tj /2.
We need a constraint that limits the number of video streams which can
be scheduled at any moment of time. On average, the number of streams for
Resource Allocation Problems 227

video j scheduled per unit of time is approximately equal to l/tj. (This is not
entirely rigorous because E[I/ X] i- 1/ E[X] holds for a random variable X,
in general. However, if X has a relatively small deviation, the approximation
E[I/ X] ~ 1/E[X] is fairly accurate.) Thus, the total number of streams of
all video types scheduled per unit of time at full capacity is 'L/J=11/tj, which
must be equal to S / L. Thus, we have the following constrained nonlinear
program:

n
mimmize L!j· tj,
j=l
n
subject to L l/tj = S/L,
j=l
tj ~ 0, j = 1, ... ,n.

This is a typical problem of SC/Simple/C. There exists the unique optimal


solution, which satisfies

In the absence of any defections from the system, the average queue length
qj of the jth video at the time of bat ching is equal to qj = tj . fj. Thus,
from the above equations, we have the following relationship for the queue
lengths at the time of batching:
ql q2 qn
Vf1 - J12 - .../Tn.
In other words, we have the following intuitive result:

Theorem 7.1 In order to achieve the best average latency time at the above
video-on-demand server, the queue lengths qj at the time of batching should
be approximately proportional to JTj. g]

Theorem 7.1 suggests the following greedy scheduling policy, called maxi-
mum factored queue length policy, or MFQ policy, for the video-on demand
problem.

Whenever the server capacity becomes available for scheduling a


stream, choose the video with the largest qj/ JTj.
228 N. Katoh and T. Ibaraki

Aggarwal, Wolf and Yu [1] performed a simulation study to establish the


effectiveness of the proposed MFQ policy against existing MQL and FCFS
policies from several points of view.

(2) Optimal buffer partition of the nested block join algorithm (Wolf, lyer,
Pattipati and Yu [117]).
Nested loop is the well-known method for query processing in a rela-
tional database, when queries require complex join predicates. There are
two alternatives for efficient execution of join operations; sort merge join
and nested loop join. It is said that nested loop join is preferable when only
a small amount of main memory is available. Nested loop join is executed
as follows; for every row of the outer relation, we examine the inner relation
in order to seek for a match. We loop over the inner relation once for every
row of the outer relation. This generalizes to the case of more than two
relations. If no relation fits into main memory, we have to fetch each row of
the inner relation from disk for each row of the outer relation. This requires
too much I/O access.
In order to reduce the number of I/O accesses, nested block join has
been proposed by Kim [67]. Suppose that we are employing a block (which
is usually called a page) as the unit ofI/O access. Each relation is partitioned
into consecutive blocks of rows of the same size so that the sum of block sizes
of two relations does not exceed memory size. One block of each relation
is brought into memory. A naive loop join is performed among the rows
contained within the blocks in memory. Suppose that we have R relations
and a total buffer size in memory is B. Let Pi and Xi denote the size
of relation i and its block size, respectively; i.e., relation i is divided into
rpdxil blocks. Let us assume that relation 1 is outermost, and so on. Since
relation 1 is fetched into memory once, the number of l/Os for relation 1 is
Pl. For each block of relation 1, we need to fetch relation 2 into memory
once. Thus, the number of l/Os for relation 2 is P2 rpl/ xl1- Similarly, the
number ofI/Os for relation 3 is thus P3rp2/x2 Hpl/xll, and so forth. Thus,
the total number of l/Os is given by

R i-l
LPi II rpj/Xj1-
i=l j=l
(7.1)

We want to minimize this objective function under the buffer size B. It


is easy to see from the form of the objective function (7.1) that we can set
Resource Allocation Problems 229

x R = 1. Thus, the problem is formulated as follows:

R i-I
Q: minimize LPi II
rpj/Xj 1 (7.2)
i=1 j=1
R-l
subject to L Xi = B-1, (7.3)
i=1
Xi: positive integer, i = 1, ... ,n. (7.4)
We assume here for simplicity that the ordering of relations is fixed. Since
the objective function of Q is not separable, we need a general algorithm
such as a dynamic programming algorithm of an O(RB2) time bound [58],
which can be improved to O(RB1.5) by using a clever analysis based on the
number theory [117]. Here, however, we consider the following approximate
problem.
R i-I
Q' : minimize LPi II
Pj/Xj (7.5)
i=1 j=1
R-l
subject to L Xi = B-1, (7.6)
i=l
Xj: positive integer, i = 1, ... ,n. (7.7)
Although the objective function is still nonseparable, surprisingly, the incre-
mental algorithm of Section 3.1 works for this problem, as will be seen below.
Thus this is another class of problems for which an incremental algorithm
works.
Now we shall describe a class of problems, including problem Q', for
which the incremental algorithm works. Let al, ... , an and bl, ... , bn be
integers satisfying ai ::; bi for all i. Let F(Xl, ... , xn) be a function defined
on {aI, al + 1, ... , bl}x {a2' a2 + 1, ... , b2}X ... x {an, an + 1, ... , bn }, and
let K be an integer satisfying II?=l ai ::; K ::; II?=1 bi. Consider the problem
RK of minimizing

(7.8)
n
subject to LXi = K, (7.9)
i=1
Xi E {ai,ai + 1, ... ,bi }, i = 1, ... ,no (7.10)
230 N. Katoh and T. Ibaraki

An incremental algorithm for problem RK starts from (Xl, ... ,Xn ) =


(aI, ... ,an), and, in each iteration, increments the variable Xi. by one, where
i* is chosen so that the cost increase incurred by the increment of Xi. is
minimum among all variables Xi with Xi < bi. This process is repeated until
2:f=l Xi = K is satisfied.
Definition: Given a function F(Xl' ... ,xn ) with the domain as described
above, define, for each K with 2:f=l ai :5 K :5 2:f=l bi, the expression F(K)
to be equal to the optimal objective value of problem RK. Thus, F(K) is a
single-variable function and is called the surrogate function of F.
The following theorem can be viewed as describing how to glue together
those functions which can be solved by incremental algorithms into a new
function which can again be solved by an incremental algorithm.

Theorem 7.2 (i) For each i = 1, ... , M, let .Fi(Xi,l, ... , Xi,nJ be a func-
tion for which an incremental algorithm can be applied, and suppose that
each of the resulting surrogate function Pi
is convex. Then, an incremental
algorithm can also be applied to
M
FE({Xij}) = LFi(Xi,I, ... ,Xi,NJ,
i=l
and the surrogate function FE of FE is also convex.
(ii) For each i = 1, ... , M, let Fi (Xi,I, .•. , Xi,n,) be a positive-valued func-
tion for which an incremental algorithm can be applied, and suppose that
each of the resulting surrogate function Pihas a convex logarithm, i. e.,
log Fi (K) is convex in K. Then an incremental algorithm can be applied to
M
Frr({Xij}) = II Fi(Xi,l, ... ,Xi,n.),
i=l
and the surrogate function Frr also has a convex logarithm.

PROOF. (i) The fact that an incremental algorithm can be applied is evident
by applying Theorem 3.2 to the objective function, which can be rewritten
in convex, separable form as 2:i';;1 Pi(Yi), where Yi = 2:j::l Xij. To show the
convexity of FE, let us look at the values of the surrogate function FE at
three consecutive integers, say K, K + 1 and K + 2. Suppose
M
FE(K) = L Fi(Yi)
j=l
Resource Allocation Problems 231

for some choice (YI,"" YM) whose sum satisfies I:~I Yi = K. Since an
incremental algorithm can be applied to the right hand side, there exists an
index i l E {1, ... ,M} for which F'5:,(K + 1) = FI (YI) + ... + Fi(Yil + 1) +
... + FM (y M). Similarly, there will exist an index i2 E {1, ... , M} which
corresponds to the value of FdK + 2) when incremented by one more from
(YI,' .. ,Yil + 1, .. , ,YM). There are two cases: Either i l = i2 or i l f: i2' In
the first case, we get

FdK + 2) - F'5:,(K + 1) =
[FI (YI) + ... + Fil (Yil + 2) + ... + FM(YM)] -
[F1(YI) + ... + Fil (Yil + 1) + ... + FM(YM)]
Fil (Yil + 2) - Fil (Yil + 1)
> Fil (Yil + 1) - Fil (Yil)
[FI (YI) + ... + Fil (Yil + 1) + ... + FM(YM)] -
[FI (YI) + ... + Fil (Yil) + ... + FM(YM)]
FdK + 1) - F'5:,(K). (7.11)
Here, the inequality at the center is due to the convexity of Fil . In the
second case we get, analogously,

F'5:,(K + 2) - F'5:,(K + 1) =
[FI(YI) + ... + Fil (Yil + 1) + ... + Fi2(Yi2 + 1) + ... + FM(YM)] -
[FI (YI) + ... + Fil (Yil + 1) + ... + Fi2 (Yi2) + ... + FM(YM)]
Fi2 (Yi2 + 1) - Fi2 (Yi2)
> Fil (Yil + 1) - Fil (Yil)
[FI(YI) + ... + Fil (Yil + 1) + ... + FM(YM)]-
[FI (YI) + ... + Fil (Yil) + ... + FM(YM)]
= F~(K + 1) - FdK). (7.12)
Here, the inequality at the center is due to the rule of the incremental
algorithm. The results (7.11) and (7.12) tell that F~(K) is convex in K.
(ii) Note that minimizing a function amounts to minimizing the loga-
rithm of the function. Since
M M
log II Fi(Yd = L log Fi(Yi), (7.13)
i=I i=I
232 N. Katoh and T. Ibaraki

and log F'i(Yi) is convex by assumption, part (ii) follows from part (i). g]

We will say that a function with a convex logarithm is logarithmic convex.


A positive logarithmic convex function over the integers can be shown to be
convex. This follows easily from the fact that the arithmetic mean of two
numbers exceeds the geometric mean. But the converse is not true. So
the convexity condition in part (ii) of Theorem 7.2 is slightly stronger than
that of part (i). However, functions of the form p/ K, where K is a positive
integer, are logarithmic convex. Furthermore, adding a positive constant
to a logarithmic convex function yields a new logarithmic convex function.
Now we can state the following theorem.

Theorem 7.3 An incremental algorithm of Section 3.1 correctly solves prob-


lem Q' of (7.5).

PROOF. Rewrite the objective function of problem Q' of (7.5) as

PI + PI {P2 + P2 {P3 + P3 {P4 + ... + PR-2 {PR-I + PR-I PR}" .}. (7.14)
Xl X2 X3 XR-2 XR-I

Notice that a function f{x) = a/x + b for constants a > 0 and b ~ 0 is


logarithmic convex, and that the product of two such functions is also log-
arithmic convex. Thus, applying this argument from the innermost terms
of (7.14) out to the outermost, it follows that the function of (7.14) is loga-
rithmic convex. Thus, from Theorem 7.2{ii), the theorem follows. g]

The computational complexity of this procedure is O{RB) since employ-


ing Theorem 7.2{ii) at one stage takes O(R) time, and there are O(B) such
stages.
Notice that the polynomial algorithm of Section 3.1.2 for SC/Simple/D
cannot be applied to problem Q' and it is not known yet whether Q' can be
solved in polynomial time or not. However, to our knowledge, Q' is the first
nonseparable convex resource allocation problem that has been shown to be
solvable by an incremental algorithm.

7.2 Reliability applications

(1) Optimal allocation for software-testing resources (Ohtera and Yamada


[96]).
Resource Allocation Problems 233

During the module testing stage in the software development process,


the manager has to determine an optimal allocation of resources such as
manpower, CPU hours, test cases to be executed and so forth, so that the
software quality and reliability are maximized. This problem is modeled as
a resource allocation problem on the basis of the software reliability growth
model developed by Kubat and Koch [77], which describes the relationship
between the applied resource and the detected software errors. In the model
of [77], the average of the cumulative number of software errors m{t) detected
in the time interval (O, t] is expressed in terms of W{t), the cumulative
resource expenditures used until time tj i.e.,
m{t) = a{l - exp[-rW{t)]), (7.15)
where a denotes the mean number of initial errors, and r (> 0) denotes the
error detection rate per unit resource expenditure.
Based on this model, we can define a resource allocation problem to
minimize software errors under the following assumptions:
1. The software system is composed of M independent modules. The num-
ber of software errors remaining in each module can be estimated by
the model of [77].
2. The manager has to allocate the total amount of testing-resource R to
each software module to minimize the number of software errors re-
maining in the system.
From (7.15), the number of software errors remaining in module j is
estimated as:

a o. exp[-roqo]
ZoJ-- J JJ'
J" - 1 2 ... " M
- " (7.16)
where aj and rj denote the average of initial number of errors in module
j and the error detection rate per unit resource expenditure for module j,
respectively.
Thus, the testing-resource allocation problem is formulated as:

M
minimize L vjaj . exp[-rjqj] (7.17)
j=l
M
subject to L qj = R, (7.18)
j=l
qj ~ 0, j = 1,2, ... ,M, (7.19)
234 N. Katoh and T. Ibaraki

where Vj stands for the relative importance of module j. This problem is a


special case of SC/Simple/C.

7.3 Production planning applications


To illustrate rich applications in production planning as mentioned in [58],
we mention a new application below.
(1) Service-constrained optimal stocking policy for spare parts (Cohen,
Kleindorfer and Lee [22]).
The problem discussed here is to determine the optimal stocking policy
for a facility that stocks various spare parts for repairs to satisfy an overall
service level requirement. Each product consists of modules or field replace-
able units (FRU for short). The problem here is to determine the number of
each FRU to be kept so as to minimize the expected holding and ordering
costs for the facility while meeting the specified service levels for products
and customers. Suppose that a product is made up of n FRUs. We de-
note the FRUs by subscripts i E N = {I, 2, ... ,n}. Demands for parts in
any given period are represented by nonnegative integer-valued independent
random variables {Di liE N} whose probability density function and cumu-
lative distribution function are denoted by Ii and Fi, respectively. We want
to determine the stocks at the beginning of each period S = {Si liE N} for
the n parts, where Si are nonnegative integers. Stock replenishment then
occurs instantaneously (Le., zero lead time) at the end of each period to
restore the stock levels depleted through the period demands.
The objective is to minimize the expected total cost per period:

minimize G(S) = L Gi(Si), (7.20)


iEN

where Gi(Si) is the expected cost of part i E N per period defined by

Gi(Sd
= E{KiO(Di) ordering cost
+(Cih/2)[Si + (Si - Di)+] holding cost (7.21)
+Cit min[Si' DiJ transportation cost
+Cis(Di - Sd+}· shortage cost

Here x+ = max(x,O) and o(x) = 0 (respectively, 1) if x :::; 0 (respectively,


x > 0). In (7.21), Ki is the fixed order cost, whkh. will be neglected since it
Resource Allocation Problems 235

is independent of the stock policy Si. The holding cost is the average of the
initial inventory Si and the final inventory (Si - Di)+ times the holding cost
per unit Cih. Transportation cost per unit Cit denotes the cost of issuing a
part, if available, and transporting it to the customer. Finally the shortage
cost Cis is associated with the incremental cost of meeting the demand from
a remote location. We assume that

Cis ~ Cit > 0, and Cih > 0 (7.22)

hold for all i. We wish to minimize (7.20) subject to the following service
constraints:

(7.23)
JEN JEN
The importance of (7.23) is that, in the long run, the excess demand oc-
curs for at most a fraction {3 of the period in which demand is nonzero.
This type of constraint is often used in this field. Now note that the event
{L:iEN(Di - Si)+ > O} is equivalent to the event {UiEN[Di > Si]}. More-
over, Pr{L:iEN Di > O} = 1 - IliEN Fi(O). Thus, (7.23) can be expressed
as

1 - IIiENFi(Si) ::; (3[1 - II Fi(O)]. (7.24)


iEN
Assuming that a = 1 - (3[1 - IliEN Fi(O)] is nonzero, we can obtain the
following equivalent, separable constraint:

L log(Fi(Si)) ~ log a, (7.25)


iEN

where a = 1 - (3[1 - IliEN Fi(O)] is a constant.


Therefore, the above problem has a separable objective function and a
separable nonlinear constraint. Since the derivative of Gi(Si) is shown to be

it follows that Gi(Sd is convex because Cis - Cit + Cih/2 > 0 from (7.22)
and Fi (Si) in non decreasing in Si.
Efficient solution algorithm similar to the incremental algorithm has been
developed by [22] although the constraint is nonlinear.
236 N. Katoh and T. Ibaraki

7.4 Other applications


(1) An optimal scheduling of the mass screening tests (Lee and Pierskalla
[79]).
The policy maker is faced with designing the mass screening programs
for contagious diseases by taking into account the tradeoff between the fre-
quency of test applications and the types of tests used (as different tech-
nologies may have different reliability characteristics and costs), against the
benefits to be achieved from detecting the defect in an earlier stage of de-
velopment. Moreover, because different subpopulations may have different
susceptibility to the disease, the problem of optimal allocation of a fixed
testing budget among subpopulations must be considered. This problem
will be formulated as SC/Simple/C under the assumptions stated below:

1. The latent period of the disease is negligible.

2. The infectious period of the disease is exponentially distributed. Note


that the termination of the infectious period can be a result of natural
deaths.

3. The size of the population is large enough so that the number of suscep-
tibles is relatively constant.

4. The incidence rates and the rates of transmission of the disease are sta-
tionaryover time, and are independent of each other.

5. The probability of an increase in the number of infected units at time t


is directly proportional to the number of units in the infectious period
of the disease at time t.

6. Once an infected unit is identified as having the disease after a mass


screening test, it will be removed from the population and subsequently
treated. As a result, it is assumed to be no longer infectious.

The assumption that the infectious periods are exponentially distributed


is common in most of the related models. Assumption 5 is also generally
used in epidemics. The following notation is used.

A: natural incidence rate of the disease for the unit,


'Y: rate of transmission of the disease from a contagious unit to a sus-
ceptible,
Resource Allocation Problems 237

J-L: rate of infected units ending the infectious period.

In the time interval [t, t'] consisting of two consecutive mass screening
tests, the number of contagious units is considered to be a stochastic process
described by
X(t + ~t) = >'~t + -yX(t) - p,X(t)~t.

From this equation and the above assumptions, we can obtain a closed-form
expression for the average number of infected units over [t, t']. Now suppose
we want to perform M mass screening tests over the planning time horizon
[0, T]. Then [79] has shown that the best scheduling policy for M tests is
to perform the tests with equal interval lengths. Then, it has been shown
that the average number of infected units in the population over a planning
horizon of T time units can be written as

[>.Ib - p,)2](MIT){exp[b - J-L)(TIM)] -I} - [>'Ib - 1-')], (7.26)

where M is the number of mass screening tests performed over the planning
time horizon [0, T]. As easily verified, given T, (7.26) is decreasing in M.
Now, suppose that each mass screening costs G, and B is the amount that
the society can afford or is willing to pay per unit time for mass screening.
Then, given a budget, the society would try to increase M as much as
possible. For argument's sake, suppose M is increased so that MIT = BIG
holds. The average number of infected units in the population under the
optimal screening schedule becomes:

1/J(B) = [>'Ib - J-L)2](BIG){exp[(-y - J-L)(GIB)) -I} - [>.Ib - J-L)). (7.27)


As T -+ 00, (7.27) can be viewed as the long range average number of
infected units in the population under the optimal mass screening schedule.
Notice that 1/J(B) is convex in B.
Suppose that there are J subpopulations, where there is little or no in-
teraction among the subpopulations as far as the transmission of the disease
is concerned. Thus, each subpopulation may be viewed as an independent
population in terms of the etiology of the disease under screening. Let D j (. )
be the respective utility function for subpopulations j. Suppose Bj and Gj
are the amount per unit time allotted to mass screening and the cost of
each mass screening for subpopulation j, respectively. Let B be the amount
per unit time available to all the subpopulations. Thus, the problem facing
238 N. Katoh and T. Ibaraki

the policy maker is to determine the Bj's by solving the following resource
allocation problem of type SCjSimplejC under the assumption that Dj{Y)
. .
IS concave III y:

maximize
(7.28)
,£f=l Dj C'Yj~~i)2 (g;.) . [exp {({j - Jlj) (i;)} - 1] - 'Yi>:!:JlJ
subject to ,£f=l B j = B,
(7.29)
Bj 2:: 0; j = 1, ... , J.

7.5 Notes and references


There are many other applications of the resource allocation problem. Among
them, applications to computer science are found in [110, 111, 112, 113, 114,
118, 122].
Baldick and Wu [8] considered the problem of an optimal design of coor-
dination of capacitators and regulators in which they applied the algorithm
by Hochbaum and Shanthikumar [55].
Eu [30] considered an optimal allocation of sample sizes in a real-time
monitoring system to detect errors, by sampling, of digital signals in telecom-
munication systems so as to minimize the sum of the so-called bit error ratios
of multiple signals. He formulated the problem as SC/LUB/C.
Li and Haimes [80] considered an optimization problem that arises in
designing a high-reliability large-scale system that is also viewed as a variant
of the resource allocation problem.
Shanthikumar and Yao [104] considered the optimal control problem for
a class of multi-class queueing systems. They found a certain submodular
structure of performance vectors that allows us to efficiently compute an
optimal performance vector.
Shanthikumar and Yao [103] studied the resource allocation problem in
queueing networks, which models certain types of manufacturing systems
where each node in the network represents a machine or a workstation. The
problem discussed in [103] seeks to find an optimal allocation of servers to
the stations in order to maximize the throughput, under the constraint that
the total number of servers to be allocated is fixed.
Bitran and Tirupati [12, 13] studied a manufacturing capacity planning
problem in manufacturing systems that are also regarded as queueing net-
works (see also Buzacott and Shanthikumar [21]). The problem involves the
Resource Allocation Problems 239

minimum cost selection of the service rate (or capacity) at each workstation
subject to an upper limit on the total dollar value of work in process in the
system. Under certain assumptions, they have formulated the problem as
the separable convex resource allocation problem under a separable convex
constraint. They presented efficient heuristic algorithms for the continuous
variable case [12] and for the discrete variable case [13].
We often encounter resource allocation problems with nonlinear con-
straints in production planning under uncertainty. One such application is
found in [119]. We shall discuss this topic again in Section 8.3.

8 Further Topics
In this section, we deal with some extensions and generalizations of resource
allocation problems.
The first topic is on the multiple resource allocation problem in which
each activity requires more than one resource. The modeling capability is
substantially enhanced by this generalization. An application of this prob-
lem is typically found in production planning which seeks to find an optimal
allocation of many types of resources such as raw materials, machines and
manpower among competing production lines so as to meet the demands.
This problem has been well studied [70, 71, 72, 73, 74, 82, 83, 84, 85, 87,109].
For most of multiple resource allocation problems discussed in the literature,
the performance of each activity (the production level of a certain prod-
uct in the context of production planning) is measured by the deviation of
the planned activity level (the amount of products to be produced) from
a given demand. Minimax objective is usually adopted because minimiz-
ing the maximum deviation among all activities balances deviations over
all activities, and is thus intuitively appealing to practitioners engaged in
production planning. Also, decision variables can be usually regarded as
continuous. Efficient algorithms for such problems have been developed.
Among them we shall review two algorithms proposed by Luss [82, 83] in
t,he next subsection.
The second topic is on the multiperiod resource allocation problem. A
typical application is also found in production planning. For each activity,
a cumulative target level is prespecified. Each decision variable represents
the cumulative level selected for an activity up to some period. Similarly
to the multiple resource allocation problem, this type of problem is usually
formulated as the one with minimax objective and continuous variables.
240 N. Katoh and T. Ibaraki

This will be discussed in Section 8.2.


The third topic deals with the resource allocation problem under a single
nonlinear constraint. We have mentioned an application of this problem
in Section 7.3, and, as pointed out [14], this type of problem has many
applications in various fields. This topic will be dealt with in Section 8.3.
Finally, we shall discuss in Section 8.4 miscellaneous problems related
to resource allocation, including (1) multiple resource allocation problem
with smoothing objective, and (2) minimum variance resource allocation
problem.

8.1 Multiple resource allocation


As remarked above, the multiple resource allocation problem usually has
a minimax objective function and continuous variables. We first give a
problem formulation, and then briefly explain two algorithms. We use the
following notations:
i: index for resources, i = 1,2, ... ,m,
j: index for activities, j = 1,2, ... ,n,
air amount of resource i used by one unit of activity jj aij ~ 0,
bi: amount of available resource ij bi > 0,
Xj: activity level to be chosen for jj Xj ~ 0,
fi(Xj): cost function associated with activity j.
The problem is formulated as follows.

M RP : minimize l~~nfi(Xj) (8.1)


-'-
n
subject to LaijXj ~ bi, i = 1,2, ... ,m, (8.2)
j=l
Xj ~ 0, j = 1,2, ... ,no (8.3)

We assume that fi(Xj) is strictly decreasing and continuous, and denote


its inverse function by J;l(')j i.e., fi(Xj) = Y implies J;l(y) = Xj' Note
that J;l(y) is also strictly decreasing and continuous. Let the indices j be
reordered so that
h(O) ~ 12(0) ~ ... ~ In(O)
holds. In the following, we denote the optimal objective value by V*.
Resource Allocation Problems 241

Theorem 8.1 There exists an optimal solution x* of MRP with the objec-
tive value V*, such that there exists a set J* = {j*, j* + 1, ... , n} satisfying

Xj* > 0,J. E J* , (8.4)

xi = 0, j ¢ J*, (8.5)

and

fj(xi) = V*, j E J*, (8.6)

fj(O) ~ V*, j ¢ J*. (8.7)

PROOF. Suppose there is an optimal solution x* of problem MRP with


xjl = 0 and xj > 0 for some j < ji' Since fj(O) ~ fiI (0) ~ V* for all j ~ j1
and all aij are nonnegative, reducing xj to zero for all j < j1 preserves
feasibility and optimality; so (8.4) and (8.5) hold for some j*. Now, suppose
fj(xj) < V* for some j E J*. Reducing each such xj until /j(xj) = V*
or until xj reaches zero, whichever occurs first, preserves feasibility and
optimality; so (8.6) holds. (Note that if xl l is reduced to zero, then all
xl's for j ~ j1 are also reduced to zero, so that (8.4)-(8.5) holds.) Since
fj(xj) ~ V* for all j, (8.5) implies (8.7). ~

An optimal solution for MRP is not necessarily unique. However, for an


optimal solution that satisfies (8.4), (8.5), (8.6) and (8.7) none of the xl's
can be reduced without violating the nonnegativity constraints or optimality.
Thus, there is a unique optimal set J* that satisfies (8.4), (8.5), (8.6) and
(8.7). Since fj{xj) is strictly decreasing, (8.4), (8.5), (8.6) and (8.7) also
imply
fj*{O) > V*, {8.8}

fj*-l{O) ~ V*. (8.9)


If j* = 1, only (8.8) is relevant. Note that the following property holds by
definition of J*.

The first algorithm solves the problem by identifying the above set J*, by
considering a sequence of relaxed versions, MRP h without the nonnegativity
242 N. Katoh and T. Ibaraki

constraints Xj 2:: 0, each of which is defined on a different set J = {h, h +


1, ... , n} of activities.

MRP h: minimize maxf·{x·)


jEJ J J
(8.10)

subject to LaijXj ~ bi, i = 1,2, ... ,m. (8.11)


jEJ

Since all fj are decreasing and all aij are nonnegative, one of the constraints
(8.11) holds with equality for an optimal solution of MRPh. Thus, from
(8.6), in order to solve MRP h , we first compute values ki (i = 1,2, ... ,m)
such that

L aijfj- 1 (k i ) = bi, i = 1,2, ... ,m. (8.12)


jEJ

The optimal objective value V; of MRPh is then obtained by

If
fl{O) > Vj* and fl-1(0) ~ Vj*
hold for J' = {l, 1+ 1, ... , n}, then this J' gives the optimal set J*. Theorem
8.1 guarantees the existence of such J'. Furthermore, the optimal objective
value V* of MRP is equal to Vj* and the positive xi's
are given by = xi
fT 1 {Vj*). As shown in [82], for certain nonlinear functions the ki's are given
as closed form expressions. For example, this is so if

where {3j > 0, 'Yj 2:: 0 and 0> O. However, in general, solving equation (8.12)
requires numerical search methods such as Newton method (for example, if
the 0 above is replaced by OJ).
We now explain another algorithm that facilitates the search for J* with-
out explicitly solving equations of the type (8.12). We first show below
properties used for this purpose.

Theorem 8.2 The optimal set J* = {j*, j* + 1, ... ,n} of problem MRP
satisfies the following conditions.

L aijfT 1 (jj* (O)) < bi for i = 1,2, ... ,m. (8.13)


jEJ*
Resource Allocation Problems 243

Furthermore, if j* > 1, it holds

L aijfj-1(fj*-1{O)) ~ bi for some i. (8.14)


jEJ*

PROOF. From (8.2) and (8.6) and the nonnegativity of aij, we have

L aijxj = L aijfT {V*) :-:; bi


1
jEJ* jEJ*

for all i. Since fj*{O) > V* holds by (8.8), fj-l(y) is strictly decreasing, and
all aij'S are nonnegative, it follows

which then implies (8.13). To prove (8.14), note that there is some i, say ii,
such that
L
ailjxj = bi"
jEJ*
Furthermore, since (8.9) implies fj*-l (0) :-:; V*, we have

Therefore,

L ai1jfT1(jj*-1(O)) > L ailjfTl(V*)


jEJ* jEJ*

L ailjxj = bi'
jEJ*

follows.

Remark 8.1: If J* = {n}, condition {8.13} is always satisfied since bi >0


for all i and f j-: 1 (fj*{O)) = o.

Theorem 8.3 For problem MRP, if a set J = {h, h + 1, ... , n} does not
satisfy (8.13) (i.e., LjEJ aijfj-1(!h(O)) ~ bi for some i), it holds

J* c J.
244 N. Katoh and T. Ibaraki

PROOF. Let J' = {l,l + 1, ... ,n} be a set such that J' :J J (i.e., I < h)
and suppose J' satisfies (8.13) (i.e., LjEJ' aijlj-1 (fl (0)) < bi for all i). This
implies that

L aijlj-1(ih(0)) < L aijlj-1(f1(0))


jEJ jEJ

< L aijIT 1(f1(0)) < bi


jEJ'

holds for all i.


The first inequality follows since 1 < h implies 11(0) ~ Ih(O) and li- 1(.)
is strictly decreasing. The second inequality follows since

I j- 1(f1(0)) ~ 0 for all j ~ I

holds from I j- 1(fj(0)) = 0, 1;(0) ~ 11(0) = 0 for j ~ 1, and the monotone


decreasingness of 1;1(.). These inequalities imply that J satisfies (8.13),
which is a contradiction. Hence, J' cannot satisfy (8.13), and thus J* C J
follows. Q]

Theorem 8.4 For problem MRP, if a set J = {h, h + 1, ... ,n} (for h ~ 2)
does not satisfy condition (8.14) (i.e., LieJ aijfj- 1(fh-l(0)) < bi f~r all i),
then it holds

PROOF. Let J' = {l,l + 1, ... ,n} satisfy J' C J (Le., I > h). Then
L aijfT 1(f1-1(0)) < L aijfT 1(fh-1(0))
jEJ' jEJ'

< L aijfT 1(ih-1 (0)) < bi


jEJ

holds for all i. The first inequality is from fh-1(0) ~ fl-1(0). The second
inequality holds since

I j- 1(fh-1(0)) ~ 0 for j ~ h

by assumption i j- 1 (h(0» = 0, ih-l(O) ~ h(O) for j ~ h, and the monotone


decreasingness of I j- 1 (.). Hence, J' does not satisfy (8.14), implying that
J* :J J. Q]
Resource Allocation Problems 245

As a direct consequence of Theorems 8.3 and 8.4, we can compute J*


efficiently by applying the binary search over the index set {I, 2, ... ,n}
using (8.13) and (8.14). It should be remarked here that, unlike conditions
(8.8) and (8.9), testing whether a set J satisfies (8.13) and (8.14) does not
require the costly computation of Vh', which is obtained by solving nonlinear
equations (8.12) as in the first algorithm. Instead, we only need to solve
equations (8.12) once in order to determine V* after the J* is found. This
will greatly reduce the computation time.

8.2 Multiperiod resource allocation

We consider the multiperiod resource allocation problem that seeks to find


an optimal allocation of resources over multiple periods so as to minimize
the cost incurred by the resulting allocation.
As in the problem discussed in the previous subsection, activities may
require to use more than one resource type. It is assumed that all resources
are storable, i.e., excess resources in one period can be used in the subsequent
periods.
As mentioned at the beginning of this section, a cumulative target level
is prespecified for each activity. This means that each decision variable rep-
resents the cumulative level selected for the corresponding activity up to
some period, implying that we have an ordering constraint on the decision
variables corresponding to each activity telling that the cumulative activity
levels are nondecreasing over time. As in the previous subsection, all vari-
ables are assumed to be continuous. The objective is again Minimax, as in
the previous subsection, i.e., the maximum among all costs of activities is
minimized.
We have a set of resources I and a set of activities A. Resources in I
are represented by the index i, and activities in A are represented by the
index j. The model considers discrete time periods t = 1, 2, ... , T, and
T = {I, 2, ... ,T}. Activity i at time t is represented by two indices (j, t)
for j E A and t E T. In production planning, (j, t) represents product j at
time period t. The set of all activity-period pairs is denoted by AP. The
following additional notations are used:
246 N. Katob and T. Ibaraki

aijt: amount of resource i used by one unit of activity (j, t); aijt ~ 0,
bi: amount available of resource i; bi > 0,
Xjt: cumulative level selected for activity (j, t),
ljt: lower bound on the level selected for activity (j, t); ljt ~ 0,
Ujt: upper bound on the level selected for activity (j, t); Ujt ~ 0,
!jt(Xjt): performance function associated with activity (j, t).
The multiperiod minimax resource allocation problem is formulated as fol-
lows.

MPMRP: minimize max !jt(Xjt) (8.15)


(j,t)EAP

subject to L aijtXjt::; bi, i E I, (8.16)


(j,t)EAP
Xjt ~ Xj,t-l, (j, t) E AP, (8.17)
ljt ::; Xjt ::; Ujt, (j, t) E AP, (8.18)
°
where XjO = is assumed for all j E A by convention. Denote the optimal
objective value by V*, and assume that Iit(Xjt) is strictly decreasing and
continuous in Xjt. The inverse function of !jt is denoted by !j-/.
Betts, Brown and Luss [9] has developed an efficient algorithm for this
problem, based on the following theorem (the proof is omitted here; see [9]
for its proof). For a given constant V E R, we call V feasible if there exists
a feasible solution x of MPMRP such that the objective value is less than
or equal to V.
Theorem 8.5 Given a V E R for problem MPMRP, let x(V) = {Xjt(V) I
(j, t) E AP} satisfy the following:

Xjt(V) = max{ljt,Xj,t_I(V),!jtl(V)}, (j,t) E AP. (8.19)


(This x(V) can be obtained by executing (8.19) in the increasing order of t,
starting from Xjo(V) = 0, j E A.) Then
(i) V is feasible to problem MPMRP if and only if x(V) satisfies con-
straints (8.16), (8.17) and (8.18).
(ii) If V = VI is feasible to problem MPMRP and V = V2 is not, then
V2 < VI holds. [;!J

Theorem 8.5 immediately suggests a binary search method to find an optimal


solution of MPMRP. Since we can compute x(V) using the recursive formula
(8.19), the entire algorithm is efficient.
Resource Allocation Problems 247

8.3 Minimizing a separable convex function under a nonlin-


ear constraint

The problem considered here has a single nonlinear constraint together with
lower and upper bounds on all variables, which is formulated as follows.

n
SC /NLP /D : minimize L fj(xj) (8.20)
j=l
n
subject to Lgj(Xj) ~ b, (8.21)
j=l
lj ~ Xj ~ Uj, j = 1, ... ,n, (8.22)
Xj: integer, j = 1, ... , n, (8.23)

where fi(Xj) and gj(Xj), j = 1, ... ,n, are differentiable convex functions
over R, b is a constant, and lj and Uj (lj < Uj), j = 1, ... , n, are lower and
upper bounds on the integer variables. We assume that the feasible region
is nonempty and bounded. Note that the assumptions on fj(xj) and gj(Xj)
imply that the continuous relaxation of SC /NLP /D is a convex program.
Bretthauer and Shetty [14] pointed out that problem SC/NLP /D has
a wide variety of applications, including production planning, capital bud-
geting, capacity planning in manufacturing networks, capacity planning in
computer networks, stratified sampling, and marketing. They proposed a
branch-and-bound algorithm based on the continuous relaxation, and re-
ported that the proposed algorithm could solve problem instances with up
to two hundred variables all of which were taken from the real applications
mentioned above.
Kodialam and Luss [75] studied SC/NLP /e, the continuous version of
SC /NLP /D and developed an algorithm by generalizing the algorithms for
SC/LUB/C developed by Bitran and Hax [11] and Zipkin [123]. They car-
ried out computational experiments for problem instances taken from pro-
duction planning applications, and showed that the proposed algorithms are
practically efficient.
Hochbaum [52] proposed a fully polynomial time approximation scheme
for SC/NLP/D that runs in O(~(nlogN +log~logn+ ~log~)). She also
proposed an O(nlog(N/E)) time algorithm that computes an E-accurate
solution for SC/NLP /C. Both algorithms assume that gj are nondecreasing.
248 N. Katoh and T. Ibaraki

8.4 Other variants of resource allocation problems


8.4.1 Resource allocation problem with smoothing objective
This problem is a variant of the multiperiod resource allocation problem
discussed in Section 8.2. For simplicity, let us assume that we have a single
resource type. The objective is to allocate the resource as smoothly as pos-
sible over the T periods, while satisfying the demands in all periods. More
specifically, we attempt to minimize the sum of the absolute values of the
changes in resource allocation in adjacent periods. A perfectly smooth al-
location of the resource implies that the same amount of resource is made
available in each period. However, this ideal situation is, in general, infeasi-
ble, as it may require temporary shortages or additional resources that will
become surplus at the end of the planning horizon. There are two models.
The first model assumes that the allocation decisions are continuous vari-
ables, and the second model considers discrete variables. Both models have
been studied by Luss and Rosenwein [85].
As indicated in [85], an application of such smoothing problems can
be found, for example, in a manufacture-to-order environment, where the
orders are known for a finite planning horizon. Consider a complex product
like a communications switching system. Each system is custom-made and
requires a different amount of many subassemblies. The sequence of systems
assembled in the final assembly shop during the planning horizon imposes
varying demands on each feeder shop that manufactures a subassembly for
that system. The question then arises how smooth can the production of
each subassembly be, subject to meeting the demands in all periods without
having unnecessary inventory.
Here we present a problem formulation in the simplest case. We use the
following notations.
T: the planning horizon,
t, r: indices for time periods; t, r = 1,2, ... ,T,
dt : demand for the resource in period t; dt ~ 0,
Xt: amount of resource made available in period t.
In production planning applications, dt is the demand for the product
(subassembly or component) at period t, and Xt is the amount of that period
produced in period t. The model assumes that resource allocations and
demands occur instantaneously and simultaneously at the beginning of each
period. Further, no resources are available prior to the first period, and
the resource made available in period t is storable. Thus, it can be used to
Resource Allocation Problems 249

satisfy the demand of period t or of later periods. Demands must be met


on time; i.e., the demand of period t must be satisfied by resources made
available in period t or earlier. The objective is to allocate the resource as
smoothly as possible among successive periods. More precisely, the objective
is to minimize the sum of the absolute value of the changes in the amounts of
resource made available in adjacent periods. The formulation of this model
is as follows:
T
minimize LI X t-1 -xtl (8.24)
t=2
t t
subject to L Xr ~ L dn t = 1,2, ... ,T - 1, (8.25)
r=l r=l
T T
LX = Ld r r , (8.26)
r=l r=l
Xt ~ 0, t = 1, 2, ... , T. (8.27)

The case for discrete variables is similarly defined. Luss and Rosenwein
[85] proposed an O(T2) time algorithm for the continuous case. For the
discrete case, they proposed an algorithm with the same running time that
converts a continuous optimal solution to the discrete one with an extra
postprocessing time of O(T).

8.4.2 Minimum variance resource allocation problem


All the objectives of Minimax, Maximin, Lexico-Minimax, Lexico-Maximin,
and Fair can be viewed as those seeking for a fair allocation of resources
to activities. However, when we are interested in an allocation of discrete
resources, the perfect fairness may not be attained in general. We consider
here variance as an alternative measure to express the degree of the unfair-
ness or imbalance of the resulting allocation. Suppose that function h{Xj)
represents the profit resulting from allocating Xj amount of resources to ac-
tivity j. Then the variance of the profit vector (h(xt), !2(X2), ... , fn{xn))
for an allocation vector x = (Xl, X2,' .. ,x n ) is defined as
n 1 1 n
Var{x) =L -[Jj{Xj) - - L fk(Xk)]2. (8.28)
j=l n n k=l

Then, the minimum variance resource allocation problem under a simple


constraint is formulated as follows.
250 N. Katoh and T. Ibaraki

Q: minimize Var(x)
n
subject to L Xj = N,
j=l
Xj ?: 0, integer j = 1, ... ,n.

This problem has been studied by Katoh [65] and a pseudo-polynomial time
algorithm was presented. The proposed algorithm is based on the technique
that transforms the problem into an equivalent parametric problem where
the objective function becomes separable, i.e.,

n
Q(A) minimize L([fj(xj)]2 - Ah(Xj))
j=l
n
subject to L Xj = N,
j=l
Xj ?: 0, integer j = 1, ... ,n.

Katoh [65] showed that there exists a parameter A = A* such that Q(A*) is
optimal to Q. However, no efficient method to search for A* has been devel-
oped. Thus, we have to solve Q(A) for all A in a certain range. It was shown
in [58] that, in continuously changing the parameter A, the number of differ-
ent optimal solutions for Q(A) is O(n 2N2). Therefore all optimal solutions
in the range can be obtained in O(n 2 N 2 • r) time, where r denotes the time
to solve Q(A) for a fixed A. Notice here that each term [fj(xj)J2 - Ah(Xj) in
the objective function may not be convex even if fj(xj) is convex. Thus, r
is not polynomial in general. However, even if the objective function is not
convex, we can apply dynamic programming to solve Q(A) in O(nN2) time
(see e.g., [58]). Thus, problem Q can be solved in a pseudo-polynomial time.
This idea can be generalized to the problems with more general constraints
such as submodular constraints.
Recently, Ichimori and Katoh [61] studied a special case of Q where
each h(Xj) is Xj/Sj (Sj > 0). In this case, [h(Xj)J2 - Afj(xj) is convex,
and hence Q(A) reduces to problem SC/Simple/D. They proposed a very
efficient algorithm for this case by devising a technique similar to branch-
and-bound procedures. They exhibited its efficiency through computational
experiments.
Resource Allocation Problems 251

8.5 Notes and References


We give here some additional references not mentioned in the previous sec-
tions. We start with multiple resource allocation problem MRP. King [68]
developed an interactive software tool for solving problem MRP with linear
objective function in support of Manufacturing Resource Planning (MRP
II) systems. Ichimori [59] developed a polynomial time algorithm for a spe-
cial class of two resource allocation problem with quadratic convex objective
function. Klein and Luss [70] and Klein, Luss and Rothblum [71] consid-
ered an extension of problem MRP in which certain substitutions among
resources are possible. They proposed efficient algorithms for the problem.
It was mentioned in [70, 71] that such substitution is an important func-
tion in the manufacturing of high-tech electronic products because shortage
of parts are often incurred due to rapid changes in technology. Nguyen
and Stone [94] and Pang and Yu [97] considered similar minimax resource
allocation problems with substitutional resources.
A special case of Problem MPMRP was examined by Luss and Smith [86]
and Klein, Luss and Smith [74], where /jt(Xjt) is linear in Xjt. In [74], the
objective is Lexico-Minimax, and efficient algorithms have been developed
for this case in [86, 74]. The algorithms in [86, 74] first solve a relaxed
problem obtained by neglecting the constraints (8.17) and (8.18). When
variables Xjt in the optimal solution x of the relaxed problem violate the
constraint (8.17) or the lower bound constraint of (8.18), Xjt is fixed to Xjt =
max{ljt, Xj,t-l} or Xjt = ljt, respectively. We then repeat this process by
solving the relaxed problem with the above equality constraints added until
all variables are fixed. It was shown in [86, 74] that this algorithm produces
an optimal solution. This algorithm can be viewed in a certain sense as a
generalization of the algorithm for problem SC/LUB/C by Bitran and Hax
[11] and Zipkin [123] (see also Chapter 2 of [58]). Luss [84] developed post-
optimization schemes and parametric analysis, which may be employed once
an optimal solution of the problem is obtained.
Klein, Luss and Rothblum [72] have developed a general methodology
to solve multiple resource minimax problems including MRP and MPMRP
discussed in this section, based on the so-called relaxation method.
Betts, Brown, and Luss [9] considered minimax resource allocation prob-
lems with ordering constraints by generalizing the constraints (8.17) of prob-
lem MPMRP, and proposed an efficient algorithm that solves a sequence of
problems without ordering constraints.
Hackman and Platzman [49] considered another generalization of MRP
252 N. Katdh and T. Ibaraki

in which a set of resources required by an activity is not fixed, but such


allocation of resource types to an activity is also a decision variable. This
problem is formulated as a nonlinear mixed-integer program, which may be
solved by a branch-and-bound algorithm. This was motivated by positioning
stock problem in a large distribution center (see [48] for the details).
Karabati, Kouvelis and Yu [62, 63] considered the so-called Min-Max-
Sum resource allocation problem with discrete variables that includes as
special cases both SC/Simple/D and Minimax/Simple/D. Yiiceer [121] pro-
posed a variant of the incremental algorithm for the resource allocation
problem with a nonseparable objective function and a single knapsack con-
straint (3.51), which produces a non-dominated solution. He discussed an
application of this problem to a spare-kit problem. More and Vavadis [89]
considered a separable concave discrete minimization with a simple con-
straint Ej=l Xj = N, and developed an algorithm for the problem.

References
[1] C.C. Aggarwal, J.L. Wolf and P.S. Yu, On optimal batching policies for video-
on-demand storage servers, Proc. of IEEE International Conference on Multi-
media Computing and Systems, Hiroshima, Japan, 1996.
[2] R.K. Ahuja, T.L. Magnanti, and J.B. Orlin: Network Flows - Theory, Algo-
rithms, and Applications, Prentice-Hall, 1993.
[3] R.K. Ahuja and J.B. Orlin, A fast and simple algorithm for the maximum flow
problem, Operations Research, Vol.37 (1990) 748-759.
[4] K. Ando and S. Fujishige, On structures of bisubmodular polyhedra. Mathe-
matical Programming Vol. 74 (1996) 293-317.
[5] K. Ando, S. Fujishige and T. Naitoh, A greedy algorithm for minimizing a
separable convex function over an integral bisubmodulat polyhedron. Journal
of the Operations Research Society of Japan Vol. 37 (1994) 188-196.
[6] K. Ando, S. Fujishige and T. Naitoh, A greedy algorithm for minimizing a
separable convex function over a finite jump system. Journal of the Operations
Research Society of Japan Vol. 38 (1995) 362-375.
[7] S. Baase, Computer Algorithms: Introduction to Design and Analysis, 2nd
Edition, Addison Wesley, 1988.
[8] R. Baldick and F.F. Wu, Efficient integer optimization algorithms for optimal
coordination of capacitators and regulators, IEEE 'nans. on Power Systems,
Vol. 5 (1990) 805-812.
Resource Allocation Problems 253

[9] L.M. Betts, J.R. Brown, and H. Luss, Minimax resource allocation for problems
with ordering constraints, Naval Research Logistics, Vol. 41 (1994) 719-738.
[10] P.P. Bhatacharya, L. Georgiadis, P. Tsoucas, and I. Viniotis, Adaptive lexico-
graphic optimization in multi-class M/GI/l Queues, Mathematics of Opera-
tions Research, Vol. 18, No.3 (1993) 705-740.
[11] G.R. Bitran and A.C. Hax, Disaggregation and resource allocation using con-
vex knapsack problems with bounded variables, Management Science, Vol. 27
(1981) 431-441.
[12] G.R. Bitran and D. Tirupati, Tradeoff curves, targeting and balancing in man-
ufacturing queueing networks, Operations Research, Vol. 37 (1989) 547-564.
[13] G.R. Bitran and D. Tirupati, Capacity planning in manufacturing networks,
Annals of Operations Research, Vol. 17 (1989) 119-135.
[14] K.M. Bretthauer and B. Shetty, The nonlinear resource allocation problem,
Operations Research, Vol. 43, No.4 (1995) 670-683.
[15] K.M. Bretthauer and B. Shetty, Quadratic resource allocation with generalized
upper bounds, Operations Research Letters, Vol. 20 (1997) 51-57.
[16] K.M. Bretthauer, B. Shetty and S. Syam, Branch and bound algorithm for
integer quadratic knapsack problem, ORSA J. Computing, Vol. 7 (1995) 109-
116.
[17] K.M. Bretthauer, B. Shetty, S. Syam and S. White, A model for resource
constrained production and inventory management, Decision Sciences, Vol. 25
(1994) 561-580.
[18] J.R. Brown, Solving knapsack sharing with general tradeoff functions, Math-
ematical Programming, Vol. 51 (1991) 55-73.
[19] J.R. Brown, Bounded knapsack sharing, Mathematical Programming, Vol. 67,
No.3 (1994) 343-382.
[20] P. Brucker, An O{n) algorithm for quadratic knapsack problems, Operations
Research Letters, Vol. 3 (1984) 163-166.
[21] J.A. Buzacott and J.G. Shanthikumar, Stochastic Models for Manufacturing
Systems, Prentice-Hall, New Jersey, 1993.
[22] M.A. Cohen, P.R. Kleindorfer and H.L. Lee, Near-optimal service-constrained
stocking policies for spare parts, Operations Research, Vol. 37 No.1 (1989)
104-117.
[23] W. Cook, A.M.H. Gerards, A. Schrijver, and E. Tardos, Sensitivity results
in integer linear programming, Mathematical Programming, Vol. 34 (1986)
251-264.
254 N. Katoh and T. Ibaraki

[24] S. Cosares and D.S. Hochbaum, Strongly polynomial algorithms for the
quadratic transportation problem with fixed number of sources, Mathemat-
ics of Operations Research, Vol. 19 No.1 (1994) 94-111.
[25] G.B. Dantzig, Linear Programming and Extensions: Princeton University
Press, Princeton, N.J., 1963.
[26] M.E. Dyer and A.M. Frieze, On an optimization problem with nested con-
straints, Discrete Applied Mathematics, Vol. 26 (1990) 159-173.
[27] M.E. Dyer and and J. Walker, An algorithm for a separable integer program-
ming problem with cumulatively bounded variables, Discrete Applied Mathe-
matics, Vol. 16 (1987) 135-149.
[28] J. Edmonds and R. Giles, J. Edmonds, A mini-max relation for submodular
functions on graphs, Annals of Discrete Mathematics, Vol. 1 (1977) 185-204.
[29] H.A. Eiselt, Continuous maximin knapsack problems with GLB constraints,
Mathematical Programming, Vol. 36 (1986) 114-121.
[30] J .E. Eu, The sampling resource allocation problem, IEEE '!rans. on Commu-
nications, Vol. 39 No.9 (1991) 1277-1279.
[31] A. Federguruen and H. Groenevelt, The greedy procedure for resource alloca-
tion problems - necessary and sufficient conditions for optimality, Operations
Research, Vol. 34 (1986) 909-918.
[32] L.R. Ford and D.R. Fulkerson, Flows in Networks, Princeton University Press,
Princeton, New Jersey, 1962.
[33] B.L. Fox, Discrete optimization via marginal analysis, Management Science,
Vol. 13 (1966) 210-216.
[34] G.N. Frederickson and D.B. Johnson, The complexity of selection and ranking
in X + Y and matrices with sorted columns, Journal of Computer and System
Sciences, Vol. 24 (1982) 197-208.
[35] S. Fujishige, Lexicographically optimal base of a polymatroid with respect to
a weight vector, Mathematics of Operations Research, Vol. 21 (1980) 186-196.
[36] S. Fujishige: Linear and nonlinear optimization problems with submodular
constraints. In: Mathematical Programming(M. Iri and T. Tanabe, eds., KTK
Scientific Publishers, Tokyo, 1989) 203-225.
[37] S. Fujishige, Submodular Functions and Optimization, North-Holland, 1991.
[38] S. Fujishige, A min-max theorem for bisubmodular polyhedra, SIAM Journal
on Discrete Mathematics, Vol. 10 No.2 (1997) 294-308.
[39] S. Fujishige, N. Katoh and T. Ichimori, The fair resource allocation problem
with submodular constraints, Mathematics of Operations Research, Vol. 13,
No.1 (1988) 164-173.
Resource Allocation Problems 255

[40] H.N. Gabow and R.E. Tarjan, Linear time algorithm for a special case of
disjoint set union, Journal of Computer and System Sciences, Vol. 30 (1985)
209-22l.
[41] Z. Galil and N. Megiddo, A fast selection algorithm and the problem of opti-
mum distribution of effort, Journal of ACM, Vol. 26 (1979) 58-64.
[42] G. Gallo, M.E. Grigoriadis and R.E. Tarjan, A fast parametric maximum flow
algorithm and applications, SIAM Journal on Computing, Vol. 18 (1989) 30-
55.
[43] A. Galperin and Z. Waksman, A separable integer programming problem
equivalent to its continual version, J. Comput. Appl. Math., Vol. 7 (1981)
173-179.
[44] H. Groenevelt, Two algorithms for maximizing a separable concave function
over a polymatroid feasible region, European Journal of Operational Research,
Vol. 54 (1991) 227-236.
[45] O. Gross, A class of discrete type minimization problems, RM-1644, RAND-
Corp., 1956.
[46] M. Grotschel, L. Lovasz, and A. Schrijver, Geometric Algorithms and Com-
binatorial Optimization (Algorithms and Combinatorics 2), Springer-Verlag,
Berlin, 1988.
[47] O.K. Gupta and A.Ravindran, Branch and bound experiments in convex non-
linear integer programming problems, Management Science, Vol. 31 (1985)
1533-1546.
[48] S.T. Hackman and L.K. Platzman, Allocating items to an automated storage
and retrieval systems, lIE Transactions, Vol. 22 (1989) 7-14.
[49] S.T. Hackman and L.K. Platzman, Near-optimal solution of generalized re-
source allocation problems with large capacities, Operations Research, Vol. 38,
No.5 (1990) 902-910.
[50] D.S. Hochbaum, Polynomial algorithms for convex network optimization," in
Network Optimization Problems: Algorithms, Complexity and Applications,
edited by D. Du and P. M. Pardalos, World Scientific, (1993) 63-92.
[51] D.S. Hochbaum, Lower and upper bounds for the allocation problem and
other nonlinear optimization problems, Mathematics of Operations Research,
Vol. 19, No.2 (1994) 390-409.
[52] D.S. Hochbaum, A nonlinear knapsack problem, Operations Research Letters,
Vol. 17 (1995) 103-110.
[53] D.S. Hochbaum and S. Hong, About strongly polynomial time algorithms for
quadratic optimization over submodular constraints, Mathematical Program-
ming Vol. 69 (1995) 269-309.
256 N. Katoh and T. Ibaraki

[54] D.S. Hochbaum, R. Shamir and J.G. Shanthikumar, A polynomial algorithm


for an integer quadratic nonseparable transportation problem, Mathematical
Programming, Vol. 55 No.3 (1992) 359-372.
[55] D.S. Hochbaum and J.G. Shanthikumar, Nonlinear separable optimization is
not much harder than linear optimization, Journal of ACM, Vol. 37, No.4
(1990) 843-862.
[56] A.J. Hoffman, A generalization of max-flow min-cut, Mathematical Program-
ming, Vol. 6 (1974) 352-359.
[57] T.C. Hu, Integer Programming and Network Flows, Addison-Wesley, New-
York, 1969.
[58] T. Ibaraki and N. Katoh, Resource Allocation Problems: Algorithmic Ap-
proaches, The MIT Press, Cambridge, MA, 1988.
[59] T. Ichimori, A two-resource allocation problem with a quadratic objective
function, Transactions of the Japan Society for Industrial and Applied Math-
ematics, Vol. 3 No.3 (1993) 199-215 (in Japanese).
[60] T. Ichimori and N. Katoh, A two-commodity sharing problem on networks,
Networks, Vol. 21 (1991) 547-564.
[61] T. Ichimori and N. Katoh, Minimum variance discrete resource allocation prob-
lem, unpublished manuscript (in Japanese) 1997.
[62] S. Karabati, P. Kouvelis, and G. Yu, The discrete allocation problem in flow
lines, Management Science, Vol. 41 No.9 (1995) 1417-1430.
[63] S. Karabati, P. Kouvelis, and G. Yu, A min-max-sum resource allocation prob-
lem and its application, Working Paper 1996-08, Koc University, Istanbul,
Turkey, 1996.
[64] A.V. Karzanov and S.T. McCormick, Polynomial methods for separable convex
optimization in totally unimodular linear spaces with applications, SIAM J.
on Computing, Vol. 26 No.4 (1997) 1245-1275.
[65] N. Katoh, An €-approximation scheme for minimum variance problems, Jour-
nal of the Operations Research Society of Japan, Vol. 33 No. 1 (1990) 46-65.
[66] N. Katoh, T. Ibaraki and H. Mine, A polynomial time algorithm for the re-
source allocation problem with a convex objective function, Journal of Oper-
ational Research Society, Vol. 30 (1979) 449-455.
[67] W. Kim, A new way to compute the product and join of relations, Proc. of
the ACM SIGMOD Conference, Santa Monica, 1980.
[68] J.H. King, Allocation of scarce resources in manufacturing facilities, AT&T
Technical Journal, Vol. 68, No.3 (1989) 103-113.
Resource Allocation Problems 257

[69] V. King, S. Rao and R. Tarjan, A faster deterministic maximum flow algo-
rithm, Proc. of the 3rd Annual AGM-SIAM Symposium on Discrete Algo-
rithms, (1992) 157-163.
[70] R.S. Klein and H. Luss, Minimax resource allocation with tree structured
substitutable resources, Operations Research, Vol. 39 No.2 (1991) 285-295.
[71] R.S. Klein, H. Luss, and U.G. Rothblum, Minimax resource allocation prob-
lems with resource-substitutions represented by graphs, Operations Research
Vol. 41 (1993) 959-97l.
[72] R.S. Klein, H. Luss, and U.G. Rothblum, Relaxation-based algorithms for
minimax optimization problems with resource allocation applications, Mathe-
matical Programming, Vol. 64 (1994) 337-363.
[73] R.S. Klein, H. Luss, and U.G. Rothblum, Multiperiod allocation of substi-
tutable resources, European J. of Operational Research,Voi. 85 (1995) 488-503.
[74] R.S. Klein, H. Luss, and D.R. Smith, A lexicographic minimax algorithm for
multiperiod resource allocation, Mathematical Programming, Vol. 55 (1992)
213-234.
[75] M.S. Kodialam and H. Luss, Algorithms for separable nonlinear resource allo-
cation problems, to appear in Operations Research.
[76] B.O. Koopman, The optimum distribution of effort, Operations Research,
Vol. 1 (1953) 52-63.
[77] P. Kubat and H.S. Koch, Managing test-procedures to achieve reliable soft-
ware, IEEE Transactions on Reliability, Vol. R-32 (1983) 299-303.
[78] T. Kuno, H. Konno and E. Zemel, A linear time algorithm for solving contin-
uous knapsack problems, Operations Research Letters, Vol. 10 (1991) 23-26.
[79] H.L. Lee and W.P. Pierskalla, Mass screening models for contagious diseases
with no latent period, Operations Research, Vol. 36 No.1 (1988) 917-928.
[80] D. Li and Y. Y. Haimes, A decomposition method for optimization of large-
system reliability, IEEE Trans. on Reliability, Vol. 41 No.2, (1992) 183-189.
[81] L. Lovasz, Submodular functions and convexity, in: A. Bachem, M. Grotschel
and B. Korte, eds., Mathematical Programming - The State of the Art
(Springer-Verlag, Berlin, 1983) 235-257.
[82] H. Luss, An algorithm for separable nonlinear minimax problems, Operations
Research Letters, Vol. 6 (1987) 159-162.
[83] H. Luss, A nonlinear minimax allocation problem with multiple knapsack con-
straints, Operations Research Letters, Vol. 10 (1991) 183-187.
[84] H. Luss, Minimax resource allocation problems: Optimization and parametric
analysis, European J. of Operational Research Vol. 60 (1992) 76-86.
258 N. Katoh and T. Ibaraki

[85] H. Luss and M.B. Rosenwein, Multiperiod resource allocation with a smoothing
objective, Naval Research Logistics, Vol. 42 (1995) 1007-1020.
[86] H. Luss and D.R Smith, Resource allocation among competing activities: A
lexicographic minimax approach, Operations Research Letters, Vol. 5 (1986)
227-231.
[87] H. Luss and D.R Smith, Multiperiod allocation of limited resources: A mini-
max approach, Naval Research Logistics, Vol. 35 (1988) 493-501.
[88] N. Megiddo and A. Tamir, Linear time algorithms for some separable quadratic
programming problems, Operations Research Letters, Vol. 13 (1993) 203-211.
[89] J.J. More and S.A. Vavadis, On the solution of concave knapsack problems,
Mathematical Programming, Vol. 49 (199)) 397-411.
[90] K. Murota, Convexity and Steinitz's exchange property, Advances in Mathe-
matics, Vol. 124 (1996) 272-311.
[91] K. Murota, Discrete convex analysis, to appear in Mathematical Programming.
[92] K. Murota, Discrete convex analysis, to appear in Discrete Structure and Al-
gorithms V, edited by S. Fijishige, Kindai-Kagakusha, 1998 (in Japanese).
[93] K. Namikawa and T. Ibaraki, An algorithm for the fair resource allocation
problem with a submodular constraint, Japan J. of Industrial and Applied
Mathematics, Vol. 8 (1991) 377-387.
[94] Q.C. Nguyen and R.E. Stone, A multiperiod resource allocation problem with
storable and substitutable resources, Management Science, Vol. 39 (1993) 964-
974.
[95] S.S. Nielsen and S.A. Zenious, Massively parallel algorithms for singly con-
strained convex program, ORSA J. Computing, Vol. 4 (1992) 166-181.
[96] H. Ohtera and S. Yamada, Optimal allocation & control problems for software-
testing resources, IEEE Trans. on Reliability, Vol. 39 No.2, (1990) 171-176.
[97] J .-S. Pang and C.-S. Yu, A min-max resource allocation problem with substi-
tutions, European Journal of Operational Research, Vol. 41 (1989) 218-223.
[98] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms
and Complexity, Prentice-Hall, Englewood Cliffs, N.J., 1982.
[99] P.M. Pardalos and N. Kovoor, An algorithm for a singly constrained class of
quadratic programs subject to upper and lower bounds, Mathematical Pro-
gramming, Vol. 46 (1990) 321-328.
[100] P.M. Pardalos, Y. Ye and C.G. Han, Algorithms for the solution of quadratic
knapsack problems, Linear Algebra Appl., Vol. 152 (1991) 69-91.
[101] J. Renegar, On the worst case arithmetic complexity of approximation zeroes
of polynomials, Journal of Complexity, Vol. 3 (1987) 9-113.
Resource Allocation Problems 259

[102] A.G. Robinson, N. Jiang and C.S. Lerme, On the continuous quadratic knap-
sack problem, Mathematical Programming, Vol. 55 (1992) 99-108.
[103] J.G. Shanthikumar and D.O. Yao, Second-order stochastic properties in
queueing systems, Proceedings of IEEE, Vol. 77 No.1 (1989) 162-170.
[104] J.G. Shanthikumar and D.O. Yao, Multiclass queueing systems: Polyma-
troidal structure and optimal scheduling control, Operations Research, Vol. 40
No.2 (1992) 293-299.
[105] W. Shih, A new application of incremental analysis in resource allocations,
Operational Research Quarterly, Vol. 25 (1974) 587-597.
[106] A. Shioura, Minimization of an M-convex function, to appear in Discrete
Applied Mathematics.
[107] H.S. Stone, J. Turek, and J .L. Wolf, Optimal partitioning of cache memory,
IEEE Transactions on Computers, Vol. 41 No.9 (1992) 1054-1068.
[108] A. Tamir, A strongly polynomial algorithm for minimum convex separable
quadratic cost flow problems on series-parallel networks, Mathematical Pro-
gramming, Vol. 59 (1993) 117-132.
[109] C.S. Tang, A max-min allocation problem: Its solutions and applications,
Operations Research, Vol. 36 (1988) 359-367.
[110] A.N. Tantawi, D. Twosely, and J.L. Wolf, An algorithm for a class constrained
resource allocation problem, 1987.
[111] D. Thiebaut, H.S. Stone, and J.L. Wolf, Improving disk cache hit-ratios
through cache partitioning, IEEE Transactions on Computers, Vol. 41 No.9,
(1992) 665-676.
[112] J. Turek, W. Ludwig, J.L. Wolf, L. Fleischer, P. Tiwari, J. Glasgow,
U. Schwiegelsohn, and P.S. Yu, Scheduling parallelizable tasks to minimize
average response time, Proc. of ACM Symp. on Parallel Algorithms and Ar-
chitectures, 1994.
[113] J. Turek, J.L. Wolf, K.R. Pattipati, and P.S. Yu, Scheduling parallelizable
tasks: Putting it all on the shelf, Proc. of ACM SIGMETRlCS '92, 225-236,
June 1992.
[114] J. Turek, J.L. Wolf, and P.S. Yu, Approximate algorithms for scheduling par-
allelizable tasks, Proc. of the 4 th Annual Symposium on Parallel Algorithms
and Architectures, San Diego, 323-332, June 1992.
[115] J .A. Ventura and C.M. Klein, A note on multi-item inventory systems with
limited capacity, Operations Research Letters, Vol. 7 (1988) 71-75.
[116] J.L. Wolf, D.M. Dias, and P.S. Yu, A parallel sort merge join algorithm for
managing data skew, IEEE Trans. on Parallel and Distributed Systems, Vol. 4
No.1 (1993) 70-86.
260 N. Katoh and T. Ibaraki

[117] J.L. Wolf, B.R. Iyer, K.R. Pattipati, and p.s. Yu, Optimal buffer partitioning
for the nested block join algorithm, Proc. of 7th International Conference on
Data Engineering, 1991, 510-519.
[118] J.L. Wolf, P.L. Yu, and H. Shachnai, DASD dancing: A disk load balancing
optimization scheme for video-on-demand computer systems, Proc. of ACM
Sigmetrics Conference, Ottawa, Canada, 1995.
[119] D.D. Yao, Optimal run quantities for an assembly system with random yields,
IIE Transactions, Vol. 20 No.4 (1988) 399-403.
[120] G. Yu and P. Kouvelis, On min-max optimization of a collection of classical
discrete optimization problems, in Minimax and Applications, D.-Z. Du and
P.M. Pardalos (eds.), (1995) 157-171.
[121] U. Yiiceer, Marginal.allocation algorithm for nonseparable functions, unpub-
lished manuscript 1995.
[122] Tak-S. Yum, Mon-S. Chen, and Yiu-W. Leung, Bandwidth allocation for
multimedia teleconferences, Proc. of ICC'91, 1991, 852-858.
[123] P.H. Zipkin, Simple ranking methods for allocation of one resource, Manage-
ment Science, Vol. 26 (1980) 34-43.
261

HANDBOOK OF COMBINATORIAL OPTIMIZATION


D.-Z. Du and P.M. Pardalos (Eds.) pp. 261-329
@1998 Kluwer Academic Publishers

Combinatoral Optimization in Clustering


Boris Mirkin
Center for Discrete Mathematics & Theoretical Computer Science
(DIMACS)
Rutgers University, P.D.Box 1179, Piscataway, NJ, 08855
and Central Economics-Mathematics Institute (CEMI), Moscow, Russia.
E-Mail: mirkin<Odimacs. rutgers. edu

Ilya Muchnik
RUTCDR and DIMACS
Rutgers University, P.D.Box 1179, Piscataway, NJ, 08855
E-Mail: muchnik<Orutcor. rutgers. edu

Contents
1 Introduction 262

2 Types of Data 265

3 Cluster Structures 274

4 Clustering Criteria 275

5 Single Cluster Clustering 276


5.1 Clustering Approaches . . . . . . . . . . . . . . . . . . . . . . . . · . 276
5.1.1 Definition-based Clusters . . . . . . . . . . . . . . . . . . · . 276
5.1.2 Direct Algorithms . . . . . . . . . . . . . . . . . . . . . . · . 278
5.1.3 Optimal Clusters . . . . . . . . . . . . . . . . . . . . . . . · . 280
5.2 Single and Monotone Linkage Clusters . . . . . . . . . . . . . . . · . 281
5.2.1 MST and Single Linkage Clustering . . . . . . . · . 281
5.2.2 Monotone Linkage Clusters . . . . . . . . . . . . . . . . . · . 283
5.2.3 Modeling Skeletons in Digital Image Processing . . . . . . · . 285
5.2.4 Linkage-based Convex Criteria . . . . . . . . . . · . 287
5.3 Moving Center and Approximation Clusters . . . . . . . . . . . . · . 289
262 B. Mirkin and 1. Muchnik

5.3.1 Criteria for Moving Center Methods 289


5.3.2 Principal Cluster . . . . 289
5.3.3 Additive Cluster .... 292
5.3.4 Seriation with Returns . 294

6 Partitioning 295
6.1 Partitioning Column-Conditional Data . . 295
6.1.1 Partitioning Concepts ....... 295
6.1.2 Cohesive Clustering Criteria. . . . 296
6.1.3 Extreme Type Typology Criterion 297
6.1.4 Correlate/Consensus Partition 297
6.1.5 Approximation Criteria 298
6.1.6 Properties of the Criteria 299
6.1.7 Local Search Algorithms . · 300
6.2 Criteria for Similarity Data ... · 303
6.2.1 Uniform Partitioning ... · 303
6.2.2 Additive Partition Clustering · 305
6.2.3 Structured Partitioning · 306
6.2.4 Graph Partitioning . . · 308
6.3 Overlapping Clusters . . . . . . · 309
7 Hierarchical Structure Clustering 310
7.1 Approximating Binary Hierarchies · 310
7.2 Indexed Hierarchies and Ultrametrics . · 313
7.3 Fitting in Tree Metrics . . . . . · 314
8 Clustering for Aggregable Data 316
8.1 Box Clustering ........ · 316
8.2 Bipartitioning . . . . . . . . . · 319
8.3 Aggregation of Flow Tables · 319
9 Conclusion 321

References

1 Introduction
Clustering is a mathematical technique designed for revealing classification
structures in the data collected on real-world phenomena. A cluster is a
piece of data (usually, a subset of the objects considered, or a subset of
the variables, or both) consisting of the entities which are much "alike", in
Combinatoral Optimization in Clustering 263

terms of the data, versus the other part of the data. The term itself was
coined in psychology back in thirties when a heuristical technique was sug-
gested for clustering psychological variables based on pair-wise coefficients
of correlation. However, two more disciplines also should be credited for the
outburst of clustering occurred in the sixties: numerical taxonomy in biology
and pattern recognition in machine learning. Among relevant sources are
Hartigan (1975), Jain and Dubes (1988), Mirkin (1996). Simultaneously,
industrial and computational applications gave rise to graph partitioning
problems which are touched below in 6.2.4.
Combinatorial optimization and graph theory are closely connected with
clustering issues through such combinatorial concepts as connected compo-
nent, clique, graph coloring, min-cut, and location problems having obvious
clustering flavor. A concept interweaving the two areas of research is the
minimum spanning tree (MST) emerged initially in clustering (within a bi-
ologically oriented method called Wrozlaw taxonomy, see a late reference in
Florek et al. (1951)) and having become a cornerstone in computer sciences.
In the follow-up review of combinatorial clustering, we employ the most
natural bases for systematization of the abundant material: available: by
types of input data and output cluster structures. This slightly differs of
the conventional taxonomy of clustering (hierarchic versus nonhierarchic,
overlapping versus nonoverlapping) in which a confusion between cluster-
ing structures and algorithms may occur. In section 2, five types of data
tables are considered according to extent of admitted comparability among
the data entries: column-conditional, comparable, aggregable, Boolean, and
spatial ones. In section 3, five types of discrete cluster structures are defined:
subsets (single clusters), partitions, hierarchies, structured partitions and bi-
partite structures, as those the most of references deal with. A very short
section 4 describes what kind of clustering criteria is the present authors'
best choice, though some other criteria are also considered in the further
text. A major problem with clustering criteria is that usually they cannot
be clear-cut substantiated (except for those emerged in specific engineering
problems): the criteria relate quite indirectly to the major goal of cluster-
ing, which is improving of our understanding of the world. This makes a
great deal of clustering research to be devoted to problems of substantiation
of clustering criteria with instance or Monte-Carlo studies or mathematical
investigation of their properties and interconnections.
Section 5 is devoted to problems of separating a single cluster from the
data (single cluster clustering). Two major ad hoc algorithms, greedy seri-
ation and moving center separation, are discussed in the contexts of corre-
264 B. Mirkin and 1. Muchnik

sponding criteria and their properties. Two kinds of criteria related, mono-
tone linkage based set functions and data approximation, are discussed at
length in subsections 5.2 and 5.3, respectively. The seriation and moving
center methods appear to be local search algorithms for the criteria.
Partitioning problems are considered in section 6. In subsection 6.1, the
problems of partitioning for column-conditional data are discussed. The
authors try to narrow down the overwhelming number of clustering crite-
ria that have been or can be suggested. A bunch of different approaches
is unified via a bunch of equivalent (under certain conditions) criteria. A
bunch of ad hoc clustering methods (agglomerative clustering, K-Means,
exchange, conceptual clustering) are discussed as those which appear to be
local search techniques for these criteria. From the user's point of view,
a major conclusion from this discussion is that the methods (along with
the parameters suggested), applied to a data set, will yield similar results.
The optimal partitioning problem in the coordinate-based framework seems
understudied and needed more efforts. In subsection 6.2, partitioning of
(dis)similarity (comparable) data matrices is covered. The topics of interest
are: uniform partitioning, additive partitioning, and graph partitioning dis-
cussed mostly in the context of data approximation. The last part is devoted
to the problem of structured partitioning (block modeling). In subsection
6.3, the approximation approach is applied to clustering problems with no
nonoverlap ping restrictions imposed.
Hierarchies as clustering structures are discussed in section 7. In sub-
section 7.1, an approximation model is shown to lead to some known ad
hoc divisive clustering techniques. The other subsections deal with indexed
hierarchies (ultrametrics) and tree metrics, the subject of particular interest
in molecular evolution studies (Setubal and Meidanis (1997)).
Section 8 is devoted to three approximation clustering problems for
aggregable (co-occurence) data: box clustering (revealing associated row-
column sets), bipartitioning/ aggregation of rectangular matrices, and ag-
gregation of square interaction (flow) matrices. The aggregable data seem
of importance in a predictable future since they present information about
very large or massive data sets in a treatable format of counts or volumes.
The material is illustrated with examples which are printed with a smaller
font.
For additional coverage, see Brucker (1978), Arabie and Hubert (1992),
Crescenzi and Kann (1995), Arabie, Hubert and De Soete [Eds.] (1996),
Day (1996) and Mirkin (1996).
Combinatoral Optimization in Clustering 265

2 Types of Data
Mathematical formulations for clustering problems much depend on the ac-
cepted format of input data. Though in the real world more and more
data are of continuous nature, as images and signals, the computationally
treated cases involve usually discrete or digitalized data. The discrete data
are considered usually as arranged in a table format.
To get an intuition on that, let us consider a set of data presented in
Table 1 which is just a 7 x 7 matrix, X = (Xik), i E I, k E K. Three features
of the table are due to the authors' willingness of using the same data set
for illustrating many problems. In general, the entries may be any reals, not
just zeros or ones. There may be no symmetry in the matrix entries, and
the number of rows may differ from the number of columns. Data in tables
6 and 7 are instances of such more general data sets.

Table 1: An illustrative data set.

Columns 1 2 3 4 5 6 7
Rows
1 1 1 1
2 1 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1 1
6 1 1 1 1 1
7 1 1 1 1

Depending on the extent of comparability among the data entries, it is


useful to distinguish among the following data types:
(A) Column-Conditional Data.
(B) Comparable Data.
(C) Aggregable Data.
(D) Boolean Data.
(E) Spatial Data.
The meaning of these follows.
(A) Column-Conditional Data
The columns are considered different noncomparable variables so that
their entries may be compared only within the columns. For instance, sup-
266 B. Mirkin and 1. Muchnik

pose every row is a record of the values of some variables for an individual,
so that the first column of X relates to sex (0 - male, 1 - female) while the
second to the respondent's attitude toward a particular kind of cereal (1 -
liking, 0 - not liking).
In such a situation, a preliminary transformation is usually performed
to standardize the columns so that they could be thought of as comparable,
to some extent. Such a standardizing transformation usually is
Xik - ak
Yik := bk (2.1)

to shift the origin (to ak) and change the scale (by factor bk ) where ak is
a central or normal point in the range of the variable (column) k and bk
a measure of the variable's dispersion. When a hypothesis about a prob-
abilistic distribution as the variable's generating facility can be admitted
with no much violation of the data's nature, the standardizing parameters
can be taken from the distribution theory, as the average, for ak, and stan-
dard deviation, for bk , when the distribution is Gaussian. When no reliable
and reproducible mechanism for the data generation can be assumed, the
choice of the parameters should be based on a different way of thinking as,
say approximation considerations in Mirkin (1996). The least-squares ap-
proximation also leads to the average and standard deviation as the most
appropriate values. The standardized matrix Y = (Yik) obtained with these
shift and scale parameters is presented in Table 2 which is not symmetric
anymore. However, other approximation criteria may lead to differently de-
fined ak and bk. For instance, the least-maximum-deviation criterion yields
bk as the range and ak mid-range of the variable k.

Table 2: Matrix Y obtained from X via least-squares standardization.

1 1.15 0.87 1.15 -0.87 -1.15 -1.58 -1.15


2 1.15 0.87 1.15 -0.87 0.87 -1.58 -1.15
3 1.15 0.87 -0.87 -0.87 -1.15 0.63 -1.15
4 -0.87 -1.15 -0.87 -0.87 0.87 0.63 0.87
5 -0.87 0.87 -0.87 1.15 -1.15 0.63 0.87
6 -0.87 -1.15 1.15 1.15 0.87 0.63 0.87
7 -0.87 -1.15 -0.87 1.15 0.87 0.63 0.87

(B) Comparable Data


Combinatoral Optimization in Clustering 267

A data table X = (Xik) is comparable if all the values Xik (i E I, k E Kj


sometimes I = K) across the table are comparable, which means also that
the user considers it convenient to average any subset of the entries.
The original data in Table 1 can be considered comparable if they present,
say, an account of mutual liking among seven individuals. Also, compara-
ble data tables are frequently obtained from the column-conditional tables
as between-item distances, similarities or dissimilarities. Similarity differs
from dissimilarity by direction: increase in difference between two items
corresponds to a smaller similarity and larger dissimilarity value. A dissim-
ilarity table is called distance if it satisfies the metric space axioms (more
on dissimilarities see in [76], [61]). A graph with its edges weighted can be
considered as a nonnegative comparable III x III matrix (of the weights).
Tables 3 and 4 present similarity matrices obtained from Table 1 con-
sidered as a column-conditional table. Table 3 is a distance matrix. Its
(i,j)-th entry hij is the number of noncoinciding components in the row-
vectors, which is called Hamming distance. The other preferred distances
are Euclidean distance squared,

drj := L IXik - xjkf,


kEJ

and the citi-block metric,

de := L IXik - xjkl·
kEJ

Curiously, because of binary entries, these latter distances coincide, in this


particular case, with the Hamming distance.

Table 3: Matrix H of Hamming distances between the rows of X.

Entity 1 2 3 4 5 6 7
1 0 1 2 6 5 6 7
2 1 0 3 5 6 5 6
3 2 3 0 4 6 5
3
4 6 5 4 0 2 1
3
5 5 6 3 3 0 3 2
6 6 5 6 2 3 0 1
7 7 6 5 1 2 1 0
268 B. Mirkin and 1. Muchnik

Table 4 is matrix A = yyT of scalar products of the rows of matrix Y


in Table 2. It is a similarity matrix.

Table 4: Similarity matrix S = YY.

Entity 1 2 3 4 5 6 7
1 1.33 1.00 0.50 -0.75 -0.42 -0.67 -1.00
2 1.00 1.25 0.17 -0.50 -0.75 -0.42 -0.75
3 0.50 0.17 0.95 -0.30 0.03 -0.80 -0.55
4 -0.75 -0.50 -0.30 0.78 -0.05 0.28 0.53
5 -0.42 -0.75 0.03 -0.05 0.87 0.03 0.28
6 -0.67 -0.42 -0.80 0.28 0.03 0.95 0.62
7 -1.00 -0.75 -0.55 0.53 0.28 0.62 0.87

There exists an evident connection between the Euclidean distance


(squared) and the scalar product similarity measure derived from the same
entity-to-variable table:

(2.2)
where aij := (Yi, Yj) := L,kEK YikYjk, which allows for converting the scalar
product similarity matrix A = yyT into the distance matrix D = (dij)
rather easily: 4j = aii + ajj - 2aij. The reverse transformation, converting
the distances into the scalar products, can be defined when all columns in
y are centered, which means that the sum of all the row-vectors is equal to
the zero vector, L:iEI Yi = o. In this case,

(2.3)

where d'f+, 4 j , and 4+ denote the within-row mean, within-column mean,


and the grand mean, respectively, in the array (d'fj).
Frequently, the diagonal entries (a.k.a. (dis )similarities of the entities
with themselves) are of no interest or just unmeasured; this does not much
affect the problems and algorithms; in the remainder, the diagonal entries
present will be the default option.
For standardizing a dissimilarity matrix, there is no need to change the
scale factor since all the entries are comparable across the table. On the
other hand, shifting the origin by subtracting a threshold value a, bij :=
Combinatoral Optimization in Clustering 269

...•.•...••.•.•..••••..•••••••••••....•••..•.•.•••••••••••••••••••.•• 84

ij

Figure 1: The effect of shifting the origin of a similarity measure.

aij - a where bij is the index shifted, may allow a better manifestation of
the structure of the data. Fig. 1 illustrates how the shifting affects the
shape of a similarity index, aij, whose values are the ordinates while ijs are
put somehow on the abscissa: shifting by a4 does not make much difference
since all the similarities remain positive; shifting by a2, a3, or al makes many
similarities negative leaving just a couple of the higher similarity "islands"
positive. We can see that an intermediate a = a2 manifests all the three
humps on the picture, while increasing it to al loses some (or all) of the
islands in the negative depth.
Quite a clustering structure is seen in Table 4 (the mean of which is
obviously zero): its positive entries correspond to almost all similarities
within two groups, one consisting of the entities 1, 2, and 3 and the other
of the rest.
(C) Aggregable Data
When the data entries measure or count the number of occurrences (as in
contingency tables) or volume of some matter (money, liquid, etc.) so that
all of them can be summed up to the total value, the data table is referred to
as the aggregable (summable) one. In such a table the row or/and column
items can be aggregated, according to their meaning, in such a way that the
corresponding rows and columns are just summed together.
Example. Considering Table 1 as a data set on patterns of phone calls made by
the row-individuals to the column-individuals and aggregating the rows in Vl =
{1,2,3}, V2 = {4,5}, V3 = {6,7}, and columns in W l = {1,3,5} and W2 =
{2, 4, 6, 7}, we get the aggregate phone call chart on the group level in table 5.
o
270 B. Mirkin and I. Muchnik

Table 5: Table X aggregated.

Example. A somewhat more realistic data set is presented in table 6 reporting


results of a psychophysical experiment on confusion between segmented numerals
(Keren and Baggen (1981)).

Table 6: Confusion: Keren and Baggen (1981) data on confusion of the segmented
numeral digits 0 to 9.

Response
Stimulus 1 2 3 4 5 6 7 8 9 0
1 877 7 7 22 4 15 60 0 4 4
2 14 782 47 4 36 47 14 29 7 18
3 29 29 681 7 18 0 40 29 152 15
4 149 22 4 732 4 11 30 7 41 0
5 14 26 43 14 669 79 7 7 126 14
6 25 14 7 11 97 633 4 155 11 43
7 269 4 21 21 7 0 667 0 4 7
8 11 28 28 18 18 70 11 577 67 172
9 25 29 111 46 82 11 21 82 550 43
0 18 4 7 11 7 18 25 71 21 818

o
Example. Yet another, rectangular, contingency data matrix is in table 7 (from
L. Guttman, 1971 as presented in Mirkin (1996)) which cross-tabulates 1554 Israeli
adults according to their living places as well as, in some cases, those of their
fathers (column items) and "principal worries" (row items). There are 5 column
items considered: EU AM - living in Europe or America, IFEA - living in Israel,
father living in Europe or America, ASAF - living in Asia or Africa, IFAA- living
in Israel, father living in Asia or Africa, IFI - living in Israel, father also living
in Israel. The principal worries are: POL, MIL, ECO - political, military and
economical situation, respectively; ENR - enlisted relative, SAB - sabotage, MTO
Combinatoral Optimization in Clustering 271

- more than one worry, PER - personal economics, OTH - other worries.

Table 7: Worries: The original data on cross-classification of 1554 individuals by


their worries and origin places.

EUAM IFEA ASAF IFAA IFI


POL 118 28 32 6 7
MIL 218 28 97 12 14
ECO 11 2 4 1 1
ENR 104 22 61 8 5
SAB 117 24 70 9 7
MTO 42 6 20 2 0
PER 48 16 104 14 9
OTH 128 52 81 14 12

o
This kind of data traditionally is not distinguished from the others, which
makes us to discuss it in more detail. Let us consider an aggregable data
table P = (Pij) (i E I,j E J) where LiEf LjEJPij = 1, which means that
all the entries have been divided by the total P++ = L Pij' Since the matrix
is non-negative, this allows us to treat PijS as frequencies or probabilities of
simultaneously occurring row i E I and column j E J (though, no proba-
bilistic estimation problems will be considered in this chapter). Note that
the rows and columns of such a table are usually some categories.
The only transformation we suggest for the aggregable data is

qij = Pij
- 1 = Pij - Pi+P+j
(2.4)
Pi+P+j Pi+P+j
where Pi+ = LjEJ Pij and P+j = LiEf Pij are the so-called marginals equal
to the totals in corresponding rows and columns.
When the interpretation of Pij as co-occurence frequencies is maintained,
qij means the relative change of probability (RCP) of i when column j be-
comes known, RCP{i/j)={p{i/j)-p{i))/p(j). Here, p{ifj) := Pij/P+j, p{i) =
Pi+, and p{j) = P+j' Symmetrically, it can be interpreted also as RCPO/i).
The ratio, Pi:~+i' is frequently referred to as the odds-ratio. In the gen-
eral setting, Pij may be considered as amount of flow, or transaction from
i E I to j E J. In this case, P++ = Li,j Pij is the total flow, p(j Ii) de-
fined as p{j/i) = Pij/Pi+, the share of j in the total transactions of i, and
272 B. Mirkin and 1. Muchnik

p(j) = P+j/P++ is the share of j in the overall transactions. This means


that the ratio p(jfi)/p{j) = Pijp++/(PHP+j) compares the share of j in i's
transactions with the share of j in the overall transactions. Then,

qij = p(j/i)/p{j) -1

shows the difference of transaction Pij with regard to "general" behavior:


qij = 0 means that there is no difference in p{j/i) and p(j)j qij > 0 means
that i favors j in its transactions while qij < 0 shows that the level of
transactions from i to j is less than it is "in general" j value qij expresses
the extent of the difference and can be called flow index. Equation qij = 0
is equivalent to Pij = PHP+j which means that row i and column j are
statistically independent (under the probabilistic interpretation). In the data
analysis context, qij = 0 means that knowledge of j adds nothing to our
ability in predicting i, or, in the flow terms, that there is no difference
between the pattern of transactions of i to j and the general pattern of
transactions to j.
The smaller PH and/or P+j, the larger qij grows. For instance, when
PH and P+j are some 10- 6 , qij may jump to million while the other entries
will be just around unity. This shows that the transformation (2.4), along
with the analyses based on that, should not be applied when the marginal
probabilities are too different.
Example. The table Q = (qij) for the Worries set is in Table 8.

Table 8: Values of the relative changes of probability (Rep), multiplied by 1000,


for the Worries data.

EUAM IFEA ASAF IFAA IFI


POL 222 280 -445 -260 36
MIL 168 -338 -129 -234 72
ECO 145 -81 -302 239 487
ENR 28 -40 11 -58 -294
SAB 19 -77 22 -66 -129
MTO 186 -252 -53 -327 -1000
PER -503 -269 804 726 331
OTH -118 582 -65 149 181

o
Combinatoral Optimization in Clustering 273

Taking into account the summability of the data (to unity), the distance
between the row (or column) entities should be defined by weighting the
columns (or rows) with their "masses" P+j (or, respectively, Pi+), as for
instance,
X2(i, i') = LP+j(qij - Qi'j)2. (2.5)
jEJ

This is equal to the so-called chi-squared distance considered in the the-


ory of a major visualization technique, the correspondence analysis (see, for
example, Benzecri (1973) and Lebart, Morineau and Piron (1995)), and de-
fined, in that theory, with the profiles of the conditional probability vectors
Yi = (Pij/Pi+) and Yi' = (Pi'j/Pi'+):

X2(i, i') := L(Yi - Yi,)2 /P+j = L(Pij/Pi - Pi'j/Pi/ )2 /P+j.


jEJ jEJ

(4) Boolean Data


Boolean (yes/no or one/zero) data are supposed to give, basically, set-
theoretic information. Due to such a table X = (Xij), any row i E I is
associated with the set Wi of columns j for which Xij = 1 while any column
j E J is associated with the row set Vj consisting of those i for which Xij = 1.
Supposedly there is no other information in the table beyond that. This is
usually presented in the graph-theoretic format to allow all the graph theory
constructions applicable. Considering Table 1 as a Boolean similarity table,
it corresponds to the graph presented in Fig. 2.

Figure 2: Graph corresponding to Table 1.

However due to its binariness this kind of data can be treated also as any
other type considered above, especially as comparable or aggregable data.
In VLSI or parallel computing applications, the entities are nodes of a
two- or more-dimensional greed (mesh) which is frequently irregular. This
coordinate-based information can be translated into a sparse graph format
in the following way (see Miller, Teng, Thurston, and Vavasis (1993)). A
k-ply neighborhood system for a data matrix Y is defined as a set of closed
274 B. Mirkin and 1. Muchnik

balls in RR, such that no point Yi, i E I, is strictly interior to more than
k balls. An (a, k) overlap graph is a graph defined in terms of a constant
a ~ 1, and a k-ply neighborhood system {BI, ... ,Bq }. There are q nodes,
each corresponding to a ball, Bm. There is an edge (m, I) in the graph if
expanding the radius of the smaller of Bm and B, by a factor a causes the
two balls to overlap.
(E) Spatial Data
These are the tables reflecting the plane continuity of a two-dimensional
space so that the rows and columns represent sequential strips of the plane,
and the entries correspond to observations in their intersection zones. A
typical example: any (2-D) digitallized image presented via brightness value
at every pixel (cell) of a roster (greed). For instance, the data in Table 1
can be thought of as an 8 x 8 greed with the unities standing for darker
cells. Formally, the spatiality is reflected in the fact that both rows and
columns are totally ordered according to the greed so that comparing two
cells should involve all the intermediates.
We do not consider here what is called multiway tables related to more
than two index sets of the data tables (as, for instance, 3-D images or the
same table measurements made in different locations/times).

3 Cluster Structures
The following categories of combinatorial cluster structures to be revealed
in the data can be found in the literature:

1. Subsets (Single Clusters). A subset S I is a simple classification


~
structure which can be represented in any of three major forms:
a} Enumeration, S = {iI, i2, ... , i m };
b} Boolean indicator vector, s = (Si), i E I, where Si = 1 if i E I and
Si = 0 otherwise;

c} Intensional predicate, P(i}, defined for all i E I, which is true if and


only if i E S.
The latter format can be considered as belonging in the class of "con-
ceptual structures" .

2. Partitions. A set of nonempty subsets S={SI, ... , Sm} is called a par-


tition if and only if every element i E I belongs to one and only one of
Combinatoral Optimization in Clustering 275

these subsets called classes; that is, 8 is a partition when U~18t = I,


and 8 t n 8 u = 0 for t =J. u.

3. Hierarchies. A hierarchy is a set 8w = {8w : W E W} of subsets 8 w ~


I, W E W (where W is an index set), called clusters and satisfying
the following conditions: 1) I E 8w; 2) for any 81, 82 E 8w, either
they are nonoverlapping (81 n 82 = 0) or one of them includes the
other (81 ~ 82 or 82 ~ 8Il, which can be expressed as 81 n 8 2 E
{0, 81, 82}. Throughout this chapter, yet one more condition will be
assumed: (3) for each i E I, the corresponding singleton is a cluster,
{i} E 8w. This latter condition guarantees that any non-terminal
cluster is union of the singletons it contains. Such a hierarchy can
be represented graphically by a rooted tree: its nodes correspond to
the clusters (the root, to I itself), and its leaves (also called terminal
or pendant nodes), to the minimal clusters of the hierarchy, which
is reflected in the corresponding labeling of the leaves. Since this
picture very much resembles that of a genealogy tree, the immediate
subordinates of a cluster are called its children while the cluster itself
is referred to as their parent.

4. Structured Partition (Block Model). A structured partition is a par-


tition 8 = {8t }, t E T, on I, for which a supplementary relation
(graph) weT x T is given to represent "close" association between
corresponding subsets 8t , 8 u when (t, u) E w (so that (8, w) is a "small"
graph).

5. Bipartite Clustering Structures. This concept is defined when the data


index sets, I and J (or K), are considered as different ones. The fol-
lowing bipartite clustering structures involve single subsets, partitions,
and hierarchies to interconnect I and J: 1) box (V, W), V ~ I, W ~ J;
2) bipartition, a pair of partitions, (8, T), with 8 defined on I and T
on J, along with a correspondence between the classes of 8 and T; 3)
bihierarchy, a pair of interconnected hierarchies, (SF, TH), 8F c 21 ,
TH C 2J.

4 Clustering Criteria
When a data set is given and a type of clustering structure has been chosen,
one needs a criterion to estimate degree of fit between the structure and
276 B. Mirkin and 1. Muchnik

data. Initially, it was a lot of ad hoc criteria suggested in clustering (see, for
instance, Brucker (1978) and Arabie, Hubert and De Soete [Eds.] (1996}).
Currently, the following way of thinking seems more productive.
To measure goodness-of-fit, the cluster structure sought, A, should be
employed to formally reconstruct the data matrix, X(A), as if it would have
been produced by the structure A only, with no noise and other influences
interferred. In this case, relation between the original data matrix, X, and
the cluster structure, A, can be stated as the following equation:

X=X(A}+E (4.1)

where E stands for the matrix of residuals, E = X - X(A}, which should


be minimized with regard to the admissible cluster structures A.
Though equation (4.1) may be considered as appealing to statistics
framework, no statistical model for the residuals has been developed so far in
such a setting. The operations research multigoal perspective (compromise
minimizing of all residuals simultaneously) also seems foreign to clustering.
The only framework being widely developed is approximation clustering in
which the clustering problems are considered as those of minimizing a norm
of E with regard to admissible cluster structures. Three norms are in use
currently: L2, the sum of the residual entries squared, Ll, the sum of ab-
solute values of the residials, and L oo , the maximum absolute value in E.
The problem of minimizing of one of these criteria is referred to as the
least-squares, least-moduli, and the least-deviation method, respectively.
In the remainder, we will describe clustering problems according to the
type of cluster structure to reveal. When the data and cluster structure
types are chosen, a criterion of fit may be defined based on substantive or
heuristical considerations, which will be also considered when appropriate.

5 Single Cluster Clustering


5.1 Clustering Approaches
There are three major approaches to determine a cluster as based on: a}
Definition, (b) Direct algorithm, and (c) Optimality criterion.

5.1.1 Definition-based Clusters


A cluster is thought of as a subset S ~ I consisting of very "similar" entities.
Its dual, an "anti-cluster", is to consist of mutually remote entities.
Combinatoral Optimization in Clustering 277

Let Bi be a subset of entities which are "similar" to i E I. Such a subset


can be defined as the set of adjacent vertices in a graph connecting entities
or as a "ball" of entities whose (dis)similarity to i is (not) greater than a
threshold.
A subset 8 ~ I can be referred to as a component cluster if, for any i E 8,
it contains B i , and as a clique cluster if, for any i E 8, it is contained in Bi
and no larger set satisfies this property. The component and clique clusters
are components and cliques, respectively, in the graph (I, B) defined by the
adjacency subsets Bi. Anti-cluster concepts involve independent subsets in
graphs.
Some more cluster concepts in terms of dissimilarities:
(1) A clump cluster is such a subset 8 that, for every i,j E 8 and
k,l E 1- 8, dij < dkl. Obviously, any clump is a clique and connected
component in a threshold graph (I, B) defined by the condition (i,j) E B iff
dij < 7f where threshold 7f is taken between maxi,jES dij and mink,IEI-S dkl.
(2) A strong cluster is such a subset 8 that, for every i, j E 8 and
k E 1-8, dij < dik' or, which is the same, dij < min(dik , djk ). Any strong
cluster is simultaneously a clique and connected component in graph (I, B)
whose adjacency sets Bi are defined by condition dij < 7fi where threshold
7fi is taken between maxjES dij and minkEI-s dik. If two strong clusters
overlap, then one of them is a part of the other. They form a (strong)
hierarchy: 8 1 n 8 2 E {0, 81, 82}, for any strong clusters 8 1 ,82, I may be
considered a strong cluster as well. Obviously the clump clusters also fit in
these properties.
(3) A weak cluster is defined by a weak form of the condition above:
dij < max(dik' djk) for all i,j E Sand k ¢ 8. Weak clusters form a weak
hierarchy: 81 n 82 n 83 E {0, 8 1 n 82, 8 2 n 8 3, 8 3 n 8tl, for any weak clusters
8 1 ,82 ,83 (Bandelt and Dress 1989).
(4) A 7f-cluster 8 ~ I is defined by condition that d(8) ~ 7f where
d(8) = Li,jESdij/181181 is the average dissimilarity within 8.
(5) A strict cluster is such an 8 ~ I that, for any k E 1-8 and 1 E 8,
d(l, 8) ~ 2d(8) < d(k,8) where d(i, 8) is the average dissimilarity of i and
8.
All these concepts are trivially redefined in terms of similarities except
for the strict clusters whose defining conditions become: for any k E I - 8
and 1 E 8, a{l, 8) ~ a(8)/2 < a(k,8). (Here a(8) and a(k,8) are the
average similarities.)
Finding clumps and component clusters involves finding cliques and com-
ponents in graphs; the other concepts are not as well developed.
278 B. Mirkin and 1. Muchnik

5.1.2 Direct Algorithms


In clustering, it is not uncommon to use a cluster designing technique with
no explicit model behind it at all: such a technique itself can be considered
a model of clustering process. Two of such direct clustering techniques
are seriation and moving center, both imitating some processes in typology
making.
A seriation procedure involves a (dis)similarity linkage measure d(i, S)
evaluating degree of dissimilarity between subsets SCI and entities i E
I-S.

Seriation
Initial setting: S = 0 if d{i, 0) is defined for all i E I, or, otherwise,
S = {io}, io E I being an arbitrary entity.
General step: given S, find i* E I - S minimizing dissimilarity
linkage measure d( i, S) with regard to all i E I - S and join i* as
the last element in S seriated.
The general step is repeated until a stop-condition is satisfied. It is also
possible that the final cluster(s) is cut out of the ordering of entire set I
resulting from the seriation process.
The seriation, actually, is a greedy procedure.
Examples of linkage functions:
A For dissimilarity matrices, D = (dij):
1. Single linkage or Nearest neighbor
sl{i, S) = rreig-diji

2. Summary linkage
su{i, S) = L diji
jeS

B For similarity matrices, A = (aij):


3. Average linkage or Average neighbor
al{i, S) =L aij/ISli
jeS

4. Threshold linkage

11r{i, S) = L{aij - 7r) =L aij - 7rISli


;eS jes
Gombinatoral Optimization in Clustering 279

where 7r is a fixed threshold value.

C For column-conditional matrices, Y = (Yik):


5. Entity-to-center scalar product

a(i, S) = (Yi, c(S))


where Yi = (Yik), k E K, is i-th row vector and c(S) is the gravity cen-
ter of S, c(S) = EjESYj/ISI, and (x,y) stands for the scalar product
of vectors x and y.

This measure obviously coincides with the average linkage (3.) when
the similarity matrix is defined as aij = (Yi, Yj).

6. Holistic linkage

hl(i,S) =L ~inlYik - Yjkl


kEK 3ES

D For spatial data arrays:

7. Window linkage

dw(i, S) = d(i, S n Wi)

where Wi is a window around i in the data array (usually, window


Wi is an m x m cell square put on the data greed so that i is in the
window center).

A natural stopping rule in the seriation process can be used when the
threshold linkage is employed: i* is not added to S if I'll' (i* ,S) < O. It is not
difficult to prove that the result is a 7r-cluster.
Another single cluster clustering techniques separates a cluster from the
"main body" of the entities.
280 B. Mirkin and I. Muchnik

Moving Center
Initial setting: a tentative center of the cluster to form, c, and a
constant center of the main body (a "reference point"), a. Usually
the origin is considered shifted so that a = O.
General procedure iterates the following two steps:
1) (Updating of the cluster) define the cluster as the ball of radius
R(a,i) around c: 8 = {i : d(i, c) ~ R(a,i)}. Two most common
difinitions: (a) constant radius, R(a, i) = r, where r is a constant,
(b) distance from the reference point, R( a, i) = d( a, i).
2) (Updating the center) define c = c(8), a centroid of 8.
The process stops when the newly found cluster 8 coincides with
that found at the previous iteration.
The algorithm mayor may not converge (see subsection 5.3 below).
Curiously, in the reference-point-based version of the algorithm, the clus-
ter size depends on the distance between c and aj the less the distance, the
less the cluster radius. This feature can be useful, for example, when a mov-
ing robotic device classifies the elements of its environment: the greater the
distance, the greater the clusters, since differentiation among the nearest
matters more for the robot's moving and acting.

5.1.3 Optimal Clusters


Clustering criteria may come up from particular problems in engineering or
other applied area or from clustering considerations.
Two most known "engineering" clustering problems are those of knap-
sack and location.
The knapsack problem is of finding a subset 8 maximizing its weight,
2:iES Wi, while keeping other parameter(s) constrained.
This problem is NP-complete though admits a polynomial time approx-
imation (see Garey and Johnson (1979) and Crescenzi and Kann (1995)).
The location problem is to find a subset 8 minimizing the cost
1(8) = Lli + LIp.inCij
iES j IES

where i is a location, Ii its cost, and Cij the cost of transport of the product
from warehouse i to customer j.
When C = (Cij) , i,j E I, is non-negative, the function I is super-
modularj that is, it satisfies inequality
1(81 U 82) + 1(81 n 82) ~ 1(81) + 1(82)
Combinatoral Optimization in Clustering 281

for any 8 1 , 82 ~ I. This is also a hard problem (see Hsu and Nemhauser
(1979) and Gondran and Minoux (1984), p. 461).
Among the problems formalizing single cluster clustering goals as they
are, two most popular are those of maximum clique and maximum density
subgraph.
The problem of finding a maximum size clique in a graph belongs to
the core of combinatorial optimization. It is NP-complete, but admits some
approximations (see the latest news in Johnson and Trick [Eds.] (1996) and
Crescenzi and Kann (1995)).
The maximum density subgraph problem is to find a subset 8 maximiz-
ing the ratio
g(S) = Ei'I~1 aij
of the total weight of edges within 8 to the number of vertices in S. (Here
A = (aij) is the edge weight matrix.) The problem can be reduced to
a restricted number of solutions to the problem of maximizing a linearized
version of g, G(8, >.} = EiJES aij ->'ISI. Function G(8, >.} is a supermodular
function so that the problem can be solved in a polynomial time (see Gallo,
Grigoriadis, and Tarjan (1989) where a max-flow technique is exploited for
the problem).
The function of maximum density obviously has something to do with the
average linkage function, g(8} = EiES al(i, 8). This illustrates that many
single cluster clustering criteria can be obtained by integrating of linkage
(dis)similarity functions with regard to their argument subsets: Dd(S) =
INTiEs d(i,8} where INT can be any operation with reals, as summation
or averaging or taking minimum.
In the remainder of this section, two topics related to the direct clustering
techniques will be treated in more detail: (a) single linkage clustering and
its extensions (subsection 5.2), and {b} models underlying the moving center
algorithm (subsection 5.3).

5.2 Single and Monotone Linkage Clusters


5.2.1 MST and Single Linkage Clustering
Let D = (dij) be a symmetric N x N matrix of the dissimilarities dij between
elements i,j E I.
The concept of minimum spanning tree (MST) is one of the most known
in combinatorial optimization. A spanning tree, T = (I, V), with 1 the set
282 B. Mirkin and 1. Muchnik

of its vertices, is said to be an MST if its length, d(T) = E(i,i)EV dij, is


minimum over all possible spanning trees on I. Two of the approaches to
finding an MST are to be mentioned here. In one of the approaches, Kruskal
algorithm, an MST is produced by starting with an empty edge set, V = 0,
and greedily adding edge by edge, (i,j) to V, in the order of increasing
dij (maintaining V with no cycles). The other approach, Prim algorithm,
works differently. It processes vertices, one at a time, starting with S = 0
and updating S at each step by adding to S an element i E I - S minimizing
its single linkage distance to S, sl(i, S) = minjEs dij.
The former approach has been generalized into a theory of greedy opti-
mization of linear set functions based on the matroid theory (Welsh (1976)).
The set of all edge subsets with no cycles is a matroid; greedily adding an
edge by edge (Kruskal algorithm) produces an MST.
As to the latter approach, its important properties can be stated as
these (Delattre and Hansen (1980)): Let us define the so-called minimum
split set function, L(S) = minjEI-S milliEs dij, as a measure of dissimilarity
between Sand T - S. Let us refer to an SCI as a minimum split cluster
if S is a maximizer of L(S) over the set P- (I) of all non-empty proper
sets SCI. All inclusion-minimal minimum split clusters can be found by
cutting any MST at all its maximum weight edges. Obviously, all unions of
these minimal clusters are also minimum split clusters.
There have been no attempts made to generalize this apprcach until
recently. Note that the seriation algorithm above is a natural extension
of the Prim algorithm. Based on the single linkage dissimilarity function,
sl(i, S) := minjEs dij, the seriation algorithm defines a single linkage series,
s = (il' i2, ... , iN), by the condition that for every k = 1, ... , N - 1, the
element ik+l is a minimizer of sl(i, Sk) with regard to i E 1- Sk. Here
Sk := {il' ... , ik} is a starting set of the series s (k = 1,2, ... , N - 1). The
Prim algorithm finds a minimum linkage series s and an MST associated
with it: its edges connect, for k = 1, ... , N - 1, the vertex ik+l with just one
of the vertices j E Sk that have dik+1j = sl(ik+b Sk). The minimum split
clusters are what can be called single linkage clusters, that is, starting sets
Sk of single linkage series s, which are maximally separated from the other
elements along the series. More explicitly, a single linkage cluster is Sk in
a single linkage series s = (il,i2, ... ,iN) such that sl(ik+l,Sk) is maximum
over all k = 1, ... , N - 1 (which is a "greedy" definition).
Example. The only MST for the matrix in table 3 of Hamming distances between
rows of the Boolean matrix in table 1 is presented in Fig. 3. Similarity matrix
in table 4 implies the same tree (with different edge weights) as the maximum
Combinatoral Optimization in Clustering 283

Figure 3: Minimum spanning tree for the distance data in table 3.

spanning tree. There is only one maximum length edge (3,5); by cutting it out, we
obtain two inclusion-minimal minimum split clusters, {I, 2, 3} and {4, 5, 6, 1}. 0
In the next subsection, this construction will be extended to the problem
of greedy optimization of yet another class of set functions, quasi-concave,
not linear ones, as described in Kempner, Mirkin and Muchnik (1997).

5.2.2 Monotone Linkage Clusters


A version of the greedy seriation algorithm finds minimum split clusters
for a class of minimum split functions defined with the so-called monotone
linkage functions. Let us refer to a linkage function d(i, 8), 8 E P-(I),
i E 1 - 8, as a monotone linkage if d(i, 8) ~ d(i, T) whenever 8 ~ T (for
all i E 1 - T). Given a linkage function d, a set function Md called the
minimum split function for d is defined by

Md(8) := ,min d(i, 8). (5.1)


leI-S

It appears, that the set of minimum split functions of the monotone


linkages coincides with the set of n-concave set functions (Kempner, Mirkin
and Muchnik (1997)). A set function F : P-(I) --+ R will be referred to as
n-concave if
(5.2)
for any overlapping 81, 82 E P- (I).
Any n-concave set function, F, defines a monotone linkage, dF, by

dF(i,8):= max F(T) (5.3)


S~Tr;I-i

for any 8 E P-(I) and i E 1 - 8.


These functions are interconnected so that for any n-concave F : P- (I) --+
R, MdF = F. The other way leads to a weaker property: for any monotone
linkage d, dMd :$ d.
284 B. Mirkin and I. Muchnik

Given a monotone linkage function, d(i, S), a series (il, ... , iN) is referred
to as a d-series if d(ik+l, Sk) = millieI-s" d(i, Sk) = Md(Sk) for any starting
set Sk = {iI, ... , ik}, k = 1, ... , N - 1. This definition describes the seriation
algorithm as a greedy procedure for constructing a d-series starting with
il E I: having defined Sk, take any i minimizing d(i, Sk) over all i E 1- Sk
as ik+1, k = 1, ... ,N -1. A subset S E 1'-(1) will be referred to as ad-cluster
ifthere exists ad-series, s = (il, ... , iN), such that S is a maximizer of Md(S)
over all starting sets Sk of s. Greedily found, d-clusters play an important
part in maximizing the associated n-concave set function, F = Md. If, for
ad-series s = (iI, i2, ... , iN), a subset SCI contains iI, and ik+1 is the first
element in s not contained in S (for some k = 1, ... N - 1), then

where Sk = {iI, ... , ik}. In particular, if S is an inclusion-minimal maximizer


of F (with regard to 1'-(1)), then S = Sk, that is, S is a d-cluster (Kempner,
Mirkin, Muchnik, 1997).
All the minimal maximizers of a n-concave set function F = Md on
1'- (I)) for a monotone linkage d can be found by using the following three-
step extended greedy procedure:
Extended Greedy Procedure
(A) For each i E I, define d-series Pi greedily starting from i as its
first element.
(B) For each d-series Pi = (il := i, i2, ... , iN), take Ii equal to its
smallest starting set with F(Ii) = maxk d(ik+1! Sk}).
(C) Among the non-coinciding minimal d-clusters Ti, i E I, choose
those maximizing F.
Moreover, every non-minimal maximizer of F is a union of the minimal
maximizers. The reverse statement, in general, is not true: some unions
of minimal maximizers can be non-maximizers. Also, the minimal clusters,
though nonoverlapping, may cover only a part of I.
Example. Let us apply the extended greedy procedure to the holistic linkage func-
tion, hl(i,S) = LA:EKminjEs IYiA: -YjA:l, and table 1 considered as matrix X. Two
ofthe hi-series produced are 1(1)2(2)3(3)4(1)5(0)6(0)7 and 7(1)6(1)4(2)5(2)3(0)2(0)1
(as started from rows 1 and 7, respectively). The values hl(ik+l' SA:) are put in the
parentheses. We can see that the hi-cluster S = {I, 2, 3} is the only maximizer of
the minimum split function Mhl(S)j no maximizer starts with 7 (neither with 4 nor
5 nor 6). 0
The problem of maximizing n-concave set functions is exponentially hard
Combinatoral Optimization in Clustering 285

when they are oracle-defined: every set indicator function FA(S) (A c I)


which is equal to 0 when S i- A and 1 when S = A is obviously n-concave
(a note by V. Levit). However, n-concave set functions can be maximized
greedy-wisely when they are defined in terms of monotone linkage. Thus,
the monotone linkage format may well serve as an easy-to-interpret and
easy-to-maximize input for dealing with n-concave set functions .


Figure 4: An illustration to the definition of the monotone-linkage function
l(i, S): I is the grey rectangle; S is the subcet of black cells; an i E 1- S
is the crossed cell being the center of a 5 x 5 window shown by the dashed
borderline.

5.2.3 Modeling Skeletons in Digital Image Processing


In image analysis, there is a problem of skeletonization (or thinning) of
planar patterns, that is, extracting a stick-like representation of a pattern
to serve as a pattern descriptor for further processing (for a review, see
Arcelli and Sanniti di Baja (1996)).
The linkage-based concave functions can suggest the set of inclusion-
minimal maximizers as a skeleton model. This can be illustrated with the
spatial data patterns in Figures 4 to 6.
On Fig. 4, the set of cells in the grey rectangle is I while the black cells
constitute S. A cell i E 1- S is in the center of a 5 x 5 window (shown by
the dashed border line; its center cell, i, is shown by the cross); its linkage
to S, l(i, S) is defined as the number of grey cells in the window, which is
obviously decreasing when S increases. It should be added to the definition
that l(i, S) is defined to be equal to 25 when the window does not overlap
S; that is, when no black cell is present in the window around i.
286 B. Mirkin and I. Muchnik

In the example shown in Fig. 4, l(i, S) = 11. This can be decreased by


moving i. The minimum value, l(i, S) = 6, is reached when the crossed cell
is moved to the left border (within its row).

Figure 5: Two single-cell sets, A and B, illustrating different minimum


values of l(i, S) (over i E 1- S).

Obviously, set S must be a single cell to get the minimum of l(i, S) (over
all i E I - S) maximally increased, as presented in Fig. 5.
The windows A and B represent the minimum values of l(i, S) for each
of the two subsets. The minimum value is 8 (for A) and 14 (for B), which
makes B by far better than A with regard to the aim of maximizing Ml(S) .
Obviously, S = B is a minimal maximizer of MI(S) , The set of minimal
maximizers, for the given I, is the black skeleton strip as presented in Fig.
6.

Figure 6: The strip of minimal maximizers of M,(S) being a skeleton of the


grey I .
Combinatoral Optimization in Clustering 287

We leave it to the reader to analyze the changes of the set of minimal


maximizers with respect to changes of the window size.

5.2.4 Linkage-based Convex Criteria


A U-concave set function F(S) can be introduced via a monotone-increasing
linkage function, that is, such a p( i, S) with i E S that p( i, S) ~ p( i, S U T)
for any T ~ I. Let us define Dp(S) = miI1iEs p(i, S), the diameter of S.
Obviously, Dp(S) = Mdp(I - S) where dp(i, S), i E 1- S, is a monotone-
decreasing linkage function defined as dp(i, S) = p(i, I -S). This implies that
the diameter functions are those (and only those) satisfying the condition
of U-concavity,
(5.4)
Inequality (5.4) shows that the structure of maximizers, S*, of a U-
concave function is dual to the structure of maximizers, I - S*, of the
corresponding n-concave function. In particular, the set of maximizers of
a U-concave function is closed with regard to union of subsets, and there
exists only one inclusion-maximum maximizer for every U-concave function.
This implies that a n-concave function may have several inclusion-minimal
maximizers if and only if I itself is the only maximizer of the corresponding
U-concave function.
Finding the maximum maximizer of a U-concave function Dp can be
done greedily with a version of the seriation algorithm involving p(i, S).
Let us consider a p-series, s = (iI, ... , iN), where every ik is a minimizer of
p(i, 1- Sk-d with regard to i E 1- Sk-l (k = 1, ... , N; So is defined as
empty set). That 1- Sk-l is the maximum maximizer of Dp(S) which gives
maximum of p(ik' I - Sk-l) (k = 1, ... , N). In this version, computation
starts with I and goes on by one-by-one extracting entities from the set.
Two more kinds of set functions can be introduced dually, by switching
between the operations of minimum and maximum (or just substituting the
linkage functions d and p by M M - d and M M - p where M M is a constant).
This way we obtain classes of what can be called n- and U-convex functions
defined by conditions

respectively. These functions are to be minimized. All the theory above


remains applicable (up to obvious changes). An example of application of
the convex functions for feature selection in regression problems has been
provided by Muchnik and Kamensky (1993).
288 B. Mirkin and I. Muchnik

The monotone linkage functions were introduced, in the framework of


clustering, by Mullat (1976) who considered U-convex functions G(S) :=
maxiES d( i, S) as greedily minimizable and called them "monotone systems" .
Constrained optimization clustering problems with this kind of functions
were considered in Muchnik and Schwarzer (1989, 1990). This theory still
needs to be polished. We will limit ourselves with an example based on the
table 1 considered as the adjacency matrix for the graph in Fug. 2.
Example. Let us consider two monotone-increasing linkage functions,

p(i,S) = I>ij( L Xj1c)2


jeS 1ceS-j

and
1r(i, S) = I>ij,
jeS

and define a constrained version of the diameter function,

Dp7f3 (S) = min p(i, S).


ieS&7r(i,S):S;3

This function still satisfies the condition of U-concavity (5.4) and, moreover,
can be maximized with the seriation algorithm above, starting with S = I and one
by one removing entities.
= =
First step: Put S I and find set II(S) {i : i E S&1r(i, S) ::; 3} which is,
obviously, II(I) = {I, 3, 4}. Find p(l, S) = 4 + 9 + 9 = 22, p(3, S) = 4 + 9 + 16 = 29,
and p(4, S) = 16 + 16 + 9 = 41. Thus, D p7f3(1) = 22, and entity 1 is the first to be
extracted so that, at the next step, S:= I - {I}.
Second step: Determine II(S) = {2,3,4}. Find p(2,S) = 4 + 4 + 16 = 24,
p(3, S) = 4 + 16 = 20, and p(4,S) = 16 + 16 + 9 = 41. Thus, D p7f3(I - {I}) = 20
and S becomes S := I - {I, 3}.
Third step: II(S) = {2,4}. Findp(2, S) = 1+16 = 17 andp(4, S) = 16+9+9 =
34. Thus, Dp7f3 (I - {1,2}) = 17 and S becomes S:= 1- {1,2,3}.
Fourth step: II(S) = {4,5}. We have p(4, S) = p(5, S) = 9 + 9 + 9 = 27, which
makes Dp7r3(1 - {1,2,3}) = 27 and S can be reduced by extracting either 4 or 5.
Let, for instance, S:= I - {1,2,3,4}.
Fifth step: II(S) = {5, 6, 7} = S. We have p(5, S) = 4 + 4 = 8 and p(6, S) =
p(7,S) = 4 + 4 + 4 = 12. Thus, D p7f3 (1 - {1,2,3,4}) = 8 and next S := {6,7}
which further reduces Dp7f3(S).
Thus maximum Dp7f3 (S) is Dp7f3 (I - {1,2,3}) = 27; the optimal S is the four-
element set S· = {4, 5, 6, 7}.
Curiously, this result is rather stable with regard to data changes. For instance,
all the loops (diagonal ones) can be removed or added, with no change in the
optimum cluster. 0
Combinatoral Optimization in Clustering 289

The constructions described only involve ordering information in both,


the domain and range of set llinkage functions and, also, they rely on the
fact that every subset is uniquely decomposable into its elements. There-
fore, they can be extended to distributive lattice structures with the set of
irreducible elements as I (see, for instance, Libkin, Muchnik and Schwarzer
(1989)).

5.3 Moving Center and Approximation Clusters


5.3.1 Criteria for Moving Center Methods
Let us say that a centroid concept, c(S), corresponds to a dissimilarity mea-
sure, d, if c(S) minimizes EiES d(i, c). For example, the gravity center (aver-
age point) corresponds to the squared Euclidean distance cP(Yi' c) since the
minimum of EiES cP(Yi' c) is reached when c = EiES YikiISI. Analogously,
the median vector corresponds to the city-block distance.
For a subset ScI and a centroid vector c, let us define

D(c,S) = Ld(i,c) + L d(i,a) (5.5)


iES iEI-S

to be minimized by both kinds of the variables (one related to c, the other


to S). Here a is a reference point.
The alternating minimization of (5.5) consists of two steps reiterated:
(1) given S, determine its centroid, c, by minimizing EiES d( i, c)j (2) given
c, determine S = S(c) by minimizing D(c, S) over all S. It appears, when
the centroid concept corresponds to the dissimilarity measure, the moving
center method is equivalent to the alternating minimization algorithm. In
the case when the radius, r, is constant, the reference point a in (5.5) is set
as a particular distinct point 00 added to I with all the distances d(i, 00)
equal to the radius r.
This guarantees convergence of the method to a locally optimal solution
in a finite number of steps.

5.3.2 Principal Cluster


Criterion (5.5) can be motivated by the following approximation clustering
model. Let Y = (Yik), i E I, k E K, be an entity-to-variable data matrix.
A cluster can be represented with its standard point c = (Ck), k E K, and a
Boolean indicator function s = (Si), i E I (both of them may be unknown).
290 B. Mirkin and 1. Muchnik

Let us define a bilinear model connecting the data and cluster with each
other:
(5.6)
where eik are residuals whose values show how well or ill the cluster structure
fits into the data. The equations (5.6) when eik are small, mean that the
rows of Yare of two different types: a row i resembles c when Si = 1, and
it has all its entries small when Si = O.
Consider the problem of fitting the model (5.6) with the least-squares
criterion:
L2(C,S) = L L(Yik -Ck Si)2 (5.7)
iEi kEK

to be minimized with regard to binary Si and/or arbitrary Ck.


A minimizing cluster structure is referred to as a principal cluster because
of the analogy between this type of clustering and the principal component
analysis: a solution to the problem (5.7) with no Booleanity restriction
applied gives the principal component score vector S and factor loadings c
corresponding to the maximum singular value of matrix Y (see, for data
analysis terminology, Jain and Dubes (1988), Lebart, Morineau, and Piron
(1995), Mirkin (1996)). It can be easily seen that criterion (5.7) is equivalent
to criterion (5.5) with d being the Euclidean distance squared, S = {i :
Si = I}, and a = 0, so that the moving center method entirely fits into
the principal cluster analysis model (when it is modified by adding a given
constant vector a to the right part). Obviously, changing L2(e, s) in (5.7)
for other Minkowski criteria leads to (5.5) with corresponding Minkowski
distances.
On the other hand, by presenting (5.7) in matrix form, L2 = Tr[(Y -
scT)T(y - seT)], and putting there the optimal c = yTs/sTs (for s fixed),
we have
L2 = Tr(yTy) - syyT s/sT s
leading to decomposition of the square scatter of the data (Y, Y) = Tr(yTy)
= "'. Y~ into the "explained" term " s yyT s/sT s and the "unexplained"
L..,.,~,k ~k
one, L2 = Tr(ETE) = (E,E), where E = (eik):

(5.8)

Matrix A = yyT is an NxN entity-to-entity similarity matrix having its


entries equal to the row-to-row scalar products aij = (Yi,Yj). Let us denote
the average similarity within a subset S ~ I as a(S) = Ei,jESaij/ISIISI.
Combinatoral Optimization in Clustering 291

Then (5.8) implies that the principal cluster is a Boolean maximizer of the
set function

g(8} = syyT sjsT S =


1
-181 :E aij = 18Ia(8} (5.9)
i,jeS
which extends the concept of subgraph density function onto arbitrary Gram
matrices A = yyT.

Figure 7: A graph with a four-element clique which is not the first choice in
the corresponding eigenvector.

As is well known, maximizing criterion syyT sjsT s with no constrains


on s yields the optimal s to be the eigenvector of A corresponding to its
maximum eigenvalue (Janich (1994)). This may make one suggest that there
must be a correspondence between the components of the globally optimal
solution (the eigenvector) and the solution to the restricted problem when s
is to be Boolean. However, even if such a correspondence exists, it is far from
being straightforward. For example, there is no correspondence between the
largest components of the eigen-vector and the non-zero components in the
optimal Boolean s: the first eigen-vector for the 20-vertex graph in Fig. 7
has its maximum value corresponding to vertex 5 which, obviously, does not
belong to the maximum density subgraph, the clique {1, 2, 3, 4}.
Let us consider a local search algorithm for maximizing g(8) starting
with 8 = 0 and adding entities from lone by one. This means that the
algorithm exploits the neighborhood N(8) := {8 + i : i E 1- 8} and is a
seriation algorithm based on the increment c5g(i, 8) := g(8 +i} - g(8} which
is
c5 (. 8) = aii + 218Ial(i, 8) - g(8}
(5.10)
9 t, 181 + 1
292 B. Mirkin and 1. Muchnik

where al(i, S) is the average linkage function defined above as al(i, S) :=


'Li,jES aij/ISI·
A dynamic version of the seriation algorithm with this linkage function
is as follows.

Local Search for g(S)


At every iteration, the values Al(i, S) = aii+2ISlal(i, S) (i E I -S)
are calculated and their maximum Al (i* ,S) is found. If Al (i* ,S) >
g(S) then i* is added to Sj if not, the process stops, S is the
resulting cluster.
To start a new iteration, all the values are recalculated:
al(i, S) ¢: (ISlal(i, S) + aii* )/(ISI + 1)
g(S) ¢: (ISlg(S) + Al(i*, S))/(ISI + 1)
lSI ¢: lSI + 1.
Since criteria (5.9) and (5.5) (with a = 0) are equivalent, the results of
these two seemingly different techniques, the seriation and moving centers
procedures, usually will be similar.
Example. The algorithm applied to the data in table 2 produces a two-entity
principal cluster, S = {1,2}, whose contribution to the data scatter is 32.7%.
Reiterating the process to the set of yet unclustered entities we obtain a partition
whose classes are in table 9. The clusters explain some 85% of the data scatter.

Table 9: The sequential principal clusters for 7 entities by the column-conditional


matrix 2.

Cluster Entities Contribution, %


1 1,2 32.7
2 3 13.6
3 4,6,7 26.0
4 5 12.4

5.3.3 Additive Cluster


Let A = (aij), i,j E I, be a given similarity or association matrix and
AS={ASiSj) a weighted set indicator matrix which means that S = (Si) is the
indicator of a subset S ~ I along with its intensity weight A. The following
Combinatoral Optimization in Clustering 293

model is applicable when A can be considered as a noisy information on AS:

(5.11)

where eij are the residuals to be minimized. Usually, matrix A must be


centered (thus having zero as its grand mean) to make the model look fair.
The least-squares criterion for fitting the model,

L2(A, s) = L: (aij - ASiSj)2, (5.12)


i,jel

is to be minimized with regard to unknown Boolean S = (Si) and/or real A


(in some problems, A may be predefined). When A is not subject to change,
the criterion can be presented as

L2(A, s) = L: a~j - 2A L: (aij - A/2)SiSj


i,jel i,jel

which implies that, for A > 0 (which is assumed for the sake of simplicity),
the problem in (5.12) is equivalent to maximizing

L(A/2, s) = L: (aij - A/2) = L: aij - A/21S1 2 (5.13)


i,jeS i,jeS

which is just the summary threshold linkage criterion, L(1r, S) = ~ies l1r(i, S)
where 1r = A/2. This implies that the seriation techniques based on l1r(i, S)
can be applied for locally maximizing {5.13}.
In general, the task of optimizing criterion (5.13) is NP-complete.
Let us now turn to the case when A is not pre-defined and may be
adjusted based on the least-squares criterion. There are two optimizing
options available here.
The first option is based on the representation of the criterion as a func-
tion of two variables, S and A, to allow using the alternating optimization
technique.

Alternating Optimization for Additive Clustering


Each iteration includes: first, finding a (locally) optimal S for
L(1r, S) with 1r = A/2; second, determining the optimal A = A(S),
for fixed S, by the formula below. The process ends when no change
of the cluster occurs.
294 B. Mirkin and 1. Muchnik

The other option is based on another form of the criterion. For any given
8, the optimal oX can be determined (by making derivative of L2(oX, s) by oX
equal to zero) as the average of the similarities within 8:

oX(8) = a(8) = L ai;si s;/ L SiS;


iJEI i,;EI

The value of L2 in (5.12) with the oX = oX(8) substituted becomes:

L2(oX,s) = La~; - (Lai;sis;)2/LsiS; (5.14)


iJ iJ iJ

Since the first item in the right part is constant (just the square scatter of
the similarity coefficients), minimizing L2 is equivalent to maximizing the
second item which is the average linkage criterion squared, g2(8) where g(8)
is defined in (5.9). Thus, the other option is just maximizing this criterion
with the local search techniques described.
Some more comments:
(1) the criterion is the additive contribution of the cluster to the square
scatter of the similarity data, which can be employed to judge how important
the cluster is (in its relation to the data);
(2) since the function g(8) here is squared, the optimal solution may
correspond to the situation when g(8) is negative, as well as L(1£',8) and
a(8). Since the similarity matrix A normally is centered, that means that
such a subset consists of the most disassociated entities and should be called
anti-cluster. However, using local search algorithms allows us to have that
sign of a(8) we wish, either positive or negative: just the initial extremal
similarity has to be selected from only positive or only negative values;
(3) in a local search procedure, change of the squared criterion when
an entity is added/removed may behave slightly differently than that of the
original g(8) (an account of this is done in Mirkin (1990));
(4) when A = yyT where Y is a column-conditional matrix, the additive
cluster criterion is just the principal cluster criterion squared, which implies
that the optimizing clusters must be the same, in this case.

5.3.4 Seriation with Returns


Considered as a clustering algorithm, the seriation procedure has a draw-
back: every particular entity, once being caught in the sequence, can never
be relocated, even when it has low similarities to the later added elements.
After the optimization criteria have been introduced, such a drawback can
Combinatoral Optimization in Clustering 295

be easily overcome. To allow exclusion of the elements in any step of the


seriation process, the algorithm is modified by extending its neighborhood
system.
Let, for any 8 c I, its neighborhood N(8) consist of all the subsets
differing from 8 by an entity i E I being added to or removed from 8. The
local search techniques can be formulated for any .criterion as based on this
modification. In particular, criterion g(8) has its increment in the new N(8)
equal to

(5.15)

where Zi = 1 if i has been added to 8 or Zi = -1 if i has been removed


from 8. Thus, the only difference between this formula and that in (5.10)
is change of the sign in some terms. This allows for a modified algorithm.

Local Search with Return for g(S)


At every iteration, values Al(i, 8) = Sii + 2ziI8Ial(i, 8) (i E I) are
calculated and their maximum Al (i* ,8) is found. If Al (i* ,8) >
zi*g(8), then i* is added to or removed from 8 by changing the
sign of Zi*; if not, the process stops, 8 is the resulting cluster.
To start the next iteration, all the values are updated:
al(i,8) {= (18Ial(i, 8) + Zi*Sii* )/(181 + Zi*)
g(8) {= (18Ig(8) + zi*Al(i*, 8))/(181 + Zi*)
1/(8) {= 181 + Zi*·
The cluster found with the modified local search algorithm is a strict
cluster since the stopping criterion involves the numerator of (5.15) and
implies inequality zi(al(i, 8) - g(8)/2181) :$ 0 for any i E I.

6 Partitioning
6.1 Partitioning Column-Conditional Data
6.1.1 Partitioning Concepts
There are several different approaches to partitioning:

A Cohesive Clustering: (a) within cluster similarities must be maximal;


(b) within cluster dissimilarities must be minimal; (c) within cluster
dispersions must be minimal.
296 B. Mirkin and 1. Muchnik

B Extreme Type Typology: cluster prototypes (centroids) must be as far


from grand mean as possible.
C Correlate (Concensus) Partition: correlation (concensus) between the
clustering partition and given variables/categories must be maximaL
D Approximate Structure: differences between the data matrix and clus-
ter structure matrix must be minimaL
There can be suggested an infinite number of criteria within each of
these loose requirements. Among them, there exists a list of criteria that
are equivalent to each other. This fact and numerous experimental results
are in favor of the criteria listed in the following four subsections.

6.1.2 Cohesive Clustering Criteria


Let S = {Sl, ... , Sm} be a partition of I to be found with regard to the data
given. A criterion of within cluster similarity to maximize:
m
g(S) = L L aij/IStl = Lg(St} (6.1)
t=l i,jESt
where A = (aij) is a similarity matrix.
A criterion of within cluster dissimilarity to minimize:
m
D(S) = L L dij/IStl (6.2)
t=l i,jESt
where D = (dij) is a dissimilarity matrix.
Two criteria of within cluster dispersion to minimize:
D(c, S) = LL d(Ct, Yi) (6.3)
t=l iESt
where d(Ct, Yi) is a dissimilarity measure between the row-point Yi, i E I,
and cluster centroid Ctj and
m
a(S) = LPtat (6.4)
t=l
where Pt = IStl/III is proportion of the entities in St and at = ~k ~iESt (Yik-
Ctk)2/IStl is the total variance in St with regard to within cluster means,
ctk = ~iESt Yik/IStl·
The versions of criteria (6.1), (6.2) and (6.4) with no cluster coefficients
Pt, IStl have been also considered.
Combinatoral Optimization in Clustering 297

6.1.3 Extreme Type Typology Criterion


A criterion to maximize is
m
T{c,S) = L L~vlStl (6.5)
vEV t=l
where Ctv is the average of the category/variable v in St.

6.1.4 Correlate/Consensus Partition


A criterion to maximize is

O{S) = L /1-{S, k) (6.6)


kEK
where /1-{S, k) is a correlation or contingency coefficient (index of consensus)
between S and a variable k E K. Important examples of such coefficients:
(i) Correlation ratio (squared) when k is quantitative,

(6.7)

where Pt is the proportion of entities in St, and u~ or Ulk is the variance of


variable k in all the set I or within cluster St, respectively. Correlation ratio
",2 (S, k) is between 0 and 1; ",2 (S, k) = 1 if and only if variable k is constant
in each cluster St.
(ii) Pearson X 2 (chi-square) when k is qualitative,
m (p )2 m 2
X2{s,k) = LL vt -PvPt = LL Pvt -1 (6.8)
vEk t=l PvPt vEk t=l PvPt
where Pv, Pt, or Pvt are frequencies of observing of a category v (in variable
k), cluster St, or both. Actually, the original Pearson coefficient was intro-
duced as a measure of deviation of observed bivariate distribution, Pvt, from
the hypothetical statistical independence: it is very well known in statis-
tics that when the deviation is due to sampling only, distribution of N X2
is asymptotically chi-square. However, in clustering this index may have
different meanings {Mirkin (1996}).
(iii) Reduction of the proportional prediction error when k is qualitative,
m (p )2 m
/:1(S/k) =L L vt - PvPt =L LP~tlPt - LP~ (6.9)
vEk t=l Pt vEk t=l vEk
298 B. Mirkin and I. Muchnik

The proportional prediction error is defined as probability of error in the


so-called proportional prediction rule applied to randomly coming entities
when any v is predicted with frequency PV' The average error of propor-
tional prediction is equal to EvPv(l - Pv) = 1 - Evp~ which is also called
Gini coefficient. 6.(8/k) is reduction of the error when the proportional
prediction of v is made under condition that t is known.
The coefficient C(8) with 11(8, k) = 6.(8/k) is proven in Mirkin (1996) to
be equivalent to yet another criterion to maximize which is frequently used
in conceptual clustering, the so-called Category Utility Function applied
only when all the variables are nominal (see, for instance, Fisher (1987)).

6.1.5 Approximation Criteria


Let the data be represented as a data matrix Y = (Yiv), i E I, v E V,
where rows Yi = (Yiv), v E V, correspond to the entities i E I, columns,
v, to quantitative variables or qualitative categories, and the entries Yiv
are quantitative values associated with corresponding variables/categories
v E V. A category v is represented in the original data table by a binary
column vector with unities assigned to entities satisfying v and zeros to
the others. The sizes of these sets will be denoted, as usual: III = Nand
IVI =n.
To present a partition 8 = {8l , ... , 8 m } as a matrix of the same size,
let us assign centroids, Ct = (Ctv), to clusters 8 t presented by corresponding
binary indicators, St = (Sit), where Sit = 1 if i E 8 t and = a if i ¢ 8 t . Then,
the matrix E t stcf represents the cluster structure so that comparison of
the structure and the data can be done via equations
m
Yiv = L CtvSit + eiv (6.10)
t=l

where eiv are residuals to be minimized by the cluster structure using, for
instance, the least-squares criterion:
m
L2(8, c) = L L (Yiv - L Ctv Sit)2 (6.11)
iEf vEV t=l

Since 8 is a partition, a simpler formula holds for the criterion:


m

D2(8,c) = L L L(Yiv -Ctv)2 (6.12)


t=l vEViESt
Combinatoral Optimization in Clustering 299

6.1.6 Properties of the Criteria


Let us assume that a standardizing transformation (2.1) has been applied to
each variable/category v E V with av the grand mean and bv the standard
deviation if v is a quantitative variable. When v is a binary category, av
is still the grand mean equal to the frequency of v, Pv' For bv , one of the
following two options is suggested: bv = 1 (first option) or bv = .jPv (second
option). Then the following statement holds.
The following criteria are equivalent to each other:
(6.1) with aij = (Yi, Yj), that is, A = yyT,
(6.2) with dij being Euclidean distance squared,
(6.3) with d(Yi, Ct) being Euclidean distance squared,
(6.4),
(6.5),
(6.6) where I"(S, k) = 1/(S, k) if k is quantitative and I"(S, k) = /l(k/S)
if k is qualitative and the first standardizing option has been applied or
I"(S, k) = X2(S, k) if k is qualitative and the second standardizing option
has been applied,
(6.11), and
(6.12).
The proof can be found in Mirkin (1996); it is based on the following
decomposition ofthe data scatter in model (6.10) when ctv are least-squares
optimal (thus being the within cluster means):
m
L Lylv = L L ~vlStl + L L e~v,
iEI vEV t=l vEV iEI vEV
(6.13)

Those of the criteria to be maximized correspond to the "partition-explained"


part of the data scatter and those to be minimized correspond to the "un-
explained" residuals.
Some properties of the criteria:
(1) The larger m, the better value. This implies that either m or a
criterion value (as the proportion of the "explained" data scatter) should be
predefined as a stopping criterion.
(2) When Ct are given, the optimal clusters St satisfy the so-called min-
imal distance rule: for every i ESt, d(Yi, ct) ~ d(Yi' cq ) for all q =1= t. This
means that the optimal clusters are within nonoverlapping balls (spheres)
that are convex bodies. This drastically reduces potential number of candi-
date partitions in enumeration algorithms. In the case of m = 2, the optimal
clusters must be linearly separated. By shifting the separating hyperplane
300 B. Mirkin and 1. Muchnik

toward one ofthe clusters until it touches an entity of I, we get a number of


the entity points belonging to the shifted hyperplane. Since the total number
of the points defining the normal vector is lVI, the total number of the sepa-
rating hyperplanes is not larger than ( ~ ) + ( ~ ) + ... + ( I~I ) ~ NIVI.
This guarantees a "polynomial" -time solution to the problem by just enu-
meration of the separating hyperplanes. Regretfully, there is nothing known
on the problem beyond that. Probably an m cluster optimal partition can
be found by enumerating not more than NlVlm/2 separating hyperplanes.

(3) The criterion (6.1) is an extension ofthe maximum density subgraph


problemj this time the total of within cluster densities must be maximized.

(4) Different expressions differently fit into different neighbourhoods for


local search. For instance, formula (6.12) fits into alternating minimization
strategy (given c, adjust Sj given S, adjust c). Formula (6.6) is preferable
in conceptual clustering when partitioning is done by consecutive dividing
I by the variables.

6.1. 7 Local Search Algorithms

Among many clustering heuristics suggested, those seem to have better


chances for survival that: (a) are local search algorithms for convenient cri-
teria, and (b) can be interpreted as models of a human classification making
process. We present here several partition neighborhood systems applicable
to any criteria.

Agglomerative Clustering. This procedure models establishing a bi-


ological taxonomy via similarity of species. The neighborhood of a partition
S = {SI, ... ,Sm} here is N(S) = {stu: t,u = 1, ... ,mjt =1= u} where stu is
obtained from S by merging its classes St and SUo For criterion (6.11), the
local search algorithm with this neighborhood system can be formulated as
follows (starting with the matrix of Euclidean distances squared):
Combinatoral Optimization in Clustering 301

Agglomerative Clustering

Step 1. Find the minimal value di* j* in the dissimilarity matrix


and merge i" and j* .
Step 2. Reduce the distance matrix, substituting one new row (and
column) i* U j* instead of the rows and columns i*, j*, with its
dissimilarities defined as

where Ct; and Cu are the cluster gravity centers. If the number of
clusters is larger than 2, go to Step 1, else End.
The value di,i*Uj* is exactly the increment of criterion D2 (6.12) when S is
changed for stu (Ward (1963)). The algorithm is known as Ward clustering
algorithm.
Lance and Williams (1967) suggested a family of agglomerative clustering
algorithms by extending equation (6.14) to a linear-wise function of the for-
mer distances di,i* and di,j*. Among the most popular in the Lance-Williams
family are the single linkage and complete linkage agglomerative clustering
algorithms where di,i*Uj* = min(di,i*,di,j*) and di,i*Uj* = max(di,i*,di,j*),
respectively. Ward algorithm also belongs to this family.
Alternating Square-Error Clustering. This algorithm can be con-
sidered a model for typology making. It corresponds also to the most pop-
ular technique in clustering, the so-called K-Means (moving centers, nuee
dynamique) method for direct clustering.

Alternating Square-Error Clustering


Starting with a list of tentative centers Ct, the following two steps
are iterated until the partition is stabilized:
Step 1. Cluster membership. Having Ct; fixed, find clusters St min-
imizing l:f=ll:iESt ~(Yi' Ct) with the minimal distance rule.
Step 2. Standard points. Having clusters St fixed, find the gravity
centers, Ct, of St, t = 1, ... , m.
The method converges in a finite number of iterations since the number
of minimal distance rule partitions is finite and the criterion decreases at
each step. The other versions of the algorithm involve different dissimilarity
measures and centroid concepts.
Exchange Algorithm. This algorithm can be considered as a proce-
302 B. Mirkin and 1. Muchnik

dure for one-by-one correcting a predefined partition S. Its neighborhood


system can be defined as N(S) = {S(i, t) : i E I, t = 1, ... , m} where S(i, t)
is a partition obtained from S by putting entity i into class St.

Exchange Algorithm
Step O. Take the initial S.
Step 1. Find a pair (i*, t*) maximizing the criterion value in N(S).
Step 2. If the criterion value is better than that for S, put S +-
S(i* , t*) and go to Step 1. Otherwise end.
To make the algorithm more time-efficient, an order on the entity set
can be specified so that, at each iteration, only one i is considered at Step
1; the next iteration deals with the next entity, and so on (the first entity
goes after the last one), until all the entities are tried a predefined number
of times.
Another, quite popular, version ofthe exchange procedure is exploited in
those applications in which the cluster cardinalities are not supposed to be
varied. In this case, the neighborhood system is N(S) := {S(i,j) : i,j E I}
where S(i,j) is a partition obtained from S by switching the entities i and
j between their classes. Again, a prespecified ordering of I can reduce
computations so that at each iteration only one i is taken according to the
ordering (next iteration, next entity). In the problems of dividing a graph in
two even parts (bisection problems), this version is well known as Kernighan-
Lin (1970) heuristics. In the clustering research, it was known somewhat
earlier (see, for instance, a review by Dorofeyuk (1971) referring to a work
of one of the authors published in 1968, in Russian).
Conceptual Clustering
The conceptual clustering algorithms construct decision trees divisively
from top to bottom (with the root to represent the universe considered) using
one variable at each step so that the following problems are addressed:
1. Which class (node of the tree) and by which variable to split?

2. When to stop splitting?


3. How to prune/aggregate the tree if it becomes too large?
4. Which class to assign to a terminal node?
Traditionally, conceptual clustering is considered as an independent set
of machine learning techniques. However, in the framework of this presen-
tation, at least some of conceptual clustering techniques can be considered
Combinatoral Optimization in Clustering 303

as yet other local search procedures. To decide which class S to split and by
which variable it is to be split, the goodness-of-split criterion (6.6) implied
by the theory above can be utilized.
Example. The tree in Fig. 8 shows results of chi-square based criterion (6.8)
applied to the task of splitting the set of seven entities by the data in table 1
considered as binary categories.

Figure 8: A conceptual tree for the binary data in table 1 found with the
summary chi-square coefficient as the goodness-of-split criterion.

6.2 Criteria for Similarity Data


6.2.1 Uniform Partitioning
A similarity matrix A = (aij) given, this clustering model can be expressed
with equation A = )..ssr +E where)" is a real and S = (Sit) is the indicator
matrix of a partition. Minimizing the least-squares criterion, (A -)"Ssr, A-
)..ssr), is equivalent to maximizing a!nw where aw and nw are respectively
the average and number of similarities within all clusters. This problem will
be referred to as the uniform partitioning model. No number of clusters m
prespecified is needed in this model.
With).. predefined (for the sake of brevity, assume).. > 0), the uniform
partitioning criterion is equivalent to
m m
F(S,)..) = L L (aij - ),,/2) = L L()"/2, St) (6.15)
t=l i,jESt t=l

to be maximized, where L{)"/2, St) is a single cluster criterion (5.13). Value


),,/2 is a "soft" similarity threshold requiring that, in general, the larger
similarities fall within the clusters, and the smaller similarities, between
the clusters. In general, the larger ).., the smaller clusters, though it is not
always so. There exists an index of the optimal partition closely following
304 B. Mirkin and 1. Muchnik

changes of >.: the larger >., the smaller the error of proportional prediction
(Gini index), 1- E~l p~, of the optimal partition (Kupershtoh, Mirkin, and
Trofimov (1976)).
When the data matrix is not symmetric, aij =I aji, it can be symmetrized
with transformation aij t- (aij+aji)/2 since criterion (6.15) is invariant with
regard to this transformation. The transformation is applicable anytime
when the approximating clustering structure is symmetric.
Example. The optimal uniform partitions of 10 digits for different thresholds
>../2 are presented in table 10. The Confusion data matrix has been preliminarily
symmetrized; its diagonal entries have been excluded.

Table 10: Uniform partitions of 10 segmented digits (the Confusion data set).

Threshold m Partition Residual Variance


-20 2 1-4-7, 2-3-5-6-8-9-0 0.754
0 4 1-4-7,2,3-5-9,6-8-0 0.476
30 6 1-7,2,3-9,4,5-6,8-10 0.439
50 6 1-7,2,3-9,4,5-6,8-10 0.439
60 7 1-7,2,3-9,4,5,6,8-10 0.468
90 8 1-7,2,3-9,4,5,6,8, 10 0.593

o
The least-squares optimal >.. is the within-partition average similarity,
aw·
This problem can also be addressed in the framework of alternating opti-
mization: (1) reiteration of the steps of optimization of criterion (6.15), with
>.. fixed, and (2) calculation of the within-class-average >., for the partition
found.
Rationales for considering the uniform partitioning problem include the
following (Mirkin (1996)):
(1) In an optimal partition, the average within class similarity is not
larger than >'/2, and the average between class similarity is not smaller than
>./2. This gives an exact meaning of "clusterness" to the uniform partition
classes.
(2) In a thorough experimental study, G. Milligan (1981) has demon-
strated that the usual correlation coefficient between A and 8 sr belongs to
the best goodness-of-fit indices of clustering results. On the other hand, the
coefficient characterizes quality of the matrix (bi)linear regression model,
sr
A = >'8 + J.LU + E (where U is the matrix with all its entries equal to
Combinatoral Optimization in Clustering 305

unity). Therefore, the experimental results may be considered as those justi-


fying use of the latter model as a clustering model; the uniform partitioning
problem is just a shortened version of it. Curiously, this shortened version
better fits into the cluster-wise meaning of the optimal partition than the
original matrix regression model.
(3) Criterion (6.15) appears to be equivalent to that of the index-driven
consensus problem in various settings. (A partition S is called an index-
driven consensus partition if it maximizes ~~=1 ~(Sk, S) where SI, ... , sn
are some given partitions on I and ~(Sk, S) is a between-partition corre-
lation index.) In particular, it is true for the index being the number of
noncoinciding edges in corresponding graphs (Hamming distance between
adjacency matrices). This way the problem of approximation of a graph by
a graph consisting of cliques (Zahn (1963)) fits within this one.
(4) In the context of the Lance-Williams agglomerative clustering, the
uniform partitioning criterion appears to be the only one leading to the
flexible Lance-Williams algorithms (with constant coefficients) as the op-
timization ones. We refer to an agglomerative clustering algorithm as an
optimization one if its every agglomeration step, merging Su and Sv into
Su U Sv, maximizes increment, c5F(U, v) = F(Su U Sv) - F(Su) - F(Sv), of a
set function F.
(5) Criterion (6.15) extends that of graph partitioning in 6.2.4 (when
threshold is zero) providing also a compromise to the requirement of getting
a balanced partition (the larger the threshold, the more uniform are the
cluster sizes as measured by the Gini index).

6.2.2 Additive Partition Clustering


To fit into situations when real-world cluster patterns may show a great
difference in cluster "diameters", a model with the clusters having distinct
intensity weights can be considered (Shepard and Arabie (1979), Mirkin
(1987)):
m
aij = L AtSitSjt + eij (6.16)
t=1
where eij are the residuals to be minimized.
In matrix terms, the model is A = SAsT + E where S is the N x m
matrix of cluster indicator functions, A is the m x m diagonal matrix of AS,
and E = (eij).
When the clusters are assumed mutually nonoverlapping (that is, the
indicator functions St are mutually orthogonal) or/and when fitting of the
306 B. Mirkin and 1. Muchnik

model is made with the sequential fitting SEFIT strategy (see section 6.3),
the data scatter decomposition holds as follows:
m
(A, A) = ~:)stAsf / sfSt]2 + (E, E) (6.17)
t=l

where the least-squares optimal AtS have been put as the within cluster
averages of the (residual) similarities.
It can be seen, from (6.17), that the least-squares fitting of the additive
clustering model (under the nonoverlapping assumption) requires maximiz-
ing of the intermediate term in (6.17), which differs from (6.1) only in that
the terms are squared here. No mathematical results on maximizing this
criterion are known.

6.2.3 Structured Partitioning


Let B = (bij) be a matrix of association or influence coefficients and (8,w)
a structured partition in which w is a digraph of "important" between-class
associations. Such a structured partition can be represented by the Boolean
matrix Sw = (Sij) where Sij = 1 if and only if (t, u) E w for i E 8 t and
j E 8 u • Then, the linear model of the associations approximated by Sw, is:

(6.18)
This model suggests uniting in the same class those entities that identically
interact with the others (being perhaps non associated among themselves).
The problem can be thought of as that of approximation of a large graph,
B, by a smaller graph, (8, w). This smaller graph is frequently called block-
model in social psychology (Arabie, Boorman, and Levitt (1978)). The
approximation model (6.18) is considered in Mirkin (1996).
When A is positive, the least squares fitting problem for model (6.18)
can be equivalently represented as the problem of maximizing

8U( 7r, 8, w) = L L L (bij - 7r) (6.19)


(u,t)Ew iESt jESu

by (8,w) for 7r = A/2.


When there is no constraints on wand 8 is fixed, the optimal w (for given
8) can be easily identified depending on the summary proximity values

b(7r, t, u) =L L (bij - 7r).


iEStjESu
Oombinatoral Optimization in Olustering 307

The structure w maximizing (6.19) for given S is

w(S) = {(t, u) : b(7r, t, u) > O}.


This implies that, with no constraints on w, maximizing criterion (6.19) is
equivalent to maximizing criterion
m
AS(7r, S) = L Ib(7r, t, u)l· (6.20)
t,u=l

which doesn't depend on wand, thus, can be locally optimized by local


search algorithms (Kupershtoh and Trofimov, 1975).
Optimizing threshold>' when S is given can be done with the aggregate
matrix (b(O, t, u)).
Example. Let us consider the Confusion data (between 10 segmented integer digits)
from Table 6, p. 270, with the diagonal entries eliminated. The matrix B centered
by subtracting its grand mean, 33.4556 (with no diagonal entries), is as follows:

- -26.5 -26.5 -11.5 -29.5 -18.5 26.5 -33.5 -29.5 -29.5


-19.5 13.5 -29.5 2.5 13.5 -19.5 -4.5 -26.5 -15.5
-4.5 -4.5 - -26.5 -15.5 -33.5 6.5 -4.5 118.5 -18.5
115.5 -11.5 -29.5 - -29.5 -22.5 -3.5 -26.5 7.5 -33.5
-19.5 -7.5 9.5 -19.5 45.5 -26.5 -26.5 92.5 -19.5
-8.5 -19.5 -26.5 -22.5 63.5 - -29.5 121.5 -22.5 9.5
235.5 -29.5 -12.5 -12.5 -26.5 -33.5 - -33.5 -29.5 -26.5
-22.5 -5.5 -5.5 -15.5 -15.5 36.5 -22.5 33.5 138.5
-8.5 -4.5 77.5 12.5 48.5 -22.5 -12.5 48.5 9.5
-15.5 -29.5 -26.5 -22.5 -26.5 -15.5 -8.5 37.5 -12.5
1 2 3 4 5 6 7 8 9 0

(l) 0--CD
(b) 00 ~-f;\
~

(e) 0--CD
Figure 9: Structured partitions for the Confusion data.
308 B. Mirkin and 1. Muchnik

An intuitively defined aggregate graph of major confusions between the digits


is presented in Fig.9 (a): the non-singleton classes, 4-7, 5-8, and 6-9, unite un-
connected entities. The structure comprises 18 entries in A, some of them being
negative, such as, for instance, a05 = -26.5. This structure is not optimal, for
7r = O. The optimal structure, w(S) with 7r = 0, must include more associations as
shown in Fig.9 (b). The average of all the 25 within structure entries is equal to
46.3, which makes 7r = 23.15 to cut out of the structure the weakest connections,
such as from 2 to 3, with a23 = 13.5 < 7r. The final structured partition is in (c)
(,\ = 68.4) differing from (a) by just only one arrow deleted. 0
In Mirkin (1996) an interpretation of a similar approximation criterion
in terms of organization design is presented.

6.2.4 Graph Partitioning


Graph partitioning is a discipline devoted to the problem of partition of
the set of nodes of a graph whose nodes and/or edges are weighted by non-
negative reals. The partition must be balanced (the classes are to be of
predefined [usually equal] weight or size when the nodes are of constant
weight) and minimize the total weight of between class communications
(edges). This is important, for instance, for parallel computations (making
equal load per processor while minimizing interprocessor communications)
and in very large scale integrated (VLSI) circuits layout (the nodes corre-
spond to chips and edges to wires connecting them).
Since the total weight of between node connections is constant, mini-
mizing between class connections is equivalent to maximizing within class
connections as expressed by the uniform partition criterion (6.15) (with zero
threshold) .
The NP-complete problem of graph bisection (splitting a graph into two
equal size parts) has received most attention in graph partitioning. Be-
sides the Lin-Kernighan exchange procedure discussed above, there are three
kinds of heuristics which attracted most efforts: (a) min-cut extensions, (b)
bisection by a vertex separator, (c) spectral partitioning. Let us describe
them in turn.
As it is well known, the Ford-Fulkerson max-flow algorithm splits the
graph in the optimal way. However, the sizes of two split parts may be far
from equal. This can be corrected with heuristics based on the exchange
procedure (Krishnamurty (1984)). Recently, this has become accompanied
with an option of replicating some nodes (to have them in both of the parts)
so that more interconnections are reduced (Hwang (1995), Liu et al. (1995)).
Combinatoral Optimization in Clustering 309

A vertex separator of a graph (I, A) (A is the matrix of edge weights,


aij ~ 0) is such a subset V c I that removing it along with all incident
edges from the graph results in two disconnected subgraphs of (almost)
equal sizes. An existence theorem (Miller et al. (1993) extending Lipton and
Tarjan (1979)) says that if G is an (0, k) overlap graph in n dimensions with
q nodes, then there exists a vertex separator whose cardinality is at most
O(ok1/nq(n-l)/n) nodes while each of the disconnected parts has at most
q(n + 1)/(n + 2) nodes. This result underlies algorithms for finding vertex
separators by projecting nodes onto lines or spheres so that the separator
corresponds to points projected into the midst (see Miller et al. (1993),
Demmel (1996) and references therein).
Spectral partitioning is applied to an ordinary graph (the edge weights
are zeros or unities) based on that idea that the eigenvector of the data
matrix corresponding to its minimal positive eigenvalue corresponds to that
direction in which the data "cloud" is most elongated. If we can find this
direction, the problem is solved by just cutting it by half. This idea was
elaborated by Pothen, Simon and Liou (1990) based on earlier results by
Fiedler (1975). Let the number of edges be M. The basic construction
is the N x M incidence matrix In(G} whose columns correspond to the
edges having all zero entries except for two incident row-vertices: one is 1,
the other is -1. The Laplacian N x N matrix, L(G} = In(G}In(G}T, has
its diagonal entries, (i, i), equal to the number of incident edges, and non-
diagonal entries, (i,j), equal to -1 (if (i, i) is an edge} or 0 ( (i,j) is not
an edge). It is the minimal positive eigenvalue of L(G} and corresponding
eigenvector, x, which determine the sought bisection: all entities i E I with
Xi > 0 go into one part while the entities with Xi < 0 go into the other.
Computation of the x (approximately) can be done cost-effectively if graph
G is sparse (see also Hagen and Kahng (1992), Demmel (1996)) .
The bisections found can be improved with an exchange (Kernighan-Lin)
heuristics. Applying bisections iteratively to a sequence of "coarser" graphs
(found by aggregating the graph adjacency matrix around vertices belonging
to locally maximal matchings) can make computations faster (see Demmel
(1996), Agrawal et al. (1995)).

6.3 Overlapping Clusters


The problem of revealing overlapping clusters with no predefined overlap
structure can be put in the approximation clustering framework. In the
additive cluster model (6.16), the clusters may overlap (as well as in the
310 B. Mirkin and 1. Muchnik

approximation model (6.10)). The clustering strategies developed so far ex-


ploit the additivity of the model which allows processing one cluster at a
time. The strategies thus involve two nested cycles: (a) a major one dealing
with sequential preparing data for revealing (updating) a cluster, and (b)
a minor one dealing with finding (updating) a single cluster. A straight-
forward implementation of this idea is in the method of sequential fitting
SEFIT (Mirkin (1987, 1990)) which can be applied to any approximation
multiclustering model. The method will be explained here only for the ad-
ditive clustering model:
(a) a residual (similarity) data matrix is obtained by transformation
aij f- aij - >'SiSj where >. and Si are the intensity and membership of the
cluster found at preceeding step (all Si = 1 and >. equal to grand mean are
taken at the first step);
(b) a cluster is found by minimizing a corresponding single cluster clus-
tering criterion, (5.12) in this case,
L2 = L (aij - >'SiSj)2
i,jeI
with regard to unknown real (positive) >. and Boolean Si, i E I. This can
be done with a single cluster clustering method;
(c) a stopping criterion is applied (based on a prespecified threshold: the
number of clusters or the explained contribution to the data scatter). If No,
go to (a); if Yes, end. .
The decomposition (6.17) holds with this method, which allows to evalu-
ate the contributions, thus saliences, of different clusters in the data scatter.
Example. Applying the sequential fitting method for partitioning ten styled digits
by the Confusion data considered as a similarity matrix (the diagonal entries re-
moved, the matrix symmetrized, and the grand mean subtracted), we find a cluster
sequence presented in Table 11.
D
A doubly alternating minimization technique is employed by Chaturvedi
and Carroll (1994) within the same strategy. A somewhat wider approach
is suggested in Hartigan (1972).

7 Hierarchical Structure Clustering


7.1 Approximating Binary Hierarchies
A hierarchy Sw is called binary if every nonsingleton cluster Sw E Sw has
exactly two children, Sw = Swl U Sw2, Swb Sw2 E Sw. Such a cluster can
Combinatoral Optimization in Clustering 311

Table 11: The SEFIT clusters for 10 digits (Confusion data).

Cluster Entities Intensity Contribution, %


1 1-7 131.04 25.4
2 3-9 98.04 14.2
3 6-8-0 54.71 13.3
4 5-9 70.54 7.4
5 5-6 54.54 4.4
6 1-4 52.04 4.0
7 8-9-0 24.31 2.6

be assigned with a ternary nest indicator, c/>w{i), which is equal to aw for


i E Swl and to bw for i E Sw2 and to 0 for i ¢ Sw' The values of aw and bw
are chosen so that C/>W is centered and normed:

(7.1)

where n w, nwb and nw2 are cardinalities of Sw and its two children, Swl
and Sw2, respectively.
It turns out, vectors C/>W are mutually orthogonal, which makes the set
<I>w = {c/>w} an orthonormal basis of all N-dimensional centered vectors.
Any data matrix, Y (preliminarily column-centered), can thus be decom-
posed by the basis:
(7.2)
where <I> = (c/>iw) is the Nx (N -1) matrix of the values ofthe nest indicators
c/>w{i) and C = (Cwk) is an (N - 1) x n matrix. The number N - 1 is the
number of nonsingleton clusters in any binary hierarchy. This implies that
C = <I>Ty, that is,

(7.3)

where Ywlk and Yw2k are the averages of k-th variable in Swl and Sw2,
respectively.
An incomplete binary hierarchy which does not satisfy (a) I E Swor/and
(b) every singleton is in Sw, will be called a cluster hierarchy, divisive if (a)
312 B. Mirkin and 1. Muchnik

holds, or agglomerative if (b) holds. Corresponding nest indicators still form


a basis, though of a space of smaller than N - 1 dimensionality.
For a cluster hierarchy, 8w, the corresponding bilinear clustering model
is similar to that for the single cluster clustering and partitioning:

Yik = L Cwk<Piw + eik (7.4)


wEW

The square data scatter decomposition here is

(Y, Y) = L JL~ + (E,E) (7.5)


wEW

where JLw = <p~yyT <Pw/<p~<Pw is an analogue to the singular value concept.


It is equal to

(7.6)

where Yw! and Yw2 are gravity centers of 8 w! and 8 w2 and d(x, y) is the
Euclidean distance between vectors x, y.
Therefore, finding an optimal (partial) 4' requires maximizing LWEW JL~,
the weighted sum of all between-children-center distances. The optimal
splitting system probably does not satisfy the minimal distance rule (see p.
299), which makes the problem hard.
However, the sequential fitting greedy strategy can be applied to sequen-
tially maximize one JL~ at a time. For a divisive cluster hierarchy, we may
start with W = {I} one by one adding optimal splits to it. Criterion to
optimize by splitting of 8w into 8w! and 8 w2 is

(7.7)

Maximizing (7.7) is equivalent to finding a two class partition of 8w


minimizing the square error clustering criterion D2 in (6.12). The second
formula for JL~ yields that the problem is similar to that in the single prin-
cipal cluster clustering. The optimal splits satisfy the minimal distance rule
which guarantees no more than N n separations to enumerate. Alternating
optimization algorithm also can be applied. Depending on which formula
in (7.7) is employed, the starting setting is taken as the two most distant
points in 8 w (the first formula) or the most "extreme" point in 8 w (the
second formula).
Combinatoral Optimization in Clustering 313

The first formula for J.I.~ is exactly the dissimilarity measure appeared in
the context of Ward agglomerative clustering (see subsection 6.1.7). Thus,
Ward algorithm is exactly the sequential least-squares fitting procedure for
model (7.4) starting with the agglomerative cluster hierarchy consisting of
singletons only.
Example. The sequential split strategy produced a pattern of splits presented in
Fig. 10. The relative cluster value J.I.~ (which is equal to contribution of the split
to the square data scatter) is assigned to each of the three initial splits. 0

57.2%

234 7 6 5

Figure 10: A divisive tree for the data in table 2 found with the least-squares
criterion.

7.2 Indexed Hierarchies and Ultrametrics


To draw a hierarchy 8w as a rooted tree, one needs an index function,
h : W -7 R, assigned to clusters in such a way that 8w ' C 8w implies
h{w') < h{w); the value h(w) corresponds to the height of the node w in a
drawn representation of 8w (see in Figures 12 (d), 14, and 15).
For any indexed hierarchy, a dissimilarity measure D = (d(i,j)) on the
set of the entities can be defined as d{i,j) = h(n[i,j]) where n[i,j] is the
minimum node cluster being the ancestor for both i,j E I. Such a measure
is special: it is an ultrametric (the concept introduced by R. Baire in late
nineteen century), that is, for any i,j, k E I

d{i,j) $ min[d(i,j),d(j,k)] (7.8)

Moreover, any ultrametric D = (d(i,j)) with the range of the distance


values 0 = do < dl < ... < dq determines an indexed hierarchy 8D whose
clusters 8w with h(w) = do are cliques/connected components in the thresh-
old graphs Go = ((i,j) : d(i,j) $ do} (0 = 0, ... , q). Every graph Go is an
equivalence relation graph; its connected components are cliques. Thus, the
314 B. Mirkin and 1. Muchnik

concepts of ultrametric and indexed hierarchy are equivalent. The under-


lying hierarchy is defined up to any monotone increasing transformation of
the ultrametric (index function).
This makes meaningful considering the problem of reconstructing an
indexed hierarchy from a dissimilarity matrix in approximation framework.
The results found for the problem of approximating a given dissimilarity
matrix, (dij) with an ultrametric, "(d(i,j)), satisfying inequality d(i,j) ~ dij
are as follows (Johnson (1967), Gower and Ross (1969), Leclerc (1995)).
Any spanning tree T on I defines an ultrametric, dT(i,j) = max{diljl :
(i' ,;') E T(i,j)} where T(i,j} is the unique path between the vertices i and
j in tree T. In the case when T is an MST for dissimilarity d, dT is the
maximum ultrametric satisfying inequality dqo ~ d, which implies that dT
is an optimal fit according to any criterion monotoneously depending on
the absolute differences between d and dqo. The clusters of the hierarchy
corresponding to dT (when T is a MST) are connected components of the
threshold graphs for original dissimilarity data. Moreover, the hierarchy is
that one found with the single linkage method.
The problem of unconstrained approximation of a given dissimilarity by
an ultrametric using the least-squares approximation is NP-complete (Day
(1987, 1996)) while it is polynomial when Loo norm is employed (Agarwala
et al. (1995)).

7.3 Fitting in Tree Metrics


The between entity distances in an ultrametric are controlled by the cor-
responding index function which may seem too restrictive in some sub-
stantive problems. A concept of tree metric as a less restrictive clustering
structure has emerged. For a tree T on I whose edges are weighted by a
positive weight function, Wij, let us define a metric dTw by the equation
dTw = E(i/,j/)ET(i,j) Wij where T( i, j} is the only path between i and j in
T. It appears (Zaretsky (1965), Buneman (1971}) a metric, (dij), is a tree
metric if and only if the following so-called Jour-point condition is satisfied
for every i,j,k,l E I:

(7.9)

The proof, actually, can be easily derived from the picture in Fig.ll
presenting the general pattern of the tree paths pair-wisely joining the four
vertices involved in (7.9). The tree metrics are related to ultrametrics via
Combinatoral Optimization in Clustering 315

Figure 11: Four-point pattern in an edge weighted tree.

the so called FKE-transform ( Farris, Kluge and Eckart (1970)): let us pick
an arbitrary eEl and define yet another distance on I - {c}

(7.10)

where M M > 0 is chosen to make all the values of de non-negative. It


appears that the following properties are equivalent:
(1) d is a tree metric;
(2) de is an ultrametric for any eEl;
(3) de is an ultrametric for some eEl.
In Fig.12 an example is presented to show correspondence between a
weighted tree and its FKE-transform.

~
o
1 3 1 6 b •
e 5 • b 2
5 1 2 d 7 11 d 8 8
• 11 15 8 • 8 8 6
o b d o b d
e d
b
(e)
(b)
(a)

:1 rr'-1 v

1~ • b d •
(d) (0)
-3
-4
b

Figure 12: Weighted tree (a) and its tree metric (b) transformed into ultra-
metric (c) and indexed hierarchy (d) with FKE-transformation; (e) is the
tree resulting with the inverse transformation.

There is not much known about problems of approximation of a dis-


similarity by tree metrics. However, there are a handful of algorithms for
constructing a convenient tree by a dissimilarity data in such a way that
316 B. Mirkin and I. M uchnik

if the dissimilarity is a tree metric, the algorithm recovers the correspond-


ing tree. Let us formulate the following algorithm of this kind (Sattah and
Tversky (1977), Studier and Keppler (1988)).

Neighbor Joining Algorithm


1. Pick a pair of immediate neighbors, i and j.
2. Form a new node u with its distances du,k = (dik + djk -
dij)/2, k E 1- {i,j}, put it in I and remove i and j (after deleting
i and j, u becomes a leaf).
3. If there are still some entities of I unremoved, go to step 1 (with
the data reduced); otherwise end.
To find a pair of immediate neighbors, the following centrality index
can be employed, c(i,j) = EkEId(k,T(i,j)), where d(k,T(i,j) is the tree
distance from k E I to the path T( i, j) between i, j E I in an underlying tree
T. The index can be calculated when T is unknown by using the following
formula:

where di+ = EkEI dik for any i E I (see Mirkin (1996), Mirkin et al. [Eds.]
(1997) for further detail and references).
Further extensions of the hierarchic cluster structures, first of all, are in
the concepts of Robinson matrix and weak hierarchies (see Diday (1986),
Hubert and Arabie (1994), Bandelt and Dress (1989, 1992); a review can be
found in Mirkin (1996)).

8 Clustering for Aggregable Data


8.1 Box Clustering
We refer to a data matrix P = (Pij), i E I, j E J, as an aggregable one if it
makes sense to add the entries up to their total, p++ = EiEI EjEJ Pij, as it
takes place for contingency, flow or mobility data.
There can be two different goals for the aggregable data analysis: 1)
analysis within row or column set similarities, 2) analysis between row and
column set interrelations.
To analyze row/column interrelations, a cluster structure called box clus-
tering should be utilized. Two subsets, V ~ I and W ~ J, and a real, /-l,
represent a box cluster as presented with III x IJI matrix having its entries
Combinatoral Optimization in Clustering 317

equal to )..ViWj where v and W are Boolean indicators of the subsets V and
W, respectively.
For the aggregable data, a specific approximation clustering strategy
emerges based on the following two features:
(1) it is transformed data entries, qij = Pij/PHP+j -1, are to be approx-
imated rather than the original data Pij j
(2) it is a weighted least-squares criterion employed rather than the com-
mon unweighted one (see Mirkin (1996)).
In the latter reference, the following box clustering model is considered.
The model is a bilinear equation,
m
qij = L J-LtVitWjt + eij (8.1)
t=l

to be fit by minimizing
m
L2 = L LPHP+j(qij - LJ-LtVit Wjt)2 (8.2)
iEI jEJ t=l

with regard to real J-Lt and Boolean Vit, Wjt, t = 1, ... , m.


The following rationales can be suggested to support the box clustering
model:
(1) It is a clustering extension of the method of correspondence analysis
(widely acknowledged to be a genuine method in analysis and visualization
of contingency data, see, for instance, Lebart, Morineau, and Piron (1995)).
(2) When a box ILtVtW'[ is orthogonal to the other boxes, the optimal J-Lt
is also a flow index applied, this time, to subsets vt and W t :

(8.3)

where PVtWt = EiEVt EjEWtPij.


(3) The box clusters found with a doubly-greedy SEFIT-based strategy
(box clusters are extracted one-by-one, and each box cluster is formed with
sequential adding/removing a row/column entity) represent quite deviant
fragments of the data table (Mirkin (1996)).
The problem of finding of an optimal box, at a single SEFIT step, by
maximIzmg
(E iEv EjEW PHP+jqij)2
(8.4)
E iEV PH EjEJ P+j
318 B. Mirkin and 1. Muchnik

_1

Figure 13: Positive RCP boxes in the correspondence analysis factor plane.

over V ~ I and W ~ J, seems to be a combinatorial problem deserving


further investigation.
Example. Applied to the Worries data in Table 7, the aggregable box clustering
algorithm produces 6 clusters; the total contribution of the clusters in the initial
value 1)2 equals some 90 % (see table 12).

Table 12: Box cluster structure of the Worries data set.

Box Columns Rows RCP,% Contrib., %


1 ASAF,IFAA PER 79.5 34.5
2 EUAM,IFEA PER -46.0 20.8
3 ASAF,IFAA POL, ECO -40.5 9.9
4 IFEA OTH, POL 46.1 9.7
5 EUAM POL, MIL, ECO, MTO 18.5 9.3
6 IFEA, ASAF, MIL, MTO -17.5 5.5
IFAA,IFI

The content of Table 12 corresponds to the traditional joint display given by


the first two correspondence analysis factors (see Fig.13 where the columns and the
rows are presented by the circles and the squares, respectively).
Due to the model's properties, all the boxes with positive aggregate flow index
(RCP) values (clusters 1, 4, and 5) correspond to the continuous fragments of the
display (shown on Fig.13); boxes with the negative RCP values are associated with
distant parts of the picture. 0
Box clustering problems can arise for other data types. Levit (1988)
provides a simple (matching based) solution to the problem of finding an all
Combinatoral Optimization in Clustering 319

unity box of maximum perimeter in a Boolean data table.

8.2 Bipartitioning
We refer to a box clustering problem as that of bipartitioning when the
boxes are generated by partitions on each of the sets, I and J. Let 8 = {Vi}
be a partition of I, and T = {Wu}, of J, so that every pair (t, '1.£) labels cor-
responding box (Vi, Wu ) and its weight I'tu. In corresponding specification
of the model (8.1)-(8.2), the optimal values I'tu are qVtW" in (8.3).
Due to mutual orthogonality of the boxes (Vi, Wu ), a decomposition of
the weighted squared scatter of the data, qij, onto the minimized criterion L2
(8.2) and the bipartition part which is just the sum of terms having format of
(8.4), can be made analogously to those in (6.17). The optimization problem
here is an analogue of that related to (6.17). An equivalent reformulation of
the problem involves aggregation of the data based on the Pearson contin-
gency coefficient. Let us aggregate the III x IJI table P = (Pij) into 181 x ITI
table P(8,T) = (Ptu) where Ptu = EiEVt EjEWuPij. In this notation, the
original table is just P = P(I, J). Then, the contingency coefficient is

It is x'iot difficult to see, that the data scatter decomposition, due to the
bilinear model under consideration, is nothing but

{8.5}

which means that the bipartitioning problem is equivalent to that of finding


such an aggregate P(8, T} which maximizes X2(S, T}.
Alternating and agglomerating optimization clustering procedures can
be easily extended to this case (Mirkin (1996)}. Reformulated in geometric
clustering terms, they involve the chi-squared distance defined in section 2.

8.3 Aggregation of Flow Tables


The flow table is an aggregable data table P = (Pij) where I = J as, for in-
stance, in brand switching or digit confusion or input-output tables. Aggre-
gation problem for such a table can be stated as that of bipartitioning with
coinciding partitions, 8 = T, or equivalently, of finding an aggregate table
P(8, 8) maximizing corresponding Pearson contingency coefficient X2(8, 8).
320 B. Mirkin and 1. Muchnik

U.,

I.'

Figure 14: Hierarchical biclustering results for the Worries data.

Another formulation involves finding such a partition, S = {VI, ... , Vm }, that


the aggregate flow index values, qtu, satisfy equations

qij = qtu + fij, i E vt, j E Vu (8.6)


and minimize the criterion, Et,u EiEVt EjEVu PHP+j(qij - qtu)2.
Applying the agglomerative clustering algorithm (by minimizing decrement of
X2(S, S) at each agglomeration step) to Confusion data table (all the entries taken
into account), we obtain the hierarchy presented in Fig. 15. The hierarchy is
indexed by the level of unexplained X2 at each level of aggregation.

174 3 9 5 6 802

Figure 15: Results of agglomerative chi-square based aggregation for Con-


fusion data.
Combinatoral Optimization in Clustering 321

9 Conclusion
Historically, clustering has appeared mostly as a set of ad hoc techniques
such as K-Means and Lance-Williams hierarchical clustering algorithms.
This is why finding appropriate optimal problems to substantiate, spec-
ify, modify and extend clustering algorithms is an important part of the
clustering theory. The clustering techniques are also local, which naturally
leads to problems of revealing such classes of criteria that can be globally
optimized with local search. The matroids are a well known example of such
a class. The concave/convex set functions touched in 5.2 is another class of
this kind.
Another issue for theoretical development is analysis of the properties of
the optimality criteria and interrelations between them since they usually
have no independent meaning except for those in specific industrial or com-
putational applications. As we tried to demonstrate, there is an intrinsic
similarity between many clustering techniques traditionally viewed different
but being, in fact, different local search techniques for the same criterion.
There is a two-way interconnection between combinatorial optimization
and clustering. The combinatorial theory gives to clustering well established
concepts and methods while clustering pays back by supplying a stock of
relevant problems and heuristical computational techniques that are compu-
tationally efficient in solving hard combinatorial optimization problems. It
seems, every polynomially solved combinatorial problem (not only min-cut
or maximum-density-subgraph, but also matching, assignment, etc.) can
contribute to clustering. On the other hand, it should be expected that the
cluster-based search and optimization techniques for combinatorial prob-
lems, already under testing, will be expanding when larger sizes of data will
be processed. There are a few concepts such as minimum spanning tree
or greedy algorithms that have a good theoretical standing in both of the
fields. We also can see an imbalance in the support provided by the combi-
natorial optimization theory to optimal clustering problems: the similarity-
based (graph-theoretic) constructions are much better explored than those
coordinate-based. In particular, the problem of estimating of efficacy of the
alternating minimization partitioning algorithm (K-Means) seems a good
subject for potential analysis: the only estimate known, NIVI, exploits only
one feature, linear separability, of the clusters, while there are more to take
into account. Though, some work on combinatorial analysis in coordinate
spaces is being also done (see, for instance, Callahan and Kosaraju (1995),
Edelsbrunner (1987), Yao (1982)).
322 B. Mirkin and I. Muchnik

Among the other topics of interest are the following:


(a) theoretical and computational support of greedily optimized classes of
set functions such as those mentioned above (sub-modular, concave/convex)
and corresponding single and other cluster structures;
(b) interconnections between the traditional eigen/singular value decom-
positions of matrices and those restricted by discrete cluster structures (as
the nest or partition indicator bases);
(c) further exploring heuristics for hard clustering problems (see, for
instance, Guenoche, Hansen, and Jaumard (1991), Hansen, Jaumard, and
Mladenovic (1995), Hsu and Nemhauser (1979), Johnson and Trick (1996),
McGuinness (1994), Pardalos, Rendl, Wolkowicz (1994)),
(d) developing mathematical theories for clustering and window-based
clustering in spatial data sets;
(e) finding other discrete clustering structures (set systems) and related
problems as emerging in application areas.
In general, the optimization clustering problems have extensive over-
laps with those in the semidefinite programming (Vandenberghe and Boyd
(1996), Pardalos and Wolkowicz (1997)) and the quadratic assignment (Hu-
bert (1987), Pardalos, Rendl, Wolkowicz (1994)). There has not been much
done in applying these global optimization techniques to clustering prob-
lems.

References
[1] R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and
M. Thorup, On the approximability of numerical taxonomy, (DIMACS
Technical Report 95-46, 1995).

[2] A. Agrawal and P. Klein, Cutting down on fill using nested dissec-
tion: Provably good elimination orderings, in A. George, J.R. Gilbert,
and J.W.H. Liu (eds.) Sparse Matrix Computation, (London, Springer-
Verlag, 1993).

[3] P. Arabie, S.A. Boorman, and P.R. Levitt, Constructing block models:
how and why, Journal of Mathematical Psychology Vo1.l7 (1978) pp.
21-63.

[4] P. Arabie and L. Hubert, Combinatorial data analysis, Annu. Rev. Psy-
chol. Vol. 43 (1992) pp. 169-203.
Combinatoral Optimization in Clustering 323

[5] P. Arabie, L. Hubert, G. De Soete (eds.) Classification and Clustering,


(River Edge, NJ: World Scientific Publishers, 1996).

[6] C. Arcelli and G. Sanniti di Baja, Skeletons of planar patterns, in T.Y.


Kong and A. Rosenfeld (eds.) Topological Algorithms for Digital Image
Processing, (Amsterdam, Elsevier, 1996) pp. 99-143.

[7] H.-J. Bandelt and A.W.M. Dress, Weak hierarchies associated with sim-
ilarity measures - an additive clustering technique, Bulletin of Mathe-
matical Biology Vol. 51 (1989) pp. 133-166.

[8] H.-J. Bandelt and A.W.M. Dress, A canonical decomposition theory


for metrics on a finite set, Advances of Mathematics Vol. 92 (1992) pp.
47-105.

[9] J.-P. Benzecri (1973) L'Analyse des Donnees, (Paris, Dunod, 1973).

[10] P.Brucker (1978) On the complexity of clustering problems, in R.Henn


et al. (eds.) Optimization and Operations Research, (Berlin, Springer,
1978) pp. 45 - 54.

[11] P. Buneman, The recovery of trees from measures of dissimilarity, in F.


Hodson, D. Kendall, and P. Tautu (eds.) Mathematics in Archeological
and Historical Sciences, (Edinburg, Edinburg University Press, 1971)
pp. 387-395.

[12] P.B. Callahan and S.R. Kosaraju, A decomposition of multidimensional


point sets with applications to k-nearest neighbors and n-body potential
fields, Journal of ACMVol. 42 (1995) pp. 67-90.

[13] A. Chaturvedi and J.D. Carroll, An alternating optimization approach


to fitting INDCLUS and generalized INDCLUS models, Journal of
Classification Vol. 11 (1994) pp. 155-170.

[14] P.Crescenzi and V.Kann, A compendium of NP optimization problems,


(URL site: http://www.nada.kth.se/viggo/problemlist/ compendium2,
1995).

[15] W.H.E. Day, Computational complexity of inferring phylogenies from


dissimilarity matrices, Bulletin of Mathematical Biology Vol. 49 (1987)
pp. 461-467.
324 B. Mirkin and 1. Muchnik

[16] W.H.E. Day (1996) Complexity theory: An introduction for practioners


of classification, In: P. Arabie, L.J. Hubert, and G. De Soete (Eds.)
Clustering and Classification, World Scientific: River Edge, NJ, 199-
233.
[17] M. Delattre and P. Hansen, Bicriterion cluster analysis, IEEE Trans-
actions on Pattern Analysis and Machine Intelligence (PAMI) Vol. 4
(1980) pp. 277-291.
[18] J. Demmel, Applications of Parallel Computers, (Lectures posted at
web site: http://HTTP.CS.Berkeley.EDU/ demmel/cs267/, 1996).
[19] E. Diday, Orders and overlapping clusters by pyramids, in J. de Leeuw,
W. Heiser, J. Meulman, and F. Critchley (eds.) Multidimensional Data
Analysis, (Leiden, DSWO Press, 1986) pp. 201-234.
[20] A.A. Dorofeyuk, Methods for automatic classification: A Review, Au-
tomation and Remote Control Vol.32 No.12 (1971) pp. 1928-1958.
[21] A.W.M. Dress and W. Terhalle, Well-layered maps - a class of greedily
optimizable set functions, Appl. Math. Lett. Vol.8 No.5 (1995) pp. 77-
80.
[22] H. Edelsbrunner, Algorithms in Combinatorial Geometry (New York,
Springer Verlag, 1987).
[23] M. Fiedler, A property of eigenvectors of nonnegative symmetric ma-
trices and its application to graph theory, Czech. Math. Journal Vo1.25
(1975) pp. 619-637.
[24] D.W. Fisher, Knowledge acquisition via incremental conceptual clus-
tering, Machine Learning Vol.2 (1987) pp. 139-172.
[25] K. Florek, J. Lukaszewicz, H. Perkal, H. Steinhaus, and S. Zubrzycki,
Sur la liason et la division des points d'un ensemble fini, Colloquium
Mathematicum Vol.2 (1951) pp. 282-285.
[26] G. Gallo, M.D. Grigoriadis, and R.E. Tarjan, A fast parametric max-
imum flow algorithm and applications. SIAM Journal on Computing
VoLl8 (1989) pp. 30-55.
[27] M.R. Garey and D.S. Johnson, Computers and Intractability: a guide
to the theory of NP-completeness, (San Francisco, W.H.Freeman and
Company, 1979).
Combinatoral Optimization in Clustering 325

[28] M. Gondran and M. Minoux, Graphs and Algorithms, (New-York,


J.Wiley & Sons, 1984).

[29] J.C. Gower and G.J.S. Ross, Minimum spanning tree and single linkage
cluster analysis, Applied Statistics VoLl8 pp. 54-64.

[30] D. Gusfield, Efficient algorithms for inferring evolutionary trees, Net-


works Vol.21 (1991) pp. 19-28.

[31] A. Guenoche, P. Hansen, and B. Jaumard, Efficient algorithms for di-


visive hierarchical clustering with the diameter criterion, Journal of
Classification Vol.8 (1991) pp. 5-30.

[32] L. Hagen, A.B. Kahng, New spectral methods for ratio cat partitioning
and clustering, IEEE Transactions on Computer-Aided Design VoLlI
No.9 (1992) pp. 1074-1085.

[33] P. Hansen, B. Jaumard, and N. Mladenovic, How to choose K entities


among N. in LJ. Cox, P. Hansen, and B. Julesz (eds.) Partitioning
Data Sets. DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, Providence, American Mathematical Society, 1995)
pp. 105-116.

[34] J.A. Hartigan, Direct clustering of a data matrix, Journal of American


Statistical Association Vol. 67 (1972) pp. 123-129.

[35] J.A. Hartigan, Clustering Algorithms, (New York, J.Wiley & Sons,
1975).
[36] W.-L. Hsu and G.L. Nemhauser, Easy and hard bottleneck location
problems, Discrete Applied Mathematics VoLl (1979) pp. 209-215.

[37] L.J. Hubert, Assignment Methods in Combinatorial Data Analysis,


(New York, M. Dekker, 1987).

[38] L. Hubert and P. Arabie, The analysis of proximity matrices through


sums of matrices having (anti)-Robinson forms, British Journal of
Mathematical and Statistical Psychology Vol.47 (1994) pp. 1-40.

[39] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, (Englewood
Cliffs, NJ, Prentice Hall, 1988).

[40] K. Janich, Linear Algebra, (New York, Springer-Verlag, 1994).


326 B. Mirkin and 1. Muchnik

[41] D.S. Johnson and M.A. Trick (eds.), Cliques, Coloring, and Satisfiabil-
ity. DIMACS Series in Discrete mathematics and theoretical computer
science, V.26. (Providence, Rl, AMS, 1996) 657 p.

[42] S.C. Johnson, Hierarchical clustering schemes, Psychometrika Vo1.32


(1967) pp. 241-245.

[43] Y. Kempner, B. Mirkin, and 1. Muchnik, Monotone linkage clustering


and quasi-concave set functions. Applied Mathematics Letters VoLlO
No.4 (1997)pp. 19-24.

[44] G. Keren and S. Baggen, Recognition models of alphanumeric charac-


ters, Perception and Psychophysics (1981) pp. 234-246.

[45] B. Kernighan and S. Lin, An effective heuristic procedure for partition-


ing of electrical circuits, The Bell System Technical Journal Vo1.49 No.2
(1970) pp. 291-307.

[46] B. Krishnamurthy, An improved min-cut algorithm for partitioning


VLSI networks, IEEE Transactions on Computers Vol.C-33 No.5 (1984)
pp. 438-446.

[47] V. Kupershtoh, B. Mirkin, and V. Trofimov, Sum of within partition


similarities as a clustering criterion, Automation and Remote Control
Vo1.37 No.2 (1976) pp. 548-553.

[48] V. Kupershtoh and V. Trofimov, An algorithm for analysis ofthe struc-


ture in a proximity matrix, Automation and Remote Control Vo1.36
No.ll (1975) pp. 1906-1916.

[49] G.N. Lance and W.T. Williams, A general theory of classificatory sort-
ing strategies: 1. Hierarchical Systems, Compo Journal Vo1.9 (1967) pp.
373-380.

[50] L. Lebart, A. Morineau, and M. Piron, Statistique Exploratoire Multi-


dimensionnelle, (Paris, Dunod, 1995).
[51] B. Leclerc, Minimum spanning trees for tree metrics: abridgments and
adjustments, Journal of Classification VoLl2 (1995) pp. 207-242.

[52] V. Levit, An algorithm for finding a maximum perimeter submatrix


containing only unity, in a zer%ne matrix, in V.S. Pereverzev-Orlov
Combinatoral Optimization in Clustering 327

(ed.) Systems for Transmission and Processing of Data, (Moscow, In-


stitute of Information Transmission Science Press, 1988) pp. 42-45 (in
Russian).

[53] L. Libkin, I. Muchnik, and L. Shvarzer, Quasi-linear monotone systems,


Automation and Remote Control Vo1.50 pp. 1249-1259.

[54] RJ. Lipton and RE. Tarjan, A separator theorem for planar graphs,
SIAM Journal of Appl.Math. Vo1.36 (1979) pp. 177-189.

[55] S. McGuinness, The greedy clique decomposition of a graph, Journal


of Graph Theory VoLl8 (1994) pp. 427-430.

[56] G.L.Miller, S.-H. Teng, W.Thurston, and S.A.Vavasis, Automatic mesh


partitioning, in A. George, J.R Gilbert, and J.W.H. Liu (eds.) Sparse
Matrix Computations: Graph Theory Issues and Algorithms, (London,
Springer-Verlag, 1993).

[57] G.W. Milligan, A Monte Carlo study of thirty internal criterion mea-
sures for cluster analysis, Psychometrika Vo1.46 (1981) pp. 187-199.

[58] B. Mirkin, Additive clustering and qualitative factor analysis methods


for similarity matrices, Journal of Classification Vol.4 (1987) pp. 7-31;
Erratum Vo1.6 (1989) pp. 271-272.

[59] B. Mirkin, A sequential fitting procedure for linear data analysis mod-
els, Journal of Classification Vo1.7 (1990) pp. 167-195.
[60] B. Mirkin, Approximation of association data by structures and clus-
ters, in P.M. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment
and Related Problems. DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, (Providence, American Mathematical
Society, 1994) pp. 293-316.

[61] B. Mirkin, Mathematical Classification and Clustering, (Dordrecht-


Boston-London, Kluwer Academic Publishers, 1996).

[62] B. Mirkin, F. McMorris, F. Roberts, A. Rzhetsky (eds.) Mathematical


Hierarchies and Biology. DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, {Providence, RI, AMS, 1997} 389 p.
[63] I. Muchnik and V. Kamensky, MONOSEL: a SAS macro for model
selection in linear regression analysis, in Proceedings of the Eighteenth
328 B. Mirkin and 1. Muchnik

Annual SAS'" Users Group International Conference (Cary, NO, SAS


INstitute Inc., 1993) pp. 1103-1108.
[64] I.B. Muchnik and L.V. Schwarzer, Nuclei of monotone systems on set
semilattices, Automation and Remote Control Vo1.52 (1989) 1993} pp.
1095-1102.
[65] I.B. Muchnik and L.V. Schwarzer, Maximization of generalized char-
acteristics of functions of monotone systems, Automation and Remote
Control Vo1.53 (1990) pp. 1562-1572.
[66] J. Mullat, Extremal subsystems of monotone systems: I, II; Automation
and Remote Control Vo1.37 (1976) pp. 758-766, pp. 1286-1294.
[67] C.H. Papadimitriou and K Steiglitz, Combinatorial Optimization: Al-
gorithms and Complexity, (Englewood Cliffs, NJ, Prentice-Hall, 1982).
[68] P.M. Pardalos, F. Rendl, and H. Wolkowicz, The quadratic assign-
ment problem: a survey and recent developments. in P. Pardalos and
H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DI-
MACS Series in Discrete Mathematics and Theoretical Computer Sci-
ence, v. 16. (Providence, American Mathematical Society, 1994).
[69] Panos M. Pardalos and Henry Wolkowicz (Eds.) Topics in Semidefi-
nite and Interior-Point Methods. Fields Institute Communications Se-
ries (Providence, American Mathematical Society, 1997).
[70] A. Pothen, H.D. Simon, K-P. Liou, Partitioning sparse matrices with
eigenvectors of graphs, SIAM Journal on Matrix Analysis and Applica-
tions Vol.11 (1990) pp. 430-452.
[71] S. Sattah and A. Tversky, Additive similarity trees, Psychometrika
Vo1.42 (1977) pp. 319-345.
[72] J. Setubal and J. Meidanis, Introduction to Computational Molecular
Biology, (Boston, PWS Publishing Company, 1997).
[73] R.N. Shepard and P. Arabie, Additive clustering: representation of sim-
ilarities as combinations of overlapping properties, Psychological Review
Vo1.86 (1979) pp. 87-123.
[74] J.A. Studier and KJ. Keppler, A note on neighbor-joining algorithm
of Saitou and Nei, Molecular Biology and Evolution Vo1.5 (1988) pp.
729-731.
Combinatoral Optimization in Clustering 329

[75] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Re-


view Vol. 38 (1996) pp. 49-95.

[76] B. Van Cutsem (Ed.), Classification and Dissimilarity Analysis, Lecture


Notes in Statistics, 93 (New York, Springer-Verlag, 1994).

[77] J.H. Ward, Jr, Hierarchical grouping to optimize an objective function,


Journal of American Statist. Assoc. Vol.58 (1963) pp. 236-244.

[78] D.J.A. Welsh, Matroid Theory, (London, Academic Press, 1976).

[79] A.C. Yao, On constructing minimum spanning trees in k-dimensional


space and related problems, SIAM J. Comput. Vol.ll (1982) pp. 721-
736.

[80] C.T. Zahn, Approximating symmetric relations by equivalence rela-


tions, J. Soc. Indust. Appl. Math. Vol. 12, No.4.

[81] K.A. Zaretsky, Reconstruction of a tree from the distances between its
pendant vertices, Uspekhi Math. Nauk (Russian Mathematical Surveys)
Vo1.20 pp. 90-92 (in Russian).
331

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 331-395
@1998 Kluwer Academic Publishers

The Graph Coloring Problem: A Bibliographic


Survey
Panos M. Pardalos and Thelma Mavridou
Center for Applied Optimization, ISE Department
University of Florida, Gainesville, FL 32611
E-mail: pardalosCOufl. edu

Jue Xue
Department of Management Sciences
The City University of Hong Kong
Kowloon, Hong Kong
E-mail: msjxueCOcityu.edu.hk

Contents
1 Introduction 332
1.1 Problem Definition . . . . . . . . . . 332
1.2 Bounds for the Coloring Problem . . . 334
1.3 Problem Formulations . . . . . . . . . . . 335

2 Complexity 338

3 Applications 340
3.1 Timetabling and Scheduling .. .. 340
3.2 Register Allocation . . . . . . . .. . 340
3.3 Frequency Assignment . . . . . . .. 341
3.4 Printed Circuit Board Testing . . .. 342
332 P.M. Pardalos, T. Mavridou, and J. Xue

4 Algorithms for the Graph Coloring Problem 342


4.1 Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . · 343
4.1.1 Sequential Greedy Coloring Heuristics . . . . . . . . . . · 343
4.1.2 Iterative Improvement Heuristic . . . . . · 345
4.1.3 k-Interchange . . . . . . . . . . . . . . . . . . . .. .. 346
4.1.4 Simulated Annealing and Tabu Search . . . . . .. .. 346
4.2 Exact Algorithms . . . . . . . . . . . . . . . . . . . . . .. .. 348

5 Generalized Graph Coloring 350


5.1 Internal Generalizations . . . . . . . . . . . . · 350
5.2 External Generalization . . . . . . . . . . . . · ... 350
5.2.1 List Colorings . . . . . . . . . . . . . . · ... 351
5.2.2 T-Colorings . . . . . . . . . . . . . . . · ... 351
5.2.3 Set Colorings . . . . . . . . . . . . . . . . . · 352
5.2.4 Combined External Generalizations · 353
6 Concluding Remarks 353

References

1 Introd uction
1.1 Problem Definition
In this chapter G = (V, E) denotes an arbitrary undirected graph without
loops, where V = {Vb V2, . .. ,vn } is its vertex set and E = {eb e2, ... ,em} ~
(E X E) is its edge set. Two edges are adjacent if they connect to a common
vertex. Two vertices Vi and Vj are adjacent if there is an edge e = (Vi, Vj) E
E. Finally, if e = (Vi, Vj) E E, we say e is incident to vertices Vi, Vj.
For a subset V' ~ V, G(V') = (V',En(V' x V')) is the subgraph induced
by V'. G = (V, E) is the complementary graph of G, if E satisfies

Vi =J j, (Vi, Vj) E E if and only if (Vi, Vj) f/ E.

It is clear that G is also the complementary graph of G under this definition.


A line graph L( G) of G is a graph with its vertices corresponding to the edges
of G. Two vertices of L(G) are adjacent if their corresponding edges in G
are adjacent.
A vertex coloring of G is an assignment of colors (e.g. {I, 2, ... }) to the
vertices of G so that no adjacent vertices get the same color. Equivalently, a
The Graph Coloring Problem: A Bibliographic Survey 333

vertex coloring is a mapping f: V -+ {I,2, ... } such that for any (Vi,Vj) E
E, f(Vi) =1= f(vj).
When integer weights W = (WI, W2, ... , w n ) are associated with the ver-
tices (VI, V2, ..• , v n ), we can extend the definition of a vertex coloring to its
weighted version. That is, a weighted vertex coloring assigns Wi different
colors to vertex i such that any adjacent vertices Vi and Vj will not share
the same color.
An edge coloring of G is defined as an assignment of colors (e.g. {I, 2, ... })
to the edges of G so that no adjacent edges have the same color. Equiva-
lently, an edge coloring is a mapping g: E -+ {I, 2, ... } such that Vel =
(Vi, Vj), e2 = (Vi, Vk) E E, g(ed =1= g(e2). One can extend the definition of
an edge coloring to that of a weighted edge coloring when there are integer
weights associated with the edges.
From the above definitions, the edge coloring problem of G is the same
as the vertex coloring problem of L(G), hence a special case of the vertex
coloring problem. Some of the special structures of line graphs may be used
in finding an edge coloring. For example, in an edge coloring problem, every
clique (see below) cut consists of edges that form a triangle, or are incident
to the same vertex.
A total coloring of G, introduced by Behzad [29] and Vizing [470], is a
coloring of both the vertices and the edges of G such that adjacent pairs of
vertices, edges, or a vertex and an incident edge have different colors. Similar
to the edge coloring problem, one can transform a total coloring problem
of G into a vertex coloring problem of a total graph G'. The vertices of G'
correspond to the vertices or edges of G. Two vertices of G' are adjacent if
and only if their corresponding vertices, edges, or a vertex and an incident
edge are adjacent. There is a one-to-one correspondence between a vertex
coloring of G' and a total coloring of G.
In this chapter, we will concentrate on the unweighted vertex coloring
problem. To simplify our notations, all colorings will refer to unweighted
vertex colorings unless specified otherwise. Results on other coloring prob-
lems will be referenced when appropriate.
If there exists a coloring of G using no more than k different colors, then
G has a k-coloring and G is k-colorable. The chromatic number x( G) of G
is the smallest integer k for which G is k-colorable. Given a graph G and an
integer k, the graph coloring problem refers to the problem of determining
whether x( G) ~ k. Similarly, one can define the edge and total chromatic
numbers of a graph, and the edge and the total coloring problems.
An independent set, or a stable set, of G = (V, E) is a subset S of V,
334 P. M. Pardalos, T. Mavridou, and J. Xue

where no two vertices of S are adjacent. An independent set is maximal if it


is not a subset of any other independent set of G. A maximum independent
set, denoted by MIS, is an independent set of largest cardinality. The inde-
pendence (stable) number of G, denoted by a(G), is the number of vertices
of a MIS.
A clique, or a complete subgraph, of G is an independent set of G. Sim-
ilarly, we can define a maximal clique, a maximum clique and the clique
number, w(G), of G.
A k-coloring naturally induces a partition of V into k color classes such
that the members of each class are assigned the same color, i.e., they are
pairwise non-adjacent. Therefore, a partition of V into k independent sets
is equivalent to a k-coloring of G. We will use a stable set, a color class, or
simply a color interchangeably with an independent set.

1.2 Bounds for the Coloring Problem


The following bounds on the chromatic number of G are immediate results
from the above definitions:

• X(G) ~ w(G).
Since all vertices of a maximum clique of G are pairwise adjacent, they
must have different colors in any coloring of G. For certain classes of
graphs this bound is tight (e.g. perfect graphs where the equality
holds for all subgraphs of G). However there are also graphs whose
chromatic numbers are strictly larger than their clique numbers (e.g.
odd cycles), or even as far away as one desires (e.g. the Mycielski
graphs, Korman [269]).

• X(G) ~ nja(G), where n = IVI.


Since every color class is an independent set, its cardinality is at most
a(G).

Let us call d(v) = I{ul (u,v) E E}I the degree of v E V and let !:l =
maxvEvd(V) be the maximum degree of G = (V, E). Then, in addition to
the above simple lower bounds, other bounds on the chromatic number of a
graph of n vertices and m edges (see Korman [269]) include the following:

• X(G) ~ n2j(n 2 - 2m).


• X(G) ~ 1 + (2m(n _1)jn)1/2.
The Graph Coloring Problem: A Bibliographic Survey 335

• If G is not a clique or an odd cycle, X(G) ~ fl.. (Brook [61], Lovasz


[301]).

Since every feasible coloring of G will provide an upper bound on the


value of X(G), it is not hard to find an upper bound for X(G) of a given
graph. For example, given a graph of n vertices, one simple upper bound
can be obtained by applying a greedy heuristic to an ordered list (Vb ... , Vn )
of V:
The vertices will be colored one at a time, following the given
order, and each vertex is colored by the smallest feasible color.
Despite the simplicity of this greedy heuristic, it is known that there
exists at least one ordering of the vertices such that the above heuristic
coloring is optimal (see section 4.1.1). However, finding such an ordering
does not seem easier than the coloring problem itself. Heuristics trying to
find a "good" ordering are abundant in the literature (see section 4).
Let (VI"'" vn ) be an ordering of the vertices of G and dj (Vj) be the
degree of Vj in the subgraph G(Vb ... ,Vj) induced by {Vb ... ,Vj}. Then a
feasible coloring obtained by applying the above heuristic to (Vl,"" vn )
will have value bounded by 1 + maxl::=:;j::=:;ndj(Vj). Hence, we have

For the edge coloring problem, Vizing's theorem [469] shows that the
edge chromatic number is either fl. or fl. + 1. When G is a bipartite graph
(i.e., X(G) = 2), the edge chromatic number is fl. (Konig [266]).
If we use X( G), X' (G), and X" (G) to denote the vertex, the edge, and the
total chromatic numbers of a graph G, then it is easy to see that X" (G) :$
X(G) + X'(G). Other bounds on X"{G) can be found in the literature, for
example [28, 89, 103, 274, 396, 397,466]. A conjecture for the total chromatic
number asks whether X"{G) :$ fl. + 2 (Chetwynd [89], Vizing [470]).

1.3 Problem Formulations


Similar to many combinatorial optimization problems, the graph coloring
problem has several mathematical programming formulations. Six such for-
mulations are presented in the following.
(F-l):
336 P. M. Pardalos, T. Mavridou, and J. Xue

s.t. Lk=l Xik = 1, V Vi E V (1)


Xik + Xjk ~ 1, V (Vi,Vj) E E (2)
Yk 2:: Xik, V Vi E V, k = 1, ... , n (3)
Yk, Xik E {O, I}, V Vi E V, k = 1, ... , n. (4)
In the above model, Yk = 1 if color k is used. The binary variables Xik
are associated with vertex Vi. Xik = 1 if and only if color k is assigned to
vertex Vi. Constraints (1) ensure that exactly one color is assigned to each
vertex. Constraints (2) prevent adjacent vertices from having the same color.
Constraints (3) guarantee that no Xik can be 1 unless color k is used. The
optimal objective function value gives the chromatic number of the graph.
Moreover, the sets 8k = {i I Xik = I}, for all k, comprise a partition of the
vertices into (minimum number of) independent sets.
(F-2):
min
s.t. Xi ~ 'Y (5)
Xi - Xj -12:: -nDij, V (Vi,Vj) E E (6)
Xj - Xi -1 2:: -n(l- Dij), V (Vi, Vj) E E (7)
Dij E{O,I},xi EZ+, VVi,VjEV

The value of Xi indicates which color is assigned to Vi, for i = 1, ... ,n.
Constraints (6) and (7) prevent two adjacent vertices from having the same
color. This can be seen by noting that if Xi = X j, then no feasible assignment
of Dij will satisfy both (6) and (7). The optimal objective function value
equals the chromatic number of the graph under consideration.
The coloring problem can also be formulated as a set partitioning prob-
lem (Korman [269]). Let 8 1 ,82 , •.. , 8 t be all the independent sets of G. Let
the rows of the 0-1 matrix As be the characteristic vectors of 8j, j = 1, ... , t.
Define variables Sj and constants eij as follows:
I if 8J' is a chosen color class
sJ' = { '
0, otherwise.

1, if vertex Vi E 8j
e;j = { 0, otherwise.

(F-3):
The Graph Coloring Problem: A Bibliographic Survey 337

s.t. sAs = 1, (8)


Sj E {O,l}, j = 1, ... ,t,

where S = (SI' ... , St), and 1= (1, ... , 1) is of dimension n.


(F-4):

min E~=1 Sj
s.t. E~=1 eijSj = 1, i = 1, ... , n. (9)
Sj E {O, I}, j = 1, ... , t.

The graph coloring problem can also be formulated as a special case of


the quadratic assignment problem (Pardalos and Wolkowicz [362]). Given
an integer r and a graph G = (V, E), the optimal objective function value
of the following problem is zero if and only if G is r-colorable.
(F-5):

mm Ek=1 E(i,j)EE XikXjk


s.t. Ek=l Xjk = 1, j = 1, ... ,n (10)
Xjk E {O, I}, 'Vj, k,

In a paper by Karger et al. [249] a semidefinite optimization problem is


constructed whose optimum is - k~1 where k is the smallest number such
that a matrix k-coloring of G exists. The optimal solution provides a matrix
k-coloring of G.
(F-6):
min 0

s.t. qij ~ 0, if (Vi, Vj) E E (11)


% =qji (12)
%=1 (13)

where matrix Q = {%} is positive semidefinite.


In the above formulations, (F-l), (F-2) have polynomial numbers of
variables and constraints. (F-3) has an exponential number of constraints,
while (F-4) is the same as (F-3), except its constraints are the transpose
of (F-3). (F-5) has a quadratic objective function with very simple con-
straints, and (F-6) uses the concept of a semi-definite program.
Although all these formulations are for the same graph coloring problem,
each formulation provides its unique perspective and distinctive advantages
338 P. M. Pardalos, T. Mavridou, and J. Xue

in designing solution methods for the graph coloring problem. For example,
formulation (F-4) provides a convenient starting point for a column gen-
eration approach (Mehrotra and Trick [332]), while the linear relaxation of
(F-3) provides a tighter convex hull than that of (F-l) (Balas and Xue [20],
Grotschel et al. [191]).
Before we end this section, let us introduce a class of random graphs
that have been studied and tested extensively, especially for the coloring
problem. This class of random graphs is denoted by Gn,p, where n refers to
the number of vertices and p refers to the "density". An instance of Gn,p
is generated by creating n vertices and adding an edge between a pair of
the vertices according to the independent probability p. Numerous results
concerning the coloring problem on this class of graphs can be found in the
literature, for example, in [47, 148, 149, 160, 162, 190, 228, 251, 308, 316,
325, 324, 415, 416].

2 Complexity
In 1972 Karp [250] showed that the graph coloring problem is NP-complete.
Garey et al. [170] and Stockmeyer [429] strengthened this result by show-
ing that the k-colorable graph problem remains NP-complete for any fixed
k ~ 3. Given the difficulty of the general graph coloring problem, many
researchers have developed polynomial-time approximation algorithms with
performance guarantees. The performance guarantee is the largest ratio,
taken over all graphs with n vertices, of the number of colors used by a
heuristic over the minimum number required.
The earliest such algorithm appeared in 1974 by Johnson [240] which,
for any graph with n vertices, gives a performance guarantee of O(nlogn).
It was not until 1983 that an improvement to the above guarantee was
achieved: Wigderson [480] developed a procedure with a performance guar-
antee of O(n(loglogn)2j(logn)2). Further improvement was achieved by
Berger and Rompel [35] who provided a heuristic with a performanceguar-
antee of O(n(loglogn)3j(logn)3). Most recently Halld6rsson [205] gave an
efficient heuristic with a performance guarantee of O(n(log logn)2 j(logn)3).
In Karger et al. [249] a randomized polynomial time algorithm is pre-
sented that uses at most min{O(a l - 2/ k ), O(n l - 3/(k+1»)} colors to color a
k-colorable graphs, where a is the maximum degree among all vertices. This
not only gives the best known approximation ratio in terms of n but also
gives the first non-trivial approximation results in terms of a. It is rea-
The Graph Coloring Problem: A Bibliographic Survey 339

sonable to believe that there exists some lower bound on the performance
guarantee that can be achieved with a polynomial-time approximation algo-
rithm. Indeed as early as 1976, Garey and Johnson [169] showed that unless
P = NP, no polynomial-time algorithm can have a performance guarantee
of less than Sx( G) colors, where s < 2.

Based on the work of Arora et al. and Arora and Safra [14, 15], Lund and
Yannakakis [309] showed that for two constants Cl and C2, a < Cl < C2 < 1,
it is N P-hard to color nCl-colorable graphs with n C2 colors. In addition,
it is shown that for every constant h there is a constant Ch such that it is
N P-hard to color a ch-colorable graph with hCh colors. Khanna et al. [254]
proved that approximating the chromatic number of a graph to within nf.,
E > 0, is N P-hard. More recently, Bellare and Sudan showed that it is
NP-hard to approximate the chromatic number to within E ~ 110 [31].

Rather than examining worst-case results some researchers have con-


ducted probabilistic analyses to evaluate an algorithm's average perfor-
mance. It has been observed that random k - colorable dense graphs tended
to be easy to color optimally: Kucera [283] and 'IUrner [459] give polynomial-
time algorithms which almost surely optimally color random k-colorable
graphs (a property of a graph with n vertices is said to occur almost surely
if the probability of occurrence of this property approaches 1 as n goes to
infinity). Algorithms with probabilistic analysis of their performance can be
found in [4, 32, 44, 190, 282, 368, 482].

On the other hand, if a graph has some special properties, then its color-
ing problem may be solvable in polynomial time. For example, if a graph is
an interval graph, or a triangulated graph (Golumbic [185]), then its color-
ing problem, together with other closely related problems, can be solved in
polynomial time. For example, Gavril [172] proposed algorithms to solve the
coloring problem, the maximum clique problem, the minimum clique cover-
ing problem, and the maximum stable set problem of triangulated graphs.
Their weighted version can also be solved in polynomial time (Balas and
Xue [21], Golumbic [185], Rose et al. [395]). The coloring problem on
graphs other than triangulated graphs may also be solved in polynomial
time. For example, the graph coloring problem of perfect graphs is shown
to be polynomial solvable by Grotschel et al. [191]. It should be noted that
the class of perfect graphs includes, among others, Meyniel graphs (Hertz
[215]), triangulated graphs and bipartite graphs.
340 P. M. Pardalos, T. Mavridou, and J. Xue

3 Applications
Many practical problems can be modeled as graph coloring problems. The
general form of these models is a graph with vertices representing items of
interest and edges connecting pairs of items with an (un)desirable binary
relationship.

3.1 Timetabling and Scheduling


Scheduling problems often involve restrictions in which pairs of activities
cannot be performed simultaneously. For example, in scheduling courses at
a university, two courses taught by the same individual cannot be scheduled
at the same time. If the courses to be scheduled are represented by the
vertices of a graph and every pair of courses that cannot be scheduled at the
same time are connected by an edge, then a coloring of this graph provides
a feasible schedule of the courses. If the goal is to minimize the number of
time slots needed, then the problem is that of finding the chromatic number
of the graph (assuming each course take the same amount of time).
This kind of scheduling problem is also called the timetabling problem.
An introduction to the timetabling problems can be found, for example,
in the work of de Werra [116]. Timetabling problems have been studied
extensively by many researchers including [11, 59, 99, 119, 117, 116, 127,
134, 289, 344, 63, 404, 478, 491, 490]. Schmidt and Strohleim [408] provide
an annotated bibliography for the timetabling problem.
Similar problems such as scheduling meetings in the New York State As-
sembly (Bodin and Friedman [45]), traffic lights scheduling (phasing) (Stof-
fers [430], Opsut and Roberts [352]), scheduling of tasks requiring the same
resources (Opsut and Roberts [352], Rafaeli el al. [373]), scheduling of fleet
maintenance (Golumbic [185], Opsut and Roberts [352]), scheduling of flights
(Feo and Resende [146]), and scheduling of municipal waste collection (Bel-
trami and Bodin [30], Tucker [463]) have also been formulated and studied
by the graph coloring model.

3.2 Register Allocation


During the execution of a computer program, both the number of instruc-
tions and the execution time per instruction may be reduced by changing
memory operands to register operands. Since the number of hardware regis-
ters is limited and is usually less than the number of variables, it is necessary
The Graph Coloring Problem: A Bibliographic Survey 341

to assign multiple variables to the same register. A register allocator, usu-


ally at the end of a program development, is used to assign variables to
hardware registers. The register allocator attempts to minimize the number
of memory references.
For register allocation, a conflict graph Gc is constructed from the pro-
gram code. Vertices represent variables. Two vertices are connected by
an edge if they are in conflict, i.e., if one variable is used both before and
after another variable within a short period of time. A coloring of the con-
flict graph is then a conflict-free assignment of registers to variables. If the
chromatic number of Gc is no more than the number of available hardware
registers, then a conflict-free assignment is possible. If the chromatic num-
ber of G c is greater than the number of available hardware registers, then
spill code must be added. Spilling a variable means keeping it in memory
rather than in a register. If that is the case, vertices corresponding to the
spilled variables are deleted from Gc, one at a time, until the chromatic
number of Gc is equal to the number of available hardware registers. At
this time, an (partial) assignment of the variables to the registers is found.
Early work on the graph coloring model for the register allocation problem
can be found in Chaitin and Chaitin el al. [81, 82]. Further work on this
application appears in [17, 58, 93, 92, 154, 265, 346, 369, 509].

3.3 Frequency Assignment


When different communication links (users) are assigned either the same
frequencies or similar frequencies, interference among the links may occur.
In tele-communication industry, the increasing demand for the number of
different frequencies (channels) has not been met by the increasing number
of usable frequencies. The goal of frequency assignment problem is to as-
sign different frequencies to users in such a way that communication among
the links has no, or the least amount of interference. Graph coloring and
generalized graph coloring models have been useful for such problems. Two
of the simplest constraints in assigning frequencies to users are co-channel
constraints and adjacent channel constraints.
Co-channel constraints: Two transmitters cannot use the same frequency
during the same time period if they are located within a given distance
of each other. A geometric graph can be constructed where the vertices
represent locations of transmitters and two vertices are connected by an
edge if they are within the given distance of each other. Then the minimum
number of frequencies needed to ensure no interference among the users
342 P. M. Pardalos, T. Mavridou, and J. Xue

equals the chromatic number of the geometric graph (Baybars [27], Hale
[203], and Metzger [334]).
Adjacent channel constraints: It is often the case that two transmitters
cannot use adjacent frequencies. This type of constraint necessitates a gen-
eralized graph coloring model. Namely a list of taboo colors (frequencies)
is associated with different vertices. Incorporating this type of constraint
has motivated the development of generalized graph coloring problems (see
section 5).
The pervasiveness of mobile and wireless communications in the world
today has made the graph coloring problem even more important. It has
become a critical part of the design step in the creation of mobile radio
telephone systems. Extensive research has been done with respect to this
application, see [27, 109, 108, 166, 167, 165, 178, 203, 230, 286, 290, 310,
352, 367, 366, 378, 386, 384, 407, 410, 412, 411, 421, 477, 511, 510].

3.4 Printed Circuit Board Testing

In Garey and Johnson [169] application of graph coloring to printed circuit


board testing is described. In printed circuit board construction, connections
are made by joining electrically common nodes into nets. Shorts occur if
careless soldering results in two nets being erroneously connected. One way
of testing for shorts is to check all pairs of nets. However, nets that are
some distance apart need not be tested. Let G be a graph with vertices
representing the nets and edges representing adjacent nets. Then an efficient
test for shorts begins by coloring G with k colors (in [169] it is shown that k =
12,8, or 5 colors are sufficient depending on one's definition of adjacency).
A yoke is then constructed connecting each net in a particular color class.
Now, since adjacent nets are in different yokes (color classes), testing only
the (~) pairs of yokes will uncover any shorts.

4 Algorithms for the Graph Coloring Problem

During the last decades extensive research on the solution of a wide class
of graph coloring problems has resulted in a variety of exact, heuristic, and
approximate methods. In this section we describe some of the well known
algorithms.
The Graph Coloring Problem: A Bibliographic Survey 343

4.1 Heuristic Algorithms


Extensive work has been presented in the literature to solve vertex coloring
problem on general graphs using heuristic methods. For example, some of
the recent work can be found in the 1993 DIMACS Implementation Chal-
lenge [112, 183, 291, 337]. The objective of these heuristics is to find good
colorings in a reasonable amount of time. Two basic approaches have domi-
nated this category, especially for solving coloring problems on large random
graphs: the Successive A ugmentation Algorithms and the Iterative Improve-
ment Algorithms.

4.1.1 Sequential Greedy Coloring Heuristics


The sequential greedy coloring heuristics (SGCH) extend a partial (could be
empty) coloring by successively augmenting the number of colored vertices
(color an un-colored vertex). In SGCH, once a color is assigned to a vertex,
it will not change.
Typically, SGCH will first order the vertices according to certain criterion
(e.g. in decreasing order of their degrees, Welsh and Powell [478]). Then
it will color each vertex, in the order specified, with the smallest feasible
color. The quality of the colorings provided by SGCH depends on the initial
ordering of the vertices. For a graph of n vertices, there exists n! orderings
of the vertices of V.
Various ordering schemes have been proposed and tested. For example,
Welsh and Powell [478] have sorted the vertices in order of decreasing degree
and called their heuristic the Largest First (LF).
From the fact that SGCH will apply greedy coloring heuristic to an a
prior ordering (Vb ... , vn ) of the vertices of G, a bound for the resulting
coloring can be derived; this bound is 1 + maxl::5j::5ndj(Vj), where dj(vj) is
the degree ofvj in the subgraph Gj = G(Vl, ... ,Vj) induced by {Vl, ... ,Vj}.
To minimize this value, the ordering of the vertices (Vb ... , vn) should satisfy

The Smallest Last (SL) ordering by Matula et al. [321] propose an ordering
of the vertices through the following two steps:

1. Let Vn be a vertex of minimum degree in G = Gn •

2. For j =n - 1 to 1, let Vj be a vertex of minimum degree in Gj.


344 P. M. Pardalos, T. Mavridou, and J. Xue

In both LF and SL, the vertices are pre-ordered before the application of
SGCH. A simple and well-known SGCH that does not pre-order the vertices
is Brelaz's Dsatur heuristic [55]. In Dsatur, the vertex to be colored next
is the one that has the maximum saturation degree, i.e. the vertex whose
colored neighbors contain the largest number of different colors. Dsatur will
color a vertex with the minimal feasible color.
Another example of coloring heuristic that does not pre-order the vertices
is proposed by Leighton [289]. It is called the Recursive Largest First (RLF)
heuristic. RLF generates color classes one at a time. Similar to Dsatur,
the vertex to be colored next is defined during the coloring process. To
generate a color class, RLF introduces two sets VR and U, which form a
partition of the current uncolored vertices. Initially U = 0 (VR contains
all uncolored vertices). The members of U are those uncolored vertices
that cannot become a member of the color class under construction. RLF
generates color classes one at a time until all vertices are in some color
classes. The following two steps are used in RLF to generate a color class
Sj:

Step 1 Choose a vertex v with the maximum degree in subgraph G(VR)


induced by YR. Place v in Sj and move all u E VR with (u, v) E E,
from VR to U.

Step 2 While VR i= 0 do:


Choose a vertex v E VR that has the largest number of neighbors in U
and add v to Sj.
Move all neighbors of v from VR to U.

The implementation of the above SGCH is relatively simple. For ex-


ample, efficient implementations can be found in Syslo et. al [435] and
Morgenstern and Shapiro [340]. Basically, these SGCH find a coloring of
G in a single try. If the order of the vertex is not carefully designed, the
quality of its solution may suffer.
There is no known ordering scheme that is superior over all other ordering
schemes. However, it is known that for every graph G, there always exists an
ordering of the vertices such that the application of SGCH to the ordering
will find an optimal coloring. To see this, let an optimal coloring to be a
partition Sl, ... , St of V, where SjS are color classes. If a SGCH is applied
to any ordering of the form (SI, ... , St), where the members of each Sj is
arbitrarily ordered, it will generate an optimal coloring of G.
The Graph Coloring Problem: A Bibliographic Survey 345

It is unlikely that such an ordering can be found efficiently. However,


research on how one may start from a given ordering of V, and gradually
move towards an "optimal ordering" are available (for example, see the work
of Biggs [39], Chams et al. [83], and White [479]).
Using an idea of Johri and Matula [243], Johnson et al. [241] have
proposed a successive augmentation algorithm called XRLF which is a gen-
eralized version of RLF heuristic with several control parameters. XRLF
first constructs many independent sets to be used as potential candidates of
color classes. Then XRLF chooses (iteratively) an independent set as the
next color class whose removal minimizes the edge density of the remaining
subgraph. When the number of uncolored vertices becomes relatively small,
the algorithm switches to an exhaustive search. The algorithm has been
tested on random graphs Gn,p with n ~ 1000 and p = 0.5. The test results
in [243] seem to indicate that XRLF spends a considerable amount of CPU
time and produces promising results. However, on other randomly gener-
ated graphs, their results are not as good as those of: a) the Morgenstern's
[339] iterative improvement algorithm for p ~ 0.5, and b) the Dsatur (Brelaz
[55]) for other p values.

4.1.2 Iterative Improvement Heuristic

Let {SI,S2"",Sr} be a coloring of G (the SiS are color classes). We can


order the vertices of V so that the members of each color class are in con-
secutive positions. Such an ordering will have the form (Sil' Si2' ... , Sir) (a
permutation of {S1, ... , Sr }). The ordering of the members inside each Sij
is arbitrary. The total number of such orderings is ISll * IS21 * ... * ISrl * r!.
The iterative greedy coloring heuristic (IG) of Culberson [111] applies
SGCH iteratively to such vertex orderings, where the vertex ordering for the
next application of SGCH is based on the graph coloring found in the current
iteration. The idea of IG originates from the observation that the coloring
from applying SGCH to such an vertex ordering will not deteriorate from
the previous one. In [111], various ordering schemes of (Sil' ... , Sir) have
been proposed and tested. It was found that in general, mixing different
ordering schemes is more effective than a pure ordering scheme. A possible
reason for this observation is that the mixing of ordering schemes allows
IG to search a "larger" neighborhood than that of a pure ordering scheme,
hence the solution quality tends to be better.
346 P. M. Pardalos, T. Mavridou, and J. Xue

4.1.3 k-Interchange
Another improvement to a given coloring has been that of interchanges.
What an interchange does is to switch previously colored vertices to different
color classes. If a color class can be removed after the switching process,
then a better coloring is found. Matula et al. [321] have proposed the
procedures LFI and SLI, which apply interchanges to the heuristic coloring
found by the application of SGCH to vertex sequence in Largest-First (LF)
and Smallest-Last (SL) orders [321].

4.1.4 Simulated Annealing and Tabu Search


One main reason for the difficulty of many combinatorial optimization prob-
lems is the massive number of feasible, or even optimal, solutions. When an
(implicit) exhaustive search method is applied to such a problem, it could
be very time consuming if the structural property of the solution space is
not fully exploited. When solution time is of concern, partial search of the
feasible solution space becomes a natural and practical alternative to the
(implicit) exhaustive search. That is, it may be worth to trade the (guar-
anteed) optimality with the improvement on solution time.
One of the most important questions in designing a partial search method
concerns what part of the solution space to search. In most cases, the
characteristics of the feasible solution space is not known before the search
starts, or even after the search is complete. Therefore, most traditional
search methods are designed to improve a current solution, i.e., to move
from a feasible solution to a better one in the solution space until there is
no better solution in a "neighborhood". One potential trap of this strategy
is when the solution space has many local optimal points, i.e., an optimal
point x* in a small "neighborhood" of x*. Once the search moves into such a
"neighborhood", it will stay there, hence, the search fails to find an optimal
solution.
To avoid these kinds of traps, many recent search methods use some
mechanism to "escape" Uump) out of a "neighborhood". Such jumping may
temporarily deteriorate the objective value. However, it will lead the search
to different parts of the solution space, i.e., outside a particular "neighbor-
hood".
In the past ten years, many heuristic search methods have been pro-
posed. Simulated Annealing (SA) and Tabu Search (TS) are two heuristic
techniques designed to avoid local traps and cycling. They enable the search
The Graph Coloring Problem: A Bibliographic Survey 347

of various parts of the solution space. In SA, random factors are introduced
to let the search jump out of the "neighborhood" of a local optimal point
and to prevent cycling, while traditional TS uses its own mechanisms for
the same purpose. Efforts to combine these approaches are emerging.
SA was first introduced in Metropolis et. al. [333] to simulate the cool-
ing process (annealing) of solid material after being heated over its melting
point. The structural properties of the cooled solid material can be con-
trolled by the cooling temperature during the annealing process. Kirkpatrick
et. al [261] suggested that the above simulation process could be used to
search for a better solution of an optimization problem. The idea is to in-
troduce some (controlled) random factors (as temperature in an annealing
process) to move the search into different parts of the solution space.
TS was first introduced by Glover [182] to imitate the memory process.
TS uses a list (tabu, or tabu list) to record certain "recently visited" feasible
solutions (states). It forbids the re-visiting of the states on the list. By
controlling the tabu list, TS can influence the degree of diversification and
the intensity of the search. Four key parameters of a TS method are regency,
frequency, quality and influence (Reeves [380]).
Several SA based heuristics have been proposed for the graph coloring
problem. For example, Chams et al. [83] have presented a pure simulated
annealing heuristic and a heuristic that combines SA and RLF. This com-
bined heuristic, called SA-RLF, first uses RLF to construct color classes
until the number of uncolored vertices reaches a specific level (a user spec-
ified parameter). Once this level is reached, the SA-RLF switches to color
the remaining graph by a simulated annealing heuristic. SA-RLF heuristic
has been tested on different families of random graphs with up to 1,000
vertices. The computational results from SA-RLF were compared with the
results from other coloring heuristics such as Dsatur (Brelaz [55]) and RLF
(Brown [63]). According to the results contained in Chams et al. [83],
SA-RLF seems to give the best results in terms of CPU time and solution
quality.
Johnson et al. [241] have also applied a simulated annealing heuristic
to graph coloring and other combinatorial optimization problems. More
recently, Klimowicz and Kubale [262] presented experimental results of T S
and SA heuristics on the graph coloring problem. They compared their
results with those from sequential coloring heuristic with interchange (SL1)
(Matula et al. [321]) on random graphs Gn,p with up to 500 vertices and
density up to 0.75.
Finding heuristic coloring solutions by Tabu Search methods can be
348 P. M. Pardalos, T. Mavridou, and J. X ue

found in de Werra [115], Hertz [215], and Hertz and de Werra [217].

4.2 Exact Algorithms


As we have seen in section 2, the general graph coloring problem is N P-
complete (Karp [250]), and the problem of determining the chromatic num-
ber of an arbitrary graph within a worst-case ratio of less than O(nf), € > 0,
is also NP-complete (Garey and Johnson [169], Khanna et al. [254]). So it
is reasonable to believe that no polynomial-time algorithm exist for exact
graph coloring problem. It seems that the most popular and efficient method
to solve the graph coloring problem is through implicit enumeration. The
first algorithm of this approach is due to Brown [63]. The Brown algorithm
colors the vertices (forward phase) sequentially according to a pre-specified
order, similar to that of SGCH. To find an alternative feasible coloring, it
uses backtracking to find the starting point of the next forward phase.
Let us consider a graph G = (V, E). Initially, a feasible coloring using
ub colors is achieved after the vertices of G are ordered by some procedure,
(e.g. VI, V2, ••• ,vn ). Based on the ordering, each vertex is assigned the
lowest feasible color. Using ub as an upper bound, the algorithm backtracks
in the reverse order of (VI, ... , vn ) until it encounters a vertex vi that is
recolorable by an alternative feasible color « ub) that has not been used
for vi. Color Vi with the smallest alternative feasible color and proceed to
a new forward phase up to either a vertex Vi that requires the color ub, or
to Vn that can be assigned a color < ub. In the latter case, a better coloring
is found and ub can be updated. In both cases, the algorithm backtracks
with the (new) bound ub. The algorithm will terminate when the backtrack
reaches vertex VI.
One improvement to this implicit enumeration algorithm is to use a
"look-ahead" step in the forward phase. This time, in the forward phase,
the next vertex to color is decided dynamically during the forward phase.
Also, the color used to color a vertex is not necessary the smallest feasible
color.
Modified versions of Brown's algorithm can be found in Brelaz [55] and
Korman [269]. There, the concept of "dynamic reordering" has been incor-
porated in the original algorithm in order to improve its efficiency. The main
idea of this "dynamic reordering" is to choose the next vertex (to color) as
the one with the smallest number of alternative feasible color assignments.
When coloring a vertex, the algorithm chooses the color that is inadmissible
to the largest number of uncolored vertices. By doing this, the search tree
The Graph Coloring Problem: A Bibliographic Survey 349

size is reduced dramatically ([55, 269]).


Kubale and Jackowski [279] presented a generalized implicit enumeration
algorithm for graph coloring based on Brown's approach [63]. Their method
was tested on random graphs Gn,p with n = 10(10)60 vertices and graph
density p = .1{.2).9, where a(b)c indicate the starting value a, ending value
c, and a gap b of the graph sizes and densities of the tested problems.
Their computational results were presented and compared with those of
Christofides's [94], Brelaz's [55], and Korman's [269].
Campers et. al [74] presented another implicit enumeration algorithm
and compared it with the exact coloring algorithms of Wang [476] and Brown
[63], on 80 different "Leighton" graphs with the number of vertices up to
100, and density up to 0.8.
In 1995, Mehrotra and Trick [332] proposed an exact algorithm. Their
algorithm uses formulation (F-4) as the basis and combines a column gen-
eration approach (for the linear relaxation of (F-4») with branch and cut.
Comparing their computation results with a Dsatur based exact algorithm
on various classes of graphs, their algorithm seems promising. Furthermore,
their results seem to suggest that the graph coloring problem on Gn,p with
up to 75 vertices can be solved routinely.
Recently, Sewell [414] proposed another exact algorithm based on branch
and bound method. It uses a maximum clique of the graph and heuristic
feasible colorings to serve as lower and upper bounds for the branch and
bound method. An improved branch rule is also introduced to cut down
subproblems at a given node of the search tree. The computational result
reported in [414] seem to confirm that graph coloring problems can be solved
routinely with graph size up to 75 vertices.
As for the exact solution to the minimum weighted coloring problem, es-
pecially when the weights are integers, there is a transformation that changes
such a problem into an unweighted integer coloring problem. Hence all the
above methods can be applied to solve the minimum integer weighted color-
ing problem. However, the tradeoff is the increase of graph size by a factor
of (average vertex weight): I:~ w v , where Wv are the weights of v E V,
and n = IVI is the size of the original graph. In [495], Xue proposed an
algorithm specially designed for the weighted coloring problem. Computa-
tional results in [495] seem to indicate the advantage of solving the minimum
integer coloring problem directly instead of using the above transformation.
350 P. M. Paz-dalos, T. Mavridou, and J. Xue

5 Generalized Graph Coloring


From the previous sections it is evident that many practical problems can
be formulated as graph coloring problems. These applications have also led
to interesting generalizations of the graph coloring problem.
There seem to have two camps of generalizations for the graph coloring
problem: the generalizations that change the internal structure of the color
classes or the external relationships between color classes. We will call these
generalizations "internal" and "external", respectively.
1. Internal Generalization: a color class does not have to be a stable
set. Instead, it has certain structural property (see below). There is
no requirement on the relationships among different color classes.
2. External Generalization: each color class is still a stable set. How-
ever, additional relationships among color classes are imposed.

5.1 Internal Generalizations


This class of generalized graph coloring problems changes the requirement
on the internal structure of a color class. Instead of requiring a color class to
be a stable set, the subgraph induced by a color class should have a specific
property P. That is, to partition G into subgraphs Gl, ... , Gk such that
all GiS have property P. The objective is to minimize k (the number of
the Gis). This class of generalized coloring problem looks like a partition
problem where "color" is replaced by a general property P. Apparently, the
ordinary graph coloring problem is a special case of this problem where the
property P requires all GiS to be stable sets.
Two such interesting generalizations are the (r, d)*-coloring and the
(r, d)- -coloring. In a (r, d)*-coloring, a vertex can adjacent to at most d
vertices of the same color. If a graph G can be colored with r colors under
the above relaxation, then G is (r,d)*-colorable (see [8, 84, 209, 293, 492]).
When a color class is allowed to contain a path of length no more than k
and G can be colored (under this relaxation) with r colors, then G is (r, k)--
colorable. Properties of (r, d)*- and (r, k)- -colorable graphs can be found,
for example, in [7, 8, 12, 84, 107, 157, 293, 492].

5.2 External Generalization


A few examples of the external generalizations of graph coloring problem
are briefly introduced here. For more details, please consult the surveys by
The Graph Coloring Problem: A Bibliographic Survey 351

Roberts [384), Gionfriddo [179), and Neufeld [343]. In particular, Roberts


[384] gives a detailed discussion of generalized graph colorings in conjunction
with recent results and applications.

5.2.1 List Colorings


Let R( v) be a restricted list of colors a vertex v E V can have. Then a R-list
coloring is a coloring where every vertex v is colored by a member of R{v).
List colorings were first proposed by Erdos et aI. [142] with IR(v)1 = k. In
particular, a graph G is said to be k-choosable if it can be R-list colored
for the assignment of any k elements to R{v). The 2- Choosable graphs are
characterized by Erdos et aI. [142]. Mahadev et al. [312] have obtained some
results for 3-choosable graphs. However, the characterization of 3-choosable
graphs is not complete. Two conjectures made by Erdos et aI. [142] are:

• Every planar graph is 5-choosable .

• There is a planar graph which is not 4-choosable.

List colorings are important in the channel assignment problem when


acceptable channels are specified.
A closely related coloring problem is the R-amenable coloring problem.
A graph G is said to be R-amenable if each vertex v can be colored using a
color not in R(v). Brown, et al. [64] and Mahadev and Roberts [311] have
studied this class of coloring problem.

5.2.2 T-Colorings
Let T be a set of non-negative integers with 0 E T. A T-coloring of G
is an assignment of a positive integer f(v) to vertices v E V such that
I f(vi) - f(vj) I¢ T for any adjacent vertices Vi and Vj' When T = {O}, a
T-coloring is just an ordinary coloring. The T-chromatic number, XT(G), is
the minimum number of colors for which G is T-colorable. The "span" of a
T-coloring of Gis max(v;,Vj)EE I f(vi) - f(vj) I, and the span of G, denoted
by SPT{G), is the minimum span among all possible T-coloring of G.
The T-Coloring problem was first introduced by Hale [203] and subse-
quently have been studied by Cozzens and Roberts [108], Cozzens and Wang
[109], Raychaudhuri [376, 378], Roberts [382, 383], Tesman [443], and Wang
[477]. For an excellent survey on work of T-colorings, see Roberts [384].
352 P. M. Pardalos, T. Mavridou, and J. Xue

An early application of the T-coloring problem comes from the channel


assignment problem where the vertices of G(V, E) are transmitters or radio
stations, and the edges indicate possible interference. f(v) of aT-coloring
corresponds to the channel assigned to the transmitter v. Hence, T is a
set of disallowed separation channels between a pair of possible interfering
transmitters.

5.2.3 Set Colorings

The set coloring problem is introduced by Roberts [386] in terms of schedul-


ing and channel assignment problems. It considers the case where a set 8(v)
of colors, satisfying a property A, should be assigned to v E V. For example,
in a channel assignment problem, some transmitters may operate on more
than one frequency; in a scheduling problem, some tasks may require more
than one time period. In these cases, property A imposed on 8(v) is simply
IS(v)1 = kv > 1. It is easy to show that the weighted integer coloring prob-
lem (defined in section 1) is a special case of the set coloring problem. What
separates the set coloring problem from the weighted coloring problems is
that property A can be more general.
Formally speaking, a set coloring is a pair (8, A), where 8 is a function
and A is a property. 8 assigns a set of colors S(v) to v E V such that

and 8(v) satisfy property A.


Two measurements may be of interest for a set coloring: The order of
a set coloring and the score of a set coloring. The former refers to the
cardinality of UvS(v), and the latter refers to the sum of all IS(v)l, i.e.,
EVEV IS(v)l· In some cases, one may want to minimize the order, while
in other cases, one may want to maximize the score for a given order. In
channel assignment problem, the order and score correspond to the band-
width and the actual number of channels used subject to a given band-width,
respectively.
Depend on the definitions of property A, the set coloring problems can be
very different. Several variants have been studied under the names of n-tuple
coloring [178, 426, 387, 234, 352, 350, 376, 386], consecutive coloring [121,
352, 350, 386], I-coloring [352, 351], J-coloring [376, 384]' and D-coloring
[455].
The Graph Coloring Problem: A Bibliographic Survey 353

5.2.4 Combined External Generalizations


In addition to the above external generalizations of the coloring problem,
many researchers are interested in combinations of the above generalized
coloring problems. For example, the Set T-colorings problem and in par-
ticular, the n-tuple T-colorings problem have been studied by Tesman [443]
and by Fiiredi, Griggs, and Kleitman [161]. Hale [203], Roberts [382], and
Tesman [443] combined the idea of list-colorings with that of T-colorings
into the List T-colorings problem. The Set List T-Colorings problem was
suggested by Roberts [384].

6 Concluding Remarks
In this chapter, we presented an overview of recent results on graph color-
ing with an extensive bibliographic survey. Graph coloring is an important
problem in computer sciences and operations research. Although many al-
gorithms and heuristics have been proposed for solving graph coloring prob-
lems, only medium size problems can be solved to optimality. It is also
difficult to compute a provably good approximate solution. Solving the col-
oring problem for large graphs it remains a computational challenge.

References
[1] H.L. Abbott-HL and B. Zhou, The Star Chromatic Number of a Graph,
Journal of Graph Theory 17 (1993), pp. 349-360.

[2] Akers, Jr., and B. Sheldon, Fault Diagnosis as a Graph Coloring Prob-
lem, IEEE Transactions on Electronic Computers, C-23(4) (1974) pp.
706-712.

[3] J. Akiyama and J. Urrutia, A Note on Balanced Colorings for Lattice


Points, Discrete Mathematics, 83 (1990), pp 123-126.

[4] N. Alon and N. Kahale, A Spectral Technique for Coloring Random


3-Colorable Graphs, Proceedings of the twenty-sixth annual A CM sym-
posium on the theory of computing, (1994) pp. 346-355.

[5] N. Alon and M. Tarsi, A Note on Graph Coloring and Graph Polynomial,
Journal of Combinatorial Theory Series B, 70(1) (1997) pp. 197-201
354 P. M. Pardalos, T. Mavridou, and J. Xue

[6] B. Andrasfai, P. Erdos, and V.T. Sos, On the Connection Between Chro-
matic Number, Maximal Clique and Minimal Degree of a Graph, Dis-
crete Math. (The Netherlands), 8 (1974) pp. 205-218.
[7] J.A. Andrews and M.S. Jacobson, On a Generalization of Chromatic
Number and Two Kinds of Ramsey Numbers, Ars Combinatoria, 23
(1987) pp. 97-102
[8] J.A. Andrews and M.S. Jacobson, On a Generalization of Chromatic
Number, Congressus Numerantium, 47 (1985) pp. 33-48.
[9] K. Appel and W. Haken, Every Planar Map is Four Colorable: Part 1,
Discharging, fllinois Journal of Mathematics, 21 (1977) pp. 429-490.
[10] K. Appel, W. Haken, and J. Koch, Every Planar Map is Four Colorable:
Part 2, Reducibility, fllinois Journal of Mathematics, 21 (1977) pp. 491-
567.
[11] J.S. Appleby, D.V. Blake, and E.A. Newman, Techniques for Produc-
ing School Timetables on a Computer and Their Application to Other
Scheduling Problems, The Computer Journal, 3 (1961) pp. 237-245.
[12] D. Archdeacon, A Note on Defective Colorings of Graphs in Surfaces,
J. Graph Theory, 11 (1987) pp. 517-519.
[13] E. Arjomandi, An efficient Algorithm for Colouring the Edges of a
Graph with ~+1 Colors, Technical Report, York University, Department
of Computer Science, 1980.
[14] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof
verification and hardness of approximation problems, Proc. 33rd IEEE
Annual Sym. on Foundation of Computer Science, pp. 14-23, 1992.
[15] S. Arora and S. Safra, Probabilistic Checking of Proofs; A New Char-
acterization of NP, Proc. 33rd IEEE Annual Sym. on Foundation of
Computer Science, (1992) pp. 1-13.
[16] B. Aspvall and J.R. Gilbert, Graph Coloring using Eigenvalue Decom-
position, SIAM Journal on Algebraic and Discrete Methods, 5(4} (1984)
pp. 526-538.
[17] L. Avra, Allocation and Assignment in High-Level Synthesis for Self
Testable Data Paths, Digest of Paare pers-International Test Confer-
ence, published by IEEE, (1992) pp. 463-472.
The Graph Coloring Problem: A Bibliographic Survey 355

[18] L. Babel and G. Tinhofer, Hard-to-Color Graphs for Connected Sequen-


tial Colorings, Discrete Applied Mathematics, 51{1-2) (1994) pp. 3-25.

[19] E. Balas, S. Ceria, G. Cornuejols, and G. Pataki, Polyhedral methods


for the MaxImum Clique Problem, in Cliques, Coloring, and Satisfia-
bility: Second DIMACS Implementation Challenge, Johnson and Trick
(eds.), 26 (1996) pp. 11-28.

[20] E. Balas and J. Xue, Weighted and Unweighted Maximum Clique Al-
gorithms with Upper Bounds from Fractional Coloring, Algorithmica 15
(1996) pp. 397-412.

[21] E. Balas and J. Xue, Minimum Weighted Coloring of Triangulated


Graphs, with Application to Maximum Weight Vertex Packing and
Clique Finding in Arbitrary Graphs, SIAM J. Comput. Vol. 20, No.
2 (1991) pp. 209-221.

[22] P. Baldi, On a Generalized Family of Colorings, Graphs and Combina-


tories, 6 (1990) pp. 95-110.

[23] P. Baldi and E.C. Posner, Graph Coloring Bounds for Cellular Radio,
Computers and Mathematics with Applications, 19 (1990) pp. 91-97.

[24] J. Bang-Jensen and P. Hell, On the Effect of Two Cycles on the Com-
plexity of Colouring, Discrete Applied Mathematics, 26 (1990) pp. 1-23.

[25] R. Bauernoppel and H. Jung, Fast Parallel Vertex Coloring, funda-


mentals of Computation Theory, FCT '85, eds. Lothar Budach, Cottbus,
GDR, (1985) pp. 28-35

[26] B. Bauslaugh, The Complexity of Infinite H-Coloring, Journal of Com-


binatorial Theory Series B, 61(2) (1994) pp. 141-154.

[27] I. Baybars, Optimal Assignment of Broadcasting Frequencies, European


Journal of Operations Research, 9 (1982) pp. 257-263.

[28] M. Behzad, The Total Chromatic Number of a Graph, a Survey, in


Combinatorial Mathematics and its Applications, (ed. D.J.A. Welsh),
Academic Press, New York, (1971) pp. 1-8.

[29] M. Behzad, Graphs and Their Chromatic Numbers, Ph.D. Thesis,


Michigan State University, 1965.
356 P. M. Pardalos, T. Mavridou, and J. Xue

[30] E. Beltrami and L. Bodin, Networks and Vehicle Routing for Municipal
Waste Collection, Networks, 4 (1973) pp. 65-94.

[31] M. Bellare, and M. Sudan, Improved Non-approximability Results,


Proc. Twenty sixth Ann. ACM Symp. on Theory of Comp., ACM (1994),
pp. 184-193.

[32] E.A. Bender and H.S. Wilf, A Theoretical Analysis of Backtracking


in the Graph Coloring Problem, Journal of Algorithms, 6(2} (1985) pp.
275-282.

[33] C. Berge, Algorithms and Extremal Problems for Equipartite Colorings


in Graphs and Hypergraphs, volume 2 of Annals of Discrete Mathematics,
North- Holland Publishing Co. (1978) pp. 149-150.

[34] C. Berge and P. Duchet, Strongly Perfect Graphs, Annals of Discrete


Mathematics, 21 (1984) pp. 57-61.

[35] B. Berger and J. Rompel, A Better Performance Guarantee for Approx-


imate Graph Coloring, Algorithmica, 5(4} (1990) pp. 459-466.

[36] K.A. Berman and J.L. Paul, The Bounded Chromatic Number for
Graphs of Genus-G, Journal of Combinatorial Theory Series B, 56 (1992)
pp 183-196.

[37] C. Bernardi, On a Theorem about Vertex Colorings of Graphs, Discrete


Math., 64 (1987) pp. 95-96.

[38] D. Bernstein, D.Q. Goldin, M.C. Golumbic, H. Krawczyk, Y. Man-


sour, I. Nahshon, and R.Y. Pinter, Spill Code Minimization Techniques
for Optimizing Compilers, Proceedings of the ACM SIGPLAN '89 Con-
ference on Programming Language Design and Implementation, Sigplan
Notices, 24(6} (1989) pp. 258-263.

[39] N. Biggs, Some Heuristics for Graph Coloring, Graph Colorings, R.


Nelson and R. Wilson (eds.), Pitman Research Notes in Mathematics
Series, Wiley (1990) pp. 87-96

[40] J.R. Bitner and E. Reingold, Backtrack Programming Techniques,


Communications of the ACM, 18 (1975) pp. 651-656.

[41] A. Blum, New Approximation Algorithms for Graph-Coloring, Journal


of the Association for Computing Machinery, 41(3), (1994) 470-516.
The Graph Coloring Problem: A Bibliographic Survey 357

[42] A. Blum, Some Tools for Approximate 3-Coloring, in 31st Annual Sym-
posium on Foundations of Computer Science, (1990) pp. 554-562.

[43] A. Blum, An O(n°.4)-Approximation Algorithm for 3-Coloring (and


Improved Approximation Algorithms for k-Coloring), in Proceedings of
the Twenty First Annual ACM Symposium on Theory of Computing,
(1989) pp. 535-542.

[44] A. Blum and J.H. Spencer, Coloring Random and Semi-random k-


Colorable Graphs, Journal of Algorithms, 19 (1995), 204-234.

[45] L.D. Bodin and A.J. Friedman, Scheduling of Committees for the New
York State Assembly, Technical Report USE No. 71-9, Urban Science
and Engineering, State University of New York, Stony Brook, 1971.

[46] B. Bollobas, The Chromatic Number of Random Graphs, Combinator-


ica, 8 (1988) pp. 49-55.

[47] B. Bollobas, Random Graphs, Academic Press, 1985.

[48] B. Bollobas, Chromatic Number, Girth, and Maximal Degree, Discrete


Mathematics, 24 (1978) pp. 311-314.

[49] B. Bollobas and A.J. Harris, List Colourings of Graphs, Graphs and
Combinatorics, 1 (1985) pp. 115-127.

[50] B. Bollobas and A. Thomason, Random Graphs of Small Order, in


Random Graphs '83, 28 of Annals of Discrete Mathematics, Section
6: "Colouring Large Random Graphs". North-Holland Publishing Co.,
(1985) pp. 47-97.
[51] J.A. Bondy, Bounds for the Chromatic Number of a Graph, Journal of
Combinatorial Theory, 7 (1969) pp. 96-98.

[52] O.V. Borodin and A.V. Kostochka, On an Upper Bound of a Graph's


Chromatic Number, Depending on the Graph's Degree and Density, J.
Combinatorial Theory (B), 23 (1977) pp. 247-250.

[53] A.A. Borovikov and V.A. Gorbatov, A Criterion for Coloring of the
Vertices of a Graph, Eng. Cybernetics, 10 (1972) pp. 683-686.

[54] J.F. Boyar and H.J. Karloff, Coloring Planar Graphs in Parallel, Jour-
nal of Algorithms, 8 (1987) pp. 470-479.
358 P. M. Pardalos, T. Mavridou, and J. X ue

[55] D. Brelaz, New Methods to Color Vertices of a Graph, Communications


of the ACM, 22(4) (1979) pp. 251-256.

[56] R Brewster, The Complexity of Coloring Symmetrical Relational Sys-


tems, Discrete Applied Mathematics, 49 (1994) pp. 95-105.
[57] P. Briggs, K.D. Cooper and L. Torczon, Improvements to Graph-
Coloring Register Allocation, A CM Transactions on Programming Lan-
guages and Systems, 16(3) (1994) pp. 428-455.

[58] P. Briggs, K. Cooper and L. Torczon, Coloring Register Pairs, A CM


Letters Programming Language Systems, 1(1) (1992) pp. 3-13.

[59] S. Broder, Final Examination Scheduling, Communications of the


ACM, 7(8) (1964) pp. 494-498.

[60] H.J. Broersma and F. Gobel, Coloring a Graph Optimally with 2 Col-
ors, Discrete Mathematics, 118 (1993) pp. 23-31.
[61] RL. Brooks, On Coloring the Nodes of a Network, Mathematical Pro-
ceedings of the Cambridge Philosophical Society, 37 (1941) pp. 194-197.

[62] J.I. Brown, The Complexity of Generalized Graph Colorings, Discrete


Applied Mathematics, 69(3) (1996) pp. 257-270.

[63] J.R Brown, Chromatic Scheduling and the Chromatic Number Prob-
lem, Management Science,. Part I, 19(4) (1972) pp. 456-463.
[64] J.1. Brown, D. Kelly, J. Schonheim, and RE. Woodrow, Graph Coloring
Satisfying Restraints, Discrete Mathematics, 80(2) (1990) pp. 123-143.
[65] RA. Brualdi and J.J.Q. Massey, Incidence and Strong Edge Colorings
of Graphs, Discrete Mathematics, 122 (1993) pp. 51-58.
[66] E. Burattini, A. Massarotti, and A. Santaniello, A Graph Colouration
Technique, In Second International Conference on Information Sciences
and Systems (Univ. Patras, Patras, 1979), Vol. III, Reidel (1980) pp.
326-333.

[67] L. Caccetta and N.J. Pullman, Regular Graphs with Prescribed Chro-
matic Number, Journal of Graph Theory, 14 (1990) pp. 65-71.
[68] L.Z. Cai and J.A. Ellis, Edge Coloring Line Graphs ofUnicyclic Graphs,
Discrete Applied Mathematics, 36 (1992) pp. 75-82.
The Graph Coloring Problem: A Bibliographic Survey 359

[69] L. Cai and J.A. Ellis, NP-Completeness of Edge-Coloring Some Re-


stricted Graphs, Discrete Applied Mathematics, 30 {1991} pp. 15-27.

[70] D. Callahan and B. Koblenz, Register Allocation via Hierarchical Graph


Coloring, Proceedings of the ACM SIGPLAN '89 Conference on Pro-
gramming Language Design and Implementation, Sigplan Notices, 26{6}
(1991) pp. 192-203.

[71] S.H. Cameron, The Solution of the Graph-coloring Problem as a Set-


covering Problem, IEEE Trans. Elect. Magn. Comp., EMC-19 (1977)
pp. 320-322.

[72] J. Cameron and G. Thomas, An Approximate Graph Partitioning Al-


gorithm and its Computational Complexity, Congress Numerantium, 49
{1985} pp. 287-293.

[73] G. Campers, O. Henkes, and J.P. Leclercq, Graph Coloring Heuristics:


A Survey, Some new Propositions and Computational Experiences on
Random and "Leighton's" Graphs, in Operations Research '87 (Buenos
Aires, 1987), North-Holland Publishing Co., 50 {1988} pp. 917-932.

[74] G. Campers, O. Henkes, and J.P. Leclercq, Sur Les Methodes Ex-
actes de Coloration de Graphes, On Exact Methods of Graph Coloring,
Cahiers du Centre d'Etudes de Recherche Operationnelle, 29 {1987} pp.
19-30.

[75] M.C. Carlisle and E.L. Lloyd, On the K-Coloring of Intervals, Discrete
Applied Mathematics, 59 (1995) pp. 225-235.

[76) J.D. Carpinelli and A.Y. Oruc, Applications of Matching and Edge-
Coloring Algorithms to Routing in Close Networks, Networks, 24 (1994)
pp. 319-326.

[77] Y. Caspi and E. Dekel, Edge-Coloring Series-Parallel Graphs, Journal


of Algorithms, 18 {1995} pp. 296-321.

[78] P.A. Catlin, Homomorphisms as a Generalization of Graph Coloring,


Congressus Numerantium, (1985) pp. 179-186.

[79] P.A. Catlin, Another Bound on the Chromatic Number of a Graph,


Discrete Math. {The Netherlands}, 24 (1978) pp. 1-6.
360 P. M. Pardalos, T. Mavridou, and J. Xue

[80] P.A. Catlin, A Bound on the Chromatic Number of a Graph, Discrete


Math. (The Netherlands), 22 (1977) pp. 81-83.

[81] G.J. Chaitin, Register Allocation and Spilling via Graph Coloring, in
Proceedings of the ACM SIGPLAN 82 Symposium on Compiler Con-
struction (Boston, June 1982). ACM, New York, (1982) pp. 98-105.

[82] G.J. Chaitin, M.A. Auslander, A.K. Chandra, J. Cooke, M.E. Hop-
kins, and P. Markstein, Register Allocation via Coloring, Computer Lan-
guages, 6 (1981) pp. 47-57.

[83] M. Chams, A. Hertz, and D. de Werra, Some Experiments with Simu-


lated Annealing for Coloring Graphs, European Journal of Operational
Research, 32(2} (1987) pp. 260-266.

[84] G. Chartrand, D.P. Geller, and S. Hedetniemi, Graphs with Forbidden


Subgraphs, J. Combinatorial Theory (B), 10 (1971) pp. 12-41.

[85] Y. I. Cheban, Investigation of a Generalized Graph Coloring Problem,


In Investigation of Methods for Solving Extremal Problems (Russian),
vii, Akad Nauk Ukrain. SSR, Inst. Kibernet Kiev (B) (1986) pp. 77-81.

[86] P. Cheeseman, Kanefsky, and W. Taylor, Where the Really Hard Prob-
lems are, in International Joint Conference on Artificial Intelligence,
(1991) pp. 331-337.

[87] B.L. Chen and K.W. Lih, Equitable Coloring of Trees, Journal of Com-
binatorial Theory Series B, 61 (1994) pp. 83-97.

[88] G. Chen, R.H. Schelp and W.E. Shreve, A New Game Chromatic Num-
ber, European Journal of Combinatorics, 18 (1997) pp. 1-9.

[89] A. Chetwynd, Total Colourings of Graphs, Graph Colourings, R. Nelson


and R.J. Wilson (eds.), Pitman Research Notes in Mathematics Series
218, John Wiley & Sons, New York, (1990) pp. 65-77

[90] D. Chhajed, Edge-Coloring a k-Tree into 2 Smaller Trees, Networks, 29


(1997) pp. 191-194.

[91] M. Chorbak and M. Slusarek, Problem 84-23, Journal of Algorithms, 5


(1984) pp. 588-588.
The Graph Coloring Problem: A Bibliographic Survey 361

[92] F. Chow and J. Hennessy, The Priority-Based Coloring Approach to


Register Allocation, A CM Transactions on Programming Languages and
Systems, 12(4) (1990) pp. 501-536.

[93] F. Chow and J. Hennessy, Register Allocation by Priority-Based Color-


ing, in Proceedings of the ACM SIGPLAN 84 Symposium on Compiler
Construction (Montreal, June 1984), ACM, New York, (1984) pp. 222-
232.

[94] N. Christofides, in Graph theory, An Algorithmic Approach, Academic


Press, London, (1975) pp. 70-71.

[95] N. Christofides, An Algorithm for the Chromatic Number of a Graph,


Computing, 14 (1971) pp. 38-39.

[96] V. Chvatal, Perfectly Ordered Graphs, in C. Berge and V. Chvatal,


editors, Topics on Perfect Graphs, Vol. 21 of Annals of Discrete Mathe-
matics. North-Holland Publishing Co., (1984) pp. 63-65.

[97] V. Chvatal, M.R. Garey, and D.S. Johnson, Two Results Concerning
Multicoloring, in Aspects of Combinatorics Alspach et al. (eds), Vol 2 of
Annals of Discrete Mathematics, North-Holland Publishing Co. (1978)
pp. 151-154.

[98] V. Chvatal, T. Hoang, N.V.R. Mahadev, and D. de Werra, Four Classes


of Perfectly Orderable Graphs, Journal of Graph Theory, 11 (1987) pp.
481-495.

[99] A.J. Cole, The Preparation of Examination Time-Tables Using a Small


Store Computer, The Computer Journal, 7 (1964) pp. 117-121.

[100] T.F. Coleman and J.J. More, Estimation of Sparse Hessian Matrices
and Graph Coloring Problems, Mathematical Programming, 28 (1984)
pp. 243-270.

[101] T.F. Coleman and J.J. More, Estimation of Sparse Jocobian Matrices
and Graph Coloring Problems, SIAM Journal on Numerical Analysis,
20 (1983) pp. 187-209.

[102] R.J. Cook, Chromatic Number and Girth, Periodica Mathematica


Hungarica, 6(1) (1975) pp. 103-107.
362 P. M. Paz-dalos, T. Mavridou, and J. Xue

[103] R.J. Cook, Complementary Graphs and Total Chromatic Numbers,


SIAM J. Appl. Math., 27 (1974) pp. 626-628.

[104] D.G. Corneil and B. Graham, An Algorithm for Determining the Chro-
matic Number of a Graph, SIAM Journal on Computing, 2(4} (1973) pp.
311-318.

[105] D. Costa and A. Hertz, Ants Can Colour Graphs, Journal of the Op-
erational Society, 47 (1996) pp. 1-11.

[106] D. Costa, A. Hertz, and O. Dubuis, Embedding a Sequential Procedure


Within an Evolutionary Algorithm for Coloring Problems, Journal of
Heuristics, 1 (1995) pp. 105-128.

[107] L.J. Cowen, R.H. Cowen, and D.R. Woodall, Defective Colorings of
Graphs in Surfaces: Partitions into Subgraphs of Bounded Valency, J.
Graph Theory, 10 (1986) pp. 187-195.

[108] M.B. Cozzens and F.S. Roberts, T-Colorings of Graphs and the Chan-
nel Assignment Problem, Congressus Numerantium, 35 (1982) pp. 191-
208.

[109] M.B. Cozzens and D.I. Wang, The General Channel Assignment Prob-
lem, Congressus Numerantium, 41 (1984) pp. 115-129.

[110] J. Culberson, A. Beacham, and D. Papp, Hiding our Colors, in CP'95


Workshop on Studying and Solving Really Hard Problems, (1995) pp.
31-42.

[111] J.C. Culberson, Iterated Greedy Graph Coloring and the Difficulty
Landscape, Technical Report TR 92-07, University of Alberta Dept of
Computer Science, Edmonton, Alberta Canada T6G 2H1, 1992, ftp:
ftp.cs. ualberta. ca/pub/TechReports.

[112] J. C. Culberson and F. Luo, Exploring the k-colorable Landscape With


Iterated Greedy, in Cliques, Coloring, and Satisfiability: Second DI-
MACS Implementation Challenge, Johnson and Trick (eds.), 26 (1996)
pp.245-284.

[113] A.R. Curtis, M.J.D. Powell, and J.K. Reid, On the Estimation of
Sparse Jacobian Matrices, Journal [nst. Maths. Applications, 13 (1974)
pp.117-119.
The Graph Coloring Problem: A Bibliographic Survey 363

[114] D.P. Dailey, Uniqueness of Colorability and Colorability of Planar 4-


Regular Graphs are NP-Complete, Discrete Mathematics, 30 (1980) pp.
289-293.
[115] D. de Werra, Heuristics for Graph Coloring, Computing, 7 (1990) pp.
191-208.
[116] D. de Werra, An Introduction to Timetabling, European Journal of
Operational Research, 19 (1985) pp. 151-162.
[117] D. de Werra, On a Particular Conference Scheduling Problem,
INFOR-Canadian Journal of Operational Research and Information
Processing, 13 (1975) pp. 308-315.
[118] D. de Werra, How to Color a Graph, in B. Roy (editor) Combinato-
rial Programming-Methods and Applications, NATO Advanced Science
Institute Series C: Mathematical and Physical Sciences, 19 (1974) pp.
305-325.
[119] D. de Werra, Some Results in Chromatic Scheduling, Z. Operations
Res. Ser. A (1974) pp. 167-175.
[120] D. de Werra and Y. Gay, Chromatic Scheduling and Frequency As-
signment, Discrete Applied Mathematics, 49 (1994) pp. 165-174.
[121] D. de Werra and A. Hertz, Consecutive Colorings of Graphs,
Zeitschrijt fUr Operations Research, 32 (1988) pp. 1-8.
.[122] P. Dencker, K. Durre, and J. Heuft, Optimization of Parser Tables for
Portable Compilers, ACM Transactions on Programming Languages and
Systems, 6 (1984) pp. 546-572.
[123] B. Descartes, Solution to Advanced Problem 4526, The American
Mathematical Monthly, 61 (1954) pp. 352. proposed by P. Ungar;
pseudonym for Tutte.
[124] B. Descartes, A Three Colour Problem, Eureka, April 1947, solution
march 1948, pseudonym for Tutte.
[125] K. Diks, A Fast Parallel Algorithm for Six-Colouring of Planar Graphs
(extended abstract), in J. Gruska, B. Rovan, and J. Wiedermann (edi-
tors) Mathematical Foundations of Computer Science 1986; Proceedings
of the Tweljth Symposium Held in Bratislava, 233 of Lecture Notes in
Computer Science, Springer-Verlag, (1986) pp. 273-282.
364 P. M. Pardalos, T. Mavridou, and J. X ue

[126] G.A. Dirac, The Structure of k-Chromatic Graphs, Fundamental


Mathematicae, 40 (1953) pp. 42-55.

[127] K. Dowsland, Timetabling Problem in which Clashes are Inevitable,


Journal of Operations Research Society, 41(10) (1990) pp. 907-918.

[128] F.D.J. Dunstan, Sequential Colourings of Graphs, in Proceedings of


Fifth British Combinatorial Conference, Utilitas Mathematica, (1976)
pp. 151-158.

[129] K. Durre, An Algorithm for Coloring the Vertices of an Arbitrary


Graph, in P. Deussen, editor, unknown, vol. 78 of Lecture Notes in Eco-
nomics and Mathematical Systems, Springer-Verlag, (1973) pp. 82-89.

[130] K. Durre, J. Heuft, and H. Muller, Worst and Best Case Behavior of an
Approximate Graph Coloring Algorithm, in J.R. Muhlbacher and Carl
Hansen Verlag, editors, Proc. 7th Conference on Graphtheoretic Concepts
in Computer Science (Linz, Austria, June 15-17, 1981), Munich, (1982)
pp. 339-348.

[131] R.D. Dutton and R.C. Brigham, A New Graph Coloring Algorithm,
The Computer Journal, 24(1) (1981) pp. 85-86.

[132] M.E. Dyer and A.M. Frieze, The Solution to some Random NP-
Hard Problems in Polynomial Expected Time, Journal of Algorithms,
10 (1989) pp. 451-489.

[133] M.E. Dyer and A.M. Frieze, The Solution to some Random NP-Hard
Problems, in 27th Annual Symposium on Foundations of Computer Sci-
ence, (1986) pp. 331-336.

[134] S. Early, Evaluating a Timetabling Algorithm Based on Graph Re-


colouring. Ph.D. thesis, University of Oxford, 1968, B. Phil. Diss.

[135] K. Edwards and C. Mcdiarmid, The Complexity of Harmonious Col-


oring for Trees, Discrete Applied Mathematics, 57 (1995) pp. 133-144.

[136] R.W. Eglese and G.K. Rand, Conference Seminar Timetabling, J. Opl.
Res. Soc., 38 (1987) pp. 591-698.

[137] J.A. Ellis and P.M. Lepolesa, A Las Vegas Coloring Algorithm, The
Computer Journal, 32(5) (1989) pp. 474-476.
The Graph Coloring Problem: A Bibliographic Survey 365

[138] P. Erdos, Graph Theory and Probability, Canadian Journal of Math-


ematics, 11 (1959) pp. 34-38.

[139] M. Erne and P. Erdos, Clique Numbers of Graphs, Discrete Mathe-


matics, 59(3) (1986) pp. 235-241.

[140] P. Erdos and D.J. Kleitman, On coloring graphs to maximize the pro-
portion of multicolored k-edges, J. Combin. Theory, 5 (1968) pp. 164-
169.

[141] P. Erdos and R.J. Wilson, On the Chromatic Index of Almost all
Graphs, Journal of Combinatorial Theory Series B, 23 (1977) pp. 255-
257.

[142] P. Erdos, A. Rubin, and H. Taylor, Choosability in Graphs, Congressus


Numerantium, 26 (1979) pp. 125-157.

[143] T. Etzion and S. Bitan, On the Chromatic Number, Colorings and


Codes of the Johnson Graph, Discrete Applied Mathematics, 70 (1996)
pp. 163-175.

[144] S. Even, A. Itai and A. Shamir, On the Complexity of Time-table


and Multicommondity Flow Problems, SIAM J. Comput., 5 (1976) pp.
691-703.

[145] U. Feige, S. Goldwasser, L. Lovasz, S. Safra, and M. Szegedy, Approx-


imating Clique is Almost NP-Complete, in 32nd Annual Symposium on
Foundations of Computer Science, (1991) pp. 2-3.

[146] T.A. Feo and M.G.C. Resende, Flight Scheduling and Maintenance
Base Planning, Management Science, 35(12) (1989) pp. 1415-1432.

[147] T.A. Feo, M.G.C. Resende, and S.H. Smith, A Greedy Randomized
Adaptive Search Procedure for Maximum Independent Set, Technical
Report, ORP89-19, Operations Research Group, Mechanical Engineer-
ing Department, University of Texas at Austin.

[148] W. Fernandez de la Vega, Random Graphs Almost Optimally Col-


orable in Polynomial Time, in Random Graphs '83, Vol. 28 of Annals
of Discrete Mathematics, North-Holland Publishing Co., (1985) pp. 311-
317.
366 P. M. Pardalos, T. Mavridou, and J. Xue

[149] W. Fernandez de la Vega, On the Chromatic Number of Sparse Ran-


dom Graphs, in B. Bollobas, editor, Graph Theory and Combinatorics,
Proceedings Cambridge Combinatorial Conference in Honour of Paul
Erdos, Academic Press (Inc) (1984) pp. 321-328.

[150] S. Fiorini and R.J. Wilson, Edge-Colorings of Graphs, Lecture Notes


in Mathematics, 6, (Pitman), 102-126 1977.

[151] C. Fleurent and J. A. Ferland, Object-oriented implementation of


heuristic search methods for graph coloring, maximum clique, and satis-
fiability, in Cliques, Coloring, and Satisfiability: Second DIMACS Imple-
mentation Challenge, Johnson and Trick (eds.), 26 (1996) pp. 619-652.

[152] C. Fleurent and J. A. Ferland, Genetic and hybrid algorithms for graph
coloring, Annals of Operations Research, to appear, 1995.

[153] J .A. Formby, A Computer Procedure for Bounding the Chromatic


Number of a Graph, in D.J.A. Welsh, editor, Combinatorial Mathematics
and its Applications, Proceedings of a Conference in Oxford, Academic
Press, Inc., (1969) pp. 111-114.

[154] C.E. Foster and H.C. Grossman, An Empirical Investigation of the


Haifa Register Allocation Technique in the GNU C Compiler, Proc. IEEE
SOUTHEASTCON, 2 pp. 776-779.

[155] M. Franklin and K.K. Saluja, Hypergraph Coloring and Reconfigured


Ram Testing, IEEE transactions on Computers, 43 (1994) pp. 725-736.

[156] E.C. Freuder, A Sufficient Condition of Backtrack-Free Search, Jour-


nal of the ACM, 29(1) (1982) pp. 24-32.

[157] M. Frick and M.A. Henning, Various Results on Defective Colorings of


Graphs, Research Report 93/90/(3), Department of Mathematics, Ap-
plied Mathematics and Astronomy, University of South Africa, Pretoria,
1990.

[158] C. Friden, A. Hertz, and D. de Werra, STABULUS: A Technique for


Finding Stable Sets in Large Graphs with Tabu Search, Computing, 42
(1989) pp. 35-44.

[159] A.M. Frieze, On the Independence Number of Random Graphs, Dis-


crete Mathematics, 81 (1990) pp. 171-175.
The Graph Coloring Problem: A Bibliographic Survey 367

[160] A.M. Frieze, Parallel Colouring of Random Graphs, in M. Karonski,


J. Jaworski, and A. Rudinski, editors, Random Graphs '87, John-Wiley
& Sons, Inc., (1990) pp. 41-52.
[161] Z. Fiiredi, J.R. Griggs, and D.J. Kleitman, Pair Labellings with Given
Distance, Institute for Mathematics and its Applications, University of
Minnesota, Minneapolis, MN.

[162] M. Fiirer and C.R. Subramanian, Coloring Random Graphs, in The


Third Scandinavian Workshop on Algorithm Theory (SWAT '92), (1992)
pre-print.

[163] H.N. Gabow and O. Kariv, Algorithms for Edge Coloring Bipartite
Graphs and Multigraphs, SIAM Journal on Computing, 11(1) (1982)
pp. 117-129.

[164] H.N. Gabow, Using Euler Partitions to Edge Coloring Bipartite Multi-
graphs, Inter. J. Compo and Inf. Sci., 5 (1976) pp. 345-355.

[165] A. Gamst, Application of Graph Theoretical Methods to GSM Radio


Network Planning, Proc. IEEE International Symposium on Circuits and
Systems, 2 (1991) pp. 942-945.

[166] A. Gamst, Some Lower Bounds for a Class of Frequency Assignment


Problems, IEEE Transactions on Vehicular Technology, VT-35 (1986)
pp.8-14.

[167] A. Gamst and K. Ralf, Computational Complexity of some Interfer-


ence Graph Calculations, IEEE Transactions on Vehicular Technology,
39 (1990) pp. 140-149.
[168] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide
to the Theory of NP-Completeness, W.H. Freeman, 1979.
[169] M.R. Garey and D.S. Johnson, The Complexity of Near-Optimal
Graphs Coloring, Journal of the ACM, 23 (1976) pp. 43-49.

[170] M.R. Garey, D.S. Johnson, and I. Stockmeyer, Some Simplified NP-
Complete Graph Problems, Theoretic Computer Science, 1 (1976) pp.
237-267.

[171] M.R. Garey and D.S. Johnson, On Salazar and Oakford, Communi-
cation of the ACM, 18 (1975) pp. 240-241.
368 P. M. Pardalos, T. Mavridou, and J. Xue

[172] F. Gavril, Algorithms for Coloring, Maximum Clique, Minimum Cov-


ering by Cliques, and Maximum Independent Set of a Chordal Graph,
SIAM Journal on Applied Mathematics, 1(2) (1972) pp. 181-187.
[173] J. A. George, Solution of Linear Systems of Equations: Direct Methods
for Finite Element Problems, Sparse Matrix Techniques: Copenhagen
1976, V.A. Barker, ed., Lecture Notes in Mathematics, 572, Springer-
Verlag, New York, (1977) pp. 79-91.
[174] I.M. Gessel, A Coloring Problem, American Mathematical Monthly,
Vol 98, Iss 6, pp 530-533, 1991.
[175] L.E. Gibbons, D.W. Hearn, P.M. Pardalos, and M. Ramana, A Con-
tinuous Characterization of the Maximaum Clique Problem, Math. of
Oper. Res., 22 (1997) pp. 754-768.
[176] L.E. Gibbons, D.W. Hearn, and P.M. Pardalos, A Continuous Based
Heuristic for the Maximum Clique Problem, in Cliques, Coloring, and
Satisfiability: Second DIMACS Implementation Challenge, Johnson and
Trick (eds.), 26 (1996) pp. 103-124.
[177] A. Gibbons and W. Rytter, Optimally Edge-Coloring Outerplanar
Graphs Is in NC, Theoretical Computer Science, 71 (1990) pp 401-411.
[178] E.N. Gilbert, Unpublished Technical Memorandum, Bell Telephone
Laboratories, Murray Hill, NJ, (1972).
[179] M. Gionfriddo, A Short Survey of Some Generalized Colourings of
Graphs, Ars Combinatorica, 21 (1979) pp. 295-322.
[180] M. Gionfriddo and G. Lofaro, 2-Colorings in S(T, T (1, V)), Discrete
Mathematics, 111 (1993) pp. 263-268.
[181] R.K. Gjertsen, M.T. Jones and P.E. Plassmann, Parallel Heuristics
for Improved, Balanced Graph Colorings, Journal of Parallel and Dis-
tributed Computing, 37 (1996) 171-186.
[182] F. Glover, Future Paths for Integer Programming and Links to Ar-
tificial Intelligence, Computers and Operations Research, 13 (1986) pp.
533-549.
[183] F. Glover, M. Parker, and J. Ryan, Coloring by tabu branch and
bound, in Cliques, Coloring, and Satisfiability Second DIMACS Imple-
mentation Challenge, Johnson and Trick (eds.), 26 (1996) pp. 285-307.
The Graph Coloring Problem: A Bibliographic Survey 369

[184] S.W. Golomb and L.D. Baumert, Backtrack Programming, Journal of


the ACM, 12 (1965) pp. 516-524.

[185] M.C. Golumbic, Algorithmic Graph Theory and Perfect Graphs, Aca-
demic Press, New York, (1980).

[186] M.C. Golumbic, The Complexity of Comparability Graph Recognition


and Coloring, Computing (Austria), 18 (1977) pp. 199-208.

[187] T. Gonzalez, A Note on Open Shop Preemptive Schedules, IEEE


Trans. Comput., C-28 (1979) pp. 782-786.

[188] D.A. Grable and A. Panconesi, Nearly Optimal Distributed Edge-


Coloring in O(loglogN) Rounds, Random Structures fj Algorithms, 10
(1997) 385-405.

[189] J.R. Griggs, Lower Bounds on the Independence Number in Terms of


the Degrees, Journal of Combinatorial Theory Series B, 34 (1983) pp.
22-39.

[190] G.R. Grimmett and C.J.H. McDiarmid, On Colouring Random


Graphs, Mathematical Proceedings of the Cambridge Philosophical So-
ciety, 77 (1975) pp. 313-324.

[191] M. Grotschel, L. Lovasz and A. Schrijver, Polynomial Algorithms for


Perfect Graphs, Annals of Discrete Mathematics, 21 (1989) pp. 325-356.

[192] B. Griinbaum, A Problem in Graph Colouring, The American Math-


ematical Monthly, 77 (1970) pp. 1088-1092.

[193] D.R. Guichard, No-Hole k-Tuple (R+l)-Distant Colorings of Odd Cy-


cles, Discrete Applied Mathematics, 64 (1996) pp. 87-92.

[194] D.R. Guichard, Acyclic Graph-Coloring and the Complexity of the


Star Chromatic Number, Journal of Graph Theory, 17 (1993) pp. 129-
134.

[195] R. Gupta, M.L. Soffa and D. Ombres, Efficient Register Allocation via
Coloring Using Clique Separators, ACM Transactions on Programming
Languages and Systems, 16 (1994) pp. 370-386.

[196] W. Gutjahr, E. Welzl and G. Woeginger, Polynomial Graph-Colorings,


Discrete Applied Mathematics, 35 (1992) pp. 29-45.
370 P. M. Paz-dalos, T. Mavridou, and J. Xue

[197] A. Gyarfas, Problems from the World Surrounding Perfect Graphs,


Research Report 177, Computer and Automation Institute Studies,
(1985).

[198] A. Gyarfas and J. Lehel, On-Line Coloring of P5-Free Graphs, Com-


binatorica, 11 (1991) pp. 181-184.

[199] A. Gyarfas and J. Lehel, First Fit and On-line Chromatic Number of
Families of Graphs, Ars Combinatoria, 29 B, 1990.

[200] A. Gyarfas and J. Lehel, On-Line and First Fit Colorings of Graphs,
Journal of Graph Theory, 12 (1988) pp. 217-227.

[201] Hadme Ait HaddadEme and Frederic Maffray, Coloring perfect degen-
erate graphs, Discrete Mathematics, 163 (1997) pp. 211-215.

[202] R. Haggkvist, P. Hell, D.J. Miller, and V. Neumann-Lara, On Multi-


plicative Graphs, and the Product Conjecture, Combinatorica, 8 (1988)
pp.63-74.

[203] W.K. Hale, Frequency Assignment: Theory and Applications, Pro-


ceedings of the IEEE, 68 (1980) pp. 1497-1514.

[204] M. M. Halld6rsson, Parallel and on-line graph coloring, Journal of


Algorithms, 23 (1997) pp. 265-280.

[205] M. M. Halld6rsson, A Still Better Performance Guarantee for Approx-


imate Graph Coloring, Information Processing Letters 45 (1993) pp. 19-
23.

[206] M.M. Halld6rsson and M. Szegedy, Lower Bounds for Online Graph-
Coloring, Theoretic Computer Science, 130 (1994) pp. 163-174.

[207] P. Hansen and M. Delattre, Complete-link Cluster Analysis by Graph


Coloring, J. of the American Stat. Assoc., 73 (1978) pp. 397-403.

[208] P. Hansen, A. Hertz and J. Kuplinsky, Bounded Vertex Colorings of


Graphs, Discrete Mathematics, 111 (1993) pp. 305-312.

[209] F. Harary, Conditional Colorability in Graphs, Graphs and Applica-


tions, John Wiley & Sons, New York, (1985).
The Graph Coloring Problem: A Bibliographic Survey 371

[210] R. Hassin and S. Lahav, Maximizing the number of unused colors in


the vertex coloring problem, Information Processing Letters, 52 (1994)
pp. 87-90.

[211] X. He, An Efficient Algorithm for Edge Coloring Planar Graphs with
Delta-Colors, Theoretical Computer Science, 74 (1990) pp. 299-312.

[212] P. Hell and D.G. Kirkpatrick, Scheduling, Matching, and Coloring, in


Algebraic Methods in Graph Theory, Vol. I, II, Colloq. Math. Soc. Janos
Bolyai, 25, North-Holland Publishing Co., (1981) pp. 273-279.

[213] P. Hell and J. NeSetril, On the Complexity of H-coloring, Journal


Comb. Theory, B48 (1990) pp. 92-110.

[214] A. Hertz, A New Polynomial-Time Algorithm for the Maximum


Weighted (X(G) -I)-Coloring Problem in Comparability-Graphs, Math-
ematical Systems Theory, 27 (1994) pp. 357-363.

[215] A. Hertz, Cosine: A new graph coloring algorithm, Operations Re-


search Letters, 10 (1991) pp. 411-415.

[216] A. Hertz, A Fast Algorithm for Coloring Meyniel Graphs, J. of Com-


bin. Theory (B), 50 (1990) pp. 231-240.

[217] A. Hertz and D. de Werra, Using Tabu Search Techniques for Graph
Coloring, Computing, 39 (1987) pp. 345-351.

[218] A. Hertz, B. Jaumard and M.P. Dearagao, Local Optima Topology for
the K-Coloring Problem, Discrete Applied Mathematics, 49 (1994) pp.
257-280.

[219] A.J.W. Hilton, Recent Results on the Total Chromatic Number, Dis-
crete Mathematics, 111 (1993) pp. 323-331.

[220] A.J.W. Hilton and H.R. Hind, The Total Chromatic Number of
Graphs Having Large Maximum Degree, Discrete Mathematics, 117
(1993) pp. 127-140.

[221] A.J.W. Hilton, R. Rado, and S.H. Scott, A « 5)-Color Theorem for
Planar Graphs, Bull. London Math. Soc., 5. NCPY, (1973) pp. 302-306.
372 P. M. Pardalos, T. Mavridou, and J. Xue

[222] A.J.W. Hilton and R.J. Wilson, Edge Colorings of Graphs: A Progress
Report, in Graph Theory and its Applications East and West: Proceed-
ings of the First China- USA International Graph Theory Conference,
Vol. 576 of Annals of the New York Academy of Sciences, (1989) pp.
241-249.

[223] H.R. Hind, An Upper Bound for the Total Chromatic Number of Dense
Graphs, Journal of Graph Theory, 16 (1992) pp. 197-203.

[224] C.T. Hoang, Efficient Algorithms for Minimum Weighted Coloring


of Some Classes of Perfect Graphs, Discrete Applied Mathematics, 55
(1994) pp. 133-143.

[225] C.T. Hoang, A Parallel Algorithm for Minimum Weighted Coloring


of Triangulated Graphs, Theoretical Computer Science, 99 (1992) pp.
335-344.

[226] A.J. Hoffman, On Eigenvalues and Colorings of Graphs, in Graph


Theory and Its Applications, Bernard Harris (ed.), Academic Press, New
York, (1970) pp. 79-91.

[227] F. Hoffmann and K. Kriegel, A Graph-Coloring Result and Its Con-


sequences for Polygon-Guarding Problems, SIAM Journal on Discrete
Mathematics, 9 (1996) pp. 210-224.

[228] P. Holgate, Majorants of the Chromatic Number of a Random Graph,


J. Roy. Statist. Soc. Ser. B., 31 (1969) pp. 303-309.
[229] G. Hopkins and W. Staton, Graphs with Unique Maximum Indepen-
dent Sets, Discrete Mathematics, 57 (1985) pp. 245-251.

[230] K. Hung and T. Yum, An Efficient Code Assignment Algorithm


for Multihop Spread Spectrum Packet Radio Networks, IEEE Global
Telecommunications Conference and Exhibition, 1 (1990) pp. 271-274.

[231] J.P. Hutchinson, 3-Coloring Graphs Embedded on Surfaces with All


Faces Even-Sided, Journal of Combinatorial Theory Series B, 65 (1995)
pp. 139-155.

[232] S. Irani, Coloring Inductive Graphs On-Line, Algorithmica, 11 (1994)


pp.53-72.
The Graph Coloring Problem: A Bibliographic Survey 373

[233] S. Irani, Coloring Inductive Graphs On-Line, in 31st Annual Sympo-


sium on Foundations of Computer Science, (1990) pp. 470-479.

[234] R.W. Irving, NP-Completeness of a Family of Graph-Coloring Prob-


lems, Discrete Applied Mathematics, 5 (1983) pp. 111-117.

[235] K. Jansen and P. Scheffler, Generalized Coloring for Tree-Like Graphs,


Discrete Applied Mathematics, 75 (1997) pp. 135-155.

[236] T. R. Jensen and B. Toft, Graph Coloring Problems, Wiley Inter-


science Series in Discrete Mathematics and Optimization, John Wiley
& Sons, Inc., (1995).

[237] R. Jeurissen and W. Layton, Load Balancing by Graph-Coloring, an


Algorithm, Computers and Mathematics with Applications, 27 (1994)
pp.27-32.

[238] G. Johns and F. Saba, On the Path-Chromatic Number of a Graph,


in Graph Theory and its Applications East and West: Proceedings of the
First China- USA International Graph Theory Conference, Vol. 576 of
Annals of the New York Academy of Sciences, (1989) pp. 275-280.

[239] D.S. Johnson, Approximation Algorithms for Combinatorial Prob-


lems, Journal of Computer and System Sciences, 9 (1974) pp. 256-278.

[240] D.S. Johnson, Worst-Case Behavior of Graph-Coloring Algorithms, in


Proceedings of 5th Southeastern Conference on Combinatorics, Graph
Theory and Computing, Winnipeg, (1974) pp. 513-528.

[241] D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon, Opti-
mization by Simulated Annealing: An Experimental Evaluation, part II,
Graph Coloring and Number Partitioning, Operations Research, 39(3)
(1991) pp. 378-406.

[242] D.S. Johnson, M. Yannakakis, and C.H. Papadimitriou, On Generat-


ing all Maximal Independent Sets, Information Processing Letters, 27
(1988) pp. 119-123.

[243] A. Johri and D.W. Matula, Probabilistic Bounds and Heuristic Al-
gorithms, Technical Report 82-CSE-06, Southern Methodist University,
Department of Computer Science, (1982).
374 P. M. Pardalos, T. Mavridou, and J. Xue

[244] M.T. Jones and P.E. Plassmann, A Parallel Graph-Coloring Heuristic,


SIAM Journal on Scientific Computing, 14 (1993) pp. 654-669.

[245] J. Kahn, Asymptotically Good List-Colorings, Journal of Combinato-


rial Theory Series A, 73 (1996) pp. 1-59.

[246] J. Kahn, Coloring Nearly-Disjoint Hypergraphs with N+o{N) Colors,


Journal of Combinatorial Theory Series A, 59 (1992) pp. 31-39.

[247] A.A. Kalnins, The Coloring of Graphs in a Linear Number of Steps,


Cybern., 7 (1971) pp. 691-700.

[248] V. Kann, On the Approximability of the Maximum Common Subgraph


Problem, in STACS 92, (1992) pp. 377-388.

[249] D. Karger, R. Motwani, and M. Sudan, Approximate graph coloring by


semidefinite programming, in 35th Annual Symposium on Foundations
of Computer Science, IEEE (1994) pp. 2-13.

[250] R.M. Karp, Reducibility Among Combinatorial Problems, in R.E.


Miller and J.W. Thatcher, editors, Complexity of Computer Computa-
tions, Plennum Press, New York, NCPY, (1972) pp. 85-104.

[251] R.M. Karp and D.W. Matula, Probabilistic Behaviour of a Naive Col-
oring Algorithm on Random Graphs, Bull. Oper. Res. Soc. Amer., 23
Suppl2 (1975) pp. 264-264.

[252] T. Kawaguchi, H. Nakano, and Y. Nakanishi, Probabilistic Analysis


of a Heuristic Graph Coloring Algorithm, Electron. Com. Japan, 65(6)
(1982) pp. 12-18.

[253] J.B. Kelly and L.M. Kelly, Paths and Circuits in Critical Graphs,
American Journal of Mathematics, 76 (1954) pp. 786-792.

[254] S. Khanna, N. Linial and S. Safra, On the hardness of Approximating


The Chromatic Number, To appear.

[255] E.M. Kheifets, Planning of Operation of Communications Links in


Packet Radio Networks Using a Graph-Coloring Algorithm, Automatic
Control and Computer Sciences, 22(5) (1988) pp. 34-37.

[256] S. Khuller, Coloring Algorithms for ks-Minor Free Graphs, Informa-


tion Processing Letters, 34 (1990) pp. 203-208.
The Graph Coloring Problem: A Bibliographic Survey 375

[257] S. Khuller, Extending Planar Graph Algorithms to k3,3-Free Graphs,


Information and Computation, 84(1) (1990) pp. 13-25.

[258] H.A. Kierstaed, S.G. Penrice and W.T. Trotter, Online and First-Fit
Coloring of Graphs That Do Not Induce P5, SIAM Journal on Discrete
Mathematics, 8 (1995) pp. 485-485.

[259] H.A. Kierstead, S.G. Penrice and W.T. Trotter, Online Coloring and
Recursive Graph-Theory, SIAM Journal on Discrete Mathematics, 7
(1994) pp. 72-89.
[260] K. Kilakos and O. Marcotte, Fractional and Integral Colorings, Math-
ematical Programming, 76 (1997) pp. 333-347.

[261] S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi, Optimization by


Simulated Annealing, Science, 220(4598) (1983) pp. 671-679.
[262] D. Klimowicz and M. Kubale, Graph Coloring by Tabu Search and
Simulated Annealing, Archives of Control Sciences, 2(1-2} (1993) pp.
41-54.
[263] J.H. Kim, On 3-Colorings ofE(Kn}, Discrete Mathematics, 118 (1993)
pp. 269-273.
[264] S. Kim and S.L. Kim, A 2-Phase Algorithm for Frequency Assignment
in Cellular Mobile Systems, IEEE Transactions on Vehicular technology,
43 (1994) pp. 542-548.

[265] P. Kolte and M.J. Harrold, Load/Store Range Analysis for Global
Register Allocation, Proc. ACM SIGPLAN Conference on Programming
Language Design Implementation, (1993) pp. 268-277.
[266] D. Konig, Theorie der Endlichen and Unendlichen Graphen, Chelsea,
New York, (1950).
[267] A.A. Kooshesh and B.M.E. Moret, 3-Coloring the Vertices of a Trian-
gulated Simple Polygon, Pattern Recognition, 25 (1992) pp. 443-443.
[268] RR Korfhage and D.W. Matula, On S and OJ More on the Salazar
and Oakford Paper, Communications of the ACM, 18 (1975) pp. 240-240.
[269] S.M. Korman, The Graph-Colouring Problem, in N. Christofides, A.
Mingozzi, P. Toth, and C. Sandi, editors, Combinatorial Optimization,
John Wiley & Sons, Inc., (1979) pp. 211-235.
376 P. M. Pardalos, T. Mavridou, and J. Xue

[270] A.D. Korshunov, The Chromatic Number of n- Vertex Graphs,


Diskret. Analiz, 35 (in Russian), (1980) pp. 15-44.
[271] J. Korst, E. Aarts, J.K. Lenstra and J. Wessels, Periodic Assignment
and Graph-Coloring, Discrete Applied Mathematics, 51 (1994) pp. 291-
305.
[272] V.P. Korzhik, A Lower-Bound for the One-Chromatic Number of a
Surface, Journal of Combinatorial Theory Series B, 61 (1994) pp. 40-56.

[273] A.V. Kostochka, List Edge Chromatic Number of Graphs with Large
Girth, Discrete Mathematics, 101 (1992) pp. 189-201.

[274] V. Kostochka, The Total Colouring of a Multigraph with Maximal


Degree 4, Discrete math., 17 (1977) pp. 161-163.
[275] J. Kratochvil and Z. Tuza, Algorithmic Complexity of List Colorings,
Discrete Applied Mathematics, 50 (1994) pp. 297-302.
[276] M. Kubale, Interval Edge-Coloring of a Graph with Forbidden Colors,
Discrete Mathematics, 121 (1993) pp. 135-143.
[277] M. Kubale, Some Results Concerning the Complexity of Restricted
Colorings of Graphs, Discrete Applied Mathematics, 36 (1992) pp. 35-
46.
[278] M. Kubale and J. Dabrowski, Empirical Comparison of Efficiency of
Some Graph Colouring Algorithms, Arch. Autom. Telemech. (Poland),
23 (1978) pp. 129-139.
[279] M. Kubale and B. Jackowski, A Generalized Implicit Enumeration Al-
gorithm for Graph Coloring, Communications of the ACM, 28(4) (1985)
pp. 412-418.

[280] M. Kubale and E. Kusz, Computer Experiences with Implicit Enumer-


ation Algorithms for Graph Coloring, in M. Nagl and J. Perl, editors,
International Workshop on Graph Theoretic Concepts in Computer Sci-
ence. Proceedings of the WG '83, Linz, Austria, 1983. Universitatisver-
lag Rudolf Trauner. GCL (ACM Guide to Computing Literature) ACM.
(1983) pp. 167-176
[281] L. Kucera, The greedy coloring is a bad probabilistic algorithm, Jour-
nal of Algorithms, 12 (1991) pp. 674-684.
The Graph Coloring Problem: A Bibliographic Survey 377

[282] L. Kucera, Graphs With Small Chromatic Number are Easy to Color,
Information Processing Letters, 30 (1989) pp. 233-236.

[283] L. Kucera, Expected Behavior of Graph Coloring Algorithms, in FCT


'77, Vol. 56 of Lecture Notes in Computer Science, Springer-Verlag,
Berlin, (1977) pp. 447-451.
[284] KR. Kumar, Al Kusiak and A. Vanelli, Grouping of Parts and Compo-
nents in Flexible Manufacturing Systems, EJOR, 24 (1986) pp. 387-397.
[285] T.V. Lakshman, A. bagchi and K Rastani, A Graph-Coloring Scheme
for Scheduling Cell 'Iransmissions and Its Photonic Implementation,
IEEE Transactions on Communications, 42 (1994) pp. 2062-2070.

[286] T.A. Lanfear, Graph Theory and Radio Assignment, Allied Radio
Frequency Agency, NATO EMC Analysis Program, Project no. 5, (1989).
[287] E.L. Lawler, A Note on the Complexity of the Chromatic Number
Problem, Information Processing Letters, 5 (1976) pp. 66-67
[288] J. Lee and J. Leung, A Comparison of 2 Edge-Coloring Formulations,
Operations Research Letters, 13 (1993) pp. 215-223.
[289] F.T. Leighton, A Graph Colouring Algorithm for Large Scheduling
Problems, J. Res. Nat. Bur. Stand., 84 (1979) pp. 489-496.
[290] D.S.P. Leung, Application of the Partial Backtracking Technique to
the Frequency Assignment Problem, IEEE International Symposium on
Electromagnetic Compatibility, (1981) pp. 70-74.
[291] G. Lewandowski and A. Condon, Experiments with parallel graph col-
oring heuristics and applications of graph coloring, in Cliques, Coloring,
and Satisfiability: Second DIMACS Implementation Challenge, Johnson
and 'Irick (eds.), 26 (1996) pp. 309-334
[292] W.F. Liang, X.J. Shen and Q. Hu, Parallel Algorithms for the Edge-
Coloring and Edge-Coloring Update Problems, Journal of Parallel and
Distributed Computing, 32 (1996) pp. 66-73.
[293] D.R. Lick and A.T. White, k-degenerate Graphs, Canad. J. Math., 22
(1970) pp. 1082-1096.
[294] V. Linek, Coloring Steiner Quadruple Systems, Journal of Combina-
torial Theory Series A, 70 (1995) pp. 45-55.
378 P. M. Pardalos, T. Mavridou, and J. Xue

[295] N. Linial and U. Vazirani, Graph Products and Chromatic Numbers,


in 30th Annual Symposium on Foundations of Computer Science, (1989)
pp. 124-128.

[296] N. Linial, Graph Coloring and Monotone Functions for Posets, Dis-
crete Mathematics, 58(1) (1986) pp. 97-98.

[297] D.D.F. Liu, T-Colorings of Graphs, Discrete Mathematics, 101 (1992)


pp. 203-212.

[298] V. Lotfi and S. Sarin, A Graph Coloring Algorithm for Large Scale
Scheduling Problems, Computers and Operations Research, 13(1) (1986)
pp. 27-32.

[299] E. Loukakis, A New Backtracking Algorithm for Generating the Fam-


ily of Maximal Independent Sets of a Graph, Computers and Mathemat-
ics with Applications, 9 (1983) pp. 583-589.

[300] E. Loukakis and C. Tsouros, Determining the Number oflnternal Sta-


bility of a Graph, International Journal of Computer Mathematics, 11
(1982) pp. 207-220.

[301] L. Lov<isz, Three Short Proofs in Graph Theory, J. Combinatorial


Theory (B), 19 (1975) pp. 269-271.

[302] L. Lov<isz, On the Chromatic Number of Finite Set Systems, Acto.


Math. Acad. Sci. Hungary, 19 (1968) pp. 59-67.

[303] L. Lov<isz, On Decomposition of Graphs, Studia Sci. Math. Hungar.,


1 (1966) pp. 237-238.

[304] L. Lov<isz, M. Saks, and W.T. Trotter, An On-Line Graph Color-


ing Algorithm with Sublinear Performance Ratio, Discrete Mathematics,
75(13) (1989) pp. 319-325.

[305] Z.K. Lu, The Harmonious Chromatic Number of a Complete Binary


and Trinary Tree, Discrete Mathematics, 118 (1993) pp. 165-172.

[306] Z.K. Lu, On an Upper Bound for the Harmonious Chromatic Number
of a Graph, Journal of Graph Theory, 15 (1991) pp. 345-347.
[307] T. Luczak, The Chromatic Number of Random Graphs, Combinator-
ica, 11(1} (1991) pp. 45-54.
The Graph Coloring Problem: A Bibliographic Survey 379

[308] T. Luczak, A Note on the Sharp Concentration of the Chromatic Num-


ber of Random Graphs, Combinatorica, 11(3) (1991) pp. 295-297.

[309] C. Lund and M. Yannakakis, On the Hardness of Approximating Mini-


mization Problems, in Conference Proceedings of the Annual ACM Sym-
posium on Theory of Computing, Publ. by ACM, New York, (1993) pp.
286-293.

[310] V.H. MacDonald, Advanced Mobile Phone Service: The cellular con-
cept, Bell Syst. Tech. J, 58 (1979) pp. 15-41.

[311] N.V.R. Mahadev and F.S. Roberts, Amenable Colorings, Technical


Report 92-26, DIMACS, Rutgers University, New Brunswick, NJ, (1992).

[312] N.V.R. Mahadev, F.S. Roberts, and P. Santhanakrishnan, 3-


Choosable Complete Bipartite Graphs, Technical Report 91-62, DI-
MACS, Rutgers University, New Brunswick, NJ, (1991).

[313] B. Manvel, Extremely Greedy Coloring Algorithms, in Graphs and


Applications (Boulder, CO, 1982), John Wiley & Sons, Inc., New York,
(1985) pp. 257-270.

[314] B. Manvel, Coloring Large Graphs, in F. Harary and J.S. Maybee, edi-
tors, Proceedings of the 12th Southeastern Conference on Combinatorics,
Graph Theory and Computing, (1981) pp. 197-204.

[315] M.V. Marathe, H.B. Hunt and S.S. Ravi, Efficient Approximation Al-
gorithms for Domatic Partition and Online Coloring of Circular Arc
Graphs, Discrete Applied Mathematics, 64 (1996) pp. 135-149.

[316] D.W. Matula, Expose-and-Merge Exploration and the Chromatic


Number of a Random Graph, Combinatorica, 7(3) (1987) pp. 275-284.

[317] D.W. Matula, Bounded Color Functions on Graphs, Networks, 2,


NCPY. (1972) pp. 29-44.

[318] D.W. Matula, A Min-Max Theorem for Graphs with Application to


Coloring, SIAM Review, 10 (1968) pp. 481-482.

[319] D.W. Matula and L.L. Beck, Smallest-Last Ordering and Clustering
and Graph Coloring Algorithms, Journal of the ACM, 30(3) (1983) pp.
417-427.
380 P. M. Pardalos, T. Mavridou, and J. Xue

[320] D.W. Matula and L. Kucera, An Expose-and-Merge Algorithm and the


Chromatic Number of a Random Graph, in M. Karonski, J. Jaworski,
and A. Ruciinski, editors, Random Graphs '87, John Wiley & Sons, Inc.,
(1990) pp. 175-187.

[321] D.W. Matula, G. Marble, and J.D. Isaacson, Graph Coloring Algo-
rithms, in Graph Theory and Computing, Academic Press, Inc., (1972)
pp. 109-122.

[322] H.A. Maurer, A. Salomaa, and D. Wood, Colorings and Interpreta-


tions: A Connection Between Graphs and Grammar Forms, Discrete
Applied Mathematics, 3 (1981) pp. 119-135.

[323] S.T. McCormick, Optimal Approximation of Sparse Hessians and Its


Equivalence to a Graph Coloring Problem, Mathematical Programming,
26(2) (1983) pp. 153-171.

[324] C.J.H. McDiarmid, Colouring Random Graphs, Annals of Operations


Research, 1 (1984) pp. 183-200.

[325] C.J.H. McDiarmid, On the Chromatic Forcing Number of a Random


Graph, Discrete Applied Mathematics, 5 (1983) pp. 123-132.

[326] C.J.H. McDiarmid, Achromatic Numbers of Random Graphs, Math-


ematical Proceedings of the Cambridge Philosophical Society, 92 (1982)
pp.21-28.

[327] C.J.H. McDiarmid, Colouring Random Graphs Badly, in Graph The-


ory and Combinatoric$' (Proc. Conj., Open Univ., Milton Keynes, 1978),
Res. Notes in Mathematics 34, Pitman, San Francisco, (1979) pp. 76-86.

[328] C.J.H. McDiarmid, Determining the Chromatic Number of a Graph,


SIAM Journal on Computing, 8 (1979) pp. 1-14.

[329] C. Mcdiarmid and X.H. Luo, Upper-Bounds for Harmonious Color-


ings, Journal of Graph Theory, 15 (1991) pp. 629-636.

[330] C. Mcdiarmid and B. Reed, On Total Colorings of Graphs, Journal of


Combinatorial Theory Series B, 57 (1993) pp. 122-130.

[331] C.J.H. Mcdiarmid and A. Sanchezarroyo, An Upper Bound for Total


Coloring of Graphs, Discrete Mathematics, 111 (1993) pp. 389-392.
The Graph Coloring Problem: A Bibliographic Survey 381

[332] A. Mehrotra and M.A. Trick, A Column Generation Approach for


Graph Coloring, Technical Report, GSAI, Carnegie Mellon University,
Pittsburgh, PA, (1995).

[333] N. Metropolis, A.W. Rosenbluth, A.H. Teller, and E. Teller, Equation


of State Calculation by Fast Computing Machines, J. of Chem. Phys.,
21 (1953) pp. 1087-1091.

[334] B.H. Metzger, Spectrum Management Technique, Presented at 38th


National ORSA Meeting Detroit Michigan, Fall 1970.

[335] D.M. Miller, An Algorithm for Determining the Chromatic Number of


a Graph, in Proceedings 5th Manitoba Conference on Numerical Math.,
{1975} pp. 533-548.

[336] J. Mitchem, On Various Algorithms for Estimating the Chromatic


Number of a Graph, The Computer Journal, 19 {1976} pp. 182-183.

[337] C. Morgenstern, Distributed Coloration Neighborhood Search, in


Cliques, Coloring, and Satisfiability: Second DIMACS Implementation
Challenge, Johnson and Trick (eds.), 26 (1996) pp. 335-358

[338] C. Morgenstern, Improved Implementations of Dynamic Sequential


Algorithms, Technical report, Texas Christian University, Fort Worth,
Texas, (1991).

[339] C. Morgenstern, Algorithms for General Graph Coloring, Technical


report, University of New Mexico, Albuquerque, {1989}.

[340] C. Morgenstern and H. Shapiro, Improved Implementations of Dy-


namic Sequential Algorithms, Technical report, Texas Christian Univer-
sity, Fort Worth, Texas, {1991}.

[341] B.R. Myers and R. Liu, A Lower Bound on the Chromatic Number of
a Graph, Networks, 1 (1972) pp. 273-277.

[342] R. Nelson and R.J. Wilson, editors, Graph Colorings. Pitman Research
notes in Mathematics, Longman Scientific and Technical, 1990. A one
day conference: survey papers.

[343] G.A. Neufeld and J. Tartar, Generalized Graph Coloration, SIAM


Journal on Applied Mathematics, 29 (1975) pp. 91-98.
382 P. M. Pardalos, T. Mavridou, and J. Xue

[344] G.A. Neufeld and J. Tartar, Graph Coloring Conditions for the Ex-
istence of Solutions to the Timetable Problem, Communications of the
ACM, 17 (1974) pp. 450-453.

[345] G.N. Newsam and J.D. Ramsdell, Estimation of Sparse Jacobian Ma-
trices, SIAM Journal on Algebraic and Discrete Methods, 4(3) (1983)
pp. 404-418.

[346] B.R. Nickerson, Graph Coloring Register Allocation for Processors


with Mutiregister Operands, SIGPLAN Notices: ACM Special Interest
Group on Programming Languages, 25(6) (1990) pp. 40-52.

[347] U.J. Nieminen, A Viewpoint to the Minimum Coloring Problem of


Hypergraphs, Kybernetika(Prague), 10 (1974) pp. 504-508.

[348] E. Nordhaus, E. and J. Gaddum, On Complementary Graphs Color-


ing, Amer. Math. Monthly, 63 (1956) pp. 175-177.

[349] S. Olariu and J. Randall, Welsh-Powell Opposition Graphs, Informa-


tion Processing Letters, 31(1) (1989) pp. 43-46.

[350] R.J. Opsut, Optimization of Set Assignments for Graphs, Ph.D. The-
sis, Dept. of Mathematics, Rutgers University, New Brunswick, NJ,
(1984).

[351] R.J. Opsut and F.S. Roberts, I-colorings, I-phasings, and I-intersection
Assignments for Graphs, and their Applications, Networks, 13 (1983) pp.
327-345.

[352] R.J. Opsut and F.S. Roberts, On the Fleet Maintenance, Mobile Radio
Frequency, Task Management, and Traffic Phasing Problems, in G. Char-
trand, Y. Alavi, D.L. Goldsmith, L. Lesniak-Foster, and D.L. Lick, edi-
tors, The Theory and Applications of Graphs, Wiley, New York, (1981)
pp.479-492.

[353] P.R.J. Ostergard, A Coloring Problem in Hamming-Spaces, European


Journal of Combinatorics, 18 (1997) pp. 303-309.

[354] A. Panconesi and A. Srinivasan, Randomized Distributed Edge-


Coloring via an Extension of the Chernoff-Hoeffding Bounds, SIAM
Journal of Computing, 26 (1997) pp. 350-368.
The Graph Coloring Problem: A Bibliographic Survey 383

[355] A. Panconesi and A. Srinivasan, The Local Nature of ~-Coloring and


Its Algorithmic Applications, Combinatorica, 15 (1995) pp. 255-280.

[356] A. Panconesi and A. Srinivasan, Improved Distributed Algorithms for


Coloring and Network Decomposition Problems, in Proceedings of the
24th Annual ACM Symposium on Theory of Computing, (1992) pp. 581-
592.
[357] C.H. Papadimitriou and M. Yannakakis, Optimization, Approxima-
tion and Complexity Classes, in Proceedings of the Twentieth Annual
ACM Symposium on Theory of Computing, (1988) pp. 229-234.

[358] P.M. Pardalos and N. Desai, An Algorithm for Finding a Maximum


Weighted Independent Set in an Arbitrary Graph, International Journal
of Computer Mathematics, 38 (1991) pp. 163-175.

[359] P.M. Pardalos and A. Phillips, A Global Optimization Approach to


Solving the Maximum Clique Problem, International Journal of Com-
puter Mathematics, 33 (1990) pp. 209-216.

[360] P.M. Pardalos and G.P. Rodgers, A Branch and Bound Algorithm
for the Maximum Clique Problem, Computers and Operations Research,
19(5) (1992) pp. 363-376.

[361] P.M. Pardalos and J. Xue, The Maximum Clique Problem. Journal of
Global Optimization, 3 (1994) pp. 463-482.

[362] P.M. Pardalos and H. Wolkowicz, Quadratic Assignment and Related


Problems, DIMACS Series, American Mathematical Society, {1994}.
[363] C. Payan, On the Chromatic Number of Cube-Like Graphs, Discrete
Mathematics, 103 {1992} pp. 271-277.

[364] J. Peemoller, Numerical Experiences with Graph Coloring Algorithms,


European Journal of Operations Research, 24{1} First EURO VI special
issue, (1986) pp. 146-151.

[365] J. Peemoller, A Correction to Brelaz's Modification of Brown's Color-


ing Algorithm, Communications of the ACM, 26(8) {1983} pp. 595-597.

[366] R.J. Pennotti and R.B. Boorstyn, Channel Assignment for Cellu-
lar Mobile Telecommunication Systems, Proc. NTC, {1976} pp. 16.5.1-
16.5.5.
384 P. M. Pardalos, T. Mavridou, and J. Xue

[367] RJ. Pennotti, Channel Assignment in Cellular Mobile Telecommuni-


cation Systems, Ph.D. Dissertation Polytech. Inst. New York, (1976).
[368] A.D. Petford and D.J. Welsh, A Randomized 3-Coloring Algorithm,
Discrete Mathematics 74 (1989) pp. 253-261.
[369] S.S. Pinter, Register Allocation with Instruction Scheduling: A New
Approach, ACM SIGPLAN Conference on Programming Language De-
sign Implementation, (1993) pp. 1248-257.
[370] B. Pittel and RS. Weishaar, Online Coloring of Sparse Random
Graphs and Random 'frees, Journal of Algorithms, 23 (1997) pp. 195-
205.
[371] C. Pommerell, M. Annaratone and W. Fichtner, A Set of New Map-
ping and Coloring Heuristics for Distributed-Memory Parallel Proces-
sors, SIAM Journal on Scientific and Statistical Computing, 13 (1992)
pp. 194-226.
[372] J. Preater, A passage time for greedy-coloring cycles, Random Struc-
tures and Algorithms, 6 (1995) pp. 105-111.

[373] D. Rafaeli, D. Mahalel and J. Prashker, Heuristic Approach to Task


Scheduling: 'Weight' and 'Improve' Algorithms, International Journal
of Production Economics, 29(2) (1993) pp. 175-186.
[374] P. Rajcani, Optimal parallel3-coloring algorithm for rooted trees and
its applications, Information Processing Letters, 41 (1992) pp. 153-156.
[375] C.P. Ravikumar and R Aggarwal, Parallel Search-and-Learn Tech-
niques and Graph-Coloring, Knowledge-based Systems, 9 (1996) pp. 3-
13.
[376] A. Raychaudhuri, Further Results on T-Coloring and Frequency As-
signment Problems, SIAM Journal on Discrete Mathematics, 7 (1994)
pp. 605-613.
[377] A. Raychaudhuri, Optimal Multiple Interval Assignments in Fre-
quency Assignment and 'fraffic Phasing, Discrete Applied Mathematics,
40 (1992) pp. 319-332.
[378] A. Raychaudhuri, Intersection Assignments, T-Colorings, and Powers
of Graphs, Ph.D. thesis, Dept. of Mathematics, Rutgers University, New
Brunswick, NJ, {1985}.
The Graph Coloring Problem: A Bibliographic Survey 385

[379] A.A. Razborov, The Gap Between the Chromatic Number of a Graph
and the Rank of Its Adjacency Matrix Is Superlinear, Discrete Mathe-
matics, 108 {1992} pp. 393-396.

[380] C.R. Reeves, Modern Heuristic Techniques, Modern Heuristic Search


Methods, Rayward-Smith, Osman, Reeves, and Smith {eds.}, Wiley,
(1996) pp. 1-26.

[381] F .S. Roberts, New Directions in Graph Theory, in Que Vadis, Graph
Theory? J. Gimbel, J.W. Kennedy, and L.V. Quintas, editors, Annals
of Discrete Mathematics, 55 {1993} pp. 13-44.

[382] F.S. Roberts, T-Colorings of Graphs: Recent Results and Open Prob-
lems, Discrete Mathematics, 93 {1991} pp. 229-245.

[383] F.S. Roberts, Set, T-, and List Colorings, in The First Workshop
on Combinatorial Optimization in Science and Technology (COST), E.
Boros and P.L. Hammer, editors, DIMACS/RUTCOR Tech. Rep. 3-91,
Rutgers University, New Brunswick, NJ, {1991} pp. 290-297.

[384] F.S. Roberts, From Garbage to Rainbows: Generalizations of Graph


Coloring and their Applications, in Graph Theory, Combinatorics, and
Applications, Vol. 2. Y. Alavi, G. Chartrand, O.R. Oellermann, and A.J.
Schwenck, editors, Wiley, New York, {1991} pp. 1031-1052.

[385] F.S. Roberts, Applied Combinatorics, Prentice-Hall, Englewood Cliffs,


NJ, (1984).
[386] F.S. Roberts, On the Mobile Radio Frequency Assignment Problem
and the Traffic Light Phasing Problem, Annals New York Acad. Sci.,
319 (1979) pp. 466-483.

[387] F .S. Roberts, Graph Theory and its Applications to Problems of Soci-
ety. CBMS-NSF Monograph No.29, SIAM, Philadelphia, {1978}.

[388] F.S. Roberts, Discrete Mathematical Models, with Applications to So-


cial, Biological, and Environmental Problems. Prentice-Hall, Englewood
Cliffs, NJ, {1976}.

[389] N. Robertson, P. Seymour and R. Thomas, Tutte's Edge-Coloring


Conjecture, Journal of Combinatorial Theory Series B, 70 {1997} pp.
166-183.
386 P. M. Pardalos, T. Mavridou, and J. Xue

[390] N. Robertson, D. Sanders, P. Seymour and R Thomas, The Four-


Coloring Problem, Journal of Combinatorial Theory Series B, 70 (1997)
pp.2-44.
[391] J.M. Robson, An Estimate of the Store Size Necessary for Dynamic
Storage Allocation, Journal of the ACM, 18 (1971) pp. 416-423.

[392] A. Rosa, On a Class of Computable Partial Edge-Colorings, Discrete


Applied Mathematics, 35 (1992) pp. 293-299.

[393] S.1. Roschke and A.L. Furtado, An Algorithm for Obtaining the Chro-
matic Number and an Optimal Coloring of a Graph, Inf. Process. Lett.
(The Netherlands), 2 (1973) pp. 34-38.
[394] D.J. Rose, A Graph-Theoretic Study of the Numerical Solution of
Sparse Positive Definite System of Linear Equations, In Read, (1972)
pp. 183-217
[395] D.J. Rose, RE. Tarjan, and G.S. Leuker, Algorithmic Aspects ofVer-
tex Elimination on Graphs, SIAM J. Comput., 5 (1976) pp. 266-283.

[396] M. Rosenfeld, The Total Chromatic Number of Certain Graphs, No-


tices Amer. Math. Soc., 15 (1986) pp. 360-365.

[397] M. Rosenfeld, On the Total Coloring of Certain Graphs, Israel J.


Math., 9 (1971) pp. 396-402.
[398] RL. Roth, Perfect Colorings of Multipatterns in the Plane, Discrete
Mathematics, 122 (1993) pp. 269-286.

[399] T. Rus and S. Pemmaraju, Using Graph-Coloring in an Algebraic


Compiler, ACTA Informatica, 34 (1997) pp. 191-209.
[400] V. Rutenberg, Complexity of Generalized Graph Coloring, in J.
Gruska, B. Rovan, and J. Wiedermann, editors, Mathematical Foun-
dations of Computer Science 1986; Proceedings of the Twelfth Sympo-
sium Held in Bratislava, Vol. 233 of Lecture Notes in Computer science,
Springer-Verlag (1986) pp. 537-581.

[401] J. Ryan, The depth and width of local minima in discrete solution
spaces, Discrete Applied Mathematics, 56 (1995) pp. 75-82.
[402] D. Sakai and C. Wang, No-Hole (R+l)-Distant Colorings, Discrete
Mathematics, 119 (1993) pp. 175-189.
The Graph Coloring Problem: A Bibliographic Survey 387

[403] T. Sakaki, K. Nakashima, and Y. Hattori, Algorithms for Finding in


Lump Both Bounds ofthe Chromatic Number of a Graph, The Computer
Journal, 19 (1976) pp. 329-332.

[404] A. Salazar and R.V. Oakford, A Graph Formulation of a School


Scheduling Algorithm, Communications of the ACM, 18 (1974) pp. 241-
242.

[405] N.Z. Salvi, A Note on the Line-Distinguishing Chromatic Number and


the Chromatic Index of a Graph, Journal of Graph Theory 17 (1993) pp.
589-591.

[406] S. Sensarma and B.K. Bandyopadhyay, Some Sequential Graph


Colouring Algorithms, International Journal of Electronics, 67(2) (1989)
pp. 187-199.

[407] 1. Schiff, Traffic Capacity of Three Types of Common-User Mobile


Radio Communication Systems, IEEE Transactions of Communications
Technology, COMM-18 (1970) pp. 12-21.

[408] G. Schmidt and T. Strohleim, Timetable Construction - an Anno-


tated Bibliography, The Computer Journal, 23 (1980) pp. 307-316.

[409] T.B. Scott, Graph Colouring with Preassignment and Unavailability


Constraints, Ars Combinatorica, 2 (1976) pp. 25-32.

[410] M. Sengoku, On Telephone Traffic in a Mobile Radio Communication


System using Dynamic Frequency Assignment, Trans. IECE Japan, E61
(1978) pp. 850-851.

[411] M. Sengoku, H. Tamura, S. Shinoda and T. Abe, Graph and Network


Theory and Cellular Mobile Communications, Proc. IEEE International
Symposium on Circuits and Systems, 4 (1993) pp. 2208-2211.

[412] M. Sengoku, H. Tamura, S. Shinoda, T. Abe, and Y. Kajitani, Graph


Theoretical Considerations of Channel Offset Systems in a Cellular Mo-
bile System, IEEE Transactions Vehicular Technology, 40(2) (1991) pp.
405-411.

[413] S. Sensarma, R. MandaI and A. Seth, Some Sequential Graph-Coloring


Algorithms for Restricted Channel Routing, International Journal of
Electronics, 77 (1994) pp. 81-93.
388 P. M. Pardalos, T. Mavridou, and J. Xue

[414] E. C. Sewell, An Improved Algorithm for Exact Coloring, in Cliques,


Coloring, and Satisfiability: Second DIMACS Implementation Chal-
lenge, Johnson and Trick (eds.), 26 (1996) pp. 359-373.
[415] E. Shamir and J. Spencer, Sharp Concentration of the Chromatic
Number of Random Graphs Gn,p, Combinatorica, 7 (1987) pp. 121-129.
[416] E. Shamir and E. Upfal, Sequential and Distributed Graph Coloring
Algorithms with Performance Analysis in Random Graph Spaces, Jour-
nal of Algorithms, 5(4) (1984) pp. 488-501.
[417] S. Shuller and V.V. Vazirani, Planar Graph Coloring is not Self-
Reducible Assuming P Not Equal to NP, Technical Report 89-1064, Cor-
nell University, Dept. of Computer Science, 1989. Subfile: STR (Stanford
Technical Report).
[418] G.J. Simmons, The Chromatic Number of Planar Graphs, in F. Harary
and J .S. Maybee, editors, Graphs and Applications: Proceedings of the
First Colorado Symposium on Graph Theory, John Wiley & Sons, Inc.,
(1985) pp. 295-316.
[419] H. Simon, On Approximate Solutions for Combinatorial Optimization
Problems, SIAM Journal on Discrete Mathematics, 3(2) (1990) pp. 294-
310.
[420] F.W. Sinden, Coloring Drawings of Bipartite Graphs - A Problem
in Automated Assembly, Discrete Applied Mathematics, 41 (1993) pp.
55-68.
[421] D.H. Smith, Graph Coloring and Frequency Assignment, Ars Combi-
natorica, 25c (1988) pp. 205-212.
[422] A. Soifer, A 6-Coloring of the Plane, Journal of Combinatorial Theory
Series A, 61 (1992) pp. 292-294.
[423] D. Springer and D. E. Thomas, New methods for coloring and clique
partitioning in data path allocation, Integration, 12 (1991) pp. 267-292.
[424] J.P. Spinrad and G. Vijayan, Worst Case Analysis of a Graph Coloring
Algorithm, Discrete Applied Mathematics, 12(1) (1985) pp. 89-92.
[425] P.K. Srimani, B.P. Sinha, and A.K. Choudhury, A New Method to
Find Out the Chromatic Partition of a Symmetric Graph, Int. J. Syst.
Sci., 9 (1978) pp. 1425-1437.
The Graph Coloring Problem: A Bibliographic Survey 389

[426] S. Stahl, n-tuple Colorings and Associated Graphs, Journal of Com-


binatorial Theory, 20 (1976) pp. 185-203.

[427] K Stecke, Design Planning, Scheduling and Control Problems of Flex-


ible Manufacturing Problems, Annals of Oper. Res., 3 (1985) pp. 3-12

[428] M. Stiebitz, Subgraphs of Colour-critical Graphs, Combinatorica, 7


(1987) pp. 303-312.

[429] L. Stockmeyer, Planar 3-Colorability is Polynomial Complete, ACM


SIBACT News, 5(3) NCPY. (1973) pp. 19-25.

[430] KE. Stoffers, Scheduling Traffic Lights-A New Approach, Transporta-


tion Res., 2 (1968) pp. 199-234.

[431] G. Stoyan and R. Stoyan, Coloring the Discretization Graphs Arising


in the Multigrid Method, Computers & Mathematics with Applications,
22 (1991) pp. 55-62.

[432] D.P. Sumner, Subtrees of a Graph and the Chromatic Number, in:
Theory and Applications of Graphs, ed., Chartrand, (1981) pp. 557-576.

[433] L. Sun, A note on Coloring of Complete Graphs, Quarterly Journal of


Mathematics, 46 (1995) pp. 235-237.

[434] S. Sur and P.K Srimani, A Self-Stabilizing Algorithm for Coloring


Bipartite Graphs, INFORMATION SCIENCES 69 (1993) pp. 219-227.

[435] M.M. Syslo, N. Deo, and J.S. Kowalik, Discrete Optimization Algo-
rithms with PASCAL Programs, Prentice-Hall, Englewood, NJ, (1983).

[436] G. Szekeres and H.S. Wilf, An Inequality for the Chromatic Number
of a Graph, Journal of Combinatorial Theory, 4 (1968) pp. 1-3.

[437] Y. Takefuji and KC. Lee, Artificial Neural Networks for 4-Coloring
Map Problems and K-Colorability Problems, IEEE Transactions on Cir-
cuits and Systems, 38 (1991) pp. 326-333.

[438] Y. Takefuji and KC. Lee, Parallel Algorithms for Finding a Near-
Maximum Independent Set of a Circle Graph - Reply, IEEE Transactions
on Neural Networks, 2 (1991) pp. 329-329.
390 P. M. Paz-dalos, T. Mavridou, and J. X ue

[439] T.K. Tan and J.K. Pollard, Determination of Minimum Number of


Wavelengths Required for All-Optical WDM Networks Using Graph-
Coloring, Electronics Letters, 31 (1995) pp. 1895-1897.

[440] R.E. Tarjan, Decomposition by Clique Separators, Discrete Mathe-


matics, 55(2) (1985) pp. 221-232.

[441] R.E. Tarjan and A.E. Trojanowski, Finding a Maximum Independent


Set, SIAM Journal on Applied Mathematics, 6(3) (1977) pp. 537-546.

[442] M. Tateishi and S. Tamura, Artificial Neural Networks for 4-Coloring


Map Problems and K-Colorability Problems, IEEE Transactions on Cir-
cuits and Systems I-Fundamental Theory and Applications, 41 (1994) pp.
248-249.

[443] B.A. Tesman, List T-colorings of Graphs, Discrete Applied Mathemat-


ics, 45 (1993) pp. 277-289.

[444] J. Thepot and G. Lechenault, A Note on the Application of Hierar-


chical Clustering to Graph Coloring, RAIRO Rech. Oper., 15(1) (1981)
pp.73-83.

[445] C. Thomassen, 3-List-Coloring Planar Graphs of Girth-5, Journal of


Combinatorial Theory Series B, 64 (1995) pp. 101-107.

[446] C. Thomassen, 5-Coloring Graphs on the Torus, Journal of Combina-


torial Theory Series B, 62 (1994) pp. 11-33.

[447] C. Thomassen, 5-Coloring Maps on Surfaces, Journal of Combinatorial


Theory Series B, 59 (1993) pp. 89-105.

[448] B. Toft, Colouring, Stable Sets and Perfect Graphs, in Handbook of


Combinatorics (R.L. Graham et al Editors), Elsevier (1995) pp. 223-288.

[449] B. Toft, Color-critical Graphs and Hypergraphs, J. Combinat. Theory


B, 16 (1974) pp. 145-161.

[450] B. Toft, On Critical Subgraphs of Colour-critical graphs, Discrete


Math., 7 (1974) pp. 377-392.

[451] B. Toft, On the Maximal Number of Edges of Critical k-chromatic


Graphs, Studia Sci. Math. Hungar., 5 (1970) pp. 461-470.
The Graph Coloring Problem: A Bibliographic Survey 391

[452] I. Tomescu, An Algorithm for Determining the Chromatic Number of


a Finite Graph, Econ. Comput. Cybern. Stud. Res. (Rumania), 1 (1969)
pp. 69-81.
[453] I. Tomescu, Sur Le Probleme du Coloriage des Graphes Generalises,
C. R. Acad. Sci. Paris Ser. A, 267 (1968) pp. 250-252.

[454] J. Topp and L. Volkmann, Some Upper-Bounds for the Product ofthe
Domination Number and the Chromatic Number of a Graph, Discrete
Mathematics, 118 (1993) pp. 289-292.

[455] W.T. 'frotter and F. Harary, On Double and Multiple Interval Graphs,
Journal of Graph Theory, 3 (1979) pp. 205-211.

[456] D.S. 'froxell, No-Hole k-'fuple (R+1)-Distant Colorings, Discrete Ap-


plied Mathematics, 64 (1996) pp. 67-85.

[457] M. 'fruszczynski, Generalized Local Colorings of Graphs, Journal of


Combinatorial Theory Series B, 54 (1992) pp. 178-188.

[458] P. 'furan, On the Theory of Graphs, Colloq. Math., 3 (1954) pp. 19-30.

[459] J.S. 'furner, Almost All k-Colorable Graphs are Easy to Color, Journal
of Algorithms, 99 (1988) pp. 63-82.

[460] J.S. 'furner, On the Probable Performance of Graph Colouring Algo-


rithms, Technical Report WUCS-85-7, (Washington University, Dept. of
Computer Science, 1985. Subfile: STR (Stanford Technical Reports).
[461] J.S. 'furner, Point Symmetric Graphs with a Prime Number of Points,
Journal of Combinatorial Theory, 3 (1967) pp. 136-145.

[462] A. 'fucker, Coloring a Family of Circular Arcs, SIAM J. Appl. Math.,


29 (1975) pp. 493-502.
[463] A.C. 'fucker, Perfect Graphs and an Application to Optimizing Mu-
nicipal Services, SIAM Rev., 15 (1973) pp. 585-590.
[464] Z. 'fuza, Graph-Coloring in Linear Time, Journal of Combinatorial
Theory Series B, 55 (1992) pp. 236-243.

[465] R. Venkatesan and L.A. Levin, Random Instances of a Graph Col-


oring Problem are Hard, in Proceedings of the Twentieth Annual ACM
Symposium on Theory of Computing, (1988) pp. 217-222.
392 P. M. Pardalos, T. Mavridou, and J. Xue

[466] N. Vijayaditya, On Total Chromatic Number of a Graph, J. London


Math. Soc., 3 (1971) pp. 405-408.

[467] S. Vishwanathan, Randomized Online Graph Coloring, Journal of Al-


gorithms, 13 (1992) pp. 657-669.

[468] S. Vishwanathan, Randomized Online Graph Coloring, in 31st Annual


Symposium on Foundations of Computer Science, (1990) pp. 464-469.

[469] N.G. Vizing, Some Unsolved Problems in Graph Theory, Russian


Math. Surveys, 23 (1968) pp. 125-141.
[470] N.G. Vizing, The Chromatic Class of a Multigraph, Cybernetics (Rus-
sian), 3 (1965) pp. 32-41.
[471] M. Voigt, List Colorings of Planar Graphs, Discrete Mathematics 120
(1993) pp. 215-219.
[472] M. Voigt and H. Walther, Chromatic Number of Prime Distance
Graphs, Discrete Applied Mathematics, 51 (1994) pp. 197-209.
[473] S. Wagon, A Bound on the Chromatic Number of Graphs Without
Certain Induced Subgraphs, Journal of Combinatorial Theory Series B,
29 (1980) pp. 345-346.
[474] A. Waller, An Upper Bound for List T-Colorings, Bulletin of the Lon-
don Mathematical Society, 28 (1995) pp. 337-342.
[475] W.D. Wallis and G.H. Zhang, On the Partition and Coloring of a
Graph by Cliques, Discrete Mathematics, 120 (1993) pp. 191-203.
[476] C.C. Wang, An Algorithm for the Chromatic Number of a Graph,
Journal of the ACM, 21 (1974) pp. 385-391.

[477] D-1. Wang, The Channel Assignment Problem and Closed Neigh-
borhood Containment Graphs, Ph.D. Thesis, Northeastern University,
Boston, MA, (1985).
[478] D.J.A. Welsh and M.B. Powell, An Upper Bound for the Chromatic
Number of a Graph and its Applications to Timetabling Problems, The
Computer Journal, 10 (1967) pp. 85-86.
[479] White, Graphs, Groups and Surfaces, North-Holland, Amsterdam,
(1984).
The Graph Coloring Problem: A Bibliographic Survey 393

[480] A. Wigderson, Improving the Performance Guarantee for Approximate


Graph Coloring, Journal of the AOM, 30 (1983) pp. 729-735.

[481] A. Wigderson, A New Approximate Graph Coloring Algorithm, in


Proceedings of the Fourteenth Annual AOM Symposium on Theory of
Computing, NCPY (1982) pp. 325-329.

[482] H.S. Wilf, Backtrack: An 0(1} Expected Time Algorithm for the
Graph Coloring Problem, Information Processing Letters, 18 (1984) pp.
119-121.

[483] H.S. Wilf, Spectral Bounds for the Clique and Independence Numbers
of Graphs, Journal of Combinatorial Theory Series B, 40 (1986) pp.
113-117.

[484] H.S. Wilf, The Eigenvalues of a Graph and its Chromatic Number, J.
London Math. Soc., 42 (1967) pp. 330-332.

[485] R.S. Wilkov and W.H. Kim, A Practical Approach to the Chromatic
Partition Problem, J. Franklin Inst., 289 (1970) pp. 333-349.

[486] M.R. Williams, The Colouring of Very Large Graphs, in R. Guy et aI,
editor, Combinatorial Structures and Their Applications, Gordon and
Breach, (1970) pp. 477-478.

[487] C. P. Williams and T. Hogg, Extending deep structure, in Proc. of


AAAI93, AAAI Press, Menlo Park, CA, (1993) pp. 152-157.

[488] C. P. Williams and T. Hogg, Using deep structure to locate hard


problems, in Proc. of AAAI92, AAAI Press, Menlo Park, CA, (1992)
pp. 472-477.

[489] T.K. Woo, S.Y.W. Su and R. Newmanwolfe, Resource-Allocation in a


Dynamically Partitionable Bus Network Using a Graph-Coloring Algo-
rithm, IEEE Transactions on Oommunications, 39 (1991) pp. 1794-1801.

[490] D.C. Wood, A Technique for Coloring a Graph Applicable to Large


Scale Timetabling Problems, The Computer Journal, 12 (1969) pp. 317-
319.

[491] D.C. Wood, A System for Computing University Examination Timeta-


bles, The Oomputer Journal, 11 (1968) pp. 41-47.
394 P. M. Par-dalos, T. Mavridou, and J. Xue

[492] D.R. Woodall, Improper Colourings of Graphs, in Graph Colourings,


R. Nelson and R.J. Wilson (eds.), Pitman Research Notes in Mathemat-
ics Series 218, John Wiley & Sons, New York, (1990) pp. 45-63

[493] D.R. Woodall and R.J. Wilson, The Appel-Haken Proof of the Four
Color Theorem, Selected Topics in Graph Theory Vol. 16 of Lectures
Notes in Mathematics, Pitman, (1977) pp. 83-101.

[494] A.R. Woods, Coloring Rules for Finite Trees, and Probabilities of
Monadic 2nd-Order Sentences, Random Structure (1 Algorithms, 10
(1997) pp. 453-485.

[495] J. Xue, Solving the Minimum Weighted Integer Coloring Problem, J.


of Computational Optimization and Applications, to appear.

[496] J. Xue, Edge-Maximal Triangulated Subgraphs and Heuristics for the


Maximum Clique Problem, NETWORKS, 24 (1994) pp. 109-120.

[497] H.P. Yap, B.L. Chen and H.L. Fu, Total Chromatic Number of Graphs
of Order 2N+1 having Maximum Degree 2N-1, Journal of the London
Mathematical Society-Second Series, 52 (1995) pp. 434-446.

[498] S.S.W. Yeh. Odd Cycles, Bipartite Subgraphs, and Approximate


Graph Coloring, Pub!. Princeton University, Princeton, NJ, 1989. UMI
order no: GAX89-17763.

[499] J.W.T. Youngs, Remarks on the Four Color Problem, in Richard Guy,
Haim Hanani, Norbert Sauer, and Jonathan Schonheim, editors, Combi-
natorial Structures and Their Applications: Proceedings of the Calgary
International Conference on Combinatorial Structures and Their Appli-
cations, Gordon and Breach, (1969) pp. 479-480.

[500] T. Zaslavsky, The Signed Chromatic Number of the Projective Plane


and Klein Bottle and Antipodal Graph-Coloring, Journal of Combina-
torial Theory Series B, 63 (1995) pp. 136-145.

[501] T. Zaslavsky, Signed Graph Coloring, Discrete Mathematics, 39(2)


(1982) pp. 215-228.

[502] J. Zerovnik, A Parallel Variant of a Heuristical Algorithm for Graph


Coloring, Parallel Computing, 13 (1990) pp. 95-100.
The Graph Coloring Problem: A Bibliographic Survey 395

[503] J. Zerovnik, A Randomised Heuristical Algorithm for Estimating the


Chromatic Number of a Graph, Information Processing Letters, 33
(1989) pp. 213-219.

[504] Z.F. Zhang, J.F. Wang, W.F. Wang and L.X. Wang, The Complete
Chromatic Number of Some Planar Graphs, Science in China series A-
Mathematics Physics Astronomy & Technological Sciences 36 (1993) pp.
1169-1177.

[505] X. Zhou, S. Nakano and T. Nishizeki, Edge-Coloring Partial k-Trees,


Journal of Algorithms, 21 (1996) pp. 598-617.

(506] X. Zhou, H. Suzuki and T. Nishizeki, An NC Parallel Algorithm for


Edge-Coloring Series-Parallel Multigraphs, Journal of Algorithms, 23
(1997) pp. 359-374.

[507] X. Zhou, H. Suzuki and T. Nishizeki, A Linear Algorithm for Edge-


Coloring Series-Parallel Multigraphs, Journal of Algorithms, 20 (1996)
pp. 174-201.

[508] A.X. Zhu, Non-coloring Contradiction Graphs and a Graph Coloring


Algorithm, Chinese Journal of Computers, 8(3) Chinese: English sum-
mary, (1985) pp. 189-197.

[509] A. Zobel, Program Structure as Basis for the Parallelization of Global


Register Allocation, Computer Language, 19(2} (1993) pp. 135-155.

[510] J.A. Zoellner and C.L. Beall, A Breakthrough in Spectrum Conserving


Frequency Assignment Technology, IEEE Transactions on Electromag.
Comput., EMC-19 (1977) pp. 313-319.

[511] J.A. Zoellner,Frequency Assignment Games and Strategies, IEEE


Transactions on Electromagnetic Compatibility, EMC-15 {1973} pp. 191-
196.
397

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 397-470
©1998 Kluwer Academic Publishers

Steiner Minimal Trees in E3:


Theery, Algorithms, and Applications
J. MacGregor Smith
Department of Mechanical and Industrial Engineering
University of Massachusetts, Amherst MA 01003
E-mail: jmsmi th<Decs . umass. edu

Figure 1: N = 6 3-Sausage dual construction


398 J. MacGregor Smith

Contents
1 Introduction 399

2 Background 401
2.1 Assumptions . . . . . . 401
2.2 Notation . . . . . . . . . · 401
2.3 Steiner Properties in E2 · 402
2.4 E2 Heuristic . . . . . . . , · 403
2.5 Results from the Unit Sphere · 406
2.6 Steiner Points in a spherical b,. • · 409
2.7 Heuristic on q> •••••••••• · 410
3 E3 Problem 410
3.1 N=3 Case .411
3.2 N = 4 Case . . . · 412
3.3 Primal Problem. · 413
3.4 Dual Problem . . · 414
3.5 N = 5 Case . . . · 419
3.6 N = 6 Case . . . · 421
3.6.1 N = 6 Dual Construction · 421
3.6.2 Conjectured Optimal Construction in E3 422
3.7 n-Sausages....... · 423
3.8 n-Sausage Properties . · 424
3.9 Sausage Experiments. · 426
4 Heuristics 427
4.1 CG Approach . . . . . . . . . . . · 427
4.2 Ribbon Decomposition Problem. · 429
4.3 Heuristic Algorithm . . . . . . · 431
4.4 General Algorithm Description . · 432
4.5 Heuristic Example . · 434
4.6 Complexity Analysis 438
4.7 Experiments. · 439

5 Applications 441
5.1 Minimum Energy Configurations (MEC's) · 442
5.2 Lower Bounds. . . . · 443
5.3 Protein Modeling . . . . . . . . . . . . · 444
5.4 Protein Experiments . . . . . . . . . . · 447
5.5 Protein Objective function description · 449
5.6 Experimental Results .. · 450
5.6.1 Collagen Results · 451
5.6.2 Silk Results . . . · 454
Steiner Minimal Trees in E3 399

5.6.3 Actin Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 456


5.7 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
6 Summary and Conclusions 460

7 Appendix I 467

References

1 Introduction
Let's say that you are going to design a new electronic circuit at the molec-
ular/atomic level where you know beforehand the basic number of atomic
elements of the circuit and you wish to arrange them so that they achieve
an optimal configuration that is both compact, stable, and integrated elec-
trically. How would you configure the atoms '1 If one assumes that the cost
of putting together the molecular structure is proportional to distance i. e.
the potential energy in the system, then you would want to minimize the
overall interconnecting distance between the atoms. This is fundamentally
the Steiner problem in E3. Figure 1 demonstrates a geometric construction
for the Steiner Minimal Tree of N=6 vertices in E3.
The Steiner problem seeks to minimize the total length of a network,
given a fixed set of vertices V that must be in the network and another set
S from which vertices may be added. The cardinality of S is not known
beforehand which makes the problem very difficult, and the focus of this
Chapter is on a better understanding of the theory, algorithms, and potential
applications of Steiner trees in E3. The Euclidean Minimal Spanning Tree
(EMST) is the optimal solution to the problem for which no vertices may
be added from the set S. The Steiner ratio is the ratio of the length of
the Steiner tree (or a heuristic approximation) to that of the EMST of
the given set. The smaller this ratio is, the more advantage there is in
finding the Steiner tree, since it indicates the configuration of vertices which
achieves the minimum possible value. Thus, studying the problem of finding
sets of vertices for which this ratio is a minimum is of central interest.
The algorithmic process is designed to examine configurations which have a
Steiner ratio close to the minimum. Configurations which have a ratio equal
to the conjectured minimum for the same number of vertices are called
optimal configurations.
400 J. MacGregor Smith

Recent research results[64] have revealed some surprising insights into


the properties of Steiner trees and their influence on the design of algo-
rithms for computing them and these all will be explored in this Chapter.
The decomposition and recomposition processes underlying the algorithmic
approach also have some impact on the potential applications of these results
and these will be briefly discussed.
Lastly, the problem of computing Steiner minimal trees is viewed from
two fundamental questions or issues:

Issue I: Given a set of vertices V in E 3 , how should they be decomposed


into subsets so that optimal and suboptimal solutions for each subset
may be combined to yield an effective solution to the Steiner Minimal
Tree for the whole set.
Issue II: On the other hand, how should a set of vertices V in E3 be ar-
ranged or comprised to yield the best configuration possible?
The first issue arises where the vertex sets either have some fixed special
structure or are generated randomly from some probability distribution.
The second issue arises from the need to determine how the optimal config-
urations may be identified and combined to yield the best possible configu-
rations for large vertex sets. This is important for potential applications as
will be shown. Both issues overlap and their interplay integrates much of
what follows.
One of the truly fascinating features of Steiner trees in E3 is how all
the algebraic, geometric, and optimization issues are intertwined. Because
of the complexity of these networks, algebraic, Computational Geometry,
and numerical optimization concepts and tools are often necessary for un-
derstanding, constructing, and analyzing these networks. It is hoped that
this Chapter, in some small way, may illuminate the understanding of how
the methodological issues underlying Steiner trees are woven together.

Chapter Outline In §2, a detailed background is provided of known


results in E2 as well as for the unit sphere. In §3, the optimization and ge-
ometric constructions possible for N = 3,4,5, and 6 vertices are examined
in detail, since a geometric understanding of the problem for small vertex
sets gives one a good intuitive grasp of both the complexity of computing
Steiner trees in E3 but also some insights as to ways of computing them. §4
describes one heuristic for computing Steiner trees in E3 along with compu-
tational results. In §5, potential applications of Steiner trees for minimizing
Steiner Minimal Trees in E3 401

energy configurations are provided, e.g. protein modelling and computa-


tional biology issues, along with a small sample of experimental results and
open research questions, and, finally, a brief summary and conclusions is
provided in §6.

2 Background
It is well-known that the complexity of computing Steiner Minimal Trees
in the plane is NP-Hard [32, 33]. Also, because the Euclidean version is
not known to be in NP, then the complexity of computing optimal Steiner
Minimal Trees in d-space d ~ 3 is demonstrably even more difficult since
there is no inherent combinatorial structure present in the problem[62].

2.1 Assumptions
• A set of finite vertices (possibly infinite) V = {Vb V2, ... , voo } is given or
randomly generated within a bounded region of the Euclidean plane or
space E2, E3 with Cartesian coordinates (Xi, Yi, Zi) for i = 1,2, ... ,00.
The bounded region is often necessary for the probabilistic analysis
of the algorithms. In the Steiner network problems described later in
§5, the set V will be augmented with points from the set S so that the
entire vertex set for the network design problem may contain Z = vuS
vertices. For the sake of clarity, "points" will refer to the additional
vertices from the Steiner set S, while "vertices" will refer to those from
the given set V.

• When necessary, it is assumed that total cost of the network is a linear


function of the flows in the network unless specified otherwise. In
many problems to be described, one assumes that the cost of an edge
is proportional to its length. In §5 on applications, the relationship of
distance to potential energy minimization is shown.

• For the purposes of this Chapter, no obstacles impede or restrict the


location of nodes and arcs in the network, although for many practical
applications, such restrictions are very important.

2.2 Notation
The following are ,additional useful notation:
402 J. MacGregor Smith

ESMT(V) := Euclidean Steiner Minimal Tree of a given vertex set V


EMST(V) := Euclidean Minimum Spanning Tree of a given vertex
set V

Pd := The minimal Steiner ratio of all vertex sets V in dimension d


i.e. P = infvEEd p(V) where p(V) = {~~~~ ~ }
lSI := cardinality of the number of vertices in the Steiner tree.

M:= number of Steiner points from set S.


N:= number of given vertices from set V.

F ST:= Full Steiner Tree with 181 = n - 2.

As demonstrated, FSTs are more prevalent in E3 than they are in E2. In


E2, there are more degenerate situations, even where optimal solutions to
vertex sets may occur, see Figure 2. In Figure 2, a FST would have 8 = 9
vertices whereas, the optimal solution here has only 8 = 5.

2.3 Steiner Properties in E2


There are certain elemental facts in the planar Steiner Tree problem which
are applicable: They are:

• 181::; N - 2 [35]. This is an important upper bound on the cardinality


of Steiner points, although the minimum number of Steiner points for
a given vertex set is unknown .

• P2 = -13/2 'V V [35, 26]. This optimal ratio exists for a single equilat-
eral triangle, certain collections of equilateral triangles and lattices.

For two dimensions and three given vertices, the mathematical program-
ming problem in E2 is a classical nonlinear programming problem to find
the coordinates of a single Steiner point 8 with coordinates (x, y) such that:
3
Min Z = 2:[(x - Xi)2 + (y - Yi)2]t
i=l

In E2, see Figure 2.3, geometric constructions are possible for construct-
ing optimal Steiner trees for small vertex sets.
Steiner Minimal Trees in E3 403

• In the plane, the ESMT is a


union of disjoint FSTs. Even
for lattice configurations with
special structure, computation
of the optimal configurations is
non-trivial for large n.
• There are many interesting spe-
cial structures besides the equi-
lateral decomposition such as
checkerboards and ladders, see
[10J for additional details and
examples.

Figure 2: Lattice Solution, N=l1

Figure 2.3 indicates three alternative methods in the plane for construct-
ing the Steiner point in a triangle. Methods (a) and (b) are similar since
by reflection of equilateral triangles on the sides of the original triangle, one
creates three new vertices 'Yi, i = 1,2,3 [20]. In method (a), one passes a cir-
cle through the reflected vertex 'Yi and the base of the edge which generated
it while in method (b) one joins the reflected vertex 'Yi with the vertex ofthe
given triangle opposite it. Method (c) is carried out by solving a geometric
dual problem which entails constructing the circumscribing triangle of the
given triangle with maximum altitude [43,41].
Some of these planar algorithmic and geometric concepts carryover to
E3, although the construction problems become measurably more difficult
since an optimization step becomes necessary. Heuristic approaches in E2
also are extendable to E3 and one heuristic approach which will be briefly
described in this Chapter is directly applicable in E3 with a slight twist.

2.4 E2 Heuristic
One heuristic for E2 is based on Computational Geometry cg data struc-
tures of the Voronoi Diagram VOR(V) and the Delaunay Triangulation
404 J. MacGregor Smith

(a) (b) (c)

Figure 3: Alternative E2 Constructions of a Steiner point in a Triangle


(VI, V2, V3)

DT(V). There are also other related data structures such as Gabriel Graphs
and Relative Neighborhood Graphs that could be utilized in the design of
a heuristic, and these are discussed in [57]. The DT(V) provides a com-
binatorial framework for the construction of the ESMT solution, while the
VOR(V) provides locus information for concatenating local optimal solu-
tions for small vertex set clusters. The data structures of the V 0 R(V) and
DT(V) are now defined, as they will become important in higher dimensions.

Let Vi and Vj denote two distinct vertices. Furthermore, let H2(Vi,Vj)


denote the set of vertices not farther from Vi than from Vj' H2(Vi,Vj) is a
half-plane. Let a Voronoi polygon around a given vertex Vi be defined as:

VOR(Vi) = n
VjEV\{v;}
H2(Vi, Vj)

Further, let P (Vi) denote the boundary of V 0 R(Vi). The boundary edges are
called Voronoi edges. Vertices where Voronoi edges meet are called Voronoi
points. Voronoi points are similar to Steiner points in that they are extra
Steiner Minimal Trees in E3 405

• The polygon VOR(Vi) is con-


vex and its interior is the locus
of vertices closer to Vi than to
any other V-vertex (Figure 4).
• The union of these polygonal
boundaries forms the Voronoi
diagram for V (Figure 4), de-
noted by VOR(V).

Figure 4: Voronoi Diagram for N=lO

vertices with degree three, but their similarity to Steiner points ends there,
at least for the Euclidean case. Voronoi diagrams form the foundation of
many cg algorithms in geometric 7NV. For a detailed survey on Voronoi
diagrams, the reader is referred to Boots [8] and Aurenhamer [2] and most
recently [29, 48].
One of the most important properties of the DT(V) is that the EMST
is a subgraph of the DT(V). Thus, once the DT(V) is constructed, one has
an O(N) algorithm for constructing the EMST. Since the EMST provides
an upper bound on the ESMT, then using the DT(V) and VOR(V) results
in a performance guaranteed heuristic. In fact, the algorithm provides a
2/../3 ~ 1.1547 performance ratio guarantee.
An O(N log N) geometric heuristic based on the concatenation of appro-
priately chosen ESMTs for clusters of 2, 3 and 4 V-vertices was given by
Smith et al. [55]. The identification of 2, 3 and 4 V-vertex sets is based
on the information available from the Delaunay Triangulation DT(V) and
the Voronoi Diagram VOR(V) along with the EMST for V. The Delaunay
and Voronoi Diagrams act as data structures for the Steiner constructions.
More specifically, only the following cluster subsets of size k = 2,3,4 are
considered during the concatenation phase.

• k = 2: pairs of vertices connected by an edge of the EMST;

• k = 3; triples of vertices on a common triangle of DT(V) and with


two of its edges in the EMST;
406 J. MacGregor Smith

• The straight-line dual of


VOR(V) obtained by connect-
ing a pair Vi and v; of vertices
if and only if P(Vi) and P(v;)
share a common side yields in
fact the DT(V).
• Hence, the DT(V) can be de-
termined in 9(NlogN) time.
The optimality of this algo-
rithm follows from the fact that
sorting of n numbers is linear-
time transformable to the prob-
lem of finding a triangulation of
V-vertices (Figure 5).

Figure 5: Delaunay Triangulation for N=10

• k = 4: quadruples of vertices based on minimizing the Voronoi edge


distance of two adjacent edge-sharing triangles of the DT(V) which
are also connected by EMST edges.

The heuristic, in fact, first attempts to construct the k = 4 clusters based


on the Voronoi information, then the k = 3, and finally the k = 2 edges.
The efficacy of the heuristic is based upon the data structure provided by
the DT(V) and VOR(V), since the Delaunay tends to create "equilateral"
triangles in its decomposition, which closely approximate the "best"solution
in E2 i.e. P2 = -13/2. This essential idea of decomposing the vertex set
around the "best" configuration possible is important in higher dimensions,
and especially in E3 as will be demonstrated.

2.5 Results from the Unit Sphere


The unit sphere ~ [22], provides a logical transition between E2 and E3.
Computing Steiner trees even for triangles on the sphere become more diffi-
cult because the distance metric is more complex and the Steiner vertices are
affected by the curvature and wraparound effects of the spherical surface.
The Geodesic Minimum Spanning Tree (GMST) problem and the Geodesic
Steiner Minimal Trees in E3 407

Steiner Minimal Tree (GSMT) problem on cP will both be examined. The


problems are computationally complex and the focus of concern will be the
development of algorithms and heuristics for these problems from a cg ver-
tex of view [52]. For the problem on the sphere, locations of the given vertices
are defined by their latitude and longitude namely (¢>i, Oi) for i = 1,2, ... ,n.
In order to discuss the minimal networks on the sphere, one needs a metric
for length. The standard metric, which corresponds to the L2 metric in Eu-
clidean space, is the great circle distance. A great circle is any circle having
a radius equal to that of the sphere. The distance between a pair of vertices
on the sphere is the length (in the L2 sense) of the lesser arc of a great circle
between the vertices.
Figure 6 nicely illustrates the minimization problem and some of the
natural difficulties of finding vertices inside spherical triangles. The problem
is to identify the additional vertex Si(¢>i,Oi), if it exists, where the sum of
the geodesic distances is minimized [1, 42, 24, 67, 43].
The following unconstrained optimization problem is presented for three
given vertices CP:

3
Minimize Z(X) = 2':D(X,vj) (1)
j=1

where X = (Xl,X2) are the latitude and longitude coordinates of the


Steiner point and D(X, Vj) is the shortest geodesic or great circle distance
on the surface of the sphere between the Steiner point X and an existing
vertex j located at Vj = (¢>j,Oj). In fact, the geodesic distance is given by
[23]:

There are a number of issues related to the geometry of the sphere that
complicate attempts to directly compute GMST and GSMT solutions on CPo
• First and most obviously, the sphere embodies degeneracies usually as-
sociated with antipodes-e.g., the multiplicity of geodesics between an-
tipodal vertices. Such events require special tests and conventions to
handle.
• Second, the sphere is bounded-i. e., the wraparound that exists com-
plicates the neighbor relation. For example, a triangle of vertices on
408 J. MacGregor Smith

Figure 6: Steiner Minimal Tree on ~

the sphere defines two spherical triangles, usually a major and a minor
one. Without information about the rest of the spanning network, it is
unclear in which of these to search for a Steiner point. The Delaunay
triangulation and Voronoi diagrams, however, provide this oriented
proximity information, since by first computing these data structures,
the task of constructing spanning trees is simplified and ultimately the
basis for the GSMT is formed.

• Third, the sphere lacks a natural ordering by means of which posi-


tional relations such as left-of and right-of are defined. The use of
stereographic projection imparts such an ordering during the Delaunay
construction process.
Steiner MinimalTI-ees in E3 409

2.6 Steiner Points in a spherical 6.


Some of the known properties of Steiner points in spherical triangles that
have already been developed and form the foundation for the heuristic search
for Steiner trees on ~ are now presented.

Theorem 2.1 [37, 18} Let Vb V2, V3 be three distinct vertices on the surface
of a sphere, not all of which are on the same great circle, then any vertex X
which minimizes the sum of the minor great circle arcs as in {I}:
3
~D{X,Vj} E ~V1V2V3
j=1

is a Steiner point.

Theorem 2.2 [18} Either there is a unique point X which minimizes


3
~D{X,Vj}
j=1

or X coincides with one of the given vertices VI, V2, V3 or if

LV1V2V3 of ~ VIV2V3 < 27r/3


then the minimum vertex of VI V2V3 :/:- V2.

Theorem 1 and 2 provide the analogues for locating Steiner points on ~


as they occur in E2. Also, under an azimuthal equidistant projection, with
a Steiner vertex as a pole, an optimum Steiner point for three given vertices
on~, is the optimum for the corresponding problem in the plane E2 [42]. In
fact, it is this local projection property for three and four vertices on ~ that
will be explored in the development of an heuristic for the general Steiner
problem on ~.
Finally, there are numerical difficulties with directly solving for the
Steiner points on ~ as a continuous nonlinear optimization problem as can
be surmised from the trigonometric relations of the distance function [43].
In fact, in problems where the given fixed vertices cannot be covered by a
disc of radius l' local minima, maxima and flat vertices must be dealt with
[24]. As an alternative to solving for the GSMT as a continuous nonlinear
optimization problem, the heuristic based on a Computational Geometry
approach is briefly summarized.
410 J. MacGregor Smith

2.7 Heuristic on ~

As in E2, the approach utilizes the DT(V) and VOR(V) as data structures.
Stereographic projection is utilized to project V onto the plane where the
DT(V) and VOR(V) are constructed and then are mapped back to the
unit sphere. See [22] for more details. The heuristic for the GSMT works
by utilizing the simplicial decomposition of the DT(V) and the GMST to
decompose the vertex set into local optimal solutions of the GSMT then
concatenates these local optimal solution into a sub-optimal solution for V.
The concatenation part of the heuristic is largely based on the ideas
for solving the ESMT in E2 [55] although there are some unique problems
associated with solving for the local three and four vertex cases on CP. As in
the planar solution, only 2,3, and 4-vertex cluster components are identified,
since experimentally, they have the most significant reduction in GSMT
solutions.
Thus, to briefly summarize the algorithmic approach, the Delaunay Tri-
angulation is utilized to partition the vertex set and help avoid degeneracies,
the Voronoi Diagram is then employed to provide locus information, and fi-
nally, stereographic projection is employed to provide a natural ordering of
the vertices for solutions to the GMST and GSMT problems.
All in all, the overall worst case running time of the algorithm is bounded
by the time complexity of constructing the Delaunay triangulation which is
O(NlogN). All the remaining major steps and sub-steps of the algorithm
for the GMST and the sub-optimal solution for the GSMT are either 0(1)
or in the worst case O(N). Storage is also O(N).

3 E3 Problem
Now as one transitions to E3, most properties and features of Steiner trees
in the plane will carryover, some will be different, and some new properties
will emerge. In particular, the maximum number of Steiner points in E3 is
also N - 2. For a collection of other properties see [62]. Not all optimal
configurations require FST's as will be demonstrated, since some of the
given vertices can act as degenerate Steiner points. Things that one should
be especially prepared far are the dynamic nature of P3, the recurring sets
of planar Steiner solutions found in E3, and the elongated optimal nature
of some of the Steiner point sets. These features will occur over and over
again not only in this section but in §4 and 5.
To start, the three vertex case in E3 is presented.
Steiner Minimal TI-ees in E3 411

3.1 N=3 Case


Take for example the following three regular vertices:

Vi: 1. 1. 1. j Vj: 1. -1. -1. j Vk: -1. 1. -1.

An assumption about these vertices is that the largest interior angle is


less than 2;otherwise the Steiner point would be degenerate with one ofthe
three given vertices. For three vertices in E 3 , they define a plane which thus
reduces the three vertex problem to the planar version of the Steiner mini-
mal tree problem. By the fundamental properties of constructing a Steiner
vertex in the plane, see Figure 2.3, any reflected equilateral vertex and cor-
responding Melzak circle define the length of the SMT and the location of
the Steiner point [44, 35].
Given that the three vertices lie in a plane, the arbitrary oriented plane
is found in E3 and then one utilizes one of the previous methods in E2
to construct an ESMT solution. The optimal length of this solution is
defined as SMT*. As a consequence of the Melzak Circle construction in
the arbitrary oriented plane, the equilateral vertex will be located farthest
from the third given vertex Vk.
To understand why this occurs, as the Melzak Circle is extended from say
vertices Vi, Vj, the equilateral vertex on the circle will trace a sphere in space,
see Figure 7. The distance from the equilateral vertex on the Melzak Circle
to the opposite vertex Vk represents the length of the Steiner tree and is
sometimes known as the Simpson line. As the Melzak Circle rotates in space,
the Simpson line, and, the length of the Steiner tree will vary in length.
Further, as a consequence of this rotation process, the length of the Steiner
tree will be longest when the Circle lies in the arbitrary oriented plane
because, if it did not lie in this space plane, then the planar construction
would yield a Steiner point without the 1200 angle property, a contradiction
to the optimal solution.
Thus, one alternative to solve the problem is to find the orientation of the
equilateral vertex and Melzak Circle, by establishing the following quadratic
optimization problem, where the location of tij with coordiates Xij, Yij, Zij
is sought:

Maximize q; = IItij - vkll (3)


412 J. MacGregor Smith

s.t.
(4)
(5)
The solution to this optimization problem will yield a chord length q,*. with
tij and the Melzak Circle lying in the same plane, otherwise q,* < SMT*, a
contradiction to the optimality of SMT*, thus q,* = SMT*, and tij is the
furthest from Vk.

• That it is the furthest vertex is


an important property in look-
ing at the N ~ 4 cases
will now be examined.
• The construction example for
the 3 given vertices is nicely il-
lustrated in Figure 7 showing
the equilateral vertex, Melzak
circle, and SMT length con-
necting tij and Vk.

Figure 7: Triangle in E3

In the example construction, the length of the Simpson line is 2V6 and
since the length of each edge of the equilateral triangle is VS, the optimal
Steiner solution of P3(V) = ..;3/2 results.

3.2 N = 4 Case
Starting with an equilateral tetrahedron with the following four vertices, a
description of the Primal approach to constructing a solution for the N = 4
case will be presented followed by a Dual formulation which is based on
some of the distance optimization ideas found for the N = 3 case.
Vi: 1. 1. 1.; Vj: 1. -1. -1.
Vk: -1. 1. -1.; V1: -1. -1. 1.
Steiner Minimal 'lfees in E3 413

There are three alternative topologies for a SMT in a tetrahedron. It


will be useful to consider the three topologies in relation to the edge pair
orientations of the tetrahedron.

Vi Vk Vi Vj Vi Vj
\ / \ / \ /
\ / \ / \ /
0-----0 0-----0 0-----0
/ \ / \ / \
/ \ / \ / \
Vj Vl Vk VI VI Vk

Figure 8: Vertex Set

The three alternative topologies for the edge pairs are given in relation
to the direction in which the (Si' Sj) edge length exists:

3.3 Primal Problem


Because an equilateral tetrahedra is being examined, the first topology on
the left of Figure 8 with vertices as given will be conveniently assumed.
Since the Steiner link geometry depends upon the topology of how the given
vertices and Steiner points interact, establishing the topology is essential in
formulating the optimization problem for constructing the Steiner tree.
If one wishes to find the SMT geometry directly along with the coordi-
nates of the Steiner points, one can minimize the following unconstrained
nonlinear programming problem:

Min Z = L L[(Xi - Sjx)2 + (Yi - Sjy)2 + (Zi - Sjz)2]t (6)


iEVUS jES

where the indices in the summation are dependent on the three alternative
topologies possible for the Steiner tree. In particular, writing out the primal
problem the following formulation is obtained (see Figure 10 for how the
topology and the constraint equations are interrelated).
Directly solving the primal problem is carried out through such an al-
gorithm as Warren Smith's [62] where a branch-and-bound code plays off
414 J. MacGregor Smith

• It was postulated by Gilbert


and Pollack in 1968, [35} that
the simplex in d- dimensional
space would yield the optimal
Steiner tree ratio .
• Most recently, the Gilbert-
Pollack conjecture was shown
to be false [27, 28}.

Figure 9: Tetrahedron P3(V) = 0.81305

iteratively searching the alternative FST topologies and numerical estimates


of the Steiner points within the FSTs. Interestingly enough, Warren Smith's
algorithm does not use a lower bounding schema which impedes its efficiency
in pruning the enumeration tree. An alternative to the primal algorithmic
approach is through the following dual quadratic programming formulation
which must be generated for each topology as in the primal.

3.4 Dual Problem


In this dual formulation related to the geometric construction in Figure 7,
the location of Melzak Circle vertices tij and tkl that are furthest apart
from each other is sought [62]. The argument for using this optimization
formulation to find the furthest Melzak Circle vertices follows a similar one
as in the N = 3 case[62]. Finding these Melzak Circle vertices and the
resulting line segment between them as depicted in Figure 11 yields a lower
bound on the length of the Steiner problem for the particular topology.
The Dual formulation is edge-based, and the Steiner points are found
after the optimization of the edges, while the Primal is vertex-based and one
directly seeks the location of the Steiner vertices. The following optimization
problem with the constraints ensuring the equilateral edges are satisfied is
Steiner Minimal Trees in E3 415

a = [(Xi - + (Yi - SilY + (Zi - Si,z)2]!


Siz)2

b = [(Xj - Siz) 2 + (Yj - Sill) 2 + (Zj - Si,z )2] ~ Vi Vk 1

C = [(Xk - s#)2 + (Yk - Sjll)2 + (Zk - Sj,z)2]! \ I


d = [(Xl - Sjz)2 + (Yt - Sjll)2 + (Zl - Sj,z)2]!
a\ Ie
\ I
f = [(Siz - Sjz)2 + (Sill - Sjll)2 + (Si,z - Sj,z)2]! Si o--f--o Sj
I \
bl \d
I \
Vj VI
Figure 10: Primal Tetrahedron Topology

V·1
e··IJ ,," . . . ......ekl
""
......
......
t·· " ......
IJ .(".-. . - - - - - - - - - - - - - - - - - - - - - - -~. . . tId
""
......
......
......
e··IJ ......
......


J
Figure 11: Dual Simpson Line Construction

presented:
(7)
subject to:
[(Xij - xd 2 + (Yij - Yi)2 + (Zij - Zi)2]~ eij (8)
[(Xij-Xj)2+(Yij-Yj)2+(Zij-Zj)2]~ - eij (9)
[(Xkl - Xk)2 + (Ykl - Yk)2 + (Zkl - Zk)2]~ = ekl (10)
[(Xkl - Xl)2 + (Ykl - Yl)2 + (Zkl - Zl)2]~ = ekl (11)
The SMT is equal to the distance between tij, tkl and, in fact, SMT =
2V6 + 2 and the Steiner ratio P3(n) = 2j2Js2 ~ 0.81305 which is the optimal
416 J. MacGregor Smith

Figure 12: N = 4 Equilateral Triangles and Melzak Circles

Steiner ratio of a regular tetrahedron [64]. In the construction Figure 12,


the twist angle between the two Melzak circles is 90° which is due to the
mutual orthogonality of the edge pairs (Vi,Vj) and (Vk,Vl) .
This quadratic optimization formulation does not admit to a polynomial
time algorithm for its solution and, thus, remains a very difficult optimiza-
tion problem even for four vertices. Even though the problem is convex,
the objective function is a maximization one, so there may not be a unique
optimum.
The beauty of solving this Dual optimization problem is that one clearly
sees the underlying geometry for constructing the Steiner tree in E3 . Solving
Steiner Minimal Trees in E3 417

the quadratic optimization problem will work for any irregular tetrahedra,
not just equilateral ones. In the Dual problem, one can establish a relaxed
version of the general problem if we use a distance squared formulation. The
Lagrangian of the distance squared formulation of the Dual is:

L(x,);) = /(x) + );h(x)

81 j8Xij, 81 j8Ai generating 10 nonlinear


If the above is then differentiated,
equations in the 10 independent variables and setting them equal to zero:

aj
aXij
aj
aXkl
aj
aYij
aj
aYkl
aj
aZij
aj
aZkl
aj
a>"1
aj
a>"2
aj
a>"3
aj
a>"4
In the above, the first six equations represent the Dual Feasibility conditions,
while the latter four represent the Primal feasibility conditions. Including
the Complimentary Slackness conditions together in the Lagrangian objec-
tive function );h(x) = 0, we see that indeed the Dual does solve the Primal
since maximizing the distance between two extreme vertices is equivalent to
the Primal objective function.
As an example of the irregular four vertex case, the optimal length of a
general tetrahedron is given as follows (this is a slight perturbation of the
equilateral tetrahedra).
418 J. MacGregor Smith

Vi:1.25 1. 1. Vj:1. -1.75 -1. Vk:-1. 1. -2. Vl:-1. -1 . 1.

Carrying out the modified algorithm with the Nonlinear programming


approach embedded in Mathematica, which is based on a Newton type pro-
cedure for solving a set of nonlinear equations, the coordinates were found
to be
tij = (-0.200706,0.366419,3.56057);

tkl = (0.374448, -1.17195, -4.44045)


The Lagrange multipliers for this irrergular configuration are Al = -1.36566,
A2 = -1.75906, A3 = -1.53361, A4 = -1.11645. The chordal distance be-
tween the extreme vertices was found to yield an SMr = 8.16789 with a
Steiner ratio P3(V) = 0.861507.

Figure 13: Dual Construction N = 4


Steiner Minimal 'frees in E3 419

3.5 N = 5 Case
As the next logical vertex set, the case for N = 5 vertices is considered.
Two adjacent equilateral tetrahedra form the foundation of this vertex set.
In general for the N = 5 vertex case, there are a total of 15 possible FST
topologies, so that the solution procedure becomes slightly more complex
with the additional vertex. Let's consider the following regular vertex set:
Vi: 1. 1. 1.; Vj: 1. -1. -1.; Vc: -1. 1. -1.
Vk: -1. -1. 1.; VI: -5/3,-5/3,-5/3

One of the topologies appears in Figure 14 and this is the one used in
the example discussion. For the above topology, the Melzak Circles are
• Figure 14 illustrates the map-
ping between the example con- Vk Vc Vi
figuration and the optimization
problem.
\ I /
\ I /
• The difficulty of this opti- t (kl) .... 0-------0--------0 .... t (ij)
mization problem stems from / \
the need to simultaneous locate / \
tijkl, tij, tkl.
VI Vj

t (ijkl)

Figure 14: Dual Construction for Vertex Set N =5


constructed iteratively, pairing up edges (Vi, Vj) and finding tij, pairing up
edges (Vk' VI) and finding tk,l., then taking tij and tk,1. to find the vertex tijkl.
and the resulting maximum distance between tijkl and Vc the central vertex.
The dual formulation is again dependent upon specifying the topology
of the vertices in the network. The dual formulation for N = 5 requires the
solution of the following quadratic optimization problem (n. b. the squared
distance function is used to simplify somewhat the optimization problem):

Maximize q, = Iltijkl. - veil (12)


subject to:

(13)
420 J. MacGregor Smith

(Xij - Xj)2 + (Yij - Yj)2 + (Zij - Zj)2 = el (14)


(Xkl - Xk)2 + (Ykl - Yk)2 + (Zkl - Zk)2
2
- ekl (15)
(Xkl - Xl)2 + (Ykl - Yl)2 + (Zkl - Zl)2 - ekl (16)
(Xijkl - Xij)2 + (Yijkl- Yij)2 + (Zijkl- Zij)2
(Xkl - Xijkl)2 + (Ykl - Yijkl)2 + (Zkl - Zijkl)2 = 0 (17)
(Xijkl- Xij)2 + (Yijkl- Yij)2 + (Zijkl- Zij)2
(Xkl - Xij)2 + (Ykl - Yij)2 + (Zkl - Zij)2 = 0 (18)

Figure 15 illustrates the resulting optimal solution for the N = 5 vertex


casco In order to solve this difficult nonlinear programming problem, a
Sequential Q:.!::.dratic Programming (SQP) algorithm available in the IMSL
Library was utilized. For the N = 5 equilateral configuration, the Pa (V) =
0.815469 which is larger than the N = 4 case. In what follows, it will be
shown that for a certain configuration of N = 6 equilateral vertices, one can
reduce this Pa(V) value below the N = 5 vertex case.

• It is clear from Figure 15 that


the edge segment between tijkl
_--------j-
and Vc is the lower bound length
that is sought.
• What is the difficulty here is in
the simultaneous location of the
vertices tij, tkl, tijkl.
• An important feature of the
dual construction is the result-
ing conjunction of the planes of
Melzak Circles and the inter-
secting Simpson lines in E3 .

Figure 15: N = 5 Dual Construction


Steiner Minimal Trees in E3 421

3.6 N = 6 Case
Now consider three equilateral tetrahedra with the following vertex set co-
ordinates and corresponding topology:

Vi: 1. 1. 1.; Vj: 1. -1. -1.; Vc: -1. 1. -1.


Vd: -1. -1. 1.; Vk: -5/3, -5/3,; -5/3 VI: -3.444,-1/9,-1/9

k d c i
\ I /
\ I /
0-----0-----0-----0
/ \
/ \
1 j

Figure 16: Vertex Set N =6


Figure 16 represents one of the 120 possible topologies interconnecting
the set of six given vertices and the Steiner points. This particular topo-
logical arrangement, however, represents a special property of the vertex
set, referred to as a "path topology" which will be defined formally later.
This "path topology" is very important for vertex sets N ~ 6 as will be
demonstrated.

3.6.1 N = 6 Dual Construction


The detailed dual optimization formulation for N = 6 is similar to the
N = 5 case and will not be repeated here for the sake of brevity. It again
requires the construction of Melzak Circles for adjacent pairs of vertices and
further Melzak Circles built upon these pairs. The quadratic optimization
formulation was also solved with the SQP algorithm described earlier for
the N = 5 case.
For this special configuration of 3 tetrahedra, it is curious why it should
decrease the Steiner ratio after 2 tetrahedra raised it above the simplex,
but, in one sense, this is related to the planar case, since the N = 5 optimal
configuration in E2 is the degenerate N = 4 tetrahedron with the fifth vertex
and €-distance removed from one of the four given vertices.
422 J. MacGregor Smith

• Figures 1 and 17 reveal the


rather interesting solutions for
the N = 6 case .
• Again, the planes of Melzak
Circles and intersecting Simp-
son lines are a unique trade-
mark of what happens in the
ESMT solutions in E3,

Figure 17: N = 6 Optimal Dual Construction P3(V) = 0.80865

3.6.2 Conjectured Optimal Construction in E3


While for N = 6 vertices one can construct an optimal solution using the
past dual formulations, there are other properties of this vertex set which
have even further significance for Steiner trees in E3.
In order to add additional vertices to the N = 6 base construction,
the following construction technique referred to as the "Bucky-ball" rule is
presented [64]:
Rule: Successively add vertices so that the Nth vertex added is always
equidistant to the min(3, N - 1) most recently added vertices.
This rule affords a way and a corresponding data structure to configure
chains of tetrahedra built upon the N = 6 basic equilateral structure. In
fact, as vertices are added to this basic structure, one creates a ribbon
sausage of vertices which dramatically decreases the P3 value. The names
of the tetrahedra structures generated are based on the fact that a single
carbon atom forms a tetrahedron and more complex carbon molecules are
simply built upon this single carbon atom base via the Bucky-ball rule.
This monotonically decreasing property of these structures is truly re-
markable. One possible explanation of why 6 vertices are needed is that the
convex hull of the 3 tetrahedra becomes the foundation of the triple helix
Steiner Minimal Trees in E3 423

n rho name
4 0.813052529585 (regular tetrahedron)
5 0.815469669674 (triangular bipyramid)
6 0.808064936179 (llpropane ll )
7 0.802859896946 (1 of 2 "chain butanes")
8 0.800899342742 pentane
9 0.798704227344 hexane
10 0.797013231353 heptane
11 0.795785747249 octane
12 0.794720989050 nonane
13 0.793838038891 decane
14 <= 0.7934
15 <= 0.7926
infinity <= 0.784190373377122 R-sausage

Table 1: Steiner Ratios For N-Vertex R-Sausages.

geometry which forms the basis of the conjectured optimal configuration.

3.7 'R-Sausages
The search for the conjectured optimal configuration for minmizing P3 in E3,
Issue #2 raised at the beginning of the Chapter in E3, began by looking at
two and three tetrahedra as well as gluing various types and combinations
of Platonic solids (cubes, octahedron, etc.) together [62].
As an aside, Fejes T6th conjectured in 1975 [60] that the optimal config-
uration of spheres in d ~ 3 for the penny packing problem was a set of unit
balls on a line. This is the famous "sausage" conjecture. The conjectured
configuration for the Steiner problem in E3 is slightly different for the since
the unit balls with each tetrahedron are slightly offset from one another
and not in a li~e. To distinguish the problem and conjecture from that of
the sausage conjecture, since it is really different, the conjectured optimal
configuration will be called a "ribbon/sausage" or 'R.-sausage for short.
The most interesting property of this 'R.-Sausage is that if it appears
to be the optimal configuration of vertices in E3 it is thus equivalent to the
best configuration, the equilateral triangle in E2 which as was shown earlier
is critical in any heuristic algorithm for vertex set decomposition.
424 J. MacGregor Smith

Conjecture 3.1 The n-Sausage achieves P3, and

P3 = ~~~ _ 3 : : + 9 J11 ~~V2 '" 0.784190373377122247108395477.


3.8 'R-Sausage Properties
For a known ribbon topology, the following properties within the n-Sausage
exist, namely:

Figure 18: N=20 Helix Geometry and Path Topology

Path Topology As can be seen in Figure 18, the n-Sausage has a unique
path topology which relates the given vertex set V with the Steiner
Steiner Minimal Trees in E3 425

points in S. For the 'R.-Sausage, there are (N - 2) Steiner points,


Steiner point i being connected to Steiner point i +1 for i = 1 ... N - 3.
Also sausage vertex i is attached to Steiner point i-I for i = 2 ... N -1,
and also sausage vertex 1 is attached to Steiner point 1 and sausage
vertex n is attached to Steiner point N - 2. This path topology is an
important indicator of the fundamental structure of a particular set of
vertices.
Monotonically Decreasing [64] The Steiner ratio is monotonically de-
creasing, as the number of vertices in V increases in the 'R.-Sausage.
This is exemplified in Table 1.
This dynamic nature of the Steiner ratio implies that the longer the
'R.-Sausage the better. This would seem to imply some importance to
the computational biological applications examined in §5.

FSTs In particular, the maximum number of Steiner points in E3 is also


N - 2. For a collection of other properties see [62]. Not all optimal
configurations require FST's as will be shown in §4 and 5, since some
of the given vertices act as degenerate Steiner points.

Angles All the angles at the Steiner junctions are 120°. This is the same
property as in the plane. When the applications to biochemical pr<r
teins are discussed, this angular requirement will not hold in all cases
since the mass of the atoms and the electrodynamic forces interrelating
them are not uniformlyequalivslent.
Steiner point Degree All the Steiner points have 3 arcs incident to each
vertex, or c5(sj) = 3, 'Vj. All given vertices have c5(Vi) = 1, 'Vi. Figure
18 illustrates the convex hull and Steiner tree for n=20 vertices. The
diagram clearly indicates the triple helix construction. Notice that all
vertices from V lie on the convex hull of the 'R.-Sausage while all the
Steiner points lie in the interior.

Helical Axis There is a well-defined axis of rotation about which both the
V and S vertices rotate.
Figures 19 and 20 illustrate two end-views of the vertex set for n =
75, the former very close to the start of the 'R.-Sausage, while the
latter from a distance. The chords across this end view in Figure 20
represent the Steiner points and line segments at both ends of this
finite 'R.-Sausage yet for an infinite 'R.-Sausage these would not exist,
426 J. MacGregor Smith

• Figure 19 illustrates that for


vertices propagating out along
the R-Sausage, they appear in
clusters of essentially 7 ver-
tices.
• Since there are a total of n =
75 vertices, two clusters have
6 vertices, while there are nine
clusters of 7 vertices.

Figure 19: 'R.-Sausage Screw Symmetry

one would have two concentric convex hulls or 3d onions as they are
called in Computational Geometry.

Periodicity Curiously enough, it has been shown that the 'R.-Sausage is


not periodic. If one selects a single vertex as a starting vertex for
the 'R-Sausage then after 28 tetrahedra or 11 chords along the helix,
because of the screw symmetry of the 'R.-Sausage it should reappear
after a 3600 revolution, but it reappears slightly off at 3540 20'. This
was demonstrated by Coxeter, p. 412 [20].

3.9 Sausage Experiments


Many hundreds of thousands of computational experiments were tried to see
if there was a better optimal configuration of vertices which has the smallest
P3, and the remarkable result so far is that nothing surpasses the 'R.-Sausage
for minimizing P3 [64]. While one cannot conclusively prove it is the optimal
configuration at this vertex in time, at least it yields a very strong upper
bound on the Steiner ratio. This is important in the next section of this
Chapter for the development of a methodology to decompose an arbitrary
vertex set and eventually develop a heuristic algorithm. As was the case in
E2 and on the unit sphere, decomposing according to the optimal vertex set
configuration, should yield an effective heuristic.
Steiner Minimal Trees in E3 427

• Figure 20 is another view of the r-----"""""::-=-::~----___,


same n-Sausage from a further
distance where it is clear that
all the given vertices V lie on
the convex hull of V and all the
Steiner points S lie in an interior
convex hull.
I
• This view also reconfirms the ::::::::~~
distance symmetries which are
present in the 'R,- Sausage along
with the existence of a cylindri-
cal Steiner convex hull inside the
given vertex set V.

Figure 20: End-View of'R.-Sausage

4 Heuristics
In general, when faced with computing a Steiner tree for a large randomly
generated vertex set in E3 with no special structure, optimal solutions will
probably not be possible. Thus, some type of decomposition procedure is
necessary to make solutions tractable. Of course the most natural decompo-
sition tool is the 'R.-Sausage itself, since as was the case in E2 and the unit
sphere, the equivalent optimal configuration vertex set should be a good
decomposition strategy. Once the subsets of vertices are carved out of the
larger set, then fast computation of these small subsets is important to the
overall Steiner construction for the entire vertex set. If this construction can
be done in better time than with an exponential worst case time algorithm,
then the overall running time and performance of a heuristic will be greatly
enhanced.

4.1 CG Approach
The general strategy followed here in constructing E3 Steiner trees is a locus
based strategy and is largely based on the approaches in the past in E2 [55]
and on the unit sphere [22]. These past approaches rely on Computational
Geometry Cg data structures from which the local optimal solutions are
formed. The crucial concepts needed in E3 are the following [29]:
428 J. MacGregor Smith

• Figure 21 illustrates a Steiner


tree construction where there
are subsets of small vertex clus-
ters N = 2,3,4,5,6 which com-
prise the Steiner tree solution.
• Thus, the discussion of the
properties of Steiner trees in §3
carry over to the discussion of
an effective heuristic algorithm.

Figure 21: Steiner Construction N = 18

Voronoi Polyhedra: Given a set of vertices V in E3, the Voronoi cell of a


vertex Vi is the set of vertices in E3 closer to Vi than to any other given
vertex Vj,j E V. This set can also be described as an intersection of
half-spaces.

Voronoi diagram: is the collection of all non-empty Voronoi cells. The


intersection of two adjacent Voronoi polyhedra is a Voronoi plane, and
the Voronoi planes intersect in Voronoi edges and eventually Voronoi
vertices.
Delaunay tetrahedization: is also a cluster of simplicial polyhedra in E3
and is the mathematical dual of the Voronoi diagram. In addition,
the facets of the Delaunay tetrahedization can be characterized by the
property: For T a proper triangulation, let maxrad( <p) be the maxi-
mum radius of any minimum containment sphere of any simplex in T,
then over all proper triangulations Tin E 3 , the Delaunay triangulation
minimizes maxrad( <p) [29J.
The Delaunay tetrahedalization is utilized as the basis for the algorithms
since the equilateral tetrahedra are crucial for the optimal Steiner point
Steiner Minimal Trees in E3 429

set configurations as discussed in §3. Beasley and Goffinet's recent use


of the Delaunay Triangulation in their heuristic for the SMT problem in
the E2 yields the best results so far for the plane [6] and reinforces the
approach for E3. There are other methods of tetrahedralization possible,
yet for the purposes of computing Steiner trees, the more equilateral the
tetrahedralization the better in light of the 'R.-sausage properties.

4.2 Ribbon Decomposition Problem


Given the fact that the infinite ribbon or triple helix is conjectured to be
the optimal configuration, one needs to utilize these ribbons to effectively
decompose a vertex set so that an overall solution to the problem can be
achieved. Even if it turns out that the infinite ribbon is not optimal, it still
provides a valid decomposition strategy for an heuristic to the problem. In
this section, vertex sets which are comprised of ribbons of regular tetrahedra
joined together are examined. Although this is still not the most general
case, one needs to illustrate how the tetrahedra join together and it is clearer
to illustrate the algorithmic concepts with equilateral tetrahedra.
Keeping in mind how the Bucky-ball rule generates ribbons, it is instruc-
tive to consider junctions of these ribbons when this rule is not followed. If
the next vertex in generating a ribbon is chosen to be equidistant from three
vertices that are not the ones most recently added, then one achieves a two-
way junction. For example, if the ribbon has 10 vertices, then choosing a
vertex equidistant from 7,9,10 would break the 'R.-sausage structure. If
in addition the 12th vertex were chosen to be equidistant from 7,8,10, one
would get a three-way junction. 1 This construction can be generalized to
non-equilateral tetrahedra as well.
Some additional notation and definitions are useful at his vertex:

Centroid Spanning Tree (CST) Given the Delaunay Triangulation, the


MST of edges which is a subgraph of the Delaunay has for each tetra-
hedron a centroid. For the equilateral tetrahedra it is also a Voronoi
vertex. The collection of tetrahedra with these centroids is referred
to as a Centroid Spanning Tree (CST). With irregular tetrahedra, the
Voronoi vertex which is the center of a sphere through the tetrahedral
vertices may lie outside the tetrahedral convex hull, yet with a regular
tetrahedra, this problem will not be encountered.

1 Following the Bucky ball rule, a one-way junction is simply the 'R-sausage.
430 J. MacGregor Smith

Ribbon ('R.) Within the CST, there are chains (links) of tetrahedra which
for the sake of the argument will be called ribbons 'R.. They are rib-
bons of varying length, li and corresponding pi which for the regular
tetrahedra can be estimated without actually calculating the Steiner
tree solution since the P3 is a decreasing function as was shown earlier.

Ribbon Junctions: Within the CST, the ribbons intersect at junctions


60,62 ,63 ,64 of degree 0,2,3,4 of tetrahedra respectively.

60-Junction: An "empty" junction of 2-ribbons with no common tetrahe-


dron between them, however, the two ribbons share a common degen-
erate Steiner point.

62-Junction: A junction of 2 Delaunay tetrahedra with p2j = 0.8155.

63-Junction: A junction of 3 Delaunay tetrahedra with p3j = 0.8288.


64-Junction: A junction of 4 Delaunay tetrahedra with p4j = 0.9005.

• Figure 22 represents a junction


of 9 Delaunay tetrahedra with
p3i = 0.8288.
• The solid circles are vertices
from the original set. Empty
circles are additional Steinerd ~~-bI6I-tttTn rN
points.

Figure 22: 3-Way Junction P3(V) = 0.8288

The e~ample illustrates an optimal SMT configuration where two of the


given vertices act as degenerate Steiner points. While N = 16, there are
only M = 12 Steiner points. Empty junctions are interesting because they
allow the 'R.-sausage topologies to be independent, i. e. there is no loss in
the Steiner ratio across the ribbons. These empty junctions deserve a study
in their own right, since they may exist in natur81 phenomenon and may also
Steiner Minimal TI-ees in E3 431

• Figure 23 illustrates one cluster


of three separate chains where
each chain has three tetrahedra
formed via the Bucky-ball rule
and there is no degradation in
P3 for the entire vertex set, i.e.
the P3 for each separate ribbon
is the optimal P3 expected for 3
tetrahedra.
• The shaded triangles in the up-
per 3-d figure are the large ones
in the lower figure, which is the
2-d representation of the same
vertex set.

Figure 23: 3-disjoint ribbons with Empty Junctions P3(V) = 0.808065

be the foundation for certain organic and inorganic applications e.g. single
and double-chain silicate compounds and other ceramic structures [45].
The other di-junctions are problematic since the sausage topologies get
corrupted and the P3 becomes less than the expected optimal value. Actu-
ally, however, for the heuristic algorithm, it is exactly how the junctions are
dealt with that is the key problem in the ribbon decomposition process.

4.3 Heuristic Algorithm


The algorithm is essentially a greedy one based on using the MST as a de-
composition guide and the Delaunay Triangulation to construct the ribbons
of tetrahedra. It seeks to decompose the Ribbon data structures in such
a way as to minimize the Steiner ratio for each piece. The Steiner ratio
depends on two factors: the length of each ribbon and the degree of each
junction. Therefore, the algorithm tries to maximize the minimum length
ribbon, but at the same time it seeks to reduce the degree of each junction in
the Ribbon data structure. Figure 24 illustrates diagrammatically how the
heuristic works. The saw-tooth line of Figure 24 represents the MST and
the tetrahedral Ribbon data structure defined by the MST. The entire rib-
bon structure is to be decomposed while seeking to maximize the minimum
432 J. MacGregor Smith

",,~-------
, , ,,'---- ....

Figure 24: 'R-Sausage Decomposition Process

length of ribbon, unioning the largest junctions of ribbons wherever possi-


ble, and concatenating the local optimal solutions within the decomposition
process.
In the following section, a formal description of the algorithm is pre-
sented. Data structures used in the algorithm are described in [56].

4.4 General Algorithm Description


The algorithm carefully examines the junctions of the ribbons and attempts
to union the local optimal solutions as efficiently as possible.

Step 1.0 Establish the cg data structure: Construct the Convex Hull and
Delaunay Triangulation data structure in E3, DT(V).
Steiner Minimal TI-ees in E3 433

Step 2.0 Establish an Upper Bound on the SMT: Solve the MST with the
DT(V) edge set.

Step 3.0 Construct the 'R Data Structure: Identify the tetrahedra ti, tj, ... tq
sharing edges in the MST

Step 3.1: Utilize the Voronoi locus information to determine adjacent


tetrahedra in the MST.
Step 3.2: Utilize the CST to construct the longest chains 'Ri of tetra-
hedra in the MST.
Step 3.3: Identify pi within each 'Ri.
Step 3.4: Identify the junctions of the 'Ri.

Step 4.0 Local Optimal Solutions: Create a Priority Queue Q of'Ri sorted
on their Steiner ratio.

Step 4.1: Select a 'Ri and determine adjacent 'Rj incident to 'Ri at
both ends if possible of 'Ri
Step 4.2: Choose the largest junction and determine the adjacent 'Rj
which Maximize the minimum length 'Rj. If the sets of ribbons
can be unioned together, go to Step 4.3 , otherwise:
4.2.1: Using the longest 'Ri at the junction, determine the face
:Fi of the tetrahedron which Ribbon 'Ri is adjacent.
4.2.2: Using this face,.ri determine the vertex of the tetrahedrahl
junction nearest to the first three vertices of'Ri. Call this
vertex Vk as the critical root vertex of the 'Ri.
4.2.3: Using Vk find a local optimal solution for the 'Ri'
4.2.4: Find the local optimal solutions for the remaining 'Rj, j =
2,3,4 using the appropriate root vertex of the junction
tetrahdera.
Step 4.3: Union the 'Ri and 'Rj,j = 2,3,4 and find the SMT of the
union.
Step 4.4: Store the SMT solution and k t- k + 1 and return to 4.1.
Step 4.5: The process is complete once the priority queue Q is ex-
hausted.
434 J. MacGregor Smith

4.5 Heuristic Example


The working of the heuristic is best described by way of an example. Con-
sider the case of three ribbons connected by two 2-way junctions, as shown
in Figure 25. Figure 25 is a two-dimensional projection representation of
the MST passing through the Delaunay data structure. The length of each
ribbon is 3 tetrahedra, and they are numbered 0, 1, & 2.

,
TetlO = ,
Junction 10 \
~ \

Points In Data Set 1


"&0(,1 ~
I
" ~ I
'.... I _--_
......... ,. Tetl4= ......................
" - - - -I
:
Junctlon'1
~.
I
\
\
\
.,
~

Points in Data Set 2" , , '§.

Figure 25: Example 2-way junctions.

1. Locate the longest ribbon. The program searches the list of rib-
bons, looking for the longest one. Since in this instance all ribbons are
of identical length, the "chosen one" is 'Ro, the first one in the list of
ribbons.

2. Identify the ends of the chosen ribbon. Check whether :


• The chosen ribbon has junctions at both ends. If so, choose the
junction with the larger number ofribbons and call it the "Join."
• If all the ribbons at the Join tetrahedron are as yet untagged,
by which is meant that they have not yet been included in any
collection of data vertices, then concatenate all the ribbons at
this Join, as well as the join itself into one data file. Mark all the
concatenated ribbons as tagged.
Steiner Minimal Trees in E3 435

Figure 26: Convex Hull of Example Vertex Set N=14

• If any of the ribbons that start/end at Join are tagged, go to


the other junction of the chosen ribbon, and repeat the above
process. If this junction too has at least one previously tagged
ribbon, return to the junction originally selected. In such cases,
the vertices of the chosen ribbon will be the only vertices in the
data set.
In the example, the chosen ribbon, no, has junctions at both ends,
both of which are 2-way junctions. Neither of the other two ribbons
have as yet been tagged, and so no are chosen and nl as the ribbons to
concatenate into a single data set, along with the junction tetrahedron,
i. e. tetrahedron number O.
3. For each of the ribbons that are to be concatenated together, one
must check whether it has a junction at the other end. If so, one will
436 J. MacGregor Smith

need to "split out" that junction.


4. Print out the relevant vertices from each of the concatenated ribbons.
These vertices are run through the SMT program, and the resultant
Tree is shown in Figure 27. The lighter lines represent the edges of
the entire set of tetrahedra.

Figure 27: SMT of top part of the data set.

5. The remaining vertices. The algorithm is now run again on the


remaining vertices in the data set i. e. it searches from among the
untagged ribbons the longest one, examines its junctions etc. In the
example, the only remaining ribbon is 'R,2. Therefore, the 2nd file
of data vertices will consist of the vertices from 'R2, along with the
vertices of Junction 1, which is the connector to the data vertices in
the 1st file. These vertices are now passed to the SMT program, and
Steiner Minimal Trees in E3 437

the resultant Tree is in Figure 28.

Figure 28: The bottom part of the data set.

6. Calculating the P3 ratio. For each of the segregated data sets that
the heuristic creates, a P3 value is calculated. The SMT /MST ratio
for the entire data set is then taken as the average of the individual
P3'S.
7. Splitting out a junction. This is a critical part of the heuristic.
When a junction such as Junction 1 in the example above is to be split
out, it is important to do it in a manner that allows one to maintain
continuity between the different vertex sets, and also not introduce
any additional distance into the overall tree. In the heuristic, this is
achieved by examining the junction tetrahedron to be split out and
identifying the vertex that is closest to the ribbon from which it is
438 J. MacGregor Smith

being separated. This vertex is then concatenated with the separated


ribbon, so that a good connection vertex is available to link the Steiner
Minimal trees obtained from the different data sets.

The overall SMT is obtained by the superposition of the individual trees,


and Figure 29 on the left shows the overall SMT for the sample data set.
The average Pa value for the entire data set is 0.804496 and was obtained in
31 seconds.

Figure 29: Heuristic and Optimal Algorithm Solutions

By contrast, the tree that would have been obtained had the entire data
set been passed directly to the SMT program is shown on the right in Figure
29. The Pa value thus obtained is 0.809361 and was obtained in 10 minutes
(600 seconds) of computing time. This solution is obviously not guaranteed
to be optimal.

4.6 Complexity Analysis


The worst case time bound for the three-dimensional DT(V) must be O(n2 )
[4]. An implementation of the Delaunay Triangulation based on the work
of Beichel and Sullivan [7] was employed. Steps 2.0 requires O(N2 ) time
in the worst case although some implementations of MST in Ea will run
faster. Step 3.0 is also O(N2) because there could be O(N2) tetrahedra.
Steiner Minima11rees in E3 439

Step 4.0 and its substeps can be carried out in constant time if the search
process is bounded above by a fixed constant amount of time Otherwise, it
will be exponential, in fact, it will require O(2N N!) time since the numerical
solution procedure is a branch and bound algorithm [62].
For the most general case where V are uniformly distributed within the
unit cube and where the local optimal solution procedure is bounded by a
constant search time OR-, then one can show:

Lemma 4.1 The greedy heuristic will require O(CIN2 ) time.

The result follows from the previous complexity analysis. The data struc-
ture is the key to the underlying complexity of the algorithm.
Since optimality is not being sought, and one knows the approximate
length of the CST chains, then the local optimal search can be controlled
and one can achieve reasonably good solutions in polynomial time. The key
factor here will be the expected Ribbon length.
For the special case where equilateral tetrahedra are studied, one knows
the P3 function as a function of the length of Ri then one can also achieve:

Lemma 4.2 For the equilateral tetrahedra case, the greedy heuristic will
require O( C2N2) time.
It is this latter case that is tested in this heuristic procedure for this Chapter,
where the V have special structure. While one could be criticized for testing
the heuristic on this special equilateral case, this is a case where many
degeneracies occur, so it is actually quite general.
Unfortunately, one cannot provide a performance guarantee like that
provided in the E2 because of the lack of knowing the optimal P3 in E3.

4.7 Experiments
As one measure of the worst case running time performance of the heuris-
tic process, the capability of the optimal seeking subroutine algorithm to
solve long chains of tetrahedra needs to be assessed. To estimate the fea-
sibility of this, the following experiments were performed on R-sausage
chains of lengths N = 25,50, 100. Those values which are blank in the
Table 2 represent the fact that the algorithm stopped without any solu-
tion obtained. Thus, as one sees, relatively short chains will be examined
in order to keep the computing times reasonable and this underscores the
need for the heuristic. Also one sees, that for N = 100, one is coming quite
440 J. MacGregor Smith

I Time II N=25 N=50 N=100 II


1 min 0.793215 -1- -1-
2 min 0.793215 -1- -1-
3 min 0.793215 0.787845 -1-
10 min 0.793215 0.787845 -1-
15 min 0.793215 0.787845 0.786049
20 min 0.793215 0.787845 0.786049

Table 2: Optimal Seeking Experiments

close to the conjectured optimal P3 (V) for E3 although the above results are
not guaranteed to be optimal. In order to test the heuristic, the following
experimental design was established, see Figure 30:

• Two Way n-Sausage Networks • Three Way n-Sausage Networks

• Four Way n-Sausage Networks • Many -Way n-Sausage Networks

Figure 30 represents 2 - d projections of the MST and tetrahedral ribbons


through which it passes. No 00 ribbon junction examples were included,
since that would bring one into the domain of examining how the optimal
sausage/ribbon topology should be combined to yield the best possible ver-
tex sets a la Issue #2. This will be examined in a future monograph.
Also, the length of the n-Sausage chains were kept relatively small so
that the local optimal search procedure is reasonably bounded in running
time. Larger vertex sets are feasible with a consequent increase in running
time although very reasonable. Figure 31 on the left illustrates the per-
formance of the heuristic in relation to the optimal seeking algorithm for
short symmetric vertex sets of 2-way, 3-way, 4-way, and multi-way junc-
tions. Figure 31 also illustrates on the right a series of experiments for
short asymmetric vertex sets. Figure 32 illustrates the longer experiments
for symmetric and asymmetric vertex sets. Running times for the heuristic
on all these vertices sets are arrayed in Table 3. Certainly for some of the
smaller vertex sets, the optimal seeking program can do better, yet the per-
centage improvement over the heuristic is not significantly different. Also,
for some of the longer more complex configurations, the run times and the
P3 values for the optimal seeking algorithm are much worse.
Steiner Minimal nees in E3 441

a.

R3
~ .'
V
•• '
RM-.
RM
RJ
{
..
RM_•

T_W., n.-w.,

.
....~
RM
.. .. {
RM-•

Ji'ov-w.,. Maltl-W.,

Figure 30: Experimental Design

5 Applications

Given what has been shown so far about Steiner trees in E3, it is quite
natural to wonder whether the long extended structure of the conjectured
optimal configuration might imply or have relevance to certain physical,
biochemical, or engineering applications. The most obvious application that
occurred was that the Steiner problem in E3 might help explain the reason
for the long molecular chains of atoms in proteins and DNA and other
molecular structures found in biochemistry [59]. In order to examine this
potential application area and others related to it, possible linkages between
the objective function of the Steiner problem and objective functions of these
applications in the physical, biochemical or engineering sciences need to be
examined.
442 J. MacGregor Smith

Topology Short Short Long Long


Symmetric Asymmetric Symmetric Asym~etric
2-Way 65/120 57/120 120/240 129/240
3-Way 120/240 129/240 120/240 165/360
4-Way 120/240 131/240 121/240 273/360
Multi-Way 128/600 174/600 300/600 678/720

Table 3: Computation Times (sees.) Heuristic vs. Optimal Algorithm

5.1 Minimum Energy Configurations (MEC's)


To clarify and delineate the connection between the scientific and engineering
applications and the Steiner problem one needs another property of the
Steiner problem which was first shown in the classic paper on Steiner trees
by Gilbert and Pollack [35]. It is recounted as Maxwell's Theorem after the
famous physicist.
Let FI! F2, F3, F4 be unit forces acting at fixed vertices Vb v2, v3, v4 re-
spectively. Also, if one tried to design a network with moveable Steiner
vertices to link up the fixed ends with elastic bands where each band will
have a tensile force then one seeks to find the network where these tensile
forces will be in equilibrium. Figure 33 nicely illustrates the MEC.

Theorem 5.1 If one draws unit vectors from a Steiner tree in the direction
of each of the lines incident to Vb V2, . •. , Vn and let Fi denote the sum of the
unit vectors at Vi, then in mechanical terms, Fi is the external force needed
at Vi to hold the tree in equilibrium. The length of the tree T has the simple
formula
n
T= LVi'Fi
i=l

The proof is in their paper[35].


What Maxwell's Theorem implies is that the minimal length Steiner tree
is equivalent to the equilibrium configuration of vertices which minimizes the
potential energy between them. Maxwell's application was to determine the
minimum weight truss made from pin-jointed rigid rods and holding a given
set of forces {FI, ... , Fn}. Maxwell's Theorem is more general in that it
applies to circuits as well as trees and the forces need not be all uniform,
although for the Steiner problem, the uniform forces are required [36].
Steiner Minimal Trees in E3 443

Short Symmetrical
Short .bymmetrlc.t
5-2 way

Heuristic p = 0.800900
Optimal p = 0.803092 Heuristic p = 0.804155
Optimal p = 0.797509

Heuristic p = 0.811104 Heuristic p = 0.821838


Optimal p = 0.815175
Optimal P = 0.814619

$-4 way

Heuristic p = 0.813008 Heuristic p = 0.8211 n


Optimal p = 0.822783 Optimal p = 0.839517

5-Mway
22pts.
----- Bpts.
7
-... ,,
~ \\ ....... --- ... ~ \

,,, ,,
\
\

Heuristic p = 0.816002 Heuristic p = 0.804226


'7pts.
Optimal p = 0.825n1 Optimal P = 0.655412

Figure 31: Short Chain Experiments

5.2 Lower Bounds

The importance of the Steiner tree for MECs is that if there are unit forces
acting at the nodes, then the Steiner tree which is the minimal length net-
work yields the optimal MEC. On the other hand, if the forces are not all
uniform, e.g. some forces are in compression as well as tension or vary in or-
der of magnitude, then the Steiner Minimal Tree yields only a lower bound.
What will be shown in the experiments on proteins which follow is that this
lower bound provided by the SMTs is surprisingly tight.
444 J. MacGregor Smith

Long Symmetrtcal
Long Aaymmetrtcal
L-2way 2Opts.--_" ~,,--20pts.
\
, I
, L-2way t7pts.--_"
\ ,
~,,-- tOpts.
\I
, 10 \/q,
10 ,
/I
I ,

,,." ' ........


Heuristic p =0.802911
--- ---
Heuristic p = 0.807855
Optimal p =0.798970 Optimal p = 0.800090

L-3way

=
Heuristic p 0.831902 Heuristic p =0.821328
Optimal p = 0.809665 Optimal p =0.806060

L-4way L-4way 26 Pts.--_,,\ / " , - t9pts.


, I
\I
12 , 5

Heuristic p = 0.822211 Heuristic p = 0.818221


Optimal p = 0.814605 Optimal p = 0.808800

L-Mway L-Mway

13

Heuristic p = 0.822948 Heuristic p z 0.815410


Optimal p = 0.853852 Optimal p = 0.844916

Figure 32: Long Chain Experiments

5.3 Protein Modeling


To preface the approach that follows, a brief review of the basics of protein
modelling will be presented. See one of the basic texts in microbiology for
additional details [9, 66].

Proteins These are long connected chains of molecular structures com-


prised of elemental units called amino acids [38}.

Geometry Many of the proteins structures are well-known for their geo-
metric structures or topology. See the books [4g, 21J for some examples.
Steiner Minimal Trees in E3 445

1 \ MECProblem Minimal Network

Figure 33: Maxwell's Theorem Illustration

x-Ray Crystallography When biochemists seek to characterize the struc-


ture of a protein, they utilize two-dimensional images of x-ray crystal-
lography and neutron diffraction images and [461 and from these two-
dimensional representations, transform the coordinates of the atoms
into a three-dimensional representation.

The backbone or network structure of a protein is a linked sequence of


rigid peptide groups, see Figure 5.3. 2 That the rigid planes of atoms occur in
proteins is similar to the planes of Steiner points and Melzak Circles found
in the experiments for the small vertex sets of §3 is remarkable.
Figure 35 illustrates the 3-dimensional orientations possible with two
amide planes and the degree of freedom they have with variations in the <I?

2after I. Geis
446 J. MacGregor Smith

Irans·Ptpli.u
Group

Figure 34: Peptide or Amide Plane


Steiner Minimal Trees in E3 447

and W torsion angles, while Figure 36 illustrates the typical conjoining of


the amino acids in a protein with the amide planes and side chains.
The six atoms in the rigid plane Figure 5.3 essentially form a FST topol-
ogy in the plane with n - 2 Steiner points, where the Carbon and Nitrogen
atoms in the amide plane Figure 5.3 acting as 2 Steiner vertices connecting
the 4 atoms on the boundary of the amide plane. While the bond angles are
not exactly 1200 the FST topology of this planar group is very important
to the overall topology of the entire chain and the p(6) for the six atoms in
the amide plane is ~ 1. It is exactly this Steiner geometry which forms the
foundation of the rest of the long chains of amino acids.

5.4 Protein Experiments


When we first discovered the 'R.-Sausage we thought it might help explain
why a protein structure such as Collagen or DNA assumes the long helical
shape they do. In order to shed some light on this topic, it is important
to summarize some of the definitions and properties within the literature
that we found on the subject. Given the previously mentioned 'R.-Sausage
Steiner properties and equivalencies to MECs, we decided to test whether or
not we could describe the molecular structure of Collagen with the Steiner
algorithms. While we are excited about the properties of this new tool
for verifying and checking these protein structures, the following caveats
must be identified here. The ESMT algorithms do not take into account
differences in the mass of the atoms, nor do we worry about impurities or
obstacles that might exist in such structures.
Biochemists classify the protein structure problem into four basic classes:
primary, secondary, tertiary, and quatenary. The primary structure is con-
cerned with the basic amino acid sequence, while the secondary structure
is concerned with the basic topological network geometry (e.g. alpha helix,
beta sheet, triple helix, etc.)[9]. Tertiary and quatenary strucutres are more
complex, and in the experimental results in this Chapter, this will not be a
main concern.
The fundamental hypothesis in the experimental results is to determine
whether the secondary protein structures are minimal length networks, i. e.
Steiner trees. The hypothesized coordinates of the protein structures found
on the Internet will be tested as to whether or not they are Steiner.
The algorithm used to test the hypothesis is a branch and bound algo-
rithm which examines whether a particular FST topology of n - 2 Steiner
points minimizes the overall length of the network [62].
448 J. MacGregor Smith

Figure 35: 3d Structure of Two Amide Planes


Steiner Minimal Trees in E3 449

Sidt elwin Sldt elulill

Sldt elwin Sldt clsalll

Figure 36: Network of Amide Planes in a Protein

5.5 Protein Objective function description


It is very interesting to note the form and complexity of the objective func-
tion for minimizing total energy Etot which is the basis of most protein
conformation studies. One form of the objective function appears below
[12]:
E tot = Ebs+ Eab + Eop + E tor + E vdW +
Ee + E 14vdW + E 14e + Ehb (19)
where:
Ebs the sum of energies arising from bond stretching or compression beyond
the optimum bond length.

Eab the sum of energies for angles which are distorted from their optimum
values.

Eop the sum of energies for the bending of planar atoms out of the plane.

E tor the sum of the torsional energies which arise from rotations about each
respective dihedral angle.
450 J. MacGregor Smith

EvdW the sum of energies due to nonbonded van der Waals interactions.

Ee the sum of non-bonded electrostatic interaction energies.

E 14v dW, E 14e the sum of energies due to van der Waals and electrostatic
interactions, respectively, for atoms connected by three bonds.

Ehb the sum of energies due to hydrogen bond interactions.

One might argue that even though their objective is very complex with
many energy terms, the end result is to dampen the influence of anyone
particular force so that et. coeteris paribus a uniform distributed potential
energy function acts throughout.
It is important to realize that the above objective function is related to
the theoretical values of the amide plane model discussed earlier. Thus, the
numerical results of the protein models are subject to numerical round-off
errors due to the nature of the computational optimization procedures.

5.6 Experimental Results


In the experimental results that follow, the protein Collagen will be exam-
ined first, then followed by results with the proteins Fibroin (Silk) and Actin.
Collagen was chosen because it is a triple helix geometry, Silk because it is
comprised of a geometry referred to as antiparallel beta sheets, and Actin
because it represents an alpha helix structure. So for the sake of the argu-
ment the proteins that will be examined represent a broad range of chemical
structures. Other proteins are related to the above structures and some are
also combinations of the triple helix, antiparallel beta sheets, and alpha helix
geometries. All three of these proteins are categorized as structural proteins
because they transmit tensile and compressive forces in organic material.
Other proteins can be categorized as catalytic, rather than structural, since
they perform chemical functions. Experiments on these catalytic proteins
have also been carried out, but for the sake of brevity and the argument, do
not include them here. The results for the catalytic proteins are similar to
the structural protein results [58].
The experimental results are also divided into two sets: optimal results
and heuristic results. The optimal results are possible for small subsets of
atoms while the heuristic results are due to the larger number of examined
atoms. As will be presented, the heuristic results are very similar to the
optimal ones, even though a proof of optimality cannot be provided.
Steiner Minimal 'Irees in E3 451

5.6.1 Collagen Results


Strictly speaking, Collagen is itself a class of proteins. In the present context,
however, this term is identified with those class members that exhibit the
triple-helix geometry.
Some of the properties of Collagen worth noting are the following:

• Collagen is a well-known triple helix geometry.


• Collagen is a protein which occurs in vertebrate and invertebrate species
in bone, skin, tendon, cornea, and basement membrane {46} and is a
rigid, strong connective ligament for transmitting the structural forces
in these tissues {21}.
That Collagen is the connective network for transmitting structural forces
in human and animal tissues is remarkable when you compare this with the
mathematical properties and objective of the ESMT problem and the recent
discussion of Maxwell's Theorem. Collagen is a natural implementation of
the Steiner network problem. Two of the most useful papers were that of
Nemethy, et.al. [47] and Chen et. al. [12] because they represent some ofthe
most recent models of Collagen and the data sets of Cartesian Coordinates
in E3 of their collagen models were available from the Protein Data Bank
(PDB) on the Internet 3. In an appendix to this paper, sample input from
the Nemethy data set is included.
In Table 4 are arrayed ten experimental results from N = 6 randomly
generated vertices from the unit cube. As can be seen from these vertex sets,

Vertex Set p Vertex Set p


RP1 0.916348 RP6 0.946819
RP2 0.998736 RP7 0.913691
RP3 0.920773 RP8 0.925921
RP4 0.962201 RP9 0.936643
RP5 0.930199 RPlO 0.981655
Mean 0.9432986 Std. Dev. (S.D.) (0.028949)

Table 4: Random Vertex Sets N=6

the p can vary widely. The average reduction over the EMST of these ran-
dom vertex sets is 5.75% when in fact, the conjectured optimal configuration
3The protein data sets discussed in this paper are readily available for other researchers
to test simply by logging onto the POB and typing "collagen"
452 J. MacGregor Smith

of N = 6, Table 1, with a p = 0.808064936179 up to 19.2 % improvement is


possible.
In contrast to these experiments and the theoretical optimal P3 value for
N = 6, Table 5 arrays the Nemethy and Chen results for selected sets of
N = 6 atomic data sets. In Table 5 and subsequent ones, the chain from the
Collagen in noted, along with the number of each atom. The differences in
the atom numbers are due to the differences in the location of the Glycine
atoms. In the Chen set of data, again 5 atoms from Glycine and 1 atom
of Proline are extracted from the chains since only 5 Glycine atoms occur.
This is probably why the p values are not identical with that of Nemethy.

Nemethy Data p Chen Data p


A: 7-12 0.997986 A: 4-9 0.999297
A: 42-47 0.998025 A: 23-28 0.998668
A: 77-82 0.998004 A: 42-47 0.998599
A:112-117 0.998023 A: 61-66 0.998558
B: 160-165 0.998014 A: 80-85 0.998804
B: 195-200 0.997990 A: 99-104 0.998693
B:230-235 0.997989 A: 118-123 0.998514
B: 265-270 0.998023 A: 137-142 0.998692
C: 313-318 0.998023 A: 156-161 0.998839
C: 348-355 0.997998 A: 176-180 0.998697
Mean 0.9980075 Mean 0.9987361
(Std. Dev.) (0.0000195) (Std. Dev.) (0.0002207)

Table 5: Optimal Results, Glycine Atoms N=6

As can be seen in Tables 5 and 6 along with the mean and standard
deviations, the atoms from the Nemethy and Chen data sets reveal almost
identical p values, with little or almost ne variation. This is very interesting
because any sequence of similar atoms in the chain reveals a signature p
value. Besides this, the topology or fundamental structural relations of the
set V and S are identical in all the comparisons. In one sense this should be
expected, but because one does not know where the Steiner points are to be
located and what the topology will be with VUS, it is not at all obvious that
the topologies would be identical. Also, the % improvement is only 0.2%,
which is surprisingly low. Finally, the fact that the p value is ~ 1 indicates
that the atoms are packed very tightly together. What these results reveal is
that one can use the Steiner algorithm as an effective lower bound to predict
the p values of the atoms in a single chain within the protein.
Steiner Minimal Trees in E3 453

Nemethy Data p Chen Data p


A: 14-19 0.985001 A: 9-14 0.983205
A: 49-54 0.984993 A: 28-33 0.983043
A: 84-89 0.984973 A: 47-52 0.983111
A: 119-124 0.985015 A: 66-71 0.983104
B: 167-172 0.985002 A: 85-90 0.983081
B: 202-207 0.985011 A: 104-109 0.983092
B: 237-242 0.984969 A: 123-128 0.983121
B: 272-277 0.985005 A: 142-147 0.983121
C: 320-325 0.985016 A: 161-166 0.982990
C: 355-360 0.984994 A: 180-185 0.983156
Mean 0.9849979 Mean 0.9831024
(Std. Dev.) (0.0000147) (Std. Dev.) (O.000588)

Table 6: Optimal Results, Proline Atoms N=6

As another check on the predictive ability of the Steiner algorithm, Pro-


line atoms from the two data sets were selected from the proteins. Again,
as can be seen in the Table 6, there is almost no variability in the p values.
To determine if one could still predict both the topology and the p with
larger sets of atoms, additional experiments were carried out with N = 9, 12
atoms respectively for the Nemethy and Chen data sets. First the results
for the Glycine atoms of the Nemethy data set. Table 7 illustrates the three
data sets of N = 9, 12 respectively with the atoms selected from the chain
the p values and the algorithm run times. Notice that the computer run

Data Set N=9 Data Set N=12


kasc 1. dat 0.996717 katsl.dat 0.995595
#7-15 (56 sec.) #7-18 (54 min.)
kasc2.dat 0.996725 kats2.dat 0.995598
#42-50 (57 sec.) #42-53 (53 min.)
kasc3.dat 0.996715 kats3.dat 0.995586
#77-85 (56 sec.) #77-88 (53 min.)

Table 7: Nemethy Glycine Outputs, N=9,12

times increase exponentially with the size of N. Figure 37 illustrates two of


the identical topological outputs of the optimal Steiner trees for two different
Glycine data sets from the Nemethy chains.
In Table 8 are the optimal outputs for the Proline atoms of the Chen
454 J. MacGregor Smith

Figure 37: Nemethy Glycine atoms n=12

data set. Again the consistent p values are indicated for the different sets of
atoms.

Data Set N=9 Data Set N= 12


kach1.dat 0.980170 katcl.dat 0.983445
#9-17 (51 sec.) #9-20 (47 min.)
kach2.dat 0.980180 katc2.dat 0.983436
#28-36 (51 sec.) #28-39 (53 min.)
kach3.dat 0.980116 katc3.dat 0.983403
#47-56 (51 sec.) #47-58 (51 min)

Table 8: Chen Proline Outputs, N=9,12

Figure 38 illustrates two views of the identical topological outputs of the


optimal Steiner trees for the Proline data sets from the Chen chain.

5.6.2 Silk Results


This second structural protein examined is one of the most thoroughly stud-
ied proteins called Fibroin, the material of silk [9, 66]. The silk protein is a
structural protein used by insects and spiders to create cocoons, webs, nests,
etc. The peptide chains are arranged in anti parallel (3 pleated structures in
which the chains extend parallel to the fibre axis [66].
While the topology of the Steiner tree for the silk protein is different
than the Collagen triple helix, yet it is essentially Steiner as the numerical
Steiner Minimal Trees in E3 455

Figure 38: Chen Proline atoms N=12

value for p indicates.


That the polypeptide chain is most suitable for tensile structures is a
natural implementation of the properties of Steiner minimal trees. Figure
39 clearly reveals the amide planes, thus supporting the argument that the
fundamental building blocks of the amide planes provide the foundation for
the Steiner topology.
Alanine Data p Glycine Data p
A: 7-12 0.996483 A: 17-22 0.997177
A: 24-29 0.996525 A: 34-39 0.997115
A: 41-46 0.996552 A: 51-56 0.997102
B:71-76 0.996487 B: 81-86 0.997147
B: 88-93 0.996523 B: 98-103 0.997152
B: 105-110 0.996532 B: 115-120 0.997162
C:135-140 0.996455 C: 145-150 0.997156
C: 152-157 0.996535 C: 162-167 0.997121
C: 169-174 0.996529 C: 179-184 0.997153
Mean 0.9965134 Mean 0.9971428
(Std. Dev.) (.0000324) (Std. Dev.) (.0000238)

Table 9: Silk Optimal Results, Atoms N=6

In reading the tables, the identity of the atoms and the chain they belong
to are included in columns 1 and 3. Alongside each entry is the p(V) result.
The silk data set is from the Protein Data Bank entry pro>.rided by [30].
In examining the optimal results for subsets of silk atoms Tables 9,10, it
456 J. MacGregor Smith

Figure 39: Fibroin Silk Protein p(V) ~ 1.009263

is surprising that the p(V) value is so consistent with little variation. It is


not only this constancy which occurs in the p(V) value, but also the Steiner
path topology which remains invariant for each of the atomic subsets.
While one is able to achieve optimal results within reasonable run times
for small subsets of atoms in the previous tables, when N = 16 only sub-
optimal results are possible within reasonable run times. Nevertheless, the
consistency of the p(V) values, low standard deviations, probably indicates
that these results also are optimal.
Finally, another structural protein, namely, Actin will be examined.

5.6.3 Jlctin Ilesults


This is a small protein that normally occurs in muscle but in many other
tissues. Actin and another protein Myosin work together to create the forces
responsible for muscle contraction. Myosin and Actin are the major compo-
Steiner Minimal Trees in E3 457

Alanine Data p Gly+Ala Data p


A: 7-16 0.993938 A: 17-26 0.995814
A: 24-33 0.993945 A: 34-43 0.995797
A: 41-50 0.993991 B: 81-90 0.995786
B: 71-80 0.993927 B: 98-107 0.995843
B: 88-97 0.993966 C: 145-154 0.995815
B: 105-114 0.993968 C: 162-171 0.995828
C: 135-144 0.993900 D: 209-218 0.995821
C: 152-161 0.993998 D: 226-235 0.995830
C: 169-178 0.994008 E: 273-282 0.995810
Mean 0.9939601 Mean 0.9958157
(Std. Dev.) (.0000348 ) (Std. Dev.) ( .0000200)

Table 10: Silk Optimal Results N=lO

Alanine Data p Gly+Ala Data p


A: 7-22 0.996098 A: 17-32 0.995596
A: 24-39 0.996056 A: 34-49 0.995603
A: 41-56 0.996066 B: 81-96 0.995578
B: 71-86 0.996083 B: 98-113 0.995625
B: 88-103 0.996060 C: 145-160 0.995594
B: 105-120 0.996081 C: 162-177 0.995618
C: 135-150 0.996071 D: 209-224 0.995628
C: 152-167 0.996058 D: 226-241 0.995643
C: 169-184 0.996087 E: 273-288 0.995606
Mean 0.9960733 Mean 0.9956101
(Std. Dev.) (.0000180 ) (Std. Dev.) ( .0000190)

Table 11: Silk suboptimal Results N=16

nents of muscle and account for 60-70% and 20-25% of total muscle protein
respectively (pg. 1122) [66].
Table 12 illustrates the optimal results with the Steiner algorithm for
sub-sets of Actin atoms. Again the striking thing is the tightness of the
Steiner p value.
In examining these results there are a number of important vertices to
summarize:

• First of all, in the single-chain optimization results, there is a remark-


able regularity in both p for the subsets of atoms throughout the chain
as well as a consistency in the Steiner topology .

• Second of all, it is surprising that in most all problem instances, P3(n) -+


458 J. MacGregor Smith

• Figure 40 rep-
resents the Steiner tree output
of a sample of N = 99 atoms
of Actin from rabbit UO].
• Examining the experimental
data, the p(V) ~ 1.002394 re-
veals that the Actin protein has
p(V) ~ 1 which indicates two
things. First, the algorithm
has not converged to optimal-
ity largely because of the car-
dinality of the vertex set, and,
seco:nd, it appears that no ad-
ditional Steiner points are nec-
essary to decrease the overall
length of the network.

Figure 40: Actin p(V) ~ 1.002394

1. Certainly the result is affected by the number of atoms and their


locations in the chains, yet by and large P3(n) ~ 1.

• All topologies represent with certain exceptions degenerate solutions to


the Steiner problem. Thus, certain of the atoms, namely the Carbon
and Nitrogen atoms, are acting as Steiner points.

• Because of this degeneracy, the bond angles in the Collagen protein are
not exactly 1200 , some are larger (this is already known) and would
explain why the degeneracy occurs and why some of the given atoms
act as Steiner points.

There are two major open questions:

• Why are no additional Steiner points necessary?

• Why does P3(V) ~ 1?

The first issue seems to relate to the fact that in order to conserve space
in the molecule, the atoms are squeezed together to minimize the volume
Steiner Minimal 'lrees in E3 459

Atoms Acid p Atoms Acid p


A: 4-11 ASP 0.998511 A: 12-21 GLU 0.997636
A: 21-28 ASP 0.998373 A: 29-38 GLU 0.997770
A: 78-85 ASP 0.997868 A: 408-417 GLU 0.996896
A:161-168 ASP 0.998173 A: 528-537 GLU 0.995880
A:363-370 ASP 0.997320 A: 620-629 GLU 0.997921
A:400-407 ASP 0.996971 A:718-727 GLU 0.997441
A:596-603 ASP 0.998015 A:765-774 GLU 0.997259
A: 1192-1199 ASP 0.997700 A: 830-839 GLU 0.997508
A:1383-1390 ASP 0.998323 A: 907-916 GLU 0.998521
Mean 0.9979171 Mean .9974258
Std.Dev. (0.0005124) Std. Dev. (0.000734)

Table 12: Actin Optimal Results, Atoms N=8,9,10

between them while and at the same time minimizing the potential energy
function. However, one must also realize that the space is not completely
filled between the atoms because there are attractive and repelling forces at
work in the minimum energy configuration [53].
The second issue seems to occur because the backbone chain of atoms
is made up of atoms in the amide plane which are essentially FSTs with
P ~ 1. Of course, the atoms not in the amide plane interact with the those
in the plane and probably cause the natural variation in p which has been
measured experimentally.
Additional experimentation with other proteins, both structural and cat-
alytic, are underway in order to see how extensive and pervasive the Steiner
properties found with Collagen, Silk, and Actin occur in other proteins.
Thus, it appears that from all the experiments that the secondary struc-
ture of a protein is essentially a Steiner network topology. The Steiner tree
acts as a very tight lower bound on the MEC configuration which would
seem to encourage the use of Steiner tree algorithms for protein conforma-
tion. How the Steiner algorithms might be used in the tertiary and quatenary
classifications remains an open question.
Finally, it appears that most all the atoms are optimally located in order
to minimize the length of the interconnecting network, and, since minimizing
the length of the interconnecting network is tantamount to minimizing the
potential energy between all the atoms, the resulting structure is a stable
protein.
460 J. MacGregor Smith

5.7 Other Applications


While the application of E3 ESMT trees to other fields of Science and En-
gineering are speculative at this juncture, some mention of what is already
known is worth exploring.
The structure of glass silicates exhibits a similar geometry to the triple
helix pattern found with the conjectured optimal configuration for the Steiner
problem [45].
As might be expected from what is known of Collagen's structural prop-
erties, this may be applied to the design of columnar structures and space
frame trusses. Buckminster Fuller [31, 50] laid down ground work in this
area, although his viewpoint was not from a Steiner perspective, but simply
from the geometry of putting together tetrahedra. Fuller did not know the
relationship of the triple helix to the Steiner problem and was simply fasci-
nated by the helical shape when putting the tetrahedra together. He felt that
such a regular repeating structure might have some stability characteristics
other skeletal structures might not have, but he was not able to realize it in
any manner. It would appear from the previous discussion on MEC's and
Maxwell's Theorem that the Steiner coordinates and its network topology
are crucial to the stability of the R-Sausage in any structural engineering
application.
Of related interest to the above structural system properties includes
the design of building systems, i.e. the heating, ventilating, air-conditioning,
plumbing and electrical systems that provide the utilities in normal building
construction. Since minimizing overall length [54] tends to be the predomi-
nant objective of these building systems, the understanding of the optimal
topology of ESMTs should be of major import for this application.
Other areas of science and engineering such as the design of Polymers,
Antenna Design (often helical antennas are employed), VLSI and Massively
Parallel Computer network design would seem to be ideal candidates for ex-
amining how the ESMT and MEC problems relate, yet no more speculation
at this point will be attempted.

6 Summary and Conclusions


Many of the properties, algorithmic methodologies, and applications of Steiner
trees in E3 have been collected together in this Chapter. While the results so
far are very new and just developing, they should give the reader an idea of
the potential for Steiner trees for future theoretical and algorithmic research
Steiner Minimal TI-ees in E3 461

as well as the potential for their applications in science and engineering.


That all theory, methodological approaches (algebraic, geometric, and
optimization) and application areas related to MEC's and Steiner trees are
closely intertwined is both fascinating and compelling. The tools of minimal
length network algorithms should help in the future verification and illumi-
nation of computational biological structures and perhaps other problems
and applications in science and engineering.

References
[1] Aly, A.A., D.C. Kay and D.W. Litwhiler, 1979. "Location Dominance
on Spherical Surfaces," Operations Research 27, 972-981.

[2] F. Aurenhammer, "Voronoi Diagrams - A Survey of a Fundamental Ge-


ometric Data Structure, " ACM Comput. Surveys 23, 345-405, 1991.
Technical Report B-90-09, Department of Mathematics, Free University
of Berlin (1990).

[3] Bern, M. and R. Graham, 1989. "The Shortest-Network Problem," Sci-


entific American 260 (1), 84-89.
[4] Bern, M. and D. Eppstein, 1992. "Mesh Generation and Optimal Tri-
angulation," in Computing and Euclidean Geometry eds. D.Z. Du and
F.K. Hwang. World Scientific Press: Singapore, pp 23-90.

[5] Beasley, J.E., 1992 "A Greedy Heuristic for the Euclidean and Rectilin-
ear Steiner Problem," EJOR 58, 284-292.

[6] Beasley, J.E. and F. Goffinet, 1994. A Delaunay Triangulation based


heuristic for the Euclidean Steiner Problem." Networks 24, 215-224.

[7] Beichel, I. and F. Sullivan, 1992. "Fast Triangulation via Empty


Spheres." Working Paper, Computing and Applied Mathematics Lab-
oratory National Institute of Standards and Technology, Gaithersburg,
MD 20899.

[8] B.N. Boots, "Voronoi (Thiessen) Polygons" Norwich, England: Geo


Books, W.H. Hutchins & Sons (1986).

[9] Brandon, C. and J. Tooze, 1991. Introduction to Protein Structure.


New York: Gabriel Publishing.
462 J. MacGregor Smith

[10] Brazil, M., Rubinstein, J.H., Wormald, N., Thomas, D., Weng, J., Cole,
T., 1996. "Minimal Steiner Trees for 2kx2k Square Lattices. J. Comb.
Theorey Series A73 91-110.

[11] Chang, S.K., 1972 "The Generation of Minimal Trees with a Steiner
Topology," JACM 19 (4) 699-711.

[12] Chen, J.M., C.E. Kung, S.E. Feairheller, and E.M. Brown, 1991. "An
Energetic Evaluation ... Collagen Microfibril Model." J. Protein Chem-
istry 10 pg. 535.

[13] Cheriton, D. and R. E. Tarjan, 1976. "Finding Minimal Spanning


Trees," Siam J. of Computing 5 (4), 724-742.

[14] Chung,F.R.K. and F.K. Hwang, 1976, "A Lower Bound for the Steiner
Tree Problem," Siam J. App1.Math 34(1),27-36.

[15] Chung, F.R.K. and R.L. Graham, 1978. "Steiner Trees for Ladders,"
Annals Of Discrete Mathematics 2, 173-200.

[16] Cockayne, E.J. and Z.A. Melzak, 1968. "Steiner's Problem for Set Ter-
minals," J.Appl.Math., 26 (2), 213-218.

[17] Cockayne, E.J. and Z.A. Melzak, 1969. "Euclidean Constructability in


Graph Minimization Problems." Math.Mag.42, 206-208.

[18] Cockayne, E.J., 1972 "On Fermat's Problem on the Surface of a


Sphere." Mathematics Magazine Sept-Oct, 216-219.

[19] Courant, D.R. and H. Robbins, 1941. What Is Mathematics? New


York: Oxford University Press.

[20] Coxeter, H.S.M., 1961. Introduction to Geometry. Wiley.

[21] Dickerson, R.E. and I. Geis, 1969. The Structure and Action of Proteins.
New York: Harper and Row, Publishers.

[22] Dolan, J. R. Weiss, and J. M. Smith, 1991. "Minimal Length Tree


Networks on the Unit Sphere." Annals of Operations Research 33, 503-
535.
[23] Donnay, J.D.H. 1945. Spherical 'Irigonometry, in Encyclopedia of
Mathematics,Interscience: New York.
Steiner Minimal nees in E3 463

[24] Drezner, Z. and G.O. Wesolowsky, 1979. "Facility Location on a


Sphere," Journal of the Operational Research Society 29, 997-1004.

[25] Du, D.z., F.KHwang, and J.F. Weng, 1982. "Steiner Minimal Trees on
Zig-Zag Lines." nans. Amer. Math Soc. 278 (1), 149-156.

[26] Du, D.Z. and F.KHwang. "A Proof of the Gilbert-Pollak Conjecture
on the Steiner Ratio," Algorithmica 7 121-135, 1992.

[27] Du, D.Z. "Disproving Gilbert-Pollak Conjecture in higher dimensional


spaces." Manuscript, Computer Science Department, University of Min-
nesota,September 1992.

[28] Du, D.Z. and W. Smith, 1992. "Three Disproofs of the Gilbert Pollak
Conjecture on Steiner ratio in three or more dimensions." in review.

[29] Fortune, S.J. "Voronoi Diagrams and Delaunay Triangulations." in


Computing and Euclidean Geometryeds. D.Z. Du and F.K Hwang.
World Scientific Press: Singapore, pp. 193-233.

[30] S.A.Fossey,G.Nemethy,KD.Gibson,H.A.Scheraga, 1991. "Conforma-


tional Energy Studies Of Beta-Sheets Of Model Silk Fibroin Peptides. I.
Sheets Of Poly(Ala-Gly) Chains" Biopolymers31, 1529.

[31] Fuller, R.B., 1963. No More Secondhand'God. Carbondale: Souhern


Illinois University Press.

[32] Garey, M.R., R.L. Graham and D.S. Johnson, 1977. "The Complexity
of Computing Steiner Minimal Trees," Siam J. Appl. Math 32 (4), 835-
859.

[33] Garey, M.R. and D.S. Johnson, 1979. Computers And Intractability; A
Guide To The Theory Of Np-completeness. (San Francisco:W.H.Freeman
and Company.)

[34] Graham, R.L. and F.K Hwang, 1976, "Remarks on Steiner Minimal
Trees." Bull. Inst. Math. Acad. Sinica 4, 177-182.

[35] Gilbert, E.N. and H.O. Pollak, 1968, "Steiner Minimal Trees." Siam J.
Appl. Math 16, 1-29.

[36] Private communication.


464 J. MacGregor Smith

[37] Greening, M.G., 1971. "Solution to Problem E2233," Amer. Math.


Monthly, 4, 303-304.

[38] Hendrickson, B.A., 1990. "The Molecule Problem: Determining Confor-


mation from Pairwise Distances," Ph.D. Thesis #90-1159. Department
of Computer Science, Cornell University, Ithaca, NY 14853-7501.

[39] Hwang, F. W. and D. Richards, 1989 "Steiner Tree Problems," Net-


works.

[40] Kabsch W. et. al. "Atomic Structure of the Actin:/DN$ASE I Com-


plex," Nature 34737.

[41] Kuhn, H.W., 1973 "A Note on Fermat's Problem." Math. Program-
ming498-107.

[42] Litwhiler, D.W., 1980, "Steiner's Problem and Fagnano's Result on the
Sphere." Mathematical Programming 18, 286-290.

[43] Love, R.F. J.G. Morris and G.O. Wesolowsky, 1988. Facilities Location,
North-Holland.

[44] Melzak, Z.A., 1961. "On the Problem of Steiner," Can Math Bulletin
4, 143-148.

[45] McColm, I.J., 1983. Ceramic Science for Materials Technologists. New
York: Chapman and Hall.

[46] Miller, M.H. and H. A. Scheraga, 1976. "Calculation of the Structures


of Collagen Models .... " J.Polym.Sci.,Polym.8ymp. 54, 171-200.

[47] Nemethy, George, et.al., 1992. "Energy Parameters in Polypeptides .... "
J. Phys. Chem.96, 6472-6484.

[48] Okabe, A. B. Boots, and K. Sugihara, Spatial Tessellations: Concepts


and Applications of Voronoi Diagrams. Wiley, {1992}.

[49] Pauling, Linus, 1963. The Architecture of Molecules.

[50] Pearce, Peter. Structure in Nature is a Strategy for Design. MIT Press.

[51] Prim, R.C., 1957. "Shortest Connecting Networks and Some General-
izations." BSTJ 361389-1401.
Steiner Minima1TI-ees in E3 465

[52] Preparata, F. and M.I. Shamos, 1985. Computational Geometry,


Springer-Verlag.
[53] Schorn, Peter. 1994. Private Communication.
[54] Smith, J. MacGregor and J.S. Liebman, 1979. "Steiner Trees, Steiner
Circuits, and the Interference Problem in Building Design." Eng Opt 4
(1), 15-36.
[55] Smith, J. MacGregor, Lee, D.T. and J.S. Liebman, 1981. "An O(NlogN)
Heuristic for Steiner Minimal Tree Problems on the Euclidean Metric."
Networks 11 (1), 23-39.

[56] Smith, J. MacGregor, R. Weiss, amd M. Patel, "An O(N2) Heuristic


for Steiner Minimal Trees in E3," Networks,25, 273-289.

[57] Smith, J. MacGregor and P. Winter, 1995. "Computational Geometry


and Topological Network Design," Computing in Euclidean Geom-
etry Lecture Note Series in Computing, Volume 4, 2nd edition, eds.
Ding-Zhu Du and Frank Hwang, World Scientific, 351-451.

[58] Smith, J. MacGregor, R. Weiss, B. Toppur, and N. Maculan, 1996.


"Characterization of Protein Structure and 3-d Steiner Networks," Pro-
ceedings of the II ALIO/EURO Workshop on Practical Combinatorial
Optimization, Catholic University of Valpariso, Faculty of Engineering,
School of Industrial Engineering, November 1996, 37-44.

[59] Smith, J. MacGregor, B. Toppur, 1996, "Euclidean Steiner Minimal


Trees, Minimum Energy Configurations, and the Embedding Problem of
Weighted Graphs in E3," Discrete Applied Matematics71, 187-215.
[60] T6th, Fejes, 1975. "Research Problem 13", Period. Math. Hungar., 6,
197-199.
[61] Wills, J.M., 1985. "On the density of Finite Packing,". Acta. Math.
Hung.,46, 205-210.

[62] Smith, Warren D~ "How to find Steiner minimal trees in Euclidean


d-space," Algorithmica 7 137-177, 1992.

[63] Smith, Warren D. "Two disproofs of the Gilbert-Pollak Steiner ratio


conjecture in d-space for d ~ 3. accepted for publication Journal of
Combinatorial Theory, Series A.
466 J. MacGregor Smith

[64] Smith, Warren D and J.MacGregor Smith. "On the Steiner Ratio in
3-Space." Journal of Combinatorial Theory, Series A69(2), 301-332.

[65] Vaidya, Pravin M., 1988. "Minimum Spanning Trees in k- Dimensional


Space" SIAM J. Comput 17 (3),572-582.

[66] Voet, D. and J. Voet, 1990 Biochemistry, Wiley.

[67] Wesolowsky, G.O., "Location Problems on a Sphere." Regional Science


and Urban Economics, 12, 495-508.

[68] Winter, P., 1987. "Steiner Problem in Networks: A Survey." NET-


WORKS 17,129-167.
Steiner Minimal Trees in E3 467

7 Appendix I
PRELIMINARY 29-APR-92 P1BBE
Collagen - triple helix where each strand
consists of 2 (GLY-PRO-PRO)4 (MODEL 1)

THEORETICAL MODEL

G.Nemethy,K.D.Gibson,K.A.Palmer,C.N.Yoon,
G.Paterlini,A.Zagari,S.Rumsey,H.A.Scheraga

"Energy parameters in polypeptides. Improved


geometrical parameters and nonbonded
interactions for use in the ecepp/3 algorithm,
with application to proline-containing peptides."

J.PHYS.CHEM. V.96 6472 1992


ASTM JPCHAX US ISSN 0022-3654

These coordinates were generated by molecular


modeling. protein data bank conventions require
that *cryst1* and *scale* records be included,
but the values on these records are meaningless.

The coordinates presented in this entry are


those described as model rs in the paper cited
as reference 1 above.
1 A 14 ACE GLY PRO PRO GLY PRO PRO GLY PRO
PRO GLY PRO PRO
2 A 14 NME
1 B 14 ACE GLY PRO PRO GLY PRO PRO GLY PRO
PRO GLY PRO PRO
2 B 14 NME
1 C 14 ACE GLY PRO PRO GLY PRO PRO GLY PRO
PRO GLY PRO PRO
2 C 14 NME
'/.
468 J. MacGregor Smith

i. CHAIN A
i.
ATOMS:

1. C ACE A 1 2.061 -1.849 -9.496


2. 0 ACE A 1 3.286 -1. 739 -9.526
3. CH3 ACE A 1 1.377 -2.908 -10.290
4. lHH3 ACE A 1 2.119 -3.490 -10.836
5. 2HH3 ACE A 1 0.822 -3.564 -9.619
6. 3HH3 ACE A 1 0.688 -2.444 -10.9
7. N GLY A 2 1.255 -1.073 -8.786
8. CA GLY A 2 1.784 0.002 -7.964
9. C GLY A 2 2.529 -0.553 -6.748
10. 0 GLY A 2 2.308 -1.696 -6.349
11. H GLY A 2 0.260 -1.169 -8.767
12. lHA GLY A 2 2.459 0.620 -8.556
13. 2HA GLY A 2 0.970 0.647 -7.633
14. N PRO A 3 3.420 0.241 -6.139
15. CA PRO A 3 4.169 -0.224 -4.969
16. C PRO A 3 3.281 -0.399 -3.747
17. 0 PRO A 3 2.168 0.122 -3.705
18. CB PRO A 3 5.219 0.868 -4.754
19. CG PRO A 3 4.574 2.100 -5.322
20. CD PRO A 3 3.782 1.623 -6.505
21. HA PRO A 3 4.589 -1. 212 -5.159
22. lHB PRO A 3 5.458 0.992 -3.698
23. 2HB PRO A 3 6.152 0.632 -5.266
24. lHG PRO A 3 3.931 2.582 -4.586
i.
i. CHAIN B
i.
25. C ACE B 1 -2.353 -1.458 -6.568
26. 0 ACE B 1 -2.592 -2.665 -6.598
27. CH3 ACE B 1 -3.177 -0.504 -7.362
28. lHH3 ACE B 1 -3.945 -1.053 -7.908
29. 2HH3 ACE B 1 -3.651 0.212 -6.691
30. 3HH3 ACE B 1 -2.539 0.027 -8.068
31. N GLY B 2 -1.382 -0.903 -5.858
32. CA GLY B 2 -0.499 -1. 713 -5.036
Steiner Minimal Trees in E3 469

33. C GLY B 2 -1.241 -2.271 -3.820


34. 0 GLY B 2 -2.276 -1. 739 -3.421
35. H GLY B 2 -1.195 0.079 -5.839
36. lHA GLY B 2 -0.096 -2.534 -5.629
37. 2HA GLY B 2 0.348 -1.113 -4.705
38. N PRO B 3 -0.730 -3.350 -3.211
39. CA PRO B 3 -1.386 -3.938 -2.041
40. C PRO B 3 -1.305 -3.037 -0.820
41. 0 PRO B 3 -0.492 -2.115 -0.777
42. CB PRO B 3 -0.634 -5.253 -1.826
43. CG PRO B 3 0.730 -4.980 -2.394
44. CD PRO B 3 0.495 -4.086 -3.577
45. HA PRO B 3 -2.452 -4.064 -2.231
46. lHB PRO B 3 -0.582 -5.517 -0.770
47. 2HB PRO B 3 -1.122 -6.082 -2.338
48. lHG PRO B 3 1.373 -4.498 -1.658
'!.
'!. CHAIN C
'!.

49. C ACE C 1 -0.738 2.668 -3.640


50. 0 ACE C 1 -1.829 3.236 -3.670
51. CH3 ACE C 1 0.409 3.191 -4.434
52. lHH3 ACE C 1 0.097 4.082 -4.980
53. 2HH3 ACE C 1 1.230 3.445 -3.763
54. 3HH3 ACE C 1 0.739 2.429 -5.140
55. N GLY C 2 -0.479 1.580 -2.930
56. CA GLY C 2 -1.504 0.961 -2.108
57. C GLY C 2 -1.831 1.830 -0.892
58. 0 GLY C 2 -1.029 2.673 -0.493
59. H GLY C 2 0.411 1.124 -2.911
60. lHA GLY C 2 -2.405 0.804 -2.701
61. 2HA GLY C 2 -1.166 -0.021 -1. 777
62. N PRO C 3 -3.010 1.641 -0.283
63. CA PRO C 3 -3.390 2.437 0.887
64. C PRO C 3 -2.548 2.106 2.108
65. 0 PRO C 3 -1. 891 1.066 2.151
66. CB PRO C 3 -4.863 2.084 1.102
67. CG PRO C 3 -4.985 0.699 0.534
470 J. MacGregor Smith

68. CD PRO C 3 -4.060 0.673 -0.649


69. HA PRO C 3 -3.211 3.495 0.697
70. 1HB PRO C 3 -5.131 2.108 2.158
71. 2HB PRO C 3 -5.522 2.786 0.590
72. 1HG PRO C 3 -4.703 -0.054 1.270
471

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (eds.) pp.471-524
@1998 Kluwer Academic Publishers

Dynamical System Approaches to Combinatorial


Optimization
Jens Starke
Institute of Applied Mathematics, University of Heidelberg, Germany
jens.starke@iwr.uni-heidelberg.de

Michael Schanz
Institute of Parallel and Distributed High-Performance Systems,
University of Stuttgart, Germany
michael.schanz@informatik.uni-stuttgart.de

Contents
1 Introduction 413

2 Assignment Problems 414

3 Dynamical Systems 411

4 Dynamical Systems Based on Penalty


Methods 478
4.1 Classical Approach. . . . . . . . . . . . . . . . . . . . . . . . . . .. 478
4.1.1 Constraints of Assignment Problems . . . . . . . . 480
4.2 Variants of Penalty Terms . . . . . . . . . . . . . . . . . . . . . . .. 481

5 Statistical Approaches 482


5.1 Metropolis Algorithm . . . . . . . . . . . . . . . 482
5.2 Simulated Annealing . . . . . . . . . . . . . . . . 483
5.3 Evolutionary Strategies and Genetic Algorithms 484

6 Neural Networks 484


6.1 Hopfield Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 485
6.1.1 The Hopfield Model for the TSP . . . . . . . . . . . . . . 486
6.1.2 Adaptation to the Assignment Problem . . . . . . . . . . . .. 488
6.2 Self-Organizing Maps and Kohonen Networks . . . . . . . . . . . . .. 489
6.2.1 Self-Organizing Maps for the TSP .. . . . . . . . . . . . .. 489
6.3 Adaptation to the Assignment Problem ... . . . . . . . . . . . . .. 490
472 J. Starke and M. Schanz

7 Double Bracket Flows 490

8 Coupled Selection Equations 490


8.1 Selection Equations . . . . . . . . . . . . . . . 491
8.2 How the Coupled Selection Equations Work . 492
8.2.1 Problem Information as Initial Values 492
8.3 Coupled Selection Equations of Pattern Formation 493
8.3.1 Two-Dimensional Assignment Problem. 493
8.3.2 Corresponding Potential Function ... . 494
8.3.3 Interpretation as Penalty terms . . . . . . 495
8.3.4 Three-Dimensional Assignment Problem. 497
8.4 Coupled, Piecewise Continuous Selection Equations . 498
8.4.1 Two-Dimensional Assignment Problem .. 498
8.4.2 Analysis of the Basins of Attraction . . . 499
8.4.3 Criterion for Global Optimal Solutions . . 501
8.4.4 Three-Dimensional Assignment Problem . 501
8.4.5 Criterion for Global Optimal Solutions . 502

9 Numerical Results 503


9.1 Generation of Data-Sets . . . . . . . . . . . . 503
9.2 Generation of Pseudo-Random Initial Values 505
9.3 Numerical Solution of Dynamical Systems .. 506
9.4 Parameter Settings for the Numerical Simulations 507
9.4.1 Dual Forest Algorithm. 508
9.4.2 Branch and Bound . . 508
9.4.3 Penalty Methods . . . . 508
9.4.4 Simulated Annealing . . 509
9.4.5 Hopfield and Tank Approach 509
9.4.6 Coupled Selection Equations 509
9.4.7 Coupled Selection Equations with Cost Terms . 510
9.5 Comparison of Several Methods . . . . . . . . . . 510
9.5.1 Two-Dimensional Assignment Problem .. 511
9.5.2 Three-Dimensional Assignment Problem. 513

10 Conclusions and Outlook 516

References

Abstract
This article describes and compares several dynamical system approaches
to combinatorial optimization problems. These include penalty methods,
the approach of HOPFIELD and TANK, self-organizing maps, i.e., KOHO-
NEN networks, coupled selection equations, and hybrid methods. Many of
them are investigated analytically and the costs of the solutions are com-
pared numerically with those of solutions obtained by simulated annealing
and the costs of a global optimal solution. In order to get reproducible
Dynamical System Approaches to Combinatorial Optimization 473

simulation results, a pseudo-random number generator with integer arith-


metic is used to produce the data sets.
Using dynamical systems, a solution to the combinatorial optimization
problem emerges in the limit of large times as an asymptotically sta-
ble point of the dynamics. These are often not global optimal solutions
but good approximations of it. Dynamical system and neural network
approaches are appropriate methods for distributed and parallel process-
ing. Because of the parallelization, these techniques are able to compute
a given task much faster than algorithms which are using a traditional
sequentially working digital computer.
The analysis focuses on the linear two-dimensional (two-index) assign-
ment problem and the NP-hard three-dimensional (three-index) assign-
ment problem. These and other assignment problems can be used as
models for many industrial problems like manufacturing planning and op-
timization of flexible manufacturing systems (FMS).

1 Introd uction
The key idea of a dynamical system approach to combinatorial optimization
problems is that the reached asymptotically stable point of a dynamical system
is identified wi~h a solution to the combinatorial optimization problem. Clearly,
this stable point has to be a feasible solution to the combinatorial optimization
problem, i.e., all constraints have to be respected in order to get a global optimal
solution or at least a good approximation of it. The discrete positions of the
asymptotically stable points allow "the use of continuous dynamical systems for
discrete optimization problems. Here, the question arises of how to define a
dynamical system with appropriate properties.
There are many different dynamical system approaches to combinatorial
optimization. The most fundamental differences are, first, the initial conditions,
second, the set of stable points and the shape of the basins of attraction, and,
third, the deterministic properties, i.e., the presence or absence of additional
stochastic terms in the dynamical system.
To initialize the dynamical system, one can use random or calculated initial
values. The detailed information about the specific optimization problem like
costs of a single assignment can be used as initial values and/or for the definition
of the dynamics, i.e., the equation of motion. Examples of approaches with
random initial values are penalty methods [42] and the method of HOPFIELD
and TANK [49] and variations of it. In contrast to this, the approaches of coupled
selection equations in [79] use calculated initial values.
The second difference, which deals with the stability properties of the dy-
namical system, is the most crucial point. In order to make dynamical systems
without cost terms in the equation of motion applicable to combinatorial opti-
mization problems there must be a one-to-one mapping from the set of asymp-
474 J. Starke and M. Schanz

totically stable points of the dynamical system to the set of feasible solutions
to the combinatorial optimization problem. The reason for this is that they
have to be universal for all problems of the same size. Methods which contain
cost terms in the equation of motion do not require this one-to-one mapping,
i.e., not every feasible solution has to be represented as a stable point. This is
due to the cost terms because they define for each problem a specific dynamical
system. Nevertheless, both approaches, with or without cost terms in the equa-
tion of motion must not have stable states which are no feasible solution. These
important stability conditions of the dynamical system that are necessary to
obtain always feasible solutions are fulfilled only in some approaches (see, e.g.,
[79]). Others, like [49], may produce spurious states.
The third difference is about the use of additional stochastic terms to prevent
the system from getting stuck in a local minimum of the objective function. For
this purpose, stochastic terms and simulated annealing techniques are used in
[85], [86] to improve the solutions of [49].
The resulting solutions are usually not global optimal but nonetheless good
approximations. Compared to other heuristics (see, e.g., [36]), the advantage
of dynamical system approaches which include neural networks is that they are
appropriate for distributed and parallel processing. The reason for this is that
the dynamical system can be distributed among many coupled simple process-
ing units. Because the communication effort is low, the parallelization is very
effective. The use of parallel hardware for dynamical system approaches results
therefore in very fast problem solvers. Another advantage of many dynamical
system and neural network approaches is their error resistivity, i.e., the pro-
cess of finding a solution can be continued even after disturbances which often
appear for example in robotics (see, e.g., [81]).
To be able to compare different approaches, the following analysis focuses
on two- and three-dimensional assignment problems. Nevertheless, most of the
methods described below can be adapted to other combinatorial optimization
problems. Furthermore, it is worth mentioning that some of the approaches
like penalty methods have their origin in continuous optimization. To apply
these methods to combinatorial optimization problems, the set of solutions is
restricted to discrete variables by using specific constraints.

2 Assignment Problems
In this section two important special cases of assignment problems, the linear
two-dimensional assignment problem and the linear NP-hard three-dimensional
assignment problem, are considered. Assignment problems are simple models
for industrial optimization problems from the simple assignment of jobs to ma-
chines up to the organization of complex flexible manufacturing systems (FMS).
This practical relevance justifies and motivates the investigation of assignment
problems.
Dynamical System Approaches to Combinatorial Optimization 475

The linear two-dimensional assignment problem allows comparison of the re-


sults with the optimal solution even for large problem sizes by using polynomial
time algorithms, e.g., the dual forest algorithm [2] or the Hungarian method [19],
[69], [27]. As a second test example the three-dimensional assignment problem
is investigated as representative of NP-hard problems.

Definition. For given costs E R nxn the task of the linear two-dimensional
(Cij)
assignment problem is to find BOOLEan variables Xij E {O, I} with i,j E N :=
{I, ... , n} so that the total costs

C := L Cij • Xij (1)


i,j

are minimal with respect to the constraints

L Xij = 1 Vj E Nand (2)

L Xij = 1 Vi EN. (3)


j

The constraints (2) and (3) imply that (Xij) has to be a permutation matrix.

In graph theory, the assignment problem is known as the weighted bipartite


matching problem. The standard example is the assignment of jobs to machines.
Here, n jobs are given which can be worked on each of n machines. The real
number Cij defines the costs which are incurred by the execution of job i on
machine j:

machinej -

job i Cll Ci n )
(

! ~I ~n

The BOOLEan matrix

{ I if job i is worked on machine j


Xij:= 0 otherwise

is a permutation matrix, that is, every job has to be done once and only once
and every machine has to be used once and only once at the same time.

Lemma 2.1 The assignment problem (1), (2), (3) is invariant by adding a
constant to each element of a row or column.
476 J. Starke and M. Schanz

Proof. The proof is straightforward: Suppose the constant ai is added to all


elements of row i and the constant {3j is added to all elements of column j.
Using the KRONECKER symbol
I for i = i'
c5w := { 0 for i:l= i'

and G := {X(l), ... , x(n!)} which is the set of all n! permutation matrices it follows
with (2) and (3) that

arg z(r)EG
min ("(C" +a·,c5··, +
~
',3
'3 '"
(3"c5
3 33
..,) .x~~»)
'3

= arg . ('"
~m (r)"
~ Cij • Xij r • Xij(r)"
+ ~ ai' Vii' r • Xij(r»)
+ ~ fJj' Vjj' f.I
z EG .. . . . .
',3 ',3 ',3

= arg z(r)EG
min (~" Cij • x~~) +
'3
ai "
~ '3
x~~) + (3j "~ X(~») '3
',3 3 ,

= arg z(r)EG
min (~" Ci"3 x~~) + a· + (3')
'3 ' 3

(L
',3

= arg min
z(r)EG ..
Cij • X~;»)
',3
o
This property is used for the solution of the assignment problem by the
Hungarian method [19], [69], [27] to transform the cost matrix into an appro-
priate form to be used during further steps. Without loss of generality it can
be assumed that Cij ~ 0 Vi, j. Using the above described transformation, one
can always obtain a cost matrix with nonnegative entries so that a permutation
matrix can be selected out of the vanishing entries. The necessary computa-
tion time of the Hungarian method is of order O(n4) instead of the exponential
dependence of n by comparing all n! feasible solutions using exhaustive search.
Alternative to the minimization problem (1), it is sometimes useful to work
with the dual maximization problem, where

W := L Wij • Xij (4)


i,j

has to be maximized with winnings Wij ~ 0 and the constraints (2) and (3) for
Xij E {O, I}. One gets (4) from (1) with the linear transformation

Wij = -a· Cij +b with a> ° Vi,j. (5)


Dynamical System Approaches to Combinatorial Optimization 477

The condition Wi; ~ Ois no restriction, because oflemma 2.1: The optimization
problems (1) and (4) are invariant by adding a constant to every element of a
specific row or column, so that Wi; ~ 0 can always be fulfilled.

Definition. In the three-dimensional (three-index) assignment problem, the


BOOLEan variables Xi;k E {O, I} with i,j, kEN := {I, ... , n} have to be de-
termined such that the total costs

C := L Ci;k • Xi;k (6)


i,;,k

are minimal with respect to the constraints

LXi;k = 1 VkEN, (7)


i,;
LXi;k = 1 Vj EN, (8)
i,k

LXi;k = 1 "liEN. (9)


;,k

The three-dimensional assignment problem is NP-hard [29].


The standard example is the assignment of jobs, machines, and workers.
A generalization of the three-dimensional assignment problem leads to multi-
dimensional assignment problems, which can be used to model some problems in
flexible manufacturing systems (FMS), distributed autonomous robotic systems,
and cellular robotic systems (CEBOT) [79], [81].

3 Dynamical Systems
Dynamical systems are defined by the change of a state x(t) ERn with respect
to the time t. For example, in physical systems, the state vector x(t) E R6 may
indicate the time dependent position in space and velocity of a mass point.
In the case of continuous time the time evolution of the state x(t) is given
by the equation of motion

:i:=f(x) (10)
ft
with t E R. Here, :i: denotes the time derivative x of the state x and f : Rn I-t
R n determines this change in dependence of the state x. In the case of discrete
time the equation of motion is given by

x(t + 1) = f(x(t» (11)

with t E Z.
478 J. Starke and M. Schanz

In this article mainly continuous time systems are considered. Starting with
an initial condition x(t = 0) the equation of motion leads to a solution curve
x(t) in the state space which is called trajectory, the set of all solution curves is
called phase flow of the dynamical system. For large times the trajectory is often
restricted to a low-dimensional attractor. There exist several kinds of attrac-
tors, like limit cycles or strange attractors. The most simple but nevertheless
important case of an attractor is an asymptotically stable point. Various tech-
niques for the analysis of the asymptotic behaviour are developed. For details
see [44], [8], [9], [7], [10], [88] or [75].
In the dynamical system approach to combinatorial optimization one can
restrict to systems with asymptotically stable points x·. This asymptotic be-
haviour is defined as the limit of large times, i.e., x· = limHoo x(t}. These
asymptotically stable points are taken as candidates of solutions of the combina-
torial optimization problems. As it was already pointed out in the introduction,
for some dynamical system approaches there exists a one to one mapping of the
asymptotically stable points to the solution of the combinatorial optimization
problem (see for example [79]). Others like, e.g., [49] may produce spurious
states which are no feasible solution.

4 Dynamical Systems Based on Penalty


Methods
Nonlinear optimization problems are defined by a cost function which has to be
minimized and by separately given constraints which have to be respected. By
adding the cost function and further terms which contain the constraints, an op-
timization problem without separately given constraints is obtained. Clearly, by
regarding a minimization problem, these terms have to vanish if the constraints
are respected and have to be large, i.e., have to cause costs or penalties, if the
constraints are violated. Therefore, this procedure is called penalty method.
Related to penalty methods are barrier methods (see, e.g., [60] and [51]) and
interior-point methods (see, e.g., [66]).

4.1 Classical Approach


Penalty methods can be used to solve optimization problems with constraints
which are given as equalities or inequalities. Here, only problems with equality
constraints are considered. Furthermore, this section focuses on nonlinear op-
timization problems, i.e., either the objective function or cost function itself or
the constraints are nonlinear. For further readings see e.g. [42], [11], [51], [77],
[35].
Suppose, the function which has to be minimized is given by f : R n 1-+ R,
and the equality constraints are given by h: Rn 1-+ Rm. Thus the set offeasible
Dynamical System Approaches to Combinatorial Optimization 479

points is defined as G := {x E Rn : h(x) = o}. By using the penalty method,


the nonlinear optimization problem
x opt = argmin/(x) (12)
zEG

where the constraints are given by G is transformed in the unrestricted opti-


mization problem
xopt = arg zER"
min j(x) (13)

with
I(x) = I(x) + h(x). (14)
The function h(x) contains the constraints and has to fulfill the condition:

h _{ 0 for h(x) = 0
(x) - large positive value for h(x) o. t: (15)

In order to fulfill the conditions (15) often the square of h(x) multiplied by a
large constant p is used:
h(x) =p. (h(X»2. (16)
The following theorem states that the penalty method converges globally to
a minimum with respect to the constraints (see, e.g., [42]).
Theorem 4.1 Let x(P) be a minimum point of (11,) with (16), i.e.,
I(x,p) = I(x) + p' (h(X»2 (17)
on a compact set. Then
lim x(P)
1'-+00
= Xo and (18)

lim p' h(x(P» = 0, (19)


1'-+00

where Xo is the minimum point of 1 on G. .From 0 :$ p < rI it follows that the


inequalities
I(x(p),p) :$I(x(p'),p') :$ I(xo), (20)

l(x(P» :$ f(x(P'» :$ I(xo) and (21)

h(x(P» ~ h(x(P'» ~ 0 (22)


are fulfilled.
480 J. Starke and M. Schanz

See [42] for a proof.


In spite of this theorem, for the numerical handling of the penalty method
the constant p has to be chosen appropriate in dependence of the finite step size
of numerical schemes like the gradient descent method. This is really the weak
point of these methods.
The large positive value in (15), which has to be fulfilled by the function h,
cause a fast relaxation of the initial point x( t = 0) to the set of feasible solutions
which respect the constraint h(x) =
O. This initial point x(t =
0) is usually
chosen randomly. See section 8.3.1 for an exception of this. The penalty term
h(x) has to be chosen smooth in order to be able to find the minimum using
gradient descent methods
.
X=--.
aj
(23)
ax
By using a digital computer a finite step size is necessary, which makes the
use of gradient descent methods problematic [60], [51] and one has to accept
low convergence rates because of the unfavourable eigenvalue structure of the
problem. To improve the convergence rate conjugate gradient methods, the
NEWTON method or variants of it can be used (see, e.g., [60]).

4.1.1 Constraints of Assignment Problems

The constraints of two-dimensional assignment problems are considered in this


section in detail. Other constraints can be treated similarly.
The constraint of BOOLEan variables x E {O, I} can be achieved by using

Xij • (Xij - 1) =0 ¢:> Xij =0 or Xij = 1. (24)

To satisfy the condition (15) for all values of Xij instead of (24) the condition

(25)

can be used. Certainly, this is in some sense just arbitrary, there are many more
ways to respect (15) than just to take the square. For an alternative see the
next section.
Dealing with the two-dimensional assignment problem the constraints (2)
and (3) are rewritten as

h)2) = (~Xij _
,
1) 2 = 0 and (26)

h~3) = (~Xij _ 1) 2 = O. (27)


1
Dynamical System Approaches to Combinatorial Optimization 481

Again, for h~2) and h~3) the square can be used so that the constraints of the
two-dimensional assignment problem are respected with the penalty term

(28)
i,j j

Because of their similar meaning, the constraints h~2) and h~3) have to be
weighted equally. Therefore, the constant 112 is chosen for both terms.
The cost function (1) of the two-dimensional assignment problem is treated
as function
/(X) = CijXij L: (29)
ij

which has to be minimized. Using a gradient descent method to minimize (14)


the resulting equation of motion is

Xij = Cij + Pl ( 4X~j - 6X~j + 2Xij)


+2112 (~Xi'j -1) + 2112 (~Xij' -1). (30)
, 3

4.2 Variants of Penalty Terms


As mentioned before, there are many ways to respect (15). Instead of just taking
the square of the constraints h one can use several kinds of functions h, see [77],
[35] or [84] for examples.
Furthermore, one can use the fact that (24) implies Xij = X~j to obtain the
penalty term, resulting in

h(x) = Pl~(~X~j_l)2 +Pl~(~X~j-lr


3' , 3

+112 L L X~j • x~, j + 112 L L X~j • X~j'. (31)


i,j i'¢i i,j j'¢j

Similarly, because of Xij = X~j the original cost function can be changed into

/(X) = L:CijX~j. (32)


ij

The penalty term (31) is used in [40] to assign buildings to appropriate sites in
regional planning. The resulting equation of motion is

Xij = (8Pl - 2Cij )Xij + 8P2X~j - 4(Pl + P2)Xij ( ~ X~j' + ~ x~, j). (33)
3 ,
482 J. Starke and M. Schanz

By using a gradient descent method of h one obtains a dynamical system which


is identical to the coupled selection equations described in section 8. This allows
further analysis, pointed out in detail in section 8.3.1.

5 Statistical Approaches
The following statistical approaches are usually not regarded as dynamical sys-
tems but can be interpreted as very special cases of stochastic dynamical systems
with discrete time dynamics. Furthermore, stochastic methods are often used in
addition to deterministic methods to avoid stagnation of the dynamics in spuri-
ous states. For this reason, we regard their inclusion in this article as justified.
The METROPOLIS algorithm, simulated annealing, evolutionary strategies and
genetic algorithms are considered as a small selection of statistical approaches.
Further examples are, the greedy randomized adaptive search procedure for
the three-dimensional assignment problem proposed in [59] and the so-called
tabu search (see, e.g., (31], [32], [33], [20]). See [16] for an overview of several
stochastic methods applied to optimization.

5.1 Metropolis Algorithm


The METROPOLIS algorithm [63] is a modified Monte-Carlo method. By so-
called importance sampling the number of random points in promising areas of
the state space is increased. The cost function which has to be minimized is
called energy E because of the physical problem considered in [63]. An ensemble
of states is considered of which each state is assigned to an energy value. Starting
from an initial state a randomly chosen test state x' in its local neighbourhood
is considered. IT the energy E(x') of the test state x' is less or equal than the
energy E(x) of the old state x, the test state will be used as new state. In
the other case, i.e., if the energy of the test state is larger, this state is taken
with probability proportional to e- llE / T • Herein, !l.E = E(x') - E(x) is the
energy difference between the test state and the current state. The parameter
T E ~ is called temperature. This behaviour can be summarized by using the
conditional probability

P(x -t x') ={ 1 for !l.E = E(x') - E{x) ~ 0 (34)


e- llE / T otherwise

which gives the probability for changing from state x to state x'.
The fact that states with higher energy values can be assumed gives the
possibility for escaping from a spurious state. By repetition of this procedure a
(thermodynamic) equilibrium is reached. The ensemble reaches a BOLTZMANN
distribution nr oc e- Er / T [63]. Herein, nr is the number of elements of the
ensemble in the state r.
Dynamical System Approaches to Combinatorial Optimization 483

5.2 Simulated Annealing


The method of simulated annealing is a special case of a METROPOLIS algo-
rithm. The idea is taken from material sciences. There, defects in crystals are
eliminated by heating and a slow cooling process afterwards. A state of a crys-
tal with defects has higher energy than a state without defects. Fast cooling
processes result in the movement of the current state into a state with defects
which correspond to a local energy minimum.
The basic idea is identical with the idea of the METROPOLIS algorithm. But
in contrast to the METROPOLIS algorithm, in simulated annealing the temper-
ature T is decreased slowly. During this process of decreasing the temperature
T, the acceptance of energetically unfavorable states becomes more and more
improbable until T = 0, where only energetically better states can be accepted.
See figure 1 for the energy E in dependence of the time t. One can observe that
the height of the jumps to larger energy values is decreasing with the time t
which is due to the decrease of the temperature T.
The method of simulated annealing was used for the traveling salesman
problem by KIRKPATRICK [55], [54], [56]. It is possible to get good results
with little computing power. The monotonically decreasing function of the
temperature T has to be adapted to the optimization problem and the problem
size. This has to be done empirically.
It can be proofed that infinitely slow cooling results in a global optimal
solution. Clearly, this fact does not help very much in practical applications
because of the finite computing time. This and further detailed analysis of
simulated annealing can be found in [87].

E(x(t» 800

700

600

500

400

300

200

100

o ~~--~--~--~~--~--~--~
o 10 20 30 40 50 60 70 80
t
Figure 1: Energy E over time t of an example of simulated annealing.
484 J. Starke and M. Schanz

5.3 Evolutionary Strategies and Genetic Algorithms


Nature shows many different kinds of adaptation of several species to specific
environmental conditions in biological evolution. Keeping this in mind one can
interpret evolution as optimization method. The basic principles of biological
evolution are used by so-called evolutionary strategies (ES) and genetic algo-
rithms (GA) for optimization problems. RECHENBERG suggested evolutionary
strategies to optimize technical systems like, e.g., the shape of ajet [74J. The sys-
tems which have to be optimized are represented by a set of parameters. These
have to be adapted in order to optimize some function like, e.g., the thrust of
the jet. The parameters are represented as real numbers which define some
vector. This vector is copied and mutated with a given probability distribution
like the Desoxyribonucleinacid (DNA) in biological systems. The best of these
variants are selected and copied and mutated again. The selection corresponds
to DARWIN'S "survival of the fittest". This procedure results in a more detailed
examination of the area in search space around successful variants. This method
is used with a different number of parallel existing individuals, descendants and
selected individuals. These evolutionary strategies were developed further and
compared with other methods by SCHWEFEL [76J.
Genetic algorithms are due to HOLLAND [46J. In contrast to evolutionary
strategies, they use a binary coding of the parameters which have to be op-
timized. The kind of coding is essential for the success of the approach of
genetic algorithms. The so-called genetic operator is defined by point mutation,
crossover and inversion. This genetic operator is used to mutate the individuals.
The composition of the genetic operator is in some sense arbitrary and requires
a lot of experience in applications of these methods. No strict rules are known
to construct this operator. Except of the parameter representation, the two
approaches are quite similar. They are not distinguished precisely in literature.
In [24J, [23J aspects of information theory and entropy concerning evolution
and optimization methods are considered. Applications of evolutionary strate-
gies and genetic algorithms to the traveling salesman problem can be found for
example in [13], [68J. FUrther applications are published, e.g., in [34J, [21J and
[64J.

6 Neural Networks
Systems consisting of a large number of connected simple units are called dis-
tributed systems or (artificial) neural networks. The units are able to perform
only simple operations. One fundamental advantage of these distributed sys-
tems is their error resistivity. An overview can be found in [65J or [41]. Many
historically important papers on this topic are collected in [5] and [4]. A short
discussion of the use of neural networks for combinatorial optimization is pre-
sented in [70], [71] and [91].
Dynamical System Approaches to Combinatorial Optimization 485

6.1 Hopfield Model


In [47], HOPFIELD points out the analogy between pattern recognition or cor-
recting associative memories and the phase flow in state space of physical dy-
namical systems with several stable states. In particular, HOPFIELD suggests
an associative memory similar to the ISING model of spin glass theory [15]. The
two state elements are called neurons, and the whole system an artificial neural
network.
A similar model with continuous states is introduced in [48]. This model is
the basis for the approach of HOPFIELD and TANK for combinatorial optimiza-
tion in the example of the traveling salesman problem (TSP), wherein one has
to find the shortest tour such that each of n given cities is visited once and only
once and the tour starts and ends in the same city [49], [50].
The Hopfield model proposed in [48] can be realized in hardware using analog
electronic circuits. These consist of a set of electronic nonlinear amplifiers which
are interconnected by resistors. Using such a hardware, a parallel processing is
natural. The nonlinear amplifiers transform an input (Voltage) signal Ui in an
output (voltage) signal Vi by
(35)
The transfer function gi which increases strictly monotonically with Ui, has
range [0,1] for Ui E [-00,00], and a well-defined inverse

(36)

Usually, the sigmoidal function gi(Ui) = (1 +tanh(ui))/2 is used. For details of


the electronic circuit see [48] or [65]. The equation of motion of the electronic
circuit and respectively of the Hopfield model can be written as

' i ' Ui = -Ui +L Wij . 9j(Uj) (37)


j

with local time constants 'i = GiRi and synaptic strengths Wij = Rd Rij. The
Gi represent capacitors and R i , Rij are resistors.
To be able to apply the Hopfield model to optimization problems the stability
properties of the output Vi are important. This is discussed in the following
theorem before applications to combinatorial optimization problems are treated
in the next sections.

Theorem 6.1 For symmetric synaptic strengths Wij = Wji the equation of mo-
tion (37) of the Hopfield network will tend to an asymptotically stable point for
any initial value.

Proof. The proof is based on a construction of a so-called LYAPUNOV function


E. In this context this function is often called energy function even if it does
486 J. Starke and M. Schanz

not represent the physical energy of the electronic circuit which is modeled by
the equation of motion (37). The LYAPUNOV function

1 LV.
1 n
E = --
2 i=1
I: I: -...
n 1
~J
j=1
Vi . V; +I: -.
n

~
i=1
gil (V)dV
0

is decreased monotonically by (37) because of

E =

The function gil (Vi) increases strictly monotonically which implies that
dgi 1 (Vi)/dVi is positive. The constants Ci are positive for all i. Because of
that, E ~ 0 holds. Furthermore, E is bounded from below if the sigmoidal
function is used for gi(Ui). Therefore, E = 0 is equivalent to Vi = 0 and it, = o.
This proves the theorem. 0

In the following all gi are assumed to be identical, i.e., gi = 9 Vi. The high-
gain limit of the amplifier, i.e., an approximation of a step function for g(Ui),
gives the quadratic LYAPUNOV function or energy function

(38)

with Qij, Pi E R which depend on the resistors and capacitors of the electronic
circuit. H they are chosen appropriate, the Hopfield model minimizes a quadratic
function which may be defined in dependence of a given BOOLEAN, i.e., 0/1-
combinatorial optimization problem. Because of the approximation of a step
function for g( Ui) in the high-gain limit the outputs agree only approximately
with the values 0 and 1.

6.1.1 The Hopfleld Model for the TSP


In [49] the energy function (38) for the traveling salesman problem with dis-
tances Cii' E R,. between city i and city i' is given by
Dynamical System Approaches to Combinatorial Optimization 487

The modulo formulation 1 + j mod n and 1 + (j - 2 + n) mod n is used instead of


j + 1 and j -1 to obtain properly defined indices in the set N = {I, ... , n}. The
constants A, B, C, D E ~ are positive real numbers. The condition Vij = 1
indicates the visit to the i-th city at position j of the path in the total tour. To
describe a tour which starts and ends at the same city and each city is visited
once and only once, V has to be a permutation matrix. The first term in (39)
causes high energy values for two non-vanishing elements in the same row i,
the second term for two non-vanishing elements in the same column j while the
third term causes high energy values for a number of non-vanishing elements
larger or smaller than n. The last term consists of the costs or distances Cij.
Therefore, this energy function can be regarded as cost function with penalty
terms of a penalty method.
The terms in E result in so-called frustrated states. This expression is com-
mon in spin glass theory and designates several competing couplings between
variables in the sense of low energetic states which cannot all be satisfied simul-
taneously. This results in several local minima where the system gets trapped
in and does not reach a global minimum.
The equation of motion of the HOPFIELD network for the TSP [48], [49] is
defined by

Uij = - U;j - A L Vii' - B L Vi,


j'#j i'#i
j - C (L L Vi,
i' j'
j' - n)
-D L Cii,(Vi',1+jrnodn + Vi',1+(i-2+n)rnodn) (40)
i'#i

with T E R+. The functions Vij(uij) are sigmoid functions of the form

(41)

where Uo is a scaling factor. The constant n is used instead of n to adjust the


operating point of the amplifiers [49]. The initial values Uij(t = 0) are randomly
chosen by
Uij(t = 0) = Uoo + Uo ( e - ~) (42)

around some center value Uoo where e is a random number in the interval [0,1].
The equation of motion (40) drives the system to the local minima of the energy
function E [48]. The spurious states of E cause solutions which do not fulfill
the given constraints. In other words, limHoo V may not be a valid tour in the
traveling salesman problem.
488 J. Starke and M. Schanz

The problems of the above mentioned local minima of the energy function
E are reported in detail in [89] and [52], among others. The stagnation of
the neural network dynamics in local minima of the energy function E can be
avoided by using some additional stochastic terms. This was done in [85], [86]
with annealing and normalization of the output. Theoretical investigations of
stability of the equation (40) depending on the parameter A, B, C and D and
supplementary remarks are given in [52] and [61].
In [83] the objective function is written in normal form and is investigated.
An analysis of the linearized dynamic of the HOPFIELD network is done in [30].
Further adaptations of the HOPFIELD network to special problems are made in
[62]. The influence of the initial values is reported in [89] and [83].
The above mentioned disadvantages of randomly chosen initial values, pa-
rameter sensitivity, and spurious states are avoided in other methods [18], [79]
which are reported below. As already mentioned, another possible approach to
avoid the stagnation of the dynamics in spurious states is the additional use of
statistical methods.

6.1.2 Adaptation to the Assignment Problem


The above described approach of HOPFIELD and TANK to the TSP is extended
to an AID converter, a signal decision circuit and a linear programming circuit
in [82].
The adaptation of (38) to the two-dimensional assignment problem can also
be done easily. This results in the energy function

E = ALL
2
L Vij Vip + B2 L L L Vij Vi,
i j j' f.j i j i' #i
j +C
2
(L L Vij _ n)
i j
2

+~ L LCij· V;~,
i j
(43)

and the following equation of motion

(44)

Again, n is used instead of n. The output, which has to be a permutation


matrix, is given by the n x n matrix (Vij).
The choice of the energy function E for the two-dimensional assignment
problem is not unique but the presented one is most similar to the energy
function for the TSP presented by HOPFIELD and TANK.
Dynamical System Approaches to Combinatorial Optimization 489

6.2 Self-Organizing Maps and Kohonen Networks


Another neural network approach to combinatorial optimization is done in [22]
and [6] for the TSP by using KOHONEN networks, i.e., self-organizing maps [57],
[58]. This approach is also called elastic net or elastic snake method because of
the fit of an initial elastic ring to the cities in the city space.

6.2.1 Self-Organizing Maps for the TSP

The elastic ring contains marked points which are iteratively moved to the cities.
The initial ring is located in the center of the city positions. The marked points
are finally mapped to the locations of the cities. Each marked point of the ring
is mainly shifted in the direction of the nearest city. The other cities have a
smaller influence which is decreasing with larger distance to the marked point.
Eventually each city is assigned to one and only one marked point.
The description of the elastic net method in [22] is quite similar to a penalty
method where the objective function

E = -aK2 LIn L ¢(IIXi - Yjll,K) + f3 L IIYj+l - Yjll2 (45)


j j

with
_d 2
¢(d, K) = e2K7 (46)

has to be minimized. Using the gradient descent method

. BE
y'---
1 - 8Yj
(47)

to minimize E, one obtains for the change of the space positions Yj of the marked
points

= a 7 I:p ¢(IIXi _ Yjlll, K)


. ,,¢(IIXi-Yjll,K) ( ) f3K( ) (48)
Yj Xi - Yj + Yj+l - 2Yj + Yj-l .

The second term in (48) can be interpreted as elastic force which explains
the name "elastic net method". In [6] one can find a more algorithmic formula-
tion which is also based on the idea of self-organizing maps. These approaches
intuitively lead to a short tour of the TSP. The application of the self-organizing
maps to the TSP produces also good results for large problems with up to 1000
cities [6]. There are variants of this approach where the equation of motion
cannot be written as gradient descent system (see, e.g., [28]).
490 J. Starke and M. Schanz

6.3 Adaptation to the Assignment Problem


To our best knowledge, there is nothing published about the application of self-
organizing maps or KOHONEN networks to assignment problems. In principle,
it should work to use the above described strategy with vectors of a canonical
basis for Xi, i.e.,
(49)
with the KRONECKER symbol6ii' and i, i' E {I, ... , n}. The initial values for the
vectors Yj ERn can either be chosen randomly or all equal.
Using an adapted cost term 13K Li,j CijYij in (45) the resulting equation of
motion is

(50)

Like in the case of the TSP, where the marked points Yj tend to the cities
Xi, the marked points Yj have to tend to the canonical basis vectors Xi. All
vectors Xi together define a permutation matrix which is a feasible solution to
the two-dimensional assignment problem.

7 Double Bracket Flows


BROCKETT introduces in [17] dynamical systems, so-called double bracket flows,
of the form

x = [Xj[Xjy)]
to diagonalize symmetric matrices, sort lists and solve linear programming prob-
lems. Here, X and Y are symmetric n x n matrices and [X jY] := XY - Y X
denotes the commutator. Such a double bracket flow is used in [18] to solve the
two-dimensional assignment problem and is applied in [90] to NP-hard com-
binatorial optimization problems. In these references, no numerical results are
presented nor compared with other methods. See [17], [18], [90] and references
there for further details.

8 Coupled Selection Equations


Using the method of coupled selection equations for combinatorial optimization
problems, the initial conditions are calculated from the specific data of the prob-
lem, i.e., the costs. They determine together with the basins of attraction the
final solution. In contrast to penalty methods and the approach of Hop FIELD
and TANK there are no cost terms in the equation of motion. The solution is
obtained as output of the dynamical system in the limit of large times, when
Dynamical System Approaches to Combinatorial Optimization 491

the state variables have reached the asymptotic stable points. Selection equa-
tions can be found in models of many biological, chemical and physical systems.
Below we will first describe the selection equations of pattern formation which
can be used for pattern recognition, before going on to outline an approach to
combinatorial optimization problems using coupled selection equations.

8.1 Selection Equations


The neural network approaches to associative memories which are described in
[48], [53] show many spurious states, i.e., the dynamic ends in patterns which
are not stored. This unwanted behaviour led to investigations about the storage
capacity of these networks by means of statistical analysis (see, e.g., [3], [65],
[41]).
To avoid the spurious states, HAKEN introduced a dynamical system where
the stored patterns are the only asymptotically stable points of the dynamics.
This dynamical system is called synergetic computer (see [37], [38], [39], and
references there). The patterns are stored as vectors J.ti = (J.til, ... , J.tin) T ERn
of the pixels for all i E M = {1, ... , m}. The vector component J.tij represents
the grey value of the corresponding pixel j of pattern i.
It is assumed that the vectors of the patterns are linear independent and that
the number m of patterns is at most n, the number of pixels. The dynamics
can be described with so-called order parameters ei which are coefficients of the
expansion of the test pattern II ERn with respect to the stored patterns J.ti and
a rest vector (}
m
II = 2: eiJ.ti + g. (51)
i=l

The equation of motion of the order parameters is defined by

(52)

with f3 > 1. To avoid the stagnation at some stationary but instable point some
noise is added to (52).
Theorem 8.1 The dynamical system (5B) is a selection equation, i.e., one and
only one mode survives, namely the mode with the largest corresponding initial
value (ei ~ 1 for t ~ 00), and all other decay to zero (ei ~ 0 with i '" i' for
l

t ~ 00).
Proof. The proof is based on linear stability analysis [38]. To get an visual im-
pression see figure 3 and the explanations for the coupled selection equations. 0

A corresponding dynamical system can be found in the BENARD problem of


fluid dynamics. There, it describes the selection behaviour of roll patterns. The
492 J. Starke and M. Schanz

similarity between pattern formation and pattern recognition was pointed out
by HAKEN [37], [14]. In [12], [43] a similar approach to an associative memory
and related systems can be found which uses a linear mapping to a dynamical
system with well-known asymptotic properties.
Further selection equations can be found in models of biological selection of
autocatalytic macromolecules. See the work of EIGEN and SCHUSTER in [25],
[26] and work based on these in [24], [45].
Concerning coupled selection equations which set of asymptotically stable
points is identical with the set of permutation matrices, a similar theorem to
8.1 will be stated below.

8.2 How the Coupled Selection Equations Work


The coupled selection equation approach to combinatorial optimization is based
on the idea of the competition between several modes or variables with respect to
the constraints. By regarding dynamical systems which allow not only one but
several modes to survive, it is possible to get complex solutions by composing
several single modes. This competition of parts of the whole solution simulta-
neously by the coupled selection equations lead to a significant reduction of the
computational complexity. In contrast to this, a comparison of all feasible solu-
tions is impossible for problem sizes n > 15 because of the exponential growth
of the number of all feasible solutions.
In the following, the maximization problems (4) which are transformations
of two-dimensional assignment problems are treated as well as the correspond-
ing three-dimensional problems. The maximization of the total winnings, i.e.,
the minimization of the total costs, is done by a selection of the largest single
winnings. The single modes or variables are initialized by the single winnings,
i.e., transformed costs. Therefore, the largest initial values have to win the
competition. The total procedure is a heuristic method to maximize the total
winnings, i.e., the sum of the single winnings.
This approach is similar to the intuitive approach of a human problem solver.
Several alternatives which are parts of the total solution, are compared during
the process of decision making, i.e., the decisions for the alternatives are com-
peting against each other.

8.2.1 Problem Information as Initial Values


The detailed problem information can be found in the valuation W E R+. and
in the constraints. The valuations are given by initial values to the dynamical
system and are not included in the equation of motion itself. The constraints are
respected by specific coupling terms of the dynamical system. The solution to
the combinatorial optimization problem emerges as limit of large times t -t 00.
The dynamical systems considered here, contain stable and unstable points only.
There are no limit cycles or more complicated attractors. The position of the
Dynamical System Approaches to Combinatorial Optimization 493

asymptotically stable points guarantee the emergence of feasible solutions only.


The other way round, every feasible solution corresponds to one and only one
asymptotically stable point.
The dynamical process changes the single continuous winnings in BOOLEan
values. This describes, the selection of suitable, i.e., as large as possible sin-
gle winnings as part of the total solution. The BOOLEan value 1 marks the

°
choice of the corresponding single decision with corresponding winnings while
the BOOLEan value marks that this alternative was not chosen.
In the following sections, two different types of coupled selection equations
are considered each for the two- and three-dimensional assignment problem.
These are the coupled selection equations of pattern formation and the coupled,
piecewise continuous selection equations. For an adaptation of the coupled
selection equations to other optimization problems, appropriate coupling terms
have to be used which lead to asymptotically stable points which respect the
constraints.

8.3 Coupled Selection Equations of Pattern Formation


By numbering the variables with two indices and using specific coupling terms,
the selection equation (52) can be adapted to two-dimensional assignment prob-
lems.

8.3.1 Two-Dimensional Assignment Problem


The given matrix (Wij) of non-negative winnings of the dual two-dimensional
assignment problem, i.e., the maximization problem (4), is used to initialize
the coupled selection equations with variables (eij) E R+.xn , as it is described
before, that is
(53)

For calculating Wij the transformation (5) is used so that Wij lies on the interval
[0,1].
To summarize the procedure of the coupled selection equation approach to
combinatorial optimization again, the final solution of the optimization problem
emerges in the limit e* for t -+ 00 as a BOOLEan variable Xij E {O, I} for each
element

(54)
from the problem information Cij E R, i.e., the given costs.
Extending the selection equations of pattern formation (52) with further
coupling terms, the dynamical system

~ij = eij - etj - {3 . eij L e~ j - {3 • eij L etj' (55)


i'i:i i'i:i
494 J. Starke and M. Schanz

results (see [78], [79]). All variables eij in the same row and same column are
coupled with each other.
Theorem 8.2 The set of asymptotically stable points of the dynamical system
(55) is identical with the set of permutation matrices for non-negative initial
values and (3 > 1/2.
Proof. The proof is based on the examination of the HESSE matrix of the cor-
responding potential function. See [79] for details. 0

This theorem states that there always emerges a feasible solution in the limit
of large times. Figure 2 shows this behaviour in 4 steps of the evolution for the
problem-size n 2 = 52 and (3 = 2.

to tl t2 t3

••• •
•• ·1· •• ••
• • • ••• •
••••••
•••




• • ••• •••••
• ••


••••• ••••
••••• • • • • • ••••
••• •••• • •

Figure 2: Four time steps of a simulation of the coupled selection equations


solving a two-dimensional assignment problem. The dots are arranged as a
matrix (~ij). The size of the dots is proportional to the values ~ii. The time to
indicates the initial state. At time t3 a stable point emerges which corresponds
to a permutation matrix.

As described above, the specific problem definition is represented by the


initial values and not by the dynamical system itself. That is, in contrast to
penalty methods or the approach of HOPFIELD and TANK, for all transformed
linear assignment problems (4) the same dynamical system (55) is used.

8.3.2 Corresponding Potential Function


To get a visual impression of how the stability properties work, one may rewrite
the equation of motion (55) as gradient flow
. 8<p.
eij = - 8eijV~ E {1,2} (56)

with the corresponding potential function

."
If) = _!2 "~ "")
c~. + !4 "~ "")
c~. + ~4 "~ "~ "")
c~ . . "")
c~ . + ~4 "~ "~ "")
c~ . . "").
c~., (57)
i,j i,j i,j i' #i i,j j' #j
Dynamical System Approaches to Combinatorial Optimization 495

Figure 3 shows a plot of cP for two variables el and e2. The manner in which

0.3

0-.---
-0.3

1.5

Figure 3: Potential function cP for two variables el, e2 ~ 0 with {3 = 4.

the coupled selection equations work can be illustrated for two variables el and
e2 by a decomposition of the potential function cP in two parts CPI and CP2 with
cP = CPI + CP2, where

1 2 1
Lel + -4 Let
2
CPI = -- (58)
2 i=l i=l

and

(59)

The potential function CPI is shown in figure 4. As one can see in figure 5,
the potential function CP2 prevents both of the variables el and e2 from being
non-vanishing in a stable state by putting a "hill" in the corresponding area.
This function causes the coupling terms in the coupled selection equations. The
superposition of CPI and CP2 cause that the minimum of CPI is replaced by two
minima in the total potential function cpo They are placed at the two saddle-
points of CPl. See figure 3 for illustration.

8.3.3 Interpretation as Penalty terms


There is an amazing correlation of the coupled selection equations of pattern
formation (55) to a penalty method with specific penalty terms. By adding l
496 J. Starke and M. Schanz

CPl
0.3

O-,-_m

-0.3

1.5

Figure 4: Potential function CPl for two variables el, 6 ~ o.

0.3

o
-0.3

1.5

Figure 5: Potential function CP2 for two variables el, 6 ~ 0 with {3 = 4.


Dynamical System Approaches to Combinatorial Optimization 497

to the potential (57), it can be written as


1
<p = <p + 4 (60)

= Pl ~ ( ~ ~lj -
l'
1r + Pl ~ ( ~ ~lj -
, 1
lr
i,j i'#i i,j j'#j

with Pl = land P2 = (3/4 - Pl. Therefore the potential (61) can be interpreted
as penalty terms of a penalty method as in (15). By using the objective function
(32) together with these penalty terms and a gradient descent method, one
obtains the equation of motion (33) as described in section 4.2.
Due to the cost term in (33), their stability properties change compared to
those reported in theorem 8.2 of the dynamical system (55). Because of the
cost term in the equation of motion, the stability properties depend on the cost
values. Also the basins of attraction of the dynamical system (15) are deformed
compared with them of the coupled selection equations (55).
A rough calculation can be done to demonstrate this fact. Assuming a


stationary point has one and only one non-vanishing element in each row and
each column, it follows from (33) that

~ij = 1-....!L (62)


4Pl
for these non-vanishing elements of the stationary points. This gives the devi-
ation of the non-vanishing elements from 1 in dependence of Pl and the costs
Cij'
Using the equation of motion (33), i.e., coupled selection equations of pattern
formation with an additional cost term, and the calculated initial values (53)
one obtains a hybrid method between the coupled selection equations and the
penalty method. As mentioned above the basins of attraction are deformed
compared to them of the coupled selection equations of pattern formation. This
implies a change in the solution behaviour.

8.3.4 Three-Dimensional Assignment Problem


To treat the three-dimensional assignment problem (6) - (9) or the correspond-
ing dual problem, i.e., the maximization problem, a three-dimensional variable
(~ijk) E R+xnxn is necessary. Apart from that, the line of proceeding is iden-
tical with the introduced approach of coupled selection equations for the two-
dimensional assignment problem. As in the two-dimensional case, the initial
values are calculated by

~ijk(t = 0) := Wijk Vi,j, k (63)


498 J. Starke and M. Schanz

using the winnings Wijk. The equation of motion is defined by

eijk = ~ijk + (3/3 - l)~ljk

-/3'~ijk (L~~j'k+ L~~jk' + L~~j'k')


i' ,j' i' ,k' j' ,k'
(64)

such that the coupling terms ensure asymptotically stable points which respect
the constraints of the three-dimensional assignment problem. These couplings
lead to a competition between all elements in the same plane. The stability
properties can be formulated by:
Theorem 8.3 The asymptotically stable points of (64) respect the constraints
E {O, I} 'Vi, j, k and (7) - (9). In addition, all points which respect these
~ijk
constraints are asymptotically stable points of (64).
A proof can be found in [79].
The decision variables (Xijk) can be identified with the asymptotically stable
points (~;jk) of the dynamical system (64) as it is done in the two-dimensional
case. Coupled selection equations for higher-dimensional assignment problems
can be defined similarly to the extension of the two-dimensional assignment
problem to the three-dimensional one.

8.4 Coupled, Piecewise Continuous Selection Equations


Using coupled, piecewise continuous selection equations, more far-reaching an-
alytical results can be obtained than with the smooth dynamical system (64) of
third order.
A criterion is given which shows whether a solution obtained by the coupled
selection equations is global optimal [80]. See also [79] for further details. Such
a criterion is very useful, since it tells whether the solution can or cannot be
improved by further trials with different optimization methods. Again, there
are no spurious states in the coupled, piecewise continuous selection equations
here introduced.

8.4.1 Two-Dimensional Assignment Problem


The dynamical system for handling the transformed two-dimensional assignment
problem (4) is defined for all i,j as

~".3. l - ~ij - /3 L: ~i' j - /3 L: ~ij' for ~ij > 0


= { X i'#i j'#j
o for ~ij =0 (65)
for ~ij <0
with large X > O.
Dynamical System Approaches to Combinatorial Optimization 499

Theorem 8.4 The set of asymptotically stable points of the coupled, piecewise
continuous selection equations (65) is identical with the set of permutation ma-
trices for non-negative initial values and f3 > 1/2.

The proof is based on examining the neighborhood of the stationary points. The
piecewise continuous dynamical system do not allow standard stability analysis.
See [80], [79] for details.
The initial values are chosen as in the case of the coupled selection equations
of pattern formation by (53). The emergence of a permutation matrix by using
the dynamical system (65) is shown in figure 6. Some minor differences to
figure 2 can be observed in the time evolution. Due to the jump in the equation
of motion (65) of the coupled, piecewise continuous selection equations, some
elements decay to zero after a finite time. In the case of coupled selection
equations of pattern formation (55) the decay to zero is exponentially.

t2 t3

••
to tl

••• • •• ••
•• '1·
•••••
•••••
• •• ••
•••• • . •
• •
• • '."
'. .'"'.'
• •••••

.
"
••

Figure 6: Four time steps of a simulation of the piecewise continuous dynamical


system solving a two-dimensional assignment problem. The dots are arranged
as a matrix (~ij)' The size of the dots is proportional to the values ~ij' The
time to indicates the initial state. At time ta a stable point emerges which
corresponds to a permutation matrix.

The system (65) can be adapted to three- or higher-dimensional assignment


problems, Likewise, the following analysis of (65) can be extended to these
problems, i.e., also to the N'P-hard three-dimensional assignment problem.

8.4.2 Analysis of the Basins of Attraction

The ability of the piecewise continuous dynamical system to solve assignment


problems is determined by the position of the stable points and the basins of
attraction like in all other dynamical system approaches. Therefore the analysis
of the basins of attraction is important.
500 J. Starke and M. Schanz

Theorem 8.5 For the dynamical system (65) the following relations are true
with two permutation matrices x(r) and x('):

L~ijX~j) ~ L~ijX~;) (66)


i,j i,j

if (3 > 1/2 and Vi,j (x~j) = 1 V x~;) = 1) => ~ij :I: O. This means that the
hyper-plane

L~ijX~j) = L~ijX~;) (67)


i,j i,j

is repelling and separates the two basins of attraction of the asymptotically stable
e
points = x(r) and flo = x(·) •
The following definition is used in the proof.
Definition. The index sets N2, J2, and R2 are defined by
N2 .- NxN,
[2 .- {(i,j) EN x N : ~ij > O},
R2 .- {(i,j) EN x N: ~ij = OJ .
Proof of theorem 8.5. Using the dynamical system (65) with (3 > 1/2, and the
constraints of permutation matrices (2) and (3), it follows that

LeijX~;)
i,j
= L
(i,j)EI2
(1 + (2(3 -l)~ij - (3 L ~i/j - (3 L ~ij/) x~;)
i'EN j'EN

= 1[21 + (2(3 -1) L ~ijX~j) - (3 L L ~i'jX~j)


(i,j) El2 (i,j) El2 i' EN

-(3 L L ~ij'X~j)
(i,j)EI2 j'EN

= 1[21 + (2(3 - 1) L ~ijX~j) - (3 L ~i/j L x~j)


(i,j)EN2 i' ,j

-(3L~ij' LX~;)
i,j' j

= 1[21 + (2(3 - 1) L ~ijX~;) - 2(3 L ~ij


i,j i,j

i,j i,j
Dynamical System Approaches to Combinatorial Optimization 501

o
By using this theorem, a criterion can be given to check if a solution to
the two-dimensional assignment problem obtained by the coupled, piecewise
continuous selection equations is optimal or not.

8.4.3 Criterion for Global Optimal Solutions


The quality of the solutions is decisively given by the shape of the basins of
attraction. Suppose the selection described by the coupled, piecewise continuous
selection equations changes variables permanently to zero one after another, i.e.,
variables are permanently eliminated as candidates for non-vanishing elements
in the solution of the optimization problem. Then a recursive application of
theorem 8.5 to the remaining non-vanishing elements guarantees an optimal
solution. This fact is described in the following theorem.
Theorem 8.6 The solution, i.e., the reached asymptotically stable point of the
dynamical system (65), is optimal in the sense of the two-dimensional assign-
ment problem (4) if

(68)
is true for all ~ij, i. e., if an element becomes zero and is staying at zero.

Proof. The proof follows directly from theorem 8.5 concerning basins of attrac-
tion and theorem 8.4 concerning asymptotically stable points of the coupled,
piecewise continuous selection equations. 0

The set of all linear, two-dimensional assignment problems where an optimal


solution can be guaranteed by using the coupled, piecewise continuous selection
equations, is given as
Lopt = {~(t = 0) E R+xn : "It' ~ii(t') = 0 => "It> t' ~ij(t) = o}.
In future work a criterion for the initial values ~(t = 0) must be sought which
allows one to determine Lopt without simulation of the coupled, piecewise con-
tinuous selection equations.

8.4.4 Three-Dimensional Assignment Problem


The extension of (65) to the three-dimensional assignment problem results in

1 + (3{3 - l)~iik - {3 (.hl ~i'i'k + .f=kl ~ilikl + .f=kl ~ij'kl)


, ,3 , , 3 ,
for ~iik > 0 (69)
o for ~iik = 0
X for ~ijk < 0
502 J. Starke and M. Schanz

with large X > o.

Theorem 8.7 The asymptotically stable points of (69) respect the constraints
{ijkE {a, I} Vi,j, k and (7) - (9). In addition, all points which respect these
constraints are asymptotically stable points of (69).

See [79] for a proof.


The decision variables (Xijk) correspond to the asymptotically stable points
({ijk) of the dynamical system (69).

8.4.5 Criterion for Global Optimal Solutions


The above described investigations of the two-dimensional assignment problem
can be adapted to the three-dimensional assignment problem.

Theorem 8.8 For the dynamical system (69) the following relation is true with
two feasible solu.tions x(r) and x(s) which respect the constraints (7) - (9):

L {ijkX~;~ ~ L {ijkX~;~
i,j,k i,j
(70)

if f3 > 1/3 and Vi,j,k (x~;~ = 1 V x~;l = 1) => {ijk i= O. This means that
the hyper-plane

L eijkX~;~ = L eijkX~;l
i,j,k i,j,k
(71)

is repelling and separates the two basins of attraction of the asymptotically stable
e
points {* = x(r) and = x(s).

The proof is similar to the proof of theorem 8.5.


As in the case of the two-dimensional assignment problem, a criterion can
be given to check if a solution to the NP-hard three-dimensional assignment
problem obtained by the coupled, piecewise continuous selection equations is
optimal or not, i.e., the piecewise linear structure of (69) allows the following
theorem.

Theorem 8.9 The dynamical system (69) solves the corresponding maximiza-
tion problem of the linear assignment problem (6) optimally if the condition

(72)

holds for all i,j,k.


Dynamical System Approaches to Combinatorial Optimization 503

Proof. The proof is based on the examination of the basins of attraction de-
scribed in theorem 8.8 and the position of the asymptotically stable points of
the coupled, piecewise continuous selection equations (69) described in theorem
~~ 0

By this theorem, one obtains a criterion for the N'P-hard three-dimensional


assignment problem (6) if the solution obtained by using the coupled, piecewise
continuous selection equations is global optimal.

9 Numerical Results
Numerical results are very important, in that they enable us to compare dif-
ferent methods for combinatorial optimization. Clearly, these results depend
on the data sets, the kind of the combinatorial optimization problem, and the
parameters of the method. There is no way to get rid of this drawback. Further-
more, the presentation of numerical results causes many problems with regard
to the scientific demands. To make the simulations reproducible, all data sets
used should be published as well. There are many cases in the history of sci-
ence where the disregard of this rule has caused disputes. As an example, see
the attempts in [89] to reproduce the results published in [49]. Because of the
large amount of data it is not possible to present them all in printed form.
Electronic publishing of the data sets is one way of getting around this prob-
lem, but often the further support of the provided material on the Internet is
not guaranteed for a long time. Therefore, a pseudo-random number generator
with integer arithmetic is used here in order to get reproducible data sets for
the simulations.

9.1 Generation of Data-Sets


In many algorithms running on a computer, random numbers are needed. How-
ever, the deterministic working principle of computers allow the production
of only pseudo-random numbers. Algorithms producing these pseudo-random
numbers can be found, for example, in [72]. For probabilities in the interval
[0,1], real numbers or floating point numbers in computers have to be used.
Because of rounding errors these algorithms are not reproducible. In contrast,
reproducibility can be achieved by using integer arithmetic. This is used in the
following to obtain pseudo-random cost matrices (Cij) and pseudo-random cost
tensors (Cijk).
The discrete time dynamic

Xn+l = (a· xn) mod m (73)


with Xn , a, mEN is applied. To produce the data sets, the initial value Xo =
111111 and the parameters a = 1233 and m = 260669 are used. The parameters
504 J. Starke and M. Schanz

a and m are chosen so that their product a . m is smaller than the largest 4-
byte signed integer (long int in the programming language C), i.e., a· m <
231 - 1. By using these parameters, a period of length 129822 is obtained. This
means that after 129822 pseudo-random numbers, the sequence is repeating.
The first pseudo-random number used for the data set is Xl. Starting with
Xo, the time discrete dynamical system (73) produces the following sequence:
111111, 148638, 20347, 63627, 251391, 29662, 79586, 117994, 33300, 133867,
54534,248489, 100862,23733,67861, ...
The elements of the cost matrix and cost tensor are chosen in the set {O, 1,
... , 99} of integer values. To obtain this set of integer values, the function

(74)

is used, where the GAUSS bracket L·J cuts off the decimals. This derives the
sequence 57, 7, 24, 96, 11, 30, 45, 12, 51, 20, 95, 38, 9, 26, 99, 89, 29, ... of
pseudo-random numbers.
For the purpose of using this pseudo-random sequence for the elements of
the cost matrices Cij and cost tensors Cijk, the uniform distribution of these
elements in every length scale of the data set is very important. To illustrate
that the distributions produced by the pseudo-random numbers are uniform, the
relative frequency r(x n } is plotted in figure 7 for the first 100 pseudo-random
numbers, in figure 8 for the first 1000, in figure 9 for the first 10000, and in
figure 10 the first 100000 pseudo-random numbers. In these plots the relative
frequency r(x n } is defined by the product of the number of pseudo-random
numbers falling at intervals of a given length which contain x and the number
of elements in the set {O, ... , 99}, i.e., 100, devided by the product of the total
number of pseudo-random numbers and the length of the intervals. The figures
show a good uniform distribution for every length scale of the data set.
The elements of the cost matrix c~~) with i,j E N = {1, ... ,n} of data set I
are defined by
(75)

using the produced sequence of pseudo-random numbers. Similarly, the elements


=
of the cost tensor C~~k with i, j, kEN {1, ... , n} of data set I are defined by

(76)

The entries of the first cost matrices or cost tensors of each data set for a given
problem size n start with the element Xl of the pseudo-random sequence. The
subsequent cost matrices or cost tensors use consecutively the elements of the
pseudo-random number sequence.
Dynamical System Approaches to Combinatorial Optimization 505

1.5

1r-------------------------~
• • • •
0.5

OL...--'---'----'----'---'---'---'----'----'----l
o 10 20 30 40 50 60 70 80 90 100
Xn
Figure 7: Distribution of the first 100 pseudo-random numbers produced by the
described algorithm in the interval [0,99]. The pseudo-random numbers falling
at intervals of length 5 are counted.

As an example, the first 10 x 10 cost matrix

57 7 24 96 11 30 45 12 51 20
95 38 9 26 99 89 29 68 46 67
23 54 13 26 40 30 50 79 45 97
53 42 28 5 89 98 1 61 72 97
92 59 64 43 78 14 87 82 1 24
(77)
67 90 59 16 75 71 73 86 90 81
32 89 63 86 73 11 71 81 40 0
67 66 79 5 67 72 35 64 36 16
74 56 74 37 85 85 65 5 25 44
5 40 81 95 51 91 52 37 87 9
is given.

9.2 Generation of Pseudo-Random Initial Values


For those methods which have to be initialized with random initial values, the
same procedure as for the generation of the data sets is used to generate these
pseudo-random numbers. The reason for this effort is again the reproducibility
of the simulations. This time, the discrete dynamics (73) is used with the initial
value Xo = 72664 as well as the parameters a = 1233 and m = 260669. By
using these parameters, a period of length 129822 is obtained. Again, the first
pseudo-random number used for the data set is Xl. Starting with Xo, the time
506 J. Starke and M. Schanz

1.5 .. . .. ...
... .. .. . .
1r----------------l..
•• ••
0.5

0'---'----'--'---'--....1.---"---'--........-..1...----'
o 10 20 30 40 50 60 70 80 90 100
Xn
Figure 8: Distribution of the first 1000 pseudo-random numbers produced by
the described algorithm in the interval [0,99]. The pseudo-random numbers
falling at intervals of length 1 are counted.

discrete dynamical system (73) produces the following sequence 72664, 185245,
61041, 190881, 232835, 88986, 238758, 93313, 99900, 140932, 163602, 224129,
41917, 71199, 203583, 254261, ...
The pseudo-random initial values have to lie in the interval [0,1]. The accu-
racy of these values is chosen as 5 decimals. Therefore, the function

xn = l:::: .100000J 10;000 (78)


is used. This produces the sequence 0.71065,0.23417, 0.73227,0.89322,0.34138,
0.91594,0.35798,0.38324,0.54066,0.62762, 0.85982, 0.16081, 0.27314, 0.78100,
0.97542, 0.68928, ... as pseudo-random initial values.
The elements of the initial n x n matrix and the initial n x n x n tensor
are defined equivalent to (75) and (76), respectively. As in the case of cost
matrices and cost tensors, the entries are chosen consecutively from the sequence
of pseudo-random numbers.

9.3 Numerical Solution of Dynamical Systems


The dynamical systems are solved numerically by using the explicit (forward)
EULER method

with discrete time steps tk and step size (1. This easy numerical method is suffi-
cient because the attractors of the dynamical systems considered are asymptoti-
Dynamical System Approaches to Combinatorial Optimization 507

1.5

. . . ..··0 ..: ........... , · ....


....:.:...... .e.:. ..... ,. . ........ . ....'
1 •• • .. •

0.5

o L-~ __ ~~ __ ~~ __ ~~ __ ~~~

o 10 20 30 40 50. 60 70 80 90 100
Xn
Figure 9: Distribution of the first 10000 pseudo-random numbers produced by
the described algorithm in the interval [0,99]. The pseudo-random numbers
falling at intervals of length 1 are counted.

cally stable points only and not limit cycles or even more complicated attractors
occur. To avoid the stagnation at stationary but unstable points, some noise
is added if the maximal component m8.Xi,; lei; I in the case of two-dimensional
assignment problems and maxi,;,k lei;k I in the case of three-dimensional assign-
ment problems is smaller than 0.0000l.
The integration was aborted if the constraints are fulfilled with accuracy of
Ll which is given below, i.e., elements which fall at interval [-Ll, A] are taken
as 0 and elements which fall at interval [1 - Ll, 1 + Ll] are taken as 1. H this
condition was not fulfilled after a given number of steps the computation was
stopped as well.
For parameter values, see the detailed description in the next section.

9.4 Parameter Settings for the Numerical Simulations

In order to allow a verification of our numerical results, the parameter settings


for the investigated methods are specified below. Nevertheless, even by using
reproducible data sets, there may occur some deviation of the obtained results
to those computed on other machines due to numerical rounding errors, but
these are neglectable in the total comparison.
508 J. Starke and M. Schanz

1.5

0.5

o L-~ __ ~~ __ ~~ __ ~~ __ ~~--J

o 10 20 30 40 50 60 70 80 90 100

Figure 10: Distribution of the first 100000 pseudo-random numbers produced


by the described algorithm in the interval [0,99]. The pseudo-random numbers
falling at intervals of length 1 are counted.

9.4.1 Dual Forest Algorithm


To evaluate the solutions of the methods used, the results are compared with
global optimal solutions. In the case of the two-dimensional assignment problem
the global optimal solutions are obtained by a dual forest algorithm [2], for which
the C code of P. KLEINSCHMIDT, A. HEFNER and H. ACHATZ has been used.
Another possibility is the well known Hungarian method [19], [69].

9.4.2 Branch and Bound


In the case of the NP-hard three-dimensional assignment problem, a branch and
bound method has been used. Because of bounding conditions the entire search
tree is truncated which reduces the total number of computations compared with
exhaustive search. For unfavourable data sets not enough bounding conditions
can be found, which results in long computing times. See, e.g., [36].

9.4.3 Penalty Methods


For the numerical integration of the equations of motion (30) and (33), the
explicit (forward) EULER scheme has been used.
For the classical penalty method (30), the step size a = 0.00001 and the
parameters PI = 10 and P2 = 10 have been used, which seem to be appropriate
for the two-dimensional assignment problems of size 10 x 10. In order to get
parameter settings which are independent of the range of the data sets, the
Dynamical System Approaches to Combinatorial Optimization 509

given costs are normalized on the interval [0,1]. The accuracy of ~ = 0.2 was
chosen to abort the integration.
The parameters PI = 1, P2 = 0.1 and the step size 0' = 0.001 have been used
for the variant with squared variables (33) for the problem sizes 10 x 10, 20 x 20
and 30 x 30.

9.4.4 Simulated Annealing


For the method of simulated annealing, an adapted routine of [72] has been
used. The original pseudo-random generator included in this routine has been
kept for the simulations. In the two-dimensional case, the assignment, i.e., the
permutation matrix is represented by an ordered list of the integer numbers
1, ... , n. This ordering is rearranged by reversing the order of part of the list
or replacing a part of the list between two arbitrary elements of the list in the
simulated annealing procedure. For the three-dimensional assignment problem
two of these lists have been used.
The cooling function T(m) = 0.95 m • T(O) where m denotes a step in the
cooling procedure, has been chosen with the initial temperature T(O) = 500.
The temperature is decreased if either the maximum number of reconfigurations
of 1000 . n or the the maximum number of successful reconfigurations 100 . n is
reached. Further details can be found in [72].

9.4.5 Hopfield and Tank Approach


Again, the costs of the data sets are normalized on the interval [0, 1] for the same
reason as mentioned in section 9.4.3. For the equation of motion (44) of the
approach of HOPFIELD and TANK adapted to the two-dimensional assignment
problem, the parameter settings A = B = 10, C = 2, D = 20, Uo = 0.02,
T = 1, and n = 15 are used. These parameters have been found by successive
determination of A = B and C to obtain permutation matrices as output.
After that, D is increased as long as possible so that a permutation matrix is
maintained as output. Furthermore, Uoo = uo·atanh(~-l) has been used which
gives initial values where the sum over the elements of each row and column
is equal to 1. These settings result in better solutions to the two-dimensional
assignment problem than the settings given in [49]. For the integration of the
dynamical system, the step size 0' = 0.001 has been used.

9.4.6 Coupled Selection Equations


For the two-dimensional assignment problem, the initial values are calculated
by using the transformation (5) with a and b so that the winnings Wij are
confined on the interval [0,1]. For both, the coupled selection equations of
pattern formation and the coupled, piecewise continuous selection equations,
the coupling parameter B = 2 and the step size 0' = 0.001 are chosen.
510 J. Starke and M. Schanz

For the three-dimensional assignment problem, the initial values are deter-
mined similarly to these of the two-dimensional case. Again, the coupling pa-
rameter B = 2 has been chosen. For problems of size 10 x 10 x 10, the fixed step
size (1 = 0.002 has been used for both methods, the coupled selection equations
of pattern formation and the coupled, piecewise continuous selection equations.
Due to gradients of large EUCLIDEAN norm in the case of the 20 x 20 x
20-problems using the coupled selection equations of pattern formation, the
absolute values of the components of the gradient were confined to 0.01 while the
used step size was (1 = 0.01. This was not necessary for the coupled, piecewise
continuous selection equations because of the piecewise linear equation of motion
instead of the terms of order three in the previous case. Therefore, the fixed
step size (1 = 0.001 has been used.

9.4.7 Coupled Selection Equations with Cost Terms


In the case of the selection equations with cost terms the cost values are used
for the equation of motion as well as for the initial values. For the equation
of motion they are used in the same manner as for the penalty methods. For
the initial values the same normalization on the interval [0,1] as for the above
described selection equations has been done.
For the numerical simulation of the evolution equation (33) the parameters
= = =
PI 1.0, P2 0.1, and the step size (1 0.001 have been used.

9.5 Comparison of Several Methods


r
The following tables show and compare the results of a dual forest algorithm,
branch and bound, classical penalty method, a variation of the penalty method,
simulated annealing, the approach of HOPFIELD and TANK, coupled selection
equations of pattern formation, coupled, piecewise continuous selection equa-
tions, and a hybrid method that is a coupled selection equations with an addi-
tional cost term.
Several simulation results are presented in the following for the two- and
three-dimensional assignment problem with different problem sizes and various
optimization methods. The data sets used for the numerical analysis are those
described in section 9.1. The corresponding pseudo-random initial values de-
scribed in section 9.2 are used for the methods which need randomly chosen
initial values. These are the classical penalty method (30), the penalty method
(33) with modified terms and the approach (44) of HOPFIELD and TANK. The
presented results are the total costs c(Al) of each used data set k in tables 1 and
4 and the mean of the total costs c of m investigated data sets in table 2, 3, and
5. The mean value is defined by
1 m
c= - Lc(Ie). (79)
m Ie=l
Dynamical System Approaches to Combinatorial Optimization 511

Furthermore, the standard deviation

1 m
8 = - ~)C(k) - C)2 = (80)
m k=l

is shown to give a measure for the spread of the obtained solutions compared to
the spread of the global optimal solution. The ratio of these standard deviations
indicates whether the quality of the obtained solutions is the same for all data
sets or whether it is varying strongly. The mean and the standard deviation are
rounded with an accuracy up to the first decimal.
We remark that we do not compare the computation time of the investigated
optimization methods. There are several reasons for this. First, our main
interest is not to find the fastest algorithm if the total amount of time is within
a reasonable range. Instead, we focus upon the investigation of the method's
optimization principle. Second, the comparison of CPU times of computer codes
is problematic. This depends strongly on the art of programming, the compiler
and compiler options used and the hardware.

9.5.1 Two-Dimensional Assignment Problem


Table 1 shows numerical results for data set 1 to 10 obtained by the dual for-
est algorithm, the classical penalty methods, a penalty method using modified
terms with squared variables, simulated annealing, the approach of Hop FIELD
and TANK, coupled selection equations of pattern formation, coupled, piecewise
continuous selection equations and coupled selection equations of pattern forma-
tion with an additional cost term. A global optimal solution which is obtained
by the dual forest algorithm, gives a comparative value to the solutions obtained
by the approximation methods. The symbol "-" in table 1 indicates that no
feasible solution was found for this data set.
A summary of the results for data set 1 to 100 in the case of the problem
sizes n 2 = 102 and n 2 = 202 and for data set 1 to 50 in the case n 2 = 302 ,
respectively can be found in the tables 2 and 3. The mean of the solutions and
the standard deviation are given for every method. Furthermore, the number
of feasible solutions obtained by the used methods is indicated. In table 2
and table 3, the classical penalty method and the approach of HOPFIELD and
TANK show obvious bad results. The quality of the solutions of the coupled
selection equations of pattern formation and of the coupled, piecewise continuous
selection equations is comparable with the quality of the solutions of simulated
annealing. The hybrid method, i.e., the coupled selection equations of pattern
formation with calculated initial values and with cost terms in the equation of
motion produces remarkable good results. Compared with the coupled selection
equations of pattern formation this is due to the deformation of the basins of
attraction caused by the cost term in the equation of motion. It is also much
better than the penalty method with the same equation of motion but with
512 J. Starke and M. Schanz

do
~ a s e
1 1n n eq
g pm mn e 1u
e e u e hm
f 0r n t e a Ye
n t 1 a ew c t
d~ i a h a 1 uo t i b t
ue t 1 0 t 1. r r i 0 r h
aSh t d e n ak on i 0
method 1t m Ys dg 1s n s dd
H 1-'
0 a
c v P c t
1 a f
f Po t. 0c
s r
0 i n
a i P r e t s
s q 1 e am c i f t
s ua t a e n 0
i ab 1T t t wu r t
c r 1 d a e i i 0 m. e
a e e n r 0 s u
variants of 1 ds &k nn e s &~
method
costs for 111 337 158 111 353 129 111 111
data set 1
costs for 95 195 179 95 263 95 95 95
data set 2
costs for 138 414 182 157 355 139 214 138
data set 3
costs for 154 - 216 167 278 155 171 154
data set 4
costs for 137 218 165 165 - 137 166 138
data set 5
costs for 154 281 214 158 272 166 162 154
data set 6
costs for 117 383 166 117 - 117 117 117
data set 7
costs for 96 396 151 100 383 96 96 96
data set 8
costs for 90 309 129 90 329 111 143 90
data set 9
costs for 104 326 214 104 340 153 104 104
data set 10
mean of 119.6 317.7 177.4 126.4 321.6 129.8 137.9 119.7
costs
standard 23.2 71.9 28.1 29.8 41.9 23.4 37.7 23.3
deviation

Table 1: Results of the simulations for 10 data sets of two-dimensional assign-


ment problems of size n 2 = 102 •
Dynamical System Approaches to Combinatorial Optimization 513

ig ~ a
1 n n
se
eq
pm mn e Iu hm
f 0r e e u e n t ea
n t I a ew ct Ye
d~ i ah a I uo t i b t
uet I 0 t . r r i0 r h
aSh e n1 ak i 0
t d on
method It m y s d g I s ns dd
H ~
0
c v P c t
I a f
f Po t. 0c
0 i n
a s! i P r e t s
s q1 e am c i f t
s ua t a en 0
i ab I T wu r t
c r I d a t ~ m. e
a n e 1 i 0
ee r 0 s u
variants of I ds &k nn e s &ti
method
problem size n~ = 1O~ , number of data sets = 100
feasible 100 88 100 100 85 100 100 100
solutions
mean of 129.5 300.5 189.2 138.0 307.8 136.7 144.7 129.7
costs
standard 31.8 84.7 35.9 32.9 42.7 34.7 42.6 31.9
deviation

Table 2: Summary of the results of two-dimensional assignment problems for


n 2 = 102 •

randomly chosen initial values because the calculated initial values start the
dynamics in a promising region.

9.5.2 Three-Dimensional Assignment Problem


For the NP-hard three-dimensional assignment problem only the most promis-
ing approaches of section 9.5.1 have been used. The solutions of the penalty
methods and the approach of HOPFIELD and TANK are either not feasible or re-
sult in very high total costs. The hybrid method which lead to excellent results
in the case of the two-dimensional assignment problem have caused problems
to obtain feasible solutions for three-dimensional assignment problems. The
results show a superposition of few feasible solutions. Further investigations
of this hybrid method and extensions to resolve this superposition might be
promising.
Table 4 shows the results for the first 10 data sets for branch and bound,
simulated annealing, coupled selection equations of pattern formation and cou-
pled, piecewise continuous selection equations. A summary of the results of
data set 1 - 100 in the case of problem size n 3 = 103 and data set 1 - 10 in
the case of problem size n 3 = 203 are presented in table 5. The global optimal
514 J. Starke and M. Schanz

ig pm
~ a
1 n n
s e
eq
mn e Iu hm
f 0r e e u e n t ea
n t I a ew ct Ye
d~ i a h a I uo t. 1. b t
ue t I 0 t . r r r h
aSh e 1 ak
1 0
on i 0
t d
method It m y s dg I s ns dd
y
H a
0
v P c t
a f
f Po t. 0c
0 i n s
s ~ i P r e t
q1 e am c i f t
ua I T
t a en 0
ab t t ~u r t
r I d a e i 1 0 m. e
ee n r 0 s u r
variants of ds &k nn e s &m
method
problem size n:.l = 20:.1, number of data sets = 100
feasible 100 100 100 92 100 100 100
solutions
mean of 141.0 240.3 210.9 574.6 175.8 185.8 142.0
costs
standard 22.4 30.2 33.4 43.9 33.1 49.6 22.5
deviation
problem size n" = 30", number of data sets = 50
feasible 50 50 50 44 50 50 50
solutions
mean of 138.7 259.6 296.5 831.8 181.5 218.0 142.5
costs
standard 21.3 25.2 40.3 58.2 35.6 56.5 20.3
deviation

Table 3: Summary of the results of two-dimensional assignment problems for


=
n2 202 and n2 302 • =

solution of the NP-hard three-dimensional assignment problem was obtained


by branch and bound which is still practicable for the considered problem sizes.
This gives a comperative value to the costs of the other solutions. The results
obtained by simulated annealing and the coupled, piecewise continuous selec-
tion equations presented in table 5 show a bad scaling behaviour with respect
to the problem size. The reason for the bad results of the coupled, piecewise
continuous selection equations is due to the violation of condition (72) which is
unfortunately the case in many numerical simulations. Further investigations
under which conditions there exist elements which do not fulfill (72) and appro-
priate transformations to avoid these elements could improve the results of the
coupled, piecewise continuous selection equations [79]. In opposite, the coupled
Dynamical System Approaches to Combinatorial Optimization 515

~a s e
b 1n eq
r mn Iu
a u e ea
nb I a c t
c 0 a 1 t. 1.
hu t i 1 0
n e n on
method &d d g n s
f p ~
0 i n
P r e t
am c 1.
t a en
t t Wu
e i i 0
r 0 s u
variants of n n e s
method
costs for 17 25 103 64
data set 1
costs for 18 88 21 58
data set 2
costs for 18 61 23 142
data set 3
costs for 14 52 109 80
data set 4
costs for 23 102 141 49
data set 5
costs for 21 67 52 195
data set 6
costs for 33 78 90 72
data set 7
costs for 17 66 97 168
data set 8
costs for 26 97 123 81
data set 9
costs for 24 52 74 79
data set 10
mean of 21.1 68.8 83.3 98.8
costs
standard 5.3 22.1 38.4 48.0
deviation

Table 4: Results of the simulations for 10 data sets of three-dimensional assign-


ment problems of size n 3 = 103 .
516 J. Starke and M. Schanz

~ a se
b 1 n eq
r mn I u
a u e ea
nb I a ct
c 0 a I t i
hu t 1. i 0
n e n on
method &d d g ns
f p ~
0 i n
p r e t
am c i
t a en
t t wu
e i i 0
r 0 s u
variants of n n e s
method
problem size n 3 = 103 , number of data sets = 100
feasible 100 100 100 100
solutions
mean of 21.0 66.6 69.1 95.8
costs
standard 5.1 15.8 30.8 45.6
deviation
problem size n 3 = 20", number of data sets = 10
feasible 10 10 10 10
solutions
mean of 5.4 164.6 61.1 162.1
costs
standard 1.7 25.7 28.7 72.4
deviation

Table 5: Summary of the results of three-dimensional assignment problems for


n3= 103 and n 3 = 20 3 •

selection equations of pattern formation show acceptable results.

10 Conclusions and Outlook


This article about dynamical system approaches to combinatorial optimization
problems focuses on the two- and on the NP-hard three-dimensional assign-
ment problem. They can be used as simple models for manufacturing planning.
In addition, there is an important practical relevance of higher-dimensional as-
signment problems in flexible manufacturing planning and distributed robotic
systems. The dynamical system approaches provide approximate solutions to
the optimization problems.
The classical penalty method and the approach of HOPFIELD and TANK
Dynamical System Approaches to Combinatorial Optimization 517

cause problems in fulfilling the constraints, i.e., even for the two-dimensional
assignment problem of size 10 x 10, feasible solutions are not obtained for all in-
vestigated data sets. Furthermore, these methods are very parameter sensitive.
Therefore, they cannot be used in practical applications. An adaptation of the
parameters to new optimization problems is very time consuming and there is
neither a straightforward way nor a guarantee to find parameter settings which
work well. Because these methods use random initial values one can get good
and bad solutions for the same specific combinatorial optimization problem. In
contrast to real statistical approaches like simulated annealing this does not at
all seem to be a necessary ingredient of the approach but is used only because of
the lack of a formula to calculate initial values. Calculated initial values allow
the dynamics to start in a promising, problem corresponding basin of attraction
instead of a randomly chosen one.
Clearly, the results presented in section 9.5 show that, if there exists a poly-
nomial time algorithm based on discrete operations this is the best choice. The
dynamical system approaches can not compete against them except for special
applications where error resistivity is relevant during the solution process. But
if there is no polynomial time algorithm the dynamical system approaches can
be an alternative to other approaches such as statistical ones. Remarkable good
results are obtained by the hybrid method, i.e., the coupled selection equations
of pattern formation with calculated initial conditions and an additional cost
term in the equation of motion. In the case of the N1'-hard three-dimensional
assignment problem the results of the coupled selection equations of pattern
formation are better than the results obtained by simulated annealing. For the
coupled, piecewise continuous selection equations a criterion for the global opti-
mality of the obtained solutions is given. This leads to the conclusion that the
coupled, piecewise continuous selection equations and the hybrid method are
promising approaches and worth for future investigations.
An implementation in hardware, i.e., an analog computer concept, could pro-
vide a big advantage concerning the computing time. One of the most promising
hardware implementations is an optical computer. See [1], [73] for a recurrent
laser system which is able to reconstruct a pattern out of a given set of pat-
terns which are stored in a hologram. The nonlinearity there is based on a
photo-refractive crystal which causes a selection process and favours the most
similar stored pattern. In [67] an alternative optical system showing pattern
formation is introduced using liquid-crystal light valves as nonlinear elements.
These might be promising concepts for the construction of an optical computer
for coupled selection equations.
Furthermore, dynamical system approaches are suitable for the adaptation
to self-organizing structures like cellular and distributed robotic systems and
flexible manufacturing systems (FMS) which are error resistant and flexible
with respect to changing demands.
518 J. Starke and M. Schanz

References
[1] Y. Abu-Mostafa and D. Psaltis. Optical neural computers. Scientific Amer-
ican, 256(3):66 - 73, 1987.

[2] H. Achatz, P. Kleinschmidt, and K. Paparrizos. A dual forest algorithm


for the assignment problem. DIMACS Series in Discrete Mathematics and
Theoretical Computer Science, 4:1 - 10, 1991. The Victor Klee Festschrift.

[3] S. Amari. Mathematical foundations of neurocomputing. In Proceedings of


the IEEE, volume 78, pages 1443 - 1463. IEEE, 1990.

[4] J. Anderson, A. Pellionisz, and E. Rosenfeld. Neurocomputing 2, Directions


for Research. The MIT Press, Cambridge, Massachusetts, 1990.

[5] J. Anderson and E. Rosenfeld. Neurocomputing, Foundations of Research.


The MIT Press, Cambridge, Massachusetts, 1988.

[6] B. Angeniol, G. De la Croix Vaubois, and J.-Y. Le Texier. Self-organizing


feature maps and the travelling salesman problem. Neural Networks, 1:289
- 293,1988.

[7] D. Anosov, I. Bronshtein, S. Aranson, and V. Grines. Smooth dynamical


systems. In Dynamical Systems I, volume 1 of Encyclopaedia of Mathemat-
ical Sciences, pages 149 - 233. Springer-Verlag, Heidelberg, Berlin, New
York, 1988.

[8] V. I. Arnol'd. Gewohnliche Differentialgleichungen. Deutscher Verlag der


Wissenschaften, Berlin, 1979, 1991.

[9] V. I. Arnol'd. Geometrische Methoden in der Theorie der gewohnlichen


Differentialgleichungen. Deutscher Verlag der Wissenschaften, Berlin, 1987.

[10] V. I. Arnol'd and Yu. S. ll'yashenko. Ordinary differential equations. In


D. Anosov and V. Arnol'd, editors, Dynamical Systems I, volume 1 of
Encycolopaedia of Mathematical Sciences, pages 1 - 148. Springer-Verlag,
Berlin, Heidelberg, New York, 1988.

[11] M. Avriel. Nonlinear Programming - Analysis and Methods. Prentice-Hall,


Englewood Cliffs, New Jersey, 1976.

[12] B. Baird. Bifurcation and category learning in network models of oscillating


cortex. Physica D, 42:365 - 384, 1990.

[13] W. Banzhaf. The molecular traveling salesman. Biological Cybernetics,


64:7 - 14, 1990.
Dynamical System Approaches to Combinatorial Optimization 519

[14] M. Bestehorn and H. Haken. Associative memory of a dynamical system:


the example of the convection instability. Zeitschrift fUr Physik B, 82:305
- 308,1991.

[15] K. Binder and A. Young. Spin glasses: Experimental facts, theoretical


concepts, and open questions. Reviews of Modern Physics, 58(4):801- 963,
1986.
[16] C. Guus E. Boender and H. Edwin Romeijn. Stochastic methods. In
Handbook of Global Optimization, pages 829 - 869. 1995.
[17] R. Brockett. Dynamical systems that sort lists, diagonalize matrices and
solve linear programming problems. In Proceedings of the 27th Conference
on Decision and Control, pages 799 - 803. IEEE, 1988.
[18] R. Brockett and W. Wong. A gradient flow for the assignment problem.
In G. Conte, A. Perdon, and B. Wyman, editors, New 7fends in System
Theory, pages 170 - 177, Boston, Basel, Berlin, 1991. Birkhauser.
[19] R. Burkard. Methoden der Ganzzahligen Optimierung. Springer-Verlag,
Wien, New York, 1972.

[20] D. Cvijovic and J. Klinowski. Taboo search: An approach to the multiple


minima problem. Science, 267(3):664 - 666, 1995.

[21] L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New


York, 1991.

[22] R. Durbin and D. Willshaw. An analogue approach to the travelling sales-


man problem using an elastic net method. Nature, 326:689 - 691, 1987.

[23] W. Ebeling. Self-organization, valuation and optimization. In R. Mishra,


D. Maal3, and E. Zwierlein, editors, On Self-Organization, volume 61 of
Springer Series in Synergetics, pages 185 - 196. Springer-Verlag, Berlin,
Heidelberg, 1994.

[24] W. Ebeling, A. Engel, and R. Feistel. Physik der Evolutionsprozesse.


Akademie-Verlag, Berlin, 1990.

[25] M. Eigen and P. Schuster. The hypercycle - part a: Emergence of the


hypercycle. Die Naturwissenschaften, 64:541- 565, 1977.

[26] M. Eigen and P. Schuster. The hypercycle - part b: The abstract hypercy-
cleo Die Naturwissenschaften, 65:7 - 41, 1978.

[27] H. Eiselt, G. Pederzoli, and C.-L. Sandblom. Continuous Optimization


Models - Operations Research. Walter de Gruyter, Berlin, New York, 1987.
520 J. Starke and M. Schanz

[28] J. Fort. Solving a combinatorial problem via self-organizing process: An


application of the kohonen algorithm to the traveling salesman problem.
Biolgical Cybernetics, 59:33 - 40, 1988.
[29] M. Garey and D. Johnson. Computers and Intractability. Feeman and
Company, San Francisco, 1979.
[30] A. Gee, S. Aiyer, and R. Prager. An analytical framework for optimizing
neural networks. Neural Networks, 6:79 - 97, 1993.
[31] F. Glover. Tabu search - part i. ORSA Journal on Computing, 1:190 - 206,
1989.
[32] F. Glover. Tabu search - part ii. ORSA Journal on Computing, 2:4 - 32,
1989.
[33] F. Glover, M. Laguna, E. Taillard, and D. de Werra. Tabu search. Annals
of Operations Research, 41, 1993.
[34] D. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley, Reading, Massachusetts, 1989.
[35] C. GroBmann and J. Terno. Numerik der Optimierung. Teubner Studi-
enbiicher: Mathematik. Teubner, Stuttgart, 1993.
[36] M. Grotschel and L. Lovasz. Combinatorial optimization. In Handbook of
Combinatorics, chapter 28, pages 1541 - 1597. 1995.
[37] H. Haken. Pattern formation and pattern recognition - an attempt at a
synthesis. In H. Haken, editor, Pattern Formation by Dynamic Systems
and Pattern Recognition, volume 5 of Springer Series in Synergetics, pages
2 - 13. Springer-Verlag, Berlin, Heidelberg, 1979.
[38] H. Haken. Synergetic Computers and Cognition - A Top-Down Approach
to Neural Nets. Springer Series in Synergetics. Springer-Verlag, Heidelberg,
Berlin, New York, 1991.
[39] H. Haken. Principles of Brain Functioning - A Synergetic Approach to
Brain Activity, Behavior and Cognition. Springer Series in Synergetics.
Springer-Verlag, Berlin, Heidelberg, New York, 1996.
[40] H. Haken. Decision making and optimization in regional planning. unpub-
lished, 1997.
[41] J. Hertz, A. Krogh, and R. Palmer. Introduction to the Theory of Neural
Computation. Addison-Wesley Publishing Company, Redwood City, 1991.
[42] M. Hestenes. Optimization Theory. John Wiley & Sons, New York, London,
1975.
Dynamical System Approaches to Combinatorial Optimization 521

[43] M. Hirsch and B. Baird. Computing with dynamic attractors in neural


networks. BioSystems, 34:173 - 195, 1995.

[44J M. Hirsch and S. Smale. Differential Equations, Dynamical Systems, and


Linear Algebra. Academic Press, New York, 1974.

[45] J. Hofbauer and K. Sigmund. The Theory of Evolution and Dynamical


Systems. Number 7 in London Mathematical Society Student Texts. Cam-
bridge University Press, 1988.
[46] J. Holland. Adaption in Natural and Artificial Systems. University of
Michigan Press, Ann Arbor, 1975.
[47] J. Hopfield. Neural networks and physical systems with emergent collective
computational abilities. In Proceedings of the National Academy of Sciences
[5J, pages 2554 - 2558.
[48] J. Hopfield. Neurons with graded response have collective computational
properties like those of two-state neurons. In Proceedings of the National
Academy of Sciences [5J, pages 3088 - 3092.

(49] J. Hopfield and D. Tank. Neural computation of decisions in optimization


problems. Biological Cybernetics, 52:141 - 152, 1985.

(50] J. Hopfield and D. Tank. Computing with neural circuits: A model. Sci-
ence, 233:625 - 633, 1986.

(51] R. Horst. Nichtlineare Optimierung. Carl Hanser Verlag, Miinchen, Wien,


1979.

(52J Behzad Kamgar-Parsi and Behrooz Kamgar-Parsi. On problem solving


with hopfield neural networks. Biological Cybernetics, 62:415 - 423, 1990.
[53] W. Kinzel. Spin glasses and memory. Physica Scripta, 35:398 - 401, 1987.
[54] S. Kirkpatrick. Optimization by simulated annealing: Quantitative studies.
Journal of Statistical Physics, 34(5/6), 1984.
(55] S. Kirkpatrick, C. Gelatt, and M. Vecchio Optimization by simulated an-
nealing. Science, 220(4598), 1983.

[56J S. Kirkpatrick and G. Toulouse. Configuration space analysis of travelling


salesman problems. Journal de Physique, 46, 1985.
(57] T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag,
Berlin, Heidelberg, New York, 1984.
(58] T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, Heidelberg,
New York, 1995.
522 J. Starke and M. Schanz

[59] N. Lidstrom, P. Pardalos, L. Pitsoulis, and G. Toraldo. An approximation


algorithm for the three-index assignment problem. unpublished, 1996.

[60] D. Luenberger. Introduction to Linear and Nonlinear Programming.


Addison-Wesley Publishing Company, New York, London, 1973.

[61] S. Matsuda. Stability of solutions in hopfield neural network. Systems and


Computers in Japan, 26(5):67 - 78, 1995. Translated from Vol. J77-D-II,
No.7, July 1994, pp. 1366 - 1374.

[62] S. Matsuda. Theoretical considerations on the capabilities of crossbar


switching by hopfield networks. In Proceedings of the 1995 IEEE Inter-
national Conference on Neural Networks, pages 1107 - 1110. IEEE, 1995.
[63] N. Metropolis, M. Rosenbluth, A. Teller, and E. Teller. Equation of state
calculations by fast computing machines. The Journal of Chemical Physics,
21(6):1087 - 1092, 1953.
[64] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Pro-
grams. Springer-Verlag, Berlin, Heidelberg, New York, 1992.
[65] B. Muller and J. Reinhardt. Neural Networks - An Introduction. Springer-
Verlag, Berlin, Heidelberg, New York, 1991.

[66] Y. Nesterov. Interior-point methods: An old and new approach to nonlinear


programming. Mathematical Programming, 79:285 - 297, 1997.

[67] R. Neubecker, G.-L. Oppo, B. Thuering, and T. Tschudi. Pattern forma-


tion in a liquid-crystal light valve with feedback, including polarization,
saturation, and internal threshold effects. Physical Review A, 52(1):791 -
808, 1995.
[68] K. Pal. Genetic algorithms for the traveling salesman problem based on a
heuristic crossover operation. Biological Cybernetics, 69:539 - 546, 1993.
[69] C. Papadimitriou and K. Steiglitz. Combinatorial Optimization - Algo-
rithms and Complexity. Prentice-Hall, Englewood Cliffs, New Jersey, 1982.
[70] P. Peretto. Neural networks and combinatorial optimization. In Proceedings
of the International Conference "Les Entretiens de Lyon", pages 127 - 134,
Paris, 1990. Springer-Verlag.

[71] C. Peterson and B. Soderberg. Neural optimization. In M. Arbib, editor,


Brain Theory and Neural Networks, pages 617 - 621. MIT Press, Cam-
bridge, London, 1995.
[72] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes
in C. Cambridge University Press, Cambridge, New York, 1992.
Dynamical System Approaches to Combinatorial Optimization 523

[73] D. Psaltis, D. Brady, X.-G. Gu, and S. Lin. Holography in artificial neural
networks. Nature, 343:325 - 330, 1990.

[74] I. Rechenberg. Evolutionsstrategie. Friedrich Frommann Verlag, Stuttgart


Bad Cannstatt, 1973.
[75] C. Robinson. Dynamical Systems - Stability, Symbolic Dynamics, and
Chaos. CRC Press, Boca Raton, Ann Arbor, London, 1995.
[76] H.-P. Schwefel. Numerische Optimierung von Computer-Modellen mittels
der Evolutionsstrategie. Birkhauser-Verlag, Basel, Stuttgart, 1977.
[77] P. Spellucci. Numerische Verfahren der nichtlinearen Optimierung.
Birkhauser Verlag, Basel, Boston, Berlin, 1993.
[78] J. Starke. Cost oriented competing processes - a new handling of assign-
ment problems. In J. Dolezal and J. Fidler, editors, System Modelling and
Optimization, pages 551- 558. Chapman & Hall, London Glasgow, 1996.
[79] J. Starke. Kombinatorische Optimierung auf der Basis gekoppelter Se-
lektionsgleichungen. PhD thesis, Universitat Stuttgart, Verlag Shaker,
Aachen, 1997.
[80] J. Starke and M. Hirsch. Solving assignment problems with a piecewise
continuous dynamical system. unpublished, 1997.
[81] J. Starke, M. Schanz, and H. Haken. Self-organized behaviour of dis-
tributed autonomous mobile robotic systems by pattern formation prin-
ciples. In Proceedings of Distributed Autonomous Robotic Systems (DARS
'98). Springer Verlag, Heidelberg, Berlin, New York, 1998. to appear.
[82] D. Tank and J. Hopfield. Simple neural optimization networks: An aid
converter, signal decision circuit and a linear programming circuit. IEEE
Transactions on Circuits and Systems, CAS-33(5):533 - 541, 1986.
[83] Y. Uesaka. Mathematical aspects of neuro-dynamics for combinatorial op-
timization. IEICE Transactions, E 74(6):1368 - 1372, 1991.
[84] K. Urahama. Analog circuit for solving assignment problems. IEEE Trans-
actions on Circuits and Systems, 41(5):426 - 429, 1994.
[85] D. Van den Bout and T. Miller. A traveling salesman objective function
that works. In Proceedings of the IEEE International Conference on Neural
Networks 1988, volume II, pages 11-299 - 11-303. IEEE, 1988.
[86] D. Van den Bout and T. Miller III. Improving the performance of the
hopfield-tank neural network through normalization and annealing. Bio-
logical Cybernetics, 62:129 - 139, 1989.
524 J. Starke and M. Schanz

[87] P. van Laarhoven and E. Aarts. Simulated Annealing: Theory and Applica-
tions. Reidel Publishing Company, Dordrecht, Boston, Lancaster, Tokyo,
1987.
[88] S. Wiggins. Introduction to Applied Nonlinear Dynamical Systems and
Chaos. Springer-Verlag, Berlin, Heidelberg, New York, 1990.
[89] G. Wilson and G. Pawley. On the stability of the travelling salesman
problem algorithm of hopfield and tank. Biological Cybernetics, 58:63 - 70,
1988.
[90] W. Wong. Matrix representation and gradient flows for np-hard problems.
Journal of Optimization Theory and Applications, 87(1):197 - 220, 1995.
[91] A. Yuille. Constrained optimization and the elastic net. In M. Arbib,
editor, Brain Theory and Neural Networks, pages 250 - 255. MIT Press,
Cambridge, London, 1995.
525

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 525-542
©1998 Kluwer Academic Publishers

On-line Dominating Set Problems for Graphs


Wen-Guey Tzeng1
Department of Computer and Information Science
National Chiao Tung University, Hsinchu 30050, Taiwan
E-mail: tzeng~cis.nctu.edu.tv

Contents
1 Introduction 525
2 On-line dominating set problems for general graphs 526
2.1 On-line algorithm Jump . . . . . . . . . . . . . . . . . . · . 527
2.2 Performance ratio . . . . . . . . . . . . . . . . . . . . . . · . 527
2.3 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

3 On-line dominating set problems for permutation graphs 529


3.1 On-line algorithm Jump2 . . . . . . . . . . . . . . . . . . . . . · . 530
3.2 Performance ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . · . 533
3.3 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · . 538

4 Conclusion 540

References

1 Introduction
A dominating set of a graph G = (V, E) is a subset V' of V such that for
each vertex u E V - V' there is a vertex v E V' so that (u, v) E E. The
lResearch supported in part by grant NSC-86-2213-E-009-024, National Science Coun-
cil, Taiwan.
526 Wen-Guey Tzeng

minimum dominating set problem is to find a set V' of minimum cardinality,


which is denoted by if>(G). It is well known that the minimum dominating
set problem is NP-complete [9]. In this paper we consider on-line dominating
set problems for general and permutation (simple) graphs.
The on-line setting deals with the situation that input data are given one
at a time so that an algorithm with only partial input has to process, make
decision, and cannot regret its decision [5,14, 16, 18]. The performance ratio
of an on-line algorithm is the ratio of the solution found by the algorithm
over the optimal solution in the worst case. In the case of dominating sets,
the performance ratio of an on-line algorithm A is

IA(G)I
,(n) = max{ Iif> (G) I : G = (V, E), IVI = n}.
We consider various on-line settings for general and permutation graphs.
Our main results for on-line dominating set problems are as follows:

1. For general graphs under some on-line setting, we present an on-line


algorithm with performance ratio 1.5fo + Cl and show that the lower
bound of performance ratio is fo - C2 for some constants Cl and C2.

2. For permutation graphs under permutation on-line setting, we present


an on-line algorithm with performance ratio In/2 + C3 and show that
the lower bound of performance ratio is In/2 - C4 for some constants
C3 and C4.

2 On-line dominating set problems for general graphs


We consider two natural settings for on-line dominating set problems for
general graphs. Let A be an on-line algorithm of computing dominating
sets. The first on-line setting is that at time i, A is given the adjacent
vertices of vertex i to vertices 1 to i-I, that is, the adjacency condition
to the future input vertices is not known at the time. The second on-line
setting is that at the beginning A is given all vertices (without edges) of the
input graph and at time i A is given all adjacent vertices of vertex i.
Since information given to A is too weak in the first setting, we can easily
show that the lower bound for performance ratio is n - 1, that is, no on-line
algorithms can do better than random guessing. However, for the second
setting we present an on-line algorithm with performance ratio 1.5fo + Cl
and show that fo - C2 is a lower bound for some constants Cl and C2.
On-line Dominating Set Problems for Graphs 527

2.1 On-line algorithm Jump


Let the input graph G = (V, E) with V = {I, 2, ... ,n}. Let D be the
dominating set found by Jump, where D is empty initially. Let Di be D
at time i (after deciding whether to select vertex i). Therefore, Do = 0,
Di ~ DiH' and Dn is the final output of Jump. For D j , vertex u E V is
dominated if u E Dj or there is a vertex v E Dj such that (u, v) E E. Those
that are not dominated are free. We call vertices k, k :::; j, visited.
Let the input at time i be it, i2,'" ,idp where vertex ij, 1 :::; j :::; di, is
adjacent to vertex i, di is the degree of vertex i, and il < i2 < ... < idi'
Jump puts vertex i into D by the following four rules. Jump checks the rules
one by one and uses the first fit rule, so there will be no ambiguity. For
further -analysis, we classify the vertices in D into j-vertex (jump vertex),
r-vertex, and i-vertex (forced vertex) and define the 9 function.

1. Let m be the number of free variables (for Di-d among iI, i2,"" idi'
r
If m ~ v'nl, Jump puts i into D as a j-vertex.

2. If vertex i dominates all free vertices (for Di-l), Jump puts vertex i
into D as an r-vertex.

3. If for all s, 1 :::; s :::; di, is < i and is ¢ Di-l, Jump puts vertex i into
D as an i-vertex. In this case vertex i is forced into D for dominating
itself and we define g(i) = i.

4. For each vertex j < i such that i = jdj and vertices j,iI,i2,··· ,jdr l
are not in Di-l, Jump puts vertex i into D as an i-vertex since vertex
i is the last one that can dominate vertex j. In this case we say that
vertex j forces vertex i into D and define g(j) = i.

Correctness. For vertex i, 1 :::; i :::; n, if Jump puts it into D, it domi-


nates itself. Otherwise, there is some adjacent vertex j > i that Jump can
select to dominate it in rule 4. Therefore, the final D = Dn is a dominating
set for G.

2.2 Performance ratio


Let G be a minimum dominating set for G. We can easily observe the
following.

1. There are at most l y'nJ j-vertices in D.


528 Wen-Guey Tzeng

2. There is at most one r-vertex in D.

3. If 101 = 1, there are no f-vertices in D.


We define opt that maps a vertex in V into one of its dominating vertices
in 0 as follows.
ifi E 0
opt(i) = { :Wn{ili E C, (i,j) E E} if i ¢ O.

Let w be the weight function of assigning weights to vertices of 0 as follows.


1. For u EOn D, w(u} is increased by 1.

2. For each f-vertex u E D - 0 and each v with g(v} = u, w(opt(v)) is


increased by 11m, where m = I{vlg(v} = u}l.
3. For each j- or r-vertex u E D - 0 and each CEO, w(c} is increased
by 1/101.

Lemma 2.1 For cEO n D, w(c} is not increased for other f-vertices.

Proof. If f-vertex u E D, u -:j:. v, charges to w(v}, u must be in D - O.


Only rule 4 matters. Some v' with opt(v'} = v and g(v'} = u forces u into
D. However, if v' forces u into D, all its adjacent vertices are visited before
and not in Du. This implies that v"s adjacent vertex v is not in D, which
is a contradiction. 0

Lemma 2.2 For v E 0 - D, w(v} is increased by at most rVnl - 1 for


f -vertices.
Proof. Since v rt. D, let Iv = {ul(u,v} E E,u is free for DV-l}. If w(v} is
increased for f-vertex u with g(v'} = u and opt(u'} = v, v' is in Iv and u
r
is in D - O. Since v is not put into D at time v, it has at most Vnl - 1
r
free adjacent vertices for Dv-b that is, IIvl :$ Vnl - 1. Therefore, w(v} is
r
increased by at most Vnl - 1 for f-vertices. 0

Theorem 2.3 Jump's performance ratio is at most 1.5v1n" + Cl.


Proof. For 101 = 1, since D consists of no f-vertices, IDI :$ l vln"J + 1.
Therefore, for each cEO, w(c) is at most Lv'nJ + 1.
For 101 ~ 2, since D consists of at most l vln"J j-vertices and one r-
vertex, for each CEO, w(c} is increased by at most (lvln"J + 1}/2 for j- and
On-line Dominating Set Problems for Graphs 529

r-vertices. By Lemma 2.1 and 2.2, w{c) is increased by at most rVnl - 1


for f-vertices. Therefore, w{c) is at most

rVnl-1 + {lvnJ + 1)/2 ~ 1.5vn+ Cl


for some constant Cl.
Jump's performance ratio is at most max{w{c)lc E C} ~ 1.5yn + Cl. 0

2.3 Lower bound


We provide a strong strong adversary B to show that yn - C2 is a lower
bound for some constant C2.

Theorem 2.4 Performance ratio of anyon-line algorithm for computing


dominating sets is at best yn - C2.
Proof. Let k = vn=I and assume that k is an integer for simplicity. Let
S~ = (Vi', E~), where Vi' = {i, k+1, i{k-1)+3, i{k-1)+4, ... , i{k-1)+k+ 1}
and E~ = {( i, v) I v E Vi' - {i}} for 1 ~ i ~ k.
From time 1 to k, adjacent vertices of vertices 1,2, ... , k are given. As-
sume that al,a2, ... ,ap are in Dk and bl ,b2, ... ,bq are not in Dk, where
p+q = k. At time k+1, B gives {1,2, ... ,k}UV~l UV~2U ... UV~p -{k+1}
as the adjacent vertex set of vertex k + 1. At time k + 1, all vertices in
(V;l UV;2 U ... UV;q) - {k + 1, bl, b2, ... , bq } are forced into D k +1' Therefore,
D = {al,a2, ... ,ap,k + 1} U {ulu E (V;l U V;2 U ... U V;q) - {b l ,b2, ... ,bq }}.
Since p. k +q. k +1 = n and the minimum dominating set is {b l , b2, ... , bq , k +
I}, the lower bound for performance ratio is
q. (k - 1) + p + 1 k· (I + q) - 2q + 1
--'----'----- = --'----'---"---
l+q l+q
2q -1
=In-l-
1-+- >
q -
'n-C2.
yH

3 On-line dominating set problems for permuta-


tion graphs
Let Jr be a permutation over {I, 2, ... , n}. G[Jr] is the permutation graph
of vertices 1 to n such that vertex i and vertex j are adjacent if and only
530 Wen-Guey Tzeng

i: 1 2 3 4 5 6 7
2 6

3 rf'----I---,Qlln
,. (i): 5 2 1 6 7 3 4

Figure 1: A permutation diagram and graph.

if (i - j)(1I"-1(i) -1I"-1(j)) < 0, where 1I"-1(k) is the position of k in 11". A


permutation graph is usually represented geometrically, called permutation
diagram. For example, Figure 1 shows a permutation diagram with 11"(1) = 5,
11"(2) = 2, etc., and its corresponding permutation graph in which vertices i
and j are adjacent if and only if the corresponding lines of i and j in the
permutation diagram intersect.
Finding dominating sets for permutation graphs (in off-line setting) has
been studied [1, 2, 4, 6, 7, 8, 10, 11, 15, 17, 19]. For example, Rhee, etc. [17]
finds an O(n + e) algorithm for the weighted dominating set problem. In
this section we study the on-line dominating set problem for permutation
graphs.
For the two on-line settings in Section 2, the pre-given knowledge of the
underlying input being a permutation graph seems no help in improving
performance bounds. Instead, we consider another natural on-line setting
for permutation graphs. The setting is that at time i, 1I"(i) is given, which
means that at time i the adjacent vertices of vertex 11"( i) are given. We say
that the setting is natural because 11" is a succinct representation of G[1I"].
With this setting, we are able to tighten the upper and lower bounds as
In/2 up to a constant offset. Furthermore, our on-line algorithm runs in
linear time O(n) in overall.

3.1 On-line algorithm Jump2


For G[1I"J, a vertex v is forward if 11"-1 (v) < v and backward if 11"-1 (v) ~ v.
We say that set A of vertices dominates set B of vertices if each vertex in
B is either in A or adjacent to some vertex in A.
Our on-line algorithm Jump2 is in Table 1, which computes a dominating
set D. At time i the current input 1I"(i) shows that vertex 1I"(i) are adjacent to
vertices in {1I"(j) 11I"(j) > 11"( i), 1 ~ j ~ i-I} U {1I"(j) 11I"(j) < 11"( i), i + 1 ~ j ~
On-line Dominating Set Problems for Graphs 531

Table 1: Algorithm Jump2.

Input: 1f(i)j
1. If 1f(i) < last then mark vertex 1f(i) as dominated
2. else mark vertex 1f(i) as "visited"j
3. Case 1f(i) 2:: next:
4. D f- D U {1f(inj last f- 1f(i)j
5. Mark vertex 1f(i) as dominatedj
6. If last> n - l In/2J then next f- n + 1
7. rv
else next f- min{ 1f(i) + 2n"l, n - l In/2 J+ l}j
8. Case 1f(i) < next A i < 1f(i): do nothingj
9. Case 1f(i) < next A i 2:: 1f(i):
10. If there is a visited, but not dominated vertex u < next
11. and all vertices k, 1 ~ k ~ u - 1, are visited or dominated
12. then D f- D U {1f(inj
13. Mark all vertices 1f(m), 1 ~ m ~ i, as dominatedj

n}. The later part is equal to {klk < 1f(i), k f/: {1f(I), 1f(2), ... ,1f(i - In}.
The key idea is that if the current vertex can potentially (but not necessarily)
dominate a sufficient number of vertices, it is put into D. The threshold for
r
the number is l In/2J at the beginning and in the end, and ffnl for the
rest. On the other hand, we should consider that the current vertex 1f(i)
may be forced into D under the following two conditions.

1. All its adjacent vertices have been visited, but not put into D.

2. There is some vertex 1f(j) > 1f(i),j < i, whose adjacent vertices except
1f(i) have been visited, but not put into D.

Variables last and next indicate whether the current vertex 1f(i) can
potentially dominate a sufficient number of vertices later. Vertex 1f(j), j < i,
is called visited if it is not dominated by D yet. A vertex is called dominated
if it has been visited and dominated by D (maybe, by itself). A vertex is
unvisited if it has not been visited yet, but may have been dominated by D.
Initially, last is set to 0, next is set to l In/2 J, and all vertices are marked
as unvisited.
532 Wen-Guey Tzeng

When the current vertex 7r( i) is < last, it is marked as dominated if it


has been dominated. Otherwise, it is marked as visited since it is currently
visited. We then consider the following three cases.

7r(i) 2: next. Vertex 7r(i) is put into D as a j-vertex and marked as domi-
nated since it dominates itself. In this case 7r( i) is either forward or
7r(i) = i. last is updated to 7r(i). If last> n -lv'n/2J, Jump2 won't
put any more j-vertex into D and thus sets next to n + 1. If last ::;
r
n-l v'n/2J, Jump2 updates next to min{ 7r(i) + ffnl, n-l v'n/2J +1}
for selecting next j-vertex. Note that vertices that are smaller than
7r(i), but not visited yet, are dominated by 7r(i). However, they are
not marked as dominated until they are read in.

7r(i) < next 1\ i < 7r(i). Vertex 7r(i) is not put into D. If 7r(i) < last, it is
dominated by j-vertex last and marked as dominated. If 7r(i) > last,
some unvisited vertex j, 1 ::; j < 7r( i), will be selected to dominate it
in the next case.

7r(i) < next 1\ i 2: 7r(i). Vertex 7r(i) is a backward vertex. If some vertex
u < next has been visited, but not dominated, and vertices 1 to u - 1
have been visited, Jump2 puts 7r(i) into D as an f-vertex since it is
the last one that can dominate u. Furthermore, since vertices 7r(m),
1 ::; m ::; i, are dominated by D now, they are marked as dominated. In
this case 7r(i) may be forced to dominate itself. If 7r(i) is not dominated
by D, it shall be dominated by an f-vertex v later, where v < 7r(i)
and 7r- 1 (v) > i.

Correctness. We consider vertices that are not put into D in the second
and third cases. In the second case, visited, but not dominated vertices shall
be dominated by vertices selected in the third case since some unvisited
vertices before them shall be put into D to dominate them later. In the
third case, vertices are not put into D only if they are dominated by j-
vertices and no vertices (including itself) before next have to be dominated
by it. Therefore, the final D is a dominating set for G[7r].
Complexity. We can implement Jump2 to run in linear time O(n) in
overall. We use an array R so that R[i] records the condition of vertex i,
which is visited, dominated, or unvisited. Only steps in Case 3 have to be
considered: (1) How to find a visited, but not dominated vertex before next
in step 10? (2) How to check whether all vertices k before vertex j are
"visited" or "dominated" in step 11? and (3) how to mark vertices 7r(m),
On-line Dominating Set Problems for Graphs 533

1 ~ m ~ i as dominated in step 13? We use three pointers: two for (1) and
(2), and one for (3). We search the array starting from the previous stop to
find the smallest visited, but not dominated vertex. Similarly, we can search
the array to check whether there is an unvisited vertex k before vertex j. To
mark vertices 7r(m), 1 ~ m ~ i, as dominated, we only mark those that are
not marked before. So, the total runtime is linear in n.

3.2 Performance ratio


We can easily observe the following:

1. All j-vertices u are either forward or u = 7r(u) and none of them are
adjacent.

r
2. There are at most In/21 + 1 j-vertices in D. Let i1 < i2 < .,. < jm
r
be the j-vertices in D, where 1 ~ m ~ Vn721 + 1.

3. All f-vertices are backward and none of them are adjacent.

4. jl ~ rIn/21, n - jm ~ lVn72J, and 7r- (ji+d - ji ~ rffnl·


1

5. If ji+l - ji > rffnl, all vertices ji + rffnl,··· ,ji+l -1 are backward


and dominated by the vertex ji+l, where 1 ~ i ~ m - 1.

Let G be a minimum dominating set for G[7r]. The function opt assigns
each vertex in G[7r] a dominating vertex in G as follows.

~(max{7r-l(u)lu E G, (u,v) E E})


if v E G
opt(v) = { if v ¢ G.

The function 9 records the condition that vertex v forces vertex u into
D in the third case.

g(v} = { ~ if vertex v forces vertex u into D


otherwise.

Note that u may equal to v and there may be VI and V2 such that g(vd =
g(V2) = U.

Definition 3.1 Let jo = 1 and jm+1 = n + 1. The ith upper segment Si


consists of vertices ji-l,ji-l + 1, ... ,ji - 1, where 1 ~ i ~ m + 1.
534 Wen-Guey Tzeng

Definition 3.2 Let if> = 1, i:n+1 = n + 1 and 1r- 1 (ii} = i: with 1 ~ i ~ m.


The ith lower segment S: consists of vertices 1r(i~_l}' 1r(i:_l + I}, 1r(j~_1 +
2}, ... ,1r(ji - I}, where 1 ~ i ~ m + 1.

The following lemmas are about upper and lower segments.

Lemma 3.3 C n Sh n SA: = 4> for 1 ~ k < h ~ m + 1.


Proof. If there is a vertex c E C n Sh n SA: with 1 ~ k < h ~ m + 1, c cannot
be a backward vertex since all vertices i in Sh are larger than the vertices i
in SA:, that is, i > i. By the properties of D, vertex c is not a forward vertex
since otherwise it would be chosen as a i-vertex. Therefore, no such vertex
exists. 0

Lemma 3.4 There are at most rJ21il -1 backward vertices in Si n Si for


2 ~ i ~ m.

Proof. Since 1r- 1 (ji+l} - ii ~ rJ21il for 1 ~ i ~ m - 1, there are at most


rJ21il - 1 backward vertices in Si n Si for 2 ~ i ~ m. 0

Lemma 3.5 There are at most rIn/21-1 backward vertices in Si n Si for


i = 1 and m + 1.
Proof. The proof is similar to that of Lemma 3.4. o
We define a charge scheme of assigning weights to vertices in G accord-
ing to vertices in D. Let w be the weight function on vertices in C. For
each vertex u in C, its weight is initially assigned as 0 and updated by the
following charge scheme.

Definition 3.6 (Initial charge)

1. For each vertex u E C n D, w(u} = w(u} + 1.


2. For each f -vertex u E D - G and each vertex v with g( v} = u,
w(opt(v)) = w(opt(v)) + l/l{xlg(x} = u}l.
3. For the i-vertex im, w(opt(jm)) = w(opt(jm)) + l{iil1 ~ i ~ m,ii E
D-C}I·
The following lemmas are about w after initial charge.

Lemma 3.7 For each c E C n D - {opt(jm}}, w(c} = 1.


On-line Dominating Set Problems for Graphs 535

Prool. If there is no vertex v with opt (v) = c, the lemma holds. Otherwise,
let v be a vertex that is dominated by c and with opt (v) = c. If vertex c
is forward, g(v) = 0 and weight w(c) is not charged by I-vertices. Since
c '" opt(jm), it is not charged by those i-vertices, which is in D - O. If
vertex c is backward, g(v) is either c or 0. Since g(v) ¢ D - 0, weight
w(opt(v)) is not charged in this case. Therefore, w(c) is charged by itself
only and thus w(c) = 1. 0

Lemma 3.8 For each backward vertex c E (0 - D) n Si n S;, 2 ~ i ~ m,


w(c) ~ r.;nT2l.

Proof. Weight w(c) is charged by I-vertices u with g(v) = u and opt(v) = c


only. Vertex v is not adjacent to vertices ii-l and ii since, otherwise,
g(v) = 0. There are two cases for such vertex v. The first is that ii-l < v < c
r
and 7r- 1 (c) < 7r- 1 (v) < 7r- 1 (ii). Since I = 7r- 1 (ji) - ii-l ~ ffnl and
such vertices are backward, there are at most 1/2 such vertices. Thus,
w(c) is charged at most 1/2 in this case. The second is that c < v < ii
and 7r- 1 (ji_d < 7r- 1 (v) < 7r- 1 (c). Let T = {vlopt(v) = c, c < v <
ii,7r- 1 (ji-d < 7r- 1 (v) < 7r- 1 (c)} and v' = min{T}. We can see that
7r- 1 (v') > 7r- 1 (c), otherwise g(v') = c. Thus, all vertices in T are domi-
nated by vertex g(v f ) and w(c) is charged at most one in this case. Therefore,
r
w(c) ~ 1/2 + 1 ~ Vnl21 for these two cases. 0

Lemma 3.9 For each backward vertex c E (0 - D) n Si n S: - if - {opt(jm)} ,


r
i = 1,m + 1, w(c) ~ In/41·

Prool. The proof is similar to Lemma 3.8 except that for the first and last
r
segments, both 7r- 1 (jd and n - im are at most Vnl2l 0

Lemma 3.10 For each vertex c E (0 - D) n Sh n Sk' 1 :::; h < k ~ m + 1,


w(c) is charged by at most one I-vertex such that w(c) :::; 1.

Proof. We know that 7r- 1 (c) > 7r- 1 (jk_d. Let T = {xlopt(x) = c,g(x) '" 0}
and v = min(T). All vertices y that are adjacent to ih,ih+b'" ,ik-l are
not in T since g(y) = 0. For vertex u = g(v) E D, it must be u < v
and 7r-l(u) > 7r- 1 (c). Otherwise, vertex c will be forced into D by vertex
v. Therefore, all vertices in T are dominated by vertex u. By the charge
scheme, the weight w(c) is at most charged by u. 0
536 Wen-Guey Tzeng

Lemma 3.11 If there is a vertex CEO n Sh n Sk' 1 ~ h < k ~ m, the


weight w(c') = 0 for c' E (0 - D) n S9 nSf, i + 1 ~ 9 ~ 1 ~ k -1.
Proof. Since any vertex v with opt (v) = d is dominated by one of vertices
jh, jh+b ... , jk-l, and c, we have g(v) = 0 and thus w(c') = O. 0

Lemma 3.12 For each j-vertex ji, 1 ~ j ~ m - 1, W(opt(ji)) ~ 1.


Proof. If ji EO, opt(ji) = ji and by Lemma 3.7 W(opt(ji)) = 1. If ji ¢ 0,
opt(jd = cEO n Sh n Sk' 1 ~ h < i ~ k ~ m + 1. By Lemma 3.10 we can
see that w(c) ~ 1. 0

Lemma 3.13 The weight w(opt(jm)) is at most rIn/21 + 2.


Proof. By the same argument of Lemma 3.12 and the initial charge scheme,
only j-vertices in D - 0 are further charged opt(jm). Since there are at most
r r
In/21 + 1 j-vertices in D, w(opt(jm)) ~ In/21 + 2. 0

Lemma 3.14 For each forward vertex c E (0 - D) n (Si n SD, 2 ~ i ~ m,


r
w(c) ~ V2nl-
Proof. Let P = {vlopt(v) = c and g(v) =I- 0}. Each vertex v E P is
not adjacent to vertices ji-l and ji since, otherwise, g(v) = 0. There are
three cases for such vertex v. The first is that it is in VI = {vlv is a
backward vertex with ji-l < v < c and 7r- l (c) < 7r- l (v) < 7r- I (ji)}. The
second is that it is in V2 = {vlv is a forward vertex with ji-l < v < c and
7r- l (c) < 7r- l (v) < 7r- l (ji)}. The third is that it is in V3 = {vic < v < ji and
7r- l (ji_d < 7r- l (v) < 7r- l (c)}. Let Xl = {xlg(v) = x for some v E VI},
X 2 = {xlg(v) = x for some v E V2}, and X3 = {xlg(v) = x for some v E
V3}. The weight w(c) is IXI UX2UX31. There are three cases to be discussed.
For the case that VI =I- 0, we let Xl U X 2 = {Xl, X2, ••• , xd, where
Xl < X2 < ... < Xk and k ~ 1. It can be seen that in this case V3 =
o and X3 = 0. Since vertices in VI are backward, 7r- l (Xl) > ji-l. If
7r- I (Xk) > 7r- I (jd. 7r- I (Xk_l) < 7r- l (ji) since otherwise the vertex that
forces Xk is dominated by Xk-l and Xk would not be put into D. Therefore,
k - 1 < 7r- I (jd - ji-l ~ rV2nl- Thus, w(c) ~ rV2nl for this case.
r
For the case that VI = 0 and V2 =I- 0, since c - ji-l < V2nl, \V21 =
r
IX21 < V2nl- Also, in this case V3 = 0 and X3 = 0. Therefore, w(c) <
rV2nl for this case.
The last case is that VI = 0, V2 = 0 and V3 =I- 0. All vertices v E V3 are
r
with v - ji-I < V2nl since otherwise vertex v would be selected into D
On-line Dominating Set Problems for Graphs 537

as a j-vertex. Since v > c > ji-I, the number of vertices in V3 is less than
r r r
J2nl and thus IX3 1< J2nl· Therefore, w(c) < J2nl for this case. This
completes the proof. 0

Lemma 3.15 For each forward vertex c E (C - D) n (Si n SD, i = 1, m + 1,


w(c) ::; rIn/2l
Proof. The proof is similar to that of Lemma 3.4 except that 7r- I (jl) and
n- jm are at most rIn/21. 0

In the Lemmas 3.7-3.15, we have discussed all cases for weight w(c),
c E C. Since some forward vertices c E (C - D) n (Si n SD, 2 ::; i ::; m, are
with weight greater than In/2+0(1), we need redistribute the over-charged
weights to other vertices. Let w' be the weight function after redistribution.
Let Ti = C n Si n Si - {ji-I}, 2::; i ::; m.

Definition 3.16 (Weight redistribution)

1. If ITil ~ 2, w'(x) = riD I:yET; w(y) for each x E Ti, where 2 ::; i ::; m.
2. If ITil = 1 and C n Sh n Sk = 0, w'(x) = w'(Opt(ji-r)) = ~(w(x) +
W(Opt(ji-r))), where Ti = {x}, 2::; i ::; m and 1 ::; h < i < k ::; m + 1.

3. For other vertices c E C that are not re-defined in the above, w' (c) =
w(c).

The following lemmas are about w' after weight redistribution.

Lemma 3.17 If ITil ~ 2, w'(Xj) ::; rIn/21, for each Xj E Ti, where 2 ::;
i::; m and j
= 1,2, ... , ITil.

Proof. Let Ti = {XI,X2,'" ,xa } for some a ~ 2. Let P = {v I opt(v) = Xi


and g( v) f= 0, 1 ::; i ::; a}. Each vertex v E P is not adjacent to vertices
ji-I and ji since, otherwise, g(v) = 0. By Lemma 3.4, there are at most
rJ2nl - 1 backward vertices in Si n Si. Similarly as the proof of Lemma 3.8
r
and Lemma 3.14, we have I:~=I W(Xk) ::; J2nl. Therefore, for each Xj E Ti,
1 ::; i ::; a, we have w'(Xj) = ~ I:~=I W(Xk) ::; In/2l r 0

Lemma 3.18 Ifml = 1, w'(x)::; rJn/21+~ andw'(opt(ji-I))::; rJn/21+


~, where 2 ::; i ::; m and Ti = {x}.
538 Wen-Guey Tzeng

Proof. If en Sh n Sic -=J 0, by Lemma 3.11 w(x) = w'(x) = O. Other-


wise, if C n Sh n Sic = 0, w(x) is redistributed with W(opt(ji-l)), where
1 ::; h < i < k ::; m + 1. By Lemma 3.11, w(opt(ji-d) can only be redis-
tributed with w(x). By Lemma 3.12, w(opt(ji-d) ::; 1. Since w(x) ::; ffnl,r
w'(x)=w'(opt(ji-d)= !(w(x) + w(opt(ji-d)) ::; rVn721 + !. 0

Note that w(opt(jm)) need not be redistributed with w(x), x E C n


Sm+! n S:n+! - {opt(jm)}, since by Lemma 3.9 and Lemma 3.15, w(x) ::;
rJn/2l
Lemma 3.19 For each c E C, w'(c) ::; rVn721 + 2.
Proof. Since the weight of vertex c E C with weight greater than In/2 +
0(1) is redistributed to other vertices, if the weight of c E C is not redis-
r
tributed, w'(c) = w(c) ::; In/2l If the weight of c E C is redistributed
with ohter vertices (or vertex), by Lemma 3.17 and Lemma 3.18 we have
w'(c) ::; rIn/21 + 2. 0

Since the final weight w'(c) for each c E C is at most rIn/21 + 2, the
r
performance ratio of Jump2 is at most In/21 + 2.

Theorem 3.20 The performance ratio of Jump2 is at most In/2 + C3 for


some positive constant C3.

Proof. We can see that the total weight of vertices in C is equal to the
number of elements in D. Since the weight of each vertex in C is at most
rIn/21 + 2, the performance ratio of Jump2 is at most In/2 + C3 for some
positive constant C3 < 3. 0

3.3 Lower bound


We use a strong adversary B to show that In/2 - C4 is a lower bound for
performance ratio.

Theorem 3.21 In/2 - C4 is a lower bound for the performance ratio of


on-line dominating set algorithms for permutation graphs. .
Proof. Let A be an on-line algorithm for dominating sets of permutation
graphs and D be the dominating set selected by A. Let k = ffnl. r
B sets 11"(1) = k. If A does not put vertex 11"(1) = k into D at time 1,
B sets 11"(2) = 1,11"(3) = 2, ... ,1I"(k) = k - 1. A has no choice but to put
vertices 11"(2) = 1 to 1I"(k) = k -1 into D. In this case the performance ratio
On-line Dominating Set Problems for Graphs 539

for the first k input vertices is k - 1 and the remaining n - k vertices can
be treated as a new input to A since they are disconnected from the first
11" (i) , 1 :$ i :$ k, vertices. If A puts vertex 11"(1) = k into D at time 1, B sets
11"(2) = 2k. Like the above, A has two choices. If it does not put 11"(2) = 2k
into D at time 2, B sets 11"(3) = 2,11"(4) = 3, ... ,1I"(k) = k - 1, 1I"(k + 1) = 1
and 1I"(k + 2) = k + 1,1I"(k + 3) = k + 2, ... ,1I"(2k) = 2k -1. A must put
1I"(k + 2) = k + 1, ... ,1I"(2k) = 2k - 1 into D. In this case the performance
ratio for the first 2k input vertices is k/2 and similarly the remaining n - 2k
vertices can be treated as a new input to A. If A puts vertex 11"(2) = 2k
into D, B sets 11"(3) = 3k and the argument continues in the same way till
the last 11" (i) , n - k + 1 :$ i :$ n, vertices. For the first hk input vertices,
(n - k)/k ~ h ~ 2, A has performance ratio (k + h - 2)/2. If D already
consists of vertices 11"(1) = k,1I"(2) = 2k, ... ,1I"(n/k) = n, B sets 1I"(n) = 1.
In this case since the minimum dominating set contains vertex 1 only, the
performance ratio is

(1)

We now again consider the case that A does not put 11"(1) = k, the
performance ratio for the first k input vertices would be k-1. The remaining
n - k vertices are disconnected from the first k vertices. These vertices are
considered separately and the performance ratio for these vertices is
k k+1 k + n-2k - 2 n- k
> min{k - 1, 2' -2-'···' ~ '-k-}
k n-k
min{ 2' -k-}·
Therefore, the overall performance ratio for this case is

> . {(k-1)+k (k-1)+¥}


mm 3 ' 2

- ~Jn/2-!
3 3·
(2)

For the case that the performance ratio for the first hk, h ~ 2, vertices
is (k + h - 2)/2, the overall performance ratio is

. (k + h - 2) +k (k + h - 2) + nkhk
> mm{ 4 ' 3 }

> In/2 - ~. (3)


540 Wen-Guey Tzeng

Therefore, by Equations (1)-(3) the best performance ratio that A can


achieve is

> mini In/2, ~Jn/2 - ~,Jn/2 - ~}


In/2 - C4,

where C4 is a positive constant ~ ~. o

4 Conclusion
We have presented on-line algorithms for dominating sets of general and
permutation graphs under various on-line settings. Lower bounds for per-
formance ratio are also shown.
An open problem is to close the gap between upper and lower bounds
for the on-line dominating set problem for general graphs.

References
[1] M.J. Atallah, G.K. Manacher, and J. Urrutia, Finding a minimum in-
dependent dominating set in a permutation graph, Discrete Applied
Mathematics Vol. 21 (1988) pp. 177-183.
[2] K. Arvind and C.P. Rangan, Connected domination and Steiner set on
weighted permutation graphs, Information Processing Letters VoL 41
(1992) pp. 215-220.
[3] A. Bar-Noy, R. Motwani, and J. Naor, The greedy algorithm is optimal
for on-line edge coloring, Information Processing Letters VoL 44 (1992)
pp. 251-253.
[4] A. Brandstadt and D. Kratsch, On domination problems for permu-
tation and other graphs, Theoretical Computer Science VoL 54 (1987)
pp. 181-198.

[5] M. Chrobak and L.L. Larmore, On fast algorithms for two servers.
Journal of Algorithms Vol. 12 (1991) pp. 607-614.
[6] C.J. Colbourn, J.K. Keil, and L.K. Stewart, Finding minimum dominat-
ing cycles in permutations graphs. Operations Research Letters VoL 4
No.1 (1985) pp. 13-17.
On-line Dominating Set Problems for Graphs. 541

[7] C.J. Colbourn and L.K. Stewart, Permutation graphs: connected dom-
ination and Steiner trees, Discrete Mathematics Vol. 86 (1990) pp. 179-
189.

[8] M. Farber and J.M. Keil, Domination in permutation graphs. Journal


of Algorithms Vol. 6 (1985) pp. 309-321.

[9] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide
to the Theory of NP-completeness. (San Francisco, Freeman, 1979).

[10] M.C. Golumbic, Algorithmic Graph Theory and Perfect Graphs, (New
York, Academic Press, 1980).

[11] O.H. Ibarra and Q. Zheng, Some efficient algorithms for permutation
graphs, Journal of Algorithms Vol. 16 (1994) pp.453-469.

[12] R.M. Karp, U.V. Vazirani, and V.V. Vazirani, An optimal algorithm
for on-line bipartite matching, Proceedings of the 22nd Annual ACM
Symposium of Theory of Computing (1990) pp. 352-358.

[13] G.-H. King and W.-G. Tzeng, On-line algorithms for the domination set
problem, Information Processing Letters Vol. 61 (1997) pp. pp.11-14.

[14] E. Koutsoupias and C.H. Papadimitriou, On the k-server conjecture,


Journal of the ACM, Vol. 42 (1995) pp. 971-983.

[15] Y. Liang, C. Rhee, S.K. Dhall, S. Lakshmivarahan, A new approach for


the domination problem on permutation graphs, Information Process-
ing Letters Vol. 37 (1991) pp. 219-224.

[16] M.S. Manasse, L.A. McGeoch and D.D. Sleator, Competitive algo-
rithms for server problems, Journal of Algorithms Vol. 11 (1990)
pp. 208-230.

[17] C. Rhee, Y. Liang, S.K. Dhall and S. Lakshmivarahan, An O{m+n)-


time algorithm for finding a minimum-weight dominating set in a per-
mutation graph, SIAM Journal on Computing Vol. 25 No. 2 (1996)
pp. 404-419.

[18] D.D. Sleator and R.E. Tarjan, Amortized efficiency of list update and
paging rules, Communications of the A CMVoi. 28 No.2 (1985) pp. 202-
208.
542 Wen-Guey Tzeng

[19] A. Srinivasan and C.P. Rangan, Efficient algorithms for the minimum
weighted dominating clique problem on permutation graphs, Theoreti-
cal Computer Science Vol. 91 (1991) pp. 1-21.
543

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 543-588
©1998 Kluwer Academic Publishers

Optimization Problems in Optical Networks


Peng-Jun Wan
Department of Computer Science and Applied Mathematics
fllinois Institute of technology, Chicago, IL 60616
E-mail: wantacsam. i it . edu

Contents
1 Introduction 543
2 Regular Interconnection Topologies 546
3 Wavelength-Routed Optical Network 549
3.1 The Load Version of Path Coloring ... . 550
3.2 The Throughput Version of Path Coloring · . 556

4 Broadcast-and-Select Optical Network 557


4.1 Multi-Transceiver Optical Network . . . . . . . . . . . . . . . . . . . 557
4.2 Cluster Interconnected Optical Network . . . . . . . . · . 573
4.3 Effect of Limited Tunable Range of Transceivers . . . . . . . . . · . 578

5 Conclusion 583
References

1 Introduction
The huge potential of optical networks for satisfying the skyrocketing needs
of broadband telecommunication services while meeting rigid quality of ser-
vice requirements has long been acknowledged. However, although fiber
has become the medium of choice in telecommunication networks, its vast
544 P.-J. Wan

resources are severely under-used, due to the much slower electronics that
are interfaced with the optical medium. For instance, transceivers oper-
ate at speeds that are several orders of magnitude below the actual usable
capacity of the fiber (several Gbps versus hundreds of Gbps). In order to
achieve higher rates, Wavelength Division Multiplexing (WDM) techniques
have been widely suggested. The concept behind WDM is to partition the
optical spectrum into multiple non-overlapping wavelength channels, each
modulated at electronic speed. WDM networks offer potential advantages,
including higher aggregate bandwidth per fiber, new flexibility for auto-
mated network management and control, noise immunity, transparency to
different data formats and protocols, low bit-error rates, and better network
configurability and survivability-all leading to more cost-effective networks.
Depending on the required geographical coverage, optical networks can
be provisioned with passive star couplers (for local-area environment) or
with wavelength-selective routers and switches (for wide-area environment).
An N x N passive star coupler [23, 43] is a passive device without any
external power supply. It accepts optical signals from N input ports on
N different wavelengths, and then it combines them into an optical signal
that consists of all the incoming wavelengths. The combined signal is split
into N parts and they are distributed onto the N output ports. Optical
access nodes connected to the passive star coupler are equipped with op-
tical transmitters and receivers. A connection between two nodes can be
set up via the passive stdr coupler if a transmitter of the source node and
a receiver of the destination node are tuned to the same wavelength. Due
to the broadcast nature of the passive star coupler and the select property
of the optical receivers, such networks are called broadcast-and-select opti-
cal networks. Fig. 1 shows an optical passive star network with N nodes.
Passive stars present the advantage of smaller power losses as compared to
linear optical busses [17]. This leads to greater network sizes. Moreover,
the operation of the 'letwork is completely passive which provides greater
reliability.
Optical switches direct the input signals to one or more of the output
links. Several types of optical switches exist or in development. An elemen-
tary switch is capable of directing the signals coming along each of its input
signals to one or more of the output ports. The elementary switch can-
not, however, differentiate between the different wavelengths coming along
the same link. Rather, the entire signal is directed to the same output(s)
[8,39, 1,50]. Generalized switches, on the other hand, are capable of switch-
ing incoming streams based on the their wavelengths [18, 1, 50]. Using
Optimization Problems in Optical Networks 545

o
o NxN o
o
o Optical o

Passive Star

1
L-.-_ _~ R T 1----....1
2
'-----+/ R T 1-------'
o
o
o
N
'-------+\ R T 1--------'

Figure 1: An N-node optical passive star network

acousto-optic filters, the switch splits the incoming signals to the different
streams associated with the various wavelengths. The switch may not route
to the same link different messages which are using the same wavelength. In
this paper we will assume that the optical switches are generalized switches
and the optical networks based on such witches are called wavelength-routed
optical networks. These networks take advantage of wavelength reuse and
can connect a large number of nodes using only a few wavelengths.

The design of cost-effective WDM networks involves combinatorial de-


sign and combinatorial optimization. The design issues include network
architecture, interconnection design, medium access control, channel as-
signment, wavelength routing, scalability analysis,reconfigurability analysis,
cost-performance analysis and network management among others. Finding
the solution, the optimal solution, or the approximation solution to such
problems is challenging because the problems are, in general, cannot be
solved in polynomial time.

This paper is organized as follows. Section 2 gives a brief description of


various regular interconnection topologies. Section 3 studies the routing se-
lection and wavelength assignment problem in wavelength-routed networks.
Section 4 studies various design issues in the broadcast-and select optical
networks. Finally, section 5 will concludes this paper.
546 P.-J. Wan

2 Regular Interconnection Topologies


WDM optical networks often use regular interconnection topologies either
as their physical topologies or as their virtual topologies. A regular intercon-
nection topology could be either a regular (undirected) graph or a regular
directed graph (or digraph in short). When an undirected graph is used as
an interconnection topology, each edge is treated as two links in opposite
directions. A regular graph is a graph in which every node has the same
degree. A regular digraph is a directed graph in which every node has the
same out-degree and in-degree and hence referred to as degree only. We will
denote the number of nodes by N and the degree of each node by d. In this
section, we will review some widely used regular graphs and digraphs. In
particular, for each topology we will give an order of all outgoing links and
all incoming links at each node.
The simplest regular graph is complete graph. An complete graph with
n nodes is denoted by Cn. For Cn, N = d = n. For each node 0 ~ a ~ n-l,
its i-th outgoing link is
a --+ i,
and its i-th incoming link is
i --+ a,
where 0 ~ i ~ n - 1.
In the complete graph each node has a self-loop. A modification of the
the complete graph is to remove all the self-loops. The resulting graph is
also a regular graph, called as a complete graph without self-loops. We will
denote by C~ the graph obtained from Cn by removing the self-loops. For
C~, N = nand d = n - 1. For each node 0 ~ a ~ n - 1, its i-th outgoing
link is
a --+ (a + 1 + i) mod n,
and its i-th incoming link is
(a - 1 - i) mod n --+ a,
where 0 ~ i < n - 1.
Hypercube is a widely used interconnection pattern. It has some very
elegant properties such as logarithm diameter, simple routing and greater
fault-tolerance. A n-dimensional hypercube H n , or n-cube in short, has
N = 2n nodes which are labeled by n-bit binary numbers. Each node has
degree of d = n. For each node 0 ~ a ~ 2n - 1, its i-th outgoing link is
a --+ a ED 2i ,
Optimization Problems in Optical Networks 547

and its i-th incoming link is

where 0 :::; i :::; n - 1 and the operator E9 is the parity operator (bitwise
exclusive or).
The generalized de Bruijn digraph is a generalization of de Bruijn digraph
[15] by I. Imase and M. Itoh [31] and S.M. Reddy, D.K. Pradhan and J.G.
Kuhl [51], independently. A generalized de Bruijn digraph D(n, d) has N =
n vertices and degree of d. For each vertex 0 :::; a :::; n - 1, its i-th outgoing
link is
a ~ (ad + i) mod n,
and its i-th incoming link is

l -ind+ -a J ~a,

where 0 :::; i :::; d - 1. When n = dk for some k > 0, D(n, d) is isomorphic


to the de Bruijn digraph with the same size and degree. The generalized de
Bruijn digraph D(N,p) has a lot of attractive properties. First it's scalable,
i.e., it can contain any number of nodes. Secondly, its diameter is at most
flogd n1. Moreover, D(n, d) is (d - I)-connected [33].
The generalized Kautz digraph is also called Imase-Itoh digraph. It is,
a generalization of Kautz digraph [38] by I. Imase and M. Itoh [32]. A
generalized Kautz digraph K(n, d) has N = n vertices and degree of d. For
each vertex 0 :::; a :::; n - 1, its i-th outgoing link is

a ~ ((n - a -l)d + i) mod n,

and its i-th incoming link is


in+a
n - 1- l-d- J ~ a,
where 0 :::; i :::; d -1. It was shown in [24] that when n = ,f + ,f-l for some
t > 0, K(n, d) is isomorphic to the Kautz digraph with same size and degree.
The generalized Kautz digraph K(n, d) has a lot of elegant properties. It
does not contain self-loops. It's scalable. Its diameter is at most flogd n1.
Moreover, Imase and Itoh [32] proved that if n = dB + dB+t for some sand
some odd t, then K (n, d) has diameter flogd n1- 1. K (n, d) also has a good
connectivity. It is shown by [33] that K (n, d) is at least (d - 1)-connected.
548 P.-J. Wan

A class of graphs, called Cayley graphs, use a group-theoretic approach as


a basis for defining graphs. Let G be a finite group and S a set of generators
for G. The Cayley graph of G with generating set S, denoted by Cay{S : G),
is defined as follows.

1. Each element of G is a vertex of Cay{S : G).

2. For x and y in G, there is a link between x and y if and only if x· s = y


for some s E S.

An n-dimensional star graph Sn, also referred to as n-star, is the Cayley


graph Cay{S : G) where G is group consisting of all permutations on n
symbols {I, 2, ... ,n}, and S consists of n - 1 transposit~ons {Si I Si =
(I, i), 2 :::; i :::; n}. The n-star Sn consists of N = n! nodes, and has degree
of d = n - 1. At each node x, its i-th outgoing link is

and its i-th incoming link is

where 0 :::; i < n - 1 The star graphs have a lot of attractive properties. All
star graphs are vertex and edge symmetric [2]. The diameter of the n-star
is l3(n;1) J [3].
The rotator digraph is also a member of Cayley digraphs. An n-dimensional
rotator graph R n , also referred to as n-rotator, is the Cayley digraph Cay{S :
G) where G is the group consisting of all permutations on n symbols {I, 2, ... , n},
and S consists of n - 1 left rotations {ailai = (i, 1,2"" ,i - 1),2 :::; i :::; n}.
The n-rotator Rn consists of N = n! nodes and has degree of d = n -1. At
each node x, the link
x -t xai+2
is called the i-th outgoing link of x, and the link

is called the i-th incoming link of x, where 0 :::; i < n - 1. The rotator
digraphs have a lot of elegant properties. All rotator digraphs are· vertex
and edge symmetric. In [20], it is proved that the diameter of the n-rotator
is n - 1. It has a simple optimal routing algorithm.
Optimization Problems in Optical Networks 549

3 Wavelength-Routed Optical Network


In this section we consider wavelength-routed optical networks based on
generalized switches. Such optical network consists of nodes, interconnected
by point-to-point fiber optic links. The nodes may be occupied by terminals,
generalized switches, or both. Terminals send and receive signals. Each
of the fiber-optic links supports a number of WDM wavelength channels.
This allows the parallel transmission on an optic fiber link of different data
streams, with speed related to the assigned wavelength. The main constraint
the wavelength allocation must obey is that for a link, on a given wavelength,
only one signal can be transmitted. Two data streams transmitted on a link
must be assigned with different wavelengths. Conversion of data between
different wavelengths is limited by the current technology. If we assume that
wavelength conversion is not allowed, a data stream is then assigned to a
single wavelength between the transmitter and the receiver.
Traditionally, a wavelength-routed optical network can be modeled either
as an undirected graph or as a digraph, whose vertices are switching routers
with possible connected terminals, and edges/links are optic fibers. The
digraph modeling is more accurate as optic fibers should be directed in one
single direction due to the fact that optic amplifiers are directed devices. A
particular digraph model is the symmetric digraph in which any connection
between a pair of nodes is a pair of two directed optical links in opposite
directions. Although the undirected graph model is less unrealistic, many of
the results presented for the undirected model, however, can extend under
certain restrictions to the symmetric digraph model. Finally, the same set
of wavelengths is considered potentially available on all the links of the
network.
In a wavelength-routed optical network, a communication request is a
pair of vertices (s, t). The two vertices in a communication are consider
unordered in the undirected modeling and are ordered in the directed mod-
eling. A communication instance I is a sequence or collection of requests.
A routing R for an instance I is a set of paths connecting the requests in I.
The conflict graph associated to a routing R is the undirected graph with
vertex set R and two vertices of R are adjacent if and only if they share a
link of G.
An algorithm for wavelength routing in general deals with two problems:
routing selection and wavelength assignment. The routing selection problem
consists of choosing a path connecting node s to t. The wavelength assign-
ment problem consists in assigning a wavelength w for the path connecting
550 P.-J. Wan

node s to t such that each link of the path is not assigned to any other com-
munication on wavelength w. The wavelength routing problem has been
often referred to as path coloring. Given a graph G = (V, E) representing
the network, we are given a sequence (Si' til of requests consisting of pairs
of vertices of the graph G. The algorithm must assign to each pair (Si' til
a color (wavelength), and a path connecting Si to ti such that two paths
sharing a link are not associated with pairs of same color.
A load version and a throughput version have been studied.
In the load version, all the communication requests must be accepted
with the goal of minimizing the overall number of colors used. One should
be aware of the severe limitations that current optical technologies impose
on the amount of available wavelengths per fiber. While experimental sys-
tems report large number of up to 100 wavelengths per fiber, current state
of the art manufacturing processes restrict the number of wavelengths per
fiber of commercial WDM multiplexers to as low as 4 (Pirelli), 8 (Lucent
Technologies), and up to 20 (IBM). Thus our aim is to minimize the num-
ber of colors used in a path coloring. Both off-line and on-line path coloring
algorithms will be studied. For the off-line path coloring problem, all the
requests in an instance are known before scheduling. An off-line path col-
oring algorithm tries to minimize the number of colors used. If the number
of colors required to realize a set of requests is greater than the number of
available colors, then more than one all optical communication round are
to be utilized. For on-line path coloring algorithms, the future requests are
unknown. An online path coloring algorithm tries to choose proper color
and path for the current request such that the future requests can use as
few extra colors as possible. The competitiveness of the on-line algorithms is
the measure of algorithm and will be present for various on-line algorithms.
In the throughput version, the number of available colors is considered
fixed, while the goal is to maximize the number of communications that
can be accepted given the network topology and the number of available
wavelengths. In this paper, we will only study the on-line algorithms for the
throughtput version.

3.1 The Load Version of Path Coloring


First of all, we consider the computational complexity of the off-line algo-
rithms. NP-completeness results in the undirected model were known much
earlier (actually, well before the advent of the WDM technology). In partic-
ular, in [25] it is proved that the path coloring problem is NP-complete for
Optimization Problems in Optical Networks 551

trees. This result has been extended in [25] to rings, while in [26] it has been
proved that the problem is efficiently solvable for bounded degree trees. For
a general network G and an arbitrary instance J, the path coloring problem
has been proved to be NP-hard in [25], and the problem remains NP-hard
for directed trees and directed rings. In [26] these results have been extended
to binary directed trees and meshes.
For the on-line path coloring algorithms, we consider the lower bound
on the competitive ratio. An O(n€) lower bound on the competitive ratio of
randomized algorithms working for arbitrary network topologies has been
shown by Bartal, Fiat and Leonardi [12]. This lower bound is achieved in a
way similar to the lower bound for the on-line edge-disjoint paths problem.
A lower bound on the the on-line graph coloring problem is first established
and then it is turned into a lower bound for on-line path coloring in a network
shaped as a brick wall.
One naturally related problem is the load balancing problem. The load
balancing problem considers to find a routing R for an communication in-
stance I in a graph G which minimizes the maximum load (also called con-
gestion) on an link, the number of paths crossing the link. For any graph G
and any instance I, let x( G, J) denote the optimal number of colors required
by the path coloring problem, and 7f(G, 1) denote the optimal load required
by the load balancing problem. Then they have the following relation.

Theorem 3.1 For any graph G = (V, E) and any instance J,

7f(G,I) ~ X(G,J) ~ 2j1E17f(G,I).


Furthermore, for every 7f and L, there exists a graph G and an instance I
such that 7f(G,I) = 7f, max(s,t)EI d(s, t) = Land X(G, I) = O(L)7f.

The inequality 7f( G, I) ~ X( G, 1) is obvious. The proof of the inequality


X(G,I) ~ 2..jfEf7f(G,J) and mesh-like network which satisfy the three con-
ditions in the second part of the theorem can be found in [1]. Here instead
to repeating the proof, let's give some intuition. Let L be the maximum
distance of the requests and b... be the the maximal degree of the conflict
graph associate with the optimal routing for the load balancing problem.
It's clear that b... ~ L7f( G, I). By a greedy coloring we know that b... + 1
colors are sufficient to color the paths. Thus X(G,I) = O(L)7f(G,J).
The above theorem applies to any graph G and any communication
instance I. In the following, we will study their relations for specific com-
munication instance: h-relations.
552 P.-J. Wan

An h-relation is an instance in which each node is a source and a destina-


tion of no more than h requests. A I-relation is also known as a permutation
instance. The (n - I)-relation is also known all-to-all instance.
By deriving a lower bound on the number of links used in the worst case
and an upper bound on the total number of links in the network, Pankaj
[45, 46] obtained in his thesis the following results.

Theorem 3.2 For every undirected graph or symmetric digraph G of max-


imum degree !:l and diameter D, there exists a permutation instance I such
that
G I) > llogll ~J
X( , - 2!:l .
Furthermore, if G is vertex transitive, there exists a permutation instance I
such that
rD
X(G, I) ~ !:ll-

In addition, Pankaj obtained the lower bound min(h,~) llO~i lJ-J for a
worst case h-relation instance.
Raghavan and Upfal [50] have shown an existential lower bound which
relates the number of colors and the edge expansion, by starting from the
same example in [1].

Theorem 3.3 For every j3 :::; 1 and 1 :::; k :::; n, there exists a graph G with
edge expansion j3 and a h-relation I such that
h
X(G,I) = O(j32).

The load balancing problem can be formulated to an integer multicom-


modity flow problem. For a permutation instance, Leighton and Rao (see
Theorem 2 in [42]) have given an efficient algorithm to obtain an integer
flow with maximum load and dilatation (length of a longest path) both
O(log f3(G»). Since each vertex in the conflict graph of the resulting routing
has degree upper bounded by load times dilation, the greedy coloring algo-
rithm allows to use OeO~~n) colors, as it has been noticed in [6]. By Hall's
theorem, this implies a result for h-relations.

Theorem 3.4 There is an efficient algorithm to solve the path coloring


problem for every h-relation in any bounded degree network G with edge
expansion j3, using at most O( h Ip~2 n) colors.
Optimization Problems in Optical Networks 553

Note that this result almost matches the O( f.k). existential lower bound.
In the above, the network topologies are arbitrary. In the following we
consider the path coloring problem in specific networks.
The path coloring problem on an undirected line network is equal to the
interval graph coloring problem. Each vertex of an interval graph is mapped
to an interval on the line. Two vertices of the interval graph are adjacent
if and only if the corresponding intervals are overlapping. Since an interval
graph is a perfect graph, the optimal solution uses exactly 7r(G,I) colors,
and can be found in polynomial time for example with a simple divide and
conquer technique.
The problem of coloring on-line an interval graph has been first studied
by Kierstead and Trotter [40]. They present an optimal on-line determin-
istic algorithm whose competitive ratio is at most 37r(G,I) - 2 and prove
that no deterministic algorithm can achieve a better ratio. These results
immediately extend to the path coloring on a line.
Now we consider the path coloring on rings. Given a routing R for an
instance J in a ring G, Thcker [53] gave an efficient algorithm to solve the
path coloring problem using at most 27r(G, J) - 1 colors and such upper
bound is tight. The basic idea is cutting the ring at any edge and then
transforming the problem on a line network. The number of paths that
cross this edge in an optimal solution is at most equal to the optimal load.
Then, the load on any other edge in the resulting problem on a line network
is increased by at most 7r(G, I). Furthermore, it was shown in [27] that there
is a linear time algorithm to the load balancing problem in any undirected
ring. Combining these two results, an efficient approximation algorithm of
ratio two can be achieved for the path coloring problem. using the same idea
as Thcker, such approximation algorithms have been shown in the undirected
model in [50] and in the directed model in [44]. An on-line algorithm for
the line can now also be applied yielding an algorithm for the ring whose
competitive ratio is away from the competitive ratio of the algorithm for the
line by at most a multiplicative factor of 2.
If each pair in a ring can be connected with only one of the two possible
paths, then the problem reduces to the circular arc graph coloring problem.
In a circular arc graph, each vertex can be mapped on on an arc of a ring,
and two vertices are connected with an edge if and only if the two arcs are
overlapping. For this problem, Slusarek has shown that an on-line algorithm
can use a number of colors bounded by 37r( G, I) - 2 yielding the same result
for the interval graph coloring.
In the case of undirected trees G, Tarjan [52] proved that x( G, J) ~
554 P.-J. Wan

~1f(G, J) and this bound is tight. Erlebach and Jansen [25] gives an efficient
approximation algorithm.

Theorem 3.5 There is an efficient algorithm to the path coloring problem


which uses at most L1.1X(G, I) + 0: 8J colors for any instance J in any tree
network G.

In case of directed symmetric tree G, it was generally believed [44] until


very recently that in a symmetric tree the ratio;fg:Bwas never greater than
~. However, Jansen [35] constructed a symmetric tree G and an instance J
such that ; g:~ = i. Moreover, Kaklamanis et al [37] formulated the path
coloring pro lem on directed symmetric tree to a constrained bipartite path
coloring problem and get the following result:

Theorem 3.6 Let G be a symmetric tree. Then for all instances J we have
X(G,I)1f(G, I) ~ i1f(G, J), and there is an efficient greedy algorithm to find
a path coloring with at most i1f(G, I) colors.

The problem of on-line path coloring on a tree can be reduced to the


problem of coloring on-line an O(1f(G,J))-inductive graph. A graph is d-
inductive if the vertices can be associated with numbers 1 through n in a
way that each vertex is connected to at most d vertices with higher numbers.
Given an instance of the path coloring problem on a tree, the so called inter-
section graph is built by associating a vertex with each path, and connecting
two vertices with a link if the two corresponding paths are intersecting. The
intersection graph can be shown to be 2(1f( G, I) - 1)-inductive [13]. An
algorithm by Irani [34] that shows how to color on-line a d-inductive graph
with O(dlogn) colors can then be applied to yield an O(logn} competitive
algorithm.
An almost matching deterministic n(lo~fo;n) lower bound on the com-
petitive ratio of on-line algorithms for path coloring on trees has also been
proved by Bartal and Leonardi [13]. This lower bound has been obtained on
a tree of depth logn. This shows that an O(logD) lower bound, where Dis
the diameter of tree, is not possible at least for deterministic algorithms.
The problem of path coloring on meshes has been approached by par-
titioning the meshes in submeshes of logarithmic size. This follows the
solution proposed by Kleinberg and Tardos for edge-disjoint paths problem
on meshes [41]. Calls are divided between short calls, with both endpoints
in the same submesh, and long calls, with endpoints in different submeshes.
Optimization Problems in Optical Networks 555

Disjoint set of colors are dedicated to long calls and short calls. Short calls
are routed through a shortest path connecting the two vertices. Short calls
in different submeshes are then non-conflicting, and the colors can be reused
in different submeshes. The routing of long calls are transformed into a path
coloring problem with more bandwidth in a simulated network where each
fiber optic connection between two vertices is formed by a link with loga-
rithmic capacity. This results in the fact that a logarithmic number of paths
with same color can include a link. A vertex of the simulated network is
associated with each submesh, while two vertices are connected by an edge
with logarithmic capacity if the two corresponding submeshes are adjacent.
This is to model that at most a logarithmic number of paths with same color
can be routed through two adjacent submeshes. Each call in the simulated
network is then given to the algorithm for path coloring with more band-
width, that assigns a color and a route in the simulated network. This route
is then converted into a route in the original mesh preserving the property
that paths with same color are edge-disjoint.
Using the above approach, [6] found the first per-instance approxima-
tion algorithm for bounded dimension meshes and it also holds for bounded
dimension tori
Theorem 3.7 There is an efficient algorithm to solve the path coloring
problem for any instance I in any bounded dimension mesh network G,
using at most O(log log N log IIlx( G, I)) colors.
Rabani [49] improved recently the previous approximation results ob-
tained for the square meshes, although the hidden constants are huge:

Theorem 3.8 There is an efficient algorithm to solve the path coloring


problem for any instance I in any square mesh network G, using at most
O(poly(log log N)x( G,I)) colors.

In the on-line path coloring algorithm on meshes given by Bartal and


Leonardi [13], the colors are assigned to short calls in a greedy manner
according to their relative length. This first part of the algorithm yields
an O(logn) competitive ratio with respect to an optimal algorithm for short
calls. The part of the algorithm dealing with long calls has also a logarithmic
competitive ratio. Therefore the combination of the two parts yields an
overall O(1og n) competitive algorithm for the path coloring problem on
meshes. The lower bound on the competitive ratio of randomized on-line
algorithms for the on-line load balancing problem for virtual circuit routing
556 P.-J. Wan

[13] also implies that the algorithm for path coloring on meshes is optimal
up to a constant factor.
The path coloring on hypercubes are relatively simple. Following the
idea of [1], Gu and Tamaki [28, 29] have proved the following results for
hypercube.

Theorem 3.9 To realize any permutation, eight wavelengths are sufficient


in undirected hypercubes and two wavelengths are sufficient in directed sym-
metric hypercubes

3.2 The Throughput Version of Path Coloring


The throughput version of path coloring consists of maximizing the number
of communication requests that can be accepted for a given set of wave-
lengths and a given network topology. In this section, we only consider
the on-line algorithms. If we look to algorithms for general networks, an
O{n f ) lower bound on the competitive ratio immediately follows from the
analogous lower bound for the on-line virtual circuit routing problem [12]
Let us assume that for a given network topology, a competitive algo-
rithm is available for the on-line edge-disjoint path problem. This can also
be considered an algorithm for the throughput version of the path color-
ing problem when one single wavelength is available. A simple technique
proposed by Awerbuch, Azar, Fiat, Leonardi and Rosen [7] allows to trans-
form an algorithm for a single wavelength into an algorithm for the multiple
wavelength case at the expenses of an additive term of 1 in the competitive
ratio.
The algorithm uses a first fit approach. Each one of the C colors are
assigned with a number from 1 to C. A different copy of the algorithm for
on-line virtual circuit routing is then executed for each one of the colors.
Every time a new call is presented we apply a first fit based approach. The
call is given as an input to the algorithm for the first color. If this algorithm
accepts the call, than it is assigned with the first color and the route chosen
by the algorithm. If the algorithm for the first color rejects the call, this
is given as input to the algorithm for the second color, and so on until the
call is eventually accepted by an algorithm for a color, or rejected by all the
colors. In this last case the call is rejected by the overall algorithm.
Let c be the competitive ratio of the base algorithm (deterministic or
randomized) for virtual circuit routing. The algorithm described above for
the throughput version of on-line path coloring achieves a competitive ratio
Optimization Problems in Optical Networks 557

of at most c+ 1. This allows to state the existence of an o (log n) competitive


algorithm for those topologies for which there exists a competitive O(logn)
competitive algorithm for on-line virtual circuit routing.

4 Broadcast-and-Select Optical Network


In this section, we will study various design problems rising from the de-
sign of broadcast-and-select optical networks. Section 4.1 studies the op-
timal wavelength assignment and transmission schedule for various single-
star multi-transceiver optical networks. Section 4.2 presents the optimal
conflict-free channel set assignment for various multi-star cluster intercon-
nected networks. Section 4.3 investigates the effect of the tunable range on
the maximum concurrence in various optical networks.

4.1 Multi-Transceiver Optical Network


The transceivers at each node can either be fixed or tunable. With the state-
of-the-art technology, tunable transceivers cost much more than the fixed
transceivers. The tunable range of tunable transceivers is further restricted
[14] and is inversely related to its tuning speed. Furthermore, the use of
tunable transceivers requires accurate synchronization. On the other hand,
the use of single fixed transmitter and single fixed receiver at each station
would cause low transmission concurrence and long end-to-end delay. One
solution around this problem is to use multiple fixed transmitters and multi-
ple fixed receivers at each node. This section will consider the network that
employs multiple fixed transmitters and multiple fixed receivers at each sta-
tion. This section will consider such transceiver configuration. We assume
that each station a has T fixed transmitters {(a, t) I 0 ~ t ~ T - 1} and R
fixed receivers {(a,r) I 0 ~ r ~ R-1}. For each transmitter (a,t) (receiver
(a, r)), a is called its node index and t (r) is called its local index.
One unique feature of optical network is that various virtual topologies
can be embedded into the physical topology through proper wavelength as-
signment. The embedding of a regular virtual topology into a WDM network
in which each station has a single fixed transmitter and a single fixed re-
ceiver was introduced in[64]. The embedding of a regular virtual topology
into a WDM network in which each station has multiple fixed transmitters
and multiple fixed receivers was described in [63]. Let d be the nodal degree
of the regular virtual topology and we assume that d is a multiple of both
558 P.-J. Wan

T and R. At each station a, we partition the d outgoing (incoming) links


at into T (R) groups, and associate the t-th (r-th) group with transmitter
(a, t) {receiver (a, r)). Fig. 2 illustrates this partition. Then for each link in

tpff

••
• (t+ l)ptr-l

p-l

Figure 2: Partition the outgoing links at each station a into Tgroups, and
associate each group with a transmitter.

the regular virtual topology, we tune the pair of transmitters and receivers
that are associated with the link to the same wavelength channels. Fig. 3
illustrates of the realization of D(8,4) with T = R = 2.This realization
scheme can be intuitively interpreted as virtually breaking each source (des-
tination) station into T (R) small nodes, with each small node implementing
d/T(d/ R) outgoing (incoming) links.
The above realization can be formulated by a transmission graph. The
transmission graph is a bipartite digraph. The vertex set of the transmission
graph is the union of the transmitter set

{(a,t) 10~a~N-l,0~t~T-l}

and the receiver set

{(b,r) I 0 ~ b ~ N -1,0 ~ r ~ R -I}.

Each link of the transmission graph is from a vertex(or transmitter) in the


transmitter set to a vertex (or receiver) in the receiver set. Each transmitter
has ~ outgoing links, and each receiver has ~ incoming links. There is a
Optimization Problems in Optical Networks 559

fr:\ ,0> ..
·(0,0
~·····(0,1 @
CD CD
0·· ,1> .. 0
0) 0)

Figure 3: The realization of de Bruijn digraph D(8, 4) with T = R = 2.

one-to-one correspondence between the links in the regular virtual topology


and the links in the transmission graph. For any link

a~b

in the regular virtual topology, if this link is the i-th outgoing link of a
and j-th incoming link of b, then the corresponding link in the transmission
graph is
i j
(a, Lcd) ~ (b, Lcd)·
T R
The above embedding only specifies which transmitters and receivers
should be assigned the same wavelength channel. It does not tell which
transmitters and receivers can have different wavelength channels. In fact,
in the extreme case, all the transmitters and receivers can be assigned the
same wavelength channel. However, this trivial wavelength assignment pro-
vides no transmission concurrence. As the transmission cycle length is equal
to the number of transmitters sharing the same wavelength, it's desirable to
minimize the number of transmitters sharing any common wavelength. To
achieve this objective, we first characterize the structures of the all transmit-
560 P.-J. Wan

ters and receivers that are required to have the same wavelength channels
by the above realization scheme.
Since all the transmitters and receivers are fix-tuned, for any transmitter
(receiver) in the transmission graph, all receivers(transmitters) that it con-
nects to are forced to have the same wavelength channels of the transmitter
(receiver). Therefore for any pairs of transceivers, if there is path between
them assuming the links in the transmission graph are bidirectional, they
must have the same wavelength channel. This key observation leads to the
concept of subnetworks. In the transmission graph, a set of transmitters
and receivers form a subnetwork if there is a path between any two of them
if we ignore the unidirectional nature of the links. Fig. 4 illustrates all
subnetworks in the transmission graph in Fig. 3.

(0,0)~0,0> (0,1)~2,0>
(2,0)~1,0> (2,1)~3,0>
(4,0)~0, 1> (4,1)~2,1>
(6,0)~1,1> (6,1)~3,1>
(1,0)..::: : - : ~4, 0> (1, 1)~::: : - - ~6, 0>
~>-
(3, 0)- ::: - - - ~5, 0> (3,1)-::::~: ~7, 0>
(5, 0).. ::: : - - ~4, I> (5,1)..:::: - - ~6, 1>
(7,0)-::::~: ~5, I> (7,1)-::::~: ~7, I>

Figure 4: The structure of subnetworks in the transmission graph in fig. 3.

In summary, we have the following lemma.

Lemma 4.1 All transmitters and receivers constituting a subnetwork in the


transmission graph are assigned the same wavelength.

The challenges of designing such networks are that given N stations, Wavail
wavelengths, and T transmitters and R receivers in each station, how do we
optimally assign wavelengths to each transmitter and each receiver to max-
imize the transmission concurrence, how do we schedule the transmission
(note that since receivers always listen on the pre-determined wavelength,
there is no need for scheduling receptions) to minimize the transmission cy-
cle length? Furthermore, how to choose the number of transmitters and
receivers, and what is the relation between the cost in terms of the num-
ber of transmitters and receivers and the performance in terms of minimal
Optimization Problems in Optical Networks 561

transmission cycle length? How to scalable the networks if the number of


station is more than the dimension of the passive star couplers by using
multiple star couplers?
The key to address these questions is to identify the subnetwork struc-
ture. Let L(T, R) denote the number of transmitters in each subnetwork,
W(T, R) denote the number of subnetworks in the transmission graph. If
Wavail 2: W(T, R), then each subnetwork can occupy one unique wavelength
channel, and in this case the maximal transmission concurrence is W (T, R)
and the minimal transmission cycle length is L(T, R). If Wavail < W(T, R),
then the W(T, R) subnetworks should share the Wavail wavelengths evenly
with each other, and in this case, the maximal transmission concurrence
is Wavail and the minimal transmission cycle length is l WNT auad
J. In either
case, the optimal transmission schedule can be designed directly from the
subnetwork structures of the transmission graph. The subnetwork structure
can also provide a straightforward implementation of scalable multi-star
broadcast-and-select optical networks in case the network size exceeds the
fan-in/out of the star coupler. The values of d, T and R can be chosen such
that the number of transmitters and receivers in each subnetwork is at most
the fan-in/out of each individual star coupler. Therefore it's possible to use
one star coupler to interconnect one or more subnetworks. To reduce the
number of star couplers, each coupler should interconnect as many subnet-
works as possible.
When max(T, R) = d, the subnetwork structure depends only on the
network size Nand min(T, R) rather than the topology. It was shown in
[63] that L(T, R) = minfT,R) when max{T, R) = d. So in the next we will
assume that max(T, R) < d. The basic approach to analyze the subnetwork
structures involves two steps:
1. Identify the set of local indices of all transmitters in a subnetwork.
2. For each local index, identify the set of node indices of transmitters in
a subnetwork.
Let m be the least common multiple of + and ~. Let T' =i andR' = i.
T R
When gcd{+,~) = 1, we will use f. to denotemin{xR', (R' - y)T') where x
and yare the unique pair of integers satisfying
xR' - yT' 1,
0< x < T',
0< y < R'.
562 P.-J. Wan

Table 1 summarizes the values of L(T, R) for various virtual topologies.

V.T. L(T,R) Ref.


Cn lk [54]
G*n NT' [54]
Hn T'2m-(1" +R')+1 [55]
D(n, d) ! if nT mod d = 0 [56]
gcd~T,RJ if nT mod d ::/= 0
K(n, d) %if nT mod d = 0 [57]
II.:cd~T R) if nT mod d ::/= 0
Sn T'(n;+1)! if gcd(~\!!j1) > 1 [58]
T,l!lHl)!lmH' -~l), if gcd(n-l n-l) - 1 T' R' > 2
8 T'If - , ,
T,l!(Hl)! otherwise
4
Rn T'((nj1_1)!)T'-1(nj1!)R -T +1 ifT ~ R [59]
T,(n-j-l - l~J + 1)! TI~11(n-il - l ¥ J + l1f J)! else

Table 1: Maximum concurrencies for various virtual topologies

In this section, we only explore the subnetwork strucure of hypercube


in detail due to the limitation of the space. The proof for other various
topologies can be found in the corresponding reference listed in Table 1
In the transmission graph for hypercube, the set of receivers a transmit-
ter (a, t) connects to is

. inn
{(a E9 2" l!!.J) I tT ~ i < (t + 1)T}'
R

and the set of transmitters a receiver (b, r) connects from is

It's easy to show for any a~ t ~ T - 1 and a ~ i < T'


Optimization Problems in Optical Networks 563

and f()r any °


~ r ~ R - 1 and ° i<~ ~,

°
Therefore, for any subnetwork, there exists a unique integer ~ k < ~ such
that for any transmitter (a, t) and receiver (b, r) in this subnetwork,
t r
l T' J = l R' J = k.
Now we show that in such a subnetwork, the set of local indices of all
transmitters is actually {t I kT' < (k + 1)T'} and the set of local indices of
all receivers is actually {r I kR' < (k + 1)R'}. This can follow from the next
lemma.

Lemma 4.2 Let a be any n-bit binary number.


(1). If t mod T' > 0, then for any li, J ~ t' < t, the two transmitters (a, t)
and (a EI1 2:~=t'+l (2i~ EI12i~-1), t') are in the same subnetwork.
(2). If r mod R' > 0, then for any l~, J ~ r' < r, the two receivers (a, r)
and (aEl12:i=r'+1(2i~ EI12i~-1),r') are in the same subnetwork.

Proof (1). The lemma is trivial when T' = 1. So we assume that T' > 1.
First assume that t' = t - 1. Consider the following two links:

(a, t)

> 0, then (t T) mod


t.!!.-l t.!!.
If t mod T' ~ > 0, which implies l ~
R
J = l ~R J.
This means that the two transmitters (a, t) and (a EI1 2t~ EI1 2t~-1, t - 1)
connects to one common receiver, and therefore are in the same subnetwork.
If li, J ~ t' < t-1, then we can apply the previous argument for t-t' times.
Figure 5 illustrates the idea, and we omit the details here.
(2). The proof is similar to (1). D

So we can completely determine the structure of the set of local indices


of all transmitters in the same subnetwork, and the structure of the set of
local indices of all receivers in the same subnetwork.
564 P.-J. Wan

11109876543210
transmitters

receivers

Figure 5: The two transmitters (a, t) and (a $ L~=tt+1 (2i% $ 2i %-1), t') are
in the same subnetwork if t mod T' > 0 and lit ~ t' < t. In this example,
n = 12 ,T = 4, R = 3, t = 3 and t' = 0

Corollary 4.3 For any subnetwork, there exists a unique integer 0 ~ k < "iii
such that in this subnetwork
(1). the set of local indices of all transmitters is {t I kT' ~ t < (k + 1)T'},
(2). the set of local indices of all receivers is {r I kR' ~ r < (k + 1)R'}.

In the next we will study the structure of the set of node indices of all
transmitters which are in the same subnetwork and have the same local
indices, and the structure of the set of node indices of all receivers which
are in the same subnetwork and have the same local indices. We first in-
troduce some definitions. The weight of any binary number is defined to
be the number of its none-zero bits. The Hamming distance between two
binary numbers is defined to be the number of different bits of them. Two
binary numbers are said to have the same parity if they have even Hamming
distance. It's easy to verify that two binary numbers have the same parity
if and only if their weight are either both even or both odd. The set of
n-bit binary numbers whose weights are even are closed under the parity
operation $. It's easy to prove by induction that in any subnetwork of a
transmission graph,

• the node indices of all transmitters have the same parity,

• the node indices of all receivers have the same parity,

• the node index of any transmitter and the node index of any receiver
have different parity.
One immediate conclusion that can be drawn from Corollary 4.3 is that
for any subnetwork there exists a unique integer 0 ~ k < ~ such that the
Optimization Problems in Optical Networks 565

node indices of all transceivers in this subnetwork have the same bits at any
position other than the positions km, km+ 1"" ,(k+ l)m-l. Thus we can
restrict our attention only to the positions km, km + 1, ... , (k + l)m -1. For
any n-bit binary number x = Xn-l ••• XIXO, we will call the m-bit binary
number X(k+l)m-l ... Xkm+1Xkm the k-segment of x. Let

Ak = {km + i; 11 ~ i ~ T' - I},

Bk = {km + i ~ 11 ~ i ~ R' - I},


Then Ak n Bk = 0, since m is the least common multiple of T and l Let

Ck = (Ak U Bk) U {km, (k + l)m} = {Ck,O, Ck,l,"', Ck,T'+R'-l},


where Ck,O < Ck,l < ... < Ck,T'+R'-l. Let x = Xm-l ... XIXO be any m-bit
binary number. For any 0 ~ i ~ T' + R' -1, we will call the binary number
XCk,i+l- 1 ... X Ck ,i+1Yx k,i the (k, i)-section of x. Figure 6 gives an example of
segments and sections for n = 24, T = 6 and R = 8.
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
receivers ~~ _ _ _ _ _ _

(1,5) • (1,4) (1,3)


lSI tm1ill1
(1,2) (1,1) . (1,0)
tm1ill1 • _ _
(0,5) . (0.4)

(0,3)
Ii!IIlIIIIll
(0.2) (0,1)
Ii!IIlIIIIll
• (0,0) •
Ii!I!I _
sections ~

transmitters
IEE----I-segment ----+f~1 ~IE------IO-segment ~I

Figure 6: The definitions of segments and sections. In this example, n =


24, T = 6 and R = 8. There are two segments, and each segment has six
sections.

For any 0 ::; k < ~, we define a binary relation ~k between two n-bit
binary numbers as follows. For any two two n-bit binary numbers x and y,
X~kY -¢=>

• x and yare only different at the k-segmentj

• for each 0 ~ i ::; T' + R' - 1, the (k, i)-sections of x and y have the
same paritYj
It's easy to see that the binary relation ~k is an equivalent relation. The
next lemma gives the size of any equivalent class.
566 P.-J. Wan

Lemma 4.4 For any 0 ~ k < ~ The size of any equivalent class under the
binary equivalent relation ~k is 2m -(T'+R')-I.

Proof For any 0 ~ i < T' + R' - 1, let ii = Ck,i+1 - Ck,i, i.e., the length
of any (k, i)-section of any binary number. Then the size of any equivalent
class under the binary equivalent relation ~k is
T'+R'-2
L 21;-1
i=O
,,~+R' -2(l._I)
= 2L.",=o '
"~+R'-21·_(T'+R'_1)
= 2L.",=o '
_ 2m -(T'+R'-1). 0

The next lemma gives a necessary condition for two transmitters/receivers


which have the same local indices.

Lemma 4.5 Suppose that a~ka'.


(i}.For any kT' ~ t < (k + I)T', the two transmitters (a, t) and (a', t) are
in the same subnetwork.

(2). For any kR' ~ r < (k+ I)R', the two receivers (b,r) and (b',r) are in
the same subnetwork.

Proof (1). To prove the lemma, we only need to prove the lemma is true
when a' = a €B 2i €B 2j where the positions i and j are within some (k, i)-
section of a. For simplicity of our description, we use the following notation.
We use the symbol ~ between two transmitters or two receivers to indicate
that the two transmitters or two receivers are in the same subnetwork.
T
i-
T
t
R
i-
Let t' = lid = l J and r' = l J = l J. If t = t', the following path
R

(a, t) \t

implies that the two transmitters (a, t) and (a €B 2i €B 2j , t) are in the same
subnetwork.
Optimization Problems in Optical Networks 567

If t' > t, then we have the following path


t'-l
(a, t) "'* (a EB L (2(k+l)~-1 EB 2(k+1)~), t')
k=t
t'-l
"'* (a EB L (2(k+1)~-1 EB 2(k+1)~) EB 2i EB 2j , t')
k=t
t'-l
((a EB 2i EB 2j ) EB L(2(k+1)~-1 EB 2(k+1)~), t')
k=t

So the lemma is also true when t' > t.


If t' < t, consider the following path

t-l
(a, t) "'* (a EB L (2(k+1)~-1 EB 2(k+l)~), t')
k=t'
t-l
"'* (a EB L (2(k+1)~-1 EB 2(k+1)~) EB 2i EB 2j , t')
k=t'
t-l
((a EB 2i EB 2j ) EB L (2(k+l)~-1 EB 2(k+1)~), t')
k=t'

Figure 7 illustrates the above path. Therefore the lemma is true in any case.

(2). The proof is similar to (1). 0

The next lemma says that the reverse of the above lemma is also true.

Lemma 4.6 Suppose that 0 ~ k < n~l.

(1). For any kT' :s: t < (k + I)T', if the two transmitters (a, t) and (a', t)
are in the subnetwork, then a ~k a'.

(2). For any kR' ~ r < (k + I)R',


if the two receivers (b,r) and (b',r) are
in the same subnetwork, then b ~k b'.
568 P.-J. Wan

11 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0
transmitters
, I

,,
I

,,
I
, ,
I I
I

I
I I
I I I
I I I
\ \ \
\ \ \ \
\ \ \ \
\ \ \ \
\ \
receivers

(a) (b)

Figure 7: The two transmitters (a, t) and (a EEl 2i EEl 2j , t) are in the same
subnetwork if i and j are within some (l.J., J , i)-section of a. In this example,
n = 12, T = 4, R = 3 and t = 3. The sold thick lines represent the position
of i and j. The solid lines represent the first half of the path, and the dash
lines represent the second half of the path.

Proof We consider three cases.


case 1. T = R. In this case T' = R' = 1 and each segment has only
one section. So for (1) what we need to prove is that a and a' are only
different at the t-segments and the t-segments of a and a' have the same
parity. Therefore, a ~t a'. This can be easily proved by induction on the
length of the path between the two transmitters (a, t) and (a', t). (2) can
follows from the similar argument.
Case 2. T < R. For convenience of the description, We first introduce
some terminologies. When T < R, we call the local index of a transmitter
as its level. Then if T < R, two transmitters which share a common receiver
must be either at the same level or at two consecutive levels. Let (a, t) and
(a', t') be such two transmitters, and p be the path from (a, t) to (a', t'). If
t = t', we call p be a wandering at level t, otherwise we call p be a a jump
between level t and level t'. In particular, if t' = t + 1, we call p a jump-up
from level t to level t + 1. If t' = t - 1, we call p a jump-down from level t
to level t - 1. Figure 8 illustrates these concepts. The section which ends
r
with the position (t + 1) -1 and the section which begins with the position
(t + l)r in the l.J., J-th segment are called the two jump sections between
level t and level t + 1. Figure 9 indicates all jump sections in the example
in Figure 6. The section which ends with the position (t + l)r - 1 and
the section which begins with the position (t + 1) in the l.J., J-th segment r
are called the two jump sections between level t and level t + 1. If p is
a wandering, then a~L*Ja'. If p is a jump, then the two jump sections
between level t and level t + 1 of a and a' have odd Hamming distance
respectively, and all other sections are equal respectively.
Optimization Problems in Optical Networks 569

(a,t)
jump-down . / ~====~
(e,t-I) /

---- -
Figure 8: The concepts of wandering, jump-up and jump-down.

- --
23 22 21 20 19 18 1716 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
receivers ~ ~ ~

j . j • j . j j j • • j • j
sections CJ t=::l t=::l CJ CJ t=::l t=::l CJ ~

transmitters

IE I-segment ~I IE O-segment ~I

Figure 9: The jump sections, indicated by j, in the the example in Figure 6.

Now we give the proof for (1). Let p be any path from (a, t) to (a', t).
Within the path p, there possibly exists jumps. Only the jump sections of
a and a' could possibly have odd Hamming distance. However, notice that
whenever there is a jump-up from some level t' to level t' + 1, there are must
be a jump-down from the level t' + 1 to level t', and vice versa. Therefore,
for any t', the number of jump-up's from level t' to level t' + 1 is equal to the
number of jump-down's from level t' + 1 to level t'. So, after even number
of jumps between level t' and level t' + 1, the jump sections between level t'
and level t' + 1 also have even Hamming distance. Therefore, a~ l i, Ja'.
Next we give the pro offor (2). The following two links
n r!!
(bEB2 r R",L :J) -+ (b,r),
T
n r!!
(b' EB 2r R", L nR J) -+ (b',r),
T

imply that the two transmitters (b EB 2r i", l '+J) and (b' EB 2 i", l '+ J) are in
T
r
T
the same subnetwork. Therefore, from (1),
570 P.-J. Wan

which implies
b~lir Jb'.
Case 3. T > R. The proof is similar to case 2, and we omit the proof
here.
Therefore in either case the lemma is true. 0

From the above two lemmas, we can completely determine the structure
of the set of node indices of all transmitters which are in the same subnetwork
and have the same local indices, and the structure of the set of node indices
of all receivers which are in the same subnetwork and have the same local
indices.

Corollary 4.7 Suppose that 0 ~ k < ~.


(1). For any kT' ~ t < (k + l)T', the two transmitters (a, t) and (a', t) are
in the same subnetwork if and only if a ~k a'.

(2). For any kR' ~ r < (k + l)R', the two receivers (b, r) and (b', r) are in
the same subnetwork if and only if b ~k b'.

From the above corollary and Lemma 4.6, in any subnetwork, the number
of transmitters with the same local indices is 2m -(T' +R')+1. SO by Corollary
4.3, the total number of transmitters in any subnetwork is T'2 m -(T'+R')+1.
By similar argument, the total number of receivers in any subnetwork is
R'2 m -(T' +R')+1. Therefore, we can get the following theorem.

Theorem 4.8

L(T, R) = T'2 m -(T'+R')+1

and
W(T,R) = '::2 n - m +T '+R'-1.
m

n-R
¥
If R is a multiple of T, then T' = 1, R' = and m = T' So L(T, R) =
2---r .
It's easy to check the following lemma from Theorem 4.8.
Optimization Problems in Optical Networks 571

Lemma 4.9
L(T,T) 2T- 1
1
L(T,kT) 2k - 1 L(T, T)
k
L(kR,R) = 2k - 1 L(R, R)
1
L(kT, kT)
2(1-t)T L(T,T)·
If each station has the same number of and receivers, increasing the
number of transmitters and receivers at each node can exponentially increase
the performance. For an example, consider a network containing 28 = 256
stations. If T = R = 1, then the performance is L(I,I) = 27 = 128. If
we double the number of transmitters and receivers, then the performance
becomes L(2,2) = 23 = 8. If we double the transmitters and receivers once
more, then the performance becomes L( 4,4) = 2.
Now let's consider another scenario. Suppose originally each station has
the same number of transmitters and receivers. Then increasing the number
of receivers alone will exponentially increase the performance. Again let's
consider a network of size 28 = 256. Suppose each station has only one
transmitter. If each station has only one receiver, then the performance is
L(I, 1) = 128. If we double the receivers at each station, the performance is
L(1,2) = 64. If we double the transmitters once more, then the performance
becomes L(l, 4) = 16.
Now we consider the effect of increasing the number of transmitters while
fixing the number of receivers. First we notice that L(2R, R) = L(R, R).
This means if the each station has the same number of transmitters, then
doubling the number of transmitters alone does not change the performance.
However 1 if we continue increasing the number of transmitters as multiples
of the number of receivers, then the performance will be increased.
In the next, we consider the case that none of T and R is a multiple of
each other.
Lemma 4.10 1. Suppose T > Rand T is not a multiple of R. Then
L(T, R) ~ 3L(R, R), and the equality holds if and and only if n =
2T = 3R.
2. Suppose R > T and R is not a multiple of T. Then L(T, R) ~
2L(T, T), and the equality holds if and and only if n = 2T = kR
for some odd k > 1.
572 P.-J. Wan

Proof. (1). Suppose that T > Rand T is not a multiple of R. Then


T' > R' ~ 2 and ji > 2. Since m = R'ji,

L(T,R) T'2 R'i-(T'+R')+1


> T'2 R'i-i- R'+1
T'2 R'i- 2i- R'+22i- 1
= T'2(R'-2)(i- 1)L(R,R)
> T'L(R,R)
> 3L(R,R)
So L(T, R) ~ 3L(R, R). The equation holds if and only if R' = l} and
T' = 3, R' = 2, which is equivalent to n = 2T = 3R.
(2). Suppose that R > T and R is not a multiple ofT. Then R' > T' ~ 2
an d Tn > 2. S·mce mn = R"
T

L(T,R) T'2T'1j<-(T' +R')+l


> T'2T'1j<-T'-1j<+1
T'2 T'1j<-T'-21j<+221j<-1
T'2(T'-2)(1j<-1) L(T, T)
> T'L(R,R)
= 2L(R,R)
So L(T, R) ~ 2L(T, T). The equation holds if and only if T' = ji and
R' = 2, which is equivalent to n = 2T = kR for some odd k > 1. D.
From the above lemma, we can see that if none of T and R is a multiple
of the other, then the performance is poorer than the case that each station
has min(T, R) transmitters and min(T, R) receivers. This means that the
more number of transmitters or receivers could possibly leads to the degrade
of the performance. As an example. consider a network of size of 26 = 64.
If T = R = 2, then the performance is L(2,2) = 4. If we increase the
transmitters by 1, then the performance becomes L(3, 2) = 12. If we increase
the receivers by 1, then the performance is L(2, 3) = 8.
The cost-performance relations can be analyzed for various virtual topolo-
gies. The relations depends on the parameters selected. As we can see,
the performance can be linearly, exponentially or even sup-exponentially
improved as the cost increases. However, there are also some interesting
abnormal situations where the performance can be decreased when the cost
is increased.
Optimization Problems in Optical Networks 573

4.2 Cluster Interconnected Optical Network


The previous section studied the single-star optical networks with time
and/ or wavelength division multiplexing. The wavelength spatial reuse can
not be exploited by single-star optical network. To overcome this constraint
a multi-star configuration which efficiently combines space with wavelength
and/or time division was proposed in [5]. The network consists of rnl clus-
ters where each cluster is a set of rno nodes, as shown in Figure 10, with
the total network size of M = rnlrnO nodes. Each node possesses a single

luster Interconnection
Network

Regular
Passive
Fiber
Interconnect

Node TX's Output Couplers Input Couplers RX'S Node


Clusters (broadcast domains) (select domains) Clusters

Figure 10: Multi-star network with discrete broadcast-select domains: rnl


clusters, each with rno nodes transmitting through output couplers and re-
ceiving through input couplers, interconnected via a regular CIN topology

fixed-wavelength transmitter and a receiver that is capable to simultane-


ously monitor a subset of separable channels. A channel here can be a
574 P.-J. Wan

reserved time slot, a dedicated wavelength, or a reserved time slot over a


given wavelength. The receiver can be realized using either a multichan-
nel acousto-optic tunable filter or a detector array and a passive (grating
based) wavelength demultiplexer [36]. Each cluster possesses its own broad-
cast and select domains realized by an output and an input star couplers,
respectively. The cluster interconnection network (eIN) refers to the fiber
connection pattern from output to input couplers. When mo > 1, each clus-
ter is provided with the a self-link to enable connectivity among nodes in
the same cluster. The dimension of the output coupler is mo : F and that of
the input coupler is F : mo, where

F = the degree in the eIN topology

ifmo = 1 and

F = 1 + the degree in the eIN topology

if mo > l.Nodes in a cluster transmit over an ordered set of mo distinct


channels through the output broadcast star coupler. At the input cou-
pler side, several distinct channel sets are monitored depending on the eIN
topology. Transmit channel sets are assigned to the output couplers such
that no conflicts may happen at the input coupler. That is, the assignment
is such that the channel sets which can be listened to through any input
coupler are disjoint to provide a collision-less environment. One important
issue in the design of this network class is optimal conflict-free channel set
assignment. Since an input cluster always listens to F output couplers, an
immediate lower bound for the minimal number of channel sets is given by
the following lemma.

Lemma 4.11 {Lower-bound} For the cluster-based multi-star optical net-


works, any conflict-free channel set assignment requires at least F channel
sets.

To study the tight upper-bound for the minimal number of channel sets,
we will formulate the conflict-free channel set assignment problems into ver-
tex coloring problems. Depending on whether the eIN topology is an undi-
rected graph or not, the corresponding vertex coloring problem has a slightly
different definition.
Given a regular digraph G, consider the following vertex coloring prob-
lem.
Optimization Problems in Optical Networks 575

2-VC a vertex coloring scheme is called a 2- va of G if for any vertexv,


its immediate predecessors and itself have different colors from each
other. The minimal number of colors required by any 2-VC of G is
denoted by X2"( G).
Given an undirected graph G, consider the following two vertex coloring
problems:
2- VC a vertex coloring scheme is called a 2- va of G if no two vertices with
distance of two have the same color. The minimal number of colors
required by any 2- VC of G is denoted by X2 (G).
2-VC a vertex coloring scheme is called a 2- va of G if no two vertices with
distance of at most two have the same color. The minimal number of
colors required by any 2-VC of G is denoted bYX2"(G).
If we regard clusters as vertices and the channel sets as colors, then we
can establish the equivalence between the conflict-free channel set assign-
ment problems and the above vertex coloring problems. Table 2 summarizes
the number of colors required by various topologies.

CIN Channels sets Remarks References


Hn 21 1og2(n+l)1 with cluster self-links, [60]
upper-bound
211og2 nl without cluster self-links,
upper-bound
D(dlC, d) d+1 with cluster self-links, [4, 5]
optimal
d without cluster self-links,
optimal
Sn n with cluster self-links, [61]
optimal
Rn n with cluster self-links, [62]
optimal

Table 2: Number of channel sets for CIN topologies

Again in this section, we will only explore the vertex coloring scheme for
the CIN based on hypercubes. The coloring scheme for other CIN topologies
ca be found in the references listed in Table 2
576 P.-J. Wan

Before we describe the coloring schemes, we introduce a special class


of matrices called binary representation matrices (BRM). The binary rep-
resentation matrix of order k, denoted by BRMk, has dimension of k x
pog2(k + 1)1- The i-th row vector of BRMk is the binary pog2(k + 1)1-
vector of the number i for 1 ~ i ~ k. For example, the matrix

0 0 1
0 1 0
BRMs= 0 1 1
1 0 0
1 0 1

is the binary representation matrix of order 5.


Our coloring uses binary vectors to represent colors. For any binary n-
vector v, we use v to denote the binary (n - I)-vector obtained from v by
removing the last entry of v, i.e., v = (Vb V2,'" ,Vn-l).
Consider the following two vertex coloring scheme:

2-Scheme For any vertex v, coloring V with the color vBRMn- l .

2-Scheme For any vertex v, coloring V with the color vBRMn .

We will show that 2-Scheme is a 2-VC of Cn, and 2-Scheme is a 2-VC of


en.
Theorem 4.12 2-Scheme is a 2- VC of Cn and 2-Scheme is a 2- VC of en.

Proof We first prove that 2-Scheme is a 2-VC of en. Let u and V be any
two vertices with d( u, v) = 2. Suppose that Ui =F Vi and Uj =F Vj for some
1 ~ i < j ~ n. We consider two cases.
Case 1. j = n. In this case,

uBRMn- 1 - vBRMn- 1 = (u - v)BRMn-l = BRMn-l i

which is the binary rlog2 n1-vector of the number i. Therefore

uBRMn_ 1 =F vBRMn-l

which means U and V have different colors.


Case 2. j < n. In this case,

uBRMn- 1 - vBRMn_l = (u - v)BRMn-l = BRMn- li + BRMn- 1j =F 0


Optimization Problems in Optical Networks 577

since i =J j. This implies that u and v have different colors.


So in either case, u and v have different colors. Therefore, 2-Scheme is
a 2-VC of On.
Now we prove that 2-Scheme is a 2-VC of On. Let u and v be any two
vertices with d( u, v) ~ 2. We consider two cases.
Case 1. d( u, v) = 1. Suppose that Ui =J Vi for some 1 ~ i ~ n. Then

uBRMn - vBRMn = (u - v)BRMn = BRMni


which is the binary pog2(n + l)l-vector of the number i. Therefore

uBRMn =J vBRMn

which means u and v have different colors.


Case 2. d( u, v) = 2. Suppose that Ui =J Vi and Uj =J vjfor some 1 ~ i <
j ~ n. Then

uBRMn - vBRMn = (u - v)BRMn = BRMni + BRMnj =J 0


since i =J j. This implies that u and v have different colors.
So in either case, u and v have different colors. Therefore, 2-Scheme is
a 2-VC for Cn. 0
A vertex coloring scheme is said to be balanced if each color is assigned
to the same number of vertices. Balance is a desired property for vertex
colorings. The next lemma shows that the 2-Scheme has this nice property.

Lemma 4.13 Both 2-Scheme and 2-Scheme are balanced.

Proof. It's easy to see that 2-Scheme uses 2 fIog 2 n l colors. So we only
need to prove that each color is assigned to 2n - rlog2 n 1 vertices. Since the
rank of the matrix BRMn- 1 is equal to flog2 n1, the dimension of the null
space of BRMn- 1 is equal to n - 1 - pog2 n 1- Notice that each binary
(n -I)-vector v corresponds to two binary n-vectors. Therefore any color is
assigned to 2 . 2n - 1- fIog2 n1 = 2n- rlog2 n1 vertices of On.
Similarly we can prove that 2-Scheme is also balanced 0

When n = 2k for some k, the 2-Scheme uses n colors, which is the


lower-bound for X2(Cn ). When n = 2k - 1 for some k, the 2-Scheme uses
n + 1 colors, which is the lower-bound for X2"(On). Thus we can obtain the
following corollary.
578 P.-J. Wan

Corollary 4.14 If n = 2k for some k, then 2-Scheme is an optimal 2- VC


of Cn. If n = 2k -1 for some k, then 2-Scheme is an optimal 2- VC for Cn.

From Lemma 4.11 and Theorem 4.12, we have

n < X2 (Cn ) <


_ 2 POg2 nl ,
n + 1 < ~2(Cn) ~ 2rlog2(n+1)1.

The above two corollaries indicate that the lower-bound can be achieved for
some special sizes of the network. One question is whether the lower-bound
given in Lemma 35 is always achievable. The following two examples give a
negative answer.

Example 1. X2(H3) = 2Pog231 = 4. This can be proved in the following


way. Without loss of generality, assume that vertex 000 has color 1, vertex
011 has color 2 and vertex 101 has color 3. We consider the color of vertex
110. Since the vertex 110 has distance of 2 from each of the three vertices
000, 011 and 101, it must have some color different from color 1,2 and 3.
Therefore, X2(H3) ~ 4. On the other hand, 4 is an upper bound for X2(H3).
So X2(H3) = 4.

Example 2. X'2(H2 ) = 2r1og2 31 = 4. The proof is straightforward, since


in H2 the distance between any pair of vertices is at most two.
From the above two examples, we conjecture that for any positive integer
n,
= 2rlog2nl ,
_ 2rlog2(n+1)l.

If this conjecture is true, then 2-Scheme is an optimal 2- VC for en and


2-Scheme is an optimal 2- VC for Cn .

4.3 Effect of Limited Tunable Range of Transceivers


The transceiver configuration that each station uses one tunable transmitter
and one fixed receiver has been studied extensively by a lot of researchers.
However, in the previous studies the tunable transmitters are assumed to be
able to tune to all wavelengths available to the whole system. Unfortunately,
with the current technology the tunable range of the tunable transmitters
is very restricted [14] and even in the near future, the tunable range is not
Optimization Problems in Optical Networks 579

likely to increase significantly due to technical difficulties. Moreover, there


is a trade-off between the speed at which a laser can be tuned and the spec-
trum range over which the laser can operate. The larger the spectrum range,
the slower the speed. If one wants lasers that can be quickly retuned, then
one has limitations on the tunability range. Furthermore, economic consid-
erations may dictate the use of cheaper lasers with tunability restrictions
if the performance of the network remain within some bounds. Tunability
restrictions, on the other hand, increase the complexity of controlling the
network and of determining a new configuration since the limitations have
to be accounted for. Again, if the control complexity can be kept low, one
would want to use cheaper lasers. In this section, we will study the effect of
the limited tunable range on the network design and performance.
Let's assume that the transmitter can only tune to k consecutive wave-
lengths. The tunable range of a transmitter can then be represented by a
waveband [w, w + k - 1]. A fixed receiver can directly, i.e. taking one hop,
receive message from a transmitter only if the receiver's wavelength lies
in the transmitter's tunable range. Realizing a communication pattern re-
quires proper waveband assignment to the transmitters and the wavelength
assignment to the receivers. For each link in the communication pattern the
waveband of the receiver at the destination node must lie in the waveband
of the receiver at the source node. For any virtual topology, any assignment
satisfying such constraint is called a valid channel assignment. It should be
noticed that any communication pattern can be realized by a valid channel
assignment. In fact, in the extreme case, one trivial valid channel assign-
ment is that all receivers are assigned the same channel and transmitters are
assigned the same waveband that contains the wavelength channel used by
the receivers. However, this trivial assignment takes no advantage of WDM
at all. It can exploit only one channel and thus only one transmission can
occur at time. Since the number of wavelengths assigned to the receivers
in the network represents the maximal transmission concurrence that the
network can achieve, it's desirable to realize any communication pattern by
a valid channel assignment that exploits the most receiver channels. For any
communication pattern, a valid channel assignment is said to optimal if it
exploits the most receiver channels. In this section, we will investigate the
optimal valid channel assignment for various communicate patterns.
Any communication pattern can be represented by a bipartite digraph
with the transmitters on the left-hand side and the receivers on the right-
hand side. A set of transmitters and receivers in this bipartite digraph form
a component if there is a path between any two of them assuming the edges
580 P.-J. Wan

in this bipartite digraph were bidirectional. In other words, forgetting the


unidirectional nature of the link between a transmitter and a receiver, a
component is a connected component in graph theoretic terminology.
The notion of components is central to optimal valid channel assignment.
The sets of wavelengths assigned to the receivers in two components can be
disjoint. Therefore, the total number of wavelength channels that can be
exploited by the whole network is the sum of number of channels that can be
exploited by each individual component.In particular, if the communication
pattern is regular, then the total number of wavelength channels that can be
exploited by the whole network is the product of the number of components
and the number of channels that can be exploited by any component. So
the optimal valid channel assignment for the whole network can be reduced
to the optimal valid channel assignment to each component.
For some communication patterns like complete graph, unidirectional
ring, generalized de Bruijn digraph and generalized Kautz digraph, the
structures of the components are relatively simple. Let URn denote the
unidirectional ring of size nand BRn denote the unidirectional ring of size
n. Table 3 summarizes the results for these communication patterns.

Pattern Channels Remarks


Cn min{n,k}
D(N,d) min{N,¥} diN
D(N,d) min{N,¥} diN
URn n
2r~1 k = 2 and n is even
BRn r~l k = 2 and n is odd
n k>2

Table 3: Number of channels sets for various communication patterns

In this section we will explore the optimal waveband assignment for


generalized de Bruijn digraph in depth. The proof for other results can be
found in [63].
We begin with the component structure of the generalized de Bruijn
graph D(n, d). We assume that n is a multiple of d. For each 0 $ i $ i-I,
let
Optimization Problems in Optical Networks 581

'Ri = {jll~J=i,O~j~n-l}.

Lemma 4.15 The single optical passive star network based on D(n, d) con-
tains ~ components. Furthermore, for each 0 ~ i ~ ~ - 1, the transmitters
in Li and the receivers in 'Ri constitute a component.

Proof. For any fixed 0 ~ i ~ ~ - 1, we first prove that the set of receivers
the transmitter of any node in Li is exactly 'Ri. Consider any transmitter
j ELi. Suppose that j = q~ + i for some q. For any 0 ~ £ ~ d - 1, the
receiver that j connects to via its £-th outgoing link is

(jd + £) mod n = (qn + id + £) mod n = id + £,

since
o ~ zd. + £ ~ (dn - l)d + d - 1 = n - 1.
Noticing that
l id d+ £J - .
- Z,
the receiver id + £ is in 'Ri. Since the transmitter j connects to d receivers
and 'Ri has size of d, therefore the set of receivers the transmitter j connects
to is exactly 'Ri.
Now we prove the reverse direction. For any fixed 0 ~ i ~ ~ -1, consider
any receiver j E'Ri. Suppose that j = id + r for some 0 ~ d - 1. For any
o ~ £ ~ d - 1, the transmitter that j connects from via its £-th incoming
link is
j = L£n+id+r = o~
l £n+
d J d J {, d + z..
Noticing that
(£~ + i) mod ~ = i,
the transmitter £~ + i is in Li' Since the receiver j connects from d trans-
mitters and Li has size of d, therefore the set of transmitters the receiver j
connects from is exactly Li.
From the above, for each 0 ~ i ~ ~ - 1, the transmitters in Li and the
receivers in'Ri constitute a component. Since the sets LO, L1,' .. ,L!!c-1
d
form
a partition of all transmitters and the sets 'Ro, 'R1,"', 'R!!c_1
d
form a parti-
tion of all receivers, the transmission graph contains exactly ~ components.
o
582 P.-J. Wan

For any a ~ i ~ i-I, we will call the component consisting of all


transmitters in £i and all receivers in 'R,i as the i-th component of the
transmission graph, denoted by Ci. From the proof of the above lemma, it's
easy to see that each component is a complete bipartite digraph. The next
lemma gives the maximal number of wavelengths that can be employed by
any component.

Lemma 4.16 For any a ~ i ~ i-I, the maximal number of wavelengths


that the receivers in component Ci is min{d, k}.

Proof. The lemma is obvious when d ~ k. So we assume that d> k. We


prove that at most k wavelengths can be assigned to the receivers in Ci.
Suppose on the contrary that more than k wavelengths were used by the
receivers, then any transmitter would have to be able to more than k wave-
lengths, since the component is a complete bipartite digraph. This would
exceed the tunability of the transmitters. Therefore at most k wavelengths
can be employed by the receivers in any component. Now we show that
the set of receivers in the component can actually employ k wavelengths.
One such assignment scheme is to distribute the k consecutive wavelengths
evenly to all receivers in the component while all transmitters have the
same tunable range that covers the k consecutive wavelengths assigned to
the receiver. This proves that the maximal number of wavelengths that the
receivers in component Ci is min{ d, k}. 0

From the Lemma 4.15 and Lemma 4.16, we can immediately obtain the
following theorem.

Theorem 4.17 The maximal number of wavelengths that can be employed


by the receivers of the single optical passive star network based on D(n, d)
. . { nk}
~s mm n'd .

For any network topology, the maximal number of channel sets that can
be exploited by the network is the network size. For both D(N, d) and
K(N, d), if diN, the optimality can be achieved when k = d. If d is
small, then the requirement of the tunable range is also very small. For any
bidirectional ring, the optimality is achieved when k = 3, which is achievable
with current technology. One interesting property of these topologies is that
the minimal values of k do not depend on the network size, which is the very
desirable property for the network scalability.
Optimization Problems in Optical Networks 583

5 Conclusion
In the last few years, the WDM network has been the fertile ground for
many interesting and challenging works in combinatorial design and com-
binatorial optimization. It has received considerably interest lately. While
many problems have been well studied, there still remain a lot of unsolved
or open problems. So it will continue to be a great arena for both the system
engineers and theoretical scientists. It's one of main purpose of this paper to
introduce the theoretical scientist some combinatorial design and optimiza-
tion problems and hopefully to attract more talented theoretical scientists
to get involved in this area.

References
[1] A. Aggarwal, A. Bar-Noy, D. Coppersmith, R. Ramaswani, B. Schieber,
and M. Sudan, Efficient Routing and Scheduling Algorithms for Opti-
cal Networks, Proceedings of the 5th Annual ACM-SAM Symposium on
Discrete Algorithms, pp. 412-423, 1994.

[2] S.B. Akers and B. Krishnamurthy, A Group-theoretic Model for Sym-


metric Interconnection Networks, IEEE Transaction on Computers,Vol.
38, pp. 555-566, 1989.

[3] S.B. Akers, D. Harel and B. Krishnamurthy, The Star Graph: An At-
tractive Alternative to The n-cube, Proc. Int. Conf. Parallel Processing,
pp. 393-400, 1987.
[4] K.A. Aly, Reconfigurable WDM Interconnection of Large-scale Multipro-
cessor Systems, PhD dissertation, Department of Electrical And Com-
puter Engineering, SUNY at Buffalo, July 1993.
[5] K.A. Aly, P.W. Dowd, A class of Scalable Optical Interconnection Net-
works through Discrete Broadcast-select Multi-domain WDM, Proc.
IEEE INFOCOM'94, (Torando, Canda), June 1994.

[6] Y. Aumann and Y. Rabani, Improved bounds for all optical routing,
Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete
Algorithms, pp. 567-576, San Francisco, California, 22-24 January 1995.

[7] B. Awerbuch, Y. Azar, A. Fiat, S. Leonardi, and A. Rosen. On-line Com-


petitive Algorithms for Call Admission in Optical Networks, Proceedings
584 P.-J. Wan

of the 4th Annual European Symposium on Algorithms, Lecture Notes in


Computer Science 1136, pp. 431-444, Springer-Verlag, 1996.

[8] K. Bala and T.E. Stern, Algorithms for Routing in a Linear Lightwave
Network, Proceedings of INFOCOM, pp. 1-9, 1991.

[9] R. Ballart and Y.C. Ching, SONET: Now It's the Standard Optical
Network, IEEE Communications Magazine, pp. 8-15, March 1989.

[10] R.A. Barry and P.A. Humblet, Bounds on the Number of Wavelengths
Needed in WDM Networks, LEOS'92 Summer Tropical Mtg. Digest, pp.
114-127, 1992.

[11] R.A. Barry and P.A. Humblet, On the Number of Wavelengths and
Switches in All Optical Networks, IEEE Transactions on Communica-
tions, pp. 583-591, Vol. 42, No. 2/3/4, February/March/April, 1994.

[12] Y. Bartal, A. Fiat, and S. Leonardi, Lower Bounds for On-Line Graph
Problems with Application to On-Line Circuit and Optical Routing, Pro-
ceedings of the 28th ACM Symposium on Theory of Computing, pp. 531-
540, 1996.
[13] Y. Bartal and S. Leonardi. On-Line Routing in All-Optical Networks,
Proceedings of the 24th International Colloquium on Automata, Lan-
guages and Programming, LNCS 1256, pp. 516-526. Springer-Verlag,
1997.
[14] C.A. Brackett, On the Capacity of Multiwavelength Optical-Star Packet
Switches, IEEE Lightwave Magazine, May 1991, pp. 33-37.
[15] N.G. de Bruijn, A Combinatorial Problem, Nederl, Akad. Wetensh.
Proc. 49 (1946), pp. 758-764.

[16] W.E. Burr, The FDDI Optical Data Link, IEEE Communications Mag-
azine, Vol.24, No.5, May 1986.

[17] M.S. Chen, N.R. Dono, R. Ramaswami, A Media Access-Protocol


for Packet-Switched Wavelength Division Multiaccess Metropolitan Net-
works, IEEE Journal on Selected Areas in Communications, vol. 8, no.
6, Aug 1990, pp. 1048-1057.
[18] K.W. Cheng, Acousto-optic Tunable Filters in Narrowband WM Net-
works, IEEE JSAC, 8:10115-1025, 1990.
Optimization Problems in Optical Networks 585

[19] 1. Chlamtac and A. Ganz, Toward Alternative High-Speed Network


Concepts: The SWIFT Architecture, IEEE 1hms. on Communications,
vol. 38, no. 4, Apr 1990, pp. 431-439.
[20] P.F. Corbett, Rotator Graphs: AN Efficient Topology for Point-to-
Point Multiprocessor Networks, IEEE Transactions on Parallel And Dis-
tributed Systems, Vol. 3, No.5, Sep 1992. pp. 622-626,
[21] T. Cox, F. Dix, C. Hemrick and J. McRoberts, SMDS: The Beginning
of WAN Superhighways, Data Communications, April 1991.
[22] T.Q. Dam, KA. Williams, D.H.C. Du, A Media-Access Protocol for
Time and Wavelength Division Multiplexed Passive Star Networks, Tech-
nical Report 91-63, Computer Science Dept., University of Minnesota.
[23] C. Dragone, Efficient N x N Star Coupler Based on Fourier Optics,
Electronics Letters, vol. 24, no. 15, Jul 1988, pp. 942-944.
[24] D.-Z. Du, F. Cao, F. Hsu, de Bruijn Digraph, Kautz Digraph, And
Their Generalizations, Combinatorial Network Theory, Du D.-Z., D.F.
Hsu eds, Kluwer Academic Publishers, 1995, pp. 65-105 .
[25] T. Erlbach, K Jansen, Scheduling of Virtual Connections in Fast Net-
works, Proc. of Parallel Systems and Algorithms (PASA) (1996), pp.
13-32.
[26] T. Erlbach, K Jansen, Call Scheduling in Trees, Rings and Meshes,
Proc. of HICSS (1997).
[27] A. Frank, T. Nishizeki, N. Saito, H. Suzuki, E. Tardos, Algorithms for
Routing around a Rectangle, Discrete Applied Mathematics 40 (1992),
pp. 363-378.
[28] Q.-P. Gu, H. Tamaki, Edge-Disjoint Routing in the Undirected Hyper-
cube. Tech. Rep. COMP 96-21, Institute of Electronics, Information, and
Communication Engineering of Japan, 1996.
[29] Q.-P. Gu, H. Tamaki, Routing a permutation in the Hypercube by Two
Sets of Edge-Disjoint Paths, Proc. of 10th International Parallel Process-
ing Symposium (IPPS) (Apr. 1996), IEEE Computer Society Press.
[30] M.G. Hluchyj, M.J. Karol, ShuffleNet: An Application of Generalized
Perfect Shuffles to Multihop Lightwave Networks, Journal of Lightwave
Technology, vol. 9, no. 10, Oct 91, pp. 1386-1396.
586 P.-J. Wan

[31] M. Imase, M. Itoh, Design to Minimize a Diameter on Building Block


Network, TEEE 1rans. on Computers, C-30, 1981, pp. 439-443.

[32] M. Imase, M. Itoh, A Design for Directed Graph with Minimal Diam-
eter, TEEE 1rans. on Computers, C-32, 1983, pp. 782-784.

[33] M. Imase, T. Soneoka, K. Okada, Connectivity of Regular Directed


Graphs with Small Diameter, TEEE 1rans. on Computers, C-34, 1985,
pp. 267-273.

[34] S. Irani. Coloring Inductive Graphs On-Line. Algorithmica, 11:53-72,


1994.

[35] K. Jansen, Approximation Results for Wavelength Routing in Directed


Trees, Proc. of WOCS (1997).

[36] J.R. Jump, YACSIM Reference Manual, Department of Electrical And


Computer Engineering, Rice University, 1.2 ed., August 1992.

[37] C. Kaklamani, P. Persiano, T. Erlbach, K. Jansen, Constrained Bipar-


tite Edge Coloring with Applications to Wavelength Routing. ICALP'97.

[38] W.H. Kautz, Bounds on directed (d, k) graphs, Theory of Cellur Logic
Networks And Machines, AFCRL-68-0668 Final Report, 1968, pp. 20-28.

[39] M. Kevacevic and M. Gerla, Rooted Routing in Linear Lightwave Net-


works, Proceedings of INFO COM, pp. 39-48, 1992.

[40] H. A. Kierstead and W. T. Trotter, An Extremal Problem in Recursive


Combinatorics, Congressus Numerantium, 33:143-153, 1981.

[41] J. Kleinberg and E. Tardos, Disjoint Paths in Densely Embedded


Graphs, Proceedings of the 36th Annual IEEE Symposium on Founda-
tions of Computer Science, pp. 52-61, 1995.

[42] F.T. Leighton, S. Rao, An Approximate Max-flow Min-cut Theorem for


Uniform Multicommodity Flow Problems with Applications to Approxi-
mation Algorithms, Proc. of 29th Annual Symposium on Foundations of
Computer Science (FOCS) (Oct. 1988), IEEE, pp. 422- 431.

[43] R.A. Linke, Frequency Division Multiplexed Optical Networks Using


Heterodyne Detection, IEEE Network Magazine, vol. 3, no. 2, Mar 1989,
pp.13-20.
Optimization Problems in Optical Networks 587

[44] M. Mihail, C. Kaklamanis, and S. Rao, Efficient Access to Optical


Bandwidth, In Proc. of the of 36th Annual IEEE Symposium on Foun-
dations of Computer Science, 1995.

[45] R.K. Pankaj, Architecture for Linear Lightwave Networks, PhD thesis,
MIT, 1992.

[46] R.K. Pankaj, R. G. Gallager, Wavelength Requirements of All-Optical


Networks, IEEE/ACM Trans. on Networking 3 (1995), pp. 269-280 .

.[47] A. Pavan, S.-R. Tong, P.-J. Wan and D.H.C. Du, A New Multihop
Lightwave Networks Based on a Generalized de Bruijn Graph, Proceed-
ings of the 21st IEEE Annual Conference on Local Computer Networks,
pp. 498-057, 1996.

[48] G.R. Pieris and G.H. sasaki, A Linear Lightwave Benes Network,
IEEE/ACM Trans. on Networking, 1993.

[49] Y. Rabani, Path Coloring on the Mesh, Proc. of 37th Annual Sympo-
sium on Foundations of Computer Science (FOCS) (Oct. 1996), IEEE.

[50] P. Raghavan and U. Upfal, Efficient Routing in All Optical Networks,


Proceedings of the 26th Annual ACM Symposium on Theory of Comput-
ing, pp. 133-143, 1994.

[51] S.M. Reddy, D.K. Pradhan, J.G. Kuhl, Directed Graphs with Minimal
Diameter And Maximal Connectivity, School of Engineering Oakland
Univ. Tech. Rep., 1980.

[52] R. E. Tarjan, Decomposition by Clique Separators, Discrete Mathemat-


ics 55, 2 (1985), pp. 221-232.

[53] A. Tucker, Coloring a Family of Circular Arcs, SIAM Journal of Applied


Mathematics 29, 3 (1975), pp. 493-502.

[54] P.-J. Wan and A. Pavan, TWDM Media Access Protocol for Single-
hop Lightwave Networks, 21st Annual Conference on Local Computer
Networks, pp. 486-490, 1996.

[55] P.-J. Wan, TWDM Lightwave Hypercube Networks, Theoretical Com-


puter Science, (194)1-2 (1998) pp. 123-136.
588 P.-J. Wan

[56] P.-J. Wan and A. Pavan, TWDM Lightwave Networks Based on Gen-
eralized de Bruijn Graphs, Dimacs Workshop on Network Design, April
1997.

[57] P.-J. Wan, TWDM Lightwave Networks Based on Kautz Digraphs, Pro-
ceedings of the Fifth IEEE International Conference on Computer and
Communication Networks, pp. 196-199, 1996.

[58] P.-J. Wan, TWDM Lightwave Networks Based on Star Graphs, sub-
mitted to IEEE Transactions on Communications, 1997.

[59] P.-J. Wan, TWDM Lightwave Networks Based on Rotator Digraphs,


Proceedings of the Sixth IEEE International Conference on Computer
and Communication Networks, pp. 154-157, 1997.

[60] P.-J. Wan, Near-Optimal Conflict-free Channel Assignment for an Op-


tical Cluster Hypercube Interconnection Network, Journal of Combina-
torial Optimization 1, 179-186 (1997).

[61] P.-J. Wan, Conflict-free Channel Assignment for an Optical Cluster In-
terconnection Network Based on Star Graphs, accepted by INFORMAT-
ICA, 1997, also appearing in Proceeding of the Nineth lASTED Interna-
tional Conference on Parallel and Distributed Computing and Systems,
1997.
[62] P.-J. Wan, Conflict-free Channel Assignment for an Optical Cluster
Interconnection Network Based on Rotator Digraphs, Proceeding of the
Eighth lASTED International Conference on Parallel and Distributed
Computing and Systems, pp. 121-123, 1996.

[63] P.-J. Wan, Multichannel Lightwave Networks, PhD thesis, Computer


Science Department, University of Minnesota.

[64] K.A. Williams, D.H.C. Du, Time and Wavelength Division Multiplexed
Architectures for Optical Passive Stars Networks, Technical Report 92-
44, Computer Science Dept., University of Minnesota.
589

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 589-616
©1998 Kluwer Academic Publishers

Shortest Networks on Surfaces


J. F. Weng
Department of Mathematics and Statistics
The University of Melbourne, Parkville, VIC 3052, Australia
E-mail: weng(Qms.unimelb.edu.au

Contents
1 Shortest Networks on Curved Surfaces 590
2 Steiner Minimal Trees on Spheres 594
2.1 Properties of Steiner Minimal Trees . . . . . . . . . . . . . · 594
2.2 Steiner Points for Spherical Triangles . . . . . . . . . . . . . · 597
2.3 Number of Steiner Trees with a Full Topology. . .. 601
2.4 Constructing Relatively Minimum Trees on Spheres . . . . . . . . . 602

3 Steiner Ratios on Spheres 603


3.1 Compression Theorem .. .604
3.2 The Steiner Ratio on Spheres .606
4 Steiner Minimal Trees for Co circular Points on Spheres 609
4.1 History of the Problem. . . . . . . . . . . . . .609
4.2 Graham's Conjecture on Spheres . . . . . . . · 610
4.3 Steiner Minimal Trees for Points on Equators .. 612

References

Suppose A = {al' a2, ... ,an} is a point set in a metric space M. The
shortest network problem asks for a minimum length network S(A) that in-
terconnects all points of A (called terminals), possibly with some additional
points to shorten the network. S(A) must be a tree since it cannot contain
any cycle for minimality. In the literature this problem is called the Steiner
590 J. F. Weng

tree problem, and S(A) is called a Steiner minimal tree for A [9]. If no ad-
ditional points are added, then the network, denoted by T(A), is called a
minimal spanning tree on A. Sometimes these networks are simply denoted
by Sand T if no confusion is caused.
The Steiner tree problem in the Euclidean plane has been studied exten-
sively [9], however, there are not many results on the Steiner minimal trees
on curved surfaces [3],[4],[1]. One reason is that the Steiner tree problem
on curved surfaces is much more complicated than the corresponding one
in the Euclidean plane. Hence, except the first section in which we outline
the Steiner tree problem on general curved surfaces, in all other sections we
only study Steiner minimal trees on spheres - the simplest curved surfaces
in Euclidean space.

1 Shortest Networks on Curved Surfaces


Suppose p and q are two points in a Steiner minimal tree S on a smooth
surface M in 3-dimensional Euclidean space R 3 . Then the edge pq is a
shortest geodesic through p and q. By 1·1 we denote the length of an object
which might be an arc, an edge, or a tree. Suppose s is an additional point in
S, and suppose p and q are points on two edges of s separately. Let s move
in direction v on M that lies in the angle formed by sp and sq. Suppose
the angle between v and sp is a, the angle between v and sq is {3. When
sp and sq are very small, from the tangent plane at s we can see that the
directional derivatives of these edges are

dlspl dlsql
dlvl = - cos a, dlvl = - cos {3.

It is easy to verify that if a + {3 ~ 271"/3, then

. {dlspl dls ql } _ -2 a + (3 -1 (1)


m;n dlvl + dlvl - cos 2 ~ .

Hence, if an angle of an additional point s is less than 271"/3, then S can be


shortened by inserting a new additional point that lies on the bisector of
the angle and is sufficiently close to s. This proves the following necessary
conditions of S [16]:

Theorem 1.1 If S is a Steiner minimal tree on M, then


(1) any angle in S is no less than 271"/3, in particular, all angles at an
Shortest Networks on Surfaces 591

additional point equal 27r /3,


(2) any vertex in S is of degree no more than 3, in particular, any addi-
tional point is exactly of degree 3.

Corollary 1.2 If S is a Steiner minimal tree on M, then the number of


additional points in S is no more than n - 2.

Corollary 1.3 If S is a Steiner minimal tree on M, then there is only


one shortest geodesic joining two vertices of S except if both vertices are
terminals.

Proof. Suppose there are two distinct shortest geodesics el and ei that join
two vertices p and q in S, and suppose at least one of p and q, say p, is an
additional point. Let the other two edges of p be e2 and e3. Then, one of el
and ei cannot form a 27r /3 angle with e2 and e3. This contradicts that Sis
minimal. 0

The condition of angles being equal to or greater than 27r /3 is referred


to as the angle condition of Steiner minimal trees. An additional point
satisfying the angle condition is called a Steiner point. By the variational
argument [10], the 27r /3 angle condition means that a Steiner point is a
stationary point. That is, if s is a Steiner point, then the sum of the first
variations of the edges of s is zero.
The graph structure of a network T is called its topology, and denoted
by G(T), or simply by G. A topology is called Steiner if all vertices have
degree at most three. A network is called a Steiner tree if it has a Steiner
topology and all additional points are Steiner points. By Theorem 1.1, a
Steiner minimal tree S must be a Steiner tree.
A Steiner point in a tree is called a (locally) minimal point if a small
perturbation of this point cannot shorten the tree. A tree is called locally
minimal if all Steiner points in the tree are minimal points. Theorem 1.1
says that a minimal point is a Steiner point. The inverse of this proposi-
tion is also true in the Euclidean plane: A Steiner point is a minimal point.
Therefore, a Steiner tree in the Euclidean plane is locally minimal. How-
ever, below we will see that it is not the case for Steiner points on curved
surfaces. In that case, a Steiner point might be a maximal point or a saddle
point. Consequently, a Steiner tree on curved surfaces might not be a locally
minimal tree. Moreover, the 27r /3 angle condition guarantees that a Steiner
tree in the Euclidean plane is not self-intersecting. Again, this is not true
592 J. F. Weng

for Steiner trees on curved surfaces. A Steiner tree on a curved surface may
have self-intersecting paths.
Let L(P) = Ipal + Ipbl + Ipcl where p is any degree 3 vertex and a, b, care
its adjacent vertices. If p minimizes L(P), then p is called a minimum point.
If a minimum point p is not a terminal, then p must be a minimal point.
A tree is called relatively minimum if it is the shortest of all trees with the
same topology. A relatively minimum tree with a given Steiner topology
may not exist. However, if it exists, then all Steiner points are minimum
points. In the Euclidean plane, a Steiner point is always a minimum point,
and a Steiner tree is always a relatively minimum tree [9]. Again, it is not
true for curved surfaces because there may exist more than one geodesic
joining two points. Summing up, the set of minimum points on a curved
surface is smaller than the set of minimal points, and the set of minimal
points is smaller than the set of Steiner points. These differences between
the Steiner points in the Euclidean plane and the Steiner points on curved
surfaces make the Steiner tree problem on curved surfaces more complicated
than the corresponding one in the Euclidean plane.
A Steiner tree spanning n terminals is called full if the number of Steiner
points is just equal to n - 2. In a full Steiner tree, all terminals are of degree
one. Full Steiner trees playa significant role in the Steiner tree problem
because a Steiner tree can be decomposed into a union of full Steiner subtrees
by Theorem 1.1 [9]. We can view this fact from another angle. The topology
of a full Steiner tree is called a full Steiner topology. By shrinking an edge is
meant the operation of deleting an edge and collapsing its two endpoints. A
topology is called a degeneracy of another if it is obtained by shrinking edges.
All degeneracies of a topology G compose a topology family, which is denoted
by D( G). From this angle of view, the topology G(T) of any Steiner tree T is
a degeneracy of certain full Steiner topology F, i.e. G(T) E D(F). Finally,
as the Steiner tree problem in the Euclidean plane has been proved to be
NP-hard [7), the Steiner tree problem on curved surfaces is also NP-hard.
The NP-hardness concerns the fact that the number of Steiner topologies
is superexponential. However, constructing a Steiner tree with a given full
topology is not so hard in the Euclidean plane. For these reasons, how to
construct a Steiner tree with a given full Steiner topology on n points, in
particular, how to find the Steiner points for 3 points, is of key importance
in the Steiner tree problem.
Suppose S is a full Steiner tree on M. Since S is a tree, the set of vertices
of S can be divided into two disjoint subsets VI and V2 so that any edge of
S has one endpoint in VI and another endpoint in V2' Hence, S can be so
Shortest Networks on Surfaces 593

ej
t~JF.nj
/ t'
s. :3
- -, , n'j
n4
t;

Figure 1: Tangent vectors and normal vectors.

oriented that the direction of an edge pq, p E VI, q E V2, is from p to q.


Suppose el, e2 and e3 are the edges incident to a Steiner point s. Suppose
the tangent vectors of these edges at s are tl, t2 and t3, whose directions
coincide with the orientations of el, e2 and e3, respectively (Figure 1). Let
L(x,y) denote the angle between two vectors x and y. Then by Theorem
1.1,
(2)

Remark 1.1 Only two of the three equations are independent. The reason
is that s lies on a 2-dimensional surface.

Remark 1.2 If all ti are unit vectors, IItill = 1, then the system of equations
(2) is equivalent to tl + t2 + t3 = O.

Instead of the tangent vectors, we can consider the vectors that lie on
the tangent plane at s and are perpendicular to the tangent vectors. These
vectors are referred to as normal vectors. Let Di (i = 1,2,3) be the normal
vectors at s, and let their directions be so defined that Di always lie on the
left (or right) side of ti. Then, the normal vectors also meet each other at
27r / 3 (Figure 1):

Remark 1.3 Similarly, if all Di are unit vectors, IIDili = 1, then this system
of equations is equivalent to DI + D2 + D3 = O.
594 J. F. Weng

Note that ti (or ni) are functions of the endpoints of ei. Hence, all
Steiner points of S can be found by solving the system of equations (2) or
(3). Theoretically, we can construct S in this way though it might not be
practical. Firstly, it is often very difficult to find geodesics on curved surfaces
and to work out the expressions of ti or ni. Secondly, the expressions of ti
or ni may be very complicated, and we have to use a numerical method to
solve the non-linear system of equations (2) or (3). There are two special
cases in which the geodesics are simple.
1. M is a plane. Since all Steiner points lie on a plane, using a special
coordinate system - the hexagonal coordinate system - it is easy to es-
tablish a set of linear equations that determines the Steiner points [15],[8].
Therefore, the problem of constructing a Steiner tree with a given Steiner
topology has a simple solution. Obviously, the hexagonal coordinate method
can apply to any developable surfaces, e.g. cylinders or cones.
2. M is a sphere. The edges of S are smaller arcs of great circles.
Therefore, the normal vector of an edge is irrelevant to the endpoints of the
edge and has a simple expression. Based on this fact, a descent algorithm
has been given in [1]. We will give an outline of this algorithm in Section
2.4.

2 Steiner Minimal Trees on Spheres


2.1 Properties of Steiner Minimal Trees
A round sphere is the simplest curved surface in R 3 • Suppose <P is a unit
sphere with the origin 0 as its center. First, the following theorem comes
from Corollary 1.3 directly.

Theorem 2.1 If S is a Steiner minimal tree on a unit sphere <P, then any
edge in S is either strictly shorter than 7r or just equal to 7r. In the latter
case the edge joins two antipodal points that are terminals. In particular,
if S is a full Steiner minimal tree spanning more than two terminals on <P,
then any edge in S is strictly shorter than 7r.

As we will see, there may exist more than one Steiner point for three
points on a sphere. We use the variational argument to study which Steiner
points are minimal points. Let sq be an edge, and let s move in direction v
on <P so that the angle between v and sq is (). By the cosine rule for spherical
Shortest Networks on Surfaces 595

triangles it is easy to prove [1] that the directional directives of Isql are

dlsql = _ cos 0 sin2 0


(4)
dlvl ' tan Isql'

Remark 2.1 Compare these formulae with the corresponding ones in the
Euclidean plane:

dlsql ~Isql . 2
dlvl =- cosO, dlvl2 = sm O.

Since the second derivative is positive in the Euclidean plane, a Steiner point
in the Euclidean plane is always locally minimal. However, a Steiner point
on a sphere may not be minimal by Equation (4).

Suppose s is a Steiner point and its adjacent vertices are a, band c. Let
s move in direction v that forms an angle 0 with sc. Since s is a Steiner
point, for any 0

L· ( ) = dL(s) = d(lasl + Ibsl + Ics!) = 0


s dlvl dlvl .

To determine if s is a minimal point, we need to check the second variation


ofL(s)
L"() sin2(27r/3±O) sin2(27r/3=F O) sin2(O)
s = tan Isal
+ tan Isbl
+ .
tan Iscl
(5)

Theorem 2.2 Suppose s is a Steiner point in a Steiner tree S and its ad-
jacent vertices are a, band c.
(1) If all three edges of s are no longer than 7r /2 and if at most one is
equal to 7r /2, then s is locally minimal.
(2) If two edges of s are no shorter than 7r /2, then s is not locally mini-
mal.

Proof. The first part is easily seen from Equation (5), and we only prove
the second part. Suppose there are two edges, say as and bs, having length
greater than or equal to 7r /2. If one of as and bs is strictly greater than
7r/2, then £(s) < 0 when 0 = O. Hence, s is not minimal. Now suppose
lasl = Ibsl = 7r/2. Let s' be the point on sc such that Iss'l = 2€ and € is very
small. Let 8 = as + bs + cs and 8' = as' + bs' + cs'. Since Icsi-Ics'l = 2€,
to show 181 > 18'1 we need only to prove that lasl + € - las'l > 0 and
596 J. F. Weng

(1) (2)

Figure 2: Local minimality.

Ibsl+€-Ibs'l > O. By symmetry, we merely prove the former inequality. Since


cos las'l = sin 2€ cos{21[" /3) = - sin € cos €, and since cos{lasl + €) = - sin €,
we have cos{lasl + €) < cos las'l. It implies lasl + € - las'l > O. This proves
that s is not a minimal point. D

By this theorem only one case has not been determined: two edges of
s are strictly shorter than 1["/2 and one edge is strictly longer than 1["/2. In
that case, i{ s) being positive or negative depends on the precise lengths of
the edges. A geometric interpretation is as follows. Suppose sa, sb are both
strictly shorter than 1["/2. Let Cab be the locus of s so that Isal + Isbl is
constant. Then Cab is a closed curve like an ellipse. Let Cc be the locus
of s so that Icsl is constant. Then Cc is a circle of latitude with e being a
pole. If the regions bounded by the two curves Cab and Cc have no common
parts, then s might be locally minimal (Figure 2(1)). Otherwise, s cannot
be minimal because the move of s into the shared part will shorten the tree
(Figure 2(2)). In the latter case, if we assume furthermore that lasl = Ibsl,
then £(s) > 0 when s moves along the bisector of abo Hence, s is a saddle
point.
At the end of this subsection, we give a simple example showing that
there may exist self-intersecting paths in a Steiner tree on spheres, and that a
locally minimal Steiner tree is not necessarily relatively minimum. Let abed
be a quadrilateral with all angles being equal and satisfying Ibel = Idal «
labl = ledl < 1["/2. Let G be the Steiner topology on abed with a Steiner point
Sl joining a, b, and another Steiner point S2 joining e, d (Figure 3). If abed
lies in the Euclidean plane, then, trivially, no Steiner tree exists with this
topology. On the sphere 4>, there are two Steiner trees such that aSI, bs l , eS2
Shortest Networks on Surfaces 597

Figure 3: Steiner trees with intersecting edges.

and dS 2 are minor arcs but S1S2 is a major arc. One Steiner tree, denoted by
8 1 in Figure 3, has no intersecting paths while another Steiner tree, denoted
by 82, has aSI intersecting ds 2, and bSI intersecting CS2. Clearly, 82 cannot
be relatively minimum. Note that tan lasll > 0, tan Ibsll > 0 in 81. We
can construct abcd so that ISls21 > 31r/2. Therefore, tan ISls21 > 0 and
L(sd > 0 for any (). In that case SI is a minimal point. By symmetry, S2 is
also a minimal point. Hence, 8 1 is a locally minimal tree. However, SI does
not minimize L(sd = lasll + Ibs 11+ ISls21 since the edge S1S2 is a major
arc. We conclude that the minimal point SI in 8 1 is not a minimum point.
Consequently, 8 1 is not a relatively minimum tree. In fact, the relatively
minimum tree for abed with such a topology does not exist on ~ either.

2.2 Steiner Points for Spherical Triangles


Let a, band e be three points not lying on a great circle of a unit sphere
~. Since two points can be joined by either a minor or a major arc of great
circle, 3 points constitute 8 triangles [13]. The triangle comprised of three
minor arcs is denoted by l::.abc. In general, each triangle may have a Steiner
point, and there exist 8 Steiner points. Trivially, the antipodal of a Steiner
point is also a Steiner point. The 8 Steiner points are partitioned in four
pairs. Let them be Si, 8i (i = 1, ... ,4), where 8i denotes the antipodal point
of point Si. Figure 4 shows 4 Steiner trees with Steiner points SI, 81, S2 and
82, respectively. The other 4 Steiner trees are similar to the trees in Figure
4(3) and Figure 4(4).
First, if l::.abe is so small that SI a, SI band SI e are all less than 1r /2, then
SI is a minimal point in l::.abe by Theorem 2.2 (Figure 4(1)). In that case, its
antipodal 81 is a maximal point (Figure 4(2)). Depending on the lengths of
edges, other Steiner points might be minimal points, or maximal points, or
saddle points as argued in the last subsection. On the other hand, if l::.abc
is sufficiently large, then more than one Steiner point may lie in l::.abe. Take
598 J. F. Weng

(1) (2)

(3) (4)

Figure 4: Steiner trees for three points.


Shortest Networks on Surfaces 599

an equilaterall::,abc as an example. With the increases of the side length, 82


approaches Sl. If labl becomes greater than 2 arcsin J2/3, then 82 goes into
l::,abc and lies on SIC [3]. (However, as we prove later, 82 cannot be minimal
even if it lies in l::,abc.) By symmetry, there are other two Steiner points
lying on Sla and sIb separately. When the side length finally equals 21r/'J,
i.e. when a, b, c lie on an equator, the three Steiner points all collapse into
Sl. Hence, the 8 Steiner points collapse into two distinct points.
To investigate the distribution of the Steiner points, without loss of gen-
erality, assume a, b lie on the XY -plane so that

a = (- sin(a), - cos(a), 0), b = (sin(a), - cos(a), 0).

The locus of point p = (x, y, J1 - x 2 - y2) on <P satisfying Lbpa = constant


is referred to as an equiangular curve of abo Let () = Lbpa. Then, by spherical
trigonometry,

lapl = 1r - arccos(xsina + ycosa), Ibpl = arccos(xsina + ycosa),

and

() _ (Cos(2a) - cos lapi cos Ibpl)


- arccos . Iap I'
sm sm Ibpl
(1 - x 2) sin2 a - (1 - y2) cos 2 a )
= arccos (
J1- (xsina + ycosa)2J1- (xsina - y cos a)2
= constant. (6)

Note that if {x, y} satisfies this equation, then {-x, y },{ x, -y} and {-x, -y}
satisfy the equation too. Hence, for any constant () there are two equiangular
curves on a hemisphere with endpoints a, b, a and b. If () > labl, then the two
curves lie on two sides of the XZ-plane. Conversely, if () < labl, then the
two curves lie on two sides of the YZ-plane. When () = labl, two equiangular
curves meet at the pole. Figure 5, plotted by MAPLE, is the orthogonal
projection of equiangular curves for labl = 2.08. When labl ~ 21r /3, the
curve of () = 21r/3 that ends at a and b is denoted by C21r / 3 (ab), and another
curve of () = 21r/3 that ends at a and b is denoted by C21r / 3 (ab).

Theorem 2.3 Suppose s is a minimal point in l::,abc, and labl ~ 21r /3.
(1) The distance from s to ab is no more than 1r/2.
(2) Lsab ~ 1r /2, Labs ~ 1r /2.
600 J. F. Weng

0=2.05

=2.08
=21t/3
(= 2.09)

.J

Figure 5: Orthogonal projection of equiangular curves (Iabl = 2.08).

Proof. It is clearly seen from Figure 5 that if 8 lies on C21T / 3 (ab), then Lb8a
becomes larger than 27r/3 when 8e shrinks. It follows that £(8) < O. Hence,
a Steiner point on C21T / 3 (ab) cannot be locally minimal. However, if 8 lies
on G21T / 3 (ab), then 8 lies in ~abp where p is the pole and the projection of
p is o. Clearly, the theorem holds in this case. 0

Theorem 2.4 If all sides of ~abe are no more than arccos(-1/v'3), then
there is at most one Steiner point lying in b...abe.

Proof. Suppose a'b'd is an equilateral triangle such that d lies on C21T / 3 (a'b').
Let la'b'l = x. Then, by the cosine rule,

2 • 2 27r 2 sin2 x
cos x = cos x - sm x cos ""'3 = cos x - -2-'

The solution is x = arccos (-1 / v'3). This proves that C21T /3 (ab) cannot meet
~abe at its interior points if labl $ arccos (-1/v'3). By symmetry, C21T / 3 (be)
and C21T / 3 (ea) cannot meet the interior points of b...abe either. Hence, there
is at most one Steiner point which is the intersection of G21T / 3 (ab), G21T / 3 (be)
and C21T /3 (ca). The theorem is proved. 0
Shortest Networks on Surfaces 601

Remark 2.2 Note that arccos(-I/V3) equals 2arcsinJ2/3 (~0.695911"),


which has been mentioned above. Obviously, if 6abc is equilateral, then 82
lies on C27r /3 (ab). Hence, 82 is not minimal even if 82 lies in 6abc.

2.3 N umber of Steiner Trees with a Full Topology


Suppose F is a full Steiner topology spanning a set A of n points. In the
Euclidean plane there is at most one Steiner tree on A with a topology in
D(F), the topology family consisting of all degeneracies from F [9]. We
have seen that this is not true for spherical triangles. Given a full Steiner
topology F on a unit sphere <P which spans n terminals a1, a2, ... ,an, with
n - 2 Steiner points 81,82, ... ,8 n -2. For any point p on <P, let p be the
vector oP. As explained at the end of Section 1, to determine Steiner trees
on <P it is preferable to use normal vectors. Let the unit normal vectors of
the 2n - 3 edges ei be lli (i = 1,2, ... ,2n - 3). It is easily seen [1] that these
normal vectors are determined by

Illlill=1 (i=I,2, ... ,2n-3), (7)


lli . ai = 0 (i = 1,2, ... ,n), (8)
where ai is the terminal incident to ei, and

lli + llj + llk = 0, (9)


for each Steiner point 8 m whose edges are ei, ej and ek.
Once all lli are obtained, we can find the Steiner points by solving an-
other set of equations. The Steiner point 8 m whose edges are ei, ej and ek,
is determined by
Ilsmll = 1, Sm· lli = 0, Sm· llj = o. (10)
Remark 2.3 There is a redundant equation Sm· llk = 0 because of Equation
(9).

Since Equations (8) and (9) are linear and Equations (7) are quadratic,
the system determining normal vectors has at most 22n - 3 solutions. Note
that if lli is a solution for normal vectors, then -lli is also a solution.
However, both lli and -lli give the same solution of system (10). Since
system (10) is quadratic, the following theorem holds [1]:

Theorem 2.5 There are at m08t 22n - 3 Steiner trees on a sphere that span
n terminals with the same topology.
602 J. F. Weng

Figure 6: Approaximations of the Steiner point in a triangle.

2.4 Constructing Relatively Minimum Trees on Spheres


Given a point set A on a sphere cP and a Steiner topology G for A, a simple
iterative process of constructing a relatively minimum Steiner tree for A is
as follows: Starting with any tree spanning A with topology G, replace an
additional point with the minimum point of its three adjacent vertices. This
replacement always reduces the length of the tree. Repeat this one-point-
replacement procedure till all additional points become minimum points with
respect to their adjacent vertices, or in practice, till the required accuracy
of the tree length is achieved. Clearly, the core of this descent algorithm is
to find the minimum point of a spherical triangle, i.e. to find the Steiner
points in a spherical triangle because a minimum point is either a Steiner
point or a vertex of the triangle.
Up to date we do not have an algorithm that can compute the exact
Steiner points in a spherical triangle. Although the Steiner points can be
determined as described in the last subsection, however, this requires to
solve a set of Equations (7),(8) and (9) whose total degree is eight. The
advantage of this approach is that any established numerical program for
solving algebraic equations can be employed for this aim. The disadvantage
is that such commonly used programs might not be efficient in this particular
case. Below we sketch two approximations. SMTsurface. tex
Suppose we have had a tree spanning n terminals with the given Steiner
topology. Suppose s is an additional point in the tree whose adjacent vertices
are a, band c. Usually, the larger the n is, the smaller the f:::.abc is. Therefore,
we assume that f:::.abc satisfies Theorem 2.4 and there is merely one Steiner
Shortest Networks on Surfaces 603

point in 6abc. Let 6abc denote the planar triangle with a, band c as
its vertices. Let So be the Steiner point of 6abc, and let sg be the point
on <P that is the central projection of so. It is suggested in [4] to take
sg as an approximation of s. Another approximation can be constructed
as follows [18]. Let d be the point in 6abc such that od .1 6abc. Since
loal = lobi = loci, d is the circumcenter of 6abc. Let d' be the intersection
of <P with the extension of od. Let P be the tangent plane at d'. Then P is
parallel to the plane determined by 6abc. Let a', b' and c' be the orthogonal
projections of a, band c onto P (Figure 6). Then, 6abc is parallel to and
congruent with 6a'b'c'. Note that

a' = d' + d' x (a x d'), b' = d' + d' x (b x d'), c' = d' + d' x (c x d').

It follows that

L'(d') = la'd'i + Ib'd' I + Ic'd'i = sin lad'i + sin Ibd'i + sin lcd' I
is an approximation of

L(d') = lad'i + Ibd'i + Icd'i


since sinx ~ x if x is small. Let sb be the Steiner point of 6a'b'c', i.e.
the orthogonal projection of So. Let So be the intersection of sOS6 with <P.
Since S6 is the minimal point in 6a'b'c' that minimizes L'(S6), So is a good
approximation of s that should minimize L(s).

3 Steiner Ratios on Spheres


Let LS(A) be the length of the Steiner minimal tree, and LT(A) be the length
of the minimal spanning tree on a set A, respectively. The Steiner ratio p( A)
for A is the ratio of LS(A) to LT(A)' The Steiner ratio problem in a metric
space M asks for the infimum of the Steiner ratios over all sets in M:

PM = inf p(A) = inf LLS(A).


AEM AEM T(A)

As we stated before, constructing S(A) is an NP-hard problem while con-


structing T(A) is an easy problem, which can be solved in polynomial time.
Therefore, the Steiner ratio can be used to measure how much the length of
a network can be reduced by inserting additional points. In 1992, Du and
604 J. F. Weng

Hwang first proved [5] that the Steiner ratio in the Euclidean plane is J3/2.
Recently, the Steiner ratio on spheres is proved to be the same as in the
Euclidean plane [12]. The proof of the latter result is based on a theorem
called compression theorem. In this section we give an outline of this proof
[19].

3.1 Compression Theorem


Suppose albici is a (minimal geodesic) triangle on a convex surface M with
curvature KM > 0 in R3. By geometric measure theory, the area of 6albici
and the length of any curve in 6al bl CI are measured by a (countable) num-
ber of very small disks (or other sets, e.g. triangles) covering 6aIblcl. The
set of disks is referred to as a cover. When we compress the surface, or
precisely speaking, when we embed 6albici in a flatter surface H, e.g. a
sphere with constant curvature KH (~KM), and preserve the side lengths,
the triangle tends to contract. That is, the disks covering the embedded tri-
angle on H have more overlapping parts. As a result, the distance between
any two points in the triangle becomes shorter. A precise description of this
fact is the following theorem [19],[12]:

Theorem 3.1 (Compression Theorem) Let M be a convex surface with cur-


vature KM > 0, and let H be a simply connected surface with constant pos-
itive curvature KH ~ KM. Let aibiCi (i = 1,2) be two (minimal geodesic)
triangles of equal side lengths on M and H respectively. Then, there exists
a continuous map h of the triangle albici onto the triangle a2b2c2 so that
for any two points p and q in albIcI, Ipql ~ Ih(P)h(q)l.

Note that the angle at a vertex is measured by the distance between


the sides of the angle. Hence, the angles of the embedded triangle become
smaller too. This is a known result in Riemannian geometry, called the
comparison theorem. The following is its symmetric form of the original
comparison theorem proved by Toponogov [2].

Theorem 3.2 (Comparison Theorem) Let M be a convex surface with cur-


vature KM > 0, and let H be a simply connected surface with constant posi-
tive curvature KH ~ KM. Let aibiCi (i = 1,2) be two (minimal geodesic) tri-
angles of equal side lengths on M and H respectively. Let ai, {3i, Ii (i = 1,2)
be the angles at ai,bi,ci. Then a2 ~ a1,{32 ~ /31,,2 ~ ,1.

To prove Theorem 3.1 we need a lemma [12].


Shortest Networks on Surfaces 605

---=======~~=====-~
Tj
~ ~ __________ ____----- c
~ 2

Lemma 3.3 Suppose Pi and qi (i = 1,2) are two points on aibi and aiCi
with ialPli = ia2P2i, and ialqli = ia2q2i respectively. Then iPlqli ~ iP2q2i.

Proof. (sketch) By the comparison theorem we have

Lalbici ~ La2~c2, LalClbl ~ La2c2~'


Hence, when bl Cl moves to al and ~C2 moves to a2 synchronically, ib2C2i
decreases faster than iblCli by the variational argument [10]. 0

Proof of Theorem 3.1 (sketch) Suppose abb'a' is a quadrilateral on M with


labl = la'b'l. Let a move to b, and a' move to b' synchronically so that the
geodesic joining a point on ab and the corresponding point on a'b' sweeps
the whole region bounded by abb'a'. Then, any point p* in abb'a lies on a
geodesic pp' so that P is on ab, p' is on a'b', and iaPi = ia'p'i. Hence, we can
define a map s of abb' a to ab so that s(p*) = P for any p* in abb' a. By the
triangle inequality of metrics, the distance between two points P and q in
abb'a' is reduced by this map s. That is, ipqi ~ is(P)s(q)i.
Now let Pi, qi be the midpoints of aibi, aiCi (i = 1,2) respectively. Then
by the above lemma we have iPI qli ~ iP2q2i. Hence, we can construct a trian-
gle alPiqr in .6alblcl so that it has the same side lengths as .6a2P2q2. Let ri
be the midpoint of biCi (i = 1,2). Similarly, we have iPlrli ~ iP2r2i, irlqli ~
hq2i, and we can construct .6bldp~ and .6clq}rr so that they have the same
side lengths as .6b2r2P2 and .6c2q2r2 respectively (Figure 7). Moreover,
since ialPii = ialPli, iblP~i = iblPli and so on, by the triangle inequality of
606 J. F. Weng

metrics the three small triangles in al bl Cl do not meet. Therefore, we can


construct the fourth triangle p~r~q~ in the polygon plp~dr~qlq~ so that the
triangle p~r~q~ has the same side lengths as the triangle P2r2q2. In this way,
once a2b2c2 is partitioned into 4 small triangles, alblcl can correspondingly
be partitioned into 4 small triangles with the same side lengths, plus some
gaps between them and the sides of ~al bl Cl. These gaps are: 6 triangles
like alPlPl, 3 quadrilaterals like plp~ q~ q~, and 3 quadrilateral like PlP~p~pi .
Now we redefine and extend the distance-reducing map s as a map of the
three types of gaps: alPlPi to a2P2 and so on, pip~q~qi to P2q2 and so on,
and PlP~p~pi to P2 and so on. Clearly, s preserves the property of distance
reduction. Summing up, we have obtained a continuous map s that maps
the gaps in alblcl to the sides of small triangles in a2b2c2. Repeating this
operation, we triangulate a2~c2 and construct the corresponding triangles
in alblcl so that all triangles become arbitrarily small. In the end, we estab-
lish a cover of a2b2c2 consisting of triangles, and all corresponding triangles
in al bl Cl are separated by gaps. Hence, the compression theorem holds in
which the required continuous map h is a natural extension of s [12]. 0

3.2 The Steiner Ratio on Spheres


Suppose aibiCi (i = 1,2) are two triangles of equal side lengths lying on
spheres CPi with radii rl, r2 (rl < r2) respectively. By the compression
theorem there is a map h : alblcl -+ a2b2c2 so that for any two points PI
and ql in alblcl, we have Iplqll ~ Ih(Pl)h(qdl. Moreover, if PI, ql are not
on the same side, then the inequality strictly holds. The following lemma,
which is an application of the compression theorem, is the key to the proof
of the Steiner ratio on spheres.

Lemma 3.4 Suppose 81 is a full Steiner minimal tree in ~alblcl such that
all terminals at (i = 1,2, ... ) lie on the sides of ~al bl Cl. Let 8 2 be the
Steiner minimal tree in ~a2b2c2 spanning a~ = h(at} (i = 1,2, ... ). Then
181 1~ 182 1.
Proof. Let st (i = 1,2, ... ) be the Steiner points in 8 1 , and let s~ = h(st}
(Figure 8). Let 8~ be the tree spanning a~ and s~ with the topology of
8 1 , (Note that the topology of 8~ might differ from the topology of 8 2 .)
By the compression theorem each edge of 81 is no shorter than the cor-
responding edge in 82, and consequently, 1811 ~ 1821. On the other hand,
Shortest Networks on Surfaces 607

Figure 8: A Steiner tree with terminals on the sides of a triangle.

by the definition of the Steiner minimal tree 18~1 ~ 1821. Hence, 1811 ~ 1821.0

Now we outline the proof of the Steiner ratio on spheres [12]. Let 81 be
a full Steiner minimal tree on a set of points a} (i = 1,2, ... ). We choose
the lengths of the edges of 81 as the parameters. Two terminals are called
adjacent if they are connected by a convex path in 81. By perturbation we
may assume that the distance between two adjacent terminals is less than
7r, and that not all terminals lie on a greater circle. Suppose a} and a} are
adjacent terminals. Let Cij be a curve that, at the beginning, coincides
with the convex path Pij in 81 that connects at and a}. Then, separate
Cij from Pij by shrinking Cij till Cij becomes the minor arc of the great
circle through at and a} by the assumption that la}a}1 < 7r. When Pij is a
spiral curve, the surface of the sphere is regarded as a Riemann surface, and
hence, Cij moves on different levels in the process of shrinkage. However,
Cij is unique in any case. All such shortest curves compose a polygon PI
referred to as the characteristic polygon in [5]. A crucial fact is that because
8 1 lies in PI, PI can be triangulated into a graph 1\ so that every triangle
in 1\ has side lengths less than 7r [12].
Now suppose 8 1 achieves the Steiner ratio PIon q,l' Following Du and
Hwang [5], consider only the minimal spanning trees that lie in the region
bounded by PI where 81 lies. These minimal spanning trees are called inner
minimal spanning trees. Hence, PI ~ 8dTl where Tl is an inner minimal
spanning tree. Note the set of terminals is the set of vertices of Pl' Let
r 1 be the union of PI and all inner minimal spanning trees. It has been
proved that no edges of two minimal spanning trees can cross [10]. Hence,
r l is a subgraph of 1\ by the definition. Corresponding to each triangle
.6.} (j = 1,2, ... ) of I'l, we construct a triangle .6.~ on q,2 with the same
(absolute) side lengths. Since PI is simple and all terminals are vertices of
608 J. F. Weng

PI, there is no loop of triangles in 1\. This guarantees that we can combine
all D.] into a graph, denoted by r2. Since r1 < r2, the circular length of
each edge of r 2 is strictly less than 7r. The subgraph in r 2 corresponding to
r 1 is called r 2 •

Lemma 3.5 The Steiner ratio PIon cI>1 is no less than the Steiner ratio P2
on cI>2 if (r2 - rd is sufficiently small.

Proof. Consider any full Steiner minimal tree 8 1 on cI>1. As stated above, we
can map r 1 onto <1>2 so that all edges in the graph r 2 have the same lengths
as the corresponding edges in r 1. Consider the part 8J of 8 1 lying in some
triangle D.}. The length of 8J is no less than the length of the corresponding
Steiner minimal tree 8J by Lemma 3.4. Hence, the union of all 8J composes
a tree 8~ spanning r 2, and the length of 8~ is no longer than 8 1. On the
other hand, IS~I ~ IS21 where 82 is the Steiner minimal tree spanning r 2.
Hence, 1811 ~ 1821.
Now we compare the minimal spanning trees spanning r 1and r2. Let
e}, i = 1,2, ... , be the diagonals of the polygons in rl. Consider the differ-
ence between the lengths of diagonal e} with the lengths of the other sides
of the polygon in r 1 where the diagonal etlies. Let E be the smallest of
these differences. Then, if r2 - r1 is sufficiently small so that Ie} I > Ie~ I - €,
then the corresponding edges e; are still diagonals of the polygons in r 2. It
follows that the inner minimal spanning tree for r 2 has the same length as
the inner minimal spanning tree for r 1 . Thus, we have PI ~ P2. 0

Theorem 3.6 The Steiner ratio P for spheres is /3/2, the same as in the
Euclidean plane.

Proof. By Lemma 3.5, for a fixed full Steiner minimal tree 8 on a sphere <1>,
its Steiner ratio is a non-increasing function of the radius r of <1>. Obviously,
when r goes to infinity, the ratio of the Steiner minimal tree for equilateral
triangles approaches /3/2. Hence, the theorem holds. 0
Shortest Networks on Surfaces 609

4 Steiner Minimal Trees for Co circular Points on


Spheres
4.1 History of the Problem
Suppose all vertices of a polygon P = ala2'" an lie on a circle C. In 1934,
Jarnik and Kassler studied the problem of constructing the shortest network
connecting the vertices of a regular n-gon. If n :::; 5, then the shortest
network for P is not the minimal spanning tree because it contains Steiner
points. Hence, their paper is regarded as the first paper on the 'genuine'
Steiner tree problem though the Steiner tree problem can be traced back
to a problem raised by Fermat [9]. The Steiner tree problem for regular
polygons was completely solved half century later [14],[6]. The conclusion is
as follows:

Theorem 4.1 If n 2: 6, then the Steiner minimal tree S for the vertices of
a regular n-gon coincides with the minimal spanning tree. If n :::; 5, then
S is a full Steiner tree whose length is less than the length of the minimal
spanning tree.

After Jarnik and Kassler, in the late sixties, Graham raised a conjecture,
which we state as a theorem since it has been proved by Rubinstein and
Thomas [11].

Theorem 4.2 (Graham's Conjecture) Suppose all vertices of a polygon P =


ala2 ... an lie on a circle C of radius r in the Euclidean plane. If at most
one of the sides of the polygon P has length strictly greater than the radius
r, then the Steiner minimal tree for P coincides with the minimal spanning
tree that consists of all sides of P with a longest side removed.

The condition in this theorem will be referred to as the Graham condi-


tion. Let ~ be a collection of point sets A in a metric space M. Then the
Steiner ratio for ~ is defined as

p(~) = inf p(A) = inf LLS(A) :::; 1.


AE~ AE~ T(A)
If p( ~) is achieved by certain set A, then

p(A) = LsLT -; LsLT = LT (~s _ P(A)) 2: O.


LT LT LT
610 J. F. Weng

Note that S{A) = T{A) means p{A) = 1. Hence, if A is the collection of


all sets of co circular points satisfying the Graham condition, then, in terms
of the Steiner ratio, Theorem 4.2 is equivalent to p{A) = 1. In this section
we show that the Steiner minimal tree for a spherical polygon P always
coincides with the minimal spanning tree of P if a generalized Graham's
condition is satisfied or if all points of P lie on an equator.

4.2 Graham's Conjecture on Spheres


Suppose P = al a2 ... an is a spherical polygon whose vertices lie on a unit
sphere ~. For convenience let an+! = al. Assume all ai of P lie on a circle
C with radius r. Let p be the center of the spherical cap bounded by C. (By
a spherical cap we mean the part of surface of ~ bounded by a circle.) The
point p is referred to as the pole of the cap. Note that the distance from p
to any point on C is equal. This distance, which is longer than r, is denoted
by r and referred to as the cap radius of C. Clearly

r = sin{r).

The condition that at most one of the sides of P has length strictly greater
than r is referred to as the spherical Graham condition. Then, we can
generalize Graham's conjecture to the Steiner minimal trees for cocircular
points on spheres [17].
By the triangle inequality of metrics, the minimal spanning tree of P
consists of all sides of P with a longest side removed. Let A be the collection
of all point sets satisfying the spherical Graham condition. Then, we wish
to show p{A) = 1. Suppose P is the polygon achieving p{A) and S is
the Steiner minimal tree on P. As in the proof of the Steiner ratio, two
neighbouring vertices ai-l and ai are connected by a convex path of S.
Because S is a tree, the cap is partitioned into n regions by these paths.
By perturbation of terminals, we may assume that the pole p of the cap
lies strictly in one of these regions, say the region bounded by anal and the
convex path connecting al and an. Since the region is convex, no part of S
lies in ~panal'
First we claim that all sides other than anal are less than or equal to
r. Suppose to the contrary that lai-Iail > r for some i (2 ~ i ~ n).
Then by the spherical Graham condition, lai-lail > lanall. Let the parts of
S lying in the polygons pal'" ai-I, pai-lai and pai'" an be sf-I, Si and
Sr, respectively. We construct a new polygon pI = a~ a~ ... a~ by rotating
Shortest Networks on Surfaces 611

Figure 9: The proof of the generalized Graham's conjecture.

pal' .. ai-l with s1- l around p towards pai, and at the same time rotating
pai'" an with Sf towards pai-l, untillai_lail = lanall (Figure 9).
Let Si meet pai-l and pai at Pi (i = 1,2, ... ) and let the Steiner minimal
tree for Pi (i = 1,2, ... ) in 6pai_lai be Si. Because at least one edge of Si
does not perpendicularly intersect pai-l or pai, by the variational argument,
Si is strictly shortened whenpai-l rotates towards pai, and whenpai rotates
towards pai-l. That is, Isn < ISil, and

IS(P')I ~ IS~-ll + IS:I + ISil < IS~-ll + ISil + ISil = S(P).

However, the minimal spanning tree for P' has the same length as the min-
imal spanning tree for P. It follows that p(P) > p(P'), contradicting that
p(P) = p(6).
Now we construct a polygon P = ii!a2 ... an, inscribing a circle C with
center p in the Euclidean plane such that Ipail = Ipail = f, lai-lail =
lai-lail (2 ~ i ~ n). By the comparison theorem (Theorem 3.2), Laipai-l >
Laipai-l (2 ~ i ~ n). It follows that Lalpan > Lalpan. Consequently,
lalanl > lalani, otherwise a contradiction Lalpan ~ Lalpan appears again
by the comparison theorem. Since al an is the unique side probably longer
thanpal, P satisfies the Graham condition in the plane. Therefore, IS(P)I =
IT(P)I by Theorem 4.2. By the construction we also have IT(P)I = IT(P)I.
The last step is to show that IS(P)I ~ IS(P)I. Let Si be the parts of S
lying in 6pai-lai (2 ~ i ~ n). Let PI,
j = 1,2, ... , be the terminals of Si
that lie on pai-l and pai. Let pt be the corresponding points lying on the
corresponding sides of pai-l and pai. Then, by Lemma 3.4 Si is not shorter
than the Steiner minimal tree Si on pt. Hence, finally we obtain

IT(P)I ~ IS(P)I = L ISil ~ L ISil ~ IS(P)I = IT(P)I = IT(P)I·


612 J. F. Weng

Summing up all arguments, we have the following conclusion:

Theorem 4.3 Suppose all vertices of a polygon P = ala2'" an lies on a


circle C on a sphere ~ with cap radius r. If the spherical Graham condition
is satisfied, i. e. if at most one of the sides of the polygon P has length strictly
greater than the cap radius r, then the Steiner minimal tree for P coincides
with the minimal spanning tree that consists of all sides of P with a longest
side removed.

4.3 Steiner Minimal Trees for Points on Equators


It is worth noticing that there is a difference between the generalized version
of Graham's conjecture (Theorem 4.3) and the original one (Theorem 4.2).
For example, the Steiner minimal tree on a regular pentagon in the Euclidean
plane is not the minimal spanning tree by Theorem 4.1. From the viewpoint
of Theorem 4.2, the reason is that the regular pentagon does not satisfy the
Graham condition though all vertices of the regular pentagon lie on a circle.
However, if the circle circumscribing the regular pentagon has a sufficiently
large cap radius on ~, say near 1r/2, then the Steiner minimal tree for the
regular pentagon on <P can coincide with the minimal spanning tree. This
fact leads us to study the set whose points lie on an equator E [17]. Let P
be such a set. Then the sides of P are the arcs of E. Suppose S(P) lies on
the hemisphere whose pole is p.
If two terminals a and bin S(P) join the same Steiner point s, then the
subtree consisting of sa and sb is referred to as a cherry, and a, b are referred
to as the terminals of the cherry.

Lemma 4.4 Suppose sab is a cheery with labl < 21r /3. Then Lsab <
1r/2, Labs < 1r/2. Consequently, a and b are adjacent vertices of P.

Proof. As a Steiner point in S(P), s is locally minimal. Hence, s lies on the


equiangular curve C 27r / 3 (ab). By Theorem 2.3, Lsab < 1r/2, Labs < 1r/2. It
implies by the angle condition that a and b are adjacent vertices. 0

Lemma 4.5 Suppose sab is a cherry with labl ~ 1r/2. Then Lsab + Labs ~
1r/2.

Proof. Let a = Lsab, f3 = Labs. Let Area be the area of ~sab. Since

a + f3 = 1r + Area - Lbsa = 1r/3 + Area,


Shortest Networks on Surfaces 613

a + f3 achieves maximum when Area achieves maximum. Obviously, Area


achieves maximum when a = f3 and labl = 7r /2. In that case, sa = sb, and
by the sine rule,
. . sina 2sina
sm{sa) = sm{ab) sm
. (Lb ) = -r.;.
sa v3
By the cosine rule,

0= cos{ab) = cos 2{sa) + sin2(sa) cos{Lbsa) = 1 - 3 sin~(sa) = 1 - 2 sin2 a.

It follows that a = 7r / 4 and a + f3 S; 7r /2. o

Theorem 4.6 If all vertices of a polygon P = ala2'" an lie on an equator


E of a sphere, then the Steiner minimal tree for P coincides with the minimal
spanning tree that consists of all sides of P with a longest side removed.

Proof. Similarly to Theorem 4.3, we need only to prove p(~) = 1 where ~


is the collection of such point sets P that all points of P lie on the equator
E. Suppose P is the polygon achieving p(~}. We prove the theorem by
contradiction. Suppose the Steiner minimal tree S{P} does not coincide
with the minimal spanning tree T{P}, then there is at least one Steiner
point in S(P).
First, we claim that each full component of S{P} contains at least one
Steiner point. Without loss of generality, suppose to the contrary that anal
is an edge of S(P). Let SI be the connected part of S{P) that contains al as
a terminal but does not contain anal. Now we rotate SI around p towards
an. Similarly to the proof of Theorem 4.3, it results in a decrease of p{P}
since is = -1 but iT ~ -1. This contradicts that p{P) = p{~}.
Next, suppose n = 3 and the Steiner point is s. Without loss of gener-
ality suppose lala21 ~ la2a31 ~ la3all. If lala21 = 27r/3, then 6abc is an
equilateral triangle. Hence, the unique Steiner point is the pole p. However,
p is not a minimal point by Theorem 2.2 that contradicts the minimality of
S. If lala21 < 27r/3 and la2a31 < la3all, then we move a2 to al. It follows
that iT = 0 but is < 0 because Lala2s < 7r /2 by Lemma 4.4. Hence, p < 0
that contradicts p{P} = p{~}. If lala21 < 27r/3 but la2a31 = la3all, then
the pole p lies on sa3. Let La2pal = 20 ~ 27r /3. Then, LT = 7r + 0 and
Ls = 7r + 2arcsin(2sinO/V3} - arcsin{tanO/V3}. In that case we can show
that Ls ~ LT for any 0 S; 7r /3 [17].
614 J. F. Weng

Finally, we assume that n ~ 4 and S{P) has at least two Steiner points.
Moreover, since any degree 2 terminal can be regarded as a degenerate
Steiner point, S{P) should have at least two cherries such that the four
terminals of the two cherries are all of degree one. If the length of the longest
sides of P is no more than 1r /2, then the spherical Graham condition is
satisfied and the theorem holds by Theorem 4.3. Therefore, we assume that
the length of the longest sides of P is strictly larger than 1r /2. Consequently,
at most three longest sides exist. There are two cases.
Case 1. One of the adjacent sides of the cherries ai-2ai-l, aiai+l,
aj-2aj-l and ajaj+1, say aiai+b is not a longest side. Then this side is
in T{P). If we move ai to ai-I, then LT = 0 but Ls < 0 because Lai-lais <
1r/2 by Lemma 4.4. Hence, p{P) < 0 that contradicts p{P) = p{6.).
Case 2. All of these adjacent sides are longest sides. Because n ~ 4,
one of ai-lai and aj-laj, say ai-lai, is shorter than 1r/2. We move ai-l
to ai, and at the same time move ai to ai-l at the same speed. Then
LT = -1 because only one of ai-2ai-l and aiai+1 can stay in T{P). It is
easily seen that if a + f3 :5 1r/2, then cos a + cos f3 ~ cos a + sin a ~ 1,
and the equality holds only if one of a and f3 is zero. Therefore, Ls =
- cos(Lsai_lai) - COS{Lai-lais) < -1 by Lemma 4.5. Again, we have
p{P) < 0 that contradicts p{P) = p{6.). 0

Inspired by Theorem 4.6, one might think that if the cap radius r is
sufficiently large, then IS(P)I = IT(P)I always holds even though f < 1r/2.
However, this is not true [17].

Theorem 4.7 If C is a circle on a unit sphere <P with cap radius r strictly
less than 1r /2, then there is a point set P on C such that the Steiner minimal
tree for P is not the minimal spanning tree.

Proof. Let al a2a3 be an isosceles triangle inscribing C with lal a31 = la2a31 =
2a, lala21 = 2b, and b < a. Let La2pal = 2() where p is the pole of the cap.
Let d be the midpoint of a2a3. The Steiner point s lies on pd. Then,

Ls = 21sa21 + r + Ipdl-Isdl, LT = 2a + 2b.

By the formulae of spherical triangles we have

a = arcsin{sin{f) cos {()/2)), b = arcsin{sin{f) sin{(})),


Isa21 = arcsin{2sin{r) sin{(})/v'3) ,
Shortest Networks on Surfaces 615

Ipdl = arctan{tan{f) cos{O)), \sd\ = arcsin{tan{b)/v'3).


It is not hard to verify that

B{LTBe
- Ls) I9=0 = (2 - [;:;3)' (A)
V.J sm r > 0.

Note that LT - Ls = 0 when e = O. Hence, for any f < 7r /2 there exists


sufficiently small 0 satisfying LT > Ls. The theorem is proved. 0

References
[1] M. Brazil, J.H. Rubinstein, D.A. Thomas, J.F. Weng and N.C.
Wormald, Shortest Networks on Spheres, Proceedin.qs of the DIMACS
Workshop on Network Design, to appear.

[2] J. Cheeger and D.G. Ebin, Comparison theorems in Riemannian geom-


etry, (North-Holland Publishing Co., Amsterdam, 1975).

[3] E.J. Co ckayne , On Fermat's problem on the surface of a sphere, Math.


Magazine, Vol. 45 (1972), pp. 216-219.

[4] J. Dolan, R. Weiss and J. MacGregor Smith, Minimal length tree


networks on the unit sphere, Annals of Oper. Res., Vol. 33 (1991), pp.
503-535.

[5] D.Z. Du and F.K. Hwang, A proof of the Gilbert-Pollak Conjecture on


the Steiner ratio, Algorithmica, Vol. 7 (1992), pp. 121-135.

[6] D.Z. Du, F.K. Hwang and J.F. Weng, Steiner minimal trees for regular
polygons, Discrete Comput. Geometry, Vol. 2 (1987), pp. 65-87.

[7] M.R. Garey, R.L. Graham and D.S. Johnson, The complexity of com-
puting Steiner minimal trees, SIAM J. Appl. Math., Vol. 32 (1977), pp.
835-859.

[8] F.K. Hwang and J.F. Weng, Hexagonal coordinate systems and Steiner
minimal trees, Discrete Math., Vol. 62 (1986), pp. 49-57.

[9] F.K. Hwang, D.S. Richard and P. Winter, The Steiner tree problem,
(North-Holland Publishing Co., Amsterdam, 1992).
616 J. F. Weng

[10] J.H. Rubinstein and D.A. Thomas, A variational approach to the


Steiner network problem, Ann. Oper. Res., Vol. 33 (1991), pp. 481-499.

[11] J.H. Rubinstein and D.A. Thomas, Graham's problem on shortest net-
works for points on a circle, Algorithmica, Vol. 7 (1992), pp. 193-218.

[12] J.H. Rubinstein and J.F. Weng, Compression theorems and Steiner
ratios on spheres, J. Gombin. Optimization, Vol. 1 (1997) pp. 67-78.

[13] I. Todhunter and J.G. Leathem, Spherical trigonometry, (Macmillan


and CO., London, 1901).

[14] J.F. Weng, Steiner minimal trees on vertices of regular polygons (Chi-
nese. English summary), Acta Math. Appl. Sinica, Vol. 8 (1985), pp.
129-141.

[15] J.F. Weng, Generalized Steiner problem and hexagonal coordinate sys-
tem (Chinese. English summary), Acta. Math. Appl. Sinica, Vol. 8
(1985), pp. 383-397.

[16] J.F. Weng, Steiner trees on curved surfaces, preprint, 1997.

[17] J.F. Weng, Shortest networks for cocircular points on spheres, J. Gom-
bin. Optimization, to appear.

[18] J.F. Weng, Finding Steiner points on spheres, preprint, 1997.

[19] J.F. Weng and J.H. Rubinstein, A note on the compression theorem for
convex surfaces, preprint, 1996.
617

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 617-634
@1998 Kluwer Academic Publishers

Minimum Weight Triangulations


Yin-Feng Xu
School of Management
Xi 'an Jiaotong University, Xi 'an, 710049, P.R. China
E-mail: yfxu<oxjtu.edu.cn

Contents
1 Introduction 618

2 Dynamic Programming 619


2.1 MWT for a Simple Polygon . . . . . . . · . . . . . . . . . . . 619
2.2 MWT for General Point Set ... . · . . . . . . . .. . 621
2.3 Remarks.................. · . . . . . . . . . . . 621

3 Subgraphs within MWT 622


3.1 Intersections of Thiangulations. . . . . . . . . . . . . . . . . . . . · . 622
3.2 Local Conditions . . . . . . . . . . . . . . · . 623
3.3 Remarks................................ · . 624

4 Matching Properties 624


4.1 Matching Between Triangulations .. · . 624
4.2 Matching Among a Triangulation. . ... 625
4.3 Remarks.................. · . 627

5 Heuristics 628

6 Conclusion 629

References
618 Yin-Feng Xu

1 Introduction
A triangulation of a given set S of n points in the plane is a maximal set of
non-crossing line segments (called edges) which have both endpoints in S.
A triangulation partitions the interior of the convex hull of the given point
set into triangles. It is used in many areas of engineering and scientific ap-
plications such as finite element methods, approximation theory, numerical
computation, computer-aided geometric design, and etc.
Let CH (S) denote the set of edges bounding the convex hull of S, T (S)
denote a triangulation of S. The Eulerian relation for planar graphs implies
the following equalities.

IT(S)I = 3n - 3 -ICH(S)I
ITt(S)1 = 2n - 2 -ICH(S)I

where IT(S)I is the edge number in T(S) and ITt(S)1 is the triangle number
in T(S) [18].
For a given n points set S in the plane, the number of different triangu-
lations of S is a exponential function of n, the best known lower and upper
bounds are O(23n) [23] and O(28n) [15]. This fact makes to compute optimal
triangulations with some optimality critria become a challenge problem.
Since a triangulation contains edge set and some region set, this enables
the multiple applications of different type of triangulations.
Optimization criteria for which efficient algorithms are known include
maximizing the minimum angle [18], minimizing the maximum angle [20],
minimizing the minimum angle [22], minimizing the maximum aspect ratio
[44], and minimizing the maximum edge length [19]. Delaunay triangula-
tion which is a dual of Voronoi diagram is a maxmin angle triangulation and
can be computed in O(n log n) and O(n) [18,44] and some other optimal-
ity properties in high dimension related with Delaunay triangulation were
observed by Raj an in [44].
Edelsbrunner and Tan gave an O(n2 Iogn) time algorithm for comput-
ing the minmax length triangulation [19]. Edelsbrunner et al.[20] gave an
O(n 2 ) time algorithm to compute a triangulation which minimizes the max-
imum angle. The former algorithm is based on an edge insertion paradigm
which is shown in Bern et al.[7] to lead polynomial time algorithms to tri-
angulations with maxmin triange height, minmax triangle eccentricity, and
minmax gradient surface, respectively.
Minimum Weight Triangulations 619

For each edge of a triangulation, assign a weight for the edge as the
Euclidean distance between its two endpoints. Then each triangulation has
a weight as the sum of the edge weight in the triangulation. Among all the
optimality critria on triangulations, the most outstanding one is to minimize
the weight, which is known as the minimum weight triangulation.
The minimum weight triangulation has applications in the numerical
approximation of bivariate data [52]. The complexity of computing the
minimum weight triangulation for arbitrary planar point set is still open
since 1975 when it was mentioned by Shamos and Hoey [45]. It is one of the
unsolved open problems listed in Garey and Johnson's book [24]. Since then
several algorithms have been proposed to solve this problem [5, 25, 27, 33,
35, 38, 43, 45, 47]. None of these is known to produce the exact minimum
weight triangulation in polynomial time.
Finding a minimum length weight triangulation with additional points
admit, which is call Minimum Weight Steiner Triangulation are also consid-
ered and some approximation algorithms were given by Eppstein [21].
In the following, we will focus the topic mainly on Minimum Weight
Triangulation( denoted by MWT) and refer it as the optimal one.
Attempts to prove the minimum weight triangulation problem NP-hard
have resulted in two related NP-hardness results. Lloyd [40] showed that
given a set S of points in the plane and a set of edges with endpoints in S,
the problem of determining whether this edge set contains a triangulation
is NP-complete. Lingas [39] showed that the problem of determining the
minimum weight geometric triangulation of multi-connected polygons is NP-
complete. Another related result was given by Jansen, it is showed that the
min-max degree triangulation problem is NP-complete[30].

2 Dynamic Programming
2.1 MWT for a Simple Polygon
Dynamic programming is a powerful tool to deal with some discrete opti-
mization problems. Gilbert[25] and Klincsek[33] gave a dynamic program-
ming algorithm for computing a minimum weight triangulation of a convex
polygon and also hold for simple polygon. The time complexity and the
space complexity are O(n 3 ) and O(n2) respectively. The algorithm can be
mentioned as follows.
Let xo, Xl, ... , Xn-l be the n vertices of a convex polygon in clockwise
order. Let C[Pi, s] , where i = 0,1, ... , n-1 and s = 0,1, ... , n, denote the the
620 Yin-Feng Xu

sum of weights of the interior edges of the minimum weight triangulation of


the convex polygon with vertices {Xi,Xi+b .•. ,Xi+8-I} ,where the subscripts
are taken modulo n. Then to find the minimum weight triangulation of
the polygon is to determine C[po, n]. Let C[Pi, s] = 0, i = 0,1, ... , n - 1 ,
1:::; s:::; 3 and

- min {w(popo)+w(poP(O 1))


(i+1):5j:5(i+s-2) "1 l' '+8-

+C[Pi,j - i + 1] + C[pj, i + s - j]}

where W(Pi,Pj) denotes the Euclidean distance between Pi andpj and W(Pi,Pj) =
o if the edge with endpoints Pi and Pj is on the boundary of the polygon.
The above dynamic programming shows the following theorem hold.

Theorem 2.1 A minimum weight triangulation of a simple polygon can be


found in O{n 3 ) time and O{n 2 ) space.

For some special points arrangement, the minimum weight triangulation


of the point set can also be computed in polynomial time with Gilbert
and Klincsek's algorithm. In [47], the minimum weight triangulation with
convex layers constraint is studied{the convex layers of a point set S is the
set of nested convex polygons obtained by repeatedly computing the convex
hull of the remaining set after removing the vertices of the current convex
hull.) and it is pointed out that if the convex layers is seperated enough,
then the minimum weight triangulation can be computed in O{n4) time with
the above dynamic programming.
With a suitable adaptation of the Hu-Tucker algorithm for optimal al-
phabetic tree[28, 29], consider finding an MWT of points on a line, which is
a degenerate case of a convex polygon.
Let P = Xl, X2, ... , Xn be n points on a line with Xi :::; Xi+b 1 :::; i :::; n - 1.
We say two line segments XiXj,XIXk cross each other iftheir intervals partially
overlap. A minimum weight triangulation of P is a maximal set of non-
crossing line segments of P with minimum possible total weight, where the
weight of a diagonal XiXj is 1Xi -Xj I. The minimum weight triangulation in
this case is just the optimal alphabetic tree with the n - 1 nodes as the line
segments (Xi, Xi+1), for i = 1,2, ... , n -1. So we have the following theorem.

Theorem 2.2 A minimum weight triangulation of n points on a line can


be found in O{nlogn) time and O{n) space.
Minimum Weight Triangulations 621

2.2 MWT for General Point Set


Gilbert and Klincsek's algorithm does not make good use the geometric
property of the problem. Lingas, Heath and PeDUnaraju extended the algo-
rithm to compute a minimum weight triangulation of a cell [38, 27]. A cell
is any interior face of a straight line planar embedding of a graph. With
the extension algorithm, a dynamic programming algorithm can be given to
compute a minimum weight triangulation for any planar point set with a
cell constraint.
Anagnostou and Corneil consider how to use the structure of the con-
vex layers of a point set and give a dynamic programming to compute a
minimum weight triangulation [5], their main result is the following;

Theorem 2.3 A minimum weight triangulation of n points in the plane


can be found in o (n 3k+l ) time, where the n points is restricted on k nested
convex polygons.

Meijer and Rappaport later improved the bound to O(nk) when Sis
restricted to lie on k non-intersecting line segments [41 ].
Xu [47] and Cheng et al.[12] consider the case, when some subgraphs of
a minimum weight triangulation is known , and how to connect the dis-
connected components to get only one connected subgraph of the minimum
weight triangulation and use dynamic programming for a cell to comp:ute
the optimal one.

Theorem 2.4 If a subgraph with k connected components of a minimum


weight triangulation of n points in the plane is given, then the complete
minimum weight triangulation of the point set can be computed in O(nk+2)
time.

The above theorem shows that to find more edges in a minimum weight
triangulation will make the possible to find a minimum weight triangulation
easy.

2.3 Remarks
The problem of finding a fast algorithm to compute a minimum weight
triangulation of a convex polygon is still open. Yao[51] presents a technique
by which the time complexity of some dynamic programming algorithms is
reduced from O(n 3 ) to O(n2). The technique requires the monotonicity of
622 Yin-Feng Xu

certain" bivariate functions, but the minimum weight triangulation problem


for a convex polygon does not hold the condition. Even no algorithm is
found to compute a minimum weight triangulation in O(n 3 ) time and o(n2 )
space.

3 Subgraphs within MWT


3.1 Intersections of Triangulations
In recent years, there are many results related with the subgraph scheme.
The convex hull CH(8) of a point set 8 is an obvious subgraph of any
minimum weight triangulation of a point set. Let E denote the set of all the
segments with endpoints in 8. A line segment pq with p, q E 8 is called a
stable line segment of all triangulations of 8, if no line segment in E properly
intersects pq. The intersection of all possible triangulations of 8 then is the
set of all stable line segments in 8, denoted by 8L(8).
As a combinatorial geometrical problem, some properties of stable line
segments of a set of planar points have been investigated in [48]. It is shown
that the maximum number of stable line segments in 8 is 2(n - 1).
A more important property is the relationship between 8L(8) and so-
called k-optimal triangulations. T(8) is called a k-optimal triangulation for
4 :::; k < n, denoted by LOTk(8), if every k-sided simple polygon drawn
from T(8) is optimally triangulated by some edges of T(8).
Let 8Lk(8) denote the intersection of all possible LOTk(8)'s (i.e., the
set of edges that are in every LOTk(8)). Let MWT(8) denote a minimum
weight triangulation of 8. Then, we have that

In some special cases of 8, 8L(8) forms a connected graph. Thus, an


MWT(8) can be constructed in polynomial time using the dynamic pro-
gramming algorithm proposed in [38, 47].
So far the structure properties of 8L(8) have been studied[47, 48], and
it is shown that 8L(8) can be found in O(n2 10gn) time and O(n) space by
Mirzaian, Wang and Xu [42].
A subgraph LMT(8) of 8L4(8) proposed by Belleville et al. [6],
Dickerson and Montague [17] can be found in O(n4) time and O(n 3 ) space
[17]. The LMT(8) has much more edges than 8L(8) sometimes. But it is
pointed out that for uniformly distributed points, the expected number of
Minimum Weight Triangulations 623

components is 8(n) ; see Bose et aL[10]. An improved algorithm using


O(n 3 Iogn) time and O(n 2 ) was given in [11].
It is easy to show that
5L(5) ~ LMT(5) ~ 5L4(5)

3.2 Local Conditions


Subgraphs found from the intersection aspect of 5Lk(5) view the subgraph
from a global way. In the following we can see that some sub graphs of the
minimum weight triangulation can also be found from some local conditions.
It is shown in [25] that the shortest edge between two points in 5 belongs
to any MWT. Keil proves that a much larger graph, V2-skeleton, is always
a subgraph of a MWT [31]. The V2-skeleton is the ,B-skeleton defined by
Kirkpatrick and Radke in [32] for ,B = V2. Given two points x and y,
define xy to be the edge connecting x and y and define IxYI to be the length
of xy. For ,B ~ 1, the forbidden neighborhood of x and y is the union of two
disks with radius ,BlxYI/2 that pass through both x and y. Given a point
set 5 and x, y E 5, xy belongs to the ,B-skeleton of 5 if no point in 5 lies in
the interior of the forbidden neighborhood of x and y. Let a xy be the angle
that the chord xy subtends at one of the circles. Then,B = 1/ sina xy .
It seems that the ,B-skeleton is a subgraph of a MWT for ,B ~ 1/ sin(7r /3).
Cheng and Xu proved that the ,B-skeleton is a subgraph of a MWT, for
,B> l/sinJ); ~ 1.17682, where J); = tan-l (3/V2.../3) ~ 7r/3.1 [14].
Yang, You and Xu formulated and proved a different property: if the
union of the two disks centered at x and y with radius Ixyl is empty, then
xy is in a MWT [50].
Note that the subgraph generated by the above condition and the ,B-
skeleton do not contain each other for ,B > 1/ sin( 7r /3), but for ,B ~ 1/ sin(7r /3),
the ,B-skeleton contains the subgraph generated by the above condition.
The proof of the ,B-skeleton to be a subgraph of a minimum weight
triangulation are based on proving an improved version of the key lemma,
Remote Length Lemma, first given by Keil [31].
Lemma 3.1 Suppose that ,B ~ V2. Let x and y be the endpoints of an edge
in the ,B-skeleton of a set S of points in the plane. Let p, q, r, and s be
four other distinct points of 5 such that pq intersects the interior of xy, rs
intersects the interior of xy, pq and r s does not intersect the interior of each
other and p and s lie on the same side of the line through xy. Then either
Iqrl < Ipql or Iqrl < Irsl·
624 Yin-Feng Xu

All the subgraphs found above have some symmetric local conditions. A
sufficient unsymmetric condition is given in [46], but the condition is of a
little complicate.

3.3 Remarks
Even some subgraphs of a minimum weight triangulation are found, either
form the global way or the local conditions, but to compute all the known
subgraphs still can not guarantee that the output is a connected graph or
only has a constant number of connected components.
Even some subgraphs of SL 4 (S) can be computed efficiently, but it is
not known how to compute SL4 (S) in polynomial time.
In [49], it is pointed out that to find more edges in a minimum weight
triangulation can improve the performance for some heuristics for the min-
imum weight triangulation.

4 Matching Properties
4.1 Matching Between Triangulations
For a given point set S in the plane, we know that any triangulation, T(S),
of S has the same number of edges, and also the same number of triangles.
This fact makes us to find out the matching relationship between different
triangulations of S.
With the Hall condition of marriage theorem( see [9]), the following
theorems was found independent in [2] and [13].

Theorem 4.1 Let S be a finite set of points in the plane and consider two
triangulations Rand B of S. There exists a perfect matching between the
set of edges of R and the set of edges of B, with the property that matched
edges either cross or are identical.

Theorem 4.2 Let S be a finite set of points in the plane and consider two
triangulations Rand B of S. There exists a perfect matching between the
set of triangles of R and the set of triangles of B, with the property that
matched triangles either overlap or are identical.

We can impose a stronger condition which requires the matched triangles


to share a vertex and the following theorem holds.
Minimum Weight Triangulations 625

Theorem 4.3 Let S be a finite set of points in the plane and consider two
triangulations Rand B of S. There exists a perfect matching between the
set of triangles of R and the set of triangles of B, with the property that
matched triangles
(a) have common interior points, and
(b) share at least one vertex.
With the above matching theorems, we can obtain some results related
with minimum weight triangulations. For a given triangulation T(S) of S,
how can we know whether is a minimum weight triangulation?
In [1], it exhibit a class of planar point sets where the minimum-weight
triangulation can be computed in polynomial time easy to recognized.
Let us call an edge e E E light if any edge in E that crosses e is longer
than e, where E denotes the set of all possible edges with endpoints in S.
Light edges obviously do not cross, so the set L of light edges can form at
most a triangulation of S.
If L actually is a triangulation then we call L the light triangulation of
S. Light edges are related to the greedy triangulation, which is obtained by
iteratively inserting the shortest edge of E that does not cross previously
inserted edges. All light edges are contained in the greedy triar - . "I,tion: a
light edge e can never be blocked by previously inserted edges <.0000I E does
not contain any shorter edge crossing e. Thus, if a light triangulation exists,
it is identical to the greedy triangulation, and it is easy to prove length
optimality with the above matching theorems.
And we can easily prove the following results; The first one gives a lower
bound for the minimum weight triangulation, the second one gives us a way
to indentify a triangulation is a minimum weight triangulation for some
special point sets.
Theorem 4.4 w(L) ~ w(MWT(S}}
where w(L} denotes the total weight of the light edges and w(MWT(S))
denotes the weight of the minimum weight triangulation.
Theorem 4.5 If a planar point set S admits a light triangulation L then
L is the minimum weight triangulation for S.

4.2 Matching Among a Triangulation


To comsider the relationship between the point set S and the edge set of a
triangulation T(S}, with the Hall condition's sufficient and necessary con-
dition for a matching( see [9]), the following theorem was found in [4].
626 Yin-Feng Xu

Theorem 4.6 Let T(8) be an arbitrary triangulation of 8, and 8 3 denote


set of triple copy of 8 (i.e., to take p E 8 three times). Then there is
matching from edges of T(8) to 8 3 such that if p E 8 3 is matched with
e E T(8), then p is an endpoint of e.

Another version of the above theorem is

Theorem 4.7 Let T(8) be an arbitrary triangulation of 8. Then the edges


of T(8) can be oriented such that each point p E 8 has an in-degree of at
most 3.

With the above matching property, an approximating property of the


well-known greedy triangulation GT(8) of a finite point set 8 is obtained.
Exploiting the concept of so-called light edges, we introduce a definition
of GT(8) that does not rely on the length ordering of the edges. Rather,
it provides a decomposition of GT(8) into levels, and the number of levels
allows us to bound the total edge length of GT(8). In particular, it is shown
in [4] that IGT(8)1 :5 3.2 k +1IMWT(8)1, where k is the number oflevels and
MWT(8) is the minimum weight triangulation of 8. Various algorithms for
computing the GT(8) are known, and the GT(8) has been used in several
applications. See, e.g., [16] for a short history.
One use of the greedy triangulation is a length approximation to the
minimum weight triangulation. For a given point set 8. Although the
GT(8) tends to be short in practical applications, and is provably short for
uniformly distributed point sets [36] and for point sets in convex position
[37], its worst-case length behaviour is fairly bad. The GT(8) can be a factor
of O( y'n) longer than the MWTj see [34]. Only very recently, a matching
upper bound has been proved [35].
In particular, an edge which is not crosse by any shorter edge will surely
belong to GT(8). Let us call an edge light in this case. Below is a catalog
of basic properties of light edges.

Lemma 4.8 Let L denote the set of all light edges defined by 8.
(a) L is a non-crossing set of edges.
(b) L contains all edges bounding the convex hull of 8.
(c) L is a subset of GT(8).
(d) In general, L is not a subset of MWT(8).

In conjunction with the above Lemma, Lemma immediately implies: if


L happens to form a triangulation of 8, then ILl = IGT(8)1 = IMWT(8)1.
Minimum Weight Triangulations 627

In any ca.c:;e, we learn that at lea.c:;t a subset of the edges in GT(S) can be
bounded in length by the weight of MWT(S).
The edges in L are called light of level 1. Let E be the total set of edges
defined by S, and let C1 collect all edges of E that are crossed by some edge
in L. Notice that each edge in L, and therefore no edge in C1 , appears in
GT(S). Define E2 = E \ (L u Cd. An edge e E E2 is called light of level
2 if e is not crossed by a shorter edge in E2' Let L2 be the set of all edges
which are light of level 2, and let C2 collect all edges of E2 that are crossed
by some edge in L 2. Again, each edge in L2, and therefore no edge in C2,
appears in GT(S). By setting E3 = E2 \ (L2 U C2) we now can define, in
the obvious way, the set L3 of edges which are light of level 3. Repeating
this process until Ek+1 = 0 yields a hierarchy of levels L 1, L2, ... ,Lk with
L1 = L.
It is evident that levels are pairwise disjoint, and that no edge of level i
can cross an edge of level j, for 1 ~ i, j ~ k. More specifically, we have:

Lemma 4.9 GT(S) = L1 U L2 U ... U Lk.

With the above Lemma and the matching theorem in this section, the
following theorem is proved in [4].

Theorem 4.10 Let S be a finite set of points in the plane, and let k be the
number of levels attained by GT(S). Then IGT(S)I ~ Ck 'IMWT(S)I, where
C1 = 1 and Ck = 3 . 2k+1 for k ~ 2.

4.3 Remarks
The matching theorems between triangulations may be generalized to the
framework of independence systems. An independence system I is a non-
empty collection of subsets of a ground set E which is closed under taking
subsets: if A E I and B c A then B E I. The elements of I are called the
independent sets, the remaining subsets of E are called dependent. A circuit
of I is a minimal dependent set.
In the triangulation ca.c:;e, a set of non-crossing edges (or of non-over lapping
triangles) may be considered independent. The circuits of this independence
system have two elements; they are the pairs of crossing edges (or of over-
lapping triangles respectively).

Theorem 4.11 Let REI be any independent set, and let B E I be an


independent set of maximum cardinality in I. Then there is an injective
628 Yin-Feng Xu

mapping g: R -+ B such that Jor every element e E R we have g(e) = e, or


{g(e), e} is contained in a circuit.

The matching theorem for the point set and the edge set of a triangula-
tion can also be extented to a more general case. For a graph G = (V, E), if
we define its local density as d = max W1, for any subgraph G' = (V', E')
of G. Then the following theorem holds [3];

Theorem 4.12 Let G = (V, E) be an simple graph with local density d, then
the edges oj G can be oriented such that each vertex v E V has an in-degree
oj at most d.

5 Heuristics
Since the complexity statue of finding a minimum weight triangulation
is unsolved, some heuristics and approximate algorithms were considered.
Shamos and Hoey[45] present a divide and conquer algorithm to construct
a Voronoi diagram of n points in a plane in O(nlogn) time. This implies
that the Delaunay triangulation, which is the planar dual of the Voronoi di-
agram, can be constructed in O(nlogn) time [45]. A greedy triangulation of
a point set S is obtained by inserting compatible edges in increasing length
order, where an edge is compatible if it does not cross previously inserted
ones and it can be found in O(n 2 1ogn) time and O(n) space [26]. Shamos
and Hoey state that both the greedy and the Delaunay triangulations are
minimum weight triangulations, and hence the Delaunay triangulation can
be computed more efficient than the greedy triangulation[45] . Lloyd pro-
vides counterexamples to show that both the Delaunay triangulation and
the greedy triangulation are not always the minimum weight triangulations
[40]. In fact, his counterexamples show that neither triangulation is opimal
even for a convex polygon.
Some heuristics are also investigated. Lingas gave a heuristic using the
two steps strategy that first find a minimum spaning tree, MST, of Sand
then optimally triangulate the cell with the M ST constraint [38]. Heath
and Penunaraju give a similar heuristic which first find GT(S) of S and the
optimal triangulate the cell with the MST of GT(S) [27]. Both heuristics
run in O(n 3 ) time and have some good experimental statue.
Plaisted and Hong presents an algorithm and show that the weight
of the triangulation which their algorithm produces is within O(log n) of the
weight of the optimal triangulation [43].
Minimum Weight Triangulations 629

None of the above heuristics can guarantee to produce a triangulation


within a constant ratio of the minimum weight triangulation in the worst
case. Levcopoulos and Krznaric gave a modified greedy triangulating al-
gorithm and proved that the output of their algoithm has a constant ratio
with the optimal ones[35].

6 Conclusion
There are still some open problems related with the minimum weight trian-
gulation problem.
The most important one is to determine the complexity statue of finding
a minimum weight triangulation.
The problem of finding an algorithm to compute a minimum weight
triangulation of a convex polygon in less than O(n 3 ) is of more interesting
and has more applications.
As we know that to find a local optimal triangulation of a point set
is relative easy, since the greedy triangulation is a local optimal one. Up
to now, there is no any result concern with how to compute a 5-optimal
triangulation in polynomial time.
To find triangulations with some another optimality criteria are still an
interesting topic. There is no any results on the problem to find a trian-
gulation with minmax area and the problem to find a triangulation with
minmax perimeter.

References
[1] O.Aichholzer,F.Aurenhammer,S.Cheng, N.Katoh, G.Rote, M.Taschwer
and Y.F.Xu, Triangulations intersect nicely, Discrete Comput Geom
16:{1996}339-359.

[2] O.Aichholzer, F.Aurenhammer, G.Rote, and M.Taschwer, Triangula-


tions intersect nicely, Proc. 11th Ann. ACM Symp.on Computational
Geometry,(1995) pp.220-229.

[3] O.Aichholzer, F.Aurenhammer, and G.Rote, Optimal graph orientation


with storage applications, SFB-Report F003-51 (Optimierung und Kon-
trolle), TU Graz, Austria, (1995).
630 Yin-Feng Xu

[4] O.Aichholzer, F.Aurenhammer, G.Rote, and Y.F.Xu, Constant-level


greedy triangulation approximate the MWT well, Journal of Combi-
natorial Optimization, to appear.

[5] E.Anagnostou and D.Corneil, Polynomial-time instances of the minimum


weight triangulation problem, Computational Geometry:theory and Ap-
pications 3(1993)247-259.

[6] P.Belleville, M.Keil, M.McAllister, and J.Snoeyink, On computing edges


that are in all minimum-weight triangulations, Pmc. 12th Ann. ACM
Symp.on Computational Geometry, (1996) pp.V7-V8.

[7] M.Bern, H.Edelsbrunner, D.Eppstein, S.Mitchell, and T.S.Tan, Edge


insertion for optimal triangulations, Discrete Comput Geom:l0(1993)
pp.47-65.

[8] M.Bern and D.Eppstein, Mesh generation and optimal triangulation, in


D.-Z.Du, F.K.Hwang(ed.) Computing in Euclidean Geometry, (Lecture
Notes Series in Computing 4, World Scientific),(1995) pp.47-123.

[9] B.Bollobas, Graph Theory. An Introductory Course, (Springer-Verlag,


Berlin, 1979).

[10] P.Bose, L.Devroye, and W.Evans, Diamonds are not a minimum weight
triangulation's best friend, Proc.8th Canadian Conf. on Computational
Geometry, (1996)pp.68-73.

[11] S.W.Cheng, N.Katoh, and M.Sugai, A study of the LMT-skeleton,


Pmc.Int.Symp.on Algorithms and Computation (ISAAC), {Lecture
Notes in Computer Science 1178, Springer Verlag,1996)pp.256-265.

[12] S.W.Cheng, M.J.Golin, and J.C.F.Tsang, Expected-case analysis of


,a-skeletons with applications to the construction of minimum-weight
triangulations, Proc. 7th Canadian Conf. on Computational Geometry,
(1995)pp.279-283.

[13] S.W.Cheng and Y.F.Xu, Constrained independence system and tri-


angulations of planar point sets, in: D.Z.Du, M.Li, (eds.), Computing
and Combinatorics, (Proc. First Ann. Int. Conf., COCOON'95, Lecture
Notes in Computer Science 959, Springer-Verlag, 1995) pp.41-50.
Minimum Weight Triangulations 631

[14] S.W.Cheng and Y.F.Xu, Approaching the largest ,a-skeleton within a


minimum-weight triangulation, Proc. 12th Ann. ACM Symp. on Com-
putational Geometry,(1996) pp.196-203.

[15] M.Denny and C.Sohler, Encoding a triangulation as a permutation of


its point set, (Manuscript, Univ.des Saarlandes, Saarbruecken, Germany,
1997).

[16] M. Dickerson, R.L. Drysdale, S. McElfresh, and E. Welzl, Fast greedy


triangulation algorithms, Proc. 10th Ann. ACM Symp. Computational
Geometry (1994)211-220.

[17] M.T.Dickerson and M.H.Montague, A (usually?) connected subgraph


of the minimum weight triangulation, Proc. 12th Ann. ACM Symp. on
Computational Geometry,(1996) pp.204-213.

[18] H.Edelsbrunner, Algorithms in Combinatorial Geometry, EATCS


Monographs on Theoretical Computer Science 10, (Springer-Verlag,
1987).

[19] H.Edelsbrunner and T.S.Tan, A quadratic time algorithm for the min-
max length triangulation, SIAM J. Comput. 22(1993)pp.527-551.

[20] H.Edelsbrunner, T.S.Tan and R.Waupotitsch, A polynomial time al-


gorithm for the minmax angle triangulation, Proc.6th A CM Symp. on
Computational Geometry, (1990) pp.44-52.

[21] D.Eppstein, Approximating the minimum weight steiner triangulation,


Discrete Comput Geom 11(1994)pp.163-194.

[22] D.Eppstein, The farthest point Delaunay triangulation minimizes an-


gles, Comput Geom Theory Appl. 1{1992}pp.143-148.

[23] A.Garcia, M.Noy, and J.Tejel, Lower bounds for the number of crossing-
free subgraphs of Kn , Proc. 7th Canadian ConI. on Computational Ge-
ometry, (1995)pp.97-102.

[24] M.Garey and D.Johnson, Computers and Intractability. A Guide to the


Theory 01 NP-completeness, (Freeman, 1979).

[25] P.D.Gilbert, New results in planar triangulation, Report R-850, Coor-


dinated Science Laboratory, (University of Illinois, 1979).
632 Yin-Feng Xu

[26] S.A.Goldman, A space efficient greedy triangulation algorithm, MIT


Technical Report,(RN. MIT/LCS/TM-336, 1988).
[27] L.S.Heath and S.V.Pemmaraju, New results for the minimum weight
triangulation problem, Algorithmica 12(1994), pp.533-552.
[28] T.C.Hu, Combinatorial Algorithms, (Addison Wesley, 1982).
[29] T.C.Hu and A.C.Tucker, Optimal computer search trees and variable-
length alphabetical codes, SIAM J. Appl. Math. 21(4), (1971)pp. 514-
532.
[30] K.Jansen, One strike against the min-max degree triangulation prob-
lem, Computational Geometry: Theory and Applications, 3(1993), 107-
120.
[31] M.Keil, Computing a subgraph of the minimum weight triangulation,
Computational Geometry: Theory and Applications, 4(1994), 13-26.

[32] D.G.Kirkpatrick and J.D.Radke, A framework for computational mor-


phology, in:G.T.Toussaint(ed.), Computational Geometry(Elsevier, Am-
sterdam, 1985)217-248.
[33] G.T.Klincsek, Minimal triangulations of polygonal domains, Ann. Dis-
. crete Math. 9(1980), 127-128.
[34] C. Levcopoulos, An O( Vn) lower bound for the nonoptimality of the
greedy triangulation, Information Processing Letters 25(1987)pp. 247-
251.
[35] C.Levcopoulos and D.Krznaric, Quasi-greedy triangulations approxi-
mating the minimum weight triangulation, Proc. 7th Ann. ACM-SIAM
Symp. on Discrete Algorithms, (1996)pp.392-401.

[36] C. Levcopoulos and A. Lingas, Greedy triangulation approximates the


minimum weight triangulation and can be computed in linear time in the
average case, Report LU-CS-TR: 92-105, (Dept. of Computer Science,
Lund University, 1992).
[37] C.Levcopoulos and A.Lingas, On approximation behavior of the greedy
triangulation for convex polygons, Algorithmica 2(1987), 175-193.
[38] A.Lingas, A new heuristic for the minimum weight triangulation, SIAM
J. Algebraic and Discrete Methods,8(1987)pp.646-658.
Minimum Weight Triangulations 633

[39] A. Lingas, The greedy and Delaunay triangulations are not bad in the
average case and minimum weight triangulation of multi-connected poly-
gons is NP-complete, Foundations 0/ Computation Theory, Lecture Notes
in Computer Science, 158 (Springer -Verlag, Berlin,1983), pp.238-250.
[40] E.L.Lloyd, On triangulations of a set of points in the plane, Proceed-
ings 0/ the Eighteenth IEEE Symposium on Foundations 0/ Computer
Science(1977), pp.228-240.
[41] H.Meijer and D.Rappaport, Computing the minimum weight triangula-
tion for a set of linearly ordered points, In/ormation Processing Letters,
42(1992)pp.35-38.
[42] A.Mirzain, C.A.Wang and Y.F.Xu, On stable line segments in
triangulations, Proc. 8th Canadian ConI. Computational Geometry,
(1996)pp.68-73.

[43] D.A.Plaisted and J.Hong, A heuristic triangulation algorithm, J. Algo-


rithms, 8{1987}pp.405-437.
[44] V.T.Rajan, Optimality of the Delaunay triangulation in R d , Discrete
Comput Geom, 12{1994}pp.189-202.
[45] M.I.Shamos and D.Hoey, Closest point problems, Proc.16th IEEE
Symp. Foundations of Computer Science,(1975)pp. 151-162.
[46] C.A.Wang, F.Chin, and Y.F.Xu, A new subgraph of mInImUm
weight triangulation, Journal of Combinatorial Optimization, Vol.l,
No.2{1997}pp.U5-127.
[47] Y.F.Xu, Minimum weight triangulation problem of a planar point set,
Ph.D. Thesis, {Institute of Applied Mathematics, Academia Sinica, Bei-
jing, 1992}.
[48] Y.F.Xu, On stable line segments in all triangulations, Appl.Math.-JCU,
UB, No.2{1996}pp. 235-238.

[49] Y.F.Xu and D.Zhou, Improved heuristics for the minimum


weight triangulation, Acta Mathematics of Applicatae Sinica Vol. 11,
No.4(1995)pp.359-368.

[50] B.T.Yang, Y.F.Xu, and Z.Y.You, A chain decomposition algorithm for


the proof of a property on minimum weight triangulations, Proc. 5th Int.
634 Yin-Feng Xu

Symp. on Algorithms and Computation (ISAAC '94), {Lecture Notes in


Computer Science 834, Springer-Verlag, 1994)pp.423-427.

[51] F.Yao, Speed-up dynamic programming, SIAM J. Alg. Disc.


Meth,No.4(1982)pp.532-540.
[52] P. Yoeli, Compilation of data for computer-assisted relief cartography,
in J. C. Davis, M. J. McCullagh, (eds.), Display and Analysis of Spatial
Data, {Wiley, New York, 1975)pp.352-367.
635

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 2)


O.-Z. Ou and P.M. Pardalos (Eds.) pp. 635-726
@1998 Kluwer Academic Publishers

Optimization Applications in the Airline Industry

Gang Yu
Department of Management Science and Information Systems and
Center for Management of Operations and Logistics
The University of Texas at Austin, Austin, TX 78712
E-mail: yu«luts.cc . utexas. edu

Jian Yang
Department of Management Science and Information Systems and
Center for Management of Operations and Logistics
The University of Texas at Austin, Austin, TX 78712
E-mail: jiany«luts.cc . utexas. edu

Contents
1 Introduction 636

2 Network Designs 637

3 Yield Managment 652

4 Flight Planning and Fleet Assignment 664

5 Crew Scheduling 678

6 Air Traffic Flow Control 687

7 Irregular Operations Control 697

8 Concluding Remarks 711


636 G. Yu and J. Yang

References

1 Introd uction
The quality of an airline's product is measured by its timeliness, accuracy,
functionality, quality, and price. For the air transportation customers, these
criteria translate into flexible schedules, on-time flights, safety, satisfactory
in-flight services, proper baggage handling, reasonable prices, and convenient
ticket purchases. To provide this high-quality, low-cost product, airlines rely
on optimization-based decision support systems to generate profitable and
cost-effective fare classes, flight schedules, fleet plans, aircraft routes, crew
pairings, gate assignments, maintenance schedules, food service plans, train-
ing schedules, and baggage handling procedures.

The airline industry has always been an exciting arena for interplay
between optimization theory and practice. This attractiveness can be at-
tributed to the nature of airline business, including:

• Severe competition among airlines and between air and other trans-
portation vehicles;

• Large operational scale and scope;

• Tightly coupled resources such as aircraft, crew, maintenance facilities,


airports, etc.;

• Active interactions and close dependencies among all involved compo-


nents;

• Dynamic environment;

• Sophisticated customer behavior;

• Complicated company policies, business rules, and tight control by the


Federal Aviation Administration (FAA);

• Complex operational plan, schedules, routes, task assignments, and


control mechanisms; and

• Real-time and mission-criticality of decisions.


Optimization Applications in the Airline Industry 637

The challenges above are faced by applied mathematicians, management


scientists, operations research analysts, and system engineers who apply
their optimization concepts and ideas to the airline industry.

In the past few decades, optimization has made an unprecedented impact


on the airline industry. This is largely due to the evolution of computing
technology, rapid advancement of optimization methodology, a better grasp
of airlines' business rules by operations researchers and practitioners, and
the driving demand from the airline industry caused by competition and
customers' high expectations.

In this paper, we survey the major optimization areas that have had a
great impact on the airline industry. The following topics are covered in
separate sections: network design, yield management, flight planning and
fleet assignment, crew scheduling, air traffic flow control, and irregular op-
erations control. In the conclusion, we comment on some of the missing
topics and point out future research and development directions based on
our knowledge and experience.

2 Network Designs
The network design problem in the airline industry can be stated as follows:
find the optimal network structure and optimal routes to carry the targeted
passenger flow at the lowest total transportation cost.

Among all the factors, including air fares, quality of service, yield man-
agement and route structures, the routing system has been proven to be a
critical element impacting an airline's market share, and thus its ability to
compete. As a result, the optimal network design problem together with an
efficient routing structure are critical issues facing airlines today.

Prior to airline deregulation in 1978, the routes of U.S. airlines were


controlled by the Civil Aeronautics Board which was established by the
Civil Aeronautics Act of 1938. In order to add new routes, airlines were
required to prove to the Civil Aeronautics Board that the proposed new
services would benefit the public and that competing airlines already serv-
ing the routes would not be adversely affected. Therefore, developing long
638 G. Yu and J. Yang

main routes was a primary objective, resulting in a "linear" pattern for the
airline network structure [14]. Figure 1 illustrates such a "linear" pattern
network structure which was commonly employed by all the airlines before
deregulation.

Figure 1: A point-to-point network structure

By eliminating entry restrictions, deregulation gave the airlines increased


freedom and flexibility in restructuring their networks. It allowed the air-
lines, for the first time in forty years, to establish routes and fares freely.
Tumultuous changes ensued in the industry with profound impact on the
most basic aspects of airline operations, including fares, services, quality,
and safety. In fact, eighteen months after deregulation, 106,000 city-pair
authorizations had been issued, in marked contrast to the 24,000 autho-
rizations granted during the eighteen-month period immediately preceding
deregulation.

Perhaps one of the most significant developments in the airline industry


was the hub-spoke network structure. Hub-spoke was conceived to protect
and increase an airline's market share. In the competitive environment en-
couraged by deregulation, the development of a hub-spoke routing system
was used as a cost reduction approach, but also as an essential marketing
tool. To date, all major U.S. carriers except Southwest Airlines have es-
Optimization Applications in the Airline Industry 639

tablished hub-spoke network systems, including Dallas-based American Air-


lines, Chicago-based United Airlines, Houston-based Continental Airlines,
St. Paul-based Northwest Airlines, and Atlanta-based Delta Airlines.

Figure 2: A typical hub-spoke network system.

Figure 2 is an illustration of a typical hub-spoke network system com-


monly seen after deregulation. A centrally-located airport serves as an air-
line's hub. The airline offers flights between its hub and the airports on
the periphery. The solid lines in Figure 2 represent these routes. Flights
from various origins (spokes) arrive at a hub as an intermediate point from
which passengers change planes to proceed to their ultimate destinations.
This strategy targets passengers traveling between origins and destinations
for which the traffic volume is not sufficient to establish frequent non-stop
flights. However, by consolidating passengers with different origins and des-
tinations, the hubbing airline is expected to be able to serve more passen-
gers on its flights, using larger and more efficient aircraft. Take American
Airlines' hub-spoke network system as an example, at its main hub in Dal-
las/Fort Worth, passengers from one flight can connect to any of 30 or more
other flights.
Essentially, centralization and the broader scope of operations associated
with a hub-spoke system permit the airlines to take advantage of economy of
scale. Consider American Airlines as an example. In 1980, approximately 10
640 G. Yu and J. Yang

percent of its traffic consisted of connecting passengers. By the mid-1980s,


about 66 percent of the passengers on a typical flight to a hub airport were
connecting to other flights to varying destinations [130].

In addition to becoming popular in airline operation, the concept of


hub-spoke networks has also been applied to air cargo delivery, ground trans-
portation, satellite communication, telephone networks, and other logistical
systems. For example, WalMart has been successfully using the concept in
retailing since the business grew from a small regional retailer to a national
chain with more than 890 stores. Today, about 80 percent of its merchandise
is delivered directly to the individual retail outlets from eight distribution
centers in six hub cities [92].

Despite the economic significance and popularity of hub-spoke systems,


there has been little analysis of the effectiveness of this type of network.
Insufficient systematic mathematical analysis has been conducted for the
purpose of justifying and evaluating the effectiveness of hub-spoke systems
currently in use. Hence, the questions of whether or not hub-spoke is a
better network design remains to be further investigated and correctly an-
swered.

Further, very limited systematic mathematical studies has been per-


formed in order to provide an optimal network structure. Most studies
in the past only focused on the construction of the hub-spoke system in a
restrictive way. Solution procedures were presented under the assumptions
that (1) there must exist a certain number of hubs in the underlying network
and (2) each node must be connected to a hub. Obviously, with a restriction
regarding the number of hubs that must exist and a restriction of the fixed
structure of the hub-spoke framework, this type of analysis only provided a
sub-optimal solution.

Past research on hub-spoke systems were conducted from two different


aspects. On the one hand, many researchers analyzed and assessed the ad-
vantages of a hub-spoke system in terms of airline economics. On the other
hand, some focused on mathematical models for identifying optimal hub
locations and designing routing policies within a given hub-spoke network
structure.

To assess the effect of hub-spoke systems on an airline's cost structure,


Optimization Applications in the Airline Industry 641

McShan and Windle [97] measured the change in the extent of hub-spoke
routing since deregulation. Their comments on the hub characteristics were:
(i) Hubbing is likely to result in more frequent flights and should therefore
improve service; (ii) the central location is clearly preferable, since the hub
minimizes total distance traveled on each spoke; and (iii) the hub city size is
also important in order to minimize total distance traveled by all passengers.
Overall, they suggested that hubs should be located centrally, thereby also
attaining substantial local traffic.

Brown [26] developed an economic model to examine the effectiveness of


the hub-spoke system. His study showed that the welfare effects of airlines'
hub-and-spoke system are ambiguous. On one hand, hubbing route pat-
terns represent an improvement over the linear route structures that existed
before deregulation, since they permit a more efficient use of aircraft and
lower the fixed cost on a per-passenger basis. On the other hand, however,
he concluded that hub-and-spoke structures result in the transfer of con-
sumer surplus to airlines.

In their analysis of airline competitive behavior, Bailey, Graham and


Kaplan [14] studied airline cost and profitability using a simultaneous re-
gression model. They concluded that the hub-spoke system allows airlines
to have more frequent flights with larger aircraft and a higher percentage of
seat occupancy, and is, therefore, cost-effective. Morrison and Winston [102]
looked at the effects of hub-spoke systems on passenger welfare, finding that,
on average, passengers benefited from the switch to hub-spoke networks by
receiving more frequent flights with lower fares and slightly shorter travel
times. Reynolds-Feighan [115] studied the efficiency of airline network rout-
ing systems and found that the overall returns to scale for hub routes and
higher than that of nonhub routes. He concluded that the increasing returns
to scale associated with each of the hub subsystems encourages airlines to
expand the number of connections to hub airports and discourages adding
'spoke' routes which are not directly connected to one of the hubs.

Aside from the research on the economic effects of the hub-spoke sys-
tems, there are few studies which focused on analyzing a hub-spoke system
in order to better understand it.

Ghobrial and Kanafani [56], Hansen [61], and Hansen and Kanafani
[62][63] are among the researchers who primarily modeled the airline hub-
642 G. Yu and J. Yang

spoke system as an approach in which equilibration rather than optimization


is stressed. It is noted that hub bing causes unevenness in the distribution
of both the benefits and the costs associated with air transportation, while
at the same time, it closely couples that distribution with the competitive
fortunes of individual airlines. As summarized by Hansen and Kanafani [62],
the net result is increased uncertainty and rivalry among airports, as well
as the communities they serve.

In terms of constructing a hub-spoke system, some studies were con-


ducted solely for the purpose of finding the optimal hub locations. These
studies either analyzed the factors affecting the hub location decisions or
used mathematical modeling to derive the optimal hub locations.

In an effort to better understand the hub bing phenomena, Bauer [17]


looked for the main factors that airlines consider in evaluating existing and
potential hubs and investigated the impact of the hub bing decision on air-
port traffic. His results indicated that population is the most important
factor determining hub location among all the characteristics that influence
hub location and the effect on airport traffic as a result of hub activity. One
of his most interesting findings was that the creation of a hub at a city leads
to a more-than-doubling of revenue generated by passenger emplanements
at that city.

O'Kelly [107] developed several hub-location models for a hub-spoke sys-


tem. O'Kelly concluded that if the cost of setting up each intercity route is
ignored, then there is no rational reason to construct a hub-oriented system.
If, on the other hand, there is a cost associated with each intercity route, as
long as the incremental transportation costs are less than the savings in link
costs in order for a single hub to emerge, the one-hub system is preferred to
the no-hub system.

Another assumption used in O'Kelly's model for the two-hub system,


which was implicitly reflected in the model as pointed out by Aykin [10), is
that the destinations in O'Kelly's model are all in a Euclidean space. The
model and the Euclidean distance are appropriate if all destinations cover a
small area on the earth's surface. When destinations are widely separated,
the Euclidean space is no longer a suitable approximation due to the curva-
ture of the earth's surface. Aykin further generalized O'Kelly's model into
a formulation that can be used to solve multi-hub location problems in the
Optimization Applications in the Airline Industry 643

Euclidean space and gave the condition under which a destination is opti-
mally assigned to a particular hub facility.

O'Kelly [108] subsequently formulated a more generalized hub location


problem in a discrete solution space with a quadratic integer programming
model. With the goal of minimizing the total transportation cost, the model
was formulated to yield the optimal hub locations.

min L L Wij(L XikCik + 2: XjmCjm + aLL XikXjmCjm)


i=l j=l k m k m

subject to

(n - p+ I)Xjj - 2:Xij > 0 j = l, ... ,n


2: X ij = I i = l, ... ,n
j

LXjj = p
j

Xij E {O,l}
where binary variable Xij indicates if node i is linked to a hub at j, and
Xii = I if node i is a hub. Wij is the demand flow between any two cities,
and Cij is the transportation cost of an unit flow between node i and j. The
total number of cities to be interconnected is n, and p is the total number of
hubs to be constructed. O'Kelly's model is strictly limited to a hub-spoke
network structure. On the assumption that there is no direct connection
between any two spoke cities, the possible alternative route structure is
completely ignored in the model.

To introduce a more comprehensive framework with networking policies,


Aykin [11] developed several integer programming models considering dif-
ferent networking policies. Each of his models deals with a special case in
which a specified networking policy is applied. Basically, he considered the
following two networking policies: (i) nonstrict hubbing, where flows be-
tween any pair of nodes are not required to go through a hub. Even there
exist hubs, flows can go through a hub only when it is cost effective; (ii) strict
but nonrestrictive hubbing, where all flows are channeled through hubs. The
"nonrestrictive" hubbing implies that flows from the same spoke node are
not required to go through the same hub, but different hub if desirable;
644 G. Yu and J. Yang

and (iii) strict and restrictive hubbing, where not only flows are required
to go through hubs, but also flows from the same spoke node have to be
served by the same hub. Aykin concluded that nonstrict hubbing is more
cost-effective than strict but nonrestrictive hub bing policy. And among the
three, the strict and restrictive hubbing is the least desirable.

Jaillet et al. [70] [71] addressed the network design problem from the
perspective of a given airline, assuming that an airline serves a fixed share
of the market. To be more specific, they assumed that the demand would
be given, and dependent on the resulting design of the network. This is
an idealized situation. Demand is only captive in a monopoly situation
(e.g., Olympic Airways e~c1usively serving the Greek Islands). Within a
given set of service policies that an airline may provide, i.e., non-stop flights
versus hub-connecting flights, their network design problem can be stated
as follows:

Given a fixed origin-destination flow demand matrix, and the ca-


pacities and the mileage cost of different types of aircraft, design
a network which satisfies the demand and minimizes the total
transportation cost.

In the Jaillet et al. model, the following notations and definitions were
used. Models are all prefixed with NW D, standing for network design
problems.
Policy Classifications:

1) One-Stop: The airline provides two possible services for each route
it serves: a non-stop flight and a flight with one connection. NWD(l) is
used to denote this basic model from which several more complex models
are developed. The index (1) implies that only up to one stop flights are
allowed.

2) Two-Stop: Similar to the one-connection case, the airline now pro-


vides an additional two-connection flight. This is the most common type of
service in the U.S. airline industry. With an extra stop is permitted, it is
expected that the solution in terms of total operational cost to this model
should be at least as good as that of model NWD(l).
Optimization Applications in the Airline Industry 645

3) All-Stop: The airline is assumed to be in a monopoly situation. It


serves the entire market exclusively without any competition. Therefore, if
it would make the airline more profitable, flows on any single route could be
channeled through as many stops as there are cities involved in that mar-
ket. Although this policy is not practical to airline industry, it has various
important applications in other fields such as telecommunication networks,
air cargo delivery, and other logistical systems.

This model will be denoted as NWD(n-2). The index (n-2) implies that
flights with up to (n - 2) stops are allowed, where n denotes the number
of cities involved. There is a close relationship between NWD(n-2) and
NWD(l). Solutions to NWD(n-2) can be obtained by solving a transformed
model of NWD(l).
Parameters:

K = number of aircraft types (e.g. Boeing 747s or DC lOs)


n = number of cities the airline serves
hj = number of passengers who desire to fly from city i to city j per day
dij =air distance from city i to city j
Ck = cost per mile for aircraft type k
bk = capacity of aircraft type k

Decision variables:

Xii = fraction of the flow hi served by a direct flight from i to j


Xiii = fraction of the flow hi served by an indirect flight from i to
an intermediate city I and then to j
Xilti =fraction of the flow Iii served by an indirect flight from i to j
by going through two intermediate cities at I and t
yt = number of aircraft of type k used on the route from city i to city j

Model NWD(l)

The basic model focuses on minimizing the total in-flight transportation


cost. It ignores the fixed cost for aircraft purchasing/leasing, it is believed
that this model will reveal a basic hub-spoke pattern if it is cost-effective.
The solution of this model will not only answer the question of which net-
work structure an airline should adopt, but also yield an optimal routing
646 G. Yu and J. Yang

strategy for the airline to apply.

n n K
NWD(1) min L L: L: dijCkyfj
i=1 j=1 k=1

subject to
n n K
IijXij + L iljXlij + L iilXijl < L bkY~ i, j = 1, ... , n
1=1 1=1 k=1
n
Xij + LXilj = 1 i,j = 1, ... ,n
1=1
Xij, Xilj > 0 i,j,l = 1, ... ,n
y~ E Z+ i,j = 1, ... ,n;k = 1, ... ,K

The objective function minimizes the total transportation cost with re-
spect to the number of aircraft used and the distances to be traveled by the
aircraft. First set of constraints restricts the number of flow units carried
on each arc so that the capacity of a selected set of aircraft on that arc will
not be exceeded. Second constraint set ensures that demand between any
given pair of cities is satisfied. Together with the integrality constraints, the
model depicts the airline's operations under the one-stop policy. Note that
the following is not taken into account in this simplified model:

1. Fixed cost for purchasing/leasing aircraft.

2. Limit on total number of available aircraft. Omission of this limit is


not an unreasonable approximation, since it is assumed the possibility
ofleasing aircraft at any city in any amount, and the objective function
also gives an incentive to reduce the number of aircraft.

3. Periodic airline operations. The airline operations should be scheduled


on a regular basis. After a certain length of time, crews must return
to their bases.

Despite the omission of these considerations, this model captures the essence
of the intended goal in the following sense: (i) if a hub-spoke system is de-
sirable, this model will reveal the pattern; and (ii) by revealing a basic
network structure, it provides directions for both routing passengers and
Optimization Applications in the Airline Industry 647

selecting aircraft for each route.

Model NWD(n-2}

Define Sij as a set of paths from city i to city j, and if a path P E Sij,
then IPI is defined as the number of cities traversed by the path. Also, the
authors assume that if P E Sij, then Illj I ~ 3, so that Sij does not include
the one-arc path Xij' Mathematically, any path P E Sij can be represented
by a sequence VI, V2, ... , vlPI with VI = i, vlPI = j. Expression (ij) E P where
P E Sal is used to indicate that the arc (i,j) is in the path P from a to 1,
or that i and j are two consecutive cities in the path. The model NWD(l}
is then extended to describe the all-stop policy:
n n K
NWD(n-2) min 2: 2: 2: dijCkyt
i=l j=lk=l

subject to
K
lijXij +L L falxp :::; 2: bkYt i,j=1, ... ,n
a,l VPESa/:(ij)EP Ie=l

Xij + L Xp = 1 i,j = 1, ... ,n


VPESij

Xij,XP~O i,j=1, ... ,n


yt E Z+ i,j = 1, ... , nj k = 1, ... ,K.

Since a path may involve up to n cities, a large number of real variables


is expected in the modeL In fact, if all possible paths are considered, the
number of real variables X will be exponential with respect to the number
of cities n. In turn, this would make the model intractable.

By using a transformation described in the following, Jaillet et al. showed


the relationship between model NWD(n-2} and model NWD(l}.

Transformation of NWD(n-2}: Given k types of aircraft and the demand


matrix lij, denote by k* the most economic aircraft (i.e., aircraft k* has the
lowest cost per seat Ck* /bk*), by m a large integer number such that mbk* ~
648 G. Yu and J. Yang

Lij lij·Define a transformation of the demand matrix by hj = lij + mbk* ,


then by adding an additional constraint:

y k*
.. >m i,j = 1, ... ,n.
'3 -

a transformed model of NWD(n-2) is obtained:


n n K
TNWD(n-2) min l: ~:)L dijCkyt - dijCk*m)
i=lj=l k=l

subject to
K
lijXij + L lalXPal ~ l:bwt i,j = 1, ... ,n
Pal:( ij)EPal Ie=l

Xij + LXP;j = 1 i,j = 1, ... ,n


P;j
Ie*
Yij > m i,j = 1, ... ,n
Xij,XP;j > 0 i, j, I = 1, ... , n
k
Yij E Z+ i,j = 1, ... ,n;k = 1, ... ,K

Lemma 1 Let the solution to T NW D( n - 2) be xP;j' yt, then


Ak
Yij
{yt.
= y~* _ m if k = k*
il k :/; k*

is an optimal solution to NWD(n - 2).

This transformation is valid for the all-stop case only, since Lemma 1
may not hold under any restricted cases; for any arc the demand mbk* is
not guaranteed to be carried directly. Because the key requirement for a
valid transformation is that added demand be directly carried between each
pair of nodes, solution equivalence of the transformation for either one-
stop or two-stop cases cannot be established. However, by applying the
transformation to the model NWD(1), we can show that the solution space
of the transformed model TNWD(J) is defined the same as that of TNWD(n-
2). Hence, the model is reduced significantly in size, and an optimal solution
Optimization Applications in the Airline Industry 649

to NWD(n-2} can then be obtained through solving a much simplified model,


TNWD{1}. Applying the transformation to the model NWD(l}, we have the
following transformed model TNWD(l}:

TNWD(l)

subject to
n n K
hjXij + L /tjXlij +L h,Xijl ~ Ebl:yt i,j = 1, ... ,n
1=1 1=1 1:=1
n
Xij + LXilj = 1 i,j = 1, ... ,n
1=1

Yij > m
k O
i,j = 1, ... ,n
Xij, Xilj > 0 i, j, 1 = 1, ... , n
k
Yij E Z+ i,j = 1, ... ,njk = 1, ... ,K
and the following theorem for establishing the relationship between the
transformed model TNWD(l} and NWD(n-2}:

Theorem 2.1 Let an optimal solution to T NW D(1) be Xij, Xijl, jj~j' then
Ak {jj~. if k :f; k*
Yij = -'io -
Yij m 'J:/ k -- k*
is the optimal solution to NWD(n - 2).

Based on Theorem 2.1, the following procedure can be used for solving the
model NWD(n-2}:

Step 1: Transform NWD(l} into TNWD(l}j

Step 2: Solve the transformed model TNWD(l} to get solution Y~jj

Step 3: Use the longest path in the graph determined by nonzero yt to


limit the possible path lengthj and

Step 4: Solve NWD(n-2} as a linear program with yt


fixed at Y~j and the
path length limited by what has been found in Step 2.
650 G. Yu and J. Yang

Model NWD(2)

In practice, almost all the airlines in the U.S. have adopted this type
of policy. The rationale is obvious. Using as many stops as needed would
certainly be more profitable if the market were fixed. However, in a realistic
situation, a flight with more connections would certainly make air travel less
desirable to passengers. An airline not operating in a monopoly situation
would lose its market share, and therefore this is not acceptable.

On the other hand, the model with at most two intermediate stops will
both be adequate to describe realistic situations and reasonable enough for
the fixed demand assumption to be valid.

NWD(2) . "n "n


mm ~i=l "K d k
~j=l ~k=l ijCkYij

subject to
n n n n
IijXij +L L ftjXltij +L L Ii,Xijtl
l=lt=l l=lt=l
n n K
+L L fltXlijt < L bkYt i, j = 1, ... , n
l=lt=l k=l
n n
Xij + L L Xiltj - 1 i,j = 1, ... , n
l=lt=l
Xij, Xiltj > 0 i,j, I, t = 1, ... , n
k
Yij E Z+ i,j = 1, ... , nj k = 1, ... ,K.

Jaillet et al. dev~loped various heuristics to solve NWD(1),NWD(2)


and NW D(n - 2). The heuristics all contain an initial feasible solution
construction phase based on LP relaxation and then followed by various
solution improvement procedures. Some benchmark testing problems have
been formed. One problem contains 39 cities selected from top 100 U.S.
cities. These cities are chosen in such a way that all major geographical
areas of U.S. are covered. The distance between any two cities is measured
as a direct line in miles. Intercity passenger travel demand is estimated
based on some gravity model. Those models in general are constructed using
two sets of variables, namely socioeconomic variables and supply variables.
Among the socioeconomic variables, choices are the following:
Optimization Applications in the Airline Industry 651

• Population. The population of the total metropolitan area served by


an airport;

• Employment. The total employment in a metropolitan area is a mea-


sure of the level of economic activities that generate travel or attract
travel; and

• Disposable income. This variable is usually measured on a per capita


basis, and it is used as a measure of potential of travel.

The choice of supply variables are commonly used in city-pair models


are:

• Airfare;

• Travel time;

• Distance;

• Frequency of service; and

• Other level of service attributes.

The detailed model and methods for setting parameters can be found in
Song [131]. The following list the major criteria defined for analysis of the
network structure based on the solutions:

• Degree of the nodes (cities) in the solution graph, i.e., the number of
arcs connecting to a city. The higher degree of a city in the resulting
network, the more likely the city is a hub;

• Number of aircraft flying in and out of a city. Since the degree of a


node does not account for the volume of flow, this criterion is used
to account for this factor. The larger the number of aircraft goes
in-and-out of a city, the more likely the city serves as a hub;

• Difference between the number of aircraft going in a city and the min-
imum number of aircraft that is required for satisfying the demand of
that city. The larger the difference is, the more likely the correspond-
ing city is a hub;
652 G. Yu and J. Yang

• Number of passengers to whom a city is served as an intermediate stop,


i.e., the total number of passengers going through a city for connecting
flights to destinations other than this city. The larger this number is,
the higher chance for the city to be a hub;

• Proportion of total passengers traveling directly to destinations with-


out going through intermediate stops. The larger this percentage is
for a given city, the more likely that the city serves as a hub. This is
because for any spoke city, a large proportion of passengers originating
from it are likely to go through some hub cities in order to reach their
destinations, while for a hub city this proportion is less.

Conclusions drawn from Jaillet et al. study can be summarized as fol-


lows. As a combination of hub-spoke and alternative arrangement, a cost
effective network design appears to be more hub-spoke structured. Location
of potential hubs seem to be more geographically dependent than they are on
the density of the demand levels. In addition, hub positions can be located
differently depending on the policy adopted. With a relative high level of
demand flows, the difference among the policies is insignificant. Thus, the
one-stop policy is recommended to account for social factors.

3 Yield Managment
Airline deregulation has remarkably raised the level of competition in the
air transportation market. In order to maintain and improve market share,
airlines make tremendous effort in yield management (it is also referred to
as revenue management in recent literature). In order to survive in such a
competitive environment and as result of yield management research, many
airlines offer a wide variety of fares, ranging from deeply discounted fares to
higher priced coach, business, and first class fares. The fare levels offered for
a flight are directly affected by the pressure to match competitors' fares in
the same market. Since little room is left for improving yield management
in terms of better pricing due to low profit margins, balancing the number
of discount and full-fare reservations accepted for a flight so as to maximize
total passenger revenue became the focus of airlines' yield managers [19].
The need for a balanced solution comes from the fact that lower fares at-
tract more passengers, thus create greater load factors, while they also take
away seats which could have been sold at higher fares to increase revenue.
The payoff for effectively managing the seat inventory is substantial. Delta
Optimization Applications in the Airline Industry 653

Airlines estimated that selling one seat per flight at full fare rather than at
a discounted fare would add over $50 million to its annual revenue [85].

Research on reservations and booking control dates far back to before


deregulation. Etschmaier and Rothstein [46] and Gasco [52] gave complete
surveys. Beckmann [18] and Thompson [148] studied the problem in a very
simplified manner. Taylor's work [140] can be considered as a pioneering
research in booking level control. Based on Taylor's work, a whole family
of models were developed for controlling overbooking levels. Several air-
lines have implemented Taylor's ideas in their booking systems. Rothstein
and Stone [123] described one of the implementations by American Airlines.
Rothstein [122] also gave a survey of the application of operations research
to airline overbooking.

Most airlines and researchers deal with the seat allocation problem flight
by flight. Even for a single flight leg, the problem is very complex. On
the same flight, there are passengers with various origin-destination (O-D)
itineraries each of which generates a different amount of revenue. For a ma-
jor airline practicing hub-and-spoke operations, every flight to the hub can
have passengers destined to almost all of its spoke stations; every flight from
the hub can have passengers departing from almost all of its spoke stations.
In addition, every itinerary has several different fare levels. So, there can be
hundreds of fare class/itinerary combinations for each flight leg, each having
its own desirability to the airline. The essential factor in determining the
seat allotment is passenger demand. Passenger demand is not deterministic,
but its trend is reflected in past records and in the number of reserved seats
for the current flight. To build and solve a model optimizing seat utilization
which covers the decisions of all the combinations, fully utilizes historical
passenger demand, and dynamically adjusts its decision with the evolving
reservation data is out of the question. All the models which deal with this
problem make certain simplifying assumptions.

Belobaba [19] discusses a very simple static model to decide the seat
allocation for a flight with two fare classes. Passenger demand for each fare
class is assumed to be an independent random variable. Ii is defined as
the revenue generated per passenger in fare class i and bi(8i) the expected
number of passengers that will make reservations in fare class i when 8i
reservations have been made in class i. The constraint is due to the pre-
existing cabin capacity: C = 8 1 + 82. The ideal allocation that maximizes
654 G. Yu and J. Yang

the total expected revenue

is the solution of equation

Hence, we have the optimal allocation Si, 8 2 satisfying

because we have

where Pi (x) is the probability density at the ticket selling level of x seats in
fare class i, and Pi(x) is the probability of selling x or more seats in fare
class i. Therefore, seats are optimally allocated between the fare classes such
that the marginal expected total revenue with respect to additional seats in
each class is equal to zero. In this model, the Pi (8i) data are generated from
historical records.

Equating marginal revenues in each of two fare classes to find the optimal
seat allocation for a flight can be extended to dynamic models. Littlewood
[84] suggests that low-fare passengers paying h should be accepted as long
as:
h ~ PI (81)!I,
where PI (8d is the probability of selling all remaining 8 1 seats to high-
fare passengers paying !I per ticket. The implicit assumptions made in
the model are: low-fare passengers book first, there are no cancellations of
bookings, and a rejected request is regarded as revenue lost to the airline.
Wang [154] uses the same idea to dynamically allocate seats for different
fare class-itinerary combinations in a flight. If the current number of book-
ings for combination (j, k) is bjk for every feasible combination (j, k), then
the next seat will be allocated to the combination (j, k) with the largest
IikP[rjk > bjk], where P[rjk > bjk] is the probability that another request
for (j, k) will be received given bjk bookings are accepted for combination
(j, k).
Optimization Applications in the Airline Industry 655

Demands for both discount-fare class and full-fare class on a flight are
closely related to the overall passenger demand for the flight, and passen-
gers who are not able to get the discount fare often upgrade their bookings
to the full-fare class. A more accurate treatment of the allocation prob-
lem should consider the correlation between the demands for the two fare
classes. Brumelle et al. [27] propose a static model to determine the dis-
count booking limit",. It uses the distribution of demand for full-fare tickets
conditioned on the demand for discount-fare tickets. Furthermore, consider-
ation of the loss in revenue reflecting the loss of goodwill of passengers who
are turned away from full-fare class is incorporated into a second model.
A third model even considers the situation where some passengers are up-
graded from discount-fare class to full-fare class. All the models are built
on the following framework.

Suppose B is the demand for discount fare class; Y (",) is the demand for
full fare class given discount booking limit ",; ~ is the cabin capacity; and
PB, py, and Pc stand for the revenue generated by carrying each discount
fare passenger, each full fare passenger, and goodwill-related loss for each
passenger being denied full fare class booking, respectively. Then the total
revenue as a random variable R(",) is:

where a 1\ b = min(a, b), a V b = max(a, b), and a+ = a V O. To determine


whether the ",th request for the discount fare class should be accepted at the
time", - 1 bookings for the class have been made, the expected incremental
gain G(",) = E[R(",) - R('f/ -1) I B ~ 'f/] shall be assessed as:

G('f/) = PB + pyE[(Y(",) 1\ (~- "') - y(", -1) 1\ (~- 'f/ + 1) I B ~",]

-pcE[Y(",) - (~- ",))+ - (y(", - 1) - (~- 'f/ + 1))+ I B ~ ",].

If G (",) is nonnegative for all ", up to some ",*, and nonpositive thereafter,
then ",* will be optimal. For all the three models the ",* can be found.

The first model neglects the goodwill factor, so its optimal discount
booking limit is:

'f/* = max{O ::; ", ::; ~ : Pr[Y > ~ - ", I B ~ ",] < PB},
py
656 G. Yu and J. Yang

and 11* = 0 if Pr[Y > ~] ~ e:.


Probability Pr[Y > ~ -11 I B ~ 11] can
be interpreted as the maxima! ratio of passengers being turned away in the
full-fare booking process. When the limit 11 is reached in practice, this prob-
ability is just the ratio. So, closeness of the spill rate to the ratio of unit
revenues ~ somehow evaluates the closeness of the 11 in use to its optimal
value.

For the second model, the optimal value is:

11* = max{O :$ 11 :$ ~ : Pr[Y > ~ -11 I B ~ 11] < PB }.


PY+PG
Because the probability is a monotonously-increasing function of 11, the in-
corporation of the goodwill consideration decreases the optimal limit.

When considering the third model, every customer seeking a discount-


fare after its booking limit 11 has been reached independently has a probabil-
ity of'Y of seeking a booking in the full-fare class. That is, Y(11) = Y + U(11),
where U(11) = E!'l+l Di, with Di'S being independent and identically-
distributed Bernoullie random variables having EDi = 'Y. The optimal
discount-fare booking limit is:

11* = max{O :$ 11 :$ ~ : Pr[Y + U(11) > ~ -11 I B ~ 11] < ~: - ~PY }.


-'Y PY
If py is replaced by py + PG,
then the consideration of upgrading is incor-
porated into the model. All these models provide tighter limits than what
would have been provided by Littlewood's model, thus produce lower full-
fare passenger spill rates.

Glover et al. [57] present a time-space-network-based seat-allocation


model to find the mix of passenger itineraries Howing over the airline's net-
work in independent fare classes that maximizes total revenues. Nodes in
the network are associated with Hight segments' departure time-station pairs
and arrival time-station pairs. One set of arcs in the time-advancing di-
rection represents the Hight legs and has the Hight legs' capacities as the
arcs' How capacities, and another set of arcs in the time-reversing direction
represents passenger itinerary-fare class combinations (PIs) and has their
demands as the arcs' How capacities. The last set of arcs links nodes at
the same stations in the time-advancing direction that represents passen-
gers transferring Hights. With the fare for each PI being designated as a
Optimization Applications in the Airline Industry 657

negative cost, the problem of globally allocating each flight segment's seats
to various PIs is solved as a minimum cost network flow problem with side
constraints. The model accommodates up to 600 daily flights, 30,000 passen-
ger itineraries, and five fare classes. The model uses each PI's deterministic
demands as the upper limit for its level of allocation. This is a good ap-
proximation if demand far exceeds supply.

Curry [35] combined the marginal seat revenue and mathematical pro-
gramming approaches to find the optimal seat allocations when fare classes
are nested on an origin-destination itinerary, and the seats are not shared
among different origin-destinations. Classes with different fares and the
same origin-destination are allocated to nests, with higher fare classes' nests
containing lower fares. A fare class can take the seats allocated to classes
with lower fares. The mathematical model maximizes the expected total
revenues generated from all the nests, subject to the constraints imposed
by the limited capacity of each flight leg. Using the marginal seat alloca-
tion approach, the expected revenue from each nest is found to be a convex
function of the number of seats allocated to the nest. The problem is then
approximated as a piecewise linear programming problem and solveJ by us-
ing linear programming techniques.

Mayer [96] addressed the two-fare nested case where both classes are al-
lowed to be filled at the same time. Titze and Griesshaber [149] simulate the
nested case when low-fare passengers book before high-fare passengers book.
Belobaba [20] presented the Expected Marginal Seat Revenue (EMSR) algo-
rithm for finding an approximately optimal policy for seat allocations when
demand for each fare class is normally distributed. Wollmer [158] gave a
simpler and faster algorithm for finding the optimal seat allocation policy
that maximizes mean revenue by establishing a critical value for each fare
class. Booking requests for a particular fare class are accepted if and only
if the number of empty seats is strictly greater than its critical value.

In practice, passengers are allowed to cancel their reservations booked


in advance or not to show up at all without penalty. Even when a flight is
solidly booked, it is likely to have vacant seats at take-off time. To offset the
revenue loss incurred by this practice, airlines have adopted the overbooking
policy, which allows the numbers of bookings to exceed the available seats
on a flight. The level of allowable overbooking must be determined by the
airlines. Too tight a level often produces a low occupancy rate of the airlines'
658 G. Yu and J. Yang

perishable seats, while too loose a level decreases passenger goodwill of those
who are denied seats they have booked and creates expense for the airline
as it tries to compensate those passengers. The difficulty of determining a
good policy of overbooking is due to the randomness and unpredictably of
passenger cancellations and no-show rates.

Shlifer and Vardi [l29] let the upper bounds of the number of bookings
N*(t) allowed at various stages of the booking process be the decision vari-
ables. They used the fact that the ratio of show-ups at take-off time versus
the number of bookings at any given time is independent of the elapsed time
since the booking was made. They made a justifiable assumption that the
number of show-ups at take-off time, when N(t) is the number of bookings
at time t, is a normal random variable with expectation N(t)a(t) and vari-
ance N(t)b(t), where a(t) and b(t) are experimentally-observed parameters.
Three criteria were used to determine N*(t) at any time t, with it being
the largest number that simultaneously satisfies all the criteria. The three
criteria were:

• "1* = maximum allowable probability of show-ups exceeding the ca-


pacity of the plane M.
• (J* =maximum allowable ratio of expected rejections over expected
show-ups.
• e* = ratio of the loss 02 incurred by rejecting a passenger and the
profit 01 incurred by carrying one. Given this ratio, the model looks
for the policy that maximizes the expected revenue of a flight.

In the case of a single-itinerary flight carrying a single type of passengers,


the probability of show-ups in excess of capacity TJ(N), given N reservations
booked at time t, is expressed as:
Na-M
TJ(N) = ¢( VNb ),
where ¢(.) is the standard normal distribution. We can find NT/* by solving
TJ(N) = "1*. The expected number of rejected passengers D(N), given N
reservations booked at time t, is written as:
r;:;;- Na - M
D(N) = (Na-M+vNb)¢( VNb ).
Nb
Optimization Applications in the Airline Industry 659

With O(N) = D(N)/Na, NO" is the solution of O(N) = 0*. The expected
gain of a flight R(N) = Cl(Na - D(N)) - C2D(N). Denoting R(N) =
R(N)/C1 , we have:

R(N) = Na - (1 + e*)D(N).
In a certain range, both D'(N) and D"(N) are larger than zero. Thus R(N)
has a single maximum with the optimal solution NC. The optimal policy
is defined as
N* = min{N'''* ,NO' ,NC}.
For other more complicated cases involving more than one itinerary and pas-
senger type, the approaches go along the same line. The implementation of
the decision rules substantially reduces the instances when passengers were
affected by overbooking and at the same time maintains a high utilization
of aircraft capacity.

Rothstein [121] applies dynamic programming to solving a Markovian


sequential decision process for an airline's overbooking policy when there is
only one fare class. The policy's decision variables kn(t)'s denote the number
of additional reservations allowed to be made from t = 1 (the period just
before departure) to t = T (the period starting T units before departure).

°
At each t, the decision variable depends on the previous booking level n,
where n ranges from to nt (the largest booking level prior to t which has
a positive probability). The optimal policy is the one that maximizes the
expected total revenue, which takes into account the penalty incurred for
denying passengers from boarding.

The following parameters, which are all constructed from experimental


data, are used in the model:

'l/Jn (T) = the probability that n passengers are booked at the beginning
of period T, n = 0, ... , nT.

di(t) = the probability that there are i reservation demands during pe-
riod t, t ~ 1, for i = 0, .. , it, with it being the maximum number of demands
with positive probability in period t.

Cjln(t) = the probability of j cancellations during period t out of n


passengers already booked at the beginning of this period. No-shows are
660 G. Yu and J. Yang

included among the cancellations in period 1.

Uh{t) = the probability that h unconfirmed reservations will be recorded


during period t, and in the period of each t, for h = 0, ... , ht, where h t is the
maximum value of h with positive probability.

Rothstein defined Vn{t) to be the maximum expected gain achievable


by any booking policy, given that n passengers are already booked at the
beginning of period t for t ~ 1, and Vn{O) is the maximum gain when
n passengers with recorded reservations, plus any no-records and standbys,
arrive for boarding. Vn{O) is fairly accurately estimated, given the airplane's
cabin size, distribution of the number of standby passengers, revenue gain c
per passenger being carried, and revenue loss b per paSsenger being denied
boarding. Vn{t) for t ~ 1 and the optimal policy are reached by using the
following recursive relations over time from t = 0 to t = T, and at each t,
from n = 0 to n = nt:
k ht n
Vn{t) = max{2: di(t) 2: 2: Uh {t)Cjln (t)Vn+i+h-j {t - 1)
k i=O h=Oj=O
00 ht n
+ L: di(t) 2: 2: Uh (t)Cjln (t)Vn+k+h-j (t - I)}.
i=k+l h=Oj=O
The unconditional expected revenue gained from the optimal policy is:
nT
E{V) = L 1Pn{T)Vn(T).
n=O

Rothstein also dealt with the situation where a maximum expected de-
nied boarding ratio is enforced. The ratio Rn(O) at departure time, given
that there are n recorded reservations, can be evaluated. If Rn(t) is de-
fined as the expected denied boarding ratio when there are n reservations
at the beginning of period t, for t ~ 1, then there is a recursive relation for
these Rn(t)'s based on conditional probability, given that the optimal policy
(kn(t), n = 0, ... , nt, t = 1, ... , T) is at hand:
kn(t) ht n
Rn(t) = L di(t) L L Cjln (t)Uh (t)Rn+i+h-j (t - 1)
i=O h=Oj=O
Optimization Applications in the Airline Industry 661

00 ht n
+( L di(t)) L L Cjln(t)Uh(t)Rn+h+kn(t)-j(t - 1).
i=k n (t)+1 h=Oj=O

When the optimal policy is used, the unconditional expected denied board-
ing ratio is:
nT
E(R) = I: 'ifJn (T)Rn (T).
n=O

Rothstein argued that the unit denied boarding penalties band E(R) move
in opposite directions. He suggestd that b be tuned to get an optimal policy
such that E(R) is less than and as close to the enforced maximum ratio TO
as possible. In so doing, he treatd b as some intermediate value to reach his
optimal policy, while b has its true meaning in reality. The result achieved
by manipulating b is rather doubtful.

Alstrup et al. [2] extended Rothstein's model to the case of two-fare


classes. They employed a two-variable stochastic dynamic programming
model developed by Ladany et aI. [65][78][79] to decide the optimal over-
booking policy for flights with two types of passengers. The policy is set to
be the number of reservations allowed to be made for two classes (C and M)
in the time period t, UC(t) and U M(t). The state variables are the booking
levels at the beginning of various time periods for the two classes, BC(t)
and BM(t). The goal is to find the optimal values for UC and UM, so that
the expected total cost when the booking levels at the beginning of period
T are given as BC and BM; namely, V(BC, BM, T) is minimized.

Alstrup et al. defined PBC(i, t) as the probability of i reservation de-


mands during period t (t units before departure) for class C passengers,
P BM(j, t) as the probability of j reservation demands during period t for
class M passengers, PCC(BC, k, t) as the probability of k cancellations dur-
ing period t for class C passengers when the booking level is BC at the
beginning of the period, and PCM(BM, g, t) as the probability of 9 cancel-
lations during period t for class M passengers when the booking level is BM
at the beginning of the period. The recursive relation for V(BC, BM, t) is:

V(BC, BM, t) =

UCUM BC BM
min {L L L L P BC(i, t)PBM(j, t)PCC(BC, k, t)
UC,UM i=O j=O k=O g=O
662 G. Yu and J. Yang

PGM(BM, g, t)V(BG + i - k, BM + j - g, t - 1)
UM BC BM
L PBM(j, t)PGG(BG, k, t)
00

+( L PBG(i, t)) L L
i=UC+l j=O k=O g=O

PGM(BM, g, t)V(BG + UG - k, BM + j - g, t - 1)
00 UC BC BM
+( L PBM(j,t))L~~PBG(i,t)PGG{BG,k,t)
j=U M +1 i=O k=O g=O

PGM{BM,g, t)V(BG + i - k,BM + UM - g, t -1)


00 00 BC BM
+( L PBG{i,t))( ~ PBM(j,t)) L LPGG(BG,k,t)
i=UC+l j=UM+l k=Og=O

PGM(BM, g, t)V(BG + UG - k, BM + UM - g, t - I)}.

To calculate V(BG, BM, T) for every integer combination of (BG, BM)


in the rectangle area determined by the lower-left point (0,0) and the upper-

°
right point (Gupperbound, Mupperbound), calculations have to proceed from t =
to t = T, by using the recursive relation. For each t, V(BG, BM, t)
for every (BG, BM) in the rectangle has to be computed and the optimal
(UG,UM) associated with this (BG,BM,t) triplet needs to be stored as
that instance's solution. The authors aggregated passengers into batches
of size 5, and used the rule-of-thumb that optimal (UG, U M) = (0,0) for
(BG',BM') leads to optimal (UG,UM) = (0,0) for BG > BG' and BM >
BM', to .reduce computation time. For t = 0, V(BG, BM, O)'s estimation
takes into account the probability of passenger no-shows:
BCBM
V(BG,BM,O) = L L PNG(i,BG)PNM(j,BM)cost(BG - j,BM - j),
i=O j=O

where P NG(i, BG) is the probability for i no-shows out of BG booked class
C passengers and P N M(j, BM) is the probability for j no-shows out of
BM booked class M passengers, cost(BG*,BM*) is the cost incurred on
the airline when the real number of show-up passengers for the two classes
are BG* and BM*, respectively. This value includes the impact of turn-
ing down excessive passengers and downgrading classes for passengers. All
the possibility functions, PBG, PBM, PGG, PGM, PNG, and PNM,
Optimization Applications in the Airline Industry 663

are estimated from the real data provided by Scandinavian Airlines Systems
(SAS). Simulations and comparisons with other simpler models indicate that
the decision tables obtained from the model define an efficient booking pol-
icy.

None of the models mentioned above lets passenger demand depend ex-
plicitly on fares. Weatherford [156] proposed a model in which fares are
decision variables along with seat allocation upper limits, while passenger
demands are random variables affected by fares. Furthermore, his model
takes into account the effect of cross-elasticity, i.e., the demand for a fare
class depends not only on the price of this class, but also on prices of other
competing fare classes. Weatherford assumed that passenger demand for
each fare class is a normal distribution whose mean is a linear function of its
own and neighboring classes' prices. The objective of his model is to maxi-
mize the expected total revenue. By making certain assumptions conceiving
the situations when passenger demand exceeds booking upper limits, the
model is solved by nonlinear optimization tools.

To be effective, the above models all rely on accurate passenger demand


and no-show forecasts. Typical forecasting methods are simple time series
models like moving averages and exponentially smoothed averages, causal
modeling that seeks quantitative relationships between the dependent vari-
ables being predicted and any cross sectional factors that might have impact
on them, and time-series cross-sectional modeling like Kalman filters. Sun et
al. [136] introduced the adaptive neural networks modeling approach. The
neural network consists of three layers of nodes: one input layer, one hidden
layer, and one output layer. The input information for dynamic forecasting
such as day of the week, origin-destination pair, and current booking level is
transformed into values on the input nodes. The final outputs are uniquely
determined from values on output nodes which are transformed from values
on input nodes through the three layers. The processes that transform the
input values to the output values are simple weighted averages. The weights
are stored on the arcs between different layers. The arcs' linkage and the
weights are subject to change so as to lessen the difference between the
output and the actual outcome during the adaptive training process. The
approach compares favorably to other traditional approaches in a compar-
ative study. BANKET, the forecasting module that employs this approach,
has been implemented on a daily basis for ten airlines since 1989.
664 G. Yu and J. Yang

4 Flight Planning and Fleet Assignment


Flight planning is a critical stage of in airline's planning process. When a
flight schedule is given, a major proportion of costs and revenues are fixed.
All the subsequent planning stages have to optimize the use of resources in
the space restricted by the schedule. Therefore, optimization of the flight
schedule is of great importance to an airline. In most airlines, the flight
schedule is drafted several months before it is put into execution. When the
first draft comes out, it is studied by the various departments involved in the
work of fleet assignment, Crew scheduling, aircraft maintenance, and other
resource allocation processes. After the draft's feasibility and economics are
evaluated and changes are recommended, it is sent back to the flight plan-
ning department for revision. Normally, a flight schedule goes through an
iterative procedure of this kind many times before reaching its final, ready-
to-execute form.

The factors that must be considered to efficiently and effectively draft a


flight schedule include [45]: the demand function and associated revenues for
each origin-destination market over the time-of-the-day and the day-of-the-
week of the planning cycle; features of the routes such as distances, opera-
tional restrictions, and aircraft characteristics such as capacities, speeds, fuel
costs, crew assignments, etc.; and other operational constraints. To capture
all the details of an airline's operations and produce an ultra-optimal solu-
tion for the flight schedule amounts to an intractably-complicated task. All
the currently-existing planning models only take into consideration some
simplified functions of passenger demand, aircraft operation costs, route
characteristics, and other restrictions into consideration. Also, the flight
schedule essentially provides a friendly framework on which other down-
stream planning processes are based. Most planners plan the frequency of
service on each route first, then determine the departure times on the basis
of the time-of-the-day variability of demand and connectivity of flights.

Since an airline's profit is monotonously dependent on its market share,


to maximize its market share with limited aircraft capacity is its primary
goal. Simpson [128] and Teodorovic [141] showed that market share on
routes with a large number of competitive carriers is determined largely by
flight frequency. The other factor an airline has to consider during its flight
schedule planning is passengers' satisfaction levels, because of its effect on
the airline's long-term profitability. There are various measures of this level,
Optimization Applications in the Airline Industry 665

while only the timing of departures and frequency of flights are related to the
flight schedule. Making some simplifying assumptions of timings, it can also
be transformed into a frequency issue. Therefore, deciding the frequency
of flights in an airline's serviced network is a key decision-making process
crucial to its short-term and long-term success.

Teodorovic and Krcmar-Nozic [143] proposed a multi-criteria model that


incorporates the major considerations in determining a good flight schedule
in a competitive environment. The model's first objective is to maximize
a designated airline p's total net profit. The profit on each route i is the
revenue minus the cost. Revenue is proportional to the total number of
passengers captured, and cost is proportional to the frequency of flights on
i. According to Powell [113], the number of passengers captured by airline
p on route i is:
m
Vip = ~iNi;/ E Nij,
;=1
where I-'i is the average number of passengers to take route i, a is an em-
pirical parameter between 1 and 2, m is the number of competitive carriers
including p, and Ni; is carrier j's frequency of flights on route i. There-
fore, the total net profit that p can make from operating on all the k routes
with flight frequency of Nip for route i when others fly the same route with
frequencies Ni;, j "I pis:

k m
Pp = ~)(Cil-'iN:;/ L: Ni'j) - CiNip],
i=1 ;=1

where Ci is the average ticket price on route i, and Ci is the total cost per
flight on route i.

The model's second and third objectives are related to the passengers'
satisfaction levels. The second objective is to minimize the average scheduled
delay of passengers due to the discrepancies between scheduled departure
times and passengers' desired departure times. Based on some stochastic
assumptions, the model takes Sdip = T/(4Nip) as the average scheduled
delay per passenger when served by airline p on route i, where T is the
time interval during which passengers express a desire to fly. When taking
into account the number of passengers on each route taken by p, the total
666 G. Yu and J. Yang

II: m
Sdp = E~iN:-IT/(4 ENm].
i=l ;=1
The third objective is to maximize the number of passengers captured
by airline p. This is for the far-flung purpose of minimizing the number of
passengers being turned a.wa.y due to lack of seats. Carrier p's total expected
number of passengers is:

In building the model, the constraint that must be considered is the


airline's limited capacity; that is, it has limited resources required to produce
a certain number of flying seat-hours. This constraint can be expressed as:
II:
E NipntBi ::; Sp,
i=1
where n is the number of seats in the plane used for executing the flights,
and tBi is the time needed to fly route i. On a certain set A of routes, some
flight frequencies (the maximum allowed) cannot be exceeded. So, there is
a group of constraints that reads like this:
\:Ii E A,

where Ni~ is the maximum allowed flight frequency on route i. On some


other set B of routes, net profits are required to be nonnegative. Using
the previously described expression for the net profits, we have yet another
group of constraints:
m
Ni~-l (2:, Nlj 2: Ci/(CiJ.Li), \:Ii E B.
;=1

The complete model is shown below:


k CiWNP
max Pp = ~) Em ,;p .. - CiNip]
i=1 ;=1 °ZJ
Optimization Applications in the Airline Industry 667

k II.'N~
- ~( ,..1 'P )
max Tp - ~ "'!l N~
i=l Li3=1 '3
. Sd _ ~[ J1.iT Nfp-l ]
mm p - ~ 4 "'!l N~
i=l Li3=1 '3
subject to
Nip ~ Ni~ Vi E A
N~-l C.
'p >-' Vi E B
EJ!=l NiJ - CiJ1.i
k
~NipntBi ~ Sp
i=l
Nip (i = 1,2, ... , k) integer.
This is a nonlinear integer multi-criteria model. The competitors' flight fre-
quencies are assumed to be known parameters in the model, and hence are
excluded from the decision variables.

To obtain a solution which optimizes all of the objectives simultaneously


is impossible in the general case. For this kind of multi-criteria problem, only
a Pareto-optimal solution, the solution where no criterion can be improved
without simultaneously worsening at least one of the remaining criteria, can
be achieved. More precisely, x* E X is a Pareto-optimal solution if there is
no other x E X such that:

Ji(x) ~ Ji(x*) Vi = 1,2, ... ,r

with at least one strict inequality being satisfied. In the expression, h (x), ... , fr(x)
are the objective functions that are to be maximized. An objective function
to be minimized can simply be converted to one that is to be maximized. In
the paper, Teodorovic and Krcmar-Nozic tried to find a satisfactory solution
x E X which holds the inequalities:

Vi = 1,2, ... , r,
where i = (11, ... , iT) is the aspiration level given by the decision maker.
Their interactive solution method was based on the work of Nakayama and
Sawaragi [103]. The brief layout of the algorithm is as follows:
668 G. Yu and J. Yang

Step 1. Assign an ideal point f* = Ui,···, J;) where It is sufficiently


large, for example It = maxxEX li{X).

Step 2. In the kth iteration the decision maker is asked to give the
aspiration level H < It for every criterion Ii, i = 1,2, ... , r.
Set Wik = 1/Ut -In, i = 1,2, ... , r, and solve the problem:

i = 1,2, ... ,r

and get the solution xk.

Note that the closer the aspiration level for i is to its ideal value, the
bigger its weight W ik is.

Step 3. Based on the values of h{x k ), i = 1,2, ... , r, the decision maker
sets the aspiration level for the next iteration or quits the procedure with
xk being the final solution. If the decision maker wants to continue, then
his options in setting n+1's are:

.H+l > li{X k ), if the decision maker wants to improve the ith criterion,
H+1 < h{xk), if the decision maker is ready to worsen the ith criterion, or
H+1 = h{xk), if the decision maker accepts the ith criterion as it is.
To continue the procedure, put k = k + 1, and go to Step 2.
The subproblems to get the ideal point and the min-max problems in
Step 2 are hard-to-solve nonlinear integer programming problems. The au-
thors applied the Monte Carlo integer programming method [32] to get ap-
proximate solutions for them. The idea is to generate many feasible solu-
tions and choose the best-performing one to be the output solution. The
numerical experiments conducted by the authors on forty routes with two
competing airlines achieved satisfactory solutions.

Levin [83] proposed several models to help determine the selection of


flights. The variability of the schedule comes from the selection of flights
from a larger set of potential flights. For each intended flight, a bundle of
flights with departure times in a time neighborhood is considered. Con-
ceivably the models are for the cases when the flight frequency for each
origin-destination market is already determined. The models' objective is
to minimize the use of aircraft to make all the connections to fulfill the in-
Optimization Applications in the Airline Industry 669

tended flights.

One of Levin's models follows the following line of thinking. When treat-
ing each flight as a node i and considering an aircraft serving an earlier flight
i being able to serve a later flight j as a directed arc (i,j), the problem be-
comes that of finding the least number of chains in the thus constructed
network G, such that exactly one node in each bundle is included in one
of the chains. Furthermore, the number of nodes required to be covered is
a constant INI, the number of intended flights, while every chain has arcs
one less than its nodes. When network G is mapped to a bipartite graph
G* = {S, T, A *}, such that each node i corresponds to a node Si E Sand
another node ti E T and each arc (i,j) corresponds to an arc (Si,tj), the
problem of minimizing the number of aircraft being used can be formulated
as maximizing the total 0-1 flow in graph G* such that no more than one
node in S(T) in each bundle has outgoing (incoming) flow and each node
sends out (receives) flow to (from) at most one other node. The bundle
without in-going or outgoing flows can be thought of as having one aircraft
committed to it, and there is no preference for anyone of the alternate
flights.

If we define the following sets and variables:

KI = {k I Sk belongs to the lth bundle},

A(i) = {j I (Si' tj) E A*},

B(j) = {i I (Si' tj) E A*},


Xij = the flow from Si to tj,

Ui = 0,1 the indicator of whether the ith flight is selected,


then the model's mathematical formulation can be listed as below:
INI
max z = LL L Xij
l=1 iEKI jEA(i)

subject to
LUi = 1 Vl
iEKI

Ui 2:: L Xij Vi
jEA(i)
670 G. Yu and J. Yang

Uj;::: 2: Xij 'Vj


iEB(j)

Ui,Xij = 0,1 'Vi,j.

Levin suggested using the Land-and-Doig type [80] branch-and-bound


algorithm to solve the model optimally. In the solution tree, if the LP
solution of the terminal node with the maximum LP objective value among
all the terminal nodes is integral, then the optimal solution found is claimed
to be just this solution; otherwise, one Ui which is between 0 and 1 in the LP
solution is fixed to 0 and 1, respectively, to cover this current terminal node
with two new terminal nodes. After these new nodes' LP solutions are found,
the terminal node with the largest LP objective is searched again, and the
whole process repeats like this until the optimal solution is found. Because
the algorithm keeps track of the problem's upper bound, the integral LP
solution is actually the global optimal solution for the problem.
Another model Levin gave was built directly on the physical layout of
the set of possible flights being considered. It was based on a time-space
network. Each departure time and station pair constitutes a node of the
network, so does each arrival time and station pair. Each potential flight
links two nodes which correspond to its departure and arrival. All the nodes
at the same station are linked in the time-advancing direction by the so called
ground arcs, and the latest node am at a station is linked back to the earliest
one Am by an overnight arc to complete a cycle. A flow of 1 in one flight arc
signals the selection of this flight and a flow of 0 otherwise, and a conserved
flow is kept in the network. The problem of minimizing the use of aircraft
is treated as minimizing the total amount of flow in all of the overnight
arcs such that exactly one arc in one bundle has flow 1 while others have
zero flow, and such that the flow is conserved. The flow conservation comes
naturally due to the requirement that the number of aircraft is balanced. If
this time, the set Kl is considered as the set of flight arcs in the network
that corresponds to potential flights in bundle 1, the layout of the model is
as follows:
minz = 2:
iUm,Am
(Um ,Am)

subject to
L Ai - L lij = 0 'Vi
kEB(i) jEA(i)
Optimization Applications in the Airline Industry 671

L: Iii = 1 VI
(i,j)EK/

Iii E Z+ Vi,j.
Levin did not solve this model, but pointed out that this model has less rows
than the previous one does, thus possibly rendering less LP solving time.

Other works on planning of flight frequencies can be found in Dantzig


[36], Elce [42], Etschmaier [43][44], Kushige [76], Miller [99], Richardson
[117], and Soudarovich [133]. The works on determining departure times,
on the basis of the time-of-the-day variability of demand and of the con-
nections of flights, can be found in Gagnon [49], Labombarda and Nicoletti
[77], Loughran [86], Richter [119][120], Struve [134], and Tewinkel [147].

Fleet assignment is to assign fleet types to flight segments after the flight
schedule is determined. The period for which the assignment is done is nor-
mally one day for domestic flights. The factors that influence schedulers
when assigning fleet types to various flights are: passenger demand, seat-
ing capacity, operational costs, and availability of maintenance at arrival
and departure stations. Actual aircraft are routed after the fleets are as-
signed to ensure the solution is operational. The operational requirements
at that stage mainly come from the necessary maintenance for each individ-
ual aircraft; A good flight schedule should also provide sufficient flexibility
to enable efficient crew scheduling to be done. On the other hand, flight
schedules are often revised to facilitate feasible or more effective flight as-
signments.

One important requirement of the fleet assignment is that the aircraft


must circulate in the network of flights. These so-called balance constraints
are enforced by using time lines to model the activities of each fleet type.
In the time line model, there is a network built on the flight schedule for
every fleet type. The components of the network for each fleet type are as
follows: Each flight's arrival corresponds to a node at the arrival station and
at the ready time, the time after which the aircraft can start to fly after the
previous flight, and each flight's departure also corresponds to a node at the
departure station and at the departure time. The ready times may differ
among different fleet types due to the fleets' individual physical conditions.
Connection each flight's departure and arrival nodes is the flight arc, and
between each node and its next adjacent node on the time line at the same
672 G. Yu and J. Yang

station is the ground arc. There is a ground arc for each station connecting
the last node to the first node for a station to complete a daily operational
cycle.

For each assignment of a fleet type f to a flight i, there is a cost cli


incurred. The direct operational costs are easy to estimate. These costs
include fuel costs, crew costs, landing fees, etc. But the cost due to the not-
so-perfect assignment of a fleet type to a flight segment is hard to assess.
Aircraft seats are the most perishable goods: once the aircraft takes off, the
empty seats do not generate revenue but incur operational costs. Therefore,
assigning a bigger aircraft than is needed to a flight is a waste of airline
assets. On the other hand, by assigning a smaller aircraft than is needed
to a flight, potential passengers will be spilled because of the inadequate
capacity. Some of the passengers will be recaptured by other flights of the
same airline, but the rest will be won over by its competitors and other
transportation means. Hence, a substantial amount of revenue that could
have been generated by these passengers will be lost by the airline. The
difficulty in deciding Cli'S lies in the fact that passenger demand on a flight
depends on extremely complicated factors that are hard to fully capture. It
takes experience and insight to make good assessment of eli's.

Hane et al. [60] used a model based on time line networks. For each
fleet type, there is such a network. They denoted nodes by fot, where f is
for fleet type, 0 is for station, and t is for time (discrete number of them
for each fleet's network). A flight arc that enters node fot is denoted by
fdot where d stands for the departure station of the flight; a flight arc that
leaves is denoted by f odt in which d stands for the destination station of
the flight. A ground arc that enters the node from the precedent time t/ot
is denoted by fot/ott, while another leaving it for the subsequent time t/ot
is denoted by fott/ot. Thus, each flight i on the schedule is represented by
an fOdoatd once and an fOdoatal once for each fleet type f, in which 0d
is i's departure station and Oa is i's arrival station, and td is i's departure
time, tal is i's ready time for fleet type f. An additional market constraint
considered in Hane et al. 's model came from the required throughs, a certain
set of consecutive flight pairs (il,jd, (i2,h), ... have to be serviced by the
same aircraft.

Other notations for describing the model are as follows: L is the set of
Optimization Applications in the Airline Industry 673

flights; F is the set of fleet types; N is the set of nodes in all the time line
networks of all the fleets; M is the set of all ground arcs; H is the set of all
required through flight pairs; and O(J) is the set of flight arcs that crosses
midnight. tl(JO) is the first node on the time line of station 0 in the net-
work of fleet f, while tn(Jo) is the last. C is the set of stations serviced by
the schedule; Dd(Jot) is the set of stations d's that make fdot's flight arcs;
Da(Jot) is the set of stations that make Jodt's flight arcs; and S(J) is the
number of aircraft available for fleet type J.

Decision variables of the model are: X /i (Xdot, X /odt') is the 0 - 1 vari-


able that indicates the coverage of flight i (also represented by dot and odt')
by flight f. Decision variable Y/ott' represents the number of type J aircraft
at station 0 between adjacent time points t and t'.

The model is presented below:

minE E C/iX/i
ieL/eF

subject to

L X,i = 1 Vi E L
/eF

LX/dot + Y/ ocfot t - L X/odt - Y/ ott+


fot
=0 V{!ot} EN
deDd(fot) deDa(fot)
Xli - Xli = 0 V(i,j) E H

LXIi +L Ylotn (fo)t1(fo) ~ S(J) Vf E F


ieO(f) oeC

Y lott, ~ 0 V{! ott'} E M


Xli E {O, I} Vi E L,f E F.
The first set of constraints guarantees that each flight is covered by one fleet
type exactly once; the second set preserves the balance of aircraft at each
transition point; the third set enforces the required throughs; the fourth set
limits the utilization of each fleet type to the availability level; and the last
two sets specify the decision variable ranges. The integrality of Y variables
is brought forth by the integrality of X variables through the set of con-
straints. It should also be noted that the ground arcs are always used in
674 G. Yu and J. Yang

the time-forwarding direction so that aircraft can depart only after they are
ready.

Hane et al. devoted much effort to making real-size problems solvable in


reasonable time in the frame of this model. Their solution procedure goes
through a branch-and-bound routine. Prior to entering branch-and-bound,
the problem's LP relaxation is solved, and the X's whose fractional solutions
exceed 0.99 are fixed to 1. The other work aimed at reducing the problem
size includes:

• Node consolidation - An aggregated node is used to replace a sequence


of nodes that consists of consecutive arrivals and consecutive depar-
tures. The aggregation is legal, because it does not affect the causal
restriction on the flight connections; and thus does not alter the solu-
tion.

• Island construction - In a hub station's schedule, a whole day's activity


can be decomposed into several islands, with each island having an
equal number of arrivals and departures. And at any time, the number
of arrivals in the island is always no less than the number of departures
in the island. Ground arcs that connect the islands can be forced to
be o. If there is only one arrival and one departure in an island, both
flights must be flown by the same fleet type.

• Eliminating missed connections - If fleet type f misses a connection in


another fleet type g's island structure, then there are flights A and B
in the same island and a station s. s is A's destination and B's origin.
A is ahead of B in g's network, but after B in f's network. If we assign
flight type f to either A or B, an extra aircraft must overnight at the
station, which is undesirable. So we forbid A and B from being flown
by f by fixing the corresponding X's to be o.

A similar integer multiple commodity network flow model was proposed


by Yan and Young [161]. The model is also based on the time-space (time
line) networks for all the fleet types n. The major difference from the previ-
ous model is that this one not only incorporates costs incurred by assigning
a fleet type to flights in the model explicitly but also other costs such as
costs incurred by ground holdings and overnight stays. After grouping the
Optimization Applications in the Airline Industry 675

constraints that limit the use of aircraft within each fleet type to its avail-
able number and the constraints that guarantee that every flight is covered
by one and only one fleet type, the model is laid out as follows:

min 2: 2: G&X&
nEM (i,j)EAn

subject to

2: X& - 2:
jENn kENn
Xri = a

2: X& ~ as "'Is E R
(i,j,n)EH.

0<
- X!l·
tJ <
- U'!I.
tJ V(i,j) E An, Vn E M
X!l·
tJ E Z+ V(i,j) E An, Vn E M.
In the model, M is the set of fleet types, and Nn and An are the sets of nodes
and arcs, respectively, in the network for fleet type n. R is the index set of
the constraints that are grouped. For each s E R, HS is the set of triplets of
(i,j,n)'s that are involved in these side constraints. In the model, the first
set of constraints is the flow conservation constraints for the networks. The
second set of constraints is the side constraints just mentioned. The third
set specifies the lower and upper bounds for each arc. The fourth is for the
integrality requirements.

Yan and Young utilized the Lagrangian relaxation method to get the
lower bounds of the original optimal solution. The Lagrangian multipliers
are operated on the grouped side constraints. They are updated from itera-
tion to iteration based on the subgradient method [47][28] . In each iteration,
the Lagrangian relaxation problem is solved by the decomposition-based net-
work simplex method. Then, a Lagrangian heuristic is used to perturb the
solution for the relaxation to get a feasible solution of the original problem,
along with an upper bound. The stopping criterion is set to be that the
difference between lower and upper bounds falls into a small gap, or that
the number of iterations surpasses a prescribed value. A drawback of this
solution technique stems from the fact that even the optimal solution of the
Lagrangian relaxation over all possible multipliers does not supply a tight
bound for the original problem.
676 G. Yu and J. Yang

Daskin and Panayotopoulos [37] also utilized the Lagrangian relaxation


method to optimize appr~imately a fleet assignment problem. Their ob-
jective was to maximize profits in a single-hub-and-spoke network. Like the
above mentioned work, heudlilticlil are used to get a primal feasible solution
along with a lower bound, after each subgradient multiplier updating step
in which the upper bound is upgraded.

As introduced by Abara, American Airlines' fleet assignment model is


based on a network comprised of nodes of flights and stations and arcs
of turns [1]. Turns exist between stations and flights that represent flight
sequence origination and termination and between flights that can be con-
secutively served by a single aircraft. Costs or profits related to flights are
assigned to turns linking the flights. The model considers constraints of
flight coverage, continuity of equipment, schedule balance (total number of
departures equals total number of arrivals at each station in a planning
cycle), aircraft count, and other nonstructural constraints. The objective
could be to minimize total operational costs, to maximize total profits, or
to optimize the utilization of certain fleet types. Using the model helped
reduce American Airlines' operational cost by 0.5 percent, which is on the
order of tens of millions of dollars. Berge and Hopperstad [23] provided a
similar model, which is part of a dynamic fleet assignment system.

Desaulniers et al. [38] proposed two equivalent models to determine daily


fleet assignment with maximum anticipated profits. The merit of the mod-
els is that they incorporate a flexible timetable in the flight schedule. The
first model is of the set partitioning type with other constraints isolated for
each fleet type representing the balance of number of aircraft at each station
and the fleet capacity constraints. The set partitioning part specifies that
each flight is covered once and only once by a flight sequence assigned to a
certain fleet type. The second model is a time constrained multicommodity
network flow formulation based on a network similar to that of Abara [1].
The only difference is that now two flights are connected as long as the first
one's destination is the second one's origination, regardless of their sched-
uled timings. The time constraints come from the requirement that each
flight be flown within a prescribed time window. To solve the first model,
a branch and bound method with variable fixing and premature stopping is
used. The upper bound is gotten from the LP relaxation of the model. The
LP problem is further solved by Dantzig-Wolfe decomposition or column
generation technique, with the subproblems being the longest path problem
Optimization Applications in the Airline Industry 677

with time windows in the same network as the second model is based on.
Computational experiments are run on real airline data, and solutions with
substantial profit improvements are achieved in a reasonable amount of time.

Recently Talluri [139] presented an algorithm that uses assignment swaps


to improve daily fleet assignment. The algorithm finds swap opportunities
that satisfy the requirements of flow balance, aircraft count, and flight cov-
erage. The algorithm uses a small number of calls to a shortest-path algo-
rithm, which exists and is very efficient. The author also gave two further
applications of the daily swap algorithm in schedule development and shows
the way to incorporate a number of other factors into the algorithm.

Among all the carriers, Delta Airlines was the first to solve to completion
one of the largest and most difficult problems in this industry [135]. Delta's
Coldstart project models fleet assignment based on the time line networks.
Besides constraints for flow conservations, flight coverages, and fleet size
restrictions, the model has additional features to capture other operational
requirements. For example, maintenance arcs are added, extending from
an evening arrival node to a morning departure node at a certain station,
to capture the incident of an aircraft being assigned to this arc and go-
ing through a twelve-hour maintenance procedure. In addition, many soft
constraints are introduced into the model in the form of penalties in the
objective functions, to prevent hard-to-detect infeasibility from occurring.
For instance, fleet sizes are not fixed. Instead, excessive use of one fleet type
incurs penalties in the total cost. The ten-fleet, 450-aircraft, 2500-flight-per
day Delta Airlines yields a model containing some 40,000 rows and 60,000
variables.

The solution strategy for Coldstart is to use the OBI interior point code
[89] to solve the problem as an LP problem, fix some or all of the binary vari-
ables that are at 1.0 level, and solve the smaller problem after the variable
fixing with the OSL mixed integer programming code [39]. Before solving
the LP problem, node aggregation and other reduction techniques are ap-
plied. The size of the resulting LP problem is some 10,000 rows and 30,000
variables. Also, not all the 1.0 variables in the LP solution are fixed due to
the fear of infeasibility. A heuristic is used to select some of the variables.

At Delta, Coldstart was used for purposes even beyond its original scope
[89]. By changing the objective function to reflect variable fleet sizes and
678 G. Yu and J. Yang

include ownership costs, the model aided fleet planning. When the objec-
tive was changed from cost minimization to profit maximization, the model
developed routes by considering the addition of new legs and deletion of
existing legs. The traditionally time-consuming manual job of moving the
flight schedule from season to season can now be done by a slightly different
version of the model. The heavy use of the model by Delta for fleet schedul-
ing has recorded cost savings of $220,000 per day. It is also estimated that
the savings achieved by use of the model will accrue to $300 million in three
years.

5 Crew Scheduling
For most airlines, crew expense is the second largest cost component, second
only to fuel expense. The number of daily flights for the largest airlines like
American and United is in the thousands [55][59]. A small improvement in
crew scheduling can lead to savings of millions of dollars. This has driven
academia and the airlines to devote a large amount of effort into research in
this area.

Most airlines begin to devise their crew assignments for the next plan-
ning period right after the flight schedule for the current period comes out.
Typically, a planning period is about two weeks or one month. When de-
ploying pilots and flight attendants to flights, the airlines must conform
to limitations set by aviation administrations, union contracts, and their
own work rules. The maximum amount of work the airlines may assign to
the crews in a certain period, the minimum amount of rest time the crews
must have between two consecutive flights, payments and compensation the
airlines must make to the crews according to their work types, etc., are
regulated. Also, each crew has a home base. It must return to the base
after a sequence of flights. So crew scheduling's primary effort is to find a
sequence of connected flight segments for each crew that start and end at
its base, so that each flight segment in the planning period is served exactly
by one crew. Because flight schedules repeat every several days within one
operational cycle, the much simplified problem of finding the sequences of
flights for the several days can be solved first, then the schedule for the
whole period can be completed by repeating the sequences in the remaining
part of the period. The sequences are called crew pairings. A valid pairing
is one that conforms to all the limitations and rules mentioned above.
Optimization Applications in the Airline Industry 679

When assigning a crew to a crew pairing, there is a real cost to an air-


line. The principal component of the cost is pay and credit: the guaranteed
hours of pay minus the hours actually flown. Other components include
hotel costs, per diem, etc. The airlines try to avoid an unnecessary cost
resulting from an unwise crew assignment as much as possible. Therefore
crew scheduling's secondary goal is to find the least expensive way of allo-
cating crews to flights. Most airlines make the desirable crew pairings first,
then pack them into bills of work, or bidlines, each containing consecutive
pairings that cover a planning period. Individual crew members then bid
on them. Thus, there is no individual consideration in the scheduling period.

In its primary form, the scheduling problem is just the set partitioning
problem, partitioning the set of flight segments into disjoint pairings, each
containing a valid sequence of flights, to make the total cost the minimum:
n
(SPP) minI;Cjxj
j=l
subject to
n
LaijXj = 1 i = 1, ... ,m
j=l
X·J E {0,1} j = 1, ... ,n.

In the formulation, each aij is a 0-1 constant that specifies whether or not
crew pairing j includes flight segment i, each Cj is the cost that pairing j
incurs, and each Xj is a 0-1 integer variable that indicates the selection of
crew pairing j into the solution set. A general survey of this problem was
provided by Balas and Padberg [13].

Sometimes, a crew need not be assigned to a connected sequence of


flights. The disconnection can be mended by deadheading the crew from
one station to another. This practice can be reflected in the model by
changing the equality constraints to inequality ones of ";:::". When a flight
is covered more than once, then some extra crews are just deadheading on
board. This change causes a problem: the costs for the pairings are still
calculated as though every flight segment in each of them is a duty flight for
the crew that covers it. So many models simply don't consider deadheading.
680 G. Yu and J. Yang

In reality, the aggregate number of hours that crews located at a certain


base spend away from the base has to fall into specified limits during each
planning period. Taking this into account, there will be one more group of
so called domicile constraints in the model. The constraints are normally
expressed as:
n
d1 $ E DbjXj $ d't b=l, ... ,B.
j=l

Also, in some models, instead of strictly enforcing the legality constraints


for each pairing, penalties are imposed on pairings so that violations of le-
gality come at a price. This more realistically represents the actual practice.

There have been many attempts to solve the crew scheduling problem,
and more generally, the set partitioning and set-covering problems. Theo-
retical works often tended to find the optimal solution, but the sizes of the
problems they dealt with were too small for their methods to be directly
applicable to real world problems. Nevertheless, these works generated use-
ful techniques for and provided intuitive insights into the problems. This
was very beneficial to the practitioners. The real problems airlines face are
comprised of up to thousands of flight segments (rows) and billions of crew
pairings (columns). The columns are partially or wholly generated before
or during the optimization process. Heuristics have to be used to get near
optimal solutions.

Because some of the literature we will present deals directly with the
crew scheduling problem, some tackles the general-purpose set partitioning
and set-covering problems, and most uses a linear algebraic formulation, the
terminology used is not consistent. In the course of incorporating it, we will
also have to inherit this inconsistency. We will use flight segments, elements,
and row indices interchangeably, and crew pairings, sets, and column indices
interchangeably. Whichever term explains most clearly will be used.

Garfinkel and Nemhauser [50] proposed an enumeration algorithm to


tackle the set partitioning problem. Their algorithm groups all the sets into
blocks that are one-to-one mapped to all the sets, each consisting of sets that
contain the element corresponding to itself, but contain no lower-indexed el-
ements. The algorithm goes through a depth-first search. In a solution,
at most one set from each block is chosen. So the search for a solution is
Optimization Applications in the Airline Industry 681

sequentially done on blocks. A set in the current block that overlaps with
any selected sets will not be chosen. These limit the effort needed in the
enumeration. Christofides and Korman [31] added a lower bound from dy-
namic programming to facilitate a potential cut at each partial selection.
They showed that their branch-and-bound algorithm compares favorably
with many other existing ones.

Rubin [124] gave a heuristic that solves massive crew scheduling prob-
lems with domicile constraints. When an initial feasible solution is given,
the heuristic keeps on improving the pairing selection locally by solving a
smaller set partitioning problem for the set of flight segments that are cov-
ered by several pairings in the solution of the previous iteration. By choosing
the flight segments this way, no improvement affects the covering of other
unselected flights. Before handing the small set partitioning problem to an
optimizer, some matrix reduction techniques are used. Due to the presence
of domicile constraints, the optimal solution for the subproblem on a subset
of flights varies when the covering of the rest of the flights varies. So, the
subproblem for the same subset may be solved more than once. A proper
storage mechanism is used so that feasible pairings for that subset of flights
can be retrieved easily once they are generated. Rubin also proposed the
following method for producing the initial solution: surrounding a known
sequence of flight sequences with high-cost deadheads to a single crew base.
The artificial deadheads gradually disappear in the course of iterations owing
to their high cost, and a real feasible solution is achieved. Combined with an
efficient set partitioning optimizer, such as the ones that will be mentioned
later, Rubin's heuristic will be able to solve real-sized crew scheduling prob-
lems.

Marsten [93] proposed a branch-and-bound algorithm based on the same


block structure that Garfinkel and Nemhauser used. The branching nodes
in the algorithm are not the conventional partial setting of the decision vari-
ables. Instead, the constraint matrix's columns are grouped into blocks that
differ in their members' first non-zero row positions. The branching nodes
are taken to be the partial mapping of rows to blocks indicating for each
row concerned in the mapping the chosen block from which a column to
cover it will be chosen. The choices of branching from a branching node,
equivalently, the options for mapping a next row, are well limited by a sim-
ple examination of the incumbent setting. This is rendered by the judicious
choice of the block expression. When a partial mapping is decided, a re-
682 G. Yu and J. Yang

stricted region can be specified for each block so that only columns within
this region are included in a feasible solution. At each branching node, LP
relaxation of the subproblem involving only the limited set of columns is
solved, and its optimal value is taken as the lower bound for the remaining
branching tree. When aU the rows are mapped to some blocks, a feasible
solution is reached; the global upper bound is updated to be the solution's
objective if it is lower than the incumbent global upper bound. This algo-
rithm worked well on some realistic crew scheduling problems, but didn't
solve a problem of this kind with 400 rows. The numbers of columns in
the problems used to test the algorithm were not big enough to be realis-
tic. In a real crew scheduling problem, crew pairings, Le., the columns, are
not given. Normally, astronomical numbers of them are generated from the
informations about the flight segments and the rules governing the choice
of pairings. Without modification, this optimal algorithm is not capable of
solving real-size crew scheduling problems.

Hoffman and Padberg [68] also introduced a branch-and-bound algo-


rithm to solve the crew scheduling problem with domicile constraints. Its
merit is the utilization of an LP-based heuristic, a preprocessor for the LP
solver, and a constraint generation mechanism. The constraint generator
delves into mathematics of polyhedral theory.

The algorithm's major searching engine in each node is an LP-based


heuristic. The heuristic solves a series of LP subproblems repeatedly, seek-
ing a feasible integer solution. With the help of the preprocessor, the LP
problems put into the LP solver are mostly easier than the ones that are
intended to be solved, with the original ones having some variables fixed
to 0, and the constraint matrix reduced. Before entering the heuristic, the
LP relaxation of the overall problem with some fixed setting of variables
is first solved. If the solution is fractional, then a bigger partial setting
is achieved by fixing some other variables according to their values in the
LP solution, their reduced costs, and the gap between the upper and lower
bounds [34][112]. Afterward, a loop starts. Both at this time and inside the
loop, whenever the current LP is infeasible or its objective is higher than
the global upper bound, backtracking is induced; meanwhile, if a feasible
integer solution is found for the entire problem, the global upper' bound is
updated and backtracking is induced. Inside the loop, a sequence of re-
ducing LP problems are solved. After each LP solution, a decomposition
routine divides the reduced LP basis into blocks, and some controlled setting
Optimization Applications in the Airline Industry 683

is done in each block based on the LP result, so that a smaller reduced LP


is achieved. As mentioned earlier, the preprocessor further fixes a partial
setting and reduces the problem. This reduced problem is again put into
the LP solver. The loop goes on until infeasibility is detected, some exiting
criteria are met, or a feasible integer solution is reached. Also placed after
the LP solver is the constraint generator, which produces facet cuts to the
polytope of feasible integer solutions' convex hull on the fly. It is activated
after solving the entire LP relaxation or exiting the heuristic without finding
a feasible integer solution.

The preprocessor aims at reducing the LP problems being solved by fix-


ing variables to 0, eliminating redundant columns, merging columns, and
eliminating rows of the constraint matrices. The reduction is achieved by
doing certain manipulations repeatedly. The manipulations stem basically
from two observations based on the strict equality to 1 in the problems'
constraints. The first observation is: If GA is the intersection graph of the
set partitioning constraint matrix A = {aij}, Le., GA'S nodes correspond to
A's columns and a pairing of nodes has an arc linking them if their corre-
sponding columns have entries of 1 in a common row; and if K is the node
set of a clique in GA, then there is at most one variable that can be 1 among
all the variables corresponding to nodes in K. The second observation is: If
Mi is the set of all the columns that have entries of 1 at row i, then there is
one and only one variable associated with a column in Mi to be 1.

The constraint generator is run on the fly, utilizing the solution of the
current LP problem. It generates facet cuts that participate in defining the
convex hull of the overall problem's feasible integer solutions, or approxi-
mate in doing so. The set partitioning problem's facet cuts were studied
by Arabeyre et al. [7], Balas and Ho [12], Balas and Padberg [13], Gomory
[58], Padberg [111], and Sassano [126]. At the same time, these cuts are not
redundant in the sense that they are detectable by the current LP solution
to be "piercing through" the LP's feasible region. In essence, the constraint
generator tries to find inequalities like

n
"b·x· < bo
L....JJJ-
j=l

that are satisfied by every point of the IP convex hull, while


684 G. Yu and J. Yang

n
2: biXj > bo
j=l
with x being the current LP solution. The cuts are built from the set par-
titioning constraints and the domicile constraints, respectively. Those from
the set partitioning constraints are mainly from two sources, both relating
to the constraint matrix A's intersection graph GA. One is, if K is the node
set of a clique in GA, then EjEK Xj :::; 1. The other is, if C is an odd cycle
without chords in GA, that is C has an odd number of nodes, forming a
cycle and having no other arcs linking any two of them, then

2: Xj :::; (ICI- 1)/2.


JEG
Because the number of nodes and arcs of GA is too large to be handled
efficiently, the generator first gets cuts involving only the variables in F =
{jID < Xj < I}. The cuts result from doing clique and odd cycle detections
in F's intersection graph GF. Afterward, these cuts are raised iteratively to
include all the variables. The lifting is based on the following logic: If

l:: bjxj :::; bo,


JEF
then

l:: bjxj + bkXk :::; bo,


JEF

with bk = bo - max{2:: jE F bjxjlAFxF :::; em - ak }, where AF and ak are


submatrices of the constraint matrix A being limited on column index sets
F and {k} respectively, and em is the m x 1 vector of 1'so The order of the
remaining variables in the column index set being added to F is randomly
chosen. Efficient heuristics are developed for clique and odd cycle detections
and for estimating Zk(= max{2:: jE F bjxjlAFxF :::; em - ak } )'s upper bound.
By using Zk'S upper bound, the cut is loosened. If a cut is so loosened that
the current LP solution no longer violates its defining inequality, then it is
dropped.

Ball, Bodin, and Dial [15] and Ball and Roberts [16] attacked the crew
scheduling problem through the graph theoretical approach. Their proce-
dure performs set partitioning on a graph whose nodes represent all flight
Optimization Applications in the Airline Industry 685

legs and arcs represent all possible connections between the flight legs. The
problem of crew pairing becomes the problem of finding the minimum match-
ing for the graph, where a matching of a graph is a subset of that graph's
arcs with the property that no two arcs in the subset are connected to an
identical node. This is equivalent to partitioning the graph to paths that
satisfy certain minimum condition. The algorithm to solve this graph par-
titioning problem consists of two phases: pairing construction and pairing
improvement. Computational tests of the algorithm proved the algorithm
to be fairly efficient in dealing with the crew scheduling problem.

The algorithm's implementation CREW _OPT uses the steepest-edge


dual algorithm of CPLEX developed by Bixby [24] as its LP solver. In
the computational experiments conducted by the authors, astonishingly few
nodes were visited before optimal solutions were reached for most of the
problems. The most difficult problems used to test CREW _OPT have hun-
dreds of thousands of non-zero entries in the constraint matrix. They were
all solved to optimality in a matter of hours on a CONVEX model 550
machine. Using Hoffman and Padberg's solution technique, one is on the
verge of being able to solve the airlines' crew scheduling problem as a single
problem to optimality.

Lavoie, Minoux, and Odier [81] proposed a column generation method


to solve the linear relaxation of the crew pairing problem. In their paper,
a flight service is a sequence of flight segments that can be performed one
after another. The cost of a crew pairing is total absence time plus total
rest time for the crew, which reflects the real cost to the airline. The au-
thors invented different states for each flight service, so that the validity of
a crew pairing can be checked by each of the two consecutive flight services
with their states, instead of by the whole sequence of the flight services.
The method requires a preprocessing to construct a graph whose nodes are
all the flight services with all the possible states and whose arcs linked all
the valid flight service-state pairings. The graph contains all the possible
pairings. The problem itself was formulated as a set-covering problem, with
each column being a potential pairing. The authors used the column gen-
eration method to solve its LP relaxation. The subproblem of finding the
minimum reduced-cost column was found to be the shortest path problem
in the aforementioned graph. The method's implementation efficiently pro-
duced optimal results. Most of the resulting optimal solutions were integer
solutions. If a solution was noninteger, the authors suggested solving the
686 G. Yu and J. Yang

actual problem in the restricted pairing set which the linear solution's ba-
sis defines. This method was applied to crew scheduling for Air France.
Marsten and Shepardson [94] were also able to resolve the noninteger solu-
tions satisfactorily.

Since the early 1970's, American Airlines (AA) has been using the trip
reevaluation and improvement program (TRIP) to plan its crew assignments.
Based on Rubin [124]'s methodology, TRIP improves an initial manually-
made set of pairings by iteratively solving set-partitioning subproblems on a
subset of flight segments covered by several incumbent pairings. Each itera-
tion involves subproblem selection, pairing generation, and pairing optimiza-
tion. Marsten [93]'s method, based on branch-and-bound, partial mapping,
and LP relaxation, was used in optimization. Since 1986, major enhance-
ments to TRIP have been made [3]. Due to improvements in pairing gen-
eration and optimization, each iteration's speed has been increased tenfold.
The column screening technique made TRIP capable of solving subprob-
lems with 100,000 generated pairings, as opposed to 5,000 previously. Other
techniques were utilized to reduce the chance of being trapped into local
minima. These enhancements generated savings of about $20 million annu-
ally in crew assignment costs for AA.

In spite of all the improvements made to TRIP, it could only provide


local optimal solutions. To move another step toward the global optima,
American Airlines Decision Technologies (AADT) joined forces with IBM
to come up with a better methodology [4]. They first generated all the
pairings with low costs from the flight segments under consideration, then
used a column generation procedure to solve the LP relaxation of the result-
ing set-partitioning problem. The procedure involves iterations of column
selection and LP solving. Using the last iteration's dual variables, all the
generated columns are priced out, and some of the columns are selected for
the next iteration's LP solving if their reduced costs are low enough. To
find the feasible integer solution near the fractional LP optimal solution,
follow-ons for each flight segment are sought after the overall problem's LP
relaxation is solved. In essence, a flight's follow-ons are other flights that
very possibly will be its next flight in the same pairing in the integer so-
lution, judging from the LP solution. Because a flight can be assigned to
different pairings (fractionally) in the LP solution, branching has to be done
by locking some of the follow-ons to try all the alternatives. The reduced
LP after locking is solved again using column generation until all the flights
Optimization Applications in the Airline Industry 687

have their follow-ons. At this time, a feasible integer solution is found. This
method was applied to the crew scheduling problems of AA's Super 800 and
B 727 fleets. The resulting improvements amounted to savings of $300,000
on the TRIP solutions for a three-month period.

Graves et al. [59] used an elastic-embedded set-partitioning integer pro-


gramming model to solve United Airline's (UA) crew scheduling problem.
This model does not allow crew deadheading, but allows legality to be vio-
lated at a price. In real life, this price comes as "credit time" for which the
crew is paid in addition to actual flight time, basically for the inconvenience
incurred by the violation. Also, it considers the domicile constraints. Their
solution method first generates all the feasible pairings and finds a disjoint or
nearly disjoint set of pairings as an initial solution. It then makes 2-0PT or
3-0PT local improvements on all the possible combinations of two-pairing
or three-pairing subsets of the solution set. The method has been in use
since October 1989. It reportedly has been able to solve UA's narrow-body
problem of about 1,716 flight segments in about 800,000 cpu seconds on an
IBM 3090 mainframe computer. It is estimated that the system saves about
$12 million annually in credit time and about $4 million in hotel costs.

6 Air Traffic Flow Control


Flight delays increase operational costs of the delayed flights and the af-
fected downstream flights, inconvenience passengers, and thus damage the
cre.dibility of the airline and hurt passengers' goodwill for the airline. On
the other hand, the high demand for air travel has brought airports to their
saturation points. A small reduction in the airports' capacities can affect
many flights' on-time service. Because the most common factor that im-
pacts airports' capacities is bad weather, which is currently beyond human
control, not much can be done to improve the stability of airports' capac-
ities. What can be done is to mitigate the capacity reduction's impact on
airline operations.

Delays are divided into two types, the less costly ground delays beyond
takeoff times and the more costly airborne delays beyond landing times.
When delay at the destination airport is foreseen before a flight has taken
off, ground delay can be imposed on it to avoid more costly future airborne
delay. In the United States, the Air Traffic Control System Command Cen-
688 G. Yu and J. Yang

ter (ATCSCC) of the FAA is doing just that. It projects the capacity and
demand at various major airports in the time horizon of several hours on a
daily basis and makes ground holding assignments to various flights. But
the decisions from the ATCSCC are primarily based on the expertise and
judgement of human controllers. Much improvement can be made with more
thorough and accurate flow management.

For the ground holding problem (GHP), work was first done on single-
airport problems or problems with few airports involved. Odoni [106] gave a
systematic description on the problem. Terrab [145] solved the static version
of the single-airport GHP a.nd gave several heuristics for the probabilistic
version. He also presented formulations for static versions of the problem
with two and three airports. Richetta [118] tackled the single-airport dy-
namic probabilistic GHP. Vranas et al. [152][153] first considered the net-
work effect of GHP.

Andreatta and Romanin-Jacur [5] tried to find the best strategy to make
ground holding assignments in the presence of airport congestions under
some greatly simplified assumptions. The simplified problem instance being
considered is: A group of flights V = {VI, ... , V n } are all scheduled to land at
the same destination airport at t = O. The probability of the destination's
capacity being i, 1 ~ i ~ n, is known a priori as p( i) and known to be
unchanged when time approaches O. All delayed aircraft will be able to land
at t = 1. For flight i, the ground delay cost is gi, and the airborne delay
cost is ai. The authors first noticed that the delay cost was affected by
the landing priorities of the flights not being held on the ground. For each
aircraft Vi that reaches the destination at t = 0 among the q chosen in the
set of flights U q , it can land immediately if and only ifthe airport's capacity
exceeds the number of airborne aircraft that have landing priorities higher
than its own. If P(k) = E~=op(h) is the probability that airport capacity
is not higher than k, and ut is a subset with cardinality B{ of flights in U q
whose priorities are higher than that of Vi'S, then the total delay cost under
the choice of uq and priority 7r is:

The higher the priority Vi has, the smaller P(B{) is. So to minimize the
cost, the optimal priority assignment should be that the flight with larger
Optimization Applications in the Airline Industry 689

ai receives higher landing priority.

When the landing priority is determined such that Vi+! has higher pri-
ority than Vi, 'Vi E {I, ... , n - I}, the number of aircraft that have landing
priority over Vi is Si = n - i. If di indicates the decision on whether or not Vi
is to be held on ground, then Di, the number of flights with priority over Vi
that are held on the ground, can be expressed as Di = di+l +Di+!. If di = 0,
then Si - Di is the number of aircraft with priority over Vi that are not de-
layed on the ground, and the expected airborne delay cost caused by Vi is
aiP(Si -Di) = aiP(n-i-Di). Taking into account the possibility of di = 1,
the expected cost caused by Vi is C;(di' Di) = digi + (1- di)aiP(n - i - Di).
The authors introduce the quantity Ci(Di) as the optimal value of the ex-
pected delay cost to the first i aircraft given that Di aircraft with higher
priority than i have been delayed on the ground. The following forward
recursion equation exists for Ci(Di):

along with the initial condition:

CO(dl + Dd = O.
Using an O(n 2 ) dynamic programming algorithm, Ci(Di)'S of all possible i's
and Di's are calculated. The optimal total expected cost is C* = Cn(O), and
the optimal decision is d~ = dn(O), d~-l = dn- 1 (d~), ... , di = diCEj=2 dj).

The authors also prove that the optimal set of ground-delayed flights
when the allowed number of ground delays is specified to be q contains the
set when the allowed number is q - 1. From this, a simple algorithm is
devised for the case when the number of allowed ground delays is specified.
Though the authors reached very satisfactory results, their assumptions are
too simple for the results to render any practical use. In real life, there is
more than one destination airport to be considered, time of delay cannot be
set to be unique, and the prediction on airport capacity will change during
the time when none of the aircraft being considered have taken off.

Terrab and Odoni [146] also addressed the single-airport ground holding
problem with deterministic and stochastic capacity forecasts. The authors
assume that airborne delay cost is always higher than ground delay cost and
the takeoff capacities are unlimited. When the forecast is determined before
690 G. Yu and J. Yang

any flight has been executed, any airborne delay at the destination can be
obsorbed by the same amount of ground delay at the origin. So, only the
amount of delay must be decided for each flight in the deterministic case.
In the stochastic case, several scenarios with certain probability distribution
exist for the outcomes of airports' capacities. At the time of an aircraft tak-
ing off, the particular scenario has not unfolded. They assumed that when
an aircraft reaches an airport's airfield, the landing capacity at the airport
is determined. In this case, a common decision for each flight at takeoff time
is made anticipating any scenario, and an airborne delay decision is made
for every scenario for the flight. The objective in the stochastic case is to
minimize the expected total delay cost. Exact dynamic programming algo-
rithms were devised for both cases. Realizing the impossibility of the exact
solution for the stochastic case, the authors came up with several heuristics
for the problem.

Considering the congestion at one single airport is not sufficient. The


capacities of airports located near each other tend to be affected by similar
weather patterns. Because runways are tightly-used resources, downstream
airports' congestion will also cause congestion to upstream airports. Thus,
congestion frequently happens to several airports simultaneously. Wang
[155] decomposed congestion events into sets of events such that every event
in each set has at least k impacted flights in common with any other event
in that set and has less than k impacted flights in common with any event
outside the set. Within each set, the order in which the events' congestions
are to be erased is determined by running the shortest path algorithm on
a network containing the information on relations between costs and order
of congestion erasing. Strategies on how individual congestions are erased
were not provided, and the sizes of networks are of exponential orders of the
numbers of congestion events, which renders the process impractical.

Vranas et al. [152] considered the ground holding problem for a whole
network of airports. They assumed that each airport's capacity at any time
of the day is deterministic and known a priori. Another big step they took
in modeling was to discretize the time horizon into many time points. The
authors proposed three models: the first one considers both ground and
airborne delays, the second one assumes that there are no airborne delays,
and the third one adds flight cancellations to the second model. Here, we
discuss only the authors' third model. Before presenting the model, however,
we have to introduce the necessary notations. {I, ... , K} is the set of airports,
Optimization Applications in the Airline Industry 691

{I, ... , T} is the set of time points, and {I, ... , F} is the set of flights. For
each flight f, rl is its scheduled arrival time, and kj its destination airport;
the cost for delaying it a unit time on ground is cJ' the cost to cancel it is
MI, and the maximum allowed delaying time for it is GI' Therefore, Tj,
the set of possible arrival time for flight f, satisfies

Tj = {t E {l, ... ,T} : rl $ t $ min{rl + G/,T}}.


For each continued flight f' whose aircraft is used for a next flight f, S I' is
the slack time built into the flight schedule. It is the surplus time between
the two consecutive flights such that if f' is at most S I' late, f can still be
executed on time. For simplicity, the next flight for f' is denoted by f in-
stead of ff" For each airport k, its capacity at time t is Rk{t). The decision
variables are:
Vlt, which indicates whether flight k arrives at time t;
gI, which represents flight f's delay time; and
zf, which indicates whether flight k is delayed.

To represent both the situations of having and of not having spare re-
sources, the authors partition the set of flights into two sets F{, the set of
continued flights whose cancellations will not affect the next flight, and F~,
the set of continued flights whose cancellations will.

The model is as follows:


F
min I)c}gl + (MI + c}rl)ZI)
1=1
subject to
gI = L tv It - r I f E {I, ... ,F}
tETj

L Vft $ Rk{t) k E {I, ... , K}, t E {I, ... , T}


l:k'J=k

f' E F{
f' E F~
692 G. Yu and J. Yang

Vft,Zf E {O, 1} f E {1, ... , F}, t E {1, ... , T}.


The first set of constraints stipulates that the definition of gf is the amount
of ground delay for each flight f if it is not canceled. If it is, gf = -Tf
because none of Vft takes the value 1. The objective function is just the
total delay and cancellation cost. When zf = 1, the part contributed by
f is just Mf. The second set of constraints specifies that the number of
arrivals at any time not exceed the airport's capacity at that moment. The
third set of constraints states that a flight is either canceled or to arrive
during the allowed period of time. When Z I' = 0, the fourth and fifth sets of
constraints all become 9I' - s I' ~ 9f' which observe the constraints imposed
by the minimum slack times. When Zf' = 1, the fourth set of constraints
becomes gf + Tf ~ 0, which are always true, and the fifth set of constraints
becomes Gf + 1 ~ gf + (Tf + Gf + 1)zf, enforcing zf = 1.

The authors solved the models using a heuristic based on LP relaxations.


In the heuristic, the set of flights, q>, with fractional solutions is divided into
classes, with each class containing all the flights in q> to be flown by one
aircraft. Each class is processed one at a time, and the flights in each class
are also treated one at a time in the order in which they are to be flown by
the aircraft. At the time flight ¢ in q> is being treated, the times allowed for
it to land at its arrival airportk; are examined. The first t that satisfies the
airport capacity constraint Ef:kj=k'J, Vft ~ 1 - vt/>t is assigned to be flight
¢'s arrival time; that is, vt/>t is set to be 1. If such a time is not found,
then ¢ is canceled. If ¢ is assigned a time, the allowed time for its following
flight is restricted to that satisfying the coupling constraint. The authors
conducted computational experiments of the three models along with the
modified versions of the coupling, network constraints being removed. From
that, they drew the following conclusions:

• In general, network effects can be large, which implies that considering


ground delaying independently at each airport is insufficient.
• Finite departure capacities have negligible impact if they are assumed
not to influence arrival capacities.
• The heuristic performs well for low cancellation costs.

In another paper by the same authors [153], the dynamic ground holding
problem was addressed. When new and more reliable information regarding
Optimization Applications in the Airline Industry 693

airport capacities at subsequent times comes, new decisions are made to


make the best use of them. In the paper, decisions are made at certain time
points for those flights that have not taken off or landed. For those flights
made to land based on previous decisions, the new decisions cannot affect
them. At time T when a decision is to be made, the set of flights that has
not taken off is denoted by F/, and the set of flights that is in the air and
has not landed is denoted by F:. The mathematical formulation for the
decision making is similar to the static one. The only difference is that, in
the dynamic model, the decision on ground holdings of flights in set F:is
irrevocable, and the decision on ground holdings of flights in set FI can be
made to override the previous one, while in the static model there is no such
a disparity. Between two consecutive decision time points T and f, the sets
FI and F: evolve according to the following rules:
FI = Fi I {f E Fi : d, + 9J < f},
r,
F; = (F:I {f E F: : + G, +at < f})U
{f E Fi : (d, + 9t < f)(r, + 9J + at ~ f)},
where all the decision variables are written in the same form as in the first
paper, except for the additions of superscripts standing for decision making
times. Also for f E F:, G, is its ground holding time which is already irre-
vocable. The rules include: once a flight is planned at T to take off before
the current time f, then its departure can no longer be held at f. Also, if a
flight is scheduled to land before the current time in the last decision period,
no decision can be made on it; but, if a flight scheduled at T is to take off
before f and is to land no earlier than f, then the decision on its landing is
subject to revision at this point.

In a more restricted model, the authors assumed that the airports' take-
off capacities are infinite. At the time decisions are made, the only decisions
to be made are on ground holdings. The ground holdings are made to ab-
sorb airborne delays, though conditions at airports may be different from
what is expected at the times of flights' arrivals. The gist is, whenever a
takeoff will incur airborne delay according to the current capacity forecast,
it is not allowed to take off. Airborne delays only occur to those flights
that are already in the set F:, and only occur when they have to. Thus,
airborne delays are not regarded as decisions to be made, and the mini-
mum costs they incur are inevitable for the current stage of planning. To
694 G. Yu and J. Yang

accommodate this strategy, for each airport k and time point t, an excess
E'kt is introduced, which represents the number of aircraft about to land in
excess of the airport's landing capacity. Note that the excess is inherited
from the last period's decision based on inadequate information. Another
set of variables of carries the length of unavoidable airborne delays for the
flights. Whenever Ekt > 0, Ekt of the flights about to land at airport k at
time t are selected to be delayed another time unit. After the preliminary
calculations, the authors arrived at such a model at planning time T:

min E df9f
JEF!

9f = E tVft - TJ IE Fi
tETjnTT

L Vft ~ max(-Ekt,O) "Ik E {l",.K},t E {l, ... ,T}


JEF!:k'j=k

E Vft = 1 IE Fi
tEFjnFT

9/T ' - sf' <


- 9/T I' E F' n FE.
Gf'+of'-Sf'~9f l'EF'nF:
Vft E {O, I},
where TT is the set of time points at and after time T. The notations used
here are essentially the same as in the previous paper. Only now, the time
T at which the decision is made has to be explicitly specified. G f' is the
ground delay decided for flight i' before T which cannot be revoked at T.

In addition, the authors also considered extensions to the models when


cancellations are incorporated and when airports' takeoff and landing ca-
pacities are interrelated. The corresponding changes to the models are in-
significant. Another extension considers the probabilistic factor in capacity
forecasting. The model assumes that a forecast is not deterministically
made, rather probability distribution of several scenarios is given. Only
during the short period of airborne holding will one of the scenarios be re-
alized. A unique decision is made at takeoff time, but different decisions
are made during flight corresponding to different realized scenarios. The
goal is to minimize the total ground delay costs plus expected total airborne
Optimization Applications in the Airline Industry 695

delay costs. The authors conducted extensive computational experiments.


Following are some of their conclusions:

• If incorrect capacity forecasts made at the beginning of the day are


corrected early enough, then their influence on the total cost of the
dynamic problem can be minimized.

• A greedy dynamic FCFS heuristic simulating current practices con-


cerning ground holding decision is highly inefficient compared to ground
holdings based on the optimal solutions of the dynamic decision mod-
els presented in the paper.

Another model dealing with the multi-airport multi-period congestion


problem was provided by Helme [64). In addition, this model considers en
route capacity restrictions. The drawback of the model is that it does not
consider any downline effects beyond the destinations. The model treats
flights as commodities differentiated by their destinations. The commodi-
ties form conserved flows in a spa.ce-time network. The network consists
of three types of vertices. The origins are origin airport and time pairs;
the destinations are destination a.irport and time pairs; and the fixes are en
route space-time vertices that enable enforcement of en route capacities and
inclusion of airborne delay costs. Time is discretized to be time points 15
minutes away from each other. There are ground arcs in time advancing
direction linking origin vertices standing for the same origin airports, and
linking destination vertices standing for the same destination airports. Also
there are airborne arcs in time advancing direction linking fixes representing
the same locations. It is assumed that all flights between two airports take
the same amount of time without delay. The flight arcs linking origin, fix,
and destination vertices correspond to flights linking one vertex representing
one location and time pair to another vertex representing another location
and time pair. The time span between two vertices linked by a flight arc
is decided by the two locations. A unit flow on a ground arc incurs a unit
ground delay, and a unit flow on an airborne arc incurs a unit airborne delay.
Thus, delay costs can be placed on the ground and airborne arcs. Landing,
arrival, and en route capacities are all placed on corresponding arcs. The
model also assumes that all the flights being considered are executed within
the time horizon it covers. The planned departures form flow sources at var-
ious origin vertices, and the required landings form flow sinks at destination
696 G. Yu and J. Yang

vertices at the ending time.

The following notations are used in the model:

v = set of vertices in the network


A = set of arcs in the network
K = set of destinations, thus commodities
eij = minimum travel time on link from airport i to airport j
c{~ = cost of one unit of flow on the arc from vertex (i, t) to vertex (j, u)
F(i,t) = {(j,u) E V I [(j,u) - (i,t)] E A} = set of "from" vertices for
vertex (i,t)
G(i,t) = {(j,u) E V I [(i,t) - (j,u)] E A} = set of "to" vertices for
vertex (i,t)
r~ = net inflow of commodity k at vertex (i, t), for k E K
llf~ = capacity of arc [(i, t) - (j, u)]

The decision variables are x1~ (k), the number of aircraft headed for
airport k, departing from location i in period t and arriving at location j in
period u, \f[(i, t) - (j, u)] E A, k E K. The formulation is listed below:

min 2: c1~ 2: x1~(k)


[(i,t)-(j,u)]EA kEK

subject to

2: x1~(k) - L x~~(k) = rft \f(i, t) E V, k E K


(j,U)EG(i,t) (j,U)EF(i,t)

L x1~(k) ~ 111~ \f[(i, t) - (j, u)] E A


kEK

x1~(k) ~ 0 \f[(i, t) - (j, u)] E A, k E K


x1~(k) E Z+ \f[(i, t) - (j, u)] E A, k E K.
This is a multicommodity minimum cost network flow formulation. The so-
lution method proposed by the author is to find the initial feasible solution
by straightforwardly scheduling delay flights on the ground for the sole pur-
pose of accommodating reduced arrival capacities and then to improve the
solution step by step, such that in each step, a negative cost cycle is found
and flow is added in the cycle. In the early stage of development, only fic-
tional data were used for experimentation by using part of the improving
Optimization Applications in the Airline Industry 697

cycle method.

Rue and Rosenshine [125] used Semi-Markov decision process models to


study the optimal control of access to the landing queue of an airport. There
is another school of literature which concentrates on the physical aspect of
air traffic. These directly use physical parameters of aircraft, airports, and
air space as input variables and give guidelines for inter-aircraft distances,
landing and takeoff delays, and other air traffic control elements in various
circumstances. Newell [105] gave a survey on airport capacity. He described
how the capacity of a runway configuration depends upon the strategy for
sequencing operations such as arriving and departing of heavy and light
aircraft, the runway geometry, the instrument flight rules, etc. Andreussi
et al. [6] introduced a discrete event simulation model that simulates the
aircraft sequencing operations in the near terminal area. The model follows
the landing procedure from common practice that all aircraft fly the "race-
track" shaped paths at different altitudes before leaving from several fixes
to enter the runway. Models concerning en route traffic conflict and capac-
ity were given by Blumstein [25], Dunlay [41], Geisinger [53], Hockaday and
Kanafani [67], Janic and Tosie [72], and Siddiqee [127].

7 Irregular Operations Control


Airlines operate in a very complex environment. Many factors are beyond
human control. The one factor that affects airline operations the most is
inclement weather. Severe weather situations worsen conditions at airports,
hence reducing the allowed arrival and departure rates, sometimes even forc-
ing airport closure. For each airline's every arrival at every airport in the
United States, an arrival time slot is assigned to it by the ATCSCC. Under
normal conditions, the slots match the flights' scheduled arrival times. In
the face of bad weather or abnormal conditions, the ATCSCC allocates to
airlines slots with reduced arrival rates that make delays and cancellations
unavoidable. Each slot is an interval of time centered around a controlled
arrival time within which the arrival has to be made. The airlines have
certain freedom in making their own decisions as to which flights are to
be canceled and which slot each flight will fill under constraints such as
rescheduled arrival not being earlier than the scheduled arrival time. The
airlines' decisions are fed back to the ATCSCC for approval. An approved
slot allocation can be put into execution, while denial forces the whole pro-
698 G. Yu and J. Yang

cess to be repeated. Very often, an allocation is denied because an airline


takes too long in making the online slot allocation decision. Different slot
allocations have different effects on the number of passengers who have to be
transferred to other flights or compensated due to flight cancellation, on the
total amount of passenger delay time, on the airlines' dependability data,
and on the intangible factors such as passenger goodwill and the airlines'
reputation. Therefore, a systems that effectively and efficiently allocates
flights to slots is highly sought after.

An arrival slot allocation system(ASAS} was implemented at American


Airlines in 1989 [150]. It consists of an algorithm that minimizes the amount
of delay, taking advantage of flight cancellations and a data-processing com-
ponent that cuts down the overall turnaround time of responses to ATCSCC
by automating the process of sending and receiving messages. The slot al-
location model is associated with the directed traveling salesman problem
[82]. Each flight segment under consideration is represented as a node in a
network; a salesman has to visit each node, deliver a slot in each visit, and
create a tour through the network. The order of nodes in the salesman's
tour corresponds to the sequence of arrival slot substitutions available for
each empty slot. The algorithm to solve it is a tour-building heuristic that
preserves aircraft and crew balance and gate connections among flights at
hub airports. The ASAS has saved American Airlines the cost of seven hu-
man dispatchers and an additional $5.2 million due to reduced amounts of
delay time.

In order to accurately measure the perturbation's propagation effect


caused by the Ground Delay Program (GDP), Luo and Yu [87] introduced
and formalized the concepts of critical departure times and critical arrival
times. For each flight, its I-stage critical departure time is the latest time
that the flight can depart without affecting the flight's subsequent sched-
uled activities; its n-stage critical departure time is the latest time that the
flight can depart without affecting the scheduled activities after n subse-
quent flights have been executed. The critical arrival times are similarly
defined. Aircraft and crew (even individual crew members) are treated as
generic resources. The authors point out that, in general, the pure slot allo-
cation problem without considering aircraft maintenance, crew legality, and
the splitting of resources (crew and aircraft fly the same subsequent flights at
the problem station) is an assignment problem that can be solved to optimal-
ity in polynomial time. Nevertheless, for various criteria, faster algorithms
Optimization Applications in the Airline Industry 699

are provided by the authors. Because the airlines' operation controllers have
to respond as quickly as possible to the ATCSCC, the algorithms will be
helpful to the controllers. When the objective is to minimize total passen-
ger delay, the authors' solution strategy is to assign slots chronologically,
and for each slot being considered, assign to it the earlier unassigned ar-
rival that has the largest passenger load. The complexity of the procedure
is O(n 2 ). Another similar algorithm was devised for minimizing the total
delay beyond critical times. To minimize the number of delayed out-flights,
i.e., the number of flights that are delayed beyond their 1-stage departure
times, an algorithm adapted from Moore's algorithm [101] for solving the
one machine scheduling problem n/1/ /nT is employed. For each in-flight,
its corresponding due date is the latest slot that enables the subsequent
out-flight to be before the 1-stage critical departure time. The algorithm
goes through two steps. In the first step, all the in-flights are ordered non-
decreasingly in their due dates and get sequence J. In the second step, the
first in-flight j in J whose associated out-flight is delayed under the current
landing sequence is identified and taken out of J. The process is repeated
until no in-flight whose associated out-flight is late remains in J. The algo-
rithm takes O(nlogn) time.

When resources are splitt able after landing (crew and aircraft follow
different routes at the problem station), each out-flight will need several
types of resources that are provided possibly by different in-flights. To
capture the lateness of out-flights, due date qi of an in-flight i E I has to be
defined as the earliest departure time of the out-flight that uses the resources
from this in-flight minus the minimum turnaround time:

qi = min{dk I <Pr(k) = i,r E R,k E K},


where <Pr(k) denotes the in-flight that provides out-flight k with the rth
resource, R is the set of resources needed for flight operation, K the set
of out-flights, and dk is the scheduled departure time of out-flight k minus
the minimum turnaround time. Luo and Yu [88] study the landing slot
allocation problem caused by GDP with splitt able resources using several
criteria. The first objective they consider is to minimize the maximum delay
Lmax among all out-flights. When in-flight i is assigned to landing slot u(i),
the definition of Lmax is:

Lmax = max [max{ 8(7(4> (k))


kEK rER r
- dd]+ = max[8(7(i)
iEI
- qi]+,
700 G. Yu and J. Yang

where Sj is slot j's starting time. The authors proved that the method of
assigning in-flights with earlier due dates to earlier landing slots guarantees
the optimal result.

To formulate the model to minimize the total number of in-flight and out-
flight delays, the following decision variables were introduced: Xij indicates
whether in-flight i is assigned to landing slot j and Yk indicates whether
out-flight k is on-time. The formulation is as follows:
(GDPLT) min Li Lj:Sj>ai
Xij + L{1- Yk)
k

subject to

Vi E I

Vj E J

L x'r(k)j + Yk :5 1 Vk E K,r E R
j:sj>d/c

Xij,YkE{O,l} Vi,j,k.
In the formulation, ai denotes the scheduled landing time for flight i E I,
S j denotes the starting time of slot j E J, and 5 denotes the interval of slot
j. The first two sets of constraints require that every in-flight is assigned to
exactly one landing slot not earlier than its scheduled landing time; the third
set of constraints declares the meaning of Yk'S. The authors recognized that
integrality constraints on Xij'S can be dropped as long as they are enforced
on Yk'S. Also they find the following useful valid inequalities:

Vi E I,

which specifies the minimum necessary delays for the first i in-flights;

L (1 - Yk') ~I {k' : d k , :5 dk} I - I {j : Sj :5 dk} I VkEK,


k':d/c,'5,d/c

which specifies the minimum necessary delays for the first k out-flights;

L Xij + Yk :5 1
i=t/>r(k),3rER
Optimization Applications in the Airline Industry 701

which states that an out-flight is late whenever an in-flight that provides


resources to it is late; and

Vi E I,

which makes sure that if an in-flight lands later than dk'S of all the out-flights
that need resources from it, then all the out-flights will be delayed. With
the help of these valid inequalities and the aforementioned model simplifi-
cation, optimal solutions for realistic problems were reached within seconds
on microcomputers. A way to get a good upper bound was also provided
by the authors: Find the level of optimal in-flight delays; then at this level,
optimize the number of takeoff delays.

When the resources are split into at most two, and if between the dk'S
of two out-flights requiring resources from the same in-flight there are no
landing slots, the authors provided a very simple model for minimizing the
number of takeoff delays. The two assumptions used are very realistic: Nor-
mally the only two resources that need consideration are crew and aircraft,
and takeoffs are much closer in time than landings. When out-flights are
thought of as nodes, and two out-flights that need resources from the same
in-flight are linked by an arc representing the in-flight, the relationship of
resource sharing and flowing can be represented by clusters of cycles dis-
tributed in many time intervals. Between the beginning of time and the
first interval, and between these intervals, there are landing slots. An out-
flight arc is on time if and only if it is assigned to a landing slot before it. An
out-flight node is on time if and only if the two in-flight arcs that link it are
all on time. The problem is that of maximizing the number of on-time nodes
with the restriction of limited landing slots. The authors provide an O(n)
time heuristic that guarantees that the solution will be less than 1 delay
away from the optimal one. The heuristic tries to assign arcs that complete
cycles, then at last leaves a chain which has 1 node less than arcs that are
on time. This is a greedy complete-cycle algorithm. The exact solution pro-
cedure provided by the authors tries to find the cycles that accommodate
the maximum number of on-time arcs; the slots left are assigned to arcs
that form a chain. The effort to find the cycles are proven to be of O(n 2 )
complexity.

Other major factors that affect airlines' regular operations are mechan-
702 G. Yu and J. Yang

ical failures and crew absence which cause aircraft shortages. The shortage
of an aircraft for one flight will propagate to downline operations and bring
great inconvenience to passengers and significant damage to the airline's
current operations and long-term profitability. Although reallocating slots
can reduce the effects of abnormal conditions at the current station, the
downline effects often are still enormous and require proper treatment. To
mitigate the rippling effect of flight delays and cancellations caused by a
few flights, airlines can opt to reschedule later flights and make appropriate
adjustments to aircraft, crew, and gate assignments. Many authors have
considered the aircraft routing problem that deals with unplanned opera-
tional changes. Because the flight schedule of an airline is always very tight,
locally adjusting the schedule and aircraft routing for full recovery is often
impossible. A scheme for global recovery is needed to reduce the irregu-
lar operations cost to the airline. To estimate how the recovery decisions
perform, costs of delaying and canceling flights must be specified. Also, min-
imum turnaround time for each aircraft is imposed, so that an aircraft can
take off again only after it has been on the ground for that amount of time.
For each station, there are curfew times after when no takeoff and landing
can take place. For the purpose of continuous operation and maintenance,
aircraft balance must be kept. In other words, for each station, a prescribed
number of aircraft must overnight there.

Teodorovic and Guberinic [142] proposed a graphical model that does


not consider flight cancellation and curfew constraints. In the model, all the
flights to be rescheduled are nodes Xi'S, and all the aircraft to be re-routed
are nodes yj's. If flight Xi lands at the station that Xj starts from, there is a
directed arc pointing from Xi to Xj' Also, all the aircraft nodes are connected
to all the flight nodes with undirected arcs. A rescheduled routing for the
entire planning horizon is represented by a chain
q = (Xa,Xb), (Xb, xc), ... , (xr, Yl), (Yl, Xh), ... , (Xq, Yj),

such that Xa, Xb, ... , Xr is the sequence of flights scheduled to be flown by
aircraft Yl, and so on and so forth. The authors supposed that the shortage
of aircraft would occur at the beginning of the day, so that a flight's delay
cost, which is proportional to the total passenger number multiplied by the
amount of the delay, could be calculated by comparing its position in the
sequence of the aircraft it is assigned to and the scheduled departure time,
and could be expressed as the length of the arc between two flight nodes.
The model seeks to minimize the total cost caused by all the rescheduled
Optimization Applications in the Airline Industry 703

flights. The problem is transformed into finding the shortest path of the
graph that traverses all of its nodes. The authors apply branch-and-bound
methodology to solve it. At each branching node, the partial sum of total
arc length is a straightforward lower bound for the partial setting - a partial
sequence of nodes for q. If the partial sum is greater than the most recent
upper bound, Le., the shortest total length ever obtained in the process from
a complete path, then all the paths with the first part identical to the partial
setting are abandoned and thus a cut is made. The authors gave numerical
examples to illustrate the effectiveness of the model and solution strategy.
No practical problem was tested.

Teodorovic and Stojkovic [144] proposed a lexicographical model meant


to minimize the number of cancellations first and, at the level of minimum
cancellations, minimize total passenger delays. The model takes into account
airport curfew times. The authors devised a heuristic that sequentially de-
cides the chain of flights to be flown by each aircraft. For each aircraft i,
the chain is allowed to grow in a multistage network, which is comprised
of one stage 0 of a starting node, one stage 1 of nodes representing flights
departing from the station where the aircraft is, and stages 2 to n of nodes
representing the n flights left unasslgned to aircraft 1, ... , i-I, and directed
arcs from nodes in stage i to nodes in stage i + 1, each of which reflects that
the flight in stage i + 1 departs from the airport where the flight in stage
i lands. A chain is a sequence of connected nodes and arcs extending from
stage 0 to the stage where it can grow no more because the station-curfew
constraint represents a feasible assignment of the aircraft. The heuristic lets
the chain grow stage by stage. When the chain up to stage i - I is found,
the arc (and thus the node) in stage i to be added to the chain is chosen
such that the incurred delay taking into consideration the part of sequence
of flights already decided is minimized. After the chain for each aircraft is
found, the flights left unassigned are canceled. The model again assumes a
common starting time of aircraft operation time. In addition, the aircraft
balance problem essential to maintenance is not addressed.

Jarrah et al.'s [73] models use network flows to trace the effect of aircraft
shortage. In one of their models dealing only with flight delays, a network
was built for the station where there is a shortage of aircraft. There are
aircraft nodes at the times of scheduled availability and flight nodes at the
times of scheduled takeoff. Also, there are supply nodes and recovery nodes
representing spare aircraft and aircraft recovered from previous unavailabil-
704 G. Yu and J. Yang

ity at the times of their readiness. A directed forward arc is linked from each
aircraft node to its assigned flight node on the schedule; and a backward arc
is linked from each flight node to every aircraft node, every supply node,
and every recovery node that is early enough so that the corresponding de-
lay is still acceptable. On each backward arc, there is a delay cost incurred
by delay of the potential assignment. Sources of unit flows are provided at
the aircraft nodes where the aircraft are unavailable; supply and recovery
nodes are treated as sinks where no more than unit flow can be absorbed.
In a path of unit flow from a source node to a sink node, the backward arcs
represent the reassignment of flights to aircraft, and the sum of their costs
corresponds to the marginal operation cost. The minimum-cost network
flow algorithm helps find the optimal reassignments.

In Jarrah et al.'s model dealing with flight cancellations, the aforemen-


tioned networks for all the stations involved become parts of a complete
network. These parts are linked by additional arcs from flight nodes at their
departure stations to aircraft nodes corresponding to the assigned aircraft
on the schedule at the arrival stations. Cancellation costs are put onto these
arcs. Because delays due to upstream delays are not captured in this model,
the backward arcs considered in this model do not represent real delays.
In fact, they always go from flights to aircraft that are ready before their
scheduled takeoff times. In a path of unit flow, the backward arcs still rep-
resent reassignments, or aircraft swaps, while the additional arcs represent
flight cancellations. Again, a minimum-cost network flow algorithm serves
the purpose of making the optimal decision. The models were tested on
real data from United Airlines. They generated effective solutions fast, and,
therefore are, suitable for real-time implementation. The values and impact
generated from the system and some of its implementation issues were re-
ported in Rakshit et al.[114]. The drawback of the models is that they only
consider limited alternatives for tackling the aircraft shortage problem. Real
delays and cancellations are not considered simultaneously and the rippling
effects of delays are not studied.

Arguello et al. [8] proposed models that deal with the problem more
thoroughly and give approximate solution techniques. The authors pro-
posed two exact models for the problem. The first one assumes that all
the feasible flight paths that can be flown by one aircraft continuously are
generated, and the cost of assigning every aircraft to every feasible flight
path is calculated. Also known is each flight's cancellation cost. Feasibility
Optimization Applications in the Airline Industry 705

means that a flight path observes the constraints of minimum turnaround


time, station curfew requirements, and path continuity requirements. The
model finds the least cost assignment of every aircraft to a feasible path
so that each flight is either covered by exactly one flight path or canceled,
and the aircraft balance is preserved. The number of feasible flight paths
is of exponential order; thus, the model is intractable. The authors did not
attempt to solve it. The second model considers aircraft and cancellations
as two types of commodities that flow in a network. Every flight's departure
or arrival corresponds to a node in the network. Departure nodes point arcs
to their corresponding arrival nodes. For each station, there is an aircraft
source node whose supply is the number of aircraft available at the loca-
tion, a cancellation source node, and a station sink node whose demand is
the number of required overnight aircraft. All the source nodes and arrival
nodes at one station point arcs to the station's sink and departure nodes.
The model makes the approximation that delay costs are stored on arcs that
enter departure nodes and are accrued for an aircraft's assignment. The can-
cellation cost for a flight is allocated to the arc representing that flight. The
problem becomes a two-commodity minimum-cost binary flow problem with
homologous arcs. The homologous arcs are the flight arcs that must have
flows of one. This is an NP-hard problem as well.

The third model Arguello et al. proposed is an approximate one based


on the time-space network. The time horizon is discretized to have a fi-
nite number of bands. For every station there are a number of station-time
nodes, each in the time band when there can be arrival or departure events.
For every flight departing from a station, there is an arc going from each
station-time node that belongs to the station-time node that corresponds to
the flight's arrival time and station, if the flight implied by the arc satisfies
the curfew requirement. If it does not, the arc goes to the station-sink node
designated for the departing station. So for each flight, there are a variety
of arcs that represent it, and each is executed at a different time and has
a different delay cost. The problem can again be thought of as a minimum
cost network flow problem with side constraints for flight coverage. We make
the following definitions before we present the formulation:

Indices:
i, j = node indices
k = flight index
706 G. Yu and J. Yang

Sets:
F = set of flights
G(i) =let of flights originating at station-time node i
H(k, i~et:of destination station;.time nodes for flight k
that originates from station-time node i
I = set of station-time nodes
J = set of station-sink nodes
L(i) ::::set of flights terminating at node i
M(k, ~t=of origination station-time nodes for flight k
terminating at node i
P(k) :set of station-time nodes at the station from which
flight k originates
Q(i) =let of station-time nodes at station that contains
station-sink node i
Parameters:
ai = number of aircraft that become available at
station-time node i
Ck = cost of canceling flight k
d~j = delay cost of flight k from station-time node
i to station-time node j
hi = number of aircraft required to terminate at
station-sink node i
Variables:
X~j = amount of aircraft flow for flight k from
station-time node i to node j
Yk = cancellation indicator for flight k
Zi = amount of aircraft flow from station-time node ito
station-sink node at same station

With the above definitions, the formulation is written as:

min L L L d~jxt + L CkYk


keF iep(k) jeH(k,i) keF

subject to

~ ~ X~j + Yk = 1 Vk E F
iep(k) jeH(k,i)
Optimization Applications in the Airline Industry 707

L L xfj + Zi = L 2: xfj + ai Vi EI
kEG(i) jEH(k,i) kEL(i) jEM(k,i)

L L XJi + 2: Zj = hi Vi EJ
kEL(i) jEM(k,i) jEq(i)

xfj E {O, I} Vk E F,i E I,j E H(k,i)


Yk E {O, 1} Vk E F
Zi E Z+ = {O, 1,2, ... } Vi E I.

In the model, the first set of constraints specifies that a flight is either
carried out at certain time or canceled. The second set balances inflows and
outflows of aircraft at all station-time nodes. The third set enforces aircraft
balance at the end of day at all the stations. The rest of the constraints
define the decision variables' domains.

Due to the discretization, all the aircraft are assumed to be able to ser-
vice a flight at the starting time of the band where its available time is. dfj
underestimates the real delay. Therefore the solution of the model is a lower
bound of the actual optimal solution. The linear relaxation of the above
model will give a lower bound to the model itself, thus a lower bound to the
optimal value.

Arguello et al. [9] adopted a greedy randomized adaptive search proce-


dure (GRASP) to get a feasible suboptimal solution to the problem. The
scheme of GRASP is: Starting with an incumbent solution, study all its
neighboring solutions and store the ones considered good compared to the
incumbent on the restricted candidate list. In the next iteration, randomly
pick one solution from the restricted candidate list as the new incumbent.
The procedure continues until the stopping criterion is met. In the GRASP
for this problem, the initial incumbent solution is simply the one that can-
cels all the subsequent flights of the aircraft that is causing the shortages.
All the neighbors are found by doing certain operations to all the pairs
of two flight paths and flight-cancellation paths in the incumbent solution.
The operations done on a pair append a part of one path to another or
exchange portions of paths between each other, so that the aircraft balance
is preserved. Delay cost can be estimated on each path, and curfew time vi-
olations can be found. Finding all the neighboring solutions in one iteration
708 G. Yu and J. Yang

needs polynomial time. The procedure does not involve spare aircraft when
starting from the default initial incumbent solution.

Wei et al. [157] [132] addressed the real-time crew recovery problem. The
crew schedule disruption problem is often caused by delays, cancellations, di-
versions, crew sickness, missed connections, and/or legality conditions. The
alternatives for resolving these problems include swapping crews, using crew
reserves, deadheading crews, and any combination of these. Crew legality,
seniority, qualification, pay protection, and returning to base after service
are the main issues that need to be taken into account. A network model is
constructed for the crew pairing repair problem. The main components of
the network include:

• Crew nodes: The crew nodes represent either arriving crew or crew
originating from the problem station. They are placed at the time of
availability.
• Flight nodes: The flight nodes represent departure flights, and they
are placed at the scheduled time of departure.

• Reserve nodes: The reserve nodes represent the availability of crew


reserves, and they are placed at their available station and time.

• Return nodes: These nodes are used to force the crews to return to
their original schedule after the recovery time.
• Scheduled arcs: These arcs emanate from crew nodes to their originally
scheduled flight nodes to represent the original schedule.
• Swap arcs: These arcs emanate from crew nodes to flight nodes which
are not their originally assigned flight and whose departure time is
later than the crew availability time.

• Flight arcs: These arcs represent the flight from one airport to another.
They originate from flight nodes at departure airports and end at the
corresponding crew nodes at their destination airports.

• Reserve arcs: These arcs emanate from reserve nodes to those flight
nodes at the same airport which can be served by the reserve crew.
• Return arcs: These arcs emanate from crew nodes at the airport where
their corresponding return nodes are placed to their return nodes.
Optimization Applications in the Airline Industry 709

Costs are assigned to the arcs to account for crew swap, deadhead, use
of reserves, etc. Based on the network, a multi-commodity integer network
flow model and a heuristic search algorithm were presented and discussed.
Their computational example showed that the model and algorithm are ef-
fective in solving the crew recovery problem.

A quadratic 0-1 programming model simultaneously dealing with flight


delays and cancellations during irregular operations was given by Cao and
Kanafani [29][30]. A solution algorithm was provided and tested to be effec-
tive and efficient enough to carry out real time missions. Yan and Yang [160]
formulate the schedule perturbation problem caused by aircraft shortage as
network flow problems and network flow problem with side constraints, while
Yan and Lin [159] formulated the schedule perturbation problem caused by
temporary airport closure as network flow problems and network flow prob-
lems with side constraints. In both applications, the network flow problems
are solved by network simplex methods and the network flow problems with
side constraints are solved by applying subgradient methods on the prob-
lems' Lagrangian relaxations. To fit the fleet assignment into the fluctuating
passenger demand pattern, Klincewicz and Rosenwein [75] also formulated
and solve a network flow problem. They treated a daily schedule varying
from day to day as the combination of a repetitive "skeleton" schedule and
daily changes.

Mathaisel [95] reported on the development of the Airline Schedule Con-


trol (AS C) system which provides a systematic interaction between the sys-
tem and humans, centralizes the database across all functions of the air-
line, and distributes the decision support and optimization to the respective
groups. The system was built on a network of workstations. One of them
works as a server, and all the others work as clients. The common database
resides on the server, and all the schedule retrievals and updates are through
the server. The ASC display on each client workstation is controlled by a
local copy. Human controllers interact with the client workstations to ac-
complish the various decision making processes. The optimization engine in-
tegrated into the ASC environment for irregular operations is a space-time
network-based model and the out-of-kilter network flow algorithm. The
space-time network consists of nodes representing the arrival and depar-
ture times and the station of flights, arcs linking the origin and destination
nodes of flights, ground arcs that link all the nodes in one station, and
overnight (for a longer schedule cycle, it could be over weekend) arcs that
710 G. Yu and J. Yang

for each station link the last node in the time horizon to the first node.
When perturbation occurs, alternative options are evaluated, and the most
cost-effective action is chosen by adding arcs and changing flow bounds that
reflect delaying or canceling of flights, designating to them the estimated cost
of carrying them out, and solving the minimum cost network flow problem
by the out-of-kilter method. The determination of the arc costs depends on
many factors, including flight duration, amount of delaying, the number of
passengers on board, etc. The ASC system provides a framework on which
continuing independent expansions and improvements on sub-problem solv-
ing for dealing with various operational problems including irregular opera-
tions are possible without inconveniencing human controllers with disparate
and ever-changing interfaces.

In [162], Yu addressed various issues encountered in implementing real-


time, mission-critical decision support systems for aircraft and crew recovery.
The following issues were discussed:

• Solution time is important due to the real-time nature of the irregular


operations control problem. The complexity of the underline opti-
mization problem demands simplification of the problem and effective
heuristics.

• Keeping the optimization model always up-to-the-minute is a chal-


lenging task. The real-time data needs to be received and processed
on-line. This issue was resolved by keeping the model in memory and
using a messaging system to update the model in real-time.

• Coping with crew legalities requires embedding some basic legality


checking in the optimization engine and complete checking when so-
lutions are constructed in the search engine.

• In a multiple user environment, who is responsible for which resource


usage is a practical issue. In such situations, a resource locking mech-
anism needs to be provided so that the resources engaged by one user
are not used by a different user, thus avoiding conflict.

• Multiple solutions are desirable to support the decision makers' choice.


This is particularly important since there are many soft constraints
that are difficult to incorporated into the optimization engine.
Optimization Applications in the Airline Industry 711

• Partial solutions are also desirable in order to resolve immediate prob-


lems and postpone unavoidable resource shortages to a later stage.
• What-if capabilities are a must for successful implementation of such
systems. These enable users to solve immediate as well as anticipated
problems.
• Since a real-time control system needs to interface with many existing
systems, the network communication and database integrity problem
should not be overlooked.
• To facilitate users' acceptance, a user-friendly graphical interface is of
great importance.

8 Concluding Remarks
In this paper, we intended to cover some major lines of optimization research
and applications in the airline industry. Due to the technology-driven na-
ture of the airline industry, optimization can have a great impact on almost
every part of the operational process ranging from planning, routing, and
scheduling to real-time control of all resources involved. Our task is by no
means complete. There are still many topics. To name but a few: they in-
clude maintenance scheduling and routing, manpower planning, crew train-
ing scheduling, gate assignment, aircraft load balance, baggage routing and
tracking, airport facility management, aircraft procurement, aircraft parts
inventory management, and food service and supply management. Some of
these can be found in Yu [164] and in Yu [163].

Currently, most airlines store data, make plans, generate schedules, man-
age resources, and control operations using fragmented systems. These sys-
tems do not communicate seamlessly with each other. Complete real-time
data is lacking. Multiple data instances reside in different systems which
leads to pitfalls in data integrity and synchronization. Graphical user inter-
faces do not have the same look and feel across systems. Many managers
still rely on computer printouts and manually-generated charts to make their
planning and daily operational decisions.

More importantly, even though some companies deploy isolated decision


support systems, these systems generate sub-optimal, localizerl, and uncoor-
dinated solutions. For example, the system which generates aircraft routes
712 G. Yu and J. Yang

does not take into account the difficulty in compiling corresponding crew
pairings; the system which generates flight schedules disregards the com-
plexity of its consequent aircraft routes; the system for manpower planning
decisions does not consider subsequent training scheduling and training re-
source usage; the systems for planning decisions do not offer robust solutions
that can be effectively recovered and remedied in the case of schedule dis-
ruptions during plan execution; etc. The lack of integration among decision
support systems leads to inferior solutions in terms of enterprise-wide cost
and responsiveness to change. We would like to see more research, develop-
ment, and implementation effort on integrated decision support systems.

Reaching and staying at the leading edge in the competitive air trans-
portation market, managing operations efficiently, and responding to cus-
tomer needs effectively are among the challenges facing top management at
every commercial airline. The key to meeting these challenges is the success-
ful deployment of sophisticated and integrated, optimization-based decision
support systems utilizing state-of-the-art computer and optimization tech-
nology. As successfully implemented decision support systems at several
major airlines start to demonstrate their tremendous value and impact, we
anticipate an overwhelming acceptance of the optimization concept and a
prosper future for optimization applications in the airline industry.

References
[1] Abara, J., "Applying Integer Linear Programming to the Fleet As-
signment Problem," Interfaces, 19:4, 1989, pp. 20-28.

[2] Alstrup, J., Boas, S., Madsen, O.B.G., and Vidal, R.V.V., "Booking
Policy for Flights with Two Types of Passengers," European Journal
of Operational Research, 27, 1986, pp. 274-288.

[3] Anbil, R., Gelman, E., Patty, B., and Tanga, R., "Recent Advances
in Crew-Pairing Optimization at American Airlines," Interfaces, 21:1,
1991, pp. 62-74.

[4] Anbil, R., Tanga, R., and Johnson, E.L., "A Global Approach to
Crew-Pairing Optimization," IBM Systems Journal, 31:1, 1992, pp.
71-78.
Optimization Applications in the Airline Industry 713

[5] Andreatta, G. and Romanin-Jacur, G., "Aircraft Flow Management


under Congestion," Transportation Science, 21:4, 1987, pp. 249-253.

[6] Andreussi. A., Bianco, L., and Ricciar-Delli, S., "A Simulation Model
for Aircraft Sequencing in the Near Terminal Area," European Journal
of Operational Research, 8, 1981, pp. 345-354.

[7] Arabeyre, J.P., Fearnley, J., Steiger, F.C., and Teather, W., "The
Airline Crew Scheduling Problem: A Survey," Transportation Science,
3, 1969, pp. 140-163.

[8] Arguello, M.F., Bard, J.F., and Yu, G., "Models and Methods for
Managing Airline Irregular Operations Aircraft Routing," in G. Yu
eds. Operations Research in the Airline Industry, 1997, pp. 1-45.

[9] Arguello, M.F., Bard, J.F., and Yu, G., "Bounding Procedures and
a GRASP Heuristic for the Aircraft Routing Problem," Journal of
Combinatorial Optimization, 1, 3, 1997, pp. 211-228.

[10] Aykin, T., "On the Location of Hub Facilities," Transportation Sci-
ence, 22:2, 1988, pp. 155-157.

[11] Aykin, T., "Networking Policies for the Hub-and-Spoke Systems with
Application to the Air Transportation System," Transportation Sci-
ence, 29, 1995, pp. 201-221.

[12] Balas, E. and Ho, A., "Set Covering Algorithms Using Cutting Planes,
Heuristics and Subgradient Optimization: A Computational Study,"
Mathematical Program Study, 12, 1980, pp. 37-60.

[13] Balas, E. and Padberg, M.W., "Set Partitioning: A Survey," SIAM


Review, 18:4, 1976, pp. 710-760.

[14] Bailey, E.E., Graham, D.R., and Kaplan, P.D., "Deregulating the Air-
lines," The MIT Press, Cambridge, MA, 1985.

[15] Ball, M., Bodin, L., and Dial, R., "A Matching Based Heuristic for
Scheduling Mass Transit Crews and Vehicles," Transportation Science,
17, 1983, pp. 4-31.

[16] Ball, M. and Roberts, A., "A Graph Partitioning Approach to Airline
Crew Scheduling," Transportation Science, 19:2, 1985, pp. 107-126.
714 G. Yu and J. Yang

[17] Bauer, P.W., "Airline Hubs: A Study of Determining Factors and


Effects," Economic Review: Federal Reserve Bank of Cleveland, Forth
Quarter, 1987, pp. 13·19.
[18] Beckmann, M.J., "Decision and Team Problems in Airline Reserva-
tions," Econometrica, 26, 1958, pp. 134-145.
[19] Belobaba, P.P., "Airline Yield Management An Overview of Seat In-
ventory Control," Transportation Science, 21:2, 1987, pp.63-73.
[20] Belobaba, P.P., "Application of A Probabilistic Decision Model to
Airline Seat Management Control," Operations Research, 37, 1989,
pp. 183-197.
[21] Benchakroun, A., Ferland, J.A., and Cleeroux, R., "Distribution Sys-
tem Planning Through a Generalized Benders Decomposition Ap-
proach," European Journal of Operational Research, 62, 1991, pp. 149-
162.
[22] Benders, J.F., "Partitioning procedure for solving mixed-variables pro-
gramming problem," Rand Symp. Math. Programming, March 16-20,
1959, Santa Monica, California.
[23] Berge, M.A. and Hopperstad, C.A., "Demand Driven Dispatch: A
Method for Dynamic Aircraft Capacity Assignment, Models and Al-
gorithms," Operations Research, 41, 1993, pp. 153-168.
[24] Bixby, R.E., "Implementing the Simplex Methods, Part I, Introduc-
tion; Part II, The Initial Basis," TR 90-32 Mathematical Science, Rice
University, Houston, TX, 1990.
[25] Blumstein, A., "An Analytical Investigation of Airport Capacity,"
Cornell Aeronautical Laboratory, Report TA-1358-6-1, June 1960.
[26] Brown, J.H., "An Economic Model of Airline Hubbing-and-Spoking,"
Logistics and Transportation Review, 27:3, 1991, pp. 225-239.
[27] Brumelle, S.L., McGill, J.I., Oum, T.H., Sawaki, K., and Tretheway,
M.W., "Allocation of Airline Seats between Stochastically Dependent
Demands," Transportation Science, 24:3, 1990, pp. 183-192.
[28] Camerini, P.K., Fratta, L., and Maffioli, F., "On Improving Relaxation
Methods by Modified Gradient Techniques," Mathematical Program-
ming Studies, 3, 1975, pp. 6-25.
Optimization Applications in the Airline Industry 715

[29] Cao, J. and Kanafani, A., "Real-Time Decision Support for Integra-
tion of Airline Flight Cancellations and Delays Part I: Mathematical
Formulation," Transportation Planning and Technology, 20, 1997, pp.
183-199.

[30] Cao, J. and Kanafani, A., "Real-Time Decision Support for Integra-
tion of Airline Flight Cancellations and Delays Part II: Algorithm and
Computational Experiments," Transportation Planning and Technol-
ogy, 20, 1997, pp. 201-217.

[31] Christofides, N. and Korman, S., "A Computational Survey of Meth-


ods for the Set Covering Problem," Management Science, 21:5, 1975,
pp. 591-599.

[32] Conley, W., Computer Optimization Techniques, New York, Petrocelli


Books, 1980.

[33] Cornuejols, G., Fisher, M., and Nemhauser, G., "Location of Bank
Accounts to Optimize Float: An Analytic Study of Exact and Approx-
imate Algorithms," Management Science, 23:6, 1977, pp. 789-810.

[34] Crowder, H., Johnson, E.L., and Padberg, M.W., "Solving Large Scale
Zero-one Linear Programming Problems," Operations Research, 31,
1983, pp. 803-834.

[35] Curry, R.E., "Optimal Airline Seat Allocation with Fare Classes
Nested by Origins and Destinations," Transportation Science, 24:3,
1990, pp. 193-204.
[36] Dantzig, G.B., Linear Programming and Extension, Princeton Univer-
sity Press, Princeton, 1963.
[37] Daskin, M.S. and Panayotopoulos, N.D., "A Lagrangian Relaxation
Approach to Assigning Aircraft to Routes in Hub and Spoke Net-
works," Transportation Science, 23:2, 1989, pp. 91-99.

[38J Desaulniers, G., Desrosiers, J., Dumas, Y., Solomon, M.M., and
Soumis, F., "Daily Aircraft Routing and Scheduling," Management
Science, 43:6, 1997, pp. 841-855.

[39] Druckerman, J., Silverman, D., and Viaropulos, K., "IBM Optimiza-
tion Subroutine Library, Guide and Reference, Release 2," Document
Number SC230519-02, IBM, Kinston, New York, 1991.
716 G. Yu and J. Yang

[40] Du, D.-Z. and Hwang, F.K., Combinatorial Group Testing and Its
Applications, World Scientific Corp., Inc., 1993.

[41] Dunlay, J.W., "Analytical Models of Perceived Air Traffic Control


Conflicts," Transportation Science, 9, 1987, pp. 149-164.

[42] E1ce, I., "The Developm.ent and Im.plementation of Air Canada's Long
Range Planning Model," AGIFORS Symposium Proceedings, 10, 1970.

[43] Etschmaier, M.M., "Schedule Construction and Evaluation for Short


and Medium Range Corporate Planning," AGIFORS Symposium Pro-
ceedings, 10, 1970.
[44] Etschmaier, M.M., "A Survey of the Scheduling Methodology Used
in Air Transportation," in R. Genser, M. Strobel, and M.M.
Etschmaier (eds.) Optimization Applied to Transportation Systems,
Vienna, IIASA, 1977.

[45] Etschmaier, M.M. and Mathaisel, O.F.X., "Airline Scheduling: An


Overview," Transportation Science, 19:2, 1985, pp. 127-138.

[46] Etschmaier, M.M. and Rothstein, M., "Operations Research in the


Management of the Airlines," OMEGA 2, 1974, pp. 157-179.

[47) Fisher, M.L., "The Lagrangian Relaxation Method for Solving Integer
Programming Problems," Management Science, 27, 1981, pp. 1-18.

[48) Florian, M., Guerin, G.G., and Bushel, G., "The Engine Scheduling
Problem on a Railway Network," INFORJ., 14, 1976, pp. 121-128.

[49) Gagnon, G., "A Model for Flowing Passengers Over Airline Networks,"
AGIFORS Symposium Proceedings, 7, 1967.

[50] Garfinkel, R.S. and Nemhauser, C.L., "The Set Partitioning Problem:
Set Covering with Equality Constraints," Operations Research, 17,
1969, pp. 848-856.

[51] Fitzpatrick, G.L. and Modlin, M.J., "Direct-Line Distances, United


States Edition," The Scarecrow Press, 1986.
[52) Gasco, J.L., "Reservations and Booking Control," AGIFORS Sympo-
sium Proceedings, 17, 1977.
Optimization Applications in the Airline Industry 717

[53] Geisinger, K.E., "Airspace Conflict Equations," Transportation Sci-


ence, 19:2, 1985, pp. 139-153.

[54] Geoffrion, A.M. and Graves, G.W., "Multicommodity Distribution


System Design by Benders Decomposition," Management Science,
20:5, 1974, pp. 822-844.

[55] Gershkoff, I., "Optimizing Flight Crew Schedules," Interfaces, 19:4,


1989, pp. 29-43.

[56] Ghobrial, A. and Kanafani, A., "Airline Rubbing: Some Implications


for Airport Economics," Transportation Research, 19A, 1985, pp. 15-
27.

[57] Glover, F., Glover, R, Lorenzo, J., and McMillan, C., "The Passenger-
Mix Problem in the Scheduled Airlines," Interfaces , 12, 1982, pp.
73-79.

[58] Gomory, R, "An Algorithm for Integer Solution to Linear Programs,"


in G. Graves and P. Wolfe (eds.) Recent Advances in Mathematical
Programming, New York, McGraw-Hill, 1963.

[59] Graves,G., McBride, R, Gershkoff, I., Anderson, D., and Mahidhara,


D., "Flight Crew Scheduling," Management Science, 39:6, 1993, pp.
736-745.

[60] Hane, C.A., Barnhart, C., Johnson, E.L., Marsten., RE., Nemhauser,
G.L., and Sigismondi, G., "The Fleet Assignment Problem: Solving a
Large-Scale Integer Program," Mathematical Programming, 70, 1995,
pp. 211-232.

[61] Hansen, M., "A Model of Airline Hub Competition," University of


California, Berkeley, Institute of Transportation Studies, Dissertation
Series 88-2, Berkeley CA.

[62] Ransen, M. and Kanafani, A., "International Airline Rubbing in a


Competitive Environment," Transportation Planning and Technology,
13, 1988, pp. 3-18.

[63] Ransen, M. and Kanafani, A., "Airline Rubbing and Airport Eco-
nomics in the Pacific Market," Transportation Research, 24A:3, 1990,
pp. 217-239.
718 G. Yu and J. Yang

[64] Helme, M.P., "A Selective Multicommodity Network Flow Algorithm


for Air Traffic Control," in G. Yu eds. Operations Research in the
Airline Industry, 1997, pp. 101-121.

[65] Hersh, M. and Ladany, S.P., "Optimal Seat Allocation for Flights with
One Intermediate Stop," Computers and Operations Research, 5, 1978,
pp.31-37.

[66] Hoang, H.H., "Topological Optimization of Networks: A Nonlinear


Mixed Integer Model Employing Generalized Benders Decomposi-
tion," IEEE Trans. Automatic Control, AC-27, 1982, pp. 164-169.

[67] Hockaday, S.L.M. and Kanafani, A.K., "Developments in Airport Ca-


pacity Analysis," Transportation Research, 6, 1974, pp. 171-180.

[68] Hoffman, K.L. and Padberg, M.W., "Solving Airline Crew Scheduling
Problems by Branch-and-Cut," Management Science, 39:6, 1993, pp.
657-682.

[69] Holloway, C., "A Generalized Approach to Dantzig-Wolfe Decomposi-


tion for Concave Programs," Operations Research, 21, 1973, pp. 210-
220.

[70] Jaillet, P., Song, G., and Yu, G., "Airline Network Design and Hub
Location Problems," Location Science, 4, 3, 1997, pp. 195-212.

[71] Jaillet, P., Song, G., and Yu, G., "Networking Design Problems with
Applications to the Airline Industry," Proceedings for TRISTAN II,
Capri, Italy, 1994.

[72] Janic, M. and Tosie, V., "En Route Sector Capacity Model," Trans-
portation Science, 25:4, 1991, pp. 299-307.

[73] Jarrah, A.r.Z., Yu, G., Krishnamurthy, N., and Rakshit, A., "A Deci-
sion Support Framework for Airline Flight Cancellations and Delays,"
Transportation Science, 27:3, 1993, pp. 266-280.

[74] Kanafani, A., "Transportation Demand Analysis," McGraw-Hill, New


York, 1983, pp. 256-258.

[75] Klincewiez, J.G. and Rosenwein, M.B., "The Airline Exception


Scheduling Problem," Transportation Science, 29:1, 1995, pp. 4-16.
Optimization Applications in the Airline Industry 719

[76] Kushige, T., "A Solution of Most Profitable Aircraft Routing," AGI-
FORS Symposium Proceedings, 3, 1963.

[77] Labombarda, P. and Nicoletti, a., "Aircraft Rotations by Computer,"


AGIFORS Symposium Proceedings, 11, 1971.

[78] Ladany, S.P. and Bedi, D.N., "Dynamic Booking Rules for Flights
with an Intermediate Stop," OMEGA, 5:6, 1977, pp. 721-730.

[79] Ladany, S.P. and Hersh, M., "Non-Stop vs. One-Stop Flights," Trans-
portation Research, 11:3, 1977, pp. 155-159.

[80] Land, A. H. and Doig, A., "An Automatic Method of Solving Discrete
Programming Problems," Economics, 28, 1960, pp. 497-520.

[81] Lavoie, S., Minoux, M., and Odier, E., "A New Approach for Crew
Pairing Problems by Column Generation with an Application to Air
Transportation," European Journal of Operations Research, 35, 1988,
pp.45-58.

[82] Lawler, E.L., Lenstra, J.K., and Rinnooy Kan, A.H.G., "Recent
Developments in Deterministic Sequencing and Scheduling: A Sur-
vey," in M.A.H. Dempster, J.K. Lenstra, and A.H.G. Rinnooy Kan
(eds.) Deterministic and Stochastic Scheduling, The Netherlands, Rei-
del/Kluwer Dordrecht, 1982.

[83] Levin, A., "Scheduling and Fleet Routing Models for Transportation
Systems," Transportation Systems, 5, 1971, pp. 232-255.

[84] Littlewood, K., "Forecasting and Control of Passenger Bookings," AG-


IFORS Symposium Proceedings, 12, 1972, pp. 95-117.

[85] "Yield Managers Now Control Tactical Marketing," Lloyd's Aviation


Economist, 12-13, 1985.

[86] Loughran, B.P., "An Airline Schedule Construction Model," AGI-


FORS Symposium Proceedings, 12, 1972.

[87] Luo, S. and Yu, G., "Airline Schedule Perturbation Problem: Land-
ing and Takeoff With Nonsplittable Resource for the Ground Delay
Program," in G. Yu eds. Operations Research in the Airline Industry,
1997, pp. 404-431.
720 G. Yu and J. Yang

[88] Luo, S. and Yu, G., "On the Airline Schedule Perturbation Problem
Caused by the Ground Delay Program," Transportation Science, 31,
4, 1997, pp. 298-311.

[89] Lustig, J.J., Marsten, RE., and Shannon, D.F., "Computational Expe-
rience with a Primal-Dual Interior Point Method for Linear Program-
ming," Linear Algebra and Its Applications, 152, 1991, pp. 191-222.

[90] Magnanti, T.L., Mireault, P., and Wong, R.T., "Tailoring Benders De-
composition for Uncapacitated Network Design," Mathematical Pro-
gramming Study, 26, 1986, pp. 112-154.

[91] Magnanti, T.L. and Wong, RT., "Accelerating Benders Decompo-


sition: Algorithmic Enhancement and Model Selection," Operations
Research, 29, 1981, pp. 465-484.

[92] Marketing News, June 20, 1986.

[93] Marsten, RE., "An Algorithm for Large Set Partitioning Problems,"
Management Science, 20:5, 1974, pp. 774-787.

[94] Marsten, R.E. and Shepardson, F., "Exact Solution of Crew Problems
using the Set Partitioning Mode: Recent Successful Applications,"
Networks, 11, 1981, pp. 165-177.
[95] Mathaisel D.F.X., "Decision Support for Airline System Operations
Control and Irregular Operations," Computers and Operations Re-
search, 23:11, 1996, pp. 1083-1098.
[96] Mayer, M., "Seat Allocation, or a Simple Model of Seat Allocation via
Sophisticated Ones," AGIFORS Symposium Proceedings, 16, 1976.

[97] McShan, S. and Windle, R, "The Implication of Hub-and-Spoke Rout-


ing for Airline Costs and Competitiveness," Logistics and Transporta-
tion Review, 25:3, 1989, pp. 209-229.

[98] Mevert, P., "Fixed Charge Network Flow Problems: Applications and
Methods of Solution," Presented at Large Scale and Hierarchical Sys-
tems Workshop, May 1977, Brussels.
[99] Miller, R, "An Optimization Model for Transportation Planning,"
Transportation Research, 1, 1967, pp. 271-286.
Optimization Applications in the Airline Industry 721

[100] Mirchandani, P., "Polyhedral Structure of the Capacitated Network


Design Problem with an Application to the Telecommunication In-
dustry," Unpublished Ph.D dissertation, Massachusetts Institute of
Technology, Cambridge, MA, 1989.

[101] Moore, J.M., "An n Job, One Machine Sequencing Algorithm for Min-
imizing the Number of Late Jobs," Management Science, 15:1, 1968,
pp. 102-109.

[102] Morrison, S. and Winston, C., "The Economic Effects of Airline Dereg-
ulation," Washington, D.C., Brookings Institution, 1986.

[103] Nakayama, H. and Sawaragi, Y., "Satisfying Tradeoff Method for Mul-
tiobjective Programming," in M. Graver and A.P. Wierzbicki (eds.)
Interactive Decision Analysis, (Proceedings, Laxenburg, Australia),
Berlin, Springer-Verlag, 1984, pp. 113-122.

[104] Nemhauster, G.L. and Widhelm,W.B., "A Modified Linear Program


for Columnar Methods in Mathematical Programming," Operations
Research, 19, 1971, pp. 1051-1060.

[105] Newell, G.F., "Airport Capacity and Delays," Transportation Science,


13:3, 1979, pp. 201-241.

[106] Odoni, A.R., "The Flow Management Problem in Air Traffic Con-
trol," in A.R. Odoni, L. Bianco, and G. Szego (eds.) Flow Control of
Congested Networks, Berlin, Springer-Verlag, 1987.
[107] O'Kelly M.E., "The Location of Interacting Hub Facilities," Trans-
portation Science, 20:2, 1986, pp. 92-106.

[108] O'Kelly M.E., "A Quadratic Integer Program for the Location of In-
teracting Hub Facilities," European Journal of Operational Research,
32, 1987, pp. 393-404.

[109] Orchard-Hays, W., "Advanced Linear Programming Computing Tech-


niques," McGraw-Hill, New York, 1986.

[110] Ostresh, L., Rushton, G. and Goodchild, M.F., "TWAIN-Exact So-


lution to the Two Source Location-Allocation Problem," Computer
Programs for Location-Allocation Problem, University of Iowa, Mono-
graph 6, 1973.
722 G. Yu and J. Yang

[111] Padberg, M.W., "On the Facial Structure of Set Packing Polyhedra,"
Mathematical Programming, 5, 1973, pp. 199-215.

[112] Padberg, M.W. and Rinaldi, G., "A Branch-and-Cut Algorithm for the
Solution of Large-scale Traveling Salesman Problems," SIAM Review,
33, 1991, pp. 60-100.

[113] Powell, W.B., IIAnalysis of Airline Operating Strategies under


Stochastic Demand," Transportation Research, 168, 1982, pp. 31-43.

[114] Rakshit, A., Krishnamurthy, N., and Yu, G., "A Real Time Decision
Support System for Managing Airline Operations at United Airlines,"
Interfaces, 26, 2, 1996, pp. 50-58.

[115] Reynolds-Feighan, A.J., "The Effects of Deregulation on U.S. Air Net-


works," Berlin, Heidelberg, Springer-Verlag, 1992.

[116] Richardson, R., "An Optimization Approach to Routing Aircraft,"


Transportation Research, 13B, 1979, pp. 49-63.

[117] Richardson, R., "An Optimization Approach to Routing Aircraft,"


Transportation Science, 10, 1976, pp. 52-71.

[118] Richetta, 0., "Ground Holding Strategies for Air Traffic Control un-
der Uncertainty," Ph.D. thesis, Massachusetts Institute of Technology,
June 1991.

[119] Richter, R.J., "Optimal Aircraft Rotations Based on Optimal Flight


Timing," AGIFORS Symposium Proceedings, 8, 1968.

[120] Richter, R.J., "Experience with the Aircraft Rotation Model," AGI-
FORS Symposium Proceedings, 10, 1970.

[121] Rothstein, M., "An Airline Overbooking Model," Transportation Sci-


ence, 5, 1971, pp: 180-192.

[122] Rothstein, M., "OR and Airline Overbooking Problem," Operations


Research, 33:2, 1985, pp. 237-248.

[123] Rothstein, M. and Stone, A.W., "Passenger Booking Levels," AGI-


FORS Symposium Proceedings, 7, 1967.
Optimization Applications in the Airline Industry 723

[124] Rubin, J., "A Technique for the Solution of Massive Set Covering
Problems, with Application to Airline Crew Scheduling," Transporta-
tion Science, 7, 1973, pp. 34-48.

[125] Rue, RC. and Rosenshine, M., "The Application of Semi-Markov De-
cision Processes to Queueing of Aircraft for Landing at an Airport,"
Transportation Science, 19:2, 1985, pp. 154-172.

[126] Sassano, A., "On the Facial Structure of the Set Covering Polytope,"
Mathematical Programming, 44, 1989, pp. 181-202.

[127] Siddiqee, W., "A Mathematical Model for Predicting the Number of
Potential Conflict Situations at Intersecting Air Routes," Transporta-
tion Science, 2, 1973, pp. 158-167.

[128] Simpson, RW., "A Review of Scheduling and Routing Models for Air-
line Scheduling," paper presented at the AGIFORS Meeting, Broad-
way, England, October 1969.

[129] Shlifer, E. and Vardi, Y., "An Airline Overbooking Policy," Trans-
portation Science, 9:2, 1975, pp. 101-114.

[130] Smith, B.C., Leimkuhler J.F., and Darrow RM., "Yield Management
at American Airlines," Interfaces, 22:1-2, 1992, pp. 8-31.

[131] Song, G., "Integer Programming Models for Airline Network Design
Problems," Ph.D. dissertation, Graduate School of Business, The Uni-
versity of Texas at Austin, 1995.

[132] Song, G., Wei, G., and Yu, G., "A Decision Support Framework for
Crew Management During Irregular Operations," in G. Yu eds. Oper-
ations Research in the Airline Industry, 1997, pp. 259-286.

[133] Soudarovich, J., "Routing Selection and Aircraft Allocation," AGI-


FORS Symposium Proceedings, 11, 1971.

[134] Struve, D.L., "Intercity Transportation Effectiveness Model," AGI-


FORS Symposium Proceedings, 10, 1970.

[135] Subramanian, R, Scheff, RP. Jr., Quillinan, J.D., Wiper, D.S., and
Marsten, RE., "Coldstart: Fleet Assignment at Delta Air Lines,"
Interfaces, 24:1-2, 1994, pp. 104-120.
724 G. Yu and J. Yang

[136] Sun, X., Brauner, E., and Hornby, S., "A Large-Scale Neural Network
for Airline Forecasting in Revenue Management," in G. Yu (eds.) Op-
erations Research in the Airline Industry, Boston, Kluwer Academic
Publishers, 1997, pp. 46-65.

[137] Sussner, P., Pardalos, P.M., and Ritter, G.X., "On Integer Program-
ming Approaches for Morphological Template Decomposition," Jour-
nal of Combinatorial Optimization, 1:2, 1997, pp. 177-188.

[138] Taha, H.A., Integer Programming Theory, Applications, and Compu-


tations, Academic Press, 1975, pp. 126-128.

[139] Talluri, K.T., "Swapping Applications in a Daily Airline Fleet Assign-


ment," Transportation Science, 30:3, 1996, pp. 237-248.

[140] Taylor, C.J., "The Determination of Passenger Booking Levels," AG-


IFORS Symposium Proceedings, 2, 1962, pp. 93-116.

[141] Teodorovi, D., "Matching of Transportation, Capacities and Passen-


ger Demands in Air Transportation," Civil Engineering Practice, Lan-
caster, Technomic Publishing, 1988, pp. 365-392.

[142] Teodorovic, D. and Guberinic, S., "Optimal Dispatching Strategy on


an Airline Network After a Schedule Perturbation," European Journal
of Operational Research, 15, 1984, pp. 178-182.

[143] Teo dorovic , D. and Krcmar-Nozic, E., "Multicriteria Model to De-


termine Flight Frequencies on an Airline Network under Competitive
Conditions," Transportation Science, 23:1, 1989, pp. 14-25.

[144] Teodorovic, D. and Stojkovic, G., "Model for Operational Daily Air-
line Scheduling," Transportation Planning and Technology, 14, 1990,
pp. 273-285.

[145] Terrab, M., "Ground Holding Strategies for Air Traffic Control,"
Ph.D. thesis, Massachusetts Institute of Technology, February 1990.

[146] Terrab, M. and Odoni, A.R., "Strategic Flow Management for Air
Traffic Control," Operations Research, 41:1, 1993, pp. 138-152.

[147] Tewinkel, D., "An Algorithm for Aircraft Scheduling in a Radial Net-
work," AGIFORS Symposium Proceedings, 9, 1969.
Optimization Applications in the Airline Industry 725

[148] Thompson, H.R, "Statistical Problems in Airline Reservation Con-


trol," Operational Research Quarterly! 12, 1961, pp. 167-185.

[149] Titze, B. and Griesshaber, R., "Realistic Passenger Booking Behavior


and Simple Low-Fare/High-Fare Seat Allotment Model," AGIFORS
Symposium Proceedings, 23, 1983.
[150] Vasquez-Marquez, A., "American Airlines Arrival Slot Allocation Sys-
tem (ASAS)," Interfaces, 21:1, 1991, pp. 42-61.

[151] Verleger, P.K., "Model of the Demand for Air 'Iransportation," Bell
J. Econ. Management Science, 3:2, 1972.
[152] Vranas, P.B., Bertsimas, D.J., and Odoni, A.R., "The Multi-Airport
Ground-Holding Problem in Air Traffic Control," Operations Research,
42:2, 1994, pp. 249-261.

[153] Vranas, P.B., Bertsimas, D.J., and Odoni, A.R., "Dynamic Ground-
Holding Policies for a Network of Airports," Transportation Science,
28:4, 1994, pp. 275-291.

[154] Wang, K., "Optimum Seat Allocation for Multi-Leg Flights with Mul-
tiple Fare Types," AGIFORS Symposium Proceedings, 23, 1983, pp.
225-237.

[155] Wang, H., "A Dynamic Programming Framework for the Global Flow
Control Problem in Air 'Iraffic Management," Transportation Science
, 25:4, 1991, pp. 308-313.
[156] Weatherford, L.R, "Using Prices More Realistically as Decision Vari-
ables in Perishable-Asset Revenue Management Problems," Journal
of Combinatorial Optimization, 1, 3, 1997, pp. 277-304.
[157] Wei, G., Song, G., and Yu, G., "Model and Algorithm for Crew Man-
agement During Airline Irregular Operations," Journal of Combina-
torial Optimization, 1, 3, 1997, pp. 305-321.
[158] Wollmer, RD., "An Airline Seat Management Model for a Single Leg
Route When Lower Fare Classes Book First," Operations Research,
40, 1992, pp. 26-37.

[159] Yan, S. and Lin, C., "Airline Scheduling for the Temporary Closure
of Airports," Transportation Science, 31:1, 1997, pp. 72-82.
726 G. Yu and J. Yang

[160] Yan, S. and Yang, D., "A Decision Support Framework for Handling
Scheduling Perturbation," Transportation Research, 30B:6, 1996, pp.
405-419.

[161] Yan, S. and Young, H., "A Decision Support Framework for Multi-
Fleet Routing and Multi-Stop Flight Scheduling," Transportation Re-
search 30A:5, 1996, pp. 379-398.
[162] Yu, G., "Real-Time, Mission-Critical Decision Support Systems for
Managing and Controlling Airlines' Operations," Proceedings at the
International Conference on Management Science and Economic De-
velopment, Hong Kong, 1996.
[163] Yu, G., Operations Research in the Airline Industry, Kluwer Academic
Publishers, Boston, 1997.

[164] Yu, G. ed., "Recent Advances of Optimization Applications in the Air-


line Industry," Special Issue of Journal of Combinatorial Optimization,
1, 3, 1997.
1

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-19
©1998 Kluwer Academic Publishers

Semidefinite Relaxations, Multivariate Normal


Distributions, and Order Statistics
Dimitris Bertsimas
Sloan School of Management
Massachusetts Institute of Technology, Cambridge, Mass. 02139
E-mail: dbertsim<Oaris .mit .edu

Yinyu Ye
Department of Management Science
The University of Iowa, Iowa City, Iowa 52242
E-mail: yinyu-ye<ouiowa. edu

Contents
1 Introduction 2

2 Positive Semidefinite Relaxations 4

3 A Randomized Rounding Method Based on the Multivariate Nor-


mal Distribution 6
3.1 The s - t Max-Cut Problem. . . . . 7
3.2 The s - u - v Max-Cut Problem . . 9
3.3 Constrained Quadratic Optimization 10
3.4 The Maximum Graph Bisection Problem. . 12

4 A Heuristic Based on Order Statistics and its Application to the


Graph Bisection Problem 14

5 Computational Results 16

References
2 D. Bertsimas and Yinyu Ye

1 Introduction

Given the symmetric matrix Q = {%} E ~nxn and the constraint matrix
A = {aij} E ~mxn, we consider the quadratic programming (QP) problem
with linear and boolean constraints

Maximize q(x):= x'Qx

It
(QP)

subject to aijxjl = bi, i = 1, ... , m,


3=1

XJ = 1, j = 1, ... ,n.

Note that the constraint xJ = 1 will force x j = 1 or x j = -1, making it a


boolean variable.
Examples of problems that fit into our framework: The graph maximum
bisection problem on an undirected graph G = (V, E), which is the problem
of partitioning the nodes in V into two sets S and V \ S of equal cardinality
so that L,(i,j)EE,iES,jEV\S Cij is maximized, can be formulated as follows:

Maximize

(GB)
n
subject to L Xj = 0,
j=l

X] = 1, j = 1, ... ,n.

Here, Cij is the weight of edge (i,j). Another example is the s - t max-cut
problem

Maximize

subject to Xs + Xt = 0,
X] = 1, j = 1, ... ,n,
Semidefinite Relaxations 3

meaning that nodes sand t should be separated in the optimal cut; and
1
Maximize '2 ~ Cij (1 - XiX j )
l,}

subject to Ixs + xtl = 2,


x; = 1, j = 1, ... , n,
meaning that nodes sand t should be on the same side of the optimal cut.
The third example is the s - u - v max-cut problem

Maximize

(SUV)
subject to Ixs + Xu + xvi = 1,
x; = 1, j = 1, ... ,n,
meaning that nodes s, u and v cannot be sided together in the optimal cut.
See Bertsimas, Teo and Vohra [4, 5] for more detailed descriptions of these
problems; also see Gibbons, Hearn and Pardalos [7].
Recently, there were several results on approximating specific quadratic
problems. Goemans and Williamson [8] provided a 0.878 approximation al-
gorithm for the max-cut and s - t max-cut problems. Frieze and Jerrum [6]
generalized the Goemans and Williamson approach to the k-max-cut prob-
lem and provided a 0.65 approximation algorithm for the maximum graph
bisection problem. Nesterov [12] provided a ~ approximation algorithm for
the boolean QP problem
Maximize q(x) = x/Qx

subject to x; = 1, j = 1, ... ,n,


where Q is positive semidefinite. Ye [17] further extended the ~ result to
solving the general QP problem
Maximize q(x) = x/Qx
n
subject to L aijX] = bi, i = 1, ... ,m,
j=l

-e ~ x ~ e,
4 D. Bertsimas and Yinyu Ye

where e is the vector of all ones. (Some negative results were given by Bellare
and Rogaway [2].) The central idea for these approximation algorithms, due
to Goemans and Williamson [8], is to use a semidefinite relaxation and
then use randomized rounding to round the solution using the geometry
of the semidefinite relaxation. To the best of our knowledge, there are no
approximation guarantees that use the semidefinite relaxation for the s-u-v
max-cut problem.
Our contributions in this paper are as follows:

1. We introduce another randomization technique based on the.multivari-


ate normal distribution to obtain feasible solutions from semidefinite
relaxations. We show that the method provides (a) a 0.878 approxima-
tion algorithm for s - t max-cut problems, (b) a 0.878 approximation
algorithm for the max-cut problem, that shows the cut obtained from
our technique partitions the graph into two equal sizes in expectation,
(c) a 0.878 approximation algorithm for s - u - v max-cut problems
with probability at least 0.91, and (d) a guarantee on the degree of
infeasibility and suboptimality for quadratic optimization with linear
constraints. It is interesting that the probability formulas we use is
from Sheppard [15] from 1900. The method does not use the geome-
try of the semidefinite relaxation and it is based on interpreting the
matrix X that arises from the semidefinite relaxation as a covariance
matrix of a multivariate normal distribution.

2. We introduce another randomization scheme based on order statistics.


A version of the heuristic that does not use a semidefinite relaxation
provides a 0.5 approximation algorithm for the graph maximum bi-
section problem. We conjecture that another version that uses the
semi-definite relaxation leads to a new algorithm for graph bisection,
which is within close to 0.878 of the optimal. We report computational
result on graphs with up to 3,000 nodes that show that the proposed
method provides near optimal solutions.

2 Positive Semidefinite Relaxations

The semidefinite programming (SDP) relaxation for (QP) is as follows:


Semidefinite Relaxations 5

ZSD = Maximize (Q,X)


(SDP)
subject to (Ai'X) = br, i = 1, ... ,m, (1)

d(X) = e, X t 0.
Here, Ai = aiai E lRnxn , i = 1, ... , m, (ai is the ith row of the matrix
A) and X E lRnxn is a symmetric matrix. Furthermore, (.,.) is the matrix
inner product (Q,X) = trace(QX), d(X) is a column vector containing the
diagonal components of X, e E lRn is the vector of all ones, and X t Z
means that X - Z is positive semidefinite. Obviously, (SDP) is feasible if
(QP) is feasible, since xx' E lRnxn is feasible for (SDP) if x E lRn is feasible
for (QP). Thus, we assume that (SDP) is feasible throughout this paper (see
Nesterov, Todd and Ye [14] for detecting the feasibility of (SDP)).
The dual of the (SDP) problem is
ZSD = Minimize Ei::l br . Ai + e'y
(2)

where D(y) is the diagonal matrix such that d(D(y)) = y E lRn. Note that
the dual is always feasible and it has an interior so that there is no duality
gap between the primal and dual. Denote by X and (X, y) an optimal
solution pair for the primal (1) and dual (2).
The positive semidefinite relaxation was first proposed by Lovasz and
Shrijver [11]. This relaxation problem can be solved in polynomial time,
e.g., see Nesterov and Nemirovskii [13] and Alizadeh [1].
Let ZQP denote the optimal solution value of problem (QP), which we
assume is feasible. Let ZSD denote the optimal solution value of problem
(SDP).
Proposition 2.1 If Problem (QP) is feasible, then,

ZQP ~ ZSD.

Proof Let x be an optimal solution of Problem (QP). Let X = xx' E lRnxn.


Then X t 0, d(X) = e,
(Ai, X) = Xf AiX = (aix)2 = 0, i = 1, ... , m,
and (Q,X) = x'Qx = ZQP. Thus, we have ZQP $ ZSD. o
6 D. Bertsimas and Yinyu Ye

3 A Randomized Rounding Method Based on the


Multivariate Normal Distribution
Let X be an optimal solution of Problem (SDP). We first review the Goe-
mans and Williamson randomization, and then proceed with our new method.

The Goemans-Williamson Randomization


Since X is positive semidefinite, there is a matrix factor V = (1h, ... ,vn ) E
)Rnxn, i.e., Vj is the jth column of V, such that X = V'V. The randomiza-
tion method of Goemans and Williamson [8] generates a random vector u
uniformly distributed on the n-dimensional unit ball and then assigns
x = sign(V'u), (3)
and for any x E )Rn, sign(x) is the vector whose components are sign(xj),
j = 1, ... ,n, that is,

. ( .) _ { 1 if Xj ~0
sIgn X3 -
-
1·f
I Xj < 0.

A New Randomized Rounding Heuristic


We interpret the positive semidefinite solution X as the covariance matrix
of a vector sample space.
A multivariate normal rounding heuristic (HI)
1. We generate a vector x from a
multivariate normal distribution
with 0 mean and covariance matrix X, that is,
x E N(O,X).
2. We assign
x = sign(x). (4)
To generate x, we simply generate a random vector u whose components are
iid N(O, 1) and let x = V'u.
Proposition 3.1
E[Xj] = 0, E[xJ] = 1, j = 1,2, ... ,n,
E[XiXj] = ~1r arcsin(xij), i, j = 1,2, ... ,n.
Semidefinite Relaxations 7

Proof. The marginal distribution of Xi is N(O,I), and thus Pr(xi = 1) =


Pr(xi = -1) = 1/2. Thus, E[Xi] = 0 and E[x~] = 1. Furthermore,
E[XiXj]
= Pr{xi = l,xj = 1) + Pr{xi = -1,xj = -1)
- Pr{xi = 1, Xj = -1) - Pr(xi = -1,xj = 1)
- Pr{xi ~ 0, Xj ~ 0) + Pr(xi < 0, Xj < 0)
- Pr{xi ~ 0, Xj < 0) - Pr{xi < 0, Xj ~ 0).

The tail probabilities of a multivariate normal distribution is a problem that


has been studied in the last 100 years. Sheppard [15] shows (see Johnson
and Kotz [10], p. 95) that

< 0, Xj < 0) 1 1 . ( )
Pr{Xi ~ 0, Xj ~ 0) = Pr{xi = 4" + 211" arCSIn Xij

Pr{xi ~ 0, Xj < 0) = Pr(xi < 0, Xj ~ 0) - -41 - -arCSIn


1
211"
. (x·· )
I, .
This leads to
(5)
o
We next apply this randomization method to several problems.

3.1 The s - t Max-Cut Problem


The relaxation of Problem (ST-) that we solve in this case is

1
Maximize 2" ~ Cij{l- Xij)
loJ
subject to Xst = -1,
Xjj = 1, j = 1, ... ,n,
XtO.

Theorem 3.2 Let Cij ~ 0 Jor all i, j. Then, heuristic Hl provides a Jeasible
solution Jor the (Sr- ) max-cut problem with objective value ZH:

E[ZH] ~ 0.878ZSD.
8 D. Bertsimas and Yinyu Ye

Proof. First, we notice that E[xs + Xt] = 0, and

E[(xs + Xt)2] = 2 + 2E[x sxt] = 2 + ~ arcsin( -1) = 2 - 2 = 0,


1['

i.e., Xs + Xt = 0 with probability 1, i.e., the solution is feasible.


Moreover, the value of the heuristic solution is

E[ZH] = ~Cij(pr(Xi = l,xj = -1) +Pr(xi = l,xj = -1))


tJ

~Cij(~ - ~arcsin(Xij))
IJ
1
> 0.878 2 ~Cij(l- Xij)
IJ
= O.878ZSD.
o

The relaxation of Problem (ST+) that we solve in this case is

1
Maximize 2 ~Cij(l- Xij)
I,J
(RST+) subject to Xst = 1,
Xjj = 1, j = 1, ... ,n,

XtO.

Theorem 3.3 Let Cij :;::: 0 for all i, j. Then, heuristic H1 provides a feasible
solution for the (Sr+-) max-cut problem with objective value ZH:

E[ZHl :;::: 0.878ZSD.

Proof. First, we notice that

E[(xs + xt}2] = 2 + 2E[x sXtl = 2 + ~ arcsin(l) = 4,


1['

and

Thus,
Semidefinite Relaxations 9

i.e., (xs + Xt)2 = 4 or Ixs + xtl = 2 with probability 1, i.e., the solution is
feasible.
Again, the value of the heuristic solution satisfies E[ZH] ~ 0.878ZSD. D

An identical analysis shows that Heuristic HI provides the 0.878 bound


for the max-cut problem and E[2:j=l Xj] = 0, i.e., the approximate cut is
expected to be equally divided.

3.2 The s - u - v Max-Cut Problem


For simplicity, let s = 1, U = 2 and v = 3. The relaxation of Problem (SUV)
that we solve in this case is
1
Maximize 2" ~ Cij{1 - Xij)
I,}

(RSUV) subject to X12 + X13 + X23 = -1,


Xjj = 1, j = 1, ... ,n,
XtO.

We now prove the following lemma.

Lemma 3.4 Let X t 0, d{X) = e and X12 + X13 + X23 = -1. Then

arcsin{X12) + arcsin{x12) + arcsin{x12) ~ -3 arcsin{I/3).


Proof. From X t 0 and d(X} = e, X12,X13,X23 E [-1,1]. Thus, at least one
of the three is negative. Now we have only two cases: the remaining two
have the opposite sign; or the remaining two are both non-positive. In the
former case, the maximum value of arcsin{x12} + arcsin{x12} + arcsin{x12} is
-2 arcsin{I/2}, and in the latter its maximum value is -3 arcsin{I/3). Since
-3 arcsin{I/3} > -2 arcsin{I/2}, we derive the lemma. D

Theorem 3.5 Let Cij ~ 0 for all i, j. Then, heuristic Hi applied to the
s-u-v max-cut problem generates a solution x with the following properties:

(a) Pr{{xl + X2 + X3)2 = 1) ~ ~ + 2~ arcsin{I/3) > 0.912.


(b) E[ZH] ~ 0.878ZSD.
10 D. Bertsimas and Yinyu Ye

Proof.
Pr((Xl + X2 + X3)2 = 1)
= 1- Pr(xl ~ 0,X2 ~ 0,X3 ~ 0) - Pr(xl < 0,X2 < 0,X3 < 0)
- 1- 2Pr(xl ~ 0,X2 ~ 0,X3 ~ 0)

= 1 - 2 (~ +. 4~ ( arcsin(3:12) + arcsin( 3:13) + arcsin( 3:23)) )


(See Tong [16], p. 190.)
> ~ + 2~ arcsin(I/3) > 0.912 (from the previous lemma).
As before, E[ZH] ~ 0.878ZSD. 0

Hence, we have
Corollary 3.6 Let Cij ~ 0 for all i, j. Then, heuristic H1 provides a feasible
solution with probability 0.91 for the (SUV) max-cut problem and delivers
an objective value ZH:
E[ZH] ~ 0.878ZSD.

3.3 Constrained Quadratic Optimization


In this section, we consider Problem (QP) with m linear homogeneous con-
straints, Le., bi = 0 for i = 1, ... , m. We assume that the problem is feasible
and the coefficient matrix Q is a positive semidefinite matrix (therefore,
some qij can be negative). Heuristic HI does not necessarily generate a
feasible solution. We show, however, that every constraint is satisfied in
expectation. We also show an upper bound on the degree of violation of
each constraint.
Theorem 3.7 Heuristic H1 generates a solution x with the following prop-
erties:
(a)

(b)

(c)

Before proving the above theorem, we prove a lemma. For any function
of one variable f(t) and X E !Rnxn, let J[X] E !Rnxn be the matrix with the
components f (Xij).
Semidefinite Relaxations 11

Lemma 3.8 Let X t 0 and d(X) ::; e. Then

11"-2 ~
X + -2-(L.... Xjj)I t arcsin[X] t X,
j=1

where I is the identity matrix in !Rnxn.


Proof. The right-side inequality is proved by Nesterov [12]. We now prove
the left. Note that X t 0 and IXij I ::; 1 for all i, j = 1, ... , n, we have
[X]' t 0 for all t = 1,2, .... Thus

1 3 1·3 5
arcsin[X] = X + 2 . 3 [X] + 2 . 4 . 5 [X] + ...

-< X f:'i 3) I + 21·. 43. 5 (~


+ 2 1. 3 (~ Xjj f:'i Xjj5) I + ...
-< X + 2.3
l(n) 1.3(n?= Xjj )1+ ...
?= Xjj 1+ 2.4.5
3=1 3=1

X +( .r;
n )(1 1.3
Xjj I 2. 3 + 2 . 4 . 5 + ...
)

X+ (tXjj)I(i -1)-
3=1

Proof of the theorem


Since x has 0 mean,
E[a~x] = O.

Recall that X t 0 and d(X) = e. Since Ai = a~ai t 0, we have:


12 D. Bertsimas and Yinyu Ye

proving the second part of the theorem.


The heuristic obtains a solution with expected value

E[ZH] = L %E[XiXj]
i,j
= ~7r L.. qij arcsin( Xij )
~,J

2 -
- -(Q, arcsin[X))
7r

> ~(Q,X) (since Q t 0, arcsin[X] t X)


2
= -ZSD,
7r

proving the last part of the theorem. D

The previous theorem shows that the solution produced by our method
is feasible in expectation. It is not obvious that the Goemans-Williamson
technique has this property. For the case without linear constraints, we
obtain that heuristic H1leads to a ~ approximation guarantee. Nesterov [12]
has showed that the randomization considered in Goemans and Williamson
[8] leads to a ~ approximation algorithm.

3.4 The Maximum Graph Bisection Problem


In this section, we show that the degree of violation ofthe constraint 2:,1=1 Xj =
ofor the graph bisection problem (GB), after we apply heuristic HI is indeed
smaller than the general boolean, quadratic optimization problem. Similar
results are obtained by Frieze and Jerrum [6]. Note that in solving (GB),
Semidefinite Relaxations 13

we have m = 1, Al = ee', d{X) = e, and e' X e = (AI, X) = O. Thus,

E[{e'x)2] = ~(ee',arcsin[X])
7r
= ~e'
7r
arcsin[X]e. (6)

We now prove the following general lemma using the bound developed
by Goemans and Williamson [8].

Lemma 3.9 Let X!: 0, d(X) = e and a'Xa = 0 where a is a nonnegative


vector. Then
~a' arcsin[X]a ~ 0.122{e' a)2.
7r

Proof. Noting that I:ij aiajXij = 0, we have

~a' arcsin[X]a =
7r

ij ij

L aiaj - 0.878 L aiaj


ij ij
0.122{e' a)2.

o
The following result is immediate:

Theorem 3.10 Let Cij ~ 0 for all i,j. Then, heuristic Hl applied to the
graph bisection problem generates a solution x with the following properties:

(a) E[e'x] = O.
=-r,.......,....,............

(b) cr[e'x] = v'E[{e'x)2] < 0.3493.


n n-
(c) E[ZH] ~ 0.878ZSD.

From the Chebyshev inequality

Pr(le'xl > 0.36n) < E[(e'x)2] = 0.943.


- - (0.36n)2
14 D. Bertsimas and Yinyu Ye

Thus, by repeating the randomization process, we can generate a cut in


polynomial time that is within 0.878 from optimal such that at leaSt 32%
of the nodes are in one part of the cut and at least 32% of the nodes are in
the other part.

4 A Heuristic Based on Order Statistics and its


Application to the Graph Bisection Problem
In order to motivate Heuristic H3 below, we propose Heuristic H2 for the
graph bisection problem that provides a bisection, which is guaranteed to
be within 0.5 from optimal.
An order statistics randomized heuristic (H2)
1. For each i, i = 1, ... , n, generate a uniform random variable Ui in
[0,1].
2. Let n = 2k. Let U(r) be the order statistics of the numbers Ui. Let
UMED = U(k) be the median of the numbers Ui.
3. Nodes i, such that Ui ~ UMED, belong to one part of the cut, and
nodes with Ui < UMED belong in the other part of the cut.
Clearly, the above heuristic provides a feasible solution for the graph bisec-
tion problem. Let ZGB be the optimal solution value of the graph bisection
problem.
Theorem 4.1 The order statistics randomized heuristic H2 provides a cut
such that
k-1 k-1
E[ZH] = 2k -1 ~Cij ~ 2k -1 ZGB.
I.J

Proof. The expected value of the heuristic is:

E[ZH] = ~Cij(pr(Ui ~ UMED < Uj) +Pr(Uj ~ UMED < Ui)).


'.J
The event Uj < UMED < Ui is equivalent to the event that k - 1 of the U,
(I =f=. i, j) are less than or equal to z, Uj is less than or equal to z, one of the
U, (I =f=. i,j) is within (z, z + dz), k - 2 of the U, are greater than z, and Ui
is greater than z, for some z E [0,1]. Thus,
(2k - 2)! fl k k-I
Pr(Uj < UMED < Ui) = (k -I)! I! (k _ 2)! 10 z (1- z) dz.
Semidefinite Relaxations 15

The integral in the rhs is a beta integral B(k + 1, k), where

B(r,s) = 10 1 zr-l(l_ z)S-ldz.


Since
k!(k -I)!
B(k + 1, k) = (2k)! '
we obtain that
(2k - 2)! k!(k - I)! k- 1
Pr(Uj < UMED < Ui) = (k -I)! I! (k - 2)! (2k)! = 2(2k -1)'
Symmetrically,
k-l
Pr(Ui < UMED < Uj) = 2(2k -1)'

Thus, we obtain

o
Notice that the above order statistics randomized heuristic is within
~ (I - n': 1) from the optimal graph bisection.
The following algorithm is a natural generalization of Heuristics HI and
H2.
Multivariate normal order statistic heuristic (H3)
1. Solve the semidefinite relaxation of the graph bisection problem. Let
X be an optimal solution.
2. Generate a random vector x distributed according to the multivariate
normal with mean 0, and covariance matrix X.

3. Compute the median XMED of the numbers Xi.

4. Nodes i with Xi 2: XMED, are in one part of the cut, and nodes i with
Xi < XMED are in the other part of the cut.

Clearly, the above heuristic provides a feasible solution to the graph


bisection problem. However, its analysis appears to be difficult as it involves
the characterization of the median of correlated random variables. Note
16 D. Bertsimas and Yinyu Ye

that since Var[Ei Xi] = E[e' XX' e) = e' X e = O. Therefore, the sample mean
Ei xi/n is equal to 0 deterministically. We then give a heuristic argument
that leads us to believe that our randomized method produces a feasible cut
which is within close to 0.878 from optimum. Heuristically speaking, one
would expect that the median and the sample mean would be close to each
other, i.e., we expect that XMED ~ O. Thus,

E[ZH] = ~ Cij (Pr(Xi ~ XMED < Xj) + Pr(xj ~ XMED < Xi))
1.3

~ Ei,j Cij (Pr(Xi ~ 0 < Xj) + Pr(Xj ~ 0 < Xi))


> 0.878ZSD,

from our previous analysis. In the next section we show computational


results that show that the above heuristic provides near optimal solutions
empirically.

5 Computational Results
We have applied the randomization methods HI and H3 to approximating
the maximum graph bisection of the well-known G - set graph problems of
Helmberg and Rendl [9]. These problems are generated by rudy, a machine
independent graph generator written by G. Rinaldi, and become standard
test problems for graph optimization. The positive semidefinite relaxation
solution X are computed by the program of Benson, Ye and Zhang [3].
We have solved 54 problems. Table 1 includes some representative prob-
lems. The second column of the Table includes the number of nodes in each
graph. The third column is the positive semidefinite relaxation objective
value ZSD of the maximum graph bisection (GB). The fourth column is the
average le'xl of 2,000 vectors x generated by randomization method HI for
each graph, respectively. We see that this value is rather small relative to
the number of nodes, indicating that the graph is close to being equally
divided by x. The fifth column is the largest objective value of the 2,000
vectors x generated by randomization method H3. Note that the vectors x
generated by H3 are always feasible (i.e., e'x = 0) for the graph bisection
problem. The last column depicts the ratio ZH /ZSD. If this ratio equals 1,
then the approximate solution is actually 100% optimal. Most of problems
have ratios above 0.9, indicating that their solutions are at least 90% opti-
Semidefinite Relaxations 17

Graph Dim. ZSD HI: le'xl H3:ZH ZH/ZSD


Gl 800 48331.86 11.70 45760 0.9467
G2 800 48357.05 11.93 45596 0.9429
G3 800 48334.91 11.48 45788 0.9473
G11 800 2516.62 12.58 2136 0.8487
G12 800 2495.33 13.87 2112 0.8463
G13 800 2588.35 12.30 2224 0.8592
G14 800 12759.36 12.13 11908 0.9332
G15 800 12684.49 12.64 11872 0.9359
G16 800 12698.21 11.99 11864 0.9343
G17 800 12683.56 12.35 11880 0.9366
G18 800 4662.46 13.52 3696 0.7927
G19 800 4322.78 13.39 3380 0.7819
G20 800 4442.65 13.56 3532 0.7950
G22 2000 56543.19 18.03 51848 0.9169
G24 2000 56561.79 17.76 51796 0.9157
G32 2000 6270.51 20.76 5208 0.8305
G34 2000 6186.67 21.15 5184 0.8379
G43 1000 28128.19 12.71 , 26004 0.9244
G44 1000 28110.77 12.48 25952 0.9232
G45 1000 28098.26 12.44 25928 0.9227
G50 3000 23952.69 3.38 23520 0.9819
G51 1000 16022.41 13.64 14932 0.9319
G52 1000 16036.39 14.24 14964 0.9331
G53 1000 16036.85 13.96 14944 0.9318
G54 1000 16021.85 12.94 14956 0.9334
Table 1: Quality of new randomization algorithms for the maximum graph
bisection problem.
18 D. Bertsimas and Yinyu Ye

mal. There are few below 0.9 (G11-GI3, GI8-G20, G32-G34), because all
of these problems have negative edge weights Cij.
Acknowledgement We would like to thank Steve Benson and Xiong Zhang
for helping us to conduct the computational test.

References
[1] F. Alizadeh, Combinatorial optimization with interior point methods and
semi-definite matrices, PhD thesis, University of Minnesota, Minneapo-
lis, MN, 1991.

[2] M. Bellare and P. Rogaway, "The complexity of approximating a non-


linear program," Mathematical Programming 69 (1995) 429-442.

[3] S. Benson Y. Ye and X. Zhang, "Solving large-scale sparse semidefinite


programs for combinatorial optimization," Working Paper, Computa-
tional and Applied Mathematics, The University of Iowa, Iowa City, IA
52242,1997.

[4] D. Bertsimas, C. Teo and R. Vohra, "On dependent randomized rounding


algorithms," Proc. 5th IPCD Conference (1996) 330-344.

[5] D. Bertsimas, C. Teo and R. Vohra, "Nonlinear relaxations and improved


randomized approximation algorithms for multicut problems," Proc. 4th
IPCD Conference (1995) 29-39.
[6] A. Frieze and M. Jerrum, "Improved approximation algorithms for max
k-cut and max bisection," Proc. 4th IPCD Conference (1995) 1-13.

[7] L. E. Gibbons, D. W. Hearn and P. M. Pardalos, "A continuous based


heuristic for the maximum clique problem," DIMA CS Series in Discrete
Mathematics and Theoretical Computer Science 26 (1996) 103-124.
[8] M. X. Goemans and D. P. Williamson, "Improved approximation algo-
rithms for Maximum Cut and Satisfiability problems using semidefinite
programming," Journal of ACM 42 (1995) 1115-1145.

[9] C. Helmberg and F. Rendl, "A spectral bundle method for semidefinite
programming," ZIB Preprint SC 97-37, Konrad-Zuse-Zentrum fuer Infor-
mationstechnik Berlin, Takustrasse 7, D-14195 Berlin, Germany, August
1997.
Semidefinite Relaxations 19

[10] N. Johnson and S. Kotz, Distributions in Statistics: Continuous Mul-


tivariate Distributions, John Wiley & Sons, 1972.

[11] L. Lovasz and A. Shrijver, "Cones of matrices and setfunctions, and


0-1 optimization," SIAM Journal on Optimization 1 (1990) 166-190.
[12] Yu. E. Nesterov, "Quality of semidefinite relaxation for nonconvex
quadratic optimization," CORE Discussion Paper, #9719, Belgium,
March 1997.

[13] Yu. E. Nesterov and A. S. Nemirovskii, Interior Point Polynomial Meth-


ods in Convex Programming : Theory and Algorithms, SIAM Publica-
tions, SIAM, Philadelphia, 1993.

[14] Yu. E. Nesterov, M. J. Todd, and Y. Ye, "Infeasible-start primal-dual


methods and infeasibility detectors for nonlinear programming prob-
lems," Technical Report No. 1156, School of Operations Research and
and Industrial Engineering, Cornell University, Ithaca, NY 14853-3801,
1996, to appear in Mathematical Programming.

[15] W. F. Sheppard, "On the calculation of the double integral expressing


normal correlation," Transactions of the Cambridge Philosophical Soci-
ety 19 (1900) 23-66.

[16] Y. L. Tong, The Multivariate Normal Distribution, Springer-Verlag,


New York, 1990.
[17] Y. Ye, "Approximating quadratic programming with quadratic con-
straints," Working Paper, Department of Management Science, The Uni-
versity of Iowa, Iowa City, IA 52242, 1997, to appear in Mathematical
Programming.
21

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 21-169
@1998 Kluwer Academic Publishers

A Review of Machine Scheduling:


Complexity, Algorithms and Approximability
Bo Chen
Warwick Business School
University oJ Warwick
Coventry CV4 7AL, U.K.
E-mail: B.Chen~varvick.ac . uk

Chris N. Potts
Faculty oj Mathematical Studies
University oj Southampton
Southampton S017 lBJ, U.K.
E-mail: cnpCOmaths.soton.ac . uk

Gerhard J. Woeginger
Institut fUr Mathematik
Graz University oj Technology
Steyrergasse 30
A-8010 Graz, Austria
E-mail: gvoegHlopt.math.tu-graz.ac.at

Contents
1 Introduction 24

2 Scheduling Models 26
2.1 Machine Environment . . . . . . . . . . . . . . . . . . . . . . . . .. 27
2.2 Job Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . .. 27
2.3 Optimality Criteria. . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
2.4 Three-Field Representation . . . . . . . . . . . . . . . . . . . . . .. 29
22 Bo Chen, C.N. Potts, and G.J. Woeginger

3 Methodology 32
3.1 Computational Complexity 32
3.2 Enumerative Algorithms . . . 35
3.2.1 Branch and Bound . . 35
3.2.2 Dynamic Programming 36
3.3 Local Search . . . . . . . . . 36
3.3.1 Neighborhood Search 36
3.3.2 Genetic Algorithms. 38
3.4 Approximation Algorithms 38

4 Single Machine Problems 40


4.1 Maximum Lateness. . . . . . . . 40
4.1.1 Complexity........ 40
4.1.2 Enumerative Algorithms. 41
4.1.3 Approximation...... 43
4.2 Total Weighted Completion Time . 44
4.2.1 Complexity........ 44
4.2.2 Enumerative Algorithms. 45
4.2.3 Approximation.. 47
4.3 Total Weighted Tardiness . . . . 49
4.3.1 Complexity........ 49
4.3.2 Enumerative Algorithms . 49
4.3.3 Local Search . . . . . . 52
4.3.4 Approximation..... 53
4.4 Weighted Number of Late Jobs 54
4.4.1 Complexity....... 54
4.4.2 Enumerative Algorithms . 56
4.4.3 Approximation...... 57
4.5 Total Weighted Earliness and Tardiness 57
4.5.1 Complexity........ 57
4.5.2 Enumerative Algorithms . 60
4.5.3 Local Search 61
4.5.4 Approximation 62
4.6 Other Criteria. . . . . 62
4.6.1 Single Criteria 62
4.6.2 Multiple Criteria 63

5 Parallel Machine Problems: Unit-Length Jobs and Preemption 65


5.1 Minmax Criteria . . . . 66
5.2 Minsum Criteria . . . . . 67
5.3 Precedence Constraints 68
5.3.1 Unit-Length Jobs. 68
5.3.2 General-Length Jobs . 69
5.4 On-Line Algorithms . . . . . 70
A Review of Machine Scheduling 23

5.4.1 Clairvoyant Scheduling. . . . . . . . . . . . . . . . . . . . .. 70


5.4.2 Non-Clairvoyant Scheduling . . . . . . . . . . . . . . . . . .. 71

6 Parallel Machine Problems: No Preemption 72


6.1 Minmax Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.1 Complexity........................ 72
6.1.2 Enumerative Algorithms. . . . . . . . . . . . . . . . . 73
6.1.3 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
6.1.4 Approximation using List Scheduling. . . . . . . . . . . . .. 74
6.1.5 Bin-Packing Based Approximation . . . . . . . . . . . . . .. 75
6.1.6 Approximation using Linear Programming. . . . . . . . . .. 76
6.1.7 Other Approaches for Approximation . . . . . . . . . . . . . 76
6.2 Minsum Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77
6.2.1 Complexity........................ 77
6.2.2 Enumerative Algorithms . . . . . . . . . . . . . . . . . . . 78
6.2.3 Approximation.......................... 79
6.3 Precedence Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 On-Line Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .. 81
6.4.1 Scheduling Over a List. . . . . . . . . . . . . . . . . . . . .. 81
6.4.2 Scheduling Over Time . . . . . . . . . . . . . . . . . . . . .. 83
6.4.3 Non-Clairvoyant Scheduling. . . . . . . . . . . . . . . . . .. 85

7 Multi-Stage Problems 86
7.1 The Open Shop. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 87
7.1.1 Complexity............................ 87
7.1.2 Enumerative Algorithms. . . . . . . . . . . . . . . . . . . .. 89
7.1.3 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
7.1.4 Approximation: Ratio Guarantees . . . . . . . . . . . . . .. 91
7.2 The Flow Shop . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92
7.2.1 Complexity............................ 92
7.2.2 Enumerative Algorithms. . . . . . . . . . . . . . . . . 94
7.2.3 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2.4 Approximation: Absolute Guarantees . . . . . . . . . . . . . 98
7.2.5 Approximation: Ratio Guarantees . . . . . . . . . . . . . .. 99
7.3 The Job Shop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.2 Enumerative Algorithms . . . . . . . . . . . . . . . . . . . . . 103
7.3.3 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3.4 Approximation: Absolute Guarantees . . . . . . . . . . . . . 107
7.3.5 Approximation: Ratio Guarantees . . . . . . . . . .... 108
7.4 Other Multi-Stage Problems . . . . . . . . . . . . . . . . . . . . . . . 108
24 Bo Chen, C.N. Potts, and C.: Woeginger

8 Further Scheduling Models 110


8.1 Family Scheduling ... . 110
8.1.1 Complexity ... . 111
8.1.2 Enumerative Algorithms. 113
8.1.3 Local Search . . . . . . . 113
8.1.4 Approximation . . . . . . 115
8.2 Scheduling Multiprocessor Jobs . 115
8.2.1 Parallel Machines . . . . . 116
8.2.2 Dedicated Machines . . . 117
8.3 Scheduling with Communication Delays 118
8.3.1 Complexity . . . . . . . . 119
8.3.2 Approximation....... 120
8.4 Resource Constrained Scheduling . 121
8.4.1 No Precedence Constraints 122
8.4.2 Precedence Constraints .. 123
8.5 Scheduling with Controllable Processing Times . 124
8.5.1 Continuously Controllable Processing Times . 125
8.5.2 Discretely Controllable Processing Times 127

9 Concluding Remarks 127

References 129

1 Introduction
The scheduling of computer and manufacturing systems has been the sub-
ject of extensive research for over forty years. In addition to computers
and manufacturing, scheduling theory can be applied to many areas includ-
ing agriculture, hospitals and transport. The main focus is on the efficient
allocation of one or more resources to activities over time. Adopting manu-
facturing terminology, a job consists of one or more activities, and a machine
is a resource that can perform at most one activity at a time. We concen-
trate on deterministic machine scheduling for which it is assumed that all
data that define a problem instance are known with certainty.
Much of the early work on scheduling was concerned with deriving rules
that find optimal schedules for some simple models. Examples include prob-
lems of scheduling a single machine to minimize the maximum lateness of the
jobs, for which an optimal solution is obtained by sequencing the jobs accord-
ing to the earliest due date (EDD) rule of Jackson [281], and of scheduling a
A Review of Machine Scheduling 25

single machine to minimize the sum of weighted completion times of the jobs,
for which an optimal solution is obtained by sequencing the jobs according
to the shortest weighted processing time (SWPT) rule of Smith [495]. These
simple problems arise infrequently in practice. Nevertheless, the analysis of
simple models provides valuable insights that enable more complex systems
to be scheduled. Moreover, the performance of many production systems
is often dictated by the quality of the schedules for a bottleneck machine.
Again, simple models can be useful for scheduling such a machine.
A significant research topic in scheduling is the use of complexity theory
to classify scheduling problems as polynomially solvable or NP-hard. For
details of the theory, we refer the reader to the books by Garey & John-
son [189] and Papadimitriou [413]. Many fundamental results in the area
of scheduling are derived by Lenstra, Rinnooy Kan & Brucker [349]. The
NP-hardness of a problem suggests that there are instances for which the
computation time required to find an optimal solution increases exponen-
tially with problem size. If large computation times for such problems are
unacceptable, then a heuristic (method) or an a,pproximation algorithm is
used to give an approximate solution. There is clearly a trade-off between
the investment in computation time for obtaining a solution and the quality
of that solution.
To obtain exact solutions of NP-hard scheduling problems, an enumera-
tive algorithm is usually applied. The main types of enumerative algorithms
are branch and bound and dynamic programming, and both may benefit from
dominance rules which help to restrict the search. Moreover, the ability of
a branch and bound algorithm to generate an optimal solution at reason-
able computational expense usually depends on the quality of the bounding
scheme that is used to eliminate partial solutions. Dominance rules and
bounding schemes are usually based on problem-specific features.
The performance of heuristics is often evaluated empirically. However, it
is sometimes possible to carry out a theoretical analysis of heuristic perfor-
mance. Following the pioneering work of Graham [225, 226] on list schedul-
ing heuristics for parallel machines, research on obtaining performance guar-
antees through worst-case analysis has developed significantly. Generally,
heuristics with stronger guarantees are preferred.
For problems where enumerative algorithms are unable to solve problems
with more than a handful of jobs, and the solutions generated by simple
heuristic methods may be far from the optimum, local search methods are
extremely useful. These methods are described by Aarts & Lenstra [1], and
their application to scheduling problems is reviewed by Anderson, Glass &
26 Bo Chen, C.N. Potts, and G.J. Woeginger

Potts [23]. For many scheduling problems, local search methods including
simulated annealing, tabu search and genetic algorithms are very successful
in generating high-quality solutions.
This chapter surveys research on deterministic machine scheduling. Our
survey includes statements of complexity results, descriptions of enumera-
tive algorithms together with an indication of the size of instances that they
might reasonably be expected to solve, the main features of local search
methods including an indication of their ability to generate near-optimal
solutions, and details of performance guarantees for approximation algo-
rithms. As a complement to the present chapter, we refer the reader to an
excellent survey on machine scheduling by Lawler, Lenstra, Rinnooy Kan &
Shmoys [335], with additional material in the books by Tanaev, Gordon &
Shafransky [524] and by Tanaev, Sotskov & Strusevich [525].
The rest of this chapter is organized as follows. Section 2 provides a de-
scription of scheduling problems, and then presents a classical representation
scheme that is based on the physical environment and the performance crite-
rion for the problem. The various methodologies that are used in scheduling
research are described in Section 3. Sections 4, 5, 6 and 7 contain our
survey of results for classical problems involving a single machine, paral-
lel machines with preemption, parallels machine without preemption, and
multi-stage systems, respectively. In Section 8, we give results for a selection
of non-classical models. Finally, some concluding remarks are contained in
Section 9.

2 Scheduling Models
The machine scheduling problems that we consider can be described as fol-
lows. There are m machines that are used to process n jobs. A schedule
specifies, for each machine i (i = 1, ... , m) and each job j (j = 1, ... ,n), one
or more time intervals throughout which processing is performed on j by i.
A schedule is feasible if there is no overlapping of time intervals correspond-
ing to the same job (so that a job cannot be processed by two machines
at once), or of time intervals corresponding to the same machine (so that a
machine cannot process two jobs at the same time), and also if it satisfies
various requirements relating to the specific problem type. The problem
type is specified by the machine environment, the job characteristics and an
optimality criterion.
A Review of Machine Scheduling 27

2.1 Machine Environment


Different configurations of machines are possible. An operation refers to a
specified period of processing by some machine type. We assume that all
machines become available to process jobs at time zero.
In a single-stage production system, each job requires one operation,
whereas in multi-stage systems the jobs require operations at different stages.
Single-stage systems involve either a single machine, or m machines operat-
ing in parallel. In the case of parallel machines, each machine has the same
function. We consider three cases: identical parallel machines in which each
processing time is independent of the machine performing the job; uniform
parallel machines in which the machines operate at different speeds but are
otherwise identical; and unrelated parallel machines in which the processing
time of a job depends on the machine assignment.
There are three main types of multi-stage systems. All such systems
that we consider comprise s stages, each having a different function. In a
flow shop with s stages, the processing of each job goes through the stages
1, ... ,s in that order. In an open shop, the processing of each job also goes
once through each stage, but the routing (that specifies the sequence of
stages through which a job must pass) can differ between jobs and forms
part of the decision process. In a job shop, each job has a prescribed routing
through the stages, and the routing may differ from job to job. There are also
multiprocessor variants of multi-stage systems, where each stage comprises
several (usually identical) parallel machines.

2.2 Job Characteristics


The processing requirements of each job j are given: for the case of a single
machine and identical parallel machines, Pj is the processing time; for uni-
form parallel machines, the processing time on machine i may be expressed
as Pj/Si, where Si is the speed of machine i; for the case of unrelated par-
allel machines, a flow shop and an open shop, Pij is the processing time on
machine/stage i; and for a job shop, Pij denotes the processing time of the
ith operation (which is not necessarily performed at stage i). We assume
that all Pj and Pij are non-negative integers. Given an instance, denote by
Pmax the maximum value of all Pj or all Pij'
In addition to its processing requirements, a job is characterized by its
availability for processing, any dependence on other jobs, and whether in-
terruptions in the processing of its operations are allowed. The availability
28 Bo Chen, C.N. Potts, and G.J. Woeginger

of each job j may be restricted by its integer release date Tj that defines
when it becomes available for processing, and/or by its integer deadline dj
that specifies the time by which it must be completed.
Job dependence arises when there are precedence constraints on the jobs.
If job j has precedence over job k, then k cannot start its processing until
j is completed. Precedence constraints are usually specified by a directed
acyclic precedence graph G with vertices 1, ... , n. There is a directed path
from vertex j to vertex k if and only if job j has precedence over job k.
Some scheduling models allow preemption: the processing of any oper-
ation may be interrupted and resumed at a later time on the same or on a
different machine.
In classical scheduling models, it is assumed that the scheduler has full
information of the problem instance, such as total number of jobs to be
scheduled, their release dates and their processing times, before the process
of scheduling actually starts. Such models assume that scheduling is off-line.
By contrast, if information about the problem instance is made available to
the scheduler job by job during the course of scheduling, then the model
requires scheduling to be on-line. In these models, we assume that, unless
stated otherwise, the scheduler's decision to assign and schedule a job or
operation is irrevocable. According to the way information on job charac-
teristics is released to the scheduler, we classify on-line scheduling problems
into the following three basic categories.

• In the model of scheduling over list, the scheduler is confronted with


the jobs one-by-one as they appear in a list. The existence of a job
is not known until all its predecessors in the list have already been
scheduled.

• In the model of scheduling over time, all jobs arrive at their release
dates. The jobs are scheduled with the passage of time and, at any
point of time, the scheduler only has knowledge of those jobs that have
already arrived.

In the above two on-line paradigms, we assume that once a job is known to
the scheduler, its processing requirement is also known. Therefore, we call
these two basic paradigms clairvoyant.

• The third basic on-line paradigm is the non-clairvoyant scheduling


model. In this model, the processing requirement of a job is unknown
until its processing is completed.
A Review of Machine Scheduling 29

In scheduling over time, restarts are sometimes allowed in which the pro-
cessing of a job is interrupted to allow another job to start, and at some
later time the interrupted job is started again from scratch. In nearly on-
line scheduling over time, the release date of next job is always known to
the scheduler. This small amount of additional information makes certain
on-line scheduling problems much easier to tackle. A good source of infor-
mation on all kinds of on-line scheduling models is the survey article by
Sgall [481].

2.3 Optimality Criteria


For each job j, an integer due date dj and a positive integer weight Wj
may be specified. Given a schedule (1, we can compute for job j: the
completion time Gj(O'); the flow time Fj(O') = Gj(O') - rj; the lateness
Lj(O') = Gj(O') - dj; the earliness Ej(O') = maxi dj - Gj(O'), O}; the tardiness
Tj(O') = max {Gj(O') - dj, O}; and the unit penalty Uj(O') = 1 if Gj(O') > dj,
and Uj(O') = 0 otherwise. Moreover, if I; is a regular objective function, i.e.,
a non-decreasing cost function, then the cost of job j is I; (0') = I; (Gj (0')).
If there is no ambiguity about the schedule under consideration, we write
Gj, Fj, Lj, Ej, Tj, Uj, and 1;, respectively.
Some commonly used optimality criteria involve the minimization of:
the maximum completion time, or makespan, Gmax = maxj Gj; the max-
imum lateness Lmax = maxj Lj; the maximum cost fmax = maxj 1;; the
maximum earliness Emax = maxj Ej; the total (weighted) completion time
Lj(Wj)Gj; the total (weighted) flow time L;(Wj)Fj; the total (weighted)
earliness 'Ej(wj)Ej ; the total (weighted) tardiness 'Ej(wj)Tj; the (weighted)
number oflate jobs Lj(Wj)Uj; or the total cost L 1;, where each maximiza-
tion and each summation is taken over all jobs j. Also, some situations
require more than one of these criteria to be considered.

2.4 Three-Field Representation


It is convenient to adopt the representation scheme of Graham, Lawler,
Lenstra & Rinnooy Kan [228]. This is a three-field descriptor al,8h' which
indicates problem type: a represents the machine environment, ,8 defines
the job characteristics, and 'Y is the optimality criterion.
Let ° denote the empty symbol. The first field takes the form a =
ala2a3, where aI, a2 and a3 are interpreted as follows .

• a1 E {o,P,Q,R,F,O,J}:
30 Bo Chen, C.N. Potts, and G.J. Woeginger

- al = 0: a single machine;
- al = P: identical parallel machines;
- al = Q: uniform parallel machines;
- al = R: unrelated parallel machines;
- al = 0: an open shop;
- al = F: a flow shop;
- al = J: a job shop .

• a2 E {o,m,s}:

- a2 = 0: the number of machines/stages is arbitrary;


- a2 = m: there is a fixed number m of machines;
- a2 = s: there is a fixed number s of stages.

- a3 = 0: a single stage, or several stages each with a single machine;


- a3 = (Pm): multi-stage with m identical parallel machines at each
stage;
- a3 = (Pmb ... , Pms): multi-stage with mk identical parallel ma-
chines at stage k;
- a3 = (P): multi-stage with an arbitrary number of identical parallel
machines at each stage.

We note that for a single machine problem al = 0, a2 = 1 and a3 = 0,


whereas al i- 0 and a2 i- 1 for other problem types. Further, a3 = 0 if
al E {o,P,Q,R}.
The second field {3 ~ {{31, f32, {33, {34, {35, f3f,} indicates job characteristics
as follows .

• {31 E {o, on-line-list, on-line, on-line-list-nc1v, on-line-nc1v}:

- {31 = 0: off-line scheduling;


- {31 = on-line-list: on-line scheduling over list;
- {31 = on-line: together with f32 = Tj, it indicates on-line scheduling
over time;
- {31 = on-line-list-nc1v: on-line non-clairvoyant scheduling over list;
- {31 = on-line-nc1v: together with f32 = Tj, it indicates on-line non-
clairvoyant scheduling over time.
A Review of Machine Scheduling 31

- f32 = 0: no release dates are specified;


- f32 = Tj: jobs have release dates.
• {33 E {o,dj}:

- {33 = 0: no deadlines are specified;


- {33 = d;: jobs have deadlines.

• {34 E {o,pmtn}:

- {34 = 0: no preemption is allowed;


- {34 = pmtn: operations of jobs may be preempted.

• {35 E {o, in tree, outtree, tree, chain, prec}:

- {35 = 0: no precedence constraints are specified;


- {35 = chain: precedence constraints on jobs are defined where each
vertex has out degree and indegree at most one;
- {35 = in tree: precedence constraints on jobs are defined by a rooted
tree which has indegree at most one for each vertex;
- {35 = outtree: precedence constraints on jobs are defined by a rooted
tree which has out degree at most one for each vertex;
- {35 = tree: precedence constraints on jobs are defined by a rooted
intree or outtree;
- f35 = prec: jobs have arbitrary precedence constraints.

• f36 E {o,Pj = 1,Pij = 1}:

- f36 = 0: processing times are arbitrary;


- f36 = Pj = 1: all jobs in a single-stage system have unit processing
times;
- {36 = Pij = 1: all operations in a multi-stage system have unit
processing times.

Lastly, the third field defines the optimality criterion, which involves the
minimization of

'Y E {CmID" Lmax, Emax ,Tmax , imax, E(wj)Cj,


E(wj)Fj E(wj)Ej, E(wj)Tj, E(wj)Uj, Ef;}.
32 Bo Chen, C.N. Potts, and G.J. Woeginger

Furthermore, as indicated in Section 2.3, it is sometimes appropriate to


consider several of these criteria.
To illustrate the three-field descriptor, we present four examples.
llrj,precl EWjCj is the problem of scheduling jobs with release dates
and precedence constraints on a single machine to minimize the total
weighted completion time.
RlpmtnlLmax is the problem of preemptively scheduling jobs on an ar-
bitrary number of unrelated parallel machines to minimize the maximum
lateness.
031Pij = 11 E Uj is the problem of scheduling jobs in a three-machine
open shop to minimize the number of late jobs, where the processing time
of each operation is one unit.
F2(P4, P3)IICmax is the problem of scheduling jobs in a two-stage flow
shop to minimize the makespan, where the two stages comprise four and
three identical parallel machines, respectively.

3 Methodology
In this section, we outline the methods and techniques that are used to ana-
lyze and solve scheduling problems. A scheduling problem is a special type
of combinatorial optimization problem. Thus, we can use the methodology
that is used for combinatorial optimization. For example, the main tools
for providing negative results (non-existence of fast algorithms for a specific
problem) come from computational complexity theory. Also, the main tools
for providing positive results are enumerative algorithms for finding exact
solutions, local search methods for finding approximate solutions, and poly-
nomial time approximation algorithms with guarantees on their worst-case
performance.
The N'P-hardness of an optimization problem suggests that it is not
always possible to find an optimal solution quickly. Therefore, instead of
searching for an optimal solution with enormous computational effort, we
may instead use a local search method or an approximation algorithm to
generate approximate solutions that are close to the optimum with consid-
erably less investment in computational resources.

3.1 Computational Complexity


Practical experience has shown that some scheduling problems are easier to
solve than others. For example, computers of today can solve instances of
A Review of Machine Scheduling 33

problem 111 E WjGj with several thousands of jobs within seconds, whereas
it takes at least several hours to solve some even moderately sized instances
of problem JIIGmax with, for example, 30 jobs and 30 machines. Computa-
tional complexity theory provides a mathematical framework that is able to
explain these 'observations from practice' and that yields a classification of
problems into easy and hard ones. In this section, we briefly sketch some of
the main points of this theory. For more information, the reader is referred
to the books by Garey & Johnson [189] and Papadimitriou [413].
A computational problem can be viewed as a function f that maps every
input x to an output f(x). In complexity theory, we usually deal with
so-called decision problems, where the output f(x) can only take the two
values of YES and NO. Since by means of binary search one can represent
every optimization problem as a 'short' sequence of decision problems, this
restriction is not really essential. We distinguish between two basic schemes
of encoding the numbers in an input. One scheme is the unary encoding,
where any integer k is encoded by k bits (e.g., 6 == 111111). Another scheme
is the standard binary encoding that is used in every standard computer
(e.g., 6 == 110). Any other encoding scheme uses roughly (i.e., up to a
constant factor) the same bits as the binary encoding scheme. The size Ixl
of an input x is the overall number of bits used for encoding x under some
given encoding scheme.
An algorithm is a step-by-step procedure for solving a computational
problem. For a given input x, it generates the correct output f(x) after a
finite number of steps. The time complexity of an algorithm expresses its
worst-case time requirements, i.e., the total number of elementary opera-
tions, such as additions, multiplications and comparisons, for each possible
problem instance as a function of the size of the instance. An algorithm is
said to be polynomial or a polynomial (time) algorithm if its time complexity
is bounded by a polynomial in input size.
In complexity theory, an algorithm is considered to be 'good' if it is
polynomial; otherwise, it is considered to be 'bad'. Similarly, a computa-
tional problem is considered easy if it possesses a polynomial algorithm, in
which case, it is said to be polynomially solvable. Problem 111 E WjGj that
is mentioned in the first paragraph of this section is polynomially solvable,
and thus easy. The class 'P is the set of all polynomially solvable problems.
When an optimization problem is formulated as a decision problem, one
can usually certify those instances x for which f(x)=YES by a short cer-
tificate. For example, any YES-instance of "is there a feasible schedule for
which all jobs are completed by their due dates" has as a short certificate a
34 Bo Chen, C.N. Potts, and G.J. Woeginger

corresponding schedule in which all jobs are on time. Given the certificate,
a YES-answer can be verified in polynomial time. The class NP contains
all problems that fulfill the following two conditions.

(i) For every YES-instance x there exists a certificate y such that Iyl is
polynomially bounded in Ixl.
(ii) There exists a polynomial algorithm that verifies whether a given y is
a valid certificate for a given instance x.

Even though it is not known whether P=NP, there is a general belief


that this is not the case. In fact, answering this question is one of the hardest
and most important open problems in computer science.
The most difficult problems in NP are the NP-complete problems. Un-
less P=NP, which is considered unlikely, an NP-complete problem does
not possess a polynomial algorithm. Intuitively speaking, a problem X in
NP is NP-complete if any other problem in NP can be solved in polyno-
mial time by an algorithm that makes a polynomial number of calls to a
subroutine that solves problem X. Note that this implies that, if an NP-
complete problem allowed a polynomial algorithm, then all problems in NP
would allow polynomial algorithms. Researchers on complexity theory have
identified a huge body of NP-complete problems. The problem JIICmax
that is mentioned in the first paragraph of this section is NP-complete, and
thus is difficult.
Note that the concepts of polynomial solvability and NP-completeness
crucially depend on the encoding scheme used. If one changes the encod-
ing scheme from binary to unary, the problem may become easier, as the
input becomes longer and hence the restrictions on the running time of a
polynomial algorithm are less stringent. A problem that is NP-complete
under the unary encoding scheme is called strongly NP-complete. An opti-
mization problem is (strongly) NP-hard if its decision version is (strongly)
NP-complete. When we state that a problem is NP-hard, it should be as-
sumed that the problem is not known to be strongly NP-hard. A problem
that can be solved in polynomial time under the unary encoding scheme
is said to be pseudo-polynomially solvable. The corresponding algorithm is
called a pseudo-polynomial (time) algorithm.
A Review of Machine Scheduling 35

3.2 Enumerative Algorithms

3.2.1 Branch and Bound


The process of solving a problem using a branch and bound algorithm can
be conveniently represented by a search tree. Each node of the search tree
corresponds to a subset of feasible solutions to a problem. A branching rule
specifies how the feasible solutions at a node are partitioned into subsets,
each corresponding to a descendant node of the search tree. For many sin-
gle machine scheduling problems, a solution is represented by a sequence of
jobs. For such problems, commonly used branching rules involve forward
sequencing and backward sequencing. In a forward sequencing branching
rule, each node corresponds to an initial partial sequence in which the jobs
in the first positions are fixed. The first branching creates nodes correspond-
ing to every job that can appear in the first position in the sequence, the
second branching creates nodes that fix the job in the second position of the
sequence, and so on. A backward sequencing branching rule is similar except
that the sequence is built from the end: each node corresponds to a final
partial sequence in which the jobs in the last positions are fixed.
The scheduling problems that we consider require an objective function
to be minimized. A lower bounding scheme associates a lower bound with
each node of the search tree. The idea is to eliminate any node for which
the lower bound is greater than or equal to the value of the best known
feasible solution, since any further exploration of such a node cannot lead
to an improved solution. Lower bounds are usually obtained by solving a
relaxed problem. For example, lower bounds for non-preemptive scheduling
problems can be obtained by solving the corresponding preemptive version of
the problem. One widely used general-purpose technique for obtaining lower
bounds is Lagrangean relaxation. Having obtained a suitable formulation of
the problem, Lagrangean relaxation removes some of the constraints and
incorporates them into the objective function using Lagrange multipliers.
An iterative method known as subgradient optimization can be used to find
values of the multipliers that yield the best bounds, although a construc-
tive method for finding multiplier values that is based on problem-specific
features sometimes provides good quality bounds at low computational ef-
fort. Solving the linear programming relaxation of an integer programming
formulation offers an alternative method of computing lower bounds. The
tightness of the resulting lower bounds is often improved through a polyhe-
dral approach that provides additional valid inequalities.
36 Bo Chen, C.N. Potts, and G.J. Woeginger

Another noteworthy feature of branch and bound algorithms is the use of


dominance rules that attempt to eliminate nodes prior to the computation
of lower bounds. Also, a heuristic method is often used at certain nodes
(especially the root node) to generate feasible solutions, the values of which
provide upper bounds. A search strategy defines the order in which the nodes
of the tree are selected for branching. One commonly used strategy is to
branch from a node with the smallest lower bound. However, it is easier
to implement a depth-first search plus backtracking strategy that branches
from a node in the most recently created subset.

3.2.2 Dynamic Programming


A dynamic programming algorithm has the property that each partial solu-
tion is represented by a state to which a value (or cost) is assigned. When
two or more partial solutions achieve the same state, one with the lowest
value is retained, while the others are eliminated. It is necessary to define
the states in such a way that this type of elimination is valid. For example,
consider the single machine sequencing problem 111 E f; in which partial
solutions are initial partial sequences. A state can be represented by the
jobs that are sequenced and the time that the machine becomes available to
process the unsequenced jobs.
A decision transforms a partial solution from one state to another. All
possible decisions are considered. The elimination of partial solutions is
usually implemented by means of a recursion, which compares the value of
all possible ways of entering a state and chooses the one which has minimum
value. Applying the recursion determines the optimal solution value, and
backtracking is then used to find the corresponding decisions that define the
solution.

3.3 Local Search


3.3.1 Neighborhood Search
In our discussion of local search heuristics, we sometimes mention some
constructive heuristics which are useful for obtaining a starting solution in
neighborhood search. Such a heuristic typically generates a single solution,
and its running time is typically a low-order polynomial.
In neighborhood search, a current solution is transformed into a new
solution according to some neighborhood structure. An acceptance rule
A Review of Machine Scheduling 37

decides whether the move from the current solution to the transformed so-
lution should be accepted, although the decision is sometimes delayed until
the complete neighborhood (or a subset of it) is explored. If a move is
accepted, then the transformed solution replaces the previous solution and
becomes the current solution; otherwise, the move is rejected and the cur-
rent solution is retained. This process is repeated until some termination
criterion is satisfied. The acceptance rule is usually based on the objec-
tive function values of the current solution and its neighbor. Examples of
neighborhoods in single machine sequencing are the transpose neighborhood
in which two jobs occupying adjacent positions in the sequence are inter-
changed, the swap neighborhood in which any two jobs in the sequence are
interchanged, and the insert neighborhood in which one job is removed from
its current position in the sequence and inserted elsewhere.
The simplest type of neighborhood search method is descent, which is
sometimes known as iterative local improvement. In this method, only
moves that result in an improvement in the objective function value are
accepted. Under a first improve search, the first move that improves the ob-
jective function value is accepted. On the other hand, best improve selects
a move that yields the best objective function value among all neighbors.
When no further improvement can be achieved, a descent method terminates
with a solution that is a local optimum. The local optimum is not necessarily
the true global optimum. A widely used remedy for this drawback is to use
multi-start descent in which multiple runs of descent from different starting
solutions are performed, and the best overall solution is selected. Simulated
annealing and tabu search are other neighborhood search methods which
allow the search to escape from local optima.
In simulated annealing, a probabilistic acceptance rule is used. More
precisely, any move that results in an improvement in the objective function
value, or leaves the value unchanged, is accepted. On the other hand, a move
that increases the objective function value by 8 is accepted with probability
exp( -8ft), where t is a parameter known as the temperature. The value of
t changes during the course of the search; typically t starts at a relatively
high value and then gradually decreases. Simulated annealing has a variant
called threshold accepting in which a move is accepted if and only if 8 ~ T,
where T is a parameter that plays the same role as temperature in simulated
annealing.
In contrast to simulated annealing, tabu search is a deterministic neigh-
borhood search method. A move is made to the best solution in the neigh-
borhood of the current solution (even if this increases the objective function
38 Bo Chen, C.N. Potts, and G.J. Woeginger

value), subject to certain forbidden moves that are defined by a tabu list.
The tabu list stores attributes of the previous few moves, and moves that
reverse these attributes are not allowed (they are tabu). The tabu list is pri-
marily aimed at preventing the method from cycling, although it also serves
the function of driving the search into unexplored regions of the solution
space. To prevent the occasional loss of a high quality solution, a tabu move
is sometimes allowed if it satisfies an aspiration condition.

3.3.2 Genetic Algorithms


In contrast to the single current solution in neighborhood search, a genetic
algorithm works with a population of solutions, where each solution is rep-
resented as a string. A mating pool of solutions is formed from the current
population, where some solutions may not be selected, and others may ap-
pear several times. Solutions are chosen for the mating pool according to
fitness values, where better quality solutions are assigned a higher fitness. A
new population is formed by applying two genetic operators. First, pairs of
solutions in the mating pool undergo crossover: sections of the strings for
the two solutions are interchanged, thereby giving two new solutions. The
aim is that one of the new solutions will inherit the desirable features of both
parents. Second, mutation changes some elements of the string with the aim
of maintaining diversity in the new population. The process is repeated until
some termination condition is satisfied.

3.4 .Approximation Algorithms


Approximation algorithms are used to generate approximate solutions with
modest computational effort. For a scheduling problem with the objec-
tive of minimizing a cost function F(.) ~ 0, an algorithm H is called a
p-approximation algorithm (p ~ 1) if F(H(I)) :5 pF(S*(I)) for all problem
instances I, where H(I) and S*(I) denote the schedule found by algorithm
H and an optimal schedule, respectively. We refer to p as a ratio guaran-
tee for algorithm H since it provides a bound on the performance under
any circumstances. If such p is the smallest possible value, then it is called
the worst-case (performance) ratio of algorithm H. An approximation algo-
rithm is often called a heuristic. A family of algorithms {He} for a problem
is called a polynomial (time) approximation scheme (PTAS, for short) if, for
every c > 0, He is a (1 + c)-approximation algorithm whose running time
is polynomial in the input size. Furthermore, if the running time of every
A Review of Machine Scheduling 39

He is bounded by a polynomial in the input size and 1/£, then the fam-
ily is called a fully polynomial (time) approximation scheme (FPTAS). If a
scheduling problem is strongly N1'-hard, then it allows neither an FPTAS
nor a pseudo-polynomial algorithm unless 1'=N1' (see, for example, Garey
& Johnson [189]).
There are other methods than the aforementioned worst-case analysis
for evaluating the performance of an approximation algorithm, which in-
clude empirical analysis and probabilistic analysis. Empirical analysis of an
algorithm basically involves running the algorithm on a large number of test
problem instances, preferably with known or estimated optimal solutions.
The instances should have a variety of different characteristics including
those that are likely to occur in practice. The performance of the algo-
rithm is then evaluated statistically by comparing the solution values that
are generated with the optimal or estimated values. Since such an evalua-
tion depends on the characteristics of the generated sample instances, this
approach is mainly used for comparing the empirical performance of alter-
native algorithms. Probabilistic analysis of an algorithm can be regarded
as the analytical counterpart of empirical analysis. The main goal of the
analysis is to provide a probabilistic characterization for the average-case
performance of the heuristics, under some assumptions on the probability
distribution of the problem parameters.
In this chapter, we are mainly concerned with worst-case analysis of al-
gorithms. We distinguish between on-line and off-line performance measures
by using different terminology. If in the above definition for the worst-case
performance, H is an on-line p-approximation algorithm, then we say that
it is a p-competitive algorithm or H is p-competitive. The worst case ratio
of an on-line algorithm is referred to as its competitive ratio. Note that the
competitiveness of an on-line algorithm is evaluated with respect to an off-
line optimal algorithm, and hence it indicates the loss associated with not
having complete information of the problem instances. In our review, both
p-approximation algorithms and on-line algorithms are assumed to be poly-
nomial, unless otherwise stated. When giving statements about the running
time of a PTAS or an FPTAS, we refer to the time requirement to obtain
a ratio guarantee of 1 + £, for any £ > O. If not explicitly stated otherwise,
the running times of approximation algorithms are low-order polynomials.
40 Bo Chen, C.N. Potts, and G.J. Woeginger

4 Single Machine Problems


4.1 Maximum Lateness
4.1.1 Complexity
For problem 11 ILmaJCl Jackson [281] shows that an optimal solution is ob-
tained in 0 (n log n) time by sequencing jobs in non-decreasing order of their
due dates. This method is known as the earliest due date or EDD rule. To
justify the use of the EDD rule for this problem, we use the following job
reinsertion argument. Consider any optimal sequence a. Suppose that some
job k is sequenced before another job j, where dk > dj. Consider the trans-
formed sequence a' in which job k is removed from its original position and
inserted immediately after job j. All jobs sequenced before k and after j in
a have the same completion time in both sequences, and job j together with
all jobs sequenced between k and j in a are completed Pk units earlier in a'.
Moreover, Lk{a') = Ck{a') - dk = Cj{a) - dk < Cj{a) - dj = Lj{a). Thus,
Lmax{a') ~ Lmax{a), which implies that a' is also an optimal sequence. Rep-
etition of this job reinsertion argument yields an optimal sequence in which
jobs appear in EDD order. Note that there may be optimal sequences in
which jobs are not sequenced in non-decreasing order of their due dates.
In the presence of deadlines, the problem remains polynomially solvable.
For this problem 11dj ILmaJCl a bisection search over all possible values of Lmax
is used. If L is a trial value of L max , then each job is assigned a deadline
min{dj,dj + L}. To test whether this value of L can be achieved, the jobs
are sequenced in EDD order with respect to the assigned deadlines.
Polynomial solvability is also achievable when there are precedence con-
straints, since Lawler [321] proposes an O{n 2 ) algorithm for the more general
problem 1lpreclfmax. The key observation, which is proved using a job rein-
sertion argument of the type described above, is that some job j, which
has no successors and is chosen so that 1i{~~=lPk) is as small is possible,
appears in the last position of some optimal sequence. Repeated applica-
tion of this result provides an optimal sequence. Baker, Lawler, Lenstra
& Rinnooy Kan [32] and Gordon & Tanaev [218] independently generalize
Lawler's approach to problem 1Irj,pmtn,preclfmax' Both algorithms also
require O{n 2 ) time.
Release dates add a substantial degree of difficulty, since problem
l\rjILmax is shown by Lenstra, Rinnooy Kan & Brucker [349] to be strongly
NP-hard. However, an optimal solution for problem l\rj,pmtnILmax is con-
structed in O{n logn) time using a generalized EDD algorithm of Horn [273].
A Review of Machine Scheduling 41

Starting at the smallest release date, this algorithm always schedules an


available job with the smallest due date. A preemption occurs at a release
date if the newly released job has a smaller due date than the that of the job
currently being processed. The resulting schedule has at most n -1 preemp-
tions. Also, problem 1lrj,pj = 11Lmax is solvable in O{n) by an algorithm
of Frederickson [175].
Simons [493] develops an O(n 3 log n} algorithm for the feasibility problem
1lrj, dj,pj = ph where P is an arbitrary positive integer. Using a bisection
search over all possible values of L max , she uses this feasibility algorithm
to develop a polynomial time algorithm for problem 1lrj,pj = p, djlLmax.
Using an improved implementation, Garey, Johnson, Simons & Tarjan [191]
reduce the time complexity of the feasibility algorithm to O{n log n).
An observation of Lageweg, Lenstra & Rinnooy Kan [316] sometimes
allows precedence constraints to be handled by simply adjusting release and
due dates and then applying the appropriate algorithm for the problem
without precedence constraints. If job j is required to precede job k, then the
release date of job k can be reset to max{rk' rj+pj}, and the due date of job j
can be reset to min{ dj, dk-Pk}' Such a resetting requires O{h) time, where h
is the number of precedences. Thus, an O{nlogn+h} algorithm is obtained
for problems 1Irj,pmtn, preel Lmax. Further, Monma [381] uses due date
resetting to obtain an O{n + h) algorithm for problem 1lpree,pj = 1lLmax.
Problem 11rjlLmax is of special importance since it is a subproblem of
some multi-stage problems which have makespan or maximum lateness as
the optimality criterion. An important structural feature, as observed by
Lageweg, Lenstra & Rinnooy Kan [316], is that of reversibility. If the due
date of each job j is replaced by a delivery time qj, where qj = -dj, and
it is required to minimize maxj{Gj + qj}, then an equivalent problem re-
sults. Release dates and delivery times play a symmetric role, and can be
interchanged for each job to yield an equivalent problem.

4.1.2 Enumerative Algorithms

A key feature of branch and bound, and approximation algorithms for prob-
lem 11rjlLmax is the O{n logn) heuristic of Schrage [466] which schedules jobs
using the generalized EDD rule. Specifically, this heuristic selects, from the
available jobs, one with the smallest due date to schedule next. Suppose that
the jobs are reindexed so that the heuristic yields the sequence (I, ... , n).
Then the maximum lateness is given by
42 Bo Chen, C.N. Potts, and G.J. Woeginger

,
Lmax = ri + LPj - d"
j=i

where 1 is the critical end job, (i, . .. ,1) form a block of jobs, and i ~ l. The
heuristic ensures that ri ~ rj for j = i, . .. ,n. If job I has the largest due
date among jobs i, ... , 1, then the heuristic provides an optimal solution.
Otherwise, we can choose an interference job k, which is the last job before
1 in the schedule such that dk > d,. None of the jobs k + 1, ... ,1 is available
when job k starts processing. Also,

LB(S) = rp.inrj + LPj - ~axdj


JEA . S JEB
JE

where A ~ Sand B ~ S are the sets of jobs that are candidates to be


sequenced before and after the other jobs in S, respectively, is a lower bound
for any subset S of jobs. It is easily verified by considering S = {k, . .. , l}
that if some jobs in {k + 1, ... , I} are sequenced before and after job k, then
the LB(S) exceeds the maximum lateness of the heuristic. Thus, job k is
either sequenced before or after all jobs in {k + 1, ... , I} in each optimal
schedule.
For problem 1lrjlLmax, various branch and bound algorithms are avail-
able which extend to problem 1lrj,preclLmax by using the release date and
due date adjustments that are described in Section 4.1.1. The algorithm
of Baker & Su [34] employs a forward sequencing branching rule and uses
a lower bound that is obtained by allowing preemption. More sophisti-
cated approaches of McMahon & Florian [377], Lageweg, Lenstra & Rin-
nooy Kan [316], Carlier [79] and Larson, Dessouky & Devor [320] use special
branching rules that are obtained from the generalized EDD heuristic. Car-
lier's algorithm appears to be the most efficient. It uses a binary branching
rule that fixes the interference job k to be sequenced either before or after
all jobs in {k + 1, ... ,I} by resetting the due date and release date of job
k, as described above, and it employs the lower bounds LB ({k + 1, ... , 1} )
and LB({k, ... ,I}) which require less computation time than the preemp-
tive lower bound. Computational results for instances with up to 10 000
jobs are reported. Carlier [79], Nowicki & Zdrzalka [403], and Nowicki &
Smutnicki [398] prove that the preemptive lower bound dominates many
of its various competitors. Grabowski, Nowicki & Zdrzalka [223] make the
observation that moving a job from one position to another within a block
does not result in an improved schedule if these positions are not the first
or last in the block. They use this observation to develop a branching rule,
A Review of Machine Scheduling 43

although the resulting branch and bound algorithm is less efficient than Car-
lier's algorithm. Zdrzalka & Grabowski [574] develop a branch and bound
algorithm for problem 1Jri,preclfmax using some of the ideas in algorithms
for problem 1Jri,precILmax'
Some work on the job shop problem JIICmax provides results relevant
to problem 1JriILmax . For example, Balas, Lenstra & Vazacopoulos [37]
generalize Carlier's algorithm to a version of problem 1Jri,preclLmax with
delayed precedence constraints, where each precedence between a pair of jobs
imposes a delay between the completion time of the first and the start of the
second. Branch and bound algorithms that rely heavily on analysis of single
machine subproblems of the form 1JrilLmax are discussed in Section 7.3.2.

4.1.3 Approximation
Worst-case analysis of approximation algorithms for problems 1Jri ILmax and
1h, preclLmax assumes that all due dates are non-positive. Equivalently, the
delivery time model is assumed with non-negative delivery times. Without
this restriction, results are likely to be elusive, since the problem of determin-
ing whether Lmax ::; 0 is NP-complete. Kise, Ibaraki & Mine [299] analyze
six approximation algorithms, including the generalized EDD heuristic of
Schrage [466], and show that each has a worst-case ratio of 2. Potts [424]
proposes an O(n2 Iogn) algorithm in which the generalized EDD heuristic is
applied up to n times, and the best solution is selected. In each application,
a constraint is added that the interference job is scheduled after the critical
end job. This algorithm has a worst-case ratio of 3/2. Nowicki & Smut-
nicki [400] show that the computational effort to achieve a worst-case ratio
of 3/2 can be reduced to O(n log n). Their algorithm applies the generalized
EDD heuristic and then constructs one other sequence, finally selecting the
better of the two schedules. An improvement to the worst-case ratio is ob-
tained by Hall & Shmoys [235] in an algorithm that relies on applying the
generalized EDD heuristic to both the original and reverse problems. Their
O(n2 log n) algorithm has a worst-case ratio of 4/3. Hall & Shmoys also
propose two PTAS's for problem 1hlLmax, the running times of which are
O(24t(n£")4t+3) and O(n logn + n(4£")8t2 +8H2), where £" = l/e.
The on-line problem lion-line, TjlLmax is analyzed by Hoogeveen & Vest-
jens [270, 547]. They show that no on-line algorithm can be better than
(1 + v'5)/2-competitive. Note that if we use the EDD rule to schedule a
job whenever the machine becomes idle, then the maximum lateness of the
resulting schedule can be as large as twice the optimal value. Taking this
44 Bo Chen, C.N. Potts, and G.J. Woeginger

into account, Hoogeveen & Vestjens introduce a clever waiting strategy in


applying the EDD rule. They prove that their modified EDD algorithm is
(1 + V5) /2-competitive, and therefore is the best possible.

4.2 Total Weighted Completion Time


4.2.1 Complexity
For problem 111 E WjCj, Smith [495] shows that an optimal solution is ob-
tained in O(nlogn) time by sequencing jobs in non-decreasing order of
Pj/Wj. This method is known as the shortest weighted processing time
or SWPT rule; it becomes the SPT rule in the case of unit weights. To
show that any optimal sequence is obtained from the SWPT rule, we use
an adjacent job interchange argument. Consider any sequence U in which
the jobs are not sequenced in SWPT order. Then U contains a job k that
is sequenced immediately before another job j, where Pk/Wk > Pj/Wj. Con-
sider the transformed sequence u' in which jobs j and k are transposed. All
jobs apart from j and k have the same completion time in both sequences.
Therefore,

L WhCh(U') - L WhCh(?")
- Wk(Ck(U') - Ck(U)) - Wj(Cj(u) - Cj(u')
- WkPj -WjPk
= WjWk(Pj/Wj - Pk/Wk) < 0,
which shows that U cannot be an optimal sequence.
In the presence of precedence constraints, the polynomial solvability of
problem Ilprecl EWjCj depends on the structure of the precedence con-
straints. Horn [272], Adolphson & Hu [10] and Sidney [491] each de-
velop O(n log n) algorithms for precedence constraints in the form of a tree,
while Lawler [324] considers series-parallel precedence constraints. A series-
parallel graph is one that is built by performing a sequence of series and
parallel operations to subgraphs, where the initial subgraph comprises n sin-
gle vertices that correspond to jobs. A series operation adds a precedence
arc from each vertex of the first subgraph to each vertex of the second,
whereas a parallel operation combines two subgraphs without adding prece-
dence arcs. The composition process can be represented by a tree, which
can be found in O(n 2 ) time from the precedence graph using an algorithm
of Muller & Spinrad [388] that improves upon a previous O(n 3 ) algorithm
A Review of Machine Scheduling 45

of Buer & Mohring [75]. After obtaining this tree, Lawler's algorithm re-
quires O(nlogn) time. For general precedence constraints, Lawler [324]
and Lenstra & Rinnooy Kan [346] show that problems 11precl E Cj and
1lprec,pj = 11 E WjCj are strongly NP-hard.
Release dates also add to the difficulty, since problem 1hiE Cj is shown
by Lenstra, Rinnooy Kan & Brucker [349] to be strongly NP-hard. However,
problem 1h,pmtnl E Cj is solvable by the shortest remaining processing
time (SRPT) rule of Schrage [464]: each time that a job is completed, or
at the next release date, the job to be processed next has the smallest
remaining processing time among the available jobs. This algorithm does
not extend to minimizing total weighted completion time, since Labetoulle,
Lawler, Lenstra & Rinnooy Kan [315] show that problem 1h,pmtnl E WjCj
is strongly NP-hard.
In the case of deadlines, problem 11dj I E Cj is solvable in O( n log n) time
by a backward scheduling algorithm of Smith [495] that repeatedly selects
the job with the largest processing time among feasible candidates to be
sequenced in the last unfilled position. However, Lenstra, Rinnooy Kan &
Brucker prove that 11djl EWjCj is strongly NP-hard. Du & Leung [149]
show NP-hardness for problem 1h,pmtn,djl E Cj.

4.2.2 Enumerative Algorithms


Potts [422, 425] develops a 'linear-ordering' formulation for 11precl E WjCj
which uses zero-one variables Xjk to indicate whether of not job j is se-
quenced before job k. In [425], he performs a Lagrangean relaxation of the
constraints Xjk + Xkj = 1 and Xjk + Xkl + Xlj ~ 1, selecting each multiplier
as large as possible subject to retaining the non-negativity of the objective
function coefficients. The resulting branch and bound algorithm is effective
in solving problems with up to 100 jobs. Queyranne & Wang [441] develop
an alternative 'completion time' formulation that uses variables Cj and valid
inequalities of the form

for any S ~ {1, ... , n}, which represent the machine capacity constraint.
They also derive valid inequalities that use the precedence constraints. To-
gether, these inequalities give a complete polyhedral characterization when
the precedence constraints are series-parallel. Queyranne & Wang [442]
46 Bo Chen, C.N. Potts, and G.J. Woeginger

evaluate the quality of the lower bounds provided by this polyhedral ap-
proach. Van de Velde [538] proposes lower bounds that are derived through
a Lagrangean relaxation of the precedence constraints in a completion time
formulation formulation. He develops a dual ascent method for finding the
multipliers. Also, these multipliers are used in a heuristic that is based on
decomposition. Hoogeveen & Van de Velde [266] improve on the Lagrangean
lower bound of Velde by a new technique that introduces slack variables in
Lagrangean problem. Although the resulting bound can never be tighter
than the polyhedral bound of Queyranne & Wang, its computational re-
quirements are significantly less.
Branch and bound algorithms for problems 11rjlEGj and 11rjlEwjGj
rely extensively on the dominance rules derived by Dessouky & Deogun [138],
Bianco & Ricciardelli [51], Deogun [137], Hariri & Potts [240], Belouadah,
Posner & Potts [47] and Chu [109]. For problem 11rjl E Gj, Ahmadi &
Bagchi [11] show that the lower bound obtained by solving the preemptive
problem 1lrj,pmtnl E Gj using the SRPT rule dominates all of the other
known bounds. Chu uses this lower bound in a branch and bound algorithm
that is effective in solving problems with up to 100 jobs.
For problem 11rjl EWjGj, allowing preemption does not lead to a use-
ful lower bounding scheme since the corresponding preemptive problem is
NP-hard (see Section 4.2.1. Hariri & Potts [240] derive a lower bound by
performing a Lagrangean relaxation of the constraints Gj ~ r j + Pj. They
use a 'multiplier adjustment' method to construct values of the multipliers.
In this method, multiplier values are chosen so that the solution of the La-
grangean problem corresponds to a schedule that is previously obtained by
a heuristic method. An improved lower bounding scheme is developed by
Belouadah, Posner & Potts [47], which is based on the principle of job split-
ting. If job j is split into two pieces jl and h, then the pieces are regarded
as jobs with processing times and weights chosen so that Ph + Ph = Pj
and wh + wh = Wj. If the two pieces are scheduled contiguously, then this
splitting reduces the total weighted completion time by the 'breaking cost'
whPh. A forward procedure constructs a schedule and splits jobs so that, at
any time, the machine processes an available job or piece j for which Pj /Wj is
as small as possible. The lower bound is the sum of the total weighted com-
pletion time for this schedule and the breaking cost. Computational results
show that the branch and bound algorithm is effective in solving instances
with up to 50 jobs. Dyer & Wolsey [154], Sousa & Wolsey [499] and Van
den Akker [533] study integer programming formulations and derive valid
inequalities.
A Review of Machine Scheduling 47

For problem 11dj l ~ WjCj , branch and bound algorithms are similar in
spirit to those for 1hl ~ WjCj. In particular, Potts & Van Wassenhove [431]
develop a lower bound based on the Lagrangean relaxation of constraints
Cj :-:; dj , where multipliers are computed using a multiplier adjustment
method, and Posner [420] proposes an improved algorithm which uses a
lower bounding procedure that is based on job splitting.

4.2.:r Approximation
There are various studies on approximation algorithms for both the off-
line and on-line versions of problems 1hl ~ Cj and 11Tjl ~ WjCj. We first
discuss off-line results.
For the unweighted problem 1h I ~ Cj, Phillips, Stein & Wein [417]
propose a 2-approximation algorithm in which the jobs are sequenced in the
order G' that they complete in the solution of the corresponding preemptive
problem 1h,pmtnl ~ Cj. If the jobs are renumbered so that G' = (1, ... , n),
then the completion time of each job k in the schedule defined by G' is
k
Ck(G') = Ti + 2::>j,
j=i

for some job i, where i :-:; k. Since Ti :-:; Ck(pmtn) and ~j=iPj :-:; Ck(pmtn),
where Ck(pmtn) is the completion time of job k in the preemptive schedule,
we obtain ~ Cj(G') :-:; 2 ~ Cj(pmtn). The observation that ~ Cj(pmtn) is
a lower bound on the optimal value of the total completion time for the
non-preemptive problem provides the desired ratio guarantee of 2.
Chekuri, Motwani, Natarajan & Stein [89] give an improved polynomial
time approximation algorithm for problem 11Tjl ~ Cj that has a ratio guar-
antee of e/(e-1) ~ 1.58 (where e denotes the base ofthe natural logarithm).
Their algorithm is based on a clever rounding technique of the preemptive
relaxation of this problem. Kellerer, Tautenhahn & Woeginger [297] pro-
vide an approximation algorithm for the corresponding problem 1h I ~ Fj
of minimizing total flow time, where the flow time of job j is defined as
Fj = Cj - Tj, with a ratio guarantee of O( y'n). They also prove that, unless
P=,NP, up to a constant factor no better guarantee can be achieved.
Hall, Schulz, Shmoys & Wein [233] propose a general methodology for
developing approximation algorithms based on sequencing jobs according
to their completion times in the solution of a linear program. This tech-
nique yields a 3-approximation algorithm for problem 1h, precl ~ Wj Cj .
48 Bo Cben, C.N. Potts, and G.J. Woeginger

Using a similar approach, Goemans [206] develops a 2-approximation algo-


rithm for problem llrjl 'EWjCj that runs in O(n2 ) time. He simultaneously
considers two equivalent linear programming relaxations for the problem:
one involves completion time variables and the other one involves preemp-
tive time-indexed variables. While the analysis is based on the first re-
laxation, the approximation algorithm is essentially based on the second.
Hall, Schulz, Shmoys & Wein [233] also use their general method to es-
tablish 2-approximation algorithms for the three problems llprecl 'EWjGj,
llrj,prec,pj = 11 'EWjCj and llrj,pmtn,precl 'EWjCj.
We now review results for on-line scheduling. Fiat & Woeginger [169]
discuss problem llon-line-listl 'E Cj. They give a {logn)1+~-competitive on-
line algorithm, and they show that no on-line algorithm can be better than
log n-competitive, even if preemption is allowed. The closely related problem
llon-line-listl 'EFj does not possess an on-line algorithm whose competitive
ratio is bounded by a function that solely depends on n; the competitive
ratio of anyon-line algorithm for this problem must depend on the job
processing times.
For problem lion-line, rjl 'E Cj, Hoogeveen & Vestjens [270, 547] show
that no on-line algorithm can be better than 2-competitive. Three different
2-competitive algorithms are reported independently by Hoogeveen & Vest-
jens [270, 547], by Phillips, Stein & Wein [417], and by Stougie [506]. Recall
that in the case of zero release dates, it is optimal to schedule jobs according
to the SPT rule. Each of the three on-line algorithms is a variant of the SPT
algorithm. Hoogeveen & Vestjens apply SPT with modified job release dates,
and Phillips, Stein & Wein use an optimal preemptive schedule as a guide.
Note that SPT in its original form is only n-competitive for minimizing either
'E Cj or 'E Fj (Mao, Kincaid & Rifkin [365]); up to a constant factor, this
rather disappointing bound is best possible for problem 1Ion-line,rjl'EFj.
The results of Chekuri, Motwani, Natarajan & Stein [89] yield a randomized
on-line approximation algorithm for problem lion-line, rjl 'E Gj with a ratio
guarantee of e/(e - 1) ~ 1.58. Vestjens [547] shows that this is the best
possible ratio that can be guaranteed by a randomized algorithm in this
on-line model.
If job weights are imposed, a competitive on-line algorithm has to be
more involved. We will see in Section 6.4.2 that by using the on-line frame-
work of Hall, Shmoys & Wein [236] to assign jobs into sub-intervals of time,
a (4 + c)-competitive algorithm for problem lion-line, rjl 'E WjGj can be
derived. However, it is possible to do better here. Recall that, for the
off-line problem 111 'E WjGj, the SWPT rule generates an optimal sched-
A Review of Machine Scheduling 49

ule. By applying the SWPT rule to sequence the jobs assigned to each
sub-interval of time by the on-line framework, Hall, Schulz, Shmoys & Wein
[233] show that a {3 + c)-competitive algorithm results. Through the use
of randomization, Chakrabarti et al. [84] improve the on-line framework
to obtain an algorithm for problem llon-line,rjl EWjGj that is (2.89 + c)-
competitive. Goemans [206] uses his analysis for problem 11rjl E WjGj to
develop a (1 + v'2)-competitive algorithm for problem lion-line, rjl E WjGj
that has a time requirement of O(nlogn).
For non-clairvoyant scheduling, all results that are presented in Sec-
tions 5.4.2 and Sections 6.4.3 for parallel machines naturally also apply to
the special case of a single machine.

4.3 Total Weighted Tardiness


4.3.1 Complexity
For problem 111 ETj, Lawler [323] presents a pseudo-polynomial dynamic
programming algorithm, which relies on the following decomposition of the
problem. Suppose that the jobs are indexed so that d 1 $ ... $ dn , where
Pj $ Pj+I if dj = dj+l' and let job j have the largest processing time. Then
there is some index k, where j $ k $ n, such that jobs 1, ... ,j - 1,j +
1, ... ,k are sequenced before job j and jobs k + 1, ... ,n are sequenced after
job k. Lawler observes that each subproblem is represented by the time at
which processing starts, and by three job indices which identify the jobs in
the subproblem. The resulting dynamic programming algorithm requires
O(n 4 p) time, where P = 'E'J=lPj' The complexity status of 111 'ETj is
resolved by Du & Leung [148], who show that the problem is NP-hard.
Koulamas [302] provides a review of algorithms for problem 111 E Tj and
some of its generalizations.
Problem 111 EWjTj is shown by Lawler [323] and Lenstra, Rinnooy Kan
& Brucker [349] to be strongly NP-hard. Since llrj,pj = 11 ETj can be
formulated as a linear assignment problem, it is solvable in O{n 3 ) time.
Lenstra & Rinnooy Kan [346] show that problem 1lprec, Pj = 11 'E Tj is
strongly NP-hard, while strong NP-hardness of problem 11rjl E Tj follows
from the corresponding result for problem 1lrjlLmax.

4.3.2 Enumerative Algorithms


Most enumerative algorithms for problems 111 E Tj and 111 E wjTj rely
heavily on dominance rules to restrict the search. A typical example is
50 Bo Chen, C.N. Potts, and G.J. Woeginger

of the form: there exists an optimal sequence in which job j precedes job k
if Pj ~ Pk, Wj ~ Wk and dj ~ dk. Although the main rules are due to Em-
mons [160], others are derived by Elmaghraby [158] and Shwimer [489], while
further development is provided by Rinnooy Kan, Lageweg & Lenstra [447].
A framework to prove the validity of these results is provided by Yu [568].
Tansel & Sabuncuoglu [526] provide some insights into problem 'hardness'
using geometric observations on the structure of Emmons' dominance rules.
A wide variety of dynamic programming, branch and bound, and hybrid
algorithms for problems 111 L: Tj and 111 L: wjTj appear in the literature.
Schrage & Baker [467] and Lawler [326] propose alternative implementa-
tions of a dynamic programming algorithm in which the precedence rela-
tions obtained from Emmons' dominance rules are regarded as precedence
constraints.
The most effective enumerative algorithms for problem 111 L: Tj employ
Lawler's decomposition ideas. Potts & Van Wassenhove [430], and later
Szwarc [514], develop conditions, in addition to those originally given by
Lawler [323], under which some positions need not be considered in the
search for the optimal position of the job with the largest processing time.
Potts & Van Wassenhove [430] propose an algorithm that successively de-
composes subproblems until they are small enough to be solved by the dy-
namic programming algorithm of Schrage & Baker [467]. Using this ap-
proach, they are able to solve problems with up to 100 jobs. In another
study, Potts & Van Wassenhove [433] provide a computational review of
different implementations of dynamic programming and decomposition al-
gorithms. A more effective branch and bound algorithm is developed by
Szwarc & Mukhopadhyay [517]. They obtain a partial decomposition of the
problem from Emmons' dominance rules and from analysis of adjacent job
orderings between each pair of jobs. Their algorithm uses a decomposition-
based branching rule that includes the conditions developed by Szwarc [514],
and computes a lower bound from SPT and EDD orderings based on a re-
laxation that disassociates due dates from processing times. They provide
computational results for instances with up to 150 jobs. Della Croce, Tadei,
Baracco & Grosso [132] develop an alternative type of decomposition and an
improvement to the lower bound of Szwarc & Mukhopadhyay. Their branch
and bound algorithm exhibits similar computational behavior to that of
Szwarc & Mukhopadhyay. An alternative decomposition approach with the
potential to provided improved algorithms is developed by Chang, Lu, Tang
& Yu [86].
For problem 111 L: wjTj, various relaxations provide lower bounding sche-
A Review of Machine Scheduling 51

mes for inclusion in branch and bound algorithms. The transportation re-
laxation of Gelders & Kleindorfer [194, 195] uses a zero-one integer program-
ming formulation with variables that indicate whether a job j is processed
in a unit-time interval [t - 1, t]. The linear assignment relaxation of Rin-
nooy Kan, Lageweg & Lenstra [447] computes an under-estimate of the
total weighted tardiness of assigning job j to position k in the sequence.
Fisher [172] proposes a Lagrangean relaxation of the machine capacity con-
straints, which produces fairly tight lower bounds but at considerable com-
putational expense. The Lagrange multipliers can be regarded as a price
for using the machine during the relevant unit-time interval. Subgradient
optimization is used to find values of the multipliers. Potts & Van Wassen-
hove [432] adopt the approach of using quickly computed but weaker lower
bounds. They use a Lagrangean relaxation of the constraints Tj ~ OJ - dj,
but use a 'multiplier adjustment' method to find values of the multipliers,
as described in Section 4.2.2. Hoogeveen & Van de Velde [266] suggest a
slight strengthening of this lower bound using their slack variable technique
(see Section 4.2.2). Abdul-Razaq, Potts & Van Wassenhove [4] review var-
ious dynamic programming and branch and bound algorithms. Included in
their review is a lower bound based on dynamic programming state-space
relaxation. This relaxation maps the state-space of a dynamic programming
formulation onto a smaller state space, which has the effect of allowing a 'se-
quence' in which some jobs appear more than once and others do not appear.
Penalties, which assign a cost each time a job is sequenced, are used to im-
prove the lower bound. This state-space relaxation bound requires pseudo-
polynomial time. In their computational experiments, Abdul-Razaq, Potts
& Van Wassenhove find that the branch and bound algorithm of Potts &
Van Wassenhove [432] is the most effective, and it can solve instances with
up to 50 jobs.
For problem 11rjl ETj, Chu [108] proposes a branch and bound algo-
rithm. His lower bound is computed from a relaxation that disassociates
due dates from release dates and processing times: the preemptive problem
1lrj,pmtnl E OJ is solved first using the shortest remaining processing time
rule, and then due dates are assigned to the respective completion times in
EDD order. Various dominance rules are employed, and the branching rule
first uses forward sequencing but switches to backward sequencing when the
completion time of the partial sequence exceeds the largest due date. Com-
putational results show that the algorithm is effective in solving instances
with up to 30 jobs.
52 Bo Chen, C.N. Potts, and G.J. Woeginger

4.3.3 Local Search

Numerous heuristic methods that have modest computational requirements


are available for problem 111 E Tj. Wilkerson & Irwin [558] propose a heuris-
tic that resembles a descent algorithm that applies the transpose neighbor-
hood to the EDD sequence. Fry, Vicens, MacLeod & Fernandez [182] also
use a descent method with the transpose neighborhood, but incorporating
three alternative initial sequences and three alternative search strategies.
Panwalkar, Smith & Koulamas [411] propose an O(n2 ) heuristic which adds
jobs to an initial partial sequence according to tests based on pairs of jobs.
Cheng [103] and Alidaee & Gopalan [19] point out that the heuristics of
Panwalkar, Smith & Koulamas, and Wilkerson & Irwin are closely related
to a modified due date heuristic of Baker & Bertrand [31], which successively
schedules a job j for which max{ t + Pj, dj} is as small as possible when the
machine becomes available at time t. Holsenback & Russell [253] and Russell
& Holsenback [456] develop an O(n2 ) heuristic in which the EDD sequence
is repeatedly improved by moving a job to a later position in the sequence.
Based on results in [253, 456], the heuristic of Russell & Holsenback appears
to be the most effective, and generates solutions to 100-job instances with
total tardiness that exceeds the optimal value by just over 1%, on average.
Yu & Yu [570] extend an earlier decomposition heuristic approach of
Potts & Van Wassenhove [435]. Their O(n2 ) heuristic starts with an EDD
sequence, and then searches for the best position of a job with the largest
processing time. This job is then fixed in the selected position, thereby
generating two subproblems. Each subproblem is decomposed in a similar
way until a complete sequence of jobs is obtained. Initial computational
results with this decomposition approach are encouraging.
There have been several studies on local search algorithms for problems
111 E Tj and 111 E wjTj. Initial work of Matsuo, Suh & Sullivan [372] stud-
ies simulated annealing based on the transpose neighborhood. Potts & Van
Wassenhove [435] use the swap neighborhood in descent and simulated an-
nealing algorithms, which they compare with special-purpose heuristics. For
problem 111 ETj, they find that a special-purpose decomposition heuristic
is slightly superior to the local search methods. On the other hand, a finely
tuned simulated annealing algorithm that employs a descent routine gives
the best results for 111 E wjTj.
Crauwels, Potts & Van Wassenhove [118] compare multi-start descent,
simulated annealing, threshold accepting, tabu search and genetic algo-
rithms for problem 111 E wjTj. For the natural sequence representation of
A Review of Machine Scheduling 53

solutions, the swap neighborhood is used in multi-start descent, simulated


annealing, threshold accepting and tabu search methods. They also propose
a novel binary string representation, where each element indicates whether
the corresponding job is on time or late; a decoding heuristic constructs a
sequence of jobs from the binary string. A neighborhood move reverses one
element of the binary string. Computational results for instances with up to
100 jobs indicated that a genetic algorithm that uses the binary representa-
tion of solutions and incorporates a descent routine based on the transpose
neighborhood, and the two tabu search algorithms that use the sequence
and binary representations provide the best quality solutions.
Congram, Potts and Van de Velde [113] propose a new neighborhood
technique, which they call dynasearch, for problem 111 E wjTj. This tech-
nique uses dynamic programming to search an exponential sized neighbor-
hood comprising combinations of swaps in O(n2 } time. They propose a
multi-start descent algorithm, where the new starting solution is obtained
from the previous local optimum by performing a small number of random
swap moves. Computational results for the same test instances show the
superiority of the dynasearch method over the local search algorithms of
Crauwels, Potts & Van Wassenhove.

4.3.4 Approximation
Developing approximation algorithms with good performance guarantees for
problem 111 E Tj is difficult. Chang, Matsuo & Tang [87] analyze several
neighborhood search heuristics, Yu & Liu [569] analyze the heuristic ofWilk-
erson & Irwin [558], Yu [567] analyzes a backward version of Wilkerson &
Irwin's heuristic, and Yu & Yu [570] analyze their decomposition heuristic.
However, the best ratio guarantee for any of these heuristics in n12.
Lawler [327] presents an FPTAS for problem 111 E Tj which is based
on the rounding of state variables in his decomposition-based pseudo-
polynomial dynamic programming algorithm. The approximation algorithm
has a running time of O(n7 Ie}. Based on the ideas of Gens & Levner [197],
Kovalyov [303] develops a procedure to find an upper bound on the optimal
objective function value that is within a factor of three of a lower bound;
this reduces the running time to O(n6 10gn + n6 Ie}.
54 Bo Chen, C.N. Potts, and G.J. Woeginger

4.4 Weighted Number of Late Jobs


4.4.1 Complexity
For problem 111 L: Wj Uj, a solution is specified by a partition of the jobs into
two subsets; those which are on time and those which are late. A schedule
is constructed from the partition by sequencing first the on-time jobs in
EDD order, while the late jobs are sequenced arbitrarily after all on-time
jobs. For problem 111 L: Uj , an algorithm of Moore [385], which is sometimes
known as the Moore-Hodgson algorithm, solves the problem in O(nlogn)
time. This algorithm repeatedly adds jobs in EDD order to the end of a
partial schedule of on-time jobs. If the addition of job i results in this job
being completed after time dj, then a job in the partial schedule with the
largest processing time is removed and declared late. Discarding a job with
the largest processing time increases the opportunity for subsequent jobs to
be completed by their due dates. Sidney [490] generalizes this algorithm to
handle the case that a specified subset of jobs must be on time. He observes
that jobs of the specified subset are not considered when discarding jobs,
and that it may be necessary to discard more than one job to ensure that
the last job in the current partial schedule is on time. An adaptation of
Moore's algorithm to problem 111 L: WjUj for the case ofreverse agreeability
of processing times with weights (where Pj < Pk implies that Wj ~ Wk)
is proposed by Lawler [322]. Thus, problem 11pj = 11 L: WjUj is solvable
in O(nlogn) time. For problem 11pj = 11 L: Uj, Monma [381] observes
that, by creating sets 8k = {ilk ::; dj < k + I} for k = 1, ... , n - 1 and
8n = {ildj ~ n}, and considering jobs in the order 81, ... ,8n , discarding a
job when it is late, an optimal solution is obtained in O(n) time.
An instance of the decision version of problem 111 E WjUj with Wj = Pj
and dj = L:h=l Ph/2 for each job i is equivalent to the problem Partition,
which is NP-complete, as observed by Karp [294], without being strongly
NP-complete. Thus, problem 111 L: WjUj is NP-hard. However, the prob-
lem is pseudo-polynomially solvable by the following dynamic programming
algorithm of Lawler & Moore [338]. Suppose that the jobs are indexed so
that d1 ::; ... ::; dn , and let Fj(t) be the minimum weighted number of late
jobs for the subproblem involving jobs 1, ... ,i, where the last on-time job
completes at time t. Using initialization of

F: (t) = {O for t = 0,
o 00 for t;f 0,
we have the following recursion: for i = 1, ... ,n,
A Review of Machine Scheduling 55

R(t) = {min{Fj_l(t - Pj),Fj-d t ) + Wj} for t = 0, ... ,dj,


J Fj_l(t)+Wj fort=dj+l, ... ,T,

where T = min{dn ,Ej=IPj} is an upper bound on the completion time of


the last on-time job. The minimum weighted number of late jobs is

min {Fn(t)},
t=O, ... ,T

and an optimal schedule is found by backtracking. The algorithm requires


O(nT) time. Note that an alternative recursion can be defined in which the
state variable t is interchanged with the value of Fj(t), so that the weighted
number of late jobs becomes a state variable, and the minimum completion
time of the last on-time job is a function value. This alternative dynamic
programming algorithm requires O(nW) time, where W = Ej=1 Wj'
Polynomial solvability of minimizing the number of late jobs is not main-
tained in the presence of deadlines or precedence constraints. Lawler [330]
shows that Ildj l E Uj is NP-hard, although this problem is open with re-
spect to pseudo-polynomial solvability. Strong NP-hardness of 1Iprec, Pj =
11 E Uj is established by Garey & Johnson [186], a result which remains
valid for the case of precedence constraints in the form of a set of chains
(Lenstra & Rinnooy Kan [348]), and when each due date is equal to the ear-
liest job completion time as defined by the precedence constraints, plus some
constant k (Steiner [502]). Polynomial algorithms for some special cases of
the latter model are developed by Sharary & Zaguia [485] and Steiner.
Release dates also introduce a substantial degree of difficulty, since
Lenstra, Rinnooy Kan & Brucker [349] show that problem llrjl L Uj is
strongly NP-hard. However, if preemption is allowed, the problem is solv-
able by a dynamic programming algorithm of Lawler [331]: he presents an
O(n3W2) algorithm for problem llri,pmtnl EWjUj, the time complexity
of which is O(n 5 ) for problem llri,pmtnIEUj' Lawler [332] generalizes
Moore's algorithm to solve the following special cases with release dates in
O(n log n) time: problem llril E Uj (and llri,pmtnl E Uj where preemption
is not necessary) for the case of agreeability of release dates with processing
times; and problem llri,pmtnl E WjUj for the case of nested [rj, dj] intervals
for the jobs, and for the case of agreeability of release dates with processing
times and reverse agreeability of release dates and processing times with
weights.
56 Bo Chen, C.N. Potts, and G.J. Woeginger

4.4.2 Enumerative Algorithms


Branch and bound algorithms for problem 111 E WjUj are proposed by Vil-
larreal & Bulfin [550] and Potts & Van Wassenhove [434], each of which
use a dominance rule to restrict the size of the search tree, and a binary
branching rule that fixes a job to be on time or late at each branching.
Villarreal & Bulfin use two fairly weak lower bounds that are based on solv-
ing a relaxation of the original problem using Moore's algorithm. Potts &
Van Wassenhove develop an improved Moore-based bound that dominates
Villarreal & Bulfin's bound. They also propose a lower bound that solves
a scaled version of the problem using Lawler & Moore's dynamic program-
ming [338]. Further, they propose a lower bounding procedure that solves
the following linear programming relaxation in O(nlogn) time:
n
minimize E WjUj
j=1
j
subject to E Ph(l - Uh) :5 dj for j = 1, ... ,n,
h=1
o:5 Uj :5 1 for j = 1, ... ,n,
where Uj = 1 if job j is late, and Uj = 0 if job j is on time. Computa-
tional results show that the linear programming based bound is preferred
to the other lower bounds. Another significant contribution of Potts & Van
Wassenhove is a reduction procedure which uses bounds in an attempt to
establish that certain jobs are on time and certain jobs are late in an optimal
schedule. If the jobs are indexed so that dl :5 ... :5 d n , and job j is on-time,
then by resetting due dates so that
dh = min{dh,dj - Pj} for h = 1, ... ,j -1,
dh = dh - Pj for h = j + 1, ... ,n,
job j can be eliminated from the problem. Computational results obtained
by first applying this reduction p{ocedure, and then solving the resulting
reduced problem, either with the ~awler-Moore dynamic programming al-
gorithm or with the branch and bound algorithm that uses the linear pro-
gramming based bound, shows that both methods are successful in solving
instances with up to 1000 jobs. Preference between dynamic programming
and branch and bound depends the char¥teristics of the data for the par-
ticular problem instance that is to be solvM.
For problem 11rjl E Uj, Dauzere-Peres [122] develops procedures for ob-
taining lower and upper bounds. The lower bound is obtained by solving
A Review of Machine Scheduling 57

a sequence of linear programming problems. An O(n2 } heuristic that uses


the key ideas in Moore's algorithm is proposed to generate an upper bound.
Computational results for instances with up to 50 jobs show that differ-
ences between the lower and upper bounds are small (at most two) for most
problem classes.
A branch and bound algorithm for problem Ildjl E Uj is proposed by
Hariri & Potts [244]. They use a pseudo-polynomial lower bounding proce-
dure that is developed using dynamic programming state-space relaxation,
as described in Section 4.3.2, and a binary branching rule that fixes a job to
be on time or late at each branching. Computational results for instances
with up to 300 jobs show that the lower bound is sufficiently tight to restrict
the size of the search tree.

4.4.3 Approximation
Sahni [457] and Gens & Levner [196, 197] propose FPTAS's for problem
111 E WjUj. Sahni's scheme applies to the problem of maximizing the
weighted number of on-time jobs, and Gens & Levner's scheme applies to
the problem of minimizing the weighted number of late jobs. The scheme
of Gens & Levner [196] has a running time of O(n 3 Ie}. By developing a
procedure to find an upper bound on the optimal objective function value
that is within a factor of two of a lower bound, Gens & Levner [197] reduce
the running time to O(n2 10g n+n2 Ie). Ibarra & Kim [279] describe a PTAS
for the maximization version of problem 11 treel E Uj.

4.5 Total Weighted Earliness and Tardiness


4.5.1 Complexity
Models with total (weighted) earliness plus total (weighted) tardiness as the
optimality criterion are widely studied. Baker & Scudder [33] survey the
main results in this area of scheduling. One variant that has received much
attention is problem Ildj = dl E(wjEj+wjTj} in which jobs have a common
due date d. In some studies the common due date is unrestricted, which is
the case for d ~ Ej=l Pj; otherwise, the common due date is restricted.
For problem Ildj = dl E(Ej + Tj) with a restricted common due date,
Hall, Kubiak & Sethi [237] and Hoogeveen & Van de Velde [264] estab-
lish NP-hardness. Hall, Kubiak & Sethi also derive a pseudo-polynomial
dynamic programming algorithm that requires O(nP} time, where P =
Ej=l Pj· Kanet [292] derives the following properties of an optimal solution
58 Bo Chen, C.N. Potts, and G.J. Woeginger

for the unrestricted common due date version of this problem, from which he
develops an O(nlogn) algorithm. First, there is no idle time between jobs.
Second, one of the jobs is completed exactly at time d. Third, the schedule
is a V-shaped, which means that jobs completed after time d are sequenced
in SPT order, while jobs completed at or before time d are sequenced in
LPT order (non-increasing order of processing times). Let a and b denote
the numbers of jobs that complete after time d, and before or at time d,
respectively, where a + b = n. If jobs a(l), ... , a(a), respectively, complete
after time d, then their objective function contribution is

Similarly, the objective function contribution is

OPP(l) + 1pp(2) + ... + (b - 2)PP(b-l) + (b - l)PP(b)

if jobs ,8(1), ... ,,8(b - 1), respectively, complete before time d and job ,8(b)
completes exactly at time d. Thus, a, a - 1, ... ,2,1 and 0,1, ... ,b - 2, b - 1
are regarded as positional weights for the processing times. The positional
weights are minimized by choosing b = rn/21 and a = Ln/2J. Further, the
objective function is minimized by assigning a longest job to a position with
the smallest weight, a second longest job to a position with a second smallest
weight, and so on.
Hall & Posner [238] prove that problem lid; = dl E w;{E; + T;) is NP-
hard. For this problem, pseudo-polynomial dynamic programming algo-
rithms are available. In particular, Hoogeveen & Van de Velde [264] and
Hall & Posner [238] present O(n2d) and O(nP) algorithms for the case
of a restricted and unrestricted common due date, respectively. Problem
lid; = dl E(wjEj + wjTj) is open with respect to pseudo-polynomial solv-
ability.
The main properties that are used in the derivation of Kanet's algorithm
for problem 11dj = dl E(Ej+Tj) with an unrestricted common due date also
hold for various generalizations of this model, thereby providing O(n log n)
algorithms. Specifically, Panwalkar, Smith & Seidmann [412] consider the
minimization of E(wEj + w'Tj + w' Cj) + d, where the common due date d is
a decision variable. Cheng [102] argues that there should be no penalty for

° °
completion times close to the due date, and considers the minimization of
E(wEj+w'Tj), where Ej = for Ej ~ T and Ej = Ej otherwise, and Tj =
for Tj ~ T and Tj = Tj otherwise, where T < minj Pj /2 is a given tolerance.
More natural models involving the minimization of E{wmax{Ej - Tj,O} +
A Review of Machine Scheduling 59

w'max{Tj-rj,O}), wherepj > rj+rj, and L:{max{Ej-r,O}+w'max{Tj-


r,O}), for any non-negative r, are analyzed by Baker & Scudder [33], and
Weng & Ventura [555], respectively.
Hoogeveen & Van de Velde [268] consider the almost common due date
problem in which the due date of each job j lies in the interval [D, D+pj], for
some constant D. For the unrestricted problem Ildj E [D, D+pjll L:{wEj +
w'Tj) with large D, they derive an O{n 2 ) dynamic programming algorithm.
For unrestricted due dates, Verma & Dessouky [546] consider problem
llpj = 11 L: wj{Ej + Tj) for the case of unit processing times and due dates
that are not assumed to be integers. They show that this problem is polyno-
mially solvable by presenting an integer programming formulation, and then
showing that there exists an optimal integer solution of the linear program-
ming relaxation. Since problem 111 L:(Ej +Tj) is N'P-hard when there is no
restriction on processing times and due dates, the problem of determining a
schedule by inserting idle time between jobs, for a given job sequence, is of
interest. For problems 111 E wj(Ej + Tj) and 111 E(wjEj + wjTj), Garey,
Tarjan & Wilfong [193] and Davis & Kanet [127] present O{nlogn) and
O(n2 ) algorithms, respectively, for idle time insertion. An alternative al-
gorithm of Swzarc & Mukhopadhyay [516] requires O(qn), where q is the
number of clusters of jobs. In computational results for randomly gener-
ated instances, Swzarc & Mukhopadhyay show that their algorithm requires
substantially less computation time than that of Davis & Kanet.
Closely related to these earliness-tardiness models is the problem of min-
imizing the completion time variance, which we denote by 111 E(Oj - 0)2,
where 0 is the average completion time. Eilon & Chowdhury [157] prove
that the V-shaped property holds for this problem. Cai [76] discusses the
weighted problem 111 E Wj (OJ - 0)2: in the case of reverse agreeability of
processing times with weights (where Pj < Pk implies that Wj ~ Wk), the V-
shaped property still holds; however, this result does not extend to the gen-
eral case with arbitrary weights. Kubiak [311] shows that all of these prob-
lems are N'P-hard. Kahlbacher [290] and De, Ghosh & Wells [128] develop
dynamic programming algorithms for computing the best V-shaped sched-
ule in pseudo-polynomial time. Hence, problem 111 E(Oj - 0)2 and prob-
lem 111 L: Wj(Oj - 0)2 with agreeable weights are both pseudo-polynomially
solvable. Further, problem 111 E Wj (OJ - 0)2 with arbitrary weights is open
with respect to pseudo-polynomial solvability.
60 Bo Chen, C.N. Potts, and G.J. Woeginger

4.5.2 Enumerative Algorithms

For problem Ildj = dl 'E{Ej + Tj) with a restricted due date, Szwarc [512]
develops a branch and bound algorithm for the case that the start time of
the schedule is given. Computational results for instances with up to 25
jobs are given. Hoogeveen, Oosterhout & Van de Velde [262] propose an
improved branch and bound algorithm. Their lower bounds are obtained by
performing a Lagrangean relaxation of the constraint that the total process-
ing of early jobs cannot exceed d. This produces a Lagrangean problem in
which the positional weight of each early job increases by the value of the
multiplier from its natural value. An analytical method for determining an
optimal value of the Lagrange multiplier allows the lower bound to be com-
puted in O{nlogn) time. The branching rule fixes jobs in non-increasing
order of processing times to be early or late. Computational results for in-
stances with up to 1000 jobs show that search trees are small, and all of the
instances with more than 30 jobs are solved at the root node of the search
tree. For problem Ildj = dl 'E{wjEj + wjTj), Van den Akker, Hoogeveen &
Van de Velde [535] propose a column generation algorithm (details of the col-
umn generation approach are given in Section 6.2.2). Computational results
for the case of an unrestricted common due date show that this algorithm
is effective in solving instances with up to 60 jobs.
For problem 111 'E Wj {OJ - 0)2, Bagchi, Sullivan & Chang [29] derive
various dominance rules. By using a tree search approach (without lower
bounds), they solve instances with up to 20 jobs. Ventura & Weng [545]
present a quadratic integer programming formulation from which they derive
a lower bound using Lagrangean relaxation. The lower bound is not tested in
a branch and bound algorithm. However, computational results for instances
with up to 500 jobs show that the lower bound is fairly tight.
Without a common due date, problems with total earliness and tardi-
ness criteria present a much greater challenge. Using similar analysis to that
of Szwarc & Mukhopadhyay [517] for problem 111 'ETj (see Section 4.3.2),
Szwarc [515] considers adjacent job orderings between each pair of jobs for
problem 111 'E{wEj + w'Tj) to obtain a partial decomposition. By incorpo-
rating a branching rule, but no lower bounds, he gives computational results
for instances with 10 jobs. Abdul-Razaq & Potts [3] develop a branch and
bound algorithm for a constrained version of problem 111 'E{wjEj + wjTj)
which forces processing to start at time zero, and forbids machine idle time is
between jobs. Their lower bounding scheme, which is based on dynamic pro-
gramming state-space relaxation, is similar to that of Abdul-Razaq, Potts
A Review of Machine Scheduling 61

& Van Wassenhove [4] for problem 111 ~ wjTj (see Section 4.3.2). Although
the lower bounds are fairly tight, large search trees are generated for in-
stances with 25 jobs. Hoogeveen & Van de Velde [267] propose a branch
and bound algorithm for the general version of problem 111 ~(wEj + w'Tj).
They use a backward branching rule for sequencing, which is combined with
an O(n 2 ) algorithm for inserting idle time between jobs. Their algorithm
includes a variety of dominance rules, and use five different lower bounds
that are developed from relaxations of the objective function, the machine
capacity, the due dates, and the processing times, and a Lagrangean relax-
ation of the non-negativity of earliness. Computational results show that
search trees become large when there are more than about 15 jobs.

4.5.3 Local Search

Any problem for which some V-shaped property can be established essen-
tially requires the jobs to be partitioned into two subsets. In this case, the
problem can be treated as one of partitioning, since sequencing is straight-
forward once the partition of jobs is known. Mittenthal, Raghavachari &
Rana [379] propose a simulated annealing algorithm which is applicable
when a V-shaped property holds. They use a neighborhood that either re-
assigns a job from one subset to the other, or swaps two jobs in different
subsets provided that these jobs are adjacent pairs in an SPT ordering.
Their algorithm first applies two descent procedures, the first of which uses
the reassign neighborhood and the second uses the swap neighborhood, and
then applies simulated annealing with the swap neighborhood. Computa-
tional results for problem 111 ~(Oj - 0)2 with instances containing up to
20 jobs show that the algorithm generates an optimal solution for each test
problem. Genetic algorithms for problems Ildj = dl ~(wjEj + wjTj) and
111 ~Wj(Oj - 0)2 are proposed by Lee & Kim [341] and Gupta, Gupta &
Kumar [229], respectively.
For problem 111 ,£(wjEj + wjTj), there are efficient algorithms, as indi-
cated in Section 4.5.1, for inserting idle time optimally, provided that the
sequence of jobs is given. With this approach, local search can be used
for finding a sequence of jobs. Yano & Kim [566] propose a descent method
which uses the transpose neighborhood. Computational results for instances
with up to 20 jobs in which the weights are proportional to the processing
times of the respective jobs show that their descent method consistently
generates an optimal solution.
62 Bo Chen, C.N. Potts, and G.J. Woeginger

4.5.4 Approximation
For problem Ildj = dl 'E(Ej + Tj} with a restricted due date, Hoogeveen,
Oosterhout & Van de Velde [262] present a method that transforms the
solution of the Lagrangean problem in their lower bounding scheme into
a feasible schedule. They show that this approximation algorithm, which
requires O( n log n} time, has a worst-case ratio of 4/3.
For problem Ildj = dl 'Ewj(Ej + Tj}, Hall & Posner [238] use their
dynamic programming algorithm to derive an FPTAS for the special case
that each weight is bounded from above by a polynomial function of n.
Jurisch, Kubiak & J6zefowska [289] establish that this special case in fact is
solvable in polynomial time. Kovalyov & Kubiak [304] and Woeginger [562]
each present an FPTAS for the general case without imposing any restriction
on the total weight of the jobs. Woeginger [561] gives a FPTAS for problem
111 'E Wj(Gj - G}2 with agreeable weights. For problem 111 'E(Gj - G}2, De,
Ghosh & Wells [128] and Cai [76] independently develop an FPTAS, each
of which has a running time of O(n 3 c}.

4.6 Other Criteria


4.6.1 Single Criteria
Problem 111 'E wjeo(Crdj) , where a is a positive constant, is solvable in
O(nlogn) time by a rule of Rothkopf [452] in which jobs are sequenced in
non-decreasing order of eodj (1 - e-OPj}/Wj. Rothkopf & Smith [453] show
that, among a certain class of objective functions that are closed under scalar
multiplication, the only problems of minimizing total cost that can be solved
by a priority rule are 111 'EWjGj and 111 'Ewjeo(Cj-dj).
Problem 111 'E WjGJ is studied by Townsend [528]. His analysis shows
that the SPT rule solves problem 111 'E GJ in 0 (n log n) time. However,
the complexity status of problem 111 'E WjGJ is open. Townsend proposes
a branch and bound algorithm, but does not perform computational tests
to assess its effectiveness. Szwarc, Posner & Liu [518] analyze adjacent job
orderings to decompose the problem, as Section 4.3.2 They report that 191 of
200 instances with up to 100 jobs are solved with this approach. Szwarc [513]
and Della Croce et al. [131] develop this decomposition approach further by
using a branch and bound algorithm to solve any subproblems that cannot
be decomposed, and by developing stronger decomposition rules. In their
computational tests, Della Croce et al. solve instances with up to 400 jobs.
However, a greater challenge results if a strong positive correlation between
A Review of Machine Scheduling 63

processing times and weights is introduced; in this case, their results are
limited to instances with up to 100 jobs. Szwarc & Mukhopadhyay [516]
develop a similar type of algorithm for problem 111 EWj(Cj - pj)2
The late work for job j is the amount of processing performed after its due
date, and is denoted by ltj. Problems 11pmtnl E ltj and 111 E ltj are analyzed
by Potts & Van Wassenhove [437]. For problem 11pmtnl E ltj, they show
that the minimum total late work is equal to the maximum tardiness Tmax
for the EOO sequence. Further, ifTmax > 0, an optimal solution is obtained
by removing the first Tmax units of processing from the EOO schedule, and
rescheduling it after the last job. By contrast, problem 111 E ltj is shown to
be N1'-hard, but is pseudo-polynomiaUy solvable in O(n(Tmax+Pmax)) time
by dynamic programming, where Pmax denotes the largest processing time.
The algorithm relies on the property that on-time jobs (those which are
started before their due dates) are sequenced in EOD order. Computational
results for instances with up to 10 000 jobs show the effectiveness of the
dynamic programming algorithm. Potts & Van Wassenhove [438] derive
two FPTAS's for problem 111 E ltj, based on the rounding of state variables
in two alternative dynamic programming formulations. The more efficient
scheme has a running time of (n 2 Ie).
Problems of minimizing total weighted late work are studied by Hariri,
Potts & Van Wassenhove [246]. They present an O(nlogn) algorithm for
problem 11pmtnl EWjltj. For problem 111 EWjltj, although the result for
the unweighted problem that on-time jobs are sequenced in EOO order no
longer holds, it is shown that at most one on-time job violates the EOO order
at any time. This property allows the development of a pseudo-polynomial
algorithm with a time requirement of O(n2P), where P = E'J=l Pj' A branch
and bound algorithm is proposed which uses dynamic programming state-
space relaxation method to obtain a lower bounds. Computational results
show that the algorithm is effective in solving instances with up to 700
jobs. By rounding state variables in a dynamic programming formulation,
Kovalyov, Potts & Van Wassenhove [305] derive an FPTAS, which has a
running time of O( n 3 log n + n 3 Ie).

4.6.2 Multiple Criteria


When there are multiple criteria, either a hierarchical or a simultaneous
approach can be adopted. Under a hierarchical approach, the criteria are
ranked in order of importance; the first criterion is optimized first, the second
criterion is then optimized, subject to achieving the optimum with respect
64 Bo Chen, C.N. Potts, and G.J. Woeginger

to the first criterion, and so on. For simultaneous optimization, there are
two approaches. First, all 'efficient' (or Pareto optimal) schedules can be
generated, where an efficient schedule is one in which any improvement to the
performance for one of the criteria causes a deterioration with respect to one
of the other criteria. Second, a single objective function can be constructed,
for example by forming a linear combination of the various criteria, which
is then optimized.
There is an extensive literature dealing with many aspects of multi-
criteria scheduling. Surveys of algorithms and complexity results in this
area are given by Dipeepan & Sen [140], Fry, Armstrong & Lewis [181]'
Hoogeveen [256], Lee & Vairaktarakis [342] and Chen & Bulfin [99]. We
only provide the most significant complexity results below.
Van Wassenhove & Gelders [541] propose a pseudo-poly~mial algorithm
for finding all efficient schedules with respect to E Gj an~ Lmax. Their
algorithm searches all possible Lmax values. Since a given Ltlt¥ value im-
poses job deadlines dj, the algorithm of Smith [495] is used to solve the
corresponding 11dj IE Gj problem. By showing that the maximum number
of efficient points is O(n2 ), Hoogeveen & Van de Velde [265] provide an
O(n3 Iogn) algorithm for finding all efficient schedules. They also provide
a generalization to the case that the two criteria are E Gj and f max, which
increases the time complexity to O(n3 min{n,logP}), where P = Ej=lPj.
Hoogeveen [256] proves strong NP-hardness of bicriteria problems involving
"£ WjGj and Lmax (unless the primary criterion is "£ WjGj in a hierarchical
approach).
The complexity status of several problems in which one of the criteria
is E Uj remains open. For example, if the criteria are E Uj and L max , the
hierarchical problem is open, irrespective of the selection of the primary
criterion. Also, open problems occur for the criteria E Uj and "£ Gj (unless
the primary criterion is E Gj in a hierarchical approach).
There are a number of significant results for the case that one of the
criteria is maximum earliness. Garey, Tarjan & Wilfong [193] develop an
O(n(logn + logpmax)) algorithm for the problem of scheduling a single ma-
chine to minimize the maximum absolute deviation of job completion times
from due dates, or equivalently problem 111 max{Emax , Tmax}. Li, Chen &
Cheng [354] show that an instance of problem 111 max{ wEmax , w'Tmax} , for
arbitrary non-negative weights wand w', can be polynomially transformed
into an instance of problem 111 max{Emax , Tmax}, thereby allowing algo-
rithms for unweighted problem to be used. Hoogeveen [258] shows that all
efficient schedules for minimizing a function of two or three maximum cost
A Review of Machine Scheduling 65

criteria can be found in O{n4) and O{n 8 ) time, respectively. The prompt-
ness of a job is defined as its given target start time minus its actual start
time. For the case that the target start time of each job j lies in the inter-
val [dj - Pj - A, dj - A], for some constant A, Hoogeveen [257] presents an
O{n2 log n) algorithm for finding all efficient schedules.

5 Parallel Machine Problems: Unit-Length Jobs


and Preemption
Intuitively, scheduling jobs with preemption to some extent can be regarded
as similar to scheduling unit jobs without preemption. Therefore, we discuss
these two models together. Allowing preemption greatly mitigates the diffi-
culty of solving scheduling problems, although each preemption may incur
a cost in practice. If we consider a scheduling problem as a zero-one integer
program with decision variables indicating the assignment of jobs (or their
operations) to machines and time slots, then preemption allows a variable
to take a value between zero and one, thereby indicating the proportion of
the job or its operation to be assigned. From this perspective, polynomial
solvability of preemptive scheduling problems is normally expected. Indeed,
for many scheduling problems, preemption is a vital element for polynomial
solvability.
It is quite simple to schedule jobs of equal lengths without any com-
plicating constraints, such as release dates or precedence constraints. Just
beyond the trivial case of scheduling identical machines, consider the prob-
lems Qlpj = 11 L: fj and Qlpj = Ilfmax. It is easily observed that there
always exists an optimal schedule which processes the jobs in n time periods
that have the earliest completion times. Once these n time slots have been
identified, the next step is to allocate the jobs to the time slots according
to the objective function. In the case of minimizing L: Ii, this allocation
amounts to solving an n x n weighted bipartite matching problem. In the
case of minimizing fmax, this involves the following iterated greedy alloca-
tion. Consider the last time slot t among the time slots that have not been
considered yet, allocate such a job j that minimizes fj{t). When release
dates and deadlines are introduced, similar approaches can be used. Fur-
ther details are given by Dessouky, Lageweg, Lenstra & Van de Velde [139]
and Lawler, Lenstra, Rinnooy Kan & Shmoys [335].
A survey of results for parallel machine scheduling, including both pre-
emptive and non-preemptive models, is given by Cheng & Sin [104].
66 Bo Cben, C.N. Potts, and G.J. Woeginger

5.1 Minmax Criteria


It is not surprising that the first paper in parallel machine scheduling deals
with preemption. In his paper, McNaughton [378] uses a simple wrap-around
procedure, which runs in O(n) time and generates at most m - 1 preemp-
tions, to solve problem PlpmtnlCmax to optimality. The basic idea behind
it is quite simple: first calculate a lower bound on the value of an optimal
schedule and then construct a schedule that matches the bound. Note that
in this approach, the lower bound is used as a deadline for constructing a
feasible schedule. Once a feasible schedule is found, it is also optimal. The
lower bounds is the maximum of the average machine workload and the
processing time of the longest job.
McNaughton's approach has been successfully applied to other preemp-
tive scheduling problems. Following a generalization of McNaughton's lower
bound to the case of uniform machines by Liu & Yang [361], Horvath, Lam
& Sethi [276] suggest the longest remaining processing time on fastest ma-
chines (LRPT-FM) rule: at each time point, the job with the most remain-
ing processing time is scheduled on the fastest available machine. Thereby
they solve problem QlpmtnlCmax in O(mn 2 ) time. Note that the LRPT-FM
rule is an adaptation of the critical path method of Muntz & Coffman [393],
which we discuss later in Section 5.3. A less intuitive and more case-based al-
gorithm is developed by Gonzalez & Sahni [217], which significantly reduces
both the running time to O(n + m log m) and the number of preemptions.
At the cost of solving a linear program, in which a decision variable
represents the total time spent by each job on each machine, Lawler &
Labetoulle [333] provide a general lower bound on the optimal makespan for
problem RlpmtnlCmax , and then solve it to optimality as an OlpmtnlCmax
problem (see Section 7). For the case of two machines, Gonzalez, Lawler &
Sahni [214] develop a linear time algorithm for problem R2lpmtnlCmax •
Horn [274] generalizes McNaughton's approach to the case where jobs
have deadlines. By considering the amount of processing that has to be
performed on each job by its deadline, Horn provides a necessary and suffi-
cient analytic condition for a feasible schedule to exist. This condition leads
directly an O(n 2 ) algorithm for problem PlpmtnlLmax , and for problem
Plrj,pmtnlCmax which is its symmetric counterpart.
More generally, if both release dates and deadlines are present, Horn [274]
formulates the problem of testing the existence of a feasible preemptive
schedule as a network flow model, which can be solved in O(n 3 ) time. Note
that, if all jobs are released at the same time, such a test can be performed
A Review of Machine Scheduling 67

in O(n(logm + logn)) time by an algorithm of Sahni [458]. Combining


Horn's network flow formulation with a binary search, Labetoulle, Lawler,
Lenstra & Rinnooy Kan [315] propose an O(n 3 min{n2 , logn+logmaxjpj}}
algorithm for problem Plrj,pmtnILmax'
In the case of uniform machines, Sahni & Cho [461] and Labetoulle,
Lawler, Lenstra & Rinnooy Kan [315] show that problem Qlrj,pmtnlGmax
is solvable in O(n log n+mn} time; the resulting schedule has at most O(mn}
preemptions.
Martel [366] provides an algorithm for problem Qlrj,pmtnILmax . By
computing a maximal network flow, Federgruen & Groenevelt [164] develop
an improved algorithm that runs in O(k n 3} time, where k is the total num-
ber of distinct speeds. The linear programming technique of Lawler &
Labetoulle [333] can be used to provide a polynomial time algorithm for
problem Rlrj,pmtnILmax .

5.2 Minsurn Criteria


McNaughton [378] shows that for problem PI IE WjGj, the total weighted
completion time cannot be reduced by allowing preemption. This property is
extended by Du, Leung & Young [153] to the case that jobs have precedence
constraints in the form of a set of chains. Therefore, we discuss these models
in Section 6 which considers non-preemptive parallel machine scheduling.
However, for other models, preemption may help. Actually, simple examples
show that, if the machines have different speeds, or the jobs are subject to
precedence constraints in the form of intrees or outtrees, preemption can
reduce the total weighted completion time.
Gonzalez [209] solves Qlpmtnl E Gj in O(n log n + mn} time. His algo-
rithm is based on the shortest remaining processing time on fastest machines
(SRPT-FM) rule: at each time point, schedule the job with the shortest re-
maining processing time on the fastest available machine. More generally,
by combining the SRPT-FM rule with the LRPT-FM rule mentioned in
the previous section, McCormick & Pinedo [374] compute in O(m 3n) time
the entire tradeoff curve of schedules that simultaneously minimize both the
total weighted completion time and the makespan.
If release dates are considered, problems become N1'-hard as is the
case for problem P2lrj,pmtnl E Gj (Du, Leung & Young [152]). Until
very recently, little was known about approximation algorithms for prob-
lems involving minsum criteria. Phillips, Stein & Wein [417] introduce
an integer program to formulate the non-preemptive scheduling of jobs
68 Bo Chen, C.N. Potts, and G.J. Woeginger

that are preempted into unit-length pieces. Then they round the so-
lution of the corresponding linear programming relaxation to a feasible
preemptive schedule. Their techniques result in a (16 + c)-, (24 + c)-
and (32 + c)-approximation algorithm for problems Plrj,pmtnl EWjCj,
Rlrj,pmtnlECj and Rlrj,pmtnIEwjCj, respectively. The constant-
approximation algorithm for Rlpmtnl E Cj is particularly interesting, since
the complexity status of the problem is still open, and is vexing issue in the
area of preemptive scheduling, given that its non-preemptive counterpart is
polynomially solvable (see Section 6.2). The ratio guarantee for problem
Plrj,pmtnl EWjCj has been significantly improved, even with precedence
constraints.
We now discuss other minsum criteria. Lawler [329] shows that
problem Plpmtnl E Uj is NP-hard. However, when the number of ma-
chines is fixed, Lawler [325] shows that problems Qmlpmtnl E WjUj and
Qmlpmtnl E WjUj are solvable in pseudo-polynomial and polynomial time,
respectively. Moreover, Lawler & Martel [337] provide an FPTAS for prob-
lem Q21pmtnl EWjUj.
All problems of minimizing total tardiness are hard, since Section 4.3.1
points out that problems 111 E Tj and 111 E wjTj are NP-hard and strongly
NP-hard, respectively, and there is no advantage in allowing preemption for
these single machine problems.

5.3 Precedence Constraints


5.3.1 Unit-Length Jobs
To appreciate the effect of precedence constraints, it is helpful to start with
the scheduling of jobs with equal lengths. There has been an intensive
research in this area.
The presence of general precedence constraints makes all problems Np-
hard if the number of machines is part of input, since this is the case for
problems Plprec,pj = llCmax and Plprec,pj = 11 E Cj, as shown by Ull-
man [530] and Lenstra & Rinnooy Kan [346], respectively. Actually, Lenstra
& Rinnooy Kan show that it is NP-complete even to approximate the so-
lution of problem Plprec,pj = llCmax within a factor better than 4/3.
However, this stagnant situation changes immediately if the form of the
precedence relations is relaxed into that of a tree, or if the number of ma-
chines is fixed.
Hu [277] gives an algorithm for problems Pltree,pj = llCmax and
A Review of Machine Scheduling 69

Plouttree,pj = 11 E OJ using a non-preemptive critical path scheduling al-


gorithm, which always schedules the job that heads the longest current chain
of unscheduled jobs. Various algorithms have been designed for a number
of special cases of problem Pmlprec,pj = 110max, which include the case
where the precedence graph is an opposing forest, that is, the disjoint union
of an in-forest where each job has at most one immediate successor and an
out-forest where each job has at most one immediate predecessor (Garey,
Johnson, Tarjan & Yannakakis [192]).
Similarly, the formulation by Fujii, Kasami & Ninomiya [183] of problem
P2lprec,pj = IIGmax as a maximum cardinality matching problem, which
is solvable in O(n 3 ) time, provides the groundwork for the development of
a series of improved algorithms. Moreover, polynomial solvability has been
extended to minimizing E OJ and to including job release dates and due
dates (Garey & Johnson [186, 187]).
Some of the simple algorithms for problem P2lprec,pj = 110max are
adapted to the more general problem Plprec,pj = 110max, and are shown to
have some good worst-case guarantees. On the other hand, various minimal
N'P-hard problems have been identified (see Lawler, Lenstra, Rinnooy Kan
& Shmoys [335] for more details).
Polyhedral analysis for minimizing total weighted completion times
yields strong lower bounds on optimal solution values; Queyranne &
Schulz [440] give a comprehensive survey of the main results in this area.
Motivated by the results of Van den Akker [533], Hall, Schulz, Shmoys &
Wein [233] show that if list scheduling (which always schedules the first in a
prespecified list of unscheduled jobs whenever a machine becomes available)
is guided by an optimal solution to a linear programming relaxation, then
it is guaranteed to produce high quality approximation algorithms, both
off-line and on-line, for minimizing total weighted completion time. As a
result, significant improvements to the previous approximation results are
obtained. In particular, a 3-approximation algorithm is given for problem
Plrj,prec,pj = 11 EWjGj, where this bound improves to 3 - 11m if there
are no (non-trivial) release dates.

5.3.2 General-Length Jobs


The polynomial solvability picture for scheduling jobs with arbitrary pro-
cessing times under preemption is very similar to that for the non-preemptive
scheduling of unit-length jobs. This similarity suggests a close relationship
between these two models, as Lawler [328] observes as a result of deriving
70 Bo Chen, C.N. Potts, and G.J. Woeginger

polynomial algorithms for a series of counterparts in the former model of


well-solvable problems in the latter model.
In parallel to the complexity results for scheduling jobs of unit-
length, problem Plpmtn,preclCmax is shown to be N'P-hard by Ullman
[531], whereas problems Plpmtn,treeICmax , P2lpmtn,preclCmax and even
Q2lpmtn,preclCmax are all polynomially solvable by the algorithms of Muntz
& Coffman [392, 393] and Horvath, Lam & Sethi [276], respectively.
Interestingly, the algorithm of Muntz & Coffman [392, 393] also fo-
cuses on the critical path of the precedence graph, as is the case in Hu's
algorithm for scheduling unit-length jobs. However, Gonzalez & John-
son [213] use a totally different list scheduling approach for solving problem
Plpmtn, treelCmax • Their algorithm segregates the jobs into two classes. In
one class there is what can be termed the 'backbone' of the problem, a su-
perset of those jobs whose start and finish times are fixed in any optimal
schedule. The other jobs can in general be scheduled with some freedom.
Their algorithm runs in O(nlogm} time. In the same spirit as their LRPT-
FM algorithm for problem QlpmtnlCmax (see Section 5.1), Horvath, Lam &
Sethi [276] solve problem Q2lpmtn,preclCmax in O(mn2 } time.
Lam & Sethi [319] adapt the Muntz-Coffman algorithm for problem
Plpmtn, preclCmax and show that it has a worst-case ratio of 2 - 21m. Simi-
larly, in the case of uniform machines, Horvath, Lam & Sethi [276] prove that
this algorithm has a ratio guarantee of J3m/2, which is tight up to a con-
stant factor. To improve the Muntz-Coffman algorithm, Jaffe [284] suggests
scheduling jobs without unforced idleness by always using fastest machine
available. He proves that this improves the ratio guarantee to rm + 1/2.
In parallel to their approximation results for problem Plrj,prec,pj =
11 E WjCj , Hall, Schulz, Shmoys & Wein [233] show that for problem
P/rj,prec,pmtn/ E WjGj a linear programming guided list scheduling algo-
rithm has a ratio guarantee of 3; this bound is improved to 3 - 11m if there
are no (non-trivial) release dates.

5.4 On-Line Algorithms


5.4.1 Clairvoyant Scheduling
Preemption makes on-line scheduling quite successful. Observe that, for
problem P/on-line-list,pmtn/Gmax , the main additional difficulty is to re-
serve appropriate spaces in anticipation of future jobs. In their algorithm for
scheduling over a list, Chen, Van Vliet & Woeginger [95] maintain a certain
A Review of Machine Scheduling 71

pattern of machine loads during the whole process of scheduling, and they
prove that if this pattern is a geometric sequence of the ratio m : (m - 1),
then the algorithm has a competitive ratio of 1/(1 - (1 - l/m)ffl). This
competitive ratio approaches e/(e - 1) :::::: 1.582 as m becomes large. They
also show that such a competitive ratio is the best possible for every m ~ 2.
On-line scheduling becomes much easier if jobs arrive over time: For
the nearly on-line variant of problem Plon-line,rj,pmtnICmax in which the
release date of the next job is always known, Gonzalez & Johnson [213] give
a I-competitive (i.e., optimal) algorithm. Sahni & Cho [459] and Labetoulle,
Lawler, Lenstra & Rinnooy Kan [315] extend this result to the nearly on-line
variant of problem Qlon-line, rj,pmtnlCmax on uniform machines. Hong &
Leung [255] show that an optimal, I-competitive (fully) on-line algorithm for
Plon-line,rj,pmtnICmax is actually quite easy to derive: whenever a new job
arrives, all the active jobs which have arrived but have not yet been finished
are rescheduled according to a slightly modified McNaughton's wrap-around
rule. For the (fully) on-line version of Qlon-line, rj,pmtnICmax , Vestjens
[547] shows that there exists a I-competitive on-line algorithm if and only
if the speeds satisfy si-d Si :5 sd Si+1 for i = 2, ... ,m - 1, where Si is the
speed of ith fastest machine.
In contrast to the single machine case (see Section 4.1), it is still un-
known whether preemption is beneficial for the parallel machine problem.
Vestjens [547] shows that the competitive ratio of anyon-line algorithm for
Pion-line, rj,pmtnlLmax is at least 12/11 :::::: 1.090.
In Section 4.2, we have seen that the shortest remaining processing
time rule is I-competitive for problem lion-line, rj,pmtnl E Cj. This re-
sult does not extend to the case of more than one machine. In fact, Vest-
jens [547] shows that the competitive ratio of anyon-line algorithm for
problem Pion-line, rj,pmtnl E Cj is at least 22/21 :::::: 1.047. Phillips, Stein
& Wein [417] describe a variation of SRPT and prove that it is 2-competitive.
For problem Pion-line, rj,pmtnl E Fj, Leonardi & Raz [352] prove that
the SRPT algorithm has a competitive ratio of O(1ogmin{n/m,7r}), where
7r = maxj Pj / minj Pj; this ratio is best possible up to a constant factor.

5.4.2 Non-Clairvoyant Scheduling


Due to lack of information, it is natural to adopt a strategy which at all
times ensures that every uncompleted job has received an equal amount
of processing. Not surprisingly, this round-robin strategy is actually very
competitive for problem Plon-line-nc1v,rj,pmtnIEFj' Motwani, Phillips
72 Bo Chen, C.N. Potts, and G.J. Woeginger

& Torng [387] prove that if all jobs are released at the same time, then the
round-robin strategy has a competitive ratio of 2 - 2m/(n + m), for n ?: m.
Moreover, no on-line algorithm can have a better competitive ratio. Under
a dynamic environment where jobs arrive over time, they show that round-
robin is O(n/ log n)-competitive, and no non-clairvoyant on-line algorithm
is better than {l(n 1/ 3 )-competitive. As might be expected, some knowledge
of job sizes does help. Motwani, Phillips & Torng prove that if the ratio 7r
of the largest to smallest job processing time is a constant, then a trivial
algorithm, run-to-completion, which runs each job to completion for any
given job order, is 7r-competitive; this is the best possible competitive ratio
for non-clairvoyant on-line algorithms for this variant.
For preemptive scheduling where the processing of a job can be stopped
and restarted later, but any previous processing is lost after a preemp-
tion, Shmoys, Wein & Williamson [488] give a (4 log m + 6)-competitive
algorithm, which is the best possible up to a constant factor, for problem
Qmlon-line-nc1v, rj,pmtnIGmax, and a (4logn + 5)-competitive algorithm
for problem Rmlon-line-nc1v, rj,pmtnIGmax'

6 Parallel Machine Problems: No Preemption


6.1 Minmax Criteria
6.1.1 Complexity
In contrast to the wide availability of efficient algorithms for preemptive
scheduling and· for scheduling unit-length jobs, polynomial solvability is
rare in cases where preemption is forbidden. This can easily be seen from
the fact that problem P211Gmax is NP-hard (Garey & Johnson [189]) and
problem PIIGmax is strongly NP-hard (Garey & Johnson [188]). However,
pseudo-polynomial algorithms can be derived for several problems with par-
allel machines if the number of machines is fixed. Using the approach of
Rothkopf [452] and Lawler & Moore [338), a state variable is associated
with each machine to represent its total workload (although one of these
state variables can be eliminated for the case of identical and uniform ma-
chines). This allows pseudo-polynomial algorithms to be developed for prob-
lems RmllGmax and RmllLmax, which are also applicable to the case of
identical and uniform machines.
A Review of Machine Scheduling 73

6.1.2 Enumerative Algorithms

For problem PIICmax , Dell'Amico & Martello [133] propose a branch and
bound algorithm. Assuming that the jobs are indexed so that Pl ~ ... ~ Pn,
a trivial lower bound is given by max{Ej=lPj/m,PbPm + Pm+1}' Using
arguments from bin packing, a procedure is developed to improve upon this
lower bound. The branching rule assigns a longest unscheduled job to one
of the machines. Computational results show that the algorithm can solve
instances with up to 10000 jobs, and is far more efficient than dynamic
programming.
Carlier [80] develops a branch and bound algorithm for problem PI rjl
Lmax. He adapts the generalized EDD heuristic for problem 11rjlLmax to the
corresponding parallel machine problem, and uses the heuristic solution to
identify a critical set of jobs which then is used to compute a lower bound.
Each job has an interval of availability, and a binary branching rule fixes
a job to be either in the first half of the interval by assigning it a smaller
deadline, or in the second half of the interval by assigning it a larger release
date. Computational results for instances with 2, 3, 5 and 8 machines show
that the algorithm is reasonably effective in solving instances with up to 100
jobs.
For problem RIICmax , Van de Velde [537] and Martello, Soumis &
Toth [367] propose branch and bound algorithms and heuristics. Van de
Velde computes lower bounds using a surrogate duality relaxation, where
multipliers are obtained using an ascent procedure. On the other hand,
Martello, Soumis & Toth derive various bounds using Lagrangean relax-
ation, and compute multipliers using subgradient optimization. Computa-
tional results with instances having up to 200 jobs and 20 machines show
that problem difficulty increases as the number of machines increases. The
results also demonstrate the superiority of the algorithm of Martello, Soumis
& Toth over Van de Velde's algorithm.

6.1.3 Local Search

Brucker, Hurink & Werner [63, 64] propose the use of a primary and sec-
ondary neighborhood for problem PIICmax . For the primary neighborhood
that reassigns a job on a most heavily loaded machine to a most lightly
loaded machine, they show that a local optimum is obtained in O(n2 ) time.
Their secondary neighborhood first makes an arbitrary reassignment of a job
to another machine, and then applies the primary neighborhood to find a 10-
74 Bo Chen, C.N. Potts, and G.J. Woeginger

cal optimum. Their computational results for instances with up to 5000 jobs
show that a simulated annealing method that uses the secondary neighbor-
hood yields better quality solutions than simulated annealing based solely
on the primary neighborhood.
Hariri & Potts [243] propose a descent algorithm for problem RIICmax
which uses a neighborhood that either reassigns a job on a most heavily
loaded machine to another machine, or interchanges a job on a most heav-
ily loaded machine with a job on another machine. Computational results
indicate that this descent algorithm generates better quality solutions than
various two-phase heuristics (Potts [426], Lenstra, Shmoys & Tardos [350])
which use linear programming in their first phase to schedule most of the
jobs. Hariri & Potts also report on initial experiments which indicate that a
more complicated neighborhood structure has little effect on solution qual-
ity.
Glass, Potts & Shade [204] use the same reassign and swap neighbor-
hood as Hariri & Potts in simulated annealing and tabu search algorithms
for problem RIICmax • They also describe a genetic algorithm in which a
solution is represented by a list of machines to which the respective jobs are
assigned, and a corresponding genetic descent algorithm in which the descent
algorithm of Hariri & Potts is applied to each solution in every population.
Computational results show that the simulated annealing, tabu search and
genetic descent algorithms are roughly comparable, but the performance of
the standard genetic algorithm is poor.

6.1.4 Approximation using List Scheduling


In a list scheduling algorithm, jobs are placed into a list, often in an arbitrary
order. The algorithm always schedules the first available job on the list of
unscheduled jobs whenever a machine becomes idle. More precisely, the
availability of a job means that the job has been released and/or all its
predecessors in the precedence graph have already been scheduled. In its
simplicity and the fact that any optimal schedule can be constructed by a list
scheduling algorithm, list scheduling is by far the most popular approach for
scheduling parallel machines. If a list scheduling algorithm operates with an
arbitrary list of jobs, then we denote the algorithm by LS. Since LS requires
no knowledge of active and future jobs, its power will be mainly discussed
in the section dealing with on-line scheduling.
In the first paper on the worst-case analysis of scheduling heuristics,
Graham [225] shows that algorithm LS for problem PIICmax has a worst-case
A Review of Machine Scheduling 75

ratio of 2 -l/m. If the job list is sorted in order of non-increasing processing


times, then this algorithm is known as LPT, and is shown by Graham [226]
to have an improved worst-case ratio of 4/3 - 1/{3m). Another variant of
LS, as suggested by Graham, is to schedule the k longest jobs optimally,
and then apply LS to the list of remaining jobs: this gives a ratio guarantee
of 1 + (I - l/m)/{l + lk/mJ). Therefore, for fixed m, a family of these
algorithms for different k's provides a PTAS, although the running time
of O{nkm) is huge. Ibarra & Kim [278] show that LPT is no more than
1 + 2{m -l)/n away from the optimum makespan, if n ;:::: 2{m -1)71", where
(as in Section 5.4) 71" = maxi Pi/ mini Pi'
If machines have different speeds, then Morrison [386] shows that LPT
for problem QIICmax has a worst-case ratio of max{a/2,2}, where a =
maxi si/ mini Si. A modified LPT algorithm, which assigns the current job
to the machine on which it will finish first, is shown by Gonzales, Ibarra &
Sahni [212] to improve the ratio guarantee to 2 - 2/{m + 1). Subsequently,
this guarantee is further improved to 19/12 by Dobson [141] and Friesen
[178].

6.1.5 Bin-Packing Based Approximation


A second main approximation approach to problem PIICmax is to consider
the dual problem, which is one of bin-packing. In a bin-packing problem, a
number of items of various sizes are to be packed into a minimum number
of bins of a certain given capacity. Note that problems of scheduling to
minimize the makespan and of bin-packing share the same decision version.
Naturally there is some common territory to explore.
Based on this principle of duality, Coffman, Garey & Johnson [111] pro-
pose a heuristic for problem PIICmax , called Multifit (MF), to find by binary
search the minimum capacity of the m bins into which the n items can be
packed by a packing heuristic, called first-fit decreasing (FFD). In the FFD
heuristic, each iteration packs the largest remaining item into the first bin
into which it fits. They prove that MF has a ratio guarantee of p + 2- k ,
where p ~ 1.22 and k denotes the number of binary search iterations. Later,
Friesen [177] improves the bound to 1.2, and the minimum value of p is
proved by Yue [571] to be 13/11. At the expense of a larger running time,
the MF algorithm is refined by Friesen & Langston [180] to achieve a slightly
better worst-case ratio of 72/61 + 2- k •
A similar principle of duality between scheduling and bin-packing is con-
sidered by Hochbaum & Shmoys [251]. Given the machine 'capacity' d, a
76 Bo Chen, C.N. Potts, and G.J. Woeginger

p-dual approximation algorithms (p > I) produces a job 'packing' that uses


at most the minimum number of machines of capacity d at the expense of
possible capacity violation by no more than (p - l}d. Using a family of
dual approximation algorithms, Hochbaum & Shmoys provide a PTAS for
problem PI IGmax ·
The multifit approach and dual approximation approach are extended to
uniform machines. Friesen & Langston [179] shows that the p in the worst-
case ratio of MF is between 1.341 and 1.40, which is later improved to 1.38
by Chen [90]. Extension of the dual approximation approach by Hochbaum
& Shmoys has finally lead to a PTAS for QIIGmax [252].

6.1.6 Approximation using Linear Programming


Extensive research has appeared in the literature on computing near-optimal
solutions for scheduling models by rounding optimal solutions to linear pro-
gramming relaxations. Many such applications are discussed throughout
this review. There are two general approaches for exploiting the solution to
a linear programming relaxation. The linear program is used either to guide
the assignment of jobs to machines, or to derive job priorities that are used
in constructing the schedule.
Extending the optimal solution to a linear programming relaxation by
an enumerative process, Potts [426] obtains a 2-approximation algorithm for
problem RIIGmax when m is fixed. Lenstra, Shmoys & Tardos [350] extend
Potts' approach by first establishing that the fractional solution to the linear
programming relaxation can be rounded to a good integral approximation in
polynomial time, thereby obviating the need for enumeration and removing
the exponential dependence on m, and then deriving a 2-approximation
algorithm even when m forms part of the input.
This approach is further extended to accommodate a more general ob-
jective criterion. Shmoys & Tardos [487] introduce a stronger rounding
technique than the one of Lenstra, Shmoys & Tardos to develop a polyno-
mial algorithm that can find a schedule with mean job completion time M
and makespan at most 2T, if a schedule with mean job completion time at
most M and makespan at most T exists.

6.1.7 Other Approaches for Approximation


Based on a dynamic programming algorithm for problem PmllGmax in which
the state variables at each stage i form a set S(i) of (m - l}-tuples, repre-
A Review of Machine Scheduling 77

senting the total workloads of the machines, Sahni [457] uses an interval
partitioning method to restrict IS(i)I, and hence provides an O(n(n2 /c)m-l)
time FPTAS. In a similar vein, Horowitz & Sahni [275] derive an FPTAS
for problems QmllGmax and RmllGmax , each with a similar running time.
It is interesting to note that when the number of machines is part of the
input these problems are strongly N'P-hard, as mentioned previously in this
section. In fact, Lenstra, Shmoys & Tardos [350] prove that, unless 'P=N'P,
it is impossible to derive a polynomial time approximation algorithm for
problem RIIGmax with a worst-case ratio that is strictly better than 3/2.
Using a completely different approach, Hall & Shmoys [234] introduce
the notion of an outline, which is a partial characterization of a schedule.
From the outline, it is possible to compute relatively simply and quickly
an optimal or near-optimal solution. Based on this notion they construct a
PTAS for problem PlrjlL max (assuming that all due dates are non-positive).

6.2 Minsum Criteria


6.2.1 Complexity
The best known non-trivial problem that is polynomially solvable is
RII E Gj. Horn [273] and Bruno, Coffman & Sethi [71] formulate this prob-
lem as a zero-one integer program that has a bipartite matching structure,
and hence is solvable in O(n 3 ) time. The formulation is based on the ob-
servation that the sum of completion times of jobs on the same machine
is simply a weighted sum of their processing requirements on that machine,
where the weight of any particular job processing requirement is the number
of times it contributes to the individual completion times. This observation
leads to simplifications for the case of identical or uniform machines.
It seems that this is the best we can do in terms of polynomial solvability,
since any extension to a more general criterion (even if there are only two
identical machines) or to including additional job characteristics results in
N'P-hardness. In particular, problem P211 E WjGj is N'P-hard, as shown
by Bruno, Coffman & Sethi [71], and Lenstra, Rinnooy Kan & Brucker [349],
and strong N'P-hardness of problem P21rjl E Gj can be derived from the
corresponding result for problem 11rjl E Gj (see Section 4.2.1). For complex-
ity results on minsum problems with precedence constraints, see Section 6.3.
78 Bo Chen, C.N. Potts, and G.J. Woeginger

6.2.2 Enumerative Algorithms

Various branch and bound algorithms for problem PI IE WjCj appear in the
literature. The earlier algorithms of Barnes & Brennan [42], Elmaghraby &
Park [159], and Sarin, Ahn & Bishop [462] are based on a lower bound of
Eastman, Even & Isaacs [155] that exploits the relationship with problem
111 EWjCj. Elmaghraby & Park also derive a useful dominance rule: if
Pj $ Pk and Wj ~ Wk, then there exists an optimal schedule in which job j
is started before or at the same time as job k. Webster [551, 552] derives
lower bounds based on job splitting (see Section 4.2). Computational results
show that his bounds are tighter than the lower bound of Eastman, Even &
Isaacs, although no branch and bound algorithm is developed.
Using a time-indexed formulation with variables that indicate whether
job j is processed in a unit-time interval [t - 1, t], Belouadah & Potts [48]
derive a lower bound for problem PI IE WjCj by performing a Lagrangean
relaxation of the machine capacity constraints. They compute values of the
multipliers by a simple constructive procedure, which allows the lower bound
to be computed in polynomial time, even though the number of multipliers
is pseudo-polynomial. Computational results show that the algorithm is
effective in solving instances with up to 8 machines and 40 jobs, although
problem difficulty increases as the number of machines increases.
Van den Akker, Hoogeveen & Van de Velde [534] propose an algorithm
for problem PI IE WjCj that employs column generation. This approach
uses a set partitioning formulation in which a zero-one variable for each
single machine schedule indicates whether it is included in the solution of
the parallel machine problem. Since the number of single machine schedules
is large, only a limited number are included in the formulation at the outset.
Using the dual variables in the solution of the linear programming relaxation
of the set partitioning problem, new single machine schedules are generated
by an O(nP) dynamic programming pricing algorithm, where P = Ej=lPj.
Computational results for instances with up to 10 machines and 100 jobs
show that that the column generation approach is more effective than the
branch and bound algorithm of Belouadah & Potts, especially when the
number of machines increases. An alternative column generation algorithm
is developed independently by Chen & Powell [101], although it appears less
effective than that of Van den Akker, Hoogeveen & Van de Velde. In another
study, Chan, Kaminsky, Muriel & Simchi-Levi [85] show that the optimal
value of the total weighted completion time is at most (...,fi + 1) /2 times the
value of the linear programming relaxation.
A Review of Machine Scheduling 79

6.2:3 Approximation

As observed in Section 4.2, the SWPT rule guarantees optimality for prob-
lem 111 L WjGj. If we apply list scheduling for problem PI I L WjGj, where
the list is constructed according to the SWPT rule, then the resulting heuris-
tic has a worst-case ratio of (1 + .../2)/2, which is proved by Kawaguchi &
Kyan [296].
In the same spirit as his approach that combines dynamic programming
and interval partitioning for problem PmllGmax , Sahni [457] constructs an
O(n(n 2 /€)m-l) time FPTAS for problem Pmll L WjGj . Sahni also devel-
ops a similar scheme for the following problem, which is shown by Bruno,
Coffman & Sethi [72]) to be NP-hard: among all possible (m!)rn/ml optimal
schedules for Pmll L Gj, one with smallest makespan is required.
With respect to approximation algorithms for problem Plrjl L wjGj ,
significant progress has been made recently. Phillips, Stein & Wein [417] de-
rive a (24+€)-approximation algorithm, which is based on some algorithmic
and structural relationships between preemptive and non-preemptive sched-
ules and linear programming relaxations of both. More recently, the ratio
guarantee is improved to 4 -l/m by Hall, Schulz, Shmoys & Wein [233] and
by Queyranne [439].
For a much more general model Rlrijl LWjGj, where the release date
of each job j depends on the machine to which it is assigned, Hall, Schulz,
Shmoys & Wein [233] present a 16/3-approximation algorithm. Inspired by
time-indexed integer programming formulations, they introduce the notion
of an interval-indexed formulation, in which the decision variables merely
indicate in which time-interval a given job completes. Their algorithm com-
bines techniques of rescaling and rounding the data, and builds on earlier
research of Lenstra, Shmoys & Tardos [350] and Shmoys & Tardos [487] for
constructing near-optimal solutions
Further exploitation of the relationship between preemptive and non-
preemptive schedules by Chakrabarti et al. [84] leads an algorithm that
converts a preemptive schedule with total completion time equal to G into a
non-preemptive schedule with total completion time at most ~G. When
applied to the solution of the linear programming relaxation considered
by Hall, Schulz, Shmoys & Wein [233] that is mentioned above, or to a
dual p-approximation algorithm, this technique yields 3.5- and (2.89 + €)-
approximation algorithms, respectively, for problem Ph I L Gj. These ap-
proximation results have been further improved, even for the on-line envi-
ronment (see Section 6.4.2).
80 Bo Cben, C.N. Potts, and G.J. Woeginger

Alon, Azar, Woeginger & Yadid [20] derive a PTAS for the problem of
minimizing the sum of the machine completion times taken to the power 0,
where 0 > o. Alon, Azar, Woeginger & Yadid [21] discuss the more general
problem PI IE I(Ci); here the Ci are the machine completion times and 1
is a function that maps the non-negative reals into the non-negative reals.
Assume that 1 is non-decreasing, convex, and fulfills

"Ie> 0 36> 0 'Vx,y ~ 0: Ix - yl ~ 6y ==? I/(x) - l(y)1 ~ e/(Y).


This condition essentially states that 1 grows in a reasonably moderate way.
They show that under these assumptions, problem PI IE 1(Ci) possesses a
PTAS.
Hoogeveen, Schuurman & Woeginger [263] establish the MAX SNP-
hardness of problem Rlrjl E Cj; hence, unless P=NP, this problem cannot
possess a PTAS. Bartal et al. [45] discuss a non-standard scheduling problem
on m identical machines: The scheduler may reject jobs from processing; in
this case, a job dependent penalty has to be paid. The goal is to minimize
the total penalty incurred plus the makespan of the processed jobs. They
give an FPTAS for the case where m is not part of the input, and a PTAS
for the case where m is part of the input.

6.3 Precedence Constraints


In the presence of precedence constraints, all problems of interest become
NP-hard. In particular, Sethi [472] proves that problem P21treel E Cj is
NP-hard, and Du, Leung & Young [153] prove that problem P21treeiCmax
is strongly NP-hard. These complexity results remain true' even for prece-
dence constraints in the form of chains [153].
Approximation algorithms mainly involve list scheduling. For problem
PlpreclCmax , the worst-case ratio of LS is not affected by the presence of
arbitrary precedence constraints, and remains as 2 - l/m. Graham [227]
shows it is even the case that

Cmax(LS)/C:nax(pmtn) ~ 2 -l/m,

where C~ax(pmtn) denotes the optimal schedule length if preemption is al-


lowed, which is a lower bound on C~ax. Surprisingly, Graham [227] observes
that this ratio does not improve if the priority list is based on the critical
path (CP) rule, i.e., if one always schedules the job with the longest out-
going chain of successors. However, if the precedence constraints take the
A Review of Machine Scheduling 81

form of a tree or of chains, then Kunde [314] shows that performance of CP


improves slightly since the worst-case ratios for these cases are 2 - 1/ (m -1)
and 5/3, respectively.
If processing times are taken into account, then Kaufman [295] shows
that for tree-type constraints,

Cmax(CP) :5 C~ax(pmtn) + (1 - 1/m)7r,

where, as before, 7r = maxj Pj / minj Pj. By improving the way a list schedul-
ing rule is guided by an optimal solution to a linear programming relax-
ation (cf. Section 5.3), Hall, Schulz, Shmoys & Wein [233] provide a 7-
approximation algorithm for problem Plrj,precl L,WjCj. This performance
guarantee is improved to 5.33 + c by Chakrabarti et a1. [84] after they have
incorporated the randomization technique of Goemans & Kleinberg [207].
Building on the work of Lenstra & Rinnooy Kan [346], Hoogeveen, Schu-
urman & Woeginger [263] show that, unless 'P=N'P, problem Plprecl L, Cj
does not possess a polynomial time approximation algorithm with a better
ratio guarantee than 8/7.

6.4 On-Line Algorithms


6.4.1 Scheduling Over a List
The performance of on-line algorithms without preemption is less under-
stood than that of on-line algorithms with preemption. We start with Gra-
ham's LS algorithm [225] for problem PI on-line-listl Cmax , which is (2-1/m)-
competitive. Since LS can be regarded as a non-clairvoyant on-line algo-
rithm, we discuss it in more detail in a later subsection. Faigle, Kern &
'l'uran [163] observe that for m = 2 or m = 3, the competitiveness of LS
is the best possible. Their work triggered progress on Plon-line-listICmax
for m ~ 4. From the worst-case examples for LS, the worst scenario is the
arrival of a very long job at the time when the machine loads are evenly
bala~ced (ironically, this is precisely what a good schedule should look like).
This observation suggests that a pure 'greedy' approach to the allocation
of jobs is not ideal, and that one machine should be lightly loaded to ac-
commodate any possible long jobs. Using this philosophy, Galambos &
Woeginger [184] and Chen, Van Vliet & Woeginger [94] provide algorithms
with better competitive ratios than LS for all m ~ 4. For small numbers of
machines (m :5 20), the competitive ratios of the algorithms of Chen, Van
Vliet & Woeginger are currently the best available.
82 Bo Chen, C.N. Potts, and G.J. Woeginger

For large m, the competitive ratios of the algorithms of Galambos &


Woeginger, and Chen, Van Vliet & Woeginger for problem PI on-line-listl
Cmax still approach 2. The first successful attack against the barrier of
2 is due to Bartal, Fiat, Karloff & Vohra [44]. Instead of leaving only a
constant number of machines lightly loaded, they leave a constant fraction
of machines lightly loaded. The competitive ratio of the resulting algorithm
is 1.986. Later, this is improved to 1.945 by Karger, Phillips & Torng [293]
and to 1.923 by Albers [16]. Albers also proves a lower bound of 1.852 on
the competitive ratio for large numbers of machines (m ~ 80).
Note that in the case of identical machines, assigning a job to the first
available machine is equivalent to assigning the job to the machine on which
it finishes first. Extending LS in this way to an on-line algorithm LS' for
problem Qlon-line-listICmax , Cho & Sahni [105] prove that the competitive
ratio of LS' is (1 + ../5)/2 for m = 2 and 1 + J(m -1)/2 for 3 ~ m ~ 6.
It is easy to check that for m = 2 and m = 3, these ratios are the best
possible. For other values of m, Aspnes et al. [25] prove that LS' is 9(1og m)-
competitive. Results of Cho & Sahni [105] show that this bound is tight up
to a constant factor. Aspnes et al. [25] improve the LS' rule by a guessing and
doubling strategy that leads to an 8-competitive algorithm. By a refinement
of the doubling strategy, Berman, Charikar & Karpinski [50] improve the
competitive ratio to 3+VS ~ 5.828, and they derive a lower bound of 2.4380.
Both algorithms LS and LS' are analyzed in the case of only two speeds 1
and 8 for problem Qlon-line-listICmax • For 8 < 1, Liu & Liu [358] show that
LS is (8 + (m - 1)/(8 + m - l))-competitive. For 8 > 1, Cho & Sahni [105]
prove that the competitive ratio of LS' is 3 - 4/(m + 1) for all m ~ 3. LS'
is improved recently by Li & Shi [355] with a reduction of its competitive
ratio by a constant independent of m. Their method is similar to those used
by Galambos & Woeginger [184] and Chen, Van Vliet & Woeginger [94] for
improving LS in the case of identical machines.
Epstein et al. [161] elaborate on problem Q2Ion-line-listICmax • Suppose
that the speeds of the two machines are 1 and 8 ~ 1. It is easy to see that
list scheduling is the best deterministic on-line algorithm for any choice of
8: Denote by tP = (../5 + 1)/2 ~ 1.6180 the golden ratio. Then for 8 ~ tP,
the best possible competitive ratio is 1 +8/ (S + 1), increasing from 3/2 to tP.
For S ~ tP, the best possible competitive ratio is 1 + 1/ s, decreasing from tP
to 1; this is the same ratio as for the trivial algorithm which puts all jobs on
the faster machine. It turns out that list scheduling is also the best possible
randomized algorithm for all speeds s ~ 2. On the other hand, for any speed
s < 2, randomized algorithms are provably better than deterministic ones.
A Review of Machine Scheduling 83

For problem Rlon-line-listIOmalCl Aspnes et al. [25] first show that natural
greedy approaches are far from optimally competitive by proving that the
competitive ratio of LS' is n. Then they describe an O(log m )-competitive
algorithm. Results of Azar, Naor & Rom [28] show that this algorithm is
the best possible result up to a constant factor.
Awerbuch et al. [27] consider the on-line scheduling of unrelated ma-
chines so as to minimize the Lp norm (p ~ 1) of the machine completion
times. They show that a simple algorithm, which always assigns the cur-
rent job to the machine with minimum increase to the machine finish time,
is (1 + v'2)-competitive for the Euclidean norm and is O(p)-competitive in
other Lp norms, and it is the best possible up to a constant factor. In the
special case of identical machines with the Euclidean norm, Avidor, Azar &
Sgall [26] prove that assigning the current job to the least loaded machine
is the best strategy with a competitive ratio of J4/3. For the case that m
is known to be sufficiently large, they provide an improved algorithm that
is (J4/3 - c)-competitive for some constant c > O. Moreover, for any Lp
norm and for any sufficiently large m, they give an algorithm that beats list
scheduling.
Bartal et al. consider on-line scheduling over a list on parallel machines
with job penalties (cf. Section 6.2.3). For m = 2, they derive a best possible
competitive ratio of (v'5 + 1)/2 ~ 1.6180. For an unbounded number of
machines, they derive a best possible competitive ratio of (v'5 + 3)/2 ~
2.6180. The best possible competitive ratio for fixed m ~ 3 is unknown.

6.4.2 Scheduling Over Time


For the objective of minimizing the makespan, Shmoys, Wein & Wil-
liamson [488] describe a general technique to convert a non-clairvoyant
scheduling algorithm for a problem with all jobs released at the same time to
a non-clairvoyant on-line algorithm that can handle unknown release dates.
They show that the quality of the schedule thus constructed is within a
factor of 2 of the quality of schedules constructed by the off-line algorithm
in the simpler environment. This technique does not only apply to parallel
machine scheduling, but also to the entire class of shop scheduling problems.
However, since anyon-line algorithm that is designed by application of this
technique cannot be guaranteed to be better than 2-competitive, we may
expect to find better on-line algorithms for specific problems.
For problem Pion-line, rj lOmax, Chen & Vestjens [96] prove that an
adapted LPT algorithm, which always schedules an available job with
84 Bo Chen, C.N. Potts, and G.J. Woeginger

the largest processing time once a machine becomes available, is (3/2)-


competitive. They also show that no on-line algorithm can be better
than 1.347-competitive for arbitrary m, or better than 1.382-competitive
for m = 2.
We next consider the more general problem Pion-line, TjlLmax, where
all due dates are assumed to be non-positive when discussing competitive
ratios. Gusfield [230] proves that the difference between the maximum late-
ness produced by an on-line heuristic and the minimum possible maximum
lateness is at most (2 - l/m)Pmax. This directly implies a weak bound of
3 - l/m on the competitive ratio of the algorithm. Hall & Shmoys [234]
observe that LS is 2-competitive even in the presence of precedence con-
straints. Vestjens [547] shows that no on-line algorithm can. be better than
(3/2)-competitive.
For problem Pion-line, Tjl E Cj, Phillips, Stein & Wein [417] give a
3-competitive algorithm, which repeatedly converts some partial pseudo-
schedules for problem Pion-line, Tj, pmtnl E Cj into non-preemptive partial
schedules by list scheduling jobs non-preemptively in the order of their com-
pletion times. Vestjens [547] shows that 1.309 is a lower bound on the
competitive ratio of anyon-line algorithm. For specific values of m, he gives
improved lower bounds.
Using a dual p-approximation ~lgorithm (see Section 6.1.5) as a sub-
routine, Hall, Shmoys & Wein [236] construct a so-called greedy-interval
framework, which yields a 4p-competitive algorithm for minimizing the
total weighted completion time. In particular, this results in (4 +
c)- and 8-competitive algorithms for problems Plon-line,Tjl EWjGj and
Rlon-line,Tjl EWjGj, respectively. Improving and extending this on-line
framework, Chakrabarti et al. [84] give a randomized on-line algorithm for
bicriteria scheduling of identical machines. They show that, given a dual
p-approximation algorithm for problem PI IE WjGj, their algorithm yields
a schedule that is simultaneously within a factor of 4p and 2.89p of the mini-
mum makespan and total weighted completion time, respectively. For a very
general class of scheduling models, Stein & Wein [500] provide a structural
proof of the existence of schedules that are simultaneously within a factor
of 2 of the minimum total weighted completion time and of the minimum
makespan.
By using a preemptive one-machine relaxation, Chekuri, Motwani,
Natarajan & Stein [89] provide a (3 -l/m)-competitive algorithm for prob-
lem Pion-line, Tjl E Gj. Further, they show that the algorithm can be im-
proved to be 2.85-competitive by incorporating a modified greedy rule.
A Review of Machine Scheduling 85

For uniform machines, Jaffe [283] presents an O{ v'ffl)-competitive al-


gorithm for problem Qlon-line,rj,precIGmax • Since this algorithm is a
variant of LS, it is non-clairvoyant and discussed later in Section 6.4.3.
Based on a linear programming technique for estimating the speed at
which each job should be run and another variant of the LS algorithm
that can exploit this additional information, Chudak & Shmoys [110] give
an O(logm)-competitive algorithm. They also extend this ,result to an
O{logm)-competitive algorithm for problem Qlon-line, rj,precl ~WjGj.

6.4.3 Non-Clairvoyant Scheduling


Observe that LS is a non-clairvoyant on-line scheduling algorithm. A clas-
sical result of Graham [225] yields that LS is {2 - 1/m)-competitive for
Pmlon-line-list-nc1v1Gmax ; LS is the best possible non-clairvoyant algorithm
for this problem. Shmoys, Wein & Williamson [488] prove that even for the
preemptive problem Pmlon-line-list-nc1v,pmtnIGmax , no better competitive
ratio can be reached. LS also is (2 -1/m)-competitive for all kinds of other
variants with release dates, precedence constraints and preemption. As an
interesting consequence, the usual fundamental difference between the pre-
emptive and the non-preemptive models disappears when jobs are scheduled
non-clairvoyantly on-line.
In terms of 7r = maxj Pj / minj Pj, the competitive ratio of algorithm
LS for problem Pmlon-line-list-nc1v1Gmax , which we denote by Rm{LS), is
derived by Achugbue & Chin [5]. If 7r :5 3, then

5/3 ifm = 3,4,


Rm(LS) = { 17/10 ifm = 5,
2 -1/(3Lm/3J) ifm ~ 6.

If 7r :5 2, then

Rm(LS) _ {3/2 m = 2,3,


- 5/3 -1/{3Lm/2J) if m ~ 4.

The performance of LS can be significantly worse if the machines have differ-


ent speeds. For problem Qlon-line-list-nc1v1Gmax , Liu & Liu [358, 359, 360]
show that Rm{LS) = 1 + u - maxi 8i/ ~i 8i, where u = max; 8i/ min; 8i.
This bound also holds in the presence of precedence constraints. Note that
when all the speeds are equal, then this competitive ratio precisely reduces
to Graham's result [225].
86 Bo Chen, C.N. Potts, and G.J. Woeginger

For problem Qlon-line-list-nc1v, preclCmax , Jaffe [283] demonstrates that


it is beneficial to selectively disregard the slow machines and to apply algo-
rithm LS only to those machines whose speeds are within a factor of fo of
the fastest machine. Jaffe proves that this variant ofLS is (fo+O(ml/4))-
competitive. This algorithm is shown to be asymptotically optimal by Davis
& Jaffe [126], since a non-clairvoyant algorithm can never be better than
(fo)-competitive. In a similar vein, Davis & Jaffe develop a non-clairvoyant
variant of LS for Rlon-line-list-nc1vJCmax and prove that it has a competitive
ratio of 2.5fo + 1 + 1/(2fo).
Note that all the above non-clairvoyant scheduling results are applicable
in the case that all jobs are released at the same time. Hence, we may invoke
the general technique of Shmoys, Wein & Williamson [488] to convert a non-
clairvoyant algorithm for scheduling jobs that arrive at the same time into
a non-clairvoyant algorithm for scheduling jobs that arrive over time (see
Section 6.4.2). This implies that all of these results can be carried over to
the situation where jobs arrive over time, while losing a factor of at most 2
in the competitive ratios.
Shmoys, Wein & Williamson [488] discuss a variant of non-clairvoyant
scheduling where job restarts are allowed. For problem QI on-line-nc1v, Tj I
Cmax , they describe a (4logm + 6)-competitive on-line algorithm, which is
the best possible up to a constant factor. For problem RI on-line-nc1v, Tjl
Cmax (where for each job the relative speeds of the machines are known),
they obtain an algorithm whose competitive ratio 8 log n + 17 depends on
the number of jobs.

7 Multi-Stage Problems
In a multi-stage scheduling problem, the processing of each job is split into
several operations, and each operation is assigned to one of s stages. For
every stage, there is a corresponding machine type; operations in this stage
can only be processed by machines of this type.
The three basic models of multi-stage scheduling (open shop, flow shop,
and job shop) are introduced in Section 2.1. In the basic model, there is
exactly one machine available for each stage. In an open shop and in a flow
shop, each job j comprises s operations Olj,"" Osj, where operation Oij
can only be processed at stage i by machine type i. The processing time of
operation Oij is denoted by Pij. In a job shop, each job j consists of a chain
of operations. Every operation is assigned to a stage; the same stage may
A Review of Machine Scheduling 87

occur several times in the chain. In the multiprocessor variants of a shop


problem, for every stage i there are mi identical machines that operate in
parallel. In these variants, an operation at stage i may be executed by any
of these mi machines.
With respect to job precedence constraints, recall that that job j pre-
cedes job k demands that job k cannot start before job j has finished. In
the context of multi-stage problems, this job dependence means that none
of operations of job k can start before all operations of job j have finished.
We note that there are different interpretations of precedence constraints
for multi-stage problems, the main one of which is as follows: that job j
has precedence over job k only requires that, on any machine, operations of
job j have to finish before those of job k can start. In this chapter, we are
only concerned with precedence constraints of the former type. For other
types of precedence constraints, we refer the reader to Strusevich [508] as a
starting point.
By an entry "op = k" or "op $ k" in the second field of the scheduling
notation, we denote the situation where every job consists of precisely k
operations or at most k operations, respectively. Moreover, let Pj = Ei Pij
be the total processing time or length of job j, and let P max = maxj Pj
be the length of a longest job. Further, for the open shop and flow shop,
IIi denotes the total processing time at stage i, where ITi = Ej Pij, and
ITmax = maxi ITi.

7.1 The Open Shop


7.1.1 Complexity
Gonzalez & Sahni [215] show that 0211Cmax allows a very simple polynomial
time solution. An obvious lower bound on the makespan is

They prove that there always exists a schedule with makespan that is equal
to this lower bound, and that this schedule can be found in linear time.
Since this lower bound is also a lower bound for the corresponding preemp-
tive problem, the same algorithm also solves problem 021pmtniCmax in O(n)
time. Based on similar arguments and on extensive case distinctions, Shak-
levich & Strusevich [484] develop linear time algorithms for the preemptive
and non-preemptive scheduling of a two-machine open shop to minimize an
arbitrary non-decreasing objective function of the two machine completion
88 Bo Cben, C.N. Potts, and G.J. Woeginger

times. In contrast, Sahni & Cho [460] prove strong N'P-hardness of the
no-wait problem 021no-waitiCmax in which the second operation of each job
must start immediately after the completion of its first operation.
Most remaining non-preemptive open shop problems are NP-hard. For
example, Gonzalez & Sahni [215] prove that problem 0311Cmax is N'P-hard.
However, deciding whether 0311Cmax is strongly N'P-hard is an outstand-
ing open problem. Open shop problems that are known to be strongly
NP-hard include 01 1{Cmax ~ 41) and hence OliCmax (Williamson et
al. [559]), 021rjlCmax (Lawler, Lenstra & Rinnooy Kan [334]), 021treeiCmax
(Lenstra [345]), 0211Lmax (Lawler, Lenstra & Rinnooy Kan [334]), and
0211 ~ Cj (Achugbue & Chin [6]). Adiri & Aizikowitz [9] show that if
some machine h dominates another machine i in the sense that minj Pkj ~
maxjPij, then 0311Cmax is solvable in polynomial time.
Many non-preemptive open shop problems are N'P-hard, even if ev-
ery job comprises only two operations (with the other operations missing).
Gonzalez & Sahni [215] prove that 0410p = 21Cmax is N'P-hard. The com-
putational complexity of 0310p = 21Cmax is unknown. Note that the jobs
in this problem are of three types (jobs that require stages 1 and 2, stages 1
and 3, and stages 2 and 3, respectively). Drobouchevitch & Strusevich [144]
prove that if there are only jobs of two such types, then 0310p ~ 21Cmax is
polynomially solvable.
Allowing preemption makes some open shop problems easier. For ex-
ample, problem OlpmtnlCmax can be solved in polynomial time by apply-
ing techniques from matching theory (Gonzalez & Sahni [215]). Lawler
& Labetoulle [333] and Gonzalez [210] develop other network formulations
and speed up the running time for solving this problem. Similar tech-
niques are used by Lawler, Lenstra & Rinnooy Kan [334] to obtain lin-
ear time algorithms for problems 021pmtniLmax and 02Irj,pmtnICmax . By
combining a linear programming formulation with binary search, Cho &
Sahni [106] derive polynomial time algorithms for problems OlpmtnlLmax
and Olrj,pmtnICmax. The algorithm of Cho & Sahni yields preemptive
schedules that often mix the operations of jobs, i.e., one operation is pre-
empted, and before this operation is resumed and completed, another op-
eration that belongs to the same job is started and preempted, and so on.
Interestingly, Cho & Sahni [106] prove that problem 03lrj,pmtnlCmax be-
comes N'P-hard if the mixing of operations is forbidden. Du & Leung [150]
show that problem 021pmtnl ~ Cj is NP-hard, and Liu & Bulfin [356] es-
tablish that problem 031pmtnl ~ Cj is strongly NP-hard.
A number of polynomially solvable cases of open shop problems have
A Review of Machine Scheduling 89

been identified. For problem OmllCmax , it turns out that if the ratio
TImax/Pmax (i.e., the ratio between the largest machine load and the longest
operation) is sufficiently large, then the problem becomes easy. More pre-
cisely, Fiala [168] applies deep results from graph theory together with so-
called integer making techniques to prove that if TImax ~ (16m' log m' +
5m')Pmax, where m' is the smallest power of 2 larger than m, then the
optimal makespan is equal to TImax. Moreover, Fiala's result also yields
a polynomial time algorithm for constructing an optimal schedule for this
special case. By applying geometric methods (so-called compact vector sum-
mation techniques) for problem 031 ICmax , Sevastianov [478] proves that if
TImax ~ 7pmax, then the optimal makespan is equal to TImax. His results can
be also translated into a polynomial time algorithm for this special case.
We refer to Sevastianov [473, 476, 477] for several other results in the same
spirit and for the corresponding references in the Soviet literature. We note
that these techniques have been successfully applied to the approximation
of flow shop problems to get absolute guarantees (see Section 7.2.4).
The open shop in which all operations have unit processing times tend to
be much easier to solve than their unrestricted counterparts. Unit processing
time problems are closely related to Latin squares and to Latin rectangles;
various techniques such as those from edge coloring in bipartite graphs and
from matching theory are useful. For example, Liu & Bulfin [357] show that
problems Oh,Pij = IICmax, 0IPij = IILmax, 0IPij = 11 ETj and 0IPij =
11 E Uj are solvable in polynomial time. For more information and numerous
references, we refer the reader to the survey of Kubiak, Sriskandarajah &
Zaras [312], and to the articles of Gonzalez [211], Liu & Bulfin [357], and
Brucker, Jurisch & Jurisch [66].
We now consider multiprocessor variants. Problem 02(P2, PI)IICmax
(and by symmetry 02(Pl, P2)IICmax ) is easily seen to be N'P-hard, since it
contains P211Cmax as a subproblem. It is unknown whether these two open
shop problems are strongly N'P-hard. Lawler, Luby & Vazirani [336] prove
that O(P)lpmtnlCmax is polynomially solvable. They also derive similar
polynomial time algorithms for generalizations with uniform machines and
with unrelated machines at each stage.

7.1.2 Enumerative Algorithms


In the design of enumerative and heuristic methods for problem OIICmax ,
the following disjunctive graph formulation is useful. For each operation Oij,
there is a vertex with weight Pij. There is an undirected edge correspond-
90 Bo Chen, C.N. Potts, and G.J. Woeginger

ing to each pair of operations that require the same job, and to each pair
of operations that require the same machine. Choosing the order in which
each job's operations are performed and the processing order on every ma-
chine corresponds to orienting the edges to produce a directed acyclic graph.
Clearly, it is required to find an orientation that minimizes the length of a
longest or critical path, where the length is defined as the sum of weights of
vertices which lie on the path.
During the course of an algorithm, some of the disjunctive edges are
oriented. For any operation Oij, its head represents an earliest start time.
For example, the length of a longest path in the graph containing oriented
edges to a predecessor of the vertex corresponding to Oij is a possible value
for the head. Similarly, the tail for Oij is the minimum time that must
elapse between the completion of Oij and the completion of the last oper-
ation. A possible value for the tail is the length of a longest path from a
successor of the vertex corresponding to Oij. Heads and tails are useful in
the computation of lower bounds.
Brucker, Hurink, Jurish & Wostmann [62] propose a branch and bound
algorithm that relies heavily on the disjunctive graph formulation. They use
a heuristic method to construct a feasible solution at each node of the search
tree, and their branching rules is based on the following observations as to
how the heuristic schedule can be improved. A critical path for the heuristic
schedule contains blocks, where a block is a maximal set of vertices that
each correspond to operations of the same job, or to operations that each
require the same machine. To obtain an improved schedule, the jobs in some
block must be reordered so that a different operation is sequenced either
before all the other operations in the block or after all the other operations
in the block. Thus, each branch of the search tree introduces precedence
constraints to enforce such a reordering within some block. Lower bounds
are computed by considering a subproblem for each of the machines. For
a given machine, only the operations on that machine, together with their
heads and tails are considered. The solution of problem llrj,pmtnILmax,
where the release date is the head of the operation and the due date is minus
the tail, provides a lower bound. Additional lower bounds are computed in
a similar way for each job by considering all of its operations. Together with
their heads and tails, scheduling these operations is equivalent to solving
problem llrj,pmtnIL max . Computational results show that the algorithm is
effective in solving instances with up to 10 jobs and 10 machines.
A Review of Machine Scheduling 91

7.1.3 Local Search


There are relatively few studies on local search and other heuristics for
problem 0IICmax . Taillard [521] presents detailed computational results for
a tabu search algorithm, but does not provide details of his method. Brasel,
Tautenhahn & Werner [57] develop two types of heuristics. The first heuris-
tic solves a sequence of bipartite matching problems, each of which is used
to schedule min{ m, n} operations, each belonging to a different job and re-
quiring a different machine. Even though different objective functions for
the matching problem are tested, the solution quality is poor. The second
heuristic uses insertion. Specifically, all possible positions to insert the next
operation are considered, and a position that yields the smallest makespan
among the resulting partial schedules is selected. In some variants of the
algorithm, several of the best positions are retained for further considera-
tion. The operations are inserted in non-increasing order of their processing
times, but with the added constraint that the first min{ m, n} operations
must be for different jobs and require different machines. Computational
results for instances with m = n show that the insertion heuristic often gen-
erates superior solutions to Taillard's tabu search method, and is therefore
preferred.

7.1.4 Approximation: Ratio Guarantees


A feasible schedule for the open shop problem is called dense when any ma-
chine is idle if and only if there is no job which currently could be processed
on that machine. This concept is introduced by Anna Racsmany (see Barany
& Fiala [40]) who observes for problem OliCmax that the makespan of any
dense schedule is at most twice the optimal makespan. This result can also
be derived as a corollary from a more general result of Aksjonov [15]. It
is conjectured that, for every m ~ 2, the makespan of a dense schedule for
problem OmllCmax is at most 2 -11m times the optimal makespan. Chen
& Strusevich [93] prove this conjecture for m ~ 3.
Sevastianov & Woeginger [480] present a PTAS for problem OmllCmax .
It is unknown whether there exists an FPTAS for OmllCmax . For the gen-
eral problem OIICmax , where the number of machines is part of the input,
Williamson et al. [559] prove that the existence of a polynomial time approx-
imation algorithm with worst-case ratio strictly less than 5/4 would imply
p=Np. This is a consequence of the fact that, unless P=NP, it is impos-
sible to verify in polynomial time whether there exists a schedule of length 4
92 Bo Chen, C.N. Potts, and G.J. Woeginger

for an input where all processing times are integer. Williamson et al. also
show there is a polynomial algorithm for determining whether there exists a
schedule of length 3. Hoogeveen, Schuurman & Woeginger [263] prove that,
unless P=N'P, problem 011 E Gj does not possess a PTAS.
For problem O{P)IIGmax , Schuurman & Woeginger [471] give a simple
2-approximation algorithm that carries over the concept of dense schedules
of Racsmany to the multiprocessor case. For problem 02{P)IIGmax with
two stages, they give an improved approximation algorithm with a ratio
guarantee of 3/2 + c. The PTAS of Sevastianov & Woeginger [480] can be
generalized to problem Os{Pm)IIGmax in which there are a constant number
of stages and a constant number of machines per stage.
Chen, Vestjens & Woeginger [97] consider on-line open shop problems.
They provide approximation algorithms for problems 021 on-line, TjlGmax
and 02 Ion-line, Tj,pmtnlGmax with competitive ratios of 3/2 and 5/4, re-
spectively; both results are the best possible. They also show that, for prob-
lems 02 Ion-line-nc1v, TjlGmax and 02 Ion-line-nc1v, Tj,pmtnIGmax, a greedy
algorithm provides the best possible competitive ratio of 3/2 in each case.
Chen and Woeginger [98] give a 1.875-competitive algorithm for problem
02 Ion-line-listl Gmax , and they show a lower bound of {.;5 + 1)/2 ::::: 1.618
on the competitive ratio of any algorithm for this problem. For problem
02Ion-line-list,pmtnIGmax , they design a best possible {4/3)-competitive al-
gorithm.

7.2 The Flow Shop


7.2.1 Complexity
A permutation schedule for a flow shop instance is a schedule in which
each machine processes the jobs in the same order. Conway, Maxwell &
Miller [114] show that, for any instance of problem FIIGmax , there always
exists an optimal schedule with the same processing order on the first two
machines and with the same processing order on the last two machines. Con-
sequently, if there are only two or three machines, then problem FIIGmax
has an optimal solution that is a permutation schedule. An analogous state-
ment does not hold for four machines: for two jobs with processing times
(4, 1, 1,4) and (1,4,4, 1), the optimal schedule has a makespan of12 whereas
the best permutation schedule has a makespan of 14. More generally, let
4>{ m) denote the least upper bound on the ratio between the makespan of
the best permutation schedule and the makes pan of the best unrestricted
A Review of Machine Scheduling 93

schedule, where the ratio is taken over all instances of problem FmllCmax .
Rock & Schmidt [450] provide an algorithm that yields ¢(m) ~ rm/2l
Potts, Shmoys & Williamson [429] construct a family of instances of prob-
lem FmllCmax for which ¢(m) ~ rrm + 1/21/2. The exact growth rate of
¢(m) is unknown.
In one of the first papers in the theory of scheduling, Johnson [286]
demonstrates that problem F211Cmax can be solved in O{nlogn) time by
the following sequencing rule: first schedule the jobs with Plj ~ P2j in or-
der of nondecreasing Plj, and then schedule the remaining jobs in order
of nonincreasing P2j. Note that this rule produces a permutation sched-
ule. Gonzalez & Sahni [215] observe that in the preemptive flow shop,
preemptions on machine 1 and on machine m can be removed without in-
creasing the makespan. Hence, Johnson's algorithm also solves problem
F21pmtniCmax in O(nlogn) time. On the other hand, Garey, Johnson &
Sethi [190] show that problem F311Cmax is strongly NP-hard. As an im-
mediate consequence, finding the best permutation schedule for problem
FmllCmax , for m ~ 3, is also strongly NP-hard. There are several polyno-
mially solvable special cases of problem FmllCmax that result from imposing
certain inequalities on the processing times. For example, Johnson [286] ob-
serves that if maxj P2j ~ max{minj Plj, minj P3j} holds in an instance of
problem F311Cmax , then the second machine is non-bottleneck, and the op-
timal algorithm for problem F211Cmax can be suitably adapted. Monma &
Rinnooy Kan [384] provide a survey of these types of results.
The following non-preemptive flow-shop problems are all strongly Np-
hard: F2hlCmax, F211Lmax, and F21treeiCmax (Lenstra, Rinnooy Kan &
Brucker [349]); and F211 E Cj (Garey, Johnson & Sethi [190]). In fact,
F211 E Cj is strongly NP-hard even if all operations on the first machine
have the same processing time (Hoogeveen & Kawaguchi [259]). Also,
the following preemptive flow-shop problems are all strongly NP-hard:
F2lrj,pmtnlCmax and F21pmtniLmax (Cho & Sahni [106]); F31pmtniCmax
(Gonzalez & Sahni [215]); and F21pmtnl E Cj (Du & Leung [150]).
Shaklevich, Hoogeveen & Pinedo [482] consider proportionate flow shops,
where for every job j all of its operations Oij have the same processing
time Pj. For regular objective functions in proportionate flow shops, they
show that an optimal solution can always be found among permutation
schedules. This property yields, for many objective functions, polynomial
time algorithms that are based on sorting. For example, problem FIICmax
can be solved in O(nlogn) time in a proportionate flow shop. As a not-so-
straightforward result, they show that, in a proportionate flow shop, problem
94 Bo Chen, C.N. Potts, and G.J. Woeginger

FII E WjGj is solvable in O(n2 ) time by a greedy approach.


A prominent variant of the flow shop imposes a no-wait constraint. In
this variant, once a job has started, it has to be processed without interrup-
tion, operation by operation, until it is completed. This variant has many
applications. For example, in systems without intermediate storage between
the machines, and in the chemical industry where jobs have to be processed
at a continuously high temperature. Hall & Sriskandarajah [239] provide a
thorough survey of complexity and algorithms for no-wait scheduling. For
problem Flno-waitlGmax , minimizing makespan can be modeled as a special
case of the traveling salesman problem (see, for example, Wismer [560]).
Since the distance matrix of the resulting traveling salesman problem has a
special combinatorial structure, the famous subtour patching technique of
Gilmore & Gomory [201] yields an O(nlogn) time algorithm for problem
F2 Ino-wait IGmax . Rote & Woeginger [451] show that the time complexity
of O(nlogn) is best possible. With a trade-off between running time and
optimality, they also develop an O(nlog(l/e)) time FPTAS. The formula-
tion of F21no-waitiGmax as a traveling salesman problem heavily exploits
the property that every job consists of two operations. If some of the jobs
only have to be processed on the first machine, this formulation breaks
down. In fact, Sahni & Cho [460] show that this variant with missing oper-
ations is strongly NP-hard. Papadimitriou & Kannelakis [414] prove that
problem F41no-waitiGmax is strongly NP-hard. Rock establishes that prob-
lem F31no-waitiGmax is also strongly NP-hard [448], and so are problems
F21no-waitiLmax and F21no-waitl E Gj (Rock [449]).
We now consider multiprocessor variants. Problems F2(P2, P1)IIGmax
and F2(P1, P2)IIGmax are easily seen to be NP-hard, since they contain
P211Gmax as a subproblem. Hoogeveen, Lenstra & Veltman [261] show that
these problems are even strongly NP-hard, and so are the preemptive prob-
lems F2(P2, P1)lpmtnlGmax and F2(P1, P2)lpmtnIGmax .

7.2.2 Enumerative Algorithms


Most of the literature on branch and bound for flow shops concentrates of
finding the best permutation schedule. The traditional approach is to em-
ploy a forward sequencing branching rule, and to obtain lower bounds by
solving subproblems that are often obtained through a relaxation of capac-
ity constraints on selected machines. Typically, a single- or two-machine
subproblem is selected, since larger subproblems are NP-hard in general.
For the permutation flow shop FIIGmax , Ignall & Schrage [280] propose a
A Review of Machine Scheduling 95

machine-based bound. Let 11 denotes the time that machine i (i = 1, ... ,m)
completes processing an initial partial sequence of jobs that are fixed. If S
is the set of unsequenced jobs, then the lower bound for machine i is equal
to the earliest time that machine i can complete all of its processing plus
the minimum time to process the last job on machines i + 1, ... , m:
m
11 + LPij + ~in L Phj'
jES JES h=i+l

McMahon & Burton [376] develop a job-based bound that can be used in
combination with the machine-based bound. However, each of these bounds
is dominated by the two-machine bounds that are developed independently
by Lageweg, Lenstra & Rinnooy Kan [318] and Potts [421]. For machines h
and i, where h < i, the two-machine bound is given by
m
Th + Mh,i(S) + ~in L Phj,
JES h=i+l

where Mh,i(S) is the makespan for the problem in which jobs of the set S
are processed on machines h, ... , i, and the capacity constraints on machines
h + 1, ... , i - I are relaxed so that they become non-bottleneck machines.
Recall that Mh,i(S) is computed by applying an adapted version of the
algorithm of Johnson [286] to sequence the jobs.
Branch and bound algorithms for the permutation flow shop FIICmax
with a forward sequencing branching rule benefit from dominance rules to
prune the search tree. These rules typically eliminate some job j from the
list of candidates to be appended to the current initial partial sequence if
certain conditions are satisfied. The most useful of these rules are developed
by Gupta & Reddi [205], McMahon [375] and Szwarc [509, 510, 511].
Problem FIICmax is reversible, which means that an equivalent problem
results if the jobs pass through the machines in the order m, ... ,1. This
property leads Potts [422] to propose a branching rule for the permuta-
tion flow shop FIICmax that allows both forward and backward sequencing,
so that each node of the search tree defines an initial and a final partial
sequence. Grabowski [221] adopts a branching rule that uses the block ap-
proach (see Section 7.1.2).
In spite of the various enhancements to the basic branch and bound
algorithms for the permutation flow shop FIICmax , the performance remains
unsatisfactory. The algorithm of Potts [422] appears to be the most effective,
96 Bo Chen, C.N. Potts, and G.J. Woeginger

although it experiences difficulty in solving moderately sized instances with


15 jobs and 4 machines.
Hariri & Potts [241] develop a branch and bound algorithms for problem
F2lpreclGmax , where job j having precedence over job k is interpreted as
constraining j to be sequenced before k on both machines. They use La-
grangean relaxation to derive a generalized job-based bound, and test three
branching rules. Computational results indicate that their best algorithm is
reasonably effective in solving instances with up to 80 jobs.
For minimizing maximum lateness in a permutation flow shop, branch
and bound algorithms are proposed for problem F211Lmax by Townsend
[527], for problem F21 Tjl Lmax by Grabowski [220], and for problem FI
Tjl Lmax by Grabowski, Skubalska & Smutnicki [224], where the two latter
algorithms are based on the block approach. Tadei, Gupta, Dela Croce &
Cortesi [519] develop branch and bound algorithms for problem F21Tj1Gmax ,
which are also applicable to the reverse problem F211Lmax. Their algorithms
include improved lower bounds and several dominance rules. In a computa-
tional comparison between a forward branching rule and a block based rule,
they find that the forward branching rule is preferred and allows instances
with up to 80 jobs to be solved effectively.
For problem F211 E Gj, branch and bound algorithms are developed by
Ignall & Schrage [280], Ahmadi & Bagchi [12], Van de Velde [536] and Della
Croce, Narayan & Tadei [129]. Ignall & Schrage use machine-based bounds,
each of which is computed by sequencing the jobs in SPT order of their
processing times on the relevant machine. Ahmadi & Bagchi develop a
stronger lower bound for the second machine by considering a subproblem
of the type 11Tjl E Gj, where the release dates are equal to the processing
times on the first machine. They obtain their bound by solving the relaxed
problem 1lTj,pmtnl E Gj using the SRPT rule. Van de Velde's bound is ob-
tained by performing a Lagrangean relaxation of the constraints that specify
that a job cannot start on the second machine until its processing is com-
pleted on the first machine. He also derives an improvement to this lower
bound. Hoogeveen & Van de Velde [266] show how Van de Velde's origi-
nal lower bound can be strengthened using their slack variable technique
(see Section 4.2.2). Della Croce, Narayan & Tadei derive improvements to
Ignall & Schrage's lower bound for the first machine, and to Ahmadi &
Bagchi's bound for the second machine. They also perform computational
tests to assess the performance of the various lower bounds in a branch and
bound algorithm. The algorithm uses a forward sequencing branching rule,
includes dominance rules and uses a descent heuristic to obtain an initial
A Review of Machine Scheduling 97

upper bound. They find that Van de Velde's bound is the most effective,
and the algorithm is successful in solving instances with up to 25 jobs.
Hariri & Potts [241] develop a branch and bound algorithm for the per-
mutation flow shop problem FII E Uj. They first observe that single ma-
chine subproblems of the type 111 E Uj can be formed and then solved using
the algorithm of Moore [385] (see Section 4.4.1). Two procedures for improv-
ing these initial lower bounds are then proposed. The branch and bound
algorithm, which uses a forward sequencing branching rule, is reasonably
effective in solving instances with up to 15 jobs, or up to 20 jobs if there are
only two or three machines.

7.2.3 Local Search


Research on heuristics and local search methods for the flow shop has focused
mainly on finding permutation schedules. For the permutation flow shop
FIIGmax , Campbell, Dudek & Smith [78] suggest aggregating machines to
produce an instance of problem F211Gmax , which is solved by the algorithm
of Johnson [286]. However, among the heuristics for the permutation flow
shop FIIGmax that do not employ local search, an O(mn 2 ) insertion method
of Nawaz, Enscore & Ham [395] is the best (see Taillard [520] for a derivation
of its time complexity). This heuristic builds a sequence by repeatedly
inserting an unscheduled job with the largest total processing time into the
best position of the current partial sequence. Rather surprisingly, it provides
better quality solutions than those given by a descent method that uses the
transpose neighborhood, as proposed by Dannenbring [120].
Various neighborhood search methods are available for the permutation
flow shop FIIGmax . In independent studies on simulated annealing, Osman
& Potts [409] and Ogbu & Smith [407] both find that the insert neighborhood
is preferred to the swap neighborhood. The same conclusion is also reached
for tabu search by Taillard [520] who uses the insert neighborhood to improve
upon a previous result of Widmer & Hertz [557] that employs the swap
neighborhood.
The more recently developed neighborhood search algorithms for the
permutation flow shop FIIGmax include special devices to improve perfor-
mance. For example, Reeves [445] develops a tabu search algorithm that
uses randomly generated subsets of the insert neighborhood, rather than the
complete neighborhood. Also, Nowicki & Smutnicki [401] use a restricted
version of the insert neighborhood. More precisely, the block structure of
some critical path (as defined in Section 7.1.2) is used to remove from con-
98 Bo Chen, C.N. Potts, and G.J. Woeginger

sideration any neighbors that cannot improve upon the makespan of the cur-
rent sequence. Another special feature of their algorithm is a backtracking
procedure which allows the search to return to a previously generated best
solution, but proceeds to a different neighborhood move to that made pre-
viously. Further, Werner [556] proposes a class of 'path' algorithms for the
permutation flow shop FIICmax • Each algorithm can be viewed as descent
with the insert neighborhood, combined with exploration and backtracking.
Reeves [446] proposes a genetic algorithm which uses the reorder
crossover in which jobs in the section of the string between the crossover
points are reordered according to their relative positions in the other solu-
tion.
Some comparative results between local search algorithms for the per-
mutation flow shop FIICmax are contained in the respective papers. Fur-
ther evaluations can be made since several of these algorithms are tested
on the instances generated by Tail1ard [521]. The tabu search algorithms of
Reeves [445] and Nowicki & Smutnicki [402] generate better quality solutions
than the other methods. However, the current champion is the algorithm of
Nowicki & Smutnicki.
We now discuss some studies with different optimality criteria. Kohler
& Steiglitz [301] compare different neighborhoods in descent algorithms for
problem F211 E Gj, while Krone & Steiglitz [309] propose a descent algo-
rithm for problem FII E Gj in which permutation schedules are not as-
sumed. For the permutation flow shop FII E WjGj, Glass & Potts [203]
perform a computational comparison of multi-start descent, simulated an-
nealing, threshold accepting, tabu search, and two genetic algorithms, one
of which applies descent to each solution in every population. The neigh-
borhood search algorithms each use the swap neighborhood which performs
marginally better than insert in initial experiments. Simulated annealing
and the genetic algorithm that incorporates descent generate the best qual-
ity solutions, and the latter method is slightly superior.
Kim [298] and Adenso-Dias [8] propose tabu search algorithms for
FII ETj and FII EWjTj, respectively. Kim uses the swap neighborhood,
whereas Adenso-Dias uses a combination of a restricted swap and a restricted
insert neighborhood.

7.2.4 Approximation: Absolute Guarantees


In this section, we only deal with makespan minimization in a flow shop. One
approach to problem FmllGmax is based on a result of 1913 by Steinitz [503],
A Review of Machine Scheduling 99

which was discovered independently by at least three groups of researchers in


the 1970s: by Belov & Stolin in Kharkov [49], by Sevastianov in Novosibirsk
[473], and by Barany & Fiala in Budapest [39,40]. Of these, only the work
of Barany & Fiala [39] is accessible to Western researchers. We refer the
reader to a comprehensive survey article of Sevastianov [476]. All of these
results show, with different precisions, that the optimum makespan cannot
be too far away from the lower bound IImax.
Typical of the absolute performance guarantees is the following result of
Sevastianov [473] and Barany & Fiala [40]. "For any instance of FmllCmax ,
there exists a permutation schedule with makespan at most IImax + m(m-
I)Pmax. Moreover, such a permutation schedule can be found in polynomial
time." The critical parameter in this statement is the term m(m -1). Belov
& Stolin [49] proved a weaker term of order 8(m5/ 2 ). Sevastianov [477]
provides a small improvement to (m - l)(m - 2 + 1/(m - 2)). For the
special case F311Gmax , Sevastianov [475] shows that there always exists a
permutation schedule with makespan bounded by IImax + 3Pmax, and that
the factor of 3 is best possible. Similar results are also known for the job
shop problem. We do not discuss these results, but instead refer the reader
to the survey of Sevastianov [476].

7.2.5 Approximation: Ratio Guarantees


A feasible schedule is active when any machine is idle if and only if there is no
job which currently could be processed on that machine without delaying an-
other operation. Gonzalez & Sahni [215] show that for the permutation flow
shop, the makespan of any active schedule is at most a factor of m away from
the optimal makespan. Gonzalez & Sahni also present an 0 (mn log n) time
r
approximation algorithm by solving m/2l two-machine flow shop subprob-
lems optimally. An alternative machine aggregation approach is described
by Nowicki & Smutnicki [399] and by Rock & Schmidt [450] that reduces the
original problem to an artificial two-machine flow shop problem for which an
optimal permutation defines an approximate permutation schedule. Such a
schedule can be found in O( mn + n log n) time. These approximation algo-
r
rithms have a worst-case ratio of m/2l.
Potts [427] investigates the performance of five polynomial time approx-
imation algorithms for problem F2/rj ICmax. The best one of these involves
the repeated application of a dynamic variant of Johnson's algorithm [286]
to modified versions of the problem, and has a worst-case ratio of 5/3.
Hall [231] derives a PTAS for problem F2/rjICmax . The strongest known
100 Bo Ohen, O.N. Potts, and G.J. Woeginger

result for makespan minimization in flow shops is a PTAS for FmlrjlCmax


(Hall [232]). Hall's PTAS can be modified to handle the case of finding the
best permutation schedule for problem FmlrjlOmax.
For the general problem FIIOmax (where the number of machines is part
of the input), Williamson et al. [559] prove that the existence of a polyno-
mial time approximation algorithm with worst-case ratio strictly less than
5/4 would imply 'P=N'P. Shmoys, Stein & Wein [486] construct randomized
approximation algorithms with a worst-case ratio of o (log2 m/ log log m) for
problem FIIOmax. Schmidt, Siegel & Srinivasan [463] show how the same ap-
proximation ratio can be achieved with a deterministic algorithm. Deciding
whether there exists an approximation algorithm with constant worst-case
ratio for problem FIIOmax is an outstanding open problem. The approx-
imability behaviour of finding the best permutation schedule for problem
FIICmax is completely unclear. We do not even know how to exclude the
existence of a PTAS.
Neumytov & Sevastianov [397] investigate the special case Flop =
210max, where every job goes through only two stages, one of which is
stage 1. They prove that this restricted problem is N'P-hard for s ~ 3
stages. Drobouchevitch & Strusevich [143] derive a polynomial time ap-
proximation algorithm with a worst-case guarantee of 3/2 for this special
case.
For the multiprocessor problem F2(P)IIOmax, Schuurman & Woegin-
ger [470] design a PTAS. The PTAS of Hall [232] can also be generalized
to problem Fs(Pm) I ICmax , i.e., to the variant with a constant number of
stages and a constant number of machines per stage. The approximability
status of F3(P)IICmax is unknown. The known approximability results for
makespan minimization in a multiprocessor flow shop are summarized in
Table 1.
Goyal & Sriskandarajah [219] survey approximation results for the no-
wait flow shop. Glass, Gupta & Potts [202] investigate the N'P-hard ver-
sion of problem F21no-waitiOmax with missing operations on the second
machine, and derive an approximation algorithm with a worst-case ratio
of 4/3. Schulz [468] describes a polynomial time approximation algorithm
with a worst-case ratio of 2m + 1 for problem Fmlrj,precl EWjOj. This
result is based on an LP-formulation and can also be generalized to prob-
lem Flrj,prec,no-waitl EWjCj. Hoogeveen, Schuurman & Woeginger [263]
prove that, unless 'P=N'P, problem FII E Cj does not possess a PTAS.
A Review of Machine Scheduling 101

Number of machines per stage


=1 constant arbitrary
Number =2 poly-time PTAS PTAS
of const > 3 PTAS PTAS open
stages arbitrary 71 PTAS 71 PTAS 71 PTAS

Table 1: The approximability of makespan minimization in a multiprocessor


flow shop. The entry "poly-time" stands for polynomially solvable, an entry
"PTAS" stands for the existence of a PTAS, and an entry "open" means
that the approximability is unknown.

7.3 The Job Shop


A job shop in which every job is processed at most once on any machine
is called acyclic. In literature of job shop scheduling, machine repetition is
usually not allowed, i.e., consecutive operations of the same job must always
be assigned to different machines. We follow this convention unless stated
otherwise.
In this subsection we use the following additional job characteristics in
the second field .

• {o,n = k,n $ k,op $ i,rep,acyc}:


- 0: no special multi-stage restrictions;
- n = k: there are exactly k jobs;
- n $ k: there are at most k jobs;
- op $ i: there are at most i operations per job;
- rep: machine repetition is allowed;
- acyc: every job is processed at most once on any machine.

7.3.1 Complexity
Apparently the job shop problem was formulated and investigated for the
first time by Akers & Friedman [14]. In fact, in the Soviet literature, the job
shop problem is usually called the Akers-Friedman-problem or AF-problem
for short. Since the job shop is a generalization of the flow shop, all negative
complexity results that are stated in Section 7.2.1 for the flow shop also ap-
ply to the job shop. In fact, we only know of two polynomially solvable cases
102. Bo Chen, C.N. Potts, and G.J. Woeginger

Number m of machines II
IIr--=~2--TI---=~3--'I-c-o-~~t-an~t-'I-M~b~it-rM-y~
II
Number =2 'P 'P 'P 'P
n of =3 'P N'P-hMd N'P-hMd N'P-hMd
jobs co~tant 'P N'P-hMd N'P-hMd N'P-hMd
Mbitrary N'P-hMd N'P-hMd N'P-hMd N'P-hMd

Table 2: The computational complexity of makespan minimization in a non-


preemptive job shop with m stages and n jobs, where an entry "'P" stands
for polynomially solvable.

of the job shop problem with an unrestricted number of jobs. First, problem
J2lop::; 21Cmax is solvable in O(n log n) time by a simple extension by Jack-
son [282] of Johnson's algorithm for problem F211Cmax . Second, problem
J21Pij = llCmax can be solved in O(n) time by an algorithm of Hefetz &
Adiri [248]. However, if we deviate slightly from these two special cases,
then we immediately encounter N'P-hMd problems: J210p :::; 31Cmax and
J310p :::; 21Cmax are N'P-hMd (Lenstra, Rinnooy Kan & Brucker [349]); and
problems J21Pij E {1,2}ICmax , J31Pij = llCmax and J21Pij = 1, replCmax
Me all strongly N'P-hMd (Lenstra & Rinnooy Kan [347]). Finally, Sahni
& Cho [460] show that problem J210p ::; 2, no-waitlCmax is strongly N'P-
hard, i.e., the vMiant where immediately after the completion of the first
operation of a job the processing of the second operation of this job must
stMt.
Another set of polynomially solvable cases of the job shop problem arises
from restricting the number of jobs in the instance. The classical result in
this Mea is due to Akers [13]: he shows that problem Jln = 21Cmax can
be formulated as a shortest path problem among rectilinear obstacles in
the plane, and hence derives a polynomial time algorithm. The approach
of Akers also applies to problem Jln = 2, replCmax . Based on ideas of
Kravchenko & Sotskov [307], Brucker [61] derives a polynomial time algo-
rithm for J21n = klCmax • All other vMiants of makespan minimization
problem Me N'P-hMd due to a breakthrough result of Sotskov & Shak-
levich [497], which states that even J31n = 31Cmax is N'P-hard. These
complexity results Me summMized in Table 2.
Somewhat surprisingly, the preemptive version of the job shop prob-
A Review of Machine Scheduling 103

lem is even harder than its non-preemptive version. Essentially, the only
polynomial solvability result known in this area comes from carrying over
the shortest path formulation of Akers [13] to Jln = 2, pmtnlCmax and to
Jln = 2, pmtn, replCmax . For more jobs, problem J21n = 3, pmtnlCmax is
already N'P-hard (Brucker, Kravchenko & Sotskov [70]).
For objective functions other than the makespan, the only known poly-
nomial solvability results are for problems J21n ~ kl E J(Cj) and J21n ~
kl max!{Cj), where! is a regular function of the job completion times
(Brucker, Kravchenko & Sotskov [69]).

7.3.2 Enumerative Algorithms


Most of the research on branch and bound algorithms for the job shop
is for minimizing the makespan. As in Section 7.1.2 for the open shop,
the disjunctive graph model is again useful for enumerative and heuristic
methods for the job shop problem JIICmax . In this case, edges between
operations of the same job are directed, thereby reflecting the order of these
operations. It is required to orient the undirected edges are for each pair of
operations that require the same machines.
In early approaches by Nemeti [396], Charlton & Death [88] and
Schrage [465] for solving problem JIICmax by branch and bound, a lower
bound is obtained by disregarding edges that are not oriented in the dis-
junctive graph formulation, and then finding the length of a longest path.
However, a much stronger lower bound is obtained by solving single machine
subproblems, as observed by Bratley, Florian & Robillard [58] and McMahon
& Florian [377]. For a given machine, the corresponding operations, together
with their heads and tails, define an instance of problem llrjlLmax, where
the release date is the head of the operation and the due date is minus the
tail. The solution value of this problem provides a lower bound. Thus, the
results of Section 4.1 are useful. In particular, the single machine problem
can be solved by the efficient branch and bound algorithm of Carlier [79], or
alternatively the preemptive relaxation, namely problem llrj,pmtnILmax,
can be solved using the generalized EDD algorithm. Lageweg, Lenstra &
Rinnooy Kan [317] observe that a slight strengthening of the lower bound is
possible by considering problem llrj,precILmax, where the precedence con-
straints between operations on the selected machine are obtained from the
oriented edges in the disjunctive graph. Brucker & Jurisch [65] develop an
alternative lower bound using a two-job relaxation. It appears to be useful
for problems in which the number of machines exceeds the number of jobs.
104 Bo Chen, C.N. Potts, and G.J. Woeginger

Other approaches for obtaining lower bounds have not led to improved
branch and bound algorithms, generally because the quality of the lower
bound is not sufficient to justify a very high investment in computation
time. Examples include the surrogate duality relaxations of Fisher, Lageweg,
Lenstra & Rinnooy Kan [173], and the linear programming relaxations of
Balas [36], Applegate & Cook [24] and Martin & Shmoys [368].
Three main types of branching rules are used in branch and bound algo-
rithms for problem JIICmax . First, a forward branching rule can be used to
generate active schedules, as suggested by GifHer & Thompson [200]. Sec-
ond, a binary branching rule can be used to orient an edge one way or the
other in the disjunctive graph formulation, as proposed by Lageweg, Lenstra
& Rinnooy Kan [317]. There are various methods for selecting the edge that
is used for branching. Third, using a block approach (see Sections 7.1.2 and
7.2.2), Barker & McMahon [41] suggest a branching rule that introduces
precedence constraints that either force one operation of a block to be a
predecessor or a successor of all other operations in this block.
Dominance rules are frequently applied to orient some of the disjunctive
arcs. Any feasible solution provides an upper bound on the makespan, from
which deadlines for individual operations can be computed. If it can be
shown that orienting a disjunctive edge in a particular direction leads to a
deadline violation, then for subsequent computations this arc can be oriented
in the reverse direction. Such an orientation is known as immediate selection.
Orienting disjunctive arcs may lead to increased values for heads and tails,
which may in turn provide better lower bounds.
We now describe the features of some of the more successful branch
and bound algorithms for problem JIICmax . McMahon & Florian [377]
use a forward active schedule branching rule, and solve problems 11ri ILmax
(with their own algorithm) to obtain lower bounds. Using the same lower
bound, Barker & McMahon [41] design an algorithm that uses a block
based branching rule. Lageweg, Lenstra & Rinnooy Kan [317] use a bi-
nary edge orientation branching rule, and obtain lower bounds by solving
problems 11ri, preclLmax. Carlier & Pinson [81] also use a binary edge ori-
entation branching rule (but with a different method for selecting edges),
and compute lower bounds by solving problems 1Irj,pmtn,precILmax, after
first applying immediate selection rules. In follow-up studies, Carlier & Pin-
son [82, 83], Applegate & Cook [24], and Brucker, Jurish & Kramer [67] de-
velop more effective procedures for immediate selection, and adjusting heads
and tails, thereby yielding improved branch and bound algorithms. Brucker,
Jurish & Sievers [68] develop a branch and bound algorithm that uses a
A Review of Machine Scheduling 105

block based branching rule, applies immediate selection, and computes lower
bounds by solving problems llrj,pmtn,precILmax. Martin & Shmoys [368]
introduce more sophisticated techniques for adjusting heads and tails that
lead to tighter lower bounds, and they propose two new branching rules.
For problem JIICmax , there are several test instances in the literature
including those of Fisher & Thompson [171], Lawrence [339], Adams, Balas
& Zawack [7] and Taillard [521]. The most famous has 10 jobs and 10
machines, and remained unsolved for many years until Carier & Pinson [81]
obtained an optimal solution. Even though the most recent algorithms now
solve this problem without much difficulty, there are instances with 15 jobs
and 15 machines that cannot be solved by currently available algorithms
using reasonable amounts of computation time.

7.3.3 Local Search


Many heuristics are based on the use of priority rules, which are surveyed by
Haupt [247]. These approaches use a priority rule to select an operation from
a set of candidates to be sequenced next. The candidates may be chosen to
create a non-delay schedule (no machine idle time is allowed if operations are
available to be processed), an active schedule, or a limited delay schedule.
Although priority rule heuristics are undemanding in their computational
requirements, the quality of schedules that are generated tends to be erratic.
An effective heuristic approach is the shifting bottleneck procedure of
Adams, Balas & Zawack [7]. This procedure constructs a schedule by select-
ing each machine in turn and orienting all of the corresponding edges in the
disjunctive graph formulation. To orient these edges, a problem llrjlLmax
for a selected (bottleneck) machine is solved by the branch and bound al-
gorithm of Carlier [79], where previously oriented edges are used in the
computation of release dates and due dates. Applegate & Cook [24] propose
some improvements to the original shifting bottleneck procedure. Dauzere-
Peres & Lasserre [121] observe that the current orientation of some edges in
the shifting bottleneck procedure may create a path between two operations
which require the same machine. In this case, the minimum time delay
between the start times of these operations can be computed. Thus, the
llr;lLmax problems that are considered within the shifting bottleneck pro-
cedure should ideally incorporate delayed precedence constraints to account
for these delays. Dauzere-Peres & Lasserre use a heuristic approach for
these single machine problems with delayed precedence constraints, whereas
Balas, Lenstra & Vazacopoulos [37] obtain an exact solution by designing a
106 Bo Chen, C.N. Potts, and G.J. Woeginger

generalized version of Carlier's algorithm.


Various local search algorithms for JIICmax have been proposed, and
they are reviewed by Vaessens, Aarts & Lenstra [532]. For neighborhood
search, the block structure corresponding to some critical path again plays
a key role. An improved schedule can only be obtained if, for at least one
block, a different operation is sequenced either before all the other operations
in the block or after all the other operations in the block. For the critical
transpose neighborhood, two adjacent operations in a block are transposed;
however, critical end transpose restricts the transpositions to either the first
pair or the last pair of operations in a block. Similarly, for the critical end
insert neighborhood, an operation in a block is inserted in the first or in the
last position of the block.
Simulated annealing algorithms for problem JIICmax are proposed by
Matsuo, Suh & Sullivan [373], Van Laarhoven, Aarts & Lenstra [539] and
Yamada, Rosen & Nakano [565]. Matsuo, Suh & Sullivan use an exten-
sion of the critical end transpose neighborhood in which transposes of the
predecessors and successors of one of the originally transposed operations
are explored. The larger critical transpose neighborhood is adopted by Van
Laarhoven, Aarts & Lenstra. Yamada, Rosen & Nakano [565] use the criti-
cal end insert neighborhood, and allow the possibility of backtracking to the
best schedule that is currently generated.
Tabu search provides an attractive alternative to simulated annealing
for problem JIICmax . Taillard [522] use the critical transpose neighborhood.
Special features of his method are the replacement of exact evaluations of
the makespan of each neighbor by quickly computed lower bound estimates,
and a randomly changing length of the tabu list. Barnes & Chambers [43]
build on Taillard's method by including a backtracking device. Dell' Amico &
Trubian [134] use a composite neighborhood consisting of generalized critical
end transpose, which allows the reordering of three critical operations (one of
which must start or end a block), and of critical end insert. Their algorithm
adopts Tailard's ideas of selecting moves according to quickly computed
lower bounds for the makespan, and of using a variable length tabu list.
Nowicki & Smutnicki [400] use the critical end transpose neighborhood, and
allow backtracking, as in their algorithm for the permutation flow shop
FIICmax .
Balas & Vazacopoulos [38] propose a guided local search procedure, which
resembles tabu search based on the critical end insert neighborhood with a
backtracking procedure. They also suggest several hybrids in which guided
local search is embedded in the shifting bottleneck procedure.
A Review of Machine Scheduling 107

Genetic algorithms for problem JIICmax often have special representa-


tions of solutions which require some type of heuristic to covert the repre-
sentation into a schedule. Storer, Wu & Vaccari [505] propose a data per-
turbation representation, and a schedule is constructed from the perturbed
data by using SPT as a priority rule to construct a limited delay schedule.
They also suggest a heuristic set representation in which each string entry
defines a priority rule that is used within a particular time period when lim-
ited delay scheduling is applied. Similar heuristic set approaches are used by
Dorndorf & Pesch [142] and Smith [494]. Della Croce, Tadei & Volta [130]
propose a priority representation in which each machine has a sequence that
defines a priority order that is used in limited delay scheduling. A similar
approach is used by Yamada & Nakano [564] who use completion times to
define priorities. Pesch [416] uses a representation that defines the order in
which two-job subproblems are solved when building a schedule. He also
considers representations that give upper bounds on the solution of two-
job or single-machine subproblems that allow some edges in the disjunctive
graph to be oriented prior to applying a heuristic method. Nakano & Ya-
mada [394] use an ordered pair representation that defines the relative order
of each pair of jobs on each machine. Using a similar approach, Aarts, Van
Laarhoven, Lenstra & Ulder [2] use a representation based on the relative
order of adjacent jobs in a block.
Several of the algorithms for the job shop problem JIICmax are tested on
the problems generated by Adams, Balas & Zawack [7], Fisher & Thomp-
son [171], Lawrence [339]"and Taillard [521]. Vaessens, Aarts & Lenstra [532]
collate the objective function values generated by the various algorithms and
provide standardized computation times. These results are extended by
Balas & Vazacopoulos [38]. The tabu search algorithm of Nowicki & Smut-
nicki and the hybrid shifting bottleneck/guided local search algorithms of
Balas & Vazacopoulos are generally preferred to the other approaches.

7.3.4 Approximation: Absolute Guarantees


Recall that two trivial lower bounds on the optimum makespan in an in-
stance of problem JIICmax are the maximum machine load II max , and the
length of the longest job Pmax • In this subsection, let Z = max{II max , Pmax }.
In a celebrated paper in theoretical computer science, Leighton, Maggs &
Rao [343] show that, for the acyclic job shop problem JIPij = 1, acyclCmax ,
the optimal makespan is at most a constant factor away from the lower
bound Z. We would emphasize that this constant factor does not depend
108 Bo Chen, C.N. Potts, and G.J. Woeginger

on the number of machines, on the number of jobs, nor on other parameters.


The proof given by Leighton, Maggs & Rao is nonconstructive and makes
repeated use of the Lovasz local lemma [162], a famous result in combina-
torial theory. An efficient polynomial time algorithm for finding a schedule
with makespan O(Z) is provided by Leighton, Maggs & Richa [344].
For the general job shop problem JIICmax , the best upper bound known
for the optimal makespan is O(Zlog2 Z/(loglogZ)2) by Goldberg, Pater-
son, Srinivasan & Sweedyk [208], which improves upon earlier work of
Shmoys, Stein & Wein [486]. Feige & Scheideler [165] show that the op-
timal makespan for problem JlacyclCmax is O(Z log Z log log Z), and they
construct instances of JlacyclCmax for which the optimal makespan is
O( Z log Z / log log Z). Hence, the O( Z) upper bound of Leighton, Maggs
& Rao [343] cannot be carried over to arbitrary processing times. Moreover,
Feige & Scheideler show that preemption may help in job shop schedul-
ing: the optimal makespan for problem Jlacyc, pmtnlCmax is O(Z log log Z).
However, the question of whether the upper bound can be improved to O(Z)
for problem Jlacyc,pmtnlCmax remains open.

7.3.5 Approximation: Ratio Guarantees


The best positive approximation result for JIICmax is due to Shmoys, Stein
& Wein [486]. They construct randomized approximation algorithms with
worst-case ratios of O(log2(mj.£)/loglog(mj.£)), where j.£ is the maximum
number of operations per job. Schmidt, Siegel & Srinivasan [463] achieve
the same approximation bound with a deterministic algorithm.
For the job shop problem JmllCmax with a constant number m of ma-
chines, Shmoys, Stein & Wein [486] provide approximation algorithms with
worst-case ratios of (2 + c), where c > 0 can be made arbitrarily close
to O. It is easily verified that their analysis also applies to the preemptive
problem JmlpmtnlCmax , for which it yields the same worst-case guaran-
tee of (2 + c). Improving these approximation results to a bound smaller
than 2 and deciding the existence of a PTAS are open problems. Sevas-
tianov & Woeginger [479] derive an approximation algorithm for problem
J21pmtniCmax with a worst-case ratio of 3/2. For the multiprocessor prob-
lem J2(P)lop::; 2lCmax , Schuurman & Woeginger [470] give a PTAS.

7.4 Other Multi-Stage Problems


Assembly lines
A Review of Machine Scheduling 109

The assembly line problem is a multi-stage scheduling problem with 8


stages and a single machine per stage. The last operation 0 sj of job j
(the assembly operation) can only be started when its first 8 - 1 operations
Olj,'" ,Os-l,j all have already been completed. The first 8 - 1 operations,
however, may be run in parallel and may overlap in time. The assembly
line problem is denoted by Aj for example, makespan minimization in an
assembly line with three stages is denoted by A311Cmax .
The assembly line problem is introduced by Lee, Cheng & Lin [340],
and is then studied by Potts et a1. [428]. Problem A211Cmax is equivalent
to problem P211Cmax, and hence solvable in polynomial time. Problem
A311Cmax is shown by Potts et a1. to be strongly N'P-hard. They also
provide a polynomial time approximation algorithm for problem AIICmax
with a worst-case ratio of 2 - 1/(8 - 1). It is not known whether there
there is a better polynomial time approximation algorithm for this general
case where the number of stages is part of the input. However, for problem
A811Cmax with a fixed number of stages, the technique of Hall [232] can be
adapted to yield a PTAS. Finally, for A311Cmax the optimal makespan is
shown by Potts et a1. to be at most ITmax + 1.25Pmax.
The mixed shop
The mixed shop (denoted by JO) is a combination of the job shop and
the open shop. Thus, we have job shop jobs and open shop jobs. Due to
the complexity of the job shop (see Section 7.3.1), only problems with two
machines can be expected to be solvable in polynomial time. Strusevich [507]
shows that problems J0210p:5 21Cmax and J02Ipmtn,op:5 21Cmax can be
solved in O(n log n) time. Shaklevich & Sotskov [483] prove that problems
JOin = 21Cmax and JOin = 21 E Cj are N'P-hard. They also derive a
polynomial time algorithm for problem JOin = 2,pmtnlf(CI, C2 ) with one
open shop job and one job shop job, where f is an arbitrary regular objective
function of the two job completion times.
The job shop with two counter routes
We now discuss a special case of the job shop (and a generalization
of the flow shop) where the jobs can only take two routes through the m
machines: The route 1 - 2 - ... - m (as in the flow shop), and the reverse
route m - ... - 2 - 1. Hence, every job consists of a chain of exactly m
operations. The job shop with two counter routes is denoted by P±.
Since problem P±211Cmax is equivalent to problem J210p :5 2lCmax , it is
solvable in polynomial time (see Section 7.3.1). On the other hand, problem
P±311Cmax is strongly N'P-hard, and all other N'P-hardness results also
110 Bo Chen, C.N. Potts, and G.J. Woeginger

carryover from the flow shop to the job shop with two counter routes. It is
not known whether problem F±mllCmax allows a PTAS (in this case, the
techniques of Hall [232] cannot be generalized). Sevastianov [474] proves that
the optimal makespan for problem F±mllCmax is at most IImax+2m2Pmax, a
result which is just slightly weaker than that of Sevastianov [473] and Barany
& Fiala [40] for problem FmllCmax . For problem F±31ICmax , Neumytov &
Sevastianov [397] improve the upper bound on the optimal makespan to
IImax + 3Pmax, and they show that the factor 3 in this bound is the best
possible.
Other variants
Lev & Adiri [353] analyze the so-called V-shop problem, a special case
of the job shop with machine repetition where each job takes the V-shaped
route 1 - 2 - ... - (m - 1) - m - (m - 1) - ... - 2 - 1 through the m
machines. Matsuo [371] and Kamoun & Sriskandarajah [291] study cyclic
flow shop problems in which each job has to go repeatedly through the
machines.

8 Further Scheduling Models


8.1 Family Scheduling
In the family scheduling model, jobs are partitioned into F families according
to their similarity, so that no setup is required for a job if it belongs to the
same family of the previously processed job. For example, jobs may be
assigned to the same family if they require the same machine tool, or if they
are produced with the same material and the machine needs to be cleaned
each time a different material is used. Reviews on models of this type are
given by Potts & Van Wassenhove [436] and Webster & Baker [554].
A setup time is required at the start of the schedule and on each occasion
when the machine switches from processing jobs in one family to jobs in
another family. If the setup times are sequence independent, then the family
setup time on machine i for family f is sil' On the other hand, for sequence
dependent setup times, then the setup time on machine i at the start of the
schedule is SiOg if a job of family 9 is processed first, and is Silg if a job
of family 9 is processed immediately after a job of family f. Further, we
make the reasonable assumption that the triangle inequality holds for each
machine i, which means that silh ~ Silg + Sigh, for all distinct families f, 9
and h, including the case f = O. All setups are anticipatory, which means
A Review of Machine Scheduling 111

that a setup on a machine does not require the presence of the job. In the
case of a single machine, we omit the subscript i.
In this section, we use the following additional job characteristics in the
second field:

- 0: there are no set-up times;


- sf: there are sequence independent family set-up times;
- Sfg: there are sequence dependent family set-up times.

A batch a maximal set of jobs that are scheduled contiguously on a ma-


chine and share a setup. Large batches have the advantage of high machine
utilization because the number of setups is small. On the other hand, pro-
cessing a large batch may delay the processing of an important job belonging
to a different family.
For many problems with family setup times, N'P-hardness is deduced
from a result without setups. Thus, in the discussion below, we mainly
restrict our attention to problems that are polynomially solvable when there
are no setup times.

8.1.1 Complexity
For several of the basic family scheduling models, results on the ordering of
jobs within a family are available. For example, Monma & Potts [382] show
that there exists an optimal schedule for problem Ilsf glLmax and for the on-
time jobs for problem Ilsfgl LWjUj in which the jobs within each family are
sequenced in EDD order. Also, for problem Ilsfgl LWjCj, jobs within each
family are sequenced in SWPT order. Thus, to solve these problems, the
ordered job within the different families are merged to produce a schedule
(and for problem Ilsfgl LWjUj the jobs are to be on time are selected).
These merging problems can be solved by dynamic programming.
The dynamic programming algorithms differ according to whether a for-
ward or backward approach is used. In a forward dynamic programming
algorithm, initial partial schedules are build by appending a job to the the
previous partial schedule. On the other hand, a backward dynamic program-
ming algorithm inserts a job at the start of a previous final partial schedule,
thereby causing a delay to each job in the partial schedule.
Monma & Potts [382] suggest forward dynamic programming algorithms
with state variables to indicate the number of sequenced jobs in each family,
112 Bo Chen, C.N. Potts, and G.J. Woeginger

the number of setups of each type in the partial schedule (this enables the
completion time of the partial schedule to be evaluated), and the family to
which the last job in the partial schedule belongs. This yields algorithms
for problems 11SlgiLmax and 11slg1 EWjGj, each with a time complexity of
O(F2nF2+2F). However, Ghosh & Gupta [199], and Ghosh [198] develop
backward dynamic programming algorithms that avoid having numbers of
setups as state variables. This reduces the time complexities for solving
each of the problems 11SlgiLmax and 11slg1 E WjGj to O(F 2 n F ). For prob-
lem lis Ig IE WjUj, Monma & Potts develop a forward dynamic programming
algorithm in which the weighted number of late jobs is a state variable, and
the completion time of the partial of on-time jobs is a function value thereby
avoiding state variables that define numbers of setups. This algorithm re-
quires O(F2 n F W) time, where W = Ej=l Wj'
Problems 11slgiLmax and 11slg1 E Uj are is polynomially solvable by
dynamic programming for fixed F, as indicated above. For arbitrary F
(the number of families is part of the input), Bruno & Downey [73] prove
that problems lis IILmax and lis liE Uj are N'P-hard, although they are
open with respect to pseudo-polynomial solvability. Similarly, problem
11slg1 EwjGj is polynomially solvable by dynamic programming for fixed
F, and Ghosh [198] shows that problem 11slg1 EGj is strongly N'P-hard
for arbitrary F. However, the complexity status of problems 11 s I IE Gj and
11sl1 L WjGj is open for arbitrary F.
For parallel machines, Monma & Potts [382] show that problem P21 sl,
pmtnl Gmax is N'P-hard, and Webster [553] shows that problem PI sll L Gj
is strongly N'P-hard.
For multi-stage problems, Kleinau [300] shows that problem 021sliGmax
is N'P-hard, even for the case of three families. He also shows that problem
F21sliGmax is N'P-hard for an arbitrary number of families, and that there
exists an optimal solution for this problem that is a permutation schedule.
Further, Monma & Potts [382] show that jobs within each family can be
sequenced using the algorithm of Johnson [286]. Thus, as observed by Potts
& Van Wassenhove [436], for fixed F, there is a polynomial time dynamic
programming algorithm ofthe type described above for problem F21s I lOmax.
A Review of Machine Scheduling 113

8.1.2 Enumerative Algorithms

Hariri & Potts [245] propose a branch and bound algorithm for problem
1ls/lLmax. They obtain an initial lower bound by ignoring setups, except
for those associated with the first job in each family, and solve the resulting
problem with the EDD rule. This lower bound is improved by a procedure
that considers whether or not certain families are split into two or more
batches. A binary branching rule fixes adjacent jobs (with respect to the
EDD ordering) in the same family to be in the same or in different batches.
Computational results show that the algorithm is successful in solving in-
stances with up to about 60 jobs. Schutten, Van de Velde & Zijm [469]
develop a branch and bound algorithm for problem 11 s I, r j ILmax. In the
presence of release dates, no results are known about the order of jobs within
a family. A key component of their algorithm is the use of dummy jobs to
represent setups. Quickly computed lower bounds are obtained by relaxing
setups and solving the corresponding preemptive problem, and a forward
branching rule is used. Computational results show that the algorithm is
effective in solving instances with up to about 40 jobs.
Mason & Anderson [370] and Crauwels, Hariri, Potts & Van Wassen-
hove [115] propose branch and bound algorithms for problem 11sl1 EWjGj.
Mason & Anderson use a forward branching rule, and make extensive use
of dominance rules to restrict the size of the branch and bound search tree.
Their lower bound is derived using objective splitting: the total weighted
completion time can be partitioned into contributions from the processing
times and from the setup times, which are optimized separately. Crauwels,
Hariri, Potts & Van Wassenhove compare three algorithms. The most effi-
cient uses a forward sequencing branching rule. Lower bound are obtained
by performing a Lagrangean relaxation of the machine capacity constraints
in a time-index formulation of the problem, and a constructive method is
used to compute values of the multipliers. Their algorithm is successful in
solving instances with up to 70 jobs, and is superior to Mason & Anderson's
algorithm.

8.1.3 Local Search

There are two studies that develop local search heuristics for problem
11sl1 EwjGj . Mason [369] designs a genetic algorithm from the observa-
tion that knowledge of the first job in each batch enables a solution to be
constructed by ordering the batches using a generalization of the SWPT
114 Bo Chen, C.N. Potts, and G.J. Woeginger

rule. Thus, he uses a binary representation of solutions to which standard


genetic operators are applied. Crauwels, Potts & Van Wassenhove [117] de-
velop several neighborhood search heuristics (descent, simulated annealing,
threshold accepting and tabu search). They use a neighborhood that se-
lects a sub-batch of jobs at the beginning (end) of a batch and moves this
sub-batch to an earlier (later) position in the sequence. The temperature
in simulated annealing follows a periodic pattern, and a descent algorithm
is applied before each temperature change. Threshold accepting is applied
in an analogous way to simulated annealing. In their tabu search method,
sub-batches are restricted to contain a single job, and a limited reordering
of batches according to an SWPT rule applied to batches is used. Compu-
tational tests for instances with up to 100 jobs and up to 20 families show
that all local search methods generate solutions which are close in value to
the optimum. The best results are obtained with a multi-start version of
tabu search when the number of families is small, and with Mason's genetic
algorithm when the number of families is large.
Crauwels, Potts & Van Wassenhove [116] propose multi-start versions
of descent, simulated annealing and tabu search, and a genetic algorithm,
for problem 11stl E Uj. The neighborhood search algorithms use either a
job or batch neighborhood. In the job neighborhood, a job is removed from
the sequence of on-time jobs, and an attempt is made to insert one or more
late jobs into the resulting sequence so that all of these jobs are on-time.
The batch neighborhood is similar except that a complete batch is removed
and then insertions of late jobs are attempted. The genetic algorithm uses
a representation in which two binary elements are associated with each job,
one indicating whether the job is on time or late, and the other indicating
whether the job ends a batch. To obtain the corresponding schedule of on-
time jobs, a due date is associated with each batch of on-time jobs, and the
batches are sequenced in EDD order. If the resulting solution is infeasible,
the algorithm of Moore [385] is applied to remove the smallest number of
batches. In computational tests for instances with up to 50 jobs and up to 10
families, all heuristics perform well. The best quality solutions are obtained
with a version of the genetic algorithm, which includes a procedure that
attempts to improve each solution of the final population.
Sotskov, Tautenhahn & Werner [498] develop constructive and neighbor-
hood search heuristics (simulated annealing, threshold accepting and tabu
search) for problems Fls,IOmax and Fls,1 L OJ in which the search is re-
stricted to permutation schedules. They use the same type of neighborhood
as Crauwels, Potts & Van Wassenhove [117]. In computational tests for
A Review of Machine Scheduling 115

instances with up to 80 jobs, and with 5 and 10 machines, the best results
are obtained with simulated annealing and tabu search, with the former
providing slightly better quality solutions.

8.1.4 Approximation
Monma & Potts [383] propose a heuristic for problem Pmlsf,pmtnlCmax
that resembles McNaughton's algorithm [378] for the classical scheduling
problem PlpmtnlCmax . It has a ratio guarantee of 2 - 1/{lm/2J + 1).
For a special class of instances in which the setup plus total process-
ing time for each single family does not exceed the optimal makespan,
Monma & Potts [383] and Chen [91] show that this performance can
be improved through the use of a heuristic that first uses list schedul-
ing for complete families, and then splits families between selected pairs
of machines. In particular, Chen's heuristic has a worst-case ratio of
max{3m/{2m + 1), (3m - 4)/{2m - 2)}.
Chen, Potts & Strusevich [92] propose two heuristics for problem
F2lsllCmax, each of which requires O{nlogn) time. The first heuristic,
which assigns all jobs of a family to a single batch and then schedules the
batches, has a worst-case ratio of 3/2. The second heuristic uses properties
of the schedule created by the first heuristic to generate another schedule
by splitting each family into at most two batches. The heuristic selects the
better of the two schedules, and has a worst-case ratio of 4/3.

8.2 Scheduling Multiprocessor Jobs


A multiprocessor job requires more than one machine at a time. This notion
is in contrast to the classical scheduling assumption that a job can only
be executed on at most one machine at a time. However, it seems to be
the right way of modeling modern parallel computer systems with shared
memory.
In this subsection, we only deal with two basic variants of scheduling
multiprocessor jobs. In the first variant, processing of job j simultaneously
needs precisely size; machines and it does not matter which machines are
used for processing the job. In second variant, processing of job j simulta-
neously needs a prespecified subset fix; of the dedicated machines. In these
variants, the symbol D. is used to denote max; size; or max; Ifix;l.
We note that many other variants of scheduling multiprocessor jobs ap-
pear in the literature. For example, sometimes every job j has an associated
116 Bo Chen, C.N. Potts, and G.J. Woeginger

upper bound OJ that indicates the maximum number of machines that can
simultaneously process j. The actual processing time of a job then depends
on the number of machines that are used for running the job. This depen-
dence may be linear (in the linear speedup model) or more involved. For
more information on these variants, we refer the reader to the survey article
by Drozdowski [146]. Also, a discussion of various speedup functions in an
on-line setting, is given by Edmonds, Chinn, Brecht & Deng [156].

8.2.1 Parallel Machines


The first papers on multiprocessor job scheduling mainly investigate vari-
ants with unit execution times. Lloyd [362] proves that problem P21
sizej, prec, Pj = 11 Cmax is solvable in linear time, whereas problem
P31sizej E {1,2},prec,pj = 11Cmax and problem Pisizej,pj = 11Cmax are
both strongly N'P-hard. He also presents an approximation algorithm for
problem Pisizej,pj = 11Cmax that is based on list scheduling and has ratio
guarantee of (2m - 6.)/(m - 6. + 1). Blaiewicz, Drabowski & W~glarz [53]
show that problem Pisizej,pj = 11Cmax is polynomially solvable if 6. is fixed
and not part of the input.
Du & Leung [147] show that problems P21sizejiCmax and P31sizejiCmax
are N'P-hard, but are pseudo-polynomially solvable. On the other hand,
problem P51sizejiCmax is strongly N'P-hard. It is still open whether prob-
lem P41sizejiCmax is strongly N'P-hard. For problem P2lsizej, chainlCmax ,
Du & Leung prove strong N'P-hardness. Steinberg [501] presents a 2-
approximation algorithm for problem Pisizej ICmax.
Drozdowski [145] shows that problem PI sizej, pmtnl Cmax is N'P-hard.
Blaiewicz, Drabowski & W~glarz [53] show that problem PI sizej E {1, 6.},
pmtnl Cmax is solvable in linear time. Moreover, they show that if 6. is fixed,
then problem Pmlsizej,pmtnlCmax can be formulated as an integer program
of fixed dimension. Consequently, problem Pmlsizej,pmtnlCmax is polyno-
mially solvable. Blaiewicz, Drozdowski, Schmidt & de Werra [54] present
an O(nlogn+nm) algorithm for problem Qisizej E {1,2},pmtnICmax .
Finally, we mention some on-line scheduling results. Feldmann, Sgall
& Teng [167] study problem Plon-line-list, sizejlCmax . They show that list
scheduling, which always schedule the current job as early as possible, is
(2 - l/m)-competitive. Feldmann, Kao, Sgall & Teng [166] show that for
problem Pion-line-list, sizej, chainlGmax , the best possible competitive ratio
is m. For the variant of Plon-line-list,sizej,precICmax with linear speedup,
they present an algorithm with a competitive ratio of (v'5 + 3)/2 ~ 2.6180.
A Review of Machine Scheduling 117

This result is the best possible for on-line scheduling, and it also constitutes
the best currently known off-line approximation algorithm for this problem.

8.2.2 Dedicated Machines


In one of the first papers on the dedicated machine model, Krawczyk &
Kubale [308] show that problem PlfixjlGmax is N'P-hard even if l.fixjl = 2
for all jobs. Hoogeveen, Van de Velde & Veltman [269] and Drozdowski [145]
show that problems P31fixjlGmax and P21fixjlLmax are strongly N'P-hard.
Hoogeveen, Van de Velde & Veltman also establish N'P-hardness for the
following problems with unit processing times: Plfixj,pj = IIGmax ~
3, P2lfixj,pj = 1, chain IGmax , and P2lfixj,pj = 1, rjlGmax . Problem
Pmlfixj,pj = IIGmax may be formulated as an integer program of fixed
dimension, and hence is polynomially solvable.
Hoogeveen, Van de Velde & Veltman [269] argue that, unless 'P=N'P,
problem Plfixj IGmax does not possess a polynomial time approximation al-
gorithm with worst-case ratio better than 4/3, which is a consequence of the
N'P-completeness of determining whether there is a schedule with Gmax ~ 3
for the case of unit processing times. Amoura, Bampis, Kenyon & Manous-
sakis [22] present a PTAS for problem Pm Ifixj IGmax · This PTAS is based on
a linear programming formulation for the corresponding preemptive prob-
lem, and it dominates a flood of previous results on the approximability of
this problem.
Kubale [313] shows that the preemptive problem Pltixj,pmtnlGmax is
strongly N'P-hard, even if l.fixj I = 2 for all jobs. On the other hand, he shows
that problem Pmlfixj,pmtnlGmax with lfixjl = 2 may be formulated as a lin-
ear program, and therefore can be solved in polynomial time. Kramer [306]
shows that problem Pm ItiXj , rj,pmtnlLmax can be solved by solving O{logn)
linear programs, each of which has o (nm+1 ) variables and O(n) constraints.
Amoura, Bampis, Kenyon & Manoussakis [22] propose an O(n) solution for
problem Pmlfixj,pmtnIGmax'
Cai, Lee & Li [77] show that problem P21fixjl L Gj is strongly N'P-
hard. Also, Hoogeveen, Van de Velde & Veltman [269] show that the
problems P31tixjl L Gj, P21.fixjl LWjGj, P21.fixj,pj = 1, chainl LwjGj, and
Plfixj,pj = 11 L Gj are all strongly N'P-hard.
118 Bo Chen, C.N. Potts, and G.J. Woeginger

8.3 Scheduling with Communication Delays


Scheduling with communication delays models computer systems with data
transmissions between the jobs where the data transmission times are sig-
nificant. There are m parallel machines and a set of precedence constrained
jobs. With every arc (j, k) between two jobs in the precedence constraints,
there is an associated communication delay Cjk. If jobs j and k are pro-
cessed on different machines, then the processing of k cannot start before
Cjk time units have elapsed after the completion of j. However, if jobs j
and k are processed on the same machine, then k cannot start before j
has been completed, but no further delay is imposed. The communication
does not interfere with the availability of the machines, i.e., during a com-
munication delay all machines may process other jobs. The occurrence of
communication delays is indicated by one of the following entries in the job
characteristics field .

• {o, Cjk, Cj*, C*k, C, C = 1, dup}:

- 0: there are no communication delays;


- Cjk: there are general communication delays;
- Cj*: the communication delays depend on the sending job only;
- C*k: the communication delays depend on the receiving job only;
- c: all communication delays are equal (uniform communication
delays);
- C = 1: all communication delays take one time unit (unit commu-
nication delays);
- dup: job duplication is allowed.

If job duplication is allowed, then the scheduler may create several copies
of a job. Note that creation of job copies may be favorable in circum-
venting high communication times. In the machine environment field, an
entry "Poo" means that the number of machines is unrestricted and may
be chosen by the scheduler. For example, makespan minimization with job
duplication, uniform communication delays, and an unrestricted number of
machines is denoted by Poolprec, c, duplCmax . For more information on
scheduling with communication delays (than that presented in this short
section), we refer the reader to the survey articles by Chretienne & Pi-
couleau [107] and by Veltman, Lageweg & Lenstra [544], and to the Ph.D.
theses by Picouleau [418] and by Veltman [543].
A Review of Machine Scheduling 119

8.3.1 Complexity

In a seminal paper, Papadimitriou & Yannakakis [415] show by a transfor-


mation from the clique problem in graphs that problems Poolprec,pj =
1, clG~ax and Poolprec,pj = 1, c, duplGmax are both NP-hard. Jung,
Kirousis & Spirakis [287] design an O(nc+2) algorithm that is based on
dynamic programming for the problem with job duplication. Hence, if
the uniform communication delay c is not part of the input, then problem
Poolprec,pj = 1, c, duplGmax is solvable in polynomial time. This demon-
strates that problem Poolprec, Pj = 1, C = 1, duplGmax is polynomially solv-
able. In contrast, for the corresponding problem Poolprec,pj = 1, c =
IIGmax without job duplication, deciding whether there exists a schedule
with makespan that does not exceed six is shown by Hoogeveen, Lenstra &
Veltman [260], based on results of Picouleau [419], to be NP-complete. No-
tably, it is shown by Hoogeveen, Lenstra & Veltman that deciding whether
there exists a schedule with makespan that does not exceed five is solvable
in polynomial time.
Colin & Chretienne [112] show that problem Poolprec, Cjk, duplGmax is
polynomially solvable if the communication delays are small, i.e., if for all
jobs k, min{pjli -+ k} ~ max{cjkli -+ k} holds, where i -+ k means that
job i is a direct predecessor of job k. Jakoby & Reischuk [285] investigate
problems with tree-like precedence constraints. They prove that problem
Poolpj = 1, in tree, c, duplGmax is NP-hard if the precedence constraints
are given by a binary intree, but is polynomially solvable if the precedence
constraints form a complete binary intree, in which all leaves are in the
same distance from the root and every non-leaf vertex has precisely two
predecessors. Problem Poolpj = 1, intree, Cjk, duplGmax is NP-hard, even
for complete binary intrees.
Most of the cases where the number of machines is given as part of
the input are NP-hard, since even the two simplest variants of deciding
whether there exists a schedule with makespan that does not exceed four for
problems Plprec,pj = 1, c = IIGmax and Plprec,pj = 1, c = 1, duplGmax are
shown by Hoogeveen, Lenstra & Veltman [260], based on results of Rayward-
Smith [444], to be NP-complete. It is also shown by Hoogeveen, Lenstra
& Veltman that problem Plprec,pj = 1, c = IIGmax ~ 3 is solvable in
polynomial time. The complexity of problem Pmlprec,pj = 1, c = IIGmax
with a fixed number of machines is open, even for m = 2.
On the other hand, although problem Plintree,pj = 1, C = IIGmax is
strongly NP-hard (Lenstra, Veldhorst & Veltman [351]), Varvarigou, Roy-
120 Bo Chen, C.N. Potts, and G.J. Woeginger

chowdhury, Kailath & Lawler [542] show that the corresponding problem
Pmlintree,pj = 1, c = 11Cmax with a fixed number of machines is solv-
able in O(n2m}. Lenstra, Veldhorst & Veltman show that problem P21pj =
1, intree, c = 11Cmax is solvable even in linear time. Hayward-Smith [443]
studies problem PI chain, clCmax , whose special case Plchain, c = llCmax is
equivalent to problem PlpmtnlCmax and hence is solvable in polynomial time
by McNaughton's wrap-around rule [378]. Surprisingly, for any fixed c ~ 2,
problem PI chain, clCmax is NP-hard. Ali & El-Rewini [17] give a polynomial
time algorithm for the special case of problem Plprec,pj = 1, c = llCmax
where the precedence constraints arise from an interval order.

8.3.2 Approximation
We first discuss approximation results for the variant with an unrestricted
number of machines. Papadimitriou & Yannakakis [415] construct a poly-
nomial time approximation algorithm with a ratio guarantee of 2 for prob-
lem Poolprec, Cj., duplCmax • Two outstanding open problems are to decide
whether their exists a polynomial time approximation algorithm for prob-
lem Poolprec,pj = 1, c, duplCmax with ratio guarantee better than 2, and to
find an approximation algorithm for problem Poolprec,pj = 1, clCmax with
bounded ratio guarantee.
Munier & Konig [391] present a (4/3}-approximation algorithm for prob-
lem Poolpree, Pi = 1, e = llCmax , which is based on the solution of the linear
programming relaxation for their integer linear programming formulation.
Hoogeveen, Lenstra & Veltman [260] show that, unless P=NP, problem
Poolpree,pj = 1, c = llCmax does not possess a polynomial time approxi-
mation algorithm with ratio guarantee better than 7/6. Mohring, Schiiffter
& Schulz [380] generalize the results of Munier & Konig [391] to problem
Poolprec,pj = 1,e = llEwjCj, and thus derive a (4/3}-approximation
algorithm for this problem. Based on the results of Hoogeveen, Lenstra
& Veltman [260], Hoogeveen, Schuurman & Woeginger [263] observe that,
unless P=NP, problem Poolprec,pj = 1, c = 11 E Cj does not possess a
polynomial time approximation algorithm with ratio guarantee better than
9/8.
When the number of machines is given as part of the input, Rayward-
Smith [444] investigates greedy schedules for problem Plprec,pj = 1, c =
llCmax , where a greedy schedule is one in which no machine is idle unless
there is no job available for processing. He proves that any greedy schedul-
ing algorithm has a ratio guarantee of 3 - 2/m. Munier & Hanen [390]
A Review of Machine Scheduling 121

provide an improvement by presenting a (7/3 - 4/(3m))-approximation al-


gorithm for this problem. Hoogeveen, Lenstra & Veltman [260] show that,
unless P=NP, problem Plprec,pj = 1,e = 11Cmax does not possess a poly-
nomial time approximation algorithm with ratio guarantee better than 5/4.
Mohring, Schaffter & Schulz [380] extend the results of Munier & Hanen [390]
to problem Plpree,pj = 1,e = 11EwjCj by derive a (1O/3 - 4/{3m))-
approximation algorithm. Based on the results of Hoogeveen, Lenstra &
Veltman [260], Hoogeveen, Schuurman & Woeginger [263] observe that, un-
less P=NP, problem Plpree, Pj = 1, e = 11 E Cj does not possess a polyno-
mial time approximation algorithm with ratio guarantee better than 11/10.
Munier & Hanen [389] present a (2 - l/m)-approximation algorithm for
problem Plpree,pj = 1, e = 1, duplCmax •

8.4 Resource Constrained Scheduling


In this section, we consider a natural extension of the classical deterministic
machine scheduling problems, namely problems that involve the presence
of additional resources, where each resource is limited in quantity and each
job requires the use of a given quantity of each resource during its exe-
cution. This topic itself is so broad that we are mainly concerned here
with a very limited subclass of off-line scheduling problems with discrete
renewable resources, i.e., only their total usage over each time period is con-
strained. Nevertheless this covers a tremendous variety of problem types,
through which the domain of deterministic scheduling theory is consider-
ably extended. For more general models of resource constrained scheduling
and more comprehensive treatment of the topic, we refer the reader to a
monograph by Blazewicz, Cellary, Slowinski & W~glarz [52].
Let us start with a problem classification scheme of BlaZewicz, Lenstra
& Rinnooy Kan [56] to accommodate the additional resource constraints
into the existing classification scheme. Suppose that there are I resources
Rl, ... , R,. For each resource Rh, there is a positive integer size Sh, which
is the total amount of Rh that is available at any given time. In single-
stage models, for each resource Rh and job j, there is a non-negative integer
requirement rhj, which is the amount of Rh required by j throughout its
execution. A schedule is feasible with respect to the resource constraints if
and only if the total requirement for resource Rh by all jobs being executed
at any time does not exceed Sh, for h = 1, ... , I. In multi-stage models, there
is for each resource Rh and operation Oij a non-negative integer requirement
rhij, with a similar condition for the feasibility of a schedule.
122 Bo Chen, C.N. Potts, and G.J. Woeginger

The presence of resource constraints is indicated in the second field of


our existing scheme by an entry resAup, where A, u, and P specify the
number of resources, their sizes, and the required amounts, respectively.
More precisely, they are characterized as follows .

• A E {.).}:
- A = . : the number of resources is arbitrary;
- A = ~: there is a fixed upper bound ~ on the number of resources .

• uELu}:
- U = . : the resource sizes are arbitrary;
- U = u: all resource sizes Sh are constants and equal to U .

• pELp}:
- p = . : the resource requirements rhj (rhij) are arbitrary;
- p = p: all resource requirements rhj (rhij) have a constant upper
bound equal to p.

8.4.1 No Precedence Constraints


Blazewicz, Lenstra & Rinnooy Kan [56] investigate problem QlresAUp,pj =
IICmax for A, u, P E {I, .}. They provide an exhaustive complexity classifica-
tion by identifying all maximal polynomially solvable problems and minimal
NP-hard problems.
There are three basic problems with resource constraints that are solv-
able in polynomial time. First, problem P2Ires"',Pj = 11Cmax can be solved
in O(n 2 l + n 2 .5 ) time by computing a maximum matching (Garey & John-
son [185]); recall that l denotes the number of resources. Second, problem
Q2Ires1··,pj = 11Cmax can be solved in O(nlogn) time by the following sim-
ple algorithm of Blaiewicz, Lenstra & Rinnooy Kan [56]. Start by scheduling
all jobs on the faster machine in order of non-increasing resource require-
ments. Next, successively remove the last job from this machine and sched-
ule it as early as possible on the slower machine, as long as this reduces the
makespan. Third, Blaiewicz, Lenstra & Rinnooy Kan [56] show that prob-
lem Qlres1 . 1,Pj = 11Cmax can be solved in O(n 3 ) time by solving a bottle-
neck transportation problem. In contrast, problems P3Ires·ll,pj = IICmax ,
Q2Ires.ll,pj = 11Cmax and P3Ires1",Pj = 11Cmax all are strongly NP-hard
(Blazewicz, Lenstra & Rinnooy Kan [56] and Garey & Johnson [185]).
A Review of Machine Scheduling 123

Most of the above results can be easily extended to other optimality


criteria. Blaiewicz, Lenstra & Rinnooy Kan [56] also observe that, if pre-
emption is allowed, then the very general problem Rm\pmtn, res···\Cmax can
be solved by linear programming.
Now consider multi-stage models. Research has revealed that virtu-
ally all except the simplest problems are NP-hard. Similar to solving
P2\res···,problempj = I\Cmax , the 02Ires···,Pij = I\Cmax reduces to finding
a maximum matching, while Blaiewicz, Cellary, Slowinski & W«;glarz [52]
show that problems 03\res ·ll,pij = I\Cmax and 03\resl··,pij = I\Cmax are
strongly NP-hard. Kubiak [310] and Lushchakova & Strusevich [364] solve
problem 02\reslll\Cmax in linear time. Recently, Jurisch & Kubiak [288]
formulate problem 02\resl··\Cmax as a maximum flow problem in a network,
which can be solved in O(n 3 ) time, and then convert the optimal preemp-
tive schedule into a non-preemptive one in O(n 2 ) time without increasing
the makespan. Problems 02\res211\Cmax and 02\res·ll\Cmax are shown by
Lushchakova & Strusevich [363] and Jurisch & Kubiak, respectively, to be
strongly NP-hard. These results show, in terms of computational complex-
ity, that it is much easier to minimize makespan in the resource-constrained
open shops with a single resource than it is with many specialized resources.
Flow shop and job shop problems are more difficult, and results for
these models are limited. Blaiewicz, Lenstra & Rinnooy Kan [56] and
Blaiewicz, Cellary, Slowinski & W«;glarz [52] observe that, while problem
F2\resl11,pij = I\Cmax is solvable in linear time by appropriately grouping
jobs together according to their overall resource requirements, each of the
problems F2\res· ll,pij = I\Cmax , F2\resl11\Cmax and J2\resl11,pij\Cmax
is strongly NP-hard.

8.4.2 Precedence Constraints


In the presence of precedence constraints, essentially all scheduling problems
with resource constraints are strongly NP-hard, since Blaiewicz, Lenstra &
Rinnooy Kan [56] establish strong NP-hardness for the simplest such prob-
lem P2\resl11, chain,pj = I\Cmax . Research is mainly focused on enumera-
tive algorithms.
A detailed literature review on research in this area can be found in
Davis [123, 124], Herroelen [249], Herroelen & Demeulemeester [250]. A
basic conceptual formulation for the classical resource-constrained project
scheduling problem is to minimize Cn subject to precedence constraints
Cj + Pj ::; Ck whenever j -+ k, and resource constraints LjElt rhj ::; Sh
124 Bo Chen, C.N. Potts, and G.J. Woeginger

for all Rh and all t, where job n succeeds all others, and It is the index
sets of jobs executed at time period t. Various integer programming formu-
lations and algorithmic procedures are developed. Typically, these include
the bounded enumeration procedure of Davis & Heidorn [125], the branch
and bound algorithm of Stinson, Davis & Khumawala [504] and the implicit
enumeration procedure of Talbot & Patterson [523]. Later, Demeulemeester
& Herroelen [135] develop a branch and bound algorithm, which we call
the DH-procedure, that is based on a depth-first solution strategy. In their
search tree, the nodes represent partial feasible schedules. Branches emanat-
ing from a parent node correspond to exhaustive and minimal combinations
of jobs whose delay resolves resource conflicts at each parent node. Com-
putational evidence indicates that the DH-procedure is more efficient than
other algorithms.
Following earlier work by Balas [35], Bartusch, Mohring & Raderma-
cher [46] introduce into their solution procedure the notion of forbidden sets,
which are subsets of jobs that cannot be executed simultaneously due to their
collective resource requirements, and temporal constraints between pairs of
jobs. This enables their algorithm to allow for precedence constraints of
minimal and maximal time lags between jobs, for resources whose availabil-
ity may change in discrete jumps over time, for time-dependent (in discrete
jumps) resource consumption per job, and for job release dates and due
dates. Unfortunately, when the problem has a large degree of parallelism,
i.e., if there are many forbidden sets or if there are few temporal constraints,
then their algorithm is computationally too time consuming.
Recently, Demeulemeester & Herroelen [136] extend their earlier branch
and bound algorithm (the DH-procedure) to deal with generalized resource-
constrained project scheduling problems. In such a problem, not only the
availability of each resource varies over time and jobs have release dates and
due dates, but also precedence diagramming is introduced to accommodate
the specification of all four possible minimum time lags between the start
and finish times of a predecessor and those of a successor. Results from
computational experience are promising, and the extended DH-procedure is
able to solve to optimality typical test instances in the literature reasonably
effectively.

8.5 Scheduling with Controllable Processing Times


In most classical scheduling models, we assume fixed job processing times.
However, in real-life applications, the processing of a job often requires ad-
A Review of Machine Scheduling 125

ditional resources such as facilities, manpower, funds, etc., and hence the
processing time can change with the allocation of these additional resources.
These changes can be continuous (see Nowicki & Zdrzalka [405]) and dis-
crete (see Chen, Lu & Tang [100]). These situations are usually modeled
as follows. In the case of continuously controllable processing times, the
processing requirement of job j is specified by three positive parameters aj,
Uj and Cj with Uj ~ aj. By assigning additional resources to the processing
of job j, the actual processing time of job j may be compressed down to
aj - Xj, where 0 ~ Xj ~ Uj. The cost of performing this compression is
equal to CjXj. In the case of discretely controllable processing times, the
processing requirement of job j is specified by positive values aij and Cij, for
i = 1, ... , kj. The compression parameter Xj for job j may take any integer
value between 1 and kj . The actual processing time of job j is equal to ax; ,j,
and the cost for this compression is equal to cx ; ,j'
Solving a scheduling problem with controllable processing times amounts
to specifying a schedule a for the jobs together with a compression vector
x that encodes the compressions Xj of the jobs. For an optimality criterion
f such as Cmax and Lmax, we denote by FI (J, a, x) the cost of schedule a
under criterion f with processing times compressed by x. Moreover, we
x,
denote by F2(X) the total compression cost of i.e., F2(X) ~ Ej CjXj in the
continuous case and F2(X) ~ Ej Cx;,j in the discrete case. Scheduling with
controllable job processing times essentially is a bicriteria problem, and the
following four basic optimization problems arise.

PI. Minimization of FI (J, a, x) + F2(X).


P2. Minimization of FI (f, a, x) subject to F2(X) ~ K.

P3. Minimization of F2(X) subject to Fdf, a, x) ~ T.

P4. Identify the set of Pareto-optimal points (x, a) for (FI' F2).

A pair (x, a) is called Pareto-optimal if there does not exist another pair
(Xi, a')
that improves on (x, a) with respect to one of FI and F2 , and stays
equal or improves with respect to the other one. Note that a solution to
problem P4 also solves problems PI-P3 as a by-product.

8.5.1 Continuously Controllable Processing Times


For problems lllLmax, IIITmax, and llr;lCmax , an optimal schedule can be
computed without knowledge of the job processing times by the EDD rule
or variants of this rule (see Section 4.1.1). As a consequence, the three
126 Bo Chen, C.N. Potts, and G.J. Woeginger

problems PI, P2 and P3 become linear programs, and hence are solvable in
polynomial time. Vickson [548] observes that for problem II\Tmax , the linear
program for P3 reduces to a production-inventory problem, which yields an
O(n2 ) algorithm. Van Wassenhove & Baker [540] describe a simple greedy
algorithm that solves P4 for these three scheduling problems in O(n2 ) time.
The boundary of the set of Pareto-optimal points is a piecewise linear curve
with up to n+ 1 breakpoints. The greedy algorithm identifies the breakpoints
one-by-one. As a consequence, this also yields algorithms of the same time
complexity for problems PI-P3.
The delivery time version of problem IlrjlLmax is strongly N'P-hard even
for fixed processing times (see Section 4.1.1). For the corresponding problem
PI, Zdrzalka [572] gives a polynomial time approximation algorithm with a
worst-case ratio of 3/2 + e, where e > 0 can be made arbitrarily small. This
result is based on the PTAS of Hall & Shmoys [235] for the delivery time
version of problem IlrjlLmax.
We now consider problem Illfmax (cf. Section 4.1.1). Van Wassenhove
& Baker [540] investigate the special case where the functions I; are totally
ordered, i.e., I;(t) ~ 1;+1(t) holds for all t and all 1 ~ j ~ n - 1. For
the case of linear functions 1;, they give an O(n3 ) algorithm for problems
PI-P4. Tuzikov [529] presents a polynomial time algorithm for problem P3.
Hoogeveen & Woeginger [271] prove that PI-P4 are polynomially solvable
for regular functions I; that are piecewise linear.
Vickson [548, 549] studies PI for problems 111 L OJ and 111 LWjOj. For
111 E Gj, problem PI can be formulated as an assignment problem and hence
can be solved in O(n2.5 ) time. For problem 111 L Gj , Ruiz Diaz & French
[454] develop an enumerative algorithm for P4. They also note that the set
of Pareto-optimal points in general is not convex. Hoogeveen & Woeg-
inger [271] prove that PI is N'P-hard for problem 111 E WjGj. Vickson
[549] describes an enumerative algorithm for P4 for 111 EWjGj. Inspired
by Vickson's work, Panwalkar & Rajagopalan [410] consider PI for problem
Ildj = dl E(wEj + w'Tj) , where d is an unrestrictively large due date, and
formulate it as an assignment problem. Alidaee & Ahmadian [18] observe
that the results of Vickson [548] and Panwalkar & Rajagopalan [410] can be
extended to parallel machines, which yields polynomial time algorithms for
PI for problems PII E Gj and Pldj = DI E aEj + (3Tj.
Nowicki & Zdrzalka [406] give an O(n2 ) greedy algorithm to solve P4 for
problem PlpmtnlGmax . The boundary of the set of Pareto-optimal points is
a piecewise linear curve with up to 2n + 1 breakpoints. Their approach also
works for P4 for problem Q2lpmtnlGmax . The computational complexity of
A Review of Machine Scheduling 127

P4 for problem Q31pmtniCmax is unknown.


For 0211Cmax , problem PI can be formulated as a linear program, and
thus is solvable in polynomial time (see Section 7.1.1). This observation also
yields polynomial time algorithms for problems P2-P4. Nowicki & Zdrzalka
[404] show that PI for problem F211Cmax is NP-hard. They also develop
a polynomial time approximation algorithm with a ratio guarantee of 3/2
for this problem. Grabowski & Janiak [222] propose a branch and bound
algorithm for P3 for problem JIICmax .

8.5.2 Discretely Controllable Processing Times


Chen, Lu & Tang [100] formulate PI for problems 111 E Cj and 11dj
DI E(wEj + w'Tj} as assignment problems, thus giving polynomial time
algorithms for these problems. These are the only known polynomially solv-
able problems for scheduling with discretely controllable processing times.
Vickson [548] shows that PI for problem 111Tmax is NP-hard. Chen,
Lu & Tang [100] prove that PI for each of the problems 1hlCmax, 11dj =
dlTmax and 11dj = dl E WjUj is NP-hard. They also give pseudo-polynomial
algorithms for these problems. The complexity of PI for problem 111 E WjCj
is unknown.

9 Concluding Remarks
In this review, we have displayed results for all of the classical machine
scheduling problems, including single machine, parallel machine, open shop,
flow shop and job shop models. Some of the well-known variants of these
classical models have also been covered.
Our review discusses the complexity of the various models, stating
whether they are polynomially solvable, NP-hard, strongly NP-hard, or
open. Much progress has been made in the area of complexity classifica-
tion over the last twenty years, and the number of open problems is fairly
small. Examples of problems that have resisted attempts to classify them
are Rlpmtnl E Cj (for which the non-preemptive counterpart is polynomi-
ally solvable), and 0311Cmax which is NP-hard but is open with respect to
pseudo-polynomial solvability.
For the NP-hard problems, we have described a variety of enumera-
tive methods, most of which use branch and bound. Except for the most
structured of these problems, the currently available branch and bound al-
gorithms cannot solve problems of practical size. The main difficulty is
128 Bo Chen, C.N. Potts, and G.J. Woeginger

deriving a lower bounding scheme that is powerful enough to restrict the


search. Even the polyhedral-based approaches that use linear programming
to compute lower bounds are not very effective, although they are very suc-
cessful in other areas of combinatorial optimization.
Research on local search methods has only gained momentum rela-
tively recently. Nevertheless, there is evidence from flow shop and job shop
scheduling, where comparative studies of multi-start descent, simulated an-
nealing, threshold accepting, tabu search and genetic algorithms are re-
ported in the literature, that these methods are capable of generating near-
optimal solutions at reasonable computational expense. Used as a 'black
box' technique, local search methods only perform satisfactorily. However,
by incorporating some problem-specific features, and allowing variations in
the basic local search technique, the performance often improves dramati-
cally. Our current understanding of why these methods work well in some
situations, but not in others, is still very superficial. Further research is
needed to produce meaningful guidelines as to what type of local search
method should be used for a problem with particular characteristics, and
what special features should be incorporated into the method to improve its
performance.
Until the last few years, research on approximation algorithms mainly fo-
cused on deriving ratio guarantees for problems of minimizing the makespan
or maximum lateness. Recent techniques use the solution of a linear pro-
grams to guide the construction of the schedule. This approach allows
ratio guarantees to be derived for a variety of models involving the total
weighted completion time criterion. There are some non-approximability
results, which state that a particular ratio guarantee cannot be achieved
in polynomial time unless P=N'P. However, there is often a large differ-
ence between the ratio indicated in the non-approximability result, and the
best available ratio guarantee. An important research topic is the further
development of techniques to help close the gap between ratios for non-
approximability and approximability.

Acknowledgements
We gratefully acknowledge the comments and suggestions of Edwin
Cheng, Han Hoogeveen, Chung-Lun Li, Petra Schuurman, Jiff Sgall, Vi-
taly Strusevich, Guochun Tang, Steef van de Velde and Wenci Yu. Partial
A Review of Machine Scheduling 129

support for the research by the first author was provided by the Management
Research Fellowship of the ESRC (Economic & Social Research Council) of
Britain and the Research Initiatives Fund of the Warwick Business School,
by the second author was provided by INTAS (Project INTAS-93-257 and
INTAS-93-257-Ext), and by the third author was provided by the START
program Y43-MAT of the Austrian Ministry of Science.

References
[1] E.H.L. Aarts and J.K. Lenstra (eds.), Local Search in Combinatorial Opti-
mization, Wiley, Chichester, 1997.

[2] E.H.L. Aarts, P.J.M. van Laarhoven, J.K. Lenstra and N.L.J. Ulder, A com-
putational study of local search algorithms for job shop scheduling, ORSA
Journal on Computing 6 (1994), 118-125.
[3] T.8. Abdul-Razaq and C.N. Potts, Dynamic programming state-space re-
laxation for single-machine scheduling, Journal of the Operational Research
Society 39 (1988), 141-152.
[4] T.S. Abdul-Razaq, C.N. Potts and L.N. Van Wassenhove, A survey of algo-
rithms for the single machine total weighted tardiness scheduling problem,
Discrete Applied Mathematics 26 (1990), 235-253.

[5] J.O. Achugbue and F.Y. Chin, Bounds on schedules for independent tasks
with similar execution times. Journal of the Association for Computing Ma-
chinery 28 (1981), 81-99.
[6] J.O. Achugbue and F.Y. Chin, Scheduling the open shop to minimize mean
flow time, SIAM Journal on Computing 11 (1982), 709-720.
[7] J. Adams, E. Balas and D. Zawack, The shifting bottleneck procedure for job
shop scheduling, Management Science 34 (1988), 391-401.
[8] B. Adenso-Dias, Restricted neighborhood in the tabu search for the flowshop
problem, European Journal of Operational Research 62 (1992), 27-37.
[9] 1. Adiri and N. Aizikowitz, Open shop scheduling problems with dominated
machines, Operations Research, Statistics and Economics Mimeograph Series
383, Technion, Haifa, Israel, 1986.
[10] D. Adolphson and T.C. Hu, Optimal linear ordering, SIAM Journal on Ap-
plied Mathematics 25 (1973), 403-423.

[11] R.H. Ahmadi and U. Bagchi, Lower bounds for single-machine scheduling
problems, Naval Research Logistics Quarterly 37 (1990), 967-979.
130 Bo Chen, C.N. Potts, and G.J. Woeginger

[12J R.H. Ahmadi and V. Bagchi, Improved lower bounds for minimizing the sum
of completion times on n jobs over m machines in a flow shop, European
Journal of Operational Research 44 (1990), 331-336.
[13J S.B. Akers, A graphical approach to production scheduling problems, Oper-
ations Research 4 (1956), 244-245.
[14J S.B. Akers and J. Friedman, A non-numerical approach to production
scheduling problems, Operations Research 3 (1955), 429-442.
[15J V.A. Aksjonov, A polynomial-time algorithm for an approximate solution of
a scheduling problem (in Russian), Upravlyaemye Sistemy 28 (1988), 8-11.
[16J S. Albers, Better bounds for online scheduling, Proceedings of the 29th Annual
ACM Symposium on Theory of Computing (1997),130-139.
[17J H.H. Ali and H. El-Rewini, An optimal algorithm for scheduling interval
ordered tasks with communication on N processors, Journal of Computing
and System Sciences 51 (1995), 301-306.
[18J B. Alidaee and A. Ahmadian, Two parallel machine sequencing problems
involving controllable job processing times, European Journal of Operational
Research 70 (1993), 335-34l.
[19J B. Alidaee and S. Gopalan, A note on the equivalence of two heuristics to min-
imize total tardiness, European Journal of Operational Research 96 (1997),
514-517.
[20J N. Alon, Y. Azar, G.J. Woeginger and T. Yadid, Approximation schemes
for scheduling, Proceedings of the 8th Annual ACM-SIAM Symposium on
Discrete Algorithms (1997), 493-500.
[21J N. Alon, Y. Azar, G.J. Woeginger and T. Yadid, Approximation schemes
for scheduling on parallel machines, Technical Report Woe-18, Department
of Mathematics, TV Graz, Graz, Austria, 1997. To appear in Journal of
Scheduling.
[22J A.K. Amoura, E. Bampis, C. Kenyon and Y. Manoussakis, Scheduling inde-
pendent multiprocessor tasks, Proceedings of the 5th Annual European Sym-
posium on Algorithms (1997), 1-12.
[23J E.J. Anderson, C.A. Glass and C.N. Potts, Machine scheduling, in E.H.L.
Aarts and J.K. Lenstra (eds.) Local Search in Combinatorial Optimization,
Wiley, Chichester, 1997,361-414.
[24J D. Applegate and W. Cook, A computational study of the job-shop scheduling
problem, ORSA Journal on Computing 3 (1991), 149-156.
[25J J. Aspnes, Y. Azar, A. Fiat, S. Plotkin and o. Waarts, On-line load balancing
with applications to machine scheduling and virtual circuit routing, Journal
of the Association for Computing Machinery 44 (1997),486-504.
A Review of Machine Scheduling 131

[26] A. Avidor, Y. Azar and J. Sgall, Ancient and new algorithms for load balanc-
ing in the Lp norm, Proceedings of the 9th Annual ACM-SIAM Symposium
on Discrete Algorithms (1998),426-435.
[27] B. Awerbuch, Y. Azar, E. Grove, M. Kao, P. Krishnan and J. Vitter, Load
balancing in the Lp norm, Proceedings of the 36th IEEE Symposium on Foun-
dations of Computer Science (1995), 383-39l.
[28] Y. Azar, J. Naor and R Rom, The competitiveness of on-line assignments,
Proceedings of the 3rd Annual ACM-SIAM Symposium on Discrete Algo-
rithms (1992), 203-210.
[29] U. Bagchi, RS. Sullivan and Y.-L. Chang, Minimizing mean squared devi-
ations of completion times about a common due date, Management Science
33 (1987), 894-906.
[30] KR Baker, Introduction to Sequencing and Scheduling, Wiley, New York,
1974.
[31] K.R Baker and J.W. Bertrand, A dynamic priority rule for scheduling against
due dates, Journal of Operations Management 3 (1982), 37-42.
[32] KR. Baker, E.L. Lawler, J.K Lenstra and A.H.G. Rinnooy Kan, Preemptive
scheduling of a single machine to minimize maximum cost subject to release
dates and precedence constraints, Operations Research 31 (1983), 381-386.
[33] KR Baker and G.D. Scudder, Sequencing with earliness and tardiness penal-
ties: a review, Operations Research 38 (1990), 22-36.
[34] KR Baker and Z.-S. Su, Sequencing with due-dates and early start times to
minimize maximum tardiness, Naval Research Logistics Quarterly 21 (1974),
171-176.
[35] E. Balas, Project scheduling with resource constraints, in E.M.L. Beale (ed.)
Applications of Mathematical Programming Techniques, English University
Press, London, 1970, 187-200.
[36] E. Balas, On the facial structure of scheduling polyhedra, Mathematical Pro-
gramming Study 24 (1985), 179-218.
[37] E. Balas, J.K Lenstra and A. Vazacopoulos, The one-machine problem with
delayed precedence constraints and its use on job shop scheduling, Manage-
ment Science 41 (1995), 94-109.
[38] E. Balas and A. Vazacopoulos, Guided local search with shifting bottleneck
for job shop scheduling, Management Science Research Report MSSR-609,
Carnegie Mellon University, Pittsburgh, USA, 1994.
[39] 1. Barany, A vector-sum theorem and its application to improving flow shop
guarantees, Mathematics of Operations Research 6 (1981),445-452.
132 Bo Chen, C.N. Potts, and G.J. Woeginger

[40] I. Barany and T. Fiala, Nearly optimum solution of multimachine scheduling


problems (in Hungarian), Szigma Mathematika Kozgazdasagi Foly6irat 15
(1982), 177-191.
[41] J.R. Barker and G.B. McMahon, Scheduling the general job-shop, Manage-
ment Science 31 (1985), 594-598.
[42] J.W. Barnes and J.J. Brennan, An improved algorithm for scheduling jobs
on identical machines, AIlE 1hlnsactions 9 (1977), 25-31.
[43] J.W. Barnes and J.B. Chambers, Solving the job shop scheduling problem
using tabu search, lIE 1hlnsactions 27 (1995), 257-263.
[44] Y. Bartal, A. Fiat, H. Karloff and R. Vohra, New algorithms for an ancient
scheduling problem, Journal of Computing and System Sciences 51 (1995),
359-366.
[45] Y. Bartal, S. Leonardi, A. Marchetti-Spaccamela, J. Sgall and L. Stougie,
Multiprocessor scheduling with rejection, Proceedings of the 7th Annual
ACM-SIAM Symposium on Discrete Algorithms (1996), 95-103. To appear
in SIAM Journal on Discrete Mathematics.
[46] M. Bartusch, R.H. Mohring and F.J. Radermacher, Scheduling project net-
works with resource constraints and time windows, Annals of Operations
Research 16 (1988), 201-240.
[47] H. Belouadah, M.E. Posner and C.N. Potts, Scheduling with release dates
on a single machine to minimize total weighted completion time, Discrete
Applied Mathematics 36 (1992), 213-231.
[48] H. Belouadah and C.N. Potts, Scheduling identical parallel machines to
minimize total weighted completion time, Discrete Applied Mathematics 48
(1994), 201-218.
[49] I.S. Belov and J.1. Stolin, An algorithm for the flow shop problem (in Rus-
sian), in Mathematical Economics and Functional Analysis, Nauka, Moscow,
1974,248-257.
[50] P. Berman, M. Charikar and M. Karpinski, On-line load balancing for re-
lated machines, Proceedings of the 5th Workshop on Algorithms and Data
Structures (1997), 116-125.
[51] L. Bianco and S. Ricciardelli, Scheduling a single machine to minimize total
weighted completion time subject to release dates, Naval Research Logistics
Quarterly 29 (1982), 151-167.
[52] J. Blaiewicz, W. Cellary, R. Slowinski and J. W~glarz, Scheduling under
Resource Constraints-Deterministic Models, (Annals of Operations Research,
Volume 7), J.C. Baltzer AG, Basel, Switzerland, 1986.
A Review of Machine Scheduling 133

[53] J. BlaZewicz, M. Drabowski and J. W~glarz, Scheduling multiprocessor tasks


to minimize schedule length, IEEE 7hmsactions on Computing 35 (1986),
389-393.
[54] J. BlaZewicz, M. Drozdowski, G. Schmidt and D. de Werra, Scheduling in-
dependent two processor tasks on a uniform k-processor system, Discrete
Applied Mathematics 28 (1990), 11-20.
[55] J. Blazewicz, K.H. Ecker, G. Schmidt and J. W~glarz, Scheduling in Computer
and Manufacturing System, Springer-Verlag, Berlin, 1994.
[56] J. BlaZewicz, J.K. Lenstra and A.H.G. Rinnooy Kan, Scheduling subject to
resource constraints: Classification and complexity, Discrete Applied Mathe-
matics 5 (1983), 11-24.
[57] H. Brasel, T. Tautenhahn and F. Werner, Constructive heuristic algorithms
for the open shop problem, Computing 51 (1993),95-110.
[58] P. Bratley, M. Florian and P. Robillard, On sequencing with earliest starts
and due dates with application to computing bounds for the (n/m/G / Fmax)
problem, Naval Research Logistics Quarterly 20 (1973), 57-67.
[59] P. Bratley, M. Florian and P. Robillard, Scheduling with earliest start and due
date constraints on multiple machines, Naval Research Logistics Quarterly 22
(1975),165-173.
[60] P. Brucker, Minimizing maximum lateness in a two-machine unit-time job
shop, Computing 27 (1981), 367-370.
[61] P. Brucker, A polynomial time algorithm for the two machine job-shop
scheduling problem with a fixed number of jobs, OR Spektrum 16 (1994),
5-7.
[62] P. Brucker, J. Hurink, B. Jurisch and B. Wostmann, A branch & bound al-
gorithm for the open-shop problem, Discrete Applied Mathematics 76 (1997),
43-59.
[63] P. Brucker, J. Hurink and F. Werner, Improved local search heuristics for
some scheduling problems. Part I, Discrete Applied Mathematics 65 (1996),
97-122.
[64] P. Brucker, J. Hurink and F. Werner, Improved local search heuristics for
some scheduling problems. Part II, Discrete Applied Mathematics 72 (1997),
47-69.
[65] P. Brucker and B. Jurisch, A new lower bound for the job-shop scheduling
problem, European Journal of Operational Research 64 (1993), 156-167.
[66] P. Brucker, B. Jurisch and M. Jurisch, Open shop problems with unit time
operations, ZOR - Mathematical Methods of Operations Research 37 (1993),
59-73.
134 Bo Chen, C.N. Potts, and G.J. Woeginger

[67] P. Brucker, B. Jurisch and A. Kramer, The job-shop problem and immediate
selection, Annals of Operations Research 50 (1994), 73-114.
[68] P. Brucker, B. Jurisch and B. Sievers, A branch & bound algorithm for the
job-shop scheduling problem, Discrete Applied Mathematics 49 (1994), 107-
127.
[69] P. Brucker, S.A. Kravchenko and Y.N. Sotskov, On the complexity of two
machine job-shop scheduling with regular objective functions, OR Spektrnm
19 (1997), 5-10.
[70] P. Brucker, S.A. Kravchenko and Y.N. Sotskov, Preemptive job-shop schedul-
ing problems with a fixed number of jobs, Osnabriicker Schriften zur Mathe-
matik, Heft 184, Universitat Osnabriick, Germany, 1997.
[71] J.L. Bruno, E.G. Coffman, Jr., and R. Sethi, Scheduling independent tasks to
reduce mean finishing time, Communications of the ACM 17 (1974), 382-387.
[72] J.L. Bruno, E.G. Coffman, Jr., and R. Sethi, Algorithms for minimizing mean
flow time, Proceedings of the IFIP Congress, North-Holland, Amsterdam,
1974,504-510.
[73] J .L. Bruno and P.J. Downey, Complexity of task sequencing with deadlines,
set-up times and changeover costs, SIAM Journal on Computing 7 (1978),
393-404.
[74] J .L. Bruno and T. Gonzalez, Scheduling independent tasks with release dates
and due dates on parallel machines, Technical Report 213, Computer Science
Department, Pennsylvania State University, USA, 1976.
[75] H. Buer & R.H. Mohring, A fast algorithm for the decomposition of graphs
and posets, Mathematics of Operations Research 8 (1983), 170-184.
[76] X. Cai, Minimization of agreeably weighted variance in single-machine sys-
tems, European Journal of Operational Research 85 (1995), 576-592.
[77] X. Cai, C.-Y. Lee and C.-L. Li, Minimizing total completion time in two-
processor task systems with prespecified processor allocations, Naval Research
Logistics 45 (1998), 231-242.
[78] H.G. Campbell, R.A. Dudek and M.L. Smith A heuristic algorithm for the n
job, m machine sequencing problem, Management Science 16B (1970),630-
637.
[79] J. Carlier, The one-machine sequencing problem, European Journal of Oper-
ational Research 11 (1982), 42-47.
[80] J. Carlier, Scheduling jobs with release dates and tails on identical machines
to minimize the makespan, European Journal of Operational Research 29
(1987), 298-306.
A Review of Machine Scheduling 135

[81] J. Carlier and E. Pinson, An algorithm for solving the job-shop problem,
Management Science 35 (1989), 164-176.
[82] J. Carlier and E. Pinson, A practical use of Jackson's preemptive schedule
for solving the job shop problem, Annals 0/ Operations Research 26 (1990),
269-287.
[83] J. Carlier and E. Pinson, Adjustment of heads and tails for the job-shop
problem, European Journal 0/ Operational Research 78 (1994), 146-161.
[84] S. Chakrabarti, C.A. Phillips, A.S. Schulz, D.B. Shmoys, C. Stein, and J.
Wein, Improved scheduling algorithms for minsum criteria, Proceedings 0/ the
23rd International Colloquium on Automata, Languages and Programming
(1996), 646-657.
[85] L.M.A. Chan, P. Kaminsky, A. Muriel and D. Simchi-Levi, Machine schedul-
ing, linear programming and list scheduling heuristics, Manuscript, Depart-
ment of Industrial Engineering, Northwestern University, Evanston, USA,
1995.
[86] S. Chang, Q. Lu, G. Tang and W. Yu, On decomposition of the total tardiness
problem, Operations Research Letters 17 (1995), 221-229.
[87] S. Chang, H. Matsuo and G. Tang, Worst-case analysis of local search heuris-
tics for the one-machine total tardiness problem, Naval Research Logistics
Quarterly 37 (1990), 111-121.
[88] J.M. Charlton and C.C. Death A generalized machine scheduling algorithm,
Operational Research Quarterly 21 (1970), 127-134.
[89] C. Chekuri, R. Motwani, B. Natarajan and C. Stein, Approximation tech-
niques for average completion time scheduling, Proceedings of the 8th Annual
ACM-SIAM Symposium on Discrete Algorithms (1997),609-618.
[90] B. Chen, Tighter bounds for MULTIFIT scheduling on uniform processors,
Discrete Applied Mathematics 31 (1991), 227-260.
[91] B. Chen, A better heuristic for preemptive parallel machine scheduling with
batch setup times, SIAM Journal on Computing 22 (1993), 1303-1318.
[92] B. Chen, C.N. Potts and V.A. Strusevich, Approximation algorithms for two-
machine flow shop scheduling with batch setup times, Mathematical Program-
ming (1998), to appear.
[93] B. Chen and V.A. Strusevich, Approximation algorithms for three-machine
open shop scheduling, ORSA Journal on Computing 5 (1993), 321-326.
[94] B. Chen, A. van Vliet and G.J. Woeginger, New lower and upper bounds for
on-line scheduling, Operations Research Letters 16 (1994), 221-230.
136 Bo Chen, C.N. Potts, and G.J. Woeginger

[95] B. Chen, A. van Vliet and G.J. Woeginger, An optimal algorithm for pre-
emptive on-line scheduling, Operations Research Letters 18 (1995), 127-131.
[96] B. Chen and A. Vestjens, Scheduling on identical machines: How good is
LPT in an on-line setting? Operations Research Letters 21 (1998), 165-169.
[97] B. Chen, A.P.A. Vestjens and G.J. Woeginger, On-line scheduling of two-
machine open shops where jobs arrive over time, Journal of Combinatorial
Optimization 1 (1997), 355-365.
[98] B. Chen and G.J. Woeginger, A study of on-line scheduling two-stage shops,
in D.-Z. Du and P.M. Pardalos (eds.) Minimax and Applications, Kluwer
Academic Publishers, 1995, 97-107.
[99] C.-L. Chen and R.L. Bulfin, Complexity of single machine, multi-criteria
scheduling problems, European Journal of Operational Research 70 (1993),
115-125.
[100] Z.L. Chen, Q. Lu and G. Tang, Single machine scheduling with discretely
controllable processing times, Operations Research Letters 21 (1997),69-76.
[101] Z.L. Chen and W.B. Powell, Solving parallel machine total weighted comple-
tion time problems by column generation, Manuscript, Department of Civil
Engineering and Operations Research, Princeton University, Princeton, USA,
1995.
[102] T.C.E. Cheng, Optimal common due date with limited completion time de-
viation, Computers and Operations Research 15 (1988), 91-96.
[103] T.C.E. Cheng, A note on the equivalence of the Wilkerson-Irwin and Modified
Due-Date rules for the mean tardiness sequencing problem, Computers and
Industrial Engineering 22 (1992),63-66.
[104] T.C.E. Cheng and C.C.S. Sin, A state-of-the-art review of parallel-machine
scheduling research, European Journal of Operational Research 47 (1990),
271-292.
[105] Y. Cho and S. Sahni, Bounds for list schedules on uniform processors, SIAM
Journal on Computing 9 (1980), 91-103.
[106] Y. Cho and S. Sahni, Preemptive scheduling of independent jobs with release
and due times on open, flow and job shops, Operations Research 29 (1981),
511-522.
[107] P. Chretienne and C. Picouleau, Scheduling with communication delays:
a survey, in P. Chretienne, E.G. Coffman, J.K. Lenstra and Z.Liu (eds.)
Scheduling theory and its applications, Chichester, John Wiley & Sons, 1995,
65-90.
[108] C. Chu, A branch-and-bound algorithm to minimize total tardiness with dif-
ferent release dates, Naval Research Logistics 39 (1992), 265-283.
A Review of Machine Scheduling 137

[109] C. Chu, A branch-and-bound algorithm to minimize total flow time with


unequal release dates, Naval Research Logistics 39 (1992), 859-875.
[110] F. Chudak and D.B. Shmoys, Approximation algorithms for precedence-
constrained scheduling problems on parallel machines that run at different
speeds, Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete
Algorithms (1997), 581-590.
[111] E.G. Coffman, Jr., M.R. Garey and D.S. Johnson, An application of bin-
packing to multiprocessor scheduling, SIAM Journal on Computing 7 (1978),
1-17.
[112] J.-Y. Colin and P. Chretienne, CPM scheduling with small communication
delays. Operations Research 39 (1991), 680-684.
[113] R.K. Congram, C.N. Potts and S.L. van de Velde, Dynasearch-iterative local
improvement by dynamic programming: The total weighted tardiness prob-
lem, Preprint, Faculty of Mathematical Studies, University of Southampton,
Southampton, UK, 1998.
[114] R.W. Conway, W.L. Maxwell and L.W. Miller, Theory of Scheduling,
Addison-Wesley, Reading, 1967.
[115] H.A.J. Crauwels, A.MA. Hariri, C.N. Potts and L.N. Van Wassenhove,
Branch and bound algorithms for single machine scheduling with batch set-
up times to minimize total weighted completion time, Annals of Operations
Research, to appear.
[116] H.A.J. Crauwels, C.N. Potts and L.N. Van Wassenhove, Local search heuris-
tics for single-machine scheduling with batching to minimize the number of
late jobs, European Journal of Operational Research 90 (1996), 200-213.
[117] H.A.J. Crauwels, C.N. Potts and L.N. Van Wassenhove, Local search heuris-
tics for single machine scheduling with batch set-up times to minimize total
weighted completion time, Annals of Operations Research 70 (1997), 261-279
[118] H.A.J. Crauwels, C.N. Potts and L.N. Van Wassenhove, Local search heuris-
tics for the single machine total weighted tardiness scheduling problem, IN-
FORMS Journal on Computing, to appear.
[119] H.A.J. Crauwels, C.N. Potts and L.N. Van Wassenhove, Local search heuris-
tics for single machine scheduling with batching to minimize total weighted
completion time, Unpublished manuscript, 1995.
[120] D.G. Dannenbring, An evaluation of flow-shop sequencing heuristics, Man-
agement Science 23 (1977), 1174-1182.
[121] S. Dauzere-Peres and J.-B. Lasserre, A modified shifting bottleneck proce-
dure for job-shop scheduling, International Journal of Production Research
31 (1993), 923-932.
138 Bo Chen, C.N. Potts, and G.J. Woeginger

[122] S. Dauzere-Peres, Minimizing late jobs in the general one machine scheduling
problem, European Journal of Opemtional Research 81 (1995), 134-142.
[123] E.W. Davis, Resource allocation in project network models-A survey, Journal
of Industrial Engineering 17 (1966), 177-188.
[124] E.W. Davis, Project scheduling under resource constraints-Historical review
and categorization of procedures, AIlE 2ransactions 5 (1973), 297- 313.
[125] E.W. Davis and G.E. Heidorn, An algorithm for optimal project scheduling
under multiple resource constraints. Management Science 17 (1971), 803-816.
[126] E. Davis and J.M. Jaffe, Algorithms for scheduling tasks on unrelated proces-
sors, Journal of the Association for Computing Machinery 28 (1981), 721-736.
[127] J.S. Davis and J.J. Kanet, Single-machine scheduling with early and tardy
completion costs, Naval Research Logistics 40 (1993), 85-101.
[128] P. De, J.B. Ghosh and C.E. Wells, On the minimization of completion time
variance with a bicriteria extension, Opemtions Research 40 (1992), 1148-
1155.
[129] F. Della Croce, V. Narayan and R. Tadei, The two-machine total comple-
tion time flow shop problem, European Journal of Opemtional Research, to
appear.
[130] F. Della Croce, R. Tadei and G. Volta, A genetic algorithm for the job shop
problem, Computers and Opemtions Research 22 (1995), 15-24.
[131] F. Della Croce, W. Szwarc, R. Tadei, P. Baracco and R. Di Tullio, Minimizing
the weighted sum of quadratic completion times on a single machine, Naval
Research Logistics 42 (1995), 1263-1270.
[132] F. Della Croce, R. Tadei, P. Baracco and A. Grosso, A new decomposition
approach for the single machine total tardiness problem, Journal of the Op-
emtional Research Society, to appear.
[133] M. Dell' Amico and S. Martello, Optimal scheduling of tasks on identical
parallel processors, ORSA Journal on Computing 7 (1995), 191-200
[134] M. Dell' Amico and M. Trubian, Applying tabu-search to the job-shop schedul-
ing problem, Annals of Opemtions Research 41 (1993), 231-252.
[135] E. Demeulemeester and W. Herroelen, A branch-and-bound procedure for
the multiple resource-constrained project scheduling problem, Management
Science 38 (1992), 1803-1818.
[136] E. Demeulemeester and W. Herroelen, A branch-and-bound procedure for
the generalized resource-constrained project scheduling problem, Opemtions
Research 45 (1997), 201-212.
A Review of Machine Scheduling 139

[137] J.S. Deogun, On scheduling with ready times to minimize mean flow time,
The Computer Journal 26 (1983), 320-328.
[138] M.L Dessouky and J.S. Deogun, Sequencing jobs with unequal ready times to
minimize mean flow time, SIAM Journal on Computing 10 (1981), 192-202.
[139] M.L Dessouky, B.J. Lageweg, J.K. Lenstra and S.L. van de Velde, Scheduling
identical jobs on uniform parallel machines, Statistica Neerlandica 44 (1990),
115-123.
[140] P. Dileepan and T. Sen, Bicriterion static scheduling research for a single
machine, OMEGA 16 (1988), 53-59.
[141] G. Dobson, Scheduling independent tasks on uniform processors. SIAM Jour-
nal on Computing 13 (1984), 705-716.
[142] U. Dorndorf and E. Pesch, Evolution based learning in ajob shop scheduling
environment, Computers and Operations Research 22 (1993),25-40.
[143] LG. Drobouchevitch and V.A. Strusevich, Heuristics for the two-stage job
shop scheduling problem with a bottleneck machine, CASSM R&D Paper 11,
Centre for Applied Statistics and Systems Modelling, University of Green-
wich, London, UK, 1997.
[144] LG. Drobouchevitch and V.A. Strusevich, A polynomial algorithm for the
three machine open shop with a bottleneck machine, CASSM R&D Paper
13, Centre for Applied Statistics and Systems Modelling, University of Green-
wich, London, UK, 1997.
[145] M. Drozdowski, On the complexity of multiprocessor task scheduling, Bulletin
of the Polish Academy of Sciences. Technical Sciences 43 (1995), 381-392.
[146] M. Drozdowski, Scheduling multiprocessor tasks - An overview, European
Journal of Operational Research 94 (1996), 215-230.
[147] J. Du and J.Y.-T. Leung, Complexity of scheduling parallel task systems,
SIAM Journal on Discrete Mathematics 2 (1989), 473-487.
[148] J. Du and J.Y.-T. Leung, Minimizing total tardiness on one processor is NP-
hard, Mathematics of Operations Research 15 (1990), 483-495.
[149] J. Du and J.Y.-T. Leung, Minimizing mean flow time with release time and
deadline constraints, Journal of Algorithms 14 (1993), 45-68.
[150] J. Du and J.Y.-T. Leung, Minimizing mean flow time in two-machine open
shops and flow shops, Journal of Algorithms 14 (1993), 341-364.
[151] J. Du, J.Y.-T. Leung and C.S. Wong, Minimizing the number of late jobs
with release time constraints, Technical Report, Computer Science Program,
University of Texas, Dallas, USA, 1989.
140 Bo Chen, C.N. Potts, and G.J. Woeginger

[152] J. Du, J.Y.-T. Leung and G.H. Young, Minimizing mean flow time with
release time constraints, Technical Report, Computer Science Program, Uni-
versity of Texas, Dallas, USA, 1988.
[153] J. Du, J.Y.-T. Leung and G.H. Young, Scheduling chain-structured tasks to
minimize makespan and mean flow time, Information and Computation 92
(1991), 219-236.
[154] M.E. Dyer and L.A. Wolsey, Formulating the single machine sequencing prob-
lem with release dates as a mixed integer program, Discrete Applied Mathe-
matics 26 (1990), 255-270.
[155] W.L. Eastman, S. Even and LM. Isaacs, Bounds for the optimal scheduling
of n jobs on m processors, Management Science 11 (1964), 268-279.
[156] J. Edmonds, D.D. Chinn, T. Brecht and X. Deng, Non-clairvoyant multipro-
cessor scheduling of jobs with changing execution characteristics, Proceedings
of the 29th Annual ACM Symposium on Theory of Computing (1997), 120-
129.
[157] S. Eilon and LG. Chowdhury, Minimizing waiting time variance in the single
machine problem, Management Science 23 (1977), 567-575.
[158] S.E. Elmaghraby, The one-machine sequencing problem with delay costs,
Journal of Industrial Engineering 19 (1968), 105-108.
[159] S.E. Elmaghraby and S.H. Park, Scheduling jobs on a number of identical
machines, AIlE Transactions 6 (1974), 1-12.
[160] H. Emmons, One-machine sequencing to minimize certain functions of job
tardiness, Operations Research 17 (1969), 701-715.
[161] L. Epstein, J. Noga, S.S. Seiden, J. Sgall and G.J. Woeginger, On-line schedul-
ing for two related machines, unpublished manuscript, 1997.
[162] P. Erdos and L. Lovasz, Problems and results on 3-chromatic hypergraphs
and some related questions, In: Infinite and Finite Sets (A. Hajnal, R. Rado
and V.T. S6s, eds.), North-HoUnd, Amsterdam, 1975. 609-628.
[163] U. Faigle, W. Kern and G. Turan, On the performance of on-line algorithms
for partition problems, Acta Cybernetica 9 (1989), 107-119.
[164] A. Federgruen and H. Groenevelt, Preemptive scheduling of uniform machines
by ordinary network flow techniques, Management Science 32 (1986), 341-
349.
[165] U. Feige and C. Scheideler, Improved bounds for acyclic job shop scheduling,
manuscript, Weizmann Institute, Rehovot, Israel, 1997.
[166] A. Feldmann, M.-Y. Kao, J. Sgall and S.-H. Teng, Optimal online schedul-
ing of parallel jobs with dependencies, Proceedings of the 25th Annual ACM
Symposium on Theory of Computing (1993),642-651.
A Review of Machine Scheduling 141

[167] A. Feldmann, J. Sgall and S.-H. Teng, Dynamic scheduling on parallel ma-
chines, Theoretical Computer Science 130 (1994),49-72.
[168] T. Fiala, An algorithm for the open-shop problem, Mathematics of Operations
Research 8 (1983), 100-109.
[169] A. Fiat and G.J. Woeginger, On-line scheduling on a single machine: Mini-
mizing the total completion time, Technical Report Woe-04, Department of
Mathematics, TV Graz, Graz, Austria, 1997.
[170] G. Finn and E. Horowitz, A linear time approximation algorithm for multi-
processor scheduling, BIT 19 (1979), 312-320.
[171] H. Fisher and G.L. Thompson, Probabilistic learning combinations of local
job-shop scheduling rules, in J.F. Muth and G.L. Thompson (eds.) Industrial
Scheduling, Prentice Hall, Englewood Cliffs, 1963, 225-25I.
[172] M.L. Fisher, A dual algorithm for the one-machine scheduling problem, Math-
ematical Programming 11 (1976), 229-25I.
[173] M.L. Fisher, B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan, Surrogate
duality relaxation for job shop scheduling, Discrete Applied Mathematics 5
(1983),65-75.
[174] P.M. Fran<;a, M: Gendreau, G. Laporte and F.M. Muller, A composite heuris-
tic for the identical parallel machine scheduling problem with minimum
makespan objective, Computers and Operations Research 21 (1994), 205-210.
[175] G.M. Frederickson, Scheduling unit-time tasks with integer release times and
deadlines, Information Processing Letters 16 (1983), 171-173.
[176] S. French, Sequencing and Scheduling: An Introduction to the Mathematics
of the Job-Shop, Horwood, Chichester, 1982.
[177] D.K. Friesen, Tighter bounds for the multifit processor scheduling algorithm,
SIAM Journal on Computing 13 (1984), 170-18I.
[178] D.K. Friesen, Tighter bounds for LPT scheduling on uniform processors,
SIAM Journal on Computing 16 (1987), 554-660.
[179] D.K. Friesen and M.A. Langston, Bounds for multifit scheduling on uniform
processors, SIAM Journal on Computing 12 (1983),60-70.
[180] D.K. Friesen and M.A. Langston, Evaluation of a MVLTIFIT-based schedul-
ing algorithm, Journal of Algorithms 7 (1986), 35-59.
[181] T.D. Fry, R.D. Armstrong and H. Lewis, A framework for single machine
multiple objective sequencing research, OMEGA 17 (1989), 595-607.
[182] T.D. Fry, L. Vicens, K. MacLeod and S. Fernandez, A heuristic solution
procedure to minimize if on a single-machine, Journal of the Operational
Research Society 40 (1989), 293-297.
142 Bo Chen, C.N. Potts, and G.J. Woeginger

[183] M. Fujii, T. Kasami and K. Ninomiya, Optimal sequencing of two equiva-


lent processors, SIAM Journal on Applied Mathematics 17 (1969), 784-789.
Erratum: SIAM Journal on Applied Mathematics 20 (1971), 141.
[184] G. Galambos and G.J. Woeginger, An on-line scheduling heuristic with better
worst case ratio than Graham's List Scheduling, SIAM Journal on Comput-
ing 22 (1993),349-355. Erratum: R. Chandrasekaran, B. Chen, G. Galambos,
P.R. Narayanan, A. van Vliet and G.J. Woeginger, A note on "An on-line
scheduling heuristic with better worst case ratio than Graham's List Schedul-
ing", SIAM Journal on Computing 26 (1997), 870-872.
[185] M.R. Garey and D.S. Johnson, Complexity results for multiprocessor schedul-
ing under resource constraints, SIAM Journal on Computing 4 (1975), 397-
411.
[186] M.R. Garey and D.S. Johnson, Scheduling tasks with nonuniform deadlines
on two processors, Journal of the Association for Computing Machinery 23
(1976), 461-467.
[187] M.R. Garey and D.s. Johnson, Two-processor scheduling with start-times
and deadlines, SIAM Journal on Computing 6 (1977),416-426.
[188] M.R. Garey and D.S. Johnson, Strong NP-completeness results: motivation,
examples and implications, Journal of the Association for Computing Ma-
chinery 25 (1978), 499-508.
[189] M.R. Garey and D.S. Johnson, Computers and Intmctability: A Guide to the
Theory of NP-Completeness, Freeman, San Francisco, 1979.
[190] M.R. Garey, D.S. Johnson and R. Sethi, The complexity of flowshop and
jobshop scheduling, Mathematics of Opemtions Research 1 (1976), 117-129.
[191] M.R. Garey, D.S. Johnson, B.B. Simons and R.E. Tarjan, Scheduling unit-
time tasks with arbitrary release dates and deadlines, SIAM Journal on Com-
puting 10 (1981), 256-269.
[192] M.R. Garey, D.S. Johnson, R.E. Tarjan and M. Yannakakis, Scheduling op-
posing forests, SIAM Journal on Algebmic Discrete Mathematics 4 (1983),
72-93.
[193] M.R. Garey, R.E. Tarjan and G.T. Wilfong, One-processor scheduling with
symmetric earliness and tardiness penalties, Mathematics of Opemtions Re-
search 13 (1988), 330-348.
[194] L. Gelders and P.R. Kleindorfer, Coordinating aggregate and detailed
scheduling decisions in the one-machine job shop: Part I. Theory, Opemtions
Research 22 (1974), 46-60.
[195] L. Gelders and P.R. Kleindorfer, Coordinating aggregate and detailed
scheduling decisions in the one-machine job shop: II-Computation and struc-
ture, Opemtions Research 23 (1975), 312-324.
A Review of Machine Scheduling 143

[196] G.V. Gens and E.V. Levner, Approximation algorithms for certain universal
problems in scheduling theory, Engineering Cybernetics 16 (1978), 31-36.
[197] G.V. Gens and E.V. Levner, Fast approximation algorithm for job sequencing
with deadlines. Discrete Applied Mathematics 3 (1981), 313-318.
[198] J.B. Ghosh, Batch scheduling to minimize total completion time, Operations
Research Letters 16 (1994), 271-275.
[199] J.B. Ghosh and J.N.D. Gupta, Batch scheduling to minimize maximum late-
ness, Operations Research Letters 21 (1997)" 77-80.
[200] B. Giiller and G.L. Thompson, Algorithms for solving production-scheduling
problems, Operations Research 8 (1960), 487-503.
[201] P.C. Gilmore and R.E. Gomory, Sequencing a one state-variable machine:
a solvable case of the traveling salesman problem, Operations Research 12
(1964), 655-679.
[202] C.A. Glass, J.N.D. Gupta and C.N. Potts, Two-machine no-wait flow shop
scheduling with missing operations, Preprint OR75, Faculty of Mathematical
Studies, University of Southampton, Southampton, UK, 1995.
[203] C.A. Glass and C.N. Potts, A comparison of local search methods for per-
mutation flow shop scheduling, Annals of Operations Research 63 (1996),
489-509.
[204] C.A. Glass, C.N. Potts and P. Shade, Unrelated parallel machine scheduling
using local search, Mathematical and Computer Modelling 20 (1994), 41-52.
[205] J.N.D. Gupta and S.S. Reddi, Improved dominance conditions for the three-
machine flowshop scheduling problem, Operations Research 26 (1978), 200-
203.
[206] M.X. Goemans, Improved approximation algorithms for scheduling with re-
lease dates, Proceedings of the 8th Annual ACM-SIAM Symposium on Dis-
crete Algorithms (1997), 591-598.
[207] M.X. Goemans and J. Kleinberg, An improved approximation ratio for the
minimum latency problem, Proceedings of the 7th Annual ACM-SIAM Sym-
posium on Discrete Algorithms (1996), 152-157.
[208] L.A. Goldberg, M. Paterson, A. Srinivasan and E. Sweedyk, Better approx-
imation guarantees for job shop scheduling, Proceedings of the 8th Annual
ACM-SIAM Symposium on Discrete Algorithms (1997), 599-608.
[209] T. Gonzalez, Optimal mean finish time preemptive schedules, Technical Re-
port 220, Computer Science Department, Pennsylvania State University,
USA, 1977.
144 Bo Chen, C.N. Potts, and G.J. Woeginger

[210] T. Gonzalez, A note on open shop preemptive schedules, Unit execution time
shop problems, IEEE 1hmsactions on Computing 28 (1979), 782-786.
[211] T. Gonzalez, Unit execution time shop problems, Mathematics of Operations
Research 7 (1982), 57-66.
[212] T. Gonzalez, O.H.lbarra and S. Sahni, Bounds for LPT schedules on uniform
processors, SIAM Journal on Computing 6 (1977), 155-166.
[213] T. Gonzalez and D.B. Johnson, A new algorithm for preemptive scheduling
of trees, Journal of the Association for Computing Machinery 27 (1980), 287-
312.
[214] T. Gonzalez, E.L. Lawler and S. Sahni, Optimal preemptive scheduling of
two unrelated processors, ORSA Journal on Computing 2 (1990), 219-224.
[215] T. Gonzalez and S. Sahni, Open shop scheduling to minimize finish time,
Journal of the Association for Computing Machinery 23 (1976), 665-679.
[216] T. Gonzalez and S. Sahni, Flow shop and job shop schedules: complexity and
approximation, Operations Research 26 (1978), 36-52.
[217] T. Gonzalez and S. Sahni, Preemptive scheduling of uniform processor sys-
tems, Journal of the Association for Computing Machinery 25 (1978), 92-101.
[218] V.S. Gordon and V.S. Tanaev On minimax problems of scheduling theory
for a single machine (in Russian), Vetsi Akademii Navuk BSSR. Ser. Jizika-
matematychnykh navuk (1983) 3-9.
[219] S.K. Goyal and C. Sriskandarajah, No-wait scheduling: computational com-
plexity and approximation algorithms, Opsearch 25 (1988), 220-244.
[220] J. Grabowski, On two-machine scheduling with release and due dates to min-
imize maximum lateness, Opsearch 17 (1980), 133-154.
[221] J. Grabowski, A new algorithm of solving the flow-shop problem, in G. Fe-
ichtinger and P. Kall (eds.), Operations Research in Progress, Reidel, Dor-
drecht, 1982, 57-75.
[222] J. Grabowski and A. Janiak, Job-shop scheduling with resource-time models
of operations, European Journal of Operational Research 28 (1986), 58-73.
[223] J. Grabowski, E. Nowicki and S. Zdrzalka, A block approach for single-
machine scheduling with release dates and due dates, European Journal of
Operational Research 26 (1986), 278-285.
[224] J. Grabowski, E. Skubalska and C. Smutnicki, On flow shop scheduling with
release and due dates to minimize maximum lateness, Journal of the Opera-
tional Research Society 34 (1983), 615-620.
[225] R.L. Graham, Bounds for certain multiprocessing anomalies, Bell System
Technical Journal 45 (1966), 1563-1581.
A Review of Machine Scheduling 145

[226] R.L. Graham, Bounds on multiprocessing timing anomalies, SIAM Journal


on Applied Mathematics 17 (1969), 416-429.
[227] R.L. Graham, unpublished manuscript, see [335].
[228] R.L. Graham, E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Opti-
mization and approximation in deterministic sequencing and scheduling: a
survey, Annals of Operations Research 5 (1979), 287-326.
[229] M.C. Gupta, Y.P. Gupta and A. Kumar, Minimizing flow time variance in a
single machine system using genetic algorithms, European Journal of Opera-
tional Research 70 (1993), 289-303.
[230] D. Gusfield, Bounds for naive multiple machine scheduling with release times
and deadlines, Journal of Algorithms 5 (1984), 1-6.
[231] L.A. Hall, A polynomial time approximation scheme for a constrained flow-
shop scheduling problem, Mathematics of Operations Research 19 (1994),
68-85.
[232] L.A. Hall, Approximability of flow shop scheduling, Proceedings of the 36th
IEEE Symposium on Foundations of Computer Science (1995),82-91.
[233] L.A. Hall, A.S. Schulz, D.B. Shmoys and J. Wein, Scheduling to minimize av-
erage completion time: Off-line and on-line approximation algorithms, Math-
ematics of Operations Research 22 (1997),513-544.
[234] L.A. Hall and D.B. Shmoys, Approximation algorithms for constrained
scheduling problems, Proceedings of the 30th IEEE Symposium on Founda-
tions of Computer Science (1989), 134-139.
[235] L.A. Hall and D.B. Shmoys, Jackson's rule for one-machine scheduling: mak-
ing a good heuristic better, Mathematics of Operations Research 17 (1992),
22-35.
[236] L.A. Hall, D.B. Shmoys and J. Wein, Scheduling to minimize average com-
pletion time: Off-line and on-line algorithms, Proceedings of the 7th Annual
ACM-SIAM Symposium on Discrete Algorithms (1996), 142-151.
[237] N.G. Hall, W. Kubiak and S.P. Sethi, Earliness-tardiness scheduling prob-
lems, II: deviation of completion times about a restrictive common due date,
Operations Research 39 (1991), 847-856.
[238] N.G. Hall and M.E. Posner, Earliness-tardiness scheduling problems, I:
weighted deviation of completion times about a common due date, Opera-
tions Research 39 (1991),836-846.
[239] N.G. Hall and C. Sriskandarajah, A survey of machine scheduling with block-
ing and no-wait in process, Operations Research 44 (1996), 510-525.
146 Bo Chen, C.N. Potts, and G.J. Woeginger

[240] A.M.A. Hariri and C.N. Potts, An algorithm for single machine sequencing
with release dates to minimize total weighted completion time, Discrete Ap-
plied Mathematics 5 (1983),99-109.
[241] A.M.A. Hariri and C.N. Potts, Algorithms for two-machine flow-shop se-
quencing with precedence constraints, European Journal of Operational Re-
search 17 (1984), 238-248.
[242] A.M.A. Hariri and C.N. Potts, A branch and bound algorithm to minimize
the number of late jobs in a permutation flow-shop, European Journal of
Operational Research 38 (1989), 228-237.
[243] A.M.A. Hariri and C.N. Potts, Heuristics for scheduling unrelated parallel
machines. Computers and Operations Research 18 (1991), 323-331.
[244] A.M.A. Hariri and C.N. Potts, Single machine scheduling with deadlines to
minimize the weighted number of tardy jobs. Management Science 40 (1994),
1712-1719.
[245] A.M.A. Hariri and C.N. Potts, Single machine scheduling with batch set-
up times to minimize maximum lateness, Annals of Operations Research 70
(1997), 75-92.
[246] A.M.A. Hariri, C.N. Potts and L.N. Van Wassenhove, Single machine schedul-
ing to minimize total weighted late work, ORSA Journal on Computing 7
(1995), 232-242.
[247] R. Haupt, A survey of priority rule-based scheduling, OR Spektrum 11 (1989),
3-16.
[248] N. Hefetz and I. Adiri, An efficient optimal algorithm for the two-machines,
unit-time, job shop, schedule-length problem, Mathematics of Operations Re-
search 7 (1982), 354-360.
[249] W.S. Herroelen, Resource-constrained project scheduling-the state of the art,
Operational Research Quarterly 23 (1972), 261-275.
[250] W.S. Herroelen and E.L. Demeulemeester, Recent advances in branch-and-
bound procedures for resource-constrained project scheduling problems, in
P. Chretienne et al. (eds.) Proceedings of the Summer School on Scheduling
Theory and Its Applications, John Wiley & Sons, Chichester, 1995,259-276.
[251] D.S. Hochbaum and D.B. Shmoys, Using dual approximation algorithms for
scheduling problems: Theoretical and practical results, Journal of the Asso-
ciation for Computing Machinery 34 (1987), 144-162.
[252] D.S. Hochbaum and D.B. Shmoys, A polynomial approximation scheme for
machine scheduling on uniform processors: Using the dual approximating
approach, SIAM Journal on Computing 17 (1988), 539-551.
A Review of Machine Scheduling 147

[253] J.E. Holsenback and R.M. Russell, A heuristic algorithm for sequencing on
one machine to minimize total tardiness, Journal of the Operational Research
Society 43 (1992), 53-62.
[254] KS. Hong and J.Y.-T. Leung, Preemptive scheduling with release times and
deadlines, Journal of Real-Time Systems 1 (1989),265-281.
[255] KS. Hong and J.Y.-T. Leung, On-line scheduling of real-time tasks, IEEE
Transactions on Computing 41 (1992), 1326-1331.
[256] J.A. Hoogeveen, Single-machine bicriteria scheduling, Ph.D. thesis, Center for
Mathematics and Computer Science, Amsterdam, The Netherlands, 1992.
[257] J.A. Hoogeveen, Minimizing maximum promptness and maximum lateness
on a single machine, Mathematics of Operations Research 21 (1996), 100-114.
[258] J.A. Hoogeveen, Single-machine scheduling to minimize a function or two or
three maximum cost criteria, Journal of Algorithms 21 (1996), 415-433.
[259] J.A. Hoogeveen and T. Kawaguchi, Minimizing total completion time in a
two-machine flowshop: Analysis of special cases, Proceedings of the 5th IPCO
Conference (1996), 374-388.
[260] J.A. Hoogeveen, J.K Lenstra and B. Veltman, Three, four, five, six, or the
complexity of scheduling with communication delays, Operations Research
Letters 16 (1994), 129-137.
[261] J.A. Hoogeveen, J.K Lenstra and B. Veltman, Preemptive scheduling in a
two-stage multiprocessor flow shop is NP-hard, European Journal of Opera-
tional Research 89 (1996), 172-175.

[262] J.A. Hoogeveen, H. Oosterhout and S.L. van de Velde, New lower and upper
bounds for scheduling around a small common due date, Operations Research
42 (1994), 102-110.
[263] J.A. Hoogeveen, P. Schuurman and G.J. Woeginger, Non-approximability
results for scheduling problems with minsum criteria, Technical Report Woe-
15, Department of Mathematics, TV Graz, Graz, Austria, 1997.
[264] J .A. Hoogeveen and S.L. van de Velde, Scheduling around a small common
due date, European Journal of Operational Research 55 (1991), 237-242.
[265] J .A. Hoogeveen and S.L. van de Velde, Minimizing total completion time
and maximum cost simultaneously is solvable in polynomial time. Operations
Research Letters 17 (1995), 205-208.
[266] J .A. Hoogeveen and S.L. van de Velde, Stronger Lagrangian bounds by use of
slack variables: applications to machine scheduling problems. Mathematical
Programming 70 (1995), 173-190.
148 Bo Chen, C.N. Potts, and G.J. Woeginger

[267] J.A. Hoogeveen and S.L. van de Velde, A branch-and-bound algorithm for
single-machine earliness-tardiness scheduling with idle time, INFORMS Jour-
nal on Computing 8 (1996), 402-412.
[268] J .A. Hoogeveen and S.L. van de Velde, Earliness-tardiness scheduling around
almost equal due date, INFORMS Journal on Computing 9 (1997), 92-99.
[269] J .A. Hoogeveen, S.L. van de Velde and B. Veltman, Complexity of scheduling
multiprocessor tasks with prespecified processor allocations, Discrete Applied
Mathematics 55 (1994), 259-272.
[270] J .A. Hoogeveen and A.P.A. Vestjens, Optimal on-line algorithms for single-
machine scheduling, Proceedings of the 5th IPCO Conference (1996), 404-414.
[271] J.A. Hoogeveen and G.J. Woeginger, Scheduling with controllable processing
times, Manuscript, Department of Mathematics, TU Graz, Graz, Austria,
1998.
[272] W.A. Horn, Single-machine job sequencing with treelike precedence ordering
and linear delay penalties, SIAM Journal on Applied Mathematics 23 (1972),
189-202.
[273] W.A. Horn, Minimizing average flow time with parallel machines, Operations
Research 21 (1973), 846-847.
[274] W.A. Horn, Some simple scheduling algorithms, Naval Research Logistics
Quarterly 21 (1974), 177-185.
[275] E. Horowitz and S. Salmi, Exact and approximate algorithms for scheduling
nonidentical processors, Journal of the Association for Computing Machinery
23 (1976),317-327.
[276] E.C. Horvath, S. Lam and R. Sethi, A level algorithm for preemptive schedul-
ing, Journal of the Association for Computing Machinery 24 (1977), 32-43.
[277] T.C. Hu, Parallel sequencing and assembly line problems, Operations Re-
search 9 (1961), 841-848.
[278] O.H. Ibarra and C.E. Kim, Heuristic algorithms for scheduling independent
tasks on nonidentical processors, Journal of the Association for Computing
Machinery 24 (1977), 280-289.
[279] O.H. Ibarra and C.E. Kim, Approximation algorithms for certain scheduling
problems, Mathematics of Operations Research 3 (1978), 197-204.
[280] I. Ignall and L. Schrage, Applications of the branch-and-bound technique to
some flow-shop scheduling problems, Operations Research 13 (1965),400-412.
[281] J.R. Jackson, Scheduling a production line to minimize maximum tardiness,
Research Report 43, Management Science Research Project, University of
California, Los Angeles, USA, 1955.
A Review of Machine Scheduling 149

[282] J.R. Jackson, An extension of Johnson's result on job lot scheduling, Naval
Research Logistics Quarterly 3 (1956),201-203.
[283] J .M. Jaffe, Efficient scheduling of tasks without full use of processor resources,
Theoretical Computer Science 12 (1980), 1-17.
[284] J.M. Jaffe, An analysis of preemptive multiprocessor job scheduling, Mathe-
matics of Operations Research 5 (1980), 415-52l.
[285] A. Jakoby and R. Reischuk, The complexity of scheduling problems with com-
munication delays for trees, Proceedings of the 9rd Scandinavian Workshop
on Algorithm Theory (1992), 165-177.
[286] S.M. Johnson, Optimal two- and three-stage production schedules with setup
times included, Naval Research Logistics Quarterly 1 (1954), 61-68.
[287] H. Jung, L.M. Kirousis and P. Spirakis, Lower bounds and efficient algorithms
for multiprocessor scheduling of directed acyclic graphs with communication
delays, Information and Computation 105 (1993), 94-104.
[288] B. Jurisch and W. Kubiak, Two-machine open shops with renewable re-
sources, Operations Research 45 (1997), 544-552.
[289] B. Jurisch and W. Kubiak, Algorithms for minclique scheduling problems,
Discrete Applied Mathematics 72 (1997), 115-139.
[290] H.G. Kahlbacher, SWEAT-A program for a scheduling problem with earliness
and tardiness penalties, European Journal of Operational Research 43 (1989),
111-112.
[291] H. Kamoun and C. Sriskandarajah, The complexity of scheduling jobs in
repetitive manufacturing systems, European Journal of Operational Research
70 (1993), 350-364.
[292] J.J. Kanet, Minimizing the average deviation of jobs completion times about
a common due date, Naval Research Logistics Quarterly 28 (1981), 643-65l.
[293] D.R. Karger, S.J. Phillips and E. Torng, A better algorithm for an ancient
scheduling problem, Journal of Algorithms 20 (1996), 400-430.
[294] R.M. Karp, Reducibility among combinatorial problems, in R.E. Miller and
J.W. Thatcher (eds.) Complexity of Computer Computations, Plenum Press,
New York, 1972, 85-103.
[295] M. T. Kaufman, An almost-optimal algorithm for the assembly line scheduling
problem, IEEE Transactions on Computing C-23 (1974),1169-1174.
[296] T. Kawaguchi and S. Kyan, Worst case bound of an LRF schedule for the
mean weighted flow-time problem, SIAM Journal on Computing 15 (1986),
1119-1129.
150 Bo Chen, C.N. Potts, and G.J. Woeginger

[297] H. Kellerer, T. Tautenhahn and G.J. Woeginger, Approximability and non-


approximability results for minimizing total flow time on a single machine,
Proceedings of the 28th Annual ACM Symposium on Theory of Computing
(1996),418-426. To appear in SIAM Journal on Computing.
[298] Y.-D. Kim, Heuristics for flowshop scheduling problems minimizing mean
tardiness, Journal of the Operational Research Society 44 (1993), 19-28.
[299] H. Kise, T. Ibaraki and H. Mine, Performance analysis of six approximation
algorithms for the one-machine maximum lateness scheduling problem with
ready times, Journal of the Operational Research Society of Japan 22 (1979),
205-224.
[300] U. Kleinau, Two-machine shop scheduling problems with batch processing,
Mathematical and Computer Modelling 17 (1993), 55-66.
[301] W.H. Kohler and K. Steiglitz, Exact, approximate and guaranteed accuracy
algorithms for the flow-shop problem n/2/F/F, Journal of the Association
for Computing Machinery 22 (1975), 106-114.
[302] C. Koulamas, The total tardiness problem: Review and extensions, Opera-
tions Research 42 (1994), 1025-1041.
[303] M.Y. Kovalyov, On one machine scheduling to minimize the number of late
items and the total tardiness, Preprint N4, Institute of Engineering Cyber-
netics, Academy of Sciences of Byelorussian SSR, Minsk, Byelorussia, 1991.
[304] M.Y. Kovalyov and W. Kubiak, A fully polynomial time approximation
scheme for the weighted earliness-tardiness problem, Operations Research,
to appear.
[305] M.Y. Kovalyov, C.N. Potts and L.N. Van Wassenhove, A fully polynomial
approximation scheme for scheduling a single machine to minimize total
weighted late work, Mathematics of Operations Research 19 (1994),86-93.
[306] A. Kramer, Scheduling multiprocessor tasks on dedicated processors, Ph.D.
thesis, Fachbereich Mathematik/lnformatik, Universitat Osnabriick, Os-
nabriick, Germany, 1995.
[307] S.A. Kravchenko and Y.N. Sotskov, Optimal makespan schedule for three
jobs on two machines, ZOR - Mathematical Methods of Operations Research
43 (1996), 233-238.
[308] H. Krawczyk and M. Kubale, An approximation algorithm for diagnostic test
scheduling in multicomputer systems, IEEE 7ransactions on Computing 34
(1985), 869-872.
[309] M.J. Krone and K. Steiglitz, Heuristic-programming solution of a flowshop-
scheduling problem, Operations Research 22 (1974), 629-638.
A Review of Machine Scheduling 151

[310] W. Kubiak, ZloZonosc Obliczeniowa Algorytm6w i Problem6w Szeregowania


Zadan przy Ograniczeniach Zasobowych (in Polish), Ph.D. thesis,ICS-Polish
Academy of Sciences, Warsaw, Poland, 1987.
[311] W. Kubiak, Completion time variance minimization on a single machine is
difficult, Operations Research Letters 14 (1993), 49-59.
[312] W. Kubiak, C. Sriskandarajah and K. Zaras, A note on the complexity of
open shop scheduling problems, INFOR 29 (1991), 284-294.
[313] M. Kubale, Preemptive scheduling of two-processor tasks on dedicated pro-
cessors (in Polish), Zeszyty Naukowe Politechniki Sl(}skiej. Seria Automatyka
100/1082 (1990), 145-153.
[314] M. Kunde, Nonpreemptive LP-scheduling on homogeneous multiprocessor
systems, SIAM Journal on Computing 10 (1981), 151-173.
[315] J. Labetoulle, E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Preemp-
tive scheduling of uniform machines subject to release dates, in W.R. Pulley-
blank (ed.) Progress in Combinatorial Optimization, Academic Press, New
York, 1984, 245-261.
[316] B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan, Minimizing maximum
lateness on one machine: computational experience and some applications,
Statistica Neerlandica 30 (1976), 25-41.
[317] B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan, Job-shop scheduling
by implicit enumeration, Management Science 24 (1977),441-450.
[318] B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan, A general bound-
ing scheme for the permutation flow-shop problem, Operations Research 26
(1978), 53-67.
[319] S. Lam and R. Sethi, Worst case analysis of two scheduling algorithms, SIAM
Journal on Computing 6 (1977), 518-536.
[320] R.E. Larson, M.1. Dessouky and R.E. Devor, A forward-backward procedure
for the single machine problem to minimize maximum lateness, lIE 7ransac-
tions 17 (1985), 252-260.
[321] E.L. Lawler, Optimal sequencing of a single machine subject to precedence
constraints, Management Science 19 (1973),544-546.
[322] E.L. Lawler, Sequencing to minimize the weighted number of tardy jobs,
RAIRO Recherche operationnelle S1O(5) (1976), 27-33.
[323] E.L. Lawler, A 'pseudopolynomial' algorithm for sequencing jobs to minimize
total tardiness, Annals of Operations Research 1 (1977), 331-342.
[324] E.L. Lawler, Sequencing jobs to minimize total weighted completion time
subject to precedence constraints, Annals of Operations Research 2 (1978),
75-90.
152 Bo Chen, C.N. Potts, and G.J. Woeginger

[325] E.L. Lawler, Preemptive scheduling of uniform parallel machines to minimize


the weighted number of late jobs, Report BW105, Centre for Mathematics
and Computer Science, Amsterdam, The Netherlands, 1979.
[326] E.L. Lawler, Efficient implementation of dynamic programming algorithms
for sequencing problems, Report BW106, Centre for Mathematics and Com-
puter Science, Amsterdam, The Netherlands, 1979.
[327] E.L. Lawler, A fully polynomial approximation scheme for the total tardiness
problem, Operations Research Letters 1 (1982), 207-208.
[328] E.L. Lawler, Preemptive scheduling of precedence-constrained jobs on parallel
machines, in M.A.H. Dempster, J.K. Lenstra and A.H.G. Rinnooy Kan (eds.)
Deterministic and Stochastic Scheduling, Reidel, Dordrecht, 1982, 101-123.
[329] E.L. Lawler, Recent results in the theory of machine scheduling, In A.
Bachem, M. Grotschel and B. Korte (eds.) Mathematical Programming: The
State of the Art-Bonn 1982, Springer, Berlin, 1983, 202-234.
[330] E.L. Lawler, Scheduling a single machine to minimize the number of late
jobs, Report No.UCB/CSD 83/139, Computer Science Division, University
of California, Berkeley, USA, 1983.
[331] E.L. Lawler, A dynamic programming algorithm for preemptive scheduling of
a single machine to minimize the number of late jobs, Annals of Operations
Research 26 (1990), 125-133.
[332] E.L. Lawler, Knapsack-like scheduling problems, the Moore-Hodgson algo-
rithm and the 'tower of sets' property, Mathematical and Computer Modelling
20 (1994), 91-106.
[333] E.L. Lawler and J. Labetoulle, On preemptive scheduling of unrelated parallel
processors by linear programming, Journal of the Association for Computing
Machinery 25 (1978), 612-619.
[334] E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Minimizing maximum
lateness in a two-machine open shop, Mathematics of Operations Research
6 (1981), 153-158. Erratum: Mathematics of Operations Research 7 (1982),
635.
[335] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan and D.B. Shmoys, Sequenc-
ing and scheduling: algorithms and complexity, in S. Graves, A.H.G. Rinnooy
Kan and P. Zipkin (eds.) Handbooks in Operations Research and Management
Science, Volume 4, Logistics of Production and Inventory, North Holland,
Amsterdam, 1993,445-522.
[336] E.L. Lawler, M.G. Luby and V.V. Vazirani, Scheduling open shops with par-
allel machines, Operations Research Letters 1 (1982), 161-164.
A Review of Machine Scheduling 153

[337] E.L. Lawler and C.U. Martel, Preemptive scheduling oftwo uniform machines
to minimize the number of late jobs, Operations Research 37 (1989), 314-318.
[338] E.L. Lawler and J .M. Moore, A functional equation and its application to
resource allocation and sequencing problems, Management Science 16 (1969),
77-84.
[339] S. Lawrence, Resource constrained scheduling: an experimental investigation
of heuristic scheduling techniques, Graduate School of Industrial Administra-
tion, Carnegie Mellon University, Pittsburgh, USA, 1984.
[340] C.-Y. Lee, T.C.E. Cheng and B.M.T. Lin, Minimizing the makespan in the
3-machine assembly-type flowshop scheduling problem, Management Science
39 {1993}, 616-625.
[341] C.-Y. Lee and S.J. Kim, Parallel genetic algorithms for the earliness tardi-
ness job scheduling problem with general penalty weights, Computers and
Industrial Engineering 28 {1995}, 231-243.
[342] C.-Y. Lee and G.L. Vairaktarakis, Complexity of single machine hierarchi-
cal scheduling: A survey, in P.M. Pardalos {ed.} Complexity in Numerical
Optimization, World Scientific Publishing Company, 1993, 269-298.
[343] T. Leighton, B. Maggs and S. Rao, Packet routing and job shop scheduling
in O{Congestion+Dilation} steps, Combinatorica 14 (1994), 167-186.
[344] T. Leighton, B. Maggs and A. Richa, Fast algorithms for finding O{Con-
gestion+Dilation) packet routing schedules, Technical Report CMU-CS-96-
152, School of Computer Science, Carnegie Mellon University, Pittsburgh,
USA, 1996.
[345] J.K. Lenstra, unpublished manuscript.
[346] J.K. Lenstra and A.H.G. Rinnooy Kan, Complexity of scheduling under prece-
dence constraints, Operations Research 26 (1978), 22-35.
[347] J.K. Lenstra and A.H.G. Rinnooy Kan, Computational complexity of discrete
optimization problems, Annals of Operations Research 4 (1979), 121-140.
[348] J.K. Lenstra and A.H.G. Rinnooy Kan, Complexity results for scheduling
chains on a single machine, European Journal of Operational Research 4
(1980), 270-275.
[349] J.K. Lenstra, A.H.G. Rinnooy Kan and P. Brucker, Complexity of machine
scheduling problems, Annals of Operations Research 1 (1977), 343-362.
[350] J.K. Lenstra, D.B. Shmoys and E. Tardos, Approximation algorithms
for scheduling unrelated parallel machines, Mathematical Programming 46
(1990), 259-271.
154 Bo Chen, C.N. Potts, and G.J. Woeginger

[351] J.K. Lenstra, M. Veldhorst and B. Veltman, The complexity of scheduling


trees with communication delays, Journal of Algorithms 20 (1996), 157-173.

[352] S. Leonardi and D. Raz, Approximating total flow time on parallel machines,
Proceedings of the 29th Annual ACM Symposium on Theory of Computing
(1997), 110-119.
[353] V. Lev and I. Adiri, V-shop scheduling, European Journal of Operational
Research 18 (1984), 51-56.

[354] C.-L. Li, Z.L. Chen and T.C.E. Cheng, A note on one-processor schedul-
ing with asymmetric earliness and tardiness penalties, Operations Research
Letters 13 (1993), 45-48.

[355] R. Li and L. Shi, An on-line algorithm for some uniform processor scheduling,
SIAM Journal on Computing (1998), to appear.

[356] C.Y. Liu and R.L. Bulfin, On the complexity of preemptive open-shop
scheduling problems, Operations Research Letters 4 (1985), 71-74.

[357] C.Y. Liu and R.L. Bulfin, Scheduling open shops with unit execution times
to minimize functions of due dates, Operations Research 36 (1988), 553-559.

[358] J.W.S. Liu and C.L. Liu, Bounds on scheduling algorithms for heterogeneous
computing systems, in J.L. Rosenfeld (ed.) Information Processing, North-
Holland, Amsterdam, 1974,349-353.
[359] J.W.S. Liu and C.L. Liu, Bounds on scheduling algorithms for heterogeneous
computing systems, Technical Report UIUCDCS-R-74-632, Department of
Computer Science, University of lllinois at Urbana-Champaign, USA, 1974.
[360] J.W.S. Liu and C.L. Liu, Performance analysis of heterogeneous multi-
processor computing systems, in E. Gelenbe and R. Mahl (eds.) Computer
Architectures and Networks, North-Holland, Amsterdam, 1974,331-343.

[361] J.W.S. Liu and A. Yang, Optimal scheduling of independent tasks on het-
erogeneous computing systems, Proceedings of the ACM Annual Conference
(1974), 38-45.

[362] E.L. Lloyd, Concurrent task systems, Operations Research 29 (1981), 85-92.

[363] I.N. Lushchakova and V.A. Strusevich, The complexity of open shop schedul-
ing under resource constraints (in Russian), in Solution Methods for Extremal
Problems, Minsk, 1989, 57-65.

[364] I.N. Lushchakova and V.A. Strusevich, Two-stage open shop systems with re-
source constraints (in Russian), Zhurnal Vychistlitel'noj Matematiki i Matem-
aticheskoj Fiziki 29 (1989), 1393-1407.
A Review of Machine Scheduling 155

[365] W. Mao, R.K. Kincaid and A. Ritkin, On-line algorithms for a single machine
scheduling problem, in S.G. Nash and A. Sofer (eds.), The Impact of Emerg-
ing Technologies on Computer Science and Operations Research, Kluwer Aca-
demic Publishers, Boston, 1995,157-173.
[366] C.U. Martel, Preemptive scheduling with release times, deadlines and due
times, Journal of the Association for Computing Machinery 29 (1982), 812-
829.
[367] S. Martello, F. Soumis and P. Toth, Exact and approximation algorithms
for makespan minimization on unrelated parallel machines, Discrete Applied
Mathematics 75 (1997), 169-188.
[368] P. Martin, D.B. Shmoys, A new approach to computing schedules for the job
shop scheduling problem, Proceedings of the 5th IPCO Conference (1996),
389-403.
[369] A.J. Mason, Genetic algorithms and scheduling problems, Ph.D. thesis, De-
partment of Engineering, University of Cambridge, U.K., 1992.
[370] A.J. Mason and E.J. Anderson, Minimizing flow time on a single machine with
job classes and setup times, Naval Research Logistics 38 (1991), 333-350.
[371] H. Matsuo, Cyclic sequencing in the two-machine permutation flow shop:
complexity, worst-case and average-case analysis, Naval Research Logistics
Quarterly 37 (1990), 679-694.
[372] H. Matsuo, C.J. Suh and R.S. Sullivan, A controlled search simulated anneal-
ing method for the single machine weighted tardiness problem, Working paper
87-12-2, Department of Management, The University of Texas at Austin, TX,
USA, 1987.
[373] H. Matsuo, C.J. Suh and R.S. Sullivan, A controlled search simulated an-
nealing method for the general jobshop scheduling problem, Working paper
03-44-88, Department of Management, The University of Texas at Austin,
TX, USA, 1988.
[374] S.T. McCormick and M.L. Pinedo, Scheduling n independent jobs on m uni-
form machines with both flow time and makespan objectives: A parametric
analysis, ORSA Journal on Computing 7 (1995), 63-77.
[375] G.B. McMahon, Optimal production schedules for flow shops, Canadian Op-
erational Research Society Journal 7 (1969), 141-151.
[376] G.B. McMahon and P.G. Burton, Flow-shop scheduling with the branch-and-
bound method, Operations Research 15 (1967), 473-481.
[377] G.B. McMahon and M. Florian, On scheduling with ready times and due
dates to minimize maximum lateness, Operations Research 23 (1975), 475-
482.
156 Bo Chen, C.N. Potts, and G.J. Woeginger

[378] R. McNaughton, Scheduling with deadlines and loss functions, Management


Science 6 (1959), 1-12.
[379] J. Mittenthal, M. Raghavachari and A.I. Rana, A hybrid simulated annealing
approach for single machine scheduling problems with non-regular penalty
functions, Computers and Operations Research 20 (1993), 103-111.
[380] R.H. Mohring, M.W. Schaffter and A.S. Schulz, Scheduling jobs with com-
munication delays: using infeasible solutions for approximation, Proceedings
of the 4th Annual European Symposium on Algorithms (1996), 76-90.
[381] C.L. Monma, Linear-time algorithms for scheduling on parallel processors,
Operations Research 30 (1982), 116-124.
[382] C.L. Monma and C.N. Potts, On the complexity of scheduling with batch
setup times, Operations Research 37 (1989), 798-804.
[383] C.L. Monma and C.N. Potts, Analysis of heuristics for preemptive parallel
machine scheduling with batch setup times, Operations Research4193 981-
993.
[384] C.L. Monma and A.H.G. Rinnooy Kan, A concise survey of efficiently solv-
able special cases of the permutation flow-shop problem, RAIRO Recherche
operationnelle 17 (1983), 105-119.
[385] J.M. Moore, An n job, one machine sequencing algorithm for minimizing the
number of late jobs, Management Science 15 (1968), 102-109.
[386] J.F. Morrison, A note on LPT scheduling, Operations Research Letters 7
(1988), 77-79.
[387] R. Motwani, S. Phillips and E. Torng, Non-clairvoyant scheduling, Theoretical
Computer Science 130 (1994), 17-47.
[388] J.H. Muller and J. Spinrad, Incremental modular decomposition. Journal of
the Association for Computing Machinery 36 (1989), 1-19.
[389] A. Munier and C. Hanen, Using duplication for scheduling unitary tasks on
m processors with unit communication delays, Theoretical Computer Science
178 (1997), 119-127.
[390] A. Munier and C. Hanen, An approximation algorithm for scheduling unitary
tasks on m processors with communication delays, Internal Report LITP 12,
Universite P. et M. Curie, Paris, France, 1995.
[391] A. Munier and J.-C. Konig, A heuristic for a scheduling problem with com-
munication delays, Operations Research 45 (1997), 145-147.
[392] R.R. Muntz and E.G. Coffman, Jr., Optimal scheduling on two-processor
systems, IEEE Transactions on Computing C-18 (1969), 1014-1020.
A Review of Machine Scheduling 157

[393] R.R. Muntz and E.G. Coffman, Jr., Preemptive scheduling of real tasks on
multiprocessor systems, Journal of the Association for Computing Machinery
17 (1970), 324-338.
[394] R. Nakano and T. Yamada, Conventional genetic algorithm for job shop prob-
lems, in R.K. Belew and L.B. Booker (eds.) Proceedings of the Fourth Inter-
national Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo,
1991, 474-479.
[395] M. Nawaz, E.E. Enscore, Jr., and I. Ham, A heuristic algorithm for the m-
machine, n-job flow-shop sequencing problem, OMEGA 11 (1983), 91-95.
[396] E. Nemeti, Das reihenfolgeproblem in der fertinungsprogrammierung und lin-
earplanung mit logischen bedingungen, Mathematica (Cluj) 6 (1964), 87-99.
[397] Ju.D. Neumytov and S.V. Sevastianov, An approximation algorithm with
best possible bound for the counter routes problem with three machines (in
Russian), Upravlyaemye Sistemy 31 (1993), 53-65.
[398] E. Nowicki and C. Smutnicki, On lower bounds on the minimum maximum
lateness on one machine subject to release dates, Opsearch 24 (1987), 106-110.
[399] E. Nowicki and C. Smutnicki, Worst-case analysis of an approximation algo-
rithm for flow-shop scheduling, Operations Research Letters 8 (1989), 171-177.
[400] E. Nowicki and C. Smutnicki, An approximation algorithm for single-machine
scheduling problem with release times and delivery times, Discrete Applied
Mathematics 48 (1993),69-79.
[401] E. Nowicki and C. Smutnicki, A fast tabu search algorithm for the permuta-
tion flow shop problem, European Journal of Operational Research 91 (1996),
160-175.
[402] E. Nowicki and C. Smutnicki, A fast taboo search algorithm for the job shop
problem, Management Science 42 (1996), 797-813.
[403] E. Nowicki and S. Zdrzalka, A note on minimizing maximum lateness in
a one-machine sequencing problem with release dates, European Journal of
Operational Research 23 (1986), 266-267.
[404] E. Nowicki and S. Zdrzalka, A two-machine flow shop scheduling problem with
controllable job processing times, European Journal of Operational Research
34 (1988), 208-220.
[405] E. Nowicki and S. Zdrzalka, A survey of results for sequencing problems with
controllable processing times, Discrete Applied Mathematics 26 (1990), 271-
287.
[406] E. Nowicki and S. Zdrzalka, A bicriterion approach to preemptive scheduling
of parallel machines with controllable job processing times, Discrete Applied
Mathematics 63 (1995), 237-256.
158 Bo Chen, C.N. Potts, and G.J. Woeginger

[407] F.A. Ogbu and D.K. Smith, The application of the simulated annealing al-
gorithm to the solution of the nlmlCmax flowshop problem, Computers and
Operations Research 17 (1990), 243-253.
[408] F.A. Ogbu and D.K. Smith, Simulated annealing for the permutation flow-
shop problem, OMEGA 19 (1991), 64-67.
[409] I.H. Osman and C.N. Potts, Simulated annealing for permutation flow-shop
scheduling, OMEGA 17 (1989), 551-557.
[410] S.S. Panwalkar and R. Rajagopalan, Single-machine sequencing with control-
lable processing times, European Journal of Operational Research 59 (1992),
298-302.
[411] S.S. Panwalkar, M.L. Smith and C.P. Koulamas, A heuristic for the single
machine tardiness problem, European Journal of Operational Research 70
(1993), 304-310.
[412] S.S. Panwalkar, M.L. Smith and A. Seidmann, Common due date assignment
to minimize total penalty for the one machine sequencing problem, Operations
Research 30 (1982), 391-399.
[413] C.H. Papadimitriou, Computational Complexity, Addison-Wesley, 1994.
[414] C.H. Papadimitriou and P.C. Kannelakis, Flow shop scheduling with limited
temporary storage, Journal of the Association for Computing Machinery 27
(1980), 533-549.
[415] C.H. Papadimitriou and M. Yannakakis, Towards an architecture-indepen-
dent analysis of parallel algorithms, SIAM Journal on Computing 19 (1990),
322-328.
[416] E. Pesch, Machine learning by schedule decomposition, Working Paper, Fac-
ulty of Economics and Business Administration, University of Limburg, Maas-
tricht, The Netherlands, 1993.
[417] C. Phillips, C. Stein and J. Wein, Scheduling jobs that arrive over time,
Proceedings of the 4th Workshop on Algorithms and Data Structures (1995),
86-97. To appear in Mathematical Programming.
[418] C. Picouleau, Etude de problemes les systemes distribues, Ph.D. thesis, Univ.
Pierre et Marie Curie, Paris, France, 1992.
[419] C. Picouleau, New complexity results on scheduling with small communica-
tion delays, Discrete Applied Mathematics 60 (1995),331-342.
[420] M.E. Posner, Minimizing weighted completion times with deadlines, Opera-
tions Research 33 (1985), 562-574.
[421] C.N. Potts, The job-machine scheduling problem, Ph.D. thesis, University of
Birmingham, U.K., 1974.
A Review of Machine Scheduling 159

[422] C.N. Potts, An algorithm for the single machine sequencing problem with
precedence constraints, Mathematical Programming Study 13 (1980), 78-87.
[423] C.N. Potts, An adaptive branching rule for the permutation flow-shop prob-
lem, European Journal of Operational Research 5 (1980), 19-25.
[424] C.N. Potts, Analysis of a heuristic for one machine sequencing with release
dates and delivery times, Operations Research 28 (1980), 1436-1441.
[425] C.N. Potts, A Lagrangean based branch and bound algorithm for single ma-
chine sequencing with precedence constraints to minimize total weighted com-
pletion time, Management Science 31 (1985), 1300-1311.
[426] C.N. Potts, Analysis of a linear programming heuristic for scheduling unre-
lated parallel machines, Discrete Applied Mathematics 10 (1985), 155-164.
[427] C.N. Potts, Analysis of heuristics for two-machine flow-shop sequencing sub-
ject to release dates, Mathematics of Operations Research 10 (1985), 576-584.
[428] C.N. Potts, S.V. Sevastianov, V.A. Strusevich, L.N. Van Wassenhove and
C.M. Zwaneveld, The two-stage assembly scheduling problem: complexity
and approximation, Operations Research 43 (1995), 346-355.
[429] C.N. Potts, D.B. Shmoys and D.P. Williamson, Permutation vs. non-
permutation flow shop schedules, Operations Research Letters 10 (1991), 281-
284.
[430] C.N. Potts and L.N. Van Wassenhove, A decomposition algorithm for the
single machine total tardiness problem, Operations Research Letters 1 (1982),
177-181.
[431] C.N. Potts and L.N. Van Wassenhove, An algorithm for single machine se-
quencing with deadlines to minimize total weighted completion time, Euro-
pean Journal of Operational Research 12 (1983), 379-387.
[432] C.N. Potts and L.N. Van Wassenhove, A branch and bound algorithm for the
total weighted tardiness problem, Operations Research 33 (1985), 363-377.
[433] C.N. Potts and L.N. Van Wassenhove, Dynamic programming and decompo-
sition approaches for the single machine total tardiness problem, European
Journal of Operational Research 32 (1987), 404-414.
[434] C.N. Potts and L.N. Van Wassenhove, Algorithms for scheduling a single
machine to minimize the weighted number of late jobs, Management Science
34 (1988), 843-858.
[435] C.N. Potts and L.N. Van Wassenhove, Single machine tardiness sequencing
heuristics, lEE Transactions 23 (1991), 346-354.
[436] C.N. Potts and L.N. Van Wassenhove, Integrating scheduling with batching
and lot-sizing: a review of algorithms and complexity, Journal of the Opera-
tional Research Society 43 (1992), 395-406.
160 Bo Chen, C.N. Potts, and G.J. Woeginger

[437] C.N. Potts and L.N. Van Wassenhove, Single machine scheduling to minimize
total late work, Operations Research 40 (1992), 586-595.
[438] C.N. Potts and L.N. Van Wassenhove, Approximation algorithms for schedul-
ing a single machine to minimize total late work, Operations Research Letters
11 (1992), 261-266.
[439] M. Queyranne, Personal communication cited in [233].
[440] M. Queyranne and A.S. Schulz, Polyhedral approaches to machine scheduling,
Preprint No. 408/1994, Department of Mathematics, Technical University of
Berlin, Berlin, Germany, 1994. To appear in Mathematical Programming.
[441] M. Queyranne and Y. Wang, Single-machine scheduling polyhedra with prece-
dence constraints, Mathematics of Operations Research 16 (1991), 1-20. Er-
ratum: Mathematics of Operations Research 20 (1995), 768.
[442] M. Queyranne and Y. Wang, A cutting-plane procedure for precedence-
constrained single-machine scheduling. Working paper, University of British
Columbia, Vancouver, Canada, 1991.
[443] V.J. Rayward-Smith, The complexity of preemptive scheduling given inter-
processor communication delays, Information Processing Letters 25 (1987),
123-125.
[444] V.J. Rayward-Smith, UET Scheduling with unit interprocessor communica-
tion delays, Discrete Applied Mathematics 18 (1987), 55-7l.
[445] C.R. Reeves, Improving the efficiency of tabu search for machine sequencing
problems, Journal of the Operational Research Society 44 (1993), 375-382.
[446] C.R. Reeves, A genetic algorithm for flowshop sequencing, Computers and
Operations Research 22 (1995), 5-13.
[447] A.H.G. Rinnooy Kan, B.J. Lageweg and J.K. Lenstra, Minimizing total costs
in one-machine scheduling, Operations Research 23 (1975), 908-927.
[448] H. Rock, The three-machine no-wait flow shop is NP-complete, Journal of
the Association for Computing Machinery 31 (1981), 336-345.
[449] H. Rock, Some new results in flow shop scheduling, ZOR - Mathematical
Methods of Operations Research 28 (1984), 1-16.
[450] H. Rock and G. Schmidt, Machine aggregation heuristics in shop-scheduling,
Methods of Operations Research 45 (1983), 303-314.
[451] G. Rote and G.J. Woeginger, Time complexity and linear-time approximation
of the ancient two machine flow shop, Technical Report Woe-14, Department
of Mathematics, TU Graz, Graz, Austria, 1997.
[452] M.H. Rothkopf, Scheduling independent tasks on parallel processors, Man-
agement Science 12 (1966), 437-447.
A Review of Machine Scheduling 161

[453] M.H. Rothkopf and S.A. Smith, There are no undiscovered priority index rules
for minimizing total delay costs, Operations Research 32 (1984),451-456.
[454] F.M. Ruiz Diaz and S. French, A note on SPT scheduling of a single machine
with controllable processing times, Note 154, Department of Decision Theory,
University of Manchester, Manchester, UK, 1984.
[455] R.M. Russell and J.E. Holsenback, Evaluation of leading heuristics for the
single machine tardiness problem, European Journal of Operational Research
96 (1997), 538-545.
[456] R.M. Russell and J.E. Holsenback, Evaluation of greedy, myopic and less-
greedy heuristics for the single machine total tardiness problem, Journal of
the Operational Research Society 48 (1997),640-646.
[457] S. Sahni, Algorithms for scheduling independent tasks, Journal of the Asso-
ciation for Computing Machinery 23 (1976), 116-127.
[458] S. Sahni, Preemptive scheduling with due dates, Operations Research 27
(1979), 925-934.
[459] S. Sahni and Y. Cho, Nearly on line scheduling of a uniform processor system
with release times, SIAM Journal on Computing 8 (1979), 275-285.
[460] S. Sahni and Y. Cho, Complexity of scheduling shops with no wait in process,
Mathematics of Operations Research 4 (1979), 448-457.
[461] S. Sahni and Y. Cho, Scheduling independent tasks with due times on a uni-
form processor system, Journal of the Association for Computing Machinery
27 (1980), 550-563.
[462] S.C. Sarin, S. Ahn and A.B. Bishop, An improved branching scheme for the
branch and bound procedure of scheduling n jobs on m parallel machines to
minimize total weighted fiowtime, International Journal of Production Re-
search 26 (1988), 1183-1191.
[463] J.P. Schmidt, A. Siegel and A. Srinivasan, Chernoff-Hoeffding bounds for ap-
plications with limited independence, SIAM Journal on Discrete Mathematics
8 (1995), 223-250.
[464] L. Schrage, A proof of the shortest remaining processing time processing
discipline, Operations Research 16 (1968), 687-690.
[465] L. Schrage, Solving resource-constrained network problems by implicit
enumeration-nonpreemptive case, Operations Research 18 (1970), 263-278.
[466] L. Schrage, Obtaining optimal solutions to resource constrained network
scheduling problems, Unpublished manuscript, 1971.
[467] L. Schrage and K.R. Baker, Dynamic programming solution of sequencing
problems with precedence constraints, Operations Research 26 (1978), 444-
449.
162 Bo Cben, C.N. Potts, and G.J. Woeginger

[468] A.S. Schulz, Scheduling to minimize total weighted completion time: Perfor-
mance guarantees of LP-based heuristics and lower bounds, Proceedings of
the 5th IPCO Conference (1996), 301-315.
[469] J.M.J. Schutten, S.L. van de Velde and W.H.M. Zijm, Single-machine schedul-
ing with release dates, due dates and family setup times, Management Science
42 (1996), 1165-1174.
[470] P. Schuurman and G.J. Woeginger, A polynomial time approximation scheme
for the tW(rstage multiprocessor flow shop problem, Technical Report Woe-
01, Department of Mathematics, TU Graz, Graz, Austria, 1997. To appear
in Theoretical Computer Science.
[471] P. Schuurman and G.J. Woeginger, Approximation algorithms for the mul-
tiprocessor open shop problem, Technical Report Woe-13, Department of
Mathematics, TU Graz, Graz, Austria, 1997.
[472] R. Sethi, On the complexity of mean flow time scheduling, Mathematics of
Operations Research 2 (1977),320-330.
[473] S.V. Sevastianov, Approximation algorithms for Johnson's and vector sum-
mation problems (in Russian), Upravlyaemye Sistemy 20 (1980), 64-73.
[474] S.V. Sevastianov, Some generalizations of the Johnson problem (in Russian),
Upravlyaemye Sistemy 21 (1981), 45-61.
[475] S.V. Sevastianov, Algorithms with estimates for the Johnson's and Akers-
Friedman problems in the case of three machines (in Russian), Upravlyaemye
Sistemy 22 (1982), 51-57.
[476] S.V. Sevastianov, On some geometric methods in scheduling theory: a survey,
Discrete Applied Mathematics 55 (1994), 59-82.
[477] S.V. Sevastianov, Vector summation in Banach space and polynomial time al-
gorithms for flow shops and open shops, Mathematics of Operations Research
20 (1995), 90-103.
[478] S.V. Sevastianov, Nonstrict vector summation in multi-operation scheduling,
Technical Report CaSaR 95-37, Department of Mathematics and Comput-
ing, TU Eindhoven, Eindhoven, The Netherlands, 1995.
[479] S.V. Sevastianov and G.J. Woeginger, Makespan minimization in preemptive
two machine job shops, Computing 60 (1998), 73-79.
[480] S.V. Sevastianov and G.J. Woeginger, Makespan minimization in open
shops: a polynomial time approximation scheme, Mathematical Program-
ming (1998), to appear.
[481] J. Sgall, On-line scheduling, in A. Fiat and G.J. Woeginger (eds.) On-line
Algorithms: The State of the Art, Springer, 1998.
A Review of Machine Scheduling 163

[482] N.V. Shaklevich, J.A. Hoogeveen and M. Pinedo, Minimizing total weighted
completion time in a proportionate flow shop, Technical Report COSOR 96-
03, Department of Mathematics and Computing, TU Eindhoven, Eindhoven,
The Netherlands, 1996.
[483] N.V. Shaklevich and Y.N. Sotskov, Scheduling two jobs with fixed and non-
fixed routes, Computing 52 (1994), 17-30.
[484] N.V. Shaklevich and V.A. Strusevich, Two machine open shop scheduling
problem to minimize an arbitrary machine usage regular penalty function,
European Journal of Operational Research 70 (1993), 391-404.
[485] A.H. Sharary and N. Zaguia, Minimizing the number of tardy jobs in single
machine sequencing, Discrete Applied Mathematics 117 (1993), 215-223.
[486] D.B. Shmoys, C. Stein and J. Wein, Improved approximation algorithms for
shop scheduling problems, SIAM Journal on Computing 23 (1994), 617-632.
(487] D.B. Shmoys and E. Tardos, An approximation algorithm for the generalized
assignment problem, Mathematical Programming 62 (1993),461-474.
[488] D.B. Shmoys, J. Wein and D.P. Williamson, Scheduling parallel machines
on-line, SIAM Journal on Computing 24 (1995), 1313-1331.
[489] J. Shwimer, On the N-jobs, one machine, sequence-independent scheduling
problem with penalties: A branch-and-bound solution, Management Science
18B (1972), 301-313.
[490] J.B. Sidney, An extension of Moore's due date algorithm, in S.E. Elmaghraby
(ed.) Symposium on the Theory of Scheduling and its Applications, Lecture
Notes in Economics and Mathematical Systems, Volume 86, Springer, Berlin,
1973, 393-398.
[491] J.B. Sidney, Decomposition algorithms for single-machine sequencing with
precedence relations and deferral costs, Operations Research 23 (1975), 283-
298.
[492] J .B. Sidney, Optimal single-machine scheduling with earliness and tardiness
penalties, Operations Research 25 (1977),62-69.
[493] B.B. Simons, A fast algorithm for single processor scheduling, Proceedings
of the 19th IEEE Symposium on Foundations of Computer Science (1978),
246-252.
[494] S.P. Smith, An experiment on using genetic algorithms to learn scheduling
heuristics, Proceedings of the SPIE-Conference on Applications of Artificial
Intelligence X: Knowledge-Based Systems, SPIE-The International Society
for Optical Engineering, Bellingham, WA, Volume 1707, 1992, 378-386
[495] W.E. Smith, Various optimizers for single-stage production, Naval Research
Logistics Quarterly 3 (1956), 59-66.
164 Bo Chen, C.N. Potts, and G.J. Woeginger

[496] Y.N. Sotskov, The complexity of shop-scheduling problems with two or three
jobs, European Journal of Operational Research 53 (1991),326-336.
[497] Y.N. Sotskovand N.V. Shaklevich, NP-hardness of shop-scheduling problems
with three jobs, Discrete Applied Mathematics 59 (1995), 237-266.
[498] Y.N. Sotskov, T. Tautenhahn and F. Werner, Heuristics for permutation flow
shop scheduling with batch setup times, OR Spektrum 18 (1996), 67-80.
[499] J.P. Sousa and L.A. Wolsey, A time indexed formulation of non-preemptive
singe machine scheduling problems, Mathematical Programming 54 (1992),
353-367.
[500] C. Stein and J. Wein, On the existence of schedules that are near-optimal
for both makespan and total weighted completion time, Operations Research
Letters 21 (1997), 115-122.
[501] A. Steinberg, A strip-packing algorithm with absolute performance bound
two, SIAM Journal on Computing 26 (1997),401-409.
[502] G. Steiner, Minimizing the number of tardy jobs with precedence constraints
and agreeable due dates. Discrete Applied Mathematics 72 (1997), 167-177.
[503] E. Steinitz, Bedingt konvergente Reihen und konvexe Systeme, Journal fUr
Reine und Angewandte Mathematik 143 (1913), 128-175.
[504] J.P. Stinson, E.W. Davis and B.M. Khumawala, Multiple resource-con-
strained scheduling using branch-and-bound, AIlE Thmsactions 10 (1978),
252-259.
[505] R.H. Storer, S.D. Wu and R. Vaccari, New search spaces for sequencing prob-
lems with application to job shop scheduling, Management Science 38 (1992),
1495-1509.
[506] L. Stougie, Personal communication (1995), cited in [547].
[507] V.A. Strusevich, The two-machine super-shop scheduling problem, Journal
of the Operational Research Society 42 (1991),479-492.
[508] V.A. Strusevich, Shop scheduling problems under precedence constraints, An-
nals of Operations Research 69 (1997), 351-377.
[509] W. Szwarc, Elimination methods in the m X n sequencing problem, Naval
Research Logistics Quarterly 18 (1971), 295-305.
[510] W. Szwarc, Optimal elimination methods in the m x n sequencing problem,
Operations Research 21 (1973), 1250-1259.
[511] W. Szwarc, Dominance conditions for the three-machine flow-shop problem
Operations Research 26 (1978), 203-206.
A Review of Machine Scheduling 165

[512] W. Szwarc, Single-machine scheduling to minimize absolute deviation of com-


pletion times from a common due date, Naval Research Logistics 36 (1989),
663-673.
[513] W. Szwarc, Parametric precedence relations in single machine scheduling,
Operations Research Letters 9 (1990), 133-140.
[514] W. Szwarc, Single machine total tardiness problem revisited, in Y. Ijiri (ed.)
Creative and Innovative Approaches to the Science of Management, Quorum
Books, 1993,407-417.
[5151' W. Szwarc, Adjacent orderings in single-machine scheduling with earliness
and tardiness penalties, Naval Research Logistics 40 (1993), 229-243.
[516] W. Szwarc and S.K. Mukhopadhyay, Minimizing a quadratic cost function
of waiting times in single-machine scheduling, Journal of the Operational
Research Society 46 (1995), 753-761.
[517] W. Szwarc and S.K. Mukhopadhyay, Decomposition of the single-machine
total tardiness problem, Operations Research Letters 19 (1996), 243-250.
[518] W. Szwarc, M.E. Posner and J.J. Liu, The single machine problem with a
quadratic cost function of completion times, Management Science3488 1480-
1488.
[519] R. Tadei, J.N.D. Gupta, F. Della Croce and M Cortesi Minimizing makespan
in the two-machine flow-shop with release times, Journal of the Operational
Research Society 49 (1998), 77-85.
[520] E. Taillard, Some efficient heuristic methods for the flow shop sequencing
problem, European Journal of Operational Research 47 (1990),65-74.
[521] E. Taillard, Benchmarks for basic scheduling problems. European Journal of
Operational Research 64 (1993), 278-285.
[522] E. Taillard, Parallel taboo search techniques for the job shop scheduling prob-
lem, ORSA Journal on Computing 6 (1994), 108-117.
[523] F.B. Talbot and J.H. Patterson, An efficient integer programming algorithm
with network cuts for solving resource-constrained scheduling problems. Man-
agement Science 24 (1978), 1163-1174.
[524] V.S. Tanaev, V.S. Gordon and Y.M. Shafransky, Scheduling Theory: Single-
Stage System, Kluwer Academic Publishers, Dordrecht, 1994.
[525] V.S. Tanaev, Y.N. Sotskov and V.A. Strusevich, Scheduling Theory: Multi-
Stage Systems, Kluwer Academic Publishers, Dordrecht, 1994.
[526] B.C. Tansel and I. Sabuncuoglu, New insights on the single machine total
tardiness problem, Journal of the Operational Research Society 48 (1997),
82-89.
166 Bo Chen, C.N. Potts, and G.J. Woeginger

[527] W. Townsend, Minimizing the maximum penalty in the two-machine flow


shop, Management Science 24 (1977), 230-234.
[528] W. Townsend, The single machine problem with quadratic penalty function
of completion times: A branch-and-bound solution, Management Science 24
(1978), 530-534.
[529] A.V. Thzikov, A bi-criteria scheduling problems subject to variation of pro-
cessing times, Zhurnal Vychistlitel'noj Matematiki i Matematicheskoj Fiziki
24 (1984), 1585-1590.
[530] J.D. Ullman, NP-Complete scheduling problems, Journal of Computing and
System Sciences 10 (1975), 384-393.
[531] J.D. Ullman, Complexity of sequencing problems, in E.G. Coffman, Jr., (ed.)
Computer and Job-Shop Scheduling Theory, Wiley, New York, 1976, 139-164.
[532] R.J .M. Vaessens, E.H.L. Aarts and J.K. Lenstra, Job shop scheduling by local
search, INFORMS Journal on Computing 8 (1996),302-317.
[533] J.M. van den Akker, LP-Based Solution Methods for Single-Machine Schedul-
ing Problems, Ph.D. thesis, Department of Mathematics and Computing
Science, Eindhoven University of Technology, Eindhoven, The Netherlands,
1994.
[534] J.M. van den Akker, J.A. Hoogeveen and S.L. van de Velde, Parallel ma-
chine scheduling by column generation, Report COSOR 95-35, Department
of Mathematics and Computer Science, Eindhoven University of Technology,
Eindhoven, The Netherlands, 1995.
[535] J.M. van den Akker, J.A. Hoogeveen and S.L. van de Velde, A column gen-
eration algorithm for common due date scheduling, Report, Department of
Mathematics and Computer Science, Eindhoven University of Technology,
Eindhoven, The Netherlands, 1998.
[536] S.L. van de Velde, Minimizing the sum of job completion times in the two-
machine flow shop by Lagrangian relaxation, Annals of Operations Research
26 (1990), 257-268.
[537] S.L. van de Velde, Duality-based algorithms for scheduling unrelated parallel
machines, ORSA Journal on Computing 5 (1993), 192-205.
[538] S.L. van de Velde, Duality decomposition of a single-machine scheduling prob-
lem. Mathematical Programming 69 (1995),413-428.
[539] P.J .M. van Laarhoven, E.H.L. Aarts and J.K. Lenstra, Job shop scheduling
by simulated annealing, Operations Research 40 (1992), 113-125.
[540] L.N. Van Wassenhove and K.R. Baker, A bicriterion approach to time/cost
tradeoffs in sequencing, European Journal of Operational Research 11 (1982),
48-54.
A Review of Machine Scheduling 167

[541] L.N. Van Wassenhove and L.F. Gelders, Solving a bicriterion scheduling prob-
lem, European Journal of Operational Research 4 (1980), 42-48.
[542] T.A. Varvarigou, V.P. Roychowdhury, T. Kailath and E.L. Lawler, Scheduling
in and out forests in the presence of communication delays, IEEE 7ransac-
tions on Parallel and Distributed Systems 7 (1996), 1065-1074.
[543] B. Veltman, Multiprocessor scheduling with communication delays, Ph.D.
thesis, CWI, Amsterdam, The Netherlands, 1993.
[544] B. Veltman, B.J. Lageweg and J.K. Lenstra, Multiprocessor scheduling with
communication delays, Parallel Computing 16 (1990), 173-182.
[545] J .A. Ventura and M.X. Weng, Minimizing single-machine completion time
variance Management Science 41 (1995), 1448-1455.
[546] S. Verma and M. Dessouky Single-machine scheduling of unit-time jobs with
earliness and tardiness penalties, Mathematics of Operations Research, to
appear.
[547] A.P.A. Vestjens, On-Line Machine Scheduling, Ph.D. thesis, Department of
Mathematics and Computing Science, Eindhoven University of Technology,
Eindhoven, The Netherlands, 1997.
[548] R.G. Vickson, Two single machine sequencing problems involving controllable
job processing times, AIlE 7hmsactions 12 (1980), 258-262.
[549] R.G. Vickson, Choosing the job sequence and processing times to minimize
total processing plus flow cost on a single machine, Operations Research 28
(1980), 1155-1167.
[550] F.J. Villarreal and R.L. Bulfin, Scheduling a single machine to minimize the
weighted number of tardy jobs, lIE 7ransactions 15 (1983),337-343.
[551] S.T. Webster, New bounds for the identical parallel processor weighted flow
time problem, Management Science 38 (1992), 124-136.
[552] S.T. Webster, Weighted flow time bounds for scheduling identical processors,
European Journal of Operational Research 80 (1995), 103-111.
[553] S.T. Webster, The complexity of scheduling job families about a common due
date, Operations Research Letters 20 (1997), 65-74.
[554] S.T. Webster and K.R. Baker, Scheduling groups of jobs on a single machine,
Operations Research 43 (1995),692-703.
[555] M.X. Weng and J.A. Ventura, Scheduling about a large common due date
with tolerance to minimize absolute deviation in completion times, Naval
Research Logistics 41 (1994), 843-851.
[556] F. Werner, On the heuristic solution of the permutation flow shop problem
by path algorithms, Computers and Operations Research 20 (1993), 707-722.
168 Bo Chen, C.N. Potts, and G.J. Woeginger

[557] M. Widmer and A. Hertz, A new heuristic method for the flow shop sequenc-
ing problem, European Journal of Operational Research 41 (1989), 186-193.
[558] L.J. Wilkerson and J.D. Irwin, An improved method for scheduling indepen-
dent tasks, AIlE Transactions 3 (1971), 239-245.
[559] D.P. Williamson, L.A. Hall, J.A. Hoogeveen, C.A.J. Hurkens, J.K. Lenstra,
S.V. Sevastianov and D.B. Shmoys, Short shop schedules, Operations Re-
search 45 (1997), 288-294.
[560] D.A. Wismer, Solution of the flow shop scheduling problem with no interme-
diate queues, Operations Research 20 (1972), 689-697.
[561] G.J. Woeginger, An approximation scheme for minimizing agreeably weighted
variance on a single machine, Technical Report Woe-21, Department of Math-
ematics, TU Graz, Graz, Austria, 1998.
[562] G.J. Woeginger, When does a dynamic programming formulation guaran-
tee the existence of an FPTAS?, Technical Report Woe-27, Department of
Mathematics, TU Graz, Graz, Austria, 1998.
[563] D.L. Woodruff and M.L. Spearman, Sequencing and bat ching for two classes
of jobs with deadlines and setup times, Production and Operations Manage-
ment 1 (1992), 87-102.
[564] T. Yamada and R. Nakano, A genetic algorithm applicable to large-scale
job-shop problems, in R. Manner and B. Manderick (eds.) Parallel Problem
Solving from Nature, Vol.2, Elsevier, Amsterdam, 1992, 281-290.
[565] T. Yamada, B.E. Rosen and R. Nakano, A simulated annealing approach
to job shop scheduling using critical block transition operators, Presented
at IEEE World Congress of Computational Intelligence, Orlando, FL, USA,
1994.
[566] C.A. Yano and Y.-D. Kim, Algorithms for a class of single-machine weighted
tardiness and earliness problems, European Journal of Operational Research
52 (1991), 167-178.
[567] W. Yu, An approximation algorithm for the total tardiness problem (in Chi-
nese), Journal of East China University of Science & Technology 18 (1992),
671-677.
[568] W. Yu, Augmentations of consistent partial orders for the one-machine total
tardiness problem, Discrete Applied Mathematics 68 (1996), 189-202.
[569] W. Yu and Z. Liu, The performance ratio of the time forward algorithm for
the total tardiness problem (in Chinese), Chinese Transaction of Operations
Research 1 (1997), 89-96.
[570] W. Yu and M. Yu, Key position method for total tardiness problem (in Chi-
nese), Chinese Journal of Operations Research 14 (1995), 11-17.
A Review of Machine Scheduling 169

[571] M. Yue, On the exact upper bound for the multifit processor scheduling
algorithms, Annals of Operations Research 24 (1990), 233-259.
[572] S. Zdrzalka, Scheduling jobs on a single machine with release dates, deliv-
ery times, and controllable processing times: worst-case analysis, Operations
Research Letters 10 (1991), 519-532.
[573] S. Zdrzalka, Analysis of approximation algorithms for single-machine schedul-
ing with delivery times and sequence independent batch setup times, Euro-
pean Journal of Operational Research 80 (1995), 371-380.
[574] S. Zdrzalka and J. Grabowski, An algorithm for single machine sequencing
with release dates to minimize maximum cost, Discrete Applied Mathematics
23 (1989), 73-89.
171

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 171-239
©1998 Kluwer Academic Publishers

Routing and Topology Embedding in Lightwave


Networks
Feng Cao
4-192 EEjCS Building, 200 Union Street SE
Department of Computer Science
University of Minnesota, Minneapolis, MN 55455
E-mail: caoCDcs. umn. edu

Contents
1 Introduction 172
1.1 Lightwave Networks . . . . . . . . . . . . . . . . . . . . . . . 173
1.2 Transmission Schedule and Virtual Topology Embedding. . 175
1.3 Reliable Routing Analysis . . . . . . . . . . . . . . . . . . . 177
1.4 Limited 'funing Range For 'funable Transceivers .. 178

2 Transmission Schedule and Topology Embedding 180


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 181
2.2 General Results . . . . . . . . 185
2.3 Directed de Bruijn Graphs. . . 189
2.4 Undirected de Bruijn Graphs . 196
2.5 Summary . . . . . . . . . · . 204

3 Reliable Routing Analysis 205


3.1 Introduction . . . . . . . . . 205
3.2 Concepts for POPS Networks . · . 206
3.3 Connectivity of POPS . . . . . · . 208
3.4 Detection Approaches . . . . . · . 214
3.5 Summary . . . . . . . . . . . . . . . . . . . . .. 215
172 Feng Cao

4 Limit Tuning Range For Tunable Transceivers 215


4.1 Introduction... 216
4.2 Complete Graph . 218
4.3 Rings .. , . 224
4.4 Meshes ... 226
4.5 Hypercubes 229
4.6 Conclusion 233

References

1 Introduction
The need of high speed networks, for applications incorporating high per-
formance distributed computing, multimedia communication and real time
network services, has provided the impetus for the study of optical networks.
Wavelength Division Multiplexing (WDM) has been used widely for study-
ing the throughput performance of optical networks. We studied WDM
lightwave networks with tunable transceivers including designs for lightwave
networks with limited tuning ranges for transceivers.
Many research issues can be transformed into combinatorial optimization
problems. By the help of techniques in combinatorial optimization, such as
graph theory, matrix theory and integer programming, we can design routing
algorithms, make optimal transmission schedules, and embed the virtual
topologies into the physical infrastructures to support high performance
distrubuted computing.
Routing and virtual topology embeddings are needed to support high
performance distributed computing. How to design the transmission sched-
ule depends on how the virtual topology is embedded in the physical light-
wave network. We developed general graph theoretic results and algorithms
and using these built optimal embeddings and optimal transmission sched-
ules for de Bruijn graphs and undirected de Bruijn graphs, assuming certain
conditions on the network parameters. We proved our transmission sched-
ules are optimal over all possible embeddings.
Reliable routing is important for supporting real-time applications on
high speed networks. We introduce some concepts proposed for reliable
routing, and demonstrate how to study this problem in lightwave networks.
Partitioned Optical Passive Stars(POPS) topology is a physical architecture
to scale up local optical passive star networks. POPS data channel can be
efficiently utilized for random permutation-based communication patterns.
Routing and Topology Embedding in Lightwave Networks 173

Reliability is important for such a scaled-up network. We analyzed the fault


tolerant routing properties of POPs networks. We demonstrated some worst
cases due to link errors and the lower bound for connectivity is obtained [12].
Some sufficient approaches were proposed to detect and keep connectivity
of the whole system.
The current technology only allows the transceivers to be tunable in a
small range, a fact ignored in previous studies. We focused on the design
of WDM optical passive star networks with tunable transmitters of limited
tuning range and fixed wavelength receivers. The limited tuning range has
big effects on the maximum delay, the total number of wavelengths which
can be used, and the topological embedding. Different network topologies
were analyzed in our study. The relationship between the total number of
wavelengths which can be utilized and the embedded topology is established.
The optimal embedding algorithms are given for the systems embedded with
different virtual topologies.

1.1 Lightwave Networks


The need of high speed networks, for applications incorporating high per-
formance parallel computing, multimedia communication and real time net-
work services, has provided the impetus for the study of optical networks.
Optical fibers are attractive for their high bandwidth, immunity to electro-
magnetic radiation, and security. The available bandwidth for optical fibers
is up to 30 THZ. Optical fibers are not only the medium of choice for wide
area networks, but are also being deployed extensively in local area environ-
ments. However, while optical networks present new possibilities for high
speed networks, they also present new network design problems that must
be surmounted before they become practical.
Wavelength Division Multiplexing (WDM) has been used widely for
studying the throughput performance of optical networks [4, 10, 22, 23, 43,
48, 49, 54, 57, 58], especially those employing optical passive star couplers
[7, 11, 31, 40, 41, 59] . The reason is that electronically operated interface
devices can not match the speed of the optical transmission media. WDM
is used to divide the total available bandwidth into different wavelengths,
each running at a speed compatible with electronic devices. Different wave-
lengths are used for transmission in the optical fiber simultaneously with
enough spacing between adjacent wavelengths. This makes WDM attractive
for optical networks since they alleviate the optical-to-electronic bottleneck.
In this study, we use the terms wavelength and channel interchangeably.
Time Division Multiplexing (TDM) is another approach which uses the
174 Feng Cao

time slots to assign the bandwidth to different applications in optical net-


works. Time Wavelength Division Multiplexing (TWDM) is the mix of TDM
and WDM, which receives more and more attention [4, 43].

0pIi00l Puoi..

s...
r--- IE--

'0 I '0

'I I 'I

'N-I I 'N-I

Figure 1: A N-station Optical Passive Star Network

An optical passive star coupler is one of the most common ways to in-
terconnect an optical network via optical fibers [17, 19, 25, 32, 21, 26, 31,
35, 36, 37, 55, 56]. Lots of optical networks are based on star couplers [27].
Figure 1 shows an optical passive star network with N stations. Each sta-
tion sends its signal to the passive star coupler through its transmitter on
a specific channel. In the passive star coupler, all the signals are combined
and broadcast to all the stations. The signals in a certain channel can be
received by the stations whose receivers occupy the corresponding channel.
There are a number of advantages for using an optical passive star network.
First, it is very simple and is completely passive. Second, its scalability is
very good up to a certain number of stations. Lastly optical passive stars
offer flexibility of topological embeddings. Different topological embeddings
are required for high performance distributed computing and application
topology dependent communications.
An issue in an optical passive star network is whether the network is
single-hop or multi-hop. In a single-hop architecture [51], fast tunable
transceivers are employed and the wavelength assignments are performed
on a per-packet basis. When the sending station knows the wavelength oc-
cupied by the receiver of the receiving station, the sending station need only
tune its transmitter to the receiver's wavelength and send the message. The
transmitting station thus needs one hop to communicate with any receiving
station. In a multi-hop architecture [38, 39, 48, 49], fixed or slowly-tunable
transceivers are employed. Because the sending station may not be able
Routing and Topology Embedding in Lightwave Networks 175

to tune its transmitters to the wavelength occupied by the receiver of the


receiving station directly, the sending station may need several hops to com-
municate with one receiving station. There are tradeoffs between single-hop
and multi-hop architectures. For example, single-hop architectures intro-
duce more overhead for each packet due to transmitter-receiver coordination
and the tuning time, while multi-hop architectures introduce higher average
end-to-end delay.

1.2 Transmission Schedule and Virtual Topology Embedding


To support high performance distributed computing, transmission schedules
and virtual topology embeddings must be provided to efficiently utilize the
high bandwidth of optical fibers. The embedding of a virtual topology and
the transmission schedule in the lightwave network embedded with the vir-
tual topology are closely related. How to design the transmission schedule
depends on how the virtual topology is embedded in the physical lightwave
network. Different embeddings of the virtual topology will induce different
transmission schedules.
We want to embed a virtual network topology, or graph, in a physical
network. This entails mapping the virtual vertices of the topology to the
physical nodes of the network. Given this embedding, we then want to sched-
ule transmissions in a repeating cycle, so that during each cycle a packet is
transmitted along each edge of the virtual topology. This transmission cycle
is called all-to-neighbor Transmission. The reason is that each node/station
in the system has a dedicated slot to talk to each of its neighbors.
All-to-neighbor transmission is one of most common communication pat-
terns in distributed computing, parallel computing, network protocols and
network operation and maintain(OAM). In distributed computing, if one-to-
all broadcasting is done from one node to all other nodes along a spanning
tree, it is actually an all-to-neighbor transmission for a directed tree (peo-
ple often call this tree top-down for the transmission starts from the root
and passes down). Similarly, if all-to-one reduction is done along a span-
ning tree for distributed computing, it is an all-to-neighbor transmission
for a reverse directed tree (people often call this tree bottom-up for the
transmission starts from the leaf nodes and passes up). In a lot of com-
munication protocols, flow specifications and information about availability
of resources are needed for better integrated network services. An Efficient
all-to-neighbor transmission can improve the performance of communica-
tion protocols. Some examples are teleconferencing and Available-Bit-Rate
Asychronous Transmission Mode( ABR ATM). In network operation and
176 Feng Cao

maintain, each station or router wants to know the status of each of its
neighbors and send out its status and requirements to its neighbors. aU-to-
neighbor Transmission can be used to exchange the information and reduce
bandwidth for this kind of operations.

one-to-all broadcasting all-to-one reduction

Figure 2: An example for all-to-neighbor transmission

How to design an all-to-neighbor transmission. is more important for


high speed networks, especially for fiber optical networks. If we want to
take advantage of the huge bandwidth of fiber channels, the transmission
cycle for any kind of communication patterns should be finished as soon
as possible in a TDWM lightwave network. That means the time slots
for all-to-neighbor transmission cycle should be as small as possible. This
makes bandwidth available for other applications or another round of all-to-
neighbor transmission.
The first objective of our study is to find the optimal transmission sched-
ule, i.e. the minimum number of time slots for each cycle. The second
objective is to minimize the number of tuning times of tunable transmitters
during each cycle. The minimum number of time slots for each cycle means
the best utilization of lightwave networks bandwidth. The fewer number of
tuning times may reduce the complexity of the transmission schedule. The
number of time slots and the tuning times are integers. Therefore, both
objectives are combinatorial optimization problems.
There is previous work related with the lightwave networks embedded de
Bruijn graphs or other topologies. Sivarajan and Ramaswami [54] studied
the throughput and delay performance of de Bruijn graphs as logical topolo-
gies in lightwave networks under different routing schemes. The problem
which is to find the transmission schedule with each cycle with the min-
imum number of time slots has already been considered for the complete
graph [41] and the hypercube [40] virtual topologies. Many other papers
consider similar problems [4, 7, 22, 43, 58, 59]. We study this problem
for the lightwave networks embedded some virtual topologies, especially de
Routing and Topology Embedding in Lightwave Networks 177

Bruijn graphs and undirected de Bruijn graphs.


In Section 2, we develop general graph theoretic results and algorithms
and using these build optimal embeddings and optimal transmission sched-
ules for de Bruijn graphs and undirected de Bruijn graphs, assuming certain
conditions on the network parameters. We prove our transmission schedules
are optimal over all possible embeddings. This is the first work studying the
number of the tuning times of tunable devices for transmission schedule.

1.3 Reliable Routing Analysis


In the design of interconnected networks, one of the most important top-
ics is reliability. Fault-tolerant routing is an inevitable issue for supporting
network services. People are trying to reduce communication delay even
in faulty environments. One example is to provide real-time services for
control, command and communication. The fault-tolerant routing becomes
critical for making sure that time constraints are satisfied through computer
networks. In this study, we will focus on fault-tolerant routing in local com-
puter networks. The same idea can also be applied to wide area networks,
such as providing efficient services through internet.
The typical approach to study routing in computer networks is to try to
find the shortest path between the sending station and the receiving station.
Whenever some stations are faulty on the path between the sending station
and the receiving station, the management protocol has to find a way to
bypass those faulty stations and set up a new path between them. Similarly,
if this new path is disconnected again, a third path needs to be set up if it
is possible (the network is still connected or there still exists a path between
the sending station and the receiving station).
We study reliability in the all-optical Partitioned Optical Passive Stars
(POPS) topology. POPS is a physical architecture to scale the traditional
optical passive star couplers and explore the advantages of high noise im-
munity and single-hop. POPS is a non-hierarchical structure and connects
several optical passive star couplers together by the help of some interme-
diate optical passive star couplers. It remains the property of high noise
immunity, single-hop, and no intermediate electronic/optical conversions.
POPS is an design without considering the power budget problems, which
is flexible to extend the optical passive star networks and keep the simplicity
of system.
It was shown in [33] that POPS data channel can be efficiently uti-
lized for random permutation-based communication patterns. In most of
applications in network computing and parallel computing, some common
178 Feng Gao

communication patterns are widely used. For example, all-to-all personal-


ized communication is the common way to globally exchange information.
Global reduction or global broadcasting is the pattern to collect data from
all slaves to the master in the master-slave model. In [33], they studied
four common communication patterns in TDM POPS networks which shows
POPS networks are supportive for distributed/networking computing.
As a scaled-up topology, fault-tolerant routing is very important for
POPS. In what situations the whole system is partitioned into several dis-
connected components? Is there any optimal way to determine and fix the
network partitions? There was no result about connectivity of POPS in
previous study. Both the station errors and the link errors are discussed
with respect to network partitions in Section 3. Some worst cases of disco
nnected POPS networks due to link errors are analyzed for obtaining the
lower bounds on the connectivity. On the other hand, several efficient ap-
proaches are proposed to optimize the detection about the network connec-
tivity, these approaches can also be used to fix the disconnected system.

1.4 Limited Tuning Range For Tunable Transceivers


The current technology only allows the transceivers to be tunable in a small
range, a fact ignored in previous studies. We analyze the effect of the limited
tunable range of transceivers. Each channel requires 1-2 nm for wide band-
width and current technology can only support 3-7 nm for large bandwidth
devices. This means the reasonable tunable range can only be 3 to 7 wave-
lengths with current technology. Even in the near future, the tunable range
is not likely to increase significantly because of technical difficulties. We
study the effects of the limited tunable range of transceivers on the optical
passive star network with an embedded topology.
To make our study more general, we assume that the tunable range is
no more than k channels. Current technologies can allow k to be single
digits. In the future, k may be larger. Our result can still be applicable for
k is a parameter in this study. In an optical passive star network embedded
with a virtual topology which is not a complete graph, it is assumed that
any two stations connected by a link in the virtual topology should be able
to communicate with each other in one hop. In an optical passive star
network embedded with a complete graph, it is impossible to have a station
communicating with all other stations in one hop unless the total number
of wavelengths used is no more than k.
We assume that there are a total of p wavelengths available for the whole
system. Each station has a tunable transmitter and a fixed receiver. Let A
Routing and Topology Embedding in Lightwave Networks 179

be a station in the system and its transmitter be tunable in the range of [a,
a+k-1]. The wavelength of A's receiver must be in [a, a+k-1]. Only those
nodes with the wavelength of their receivers in [a, a+k-1] can receive the
message from A directly, i.e. it will take one hop to reach those nodes. For
the other stations, it takes multi-hop transmission to receive the message
from A. An example is shown in Figure 3 to see the effect for assigning
wavelengths to embed a mesh into an optical network.
There are many communication protocols for optical networks [26, 19,
21,26,36,44,52,55]. But there is no specific communication protocol about
lightwave networks with limited tuning ranges for transceivers. We proposed
communication protocols based on different assumptions about transmitters
and receivers.
D
r r 0: I..... I.!:!:: ...3, .....)
A: ,.1..1, ~.. I. 11+2)

0 c c: ,...2, ... 3...........5. a+6J


(
D: '... 1, ...2, ...3.........51

• ~

H: 1...3, ...4, "". a+6,"'7)

Figure 3: An example for wavelength assignment with k=5. The underlined


number for each station means the wavelength for its receiver.

In an optical passive star network embedded with a complete graph,


we study the relationship between the total number of wavelengths used
and the maximum delay without congestion in the system. That means
we only consider hop-counts and ignore the potential re-transmissions in
our evaluation of delay. The optimal embedding algorithm that minimizes
the maximum delay is given. One effect of the limited tunable range is
the throughput bottleneck for some wavelengths in the system no matter
how the wavelengths are assigned. This bottleneck indicates the complete
graph is not suitable as the embedding topology for a system with uniform
communication among all stations. But we show how one-to-all broadcasting
in such a system can be efficient.
In an optical passive star network embedded with a topology other than
the complete graph, the following questions need to be answered:
• How to create connections between the sending station and the receiv-
ing station? How to resolve transmission collisions in each channel?
180 Feng Cao

• How to embed a virtual topology on an optical passive star network


satisfying the constraint that neighboring nodes in the virtual topology
are one hop away?

• What is the relationship between the topology and the total number
of wavelengths which can be used? Are there any tight upper bounds
on the total wavelengths which can be used?

• How to embed the topology in the optical passive star network to use
as many wavelengths as possible?

The last three questions are combinatorial optimization problems. The


optimal solutions to these questions can reduce the cost of building up light-
wave networks, and improve the utilizations of wavelength bandwidths.
In Section 4, we proposed communication protocols which is efficient for
limited tuning ranges. We study the systems embedded with rings, meshes
and hypercubes. All of them are important structures which are widely used
in communication and distributed computing [26, 43, 45, 4, 22, 58]. The
methods we use can also be applied for studying optical networks embedded
with other virtual topologies.

2 Transmission Schedule and Topology Embedding


We consider the problem of embedding a virtual de Bruijn topology, both
directed and undirected, in a physical optical passive star time and wave-
length division multiplexed (TWDM) network and constructing a schedule
to transmit packets along all edges of the virtual topology in the shortest
possible time. We develop general graph theoretic results and algorithms
and using these build optimal embeddings and optimal transmission sched-
ules, assuming certain conditions on the network parameters. We prove our
transmission schedules are optimal over all possible embeddings.
As a general framework we use a model of the passive star network with
fixed tuned receivers and tunable transmitters. Our transmission schedules
are optimal regardless of the tuning time. Our results are also applicable to
models with one or more fixed tuned transmitters per node. We give results
that minimize the number of tunings needed. For the directed de Bruijn
topology a single fixed tuning of the transmitter suffices. For the undirected
de Bruijn topology two tunings per cycle (or two fixed tuned transmitters
per node) suffice and we prove this is the minimum possible.
Routing and Topology Embedding in Lightwave Networks 181

2.1 Introduction
Optical networks present new possibilities for high speed networks, but they
also present new network design problems. In this section we study trans-
mission schedule problems of embedding a virtual de Bruijn topology in a
time and wavelength division multiplexed (TWDM) optical passive star net-
work. The embedding of a virtual topology and the transmission schedule
in the lightwave network embedded with the virtual topology are closely re-
lated. How to design the transmission schedule depends on how the virtual
topology is embedded in the physical lightwave network. Different embed-
dings of the virtual topology will induce different transmission schedules. In
our study, we first show how to embed a de Bruijn graph into an optical
passivestar network. Then, based on that embedding, we design the trans-
mission schedule. At last, we show that our transmission schedule is optimal
among all the possible transmission schedules on all possible embedding of
a de Bruijn graph into an optical passivestar network. This means that our
design of embedding of an de Bruijn graph as the virtual topology into a
lightwave network is really what we want. The tuning times of the tunables
transmitters are also considered, which may be helpful to reduce the require-
ments for the physical devices and simplify the complexity of the optimal
transmission schedules.
The first part of this section deals with the design of transmission sched-
ules for a lightwave network embedded with a general topology. Some gen-
eral results are shown about the low bounds of transmission schedules.
The second part and the third part of this section are about the optimal
transmission schedule in an optical passive star network embedded an de
Bruijn graph(directed or undirected}.
De Bruijn graphs are used as a structure for networking computing and
parallel computing. There are some well-known network topologies that
belong to de Bruijn graphs and some variations of de Bruijn graphs. For
example, shuflle exchange graphs are a special case of de Bruijn graphs. Ring
topologies are a special case of generalized de Bruijn graphs. The study on
de Bruijn can help us understand how to design transmission schedule and
topology embedding for a large class of network topologies. The results can
be applied to different special cases of de Bruijn graphs and undirected de
Bruijn graphs.
The lightwave networks embedded de Bruijn graphs provide the possibil-
ity of high performance computing for some large problems. Transmission
schedules are important for networking computing and parallel computing
for solving different communication patterns. De Bruijn graphs are also used
182 Feng Cao

00 00

01 01

02 02

10 10

11 11

12 12

20 20

21 21

22 22

Figure 4: An example of de Bruijn Graph

in switching networks design. For example, the shuffle exchange networks


are based on de Bruijn graphs. An optical passive star embedded with a
de Bruijn graph may be used as a high speed switch or a stage for com-
munication connections. Sivarajan and Ramaswami [54] proposed de Bruijn
graphs as logical topologies for multihop lightwave networks. They also pro-
posed de Bruijn graphs as good physical topologies for wavelength routing
lightwave networks consisting of all-optical routing nodes interconnected by
point-to-point fiber links.
The TWDM optical passive star [10, 31, 27], as used in this chapter, is
a physical network architecture with N nodes, each having a single tunable
transmitter and a single fixed tuned receiver connected through an optical
passive star. See Figure 1. The transmissions from each transmitter are
routed to all receivers, but in order to communicate both transmitter and
receiver must be tuned to the same wavelength. Transmissions of fixed sized
packets are scheduled in synchronous time slots; only one transmission can
be done on each wavelength in a single time slot. The network supports k
different wavelengths, so k transmissions can be done simultaneously. The
transmitters take a fixed amount of time, 0, expressed in time slots, to
change their tuning from one wavelength to another. If the packet size is
small, as it would be in an ATM model, 0 will be large.
We want to embed a virtual network topology, or graph, in this phys-
ical network. This entails mapping the virtual vertices of the topology to
Routing and Topology Embedding in Lightwave Networks 183

000 000

001 001

010 010

011 011

100 100
101 101
110 110
111 111

Figure 5: Shuffle Exchange Networks: A special case of de Bruijn Graphs

the physical nodes of the network. Given this embedding, we then want
to schedule transmissions in a repeating cycle, so that during each cycle
a packet is transmitted along each edge of the virtual topology. We will
consider both directed and undirected graphs for the virtual topology; for
undirected graphs we need schedule packets in each direction along an edge.
All-to-neighbor transmission is considered to be the communication pattern
for this section.
How to design an all-to-neighbor transmission is more important for
high speed networks, especially for fiber optical networks. If we want to
take advantage of the huge bandwidth of fiber channels, the transmission
cycle for any kind of communication patterns should be finished as soon as
possible. That means the time slots for all-to-neighbor transmission cycle
should be as small as possible. This makes bandwidth available for other
applications or another round of all-to-neighbor transmission.
The first objective of our study is to find the optimal transmission sched-
ule, i.e. the minimum number of time slots for each cycle. The second ob-
jective is to minimize the number of tuning times of tunable transmitters
during each cycle. The minimum number of time slots for each cycle means
the best utilization of lightwave networks bandwidth. The fewer number
of tuning times may reduce the complexity of the transmission schedule.
If tuning time is large, fewer tunings give a shorter overall schedule. We
can ask for an optimal transmission schedule given an embedding, or even
184 Feng Cao

better, we would like an optimal transmission schedule over all possible em-
beddings. The transmission schedules we give are optimal over all possible
embeddings.

Not only do we determine optimal embeddings and schedules, but we


build our proofs of optimality on several general graph-theoretic results and
algorithms that will be of use in the further study of similar problems.

The model we use in this section assumes that the transmitter is tunable
to all possible receiver wavelengths. The transmitters take a fixed amount of
time, 8, expressed in time slots, to change their tuning from one wavelength
to another. It gives a more general framework for our results. Since 8 is
parameter, our results rely on 8. Our results based on 8 are more general.
With technologies for optical devices are advancing, the tuning time 8 will
be getting smaller and smaller. But our results are still applicable for 8 is pa-
rameter in our study. Another goal for our study is to reduce the number of
tuning times for all-to-neighbor transmission cycle. In fact, for the directed
de Bruijn graph we tune the transmitters only once before the transmission
schedule begins, so we are actually using a fixed tuned transmitter model.
For the undirected de Bruijn graph we tune each transmitter to only two
wavelengths; this could be implemented as an optical network where each
node has two fixed tuned transmitters and one receiver. Many of our results
can be interpreted in a model with multiple fixed tuned transmitters, and
this model has been successfully implemented experimentally.

There are additional questions about a mUltiple fixed tuned transmitter


model that we do not directly consider. It is possible that slightly shorter
transmission schedules could be devised by having a single node transmit
on two wavelengths simultaneously; but our schedules are using all possible
wavelengths in almost all time slots, so the improvement would only be
slight. Another problem for fixed tuned models is determining whether an
embedding is feasible given an assignment of wavelengths to the transmitters
and receivers, or determining what assignments allow the most flexibility in
embedding virtual topologies. Our results simply give an assignment that
is optimal for embedding the de Bruijn topologies.

This section is organized as follows. In the next subsection, we give


results about optimal embeddings and transmission schedules for arbitrary
virtual topologies. Then we design optimal embeddings and scheduling for
directed de Bruijn topologies, and finally for undirected de Bruijn topologies.
We summarize our work in subsection 2.5.
Routing and Topology Embedding in Lightwave Networks 185

2.2 General Results


An (N, k, 8) optical passive star network is a network as shown in Figure 1
with N nodes, 0,1, ... ,(N - 1). Each node has a tunable transmitter that
can can tuned to any of k wavelengths, wo, WI, ... , wk-l. Each node has
a fixed tuned receiver, so that node i is tuned to wavelength Wimodk. The
network transmits packets in synchronous time slots, with only one packet
of each wavelength permitted in each time slot. Thus at most k packets can
be transmitted in a single time slot. The time needed for a node to tune its
transmitter to a new wavelength is 8 time slots.
A network topology, G, is a directed graph that represents the virtual
interconnections among the vertices. If G is an undirected graph, think of
each edge as two directed edges, one in each direction.
If all nodes in G have r out-going links and r in-coming links, G is
called regular with degree r or r regular graph. Let V and E be the set of
vertices and edges, respectively, of G; and let v = IVI and e = lEI be their
sizes. If G is regular, let r be the in degree of the vertices. Then vr = e.
Regular graphs are one of important classes of graphs. A lot of common
network topologies belong to regular graphs, such as ring and hypercube,
We will focus on regular graphs in the subsection. Our general results can
be applied to some regular graphs, such as ring and hypercube. We also
propose a couple of general approaches to study the transmission schedule
for regular graphs and general graphs.
An embedding of a network topology G into an optical passive star is a
map, f, from the vertices of G to the nodes of the network. This naturally
gives a coloring of the vertices of G by the k wavelengths: vertex v is colored
by the receiving wavelength of node f(v), that is by wavelength Wf(v)modk.
We denote the coloring function by Receiver-Wavelength (v) and we give only
the wavelength index; so here Receiver-Wavelength(v) = f(v) mod k. This
in turn gives a coloring of the directed edges of G, where an edge has the
color of the vertex to which it points (the vertex at its head). Each directed
edge represents one packet that must be transmitted.
Note that the nodes of a passive star network are distinguished only by
the wavelength of their receivers. Hence specifying the receiver wavelength
coloring of the vertices of G implicitly defines the embedding: the first vertex
of G of color i is assigned to node i in the optical passive star network, the
second vertex of color i is assigned to node i+k, the third to node i+2k, and
so on. As long as we do not use more than N / k of anyone wavelength color,
the embedding is straightforward. So in what follows we will only specify
the function Receiver-Wavelength(v) that defines the coloring by receiver
186 Feng Cao

wavelengths, verifying that we do not use more than N I k of anyone color.


We will not explicitly specify the embedding function, f.
An optimal transmission schedule must schedule each packet transmis-
sion so that only one transmission occurs in each wavelength in each time
slot, and yet so the the total time needed for all transmissions is minimum.
The minimum time transmission schedule might well depend on the embed-
ding we choose. We want to construct the embedding so that it allows the
minimum transmission schedule out of all possible embeddings, and so that
we can efficiently determine the embedding and the optimal transmission
schedule. An optimal transmission schedule also depends on the value of o.
For theoretical purposes we will sometimes consider optimal transmission
schedules when 0 = 0, though in practice 0 is never O. 0 has an big effect in
designing optimal transmission schedule.
Before we design any transmission schedule for any virtual topology em-
bedded in an optical passive star network, we want to know the low bound
for the length of optimal transmission schedules. This low bound tells us the
least number of slots which is needed for any transmission schedule. This
low bound can be analyzed and determined theoretically. Of course, this
may not be an exact bound. After analysis, a transmission schedule will be
designed to be as near to the lower bound as possible. If the transmission
schedule reaches the low bound, it tells us that an optimal transmission
schedule has been proposed and the work has been perfectly done. Other-
wise, either the lower bound is not the exact bound for this virtual topology,
or the transmission schedule is not optimal. More analysis and design are
needed to achieve better performance for such a transmission cycle.
A trivial lower bound on the length of an optimal transmission schedule
r
is elk1 where e is the number of edges of G, since there are e transmissions
and each time slot can accommodate at most k simultaneous transmissions.
Clearly, to achieve such a lower bound, we would need to have at most
rei k1 edges of each wavelength color. G is regular of degree r, then this
r
lower bound becomes r v I k 1, since some color must be used for at least
r r
v I k1 vertices, giving at least r v I k1transmissions on a single wavelength.
r
To achieve this lower bound we would need at most v I k1 vertices of each
color. We summarize this in Theorem 2.2.

Definition 2.1 A k uniform vertex (edge) coloring of a graph is a coloring


so that there are at most rvlkl vertices rrelkl edges) of each color.

r
Theorem 2.2 -Edge Lower Bound: There is a elk1 lower bound on an
optimal transmission schedule, which can be achieved only if G is uniformly
Routing and Topology Embedding in Lightwave Networks 187

edge colored by the receiver wavelengths. If G is regular of degree r, then


r
there is an r v / k1 lower bound, which can be achieved only if G is uniformly
vertex colored.

Note also that a uniform vertex coloring ensures that as long as v :::; N,
that is as long as the network topology has fewer vertices than the optical
passive star we are embedding it into, then a coloring by receiver wavelengths
enduces a valid embedding.
So when G is regular, a necessary condition for an rrv/kl optimal trans-
mission schedule is that G be k uniformly colored by the receiver wave-
lengths. Surprising, this is also a sufficient condition when 6 = 0.

Theorem 2.3 If G is an r regular graph and each receiver wavelength colors


at most m vertices and 6 = 0, then G has an rm transmission schedule that
can be determined in time O(r 2 v 3/ 2 ).

Proof: Look at the adjacency matrix of Gj each row and column sums to
r. By a well know theorem of Birkhoff [8,42]' the matrix can be written as
the sum of r permutation matrices, where each row and column has only a
single entry of value 1. This can be done by solving r -1 maximum matching
problems, which can be done in time O(r 2 v 3/ 2 ) [29]. Each entry with value
1 in these permutation matrices represent one directed edge of the graph,
and so one transmission that must be scheduled.
Look at a single permutation matrix. Since each row has only one entry
of value 1, each node will only be transmitting to one receiver. Since each
column has only one entry of value 1, each receiver will only be receiving from
one transmitting node. The wavelength of the transmission is determined
by the column in which the entry of value 1 appears. Write this permutation
matrix in turn as a sum of at most m {O, 1}-matrices where in each matrix at
most one column of each wavelength has an entry of value 1. Since at most
m vertices of G have the same receiver wavelength color, this is possible and
can be done efficiently. Each such matrix represents the transmissions that
occur in a single time slot.
Since we are assuming that 6 = 0, the transmission along all of the edges
of G can be scheduled in only rm time slots.

Corollary 2.4 If G is an r regular graph and it is k uniformly vertex colored


rr
by the receiver wavelengths, and if 6 = 0, then it has an v / k1 optimal
transmission schedule that can be determined in time O(r 2 v 3 / 2 ).
188 Feng Cao

Proof: Here that the transmission schedule is optimal by Theorem 2.2


for regular graphs.
We can apply this corollary to show that any embedding of the d-ary
n-dimensional hypercube that assigns the k receiver wavelengths uniformly
r
has an optimal transmission schedule of length (d - l)n cF / k1 time slots,
when 6 = O.
Another lower bound on the length of an optimal transmission schedule
is given by considering the number of transmissions on each wavelength that
each node must perform.

Theorem 2.5 Vertex Lower Bound: If G has a vertex of degree at least


r and the neighbors of that vertex are colored by at least £ > 1 receiver
wavelengths, then there is an r + £6 lower bound on an optimal transmission
schedule, using this wavelength coloring.

Proof: Each the vertex must do r transmissions and change its transmis-
sion wavelength £ times in each transmission cycle.
It is important to emphasize that this lower bound depends on the wave-
length coloring. Note also that if each vertex has neighbors of only one
wavelength, £ = 1, then the transmitter can set its wavelength once before
any transmissions begin and this lower bound becomes just r.
If 8 is large enough then it is easy to find a transmission schedule that
meets this lower bound and hence is optimal. If 6 > e, for example, then
we can schedule only one transmission per time slot and still achieve an
optimal transmission schedule by this lower bound. So we will be interested
in schedules that are optimal for all values of 6, both large and small.
To reduce this lower bound as much as possible, we woulq like the neigh-
bors of a vertex in G to be colored by as few different wavelengths as possible.
This motivates the next definition ..

Definition 2.6 An i-neighborhood bounded vertex coloring of a graph is


a coloring where each vertex has neighbors colored by at most £ colors.

There are limits on how small £ can be.

Theorem 2.7 IfG is undirected and connected, then G has a l-neighborhood


bounded coloring with at least two colors if and only if G is bipartite.

Proof: If G is bipartite, then clearly it has a I-neighborhood bounded


coloring with two colors.
Routing and Topology Embedding in Lightwave Networks 189

Assume G has a I-neighborhood bounded coloring with at least two


colors, and that it is not bipartite. Since G is connected, it must have an
edgeuv with u and v different colors, say u is blue and v is red. Since G is
not bipartite, it must have an odd length cycle. Then there must be a path
from v to the odd length cycle, but any path from v must alternate colors,
since all of the neighbors of v must be blue like u, all vertices length two
away must be red, like v, and so on. This means the odd length cycle must
be colored alternately by red and blue-but this is impossible.
Theorem 2.7 tells us only bipartite graphs have I-neighborhood bounded
coloring. I-neighborhood bounded coloring is a big plus to reduce the num-
ber of tuning times for each node and simplify the transmission schedule.
For the transmitter of a station need to tune its transmitter only once to
talk to all of its neighboring stations, the transmitter can sit on that channel
until the all-to-neighbor transmission cycle is over. The following facts can
be known from Theorem 2.7:

• Rings have I-neighborhood bounded colorings

• Meshes have I-neighborhood bounded colorings

• Hypercubes have I-neighborhood bounded colorings

We would like to mention one last general technique. When some node
needs more than one transmitter wavelength, it is possible to replace two
cycles of the transmission schedule by a cycle followed by a reversed cycle. In
this way, the tuning of the transmitters at the end of the cycle is unnecessary,
since the reversed cycle begins in the same tuning that ends the first cycle.
This can reduce the transmission time by as much as o.

2.3 Directed de Bruijn Graphs


The directed de Bruijn graph, B(d, n), is defined by

v - {(Xl, ... ,xn ) I Xi E [0, d - In


E = {(Xl, . .. ,xn ) ~ (X2' ... ,xn , a) I a E [0, d - In.

Note this is regular of degree dj however, it has d loops (x, ... ,x) ~
(x, ... ,x). Since no packets need to be transmitted along these loop edges
(a node does not need to transmit to itself), we drop these edges.
Since there are d!' vertices, each of degree d, and since we exclude the d
loop edges, we have a total of d!'+1 - d edges. By the Edge Lower Bound,
190 Feng Cao

Theorem 2.2, we know an optimal transmission schedule must use at least


r (tF+! - d) / k1time slots.
In fact this lower bound can be achieved for some values of k and d. This
holds for any 5, since it is possible to embed the directed de Bruijn graph
in the optical passive star so that the neighbors of a given node all have the
same receiver wavelength-a I-neighborhood bounded coloring. In such an
embedding, the transmitters never need to tune to a different wavelength,
so there is no dependence on the tuning time 5.
We will define two different embeddings, the first gives an optimal trans-
mission schedule for values of k and d that satisfy certain simple conditions.
The second embedding is more complicated and it gives a good but not
necessarily optimal transmission schedule, but without any assumptions on
k and d. Actually we must make some assumptions about N, k, and d for
the embedding to be feasible; we assume tF < N so that some embedding
is possible, and d < N/k so that a I-neighborhood bounded coloring is
possible.
Here is the first vertex coloring, which by our comments above implicitly
defines the embedding.

Embedding A:

n-l
Receiver-Wavelength(xI, ... ,xn ) = L xjdj - 1 mod k.
j=1

This coloring can be thought of as taking the number represented in


base d by (XIX2 ••• X n -l) and reducing it modulo k to get the color of vertex
(Xl, ••. ,xn ). This coloring is not necessarily vertex uniform, but it does use
dr
each wavelength color at most tF- 1 /k1times. (If kltF- 1 then it is vertex
uniform.) This coloring is I-neighborhood bounded, because the neighbors
of (XI, ••• ,xn ) are all of the form (X2,' •• ,Xn , a) for some a, and these are
all of the same receiving wavelength color.
r
This gives a ~ tF- 1 /k1 transmission schedule for the entire directed de
Bruijn graph, by Theorem 2.3. We would like to improve this to get an
optimal r(tF+! - d)/k1 transmission schedule for the graph with the loops
removed. If kltF- 1 and d < k (both reasonable assumptions) then in fact
r r
(tF+! - d) / k1 and ~ ~-1 / k1 are equal. Hence in this case, we have an
optimal transmission schedule.
Next consider the case where kid. First of all, note that the loop edges
are uniformly distributed among the k different wavelengths. This follows
Routing and Topology Embedding in Lightwave Networks 191

because the loop (x, . .. , x) -t (x, ... ,x) is assigned to wavelength


n-l
2: xdj - 1 == x (mod k),
j=l

since kid. Consequently, the d vertices with loops together have (d - l)d
edges uniformly distributed among the k different wavelengths.
We can consider the directed de Bruijn graph as the union of all edges
of the form (XI, X2, ... ,xn ) -t (X2,"" x n , Xl) and all other edges. Each
subgraph is regular, of degree 1 and degree d - 1 respectively. We can apply
Corollary 2.4 to each subgraph to get a transmission schedule. Note all loop
edges are in the first subgraph, and since they are uniformly colored by the
receiving wavelengths, we can remove d/k time slots from this schedule by
removing all loop edges. This gives us a schedule for all transmissions in

~ _ ~ + (d -1)~
k k k
= r~+1-
k
dl
time slots, and so it is optimal. We summarize this in a theorem.

Theorem 2.8 If kl~-l and either k > d or kid, then Embedding A has an
optimal transmission schedule, as given above.

An example of this embedding and an optimal transmission schedule for


B( 4,2) with 4 wavelengths is given in Figure 2. In this example, d = 4, n = 2
and k = 4. It is a cas of kid. So the number of time slots for optimal
schedules of all-to-neighbor transmission Cycle will be no

r ~+1k -d1 = r-
43 --41 =15
4

Table 2 demonstrates the adjacency matrix ofB(4,2) and shows how to make
an optimal transmission schedule in 15 time slots.
Another example of this embedding and an optimal transmission sched-
ule for B(2, 4) with 4 wavelengths is given in Figure 6 and Figure 7. In this
example, d = 2, n = 4 and k = 4. It is a cas of k = ~. So the number
of time slots for optimal schedules of all-to-neighbor transmission Cycle will
be no

Table 1 demonstrates the adjacency matrix ofB(4,2) and shows how to make
an optimal transmission schedule in 8 time slots.
......
~
t-:l

0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3 3,0 3,1 3,2 3,3
~?, 0) L R R R
(0,1) C R R R
(0,2) C R R R
(0,3) C R R R
\~,O) R C R R
(1,1) R L R R
(1,2) R C R R
(1,3) R C R R L = loop
~~, ?1 R R C R
(2,1) R R C R
(2,2) R R L R
(2,3j R R C R
~3,0! R R R <.;
(3,1) R R R C
(3,2) R R R C
(3,3) R R R L
edges, C = cycle edges where (Xl, X2) -7 (X2' Xl), and R = rest of the edges.

Table 1: Adjacency matrix of B(4, 2)

&1
~
Q
o
~
o
.....
=
~.

~
Q..

~
~
V\Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
o
(O,Ol ° (0,1) (0,2) ~O, ~!
0-
(0,1) (1,0) (1,1) (1,2) (1,3) ~
(0,2) (2,0) (2,1) (2,2) (2,3)
(0,3) (3,0) (3,1) (3,2) (3,3)
\~,~! (0,1) \?, ~! \0, :! \?, ~! ~
(1,1) (1,0) (1,2) (1,3) 0-
(1,2) (2,1) (2,O) (2,3) 8.
(1,3) (3,1) (3,O) ~~'3,2~l (3,3) Q..
{2,O! \O,2! (O, ~! \0, ~! \0, ~! ~.
(2,1) (1,2) (I,O) (1,1) (1,3)
(2,2) (2,O) (2,1) (2,3)
(2,3) (3,2) (3,O) (3,1) (3,3) s·
(3,O! (0,3) \?,~! \?, ~! So, ~! t"1
(3,1) (1,3) (I,O) (1,1) (1,2) Qq.
(3,2) (2,3) (2,O) (2,1) (2,2)
(3,3) (3,O) (3,1) (3,2) b"
.....
Transmission schedule, showing destination vertex at each time slot. ~
~
The wavelength is the first coordinate of the destination vertex.
~
.....
Table 2: Schedule for B(4, 2) with 4 wavelengths using Embedding A. ~
~
*

t-'
~
~
194 Feng Gao

Before we give our second embedding, we will first prove a better lower
bound on an optimal transmission schedule. Because the directed de Bruijn
graph with loops removed is almost regular, we can get an improved lower
bound that falls above the general Edge Lower Bound, but below the Edge
Lower Bound for regular graphs.

()(]()() 0001 0010 0011 0100 0101 0110 om 1000 1001 1010 1011 1100 1101 lllO 1111

()(]()() L R
0001 R R
0010 R R
0011 R R

0100 R R
0101 R R
0110 R R
0111 R R

1000 R R
1001 R R
1010 R R
1011 R R

1100 R R
1101 R R
1110 R R
1111 R L

Adjcency matrix of B(2,4), L=loop edges, R=rest of edges

Figure 6: The adjacency matrix of B(2,4)

Theorem 2.9 An optimal transmission schedule on B(d, n) with loops re-


moved must use at least

time slots, where d!' = kq + r with 0 < r :$ k.

Proof: First assume that some receiving wavelength is used more than
rd!' / k1 times. Then even if all the loop edges share this same receiving
wavelength, we would need at least dUd!' / k1 + 1) - d time slots just to
transmit along all the edges of this wavelength, but this is already at least
as large as the lower bound of the theorem.
So r different wavelengths must each be used for exactly d!' / k1vertices. r
From these r wavelengths, pick the the one that colors the fewest loop edges;
it colors at most ld/rJ loop edges. Therefore, to transmit along all the edges
· Routing and Topology Embedding in Lightwave Networks 195

1 2 3 4 5 6 7 8
0000 0001
0001 0010 0011
0010 0100 0101
0011 0110 0111

0100 1000 1001


0101 1010 1011
0110 1101
1100
0111 1110 1111
1000 0000 0001
1001 0010 0011
0100 0101
1010
1011 0110 0111
1100 1000 1001
1101
1010 1011
1110 1101
1100
1111 1110 1111

Figure 7: Transmission Schedule for B(2,4) with 4 wavelengths using Em-


bedding A

of this wavelength, we must use at least

time slots.
Now here is our second embedding. We give it in the form of an algo-
rithm. We need to be sure that the loop edges are colored uniformly; this
fact is clear from the algorithm.

Embedding B:

c~o
for x = 0 to d - 1
for a = 0 to d - 1
Receiver-Wavelength(x, x, ... ,a) ~ c mod k
c~c+1
for (Xl, ... , xn-d = (0, ... ,0) to (d - 1, ... ,d - 1)
if Xl =j:. X2 OR X2 =j:. Xg OR ... OR Xn-2 =j:. Xn-l then
for a = 0 to d - 1
Receiver-Wavelength(xl, ... ,xn-l,a) ~ c mod k
c~c+l
196 Feng Cao

Here the colors of the loop edges are uniformly distributed over the k
wavelength colors. For the transmission schedule, we can schedule transmis-
sion along all edges, including the loop edges, in Jl rd"-l / k1time steps. We
can again arrange to have all the loop edges appear together in this schedule,
so ld/kJ time slots will be completely taken up by loop edges and they can
be removed from the schedule. This gives a schedule requiring only

time slots. This may not be optimal, but it is close to the lower bound
established in Theorem 2.9. It is easy to see that the difference between
the length of this schedule and the lower bound in the theorem is at most
rP time slots. If kld"-l then rPrd"-l/kl = drd"/kl and r = k and this
schedule meets the lower bound and so it is optimal. We summarize this in
the next theorem.

Theorem 2.10 Embedding B has a transmission schedule, as given above,


in rP rd"-l /k l-ld/kJ time slots. If kld"-l then it is an optimal transmission
schedule. If k "fd"-l, then the transmission schedule uses at most rP more
time slots than optimal.

With Theorem 2.8 and Theorem 2.10 we get an embedding and an opti-
mal transmission schedule for the directed de Bruijn graph B(d, n) whenever
kld"-l, and a near optimal schedule (within rP time slots of optimal) in all
other cases. This transmission schedule is optimal over all possible embed-
dings and for all possible values of d, since it meets the Edge Lower Bound,
Theorem 2.2, which holds for all embeddings and all d.

2.4 Undirected de Bruijn Graphs


The undirected de Bruijn graph, UB (d, n), is simply the underlying undi-
rected graph of B(d, n). If d = 1 the graph has only one vertex; so we
assume d > 1. If n = 1 the graph is a complete graph on d vertices. The
case of the complete graph has been handled in [41], so we will also assume
n > 1. We sometimes need to handle the case n = 2 specially; we will only
mention this case in passing.
We will call the edges (Xl, X2, ... , xn) --+ (X2,"" Xn , a), left-shift edges,
and edges (Xl"'" Xn-l, Xn) --+ (a, Xl, ... ,xn-d, right-shift edges. As in
the directed case, we drop the d loop edges. Here there are also multiple
edges between (rP - d) /2 pairs of vertices of the form (a, b, a, b, ... ) and
Routing and Topology Embedding in Lightwave Networks 197

(b, a, b, a, ... ) where a =1= bj there is a left-shift and right-shift edge between
these two vertices. We only count one of these two edges. Since we must send
messages in both directions along each undirected edge, the total number of
transmissions is 2cfl+ 1 - cf2 - d.
The first result about undirected de Bruijn graphs is an immediate con-
sequence of Theorem 2.7.
Theorem 2.11 If d > 1 and n > 1, then UB(d,n) does not have a 1-
neighborhood bounded coloring with at least two colors.

Proof: When d > 1 and n > 1 we have the length three cycle
(0,0, ... ,0) -7 (0, ... ,0,1) -7 (1,0, ... ,0) -7 (0,0, ... ,0).
Hence, UB(d,n) is not a bipartite graph, and by Theorem 2.7 it does not
have a I-neighborhood bounded coloring with at least two colors.
However, a 2-neighborhood bounded coloring is possible, as shown by
the following embedding.

Embedding C:
n-l
Receiver-Wavelength (Xl , ... ,Xn ) = L xjdj - 2 mod k.
j=2

Here the left-shift edges of (Xl, X2, ..• , xn) are all colored by (X3 ... Xn)d
(that is, as a base d number) reduced modulo k. The right-shift edges are
all colored (Xl ... X n -2)d mod k. If we are to use all wavelengths we must
assume that dn - 2 ~ k.
By the previous theorem, this embedding is optimal for minimizing the
total number of the tuning times.
We will consider two special cases: kid and k = dP for 2 ~ p ~ n - 2.
Transmission Schedule When kid
When kid, Embedding C takes the form
Receiver-Wavelength (Xl , ... ,Xn ) = X2 mod k.

° °
Since each node X = ,xn ) needs to transmit a message to
(Xl, ...
(X2' ... ,Xn , a), ~ a ~ d - 1, and to ({3, Xl, ... ,xn-d, ~ {3 ~ d - 1, the
wavelength of the transmitter of X need occupy only wavelengths X3 mod k
and Xl mod k. The wavelength of the receiver of X = (Xl, X2, . .. ,xn ) is fixed
at X2 mod k.

Algorithm UB1:
198 Feng Cao

1. Initialization: Define

where a ~ i, j ~ k - l.
For x E V[i,j], the wavelength of x's transmitter is switched to i.
We choose an arbitrary order for the vertices in V[i, j] with j f: O. For
the vertices in V[i, 0], we divide them into three parts:

A(i) = {x E V[i,O] 1 Xi = Xi+1,O ~ i ~ n -1}


B(i) = {x E V[i, 0] - A(i) 1 xi = Xi+2,0 ~ i ~n - 2}
C(i) = V[i, 0] - A(i) - B(i).

Let B(i) = B(i, 1)UB(i, 2) with B(i, 1)nB(i, 2) = 0 and a ~ IB(i, 1)1-
IB(i, 2)1 ~ l.
We order the vertices of V[i, 0] so that all vertices of C(i) come first,
then the vertices of B(i), and finally those of A(i). We pick an arbi-
trary order within A(i), B(i) and C(i). It is a simple fact that each
node in B(i) has a neighbor which is both a left-shift neighbor and a
right-shift neighbor, we call this neighbor a common neighbor.

2. Phase I: All nodes transmit messages to their left-shift neighbors,


except the common neighbors ofthe nodes in B(i, 2) with 0 ~ i ~ k-l.

for j = 1 to k
Each node in V[i,j mod k] transmits to its left-shift
neighbors in parallel over a ~ i < k according
to its order in V[i,j mod k] except the common
neighbors of the nodes in B(i,2) with a ~ i < k.
After completing its transmission, each node switches
its transmitter wavelength to (i + j) mod k.

3. Phase II: All nodes transmit messages to their right-shift neighbors,


except the common neighbors ofthe nodes in B(i, 1) with a ~ i ~ k-l.
This is similar to Phase I. After completing transmissions nodes switch
their transmitters back to wavelength i mod k.
This algorithm gives a schedule for the time-slot cycle. Note that the
schedule returns to the same setup at the end of Phase II, so we can continue
another cycle by repeating Phase I and Phase II.
Routing and Topology Embedding in Lightwave Networks 199

Theorem 2.12 Algorithm UBl gives an optimal schedule for the time-slot
r
cycle for all 8. If 8 :::; an+! /k2 - d - d/k - d(d - 1}/2k1, the cycle is
(2an+! - d2 - d}/k time slots long.
If 8> an+! /k - d - d/k - rd(d -1}/2k1, the cycle is 28 + 2d.

Proof: First lV[i, j]\ = an /k2. Since we keep the same order for all
the nodes in V[i,j] in Phase I and Phase II and x = (Xl, X2,"" xn) with
Xi = Xi+ 1, 0 :::; i :::; n -1 has only d-1 left-shift neighbors, there are no wasted
time slots if the tuning time from Phase I to Phase II for each node is no
more than dklV[i,j]\- d - d/k - Ld(d -1}/2kJ.
Since each node in C of V[i, 0] has only d - 1 right-shift neighbors and
each node in B of V[i, 0] has d - 1 right-shift neighbors yet to transmit to
(each node in B of V[i, 0] has d right-shift neighbors, but one of them is also
a left-shift neighbor), there are no wasted time slots if the tuning time from
Phase II to Phase I for each node is no more than dklV[i,j]\- d - d/k-
rd(d-1}/2kl
Note that

dklV[i, j]l- d - d/k - fd(d - 1}/2k1 :::; dklV[i, j]\- d - d/k - Ld(d - 1}/2kJ

lV[i,j]\*d*k-d-d/k- Ld(d-1}/2kJ :::; lV[i,j]\*d*k-d-d/k- fd(d-1}/2k1+1


Therefore if 8:::; an+! /k 2-d-d/k- fd(d-1}/2k 1, Algorithm UB1 is optimal;
the cycle is (2an+ 1 - ~ - d}/k time slots long.
If 8 > an+! /k - d - d/k - rd(d - 1}/2k1, the time-slot cycle is

(2~+! - d2 - d}/k + 28 - 2(~+1 /k - d - d/k)


- Ld(d - 1)/2kJ - rd(d - 1)/2k1
28 + 2d

For UB(d, 2) with kid, it is easy to design a pipeline schedule similar to


that given in [41] for the case n = 1, the complete graph. An example of
Algorithm UB1 for UB(4,3} with 2 wavelengths is given in Figures 8 and 9.
For UB(4, 3}, d = 4, n = 3 and k = 2. If 8:::; an+! /k 2-d-d/k-rd(d-1}/2k 1,
the time slots for the optimal transmission schedules are no less than

(2~+! - d2 - d}/k = (2 * 44 - 42 - 4}/2 = 246

Figures 8 and 9 tells how to make an optimal schedule in 246 time slots.
An example of Algorithm UB1 for UB(4,3) with 4 wavelengths is given
in Figures 8 and 9. For UB(4,3), d = 4, n = 3 and k = 2. If 8:::; an+! /k 2 -
200 Feng Cao

V[O,I] = {(I,x2,0),(3,X2,0),(I,x2,2),(3,x2,2)}
V[O,O] - {(0,x2,0),(2,X2,0),(0,X2,2),(2,X2,2)}
Al - {(0,0,0),(2,2,2)}
BI- {(a, X2, 0), (2, X2, 2)} - Al
01 - V[O,O] - Al - BI

V[I,I] - {(0,x2,I),(2,x2,I),(0,x2,3),(2,x2,3)}
V[I,O] - {((I,x2,I),(3,x2,I),(I,x2,3),(3,x2,3)}
A2 = {(I,I,I),(3,3,3)}
B2 {(I, X2, 1), (3, X2, 3)} - A2
O2 - V[I,O] - A2 - B2

Figure 8: Vertex sets for U B(4,3) with 2 wavelengths using Algorithm UBI.

r
d-d/k- d(d-I)/2k 1, the time slots for the optimal transmission schedules
are no less than
(2tF+1 - d2 - d)/k = (2 * 44 - 42 - 4)/4 = 123
Figures 10 tells how to make an optimal schedule in 123 time slots.
Transmission Schedule When k = dP
Now we turn to the case where k = d? for 2 ::; p ::; n - 2. In this case, we
have n 2: 4. Here we will schedule transmissions along all edges, even along
the loop and multiple edges. This will still give us an optimal transmission
schedule as long as p > 2. If p = 2 the transmission schedule is at most 1
time slot longer than optimal.
Here is the algorithm.

Algorithm UB2:
1. Initialization: Define for °: ; j, k <d
V[j,k,Yj - {(a,b,xI, ... ,xp -2,a+j,b+k,y) I O::;a,b,xi <d}
L[i,j,k,Yj - {(a,b,xI, ... ,xp -2,a+j,b+k,y)--*
(b, XI, ... , Xp -2, a + j, b + k, y, i) I 0::; a, b, Xi < d}
R[i,j,k,Yj = {(a,b,XI, ... ,Xp -2,a+j,b+k,y)--*
(i, a, b, Xl, ... , Xp -2, a + j, b + k, y) I °: ;
a, b, Xi < d}
Routing and Topology Embedding in Lightwave Networks 201

time slot 1------64 65--96 97--116 117 121---184 185--216 217--240 241-246
-120

V[O,I] wO wi

Al
-wO -wO-
BI
-wO- -wO-
CI wO wO

V[I,I] wi wO

A2
- wi -wi-
B2
-wi- -wi-
C2 wi wi

Figure 9: Schedule for UB(4,3) with 2 wavelengths using Algorithm UBI.

V[ij], -I <i,j<k=4
V[O,O] : { (0, X2 ,0)) V[I,O):{ (1, ~, I) }
A(O):{ (0, 0, 0)) A(l): { ( I, I, I) }
B(O):V[O,O)-A(O) B(l): V[I,I)-A(1)

V[2,O):{ ( 2, X2 ,2») V[3,O):{ (3, ~ ,3) }


A(2):{ (2, 2, 2) } A(3): {(3, 3, 3»)
B(2):V[2,2)-A(2) B(3) : V[3,O)-A(3)

1-3 4-12 13-60 61-63 64-75 76-123


A(O) Wo Wo
B(O) Wo Wo
A(I) WI WI
B(1) WI W
A(2) W2 W2
B(2) W2 W2
A(3) W3 Wl
B(3) Wl Wl
\W Wo
V[*, a)
WI WI
a>O W2 W3 W2 W3

Figure 10: Schedule for UB(4, 3) with 4 wavelengths using Algorithm UBI.
202 Feng Gao

Note here that a + j and b + k are reduced modulo d. By Y in the


definition of R we mean the vector of coordinates y with the last
coordinate removed.

2. Phase I: All vertices transmit to their left-shift neighbors.

for y= (0, ... ,0) to (d - 1, ... , d - 1)


for k = 0 to d - 1
for j = 0 to d - 1
for i = 0 to d - 1
Transmit along the edges of L[i,j, k, 171
in a single time slot
Each node in V[i, k, 171 tunes its transmitter
to the wavelength for its right-shift
neighbors

3. Phase II: All vertices transmit to their right-shift neighbors.


This is the same as phase I, except we transmit along the edges of
R[i,j,k,171 and then tune the transmitters to the wavelength for the
left-shift neighbors.

An example of Algorithm UB2 for UB(2, 5) with 8 wavelengths is given


in Figure 3.

Theorem 2.13 Algorithm UB2 gives a valid transmission schedule, when


k = dP and 2::; p ::; n - 2.

Proof: Let us look at a single set of edges L = L[i, j, k, 171 and show that
transmissions along all these edges can be scheduled in one time slot. It is
clear that the edges in L all begin in a vertex in V[i, k, 171. Each such vertex
has only one transmission scheduled in L. The receiving vertices in Leach
receive only one transmission. To see this note that if (b,Xl,'" ,xp _2,a +
j, b + k, y, i) is the same as (b', x~, ... , X~_2' a' + j, b' + k, y, i), then in fact
a = a', b = b', and Xl = x~ for 1 ::; .e ::; p - 2; so the source vertices are the
same and this is the same transmission. And finally, each wavelength color
is used only once, since the color is determined by a, b, and the Xl which
are different for each vertex. Hence, these transmissions can be scheduled
in one time slot. The same is true about the R sets of edges.
Clearly this schedules transmissions along all edges of U B(d, n), includ-
ing loop and multiple edges. Therefore Algorithm UB2 gives a valid trans-
mission schedule.
Routing and Topology Embedding in Lightwave Networks 203

V[O,O] = {(0,0,x,0,0),(0,1,x,0,1),(1,0,x,1,0),(1,1,x,1,1) I x=0,1}


V[0,1] {(O, 0, x, 0,1), (0, 1, x, 0, 0), (1,0, x, 1, 1), (1, 1, x, 1,0) I x = 0,1}
V[1,0] {(0,0,x,1,0),(0,1,x,1,1),(1,0,x,0,0),(1,1,x,0,1) I x=0,1}
V[1,1] {(0,0,x,1,1),(0,1,x,1,0),(1,0,x,0,1),(1,1,x,0,0) I x=0,1}
Set\Time 0-1 2-3 4-5 6-7 8-9 10-11 12-13 14-15
V[O,O] L R
V[0,1] L R
V[l,O] L R
V[1,1] L R

Vertex\ Time
~o, 0, 0, O,?!
°wo1 2 3 4 5 6 7 8 9
wo
10-11 12 13 14 15

(0,1,0,0,1) wl w2
(1,0,0,1,0) w2 w4
(1,1,0,1,1) w3 we
(0,0,1,0,0) w4 WI
(0,1,1,0,1) W6 w3
(1,0,1,1,0) we w5
(1,1,1,1,1) w7 w7
~o, 0, 0, 0, ~! wl wo
(0,1,0,0,0) wo w2
(1,0,0,1,1) w3 w4
(1,1,0,1,0) w2 we
(0,0,1,0,1) w6 wl
(0,1,1,0,0) w4 w3
(1,0,1,1,1) w7 w5
(1,1,1,1,0) we w7
~?, 0, 0, 1, ~~ W2 wo
(0,1,0,1,1) W3 W2
(1,0,0,0,0) wo W4
(1,1,0,0,1) Wl we
(0,0,1,1,0) we Wl
(0,1,1,1,1) w7 W3
(1,0,1,0,0) W4 W5
(1,1,1,0,1) W5 w7
(0,0,0,1,1) w3 wo
(0,1,0,1,0) W2 w2
(1,0,0,0,1) Wl w4
(1,1,0,0,0) wo we
(0,0,1,1,1) w7 wl
(0,1,1,1,0) we w3
(1,0,1,0,1) W5 W5
(1,1,1,0,0) W4 W7

Table 3: Schedule for UB(2, 6) with 8 wavelengths using Algorithm UB2.


204 Feng Cao

Phase I, excluding tuning time, takes cF-p+l time slots; so the length
of this transmission schedule when 0 = 0 is 2cF- p+l. Each set of vertices
V[j, k, YJ transmits to all d of its left-shift neighbors in d consecutive time
slots and then tunes its transmitter. This gives the maximal amount of
free tuning time in the schedule. If 0 ::; cF-p+l - d, then the transmission
schedule is still only 2cF-p+l time slots long. For larger values of 0, the
schedule is
2rfl- p+l + 2(0 - (rfl-P+l - d)) = 2d + 20
time slots long.

Theorem 2.14 Assume that k = dP for 2 ::; p ::; n-2. Algorithm UB2 gives
a transmission schedule of length 2cF- p+l time slots when 0 ::; cF-p+l - d
and of length 2d + 20 otherwise. If p > 2 this schedule is optimal for all o.
If p = 2 this schedule is within one time slot of optimal for 0 ::; cF-p+l - d
and optimal for larger o.
Proof: The calculations for the length of the schedule are given above.
Since U B (d, n) has 2cF+l - J2 - d messages to transmit, by the Edge
Lower Bound Theorem 2.2 any transmission schedule must have at least
r(2cF+ 1 - J2 - d)/dPl time slots. When p > 2 it is easy to see this is equal
to 2cF- p+l, and when p = 2 this is equal to 2cF- p+l - 1. Therefore our
schedule is optimal when 0 ~ cF-p+1 - d and p > 2 and only one time slot
longer than optimal when p = 2.
Since n ~ 4 we know there must be some vertex with neither loops nor
multiple edges. This vertex has 2d neighbors to transmit to, and it must
change its transmitter wavelength at least twice, by Theorem 2.11, no matter
what wavelength coloring we use. By the Vertex Lower Bound Theorem 2.5,
we have a lower bound of 2d + 20. Hence again the schedule is optimal.
Hence Algorithm UB2 is an optimal schedule for the undirected de Bruijn
graph, for all possible embeddings and all possible values of 0, when k = dP
and 2 < p ::; n - 2.
Using the technique of alternating forward and reverse cycles mentioned
in subsection 2.2, we can reduce the transmission time for large 0 to cF-p+l +
d+o.

2.5 Summary
We have given embeddings of the virtual topologies of the directed and
undirected de Bruijn graphs in a TWDM optical passive star network. Along
with these embeddings we have given transmission schedules that transmit
Routing and Topology Embedding in Lightwave Networks 205

along all edges of the virtual topology (excluding loop and multiple edges)
in a minimal number of time slots. We proved these embeddings minimize
the number of tunings needed in each cycle of the schedule and that the
schedules are optimal over all possible embeddings.
We have made certain assumptions about the parameters of the virtual
and physical networks to get these optimal results, but we also have given
near optimal schedules for the directed de Bruijn graph without unnecessary
assumptions on the parameters.
Many open questions remain: Can we relax the requirements on the
network parameters further and still prove optimality? For what parameters
can we find optimal embeddings and schedules for a generalized de Bruijn
network? (In a generalized de Bruijn network the edges are defined by
Xi -+ X(di+a)modn for 0 ~ a < d.) Can we get lower bounds on the number
of tunings needed for other topologies? Most importantly, are there further
applications of the general graph theoretic results that we began in this
section, and further results along these lines that will be useful in solving
these embedding and scheduling problems? We think that considering the
scheduling problem as a matrix decomposition problem-trying to write
the adjacency matrix of the virtual topology as a sum of matrices with
certain combinatorial properties-is a particularly useful way to formulate
the problem. We hope these ideas have further application.

3 Reliable Routing Analysis


We study reliability in the all-optical Partitioned Optical Passive Stars
(POPS) topology. POPS is a physical architecture to scale the traditional
optical passive star couplers and explore the advantages of high noise im-
munity and single-hop. The reliability and fault tolerance in POPS rely
on its connectivity. In this study, we analyze the worst case for network
partitions due to either node failures or link errors. For node connectivity,
we show that the whole system remains connectivity no matter how many
nodes don't work. For link connectivity, we analyzed some worst cases due
to link errors and the lower bound for connectivity is demonstrated. Some
sufficient approaches are proposed to detect and keep connectivity of the
whole system.

3.1 Introduction
The Partitioned Optical Passive Stars (POPS) topology [33, 17, 32] is a
physical architecture to scale the traditional optical passive star couplers
206 Feng Cao

and explore the advantages of the optical passive star couplers. POPS is a
non-hierarchical structure and connects several optical passive star couplers
together by the help of some intermediate optical passive star couplers. It
remains the property of high noise immunity, single-hop, and no intermedi-
ate electronic/optical conversions. POPS is an design without considering
the power budget problems, which is flexible to extend the optical passive
star networks and keep the simplicity of system.
It was shown in [33] that POPS data channel can be efficiently utilized
for random permutatio-based communication patterns. In most of appli-
cations in network computing and parallel computing, some common com-
munication patterns are widely used. For example, All-to-all personalized
communication is the common way to globally exchange information. Global
reduction or global broadcasting is the pattern to collect data from all slaves
to the master in the master-slave model. In [33], they studied four common
communication patterns in TDM POPS networks which shows POPS net-
works are supportive for distributed/networking computing.
Reliability is one of important topics for distributed systems and local
area networks [14, 15]. we study the connectivity of POPS networks. The
networks reliability and fault tolerance rely on the connectivity of networks
topologies. There was no result about connectivity of POPS in previous
study. Both the station errors and the link errors are discussed with respect
to network partitions. Some worst cases of disconnected POPS networks
due to station/link errors are analyzed for getting the lower bounds on the
connectivity. On the other hand, several efficient approaches are proposed
to detect if the network is still connected, these approaches can also be used
to fix the disconnected system.
The following subsection describes the concept of POPS networks and
shows a couple of examples. In subsection 3.3, connectivity of POPS net-
works is analyzed for station errors and link errors. Some efficient ap-
proaches are proposed to make the whole system connected in subsection 3.4.
In subsection 3.5, we conclude our study and list some topics for future study.

3.2 Concepts for POPS Networks


POPS networks are based on optical passive star networks. An optical
passive star coupler is one of the most common ways to interconnect an
optical network via optical fibers. Figure 1 shows an optical passive star
network with d stations. Each station sends its signal to the passive star
coupler through its transmitter on a specific channel. In the passive star
coupler, all the signals are combined and broadcast to all the stations. The
Routing and Topology Embedding in Lightwave Networks 207

signals in a certain channel can be received by the stations whose receivers


occupy the corresponding channel.
GROUP GIl0UP

10 10

II II

" "
13

14 14

"
SOURCE
"
DESTINATION

Figure 11: A POPS network with n = 16, d = 8, and g = 2.

A POPS network, POPS[n,dj, can be determined by parameters: one is


the number of nodes in the system denoted by n, the other is the degree
of each coupler denoted by d. Each coupler is a d-station optical passive
star. The n nodes are partiti oned into g=n/d groups. An example of POPS
networks, POPS[16,8], is shown in Figure 11.
Each node has g transmitters and g receivers. There are c = g2 couplers
as intermediate couplers. They are denoted by Ci,j, 0 ~ i, j ~ g - 1. The
transmitters of nodes in Group j are connected to Ci,j for 0 ~ j ~ g - 1.
The receivers of nodes in Group i are connected to Ci,j for 0 ~ i ~ g - 1.
POPS networks are different from the usual electronic switching net-
works. A POPS network use dxd passive star coupler as the intermediate
connections. There are more research issues in a dxd passive star coupler
such as WDM, multiple channel access protocol and transmission collision
recovery.
Given POPS[n,d], the number of intermediate couplers is c = n 2 / ~.
To reduce the cost of such a POPS network and fully utilized the resource,
it is reasonable to assume that the number of intermediate couplers be no
more than the number of nodes in the system. This is equivalent to d ~
n 1/ 2 . In a WDM POPS[n,dj, some new constraints on nand d will be
needed to reduce the number of intermediate couplers. In order to reduce
208 Feng Cao

the cost of construction of a POPS network, people have to consider the


tradeoff of the cost of larger degree passive star couplers and the number
of intermediate passive star couplers. The large degree passive star coupler
is more expensive, but the POPS will use less number of optical passive
couplers. On the other hand, the small degree passive star coupler is cheaper,
but the total number of couplers will increase to support the same number
of nodes in the system.

3.3 Connectivity of POPS


Connectivity of a network is one of the fundamental parameters to show
reliability of the network. Based on connectivity, people can tell the whether
the network is still connected when the node failures or link errors happen.
In this subsection, we study some of worst cases for POPS networks and
show the low bound for connectivity. Some efficient approaches are also
proposed in the next subsection to make sure that the whole POPS network
is connected.
There are two kinds of errors in POPS networks: node errors and link
errors. Either some nodes in POPS network don't work, which is correspond-
ing to the traditional node connectivity. Or some output links or some input
links may go down, which is corresponding to the link connectivity. Note
that one transmitter failure in a node induces a output link error for that
node, and one receiver failure in a node induce a input link error for the
same node. Therefore, the link error problem in a POPS network is related
to transmitter or receiver failure problem in some sense.

First, we analyze the node connectivity of a POPS network.


Consider a POPS network POPS[n,d]. Let A and B be two working
nodes and A be in Group i and B in Group j. Since A can send out a
message to B through A's transmitter to Cj,i and B can receive the message
from B's receiver from Cj,i, the connection from A to B and be set up.
Similarly, the connection from B to A can be set up. This shows that the
connection between any two working nodes can always be set up. We put
this result in the following theorem.

Theorem 3.1 The POPS network is connected no matter how many nodes
don't work.

Theorem 3.1 tells us that POPS networks are very reliable with respect
to node failures. The working nodes are always connected with each other.
Routing and Topology Embedding in Lightwave Networks 209

l 111 link!.
Group ...
1 2 s
...
°
1
~0,1)

KI,2)
1,0,2)

:~1,3) ...
~o,s)

:~l,s+ 1)

2 "2,3) ~2,4) ... ~2,s+2)

g-2 g-2,g-1) g-2,0) ... g-2,s-2)

g-l ~g-l,O) ~g-l,l) ... ~g-l,s-l)

. lil1 ks
innut
Group
1 2 s

°1
KO,O)

(1,1)
(1,0)

(2,1)
...
...
(s-l,O)

(s,l)

2 (2,2) (3,2) ... (s+1,2)

g-2 (g-2,g-2) (g-l, (s-3, g-2)


O'-?)
.....
g-l (g-l,g-l) (O,g-l) (s-2,g-1)

Figure 12: An example with s input links and s output links


210 Feng Cao

There is no need to worry about node fault tolerance issues in the remaining
working system.
In the rest of this subsection, we discuss the link connectivity of a POPS
network. It turns out the link connectivity is much more complicated than
its node connectivity. We proposed some efficient approaches to guarantee
that the whole system is still connected in the next subsection.
Consider the following example Figure 12 of a POPS network POPS[n,dj.
Each node has up to s output link working and up to s input link working,
and s ~ g/2. Let (ij) denote the link to Ci,j from transceivers.
We Claim that the whole system is totally disconnected for g > 2, i.e.
no two nodes can communicate with each other. No sending message can
be received by any nodes in this situation. This is the worst case happening
to a system for network partitions.
Let G > 2. Let A and B be two nodes. A is in Group i and B in Group
j. Suppose that there is a hop such that A's message can be received by B.
There must exist p and q such that

(i,i +p) == (j +q,j)(mod g),1 ~p ~ sand 0 ~ q ~ s-1.

That is equivalent to

i == j + p and i + p == j (mod g)
We have i + j + P + q == i + j(mod g), i.e. p + q == O(mod g). Since
1 ~ p + q ~ 2s - 1 and 0 ~ 2s ~ g, this is impossible.
Thus the sending message message can't be received by B. Because A
and B can be any two station in the POPS network, we can say the whole
network is totally disconnected in this situation.
Based on this example, we generalize the result to have

Theorem 3.2 In POPS[n,dj, each node has up to x input links working,


and y output link working. If x + y ~ g and g > 2, the whole network may
be totally disconnected. No sending message could be received by any nodes
in the network.

Property 3.3 In POPS[n,dj, if each node has up to s input links working


and s output link working, and s ~ g /2 + 1. Then the whole network may
be partitioned into 9 /2 connected components.

Based on the previous example, we add one more input link and one
more output link for each node.
Routing and Topology Embedding in Lightwave Networks 211

If 9 = 28 and 9 > 2,
(s+ l)th IN (s+l)th OUT
Group 0: (0,0) (s,O)
Group 1: (1,1) (s+l,l)
Group 2: (2,2) (s+2,2)
Group 3: (3,3) (s+3,3)

Group s-l: (s-l, s-l) (2s-1,s-1)


Group s: (s,s) (a,s)
Group s+l: (s+l,s+l) (s,s+l)

Group 2s-2: (2s-2,2s-2) (s-2,2s-2)


Group 2s-1: (2s-1,2s-1) (s-l, 2s-1)
i------------------------i
I ,-, ,-,

:~:..;: ~ .!: I
I ,\ I \ I

->.
'
I I
,~------------------- ______ I

r-------------------------
I ,-, ,-.... I

:~:) :':: --> :newlink

i Group 2 _><- Group s+2


_ _ : old link
f"" - - ,
, , :totally
'- - - - - - - - -- - -- - --- - - - - - - - - ~ '- - -' connected component
, 0

r-------------------------
I ,-, ,-, I

,:~:) ~:):,
I ,<_ I

: Group 3 _> Group s+3 :


'- - --- -- -------------- ---- ..
Figure 13: Connectivity for (s+l)th input/output links

Since we already know that there is connection between any old input
links and any old output links, we only need to consider the effects of new
input links and new output links. It is obvious all the new input links and
all new output links are used to set up a path from one node to another.
From Figure 13, we have that all the nodes in Group i and Group i + 9/2
are fully connected with one another. There are 9/2 connected component
in this POPS network:
Component 1: Group 0 and Group 9/2
212 Feng Cao

Component 2: Group 1 and Group 9 /2 + 1

Component g/2: Group 9 /2 - 1 and Group 9 - 1

Property 3.4 In POPS[n,dj, if each node has up to s input links working


and s output link working, and s ::; g/2 + 2. Then the whole network may
be partitioned into 9 / 4 connected components.

We continue to add one more input link and one more output link for
each node. Similarly to the above analysis, we only need to consider the
effects of new input links and new output links. It is obvious all the new
input links and all new output links are used to set up a path from one node
to another.
Let p = g/2. From Figure 14, there are g/4 connected component in this
POPS network:
Component 1: Group 0, 1, p, p + 1
component 2: Group 2,3, p + 2,p + 3

Component g/4: Group p - 2,p -1,g - 2,g-1

r- F'-T
,-------------------------
,

!ik::'~~
I, ...
m~ ~ :
:1 ----- ~,~-~J:f --------
,I 'A·
I I
A·I ,I
:1': - - - - - "\.i":...L::c=..::.=:,
: Group
" 1

: ~.'=---=---=---=---~---""'--=-!--------------:..------------------"-

------------------------ : old links


p2
1FJrou Group s+21 ~ =;: : new links

-r----
t- ------- ,--- : total connected
I " I I I
':~-~ component

'j~m _ _ :Lm,
__________G:~U~~~3_]:
II / I
II II

::--~~U~3
~ ________________________ J

Figure 14: A demonstration for (s+2)th input/output links.


Routing and Topology Embedding in Lightwave Networks 213

Property 3.5 In POPS[n,dj, if each node has up to s input links working


and s output link working, and s ~ g/2 + 4. Then the whole network may
be partitioned into g/8 connected components.

This time we add two more input links and two more output links for
each node. Similarly to the above analysis, we only need to consider the
effects of new input links and new output links. It is obvious all the new
input links and all new output links are used to set up a path from one node
to another.

~ __ J :totally connected component


-~ :newliok
- :oldliok
Group O. 1. 2, 3 are already totally connected
Group I, 1+1, &+2,11+4 are already totally connected

Figure 15: A demonstration for (s+3)th and (s+4)th input/output links.

Let p = g/2. From Figure 15, there are g/8 connected component in this
POPS network:
Component 1: Group 0,1,2, 3,p,p + l,p + 2,p + 3

Component g/8: Group p - 4,p - 3,p - 2,p -1,g - 4,g - 3,g - 2,g-1
This process can be continued until we get an lower bound for the link
connectivity for a POPS network.

Theorem 3.6 In POPS[n,dj, even if each node has up to 75 percent input


links working and up to 75 percent output link working, the whole network
may be still disconnected.
214 Feng Cao

More specially, if each node has up to s input links working and s output
link working, and 8:5 £9, there may be no less than two network connected
component in the system.
The total number of new input links or new output links for each node
during the process:

1 + 1 + 2 + 4 + ... + g/8 = g/4


The total number of input links or output links after this process:
3
g/4+g/2=49

3.4 Detection Approaches


Theorem 3.6 tells us that POPS networks are not reliable in link errors or
transceiver errors. Even there are 75 percent links working in the system,
we can't guarantee that the network is connected. So some efficient methods
are needed to protect the network as a whole system without any partition.

Approach 3.7 Each group is connected locally. If each node has no less
than 9/2 + 1 input link working and no less than 9/2 + 1 output link working.
A output link to Ci,j from Group j if and only if there exist an input link to
Group i working.
This approach guarantees the network is connected. For any two nodes A
and B, A in Group i and B in Group j (i and j may be equal ). Let (i,pt)
be the 9/2 + 1 inputs to A,O :5 t :5 g/2. Let (qt,j) be the g/2 + 1 outputs
from B.
By the Pigeon Principle, there exist a and 6 such that
Po = q6, 0:5 a :5 9/2 and 0:5 6 :5 g/2.

A has a input link from By the approach, there must be a output


Ci,Pa.
link from a node in Group Po to Ci,Pa . B has a output link to C q6 ,j. By
the approach, there must be a input link from a node in Group Po to Cq6 ,i
. Since all the nodes in Group Po are connected, we find a path from B to
A through Group Po.
Similarly, there is a path from A to B by path of a certain group. There-
fore, the whole network is connected.
This is efficient approach with respect to each intermediate optical pas-
sive star couplers. By checking and fixing up links to each intermediate star
coupler to satisfy this condition, the whole system is connected.
Routing and Topology Embedding in Lightwave Networks 215

Approach 3.8 {l}For each node in Group i, check to make sure its input
link and output link to Ci,i always work.
{2}For Group i, 0 ~ i ~ 9 - 1, check to make sure there is at least one
output link to Ci+1,i and one input link from Ci,i-l.

This approach guarantee that the POPS network is connected.


Each group is locally connected for each node can send and receive mes-
sage from Ci,i by (1).
The links in (2) form a ring among all the groups. So any two groups
can talk to each other based on this ring.
This approach asks the special treatment for some links. The advan-
tage is the total number of links to guarantee the connectivity of the whole
network is greatly reduced for checking these special links.
The total number of links checked: 2n + 2g. The total number of links
in the network: 2ng. So we only check (~ + ~) of links to guarantee that
there is no network partition.

3.5 Summary
We consider relability in the all-optical Partitioned Optical Passive Stars
(POPS) topology. The reliability and fault tolerance in POPS rely on its
connectivity. In this study, we analyze the worst case for network partitions
due to either node failures or link errors. For node connectivity, we show
that the whole system remains connectivity no matter how many nodes
don't work. For link connectivity, we analyzed some worst cases due to
link errors and the lower bound for connectivity is demonstrated. Some
sufficient approaches are proposed to detect and keep connectivity of the
whole system.

4 Limit Tuning Range For Tunable Transceivers


Wavelength Division Multiplexing (WDM) has been widely used for study-
ing the performance of optical networks, especially those employing optical
passive star couplers. Many models have been proposed for WDM on an
optical passive star coupler, such as each station equipped with a single
tunable transmitter and a single fixed wavelength receiver, and each station
with multiple tunable transmitters and multiple tunable receivers. The cur-
rent technology only allows the transceivers to be tunable in a small range,
a fact ignored in previous studies. In this section, we focus on the design
of WDM optical passive star networks with tunable transmitters of limited
216 Feng Cao

tuning range and fixed wavelength receivers. The limited tuning range has
effects on the maximum delay, the total number of wavelengths which can
be used, and the topological embedding. Complete graphs, rings, meshes
and hypercubes are the four topologies studied in this section. The rela-
tionship between the total number of wavelengths which can be utilized and
the embedded topology is established. The bound for the maximum delay
is analyzed. The optimal embedding algorithms are given for the systems
embedded with one of the four topologies.

4.1 Introduction
The transceivers can be either tunable or fixed at a channel. In previous
studies, some combinations of the transmitters and receivers have been con-
sidered. When the transceivers are tunable, the schedule of wavelengths for
all transceivers becomes very important for achieving high performance of
the system. Lee and etc. [40, 41] studied the transmission scheduling for the
systems embedded with hypercubes and complete graphs, Cao and Borchers
[11] gave the optimal schedules for the systems embedded with de Bruijn
graphs.
One issue missing in previous research is the limited tunable range of
transceivers. Each channel requires 1-2 nm for wide bandwidth and current
technology can only support 3-7 nm for large bandwidth devices. This means
the reasonable tunable range can only be 3 to 7 wavelengths with current
technology. Even in the near future, the tunable range is not likely to
increase significantly because of technical difficulties. In this section, we
study the effects of the limited tunable range of transceivers on the optical
passive star network with an embedded topology.
We assume that each station has one tunable transmitter and one fixed
wavelength receiver. The range of the tunable transmitter is limited to k
channels (k is a small number). There are two reasons for picking a fixed
wavelength receiver over a tunable one. First, the cost of the system can
be kept lower. Tunable devices (especially receivers) are considerably mOre
expensive than fixed wavelength devices. Second, with a fixed wavelength
receiver, a transmitter wishing to send a message only needs to tune its wave-
length to that of the receivers fixed wavelength. There is no pre-transmission
coordination required between the two to determine which wavelength to em-
ploy for communication as would be the case if a tunable receiver was to be
used. We assume the receiver at each station occupies a wavelength within
the range of its tunable transmitter. This permits easy design of transceivers
and wavelength switching in the optical domain. Other requirements may
Routing and Topology Embedding in Ligbtwave Networks 217

become more clear when we discuss the details of the network.


Consider other models for the stations. In a system where each station
with one fixed wavelength transmitter and one fixed wavelength receiver,
it becomes a model of time multiplexing division for the whole system can
only use one wavelength. In a system where each station has one tunable
transmitter and several fixed wavelength receivers, the wavelengths of those
receivers should be in the tuning range of its transmitter. Otherwise, the
receiving signals can not be sent out. The system cost will increase for
the number of fixed-wavelength receivers and the synchronization devices
among them. Transmission schedule needs to be considered even within one
station. In a system where each station has several tunable transmitters and
one fixed wavelength receiver, let A be a station with channel a occupied by
its receiver. Then channel a must be within the range of all its transmitters.
Since the tuning range is limited to k channels, the union of the tuning range
of all of A's transmitters is at most in the range [a-k+1, a+k-1]. This is akin
to a system where each station has a fixed wavelength receiver and a tunable
transmitter with range 2k-l instead of k. For a system with multiple tunable
transmitters and multiple fixed wavelength receivers at each station, we can
decompose the system into several subsystems, with each subsystem having
only one tunable transmitter and one fixed receiver. Thus our assumption
is reasonable and provides the basis for other types of systems as well.
Topology embedding is very important for communication and high per-
formance parallel computing. With a limited tunable range for each trans-
mitter, one station can only communicate with those stations whose re-
ceivers are in its tunable range, in one hop. So the topological embedding
determines the nature of communication between any two stations. We can
determine a path between any two stations and the maximal network de-
lay only when the topological embedding is given. For scientific computing,
some special topologies may need to be embedded into the system to more
accurately reflect the computing model used.
We assume that there are a total of p wavelengths available for the whole
system, and the tunable range is no more than k channels and each station
has one tunable transmitter and one fixed receiver. Each station has a
tunable transmitter and a fixed receiver. Let A be a station in the system
and its transmitter be tunable in the range of [a, a+k-1]. The wavelength of
A's receiver must be in [a, a+k-l]. Only those nodes with the wavelength of
their receivers in [a, a+k-l] can receive the message from A directly, i.e. it
will take one hop to reach those nodes. For other stations, it takes multi-hop
transmission to receive the message
This section is organized as follows. In subsection 4.2, we consider the
218 Feng Cao

optical passive star network embedded with the complete graph, i.e. each
station is connected to all stations in the system. Since it is a multi-hop
system, the bounds for the maximum delay in the virtual topology embed-
ding without congestion are studied and embedding algorithms designed to
be optimal in terms of the maximum delay are proposed. The transmis-
sion schedule is discussed for common parallel communications. In subsec-
tion 4.4, meshes, an important topology for computing, are considered as the
embedded virtual topology. The relationship between the structure of the
mesh and the maximum number of wavelengths which can be used is ana-
lyzed. The algorithm for topological embedding is designed to maximize the
use of the available channels. In subsection 4.5, hypercubes, another impor-
tant topology for parallel architectures, are studied. The upper bound for
the maximum number of wavelengths used is shown. Dynamic programming
algorithms are designed for the topological embedding. In subsection 4.6,
we conclude this section and discuss the possibility of extending our work
to other models and other topological embeddings.

4.2 Complete Graph


There is no restriction on the total number of channels which can be used in
a system embedded with a complete graph. It is intuitive that the perfor-
mance of the system will be improved with more available channels with no
limit on the tuning range. With the limited tuning range on each transmit-
ter, one big effect is that the maximum delay for the whole system becomes
a performance bottleneck even if more channels are used. The maximum
delay is an important parameter for measuring the performance of network
communication and parallel computing. If it is too big, the performance of
the whole system will be poor for messages routed in the system through
the maximum delay path and the bandwidths are occupied by such com-
munications. We first estimate the relation between the total number of
used channels and the maximum delay to help us understand the effect of
bounded tunable range for the transmitters.

Theorem 4.1 In a WDM optical passive star network embedded with the
complete graph, there are p channels used. The maximum delay is no less
than lk=~J + 1 hops if k < p and is one hop if k = p.

Proof: It is obvious that it takes one hop for the communication between
any two stations if k = p. If k < p, suppose that W={ Wo, WI, ..• , Wp-l
} is the set of p channels used. Without loss of generality, assume that
Wi+! = Wi + 1, 0 ~ i ~p - 2.
Routing and Topology Embedding in Lightwave Networks 219

k 3 4 p3!. p2!. ~p P
maximum delay L~J +1 LI!jlJ Lr~lJ + 1 !'pi J + 2 3 1

Table 4: Lower bounds for the maximum delay

Case 1: (k-l) I (p-l). Consider the communication between a station A


whose receiver occupies Wo and a station B whose receiver occupies Wp-l.
Suppose that the maximum delay is no more than t~ hops, there exists
- -
t~ 1 stations Gi, 1 ~ i ~ t~ 1, such that Gi'S receiver occupies
wi*(k-l) and Gi'S transmitter is tunable in [i * (k - 1), (i + 1) * (k - 1)].
The communication from A to B passes through all Gi. Now consider the
communication from Gld._l to A. Since the tuning range of Gld._l is
10-1 10-1
[p-k,p-1], Gld._l has to send its message to one station D whose receiver
10-1
occupies one channel in [p - k,p - 1]. Then D forwards Gld._l'S message
10-1
to A.
A B
W
Wo W W W pol
o ...... 0
1 k-l 2(k-l)
o 0 ...... 0 0
~~U ~~

Figure 16: The path from A to B

If there exists i such that Gi is the only one station whose receiver occu-
pies wi*(k-l), it will take at least t=lhops from D to A. Thus the maximum
delay is no less than £::::~ + 1. Otherwise, the maximum delay is no less than
lcl
k-l'
Case 2: k-l is not a factor ofp-1. Consider the communication between a
station A whose receiver occupies Wo and a station B whose receiver occupies
Wp-l. It is easy to see that it needs at least Lt=U
+ 1 to send a message
from A to B. I
The following table demonstrates the lower bound of the maximum delay
in the above theorem.
In the next step, we design an algorithm, Gomplete-Graph-Embedding,
for the topological embedding. It has the optimal maximum delay. That
means it is best in such a multihop system. Without loss of generality, we
can assume that N ~ p. If N ~ 2 * p, it is possible that each channel are
occupied by two fixed wavelength receivers. So we consider N ~ 2 * P and
220 Feng Cao

N S 2 * p seperately.

Algorithm Complete-Graph-Embedding for N ~ 2 *p


1. Receivers' wavelength assignment
Divide N stations into p disjoint sets, Ni with INil 2: 2 (0 SiS p-l).
Assign wavelength Wi to the receivers of the stations in Ni. Divide Ni
into two disjoint sets L[i, 1] and L[i, 2] such that IIL[i, 1]1-IL[i, 2111 S 1.

2. Transmitters' wavelength assignment


For each station in L[i, 1], assign its tuning range of its transmitter to
be [i - k+ 1, i] for i 2: k -1 and [0, k -1] for i < k-1. For each station
in L[i, 2], assign its tuning range of its transmitter to be [i, i + k - 1]
for i S p - k and [p - k,p - 1] for i > p - k.

Algorithm Complete-Graph-Embedding for N < 2 * p


1. Receivers' wavelength assignment Divide N stations into p dis-
joint sets, Ni with INil 2: 1 (0 SiS p - 1). Assign wavelength Wi to
the receivers of the stations in N i .

2. Transmitters' wavelength assignment


If i - Lk~ 1 J* (k - 1) is even, assign the tuning range of its transmitter
in Ni to be [i, Hk-l] for p - Lk~d * (k - 1) > k - 1, and [p-k, p] for
p - Lk~lJ * (k - 1) S k - 1.
If i - Lk~lJ * (k -1) is odd, assign the tuning range of its transmitter
in Ni to be [i-k+l, i] for i 2: k -1, and [0, k-l] if 0 SiS k - 1.

In the stage of Receivers' wavelength assignment, the grouping of Ni


can be quite convenient. The only constraint here is that the size of each
set should be no less than two. In the stage of Transmitters' wavelength
assignment, the wavelengths of the transmitters are uniformly distributed.
An observation of the system is as following:

Theorem 4.2 The maximum delay in the system generated by Complete-


Graph-Embedding is no more than Lr::~J+1. This shows its has the optimal
maximum delay.

Proof: We first consider N 2: 2 * p.


Routing and Topology Embedding in Lightwave Networks 221

Since INil ~ 2 for 0 ~ i ~ p - 1, there is at least one station in L[i, 1]


and L[i,2] respectively. Let A and B be two stations in the system with
their receivers' occupying Wi and Wj respectively with i ~ j. Consider the
communication between A and B.
If A E L[i, 1], A can send its message to another station C which is in
L[i,2] by tuning its transmitter to Wi. C sends that message from A to B by
pass those stations which is in L[l, 2] for 1 = i + P * (k - 1), 1 ~ p ~ It:::~J.
So the communication from A to B is no more than l~:::: + 1 hops. U
Similarly, we can route the message from B to A with no more than
l~::::~J + 1 hops.
For N ~ 2 *p, we can route the message between A and B in the way as
the proof in Theorem 4.1 . By the above theorem, we know that Complete-
Graph-Embedding provides the optimal maximum delay. I

.. ., . 1
., ., ., ..
Uo,l) Xy y Y

~ ... Xy Y Y

1lI,I1 Y xv y

UI.11 xv y y

1.12,1) y y Xy

Xy Y Y
~""

..
UlIJ Y Y xv

~ xv y y

1.(4,11 Y Y xv

xy y y
~'"
U5,1] Y Y XY

!lUI Y xy y

1..16,1) Y xv
Y

1.I1i.21 y V xy

X: the wavelength for receivers, Y: the wavelength for tunable transmitters

Figure 17: Wavelength assignment for the system with N=22, p=7 and k=3.

By Complete-Graph-Embedding, the whole system has the optimal max-


imum delay. For example, the system with N=22, p=7 and k=3. The
channel graph generated by Algorithm Complete-Graph-Embedding is:
Ni = {3i,3i + 1, 3i + 2}, L[i, 1] = {3i}, and L[i, 2] = {3i + 1, 3i + 2} for
o~ i ~ 5. N6 = {18, 19,20, 21}, L[6, 1] = {18, 19} and L[6, 2] = {20, 21}.
Another example is for N=12, p=7 and k=3. Let Ni be nonempty for
o ~ i ~ 6.
The optimal maximum delay can be achieved by our algorithm for the
channel assignment. One important issue for the system embedded with the
222 Feng Cao

w
0
w
I
w
2
w
]
w
4
w , w
6

NO XY Y Y

NI Y Xy Y

N2 XY Y Y

N] Y Y Xy

N4 XY Y Y

N, Y Y XY

N6 Y Y Xy

X: the wavelength for receivers, Y: the wavelength for tunable transmitters

Figure 18: Wavelength assignment for the system with N=12, p=7 and k=3.

complete graph is the heavy load on some channels for uniform communi-
cation among the stations. Those channels can be the bottlenecks of the
whole system with heavy or bursty loads. But this phenomenon is one of
the key characteristics for a system of limited tuning range for each trans-
mitter embedded with the complete graph. No matter how the channels
are assigned, the bottlenecks are always there. The main reason is that the
tunable range is bounded by k and the communications between the stations
with low numbered channels and those with high numbered channels have
to traverse the system along the intermediate channels.

Theorem 4.3 In a system with limited tuning range for each transmitter
embedded with the complete graph, there always exist k consecutive chan-
nels which are the bottleneck for communication if the communication is
uniformly distributed.

Proof: Suppose that W = {wo, WI, ••• , Wp-l} is the set of used channels
with Wi+! = Wi + 1. Let [Wi, WHk-l] be the range which divides the whole
used channels into three parts: X is the set of stations whose receivers occupy
a channel in [0, Wi-I], Y the set of stations whose receivers occupy a channel
in [Wi, WHk-I], and Z the set of stations whose receivers occupy a channel
[WHk-I, Wp-l] with IIXI - IZII as small as possible. Define x = lXI, y =
IYI, z = IZI, and N=x+y+z.
Since the communication is uniformly distributed, we consider the re-
ceivers workload in [Wi, Wi+k-l] for all-to-all communication in such a envi-
ronment.
Routing and Topology Embedding in Lightwave Networks 223

x y z
W W W W W
o \ i i+\ ... i+k·\ W i+k wi+k+\'" wp-\

Figure 19: The partition of the stations

workload x(y + z) + z(x + y) + y(y - 1)


= y(x + z) + 2xz + y(y - 1)
= y(N - y) + 2xz + y(y - 1)
N-y 2
= O(Ny-y+2*(-2-))
1
= O(2(y2 - 2y + N 2))

O(~(y _1)2 + ~N2 -~)


2 2 2
O(N2)

The workload on [Wi, WHk-l] is in the order of N2. It will take O(N2/k) for
the transmission of this workload. We have specified that k is much smaller
than p. This means that the wavelength range from Wi to wHk-l will be
quite congested and overloaded. Now matter how the complete graph is
embedded, such a tuning range always can be found according to the above
construction. Therefore there always exist k consecutive channels which
are the bottleneck of communication if the communication is uniformly dis-
tributed.

We discuss how to implement one-to-all broadcasting for parallel com-


puting by using Algorithm Gomplete-Graph-Embedding for N ~ 2 * p.
One-to-all broadcasting is one of the most important operations in parallel
computing. Let A be a station in Ni. A in L[i, 1], sends a message to all
the stations whose receivers occupy the wavelengths less than Wi. At the
same time, A sends the message to a station B in L[i, 2]. B then sends the
message to all the stations whose receivers occupy wavelengths greater than
Wi·
This shows that this algorithm is flexible and useful for future Parallel
Virtual Machines (PVM) on such a system embedded with the complete
graph.
224 Feng Cao

4.3 Rings
Rings are one of the simplest network topologies. Rings are used widely
in local area computer networks for its simple structure and easies to be
maintained [1, 44]. for example, token rings and FDDI are based on rings
for communication, operation and maintain.
In this subsection, we study how to embed rings in an optical passive star
coupler and how to maximize the number of wavelengths for such a system.
This will give us some hints for the following subsections about meshes and
hypercubes.
node
0---0--0--- .... ---0--0--0
o 1 2 n-3 n-2 n-l

Figure 20: An example of L[n]

Before exploring the embedding of a ring into an optical passive star, we


demonstrate how to embed a line segment.
A line LIn] has n nodes, {O, 1, ... , n - I}, and node i is connected to i-I
and i+l (I :$ i :$ n - 2). An example is shown in Figure 20.

Lemma 4.4 If n > 2 is even, the total number of wavelengths for LIn]
which can be used is no more than j{k -1) +max{k - 2,O} + 1.
If n > 2 is odd, the total number of wavelengths for LIn] which can be
used is no more than nf (k - 1) + 1.

We just show how to proof the above lemma for n = 4 and n = 5 in


Figure 21. For n > 5, the proof is similar. Ifn = 4, we consider the receiver's
wavelength of each node. Let r{i) denote the receive's wavelength of node
i. Assume r{O) = a, we must have that Ir(2) - r{O)1 :$ k - 1 for node 2
and 0 are both neighbors of node 1. The transmitter of node 1 should cover
both r{O) and r{I). Without loss of generality, assume r(2) = a + k - 1.
Similarly, Ir(3) - r{I)1 :$ k - 1. The total wavelengths for transmitters are
no more than
n
[r(2)+k-l]-[r{I)-{k-l)] = 2{k-l)+[r{2)-r{I)] :$ "2{k-l)+max{k-2,O}

If n = 5, we consider the receiver's wavelength of each node. Let r{ i)


denote the receive's wavelength of node i. Assume r(O) = a, we must have
Routing and Topology Embedding in Lightwave Networks 225

receiver's wavelength

a s a+k-l t
O----fOr---0--0
node o 1 2 3

a s a+k-l t a+2(k-l)

node o 1 2 3 4

Figure 21: Proof for 1[4] and 1[5]

that Ir(2) - r{O) I ::; k - 1 for node 2 and 0 are both neighbors of node 1.
The transmitter of node 1 should cover both r{O) and r{I). Without loss
of generality, assume r(2) = a + k - 1. Similarly, Ir(4) - r(2)1 ::; k - 1 and
let r(4) = a + 2{k -1). The total wavelengths for transmitters are no more
than

n+l
[r(3) + k - 1)] - [r{l) - (k - 1)] = 2{k - 1) + [r(3) - r{I)] ::; -2-{k - 1)

Now we discuss the total range of receiver's wavelengths of all nodes in


the system for a ring with n nodes. Figure 22 shows n = 5,6,7, and 8.
They represent four cases to assign the maximum wavelengths for an optical
passive star network embedded a ring with the same number of nodes.

Lemma 4.5 If4m::; n::; 4m+2 with m > 0, the total number of receiver's
wavelengths of all nodes is no more than m(k - 1) + 1.
If n = 4m + 3 with m > 0, the total number of receiver's wavelengths of
all nodes is no more than (m + l)(k - 1) + 1.

The receiver's wavelength assignments are displayed in Figure 22. In


each case, we only consider the odd nodes. Because the increasement or
decreasement of wavelengths from node i to i+2 is k-l, the receiver's wave-
length of node i must fall in the interval of wavelengths of node i and i+2.
That means the range of all receiver's wavelengths of odd nodes are enough
for our study.
226 Feng Gao

Receiver's wavelength
a a

a a

a+2k-2

Figure 22: Four cases for rings

4.4 Meshes
A mesh M[c,d], is a set of nodes V(M[c, d]) = {(x, y) I 0 $ x $ c - 1,0 $
y $ d -1}, and two nodes, (xllyd and (X2,Y2), are connected by an edge
iff IXI - x21 + IYI - Y21 = 1. It is easy to see that diameter of M[n,m] is
n + m - 2. So we have
Lemma 4.6 The maximum delay in a system embedded with M[n,mj is
n+m-2.
One effect of the bounded range of the tunable transmitters is the total
number of channels which can be used in the whole system. Because of
the presence of tunable transmitters of a limited range, the total number of
channels will be restricted to a certain range. Thus the channel assignment
becomes more difficult than before where tunable transmitters of unlimited
range were considered. With unlimited tuning range, one station can tune
its transmitter's channel to any channel. The following theorem tells us the
upper bound for the number of channels which can be used for the system.
Theorem 4.7 The total number of wavelengths which can be used in a sys-
tem with the embedded M[n,mj is no more than (2k - 2) * rniml
+ k.
Routing and Topology Embedding in Lightwave Networks 227

o
o 2 3 4 5

Figure 23: An example of meshes: M[6,5)

Proof: Let A= (l~J, l r; J) = (a, b) be the center point of M[a, b). Define
Li = {(x, y) : Ix - al + Iy - bl = 2i}, 1 :5 i :5 rn)ml Let w be the channel
occupied by A's receiver.
For any station B in L17 there exists a station C such that A and B
must be the two neighboring stations of C. That means the set of channels
occupied by the receivers in Ll must be of the form [SI' tl) with w E [S17 td
and It 1 - s1I :5 2k - 1.
Similarly, we can determine the set of channels occupied by the receivers
in L 2. It must be of the form [S2' t2) with [SI' tl) ~ [S2' t2) and t2-s2 :5 4k-3.
Thus the set of channels occupied by the receivers in L n±m must be of
4
the form [Sn±m,tn±m] with [Sn±m_l,tn)m_l] ~ [Sn±m,tn)m] and
4 4 4 4

n+m
tn)m - Sn)m :5 1 + (2k - 2) * l-4- J

Let A' = (l~J, Lr; J+1) = (a', b'). Define L~ = {(x,y) : Ix-a'I+ly-b'l =
2i}, 1 :5 i :5 rn)ml Let w' be the channel occupied by A',s receiver. Since
A' is adjacent to A, Iw' - wi :5 k - 1. The same argument can apply to
L~. Because of the relation between Wi and w, the wavelengths used by L~
differentiate with those of Li with at most k-1 different channels. Note that
the union of Li and L~ is the vertex set of M[n,m).
So the total number of wavelengths are no more than
n+m n+m
1 + (2k - 2) * r-
4 -1 + (k -1) = (2k - 2) * r-4 -1 + k
228 Feng Gao

..JI

O:A ........... iaL I X:lbelfllioaoiaL 1 D:A· ............ iaL I

Figure 24: A, B, Uj and U;


k maximum channel
0(1) O{n+m)
O{n~) O{n~{n+m))
O(m~) O(mt(n+m))
O(n) o (n(n+m))
O{m) O{m{n+m))
Table 5: Upper bound for wavelengths on meshes

A table demonstrates the upper bound of the number of channels in the


above theorem.

The channel assignment algorithm is important for this system. The


reason is that this system tries to use as many channels as possible while
any two adjacent stations can communicate with each other directly.
We design an algorithm for channel allocation for general k. p is the
number of channels available for the system. This algorithm can attain the
upper bound for the maximum number of channels used in the system.
Define a (rmod p) = a if 0 :5 a :5 p - 1, a - l~J * p if l~J is even,
p - a + l~J * p if L~J is odd.
Routing and Topology Embedding in Lightwave Networks 229

Algorithm Mesh-Embedding

1. Initialization: For any station in the system, pick up its coordinate


in the mesh. Divide the mesh into two parts: X = {(i,j) E M[n,m] :
i + j is even} and Y = {(i,j) E M[n, m] : i + j is odd}. Divide X into
disjoint subsets: Xi = {(a, b) EX: a + b = i}, i is even and 0 ~ i ~
n + m. Divide Y into disjoint subsets: Yi = {( a, b) E Y : a + b = i}, i
is odd and 0 ~ i ~ n + m.

2. Channel Assignment: For each station in Xi and ~ even, assign


wavelengths in the range [ (~-1) * (k -1)(rmod p), (~-1) * (k -1) +
L~J (rmod p)] to its receiver and [ (~-1) * (k -1) - Lk;1 J(rmod p), ~-
1) * (k - 1) + Lk;1 J(rmod p) ] as the tuning range of its transmitter.
For each station in Xi with ~ odd, assign wavelengths in the range [
(i21 - 1) * (k -1) + Lk;1 J+ 1(rmod p), i21 * (k -1) -1(rmod p)] and
[ i21 - 1) * (k - 1)(rmod p), i21 * (k - 1) - 1(rmod p)] as the tuning
range of its transmitter.
For each station in Yi, assign the wavelengths to its transmitter and
receiver in same manner as described above for stations in X i - 1 .

Note that the maximal channels which can be used in the system em-
bedded with M[n,m] by Mesh-Embedding is rn-aml
* (k -1), which is in the
order of O(k*(n+m)). Therefore we say this algorithm provides an optimal
way to use as many wavelengths as possible.
We show how to assign wavelengths to a system with k=3 and 4 in
m[7,6]. These values are the ones most employed in current research.

4.5 Hypercubes
Hypercubes are widely used in parallel computing. The same questions
raised in the previous subsection also need to be answered in the systems
embedded with hypercubes. A hypercube H[2,n], is a set of nodes V =
{X = XOX1",Xn-1 : Xi = 0 or 1}, and two nodes X = XOXl",Xn-1 and
Y = YOY1".Yn-1 are connected by an edge iff Ei::-llxi - Yil = 1. It is easy
to see the diameter of H[n] is n. So we have the following:

Lemma 4.8 The maximum delay in a system embedded with H[2, nJ is n.


230 Feng Cao

Figure 25: An example of Mesh-Embedding for M[7,6] with k=3

Figure 26: An example of Mesh-Embedding for M[7,6] with k=4


Routing and Topology Embedding in Lightwave Networks 231

k 0(1) O(n~) O(n) O(n 2) O(2n)


maximum channel O(n) O(n!) O(n 2) O(n~) O(2n)

Table 6: Upper bound for wavelengths on hypercubes

We want to determine the maximum number of wavelengths which can


be used in a system with tunable transmitters of the limited tuning range
and the embedded hypercube topology.

Theorem 4.9 The total number of wavelengths which can be used in a sys-
tem with the embedded Hypercube[2, nj is no more than

Proof: Let X = XOXI ... Xn-1 with Xi = 0, i = 0,1, ... , n - 1 and w be

°
the wavelength of X's receiver. X has n adjacent stations Yi, denoted by
UI = {Yi : ~ i ~ n - I}. The union of channels of Yi's receiver is of the
form [aI, bl ] with wE lab bl ] and bl - al ~ k - 1.
Let U2 denote the union of all the stations adjacent to any node in UI.
Since X is also in U2, the union of channels of the receivers of the stations
in U2 is of the form [a2, b2] with [aI, bl ] E [a2, b2] and ~ - a2 ~ 2k - 1.
Similarly, we can continue this process. The final set is Un. The union
of channels occupied by the stations which are in Un is of the form [an, bn],
[an-I, bn- l ] ~ [an, bn], i.e. [an, bn] contains all the channels which are possi-
bly used in the system. Therefore

bn - an ~ 1 + (k - 1) + (k - 1) + ... + (k - 1) = 1 + n * (k - 1)

The following is a table for maximum number of channels for the system
embedded with hypercubes with different k.

We propose two dynamic-programming algorithms to allocate channels


for the system with an embedded hypercube.

Algorithm Hypercube-Embedding-One:
232 Feng Cao

w w

(3.2,4) (3.2,8)

(3.2.16)

Figure 27: Algorithm Hypercube-Embedding-One for (3,2,16)

1. Initialization: We have a system embedded with H[n], p available


channels and the tuning range of k channels for each transmitter, de-
noted by (p, k, 2n)
2. Top-Down: We only need to find the wavelength assignment for (p,
k, 2n - 1). After that, the corresponding nodes in the two H[n-1]s with
the same the wavelength assignment are connected by an edge.
3. Basis: This procedure can continue until we find some c such that
an wavelength assignment for (p, k, 2C ) is already known or is easily
determined.
We show an example with p=3, k=2 and n=4.
Algorithm Hypercube-Embedding-Two:
1. Initialization: We have a system embedded with H[n], p available
channels and the tuning range of k channels for each transmitter, de-
noted by (p, k, 2n)
2. Top-Down: We only need to find the wavelength assignment for (p/2,
k/2, 2n - 1). After that, the wavelength of each receiver is doubled in
one H[n-1], and the wavelength of each receiver is one more than the
old wavelength double in the other H[n-1]. Connect the corresponding
nodes in the two H[n-1]s with an edge.
Routing and Topology Embedding in Lightwave Networks 233

3. Basis: This procedure can continue until we find some c such that an
wavelength assignment for (~, 2nk_ c , 2C ) is already known or is easily
determined.

We show an example with p=12, k=9, and n=4.

w w

D
1 2 w

w w
o 1

(3,3,4) (6, 5, 8)

w
4 k----f--"-,/

w
o

(12,9,16)

Figure 28: Algorithm Hypercube-Embedding-Two for (12,9,16)

There exists a third dynamic algorithm which combines the above two
dynamic algorithms into a new dynamic algorithm for the hypercube em-
bedding.

4.6 Conclusion
We specify the problems introduced by the limited tuning range of trans-
mitters for topological embedding, which was ignored in previous studies.
After formulating the problems, we study the optical passive star network
embedded with the complete graph, meshes and hypercubes. These three
topologies are among the most widely used structures in both network com-
munication and parallel computing. For the system embedded with the com-
plete graph, the bounds for the maximum delay are studied and the embed-
ding algorithm is designed which is optimal in terms of the maximum delay.
The bottleneck phenomenon is analyzed in such a environment. One-to-all
broadcasting can be implemented for the common parallel communications
234 Feng Cao

by using our embedding algorithm. For both meshes and hypercubes, the
relation between the structure of the topologies and the maximum number
of wavelengths which can be used are analyzed. The algorithm for the topo-
logical embedding is designed to show how to maximize the use of available
channels. Examples are given to show the efficiency of our algorithms for
the topological embedding.
The problems we proposed in Section 1.4 also need to be answered for
the systems embedded with other topologies. The methods we used for these
three structures can also be helpful in designing new embedding algorithms
and analyzing the rela tions among different system parameters. Our fu-
ture work will be based on othe r topologies, and the impact of embedding
multiple topologies in the same system .

References
[1] ANSI/IEEE Standard 802.5-1985, Token Ring Access Method and
Physical Layer Specifications, 1985

[2] Anthony S. Acampora, An Introduction to Broadband Networks,


Plenum Press, New York, 1994.

[3] Nima Ahmadvand and Terence Todd, Dual-Hop LANs Using Station
Wavelength Routing, IEEE ICCCN'95, September, 1995.

[4] J. D. Attaway and J. Tan, HyperFast: Hypercube Time Slot Allocation


in a TWDM Network, preprint, 1994.

[5] Subrata Banerjee and Biswanath Mukherjee, Fairnet: A WDM-based


Multiple Channel Lightwave Network with Adaptive and Fair Schedul-
ing Policy, Journal of Lightwave Technology, pp. 1104-1112, 5/6, 1993.

[6] J. A. Bannister and L. Fratta and M. Gerla, Topological design of the


wavelength-division optical network, IEEE Infocom'90, pp. 1005-1013,
1990.

[7] S. Bhattacharya, D. H. Du, and A. Pavan, A Network Architecture for


Distributed High Performance Heterogeneous Computing, Proceedings
of the Hete"rogeneous Computing Workshop, pp. 110-115, 1994.

[8] G. Birkhoff, Tres observaciones sobre el algebra lineal, Univ. Nac. 1\£-
cuman, Rev., Ser. A, 5, pp. 147-151, 1946.
Routing and Topology Embedding in Lightwave Networks 235

[9] Michael S. Borella and Biswanath Mukherjee, Efficient Scheduling of


Nonuniform Packet 'fraffic in a WDM/TDM Local Lightwave Network
with Arbitrary 'fransceiver TUning Latencies, IEEE Infocom '95, pp.
129-137, 1995.

[10] C. A. Brackett, Dense Wavelength Division Multiplexing Networks:


Principles and Applications, IEEE Journal on Selected Areas in Com-
munications, vol. 8, no. 6, pp. 948-964, 1990.

[11] F. Cao and A. Borchers, Optimal 'fransmission Schedules for Embed-


dings of the De Bruijn Graphs in an Optical Passive Star Network,
Proceedings of IEEE 5th International Conference on Computer Com-
munication and Networks (ICCCN'96), October 1996, Maryland, USA.

[12] F. Cao, Reliable Routing in Circluant Networks, Proceedings of IEEE


22nd Local Computer Networks (LCN'97), November 1997, Minneapo-
lis, USA.

[13] F. Cao, D. H. C. Du and A. Pavan, Design of WDM Optical Passive


Star Networks with TUnable 'fransceivers of Limited TUning Range, to
appear in IEEE Transactions on Computers.

[14] F. Cao, D.Z. Du, D.F. Hsu and P. Wan, Fault-tolerant routing in But-
terfly Networks, DIMACS workshop on network design, April, Prince-
ton, 1997. Technical report TR 95-073, Department of Computer Sci-
ence, University of Minnesota, 1995

[15] F. Cao, D.Z. Du and D.F Hsu, On the fault-tolerant diameters and con-
tainers of large bipartite digraphs, International Symposium On Com-
binatorial Algorithm, Tianjin, China, June, 1996.

[16] Ming Chen and Nicolas D. Georganas, Multiconnection Over Multi-


channels, IEEE Infocom '95, pp. 1037-1043, 1995

[17] D. Chiarulli, S. Levitan, R. Melhem, J. Teza and G. Gravenstreter, Mul-


tiprocessor interconnection networks using partitioned optical passive
star topologies and distributed control, Proceedings of the 1st Interna-
tional Workshop on Massively Parallel Processing Using Interconnec-
tions, 70-80, IEEE, April 1994

[18] K. W. Cheung, Scalable, Fault Tolerant I-Hop Wavelength Routing,


Globe Com, pp. 1240-1244, 1991.
236 Feng Cao

[19] R. Chipalkatti and Z. Zhang and A. S. Acampora, High Speed Commu-


nication Protocols for optical star coupler ing WDM, IEEE Infocom,
pp. 2124-2133, 1992.

[20] S.D. Cusworth and J.M. Senior and A. Ryley, Wavelength Division Mul-
tiple Access on a High-speed Optical Fibre LAN, Computer Networks
and ISDN Systems, pp. 323-333, 1989/90.

[21] Patrick W. Dowd, Random Access Protocols for High-Speed Interpro-


cessor Communication Based On an Optical Passive Star Topology,
Journal Of Lightwave Technology, pp. 799-808, vol. 9, 1991.

[22] P. W. Dowd, Wavelength Division Multiple Access Channel Hypercube


Processor Interconnection, IEEE Transactions on Computers" vol. 41,
no. 10, pp. 1223-1241, 1992.

[23] D. H. C. Du and R. J. Vetter, Advanced Computer Networks, preprint.

[24] C. Ersoy and S. P. Pamwar, Topological design of interconnected LAN-


MAN networks, IEEE Infocom, pp. 2260-2269, 1992.

[25] A. Ganz, B. Li and L. Zenou, Reconfigurability of multi-star based


lightwave LANs, GlobeCom'91, volume 3, 1906-1910, 1992

[26] Mario Gerla and B. Kannan and P. Palnati, Protocols for an Optical
Star Interconnect for High Speed Mesh Networks, IEEE Infocom'95,
pp. 146-153, 1995

[27] G. R. Green, Fiber Optic Networks, Englewood Cliffs, New Jersey, Pren-
tice Hall, 1993.

[28] Optical Communications Systems, Prentice-Hall, 2nd Edition, 1993.

[29] J. E. Hopcroft and R. M. Karp, "An n 5/ 2 algorithm for maximum


matchings in bipartite graphs," SIAM Journal on Computing, 2 (4),
pp. 225-231, 1973.

[30] Thomas Inukai, An Efficient SS/TDMA Time Slot Assignment Algo-


rithm, IEEE Transactions On Communications, pp. 1449-1455, vol.
COM-27, 1979

[31] A. Ganz and Y. Gao, "Time-Wavelength Assignment Algorithms for


High Performance WDM Star Based Systems," IEEE Transactions on
Communi cations, vol. 42, no. 2,3,4, pp. 1827-1836, 1994.
Routing and Topology Embedding in Lightwave Networks 237

[32] G. Gravenstreter, R. Melhem, D. Chiarulli, S. Levitan and J. Teza,


The partitioned optical passive stars(POPS) topology, Proceedings of
the Ninth International Parallel Processing Symposium, April 1995

[33] G. Gravenstreter and R.G. Melhem, Realizing Common Communica-


tion Patterns in Partitioned Optical Passive Stars(POPS) Networks,
Proceedings of the 2nd Int. Conf. on Massively Parallel Processing Us-
ing Optical Interconnections, San Antonio, TX, 1995

[34] IEEE Std 902.3-1985, Carrier Sense Multiple Access with Collision De-
tection (CSMAjCD) Access Method and Physical Layer Specifications,
IEEE ISDN 0-471-892749-5, 1985.

[35] M. Janoska and T.D. Todd, A Simplified Optical Star Network Using
Distributed Channel Controllers, Submitted for publication, 1993
[36] B. Kannan and Shivi Fotedar and Maria Gerla, A Protocol for WDM
Star Coupler Networks, IEEE Infocom'94, pp. 1536-1543, June, 1994.
[37] Mohsen Kavehrad and Ganti Sudhakar and Nicolas Georganas, Slotted
Aloha and Reservation Aloha Protocols For Very High-Speed Optical
Fiber Local Area Networks Using Passive Star Topology, Journal Of
Lightwave Technology, pp. 1411-1422, vol. 8, 1991.

[38] Milan Kovacevic and Mario Gerla, On the performance of shared-


channel multihop lightwave networks, IEEE Infocom '94, pp. 5440-5510,
1994
[39] J. P. Labourdette and A. S. Acampora, Wavelength agility in multihop
lightwave networks, IEEE Infocom'90, pp. 1022-1029, 1990.
[40] Sang-Kyu Lee, A. Duksu Oh, and Hyeong-Ah Choi, Transmission
Schedules for Hypercube Interconnection in TWDM Optical Passive
Star Networks, Department of Electrical Engineering and Computer
Science, George Washington University Technical Report GWU-IIST-
95-07, 1995.
[41] Sang-Kyu Lee, A. Duksu Oh, Hongsik Choi, and Hyeong-Ah Choi,
Optimal Transmission Schedules in TWDM Optical Passive Star Net-
works, Department of Electrical Engineering and Computer Science,
George Washington University Technical Report GWU-IIST-95-03.
[42] J. H. van Lint, and R. M. Wilson, A Course in Combinatorics, Cam-
bridge University Press, 1992.
238 Feng Gao

[43] G. Liu, K. Y. Lee, and H. Jordan, TDM Hypercube and TWDM Mesh
Optical Interconnections, Proceedings of IEEE GLOBECOM, pp. 1953-
1957, 1994.

[44] M. Marsan and A. Bianco and E. Leonardi and M. Meo and F. Neri,
On the Capacity of MAC Protocols for All-Optical WDM Multi- Rings
with Tunable 'Iransmitters and Fixed Receivers, submitted to IEEE
Infocom '96, 1996.

[45] N. F. Maxemchuk, Regular Mesh Topologies in Local and Metropolitan


Area Networks, AT&T Technical Journal, pp. 1659-1683, vol. 64, 1985.

[46] R. Melhem and G. Hwang, Embedding rectangular grids into square


grids with dilation two, IEEE Transactions on Computers, 1446-1455,
December 1990

[47] B. Mukherjee, WDM-Based local lightwave networks, Part I: Single-


Hop systems, IEEE Networks, 12-27, May 1992

[48] A. Pavan, S. Bhattacharya and D. H. C. Du, Reverse Channel Aug-


mented Multihope Lightwave Networks, Proceedings of the IEEE Info-
com, San Fransico, March, 1993.

[49] A. Pavan, P. J. Wan, S. R. Tong and D. H. C. Du, A New Multi-


hop Lightwave Network Based on the Generalized De-Bruijn Graph,
the Proceeding of the IEEE Local Networks Conference, Minneapolis,
October 1996.

[50] R. Ramaswami, Multiwavelength lightwave networks for computer com-


munication, IEEE Communications Magazine, pp. 78-88, voL 31, 1993.

[51] G. N. Rouskas and M. H. Ammar, Analysis and Optimization of trans-


mission schedules for single-hop WDM networks, IEEE Infocom '93, pp.
1342-1349, 1993.

[52] A. Ryley and S.D. Cusworth and J.M. Senior, Piggybacked Token Pass-
ing Access Protocol for Multichannel Optical Fibre LANs, Comput.
Comm., pp. 213-222, voL 12, 1989.

[53] J. Sharony and T. E. Stern and K. W. Cheung, Extension of Classical


Rearrangable and Non-blocking Networks to the Wavelength Dimen-
sion, IEEE GlobeCom, pp. 1901-1905, 1992.
Routing and Topology Embedding in Lightwave Networks 239

[54] K. N. Sivarajan and R. Ramaswami, "Lightwave Networks Based on de


Bruijn Graphs," IEEE Transactions on Networking, vol. 2, no. 1, pp.
70-79, 1994.

[55] G. N. M. Sudhakar and N. D. Georgana and M. Kavehrad, Slotted


Aloha and Reservation Alha Protocols for Very High Speed Optical
Fiber Local Area Networks Using Passive Star Topology, Journal 0/
Lightwave Tech, vol. 9, 1991.

[56] Terence D. Todd and Adrian M. Grah and Oliver Barkovic, Optical Lo-
cal Area Networks (LANS) Using Wavelength-Selective Couplers, IEEE
In/ocorn '95, pp. 916-923, 1995.

[57] S. R. Tong, "Efficient Designs for High Speed Network Architectures",


Ph.D. Thesis, University of Minnesota, February, 1994

[58] K. A. Williams and D. H. C. Du, Efficient Embedding of a Hyper-


cube in an Irregular WDM Network, Technical Report, Department of
Computer Science, University of Minnesota, 1991.

[59] K. A. Williams and D. H. C. Du, Time and Wavelength Division Multi-


plexed Architectures for Optical Passive Star Networks, Technical Re-
port, Department of Computer Science, University of Minnesota, 1991.
241

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 241-337
@1998 Kluwer Academic Publishers

The Quadratic Assignment Problem


Rainer E. Burkard* and Eranda Qela*
Institute of Mathematics,
Technical University Graz, Steyrergasse 30, 8010 Graz, Austria
E-mail: {burkard.cela}<Dopt.math.tu-graz.ac.at

Panos M. Pardalos and Leonidas S. Pitsoulis


Center for Applied Optimization,
Industrial and Systems Engineering Department,
University of Florida, Gainesville, FL 32611
E-mail: leonidas<Ddeming. ise . ufl. edu, pardalos<Dufl. edu

Contents
1 Introduction 243
2 Formulations 245
2.1 Quadratic Integer Program Formulation 245
2.2 Concave Quadratic Formulation. 246
2.3 Trace Formulation 248
2.4 Kronecker Product . . . . 249

3 Computational complexity 249


4 Linearizations 253
4.1 Lawler's Linearization . . . . . . . . . 254
4.2 Kaufmann and Broeckx Linearization 254
4.3 Frieze and Yadegar Linearization ... 255
4.4 Adams and Johnson Linearization .. 257
·These authors have been supported by the Spezialforschungsbereich F 003 "Opti-
mierung und Kontrolle" , Projektbereich Diskrete Optimierung.
242 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

5 QAP Polytopes 258

6 Lower Bounds 262


6.1 Gilmore-Lawler Type Lower Bounds · 262
6.2 Bounds Based on Linear Programming Relaxations . . . . . . . . . . 270
6.3 Variance Reduction Lower Bounds . . . . . . . . . . . . . . . . . . . 273
6.4 Eigenvalue Based Lower Bounds . . . . . . . . . . . . . . . . . . . . 275
6.5 Bounds Based on Semidefinite Relaxations . . . . . . . . . . . . . . . 281
6.6 Improving Bounds by Means of Decompositions . . . . . . . . . . 285

7 Exact Solution Methods 287


7.1 Branch and Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
7.2 Traditional Cutting Plane Methods . . . . . . . . . . . . . . . . . . . 288
7.3 Polyhedral Cutting Planes . . . . . . . . . . . . . . . . . . . . . . . . 289

8 Heuristics 291
8.1 Construction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 291
8.2 Limited Enumeration Methods . . . . . . . . . . . . . . . . . · 292
8.3 Improvement methods . . . . . . . . . . . . . . . . . . . . . . . · 292
8.4 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · 294
8.5 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . ... 295
8.6 Genetic Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . · 296
8.7 Greedy Randomized Adaptive Search Procedure . . . . . . . . · 297
8.8 Ant Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .301

9 Available Computer Codes for the QAP 302

10 Polynomially Solvable Cases 304

11 QAP Instances with Known Optimal Solution 307

12 Asymptotic Behavior 309

13 Related Problems 314


13.1 The Bottleneck QAP . . . . . . . . . · 315
13.2 The Biquadratic Assignment Problem . . . . . . . . . . . . . . . . . 316
13.3 The Quadratic Semi-Assignment Problem . . . . . . . . . . . . . . . 317
13.4 Other Problems Which Can Be Formulated as QAPs . . . . . . . . . 318

Bibliography
The Quadratic Assignment Problem 243

1 Introduction
The quadratic assignment problem (QAP) was introduced by Koopmans
and Beckmann in 1957 as a mathematical model for the location of a set of
indivisible economical activities [113]. Consider the problem of allocating a
set of facilities to a set of locations, with the cost being a function of the
distance and flow between the facilities, plus costs associated with a facility
being placed at a certain location. The objective is to assign each facility to
a location such that the total cost is minimized. Specifically, we are given
three n x n input matrices with real elements F = (Jij) , D = (dkl) and
B = (b ik ), where lij is the flow between the facility i and facility j, dkl is
the distance between the location k and location l, and bik is the cost of
placing facility i at location k. The Koopmans-Beckmann version of the
QAP can be formulated as follows: Let n be the number of facilities and
locations and denote by N the set N = {I, 2, ... ,n}.

(1)

where Sn is the set of all permutations 4>: N --t N. Each individual product
/ijdrP(i)rPU) is the cost of assigning facility i to location 4>(i) and facility j to
location 4>(j). In the context of facility location the matrices F and D are
symmetric with zeros in the diagonal, and all the matrices are nonnegative.
An instance of a QAP with input matrices F, D and B will be denoted by
QAP(F, D, B), while we will denote an instance by QAP(F, D), if there is
no linear term (i.e., B = 0).
A more general version of the QAP was introduced by Lawler [118]. In
this version we are given a four-dimensional array G = (Cijkl) of coefficients
instead of the two matrices F and D and the problem can be stated as
n n n
min L L CijrP(i)rPU) + L
rPESn i=l j=l i=l
birP(i) (2)

Clearly, a Koopmans-Beckmann problem QAP(F, D, B) can be formulated


as a Lawler QAP by setting Cijkl := lijdkl for all i,j, k, 1 with i =I j or k =ll
and Ciikk := liidkk + bik' otherwise.
Although extensive research has been done for more than three decades,
the QAP, in contrast with its linear counterpart the linear assignment prob-
lem (LAP), remains one of the hardest optimization problems and no exact
algorithm can solve problems of size n > 20 in reasonable computational
244 R.E. Burkard, E. Qela, P.M. Pardalos, and L.B. Pitsoulis

time. In fact, Sahni and Gonzalez [164] have shown that the QAP is NP-
hard and that even finding an approximate solution within some constant
factor from the optimal solution cannot be done in polynomial time unless
P=NP. These results hold even for the Koopmans-Beckmann QAP with co-
efficient matrices fulfilling the triangle inequality (see Queyranne [152]). So
far only for a very special case of the Koopmans-Beckmann QAP, the dense
linear arrangement problem a polynomial time approximation scheme has
been found, due to Arora, Frieze, and Kaplan [7]. Complexity aspects of
the QAP will be discussed in more detail in Section 3.
Let us conclude this section with a brief review of some of the many
applications of the QAP. In addition to facility layout problems, the QAP
appears in applications such as backboard wiring, computer manufacturing,
scheduling, process communications, turbine balancing, and many others.
One of the earlier applications goes back to Steinberg [168] and concerns
backboard wiring. Different devices such as controls and displays have to
be placed on a panel, where they have to be connected to each other by
wires. The problem is to find a positioning of the devices so as to minimize
the total wire length. Let n be the number of devices to be placed and let
dkl denote the wire length from position k to position I. The flow matrix
F = (fij) is given by

1 if device i is connected to device j,


h; = { o otherwise.
Then the solution to the corresponding QAP will minimize the total wire
length. Another application in the context of location theory is a campus
planning problem due to Dickey and Hopkins [58]. The problem consists
of planning the sites of n buildings in a campus, where dkl is the distance
from site k to site 1, and lij is the traffic intensity between building i and
building j The objective is to minimize the total walking distance between
the buildings.
In the field of ergonomics Burkard and Offermann [36] showed that QAPs
can be applied to typewriter keyboard design. The problem is to arrange the
keys in a keyboard such as to minimize the time needed to write some text.
Let the set of integers N = {I, 2, ... ,n} denote the set of symbols to be
arranged. Then lij denotes the frequency of the appearance of the pair of
symbols i and j. The entries of the distance matrix D = dkl are the times
needed to press the key in position 1 after pressing the key in position k
for all the keys to be assigned. Then a permutation cp E Sn describes an
The Quadratic Assignment Problem 245

assignment of symbols to keys An optimal solution if>* for the QAP mini-
mizes the average time for writing a text. A similar application related to
ergonomic design, is the development of control boards in order to minimize
eye fatigue by McCormick [126]. There are also numerous other applications
of the QAP in different fields e.g. hospital lay-out (Elshafei [63]), ranking
of archeological data (Krarup and Pruzan [114]), ranking of a team in a
relay race (Heffley [93]), scheduling parallel production lines (Geoffrion and
Graves [76]), and analyzing chemical reactions for organic compounds (Ugi,
Bauer, Friedrich, Gasteiger, Jochum, and Schubert [173]).

2 Formulations
For many combinatorial optimization problems there exist different, but
equivalent mathematical formulations, which stress different structural char-
acteristics of the problem, which may lead to different solution approaches.
Let us start with the observation that every permutation if> of the set N =
{I, 2, ... , n} can be represented by an n x n matrix X = (xii), such that
I if if>{i) = j,
{
Xii = 0 otherwise.

Matrix X is called a permutation matrix and is characterized by following


assignment constraints
n
L:Xii = 1, j = 1,2, ... ,n,
i=l
n
LXii = 1, i = 1,2, ... ,n,
i=l
Xii E {O, I}, i, j = 1,2, ... ,n.
We denote the set of all permutation matrices by X n . Due to a famous
theorem of Birkhoff the permutation matrices correspond in a unique way to
the vertices of the assignment polytope ( the Birkhoff polytope, the perfect
matching polytope of Kn,n etc.). This leads to the following description of
a QAP as quadratic integer program.

2.1 Quadratic Integer Program Formulation


Using permutation matrices instead of permutations, the QAP ((2) can be
formulated as the following integer program with quadratic objective func-
246 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

tion (hence the name Quadratic Assignment Problem by Koopmans and


Beckmann [113]).
n n n n n
min 2: 2: 2: 2: Ci;klXikX;l + 2: bi;Xi; (3)
i=1 ;=1 k=ll=1 i,;=1
n
s.t. 2: Xi; = 1, j = 1,2, ... ,n, (4)
i=1
n
2: x;; = 1, i = 1,2, ... ,n, (5)
;=1
Xi; E {O, I}, i,j = 1,2, ... ,n. (6)

l.From now on, whenever we write (Xi;) E X n , it will be implied that the Xi;
satisfy the assignment constraints (4), (5) and (6).
Many authors have proposed methods for linearizing the quadratic form
of the objective function (3) by introducing additional variables; some of
these of linearizations will be discussed in Section 4.
A QAP in Koopmans-Beckmann form can be formulated in a more com-
pact way if we define an inner product between matrices. Let the inner
product of two real n x n matrices A, B be defined by
n n
(A, B) := L L aijbij.
i=I;=1

Given some n X n matrix A, a permutation cf> E Sn and the associated


permutation matrix X E X n , then AXT and XA permute the columns and
rows of A, respectively, according to the permutation cf> and therefore

Thus we can formulate a Koopmans-Beckmann QAP alternatively as

min (Jl,XDXT) + (B,X) (7)


s.t. X E Xn •

2.2 Concave Quadratic Formulation


In the objective function of (3), let the coefficients Cijkl be the entries of an
n2 X n2 matrix S, such that Ci;kl is on row (i-l)n+k and column (j -1)n+l.
Now let Q := S - aI, where I is the (n 2 xn2 ) unit matrix and a is greater
The Quadratic Assignment Problem 247

than the row norm IISlloo of matrix S. The subtraction of a constant from the
entries on the main diagonal of S does not change the optimal solutions of
the corresponding QAP, it simply adds a constant to the objective function.
Hence we can consider a QAP with coefficient array Q instead of S. Let
x = (Xn,X12, ... ,Xln ,X21, ... ,Xnn )t = (Xl, ... ,Xnn)t. Then we can rewrite
the objective function of the QAP with array of coefficients Q as a quadratic
form xTQx, where:
n2 n 2 -1 n2
= L qii x ; +2 L L qijXiXj
i=l i=l j=i+l
n2 n2 n 2 -1 n2
= L(qii + L qij)X; - L L qij(Xi - Xj)2
i=l j=l i=l j=i+l
#i
n2 n2 n 2 -1 n2
=L(-o+LSij)X;- L L Sij(Xi- Xj)2
i=l j=l i=l j=i+l
n2 n2
:$ L( - 0 + L Sij)X;.
i=l j=l

Since x T [I/2(Q+QT)]x = 1/2xT Qx, we can assume that Q is symmetric


and negative definite. Therefore we have a quadratic concave minimization
problem and can formulate the QAP as
mm xTQx
n
s.t. LXij = 1, j = 1,2, ... ,n, (8)
i=l
n
LXij = 1, i = 1,2, ... ,n,
j=l
Xij ~ 0, i, j = 1,2, ... ,n.
Bazaraa and Sherali [16] introduced the above formulation, and used it to
derive cutting plane procedures. Although their exact methods were com-
putationally not efficient, heuristics derived from these procedures produced
suboptimal solutions of good quality.
By adding the term oJ to the matrix Q instead of subtracting it, we
could always assume that the objective function of the QAP is convex. This
leads to the formulation of the QAP as a quadratic convex minimization
problem.
248 R.E. Burkard, E. Oela, P.M. Pardalos, and L.S. Pitsoulis

2.3 Trace Formulation


The trace of an n x n matrix B is defined to be the sum of its diagonal
elements, i.e.:
n
trB:= Lb ii .
i=l

Consider a Koopmans-Beckmann QAP instance with input matrices F, D


and B. Letting D = X DT XT, then

n n n n
tr(F D) = L L !ijdji = L L fijdcp{i)cp{j),
i=lj=l i=lj=l

since d ji = dcp{i)cp{j), i, j = 1, ... , n, where ¢ E Sn is the permutation associ-


ated with X (see 2.1). Since tr(BXT) = L:~=1 bicp{i), the QAP in (7) can be
formulated as

min tr(FXDT + B)XT (9)


s.t. X E Xn.

The trace formulation of the QAP first appeared in Edwards [61, 62], and
was used by Finke, Burkard, and Rendl [67] to introduce the eigenvalue lower
bounding techniques for symmetric QAPs (see Section 7.1). Given any two
real n x n matrices A, B, recall the well known properties tr(AB) = tr(BA),
(AB)T = BT AT and trA = trAT. For F = FT we can then write the
quadratic term in (9) as

where D is not necessarily symmetric. Therefore, given a QAP instance


where only one of the matrices is symmetric (say F), we can transform it
into a QAP instance where both matrices are symmetric. This is done by
introducing a new symmetric matrix E = ~(D + DT):
The Quadratic Assignment Problem 249

2.4 Kronecker Product


Let A be a real m x n matrix and let B be a real p x q matrix. Then the
Kronecker product of matrices A and B is defined as

al~B ~:: al~B).


am2 B ... amnB

That is, A ® B is the mp x nq matrix formed from all possible pairwise


2
element products of A and B. If we let vec(X) E lRn be the vector formed
by the columns of a permutation matrix X, the QAP can be formulated as

min vec(X)T (F ® D)vec(X) + vec(B)T vec(X), (10)


s.t. X E Xn.

Operations using the Kronecker product and its properties have been studied
in detail by Graham [84]. However, the above formulation is rarely used in
investigations of the QAP. Based on that formulation Lawler [118] gave an
alternative formulation of the QAP as a linear assignment problem (LAP) of
size n with the additional constraint that only (n 2 xn 2 ) permutation matrices
which are Kronecker products of n x n permutation matrices are feasible. If
as before the (n 2 X n 2 ) cost matrix C contains the n 4 costs Cijkl' such that
the (ijkl)-th element corresponds to the element in the ((i -l)n + k)-th row
and ((j - l)n + l)-th column of C, the QAP can be written as

mIll (C,Y)
s.t. Y=X®X, (11)
XEXn ·

Because of the additional constraint to be fulfilled by the feasible solutions


the resulting LAP cannot be solved efficiently.

3 Computational complexity
The results described in this section bring evidence to the fact that the QAP
is a "very hard" problem from the theoretical point of view. Not only that
the QAP cannot be solved efficiently but it even cannot be approximated
efficiently within some constant approximation ratio. Furthermore, finding
250 R.E. Burkard, E. vela, P.M. Pardalos, and L.S. Pitsoulis

local optima is not a trivial task even for simply structured neighborhoods
like the 2-opt neighborhood.
Two early results obtained by Sahni and Gonzalez [164] in 1976 settled
the complexity of solving and approximating the QAP. It was shown that
the QAP is NP-hard and that even finding an €-approximate solution for the
QAP is a hard problem, in the sense that the existence of a polynomial €-
approximation algorithm implies P = N P. In the following, let Z (F, D, ¢»
denote the objective function value of a solution ¢> for a QAP with flow
matrix F and distance matrix D.

Definition 3.1 Given a real number € > 0, an algorithm T for the QAP is
said to be an €-approximation algorithm if
Z(F, D, 1fT) - Z(F, D, 1fopt)
Z(F,D,1f op t)
<
-
€, (12)

holds for every instance QAP(F, D), where 1fT is the solution of QAP(F, D)
computed by algorithm T and 1fopt is an optimal solution of QAP(F,D).
The solution of QAP(F, D) produced by an €-approximation algorithm is
called an €-approximate solution.

Theorem 3.2 (Sahni and Gonzalez [164], 1976)


The quadratic assignment problem is strongly NP-hard.
For an arbitrary € > 0, the existence of a polynomial time €-approximation
algorithm for the QAP implies 'P = N'P.

The proof is done by a reduction from the Hamiltonian cycle problem: Given
a graph G, does G contain a cycle which visits each vertex exactly once (see
[73])?
Queyranne [152] derives an even stronger result which further confirms
the widely spread" belief on the inherent difficulty of the QAP in comparison
with other difficult combinatorial optimization problems. It it well known
and very easy to see that the traveling salesman problem (TSP) is a special
case of the QAP. The TSP on n cities can be formulated as a QAP(F, D)
where F is the distance matrix of the TSP instance and D is the adjacence
matrix of a Hamiltonian cycle on n vertices. In the case that the distance
matrix is symmetric and satisfies the triangle inequality, the TSP is ap-
proximable in polynomial time within 3/2 as shown by Christofides [46].
Queyranne [152] showed that, unless P = NP, QAP(A,B) is not approx-
imable in polynomial time within some finite approximation ratio, even if A
The Quadratic Assignment Problem 251

is the distance matrix of some set of points on a line and B is a symmetric


block diagonal matrix.
A more recent result of Arora, Frieze and Kaplan [7] answers partially
one of the open questions stated by Queyranne in [152]. What happens if
matrix A is the distance matrix of n points which are regularly spaced on a
line, i.e., points with abscissae given by xp = p, p = 1, ... , n? This special
case of the QAP is termed linear arrangement problem and is a well stud-
ied NP-hard problem. In the linear arrangement problem the matrix B is
not restricted to have the block diagonal structure mentioned above, but is
simply a symmetric 0-1 matrix. Arora et al. give a polynomial time approxi-
mation scheme (PTAS) for the linear arrangement problem in the case that
the 0-1 matrix B is dense, i.e., the number of 1 entries in B is in O(n2 ),
where n is the size of the problem. They show that for each f > 0 there
exists an f-approximation algorithm for the dense linear arrangement prob-
lem with time complexity depending polynomially on n and exponentially
on 1if, hence polynomial for each fixed f > O.
Recently it has been shown that even finding a locally optimal solution
of the QAP can be prohibitively hard, i.e., even local search is hard in the
case of the QAP. Below we formalize this idea to some extent.
Assume that an optimization problem P is given by specifying a ground
set &, a set :F ~ 2e of feasible solutions and a cost function c: & ~ lR.
This cost function c implies an objective function I::F ~ lR defined by
I(S) = Lxes c(x), for all S E:F. The goal is to find a feasible solution
which minimizes the objective function. For every feasible solution S E :F
let a neighborhood N(S) C :F of S be given. This neighborhood consists of
feasible solutions which are somehow "close" to S. Now, instead of looking
for a globally optimal solution S* E :F of the problem P, that is
I(S*) = minj(S),·
SeF
we look for a locally optimal solution or a local minimum of P, that is an
8 E :F such that
1(8) = min_ I(S).
SeN(S)
An algorithm which produces a locally optimal solution, is frequently called
a local search algorithm. Some local search algorithms for the QAP are
described in Section 8.
Let us consider the intriguing question "Is it easy to find a locally op-
timal solution for the QAP?". Clearly the answer depends on the involved
252 R.E. Burkard, E. Oela, P.M. Pardalos, and 1.8. Pitsoulis

neighborhood structure. If the neighborhoods N(S) are replaced by new


neighborhoods N'(S), one would generally expect changes in the local op-
timality status of a solution. The theoretical basis for facing this kind of
problems was introduced by Johnson, Papadimitriou and Yannakakis in [97].
They define the so-called polynomial-time local search problems, shortly PLS
problems. A pair (P,N), where P is a (combinatorial) optimization problem
P and N is an associated neighborhood structure, defines a local search prob-
lem which consists of finding a locally optimal solution of P with respect to
the neighborhood structure N. Without going into technical details a PLS
problem is a local search problem for which local optimality can be checked
in polynomial time. In analogy with decision problems, there exist complete
problems in the class of PLS problems. The PLS-complete problems, are -
in the usual complexity sense - the most difficult among the PLS problems.

Murthy, Pardalos and Li [138] introduce a neighborhood structure for the


QAP which is similar to the neighborhood structure proposed by Kernighan
and Lin [109] for the graph partitioning problem. For this reason we will
call it a K-L type neighborhood structure for the QAP. Murthy et al. show
that the corresponding local search problem is PLS-complete.
A K-L type neighborhood structure for the QAP. Consider a
permutation 4>0 E Sn. A swap of 4>0 is a permutation 4> E Sn obtained from
CPo by applying a transposition (i,j) to it, cP = cpoo(i,j). A transposition (i,j)
is defined as a permutation which maps i to j, j to i, and k to k for all k (j.
{i, j}. In the facility location context a swap is obtained by interchanging
the facilities assigned to two locations i and j. A greedy swap of permutation
4>0 is a swap 4>1 which minimizes the difference Z(F, D, cp) - Z(F, D, cpo) over
all swaps 4> of 4>0' Let 4>0,4>1, ... ,4>1 be a set of permutations in Sn, each of
them being a greedy swap of the preceding one. Such a sequence is called
monotone if for each pair of permutations 4>k, 4>t in the sequence, {ik,jk} n
{it,jt} = 0, where 4>k (7rd is obtained by applying transposition (ik,jk)
((it, jt)) to the preceding permutation in the sequence. The neighborhood
of 4>0 consists of all permutations which occur in the (unique) maximal
monotone sequence of greedy swaps starting with permutation 4>0. Let us
denote this neighborhood structure for the QAP by NK-L' It is not difficult
to see that, given a QAP(F, D) of size n and a permutation 4> E Sn, the
cardinality of NK _d7r) does not exceed In/2J + 1.
It is easily seen that the local search problem (QAP,NK_d is a PLS
problem. Pardalos, Rendl, and Wolkowicz [147] have shown that a PLS-
The Quadratic Assignment Problem 253

complete problem, namely the graph partitioning problem with the neigh-
borhood structure defined by Kernighan and Lin [109] is PLS-reducible to
(QAP,N'K-L). This implies the following result.

Theorem 3.3 (Pardalos, Rendl and Wolkowicz [147], 1994)


The local search problem (QAP,N'K_d, where N'K-L is the Kernighan-Lin
type neighborhood structure for the QAP, is PLS-complete.

The PLS-completeness of (QAP,N'K-d implies that, in the worst case, a


general local search algorithm as described above involving the Kernighan-
Lin type neighborhood finds a local minimum only after a time which is
exponential on the problem size. Numerical results, however, show that
such local search algorithms perform quite well when applied to QAP test
instances, as reported in [138].
Another simple and frequently used neighborhood structure in Sn is
the so-called pair-exchange (or 2-opt) neighborhood N'2. The pair-exchange
neighborhood of a permutation 4>0 E Sn consists of all permutations 4> E Sn
obtained from 4>0 by applying some transposition (i,j) to it. Thus, N'2(4)) =
{4> 0 (i, j): 1 ~ i,j ~ n, i # j,}.
It can also be shown that (QAP,N'2) is PLS-complete. Schaffer and Yan-
nakakis [165] have proven that the graph partitioning problem with a neigh-
borhood structure analogous to N'2 is PLS-complete. A similar PLS-reduc-
tion as in [147] implies that the local search problem (QAP,N'2), where N'2
is the pair-exchange neighborhood, is PLS-complete. This implies that the
time complexity of a general local search algorithm for the QAP involving
the pair-exchange neighborhood is also exponential in the worst case.
Finally, let us mention that no local criteria are known for deciding how
good a locally optimal solution is as compared to a global one. l.From
the complexity point of view, deciding whether a given local optimum is a
globally optimal solution to a given instance of the QAP, is a hard problem,
see Papadimitriou and Wolfe [145].

4 Linearizations
The first attempts to solve the QAP eliminated the quadratic term in the
objective function of (2), in order to transform the problem into a (mixed)
0-1 linear program. The linearization of the objective function is usually
achieved by introducing new variables and new linear (and binary) con-
straints. Then existing methods for (mixed) linear integer programming
254 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

(MILP) can be applied. The very large number of new variables and con-
straints, however, usually poses an obstacle for efficiently solving the result-
ing linear integer programs.
MILP formulations provide moreover LP relaxations of the problem which
can be used to compute lower bounds. In this context the "tightness" of the
continuous relaxation of the resulting linear integer program is a desirable
property.
. In this section we present four linearizations of the QAP: Lawler's lin-
earization [118], which was the first, Kaufmann and Broeckx's lineariza-
tion [108], which has the smallest number of variables and constraints, Frieze
and Yadegar's linearization [70] and the linearization of Adams and John-
son [3]. The last linearization which is a slight but relevant modification of
the linearization proposed by Frieze and Yadegar [70], unifies most of the
previous linearizations and is important for getting lower bounds.

4.1 Lawler's Linearization


Lawler [118] replaces the quadratic terms XijXkl in the objective function of
(2) by n 4 variables

Yijkl := XijXkl, i, j, k, 1 = 1,2, ... ,n, (13)

and obtains in this way a 0-1 linear program with n 4 + n 2 binary variables
and n 4 + 2n 2 + 1 constraints. Thus the QAP can be written as the following
0-1 linear program (see [118, 23])
n n
min L L CijklYijkl
i,j=l k,l=l
s.t. (Xij) E Xn,
n n
L L Yijkl = n 2, (14)
i,j=l k,l=l
Xij + xkl - 2Yijkl ~ 0, i,j, k, 1 = 1,2, ... ,n,

Yijkl E {O, I}, i,j, k, 1 = 1,2, ... , n.

4.2 Kaufmann and Broeckx Linearization


By adding a large enough constant to the cost coefficients, which does not
change the optimal solution, we may assume that all cost coefficients Cijkl are
The Quadratic Assignment Problem 255

nonnegative. By rearranging terms in the objective function (2) we obtain


n n
L Xij L CijktXkl· (15)
iJ=1 k,I=1

Kaufmann and Broeckx [108] define n 2 new real variables


n
Wij := Xij L CijklXkl, i,j = 1, ... , n, (16)
k,t=1

and plug them in the objective function of (15) to obtain a linear objective
function of the form n
L Wij· (17)
iJ=1

Then they introduce n 2 constants aij := L:~,'=1 Cijkl for i,j = 1, ... , n,
and show that the QAP (2) is equivalent to the following mixed 0-1 linear
program
n
min L Wij
i,j=1

n
aijXij + L CijklXkl - Wij $ aij, i,j = 1, ... ,n, (18)
k,I=1
Wij ~ 0, i,j = 1,2, ... ,no
This formulation employs n 2 real variables, n 2 binary variables and n 2 + 2n
constraints. The proof of equivalence of the QAP to the mixed integer linear
program (18) can be found in [23, 108]. The above linearization, as well as
others that appeared in the literature (see e.g. [24, 29]), are obtained by
applying the general linearization strategy proposed by Glover [78].

4.3 Frieze and Yadegar Linearization


Frieze and Yadegar [70] replace the products XijXkl of the binary variables
by continuous variables Yijkl (Yijkl := XijXkt) and get the following mixed
integer linear programming formulation for the QAP (2)
n n
min L L CijklYijkl (19)
iJ=1k,I=1
256 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

s.t. (Xij) E Xn, (20)


n
2: Yijkl = Xkl, j, k, 1 = 1, ... ,n, (21)
i=1
n
2: Yijkl = Xkl, i, k, 1 = 1,2, ... ,n, (22)
j=1
n
2: Yijkl = Xij, i, j, 1 = 1, ... ,n, (23)
k=1
n
2: Yijkl = Xij, i, j, k = 1,2, ... ,n, (24)
1=1
Yijij = Xij, i, j = 1,2, ... ,n, (25)
o ~ Yijkl ~ 1, i, j, k, 1 = 1,2, ... ,n. (26)

This mixed integer program has n 4 real variables, n 2 binary variables and
n 4 + 4n3 + n 2 + 2n constraints. For obtaining a lower bound Frieze and
Yadegar considered a Lagrangean relaxation of this mixed integer program
by relaxing the constraints (23) and (26) and solved it approximately by ap-
plying subgradient optimization techniques. They showed that the solution
of the Lagrangean relaxation is larger than all lower bounds derived from
reduction techniques applied to the Gilmore-Lawler bound for the QAP (see
Section 7.1). l.From a result of Geoffrion [75] follows that the solution of
the Lagrangean relaxation equals the solution of the continuous relaxation
of the mixed integer program (19)-(26).

It is interesting to notice here that the gap between the optimal value of this
continuous relaxation and the optimal value of the QAP can be enormous.
Dyer, Frieze, and McDiarmid [60] showed for QAPs whose coefficients Cijkl
are independent random variables uniformly distributed on [0,1] that the
expected optimal value of the above mentioned linearization has a size of
O(n). On the other hand the expected optimal value of such QAPs increases
with high probability as O(n2 ), as shown by Burkard and Fincke [32]. Con-
sequences of this asymptotic behavior will be discussed in some detail in
Section 12. No similar asymptotic result is known for the continuous relax-
ation of the linearization due to Adams and Johnson [3] which is presented
. in the following section.
The Quadratic Assignment Problem 257

4.4 Adams and Johnson Linearization


Adams and Johnson presented in [3] a new 0-1 linear integer programming
formulation for the QAP, which resembles to a certain extent the lineariza-
tion of Frieze and Yadegar. It is based on the linearization technique for
general 0-1 polynomial programs introduced by Adams and Sherali in [4, 5].
The QAP with array of coefficients C = (Cijkl) is proved to be equivalent to
the following mixed 0-1 linear program

n n
min L: L: CijklYijkl (27)
i,;=1 k,l=1
s.t. (Xij) E Xn,
n
L: Yijkl = Xkl, j, k, 1 = 1, ... ,n,
;=1
n
L: Yijkl = Xkh i, k, 1 = 1,2, ... ,n,
j=1
Yijkl = Yklij, i,j, k, 1 = 1, ... ,n, (28)
Yijkl 2:: 0, i,j, k, 1 = 1,2, ... ,n,
where each Yijkl represents the product XijXkl. The above formulation con-
tains n 2 binary variables Xij, n 4 continuous variables Yijkl, and n 4 +2n3 +2n
constraints excluding the nonnegativity constraints on the continuous vari-
ables. Although as noted by Adams and Johnson [3] a significant smaller
formulation in terms of both the variables and constraints could be obtained,
the structure of the continuous relaxation of the above formulation is favor-
able for solving it approximately by means of the Lagrangean dual. (See
Section 6.2 for more information.)
The theoretical strength of the linearization (27) lies in the fact that the
constraints of the continuous relaxations of previous linearizations can be
expressed as linear combinations of the constraints of the continuous re-
laxation of (27), see [3, 98]. Moreover, many of the previously published
lower-bounding techniques can be explained based on the Lagrangean dual
of this relaxation. For more details on this topic we refer to Section 6.2.
As noted by the Adams et al. [3], the constraint set of (27) describes
a solution matrix Y which is the Kronecker product of two permutation
matrices (Le., Y = X ® X where X E Sn), and hence this formulation of
the QAP is equivalent to (11).
258 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

5 QAP Polytopes
A polyhedral description of the QAP and of some of his relatives have
been recently investigated by Barvinok [12], Jiinger and Kaibel [100, 101],
Kaibel [102], and Padberg and Rijal [142, 161]. Although in an early stage
yet, the existing polyhedral theory around the QAP counts already a num-
ber of results concerning basic features like dimensions, affine hulls, and
valid and facet defining inequalities for the general QAP polytope and the
symmetric QAP polytope.
The linearization of Frieze and Yadegar introduced in the previous sec-
tion can be used as a starting point for the definition of the QAP polytope.
The QAP polytope is defined as a convex hull of all 0-1 vectors (Xij, Yijkl) ,
1 :::; i, j, k, 1 :::; n, which are feasible solutions of the MILP formulation of
Frieze and Yadegar [70].
Another possibility to introduce the QAP polytope is the formulation of
the QAP as a graph problem as proposed by Jiinger and Kaibel [100]. This
formulation provides some additional insight in the problem and allows an
easier use of some technical tools e.g. projections and affine transformations.
The latter lead to a better understanding of the relationship between the
general QAP polytope and related polytopes, e.g. the symmetric QAP poly-
tope, or well studied polytopes of other combinatorial optimization problems
like the traveling salesman polytope or the cut polytope (see [102]).
For each n E IN consider a graph Gn = (Vn, En) with vertex set Vn =
{(i,j): 1 :::; i,j :::; n} and edge set En = {((i,j), (k, I)): i =J k,j =J I}. Clearly,
the maximal cliques in Gn have cardinality n and correspond to the per-
mutation matrices. Given an instance of the Lawler QAP with coefficients
Cijkl and linear term coefficients bij, we introduce bij as vertex weights and
Cijkl as weight of the edge ((i,j), (k, l)). Solving the above QAP instance
is equivalent to finding a maximal clique with minimum total vertex- and
edge-weight. For each clique 0 in Gn with n vertices we denote its incidence
2 n 2 (n_l)2
vector by (xC,y C ), where xC E lRn , yC E lR 2

,, _ {I if (i,j) EO, I if (i,j),(k,l) EO,


= {
X
Yijkl
'3 -
o otherwise o otherwise

The QAP polytope denoted by QAPn is then given by

QAPn := conv{(x c , yC): 0 is a clique with n vertices in Gn }.


The Quadratic Assignment Problem 259

It turns out that the traveling salesman polytope and the linear ordering
polytope are projections of QAPn , and that QAPn is a face of the Boolean
quadric polytope, see [102].
Barvinok [12], Padberg and Rijal [142], and Junger and Kaibel [100]
have independently computed the dimension of QAPn , and have shown that
the inequalities Yijkl ~ 0, i =I k, j =I I, are facet defining. (These are usu-
ally called trivial facets of QAPn .) Moreover, Padberg and Rijal [142], and
Junger and Kaibel [100] have independently shown that the affine hull of
QAPn is described by the following equations which are linearly indepen-
dent:
n
LXij = 1, 1~j~n-1 (29)
i=1
n
LXij = 1, 1 ~ i ~ n, (30)
j=1
k-l n
o 1 ~ j =I 1 ~ n, 1 ~ k ~ n - 1{31)
-Xkl +L Yijkl + L Yklij
or 1 ~ I < j ~ n, k = n
i=l i=k+l

1~j n, 1 ~ i ~ n - 3,
~
j-l n
i <k n -lor
~
-Xij +L Yijkl + L Yijkl = 0 1~j~ n -1, i = n - 2,
(32)
1=1 l=j+l
k=n-1
jl n
~ ~ j ~ n - 1, 1 ~ i ~ n - 3'(33)
-Xkj +L Yilkj + L Yilkj = 0 z<k~n-1
11 l=j+l

Summarizing we get the following theorem:


Theorem 5.1 (Barvinok [12], 1992, Junger and Kaibel [100], 1996, Pad-
berg and Rijal [142], 1996)

(i) The affine hull of the QAP polytope QAPn is given by the linear equa-
tions (29)-(33). These equations are linearly independent and the rank
of the system is 2n(n - 1)2 - (n - l)(n - 2), for n ~ 3.

(ii) The dimension ofQAPn is equal to 1+(n-1)2+n (n-1)(n-2)(n-3)j2,


for n ~ 3.

(iii) The inequalities Yijkl ~ 0, i < k, j =II, define facets of QAPn .


260 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Padberg and Rijal [142] identified additionally two classes of valid inequal-
ities for QAPn , the clique inequalities and the cut inequalities, where the
terminology is related to the graph Gn • The authors identify some condi-
tions under which the cut inequalities are not facet defining. It is an open
problem, however, to identify facet defining inequalities within these classes.
A larger class of valid inequalities, the so-called box inequalities have been
described by Kaibel [102]. Those inequalities are obtained by exploiting the
relationship between the Boolean quadric polytope and the QAP polytope.
A nice feature of the box inequalities is that it can be decided efficiently
whether they are facet defining or not, and in the latter case some facet
defining inequality which dominates the corresponding box inequality can
be derived.
Similar results have been obtained for the symmetric QAP polytope,
SQAPn , arising in the case that at least one of the coefficient matrices
of the given QAP (matrices F, D in (1)) is symmetric. The definition of
SQAPn is given by means of a hypergraph Hn = (Vn' Fn ), where Vn is
the same set of vertices as in graph Gn and Fn is the set of hyperedges
{(i,j), (k,l), (i,l), (k,j)} for all i # k, j # l. A set G c Vn is called a clique
in Hn if it is a clique in Gn . Again, the incidence vector (xC, yC) of a clique
G is introduced by

X
.. _
tJ -
{I if (i,j) E G
Yijkl ={
I ifi < k, I #j, (i,j), (k,l) E G
o otherwise o otherwise

2 n 2 (n_l)2
Here, xC E IRn and yC E IR 4 • The polytope SQAPn is then defined
as

SQAPn := conv{(x C , yC): G is a clique with n vertices in Gn }

Padberg and Rijal [142] and Junger and Kaibel [101] showed that the fol-
lowing system of equations (34)-(37) offers a minimal linear description of
the affine hull of SQAPn .
n
LXij 1 1$ i $ n (34)
j=l
n
LXij = 1 l$j$n-l (35)
i=l
The Quadratic Assignment Problem 261

j-l n
l~i<k~n
-Xij - Xkj + LYilkj + L Yijkl = 0
1 ~ j ~ n,
(36)
1=1 I=j+l

k-l n l~k~n
-Xkj - Xkl + LYijkl + L Ykjil - 0 1~j ~ n - 3, (37)
i=1 i=k+l 1~j<l~n-1

Junger and Kaibel [101] proved a conjecture of Padberg and Rijal concerning
the dimension of SQAPn . They also introduced a class of facet defining
inequalities, so-called curtain inequalities. The separation problem for these
inequalities has been shown to be NP-hard.
By summarizing these results we get the following theorem

Theorem 5.2 (Junger and Kaibel [101], 1996, Padberg and Rijal [142],
1996)

{i} The affine hull of the symmetric QAP polytope SQAPn is described by
the linear equations {34}-{37}. These equations are linearly indepen-
dent and the rank of the system is n 2 (n - 2) + 2n - 1.

{ii} The dimension of SQAPn is equal to (n - 1)2 + n 2 (n - 3)2/4.

{iii} The inequalities Yijkl ~ () for i < k, j < I, and Xij ~ 0 for 1 ~ i,j ~ n,
define facets of QAPn .

{ivy For each i < k and for all J ~ {1, 2, ... ,n} the row curtain inequalities

- L Xij + L Yijkl ~0
jeJ j,leJ
j<1

are valid for SQAPn . For each j < I and for all I ~ {1, 2, ... ,n} the
column curtain inequalities

- L Xij + L Yijkl ~0
ieI i,ke1
i<k

are valid for SQAPn .


All curtain inequalities with 3 ~ IJI, IJI ~ n-3 define facets of SQAPn .
The other curtain inequalities define faces which are contained in triv-
ial facets of SQAPn .
262 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Finally, there are some additional results concerning the affine descrip-
tion and the facial structure of polytopes of special versions of sparse QAPs,
e.g. sparse Koopmans-Beckmann QAPs, see Kaibel [102]. The idea is to
take advantage of the sparsity for a better analysis and description of the
related polytopes. These investigations, however, are still in their infancy.

6 Lower Bounds
Lower bounding techniques are used within implicit enumeration algorithms,
such as branch and bound, to perform a limited search of the feasible region
of a minimization problem, until an optimal solution is found. A more lim-
ited use of lower bounding techniques concerns the evaluation of the perfor-
mance of heuristic algorithms by providing a relative measure of proximity
of the suboptimal solution to the optimum. In comparing lower bounding
techniques, the following criteria should be taken into consideration:

• Complexity of computing the lower bound.

• Tightness of the lower bound (Le., "small" gap between the bound and
the optimum solution).

• Efficiency in computing lower bounds for subsets of the original feasible


set.

Since there is no clear ranking of the performance of the lower bounds that
will be discussed below, all of the above criteria should be kept in mind while
reading the following paragraphs. Considering the asymptotic behavior of
the QAP (see Section 12) it should be fair to assume that the tightness
of the lower bound probably dominates all of the above criteria. In other
words, if there is a large number of feasible solutions close to the optimum,
then a lower bound which is not tight enough, will fail to eliminate a large
number of subproblems in the branching process.

6.1 Gilmore-Lawler Type Lower Bounds


Based on the formulation of the general QAP as an LAP of dimension n 2
stated in formulation (11), Gilmore [77] and Lawler [118] derived lower
bounds for the QAP, by constructing a solution matrix Y in the process
of solving a series of LAPs. If the resulting matrix Y is a permutation ma-
trix, then the objective function value yielded by Y is optimal, otherwise
The Quadratic Assignment Problem 263

it is bounded from below by (C, Y). In this section we briefly describe a


number of bounding procedures which exploit this basic idea.

The Gilmore-Lawler bound


Consider an instance of the Lawler QAP (2) with coefficients C = (Cijkl), and
partition the array C into n 2 matrices of dimension n x n, C(i,j) = (Cijkl),
for each fixed pair (i,j), i,j = 1,2, ... ,n. Each matrix C(i,j) essentially
contains the costs associated with the assignment Xij = 1. Partition the
solution array Y = (Yijkl) also into n 2 matrices, y(i,j) = (Yijkl), for fixed
i,j = 1,2, ... , n.
For each pair (i,j), 1 ~ i,j ~ n, solve the LAP with cost matrix C(i,j)
and denote its optimal value by Ii;:
n n
Iij = min ~ ~ CijklYijkl (38)
k=11=1
n
s.t. ~ Yijkl = 1, I = 1,2, ... ,n,
k=1
n
~Yijkl = 1, k = 1,2, ... ,n,
1=1
Yijij =1 (39)
Yijkl E {0,1}, i,j = 1,2, ... ,no (40)

Observe that constraint (39) essentially reduces the problem into an LAP of
dimension (n - 1) with cost matrix obtained from C(i,j) by deleting its i-th
row and j-th column. For each i,j, denote by y(i,j) the optimal solution
matrix of the above LAP.
The Gilmore-Lawler lower bound GLB(C) for the Lawler QAP with co-
efficient array C is given by the optimal value of the LAP of size n with cost
matrix (Iij)
n n
GLB(C) = min ~~lijxij (41)
i=1 j=1
s.t. (Xij) E Xn .
Denote by X* = (xij) the optimal solution matrix of this last LAP. If
~ L:ij xijy(i j ) E X n , then the array y* = (Yijkl) with matrices y(i,j)* =
xijy(i j ) for all i,j, 1 ~ i,j ~ n, is a Kronecker product of two permutation
264 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

matrices of dimension n, and hence an optimal solution of the considered


QAP. Since each LAP can be solved in O(n 3 ) time, the above lower bound
for the Lawler QAP (2) of dimension n can be computed in O(n 5 ) time.
For the more special Koopmans-Beckmann QAP (1), where the quadratic
costs Cijkl are given as entry-wise products of two matrices F = (lij) and
D = (dij) , Cijkl = fijdkl for all i,j, k, I, the computational effort can be
reduced to O(n 3 ). This is due to the following well known result of Hardy,
Littlewood, and P6lya [92]:
Proposition 6.1 (Hardy, Littlewood and P6lya [92], 1952)
Given two n-dimensional real vectors a = (ai), b = (bi) such that 0 ~ al ~
a2 ~ ... ~ an and bl ~ ~ ~ ... ~ bn ~ 0, the following inequalities hold for
any permutation 4J of 1,2, ... ,n:
n n n
L aibi ~ L aibtf>(i) ~ L ai bn-i+1
i=l i=l i

Given two arbitrary nonnegative vectors a, b E rn.n , let 4J be a permu-


tation which sorts a non-decreasingly and 'I/J a permutation which sorts
a non-increasingly. Moreover, let 1(" be a permutation which sorts b non-
increasingly. We denote
n n
(a,b)- := L atf>(i)b7r(i) (a, b) + := L a,p(i)b7r(i) (42)
i=l i=l

Consider now an instance (1) of the Koopmans-Beckmann QAP. This can


be written as a Lawler QAP of the form (2) by setting

fikdjl, for i =1= k, j =1= 1


Cijkl := {
liidjj + bij, for i = k,j = l.
Each matrix a(i,j) of the array a is then given by a(i,j) = (likdjl). There-
fore, instead of solving n 2 LAPs we can easily compute the values lij by
applying Proposition 6.1, as

(43)

where i(i,.}' dU,.) E rn.n - 1 are (n -I)-dimensional vectors obtained from the
i-th and the j-th row of F and D by deleting the i-th and the j-th element,
respectively. Finally, by solving the LAP with cost matrix (lij) as in (4I), we
The Quadratic Assignment Problem 265

obtain the Gilmore-Lawler lower bound for the Koopmans-Beckman QAP.


The appropriate sorting of the rows and columns of F and D can be done
in O(n2 10gn) time. Then the computation of alllij takes O(n 3 ) time and
the same amount of time is needed to solve the last LAP.
Similar bounds have been proposed by Christofides and Gerrard [48].
The basic idea relies again on decomposing the given QAP into a number
of subproblems which can be solved efficiently. First solve each subproblem,
then build a matrix with the optimal values of the subproblems, and solve an
LAP with that matrix as cost matrix to obtain a lower bound for the given
QAP. Christofides et al. decompose the Koopmans-Beckmann QAP(F, D)
based on isomorphic-subgraphs of graphs whose weighted adjacency matrices
are F and D. The GLB is obtained as a special case, if these subgraphs are
stars, and it generally outperforms the bounds obtained by employing other
subgraphs, like single edges, or double stars (see also [74]).
The Gilmore-Lawler bound is simple to compute, but it deteriorates fast
as n increases. The quality of this lower bound can be improved if the given
problem is transformed such that the contribution of the quadratic term in
the objective function is decreased by moving costs to the linear term. This
is the aim of the so-called reduction methods.

Reduction methods
Consider a Lawler QAP as in (2), and assume that bij = 0 for all i,j. By
the above discussion the GLB will be given as solution of the following LAP
n n
mIn 2: 2: (lij + Cijij )Xij
i=1 j=1
s.t. (Xij) E X n . (44)
We want to decompose the cost coefficients in the quadratic term of (2) and
transfer some of their value into the linear term such that Cijij » Iij' This
would yield a tighter lower bound because the LAP can be solved exactly.
This procedure is known as reduction and was introduced by Conrad [54].
Reductions have been investigated by many researchers (see [21, 162, 62,
70]). The general idea is to decompose each quadratic cost coefficient into
several terms so as to guarantee that some of them end up in being linear cost
coefficients and can be moved in the linear term of the objective function.
Consider the following general decomposition scheme:

D-1: Cijkl = Cijkl + eijk + 9ijl + hikl + tjkl, i =P k, j =P l,


266 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

where e, g, h, t E rn.n3 • Substituting the above in the objective function of


(2) we obtain a new QAP which is equivalent with the given one and whose
objective function has a quadratic and a linear part. (Formulas for the
coefficients of this new QAP can be found in the literature, e.g. [70].) For
the quadratic term we can compute the Gilmore-Lawler bound. Then we
add it to the optimal value of the linear part in order to obtain a lower
bound for the QAP.
In the case of the Koopmans-Beckman QAP the general decomposition
scheme is

D-2: Iii = hi + Ai + I-'i' i -:F j,


dkl = dkl + 11k + ¢l, k -:F I,

where A, 1-', 11, ¢ E rn.n .


Frieze and Yadegar [70] have shown that the inclusion of vectors hand t in
D-l, or similarly the inclusion of vectors I-' and ¢ in D-2, does not affect
the value of the lower bound. Therefore these vectors are redundant.
As mentioned also in Section 4.3, Frieze and Yadegar derived lower
bounds for the QAP based on a Lagrangean relaxation of the mixed inte-
ger linear programming formulation (19)-(26). By including the constraints
(21) and (22) in the objective function (19) and using vectors e and 9 as
Lagrangean multipliers, we get the following Lagrangean problem
£(e,g) =

min {2:ijkl CijklYijkl + 2: j kl ejkl (Xkl - 2: i Yijkl) + 2:ikl gikl (Xkl - 2:j Yijkl) } =
2:ijkl (Cijkl - ejkl - gikl)Yijkl + 2:ij (2:k ekij + 2:, g,ij) Xij
s.t. constraints (20), (23), ... ,(26).

As proved in [70], for any choice of e and g, the solution to the above La-
grangean problem equals the value of the GLB obtained after the decompo-
sition of the coefficient Cijkl by using only vectors e and 9 in D-l. Therefore,
maxe,g C( e, g) constitutes a lower bound for the QAP which is larger (Le.,
better) than all GLBs obtained after applying reduction methods according
to D-l (D-2). Frieze and Yadegar propose two subgradient algorithms to
approximately solve maxe,g C( e, g), and obtain two lower bounds, denoted
by FYI and FY2. These bounds seem to be sharper than the previously
reported Gilmore-Lawler bounds obtained after applying reductions.
The Quadratic Assignment Problem 267

Bounding techniques based on reformulations


Consider the Lawler QAP with a linear term in the objective function:
n n n
min E E CijklXikXjl + E bikXik
i,k=1 j,I=1 i,k=1
s.t.
n
E Xik = 1, 1:5 k :5 n,
i=1
n
E Xik = 1, 1 $ i :5 n,
k=1
XikE{O,I}, l:5i,k:5n.

As already mentioned in Section 1, we assume without loss of generality


that the coefficients Cijkl' 1 :5 i,j, k, 1 :5 n are nonnegative.
A reformulation of this QAP is another QAP of the same form with new
coefficients <jkl' 1 :5 i,j,k,l :5 n, and b~k' 1 :5 i,k :5 n, such that for all
permutation matrices (Xij)
n n n n n n
L L CijklXikXjl + L bikXik = L L C~jklXikXjl + L b~kXik'
i,k=1 j,I=1 i,k=1 i,k=1 j,I=1 i,k=1

holds. The basic idea is to derive a sequence of reformulations of the given


problem by applying some "appropriate" reformulation rule. When we com-
pute the GLB for each reformulation in the sequence, the best among these
bounds is a valid bound for the original QAP. The reformulation rule is
"appropriate" if the sequence of GLBs computed for the reformulations is
monotonically nondecreasing. Usually, the construction of a new reformula-
tion exploits the previous reformulations and the bounds obtained for them.
Carraresi and Malucelli in [40] have proposed the following scheme to derive
the coefficients of the reformulation

C~jkl = Cijkl + Tijkl - aijl - /3jkl + ()ik, 1:5 i,j, k, I :5 n,


n n
b~k = bik +L aijk +L /3ikl - (n - 1)()ik, 1:5 i, k :5 n.
j=1 l=1

This type of bounding strategies has been proposed by Carraresi and Malu-
celli [39] and Assad and Xu [8]. The parameters a, /3, T and () are updated
in each reformulation step. Their values are determined by making use of
the lower bound obtained for the last reformulation and the optimal values
268 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

and the dual variables of the linear assignment problems solved during the
last GLB computation. Clearly, not all choices of the parameters T, i}, {3
and () in the above formulas produce a reformulation but there are settings
of those parameters which do so, as shown in [8, 39].
To illustrate the idea consider the reformulation formulas proposed by Car-
raresi and Malucelli in [40]:
(H1) (t) (t)
Tijkl - cijkl - cji1k ' (45)
(t+1) (t)
i}ijl - uijl' (46)
{3(t+1) (t)
jkl - vjkl' (47)
()(t+1)
ik - 1
--1 (t)
cik
+ ui(t) + vk(t») , (48)
n-
for all 1 ~ i, j, k, l ~ n. Here t is an index which counts the reformulations,
u~~L 1 ~ i ~ n, and V)~" 1 ~ k ~ n, are the optimal values of the dual
l
variables of the LAP with cost matrix (C~~~, + bW), for 1 ~ j, ~ n. Let l!r
be the optimal values of these LAPs, 1 ~ i, k ~ n. Then u~t), 1 ~ i ~ n,
vi
and t ) , 1 ~ k ~ n, are optimal values of the dual variables for the LAP
with costs matrix (l~r + bW) (i.e., the last LAP solved to compute the GLB
of the t-th reformulation). The bound produced with these settings is often
denoted by CMB in the literature. Clearly, the computation of CMB (as
well as the computation of the bounds obtained by applying the reformula-
tion schemes proposed in [8, 39]) involves O(n 5 ) elementary operations per
iteration.
The reformulation schemes generally produce bounds of good quality.
However, these bounding techniques are quite time-consuming, as n 2 + 1
linear assignment problems per iteration have to be solved. Finally it has
been shown in [39] that in the case that Cijkl = Cjilk, for alII ~ i, j, k, l ~ n,
the general reformulation scheme cannot produce lower bounds which are
better than the optimal value of the continuous relaxation of the mixed
integer programming formulation of Frieze and Yadegar.

Lower bounds for the QAP based on a dual formulation


More recently another bounding procedure which shares the basic idea of
the GLB has been proposed by Hahn and Grant [90, 91]. This procedure
combines GLB ideas with reduction steps in a general framework which
The Quadratic Assignment Problem 269

works also for the Lawler QAP (2). The resulting bound is denoted by
HGB. Recall that we assume w.l.o.g. that all Cijkl in (2) are nonnegative.
As described in 2.4 the four dimensional array C = (Ci.i.k~) is thought as
being an n 2 x n 2 matrix composed of n 2 submatrices Cll,J), I :5 i,j :5 n,
where each C(i,j) is an n x n matrix given by C(i,j) = (Cijkl). This structure
of C complies with the structure of the Kronecker product X ® X, where X
is an n x n permutation matrix. The entries Cijij are called leaders. Clearly,
there is only one leader in each matrix C(i,j). The objective function value
corresponding to permutation <p consists of the sum of those entries Cijkl
which correspond to I-entries in the Kronecker product Xrp ® Xrp, where Xrp
is the permutation matrix corresponding to permutation <p. Hence, entries
of the form Cijil, j f. I, or Cijkj, if. k, do not contribute to the value of the
objective function. Such entries are called disallowed entries. Entries which
are not disallowed are said to be allowed.
The bounding procedure uses the following classes of operations acting
on the matrix (Cijkl):

(RI) Add a constant to all allowed entries of some row (column) of some
submatrix C(ij) and either subtract the same constant from the allowed
entries of another row (column) of the same submatrix, or subtract it
from the leader in that submatrix.

(R2) Add a constant to all allowed entries of some row (column) of the
n 2 x n 2 matrix (Cijkl).
Clearly, operations of class RI do not change the objective function; They
just redistribute the entries of the submatrices C(ik). Operations of class R2
add a constant to the objective function, and hence they maintain the order
of permutations with respect to the corresponding values of the objective
function. The main idea is then to transform C by applying operations of the
classes RI and R2 so as to decrease the objective function by some amount,
say R, and to preserve the nonnegativity of entries of the transformed array
C'. Then, clearly, R is a lower bound for the optimal solution of the given
QAP. If, moreover, the O-entries in the transformed matrix C' comply with
the pattern of zeros in the Kronecker product Xrp®Xrp for some permutation
matrix Xrp, then R is the optimal value of the original QAP and permutation
¢ is an optimal solution.
The procedure developed to find such a lower bound R, or possibly, to
optimally solve the problem, is essentially similar to the Hungarian method
for the linear assignment problem. It uses operations of classes RI and R2
270 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

to redistribute the entries of C so as to obtain a pattern of zeros which


complies with the pattern of zeros of the Kronecker product X ® X for
some permutation matrix X. The whole process is a repeated computation
of Gilmore-Lawler bounds on iteratively transformed problem data, where
the transformations generalize the ideas of reduction methods. The time
complexity of each iteration is basically that of the GLB computation for a
Lawler QAP (i.e. O(n 5 )).
A deeper investigation of this bounding procedure reveals that it is an
iterative approach in which the dual of some LP relaxation of the original
problem is solved and reformulated iteratively (see Karisch, Qela, Clausen
and Espersen [104]). The reformulation step makes use of the information
furnished by the preceding solution step. Some more details of this inter-
pretation are given in Section 6.2.
As reported in [90] this bounding procedure has been tested on small and
middle sized QAP instances from QAPLIB [34]. The computational results
show an improved trade-off between quality of bounds and computation
time, when compared to other bounding techniques. Other computational
results of Hahn et al. [91] show that it is promising to involve the HGB in
branch and bound approaches.

6.2 Bounds Based on Linear Programming Relaxations


As we saw in Section 4 several mixed integer linear programming (MILP)
formulations have been proposed for the QAP. Clearly, the optimal solution
of the continuous relaxation of an MILP formulation is a lower bound for
the optimal value of the corresponding QAP. Moreover, each feasible solu-
tion of the dual of this relaxation is also a lower bound. The identification
of appropriate continuous relaxations of MILP formulations, and the devel-
opment of solution methods to solve these relaxations or their duals, have
been important aspects of research on the QAP.
In the context of lower bound computation two MILP formulations of the
QAP playa special role: The formulation of Frieze and Yadegar [70] de-
scribed in Section 4.3 and that of Adams and Johnson [3] described in Sec-
tion 4.4.
As we have already mentioned Frieze and Yadegar consider a Lagrangean
relaxation of their MILP formulation and develop two subgradient opti-
mization based algorithms to approximately solve the latter. The resulting
bounds denoted by FYI and FY2, respectively, perform better than the
Gilmore-Lawerbound.
The Quadratic Assignment Problem 271

Adams and Johnson build upon the MILP formulation of Frieze and Yade-
gar and propose a slightly different MILP formulation. As shown in [3] the
continuous relaxation of this formulation is tighter than the continuous re-
laxation of the formulation of Frieze et al. in the sense that the optimal value
of the former may be strictly larger than that of the latter. Moreover, the
constraints of the continuous relaxation of the formulations of Frieze et al.
can be obtained as a linear combination of the constraints of the continuous
relaxation of the formulation of Adams and Johnson.
Adams et al. consider a Lagrangean relaxation of (27) obtained by adding
the so-called complementary constraints (28) to the objective function with
Lagrangean multipliers O:ijkl. This Lagrangean relaxation denoted by AJ(o:)
is given below
n n n n
mm L L L L (Cijkl - O:ikjl)Yikjl-
i=1 j=1 k=1 1=1
j>i 1#
n n n n n n
L L L L (Cijkl - O:jlik)Yikjl +L L aikbikXik
i=lj=lk=II=1 i=lk=1
j<i 1#
s.t.
n
L Xik = 1, 1 ::; k ::; n,
i=1
n
L Xik = 1, 1 ::; i ::; n,
k=1
n
(AJ(o:)) L Yijkl = Xik, 1::; i, k, I ::; n,
j=1
n
L Yijkl = Xik, 1::; i, j, k ::; n,
1=1
Xik E {O, I}, 1 ::; i, k ::; n,
0::; Yijkl ::; 1, 1 ::; i,j, k, I ::; n.

Let 8(0:) denote the the optimal value of AJ(o:). Then max Q 8(0:) equals
the optimal value of the continuous relaxation of (27). Adams and John-
son [3] show that for each fixed set of the multipliers 0: the problem AJ(o:)
can be solved efficiently by solving n 2 + 1 LAPs, where n is the size of the
considered QAP. Moreover they develop an iterative dual ascent procedure
to approximately solve the above maximization problem. In each iteration
problem AJ(o:) is solved to optimality and the optimal value 8(0:) is com-
puted. Clearly, 8(0:) is a lower bound for the considered QAP. Then the
multipliers O:ijkl are updated by using the information contained in the dual
variables of the LAPs solved during the previous iteration. The algorithm
272 R.E. Burkard, E. Oela, P.M. Pardalos, and L.S. Pitsoulis

stops after having performed a prespecified number of iterations, and then


clearly, the solution it outputs gives a lower for the original QAP. These
bounds are denoted by AJB. Adams and Johnson propose two updating
rules for the multipliers, one of them leading to a non-decreasing sequence
of lower bounds 9(a). In both cases the time complexity of this bounding
procedure is dominated by the solution of n 2 + 1 LAPs in each iteration and
amounts to O(n5 ) per iteration.
The strength of AJB relies on the fact that it generalizes and unifies
all Gilmore-Lawler-like bounds (see Section 6.1) but the HGB. Adams et
a1. have shown that 9(0) equals the Gilmore-Lawler bound whereas GLBs
obtained after applying reductions as well as the bounds of Carraresi and
Malucelli [39] and Assad and Xu [8] equal 9(a) for special settings of the
Lagrangean multipliers aijkl. l.From a practical point of view numerical
experiments with instances from QAPLIB show that AJB generally outper-
forms the above mentioned bounds. However, according to the numerical
results reported in [3, 90], HGB outperforms AJB in terms of quality, while
having higher computation time requirements.
The theoretical relationship between AJB and HGB has been investi-
gated recently by Karisch, Qela, Clausen and Espersen [104]. It turns out
that unlike other Gilmore-Lawler-like bounds, HGB cannot be obtained by
applying the algorithm of Adams and Johnson to solve the Lagrangean re-
laxation. However, both AJB and HGB can be obtained as feasible solu-
tions of the dual of the continuous relaxation of the MILP formulation (27)
proposed by Adams and Johnson. Karisch et a1. propose an iterative al-
gorithm to approximately solve this dual, and show that AJB, HGB, and
all other Gilmore-Lawler-like bounds can be obtained by applying this algo-
rithm with specific settings for the control parameters. Moreover, the same
authors identify a setting of parameters which seems to produce a bound
which is competitive with HGB in terms of quality and provides a better
time/quality trade-off. This bound denoted by KCCEB seems to be espe-
cially suitable for use within branch and bound algorithms (see [104] for
more details).
Concerning the solution to optimality of the continuous relaxation of
(27), Adams and Johnson point out that the resulting linear program (LP) is
highly degenerated, and degeneracy poses a problem for primal approaches.
An effort to solve this LP relaxation has been done by Resende, Ramakr-
ishnan and Drezner [158]. These authors use an interior point approach to
solve the LP relaxation for QAP instances of size smaller than or equal to
30 taken from QAPLIB [34]. For larger instances the memory requirements
The Quadratic Assignment Problem 273

become prohibitive. The bounds of Resende et al., frequently denoted by


IPLP, turn out to be the best existing bounds for a large number of test
instances from QAPLIB. However, the computation of the IPLP bounds
requires very high computation times (see [158]) and therefore, the IPLP
bounds cannot be used within branch and bound algorithms, despite their
good quality.
The HGB bound of Hahn et al. [90] and the KCCEB bound of Karisch et
al. [104] seem to be the only linearization bounds comparable with IPLP, in
terms of tightness. Moreover, generally, HGB can be computed much faster
than IPLP, whereas KCCEB seems to be computable at least one order of
magnitude faster than IPLP (see [104]).

6.3 Variance Reduction Lower Bounds


The variance reduction lower bounds were introduced by Li, Pardalos, Ra-
makrishnan and Resende in [123]. Consider an instance of the Koopmans-
Beckmann QAP of size n, with flow and distance matrices F = (fij) and
D = (dij). Partition both matrices as F = Fl +F2 and D = Dl +D2, where
(1) (2) (1) (2)
Fl = (fij ), F2 = (fij ) and Dl = (dij ), D2 = (dij ), and define a new
n x n matrix L = (Iij), by solving the following n 2 LAPs

(49)

It has been shown in [123] that the solution of the LAP with cost matrix L
constitutes a lower bound for the considered QAP. The problem of concern
now is to choose F1 , F2 and Dl, D2 such that the resulting lower bound is
maximized. Notice that by setting Fl = F and Dl = D we obtain the GLB.
Given an m x n matrix M, denote its rows and columns m(i.), and
m(.j), i,j = 1, ... ,n, respectively. Think of M as a data set of mn elements
mij, and define an average 'Y(M) and a variance V(M) as
1 m n m n
'Y(M):= -LLmij, V(M) := L Lb(M) - mij)2.
mn i=1 j=1 i=1 j=1

Also define the total variance


m
T(M, A) := AL V(m(i.)) + (1 - A)V(M), AE[O,I].
i=1
274 R.E. Burkard, E. Oela, P.M. Pardalos, and L.S. Pitsoulis

The term V (m( i.)) stands for the variance of m(i.)' treated as an 1 x n matrix.
Li et a1. observed that as the variances of the matrices F and D decrease,
the GLB increases. Moreover, GLB becomes maximum if the variances of
the rows of the matrices equal zero. The partition scheme considered is of
the form FI = F + t1F, F2 = -t1F' and DI = D + t1D, D2 = -t1D. We
will only describe how t1F is obtained; t1D is then obtained in an analogous
way. Thus, the problem is to find a matrix t1F, such that the variances of
FI and F2 and the sum of the variances of the rows for each FI and F2 are
minimized. This problem can be stated mathematically as

min OT(F + t1F,,\) + (1 - O)T( -t1~, ,\), (50)

where t1F = (Oij) is an n x n matrix and 0 E [0,1] is a parameter. Two


approximate solutions

R-l: Oij = O(fnn - lij) + Onn, i,j = 1, ... , n,


R-2: Oij = O("((f(.n») - 'Y(f(.j») i, j = 1, ... ,n,
where onn is arbitrary, were proposed in [123], The matrix t1D is constructed
in the same way. After the partitioning of the matrices F and D according
to R-l or R-2, the solution to the LAP with cost matrix L = (lij) (where
lij are defined in (49)) yields the bounds LB1(O) or LB2(O), respectively.
Notice that R-2 is obtained under the assumption that the columns of the
matrix t::..F (t::..D) are constant. This fact can be used to speed the compu-
tation of LB2(O) by applying Proposition 6.1.
In the case of computing LB1(O), the direct approach would be to solve n 2
LAPs defined in (49), and this would require O(n 5 ) elementary operations.
A different approach is to calculate lower bounds ~j for the values lij, i,j =
1, ... , n, and to solve than the LAP with cost matrix (iij)

It takes O(n 3 ) time to compute all iij and the same time to solve the final
LAP. Thus, the variance reduction lower bound can be computed in O(n 3 )
time. These lower bounds perform well on QAPs with input matrices that
have high variances, but their performance reduces to that of the GLB when
the variance of the matrices is small.
It is worth noting that there is also a closed form solution to problem
(50) given by Jansen [96]. However, as reported in [123], using that closed
form to compute the lower bounds, poses implementation obstacles.
The Quadratic Assignment Problem 275

6.4 Eigenvalue Based Lower Bounds


These bounds were introduced by Finke, Burkard, and Rendl [67], and can
be applied to the Koopmans-Beckmann QAP in (1). They are based on the
relationship between the objective function value of the QAP in the trace
formulation (9) and the eigenvalues of its coefficient matrices. When de-
signed and implemented carefully, these techniques produce bounds of good
quality in comparison with Gilmore-Lawler-like bounds or, more generally,
with bounds based on linear relaxations. However, these bounds are quite
expensive in terms of computation time requirements and therefore are not
appropriate for use within branch and bound algorithms. Moreover, these
bounds deteriorate quickly when lower levels of the branch and bound tree
are searched, as shown by Karisch, Clausen, Perregaard, and Rendl [49].
Upon the introduction of the method in [67], many improvements and
generalizations have appeared [86, 87, 88, 89, 154, 155]. There is a resem-
blance with the Gilmore-Lawler based lower bounds in the sense that, based
upon a general eigenvalue bound, reduction techniques are applied to the
quadratic terms of the objective function in order to improve its quality. In
this case the reduction techniques yield a significant improvement, which is
not really the case with the GLB.

Bound EV
Consider the trace formulation of the QAP in (9), with F and D being real
symmetric matrices (see Section 2.3), and hence having only real eigenvalues.
The following theorem describes the relations between the eigenvalues of
matrices F and D and the objective function of QAP(F, D) :

Theorem 6.1 (Finke, Burkard, and Rendl [67], 1987)


Let D, F be symmetric n x n matrices with real entries. Denote by A =
(AI, ... ,An)T and Xl, ... ,Xn the eigenvalues and eigenvectors of F, and by
f-L = (f-Ll,"" f-Ln)T and Yl, .. · ,Yn the eigenvalues and eigenvectors of D,
respectively. Then the following two relations are true for all X E X n ,

where S(X) = ((Xi,XYj)2) is a doubly stochastic matrix,

(ii)
276 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

By using part (ii) of Theorem 6.1 we obtain a lower bound (EVB) for the
considered QAP

The second term is the optimal value of an LAP and can be computed
efficiently.
EVB is not a strong bound. It often takes a negative value for QAP instances
with nonnegative coefficients. According to Theorem 6.1 the smaller the
intervaf[(A,J.L)-,(A,J.L)+] is, the closer is (A,J.L)- to tr{FXDXT). Thus,
trying to equivalently transform the given QAP so as to decrease the length
of that interval is one possibility to improve EVB.

Reduction methods and bound EVI


One possibility to make the interval [(A, J.L) -, (A, J.L)+] smaller, and hence to
improve EVB, is to decompose the matrices F and D such that some amount
will be transferred to the linear term, and the eigenvalues of the matrices
resulting in the quadratic term are as uniform in value as possible. Define
the spread of the matrix F as

spread{F) := max { I Ai - Aj I : i, j = 1, ... ,n} .

Our goal is to minimize the spreads of the matrices that compose the
quadratic term. There is no simple closed form for expressing spread{F)
in terms of fij, however there is a closed formula for an upper bound m{F)
due to Mirsky [136]
1/2
n n 2
[ ]
spread{F) ~ m{F) = 2t1f;fi~ - ;;,{trF)2 (51)

Finke, Burkard, and Rendl [67] have proposed the following decomposition
scheme

lij = hj + ei + ej + rij, (52)


dkl = dkl + gk + gl + Ski, (53)

where Tij = Sij = 0, for i f j. Denote F = (hj) and iJ = (dij ). The


values of ei and rii (gj and Sjj) which minimize the function f(e,r) = m{F)
The Quadratic Assign~ent Problem 277

(h(g, r) = m(D)) obtained by substituting the values of iij (dij) in (51) are
given by closed formula, see [67].
By replacing F and D in (9) we obtain

tr(FXD + B)XT = tr(FXD + B)XT,

where bij bij + fiidjj + 2ei E~=l djk' Let X = (Xl, .. " Xn) and Ji =
kh
(Jil,'" ,Jin) be the eigenvalues of matrices F and D, respectively. Byap-
plying EVB to the QAP with transformed coefficient matrices we obtain a
new eigenvalue bound EVBl

EVBl := (X,p,)- + min trBXT,


XEX n

Bound EV2
If we restrict ourselves only to purely quadratic, symmetric QAPs (fii =
dii = 0, for all i, B = 0), the matrix B in the above decomposition be-
comes B = cwT , where c = 2(el, ... , en)T and w = (Ej dlj,'" ,Ej dnj)T.
Therefore minxExn tr(BXT) = (c, w)-, and

One can, however, obtain a further improvement as suggested by Rendl [154]


as follows. Let Sk := {Xl,'" ,Xk} ~ X n, and

Thus, for any integer k ~ 1 we have L(Xt} ::;; L(X2 ) ::;; ••• ::;; L(Xk). In
other words the set Sk contains the k best solutions (permutation matrices)
of the problem minxExn(c,Xiw).
Z(F, D, Xi) is the value of the objective function of QAP(F, D) yielded
by solution Xi, i.e.,
- - - - - T
Z(F,D,Xi) =tr(FXiD+B)Xi'

Further define Z(k) := min {Z(F, D, Xi): i = 1, ... , k}. Then the following
inequalities hold (see [154])
278 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

where the equality Z(i) = (X, J1)- + L(Xi) for some i implies that Xi is
an optimal solution of QAP(F, D). Thus, essentially, we try to reduce the
gap between the optimal value of the QAP and the lower bound EVBl,
by increasing the value of the linear term (c, w)- in the bound in k steps,
where k is specified as a parameter. The generation of the set Sk is a
special case of the problem of finding the k best solutions of an assignment
problem. Murty [139] has given an O(k n3) algorithm to solve this problem.
Rendl [154] presents an O(nlogn + (n + logk)k) algorithm for the special
case where the cost matrix of the assignment problem is given as a product
matrix (CiWj).
Rendl [154] addresses two issues regarding the effectiveness of the above
ranking procedure in improving the lower bound. First, if the vectors c
and W have m ~ n equal elements, then there are at least m! permutation
matrices {Xi} such that the values (c,Xiw) are equal. This implies in turn
that there will be none or small improvement in the lower bound while
generating Sk for quite some number of iterations. It can be shown that c
and W will have equal elements if the row sums of F and D are equal (see
[67]). Hence, the ranking procedure could give good results in the case that
most of the row sums of F and D are not equal. Secondly, Rendl defines a
ratio A called the degree of linearity based on the ranges of the quadratic
and linear terms that compose the lower bound

The influence of the linear term on the lower bound is inversely propor-
tional to the value of A. A small value of A suggests that the ranking
procedure would be beneficial for the improvement of EVBl for symmet-
ric, pure quadratic QAPs. For large values of A, we can expect that the
quadratic term dominates the linear term in the objective function. In this
case Finke et al. [67] suggest the following improvement of EVBl. Consider
part (i) of Theorem 6.1 applied to the reduced matrices F and D, and de-
note the elements of the matrix S(X) by Sij, Sij = (Xi,XYj)2. It is easy to
see that lij ~ Sij ~ Uij, where
The Quadratic Assignment Problem 279

Recalling the fact that the Sij are the elements of a doubly stochastic
matrix, we can then form the capacitated transportation problem

n n
CTP* = min L L ).iJ1j Sij
i=l j=l
n
s.t. L Sij = 1, j = 1, ... , n,
i=l
n
L Sij = 1, i = 1, ... , n,
j=l
lij ~ Sij ~ Uij'
Then, a new lower bound would be
EVB2 = CTP* + (c,w)-.
Other eigenvalue related bounds
Rendl and Wolkowicz [155] derive a new lower bound similar to EVB2.
Notice that the decomposition scheme in (52) and (53) is uniquely deter-
mined by the 4n-dimensional vector d := (eT,gT,rT,sT) E JR4n , where
r = (rn, ... , rnn)T and S = (sn, ... , snnf. EVBl is then a function of
d. Maximizing this function with respect to d will result in a lower bound
with the best possible decomposition with respect to both the linear and the
quadratic term. Maximizing EVB 1 as a function of d leads to a nonlinear,
nonsmooth, nonconcave maximization problem which is hard to solve to op-
timality. Rendl et al. propose a steepest ascent algorithm to approximately
solve this problem (see [155]). The new bound, denoted EVB3, produces
the best lower bounds for a number of QAP instances from QAPLIB, with
the expense, however, of high computational time requirements.
A more general approach to eigenvalue based lower bounding techniques,
was employed by Hadley, Rendl and Wolkowicz [87]. Consider the following
sets ofnxn matrices, where I is the nxn identity matrix and u := (1, ... ,1)T
is the n-dimensional vector of all ones:
0:= {X : XT X = I}, set of orthogonal matrices,
e:= {X: Xu = XTu = u}, set of matrices with row
(54)
and column sums equal to one,
N := {X : X ~ O}, set of nonnegative matrices.
280 R.E. Burkard, E. Oela, P.M. Pardalos, and L.S. Pitsoulis

e
It is a well known result that Xn = 0 n n N, while the set n of doubly
n e
stochastic matrices is given as = nN. Moreover, by Birkhoff's theorem
[17] we know that n is a convex polyhedron with vertex set X n , i.e., n =
conv{X : X E X n}. The above characterization of Xn implies that we get a
relaxation of the QAP, if we delete one or two of the matrix sets 0, and N e
in the intersection Xn = OnenN. Obviously, the relaxation, and therefore
the lower bound, will be tighter if only one of the matrix sets is excluded.
In relation to Theorem 6.1, Rendl and Wolkowicz [155] have shown that

min tr(FXDXT)
XeO
= tr(FAFAbDADA~) = (>",1')-,
maxtr(FXDXT ) = tr(FAFAbDADA~) = (.\,1')+,
XeO
where AF, AD are matrices whose columns consist of the eigenvectors of F
and D, respectively, in the order specified by their minimal (maximal) inner
product. In other words, the lower bound on the quadratic part of the QAP
as obtained in EVB, is derived by relaxing the feasible set to the set of
orthogonal matrices.
All eigenvalue bounds discussed above relax the set of permutation ma-
trices to O. A tighter relaxation was proposed in [86, 88], where the set
of permutation matrices was relaxed to 0 n e. The authors incorporate e
in the objective function by exploiting the fact that the vector of ones u is
both a left and right eigenvector with eigenvalue 1, for any X E X n . More
specifically, define

P := [u/liull : V], where VT u = 0, VTV = In-I'

Then, V is an orthonormal basis for {u}.l, while Q := VVT is the orthogo-


nal projection on {u }.l. The following characterization of the permutation
matrices is given in [88].
Lemma 6.1 (Hadley [86], 1989, Hadley, Rendl, Wolkowicz [88], 1992)
Let X be a real n x n matrix and Y be a real (n -1) x (n - 1) matrix. If

X=p[10jpT
Y ,o (55)

then

X E e, X EN¢:}. VYV T ~ -uuTIIIull 2 , and X EO¢:} Y E On-I.


Conversely, if X E e, there exists a Y such that (55) holds.
The Quadratic Assignment Problem 281

Note that the above characterization of permutation matrices preserves


the orthogonality and the trace structure of the problem. By substitut-
ing X = -uu.T IIIull 2 + VYV T in the trace formulation of the QAP (9) as
suggested by (55), we obtain an equivalent projected problem (PQAP) of
dimension n - 1 with variable matrix Y. The new lower bound, often called
elimination bound and denoted by ELI, is obtained by dropping the require-
ment VYV T ~ -uu t IIIull 2 and simply requiring Y E On-I' In this way we
derive a lower bound for the quadratic part of the PQAP. The linear part
can be solved exactly as an LAP.
Concluding this section notice that there is a possibility to apply eigen-
value bounds to non-symmetric QAPs, i.e., QAPs with both coefficient ma-
trices being non-symmetric. Hadley [86] and Rendl and WoIkowicz [89] show
that analogous eigenvalue bounds to those for QAPs with at least one sym-
metric coefficient matrix can be derived for QAPs with Hermitian coefficient
matrices. Moreover, these authors show that each QAP can be equivalently
transformed into a QAP with Hermitian coefficient matrices.

6.5 Bounds Based on Semidefinite Relaxations


Semidefinite programming (SDP) is a generalization of linear programming
where the variables are taken from the Euclidean space of matrices with the
trace operator acting as an inner product. The non-negativity constraints
are replaced by semidefiniteness constraints and the linear constraints are
formulated in terms of linear operators on the above mentioned Euclidean
space of matrices. Successful applications of semidefinite programming in
discrete optimization are presented in Goemans and Williamson [82], and
Lovasz and Schrijver [125].
Recently, semidefinite programming relaxations for the QAP were consid-
ered by Karisch [103], Zhao [176], and Zhao, Karisch, Rendl and Wolkow-
icz [177]. The SDP relaxations considered in these papers are solved by
interior point methods or cutting plane methods, and the obtained solu-
tions are valid lower bounds for the QAP.
In terms of quality the bounds obtained in this way are competitive with
the best existing lower bounds for the QAP. For many test instances from
QAPLIB, such as some instances of Hadley, Roucairol, Nugent et a1. and
Taillard, they are the best existing bounds. However, due to prohibitively
high computation time requirements, the use of such approaches as basic
bounding procedures within branch and bound algorithms is up to now not
feasible.
282 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

We refer to [103, 177] for a detailed description of SDP approaches to the


QAP and illustrate the idea by describing just one semidefinite programming
relaxation for the QAP.
The set of n x n permutation matrices Xn is the intersection of the set
of n x n 0-1 matrices, denoted by Zn, and the set en of n x n matrices with
row and column sums equal to 1. Moreover, Xn is also the intersection of
Zn with the set of n x n orthogonal matrices, denoted by On. Hence
Xn = Zn n en = Zn nOn·
Recall that
On = {X E IRnxn:XXT = XTX = I} and

en = {X E IRnxn : Xu = XT u = u} ,
where I is the n x n identity matrix and u is the n-dimensional vector of
all ones. Then, the trace formulation of the QAP (2.3) with the additional
linear term n n
-22:2: b ij X ij,
i=l j=l
can be represented equivalently as follows:

s.t.
(QAPe ) XXT =XTX =1,
Xu=XTu=u,
xlj - Xij = O.

In order to obtain a semidefinite relaxation for the QAP from the for-
mulation QAPe above, we introduce first an n 2-dimensional vector vec(X).
vec(X) is obtained as a column-wise ordering of the entries of matrix X.
Then the vector vec(X) is lifted into the space of (n 2 + 1) x (n 2 + 1) matrices
by introducing a matrix Yx,

y _
X -
(xo
vec(X)
vec(X)T)
vec(X)vec(X)T .
Thus, Yx has some entry Xo in the left-upper corner followed by the vector
vec(X) in its first row (column). The remaining terms are those of the
matrix
vec(X)vec(X?
The Quadratic Assignment Problem 283

sitting on the right lower n 2 x n 2 block of Yx.


Secondly, the coefficients of the problem are collected in an (n 2 +1) x (n 2 +1)
matrix K given as

K _ ( 0 -vec(B)T )
- vee(B) D ®F '

where the operator vee is defined as above and D ® F is the Kronecker


product of D and F.
It is easy to see that with these notations the objective function of QAPe
equals tr(KYx). By setting Yoo := Xo = 1 as done in Zhao et al. [177],
one obtains two additional constraints to be fulfilled by the matrix Yx: Yx
is positive semidefinite and matrix Yx is a rank-one matrix. Whereas the
semidefiniteness and the equality Yoo = 1 can be immediately included in an
SDP relaxation, the rank-one condition is hard to handle and is discarded
in an SDP relaxation. In order to assure that the rank-one positive semidef-
inite matrix Yx is obtained by an n x n permutation matrix as described
above, other constraints should be imposed to Yx . Such conditions can
be formulated as valid constraints of an SDP formulation for the QAP by
means of some new operators, acting on matrices or vectors as introduced
below.
diag(A) produces a vector containing the diagonal entries of matrix A in
their natural order, i.e., from top-left to bottom-right. The adjoint operator
Diag acts on a vector V and produces a square matrix Diag(V) with off-
diagonal entries equal to 0 and the components of V on the main diagonal.
Clearly, for an n dimensional vector V, Diag(V) is an n x n matrix.
arrow acts on an (n 2 + 1) x (n 2 + 1) matrix Y and produces an n 2 + 1
dimensional vector arrow(Y) = diag(Y) - (0, YO,I:n2), where (0, Y{O,I:n2» is
an n 2 + 1 dimensional vector with first entry equal to 0 and other entries co-
inciding with the entries of Y lying on the O-th row and in columns between
1 and n 2 , in their natural orderl. The adjoint operator Arrow acts on an
n 2 + 1 dimensional vector Wand produces an (n 2 + 1) x (n 2 + 1) matrix
Arrow(W)

Arrow(W) = (wo
1/2W(1:n2)
1/2W[n2
Diag(Wl:n2)
)
,

lNote here that the rows and columns of an (n 2 + 1) x (n 2 + 1) matrix are indexed by
0,1, ... ,n2 •
284 R.E. Burkard, E. Qela, P.M. Pardalos, and L.8. Pitsoulis

where W(1:n2) is the n 2 dimensional vector obtained from W by removing


its first entry WOo
Further, we are going to consider an (n 2 + 1) x (n 2 + 1) matrix Y as composed
of its first row 1'(0,.), of its first column 1'(.,0), and of n 2 submatrices of size
n x n each, which are arranged in an n x n array of n x n matrices and
produce its remaining n 2 x n 2 block. (This is similar to the structure of a
Kronecker product of two nxn matrices, see Section 2.4 and 6.1.) The entry
Yaj3, 1 ~ ex,{3 ~ n 2 , will be also denoted by Y(ij)(kl), with 1 ~ i,j,k,l ~ n,
where ex = (i - 1)n + j and {3 = (k - 1)n + 1. Hence, Y(ij)(kl) is the element
with coordinates (j,l) within the n x n block with coordinates (i, k).
With these formal conventions let us define the so-called block-a-diagonal
and off-a-diagonal operators, acting on an (n 2 + 1) x (n 2 + 1) matrix Y, and
denoted by bOdiag and oOdiag, respectively. bOdiag(Y) and oOdiag(Y) are
n x n matrices given as follows:
n n
bOdiag(Y) = 2: 1'(k,.)(k,.) , oOdiag(Y) = 2: Y(.,k),(.,k) ,
k=l k=l

where, for 1 ~ k ~ n, y(k,.)(k,.) is the k-th n x n matrix on the diagonal of the


n x n array of matrices, defined as described above. Analogously, 1'(.,k),(.,k)
is an n x n matrix consisting of the diagonal elements sitting on the position
(k, k) of the n x n matrices (n 2 matrices altogether) which form the n 2 x n 2
lower right block of matrix Y. The corresponding adjoint operators BO Diag
and 0°Diag act on an n x n matrix 8 and produce (n 2 + 1) x (n 2 + 1)
matrices as follows:

Finally, let us denote by eo the n 2 + 1 dimensional unit vector with first


component equal to 1 and all other components equal to 0, and let R be the
(n 2 + 1) x (n 2 + 1) matrix given by

where E is the n x n matrix of all ones.


With these notations, a semidefinite relaxation for QAPe is given as
follows
The Quadratic Assignment Problem 285

min tr(KY)
s.t.
bOdiag(Y) = I,
oOdiag(Y) = I,
arrow(Y) = eo,
tr(RY) = 0,
YtO.

where ~ is the so-called Lowner partial order, i.e., A ~ B if and only if


B - A t 0, that is B - A is positive semidefinite.
Zhao et al. [177] have shown that an equivalent formulation for the con-
sidered QAP is obtained from QAPRO by imposing one additional condition
on the matrix Y, namely, the rank-one condition.

6.6 Improving Bounds by Means of Decompositions


The idea of applying so-called decompositions to improve lower bounds for
specially structured QAPs was initially proposed by Chakrapani and Skorin-
Kapov [44], and then further elaborated by Karisch and Rendl [105]. The
applicability of this approach seems to be restricted to QAPs with a very
special structure, the so-called grid QAPs (or rectilinear QAPs) to be intro-
duced below. This procedure yields the best existing bounds for many grid
QAP instances from QAPLIB and a good trade off between computation
time and bound quality.
A grid QAP is a Koopmans-Beckmann QAP with flow matrix F and
distance matrix D = (dij) being the distance matrix of a uniform rectangular
grid. If dij = dik + dkj, we say that k is on the shortest path connecting
i and j. The triple u = (i,j,k) is then called a shortest path triple. The
shortest path triple v = (i,j, k) for which dik = dkj = 1 is called a shortest
triangle.
We associate a matrix Ru = (r~)) to each shortest path triple u =
(k,m,l), and a matrix Tv = (t~j)) to each shortest triangle v = (k',m',l'),
where Ru and Tv are defined by
(u) - r(u) - r(u) - r(u) - 1 r(u) - r(u) - 1
r kl - Ik - ml - 1m - , km - mk - - ,
(v) _ (v) _ (v) _ (v) _ (v) _ (v) _
tk'm' - t"m' - tm'l' - tl'k' - tk'ml - tm'k' - 1,
r~) = 0 and t~j) = 0 if {i,j} ~ {k,l,m}.
286 R.E. Burlmrd, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

The set of all shortest path triples is denoted by 'R, and the set of all shortest
triangles is denoted by I.
The key observation is that, for each Ru. E 'R, and for each T" E I, the iden-

°
tity permutation is an optimal solution of QAP(Ru., D) and QAP(T", D).
The optimal values for these QAPs are and 8, respectively, and these
simple QAPs can be used to improve the quality of lower bounds for an
arbitrary grid QAP. Let us decompose the distance matrix F as

F= L auRu. + L {3"T" + F r, (56)


uE'R. "ET
where Fr is the residual matrix given as

Fr := F - L auRu. + L {3"T".
uE'R. "ET
For every choice of the parameters au 2: 0, 'U E 'R" and (3" 2: 0, v E I, and
for any permutation </> we have
Z(F, D, </» = L auZ(Ru., D, </» + L (3"Z(T" , D, </» + Z(Fr' D, </». (57)
uE'R. "ET

Equality (57) implies


minZ(F, D, </» 2: 8 L (3" + minZ(Fr, D, </» 2:
tP "ET tP

8 L (3" + LB(Fr, D) ,
"ET
where LB(Fr' D) is any lower bound for the QAP with How matrix Fr and
distance matrix D. Clearly, the expression on the right hand side of (57) is
a lower bound for the original QAP. This lower bound, which depends on
the vectors a = (au), {3 = ((3,,), is denoted by h(a, (3). Then, h(O,O) equals
LB(F, D)), and therefore,
max h(a,{3) 2: LB(F, D) ,
Q~O,.B~O

where a vector is said to be nonnegative if all its components are non-


negative. Hence, maxQ~o,.B~o h(a, (3) is an improvement upon the bound
LB( QAP(F, D)).
Chakrapani et al. [44] improve the Gilmore-Lawler bound (GLB), and
the elimination bound (ELI), by using only the matrices Ru., 'U E 'R" for the
decomposition. Karisch et al. [105] use the decomposition scheme (56) to
improve the elimination bound (ELI) (introduced in [88]).
The Quadratic Assignment Problem 287

7 Exact Solution Methods


An exact algorithm for a combinatorial optimization problem provides the
global optimal solution to the problem. In this section we will briefly discuss
several exact algorithms that have been used for solving the QAP, like branch
and bound, cutting plane and branch and cut algorithms.

7.1 Branch and Bound


Branch and bound algorithms have been applied successfully to many hard
combinatorial optimization problems, and they appear to be the most effi-
cient exact algorithms for solving the QAP.
The basic ingredients of branch and bound algorithms are bounding, branch-
ing, and the selection rule. Although many bounding techniques have been
developed for the QAP the most efficient branch and bound algorithms for
this problem employ the Gilmore-Lawler bound (GLB). The reason is that
other bounds which outperform GLB in terms of bound quality are simply
to expensive in terms of computation time. However, more recently some ef-
forts have been made to employ other Gilmore-Lawler-like bounds in branch
and bound algorithms. The bound of Hahn and Grant (HGB) [90], has been
used in a branch and bound algorithm by Hahn, Grant, and Hall [91], and the
results are promising. Pardalos, Ramakrishnan, Resende and Li [150] solve
some previously unsolved instances from QAPLIB by applying a branch and
bound algorithm which employs the variance reduction lower bound.
Three types of branching strategies are mostly used for the QAP: single as-
signment branching, see Gilmore [77], Lawler [118], pair assignment branch-
ing see Gavett and Plyter [74], Land [116], Nugent et al. [141], and branch-
ing based on relative positioning see Mirchandani and Obata [135]. The
single assignment branching which is the most efficient assigns a facility to a
location in each branching step, i.e., each problem is divided into subprob-
lems by fixing the location of one of the facilities which are not assigned
yet. Several rules for the choice of the facility-location pair to determine the
subproblems of a new level of the search tree have been proposed by different
authors. The appropriate rule usually depends on the bounding technique.
If the GLB is employed the above mentioned rule is frequently formulated in
terms of the reduced costs of the last assignment problem solved to bound
the subproblem which is currently being branched [14, 23, 131].
The pair assignment algorithms assign a pair of facilities to a pair of lo-
cations at a branching step, whereas in relative positioning algorithms the
288 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

levels of the search tree do not correspond to the number of facilities already
assigned to locations. Here the fixed assignments within each subproblem
are determined in terms of distances between facilities, i.e., their relative po-
sitions. Numerical results show that pair assignment or relative positioning
algorithms are outperformed by single-assignment algorithms.
Roucairol [163] developed another branching rule which does not belong to
any of the above groups, the so-called polytomic or k-partite branching rule.
The search tree produced by this algorithm is not binary as in most of the
other approaches. In this case the GLB is employed and the branching rule
is based on the solution <jJ of the last linear assignment problem solved to
compute the lower bound at the current node of the search tree. Let X~)
be the subset of Xn (the set of permutations of {1, 2, ... ,n}) consisting
of those permutations 7r such that 7r(i) = <jJ(i). Analogously, X~) is the
set of permutations 7r E X n , such that 7r(i) :f: <jJ(i). The current node
is branched into n + 1 new nodes with sets of feasible solutions given by
X~l), X~l) nx~2), ... ,X~l) nx~2)n ... nx~n-l)nx~n),X~l) nx~2) n ... nx~n).

Another issue in the implementation of branch and bound algorithms


concerns the so-called selection rule which determines the choice of the sub-
problem to be branched, i.e., the vertex of the search tree to be branched.
Several strategies, ranging from problem-independent depth or breadth first
search to instance dependent criteria related to the maximization of lower
bounds or reduced costs, have been tested by different authors. There seems
to be no clear winner among the tested strategies.
Better results on solving large size problems have been achieved lately
by parallel implementations, see Pardalos and Crouse [146], Bruengger,
Clausen, Marzetta, and Perregaard [19], and Clausen and Perregaard [50].
The Nugent et al. test instances [141] are widely considered as "stubborn"
QAP instances and has become an obvious challenge for every new algo-
rithm designed for solving the QAP to optimality. The largest Nugent et
a1. test instance which has ever been solved to optimality has size equal to
25 and has been solved by a parallel branch and bound algorithm which
employs a special implementation of the GLB, see Marzetta [130].

7.2 Traditional Cutting Plane Methods


Traditional cutting plane algorithms for the QAP have been developed by a
different authors, Bazaraa and Sherali [15,16], Balas and Mazzola [9, 10, 11],
The Quadratic Assignment Problem 289

and Kaufmann and Broeckx [108]. These algorithms make use of mixed inte-
ger linear programming (MILP) formulations for the QAP which are suitable
for Benders' decomposition. In the vein of Benders, the MILP formulation
is decomposed into a master problem and a subproblem, called also slave
problem, where the master problem contains the original assignment vari·
abIes and constraints. For a fixed assignment the slave problem is usually a
linear program and hence, solvable in polynomial time. The master problem
is a linear program formulated in terms of the original assignment variables
and of the dual variables of the slave problem, and is solvable in polynomial
time for fixed values of those dual variables. The algorithms work typically
as follows. First, a heuristic is applied to generate a starting assignment.
Then the slave problem is solved for fixed values of the assignment vari·
abIes implied by that assignment, and optimal values of the primal and dual
variables are computed. If the dual solution of the slave problem satisfies
all constraints of the master problem, we have an optimal solution for the
original MILP formulation of the QAP. Otherwise, at least one of the con·
straints of the master problem is violated. In this case, the master problem
is solved with fixed values for the dual variables of the slave problem and
the obtained solution is given as input to the slave problem. The procedure
is then repeated until the solution of the slave problem fulfills all constraints
of the master problem.
Clearly any solution of the master problem obtained by fixing the dual
variables of the slave problem to some feasible values, is a lower bound for the
considered QAP. On the other side, the objective function value of the QAP
corresponding to any feasible setting of the assignment variables is an upper
bound. The algorithm terminates when the lower and the upper bounds
coincide. Generally, the time needed for the upper and the lower bounds
to converge to a common value is too large, and hence these methods may
solve to optimality only very small QAPs. However, heuristics derived from
cutting plane approaches produce good suboptimal solutions in early stages
of the search, e.g. Burkard and Bonniger [24] and Bazaraa and Sherali [16].

7.3 Polyhedral Cutting Planes


Similarly to traditional cutting plane methods also polyhedral cutting planes
or branch and cut algorithms 2 make use of an LP or MILP relaxation of the
combinatorial optimization problem to be solved, in our case the QAP. Addi-

2This term was originally used by Padberg and Rinaldi [143].


290 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

tionally, polyhedral cutting plane methods make use of a class of (nontrivial)


valid or facet defining inequalities known to be fulfilled by all feasible solu-
tions of the original problem. If the solution of the relaxation is feasible for
the original problem, we are done. Otherwise, some of the above mentioned
valid inequalities are probably violated. In this case a "cut" is performed,
that is, one or more of the violated inequalities are added to the LP or MILP
relaxation of our problem. The latter is resolved and the whole process is
repeated. In the case that none of the valid inequalities is violated, but
some integrality constraint is violated, the algorithm performs a branching
step by fixing (feasible) integer values for the corresponding variable. The
branching steps produce the search tree like in branch and bound algorithms.
Each node of this tree is processed as described above by performing cuts
and then by branching it, if necessary. Clearly, related elements of branch
and bound algorithms like upper bounds, selection and branching rules play
a role in branch and cut algorithms. Hence, such an approach combines
elements of cutting plane and branch and bound methods.

The main advantage of polyhedral cutting plane algorithms with respect to


traditional cutting planes relies on the use of cuts which are valid for the
whole polytope of the feasible solutions, and possibly facet defining. Tradi-
tional cutting planes instead rely frequently on cuts which are not valid for
the whole polytope of the feasible solutions. In this case the whole computa-
tion has to be done from scratch for different variable fixings. This requires
additional running time and additional amounts of memory. Another and
not less important drawback of traditional cutting plane algorithms is due to
the "weakness" of the cuts they involve. In contrast with cuts produced by
facet defining inequalities, the weak cuts cannot avoid the slow convergence.

As we saw in Section 5 some properties and few facet defining inequal-


ities of the QAP polytope are already known. But still polyhedral cutting
plane methods for the QAP are not yet backed by a strong theory. How-
ever, some efforts to design branch and cut algorithms for the QAP have
been made by Padberg and Rijal [142] and Kaibel [102]. Padberg and Ri-
jal [142] have tested their algorithm on sparse QAP instances from QAPLIB.
The numerical results are encouraging, although the developed software is
of preliminary nature, as claimed by the authors. Kaibel [102] has used
branch and cut to compute lower bounds for QAP instances from QAPLIB.
His results are promising especially in the case where box inequalities are
involved.
The Quadratic Assignment Problem 291

8 Heuristics
Although substantial improvements have been done in the development of
exact algorithms for the QAP, problems of dimension n > 20 are still not
practical to solve because of very high computer time requirements. This
makes the development of heuristics indispensable as algorithms which pro-
vide good quality solutions in a reasonable time. Much research has been
devoted to the development of such approaches. We distinguish the following
types of heuristic algorithms:
• Construction methods (CM)
• Limited enumeration methods (LEM)
• Improvement methods (1M)

• Tabu search (TS)


• Simulated annealing (SA)
• Genetic algorithms (GA)
• Greedy randomized adaptive search procedures (GRASP)
• Ant systems (AS)

8.1 Construction Methods


Construction methods were introduced by Gilmore [77]. They are iterative
approaches which usually start with an empty permutation, and iteratively
complete a partial permutation into a solution of the QAP by assigning some
facility which has not been assigned yet to some free location. The algorithm
is presented in pseudocode in Figure 1. Here <Po, <Pb .•. , <Pn-I are partial
permutations, and heur(i) is some heuristic procedure that assigns facility
i to some location j, and returns j. r is the set of already assigned pairs
of facilities to locations. The procedure update constructs a permutation
<Pi by adding the assignment (i,j) to <Pi-I. The heuristic heur(i) employed
by update could be any heuristic which chooses a location j for facility i,
(i,j) rt r, in a greedy fashion or by applying local search.
One of the oldest heuristics used in practice, the CRAFT heuristic de-
veloped by Buffa, Armour and Vollmann [20], is a construction method.
Another construction method which yields good results has been proposed
by Miiller-Merbach [140].
292 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

procedure construction(4Jo, r}
1 4J = Uj
2 do i = 1, ... ,n - 1 -+
3 if (i,i) ¢ r -+
4 i =heur(i}j
5 update (4Ji, (i,i))j
6 r = r U (i,i}j
7
8
9 od;
10 return(4J)
end construction;

Figure 1: Pseudo-code for a construction method

8.2 Limited Enumeration Methods


Limited enumeration methods rely on the observation that often enumer-
ation methods (e.g. branch and bound algorithms) find good solutions in
early stages of the search, and employ then a lot of time to marginally im-
prove that solution or prove its optimality. This behavior of enumeration
methods suggests a way to save time in the case that we are interested in
a good but not necessarily optimal solution: impose some limit to the enu-
meration process. This limit could be a time limit, or a limit on the number
of iterations the algorithm may perform.
Another strategy which serves the same goal is to manipulate the lower
bound. This can be done by increasing the lower bound if no improvement
in the solution is achieved during a large number of iterations, and would
yield deeper cuts in the search tree to speed up the process. Clearly, such an
approach may cut off the optimal solution and hence should be used care-
fully, possibly in conjunction with certain heuristics that perform elaborate
searches in the feasible space.

8.3 Improvement methods


These methods belong to the larger class of local search algorithms. A local
search procedure starts with an initial feasible solution and iteratively tries
The Quadratic Assignment Problem 293

to improve the current solution. This is done by substituting the latter


with a (better) feasible solution from its neighborhood. This iterative step is
repeated until no further improvement can be found. Improvement methods
are local search algorithm which allow only improvements of the current
solution in each iteration. For a comprehensive discussion of theoretical and
practical aspects of local search in combinatorial optimization the reader is
referred to the book edited by Aarts and Lenstra [2].
Basic ingredients of improvement methods (and of local search in gen-
eral) are the neighborhood and the order in which the neighborhood is
scanned. Frequently used neighborhoods for QAPs are the pair-exchange
neighborhood and the cyclic triple-exchange neighborhood. In the case of
pair-exchanges the neighborhood of a given solution (permutation) consists
of all permutations which can be obtained from the given one by applying
a transposition to it. In this case, scanning the whole neighborhood, i.e.,
computing the objective function values for all neighbors of a given per-
mutation, takes 0(n 3 ) time. (The size of the neighborhood is (;), and it
takes O(n) steps to compute the difference of the objective function values
of a permutation 71" and a permutation 71"' in the neighborhood of 71".) If the
neighborhood of 71" is already scanned and 71"' is a neighbor of 71", then the
neighborhood of 71"' can be scanned in 0(n 2 ), see Frieze et al. [71].
In the case of cyclic triple-exchanges, the neighborhood of a solution (permu-
tation) 71" consists of all permutations obtained from 71" by a cyclic exchange
of some triple of indices. The size of this neighborhood is 0«(;)). Cyclic
triple-exchanges do not really lead to better results when compared with
pair-exchanges.
Another important ingredient of improvement methods is the order in
which the neighborhood is scanned. This order can be either fixed previously
or chosen at random. Given a neighborhood structure and a scanning order,
a rule for the update of the current solution (from the current iteration to the
subsequent one) should be chosen. The following update rules are frequently
used:

• First improvement

• Best improvement

• Heider's rule [94]

In the case of first improvement the current solution is updated as soon as


the first improving neighbor solution is found. Best improvement scans the
294 RE. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

whole neighborhood and chooses the best improving neighbor solution (if
such a solution exists at all). Heider's rule starts by scanning the neigh-
borhood of the initial solution in a prespecified cyclic order. The current
solution is updated as soon as an improving neighbor solution is found.
The scanning of the neighborhood of the new solution starts there where
the scanning of the previous one was interrupted (in the prespecified cyclic
order).
In order to get better results, improvement methods and local search algo-
rithms in general are performed several times starting with different initial
solutions.

8.4 Tabu Search


Tabu search is a local search method introduced by Glover [79, SO] as a tech-
nique to overcome local optimality. One way to overcome local optimality
would be to allow also the deterioration of the current solution when mov-
ing from one iteration to the subsequent one, in contrast to improvement
methods. In the case of tabu search the basic idea is to "remember" which
solutions have been visited in the course of the algorithm, in order to derive
the promising directions for further search. Thus, the memory and not only
the local investigation of the neighborhood of the current solution drives
the search. The reader is referred to the book edited by Glover, Laguna,
Taillard, and De Werra [S1] for a comprehensive introduction to tabu search
algorithms.
The main ingredients of tabu search are the neighborhood structure, the
moves, the tabu list and the aspiration criterion. A move is an operation
which, when applied to a certain solution 11", generates a neighbor 11"' of it.
In the case of QAPs the neighborhood is the pair-exchange neighborhood
and the moves are usually transpositions. A tabu list is a list of forbidden or
tabu moves, Le., moves which are not allowed to be applied to the current
solution. The tabu status of the moves changes along with the search and
the tabu list is updated during the search. An aspiration criterion is a
condition which, when fulfilled by a tabu move, cancels its tabu status.
A generic tabu search procedure starts with an initial feasible solution S
and selects a best-quality solution among (a part of) the neighbors of S
obtained by non-tabu moves. Note that this neighboring solution does not
necessarily improve the value of the objective function. Then the current
solution is updated, i.e., it is substituted by the selected solution. Obviously,
this procedure can cycle, Le., visit some solution more than once. In an
The Quadratic Assignment Problem 295

effort to avoid this phenomenon a tabu criterion is introduced in order to


identify moves which are expected to lead to cycles. Such moves are then
declared tabu and are added to the tabu list. As, however, forbidding certain
moves could prohibit visiting "interesting" solutions, an aspiration criterion
distinguishes the potentially interesting moves among the forbidden ones.
The search stops when a stop criterion (running time limit, limited number
of iterations) is fulfilled.
There is a lot of freedom in the implementation of different elements of a tabu
search algorithms, e.g. the tabu list (length and maintenance), the aspiration
criterion, the tabu criterion. The performance of tabu search algorithms
depends very much on the implementation chosen for its basic ingredients,
and there is no general agreement about the best implementation of any of
those.
Different implementations of tabu search have been proposed for the
QAP, e.g. a tabu search with fixed tabu list (Skorin-Kapov [166]), the robust
tabu search (Taillard [171]), where the size of the tabu list is randomly chosen
between a maximum and a minimum value, and the reactive tabu search
(Battiti and Tecchiolli [13]) which involves a mechanism for adopting the
size of the tabu list. Reactive tabu search aims at improving the robustness
of the algorithm. The algorithm notices when a cycle occurs, i.e., when a
certain solution is revisited, and increases the tabu list size according to the
length of the detect.ed cycle. The numerical results show that generally the
reactive tabu search outperforms other tabu search algorithms for the QAP
(see [13]).
More recently, also parallel implementations of tabu search have been pro-
posed, see e.g. Chakrapani and Skorin-Kapov [43]. Tabu search algorithms
allow a natural parallel implementation by dividing the burden of the search
in the neighborhood among several processors.

8.5 Simulated Annealing


Simulated annealing is a local search approach which exploits the analogy
between combinatorial optimization problems and problems from statistical
mechanics. Kirkpatrick, Gelatt and Vecchi [110] and Cerny [42] were among
the first authors who recognized this analogy, and showed how the Metropo-
lis algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller [133])
used to simulate the behavior of a physical many-particle system can be
applied as a heuristic for the traveling salesman problem.
The analogy between a combinatorial optimization problem and a many-
296 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

particle physical system basically relies on two facts:


• Feasible solutions of the combinatorial optimization problem corre-
spond to states of the physical system .
• The objective function values corresponds to the energy of the the
states of the physical system.
In condensed matter physics annealing is known as a cooling process which
produces low energy thermal equilibrium states of a solid in a heat bath.
The aim is to reach the so-called ground state which is characterized by a
minimum of energy.
Burkard and Rendl [37] showed that a simulated cooling process yields a
general heuristic which can be applied to any combinatorial optimization
problem, as soon as a neighborhood structure has been introduced in the
set of its feasible solutions. In particular Burkard et al. applied simulated
annealing to the QAP. Other simulated annealing (SA) algorithms for the
QAP have been proposed by different authors, e.g. Wilhelm and Ward [175]
and Connolly [53]. All these algorithms employ the pair-exchange neighbor-
hood. They differ on the way the cooling process or the thermal equilibrium
is implemented. The numerical experiments show that the performance of
SA algorithms strongly depends on the values of the control parameters,
and especially on the choice of the cooling schedule.
Simulated annealing (SA) can be modeled mathematically by an inho-
mogeneous ergodic Markov chain, and this model has been used for the
probabilistic analysis of the convergence of simulated annealing algorithms.
Under natural conditions on the involved neighborhood structure and non
very restrictive conditions on the slowness of the cooling process it can be
shown that SA asymptotically converges to an optimal solution of the con-
sidered problem. The investigation of the speed of this convergence remains
an (apparently difficult) open problem. For a detailed discussion on the con-
vergence and other theoretical aspects of simulated annealing the reader is
referred to the books by Aarts and Korst [1] and Laarhoven and Aarts [115].

8.6 Genetic Algorithms


The so-called genetic algorithms (GA) are a nature inspired approach for
combinatorial optimization problems. The basic idea is to adapt the evo-
lutionary mechanisms acting in the selection process in nature to combina-
torial optimization problems. The first genetic algorithm for optimization
problems was proposed by Holland [95] in 1975.
The Quadratic Assignment Problem 297

A genetic algorithm starts with a set of initial feasible solutions (gener-


ated randomly or by using some heuristic) called the initial population. The
elements of a population are usually termed "individuals". The algorithm
selects a number of pairs of individuals or parents from the current popula-
tion and uses so-called cross-over rules to produce some feasible solution or
child out of each pair of individuals. Further, a number of "bad" solutions,
i.e., solutions yielding to high values of the objective function, is thrown out
of the current population. This process is repeated until a stop criterion, e.g.
a time limit, a limit on the number of iterations, a measure of convergence,
is fulfilled. In the course of the algorithm, mutations or immigrations are
applied periodically to the current population to improve its overall quality
by modifying some of the individuals or replacing them by better ones, re-
spectively. Often local optimization tools are periodically used within GAs
resulting in so-called hybrid algorithms. The search is diversified by means
of so-called tournaments. A tournament consists of applying several runs
of a GA starting from different initial populations and stopping them be-
fore they converge. A "better" population is derived as a union of the final
populations of these different runs, and then a new run of the GA is started
over this population. For a good coverage of theoretical and practical issues
on genetic algorithms the reader is referred to Davis [56] and Goldberg [83].
A number of authors have proposed genetic algorithms for the QAP.
Standard algorithms e.g. the one developed by Tate and Smith [172], have
difficulties to generate the best known solutions even for QAPs of small
or moderate size. Hybrid approaches, e.g. combinations of GA techniques
with tabu search as the one developed by Fleurent and Ferland [68] seem to
be more promising. More recently another hybrid algorithm, the so-called
greedy genetic algorithm proposed by Ahuja, Orlin, and Tivari [6] produced
very good results on large scale QAPs from QAPLIB.

8.7 Greedy Randomized Adaptive Search Procedure


The greedy randomized adaptive search procedure (GRASP) was introduced
by Feo and Resende [66] and has been applied successfully to different hard
combinatorial optimization problems [65, 111, 112, 157] and among them to
the QAP [124, 148] and the BiQAP [132]. The reader is referred to [66] for
a survey and tutorial on GRASP.
GRASP is a combination of greedy elements with random search elements
in a two phase heuristic. It consists of a construction phase and a local im-
provement phase. In the construction phase good solutions from the avail-
298 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

able feasible space are constructed, whereas in the local improvement phase
the neighborhood of the solution constructed in the first phase is searched
for possible improvements. A pseudocode of GRASP is shown in Figure 2.
The input parameters are the size RC Lsize of the restricted candidate list
(RCL), a maximum number of iterations, and a random seed. RCL con-
tains the candidates upon which the sampling related to the construction of
a solution in the first phase will be performed.

procedureGRASP(RCLSize,Maxlter,RandomSeed)
1 Inputlnstance();
2 do k = 1, ... , Maxlter --+
3
ConstructGreedyRandomizedSolution(RCLSize,RandomSeed);
4 LocalSearch (BestSolutionFound);
5 UpdateSolution(BestSolutionFound);
6 od;
7 returnBestSolutionFound
end GRASP;

Figure 2: Pseudo-code for a generic GRASP

For the QAP the construction phase consists of two stages. The RCL con-
tains tuples of partial permutations and values associated to them. Each of
these partial permutations fixes the location of facilities 1 and 2. Such partial
permutations are called 2-permutations. In the first stage a 2-permutation
is chosen randomly from the restricted candidate list (RCL).
Given a QAP instance of size n with flow matrix F = (fij) and distance ma-
trix D = (dij), the value CtfJ,'I/J associated with a pair (4),1/J) of2-permutations
is given as
2 2
CtfJ,'I/J = L L dtfJ(i)tfJU) i'I/J (i)'I/J U) •
i=l j=l

Clearly, the 2-permutations 4>, 1/J can be seen as elements of the set K =
{(i,i) : i,i = 1,2, ... ,n,i "I i}, and since IKI = n(n -1), there are
n 2 (n - 1)2 pairs (4),1/J) of 2-permutations. If we have a symmetric QAP
instance with zeros in the diagonal, the above cost simplifies to

CtfJ,'I/J = 2dtfJ (1)tfJ(2)f'I/J(1)'I/J(2) .


The Quadratic Assignment Problem 299

The RCL contains a number of pairs (¢, 'l/J) - this number equals the RCL
size and is denoted by RG Lsize - having the smallest associated costs. In
the case of an asymmetric QAP, we compute the costs Gcp,'I/J for all (¢, 'l/J)
and keep the RGLsize smallest among them. In the symmetric case, we
sort the m = n 2 - n off-diagonal entries of matrix D in ascending order, and
the off-diagonal entries of F in descending order, i.e.,

·. >
f '1}1 -
f·1232. >
-
... >
-
f·Im3m'
.
Then, the products dk.l.li.i. are the costs associated to pairs of 2-permuta-
tions (ks,ls), (is,js), 1 ~ s ~ m, respectively. These costs are sorted in
ascending order and the RG Lsize smallest among them are put in RCL.
Finally, one pair of 2-permutations from RCL is chosen at random, and these
determines the locations of two facilities which are kept fixed in the second
stage of the construction phase. Notice that the RCL is constructed only
once, and hence, in constant time with regard to the number of iterations.
In the second stage the remaining n - 2 facilities are assigned to locations.
Let the set r r be the set of assignments made prior to the r - th assignment:

Note that at the start of stage 2, !f31 = 2, since two assignments are made
in the first stage, and r = Irr I + 1 throughout the second stage. In stage 2
we also construct an RCL which contains the single assignments m -7 s,
(m, s) ¢ rr, and their associated costs em,s defined as

em,s:= L dl/>(i)l/>(j)I'I/J(i)'I/J(j) ,
(i,j)ETr

where
Tr := {(i,j) : i,j = 1,2, ... , r, {i,j} n {r} # 0},
and ¢, 'l/J are partial permutations resulting from the r - 1 assignment which
are already fixed and the assignment (m, s). In the case of a symmetric
QAP the cost Gm,s is given by the simpler formula

r-l
ems = 2 L dml/>(i)ls'I/J(i)'
i=l
(58)
300 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Among the U = (n - r + 1) 2 possible assignments (m, s), those with the


RLSsize smallest associated costs are included in RCL. One assignment is
then selected at random from RCL and the set r r is updated

rr = rr U {(m,s)}.
This process is repeated until a permutation of {I, 2, ... , n}, i.e., a fea-
sible solution of the considered QAP, results. Stage 2 of the construc-
tion phase of GRASP in pseudocode is shown in Figure 3. Procedure

procedure ConstructionStage2(a, (jt,Pl), (h,P2))


1 r = {(jl,pd, (h,P2));
2 do r = 2, ... , n - 1 -+
3 U=O;
4 do m = 1, ... , n -+
5 do s = 1, ... , n -+
6 if (m,s) ¢ r -+
7 Oms = E(i,j)ETr ap(i)p(j)bq(i)q(j);
8 inheap(Cms );
9 U= U+1;
10 H;
11 od;
12 od;
13 t =random[l, laUJ];
14 do i = 1, ... , t -+
15 Oms =outheap () ;
16 od;
17 r = r U {(m, s)};
18 od;
19 return r
end ConstructStage2;

Figure 3: Stage 2 of Construction Phase of GRASP

ConstructionStage2 returns the set r with n - 1 assignments, the last


assignment being then trivial. The inheap 0 and outheap 0 procedures
are used for sorting and choosing the smallest among the computed Om,s
The Quadratic Assignment Problem 301

costs, respectively. The procedure random generates a random number in


the interval [1, RLSsize].
Finally, the second phase of the algorithm completes a GRASP iteration
by applying an improvement method starting from the solution constructed
in the first phase and employing the 2-exchange neighborhood (see also
Section 8.3).

8.8 Ant Systems


Ant system (AS) are recently developed heuristic techniques for combina-
torial optimization which try to imitate the behavior of an ant colony in
search for food. Initially the ants search for food in the vicinity of their
nest in a random way. As soon as an ant finds a source of food, it takes
some food from the source and carries it back to the nest. During this trip
back the ant leaves a trail of a substance called pheromone on the ground.
The pheromone trail serves to guide the future search of ants towards the
already found source of food. The intensity of the pheromone on the trail
is proportional to the amount of food found in the source. Thus, the ways
to rich sources of food visited frequently (by a large number of ants) will
be indicated by stronger pheromone trails. In an attempt to imitate the
behavior of ants to derive algorithms for combinatorial optimization prob-
lems, the following analogies can be exploited: a) the area searched by the
ants resembles the set of feasible solutions, b) the amount of food at food
sources resembles the value of the objective function, and c) the pheromone
trail resembles a component of adaptive memory.
AS were originally introduced by Dorigo [59] and Colorni, Dorigo, and
Maniezzo [51] and have already produced good results for well known prob-
lems like the traveling salesman problem (TSP) and the QAP [52, 72].
In the case of the QAP the pheromone trail which is also the key element
of an AS, is implemented by a matrix T = (Tij). Tij is a measure for the
desirability of locating facility i at location j in the solutions generated
by the ants (the algorithm). To illustrate the idea we briefly describe the
algorithm of Gambardella, Taillard and Dorigo [72].
The algorithm is iterative and constructs a fixed number, say m, of solutions
in each iteration. (This number is a control parameter and is also thought
as number of ants.) In the first iteration these solutions are generated ran-
domly, whereas in the subsequent iteration they are updated by exploit-
ing the information contained in the pheromone trail matrix T. Initially
302 R.E. Burkard, E. vela, P.M. Pardalos, and L.S. Pitsoulis

the pheromone trail matrix is a constant matrix; the constant is inverse-


proportional to the best value of the objective function found so far. This
is in compliance with the behavior of ants whose search directions are ini-
tially chosen at random. Let us denote the best solution found so far by
<f>* and its corresponding value of the objective function by f(<f>*}. In the
further iterations the entries Ti</>"(i} of T are increased by the same value
which is proportional to f(<f>*}. The update of the m solutions in each it-
eration is done first by means of the pheromone trail matrix, and then by
applying some improvement method. In both cases the update consists of
swapping the locations for a sequence of facility pairs. First, the current
solution is updated by swapping the locations of pairs of facilities chosen so
as to maximize the (normalized) sum of the corresponding pheromone trail
entries. Then, the solution obtained after this update is improved by ap-
plying some improvement methods, e.g. first or best improvement (see also
Section 8.3). As soon as an improvement of the best known solution is de-
tected an intensification component "forces the ants" to further explore the
part of the solution space where the improvement was found. If after a large
number of iterations there is no improvement of the best known solution, a
diversification - which is basically a new random start - is performed.
Numerical results presented in [52, 72] show that ant systems are competitive
heuristics especially for real life instances of the QAP with a few very good
solutions clustered together. For randomly generated instances which have
many good solutions distributed somehow uniformly in the search space, AS
are outperformed by other heuristics, e.g. genetic algorithms or tabu search
approaches.

9 Available Computer Codes for the QAP


Burkard, Karisch, and Rendl [34] have compiled a library of QAP instances
(QAPLIB) which is widely used to test bounds, exact algorithms, and heuris-
tics for the QAP. The instances collected in QAPLIB are due to different
authors and range from instances arising in real life applications to instances
generated randomly only for test purposes. Many of these instances have not
been solved to optimality yet, the most celebrated among them being the in-
stances of Nugent, Vollmann, and Ruml [141] of size larger than 25. QAPLIB
can be found at http://www.opt.math.tu-graz.ac.at/.ltarisch/qaplib.

A number of codes to compute lower bounds are available. A FORTRAN


The Quadratic Assignment Problem 303

code which computes the GLB is due to Burkard and Derigs [29], and is able
to compute the bound for instances of size up to 256. The source code can be
downloaded from the QAPLIB web page. Another FORTRAN code which
can be downloaded from the QAPLIB web page computes the elimination
bound (ELI) for symmetric QAP instances of size up to 256.
Recently, Espersen, Karisch, Qela, and Clausen [64] have developed QAPpack
which is a JAVA package containing a branch and bound algorithm to solve
the QAP. In QAppack a number of bounds based on linearization are imple-
mented: the Gilmore-Lawler bound [77, 118], the bound of Carraresi and
Malucelli [40], the bound of Adams and Johnson [3], the bound of Hahn and
Grant [90], and the bound of Karisch, Qela, Clausen, and Espersen [104].
The implementation is based on the dual framework provided by Karisch et
a1. [104]. QAPpack can be found at http://wvv.imm.dtu.dk/-te/QAPpack.
Besides QAPpack, a FORTRAN code of the branch and bound algorithm
developed by Burkard and Derigs [29] can be downloaded from the QAPLIB
web page.

There are also some codes of heuristics available. The (compressed)


FORTRAN source file - 608.Z - of a heuristic due to West [174], can be
downloaded at ftp: I Inetlib. att. com in Inetlib/toms.
The source files (compressed tar-files) of two FORTRAN implementations of
GRASP for dense QAPs by Resende, Pardalos and Li [156] and sparse QAPs
by Pardalos, Pitsoulis and Resende [149] can be downloaded from Resende's
web page at http://wvv.research.att.com/-mgcrI srcl index. html.
The source file of a FORTRAN implementation of the simulated annealing
algorithm of Burkard and Rendl [37] can be downloaded from the QAPLIB
web page.
The source file of a C++ implementation of the simulated annealing algo-
rithm of Connolly [53], due to Taillard, can be downloaded from Taillard's
web page at http://wvv.idsia.ch/-eric/codes .dirlsa...qap.c.
Also a source file of a PASCAL implementation of Taillard's robust tabu
search [171] can be found at Taillard's web page.

Finally, the source file of a FORTRAN implementation of Li and Parda-


los' generator for QAP instances with known optimal solution [122] can be
obtained by sending an email to coapbath. ufl. edu with subject line send
92006.
304 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

10 Polynomially Solvable Cases


Since the QAP is NP-hard, restricted versions which can be solved in poly-
nomial time are an interesting aspect of the problem. A basic question
arising with respect to polynomially solvable versions is the identification
of those versions and the investigation of the border line between hard and
easy versions of the problem. There are two ways to approach this topic:
first, find structural conditions to be imposed on the coefficient matrices of
the QAP so as to obtain polynomially solvable versions, and secondly, inves-
tigate other combinatorial optimization or graph-theoretical problems which
can be formulated as QAPs, and embed the polynomially solvable versions
of the former into special cases of the later. These two approaches yield two
groups of restricted QAPs which are briefly reviewed in this section. For a
detailed information on this topic the reader is referred to [41].
Most of the restricted versions of the QAP with specially structured
matrices involve Monge matrices or other matrices having analogous prop-
erties. A matrix A = (aij) is a Monge matrix iff the following inequalities
are fulfilled for each 4-tuples of indices i, j, k, I, i < k, j < I:
aij + akl :5 ail + akj, (Monge inequalities).
A matrix A = (aij) is an Anti-Monge matrix iff the following inequalities
are fulfilled for each 4-tuples of indices i,j, k, I, i < k, j < I:

(Anti-Monge inequalities).

A simple example of Monge and Anti-Monge matrices are the sum matrices;
the entries of a sum matrix matrix A = (aij) are given as aij = ai + {3j,
where (ai) and ({3j) are the generating row and column vector, respectively.
A product matrix A is defined in an analogous way: its entries are given
as aij = ai{3j, where (ai), ({3j) are the generating vectors. If the row
generating vector (ai) and the column generating vectors ({3i) are sorted
non-decreasingly, then the product matrix (ai{3j) is an Anti-Monge matrix.
In contrast with the traveling salesman problem, it turns out that the
QAP with both coefficient matrices being Monge or Anti-Monge is NP-hard,
whereas the complexity of a QAP with one coefficient matrix being Monge
and the other one being Anti-Monge is still open, see Burkard, Qela, Demi-
denim, Metelski, and Woeginger [26] and Qela [41]. However, some polyno-
mially solvable special cases can be obtained by imposing additional con-
ditions on the coefficient matrices. These special cases involve very simple
The Quadratic Assignment Problem 305

.
J 1 J 1

~j ~l a.1 ~l
1
1 - -f + -
+ - .. - i
k .l\o..

~j ~l 8t:j 8t:l

a) b)
Figure 4: The sum of the depicted entries taken with the cor-
responding signs must be nonnegative: a) Monge inequality, b)
Anti-Monge inequality.

matrices like product matrices or so-called chess-board matrices. A matrix


A = (aij) is a chess-board matrix if its entries are given by aij = (-1)i+j.
These QAPs can either be formulated as equivalent LAPs, or they are con-
stant permutation QAPs (see [26, 41]), i.e., their optimal solution can be
given before hand, without knowing the entries of their coefficient matrices.
A few other versions of the QAP involving Monge and Anti-Monge matrices
with additional structural properties can be solved by dynamic program-
ming.
Other restricted versions of the QAP involve matrices with a specific
diagonal structure e.g. circulant and Toeplitz matrices. An n x n matrix
A = (aij) is called a Toeplitz matrix if there exist numbers C-n+l, ... , C_l,
CO, CI, ... , Cn-l such that aij = Cj-i, for all i,j.
A matrix A is called a circulant matrix if it is a Toeplitz matrix and the
generating numbers Ci fulfill the conditions Ci = Cn-i, for 0 ~ i ~ n - 1. In
other words, a Toeplitz matrix has constant entries along lines parallel to
the diagonal, whereas a circulant is given by its first row and the entries of
the i-th row resembles the first row shifted by i - I places to the right.
QAPs with one Anti-Monge (Monge) matrix and one Toeplitz
(circulant) matrix. In general these versions of the QAP remain NP-
hard unless additional conditions, e.g. monotonicity, are imposed on the
coefficient matrices. A well studied problem is the so called Anti-Monge-
306 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Toeplitz QAP where the rows and columns of the Anti-Monge matrix are
non-decreasing, investigated by Burkard, vela, Rote and Woeginger [28]. It
has been shown that this problem is NP-hard and contains as a special case
the so called turbine problem introduced by Mosewich [137] and formulated
as a QAP by Laporte and Mercure [117]. In the turbine problem weare given
a number of blades to be welded in regular spacing around the c)'linder of
the turbine. Due to inaccuracies in the manufacturing process the weights of
the blades differ slightly and consequently the gravity center of the system
does not lie on the rotation axis of the cylinder, leading to instabilities. In
an effort to make the system as stable as possible, it is desirable to locate
the blades so as to minimize the distance between the center of gravity and
the rotation axis. The mathematical formulation of this problem leads to
an NP-hard Anti-Monge-Toeplitz QAP. (For more details and for a proof
of NP-hardness see Burkard et al. [28].) It is probably interesting that
the maximization version of this problem is polynomially solvable. Further
polynomially solvable special cases of the Anti-Monge-Toeplitz QAP arise
if additional constraints e.g. benevolence or k-benevolence are imposed on
the Toeplitz matrix. These conditions are expressed in terms of properties
of the generating function of these matrices, see Burkard et a!. [28].
The polynomially solvable QAPs with one Anti-Monge (Monge) matrix and
the other one Toeplitz (circulant) matrix described above, are all constant
permutation QAPs. The techniques used to prove this fact and to identify
the optimal permutation is called reduction to extremal rays. This tech-
nique exploits two facts: first, the involved matrix classes form cones, and
secondly, the objective function of the QAP is linear with respect to each
of the coefficient matrices. These two facts allow us to restrict the inves-
tigations to instances of the QAP with 0-1 coefficient matrices which are
extremal rays of the above mentioned cones. Such instances can then be
handled by elementary means (exchange arguments, bounding techniques)
more easily that the general given QAP. The identification of polynomially
solvable special cases of the QAP which are not constant permutation QAPs
and can be solved algorithmically remains a challenging open question.
Another class of matrices similar to the Monge matrices are the Kalmanson
matrices. A matrix A = (aij) is a Kalmanson matrix if it is symmetric and
its elements satisfy the following inequalities for all indices i,j, k, I, i < j <
k < l:

For more information on Monge, Anti-Monge and Kalmanson matrices, and


The Quadratic Assignment Problem 307

their properties the reader is referred to the survey article of Burkard, Klinz
and Rudolf [35]. The Koopmans-Beckmann QAP with one coefficient matrix
being is a Kalmanson matrix and the other one a Toeplitz matrix, has been
investigated by De'lneko and Woeginger [57]. The computational complexity
of this problem is an open question, but analogously as in the case of the
Anti-Monge-Toeplitz QAP, polynomially solvable versions of the problem
are obtained by imposing additional constraints to the Toeplitz matrix.
Further polynomially solvable cases arise as QAP formulations of other
problems, like the linear arrangement problem, minimum feedback arc set
problem, packing problems in graphs and subgraph isomorphism, see [26,
41]. Polynomially solvable versions of these problems lead to polynomially
solvable cases of the QAP. The coefficient matrices of these QAPs are the
(weighted) adjacency matrices of the underlying graphs, and the special
structure of these matrices is imposed by properties of these graphs. The
methods used to solve these QAPs range from graph theoretical algorithms
(in the case of the linear arrangement problem and the feedback arc set
problem), to dynamic programming (in the case of subgraph isomorphism).

11 QAP Instances with Known Optimal Solution


[QAPs with known solution] Since the QAP is a very hard problem from a
practical point of view, often heuristics are the only reasonable approach to
solve it, and so far there exists no performance guarantees for any of the algo-
rithms developed for the QAP. One possibility to evaluate the performance
of heuristics and to compare different heuristics is given by QAP instances
with known optimal solution. Heuristics are applied to these instances and
the heuristic solution is compared to the optimal one known before hand.
The instances with known optimal solution should ideally have two prop-
erties: first, they should be representative in terms of their hardness, and
secondly, they should not be especially easy for any of the heuristics.
Two generators of QAP instances with known optimal solution have been
proposed so far: Palubeckis' generator [144] and the generator proposed by
Li and Pardalos [122].
The first method for generating QAP instances with a known opti-
mal solution was proposed by Palubeckis [144] in 1988. The input of the
Palubeckis' algorithm consists of the size n of the instance to be generated,
the optimal solution (permutation) 11' of the output instance, two control
parameters wand z, wh~re z < w, and the distance matrix A of an r x s
308 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

grid with rs = n. A contains rectilinear distances also called Manhattan


distances, i.e., the distance aij between two given knots i, j lying in rows ri,
rj and in columns Ci, Cj, respectively, is given by aij = Iri - rjl + ICi - cjl.
The output of the algorithm is a second matrix B such that 7r is an op-
timal solution of QAP(A, B). The idea is to start with a matrix B such
that QAP(A, B) is a trivial instance with optimal solution 7r. Then B is
transformed such that QAP(A, B) is not any more trivial, but 7r continues
to be its optimal solution.
Palubeckis starts with a constant matrix B = (bij) with bij = w. QAP(A, B)
is a trivial problem because all permutations yield the same value of the
objective function and thus, are optimal solutions. Hence, also the identity
permutation id is an optimal solution of QAP(A, B). Then matrix B is
iteratively transformed so that it is not a constant matrix any more and
the identity permutation remains an optimal solution of QAP(A, B). In the
last iteration the algorithm constructs an instance QAP(A', B) with optimal
solution 7r with the help of QAP(A, B) with optimal solution id, by setting
A' = (a 1T (i)1T(j»)' The optimal value of QAP(A', B) equals w Ei=l Ej=l aij'
Cyganski, Vaz and Virball [55] have observed that the QAP instances
generated by Palubeckis' generator are "easy" in the sense that their optimal
value can be computed in polynomial time by solving a linear program.
(For an accessible proof of this result the reader is referred to [41].) Notice,
however, that nothing is known about the computational complexity of QAP
instances generated by Palubeckis' generator. We believe that finding an
optimal solution for these QAPs is NP-hard, although the corresponding
decision problem is polynomially solvable.

Another generator of QAP instances with known solution has been pro-
posed by Li and Pardalos [122]. As Palubeckis' generator, Li and Pardalos
starts with a trivial instance QAP(A, B) with the identity permutation id
as optimal solution and iteratively transforms A and B so that the resulting
QAP instance still has the optimal solution id but is not trivial any more.
The transformations are such that for all i, j, i', j', aij ~ ai' j' is equivalent
to bij ~ b~j at the end of each iteration.
If the coefficient matrices are considered as weighted adjacency matrices
of graphs, each iteration transforms entries corresponding to some specific
subgraph equipped with signs on the edges and hence called sign-subgraphs.
The application of Li and Pardalos' algorithm with different sign-subgraphs
yields different QAP generators. A number of generators involving different
sign-subgraphs, e.g. subgraphs consisting of a single edge, signed triangles
The Quadratic Assignment Problem 309

and signed spanning trees have been tested. It is perhaps interesting and
surprising that QAP instances generated by involving more complex sign-
subgraphs are generally "easier" than those generated by involving sub-
graphs consisting of single edges. Here a QAP instance is considered to be
"easy", if most heuristics applied to it find a solution near to the optimal
one in a relatively short time. Nothing is known about the complexity of
QAP instances generated by the generator of Li and Pardalos, since the
arguments used to analyze Palubeckis' generator do not apply in this case.

12 Asymptotic Behavior
The QAP shows an interesting asymptotic behavior: under certain prob-
abilistic conditions on the coefficient matrices the QAP, the ratio between
its "best" and "worst" values of the objective function approaches 1, as the
size of the problem approaches infinity. This asymptotic behavior suggests
that the relative error of every heuristic method vanishes as the size of the
problem tends to infinity, i.e., every heuristic finds almost always an almost
optimal solution when applied to QAP instances which are large enough. In
other words the QAP becomes in some sense trivial as the size of the problem
tends to infinity. Burkard and Fincke [32] identify a common combinatorial
property of a number of problems which, under natural probabilistic condi-
tions on the problem data, behave as described above. This property seems
to be also the key for the specific asymptotic behavior of the QAP.
In an early work Burkard and Fincke [31] investigate the relative dif-
ference between the worst and the best value of the objective function for
Koopmans-Beckmann QAPs. They first consider the case where the co-
efficient matrix D is the matrix of pairwise distances of points chosen in-
dependently and uniformly from the unit square in the plane. Then the
general case where entries of the flow and distance matrices F and D are
independent random variables taken from a uniform distribution on [0, 1] is
considered. In both cases it is shown that the relative difference mentioned
above approaches 0 with probability tending to 1 as the size of the problem
tends to infinity.
Later Burkard and Fincke [32] consider the ratio between the objective func-
tion values corresponding to an optimal (or best) and a worst solution of a
generic combinatorial optimization problem described below.
Consider a sequence Pn , n E IN, of combinatorial optimization (minimiza-
tion) problems with sum objective function as described in Section 3. Let
310 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

en and:Fn be the ground set and the set of feasible solutions of problem Pn,
respectively. Moreover, let en: en -+ 1R+ and f::F -+ 1R+ be the nonnegative
cost function and the objective function for problem Pn, respectively, For
n E 1N, an optimal solution X opt minimizes the objective function, whereas
a worst solution Xwor E :Fn maximizes the objective function and is defined
as follows:
f{Xwor) = 2:
zexwor
en{x) = maxf{X) = max
Xe~ Xe~ zex
2: en{x).
It is shown in [32] that the behavior ofthe ratio f{Xopt )/ f{Xwor) is strongly
related to the ratio In l:Fnl/IXnl between the cardinality of the set of feasible
solutions :Fn and the cardinality of an arbitrary feasible solution X n , under
the assumption that all feasible solutions have the same cardinality.
Theorem 12.1 (Burkard and Fincke [32], 1985)
Let Pn be a sequence of combinatorial minimization problems with sum
objective function as described above. Assume that the following conditions
are fulfilled:
(BF1) For all X E :Fn , IXI = Ix(n)l, where x(n) is some feasible solution
in :Fn .
(BF2) The costs en{x), x E X, X E :Fn , n E 1N, are random variables
identically distributed on [0,1]. The expected value E = E{en{x)) and
the variance u 2 = u 2 (en(x» > 0 oj the common distribution are finite.
Moreover, for all X E :Fn , n E 1N, the variables en{x), x E X, are
independently distributed.
(BFS) l:Fnl and Ix(n)1 tend to infinity as n tends to infinity and moreover,
lim .\oIX(n)I-In l:Fnl-+ +00
n~oo

where .\0 is defined by .\0 := {fOU/{fO + 2(2 ))2 and fO fulfills

o< fO < (12 and 0 < ~ + fO :5 1 + f, (59)


- fO
for a given f > O.
Then, as n -+ 00
The Quadratic Assignment Problem 311

The combinatorial condition represented by the limit in (BF3) says that


the cardinality of the feasible solutions is large enough with respect to the
cardinality of the set of feasible solutions. Namely, the result of the theorem
is true if the following equality holds:
lim In IFni = 0
n-+oo Ix(n) I
The other conditions of Theorem 12.1 are natural probabilistic require-
ments on the coefficients of the problem. Theorem 12.1 states that for each
€ > 0, the ratio between the best and the worst value of the objective func-
tion lies on (1 - €, 1 + €), with probability tending to 1, as the "size" of the
problem approaches infinity. Thus, we have convergence with probability.
Under one additional natural (combinatorial) assumption (condition (83) of
the theorem below), 8zpankowski strengthens this result and improves the
range of the convergence to almost surely. In the almost sure convergence
the probability that the above mentioned ratio tends to 1 is equal to 1. (De-
tailed explanations on the probabilistic notions used in every text book on
probability theory.)

Theorem 12.2 (8zpankowski [170], 1995)


Let Pn be a sequence of combinatorial minimization problems with sum ob-
jective function as above. Assume that the following conditions are fulfilled:

(SI) For all X E F n , IXI = Ix(n)l, where x(n) is some feasible solution in
Fn·
(82) The costs en(x), x E X, X E F n , n E IN, are random variables
identically and independently distributed on [0,1]. The expected value
E = E(en(x»), the variance, and the third moment of the common
distribution are finite.

(S3) The worst values of the objective junction, max E en(x), form a
XEFn xEX
nondecreasing sequence for increasing n.

(S4) IFni and Ix(n)1 tend to infinity as n tends to infinity and moreover,
In IFni = o(lx(n)l).
Then, the following equalities hold almost surely:
312 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Theorems 12.1 and 12.2 can be applied to the QAP. The reason is that
the QAP fulfills the combinatorial condition (84) in Theorem 12.2 (and
therefore, also condition (BF3) in Theorem 12.1). Thus, we immediately
get the following corollary:
Corollary 12.3 Consider a sequence of problems QAP(A (n), B(n») for n E
IN, with n x n coefficient matrices A(n) = (a~j») and B = (b~j»). Assume
that a~j) and b~j), n E IN, 1 :5 i, j :5 n, are independently distributed random
variables on [0, M], where M is a positive constant. Moreover, assume that
entries a~j), n E IN, 1 :5 i, j :5 n, have the same distribution, and entries
b~j), n E IN, 1 :5 i,j :5 n, have also the same distribution (which does not
necessarily coincide with that of a~j)}. Furthermore, assume that these vari-
ables have finite expected values, variances and third moments.

,-
Let 1I"i:~ and 1I"t~ denote an optimal and a worst solution of QAP(A(n) , B(n»),
respectively, i.e.,
Z (A(n) B(n) 1I"(n») = min Z (A(n) B(n) 11")
, ~~ "
and
Z (A(n) B(n) 1I"(n»)
, 'wor
= 'lrES
max Z (A(n)
"
B(n) 11")
n

Then the following equality holds almost surely:


Z (A (n) B(n) 1I"(n»)
lim "opt =1
n-+oo Z (A(n) , B(n) , 1I"~J,.)

The above result suggests that the value of the objective function of the
problem QAP(A(n),B(n») (corresponding to an arbitrary feasible solution)
gets somehow close to its expected value n 2E(A)E(B), as the size of the
problem increases, where E(A) and E(B) are the expected values of a~j) and
b~j), n E IN, 1 :5 i,j :5 n, respectively. Frenk, Houweninge, and Rinnooy
Kan [69] and Rhee [159, 160] provide different analytical evaluations for this
"getting close", by imposing different probabilistic conditions on the data.
The following theorem states two important results proved in [69] and [160].
The Quadratic Assignment Problem 313

Theorem 12.4 (Frenk et al. [69), 1986, Rhee [160], 1991)


Consider the sequence of QAP(A(n),B(n), n E IN, as in Corollary 12.3.
Assume that the following conditions are fulfilled:
(C1) a~j), b~j), n E IN, 1 :$ i,j :$ n, are random variables independently
distributed on [0, M].

(C2) a~j), n E IN, 1 :$ i,j :$ n, have the same distribution on [0, M]. b~j),
n E IN, 1 :$ i, j :$ n, have also the same distribution on [0, M].
Let E(A), E(B) be the expected values of the variables a~j) and b~j), respec-
tively. Then, there exists a constant Kl (which does not depend on nY, such
7r
that the following inequality holds almost surely, for E Sn, n E IN
. .,fii Z(A(n), B(n), 7r)
h~1!P y'logn n 2 E(A)E(B) -1 :$ Kl
Moreover, let Y be a random variable defined by
Y = Z (A(n) , B(n), 7r~:n - n 2 E(A)E(B) ,
where 7r~:/ is an optimal solution of QAP(A(n) , B(n). Then there exists
another constant K2, also independent of the size of the problem, such that
1
K2 n 3/ 2(logn)1/2 :$ E(Y) :$ K 2n 3/2(logn)1/2

P{IY - E(Y)I ~ t}:<:; 2exp ( 4n2 I1 AllrIlBII;,)


for each t ;::: 0, where E(Y) denotes the expected value of variable Y and
IIAlloo ~IBlloo) is the so-called row sum norm of matrix A (B) defined by
IIAlloo = maxl~i~n Ej=llaijl·
These results on the asymptotic behavior of the QAP have been ex-
ploited by Dyer, Frieze, and McDiarmid [60] to analyze the performance
of branch and bound algorithms for QAPs with coefficients generated ran-
domly as described above. Dyer et al. have shown that for such QAPs the
optimal value of the continuous relaxation of Frieze and Yadegar's lineariza-
tion (19)-(26) is in O(n) with probability tending to 1 as the size n of the
QAP tends to infinity. Hence the gap between the optimal value of this
continuous relaxation and the optimal value of the QAP grows like O(n)
with probability tending to 1 as n tends to infinity. This result leads to the
following theorem.
314 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Theorem 12.5 (Dyer, Frieze, and McDiarmid [60], 1986)


Consider any branch and bound algorithm lor solving a QAP with ran-
domly generated coefficients as in Corollary 12.9, that uses single assignment
branching and employs a bound obtained by solving the continuous relaxation
01 the linearization {19}-{26}. The number 01 branched nodes explored is at
least n(l-o(l»n/4 with probability tending to 1 as the size n 01 the QAP tends
to infinity.

13 Related Problems
One possibility to obtain generalizations of the QAP is to consider objec-
tive functions of higher degree and obtain in this way cubic, biquadratic
and generally N -adic assignment problems (see e.g. [118]). For the cubic
assignment problem for example, we have n 6 cost coefficients Cijklmp where
i,j, k, " m,p = 1, ... , n, and the problem is given as follows:
n n n
min L L L CijklmpXijXklXmp
i,j=l k,l=l m,p=l
s.t (Xij) e Xn •
As it is noted in [118], we can construct an n 3 x n 3 matrix S containing
the cost coefficients, such that the cubic assignment problem is equivalent
to the LAP
min (S,Jr)
s.t. Jr = X ®X ®X,
XeX n •
In an analogous way the LAP can be extended to any N -adic assignment
problem, by considering the solution matrix Jr to be the Kronecker Nth
power of a permutation matrix in X n .
Another modification of the objective function which yields a problem
related tothe QAP, the bottleneck quadratic assignment problem (BQAP), is
the substitution of the sum by a max operation. The first occurrence of the
BQAP is due to Steinberg [168] and arises as an application in backboard
wiring while trying to minimize the maximum length of the involved wires
(see also Section 1).
In this section several of generalizations and problems related to the QAP
are presented, for which real applications have been found that initiated an
interest in analyzing them and proposing solution techniques.
The Quadratic Assignment Problem 315

13.1 The Bottleneck QAP


In the bottleneck quadratic assignment problem (BQAP) of size n we are
given an n x n flow matrix F and an n x n distance matrix D, and wish to
find a permutation 4> E Sn which minimizes the objective function

max{fijdq,(i)q,(j): 1 ~ i,j ~ n}.

A more general BQAP analogous to the QAP in (2) is obtained if the coef-
ficients of the problem are of the form Cijkl' 1 ~ i,j, k, 1 ~ n:

min ~~ Cijq,(i)q,(j).
q,ESn l$',J$n

Besides the application in backboard wiring mentioned above, the BQAP


has many other applications. Basically, all QAP applications give raise to
applications of the BQAP because it often makes sense to minimize the
largest cost instead of the overall cost incurred by some decision. A well
studied problem in graph theory which can be modeled as a BQAP is the
bandwidth problem. In the bandwidth problem we are given an undirected
graph G = (V, E) with vertex set V and edge set E, and seek a labeling
of the vertices of G by the numbers 1,2, ... , n, where IVI = n, such that
the minimum absolute value of differences of labels of vertices which are
connected by an edge is minimized. In other words, we seek a labeling
of vertices such that the maximum distance of I-entries of the resulting
adjacency matrix from the diagonal is minimized, i.e., the bandwidth of the
adjacency matrix is minimized. It is easy to see that this problem can be
modeled as a special BQAP with flow matrix equal to the adjacency matrix
of G for some arbitrary labeling of vertices, and distance matrix D = (Ii - j I).

The BQAP is NP-hard since it contains the bottleneck TSP as a special case.
(This is analogous to the fact that the QAP contains the TSP as a special
case, as it is shown in Section 13.4). Some enumeration algorithms to solve
BQAP to optimality have been proposed by Burkard [22]. These algorithms
employ a Gilmore-Lawler-like bound for the BQAP which involves in turn
the solution of bottleneck linear assignment problems. The algorithm for
the general BQAP involves also a threshold procedure useful to reduce to 0
as many coefficients as possible.
Burkard and Fincke [30] investigated the asymptotic behavior of the BQAP
and proved results analogous to those obtained for the QAP: If the coeffi-
cients are independent random variables taken from a uniform distribution
316 R.E. Burka.rd, E. vela, P.M. Pardalos, and L.S. Pitsoulis

on [0,1], then the relative difference between the worst and the best value
of the objective function approaches 0 with probability tending to 0 as the
size of the problem approaches infinity.
The BQAP and the QAP are special cases of a more general quadratic
assignment problem which can be called the algebraic QAP (in analogy
to the algebraic linear assignment problem (LAP) introduced by Burkard,
Hahn, and Zimmermann [33]). If (H, *,~) is a totally ordered commutative
semigroup with composition * and order relation ~, the algebraic QAP with
cost coefficients Cijkl E H is formulated as

min CU4>(l)4>(l)
4>eSn
* C124>(l)4>(2) * ... * Cln4>(l)4>(n) * ... * Cnn4>(n)4>(n) •
The study of the bottleneck QAP and more generally the algebraic QAP
was the starting point for the investigation of a number of algebraic combi-
natorial optimization problem with coefficients taken from linearly ordered
semimodules e.g. linear assignment and transportation problems, flow prob-
lems etc. The reader is referred to Burkard and Zimmermann [38] for a
detailed discussion on this topic.

13.2 The BiQuadratic Assignment Problem


A generalization of the QAP is the BiQuadratic Assignment Problem, de-
noted BiQAP, which is essentially a quartic assignment problem with cost
coefficients formed by the products of two four-dimensional arrays. More
specifically, consider two n 4 x n 4 arrays F = (fijkl) and D = (dmpst ). The
BiQAP can then be stated as:
n n n n
min L L L L fijkldmpstXimXjpXksXlt
i,j=l k,l=l m,p=l B,t=l
n
s.t. LXij = 1, j = 1,2, ... ,n,
i=l
n
LXij = 1, i = 1,2, ... ,n,
j=l
Xij E {0,1}, i,j = 1,2, ... ,n.

The major application of the BiQAP arises in Very Large Scale Inte-
grated (VLSI) circuit design. The majority of VLSI circuits are sequential
circuits, and their design process consists of two steps: first, translate the
The Quadratic Assignment Problem 317

circuit specifications into a state transition table by modeling the system


using finite state machines, and secondly, try to find an encoding of the
states such that the actual implementation is of minimum size. A detailed
description of the mathematical modeling of the VLSI problem to a BiQAP
is given by Burkard, Qela and Klinz [27]. Equivalently, the BiQAP can be
stated as:
n n n n
r~n
E n
L L L L fijkldt/J(i)t/J(j)t/J(k)t/J(l),
i=l j=l k=ll=l

where Sn denotes the set of all permutations of N = {1, 2, ... ,n}. All
different formulations for the QAP can be extended to the BiQAP, as well
as most of the linearizations that have appeared for the QAP.
Burkard et al. [27] compute lower bounds for the BiQAP derived from
lower bounds of the QAP. The computational results showed that these
bounds are weak and deteriorate as the dimension of the problem increases.
This observation suggests that branch and bound methods will only be ef-
fective on very small instances. For larger instances, efficient heuristics, that
find good-quality approximate solutions, are needed. Several heuristics for
the BiQAP have been developed by Burkard and Qela [25], in particular de-
terministic improvement methods and variants of simulated annealing and
tabu search algorithms. Computational experiments on test problems of size
up to n = 32, with known optimal solutions (a test problem generator is
presented in [27]), suggest that one version of simulated annealing is best
among those tested. The GRASP heuristic has also been applied to the
BiQAP by Mavridou, Pardalos, Pitsoulis and Resende [132]' and produced
the optimal solution for all the test problems generated in [27].

13.3 The Quadratic Semi-Assignment Problem


In the quadratic semi-assignment problem (QSAP) we are given again two
coefficient matrices, a flow matrix F = (lij) and a distance matrix D = (dij),
but in this case we have n "objects" and m "locations", n > m. We want to
assing all objects to locations and at least one object to each location so as
to minimize the overall distance covered by the flow of materials (or people)
moving between different objects. Thus the objective function is the same
as that of the QAP, and the only different concerns the feasible solutions
which are not one-to-one mappings (or bijections) between the set of objects
and locations but arbitrary functions mapping the set of objects to the set
318 R.E. Burkard, E. Oela, P.M. Pardalos, and L.S. Pitsoulis

of locations. Thus SQAP can be formulated as follows:


n n n n n
min E E E E fijdklXileXjl + E bijXij (60)
i=lj=lk=ll=l iJ=l
n
s.t. EXij = 1, i = 1,2, ... ,n, (61)
j=l
Xij E {O, I}, i, j = 1,2, ... ,n. (62)

SQAP unifies some interesting combinatorial optimization problems like


clustering, m-coloring. In a clustering problem we are given n objects and
a dissimilarity matrix F = (lij). The goal is to find a partition of these
objects into m classes so as to minimize the sum of dissimilarities of ob-
jects belonging to the same class. Obviously this problem is a QSAP with
coefficient matrices F and D, where D is an m x m identity matrix. In
the m-coloring problem we are given a graph with n vertices and want to
check whether its vertices can be colored by m different colors such that
~ecb. two vertices which are joined by an edge receive different colors. This
pi 'JiJ:;:>m can be modeled as a SQAP with F equal to the adjacency matrix
of tJ~: given graph and D the m x m identity matrix. The m-coloring has
an fillswer "yes" if and only if the above SQAP has optimal value equal to
O. Practical applications of the SQAP include distributed computing [169]
and scheduling [45].
SQAP was originally introduced by Greenberg [85]. As pointed out by
Malucelli [128] this problem is NP-hard. Milis and Magirou [134] propose a
Lagrangean relaxation algorithm for this problem, and show that similarly
as for for the QAP, it is very hard to provide optimal solutions even for
SQAPs of small size. Lower bounds for the SQAP have been provided by
Malucelli and Pretolani [129], and polynomially solvable special cases have
been discussed by Malucelli [128].

13.4 Other Problems Which Can Be Formulated as QAPs


There are a number of other well known combinatorial optimization prob-
lems which can be formulated as QAPs with specific coefficient matrices. Of
course, since QAP is not a well tractable problem, it does not make sense
to use algorithms developed for the QAP to solve these other problems. All
known solution methods for the QAP are far inferior compared to any of
the specialized algorithms developed for solving these problems. However,
The Quadratic Assignment Problem 319

the relationship between the QAP and these problems might be of benefit
for a better understanding of the QAP and its inherent complexity.

Graph Partitioning and Maximum Clique


Two well studied NP-hard combinatorial optimization problems which are
special cases of the QAP, are the graph partitioning problem (GPP) and the
maximum clique problem (MCP). In GP we are given an (edge) weighted
graph G = (V, E) with n vertices and a number k which divides n. We
want to partition the set V into k sets of equal cardinality such that the
total weight of the edges cut by the partition is minimized. This problem
can be formulated as a QAP with distance matrix D equal to the weighted
adjacency matrix of G, and flow matrix F obtained by multiplying with -1
the adjacency matrix of the union of k disjoint complete subgraphs with
njk vertices each. For more informations on graph partitioning problems
the reader is referred to Lengauer [120].
In the maximum clique problem we are again given a graph G = (V, E)
with n vertices and wish to find the maximum k :5 n such that there exists
a subset VI ~ V which induces a clique in G, i.e. all vertices of VI are
connected by edges of G. In this case consider a QAP with distance matrix
D equal to the adjacency matrix of G and flow matrix F given as adjacency
matrix of a graph consisting of a clique of size k and n - k isolated vertices,
multiplied by -1. A clique of size k in G exists only if the optimal value
of the corresponding QAP is _k2 • For a review on the maximum clique
problem the reader is referred to [151].

The Traveling Salesman Problem


In the traveling salesman problem (TSP) we are given a set of cities and
the pairwise distances between them, and our task is to find a tour which
visits each city exactly once and has minimal length. Let the set of integers
N = {I, 2, ... , n} represent the n cities and let the symmetric n x n matrix
D = (dij) represent the distances between the cities, where dij is the distance
between city i and city j (dii = 0 Vi = 1,2, ... , n). The TSP can be
formulated as
n-l
min L dt/>(i)t/>(i+1) + dt/>(n)t/>(l) (63)
i=l
s.t. if> E Sn'
320 R.E. Burkard, E. vela, P.M. Pardalos, and L.8. Pitsoulis

The TSP can be formulated as a QAP with the given distance matrix and a
flow matrix F being equal to the adjacency matrix of a cycle on n vertices.
The traveling salesman problem (TSP) is a notorious NP-hard combina-
torial optimization problem. Among the abounding literature on the TSP we
select the book edited by Lawler, Lenstra, Rinnooy Kan and Schmoys [119]
as a comprehensive reference.

The linear arrangement problem


In the linear arrangement problem we are given a graph G = (V, E) and wish
to place its vertices at the points 1, 2, ... , n on the line so as to minimize the
sum of pairwise distances between vertices of G which are joined by some
edge. If we consider the more general version of weighted graphs than we
obtain the backboard wiring problem (see Section 1). This is an NP-hard
problem as mentioned by Garey and Johnson [73]. It can be formulated as
a QAP with distance matrix the (weighted) adjacency matrix of the given
graph, and flow matrix F = (fij) given by lij = Ii - jl, for all i,j.

The minimum weight feedback arc set problem


In the minimum weight feedback arc set problem (FASP) a weighted digraph
G = (V, E) with vertex set V and arc set E is given. The goal is to remove
a set of arcs from E with minimum overall weight, such that all directed
cycles, so-called dicycles, in G are destroyed and an acyclic directed sub-
graph remains. Clearly, the minimum weight feedback arc set problem is
equivalent to the problem of finding an acyclic subgraph of G with maxi-
mum weight. The unweighted version of the FASP, that is a FASP where
the edge weights of the underlying digraph equal 0 or 1, is called the acyclic
subdigraph problem and is treated extensively by Junger [99].
An interesting application of the FASP is the so-called triangulation of input-
output tables which arises along with input-output analysis in economics
used to forecast the development of industries, see Leontief [121]. For de-
tails and a concrete description of the application of triangulation results in
economics the reader is referred to Conrad [54] and Reinelt [153].
Since the vertices of an acyclic subdigraph can be labeled topologically,
i.e. such that in each arc the label of its head is larger than that of its tail,
the FASP can be formulated as a QAP. The distance matrix of the QAP is
the weighted adjacency matrix of G and the flow matrix F = (fij) is a lower
triangular matrix, i.e., hj = -1 if i ::; j and hj = 0, otherwise.
The Quadratic Assignment Problem 321

The FASP is well known to be NP-hard (see Karp [107], Garey and
Johnson [73]).

Packing problems in graphs


Another well known NP-hard problem which can be formulated as a QAP
is the graph packing problem cf. Bollobas [18]. In a graph packing problem
we are given graphs G I = (VI,Ed, G 2 = (V2,E2) with n vertices each and
edge sets EI and E2. A permutation 7r of {I, 2, ... ,n} is called a packing of
G2 into G I , if (i,j) EEl implies (7r(i), 7r(j)) ¢ E 2 , for 1 ~ i,j ~ n. In other
words, a packing of G2 into GI is an embedding of the vertices of G2 into
the vertices of G I such that no pair of edges coincide. The graph packing
problem consists of finding a packing of G2 into G I, if one exists, or proving
that no packing exists.
The graph packing problem can be formulated as a QAP with distance
matrix equal to the adjacency matrix of G2 and flow matrix equal to the
adjacency matrix of G I . A packing of G2 into G I exists if and only if the
optimal value of this QAP is equal to O. In the positive case the optimal
solution of the QAP determines a packing.

References
[1] E. H. L. Aarts and J. Korst, Simulated Annealing and Boltzmann
Machines: A Stochastic Approach to Combinatorial Optimization and
Neural Computing, Wiley, Chichester, 1989.

[2] E. Aarts and J. K. Lenstra, eds., Local Search in Combinatorial Op-


timization, Wiley, Chichester, 1997.

[3] W. P. Adams and T. A. Johnson, Improved linear programming-based


lower bounds for the quadratic assignment problem, in Quadratic As-
signment and Related Problems, P. M. Pardalos and H. Wolkowicz,
eds., DIMACS Series on Discrete Mathematics and Theoretical Com-
puter Science 16, 1994, 43-75, AMS, Providence, RI.

[4] W. P. Adams and H. D. Sherali, A tight linearization and an al-


gorithm for zero-one quadratic programming problems, Management
Science 32, 1986, 1274-1290.
322 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

[5] W. P. Adams and H. D. Sherali, Linearization strategies for a class


of zero-one mixed integer programming problems, Operations Re-
search 38, 1990, 217-226.

[6] R. K. Ahuja, J. B. Orlin, and A. Tivari, A greedy genetic algorithm


for the quadratic assignment problem, Working paper 3826-95, Sloan
School of Management, MIT, 1995.

[7] S. Arora, A. Frieze, and H. Kaplan, A new rounding procedure for


the assignment problem with applications to dense graph arrange-
ment problems, Proceedings of the 37-th Annual IEEE Symposium on
Foundations of Computer Science (FOCS), 1996, 21-30.

[8] A. A. Assad, and W. Xu, On lower bounds for a class of quadratic 0-1
programs, Operations Research Letters 4, 1985, 175-180.

[9] E. Balas and J. B. Mazzola, Quadratic 0-1 programming by a new


linearization, presented at the Joint ORSA/TIMS National Meeting,
1980, Washington D.C.

[10] E. Balas and J. B. Mazzola, Nonlinear programming: I. Linearization


techniques, Mathematical Programming 30, 1984, 1-21.

[11] E. Balas and J. B. Mazzola, Nonlinear programming: II. Dominance


relations and algorithms, Mathematical Programming 30, 1984, 22-
45.

[12] A. I. Barvinok, Computational complexity of orbits in representations


of symmetric groups, Advances in Soviet Mathematics 9, 1992, 161-
182.

[13] R. Battiti and G. Tecchiolli, The reactive tabu search, ORSA Journal
on Computing 6, 1994, 126-140.

[14] M. S. Bazaraa and O. Kirca, Branch and bound based heuristics for
solving the quadratic assignment problem, Naval Research Logistics
Quarterly 30, 1983, 287-304.

[15] M. S. Bazaraa and H. D. Sherali, Benders' partitioning scheme applied


to a new formulation of the quadratic assignment problem, Naval Re-
search Logistics Quarterly 27, 1980, 29-41.
The Quadratic Assignment Problem 323

[16] M. S. Bazaraa and H. D. Sherali, On the use of exact and heuristic


cutting plane methods for the quadratic assignment problem, Journal
of Operations Research Society 33, 1982, 991-1003.

[17] G. Birkhoff, Tres observaciones sobre el algebra lineal, Univ. Nac.


Tucuman Rev., Ser. A, 1946, 147-151.

[18] B. Bollobas, Extremal Graph Theory, Academic Press, London, 1978.

[19] A. Bruengger, J. Clausen, A. Marzetta, and M. Perregaard, Joining


forces in solving large-scale quadratic assignment problems in parallel,
in Proceedings of the II-th IEEE International Parallel Processing
Symposium (IPPS), 1997, 418-427.

[20] E. S. Buffa, G. C. Armour, and T. E. Vollmann, Allocating facilities


with CRAFT, Harvard Business Review 42, 1962, 136-158.

[21] R. E. Burkard, Die Storungsmethode zur Losung quadratischer Zuord-


nungsprobleme, Operations Research Verfahren 16, 1973, 84-108.

[22] R. E. Burkard, Quadratische Bottleneckprobleme, Operations Re-


search Verfahren 18, 1974, 26-41.

[23] R. E. Burkard, Locations with spatial interactions: the quadratic as-


signment problem, in Discrete Location Theory, P. B. Mirchandani
and R. L. Francis, eds., Wiley, 1991.

[24] R. E. Burkard and T. Bonniger, A heuristic for quadratic boolean pro-


grams with applications to quadratic assignment problems, European
Journal of Operational Research 13, 1983, 374-386.

[25] R. E. Burkard and E. Qela, Heuristics for biquadratic assignment prob-


lems and their computational comparison, European Journal of Oper-
ational Research 83, 1995, 283-300.

[26] R. E. Burkard, E. Qela, V. M. Demidenko, N. N. Metelski, and G. J.


Woeginger, Perspectives of Easy and Hard Cases of the Quadratic As-
signment Problems, SFB Report 104, Institute of Mathematics, Tech-
nical University Graz, Austria, 1997.

[27] R. E. Burkard, E. Qela, and B. Klinz, On the biquadratic assign-


ment problem, in Quadratic Assignment and Related Problems, P. M.
324 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

Pardalos and H. Wolkowicz, eds., DIMACS Series on Discrete Math-


ematics and Theoretical Computer Science 16, 1994, 117-146, AMS,
Providence, RI.

[28] R. E. Burkard, E. Qela, G. Rote, and G. J. Woeginger, The quadratic


assignment problem with an Anti-Monge and a Toeplitz matrix: Easy
and hard cases, SFB Report 34, Institute of Mathematics, Technical
University Graz, Austria, 1995. To appear in Mathematical Program-
ming.

[29] R. E. Burkard and U. Derigs, Assignment and matching problems:


Solution methods with Fortran programs, Lecture Notes in Economics
and Mathematical Systems 184, Springer-Verlag, Berlin, 1980.

[30] R. E. Burkard and U. Fincke, On random quadratic bottleneck assign-


ment problems, Mathematical Programming 23,1982,227-232.

[31] R. E. Burkard and U. Fincke, The asymptotic probabilistic behavior


of the quadratic sum assignment problem, Zeitschrift fur Operations
Research 27, 1983, 73-81.

[32] R. E. Burkard and U. Fincke, Probabilistic asymptotic properties of


some combinatorial optimization problems, Discrete Applied Mathe-
matics 12, 1985, 21-29.

[33] R. E. Burkard, W. Hahn and U. Zimmermann, An algebraic approach


to assignment problems, Mathematical Programming 12, 1977, 318-
327.

[34] R. E. Burkard, S. E. Karisch, and F. Rendl, QAPLIB - a quadratic


assignment prob-
lem library, Journal of Global Optimization 10, 1997, 391-403. An
on-line version is available via World Wide Web at the following URL:
http://www.opt.math.tu-graz.ac.at/~arisch/qaplib/

[35] R. E. Burkard, B. Klinz, and R. Rudolf, Perspectives of Monge proper-


ties in optimization, Discrete Applied Mathematics 70, 1996, 95-161.

[36] R. E. Burkard and J. Offermann, Entwurf von Schreibmaschinen-


tastaturen mittels quadratischer Zuordnungsprobleme, Zeitschrift fur
Operations Research 21, 1977, B121-B132, (in German).
The Quadratic Assignment Problem 325

[37] R. E. Burkard and F. Rendl, A thermodynamically motivated simu-


lation procedure for combinatorial optimization problems, European
Journal Operational Research 17,1984,169-174.
[38] R. E. Burkard and U. Zimmermann, Combinatorial optimization in
linearly ordered semimodules: a survey, in Modern Applied Mathe-
matics, B. Korte, ed., North Holland, Amsterdam, 1982, 392-436.
[39] P. Carraresi and F. Malucelli, A new lower bound for the quadratic
assignment problem, Operations Research 40, 1992, Suppl. No.1,
S22-S27.
[40] P. Carraresi and F. Malucelli, A reformulation scheme and new lower
bounds for the QAP, in Quadratic Assignment and Related Problems,
P. Pardalos and H. Wolkowicz, eds., DIMACS Series in Discrete Math-
ematics and Theoretical Computer Science 16, 1994, 147-160, AMS,
Providence, Rl.
[41] E. Qela, The Quadratic Assignment Problem: Theory and Algorithms,
Kluwer Academic Publishers, DOl'drecht, The Netherlands, 1998.
[42] V. Cerny, Thermodynamical approach to the traveling salesman prob-
lem: An efficient simulation algorithm, Journal of Optimization The-
ory and Applications 45, 1985, 41-51.
[43] J. Chakrapani and J. Skorin-Kapov, Massively parallel tabu search for
the quadratic assignment problem, Annals of Operations Research 41,
1993, 327-342.
[44] J. Chakrapani and J. Skorin-Kapov, A constructive method to improve
lower bounds for the quadratic assignment problem, in Quadratic As-
signment and Related Problems, P. Pardalos and H. Wolkowicz, eds.,
DIMACS Series in Discrete Mathematics and Theoretical Computer
Science 16, 1994, 161-171, AMS, Providence, Rl.
[45] P. Chretienne, A polynomial algorithm to optimally schedule tasks
on a virtual distributed system under tree-like precedence constraints,
European Journal of Operational Research 43, 1989, 225-230.
[46] N. Christofides, Worst case analysis of a new heuristic for the travel-
ing salesman problem, Technical Report 338, Graduate School of In-
dustrial Administration, Carnegie-Mellon University, Pittsburgh, PA,
1976.
326 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

[47] N. Christofides and E. Benavent, An exact algorithm for the quadratic


assignment problem, Operations Research 37, 1989, 760-768.

[48] N. Christofides and M. Gerrard, A graph theoretic analysis of bounds


for the quadratic assignment problem, in Studies on Graphs and Dis-
crete Programming, P.Hansen, ed., North Holland, 1981, pp. 61-68.

[49] J. Clausen, S. E. Karisch, M. Perregaard, and F. Rendl, On the ap-


plicability of lower bounds for solving rectilinear quadratic assign-
ment problems in parallel, Computational Optimization and Appli-
cations 10, 1998, 127-147.

[50] J. Clausen and M. Perregaard, Solving large quadratic assignment


problems in parallel, Computational Optimization and Applications 8,
1997, 111-127.

[51] A. Colorni, M. Dorigo, and V. Maniezzo, The ant system: optimization


by a colony of cooperating agents, IEEE Transactions on Systems,
Man, and Cybernetics -Part B 26, 1996, 29-41.

[52] A. Colorni and V. Maniezzo, The ant system applied to the quadratic
assignment problem, to appear in IEEE Transactions on Knowledge
and Data Engineering, 1998.
[53] D. T. Connolly, An improved annealing scheme for the QAP, European
Journal of Operational Research 46, 1990, 93-100.
[54] K. Conrad, Das Quadratische Zuweisungsproblem und zwei seiner
Spezia1£aJle, Mohr-Siebeck, Tiibingen, 1971.
[55] D. Cyganski, R. F. Vaz, and V. G. Virball, Quadratic assignment
problems with the Palubeckis' algorithm are degenerate, IEEE Trans-
actions on Circuits and Systems-I 41, 1994, 481-484.

[56] L. Davis, Genetic Algorithms and Simulated Annealing, Pitman, Lon-


don, 1987.

[57] V. G. Delneko and G. J. Woeginger, A solvable case of the quadratic


assignment problem, SFB Report 88, Institute of Mathematics, Tech-
nical University Graz, Austria, 1996.
[58] J. W. Dickey and J. W. Hopkins, Campus building arrangement using
TOPAZ, Transportation Research 6,1972,59-68.
The Quadratic Assignment Problem 327

[59] M. Dorigo, Optimization, Learning, and Natural algorithms, Ph.D.


Thesis, Dipartimento die Elettronica e Informazione, Politecnico di
Milano, Milano, Italy, 1992, (in Italian).

[60] M. E. Dyer, A. M. Frieze, and C. J. H. McDiarmid, On linear programs


with random costs, Mathematical Programming 35, 1986, 3-16.

[61] C. S. Edwards, The derivation of a greedy approximator for the


Koopmans-Beckmann quadratic assignment problem, Proceedings of
the 77-th Combinatorial Programming Conference (CP77), 1977, 55-
86.

[62] C. S. Edwards, A branch and bound algorithm for the Koopmans-


Beckmann quadratic assignment problem, Mathematical Program-
ming Study 13, 1980, 35-52.

[63] A. N. Elshafei, Hospital layout as a quadratic assignment problem,


Operations Research Quarterly 28, 1977, 167-179.

[64] T. Espersen, S. E. Karisch, E. Qela, and J. Clausen, QAPPACK- a JAVA


package for solving quadratic assignment problems, working paper,
Department of Mathematical Modelling, Technical University of Den-
mark, Denmark, and Institute of Mathematics, Technical University
Graz, Austria.

[65] T. A. Feo, M. G. C. Resende, and S. H. Smith, A greedy randomized


adaptive search procedure for the maximum independent set, Techni-
cal report, AT&T Bell Laboratories, Murray Hill, NJ, 1989. To appear
in Operations Research.

[66] T. A. Feo and M. G. C. Resende, Greedy randomized adaptive search


procedures, Journal of Global Optimization 6, 1995, 109-133.

[67] G. Finke, R. E. Burkard, and F. Rendl, Quadratic assignment prob-


lems, Annals of Discrete Mathematics 31, 1987, 61-82.

[68] C. Fleurent and J. Ferland, Genetic hybrids for the quadratic assign-
ment problem, in Quadratic Assignment and Related Problems, P.
Pardalos and H. Wolkowicz, eds., DIMACS Series in Discrete Math-
ematics and Theoretical Computer Science 16, 1994, 173-187, AMS,
Providence, RI.
328 R.E. Burkard, E. vela, P.M. Pardalos, and L.S. Pitsoulis

[69] J. B. G. Frenk, M. van Houweninge, and A. H. G. llinnooy Kan,


Asymptotic properties of the quadratic assignment problem, Mathe-
matics of Operations Research 10, 1985, 100-116.

[70] A. M. Frieze and J. Yadegar, On the quadratic assignment problem,


Discrete Applied Mathematics 5, 1983, 89-98.

[71] A. M. Frieze, J. Yadegar, S. EI-Horbaty, and D. Parkinson, Algo-


rithms for assignment problems on an array processor, Parallel Com-
puting 11, 1989, 151-162.

[72] L. M. Gambardella, E. D. Taillard, and M. Dorigo, Ant colonies for


the QAP, Technical Report IDSIA-4-97, 1997, Istituto dalle Molle Di
Studi sull' Intelligenza Artificiale, Lugano, Switzerland.

[73] M. R. Garey and D. S. Johnson, Computers and intractability: A guide


to the theory of NP-completeness, W. H. Freeman and Company, New
York, 1979.

[74] J. W. Gavett and N. V. Plyter, The optimal assignment offacilities to


locations by branch and bound, Operations Research 14, 1966, 210-
232.

[75] A. M. Geoffrion, Lagrangean relaxation and its uses in integer pro-


gramming, Mathematical Programming Study 2,1974,82-114.

[76] A. M. Geoffrion and G. W. Graves, Scheduling parallel production


lines with changeover costs: Practical applications of a quadratic as-
signment/LP approach. Operations Research 24, 1976, 595-610.

[77] P. C. Gilmore, Optimal and suboptimal algorithms for the quadratic


assignment problem, SIAM Journal on Applied Mathematics 10, 1962,
305-313.

[78] F. Glover, Improved linear integer programming formulations of non-


linear integer problems, Management Science 22, 1975, 455-460.

[79] F. Glover, Tabu search - Part I, ORSA Journal on Computing 1, 1989,


190-206.

[80] F. Glover, Tabu search - Part II, ORSA Journal on Computing 2,


1989,4-32.
Tbe Quadratic Assignment Problem 329

[81] F. Glover, M. Laguna, E. Taillard, and D. de Werra, eds., Tabu search,


Annals of Operations Research 41, 1993.

[82] M. X. Goemans and D. P. Williamson, Improved approximation algo-


rithms for maximum cut and satisfiability problems using semidefinite
programming, Journal oftbe ACM 42,1995, 1115-1145.

[83] D. E. Goldberg, Genetic Algoritbms in Search, Optimization and Ma-


chine Learning, Addison-Wesley, Wokingham, England, 1989.

[84] A. Graham, Kronecker Products and Matrix Calculus witb Applica-


tions, Halsted Press, Toronto, 1981.

[85] H. Greenberg, A quadratic assignment problem without column con-


straints, Naval Research Logistic Quarterly 16, 1969, 417-422.

[86] S. W. Hadley, Continuous Optimization Approaches for tbe Quadratic


Assignment Problem, PhD thesis, University of Waterloo, Ontario,
Canada, 1989.

[87] S. W. Hadley, F. Rendl, and H. Wolkowicz, Bounds for the quadratic


assignment problem using continuous optimization techniques, Pro-
ceedings of tbe I-st Integer Programming and Combinatorial Opti-
mization Conference (IPCO), University of Waterloo Press, 1990, 237-
248.

[88] S. W. Hadley, F. Rendl, and H. Wolkowicz, A new lower bound via


projection for the quadratic assignment problem, Mathematics of Op-
erations Research 17, 1992, 727-739.

[89] S. W. Hadley, F. Rendl, and H. Wolkowicz, Nonsymmetric quadratic


assignment problems and the Hoffman-Wielandt inequality, Linear Al-
gebra and its Applications 58, 1992, 109-124.

[90] P. Hahn and T. Grant, Lower bounds for the quadratic assignment
problem based upon a dual formulation, to appear in Operations Re-
search.

[91] P. Hahn, T. Grant, and N. Hall, Solution of the quadratic assignment


problem using the Hungarian method, to appear in European Journal
of Operational Research.
330 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

[92] G. G. Hardy, J. E. Littlewood, and G. P6lya, Inequalities, Cambridge


University Press, London and New York, 1952.

[93] D. R. Heffley, Assigning runners to a relay team, in Optimal Strate-


gies in Sports, S. P. Ladany and R. E. Machol, eds., North-Holland,
Amsterdam, 1977, 169-171.

[94] C. H. Heider, A computationally simplified pair exchange algorithm for


the quadratic assignment problem, Paper No. 101, Center for Naval
Analysis, Arlington, Virginia, 1972.

[95] J. H. Holland, Adaptation in Natural and Artificial Systems, Univer-


sity of Michigan Press, Ann Arbor, 1975.

[96] B. Jansen. A note on lower bounds for the QAP, Technical report,
Mathematics and Computer Science, Delft University of Technology,
The Netherlands, December 1993.

[97] D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis, How easy


is local search, Journal of Computer and System Sciences 37, 1988,
79-100.

[98] T. A. Johnson, New linear programming-based solution procedures for


the quadratic assignment problem, Ph.D. Thesis, Clemson University,
SC,1992.

[99] M. Junger, Polyhedral Combinatorics and the Acyclic Subdigraph


Problem, Heldermann Verlag, Berlin, Germany, 1985.

[100] M. Junger and V. Kaibel, A basic study of the QAP polytope, Techni-
cal Report 96.215, Institut fiir Informatik, Universitat zu Kaln, Ger-
many, 1996.

[101] M. Junger and V. Kaibel, On the SQAP polytope, Technical Report


96.241, Institut fiir Informatik, Universitat zu Kaln, Germany, 1996.

[102] V. Kaibel, Polyhedral Combinatorics of the Quadratic Assignment


Problem, Ph.D. Thesis, Universitat zu Kaln, Germany, 1997.

[103] S. E. Karisch, Nonlinear Approaches for Quadratic Assignment and


Graph Partition Problems, Ph.D. Thesis, Technical University Graz,
Austria, 1995.
The Quadratic Assignment Problem 331

[104] S. E. Karisch, E. Qela, J. Clausen, and T. Espersen, A dual frame-


work for lower bounds of the quadratic assignment problem based on
linearization, SFB Report 120, Institute of Mathematics, Technical
University Graz, Austria, 1997.

[105] S. E. Karisch and F. Rendl, Lower bounds for the quadratic assignment
problem via triangle decompositions, Mathematical Programming 71,
1995, 137-151.

[106] S. E. Karisch, F. Rendl, and H. Wolkowicz, Trust regions and relax-


ations for the quadratic assignment problem, in Quadratic Assignment
and Related Problems, P. Pardalos and H. Wolkowicz, eds., DIMACS
Series in Discrete Mathematics and Theoretical Computer Science 16,
1994, 199-220, AMS, Providence, RI.

[107] R. M. Karp, Reducibility among combinatorial problems, in Complex-


ity of Computer Computations, R. E. Miller and J. W. Thatcher, eds.,
Plenum, New York, 1972, 85-103.

[108] L. Kaufmann and F.. Broeckx, An algorithm for the quadratic as-
signment problem using Benders' decomposition, European Journal
of Operational Research 2, 1978, 204-211.

[109] B. Kernighan and S. Lin, An efficient heuristic procedure for parti-


tioning graphs, Bell Systems Journal 49, 1972, 291-307.

[UO] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by sim-


ulated annealing, Science 220, 1983, 671-680.

[111] J. G. Klincewicz, Avoiding local optima in the p-hub location problem


using tabu search and GRASP, Annals of Operations 40,1992,283-302.

[112] J. G. Klincewicz and A. Raj an, Using GRASP to solve the compo-
nent grouping problem, Technical report, AT&T Bell Laboratories,
Holmdel, NJ, 1992.

[113] T. C. Koopmans and M. J. Beckmann, Assignment problems and the


location of economic activities, Econometrica 25, 1957, 53-76.

[114] J. Krarup and P. M. Pruzan, Computer-aided layout design, Mathe-


matical Programming Study 9, 1978, 75-94.
332 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

[115] P. J. M. van Laarhoven and E. H. L. Aarts, Simulated Annealing:


Theory and Applications, D. Reidel Publishing Company, Dordrecht,
1988.

[116] A. M. Land, A problem of assignment with interrelated costs, Opera-


tions Research Quarterly 14, 1963, 185-198.

[117] G. Laporte and H. Mercure, Balancing hydraulic turbine runners: A


quadratic assignment problem, European Journal of Operational Re-
search 35, 1988, 378-382.

[118] E. L. Lawler, The quadratic assignment problem, Management Sci-


ence 9, 1963, 586-599.

[119] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys,


eds., The Traveling Salesman Problem, Wiley, Chichester, 1985.

[120] T. Lengauer, Combinatorial Algorithms for Intergrated Circuit Lay-


out, Wiley, Chichester, 1990.

[121] W. Leontief, Input-Output Economics, Oxford University Press, New


York, 1966.

[122] Y. Li and P. M. Pardalos, Generating quadratic assignment test prob-


lems with known optimal permutations, Computational Optimization
and Applications 1, 1992, 163-184.

[123] Y. Li, P. M. Pardalos, K. G. Ramakrishnan, and M. G. C. Resende,


Lower bounds for the quadratic assignment problem, Annals of Oper-
ations Research 50, 1994, 387-410.

[124] Y. Li, P. M. Pardalos, and M. G. C. Resende, A greedy random-


ized adaptive search procedure for the quadratic assignment problem,
in Quadratic Assignment and Related Problems, P. Pardalos and H.
Wolkowicz, eds., DIMACS Series in Discrete Mathematics and Theo-
retical Computer Science 16, 1994, 237-261, AMS, Providence, RI.

[125] L. Lovasz and A. Schrijver, Cones of matrices and set functions and
0-1 optimization, SIAM Journal on Optimization 1, 1991, 166-190.

[126] E. J. McCormick, Human Factors Engineering, McGraw-Hill, New


York, 1970.
The Quadratic Assignment Problem 333

[127] T. Magnanti, R. Ahuja, and J. Orlin. Network flows: theory, algo-


rithms, and applications, Prentice Hall, Englewood-Cliffs, New Jersey,
1993.

[128] F. Malucelli, Quadratic Assignment Problems: Solution Methods and


Applications, Ph.D. Thesis, Dipartimento di Informatica, Universita
di Pisa, Italy, 1993.

[129] F. Malucelli and D. Pretolani, Lower bounds for the quadratic semi-
assignment problem, Technical Report 955, Centre des Recherches sur
les Transports, Universite de Montreal, Canada, 1993.

[130] A. Marzetta, Dynamic programming for the quadratic assignment


problem, presented at the 2-nd Aussois Workshop on Combinatorial
Optimization, 1998, Aussois, France.

[131] T. Mautor and C. Roucairol, A new exact algorithm for the solution
of quadratic assignment problems, Discrete Applied Mathematics 55,
1992, 281-293.

[132] T. Mavridou, P. M. Pardalos, L. S. Pitsoulis, and M. G. C. Resende, A


GRASP for the biquadratic assignment problem, European Journal
of Operations Research 105, 1998, 613-62l.

[133] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller,


Equations of state calculations by fast computing machines, Journal
of Chemical Physics 21, 1953, 1087-1092.
[134] I. Z. Milis and V. F. Magirou, A Lagrangean relaxation algorithm
for sparse quadratic assignment problems, Operations Research Let-
ters 17, 1995, 69-76.

[135] P. B. Mirchandani and T. Obata, Locational decisions with interac-


tions between facilities: the quadratic assignment problem a review,
Working Paper Ps-79-1, Rensselaer Polytechnic Institute, Troy, New
York, May 1979.

[136] L. Mirsky, The spread of a matrix, Mathematika 3, 1956, 127-130.

[137] J. Mosevich, Balancing hydraulic turbine runners - a discrete combi-


natorial optimization problem, European Journal of Operational Re-
search 26, 1986, 202-204.
334 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

[138] K. A. Murthy, P. Pardalos, and Y. Li, A local search algorithm for


the quadratic assignment problem, Informatica 3, 1992, 524-538.

[139] K. G. Murty, An algorithm for ranking all the assignments in order of


increasing cost, Operations Research 16, 1968, 682-287.

[140] H. Miiller-Merbach, Optimale Reihenfolgen, Springer-Verlag, Berlin,


Heidelberg, New York, 1970, pp. 158-171.
[141] C. E. Nugent, T. E. Vollmann, and J. Ruml, An experimental compar-
ison of techniques for the assignment of facilities to locations, Journal
of Operations Research 16, 1969, 150-173.

[142] M. W. Padberg and M. P. Rijal, Location, Scheduling, Design and


Integer Programming, Kluwer Academic Publishers, Boston, 1996.
[143] M. W. Padberg and G. Rinaldi, Optimization of a 532-city symmetric
traveling salesman problem by a branch and cut algorithm, Operations
Research Letters 6, 1987, 1-7.
[144] G. S. Palubeckis, Generation of quadratic assignment test problems
with known optimal solutions, U.S.S.R. Comput. Maths. Math. Phys.
28, 1988, 97-98, (in Russian).
[145] C. H. Papadimitriou and D. Wolfe, The complexity of facets resolved,
Proceedings of the 25-th Annual IEEE Symposium on the Foundations
of Computer Science (FOCS), 1985, 74-78.

[146] P. Pardalos and J. Crouse, A parallel algorithm for the quadratic


assignment problem, Proceedings of the Supercomputing Conference
1989, ACM Press, 1989, 351-360.
[147] P. Pardalos, F. Rendl, and H. Wolkowicz, The quadratic assignment
problem: A survey and recent developments, in Quadratic Assignment
and Related Problems, P. Pardalos and H. Wolkowicz, eds., DIMACS
Series in Discrete Mathematics and Theoretical Computer Science 16,
1994, 1-42, AMS, Providence, RI.

[148] P. M. Pardalos, L. S. Pitsoulis, and M. G. C. Resende, A parallel


G RASP implementation for solving the quadratic assignment prob-
lem, in Parallel Algorithms for Irregular Problems: State of the Art,
A. Ferreira and Jose D.P. Rolim, eds., Kluwer Academic Publishers,
1995, 115-133.
The Quadratic Assignment Problem 335

[149] P. M. Pardalos, L. S. Pitsoulis, and M. G. C. Resende, Fortran sub-


routines for approximate solution of sparse quadratic assignment prob-
lems using GRASP, ACM '.franscations on Mathematical Software 23,
1997, 196-208.

[150] P. M. Pardalos, K. G. Ramakrishnan, M. G. C. Resende, and Y. Li,


Implementation of a variable reduction based lower bound in a branch
and bound algorithm for the quadratic assignment problem, SIAM
Journal on Optimization 7, 1997, 280-294.

[151] P. M. Pardalos and J. Xue, The maximum clique problem, Research


Report 93-1, Department of Industrial and System Engineering, Uni-
versity of Florida, FI, 1993.

[152] M. Queyranne, Performance ratio of heuristics for triangle inequality


quadratic assignment problems, Operations Research Letters 4, 1986,
231-234.

[153] G. Reinelt, The Linear Ordering Problem: Algorithms and Applica-


tions, Heldermann Verlag, Berlin, Germany, 1985.

[154] F. Rendl, Ranking scalar products to improve bounds for the quadratic
assignment problem, European Journal of Operations Research 20,
1985, 363-372.

[155] F. Rendl and H. Wolkowicz, Applications of parametric programming


and eigenvalue maximization to the quadratic assignment problem,
Mathematical Programming 53,1992,63-78.

[156] M. G. C. Resende, P. M. Pardalos, and Y. Li, Fortran subroutines for


approximate solution of dense quadratic assignment problems using
GRASP, ACM '.franscations on Mathematical Software 22, 1996, 104-
118.

[157] M. G. C. Resende, L. S. Pitsoulis, and P. M. Pardalos, Approximate


solution of weighted max-sat problems using GRASP, in The Satisn-
ability Problem, P. M. Pardalos, M. G. C. Resende and D. Z. Du, eds.,
DIMACS Series in Discrete Mathematics and Theoretical Computer
Science 35, 1997, 393-405, AMS, Providence, RI.

[158] M. G. C. Resende, K. G. Ramakrishnan, and Z. Drezner, Comput-


ing lower bounds for the quadratic assignment problem with an inte-
336 R.E. Burkard, E. Qela, P.M. Pardalos, and L.S. Pitsoulis

rior point algorithm for linear programming, Operations Research 43,


1995, 781-791.

[159] W. T. Rhee, A note on asymptotic properties of the quadratic assign-


ment problem, Operations Research Letters 7, 1988, 197-200.

[160] W. T. Rhee, Stochastic analysis of the quadratic assignment problem,


Mathematics of Operations Research 16, 1991, 223-239.

[161] M. P. Rijal, Scheduling, Design and Assignment Problems with


Quadratic Costs, Ph.D. Thesis, New York University, NY, 1995.

[162] C. Roucairol, A reduction method for quadratic assignment problems,


Operations Research Verfahren 32, 1979, 183-187.

[163] C. Roucairol, A parallel branch and bound algorithm for the quadratic
assignment problem, Discrete Applied Mathematics 18, 1987, 221-
225.

[164] S. Sahni and T. Gonzalez, P-complete approximation problems, Jour-


nal of the Association of Computing Machinery 23, 1976, 555-565.

[165] A. Schaffer and M. Yannakakis, Simple local search problems that are
hard to solve, SIAM Journal on Computing 20, 1991, 56-87.

[166] J. Skorin-Kapov, Tabu search applied to the quadratic assignment


problem, ORSA Journal on Computing 2, 1990, 33-45.

[167] J. Skorin-Kapov, Extensions of tabu search adaptation to the


quadratic assignment problem, to appear in Computers and Opera-
tions Research.

[168] L. Steinberg, The backboard wiring problem: A placement algorithm,


SIAM Review 3, 1961, 37-50.

[169] H. S. Stone, Multiprocessor scheduling with the aid of network How


algorithms, IEEE Transactions on Software Engineering 3, 1977, 85-
93.

[170] W. Szpankowski, Combinatorial optimization problems for which al-


most every algorithm is asymptotically optimal!, Optimization 33,
1995, 359-367.
The Quadratic Assignment Problem 337

[171] E. Taillard, Robust tabu search for the quadratic assignment problem,
Parallel Computing 17, 1991, 443-455.
[172] D. M. Tate and A. E. Smith, A genetic approach to the quadratic
assignment problem, Computers and Operations Research 22, 1995,
73-83.

[173] I. Ugi, J. Bauer, J. Friedrich, J. Gasteiger, C. Jochum, and W. Schu-


bert, Neue Anwendungsgebiete fiir Computer in der Chemie, Ange-
wandte Chemie 91, 1979, 99-111.
[174] D. H. West, Algorithm 608: Approximate solution of the quadratic
assignment problem, ACM Transactions on Mathematical Software 9,
1983, 461-466.

[175] M. R. Wilhelm and T. L. Ward, Solving quadratic assignment prob-


lems by simulated annealing, IEEE Transactions 19, 1987, 107-119.

[176] Q. Zhao, Semidefinite Programming for Assignment and Partitioning


Problems, Ph.D. Thesis, University of Waterloo, Ontario, Canada,
1996.

[177] Q. Zhao, S. E. Karisch, F. Rendl, and H. Wolkowicz, Semidefinite


relaxations for the quadratic assignment problem, Technical Report
DIKU TR-96-32, Department of Computer Science, University of
Copenhagen, Denmark, 1996. To appear in Journal of Combinatorial
Optimization.
339

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 339-405
©1998 Kluwer Academic Publishers

Algorithmic Aspects of Domination in Graphs


Gerard J. Chang
Department of Applied Mathematics
National Chiao Tung University
Hsinchu 30050, Taiwan

Contents
1 Introduction 340

2 Definitions and notation 343


2.1 Graph terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
2.2 Variation of domination . . . . . . . . . . . . . . . . . . . . . . . . . 345
2.3 Special classes of graphs . . . . . . . . . . . . . . . . . . . . . . . . . 348

3 Trees 349
3.1 Basic properties of trees . . . . . . . . . . . . . . . . . . . . . . . . . 349
3.2 Labeling algorithm for trees . . . . . . . . . . . . . . . . . . . . . . . 350
3.3 Dynamic programming for trees . . . . . . . . . . . . . . . . . . . . . 353
3.4 Primal-dual approach for trees . . . . . . . . . . . . . . . . . . . . . 356
3.5 Thee related classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

4 Interval graphs 358


4.1 Interval orderings of interval graphs . . . . . . . . . . . . . . . . . . 358
4.2 Domatic numbers of interval graphs . . . . . . . . . . . . . . . . . . 360
4.3 Weighted independent domination in interval graphs . . . . . . . . . 362
4.4 Weighted domination in interval graphs . . . . . . . . . . . . . . . . 363
4.5 Weighted total domination in interval graphs . . . . . . . . . . . . . 364
4.6 Weighted connected domination in interval graphs . . . .. . . . . 364

5 Chordal graphs and NP-complete results 365


5.1 Perfect elimination orderings of chordal graphs . . . . . . . . . . . . 365
5.2 NP-completeness for domination . . . . . . . . . . . . . . . . . . . . 366
5.3 Independent domination in chordal graphs . . . . . . . . . . . . . . . 371
340 G.J. Chang

6 Strongly chordal graphs 376


6.1 Strong elimination orderings of strongly chordal graphs 376
6.2 Weighted domination in strongly chordal graphs 377
6.3 Domatic partition in strongly chordal graphs 380

7 Permutation graphs 382

8 Co comparability graphs 384

9 Distance-hereditary graphs 388


9.1 Hangings of distance-hereditary graphs. 388
9.2 Weighted connected domination. . 389

References

Abstract
Graph theory was founded by Euler [78] in 1736 as a generalization
to the solution of the famous problem of the Konisberg bridges. From
1736 to 1936, the same concept as graph, but under different names,
was used in various scientific fields as models of real world problems,
see the historic book by Biggs, Lloyd and Wilson [19]. This chapter
intents to survey the domination problem in graph theory, which is a
natural model for many location problems in operations research, from
an algorithmic point of view.

1 Introduction
Domination in graph theory is a natural model for many location problems
in operations research. As an example, consider the following fire station
problem. Suppose a county has decided to build some fire stations, which
must serve all of the towns in the county. The fire stations are to be located
in some towns so that every town either has a fire station or is a neighbor of
a town which has a fire station. To save money, the county wants to build
the minimum number of fire stations satisfying the above requirements.
Domination has many other applications in the real world. The recent
book by Haynes, Hedetniemi and Slater [100] illustrates many interesting
examples, including dominating queens, sets of representatives, school but
routing, computer communication networks, (r, d)-configurations, radio sta-
tions, social network theory, landing surveying, kernels of games ... etc.
Algorithmic Aspects of Domination in Graphs 341

Among them, the classical problems of covering chessboards by the min-


imum number of chess pieces were important in stimulating the study of
domination, which commenced in the early 1970's. These problems certainly
date bake to De Jaenisch [69] and have been mentioned in the literature fre-
quently since that time.
A simple example is to determine the minimum number of kings domi-
nating the entire chessboard. The answer for an mxn chessboard is rr;l rjl.
In the Chinese chess game, a king only dominates the four neighbor cells
which have common sides with the cell the king lies. In this case, the Chi-
nese king domination problem for an m x n chessboard is harder. Figure 1
shows optimal solutions to both cases for a 3 x 5 board.

K K K K
K
(a) chessboard (b) Chinese chessboard
Figure 1: King domination on a 3 x 5 chessboard.

We can abstract the above problems into the concept of domination in


terms of graphs as follows. A dominating set of a graph G = (V, E) is a
subset D of V such that every vertex not in D is adjacent to at least one
vertex in D. The domination number -y(G) of a graph G is the minimum
size of a dominating set of G.
For the fire station problem, we consider the graph G having all towns of
the county as its vertices and a town is adjacent to its neighbor towns. The
fire station problem is just the domination problem, as 'Y( G) is the minimum
number of fire stations needed.
For the king domination problem on an m x n chessboard, we consider
the king's graph G, whose vertices correspond to the mn squares in the
chessboard and two vertices are adjacent if and only if their corresponding
squares have a common point or side. For the Chinese king domination
problem, the vertex set is the same but two vertices are adjacent if and
only if their corresponding squares have a common side. Figure 2 shows
the corresponding graphs for the king and the Chinese king domination
problems on a 3 x 5 chessboard. 'Y( G) is then the minimum number of kings
342 G.J. Chang

needed. Black vertices in the graph form a minimum dominating set.

(a) for chess (b) for Chinese chess


Figure 2: King's graphs for the chess and the Chinese chess.

Although many theoretic theorems for the domination problem have


been established for a long time, the first algorithmic result on this topic was
given by Cockayne, Goodman and Hedetemiemi [47] in 1975. They gave a
linear-time algorithm for the domination problem in trees by using a labeling
method. On the other hand, at about the same time Garey and Johnson (see
[86]) constructed the first (unpublished) pro oft hat the domination problem
is NP-complete for general graphs. Since then, many algorithmic results are
studied for variants of the domination problem in different classes of graphs.
The purpose of this chapter is to survey theses results.
This chapter is organized as follows. Section 2 gives basic definitions and
notation. In particular, the classes of graphs surveyed in this chapter are
introduced. Among them, trees and interval graphs are two basic classes in
the study of domination. While a tree can be viewed as many paths starting
from a center with different branches, an interval graph is a 'thick path' in
the sense that a vertex of a path is replaced by a group of vertices tiding
together. Section 3 investigates different approaches for domination in trees,
including labeling method, dynamic programming method, primal-dual ap-
proach and others. These techniques are used not only for trees, but also
for many other classes of graphs in the study of domination as well as many
other optimization problems. Section 4 is for domination in interval graphs.
It is in general not clear that how far we can extend the results in trees
and interval graphs to other classes of graphs. For some cases the domi-
nation problem becomes NP-complete, while some are solvable. Section 5
surveys NP-complete results, for which chordal graphs play an important
role. The remaining sections are for classes of graphs in which the domi-
nation problem is solvable, including strongly chordal graphs, permutation
graphs, co comparability graphs and distance-hereditary graphs.
Algorithmic Aspects of Domination in Graphs 343

2 Definitions and notation


2.1 Graph terminology
A graph is an ordered pair G = (V, E) consisting of a finite nonempty set V
of vertices and a set E of 2-subsets of V, whose elements are called edges.
A graph is trivial if it contains only one vertex. For any edge e = {u, v}, we
say that vertices u and v are adjacent, and that vertex u (respectively, v)
and edge e are incident. Two distinct edges are adjacent if they contain a
common vertex. It is convenient to henceforth denote an edge by uv rather
than {u, v}. Note that uv and vu represent the same edge in a graph.
Two graphs G = (V, E) and H = (U, F) are isomorphic if there exists a
bijection f from V to U such that uv E E if and only if f (u) f (v) E F. Two
isomorphic graphs are essentially the same as one can be obtained from the
other by renaming vertices.
It is often useful to express a graph G diagrammatically. To do this,
each vertex is represented by a point (or a small circle) in the plane and
each edge by a curve joining the points (or small circles) corresponding to
the two vertices incident to the edges. It is convenient to refer to such
digram of G as G itself. In Figure 3, a graph G with vertex set V = {u, v,
w, x, y, z} and edge set E = {uw, ux, vw, wx, xy, wz, xz} is shown.
u

z
Figure 3: A graph G of 6 vertices and 7 edges.

Suppose A and B are two sets of vertices. The neighborhood NA(B) of


B in A is the set of vertices in A that are adjacent to some vertex in B, i.e.,
NA(B) = {u E A : uv E E for some v E B}.
The closed neighborhood NA[B] of B in A is NA[B]UB. For simplicity, NA(V)
stands for NA({V}), NA[V] for NA[{V}], N(B) for Nv(B), N[B] for Nv(B),
N(v) for Nv({v}) and N[v] for Nv({v}). We use u v for u E N[v].
f'V

The degree deg(v) of a vertex v is the size of N(v), or equivalently, the


number of edges incident to v. An isolated vertex is a vertex of degree zero.
344 G.J. Chang

A leaf (or end vertex) is a vertex of degree one. The minimum degree of a
graph G is denoted by 8(G) and the maximum degree by ll.(G). A graph G
is r-regular if 8(G) = ll.(G) = r. A 3-regular graph is called a cubic graph.
A graph G' = (V', E') is a subgraph of another graph G = (V, E) if
V' ~ V and E' ~ E. In the case of V' = V, G' is called a spanning subgraph
of G. For any nonempty subset S of V, the (vertex) induced subgraph G[8]
is the graph with vertex set S and edge set

E[8] = {uv E E : U E S and v E S}.


A graph is H -free if it dose not contain H as an induced subgraph. The
deletion of S from G = (V, E), denoted by G - S, is the graph G[V \ 8].
We use G - vasa short notation for G - {v} when v is a vertex in G. The
deletion of a subset F of E from G = (V, E) is the graph G - F = (V, E \ F).
We use G - e as a short notation for G - {e} if e is an edge of G. The
complement of a graph G = (V, E) is the graph G = (V, E), where

E = {uv f/. E: u,v E V and u '1= v}.


Suppose GI = (Vi, Ed and G2 = (V2, E2) are two graphs with Vi n V2 = 0.
The union of GI and G2 is the graph G 1 U G2 = (VI U V2, EI U E2)' The
join of GI and G2 is the graph G 1 + G2 = (VI U V2, E+), where

The Cartesian product of GI and G2 is the graph GI x G2 = (Vi x V2, Ex),


where
Vi x V2 = {(VbV2) : VI E VI and V2 E V2},
Ex = {(Ub U2)(Vb V2) : (UI = VbU2V2 E E2) or (UIVI E EbU2 = V2)}.
In a graph G = (V, E), a clique is a set of pairwise adjacent vertices
in V. An i-clique is a clique of size i. A 2-clique is just an edge. A 3-
clique is called a triangle. A stable (or independent) set is a set of pairwise
nonadjacent vertices in V.
For two vertices x and y of a graph, an X-v walk is a sequence x = Xo, xl,
••• , Xn = Y such that Xi-IX; E E for 1 ~ i ~ n, where n is called the length
ofthe walk. In a walk Xo, XI, ... , X n , a chord is an edge XiX; with Ii - jl ~ 2.
A trial (path) is a walk in which all edges (vertices) are distinct. A cycle is
an x-x walk in which all vertices are distinct except the first vertex is equal
to the last. A graph is acyclic if it does not contain any cycle.
Algorithmic Aspects of Domination in Graphs 345

A graph is connected if for any two vertices x and y, there exist an


x-y walk. A graph is disconnected if it is not connected. A (connected)
component of a graph is a maximal subgraph which is connected. A cut-
vertex is a vertex whose deleting from the graph resulting a disconnected
graph. A block of a graph is a maximal connected graph which has no
cut-vertices.
The distance d(x, y) from a vertex x to another vertex y is the minimum
length of an x-y path; and d(x, y) = 00 when there is no x-y paths.
Digraphs or directed graphs can be defined similar to graphs except that
an edge (u, v) is now an ordered pair rather than a 2-subset. All terms
in graphs can be defined with suitable modifications by taking account the
directions.
An orientation of a graph G = (V, E) is a digraph (V, E') such that for
each edge {u, v} of E exactly one of (u, v) and (v, u) is in E' and the edges
in E' all come from this way.

2.2 Variation of domination


Due to different requirements in a location problem or a chessboard prob-
lem, people have studied many variations of the domination problem. For
instance, in the queen's domination problem, one may ask the additional
property that two queens don't dominate each other, or any two queens
must dominate each other. The following are most commonly studied vari-
ants of domination.
Recall that a dominating set of a graph G = (V, E) is a subset D of V
such that every vertex not in D is adjacent to at least one vertex in D. This
is equivalent to that N[x] n D =f:. 0 for all x E V or UYEDN[y] = V.
A dominating set is independent, connected, total or perfect (efficient) if
G[D] has no edge, G[D] is connected, G[D] has no isolated vertices or IN[v]n
DI = 1 for any v E V \ D. An independent (respectively, connected or totaQ
perfect dominating set is a perfect dominating set which is also independent
(respectively, connected or total). A dominating clique (respectively, cycle)
is a dominating set which is also a clique (respectively, cycle). For a fixed
positive integer k, a k-dominating set of G is a subset D of V such that for
every vertex v in V there exists some vertex u in D with d( u, v) :5 k. An
edge dominating set of G = (V, E) is a subset F of E such that every edge
in E \ F is adjacent to some edge in F.
For the above variants of domination, the corresponding independent,
connected, total, perfect, independent perfect, connected perfect, total per-
346 G.J. Chang

feet, clique, cycle, k-, edge domination numbers are denoted by 'Yi( G), 'Ye( G),
'Yt(G), 'Yp(G), 'Yip(G), 'Yep(G), 'Ytp(G), 'Ycl(G), 'Ycy(G), 'Yk(G), 'Ye(G), respec-
tively. Figure 4 shows a graph G of 13 vertices and 19 edges, whose values
of all 'Y7r (G) are given as follows.
'Y(G) = 3, D* = {V2,vS,vg};
'Yi(G) =3, D* = {V2,vS,vg};
'Ye(G) = 6, D* = {V2, V4, VS, V7, VIO, VI2};
'Yt{G) = 5, D* = {V2, V4, V7, VIO, VI2};
'Yp(G) = 4, D* = {V2,Va,vS,vg};
'Yip(G) = 00, infeasible;
'Yep (G) = 10, D* = {VI, V2, V4, VS, V6, V7, Vg, VIO, Vn, VI2};
'Ytp(G) = 6, D* = {VI,V2,V4,VS,VI2,Vla};
'Ycl( G) = 00, infeasible;
'Yey(G) = 8, D* = {V2, V4, V6, Vg, V12, VIO, V7, vs};
'Y2(G) = 2, D* = {V6, V7};
'Yk( G) = 1, D* = {V7} for k ~ 3;
'Ye(G) = 3, =
D* {V2V4,vgvI2,V7VS}.

Figure 4: A graph G of 13 vertices and 19 edges.

For all of the above variants of domination, except the edge domination,
we can consider their (verte:rr)weighted cases. Now, every vertex V has a
weight w(v) of real number. The problem is to find a dominating set D in
a suitable variant such that
w(D) = L w(v)
vEV

is as small as possible. We use 'Y7r (G, w) to denote this minimum value,


where 7r stands for a variant of the domination problem. When w(v) = 1
for all vertices v, the weighted cases become the cardinality cases.
For some variants of domination, we may assume the vertex weights are
non-negative as the following lemma shows.
Algorithmic Aspects of Domination in Graphs 347

Lemma 2.1 Suppose G =:= (V, E) is a graph in which every vertex is asso-
ciated with a weight w(v) of real number. If w'(v) = max{w(v),O} for all
vertices v E V, then
11l"(G,w) = 11l"(G,w') + L w(v)
w(v)<o

for any 7r E {0, c, t, k}.


Proof. Denote by A the set of all vertices v with w( v) < 0. Suppose D lS
a 7r-dominating set of G with E w(v) = 11l"(G,w). Then
vED

11l"(G,w') ~ L w'(v) = L w(v) - L w(v) ~ 11l"(G,w) - L w(v).


vED vED vEDnA w(v)<o

On the other hand, for any 7r-dominating set D of G with E w' (v) -
vED
11l"(G, w'), D U A is also a 7r-dominating set of G and so
11l"(G,w) ~ L w(v) =L w'(v) + L w(v) = 171"(G,w') + L w(v).
vEDuA vED vEA w(v)<o

Both inequalities imply the lemma. •


More generally, we may consider vertex-edge-weighted cases of the dom-
ination problems as follows. Now, besides the weights of vertices we also
associate a weight w(e) to each edge e. The object is then to find a domi-
nating set D in a suitable variant such that
w{D} = L w{v} + L w{uu'}
vED uEV\D

is as small as possible, where u' is a vertex in D that is adjacent to u. Note


that there may have many choice of u' except for the perfect domination
and its three variants. We use 171" (G, w, w) to denote this minimum value,
where 7r stands for a variant of domination. When w(e) = for all edges e,°
the vertex-edge-weighted cases become the vertex-weighted cases.
Another parameter related to domination is as follows. The domatic
number d( G) of a graph G is the maximum number r such that G has r
pairwise disjoint dominating sets Dl, D2, ... , Dr. We can also define in-
dependent, connected, total, perfect, independent perfect, connected perfect,
total perfect, clique, cycle, k-, edge domatic numbers di(G), dc(G), dt(G),
dp(G), dip(G), dcp(G), dtp(G), dcl(G), dcy(G), dk(G), de(G), respectively,
according to above variations of domination in similar ways.
348 G.J. Chang

2.3 Special classes of graphs


In this subsection, we introduce special classes of graphs. They are not only
important in the study of domination, but also fundamental in graph theory.
A complete graph is a graph whose vertex set is a clique. The complete
graph of n vertices is denoted by Kn. The complement Kn of the complete
graph Kn is then a graph with no edges.
The n-path, denoted by Pn , is a graph of n vertices that contains a
chordless path of length n. The n-cycle, denoted by Cn , is a graph of n
vertices that contains a chordless cycle of length n.
An r-partite graph is a graph whose vertex set can be partitioned into
r stable sets, which are called its partite sets. A 2-partitegraph is usually
called a bipartite graph. A complete r-partite graph is a r-partite graph in
which vertices in different partite sets are adjacent. A complete r-partite
graph whose partite sets having nI, n2, ... , nr vertices, respectively, is
denoted by K(nI' n2,· .. ,nr).
A tree is a connected graph without any cycle. A directed tree is an
orientation of a tree. A rooted tree is a directed tree in which there is a
special vertex r, called the rooted, such that for every vertex v there is
a directed r-v path. Trees are probably the simplest structures in graph
theory. Problems look hard in general graphs are often investigated in trees
first as a warm up. We study domination in trees as the first class in Section
3. Many ideas for domination in trees are then generalized to other classes
of graphs.
Suppose :F is a family of sets. The intersection graph of :F is the graph
obtained by representing each set in :F as a vertex and joining two vertices
with an edge if their corresponding sets intersect. It is well-known that
any graph is the intersection graph of some family:F. The problem of
characterizing the intersection graphs of families of sets having some specific
topological or other pattern is often very interesting and frequently has
applications in the real world. A typical example is the class of interval
graphs. An interval graph is the intersection graph of intervals in the real
line. They play important roles in many applications. We study domination
in interval graphs in Section 4.
A graph is chordal (or triangulated:) if every cycle of length greater than
three has a chord. The class of chordal graphs is one of the the classical
class in the perfect graph theory, see the book by Golumbic [88]. It turns out
to be also very important in the domination theory. It is well-known that a
graph is chordal if and only if it is the intersection graph of some subtrees
Algorithmic Aspects of Domination in Graphs 349

of a certain tree. If these subtrees are paths, this chordal graph is called an
undirected path graph. If theses subtrees are directed paths of a rooted tree,
we get a directed path graph. If these subtrees are paths in some n-path, we
get an interval graph.
Most variations of the domination problem are NP-complete even for
chordal graphs, see Section 5. As an important subclass of chordal graphs,
the class of strongly chordal graphs is a star of the domination theory. We
study domination in this class in Section 6.
A permutation diagram consists of n points on each of two parallel lines
and n straight line segments matching the points. The intersection graph
of the line segments is called a permutation graph. We study domination in
permutation graphs in Section 7.
A comparability graph is a graph G = (V, E) that has a transitive orien-
tation G' = (V,E'), i.e., uv E E' and vw E E' imply uw E E'. A cocompa-
rability graph is the complement of a comparability graph. Cocomparabity
graphs are generalization of permutation graphs and intervals graphs. We
study domination in co comparability graphs in Section 8.
A graph is distance-hereditary if the distance of any two vertices is
the same in any connected induced subgraph containing them. We study
distance-hereditary graphs in Section 9.
For more detail discussions of these classes of graphs, see the remaining
sections of this chapter.

3 Trees
3.1 Basic properties of trees
Recall that a tree is an acyclic connected graph. The following characteri-
zations are well-known.
Theorem 3.1 The following statements are equivalent for any graph G =
(V,E).
(1) G is a tree.
(2) G is connected and IVI = lEI + l.
(3) G is acyclic and IVI = lEI + l.
(4) For any two vertices u and v, there is a unique u-v path.
(5) The vertices of G have an ordering [VI, V2, ... ,vn ] such that Vi is a
leaf of Gi = G[{Vi,Vi+l, ... ,vn }] for 1::; i ::; n -1, or equivalently
for each 1 ::; i ::; n - 1, Vi is adjacent to exactly one Vj with j > i. (TO)
350 G.J. Chang

The ordering in Theorem 3.1 (5) is called a tree ordering of the tree. It
plays an important role in many algorithms dealing with trees. In a tree
ordering [Vb V2, ,." vn], the only neighbor Vj of Vi with j > i is called the
father of Vi. From an algorithmic point of view, the testing of a tree and
finding a tree ordering can be done in linear time. Figure 5 shows a tree of
11 vertices and a tree ordering.

Figure 5: An example of the tree ordering.

3.2 Labeling algorithm for trees


Cockayne, Goodman and Hedetniemi [47] gave the first linear-time algorithm
for the domination problem in trees by a labeling method, which is a naive
but useful approach.
The algorithm starts processing a leaf V of a tree T, which is adjacent
to a unique vertex u. To dominate v, a minimum dominating set D of T
must contain u or v, However, since N[v] ~ N[u], it is better to have u in
D rather than V in D. So we can keep an information 'required' in u and
delete V from T. At some iteration, if we process a 'required' leaf V adjacent
to u which is not labeled by 'required', we need to put V into D and delete
it from T. But now, there is a vertex in D that dominates u, so we label u
by 'free'. For convenience, we label all vertices by 'bound' initially.
More precisely, suppose the vertex set of a graph G = (V, E) is parti-
tioned into three sets F, B and R, where F consists of free vertices, B consists
of bound vertices and R consists of required vertices. A mixed dominating
set of G (with respect to F, B, R) is a subset D ~ V such that
R ~ D and every vertex in B \ D is adjacent to some vertex in D.
Free vertices need not be dominated by D but may be included in D in order
to dominate bound vertices. The mixed domination number 'Ym(G) is the
minimum size of a mixed dominating set in G, such a set is called an md-set
Algorithmic Aspects of Domination in Graphs 351

of G. Note that mixed domination is the usual domination when B = V


and F = R= 0.
The construction and correctness of the algorithm is based on the fol-
lowing theorem.

Theorem 3.2 Suppose T is a tree having free, bound and required vertices
F, Band R, respectively. Let v be a leaf ofT, which is adjacent to u. Then
the following statement holds.
(1) If v E F, then ')'m(T) = ')'m(T - v).
(2) If v E Band T' is the tree which results from T by deleting v and
relabeling u as 'required', then ')'m(T) = ,),m(T').
(3) If v E Rand u E R, then ')'m(T) = 1 + ')'m(T - v).
(4) If v E R, u ¢. Rand T' is the tree which results from T by deleting
v and relabeling u as 'free', then ')'m(T) = 1 + ')'m(T').

Proof. (1) Since v is free, any mixed dominating set of T - v is also a mixed
dominating set ofT. Thus, ')'m(T) ~ ')'m(T-v). On the other hand, suppose
D is an md-set of T. If v ¢. D, then D is also a mixed dominating set of
T - v. If v E D, then (D \ {v}) u {u} is a mixed dominating set of T - v,
whose size is at most IDI. Thus, in either case, ')'m(T - v) ~ IDI = ')'m(T).
(2) Since u is required in T', a mixed dominating set ofT' always contains
u and hence is also a mixed dominating set of T. Thus, ')'m(T) ~ ,),m(T').
On the other hand, suppose D is an md-set of T. Since v is bound in
T, either u or v is in D. In any case, D' = (D \ {v}) u {u} is also an
md-set of T which contains u but not v. Note that D' is then a mixed
dominating set of T', in which u is considered as a required vertex. So,
,),m(T') ~ ID'I ~ IDI = ')'m(T).
(3) If D' is an md-set of T', then D' U {v} is a mixed dominating set
of T. Thus, ')'m(T) ~ 1 + ,),m(T'). On the other hand, any md-set D of T
contains both u and v. Then D \ {u} is a mixed dominating set of T. So,
,),m(T') ~ ')'m(T) - 1.
(4) If D' is an md-set of T', then D' U {v} is a mixed dominating set
of T. Thus, ')'(T) ~ 1 + ,),m(T'). On the other hand, any md-set D of T
contains v. Since u is free in T', D \ {v} is a mixed dominating set in T'.
So, ,),m(T') ~ ')'m(T) - 1. •

Base on the above theorem, we have the following algorithm for the
mixed domination problem in trees.

Algorithm DomTreeL. Find a minimum mixed dominating set of a tree.


352 G.J. Chang

Input. A tree T whose vertices are labeled by free, bound or required. A


tree ordering [VI, V2, .•. , vn ] of T.
Output. A minimum mixed dominating set D of T.
Method.

D t- <p;
for i = 1 to n - 1 do
let Vj be the father of Vi;
if (Vi is bound) then
relabel Vj as required;
if (Vi is required) then
{D t- D U {Vi};
if Vj is bound then relabel Vj as free; }
end for;
if Vn is not free then D t- D U {v n }.

Slater [161] generalized the above idea to solve the k-domination prob-
lem in trees. Now, each vertex Vi is associated with an ordered pair (ai, bi )
of a nonnegative integer ai and a positive integer bi. The dominating set D
is chosen so that each vertex Vi is within distance ai from some vertex in
D. The integer bi indicates that there is a vertex in the current D that is
of distance bi from Vi. For the case of k = 1, a label (0, bi) is the same as
'required', (1,1) the same as 'free' and (1,2) the same as 'bound'. Mitchell
and Hedetniemi [141] and Yannakakis and Gavril [176] gave labeling algo-
rithms for the edge domination problem in trees. Laskar, Pfaff, Hedetniemi
and Hedetniemi [128] gave a labeling algorithm for the total domination
problem in trees.
As these algorithms suggest, the labeling algorithm may only work for
problems whose solutions have 'local property'. As an example of problem
without local property, let us consider the independent domination problem
in the tree T of Figure 6. The only minimum independent dominating set
of T is {VI, V3}. If we are given the tree ordering [VI, V2, V4, V3, vs], our
algorithm must be clever enough to put the leaf VI into the solution at the
first iteration. If we are given another tree ordering [vs, V4, V3, V2, VI], our
algorithm must be clever enough not to put the leaf Vs at the first iteration.
So, the algorithm must be one that not only looks at a leaf and its only
neighbor, but also has some idea about the whole structure of the tree. This
is what I mean the solution not only has a 'local property'.
Algorithmic Aspects of Domination in Graphs 353

Figure 6: A tree T with a unique minimum independent dominating set.

3.3 Dynamic programming for trees


Dynamic programming is a powerful method for solving many discrete opti-
mization problemsj see the books by Bellman and Dreyfus [11], Dreyfus and
Law [74] and Nemhauser [147]. The main idea of the dynamic programming
approach for domination is to turn the 'bottom-up' labeling method into
'top-down'. Now a specific vertex '1.£ is chosen from G. A minimum domi-
nating set D of G either contains or does not contain u. So we may consider
the following two domination problems which are the original problem with
boundary conditions.

'l(G,u) = min{IDI : D is a dominating set of G and '1.£ ED}.

'Y°(G,u) = min{IDI : D is a dominating set of G and '1.£ rt. D}.


Then we have

Lemma 3.3 'Y(G) = min{-yl(G,u), 'Y°(G,u)} for any graph G with a spe-
cific vertex u.

Suppose H is another graph with a specific vertex v. Let I be the graph


with the specific vertex '1.£, which is obtained from the disjoint union of G
and H by joining a new edge UVj see Figure 7
The aim is to use 'Yl(G, '1.£), 'Y°(G, '1.£), 'Yl(H, v) and 'Y°(H, v) to find 'Yl(l, '1.£)
and 'YO (I, '1.£). Suppose D is a dominating set of I with '1.£ E I. Then D =
D' U D", where D' is a dominating set of G with v E D' and D" is a subset
of V(H) which dominates V(H) - {v}. There are two cases. In the case of
v E D", D" is a dominating set of H. On the other hand, if v rt. D" then
D" is a dominating set of H - v. For the latter case, we need the following
new problem:

'Y00(G, '1.£) = min{IDI : D is a dominating set of G - u}.


354 G.J. Chang

Figure 7: The composition of two trees G and H.

Note that ')'00 ( G, u) :::; ')'0 ( G, u), since a dominating set D of G with
u ¢ D is also a dominating set of G - u.

Theorem 3.4 Suppose G and H are graphs with specific vertices u and v,
respectively. Let I be the graph with the specific vertex u, which is obtained
from the disjoint union of G and H by joining a new edge uv. Then the
following statements hold.
n.
(1) ')'1 (I, u) = ')'1 (G, u) + mini ')'1 (H, v), ')'00 (H, v
(2) ,),O(I,u) = minbO(G,u) + ,),O(H,v),,),OO(G,u) +')'l(H,vn.
(3) ,),°°(1,u) = ,),OO(G, u) + ')'(H) = ,),OO(G, u) + minb 1 (H, v), ,),O(H, vn.

Proof. (1) follows from the fact that D is a dominating set of I with u ED
if and only if D = D' U D", where D' is a dominating set of G with u E D'
and D" is a dominating set of H with v E D" or a dominating set of H - v.
(2) follows from the fact that D is a dominating set of I with u ¢ D if
and only if D = D' U D", where either D' is a dominating set of G with
u ¢ D' and D" is a dominating set of H with v ¢ D", or D' is a dominating
set of G - u and D" is a dominating set of H with v E D".
(3) follows from the fact that D is a dominating set of I - u if and only if
D = D' UD", where D' is a dominating set of G - u and D" is a dominating
set of H. •
Base on the lemma and the theorem above, we have the following dy-
namic programming algorithm for the domination problem in trees.

Algorithm DomTreeD. Determine the domination number of a tree.


Input. A tree T with a tree ordering [Vb V2,·.·, vnJ.
Output. The domination number ')'(T) of T.
Method.
Algorithmic Aspects of Domination in Graphs 355

for i = 1 to n do
,1(Vi) t- Ij
,0 (Vi) t- OOj
,OO(V) t- OJ
end do;
for i = 1 to n - 1 do
let Vj be the father of Vij
,1(Vj) t- ,1 (Vj) + minb 1 (Vi),,00(Vi)}j
,0(Vj) t- minbO(vj) + ,0(Vi), ,00(Vj) +
,00(Vj) t- ,00(Vj) + minb1(vi),,0(Vi)}j
,1 (Vi)}j
end dOj
,(T) t- minb 1 (vn ),,0(vn )}.

The advantage of the dynamic programming method is that it also works


for problems whose solutions have no local property. As an example, Beyer,
Proskurowski, Hedetniemi and Mitchell [18] solved the independent domi-
nation problem by this method. Moreover, the method can be used to solve
the vertex-edge-weighted cases. The following derivation for the vertex-
edge-weighted domination in trees is slightly different from that given by
Natarajan and White [146].
Define ,1(G, u, w, w)"O(G, u, w, w) and 'Y00(G, u, w, w) in the same way
as ,1(G, u), ,O(G, u) and ,OO(G, u), except that IDI is replaced by w(D).
We then have:

Lemma 3.5 ,(G,w,w) = minb1 (G,u,w,w), 'Y°(G,u,w,w)} for any graph


G with a specific vertex u.

Theorem 3.6 Suppose G and H are graphs with specific vertices u and V,
respectively. Let I be the graph with the specific vertex u, which is obtained
from the disjoint union of G and H by joining a new edge uv. The following
statements hold.
(1) ,1(I,u,w,w) ,1(G,u,w,w) + minb(H,v,w,w),
w(uv) + ,OO(H, V, w, w)}.
(2) ,0(I,u,w,w) = minbO(G,u,w,w) + ,(H,w,w)"OO(G,u,w,w) +
w(uv) + ,1(H, V, w, w)}.
(3) ,°°(1,u, w, w) = ,OO(G, u, w, w) + ,(H, w, w).

Base on the lemma and the theorem above, we have the following algo-
rithm for the vertex-edge-weighted domination problem in trees.
356 G.J. Chang

Algorithm VEW-DomTreeD. Determine the vertex-edge-weighted dom-


ination number of a tree.
Input. A tree T with a tree ordering [Vb V2, ... ,Vn ], and each vertex V has
a weight w(v) and each edge e has a weight w(e).
Output. The vertex-edge-weighted domination number 'Y(T,w,w) ofT.
Method.

for i = 1 to n do
'Y(Vi) +- 'Yl(Vi) +- W(Vi);
'Y°(Vi) +- 00;
'Y00(Vi) +- 0;
end do;
for i = 1 to n - 1 do
let Vj be the father of Vi;
'Yl(Vj) +- 'Yl(Vj)+ min{'Y(vi),w(uv) + 'Y00(Vi)}i
'Y°(Vj) +- min{'Y°(vj) + 'Y(Vi) , 'Y00(Vj) + w(UV) + 'Yl (Vi)}i
'Y00(Vj) +- 'Y00(Vj) + 'Y(Vi)i
'Y(Vj) +- min{'Y1(vj),'Y°(Vj)}i
end do;
'Y(T, w, w) +- 'Y(vn ).

The dynamic programming method was also used in several papers for
solving variations of the domination problems in trees, see [9, 84, 102, 107,
162, 180].

3.4 Primal-dual approach for trees


The most beautiful method used in domination may be the primal-dual
approach. In this method, besides the original domination problem we also
consider the following dual problem. In a graph G = (V, E), a 2-stable set
is a subset S ~ V in which every two distinct vertices u and V have distance
d(u,v) > 2. The 2-stability number 02(G) of G is the maximum site of a
2-stable set in G. It is easy to see the

Weak Duality Inequality: 02(G) ~ 'Y(G) for any graph G.


Note that the above inequality can be strict, as shown by the n-cycle Cn
that 02(Cn ) = ljJ but 'Y(Cn ) = rjl
Algorithmic Aspects of Domination in Graphs 357

For a tree T, we shall design an algorithm which outputs a dominating


set D* and a 2-stable set S* with ID*I ~ IS*I. Then

which imply that all inequalities are equalities. Consequently, D* is a min-


imum dominating set, S* is a maximum 2-stable and the strong duality
equality a2(T) = 'Y(T) holds.
The algorithm starts from a leaf v adjacent to u. It also uses the idea as
in the labeling algorithm that u is more powerful than v since N[v) ~ N[u).
Instead of choosing v, we put u into D*. Besides, we also put v into S*.
More precisely, we have the following algorithm.

Algorithm DomTreePD. Find a minimum dominating set and a maxi-


mum 2-stable set of a tree.
Input. A tree T with a tree ordering [Vb V2, ... ,vn ).
Output. A minimum dominating set D* and a maximum 2-stable set S*
ofT.
Method.

D* +-</>;
S* +-</>;
for i = 1 to n do
let Vj be the father of Vi; (assume Vj = Vn for Vi = v n )
if (N[Vi) n D* = </» then
{ D* +- D* U {Vj}; S* +- S* U {Vi} };
end do.

To verify the algorithm, we have to prove that D* is a dominating set,


S* is a 2-stable set and ID*I ~ IS*I.
D* is clearly a dominating set as the if-then statement does.
Suppose S* is not a 2-stable set, i.e., there exist Vi and Vi' in S* such
that i < i' but dT(Vi,Vi') ~ 2. Let Ti = T[{Vi,Vi+l, ... ,Vn }). Then Ti
contains Vi, Vj and Vi', Since d( Vi, Vi,) ~ 2, the unique Vi-Vi' path in T (and
also in T') is either Vi,Vi' or Vi,Vj,Vi', In any case, Vj E N[W]. Thus, at
the end of iteration i, D* contains Vj. When the algorithm processes Vi',
N[Vi'] n D* "I- </> which causes that S* does not contain Vi', a contradiction.
ID*I ~ IS*I follows from that when we add Vj into D*, which mayor
may not be already in D*, we always add a new vertex Vi into S*.
358 G.J. Chang

Theorem 3.7 Algorithm DomTreePD gives a minimum dominating set D*


and a maximum 2-stable set S* 01 a tree T with ID*I = IS*I in linear time.

Theorem 3.8 (Strong Duality) (}2(T) = ')'(T) lor any tree T.


The primal-dual approach was in fact used by Farber [79] and Kolen
[123] for the weighted domination problem in trees. It was also used by
Cheston, Fricke, Hedetniemi and Jocobs [46] for upper fraction domination
in trees.

3.5 Tree related classes


Besides the three methods demonstrated in the previous subsections, 'trans-
formation method' sometimes is also used in the study of domination. Roughly
speaking, the method transform the domination problem in certain graphs
to another well-known problem, which is solvable. As this method depends
on the variation of domination and the type of graphs we are working on,
we will only mention it when it is used in the problem we survey in this
chapter.
There are some classes of graphs which have tree-like structures, includ-
ing block graphs, (partial) k-trees and cacti.
A block graph is a graph whose blocks are complete graphs. For results
on variations of domination in block graphs, see [34, 36, 110, 114, 181].
A cactus is a graph whose blocks are cycles. Hedetniemi [105] gave a
linear-time algorithm for the domination problem in cacti.
For a positive integer k, k-trees are defined recursively as follows: (i) a
complete graph of k vertices is a k-treej (ii) the graph obtained from a k-tree
by adding a new vertex adjacent to a clique of k vertices is a k-tree. Partial
k-trees are subgraphs of k-trees. For results on variations of domination in
k-trees and partial k-trees, see [1, 56, 83, 153, 159, 169, 170].

4 Interval graphs
4.1 Interval orderings of interval graphs
Recall that an interval graph is the intersection graph of a family of intervals
in the real line.
In many papers, people design algorithms or prove theorems for interval
graphs by using the interval models directly. This very often accomplished
Algorithmic Aspects of Domination in Graphs 359

by ordering the intervals according to some nondecreasing order of their


right (or left) endpoints.
For instance, suppose G = (V, E) is an interval graph with an interval
model
{Ii = [ai,bi] : 1 ~ i ~ n},
where bl ~ ~ ~ ... ~ bn • One can solve the domination problem for G by
using exactly the same primal-dual algorithm for trees except replacing the
4th line of Algorithm DomTreePD by
let j be the largest index such that v; E N[Vi]i
In order to prove that the revised algorithm works for interval graphs, we
only need to show that S* is a 2-stable set. Suppose to the contrary that S*
contains two vertices Vi and Vi' with i < i' and da(Vi, ViI) ~ 2, say there is a
vertex Vk E N(Vi) nN[Vil]. Consider the largest vertex v; of N[Vi] as chosen
in iteration i of the algorithm. Since Vk E N(Vi), we have k ~ j. Consider
two cases.
Case 1. j ~ i'. In this case, k ~ j ~ i'. Then bk ~ b; ~ bi'. Since
Vk E N[Vil], we have that Ik intersects IiI and so ai' ~ bk. Therefore,
ai' ~ b; ~ bil which imply that I; intersects IiI.
Case 2. i' < j. In this case, i < i' < j. Then bi ~ bil ~ b;. Since v; E
N(Vi), we have that Ii intersects I; and so a; ~ bi. Therefore, a; $ bil $ bj
which imply that Ij intersects IiI.
In any case, v; E N[W]. As Vj is put into D* in iteration i, when the
algorithm processes Vi', N[Vil] n D* =F 0 which causes that S* does not
contain Vi', a contradiction. Therefore, S* is a 2-stable set.
As one case see, the arguments in cases 1 and 2 are quit similar. This
is also true in many other proofs for interval graphs. One may expect that
there is a unified property can be said. This is in fact the so called interval
ordering in the following theorem (see [156]). Notice that once we have
property (10) below, the conclusion Vj E N[Vil] above follows immediately.

Theorem 4.1 G = (V, E) is an interval graph if and only if G has an


interval ordering which is an ordering [Vb V2, ••• ,vn ] of V satisfying

i < j < k and ViVk E E imply VjVk E E. (10)

Proof. (:::}) Suppose G is the intersection graph of

{Ii = [ai,bi] : 1 $ i ~ n}.


360 G.J. Chang

We may assume that bl :5 ~ :5 ... :5 bn • Suppose i < j < k and ViVk E E.


Then bi :5 bj :5 bk. Since ViVk E E, we have Ii n Ik :/: 0 which implies that
ak :5 bi· Thus, ak :5 bj :5 bk and so Ij n Ik :/: 0, i.e., VjVk E E.
({::) On the other hand, suppose (10) holds. For any Vi E V, let i* be
the minimum index such that Vi* E N[Vi] and let interval Ii = [i*, i]. If
ViVk E E with i < k, then k* :5 i < k and so Ii n Ik :/: 0. If Ii n Ik :/: 0
with i < k, say j E Ii n Ik, then k* :5 j :5 i < k. Since Vk*Vk E E, by (10),
ViVk E E. Therefore, G is an interval graph with

{Ii = [i*, i] : 1 :5 i :5 n}
as an interval model. •
4.2 Domatic numbers of interval graphs
Another interesting usage of the interval ordering is the following rewriting
for the result on the domatic numbers of interval graphs obtained by Bertossi
[16]. He transformed the domatic number problem on interval graphs to a
network flow problem as follows. This is a typical example of the transfor-
mation method.
We first describe Bertossi's method. Suppose G = (V, E) is an interval
graph, whose vertex set V = {1,2, ... ,n}, with an interval model {[ai,bi] :
1 :5 i :5 n}. Without loss of generality, we may assume that
no two intervals share a common endpoint and < a2 < ... < an.
al

We first add two 'dummy' vertices 0 and n + 1 to the graph with bo < al
and bn < an+!. We then construct an acyclic directed network H as follows.
The vertex set of H is {O, 1,2, ... ,n, n + I}, and there is a directed edge
(i,j) in H if and only if j E P(i) U Q(i), where
P( i) = {k : ai < ak < bi < bk}
and
Q(i) = {k : ak > bi and there is no h with bi < ah < bh < ak}.
Figure 8 shows an example of H.
Bertossi then proved that any path from vertex 0 to vertex n + 1 in H
corresponds to a proper dominating set of G and vice versa. His arguments
have a flaw. In fact this statement is not true as 0,2,3,5,6 is a path in
the directed network H in Figure 8, but its corresponding dominating set
{2, 3, 5} has a proper subset {2,5} that is also a dominating set. To be
precise, he only showed that
Algorithmic Aspects of Domination in Graphs 361

~ t-I _.=..5_-i
1 I 3
2

H
Figure 8: The construction of H for an interval graph of domatic number 2.

(PI) a O-(n + 1) path in H corresponds to a dominating set of G,


and
(P2) a proper dominating set of G corresponds to a O-(n + 1) path in
H.
Besides, the entire arguments can be treated in terms of the interval or-
dering as follows. We now assume [0, 1, 2, ... ,n, n + 1] is an interval ordering
of the graph G = (V, E) with two isolated vertices 0 and n + 1 added. We
then construct a directed network H' with
vertex set {O, 1,2, ... ,n, n + I}
and
edge set {ij : i <j and (i <h <j imply ih E E or hj E E)}.
Notice that H' is not the same as H. However, statements (PI) and (P2)
remain true if we replace H by H'. Also, there is a simpler proof as we can
use property (10) without getting involved the endpoints of the intervals.
The argument is as follows.
First, a O-(n + 1) path P : 0 = io, ib ... ,ir , ir+! = n + 1 in H cer-
tainly corresponds to a dominating set D = {io, ib ... , ir , i r +!} of G by
the definition of the edge set of H'. This proves (PI). Conversely, for a
proper dominating set D of G, consider the corresponding path P. For any
o $ s $ r, suppose is < h < is+!. Since D is a dominating set of G, there
exists some ij E D such that hij E E.
Case 1. j $ s. Then ish E E by (10).
Case 2. j = s + 1. Then his+! E E.
362 G.J. Chang

Case 3. j > s+ 1. If N[i s+!] ~ N[ij], then D\ {vs+t} is a dominating set


of G, violating that D is a proper dominating set. So, there is some vertex
k E N[is+!] \N[ij]. The case of is+! < ij < k or h < k < ij implies k E N[ij]
by (10), a contraction. The case of ij < h < is+! implies his+! E E.
So, in any case, isis+! is an arc in H. This proves (P2).

Primal-dual approaches were also used in [135, 166] to solve the domatic
number problem in interval graphs. Manacher and Mankus [137] made it
possible to get an O(n) algorithm for the problem. In fact, the same method
was also obtained by Peng and Chang [149], whose idea was generalized to a
net result for the problem in a larger class of graphs, strongly chordal graphs
(see Section 6.3).

4.3 Weighted independent domination in interval graphs


There are many algorithms for variants of domination in interval graphs,
see [15, 17, 40, 42, 44, 155, 156]. Among them, Ramalingam and Pandu
Rangan [156] gave a unified approach to the weighted independent dom-
ination, the weighted domination, the weighted total domination and the
weighted connected domination problems in interval graphs by using the in-
terval orderings. We demonstrate their algorithms in this and the following
subsections.
Now, suppose G = (V, E) is an interval graph with vertex set V =
{1, 2, ... , n}, where [1,2, ... , n] is an interval ordering of G. We also assume
that each vertex is associated with a real number as its weight. Notice
that except for independent domination, according to Lemma 2.1, we may
assume that the weights are nonnegative. Consider following notation.
Vi = {1, 2, ... , i} and Gi denotes the subgraph G[Vi] induced by Vi.
Vo is the empty set.
low(i) = minimum element in N[i].
maxlow(i) = max{low(j) : low(i) :5 j :5 i}.
Li = {maxlow(i), maxlow(i) + 1, ... , i}.
Mi = {j : j > iand j is adjacent to i}.
For any family X of sets of vertices, we use min {X} to denote a minimum-
weighted set in X. If X is the empty set, then min{ X} denotes a set of
infinite weight.
Algorithmic Aspects of Domination in Graphs 363

Notice that the vertices in the set {I, 2, ... ,low{i)-I} are not adjacent to
i and the vertices in {low{i), low{i) +1, ... , i} are adjacent to i. The vertices
in Li form a maximal clique in the graph Gi' Let j be the vertex such that
low{i) ::; j ::; i and maxlow{i) = low{j). Then, low{i) ::; low(j) ::; j ::; i. It
can easily be seen that N[j] is a subset of Li U Mi. Furthermore, in Gi, j is
adjacent only to the vertices in Li.
Having all of these, we are now ready to establish the solutions to
weighted independent domination problem in interval graphs.

Let I Di denote an independent dominating set of the graph Gi and let


MIDi denote the minimum weighted I Di.
We know that, in Gi, the set Li is a maximal clique and that there is
a vertex in Li which is not adjacent to any vertex in Vi \ Li. Hence, any
IDi contains exactly one vertex j in Li. Furthermore, it is necessary and
sufficient that IDi \ {j} dominates ViOW(j)-l and contains no vertex adjacent
to j. Hence, IDi \ {j} is an independt dominating set of G1ow(j)-1' In other
words, a set is an IDi if and only if it is of the form ID1ow(j)-1 U {j} for
some j in Li.
Therefore, we have the following lemma.

Lemma 4.2 (a) MIDo = 0. (b) For 1 ::; i ::; n,


MIDi = min{MID1ow(j)_1 U {j} : j ELi}.

A linear-time algorithm for the weighted independent domination prob-


lem in interval graphs then follows. We omit the detail description as it is
easy.

4.4 Weighted domination in interval graphs


Let Di denote a subset of V that dominates Vi. Unlike to independent
domination, we do not restrict Di to be a subset of Vi. Let MDi denote a
minimum weighted Di.
We know that there is a vertex in Li whose neighbors are all in Li U Mi.
Thus Di contains some vertex j in LiUMi. It is necessary and sufficient that
the remaining set Di \ {j} dominates Viow(j)-l since j dominates all vertices
in Vi - 'Viow(;)-l and no vertex in Viow(j)-l' (Note that if j E Li U Mi , then
low(j) :5 i.) In other words, a set is a Di if and only if it is of the form
D1ow(j)-1 U {j} for some j in Li U Mi.
Hence, we have the following lemma.
364 G.J. Chang

Lemma 4.3 (a) MDo = 0. (b) For 1 ~ i ~ n,

MDi = min{MD1ow(j)_1 U {j} : j ELi U Mi}.

4.5 Weighted total domination in interval graphs


Let T Di denote a subset of V that totally dominates Vi and let MT Di
be a minimum weighted T Di. Let P Di denote a subset of V that totally
dominates {i} U Viow(i)-l and let MPDi be a minimum weighted PDi.
As in domination, T Di also includes some vertex j in Li U Mi. If j ELi,
then it is necessary and sufficient that the set T Di \ {j} totally dominates
Viow(j)-l U {j}. If j E Mi, then it is necessary and sufficient that the set
TDi \ {j} totally dominates Viow(j)-l'
Similarly, any P Di includes some vertex j adjacent to i. By the defi-
nition, j ~ low{i). Hence, it is necessary and sufficient that the PDi \ {j}
totally dominates Vmin{low(i)-l,low(j)-l}'
Therefore, we have the following lemma.

Lemma 4.4 (a) MTDo = 0. (b) For 1 ~ i ~ n,

MTDi = min ({MPD;U{j} : j E LdU{MTD1ow(;)-1 U{j} : j E Mi}).

MPDi = min{MTDmin{low(j)-l,low(i)-l} U {j} : j E N(i)}.

Note that in the original paper by Ramalingam and Pandu Rangan [156],
there is a typo that using j E N[i] rather than j E N(i) in the formula for
MPDi.

4.6 Weighted connected domination in interval graphs


Let CDi denote a connected dominating set of Gi that contains the vertex
i and let MCDi denote a minimum weighted CDi.
If low{i) = 1, then MCDi is {i} since all vertices have nonnegative
weights. If low(i) > 1, then any CDi contains vertices other than i, and
hence some vertex adjacent to i in Gi. Let j be the maximum vertex in
CDi \ {i}. We may assume that low(j) < low{i). Otherwise we may remove
j to get a CDi of the same or lower weight. If low(j) < low(i), then any
other vertex of Gj adjacent to i is also adjacent to j. Thus, it is necessary
and sufficient that CDi \ {i} is a CDj.
Hence, we have the following lemma.
Algorithmic Aspects of Domination in Graphs 365

Lemma 4.5 (a) If low(i) = 1, then MODi = {i}. (b) For low(i) > 1,
MODi = min{MIDj U {i} : j E N[i] and j <i and low(j) < low(i)}.
(c) min{MODn : i E Ln} is a minimum weighted connected dominating
set of the graph G.

5 Chordal graphs and NP-complete results


5.1 Perfect elimination orderings of chordal graphs
In previous sections, we have seen two classes of graphs in which the domina-
tion problem is well solved. We then try to generalize the results for general
graphs. However, the NP-completeness theory raised by Cook suggests that
this is in general quit impossible. Garey and Johnson, in a unpublished
paper (see [86]), pointed out that the dominating problem is NP-complete
by transforming the vertex cover problem to it.

VERTEX COVER
INSTANCE: A graph G = (V, E) and a positive integer k ~ IVI.
QUESTION: Is there a subset 0 S; V of size at most k such that each edge
xy of G has either x E 0 or yEO?

Their idea can in fact be modified to proved many NP-complete re-


sults for the domination problem and its variants in many classes of graphs.
Among these classes, the classes of chordal graphs is of most interesting in
the study of many graph optimization problem. Chordal graphs are raised
in the theory of perfect graphs, see [88]. It contains trees, interval graphs,
directed path graphs, split graphs, undirected path graphs ... as subclasses.
Recall that a graph is chordal if every cycle of length at least four has
a chord. The following property is an important characterization of chordal
graphs. The proof presented here is from Theorem 4.3 in [89], except we do
not introduce the Maximum Cardinality Search explicitly.

Theorem 5.1 A graph G = (V, E) is chordal if and only if it has a perfect


elimination ordering which is an ordering [VI, V2, ••• , v n ] of V such that

i < j < k and ViVj, ViVk E E imply VjVk E E. (PEO)

Proof. (::}) For any ordering u = [Vl' V2, •.• ,vn ], define the vector

d(u) = (d n , dn - I , ... , dd,


366 G.J. Chang

where each dB = I{t : t > 8 and Vt is adjacent to VB}' Choose an ordering (1


such that d( (1) is lexicographically largest.
Suppose p < q < r and Vr E N(vp) \ N(vq). Consider the ordering (11
obtained from (1 by interchanging vp and vq. Then d~ = dB for all 8 > q and

I{t: t> q and Vt E N(vp)}1 = d~ :$ dq = I{t : t > q and Vt E N(vq)}l.


However, r is a vertex such that r > q and Vr E N(vp) \ N(vq). Then, there
exists some 8 > q such that VB E N(vq) \ N(vp). This gives

p < q < r,Vr E N(vp)\N(vq) imply VB E N(vq)\N(vp) for some 8 > q. (*)
Next, we use (*) to prove the following claim.
Claim. There does not exist any chordless path P : Vi1' Vi2' ... , Vi., with
x ~ 3 and iy < ix < il for 1 < y < x.
(Notice that the claim implies (PEO) as Vk, Vi, Vj is a chordless path with
i < j < k whenever VjVk ¢ E.) Suppose to the contrary that such a
path P exists. Choose one with a largest i x' Since i2 < ix < il and
Vi1 E N(Vi2) \ N(Vi.,), by (*), there exists some iX+1 > ix such that Vi.,+1 E
N (Vi.,) \ N (Vi2)' Let z be the minimum index such that z ~ 2 and Vi.,+1 V% E
E. Note that z exists and z ~ 3. For the case when Vi1Vi.,+1 ¢ E, pi :
Vi1 , Vi2' ... , Vi~_1 , Vi~, Vi.,+1 (or its inverse) is a chordless path of length at
least three with iy < iX+1 < il (or iy < il < i x+1) for 1 < y :::; z. In this
case, ix < iX+1 is a contradiction to the choice of P. For the case when
Vi1 Vi.,+1 E E, pi together with the edge Vi1 Vi., +1 form a chordless cycle of
length at least four, a contradiction to that G is chordal.
(-<=) On the other hand, suppose (PEO) holds. For any cycle of length
at least four, choose the vertex of the cycle with the least index. By (PEO),
the two neighbors of this vertex in the cycle are adjacent. •

5.2 NP-completeness for domination


We now give the reduction for the NP-completeness of domination given by
Garey and Johnson. We in fact adapt their results for split graphs, which
are special chordal graphs.
A split graph is a graph whose vertex set is the disjoint union of a clique
C and a stable set S. Notice that a split graph is chordal as the ordering with
the vertices in S first and the vertices in C next gives a perfect elimination
ordering.
Algorithmic Aspects of Domination in Graphs 367

Theorem 5.2 The domination problem is NP-complete for split graphs.

Proof. We transform the vertex cover problem to the domination problem


in split graphs. Given a graph G = (V, E), construct the graph G' = (V', E')
with
vertex set V' = VUE
and
edge set E' = {VIV2 : VI =j:. V2 in V} U {ve: vEe}.
Notice that G' is a split graph whose vertex set V' is the disjoint union of
the clique V and the stable set E.

a
e
b

c
g
d
G G'
Figure 9: A transformation to a split graph.

If G has a vertex cover C of size at most k, then C is a dominating set


of G' of size at most k, by the definition of G'. On the other hand, suppose
G' has a dominating set D of size at most k. If D contains any e E E, say
e = UV, we can replace e with u to get a new dominating set of size at most
k. In this way, we may assume that D is a subset of V. It is then clear that
D is a vertex cover of G of size at most k.
Since the vertex cover problem is NP-complete, the domination problem
is also NP-complete for split graphs. •
Note that the dominating set of G' in the proof above in fact induces a
connected subgraph. Hence, we also have

Corollary 5.3 The total and the connected domination problems are NP-
complete for split graphs.

We can modify the above proof to get

Theorem 5.4 The domination problem is NP-complete for bipartite graphs.


368 G.J. Chang

Proof. We transform the vertex cover problem to the domination problem


in bipartite graphs. Given a graph G = (V, E), construct the graph G' =
(V', E') with
vertex set V' = {x, y} U VUE
and
edge set E' = {xy} U {yv : v E V} U {ve : vEe}.
Notice that G' is a bipartite graph whose vertex set V' is the disjoint union
of two stable sets {x} U V and {y} U E.

a
e
b
x
c

d
G G'
Figure 10: A transformation to a bipartite graph.

If G has a vertex cover C of size at most k, then {y} U C is a dominating


set of G' of size at most k + 1. On the other hand, suppose G' has a
dominating set D of size at most k+ 1. Sine Nc' [x] = {x, y}, D must contain
x or y. We may assume that D contains y but not x, as (D \ {x}) U {y} is
also a dominating set of size at most k + 1. Since y ED, if D contains any
e E E, say e = UV, we can replace e with U to get a new dominating set of
size at most k + 1. In this way, we may assume that D \ {y} is a subset of
V. It is then clear that D \ {y} is a vertex cover of G of size at most k.
Since the vertex cover problem is NP-complete, the domination problem
is NP-complete for bipartite graphs. •

Corollary 5.5 The total and the connected domination problems are NP-
complete for bipartite graphs.

There are many other NP-complete results for variations of domination,


see [9, 14,21,56,63,66,84,102,110, 145, 180, 181] and [86, 115, 116]. Most
of the proofs are more or less similar to the above two. There are only very
few are proved by different ways. As an example, Booth and Johnson [21]
Algorithmic Aspects of Domination in Graphs 369

proved that the domination problem is NP-complete for undirected path


graphs, which is another subclass of chordal graphs, by reducing the 3-
dimensional matching problem to it.

Theorem 5.6 The domination problem is NP-complete for undirected path


graphs.

Proof. Consider an instance of the 3-dimensional matching problem, in


which there are three disjoint sets W, X and Y each of size q and a subset

of W x X x Y having size p. The problem is to find a subset M' of M having


size exactly q such that each Wr E W, X, E X and Yt E Y occurs in precisely
one triple of M'.
Given an instance of the 3-dimensional matching problem, we construct
a tree i having 6p + 3q + 1 vertices from which we obtain an undirected
path graph G. The vertices of the tree, which are represented by sets, are
explained below.
For each triple mi E M there are six vertices depend only upon the triple
itself and not upon the elements within the triple:
{Ai, Bi, Oi, Di}
{Ai, Bi, Di, Pi}
{Ci,Di,Gi}
{Ai, Bi, Ei}
{Ai,Ei,Hi}
{Bi' E i , Ii} for 1 ~ i ~ p.
These six vertices form the subtree of i corresponding to mi, which is
illustrated in Figure 11. Next, there is a vertex for each element of W, X
and Y that depends upon the triples of M to which each respective element
belongs:
{14 } U {Ai : wr E mil for Wr E W,
{Ss} U {Bi : X, E mil for X, EX,
{Tt} U {Oi : Yt E mil for Yt E Y.
Finally, {Ai, Bi, Cj : 1 ~ i ~ p} is the last vertex of i. The arrangement
of these vertices in the tree i is shown in Figure 11. We then have an
undirected path graph G with vertex set
370 G.J. Ghang

forYt EY

tree T
formiEM
Figure 11: A transformation to an undirected path graph.

of size 9p + 3q, where the undirected path in 7 corresponding to a vertex v


of G consists of those vertices (sets) containing v in the tree 7.
We next claim that G has a dominating set of size 2p + q if and only if
the 3-dimensional matching problem has a solution.
Suppose D is a dominating set of G of size 2p+q. Observe that for any i,
the only way to dominate the vertex set {Ai, B i , Gi, Gi , D i , E i , Fi , Gi, Hi, Ii}
corresponding to mi with two vertices is to choose Di and Ei, and that
any larger dominating set might just as well consist of Ai, Bi and Gi, since
none of the other possible vertices dominate any vertex outside of the set.
Consequently, D consists of Ai, Bi and Gi for t mi's, and Di and Ei for p-t
other mi's, and at least max{3(q - t), O} Rr, 8 8 , Tt • Then,

2p + q = IDI ~ 3t + 2(P - t) + 3(q - t) = 2p + 3q - 2t

and so t ~ q. Picking q triples mi for which Ai, Bi and Gi are in D form a


matching M' of size q.
Conversely, suppose the 3-dimensional matching problem has a solutior
M' of size q. Let

D = {Ai,Bi,Gi: mi EM'} U {Di,Ei: mi E M \ M/}.


It is straightforward to check that D is a dominating set set of G of size
3q + 2(P - q) = 2p + q. •
Algorithmic Aspects of Domination in Graphs 371

5.3 Independent domination in chordal graphs


Farber [80] showed a surprising result that the independent domination prob-
lem is solvable by using a linear programming method. On the other hand,
an unpublished proof by the author is that the weighted independent dom-
ination problem is NP-complete. We adapt the proof of Theorem 5.2 for
such a proof, which is different from but has the same spirit as the original
proof.

Theorem 5.7 The weighted independent domination problem is NP-complete


for chordal graphs.

Proof. We transform the vertex cover problem to the weighted independent


domination problem in chordal graphs. Given a graph G = (V, E), construct
the following chordal graph G' = (V', E') with
vertex set V' = {v",v',v : v E V} U E
and
edge set E' = {v"v', v'v : v E V} U {ve : vEe} U {ele2 : el =1= e2 in
E}.
The weight of each e E E is 21V1 + 1 and the weight of each vertex in V' is
one.

a" a' a
a

d
G G'
Figure 12: A transformation to a weighted chordal graph.

If G has a vertex cover C of size at most k, then {v", v : v E C} U {v' :


v E V \ C} is an independent dominating set of G' with weight at most
IVI + k. On the other hand, suppose G' has an independent dominating set
D of weight at most IVI + k. As k :5 lVI, D contains no elements in E. Let
C = D n V. It is clear that C is a vertex cover of G. Also, for each v E V
372 G.J. Cbang

the set D contains exactly one vertex in {v", Vi}. Thus C is of size at most
k.
Since the vertex cover problem is NP-complete, the weighted indepen-
dent domination problem is NP-complete for chordal graphs. •
We are now on the way to present Farber's algorithm for the independent
domination problem in chordal graphs. Suppose now G = (V, E) is a chordal
graph with a perfect elimination ordering [Vb V2, ••• , vn ] and vertex weights
WI, W2, ••• ,Wn of real numbers. We write

i ..... j for Vi E N[Vj],


i<j for i ..... j and i :5 j,
i5-j for i ..... j and i ~ j.

It follows from the definition of a perfect elimination ordering that each

is a clique. Thus a set S of vertices is independent if and only if each OJ


contains at most one vertex of S. Also, S is a dominating set if and only
if for each j, S contains at least one vertex Vi with i ..... j. Consider the
following linear problem:
n
Pl(G,W): Minimize E WiXi,
i=1

subject to E Xi ~ 1 for each j,


i"'j

E Xi :5 1 for each j,
i>j
for each i.

It follows from the above comments that there is a one-to-one correspon-


dence between independent dominating sets of G and feasible 0-1 solutions
to PI (G, w). Moreover, an optimal 0-1 solution to PI (G, w) corresponds
to a minimum weighted independent dominating set in G. Notice that a
set of vertices of G is an independent dominating set in G if and only if it
is a maximal independent set in G. Consequently, there exist independent
dominating sets, and PI (G, w) is a feasible linear program. It will follow
from the algorithm presented below that if every vertex of G has a weight in
Algorithmic Aspects of Domination in Graphs 373

{1,0,-1,-2, ... } then Pl(G,W} has an optimal 0-1 solution, and that the
following dual program has an integer optimal solution:
n
Dl(G, w): Maximize L (Yj - Zj),
j=l

subject to L Yj - L Zj :::; Wi for each i,


j'Vi {<i
for each j.

If, however, not all vertex weights are in {I, 0, -1, - 2, ... } (even though
they are all integers), then it may be the case that neither P l (G, w) nor
Dl (G, w) has an integer solution. For the reminder of this section, we assume
that each vertex has a weight in {I, 0, -1, -2, ... }.
We first define two functions which simplify the presentation of the al-
gorithm. For each i, let

f(i) = Wi + LZj - LYj


j<i j'Vi

and
g(i} = Wi + LZj - LYj.
{<i P.i
Note that f(i) is the slack in the dual constraint associated with vertex Vi.
The algorithm to locate a minimum weighted dominating set in G has
two stages. Stage one finds a feasible solution to Dl (G, w) by scanning the
vertices in the order Vl, V2, ••• ,Vn ; and stage two uses this solution to find

V n , Vn-l, ..• ,Vl. Initially, each Yj = 0, each Zj = °


a feasible 0-1 solution to P l (G, w) by scanning the vertices in the order
and each Xi = 2. (The
interpretation of Xi = 2 is that Xi has not yet been assigned a value.) If,
at the time Vj is scanned in stage one, the dual constraint associated with
Vj is violated, i.e., if f(j) < 0, then we add just enough to Zj to bring
that constraint into feasible. Otherwise, we add as much as possible to
Yj without violating the dual constraint associated with Vj or with any
previously scanned vertex. In stage two, if Xi = 2 and g(i) = when Vi is °
°
scanned, then we let Xi = 1 and let Xj = for each Vj adjacent to Vi.
A formal description of the algorithm is given below.

Algorithm IndDomChordal. Determine 'Yi (G, w) for a chordal graph G


with weights Wl, W2, •.. , Wn in {I, 0, -1, -2, ... }.
374 G.J. Chang

Input. A chordal graph G with a perfect elimination ordering [Vl, V2,'" ,vn ]
and vertex weights WbW2, ... ,Wn in {1,0, -1, -2, ... }.
Output. Optimal solutions to Pl(G,W) and Dl(G,W).
Method.

°
each Yj t- 0, each Zj t- and each Xi t- 2;
Stage one: for j = 1 to n do
°
if f(j) < then Zj t- - f(j)
else Yj t- min{f(k) : k<j};

°
Stage two: for i = n to 1 by -1 do
if Xi = 2 and g(i) = then
{ Xi t- 1; for each Vj adjacent to Vi do Xj t- 0. }

We now establish the validity of the algorithm.

Theorem 5.8 Algorithm IndDomChordal finds a minimum weighted in-


dependent dominating set of a chordal graph with vertex weights are in
{1,0,-1,-2, ... }.

We first need several lemmas. Note that since the weights are integral,
all Xi, Yj, Zj, f(i), g(i) are integral at any time.

Lemma 5.9 For each j, f(j) ~


one.
° at all times after scanning Vj in stage

° °
Proof. The fact that f(j) ~ immediately after scanning Vj in stage one
follows from the choice of Zj and Yj. The fact that f(j) ~ after scanning
each Vk where k > j follows from the choice of Yk. •

Lemma 5.10 At the end of stage one, Yj ;::: °


for any j.

Proof. This is an immediate consequence of Lemma 5.9. •


Lemma 5.11 At the end of stage one, °:$ f(j) :$ g(j) for any j.
Proof. This follows from Lemmas 5.9 and 5.10. •
Lemma 5.12 At the end of stage one, for each i, there is a j<i such that
g(j) = O.
Algorithmic Aspects of Domination in Graphs 375

Proof. According to Lemma 5.11, g(i) ~ O. If g(i) = 0, then the lemma


is true as we may choose j = i. Suppose g(i) > O. Then f(i) was positive
immediately after scanning Vi, and so, by the choice of Zi and Vi, Zi = 0 and
Yi was chosen to force f(j) to be 0 for some j<:'i. Thus there is some j<:'i
such that

If Yk = 0 for each k such that k>j and k ~ i, then g(j) = O. Otherwise,


choose k such that Yk > 0, k>j and k ~ i. Then i '" k since i>j, k>j and
[Vb V2, •.. , vn ] is a perfect elimination ordering. Hence k<:'i since i ~ k and
i '" k. Since all vertex weights are integral, Yk is an integer. Thus Yk ~ Wi,
since Wi E {I, 0, -1, -2, ... }. Now, g(i) > 0, Yk ~ Wi and k<:'i, and hence
there is some i<:'i such that Zl > O. By the choice of Zl, we have g(i) = O. •
Note that the proof of Lemma 5.12 relies upon the fact that G has vertex
weights in {I, 0, -1, -2, ... } to show that if Yk > 0 then Yk ~ Wi.

Lemma 5.13 At the end of stage two, for each j, if g(j) = 0 then Xk = 1
for some k>j.

Proof. It is clear from the instructions in stage two that, for each i, if Xi = 1
at any time during the algorithm then Xi = 1 at the end of the algorithm.
Suppose g(j) = O. If Xj was 2 just prior to scanning Vj in stage two then Xj
was assigned the value of 1 when Vj was scanned, and hence Xj = 1 at the
end of the algorithm. In this case, we choose k = j as desired. Otherwise,
Xj was 0 prior to scanning Vj. In that case, Xj = 0 by virtue of the fact that
Xk = 1 for some previously scanned neighbor Vk of Vj, i.e., Xk = 1 for some
k>j. •
Proof of Theorem 5.S. It is easy to see that Algorithm IndDomChordal
halts after O(IVI + lEI) operations. We now show that the final values of
X1,X2,··., Xn and YI, Y2,··· ,Yn,Zb Z2,··· Zn are feasible solutions to P1(G, w)
and DI (G, w) respectively, and then verify that these solutions satisfy the
conditions of complementary slackness. It then follows that they are opti-
mal.
(i) Feasibility of dual solution: Clearly Zj ~ 0 for each j. By Lemmas
5.9 and 5.10, Yj ~ 0 and f(j) ~ 0 for each j.
(ii) Feasibility of primal solution: The instructions in stage two
guarantee that if Xi = Xj = 1 then ViVj ¢ E, and if Xi = 0 then Xj = 1
for some j '" i. Since a 0-1 primal solution is feasible if and only if the set
376 G.J. Chang

{Vj : Xj = I} is an independent dominating set in G, it suffices to show that


each Xi is either 0 or 1. According to Lemma 5.12, for each i, there is a
j<:'i such that g(j) = O. According to Lemma 5.13, there is a k>j such that
Xk = 1. Since i>j, k>j and [Vb V2,"" v n ] is a perfect elimination ordering,
it follows that i '" k. If i = k then Xi = 1; otherwise Xi = O.
(iii) Complementary slackness: If Xi > 0 then g(i) = 0 by the choice
of Xi, and hence f(i) = 0, by Lemma 5.11. Thus E Yj - E Zj = Wi.
j,..,i j<i
If Zj > 0 then g(j) = 0 by the choice of Zj, and so Xk = 1 for some k>j,
by Lemma 5.13. Hence E Xi ~ 1. Equality follows from the fact that the
i>j
primal solution is feasible.
Suppose Yj > 0 but Xi = Xk = 1 for i '" j and k '" j. Since Xi = Xk = 1,
g(i) = g(k) = 0, and so j ~ min{i,k} by Lemma 5.9, Lemma 5.11 and the
choice of Yj. Thus i>j and k>j, which imply i '" k since [Vb V2, ••• , vn ] is a
perfect elimination ordering. On the other hand, ViVk ¢ E since Xi = Xk = 1.
Consequently i = k. Hence, if Yj > 0 then E Xi :5 1. Equality follows from
i,..,j
the fact that the primal solution is feasible. •

6 Strongly chordal graphs


6.1 Strong elimination orderings of strongly chordal graphs
Strongly chordal graphs were introduced by several people [37, 81, 106] in
the study of domination. In particular, most variants of the domination
problem are solvable in this class of graphs. There are many equivalent
ways to define them. This section adapts the notation from Farber's paper
[81].
A graph G = (V, E) is strongly chordal if it admits a strong elimination
ordering which is an ordering [VI, V2, ... ,vn ] of V such that the following
two conditions hold.
(a) If i < j < k and ViVj, ViVk E E, then VjVk E E.
(b) If i < j < k < l and ViVk,ViVt,VjVk E E, then VjVt E E.
Notice that an ordering satisfying condition (a) is a perfect elimination
ordering. Hence, every strong elimination ordering is a perfect elimination
ordering, and every strongly chordal graph is chordal. We then also use the
notation i '" j for Vi E N[Vj].
Algorithmic Aspects of Domination in Graphs 377

The following equivalent definition of the strong elimination ordering is


more convenient in many arguments for strongly chordal graph.

Lemma 6.1 An ordering [VI,V2,'" ,vn ] of the vertices of G is a strong


elimination ordering of G if and only if

(SEO)

Proof. Suppose [VI, V2, ... ,vn ] is a strong elimination ordering of G. Sup-
pose that i :s; j, k :s; t, i '" k, i '" t and j '" k. If i = j or k = t or j = t,
then clearly j '" t. Suppose, on the other hand, that i < j, k < t and j =I- t.
By symmetry, we may assume that i :s; k. We consider three cases.
Case 1. Suppose j = k. Then i < j < t and ViVj, ViVl E E, whence
VjW E E, by (a) in the definition of a strong elimination ordering.
Case 2. Suppose j < k. Then i < j < k < t and ViVk, ViVl, VjVk E E,
whence VjV/. E E, by (b) in the definition of a strong elimination ordering.
Case 3. Suppose k < j. If i = k, then VkVl E E. Otherwise, i < k < t
and ViVk, ViW E E, whence VkW E E. Consequently, VkVl, VkVj E E and
either k < t < j or k < j < t. In either case, VjVi E E.
This complete the proof of necessity. The proof of sufficiency is trivial
and thus omitted. •

6.2 Weighted domination in strongly chordal graphs


There are quit a few algorithms have been designed for variants of the dom-
ination problem, see [31, 33, 37, 38, 81, 106, 124, 173]. This section presents
the linear algorithm, given by Fraber [81], for locating a minimum weighted
dominating set in a strongly chordal graph.
Let G = (V, E) be a strongly chordal graph with a strong elimination
ordering [VI, V2, ... ,vn ] and vertex weights WI, W2, ... , W n . According to
Lemma 2.1, we may assume that these vertex weights are nonnegative. Con-
sider the following linear problem:
n
P2(G, w): Minimize E WiXi,
i=I

subject to E Xi ~ 1 for each j,


i"'j

for each i.
378 G.J. Chang

By the definition, a set S of vertices of G is a dominating set if and only


if, for each j, S contains some vertex Vi such that i '" j. Consequently, there
is a one-to-one correspondence between feasible 0-1 solutions to P2(G,W)
and dominating sets in G. Moreover, an optimal 0-1 solution to P2(G,w)
corresponds to a minimum weighted dominating set in G. It follows from
the algorithm below that P2 (G, w) has an optimal 0-1 solution and that the
following dual program has an optimal solutions:
n
D2(G,W): Maximize E Yj,
j=1

subject to E Yj ~ Wi for each i,


jrvi

· >0
Y3 - for each j.
The algorithm presented below solves the linear programs P2 (G, w) and
D2 (G, w). To simplify the presentation of the algorithm, we define a function
h and a family of sets. For each i, let
h(i) = Wi - LYj and Ti = {j : i '" j and Yj > a}.
jrvi

Note that h(i) is the slack in the dual constraint associated with vertex Vi,
and Ti is the set of constraints in P2(G,W) containing Xi which must be at
equality to satisfy the conditions of complementary slackness.
The algorithm has two stages. Stage one finds a feasible solution to
D2 (G, w) by scanning the vertices in the order V1, V2, ... , Vn; and stage two
uses this solution to find a feasible 0-1 solution to P2 (G, w) by scanning
the vertices in the order Vn, Vn-l, ... ,V1. In this algorithm we utilize a set
T to assure that the conditions of complementary slackness are satisfied.
Initially, T = {1, 2, ... ,n}, each Yj = 0 and each Xi = O. When Vj is
scanned in stage one, we add as much as possible to to Yj without violating
the dual constraint. In stage two, if h(i) = 0 and Ti ~ T when Vi is scanned
then we let Xi = 1 and replace T by T \ 11. Otherwise Xi remains O. A more
formal description of the algorithm follows.

Algorithm WDomSC. Determine ,(G,w) for a strongly chordal graph G


with nonnegative vertex weights W1, W2, ... , wn •
Input. A strongly chordal graph G with a strong elimination ordering [V1'
V2, ... , vn] and nonnegative vertex weights W1, W2,"" wn.
Output. Optimal solutions to P2(G, w) and D2 (G, w).
Method.
Algorithmic Aspects of Domination in Graphs 379

T f-{1,2, ... ,n}, each Yj f- 0, and each Xi f- 0;


Stage one: for j = 1 to n do
Yj f- min{h(k) : k '" j};
Stage two: for i = n to 1 by -1 do
if h(i) = 0 and 11 ~ T then
{ Xi f- 1; T f- T \ Ti. }

We now verify the algorithm.

Theorem 6.2 Algorithm WDomSC finds a minimum weighted dominating


set of a strongly chordal graph with nonnegative vertex weights in linear time
providing a strong elimination ordering is given.

Proof. It is easy to see that the algorithm halts after O(IVI + lEI) opera-
tions. In order to show that the final values of Xb X2, ... ,Xn and Yb Y2, •.• ,Yn,
Zl, Z2, •.. Zn are optimal solutions to P2(G, w) and D2(G, w), respectively, it
suffices to show that these solutions are feasible and that they satisfy the
conditions of complementary slackness.
(i) Feasibility of dual solution: The instructions in stage one guar-
antee that h(i) ~ 0 for each i and Yj ~ 0 for each j.
(ii) Feasibility of primal solution: Clearly, each Xi is either 0 or 1.
Thus it suffices to show that for each j, Xi = 1 for some i '" j. By the choice
of Yj, there is a k '" j such that h(k) = 0 and maxTk :5 j. If Xk = 1 we are
done. Otherwise, by the algorithm, Tk was not contained in T when Vk was
scanned in stage two. Since, in stage two, the vertices are scanned in the
order Vn, Vn-b ... ,Vb there is some l > k such that Xl = 1 and Tl n Tk =1= 0.
Let i E TlnTk. Then i :5 j since maxTk :5 j. Thus i :5 j, k < l, i '" k, i '" l
and j '" k, whence l '" j by Lemma 6.1, since [Vb V2, ... , Vn] is a strong
elimination ordering. Hence l '" j and Xl = 1. Consequently, the primal
solution is feasible.
(iii) Complementary slackness: If Xi > 0, then Xl = 1 and so h(i) =
0, i.e., E Yj = Wi·
j",i
Suppose Yj > O. It is clear from the instructions that if Xi = Xk = 1,
then Ti n Tk = 0. Thus, E Xi :5 1. Equality follows from the feasibility of
i"'j
the primal solution. •
Farber in fact also gave an algorithm for the weighted independent domi-
nation problem with arbitrary real weights for strongly chordal graphs. The
380 G.J. Chang

approach is similar to that for chordal graphs (with restricted weights) in


Section 5.2. The only difference is that the function g(i) is replaced by a set
Si = {j : i '" j and Yj > O} which has a similar usage as 11 in this section.
The development is similar to that in Section 5.2 and thus omitted.

6.3 Domatic partition in strongly chordal graphs


Peng and Chang [150] gave an elegant algorithm for the domatic partition
problem in strongly chordal graphs.
Their algorithm uses a primal-dual approach. Suppose [Vb v2, ... ,vn ] is
a strong elimination ordering of G = (V, E) with the minimum degree 8(G).
Choose a vertex x of degree 8(G). As any dominating set Di in a domatic
partition of G contains at least one vertex Vi in N[x] and two distinct Di
have different corresponding Vi, we have the
Weak Duality Inequality: d(G) ~ 8(G).
Their algorithm maintains 8(G) + 1 disjoint sets. Initially, these sets are
empty. The algorithm scans the vertices in the reverse order of the strong
elimination ordering. A vertex is included in a set when it is scanned. When
the algorithm terminates, these 8(G) + 1 sets are dominating sets.
We say that a vertex V is completely dominated if V is dominated by all
of these 8(G) + 1 dominating sets.

Algorithm DomaticSC. Determine a domatic partition of a strongly chordal


graph G of size 8(G) + 1.
Input. A strongly chordal graph G = (V, E) with a strong elimination
ordering [Vb V2, ... , Vn ].
Output. A partition of V into 8(G) + 1 disjoint dominating sets of G.
Method.

Si +- 0 for 1 ~ i ~ 8( G) + 1;
for i = n to 1 step -1 do
find the largest k '" i such that Vk is not completely dominated;
let St. be a set that does not dominate Vk;
St. +- {Vi} U St.;
if no such set exists then include Vi to an arbitrary St.;
end do.

Before proving the correctness of the algorithm, we need two lemmas.


Algorithmic Aspects of Domination in Graphs 381

Lemma 6.3 Assume Si ~ {Vi+l, Vi+2, ... , vn } and k "" i, where 1 :5 i :5 n.


If Si does not dominate Vk, then Si does not dominate Vj for all j :5 k with
J ""~.

Proof. Suppose to the contrary that Si has a vertex vp dominates Vj, i.e.,
i < p and p j. Then i < p, j :5 k, i j, i "" k and p j imply p k by
f'V I"V f'V f'V

(SEO), which contradicts that Si does not dominate Vk. _


Let r(v) = I{x E N[v] : x is not in any of the 6(G) + 1 sets} I and
ndom( v) be the number of sets that do not dominate v during the execution
of Algorithm DomaticSC. Then we have

Lemma 6.4 Algorithm DomaticSG maintains the following invariant:

r(Vj) ~ ndom(vj) for allj E {1,2, ... ,n}.

Proof. We prove the lemma by induction. Initially,

for all Vj E V. During iteration i, only values of r(vj) and ndom(vj)' where
j "" i, may be altered when Vi is included in a set Si. Notice that the
algorithm determines the largest index k "" i such that Vk is not completely
dominated. It then finds a set Si that does not dominate Vk (Si is chosen
arbitrarily when Vk does not exist).
For any j i with j :5 k, by Lemma 6.3, Vj was not dominated by Si'
f'V

Therefore, r(Vj) and ndom(Vj) are decremented by one after Vi is included


in Si.
On the other hand, for any j "" i with j > k (or non-existence of such
Vk), by the choice of the vertex Vk in the algorithm, vertex Vj is completely
dominated, i.e., ndom(vj) = O. Thus the invariant is maintained. _

Theorem 6.5 Algorithm DomaticSG partitions the vertex set of a strongly


chordal graph G = (V, E) in to d( G) = 6(G) + 1 disjoint dominating sets in
linear time provided that a strong elimination ordering is given.

Proof. Upon termination ofthe algorithm, r(vj) = 0 for allj E {1, 2, ... ,n}.
According to Lemma 6.4, we have that ndom(vj) = 0 for all Vj in V. That
is, these 6(G) + 1 sets are dominating sets of G. The strong duality equality

d(G) = 6(G) +1
382 G.J. Chang

then follows from the weak duality inequality.


To implement that algorithm efficiently, each vertex Vi is associated with
a variable ndom(i) and an array L(i) of size 5(G) + 1. Initially, ndom(i) =
5(G) + 1 and the values of entries in Li are all zero. Thus, for each vertex
it takes O(di) time to test ndom(i) to determine Vk, where di is the degree
of Vi. It then takes 0(5(G) + 1) time to decide which set Vi should go.
Finally, for each Vj E N[Vi], it takes 0(1) time to update ndom(j) and Lj.
Therefore, the algorithm takes

o (t.(<1; H(G) +1) + <1;)) = O(IVI + lEI) time.


7 Permutation graphs
Given a permutation 1r = (1r(1), 1r(2), . .. ,1r(n)) on the set In = {I, 2, ... ,n},
the permutation graph of 1r is the graph G(1r) = (In, E(1r)) with

E(1r) = {jk : (j - k)(1r- 1 (j) - 1r- 1 (k)) < O}.


Note that 1r-l(j) is the position of j in the permutation 1r. Figure 13
illustrates a permutation and its corresponding permutation graph. If we
draw a line between each integer i and its position in 1r, we create n lines,
each with an associated integer. In this way, two vertices j and k are adjacent
in G(1r) if and only if their corresponding lines cross. That is, G(1r) is
the intersection graph of these n lines. Notice that an independent set in
G(1r) corresponds to an increasing subsequence of 1r, and a clique in G(1r)
corresponds to a decreasing subsequence of 1r.

1 2 345 6 7 8

1r = (2 3 6 4 1 8 7 5)

Figure 13: A permutation and its corresponding permutation graph.

Permutation graphs were first introduced by Pnueli, Lempel and Even


in 1971 [152, 77]. Since that time quite a few polynomial time algorithms
Algorithmic Aspects of Domination in Graphs 383

have been constructed on permutation graphs. For example, At allah , Man-


acher and Urrutia [6], Brandstadt and Kratsch [28], Farber and Keil [82]
and Tsai and Hsu [171] have constructed polynomial domination and inde-
pendent domination algorithms, Brandstadt and Kratsch [28] and Colbourn
and Stewart [54] have constructed polynomial connected domination algo-
rithms, and Brandstadt and Kratsch [28] and Corneil and Stewart [65] have
constructed polynomial total domination algorithms.
In this section we present a simple O{n 2 ) algorithm, due to Brandstadt
and Kratsch [29] for finding a minimum weighted independent dominating
set in a permutation graph. We assume that the defining permutation 7r
of the permutation graph is given as part of the input. Spinrad [165] has
shown that 7r can be constructed in O{n 2 ) time, given the graph G.
This algorithm takes the advantage of the observation that a set is an
independent dominating set if and only if it is a maximal independent set.
Since, as we have observed, maximal independent sets correspond in permu-
tation graphs to maximal increasing subsequences in 7r, all that is necessary
is to search for such a sequence in 7r of minimum weight. In particular,
we determine, for every j, 1 ~ j < n, the minimum weight 'Yi{j, w) of an
independent dominating set in the subsequence 7r(1), 7r(2), ... ,7r(j), which
contains 7r(j) as the rightmost element. We let w{j) denote the weight of
vertex j.

Algorithm WIndDomPer. Solve the weighted independent domination


problem for permutation graphs.
Input. A permutation graph G with its corresponding permutation 7r on
the set {I, 2, ... ,n} and vertex weights w{l), w(2), . .. ,w{n) ofreal numbers.
Output. The weighted independent domination number 'Yi{G, w) of G.
Method.

for j = 1 to n do
p{j) +- OJ
'Yi(j) = w{7r(j))j
for k = j - 1 to 1 step -1 do
if 7r(j) > 7r{k) and p{j) = 0 then
{ 'Yi(j) +- w{7r{j)) + 'Yi{k)j p(j) +- 7r{k)j }
else if 7r(j) > 7r{k) > p(j) > 0 and
'Yi{j) > w{7r{j)) + 'Yi{k) then
{ 'Yi(j) +- w{7r{j)) + 'Yi{k)j p(j) +- 7r{k)j }
384 G.J. Chang

end do;
end do;
m +- p(n);
'Yi(G, w) +- li(n);
for j = n - 1 to 1 step -1 do
if 7r(j) > m and li(G, w) > li(j) then
{ li(G, w) +- li(j); m +- 7r(j) };
end do.

We illustrate this algorithm with the permutation graph G(7r) in Figure


13, where 7r = (2 3 6 4 1 8 7 5) and all weights are equal to 1:
li(j) = (1 2 3 3 1 2 2 2), and thus the minimum size of an independent
dominating set is 2, for example the set {1,5}.

8 Co comparability graphs
A graph G = (V, E) is a comparability graph if G has an orientation H =
(V, F) such that xy, yz E F imply xz E F. In other words, G has a compa-
rability ordering which is an ordering [VI, V2,"" vn ] of V satisfying

A graph G = (V, E) is a cocomparability graph if its complement G is a


comparability graph, or equivalently, G has a cocomparability ordering which
is an ordering [VI,V2,'" ,vn ] of V satisfying

i < j < k and ViVk E E imply ViVj E E or VjVk E E. (CCO)

There is an O(n2 .376 )_time recognition algorithm for comparability graphs


and thus for co comparability graphs [165]. This has been improved by Mc-
Connell and Spinrad who gave an O( n + m )-time algorithm constructing an
orientation of any given graph G such that the orientation is a transitive
orientation of G if and only if G has a transitive orientation [139]. Unfor-
tunately, the best algorithm for testing whether the orientation is indeed
transitive has running time O(n 2.376 ).
The class of cocomparability graphs is a well studied superclass of the
classes of interval graphs, permutation graphs and trapezoid graphs. Dom-
ination problems on co comparability graphs were considered for the first
time by Kratsch and Stewart [127]. They obtained polynomial time al-
gorithms for the domination/total domination/connected domination and
Algorithmic Aspects of Domination in Graphs 385

the weighted independent domination problems in cocomparability graphs.


These algorithms are designed by dynamic programming using cocompa-
rability orderings. Breu and Kirkpatrick [30] (see [2]) improved this by
giving O(nm2 )-time algorithms for the domination and the total domina-
tion problems and an O(n2.376 )_time algorithm for the weighted independent
domination problem in co comparability graphs.
On the other hand, the weighted domination, total domination and con-
nected domination problems are NP-complete in co comparability graphs
[41]. Also, the problem 'Given a co comparability graph G, does G have a
dominating clique?' is NP-complete [127].
An O(n3 )-time algorithm computing a minimum cardinality connected
dominating set of a connected co comparability graph has been given in [127].
We illustrate the algorithm for this problem given by Breu and Kirkpatrick
[30] (see also [2]).
Let [VI, V2, ... ,vn ] be a co comparability ordering of a co comparability
graph G = (V, E). For vertices u, w E V we write u < w, if u appears
before w in the ordering, i.e., u = Vi and w = Vj implies i < j. For
i ~ j the set {Vk : i ~ k ~ j} is denoted by [Vi, Vj]. Then ViVj E E
implies that every vertex Vk with i < k < j is adjacent to Vi or to Vj, thus
{Vi, Vj} dominates [Vi, Vj]. This can be generalized as follows: Let S ~ V
where G[S] is connected. Then S dominates [min(S) , max(S)] where min(S)
(respectively, max(S)) is the vertex of S with the smallest (respectively,
largest) index in the ordering.
The following theorem and lemma are given in [127].

Theorem 8.1 Any connected cocomparability graph G has a minimum con-


nected dominating set S such that the induced subgraph G[S] is a chordless
path PbP2,'" ,Pk·

Lemma 8.2 Suppose S ~ V is a minimum connected dominating set of a


cocomparability graph G = (V, E) with a cocomparability ordering
[Vb V2,···, v n ]. If G[S] is a chordless path Pl,P2,'" ,Pk, then every ver-
tex x < min(S) is dominated by {Pl,P2} and every vertex y > max(S) is
dominated by {Pk-I,Pk}.

The following approach enables an elegant way of locating a chordless


path of minimum size that dominates the co comparability graph. A source
vertex is a vertex Vi such that VkVi E E for all k < i and a sink vertex
is a vertex Vj such that VjVk E E for all k > j. Then [VI, V2, ... ,vn ] is
386 G.J. Chang

a canonical cocomparability ordering if [Vb V2, ... , Vr ), 1 ~ r < n, are the


source vertices and Vs, Vs+b ... , Vn , 1 < s ~ n, are the sink vertices. Note
that every co comparability graph G has a canonical co comparability order-
ing. Furthermore, given any co comparability ordering a canonical one can
be computed in time O(n + m),
From now on, we assume that [Vb V2, ... , vn ] is a canonical cocompa-
rability ordering. Since the source vertices of G form a clique, any source
vertex Vi dominates [Vb Vi]. Analogously, since the sink vertices of G form
a clique, any sink vertex Vj dominates [Vj, vn ]. Therefore the vertex set of
every path between a source vertex and a sink vertex is dominating.
The following theorem given in [30] enlights the key property.
Theorem 8.3 Every connected cocomparability graph G = (V, E) satisfying
IC( G) ~ 3 has a minimum connected dominating set which is the vertex set
of a shortest path between a source and a sink vertex of G.
Proof. Let [Vb V2, ... ,vn ] be a canonical co comparability ordering of G.
According to Theorem 8.1, there is a minimum connected dominating set
S of G such that G[S] is a chordless path P : PI,P2, ... ,Pk, k ~ 3. We
construct below a chordless path P" between a source vertex and a sink
vertex of G, that has the same number of vertices as the path P.
Let PI = Vi and P2 = Vj. First observe that P2 = Vj cannot be a source
vertex, otherwise N(P2] 2 [VI, Vj] implying that {P2,P3, ... ,Pk} is also a
connected dominating set of G, a contradiction. If PI is a source vertex then
P starts at a source vertex. In this case we proceed with the path p' = P
(possibly) rearranging Pk-I,Pk.
Suppose PI = Vi is not a source vertex. Then there is a source vertex
U of G with UPl ¢ E. Since [VI, V2, ... , vn ] is a canonical cocomparability
ordering and since PI and P2 are not source vertices, we get U < PI and U <
P2. Since ViVj E E and by Lemma 8.2, {Vi,Vj} dominates [l,max{vi,vj}].
Consequently UP2 E E.
Consider the set S' = {U,P2, ... ,Pk}. Since {U,P2} dominates [VI,Vj), S'
is a dominating set. Since P : PbP2, ... ,Pk is a chordless path, t ~ 3 implies
PtU ¢ E. Thus S' induces the chordless path P' : U,P2, ... ,Pk.
Similarly, starting from P' the vertex Pk can be replaced, if necessary.
Vertex Pk-l is not a sink vertex. If Pk is a sink vertex then S" = S' and
P" = P'. Otherwise we replace Pk by a sink vertex V satisfying VPk ¢ E to
obtain S" and P".
G[S"] induces a chordless path between a sink and a source vertex. The
vertex set of any path between a sink and a source vertex is a dominating set.
Algorithmic Aspects of Domination in Graphs 387

By construction S" is a minimum connected dominating set. Consequently


S" is the vertex set of a shortest path between a source vertex and a sink
vertex of G. •
According to Theorem 8.3, when 'Yc( G) 2: 3, computing a minimum
connected dominating set of a connected co comparability graph G reduces
to computing a shortest path between a source and a sink vertex of G.

Algorithm ConDomCC. Solve the connected domination problem in co-


comparabiliy graphs.
Input: A connected cocomparability graph G = (V, E) and a canonical
cocomparability ordering [Vb V2, ••• , vn] of G.
Output: A minimum connected dominating set of G.
Methods.

Step 1. Check whether G has a minimum connected dominating set D


of size at most 2. If so, output D and stop.
Step 2. Construct a new graph G' by adding two new vertices s and
t to G such that s is adjacent exactly to the source vertices of G and t is
adjacent exactly to the sink vertices of G.
Step 3. Compute a shortest path P : S,PbP2, ... ,Pk, t between sand t
in G' by the breadth-first-search .
Step 4. Output {Pl,P2, ... ,Pk}.

The correctness of Algorithm ConDomCC follows immediately from The-


orem 8.3. The 'almost linear' running time of the algorithm follows from
the well-known fact that breadth-first-search is a linear-time procedure.

Theorem 8.4 For any connected cocomparability graph G = (V, E) with


a canonical cocomparability ordering [Vb V2, ••. , vn], Algorithm ConDomCC
outputs a minimum connected dominating set of G in O(nm) time. In fact,
all parts of the algorithm except checking for a connected dominating set of
size two can be done in time O(n + m).

It is clearly unsatisfactory that the straightforward test for a connected


dominating set of size two dominates the overall running time. The crux is
that there are even permutation graphs for which each minimum connected
dominating set of size two contains neither a source nor a sink vertex (see
[112, 122]) It seems that minimum dominating sets of this type cannot be
found by a shortest path approach. It is an open question whether Step 1
of Algorithm ConDomCC can be implemented in a more efficient way.
388 G.J. Chang

We mention that the O(n) algorithms computing a minimum connected


dominating set for permutation graphs [112] and trapezoid graphs [122] both
rely on Theorem 8.3.
Corneil, Olariu and Stewart have done a lot of research on asteroidal
triple-free graphs, usually called AT-free graphs [58, 59]. They are defined
as those graphs not containing an asteroidal triple, i.e., a set of three vertices
such that between any two of the vertices there is a path a voiding the
neighborhood of the third.
AT-Free graphs form a superclass of the co comparability graphs. They
are a 'large class of graphs' with nice structural properties and some of them
are related to domination. One of the major theorems on the structure of
AT-free graphs states that every connected AT-free graph has a dominating
pair, i.e., a pair of vertices '11, v such that the vertex set of each path between
'11 and v is a dominating set.
An O(n + m) algorithm computing a dominating pair for a given con-
nected AT-free graph has been presented in [59]. This can be used to obtain
an O{n+m) algorithm computing a dominating path for connected AT-free
graphs (see also [62]). An O{n 3 ) algorithm computing a minimum connected
dominating set for connected AT-free graphs is given in [7]. An O{n + m)
algorithm computing. a minimum connected dominating set in connected
AT-free graphs with diameter greater than three is given in [59].

9 Distance-hereditary graphs
9.1 Hangings of distance-hereditary graphs
A graph is distance-hereditary if every two vertices have the same dis-
tance in every connected induced subgraph. Distance-hereditary graphs
were introduced by Howorka [93]. The characterization and recognition of
distance-hereditary graphs have been studied in [8, 67, 68, 92, 93]. Distance-
hereditary graphs are parity graphs [32] and contains all cographs [57, 64].
The hanging hu of a connected graph G = (V, E) at a vertex '11 E V is
the collection of sets Lo{u), Ll{U), ... , Lt{u) (or Lo, L1, ... , Lt if there is
no ambiguity), where t = maxdG{u, v) and
vEV

Li(U) = {v E V: dG{u,v) = i}
for 0 ~ i ~ t. For any 1 ~ i ~ t and any vertex v ELi, let N'(v) -
N{v) n Li-l. A vertex v E Li with 1 ~ i ~ t is said to have a minimal
Algorithmic Aspects of Domination in Graphs 389

neighborhood in Li-l if N'(x) is not a proper subset of N'(v) for any x ELi.
Such a vertex v certainly exists.

Theorem 9.1 [8, 67, 68] A connected graph G = (V, E) is distance-


hereditary if and only if for every hanging hu = (Lo, L 1 , ... , L t ) of G and
every pair of vertices x, y E Li (1 ~ i ~ t) that are in the same component
of G - Li-l, we have N'(x) = N'(y).

Theorem 9.2 [8] Suppose hu = (Lo, Ll, ... ,Lt ) is a hanging of a connected
distance-hereditary graph at u. For any two vertices x, y E Li with i ~ 1,
N'{x) n N'(y) = 0 or N'(x) ~ N'(y) or N'(x) 2 N'(y).

Theorem 9.3 (Fact 3.4 in [92]) Suppose hu = (Lo, L 1 , ••. , L t ) is a hang-


ing of a connected distance-hereditary graph at u. If vertex v E Li with 1 ~
i ~ t has a minimal neighborhood in Li-b then NV\N'(v) (x) = NV\N'(v)(y)
for every pair of vertices x and y in N' (v).

9.2 Weighted connected domination


D'Atri and Moscarini [67] gave O{IVIIEI) algorithms for connected domina-
tion and Steiner tree problems in distance-hereditary graphs. Brandstadt
and Dragan [26] presented a linear-time algorithm for the connected r-
domination and Steiner tree problems in distance-hereditary graphs.
In this section, we present a linear-time algorithm given by Yeh and
Chang [179] for finding a minimum weighted connected dominating set of
a connected distance-hereditary graph G = (V, E) in which each vertex v
has a weight w(v) that is a real number. According to Lemma 2.1, we may
assume that the vertex weights are nonnegative.

Lemma 9.4 Suppose hu = {Lo,L1, ... ,Lt } is a hanging of a connected


distance-hereditary graph at u. For any connected dominating set D and
v E Li with 2 :::; i :::; t, D n N'(v) =1= 0.

Proof. Choose a vertex yin D that dominates v. Then y E Li-l ULiUL i+!.
If y ELi-I, then y E D n N'(v). So we may assume that y E Li U Li+l.
Choose a vertex xED n (Lo U Lt) and an x-y path

using vertices only in D. Let j be the smallest index such that {Vj, Vj+!, ... , vm }
~ Li U Li+! U ... U Lt. Then Vj ELi, Vj-l E N'(vj), and v and Vj are in
390 G.J. Chang

the same component of G - Li-l. By Theorem 9.1, N'{v) = N'{vj) and so


Vj-l ED n N'{v). In any case, D n N'{v) =10. •

Theorem 9.5 Suppose G = (V, E) is a connected distance-hereditary graph


with a non-negative weight function w on its vertices. Let hu = {Lo, Lb . .. ,
L t } be a hanging at a vertex u of minimum weight. Consider the set A =
{ N' (v): v E Li with 2 ::;; i ::;; t and v has a minimal neighborhood in L i- 1 }.
For each N' {v} in A, choose one vertex v* in N' {v} of minimum weight,
and let D be the set of all such v*. Then D or D U {u} or some {v} with
v E V is a minimum weighted connected dominating set of G.

Proof. For any x E Li with 2 ::;; i ::;; t, by Theorem 9.2, N'{x} includes
some N' {v} in A. Thus we have Claim 1.
Claim 1. For any x E Li with 2 ::;; i ::;; t, x E N[Li-1 n D].
Claim 2. D U {u} is a connected dominating set of G.
Proof of Claim 2. By Claim 1 and N[u] = LIU{U}, DU{u} is a dominating
set of G. Also, by Claim 1, for any vertex x in D U {u} there exists an x-u
path using vertices only in D U {u}, i.e., G[D U {u}] is connected. •
Suppose M is a minimum weighted connected dominating set of G. Ac-
cording to Lemma 9.4, M n N'{v) =I 0 for each N'{v) E A, say v** E
M n N'(v}. Since any two sets in A are disjoint, we have IMI ~ IAI = IDI.
Case 1. IMI = 1. The theorem is obvious in this case.
Case 2. IMI > IDI. In this case, there is at least one vertex x in M
that is not a v**. Then

w(M) ~ L w(v**} + w(x} ~ L w(v*} + w(u} = w(D U {u}}.


v·· v·

This together with Claim 2 gives that D U {u} is a minimum weighted


connected dominating set of G.
Case 3. IMI = IDI ~ 2. Since A contains pairwise disjoint sets, we have
M = {v** : N'{v) E A}. Then w{M) = Ev •• w(v**} ~ Ev. w(v*} = w(D}.
For any two vertices x* and y* in D, x** and y** are in M. Since G[M]
is connected, there is an x**-y** path in G[M]:

x ** = Vo** ,vI** , ... , vn** = Y** .


For any 1 ::; i ::; n, since vi and vi*' are both in N' (vd E A, by Theorem
9.3, NV\N'(Vi)(Vt) = NV\N'(Vi) (vi*'). But vi~l E NV\N'(vd(vi*'). Therefore
Algorithmic Aspects of Domination in Graphs 391

Vi~1 E NV\N'(v;){vi) and vi E NV\N'(v;_d{vi~I)' Also, that vi-I and vi~1


are both in N'{Vi-l) E A implies that NV\N'(v;_d{vi-l) = NV\N'(v;_d{vi~d.
Then vi E NV\N'(v;_d{vi-I)' This proves that vi-I is adjacent to vi for
1 :::; i :::; n and then
x * = vo,
* vI*, ..• ,vn* = Y*
is an x*-y* path in G[D], i.e., G[D] is connected.
For any x in V, since M is a dominating set, x E N[v**] for some N' (v) E
A. Note that v** and v* are both in N'{v). According to Theorem 9.3,
NV\N'(v){V**) = NV\N'(v) (v*). In the case of x ¢ N'{v), x E N[v**] implies
x E N[v*], i.e., D dominates x. In the case of x E N'{v), NV\N'(v){V*) =
NV\N'(v) (x). Since G[D] is connected and IDI ~ 2, v* is adjacent to some
y* E D \ N' (v). Then x is also adjacent to y*, i.e., D dominates x. In any
case, D is a dominating set. Therefore D is a minimum weighted connected
dominating set of G. •
By Lemma 2.1 and Theorem 9.5, we can design an efficient algorithm for
the weighted connected domination problem in distance-hereditary graphs.
To implement the algorithm efficiently, we do not actually find the set A.
Instead, we perform the following step for each 2 :::; i :::; t. Sort the vertices
in Li such that

We then process N'{Xk) for k from 1 to j. At iteration k, if N'{Xk) nD = 0,


then N'{Xk) is in A and we choose a vertex of minimum weight to put it
into Dj otherwise N'(Xk) ¢ A and we do nothing.

Algorithm WConDomDH. Find a minimum weighted connected domi-


nating set of a connected distance-hereditary graph.
Input: A connected distance-hereditary graph G = (V, E) and a weight
w(v) of real number for each v E V.
Output: A minimum weighted connected dominating set D of graph G.
Method.

D t- 0j
let V' = {v E V : w{v) < O};
w(v) t- 0 for each v E V';
let U be a vertex of minimum weight in V;
determine the hanging hu = (Lo, L 1 , ... , L t ) of G at U;
392 G.J. Chang

for i = 2 to t do
let Li = {XbX2, ... ,Xj};
sort Li such that IN'(Xil)1 ~ IN'(Xi2)1 ~ ... ~ IN'(Xij)l;
for k = 1 to j do
if N'(Xi,,) n D = 0 then D f- D U {y} where y is a vertex
of minimum weight in N' (Xi" );
end do;
if not (Ll S; N[D] and G[D] is connected) then D f- D U {u};
for v E V that dominates V do ifw(v) < w(D) then D f- {v};
D f- DUV'.

Theorem 9.6 Algorithm WConDomDH gives a minimum weighted con-


nected dominating set of a connected distance-hereditary graph in linear
time.

Proof. The correctness of the algorithm follows from Lemma 2.1 and The-
orem 9.5. For each i, we can sort Li by using a bucket sort. Then the
algorithm runs in O(IVI + lEI) time. •

References
[1] S. Arnborg and A. Proskurowski. Linear time algorithms for NP-hard
problems restricted to partial K-trees. Discrete Appl. Math., 23:11-24,
1989.
[2] K. Arvind, H. Breu, M. S. Chang, D. G. Kirkpatrick, F. Y. Lee, Y. D.
Liang, K. Madhukar, C. Pandu Rangan and A. Srinivasan. Efficient
algorithms in co comparability and trapezoid graphs. Submitted, 1996.
[3] K. Arvind and C. Pandu Rangan. Connected domination and Steiner
set on weighted permutation graphs. Inform. Process. Lett., 41:215-
220, 1992.
[4] T. Asano. Dynamic programming on intervals. Interoat. J. Comput.
Ceom. Appl., 3:323-330, 1993.
[5] M. J. Atallah and S. R. Kosaraju. An efficient algorithm for maxdom-
inance, with applica.tions. Algorithmica, 4:221-236, 1989.
[6] M. J. Atallah, G. K. Mana.cher and J. Urrutia. Finding a minimum
independent dominating set in a permutation graph. Discrete Appl.
Math., 21:177-183, 1988.
Algorithmic Aspects of Domination in Graphs 393

[7] H. Balakrishnan, A. Rajaraman and C. Pandu Rangan. Connected


domination and Steiner set on asteroidal triple-free graphs. In F.
Dehne, J. R. Sack, N. Santoro and S. Whitesides, editors, Proc. Work-
shop on Algorithms and Data Structures (WADS'93), volume 709,
pages 131-141, Montreal, Canada, 1993. Springier-Verlag, Berlin.
[8] H. J. Bandelt and H. M. Mulder. Distance-hereditary graphs. J. Comb.
Theory, Series B, 41:182-208, 1986.
[9] D. W. Bange, A. E. Barkauskas and P. J. Slater. Efficient dominating
sets in graphs. In R. D. Ringeisen and F. S. Roberts, editors, Appli-
cations of Discrete Mathematics, pages 189-199. SIAM, Philadelphia,
PA, 1988.
[10] A. E. Barkauskas and L. H. Host. Finding efficient dominating sets in
oriented graphs. Congr. Numer., 98:27-32, 1993.
[11] R. E. Bellman and S. E. Dreyfus. Applied Dynamic Programming.
Princeton University Press, 1962.
[12] P. J. Bernhard, S. T. Hedetniemi and D. P. Jacobs. Efficient sets in
graphs. Discrete Appl. Math., 44:99-108, 1993.
[13] P. Bertolazzi and A. Sassono. A class of polynomially solvable set
covering problems. SIAM J. Discrete Math., 1:306-316, 1988.
[14] A. A. Bertossi. Dominating sets for split and bipartite graphs. Inform.
Process. Lett., 19:37-40, 1984.
[15] A. A. Bertossi. Total domination in interval graphs. Inform. Process.
Lett., 23:131-134, 1986.
[16] A. A. Bertossi. On the domatic number of interval graphs. Inform.
Process. Lett., 28:275-280, 1988.
[17] A. A. Bertossi and A. Gori. Total domination and irredundance in
weighted interval graphs. SIAM J. Discrete Math., 1:317-327, 1988.
[18] T. A. Beyer, A. Proskurowski, S. T. Hedetniemi and S. Mitchell. In-
dependent domination in trees. Congr. Numer., 19:321-328, 1977.
[19] N. L. Biggs, E. K. Lloyd and R. J. Wilson. Graph Theory 1736-1936.
Clarendon Press, Oxford, 1986.
[20] M. A. Bonuccelli. Dominating sets and domatic number of circular
arc graphs. Discrete Appl. Math., 12:203-213, 1985.
[21] K. S. Booth and J. H. Johnson. Dominating sets in chordal graphs.
SIAM J. Comput., 11:191-199, 1982.
394 G.J. Chang

[22] A. Brandstadt. The computational complexity of feedback vertex set,


Hamiltonian circuit, dominating set, Steiner tree and bandwidth on
special perfect graphs. J. InJorm. Process. Cybemet., 23:471-477,
1987.
[23] A. Brandstadt and H. Behrendt. Domination and the use of maximum
neighbourhoods. Technical Report SM-DU-204, Univ. Duisburg, 1992.
[24] A. Brandstadt, V. D. Chepoi and F. F. Dragan. Clique r-domination
and clique r-packing problems on dually chordal graphs. Technical
Report SM-DU-251, Univ. Duisburg, 1994.
[25] A. Brandstadt, V. D. Chepoi and F. F. Dragan. The algorithmic use
of hypertree structure and maximum neighbourhood orderings. In
E. W. Mayr, G. Schmidt and G. Tinhofer, editors, Lecture Notes in
Comput. Sci., 20th Intemat. Workshop Graph- Theoretic Concepts in
Computer Science (WG '94), volume 903, pages 65-80, Berlin, 1995.
Springer-Verlag.
[26] A. Brandstadt and F. F. Dragan. A linear-time algorithm for con-
nected r-domination and Steiner tree on distance-hereditary graphs.
Technical Report SM-DU-261, Univ. Duisburg, 1994.
[27] A. Brandstadt, F. F. Dragan, V. D. Chepoi and V. I. Voloshin. Du-
ally chordal graphs. In Lecture Notes in Comput. Sci., 19th Intemat.
Workshop Graph-Theoretic Concepts in Computer Science (WG'93),
volume 790, pages 237-251, Berlin, 1993. Springer-Verlag.
[28] A. Brandstadt and D. Kratsch. On the restriction of some NP-
complete graph problems to permutation graphs. In L. Budach, editor,
Lecture Notes in Comput. Sci., Proc. FCT'85, volume 199, pages 53-
62, Berlin, 1985. Springer-Verlag.
[29] A. Brandstadt and D. Kratsch. On domination problems on permu-
tation and other graphs. Theoret. Comput. Sci., 54:181-198, 1987.
[30] H. Breu and D. G. Kirkpatrick. Algorithms for dominating and Steiner
set problems in co comparability. Manuscript, 1996.
[31] M. W. Broin and T. J. Lowe. A dynamic programming algorithm for
covering problems with (greedy) totally balanced contraint matrices.
SIAM J. Algebraic and Discrete Methods, 7:348-357, 1986 .
. [32] M. Burlet and J. P. Uhry. Parity graphs. Annals oj Discrete Math.,
21:253-277, 1984.
Algorithmic Aspects of Domination in Graphs 395

[33] G. J. Chang. Labeling algorithms for domination problems in sun-free


chordal graphs. Discrete Appl. Math., 22:21-34, 1988/89.
[34] G. J. Chang. Total domination in block graphs. Oper. Res. Lett.,
8:53-57, 1989.
[35] G. J. Chang, M. Farber and Z. Tuza. Algorithmic aspects of neigh-
borhood numbers. SIAM J. Discrete Math., 6:24-29, 1993.
[36] G. J. Chang and G. L. Nemhauser. R-domination of block graphs.
Oper. Res. Lett., 1:214-218, 1982.
[37] G. J. Chang and G. L. Nemhauser. The k-domination and k-stability
on sun-free chordal graphs. SIAM J. Algebraic Discrete Methods,
5:332-345, 1984.
[38] G. J. Chang and G. L. Nemhauser. Covering, packing and generalized
perfection. SIAM J. Algebraic Discrete Methods, 6:109-132, 1985.
[39] G. J. Chang, C. Pandu Rangan and S. R. Coorg. Weighted indepen-
dent perfect domination on co comparability graphs. Discrete Appl.
Math., 63:215-222, 1995.
[40] M. S. Chang. Efficient algorithms for the domination problems on
interval graphs and circular-arc graphs. In IFIP Transactions A-12,
Proc. IFIP 12th World Congress, volume 1, pages 402-408, 1992.
[41] M. S. Chang. Weighted domination on co comparability graphs. In
Lecture Notes in Comput. Sci., Proc. ISAAC'95, volume 1004, pages
121-131, Berlin, 1995. Springer-Verlag.
[42] M. S. Chang, F. H. Hsing and S. L. Pengo Irredundance in weighted
interval graphs. In Proc. National Computer Symp., pages 128-137,
Taipei, Taiwan, 1993.
[43] M. S. Chang and Y. C. Liu. Polynomial algorithms for the weighted
perfect domination problems on chordal and split graphs. Inform.
Process. Lett., 48:205-210, 1993.
[44] M. S. Chang and Y. C. Liu. Polynomial algorithms for weighted perfect
domination problems on interval and circular-arc graphs. J. Inform.
Sci. Engineering, 10:549-568, 1994.
[45] M. S. Chang, S. Wu, G. J. Chang and H. G. Yeh. Domination in
distance-hereditary graphs. 1996. Submitted.
396 G.J. Chang

[46] G. A. Cheston, G. H. Fricke, S. T. Hedetniemi and D. P. Jacobs. On


the computational complexity of upper fractional domination. Discrete
Appl. Math., 27:195-207, 1990.
[47] E. J. Cockayne, S. E. Goodman and S. T. Hedetniemi. A linear al-
gorithm for the domination number of a tree. Inform. Process. Lett.,
4:41-44, 1975.
[48] E. J. Cockayne, B. L. Hartnell, S. T. Hedetniemi and R. Laskar. Perfect
domination in graphs. J. Combin. Inform. System Sci., 18:136-148,
1993.
[49] E. J. Cockayne and S. T. Hedetniemi. Optimal domination in graphs.
IEEE Trans. Circuits and Systems, 22:855-857, 1975.
[50] E. J. Cockayne and S. T. Hedetniemi. A linear algorithm for the
maximum weight of an independent set in a tree. In Proc. Seventh
Southeastern Conf. on Combinatorics, Graph Theory and Computing,
pages 217-228, Winnipeg, 1976. Utilitas Math.
[51] E. J. Cockayne, G. MacGillivray and C. M. Mynhardt. A linear al-
gorithm for 0-1 universal minimal dominating functions of trees. J.
Combin. Math. Combin. Comput., 10:23-31, 1991.
[52] E. J. Cockayne and F. D. K. Roberts. Computation of dominating
partitions. INFOR., 15:94-106, 1977.
[53] C. J. Colbourn, J. M. Keil and L. K. Stewart. Finding minimum
dominating cycles in permutation graphs. Oper. Res. Lett., 4:13-17,
1985.
[54] C. J. Colbourn and L. K. Stewart. Permutation graphs: connected
domination and Steiner trees. Discrete Math., 86:179-189, 1990.
[55] D. G. Corneil. The complexity of generalized clique packing. Discrete
Appl. Math., 12:233-239, 1985.
[56] D. G. Corneil and J. M. Keil. A dynamic programming approach to
the dominating set problem on k-trees. SIAM J. Algebraic Discrete
Methods, 8:535-543, 1987.
[57] D. G. Corneil, H. Lerchs and L. Stewart. Complement reducible
graphs. Discrete Appl. Math., 3:163-174,1981.
[58] D. G. Corneil, S. Olariu and L. Stewart. Asteroidal triple-free graphs.
SIAM J. Discrete Math. To appear.
Algorithmic Aspects of Domination in Graphs 397

[59] D. G. Corneil, S. Olariu and L. Stewart. Linear time algorithms for


dominating pairs in asteroidal triple-free graphs. SIAM J. Comput.
To appear.
[60] D. G. Corneil, S. Olariu and L. Stewart. Computing a dominating
pair in an asteroidal triple-free graph in linear time. In Proc. 4th
Algorithms and Data Structures Workshop, LNCS 955, volume 955,
pages 358-368. Springer, 1995.
[61] D. G. Corneil, S. Olariu and L. Stewart. A linear time algorithm to
compute dominating pairs in asteroidal triple-free graphs. In Lecture
Notes in Comput. Sci., Proc. 22nd Intemat. Colloq. on Automata,
Languages and Programming (ICALP'95), volume 994, pages 292-302,
Berlin, 1995. Springer-Verlag.
[62] D. G. Corneil, S. Olariu and L. Stewart. A linear time algorithm to
compute a dominating path in an AT-free graph. Inform. Process.
Lett., 54:253-258, 1995.
[63] D. G. Corneil and Y. Perl. Clustering and domination in perfect
graphs. Discrete Appl. Math., 9:27-39, 1984.
[64] D. G. Corneil, Y. Perl and L. Stewart Burlingham. A linear recognition
algorithm for cographs. SIAM J. Comput., 14:926-934, 1985.
[65] D. G. Corneil and L. K. Stewart. Dominating sets in perfect graphs.
Discrete Math., 86:145-164, 1990.
[66] P. Damaschke, H. Miiller and D. Kratsch. Domination in convex and
chordal bipartite graphs. Inform. Process. Lett., 36:231-236, 1990.
[67] A. D'Atri and M. Moscarini. Distance-hereditary graphs, Steiner trees,
and connected domination. SIAM J. Comput., 17:521-538, 1988.
[68] D. P. Day, O. R. Oellermann and H. C. Swart. Steiner distance-
hereditary graphs. SIAM J. Discrete Math. 7:437-442, 1994.
[69] C. F. De Jaenisch. Applications de l'Analuse mathematique an Jen
des Echecs. Petrograd, 1862.
[70] F. F. Dragan. HT-graphs: centers, connected r-domination and Steiner
trees. Comput. Sci. J. Moldova (Kishinev), 1:64-83, 1993.
[71] F. F. Dragan. Dominating cliques in distance-hereditary graphs. In
Lecture Notes in Comput. Sci., Algorithm Theory - SWAT/94: 4th
Scandinavian Workshop on Algorithm Theory, volume 824, pages 370-
381, Berlin, 1994. Springer-Verlag.
398 G.J. Chang

[72] F. F. Dragan and A. Brandstadt. Dominating cliques in graphs with


hypertree structure. In E. M. Schmidt and S. Skyum, editors, Lec-
ture Notes in Comput. Sci., Internat. Symp. on Theoretical Aspects
of Computer Science (STACS'94), volume 775, pages 735-746, Berlin,
1994. Springer-Verlag.
[73] F. F. Dragan and A. Brandstadt. r-Dominating cliques in graphs with
hypertree structure. Discrete Math., 162:93-108, 1996.
[74] S. E. Dreyfus and A. M. Law. The Art and Theory of Dynamic Pro-
gramming. Academic Press, New York, 1977.
[75] J. E. Dunbar, W. Goddard, S. T. Hedetniemi, M. A. Henning and
A. A. McRae. The algorithmic complexity of minus domination in
graphs. Discrete Appl. Math., 68:73-84, 1996.
[76] J. E. Dunbar, S. T. Hedetniemi, M. A. Henning and A. A. McRae.
Minus domination in graphs. Comput. Math. Appl. To appear.
[77] S. Even, A. Pnueli and A. Lempel. Permutation graphs and transitive
graphs. J. Assoc. Comput. Mach., 19(3):400-410, 1972.
[78] L. Euler. Solutio problematis ad geometriam situs pertinentis. Acad.
Sci. Imp. Petropol., 8:128-140, 1736.
[79] M. Farber. Domination and duality in weighted trees. Congr. Numer.,
33:3-13, 1981.
[80] M. Farber. Independent domination in chordal graphs. Oper. Res.
Lett., 1:134-138, 1982.
[81] M. Farber. Domination, independent domination and duality III
strongly chordal graphs. Discrete Appl. Math., 7:115-130, 1984.
[82] M. Farber and J. M. Keil. Domination in permutation graphs. J.
Algorithms, 6:309-321, 1985.
[83] A. M. Farley, S. T. Hedetniemi and A. Proskurowski. Partitioning
trees: matching, domination and maximum diameter. Internat. J.
Comput. Inform. Sci., 10:55-61, 1981.
[84] M. R. Fellows and M. N. Hoover. Perfect domination. Australas. J.
Combin., 3:141-150, 1991.
[85] G. H. Fricke, M. A. Henning, O. R. Oellermann and H. C. Swart. An
efficient algorithm to compute the sum of two distance domination
parameters. Discrete Appl. Math., 68:85-91, 1996.
Algorithmic Aspects of Domination in Graphs 399

[86] M. R. Garey and D. S. Johnson. Computers and Intractability: A


Guide to the Theory of NP-Completeness. Freeman, New York, 1979.
[87] F. Gavril. Algorithms for minimum colorings, maximum clique, mini-
mum coverings by cliques and maximum independent set of a chordal
graph. SIAM J. Comput., 1:180-187, 1972.
[88] M. C. Golumbic. Algorithmic Graph Theory and Perfect Graphs. Aca-
demic Press, New York, 1980.
[89] M. C. Golumbic. Algorithmic aspect of perfect graphs. Annals of
Discrete Math., 21:301-323, 1984.
[90] D. L. Grinstead and P. J. Slater. A recurrence template for several
parameters in series-parallel graphs. Discrete Appl. Math., 54:151-168,
1994.
[91] D. L. Grinstead, P. J. Slater, N. A. Sherwani and N. D. Holmes. Ef-
ficient edge domination problems in graphs. Inform. Process. Lett.,
48:221-228, 1993.
[92] P. H. Hammer and F. Maffray. Completely separable graphs. Discrete
Appl. Math., 27:85-99, 1990.
[93] E. Howorka. A characterization of distance-hereditary graphs. Quart.
J. Math. Oxford Ser. 2, 28:417-420, 1977.
[94] E. O. Hare, W. R. Hare and S. T. Hedetniemi. Algorithms for com-
puting the domination number of k x n complete grid graphs. Congr.
Numer., 55:81-92, 1986.
[95] E. O. Hare, S. Hedetniemi, R. C. Laskar, K. Peters and T. Wimer.
Linear-time computability of combinatorial problems on generalized-
series-parallel graphs. In D. S. Johnson, T. Nishizeki, A. Nozaki and H.
S. Wilf, editors, Discrete Algorithms and Complexity, Proc. Japan- US
Joint Seminar, pages 437-457, Kyoto, Japan, 1987. Academic Press,
New York.
[96] E. O. Hare and S. T. Hedetniemi. A linear algorithm for computing the
knight's domination number of a K x N chessboard. Congr. Numer.,
59:115-130, 1987.
[97] J. H. Hattingh, M. A. Henning and P. J. Slater. On the algorithmic
complexity of signed domination in graphs. Australas. J. Combin., 12,
1995. 101-112.
400 G.J. Chang

[98] J. H. Hattingh, M. A. Henning and J. L. Walters. On the computa-


tional complexity of upper distance fractional domination. Australas.
J. Gombin., 7:133-144, 1993.
[99] T. W. Haynes, S. T. Hedetniemi and P. J. Slater, editors. Domination
in Graphs: Advanced Topics. Marcel Dekker, Inc., New York, 1997.
[100] T. W. Haynes, S. T. Hedetniemi and P. J. Slater. Fundamentals of
Domination in Graphs. Marcel Dekker, Inc., New York, 1997.
[101] S. M. Hedetniemi, S. T. Hedetniemi and M. A. Henning. The algorith-
mic complexity of perfect neighborhoods in graphs. J. Gombin. Math.
Gombin. Gomput. To appear.
[102] S. M. Hedetniemi, S. T. Hedetniemi and D. P. Jacobs. Private domi-
nation: theory and algorithms. Gongr. Numer., 79:147-157, 1990.
[103] S. M. Hedetniemi, S. T. Hedetniemi and R. C. Laskar. Domination in
trees: models and algorithms. In Y. Alavi, G. Chartrand, L. Lesniak,
D. R. Lick and C. E. Wall, editors, Graph Theory with Applications to
Algorithms and Gomputer Science, pages 423-442. Wiley, New York,
1985.
[104] S. T. Hedetniemi and R. C. Laskar, editors. Topics on Domination,
volume 48. North Holland, New York, 1990.
[105] S. T. Hedetniemi, R. C. Laskar and J. Pfaff. A linear algorithm for
finding a minimum dominating set in a cactus. Discrete Appl. Math.,
13:287-292, 1986.
[106] A. J. Hoffman, A. W. J. Kolen and M. Sakarovitch. Totally-balanced
and greedy matrices. SIAM J. Algebraic and Discrete Methods, 6:721-
730, 1985.
[107] W. Hsu. The distance-domination numbers of trees. Oper. Res. Lett.,
1:96-100, 1982.
[108] W. Hsu and K. Tsai. Linear time algorithms on circular-arc graphs.
Inform. Process. Lett., 40:123-129, 1991.
[109] S. F. Hwang and G. J. Chang. k-Neighbor covering and independence
problem. SIAM J. Discrete Math. To appear.
[110] S. F. Hwang and G. J. Chang. The k-neighbor domination problem
in block graphs. European J. Oper. Res., 52:373-377, 1991.
[111] S. F. Hwang and G. J. Chang. The edge domination problem. Discuss.
Math.-Graph Theory, 15:51-57, 1995.
Algorithmic Aspects of Domination in Graphs 401

[112] O. H. Ibarra and Q. Zheng. Some efficient algorithms for permutation


graphs. J. Algorithms, 16:453-469, 1994.
[113] M. S. Jacobson and K. Peters. Complexity questions for n-domination
and related parameters. Congr. Numer., 68:7-22, 1989.
[114] T. S. Jayaram, G. Sri Karishna and C. Pandu Rangan. A unified
approach to solving domination problems on block graphs. Report
TR-TCS-90-09, Dept. of Computer Science and Eng., Indian Inst. of
Technology, 1990.
[115] D. S. Johnson. The NP-completeness column: an ongoing guide. J.
Algorithms, 5:147-160, 1984.
[116] D. S. Johnson. The NP-completeness column: an ongoing guide. J.
Algorithms, 6:291-305,434-451, 1985.
[117] J. M. Keil. Total domination in interval graphs. Inform. Process. Lett.,
22:171-174, 1986.
[118] J. M. Keil. The complexity of domination problems in circle graphs.
Discrete Appl. Math., 42:51-63, 1993.
[119] J. M. Keil and D. Schaefer. An optimal algorithm for finding domi-
nating cycles in circular-arc graphs. Discrete Appl. Math., 36:25-34,
1992.
[120] T. Kikuno, N. Yoshida and Y. Kakuda. The NP-completeness of the
dominating set problem in cubic planar graphs. Trans. IEEE, pages
443-444, 1980.
[121] T. Kikuno, N. Yoshida and Y. Kakuda. A linear algorithm for the
domination number of a series-parallel graph. Discrete Appl. Math.,
5:299-311, 1983.
[122] E. Kohler. Connected domination on trapezoid graphs in O(n) time.
Manuscript, 1996.
[123] A. Kolen. Solving covering problems and the uncapacitated plant
location problem on trees. Eropean J. Oper. Res., 12:266-278, 1983
[124] D. Kratsch. Finding dominating cliques efficiently, in strongly chordal
graphs and undirected path graphs. Discrete Math., 86:225-238, 1990.
[125] D. Kratsch. Domination and total domination in asteroidal triple-
free graphs. Technical Report Math/Inf/96/25, F.-Schiller-Univ. Jena,
1996.
402 G.J. Chang

[126] D. Kratsch, P. Damaschke and A. Lubiw. Dominating cliques in


chordal graphs. Discrete Math., 128:269-275, 1994.
[127] D. Kratsch and L. Stewart. Domination on cocomparability graphs.
SIAM J. Discrete Math., 6(3):400-417, 1993.
[128] R. C. Laskar, J. Pfaff, S. M. Hedetniemi and S. T. Hedetniemi. On
the algorithmic complexity of total domination. SIAM J. Algebraic
Discrete Methods, 5:420-425, 1984.
[129] E. L. Lawler and P. J. Slater. A linear time algorithm for finding an
optimal dominating subforest of a tree. In Graph Theory with Appli-
cations to Algorithms and Computer Science, pages 501-506. Wiley,
New York, 1985.
[130] Y. D. Liang. Domination in trapezoid graphs. Inform. Process. Lett.,
52:309-315, 1994.
[131] Y. D. Liang. Steiner set and connected domination in trapezoid
graphs. Inform. Process. Lett., 56:101-108, 1995.
[132] Y. D. Liang, C. Rhee, S. K. Dall and S. Lakshmivarahan. A new
approach for the domination problem on permutation graphs. Inform.
Process. Lett., 37:219-224, 1991.
[133] M. Livingston and Q. F. Stout. Constant time computation of mini-
mum dominating sets. Congr. Numer., 105:116-128, 1994.
[134] E. Loukakis. Two algorithms for determining a minimum independent
dominating set. Internat. J. Comput. Math., 15:213-229, 1984.
[135] T. L. Lu, P. H. Ho and G. J. Chang. The domatic number problem in
interval graphs. SIAM J. Discrete Math., 3:531-536, 1990.
\
[136] K. L. Ma and C. W. H. Lam. Partition algorithm for the dominating
set problem. Congr. Numer., 81:69-80, 1991.
[137] G. K. Manacher and T. A. Mankus. Finding a domatic partition of
an interval graph in time O(n). SIAM J. Discrete Math., 9:167-172,
1996.
[138] M. V. Marathe, H. B. Hunt III and S. S. Ravi. Efficient approximation
algorithms for domatic partition and on-line coloring of circular arc
graphs. Discrete Appl. Math., 64:135-149, 1996.
[139] R. M. McConnell and J. P. Spinrad. Modular decomposition and
transitive orientation. Manuscript, 1995.
Algorithmic Aspects of Domination in Graphs 403

[140] S. L. Mitchell, E. J. Cockayne and S. T. Hedetniemi. Linear algo-


rithms on recursive representations of trees. J. Comput. System Sci.,
18(1):76-85, 1979.
[141] S. L. Mitchell and S. T. Hedetniemi. Edge domination in trees. Congr.
Numer., 19:489-509, 1977.
[142] S. L. Mitchell and S. T. Hedetniemi. Independent domination in trees.
Congr. Numer., 29:639-656, 1979.
[143] S. L. Mitchell, S. T. Hedetniemi and S. Goodman. Some linear algo-
rithms on trees. Congr. Numer., 14:467-483, 1975.
[144] M. Moscarini. Doubly chordal graphs, Steiner trees and connected
domination. Networks, 23:59-69, 1993.
[145] H. Miiller and A. Brandstadt. The NP-completeness of STEINER
TREE and DOMINATING SET for chordal bipartite graphs. Theoret.
Comput. Sci., 53:257-265, 1987.
[146] K. S. Natarajan and L. J. White. Optimum domination in weighted
trees. Inform. Process. Lett., 7:261-265, 1978.
[147] G. L. Nemhauser. Introduction to Dynamic Programming. John Wiley
& Sons, 1966.
[148] A. K. Parekh. Analysis of a greedy heuristic for finding small domi-
nating sets in graphs. Inform. Process. Lett., 39:237-240, 1991.
[149] S. L. Peng and M. S. Chang. A new approach for domatic number
problem on interval graphs. Proc. National Compo Symp. R. O. C.,
pages 236-241, 1991.
[150] S. L. Peng and M. S. Chang. A simple linear time algorithm for
the domatic partition problem on strongly chordal graphs. Inform.
Process. Lett., 43:297-300, 1992.
[151] J. Pfaff, R. Laskar and S. T. Hedetniemi. Linear algorithms for in-
dependent domination and total domination in series-parallel graphs.
Congr. Numer., 45:71-82, 1984.
[152] A. Pnueli, A. Lempel and S. Even. Transitive orientation of graphs
and identification of permutation graphs. Canad. J. Math., 23:160-
175, 1971.
[153] A. Proskurowski. Minimum dominating cycles in 2-trees. Internat. J.
Comput. Inform. Sci., 8:405-417, 1979.
404 G.J. Chang

[154] A. Proskurowski and J. A. Telle. Algorithms for vertex partitioning


problems on partial k-trees. SIAM J. Discrete Math. To appear.
[155] G. Ramalingam and C. Pandu Rangan. Total domination in interval
graphs revisited. Inform. Process. Lett., 27:17-21, 1988.
[156] G. Ramalingam and C. Pandu Rangan. A unified approach to domi-
nation problems in interval graphs. Inform. Process. Lett., 27:271-274,
1988.
[157] C. Rhee, Y. D. Liang, S. K. Dhall and S. Lakshmivaranhan. An
O{n + m} algorithm for finding a minimum-weight dominating set
in a permutation graph. SIAM J. Comput., 25:401-419, 1996.
[158] J. S. Roh!. A faster lexicographic N queens algorithm. Inform. Pro-
cess. Lett., 17:231-233, 1983.
[159] P. Scheffler. Linear-time algorithms for NP-complete problems re-
stricted to partial k-trees. Technical Report 03/87, IMATH, Berlin,
1987.
[160] D. Seese. Tree-partite graphs and the complexity of algorithms. In
Lecture Notes in Computer Science, FCT 85, volume 199, pages 412-
421. Springer, Berlin, 1985.
[161] P. J. Slater. R-domination in graphs. J. Assoc. Comput. Mach.,
23:446-450, 1976.
[162] P. J. Slater. Domination and location in acyclic graphs. Networks,
17:55-64, 1987.
[163] C. B. Smart and P. J. Slater. Complexity results for closed neighbor-
hood order parameters. Congr. Numer., 112:83-96, 1995.
[164] R. Sosic and J. Gu. A polynomial time algorithm for the N-queens
problem. SIGART Bull., 2{2}:7-11, 1990.
[165] J. Spinrad. On comparability and permutation graphs. SIAM J. Com-
put., 14:658-670, 1985.
[166] A. Srinivasa Rao and C. Pandu Rangan. Linear algorithm for domatic
number problem on interval graphs. Inform. Process. Lett., 33:29-33,
1989/90.
[167] A. Srinivasan Rao and C. Pandu Rangan. Efficient algorithms for the
minimum weighted dominating clique problem on permutation graphs.
Theoret. Comput. Sci., 91:1-21, 1991.
Algorithmic Aspects of Domination in Graphs 405

[168] J. A. Telle. Complexity of domination-type problems in graphs. Nordic


J. Comput., 1:157-171, 1994.
[169] J. A. Telle and A. Proskurowski. Efficient sets in partial k-trees. Dis-
crete Appl. Math., 44:109-117, 1993.
[170] J. A. Telle and A. Proskurowski. Practical algorithms on partial k-
trees with an application to domination-type problems. In F. Dehne, J.
R. Sack, N. Santoro and S. Whitesides, editors, Lecture Notes in Com-
put. Sci., Proc. Third Workshop on Algorithms and Data Structures
(WADS'93), volume 703, pages 610-621, Montreal, 1993. Springer-
Verlag.
[171] K. H. Tsai and W. L. Hsu. Fast algorithms for the dominating set
problem on permutation graphs. Algorithmica, 9:109-117,1993.
[172] C. Tsouros and M. Satratzemi. Tree search algorithms for the domi-
nating vertex set problem. Internat. J. Computer Math., 47:127-133,
1993.
[173] K. White, M. Farber and W. Pulleyblank. Steiner trees, connected
domination and strongly chordal graphs. Networks, 15:109-124, 1985.
[174] T. V. Wimer. Linear algorithms for the dominating cycle problems
in series-parallel graphs, partial K-trees and Halin graphs. Congr.
Numer., 56:289-298, 1986.
[175] T. V. Wimer. An O(n) algorithm for domination in k-chordal graphs.
Manuscript, 1986.
[176] M. Yannakakis and F. Gavril. Edge dominating sets in graphs. SIAM
J. Appl. Math., 38:264-272, 1980.
[177] H. G. Yeh and G. J. Chang. Algorithmic aspect of majority domina-
tion. Taiwanese J. Math., 1:343-350, 1997.
[178] H. G. Yeh and G. J. Chang. Linear-time algorithms for bipartite
distance-hereditary graphs. Submitted.
[179] H. G. Yeh and G. J. Chang. Weighted connected domination and
Steiner trees in distance-hereditary graphs. Discrete Appl. Math. To
appear.
[180] C. Yen and R. C. T. Lee. The weighted perfect domination problem.
Inform. Process. Lett., 35(6):295-299, 1990.
[181] C. Yen and R. C. T. Lee. The weighted perfect domination problem
and its variants. Discrete Appl. Math., 66:147-160, 1996.
407

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 407-456
©1998 Kluwer Academic Publishers

Selected Algorithmic Techniques for Parallel


Optimization
Ricardo C. Correal
Nucleo de Computar;iio Eletronica
Universidade Federal do Rio de Janeiro
Caixa Postal 2324, 20001-970, Rio de Janeiro, RJ, Brazil
E-mail: correa@nce.ufrj.br

Afonso Ferreira
CNRS-I3S-INRIA, SLOOP Project
BP93, 06902 Sophia-Antipolis, France
E-mail: Afonso.Ferreira@sophia.inria.fr

Stella C. S. Porto 2
Computar;iio Aplicada e Automar;iio
Universidade Federal Fluminense
Rua Passo da Patria, 156, 24210-240 Niteroi, RJ, Brazil
E-mail: stella@caa.uff.br

Contents
1 Introduction 408

I- Exact Methods 411


2 Branch-and-Bound 411
2.1 Preliminaries on Sequential B&B . . . . . . . .. 412
2.2 Implementation Models for Parallel B&B .. ... 415
Ipartially supported by the brazilian agency FAPERJ.
2Partially supported by the brazilian agency CNPq.
408 R. C. Correa, A. Ferreira, and S. C. S. Porto

2.2.1 Shared Data Model. . . . . . . . . . . . . 416


2.2.2 Relaxed SDM . . . . . . . . . . . . . . . . 420
2.2.3 Implementation Trends and Applications 421
2.2.4 Distributed Data Model . . . . . . . . . . 423
2.2.5 Implementation Trends and Applications 426
2.3 Some Remarks on Efficiency . . . . . . . . . . . . 429

II - Metaheuristics 430
3 Tabu Search 431
3.1 Preliminaries on Sequential Tabu Search. 431
3.2 Parallel Tabu Search . . . . . . . . . . . 433
3.3 Parallelization Trends and Applications 435

4 Annealing-Based Methods 438


4.1 Sequential Simulated Annealing. 438
4.2 Parallelization Trends and Applications 439
4.3 Microcanonical Optimization . . . . . . 440

5 Genetic Algorithms 444


5.1 Parallel Trends . . . . . . . . . . . . . . 445
5.1.1 Centralized Genetic Algorithms . 446
5.1.2 Distributed Genetic Algorithms . 446

6 Conclusion 448

References

1 Introd uction
The use of parallel algorithms for solving computationally hard problems
becomes attractive as parallel systems, consisting of a collection of power-
ful processors, offer large computing power and memory storage capacity.
Even though parallelism will not be able to overdue the assumed worst case
exponential time or memory complexity of those problems (unless an ex-
ponential number of processors is used) [11], the average execution time of
heuristic search algorithms which find good suboptimal solutions for many
hard problems is polynomial. Consequently, parallel systems, possibly with
hundreds or thousands of processors, give us the perspective of efficiently
solving relatively large instances of hard problems.
Selected Algorithmic Techniques for Parallel Optimization 409

Another important motivation for using parallel processing is that, in


several industrial, research or other real-world environments, mathemati-
cal programmers must face up to moderate instances of hard problems for
which an exact optimal solution is highly desirable (e.g. in VLSI floor-plan
optimization [4]). In such circumstances, parallelism can bring the process-
ing time from some days or months, that is typical when one workstation
is used, to some minutes or hours. This is crucial in some applications
that require quick solutions (e.g. robot motion planning and speech under-
standing). These two motivations imply that it is of major importance to
study ways to parallelize existing sequential methods used for solving hard
problems.
For these reasons, parallel processing has been widely studied as an
additional source of improvement in search efficiency in discrete optimiza-
tion [3, 22, 23, 42]. In more concrete terms, parallelism is treated as a way
to solve discrete optimization problems (DOP, for short), which are stated
as follows. Let n be a positive integer, W be the solution space, S be a
domain defined as the discrete set of all vectors x in the solution space that
satisfy a set of constraints, and f : W -+ R be the objective function from W
onto a completely ordered set R. We call xES a feasible solution and f(x)
the value of x. A feasible solution x is better than x' E S if f(x) < f(x').
We search for an optimal solution x* E S*, where S* is the set of all best
feasible solutions (we use the notation 1* = f(x*)). In the remainder of
this chapter, S stands indistinguishably for the domain of the DOP being
solved and for the DOP itself. Maximization problems can be stated and
dealt with similarly. We assume S finite and non-empty.
We review in this chapter the literature pertinent to design and imple-
mentation of parallel algorithms for discrete optimization problems. The
focus is on distributed memory parallel systems (denoted by DMPS). These
systems are composed of a set P = {po, PI,'" ,Pn-d of processors, each
one with its own local memory and with no physically shared memory. A
pair of processors is connected if there exists a physical link between them,
and the processors in a DMPS are connected by an interconnection network.
The communication between two processors is implemented through the ex-
change of messages over a sequence of links of the network connecting these
two processors. It is assumed throughout this chapter that each processor
is powerful enough to do some computation and to communicate over one
or more of the links incident to it simultaneously. Intel Paragon, Cray T3E,
IBM SP-2, CM-5 and network of workstations are examples of distributed
memory parallel systems.
410 R. C. Correa, A. Ferreira, and S. C. S. Porto

In the description of the algorithms in this chapter, we assume the model


of computation defined in the sequel. This model is intended to bring under
the same framework the apparently different DMPSs used to support the
parallel implementations covered in the following sections. An algorithm is
a set of communicating tasks, where each task is a sequential-code entity.
Pairs of tasks of an algorithm are comiected by, and communicate with
each other through, channels. A channel is an abstract entity which is
supposed to be supported by the operating system running on the DMPS
under consideration. In order to execute an algorithm, its tasks must be
mapped onto the processors of the DMPS. For the sake of convenience, it
is assumed that each processor is able to concurrently execute one or more
tasks. We also assume that the communication interface with the operating
system provides the tasks ways of sending and receiving messages through
their incident channels. Finally, it is assumed that the tasks are identified,
and furthermore that each task knows its identity. We refer the interested
reader to [5] for further details on these abstractions.
The remainder of the chapter reviews the literature in parallel algorith-
mic techniques for solving discrete optimization problems. The focus is on
the state-of-the-art of the strategies used to parallelize well-known sequential
exact methods and metaheuristics. In all cases, we first present theoretical
aspects in order to describe their intrinsic parallelism. Then, the reader will
find a summary of implementations from the literature, which is intended to
give an overview of the trends in exploiting the intrinsic parallelism in prac-
tical implementations. We shall describe the algorithms at the task-channel
level, indicating how the tasks are mapped to processors.
These topics are organized in two parts. The first part is devoted to
the branch-and-bound exact method, which is the subject of Section 2. The
presentation in this part is more detailed, and most of the techniques are
carefully analyzed. In the second part, we turn out to metaheuristics. This
part is organized in three sections. In section 3, we review the principles
of tabu search and present a framework to describe its parallel implemen-
tations. Annealing-based methods are then discussed in Section 4. Finally,
Section 5 contains a large material about parallel genetic implementations.
In all sections of the second part, we note that, from the parallel standpoint,
the techniques are very similar to those used in parallel branch-and-bound.
For this reason, we focus on how to use these techniques in the context of
the metaheuristics. A large number of references, covering all the topics of
this chapter, is also provided.
Selected Algorithmic Techniques for Parallel Optimization 411

Syntax adopted Throughout the chapter, parallel tasks are described


using guarded commands. In this notation, a guard is a boolean condition
or a specification of a type of a message to be received by the task. A guard
is said to be valid when the corresponding boolean condition is satisfied or
when a message of the specified type is available. A guarded command is a
guard followed by a sequential code. A task is then a loop including a set of
guarded commands. At each step of the loop, a valid guard is selected, and
the corresponding sequential code is executed. In case several guards are
valid, one of them is selected arbitrarily. Although we adopt this semantics,
one may alternatively think the guarded commands as threads, in which
case they can execute concurrently in a single processor.

I - Exact Methods

As mentioned in the Introduction, optimal solutions are highly desirable


in several real-world environments. For this reason, the theoretical interest
on the structural characterization of the solution space of a great number
of applications has been increased in the last decades. As a consequence,
several efficient exact methods for DOPs that profit of the advances provided
by the theory have emerged. Most of these exact methods use structural
properties of the solution space to guide the search for an optimal solution
in a branch-and-bound framework [63]. In spite of the progresses obtained in
this direction, a lot of hard DOPs persist. Thus, parallelism appears as an
additional way to bring the computation time required to obtain an optimal
solution of relatively large instances of hard problems to some reasonable
limits.
In this part of the chapter, we concentrate on a more in-depth presen-
tation of parallel branch-and-bound, which illustrates the major issues and
trends in parallelizing exact methods for hard problems. In particular, we
emphasize on how to efficiently guide the parallel search provided that the
sequential counterpart is efficient.

2 Branch-and-Bound
The method called branch-and-bound (denoted as B&B) is a tree search
algorithm often used as an implicit enumeration method of the solution
space of discrete optimization problems. Its principle lies in successive de-
412 R. C. Correa, A. Ferreira, and S. C. S. Porto

compositions of the original problem in smaller disjoint subproblems until


a feasible solution found during the search is demonstrated to be optimal.
This enumeration is implicit because some subproblems which are known
not to contain an optimal solution are eliminated without being decom-
posed. Backtracking, dynamic programming, A* and AND-OR tree search
can be viewed as variations of B&B algorithms [43, 62].
From a parallel computing point of view, the challenge is how to use a
set of processors to improve the search efficiency of B&B algorithms, given
that an attractive feature is that disjoint subproblems can be decomposed
simultaneously and independently. Other sources of parallelism exist, but
they have not been widely explored [9, 14, 78]. Parallel B&B is traditionally
considered as an irregular parallel algorithm due to the fact that the struc-
ture of the search tree is not known beforehand [23]. Its efficiency depends
on heuristic choices of paths to be explored in the search tree, because they
may result in unnecessary work, in case a subproblem that does not contain
an optimal solution is chosen to be decomposed. In this section, we present
some analytic models of parallel B&B algorithms, which are used to address
some issues related to their implementation and efficiency.

2.1 Preliminaries on Sequential B&B


A sequential B&B algorithm consists of a sequence of iterations. In the first
iteration, if S can be efficiently solved, an optimal solution is then obtained
and the algorithm stops. Otherwise, S is decomposed in a number of sub-
problems. A subproblem corresponds to the original problem defined over a
subset of s E S. Once again, s stands for the domain of the subproblem and
for the subproblem itself. In each subsequent iteration, a subproblem pre-
viously generated is solved to optimality or decomposed, which eventually
generates new smaller subproblems.
At any point of the execution, a priority list £. keeps open subproblems,
i.e. subproblems that were generated previously but neither decomposed
nor eliminated or solved [11, 54]. In order to describe more precisely the
enumeration process introduced above, let h be a heuristic priority function
that applied to an open subproblem determines its priority from a completely
ordered priority set 1-l. The only constraint imposed over h depends on the
decomposition and will be stated later. The subproblem decomposed in an
iteration is selected from £., where the newly generated subproblems are
also inserted. The aim of the priority list is to provide open subproblems
in a nondecreasing order of h. Thus, the priority function encapsulates the
Selected Algorithmic Techniques for Parallel Optimization 413

structural information used to guide the search.


The priority list £ is composed of a set of open subproblems, denoted
by £.openset, and the following two functions. One more function that is
used to eliminate subproblems will be defined later in this section.

Selection: this function is denoted by £.SELO, and returns an open sub-


problem s such that there exists a nondecreasing order of h applied to
the subproblems in the openset in which v is the first element. If the
openset is empty, then £.SELO = 0. Otherwise, £.sELO i- 0 and the
subproblem returned, called E-node, is extracted from the openset.

Insertion: given a set V of open subproblems, an insertion £.INS(V) ex-


tends the openset with the subproblems in V.

The enumeration process is completely specified by a decomposition Junc:..


tion 8 which returns a set of subproblems when applied to a given E-node
s. As previously introduced, this function is responsible for the generation
of the subproblems from S. The subproblems in 8(s) are restricted to be
disjoint (in the sense that a feasible solution of every subproblem in 8(v)
is a feasible solution of s). Furthermore, other additional properties are
generally required to the subproblems in 8(s), often as a way to prevent a
premature elimination of all optimal solutions [36, 63].
Feasible solutions are found during the search in two situations during a
decomposition. First, E-nodes simple enough to be solved directly may be
chosen for decomposition. In this case, such an E-node, say s, is solved to
optimality, determining an optimal solution to s (which is a feasible solution
for S but not necessarily an optimal one). Also, one can find a feasible solu-
tion in s (not necessarily an optimal solution of s) during its decomposition.
At any point of the search, one of the best feasible solutions found so far is
called incumbent and is kept in a variable denoted xu, as well as its value
(variable U = f(xu)).
Having defined the decomposition, we are able to state the constraint
imposed over h, which is the following: If Si E 8(sj), then h(Si) ;::: h(sj).
Using only the enumeration process incurred by the successive decompo-
sitions, all the subproblems that are not simple enough to be solved directly
must be decomposed in order to eventually demonstrate that the incum-
bent is an optimal solution. However, some of these subproblems can be
eliminated from the enumeration using a lower bound function I, which as-
sociates to each open subproblem s a value l(s) ~ minxEs{J(x)} such that
l(Sl) ;::: l(S2), if Sl is generated by the decomposition of S2. Notice that if
414 R. C. Correa, A. Ferreira, and S. C. S. Porto

an open subproblem s is such that l (s) 2: U then s cannot lead to a feasi-


ble solution which is better than Xu. This is the motivation for the third
function associated to the priority list I:- defined below 3 .

Pruning: given a feasible solution X, a pruning I:-.PRUN(X) eliminates from


the openset every subproblem v so that l (v) 2: f (x).

A sequential B&B algorithm is summarized in Figure 1. The execution of


the algorithm starts with the initial state (openset = {S}, U = U, Xu = x),
where U 2: 1* is a constant corresponding to the initial incumbent value
(possibly infinite) and x is a solution having value U (if U = 00, then the
contents of Xu is undefined), and ends at the final state (0,1*, x*), with x*
being an optimal solution with value 1*. The correctness of this algorithm
is guaranteed provided that a decomposition eliminates an optimal solution
only if the incumbent is as good as this optimal solution, and a pruning
eliminates a subproblem s only if the incumbent is at least as good as the
best solution in s. Moreover, if a feasible solution is not eliminated, then it
is explicitly enumerated exactly once [59, 64].

Algorithm: sequential branch-and-bound


Input: initial subproblem S, initial incumbent x and value V;
L.openset ~ {S}; U ~ V; Xu ~ x;
while (L.open8et of- 0) do
1. 8 ~ £.SELO;
V ~ 15(8);
if (decomposition identified a feasible solution xEs) then
if (f(x) < U) then
Xu ~ X; U ~ f(x);
L.PRUN(X);
L.lNS(V);
return Xu;

Figure 1: Pseudo-code for sequential branch-and-bound.

It is sometimes convenient to think the decomposition process as a search


30ther elimination criteria are popular in branch-and-bound algorithms. They are not
explicitly mentioned in this chapter because they do not lead to crucial modifications
in the parallel approaches reviewed in the next sections. We refer the interested reader
to [35, 67].
Selected Algorithmic Techniques for Parallel Optimization 415

tree, with root S. The vertices of such a tree represent the subproblems and
an arc (Si' Sj) indicates that Sj E 0'( sd 4 . A B&B algorithm consists of a
heuristic iterative search in the search tree. Consequently, the possibility
of two subproblems Si and Sj to be independently decomposed provided
that there is no k such that Si E O' k (sj) or Sj E O' k (si), where 81(s) = O'(s)
and 8k (s) = O'(some subproblem in O' k - 1(s)), becomes more evident. This
intrinsic parallelism will be explored in the following sections.

2.2 Implementation Models for Parallel B&B


In this section, we describe parallel B&B algorithms whose principle lies
in simultaneous subproblem decompositions, being each decomposition per-
formed sequentially by a distinct task. The execution of a parallel B&B
algorithm on a DMPS incurs some overhead, due to communication man-
agement, which can degrade the performance. As a consequence, several
special techniques have been developed to address the problems related to
the searching process taking into account two (apparently conflicting) major
issues:
1. minimize time overhead, including communication overhead and idle
time due to contention on common data structures; and
2. minimize search overhead, which appears in the parallel setting be-
cause a subproblem S which should not be decomposed (i.e., with
l(s) 2: 1*) may be selected [45, 48J.
In this context, the questions one may still raise are where and when to
decompose each subproblem, and how to_ take these decisions distributedly.
We shall see in the sequel how to use the priority function h to answer
these questions under two high-level approaches used to model intertask
interactions: shared data model (SDM) and distributed data model (DDM).
The SDM forces an order of subproblem decompositions which is consistent
with h. On the other hand, in the DDM, the tasks independently execute
sequential B&B algorithms on disjoint search subspaces, and h is only used
to (dynamically) partition the search space among the tasks.
The description of the two models mentioned above is based on the task-
channel level defined in the Introduction, where each task is a sequence of
4Notice that the definition of the decomposition function 8 guarantees that the graph
defined in this way is indeed a tree. Although not in the scope of this chapter, there are
other search algorithms, known as graph search algorithms (as A' and its derivations), for
which intersections between subproblems are allowed [67].
416 R. C. Correa, A. Ferreira, and S. C. S. Porto

guarded commands of two types. The first type corresponds to the actions
performing the search itself, which, with slight modifications, are those in-
dicated in the algorithm of Figure 1. The guarded commands of the second
type are related to the underlying distributed algorithms required to bring
the distributed nature of the tasks arrangement to the higher abstract level
adopted to classify parallel B&B algorithms. We assume that to each task
corresponds a different processor of the DMPS. The tasks and the channels
between pairs of tasks are represented by a task-channel graph G, which
is assumed to be a complete graph for the sake of simplicity. However, we
assume that the structure of G is arbitrary and known beforehand when
presenting some implementations from the literature in order to illustrate
how the models can be used in practice.

2.2.1 Shared Data Model


The SDM is the high-level approach shown in Figure 2. We denote by GDS
(for global data structure) the structure containing a (virtual) global priority
list and a global incumbent. The former, denoted by C by analogy with its
sequential counterpart, is an abstract entity since tasks do not share any
piece of memory when they are mapped to distinct processors. The GDS is
consistently shared by all tasks, e.g., whenever any task invokes an operation
on the GDS, any other task that subsequently operates the GDS will retrieve
the result of the operation just performed. The operations of insertion,
selection and pruning in the priority list, as well as incumbent update, are
executed indivisibly, e.g., the model guarantees serializability of operation
invocations; if two operations are applied simultaneously, then the result
is as if one of them is executed before the other; the order of invocation,
however, is non-deterministic. The aim of SDM is to reduce search overhead,
which is done based on global selections that respect the priority function
h [15]. The priority function is not sufficient to completely eliminate search
overhead in most cases, but it is the unique heuristic information available
to guide the search [48].
Now, we describe the attributes G's level under SDM. Each of the m
tasks works in its own pace through a sequence of steps, which is shown
in Figure 3 for a task ti. In this sequence of steps, decompositions are
independently performed by the tasks, which allows some tasks to operate
over £, more frequently than others. As indicated in Figure 3, each task
is required to deal with four types of messages. Each of these types of
messages is in correlation with one of the underlying algorithms used to
Selected Algorithmic Techniques for Parallel Optimization 417

(openset, xu, U)

Each selection/insertion
incurs a latency time

...
qo ql qp-l

Each processor runs a sequential B&B

Figure 2: Overall description of the SDM with p tasks.

distributedly implement the higher abstract level of SDM. Before describing


more precisely the algorithm in Figure 3, let us define some variables in
order to see in more detail how the GDS is organized at G's level.
A subproblem selected from £ is put in BestSbpi. If BestSbpi = Nil,
then there is no subproblem ready to be decomposed by k For a lack of
shared memory when £ is to be implemented over G, the global priotity list £
is divided into as many portions as tasks with an one-to-one correspondence
between tasks and portions of £. A portion of £ assigned to a task ti is also
a priority list, and is referred to as local priority list £i. On the other hand,
the global incumbent Xu and its value U have to be replicated such that to
each task ti is assigned a version xu; and Ui of Xu and U, respectively. For
each task ti, the boolean variable TermCond i is used to indicate whether
£i.openset is empty or not. Another boolean variable, TermCond, indicates
whether the stable property

TermCond i = T RUE, simultaneously for all 0 ::; i < m,


(1)
and there is no message in transit from one task to another

is verified. If TermC ond becomes T RU E in ti at time k, then (1) was


verified somewhere before k (since (1) is a stable property, it remains valid
at k). In these terms, TermC ond = T RU E is equivalent to the sequential
termination condition (Figure 1).
Two different states for a task are identified in SDM, namely ACT IV E
and IN ACT IV E. These states aim to indicate whether the task has been
assigned a subproblem to decompose or not. A task in the ACT IV Estate
has not completed the decomposition of a subproblem it has requested.
418 R. C. Correa, A. Ferreira, and S. C. S. Porto

On the other hand, the situation where a task is free for starting a new
decomposition is signaled by the IN ACTIV E state. An IN ACTIV E to
ACTIV E state transition occurs for a task when it requests a selection,
and the transition in the inverse direction occurs when the task finishes
a decomposition. Thrning out to the use of these variables in the SDM
algorithm in Figure 3, lines 1 and 2 are devoted to the implementation of the
distributed termination detection algorithm (see [5] and references therein).
The main purpose of this algorithm is to set the variable TermCond to
TRUE only after (1) is verified.
Let us examine the remaining guarded commands in the algorithm in
Figure 3. The crucial point in the implementation of SDM is the manage-
ment of the global priority list. As stated above, the operations over £, have
to occur one after the other. Then, the first problem to be tackled is this
sequentialization, which can be done defining some token-passing strategy
such that a task has the right to operate over £, if and only if it holds the
token. This token-passing strategy must be judiciously determined in order
to avoid situations where some task is precluded to operating £, for exces-
sively long times. Let us now suppose that a task ti possesses the token for
a selection from £,. In such a situation, ti is only able to determine a best
subproblem in £, locally when it knows the priority of the best subproblems
in each local priority list. Therefore, the globaly best subproblem s may
reside in some £'j, j =1= i, which implies that s must be transported from
task tj to ti [12].
Efficient solutions for all of these situations are not trivial at all in the
context of distributed memory. Regardless the solution adopted, we can
state that each task deals with three types of messages. Namely, messages
used by the token-passing strategy, by the best subproblem selection and by
subproblem transportation. All these types of messages are treated when
the actions related to the guard of line 4 are executed. This may involve
setting new values to BestSbpi and TermCondi, in case ti receives the
best subproblem from other task. Eventually, TermCondi is also set to
FALSE when receiving the best subproblem, or to T RU E when receiving
an information that there was no subproblem available in the local priority
lists to be selected. Analogously, the case where a subproblem housing
in ti may be transported to another task as the best subproblem, which
eventually switches TermCondi from T RU E to FALSE and vice-versa.
Each task decomposes its subproblem selected in the guarded command 4
independently from the other tasks (possibly generating feasible solutions).
This is done in the guarded command 5. If some task generates a feasible
Selected Algorithmic Techniques for Parallel Optimization 419

Algorithm: parallel branch-and-bound under SDM, task ti, 0 ~ i ~ m - 1


Input: initial subproblem S, initial incumbent Xo and value Uo;
if (i = 0) then £i.openset +- {S}; TermCondi +- FALSE;
else TermCondi +- TRUE;
Ui +- V; xu, +- x;
TermCond +- FALSE; BestSbpi +- Nil; state +- INACTIVE;
while (TermCond = FALSE) do
1. guard: (TermCondi = TRUE)
Spontaneous termination actions;
2. guard: termination message
Termination actions, eventually setting TermCond to TRUE;
3. guard: (state = INACTIVE)
Spontaneously request global selection from £;
state +- ACTIVE;
4. guard: £ management message
£ management actions, eventually setting TermCondi to
FALSE or TRUE, or BestSbpi to some open subproblem
and state to IN ACT IV E;
5. guard: (BestSbpi =f. Nil)
s +- BestSbpi; BestSbpi +- Nil;
V+- 8(s);
if (decomposition identified a feasible solution xEs) then
if (f(x) < Ui) then
Broadcast x;
Ui +- f(x); XU i +- x;
£i.PRUN(X);
Spontaneously request global insertion of V in £;
state +- IN ACT IV E;
6. guard: incumbent candidate x'
if (f(X') < Ui) then
Ui +- f(x' ); xu, +- x';
£i.PRUN(X' );
return xu,;

Figure 3: Pseudo-code for parallel branch-and-bound under SDM.


420 R. C. Correa,A. Ferreira, and S. C. S. Porto

solution x whose value is smaller than Ui, it broadcasts x to each task tj,
j i= i, where it eventually also becomes the new incumbent (guarded com-
mand 6). Thus, ti and all of such tj's update the local incumbent, possibly
prunning.c. Finally, ti requests the insertion of all new subproblems, which
is performed under the policy established in the guarded command 4.

2.2.2 Relaxed SDM


For the sake of flexibility, SDM could be relaxed in some of its constraints
in order to reduce the synchronization and communication overheads as-
sociated to the management of .c. However, the guarantee that the exe-
cution follows an order of decompositions consistent with h would be lost.
Thus, one may seek with such relaxations a tradeoff between minimizing
time and search overheads. This relaxation consists of reducing the depen-
dency between partial lists, which is equivalent to assume that G is not
a complete graph anymore and that its structure defines the reach of the
selections. More precisely, let us number the selections performed by each
task as 0,1,2,···. Then, reachi(k) stands for the k-th selection of task ti,
and indicates the set of local priority lists whose best subproblems are taken
into account in the comparisons defining the result of the selection. This
parameter should be included in the guarded commands 3 and 4 in Figure 3.
The reach of selections can be static or dynamic. Considering the static
reach case, reachi(k) is defined to fit Ni for all selections, where Ni stands
for the set of tasks adjacents to ti in G. It implies that all selections take
into account all partial lists when G is a complete graph (the previous case,
with no relaxation), but only a subset of local priority lists otherwise. The
actions in the guarded commands 3 and 4 are not altered in this case pro-
vided that they act only on the neighborhood. The effect on time overhead
of this relaxation is evidenced by the fact that a selection which involves a
smaller number of local priority lists yields less synchronization and com-
munications. On the other hand, there is a price to pay in terms of search
overhead in case the local priority list containing the best subproblem does
not participate in a selection process. In such a situation, a low quality
subproblem may be selected, yielding a search overhead.
The dynamic reach case introduces a dynamic evolution in the depen-
dency relations between local priority lists. We keep the structure of G as
above, but the reach changes dynamically in function of the selection. Thus,
reachi(k) is defined to be a subset of Ni, and reachi(k 1 ) may be different
from reach i (k2) if kl i= k 2. In this dynamic case, the guarded command 3
Selected Algorithmic Techniques for Parallel Optimization 421

should be altered to request selections from I:- with reachi(k), for each se-
lection k. Additionally, the actions in the guarded command 4 should act
only on the subset of Ni corresponding to reachi(k). The time and search
overheads are affected by this second relaxation as in the previous case.

2.2.3 Implementation Trends and Applications

The features discussed so far provide a challenge for DMPS implementations


of high level (relaxed) SDM parallel B&B, which have been investigated from
several points of view. The difficulty lies in determining ways to appropri-
ately set actions and conditions in Figure 3, mainly when relaxations are
concerned. A number of parallel implementations have been reported in the
literature. The vast majority of published implementations have focused on
specialized algorithms for "pure" combinatorial problems. These are taken
as example because they can be implemented and tested in order to capture
the nature of parallel B&B algorithms without too many distracting details
specific to the problems.
Two SDM implementations were tested by Kumar, Ramesh and Nagesh-
wara Rao [44], one being synchronous and the other asynchronous. Both
implementations were done on a BBN Butterfly composed of up to 256 tasks.
The global data structure is physically distributed over the local memories of
the tasks, and the tasks use the facility, provided by the BBN Butterfly com-
puter, to access the local memory of another task through a packet switched
network. The asynchronous version is called blackboard, and its principle
is similar to £. The problems tested are the traveling salesperson problem
and the vertex cover problem, whose instances were randomly generated. In
the synchronous version, a concurrent heap is used in order to improve the
performance of the global data structure operations. For a sufficiently large
instance of the traveling salesperson problem, this synchronous algorithm
attains good speedups. For the vertex cover problem, being a fine-grained
problem, the algorithm turned out to be inadequate. The asynchronous
implementations exhibited good speedups for both application problems.
Miller and Pekny also used a BBN Butterfly computer to implement
their algorithm (14 tasks) [58, 68]. It is an asynchronous parallel B&B un-
der SDM. The application used is the traveling salesperson problem. The
sequential B&B algorithm used is sophisticated, and involves the solution of
an assignment problem relaxation to compute the lower bound of subprob-
lems and subtour elimination heuristics to decompose a subproblem. The
decomposition operation is divided in two steps, the division of the subprob-
422 R. C. Correa, A. Ferreira, and S. C. S. Porto

lem in a number of new subproblems and the lower bound computation. The
GDS, in this case, keeps two priority lists of subproblems; a list of subprob-
lems to be divided (open subproblems) and a list of subproblems whereof to
compute the lower bound (solved subproblems). During a selection, the GDS
provides an open subproblem if there is at least one; otherwise, it provides
a solved subproblem. Moreover, a patching algorithm is used from time to
time to find a feasible solution in a subproblem, which can update the upper
bound. The instances were generated randomly. The speedup values tended
to increase with the number of subproblems decomposed. For reduced num-
bers of decomposed subproblems, the speedup tended to approach to one,
and deceleration speedup anomalies were detected. For instances where the
number of subproblems was sufficiently large, the speedups tended to ap-
proach p, and acceleration anomalies were verified.
A CM-5-oriented asynchronous implementation for the mixed integer
programming problem was proposed by Eckstein to assess the impact of
the resulting bottlenecks when G is a star [20]. This case is also referred
to as centralized strategy, and the task at the center of the star is called
central task. This strategy limits performance since the latency time at each
iteration grows linearly with the number of tasks. Moreover, at least two
communications are necessary between each task and the central task per
decomposition (corresponding to the selection and insertions). During the
execution of the algorithm, a number of additional pieces of information are
associated to each subproblem in order to improve the performance of the
decomposition operation. However, Eckstein used the active message facility
of the CM-5 to prevent useless transportation of subproblems and excessive
amount of memory consumed at the central task. Problems from MIPLIB, a
public collection of problems derived from real industrial applications, were
used to check the efficiency of the algorithm in an "industrial strength"
environment. Up to 128 tasks have been used. The benefits of parallelism
tend to persist at least until there are about 100 subproblems for every task,
which is related to the coarse-grained feature of the problem. Impressive
speedup values were obtained for some instances, the most impressive one
for a problem whose estimated sequential time was 6.6 days and that was
solved in only 34.9 min with 128 tasks. The results were extended in [21]
with more efficient decompositions and lower bound calculations. With this
improvement in the sequential algorithm and a better scalar performance of
the version of the CM-5 used, most of the superlinear speedups disappeared.
The synchronized SDM and asynchronous relaxed SDM implementations
described by Correa and Ferreira in [13] approach the question of the rela-
Selected Algorithmic Techniques for Parallel Optimization 423

tion between the order of selections and the possibility of overlap between
communication and computation. A distributed algorithm is used in both
the implementations to select the m (number oft asks) best priority subprob-
lems in £. In the synchronized case, the latency time associated to selection
and insertion degrades the execution time. However, it yields a good order
of subproblem decompositions, where the total number of subproblems de-
composed decreases almost linearly with the number of tasks. This order
is then taken as a reference in the sense that it is desirable to obtain an
asynchronous execution (without synchronization points) that follows this
order. With this purpose, an asynchronous algorithm is proposed that can
be explained in terms of two waves propagating in the search tree. One wave
corresponds to the subproblems inserted with complete reach (containing all
local priority lists) and the other wave corresponds to the subproblems in-
serted with local reach (a task inserts only in its local priority list). The
decompositions of subproblems selected with local reach tend to move the
corresponding wave away from the other wave. The distance between the two
waves is controlled such that if it becomes greater the a predefined thresh-
old, then decompositions of subproblems selected with complete reach are
performed, bringing the two waves closer to each other. The application
used is the knapsack problem, with randomly generated hard instances. In
spite of the fine-grained nature of the problem, good speedups were obtained
up to 32 tasks.

2.2.4 Distributed Data Model


Contrary to SDM, which defines an abstract GDS, the DDM is a fully dis-
tributed high level approach, briefly described in Figure 4. We can see in this
figure that each task ii, i = 0, ... ,m-l, works by operating over its local pri-
ority list and by exchanging subproblems with other tasks (noted subproblem
migration), meaning that a subproblem is authorized to migrate from one
local priority list to another. Its simple nature might lead the reader to be-
lieve that parallel B&B is trivial in this case: each task sequentially searches
on different and disjoint regions of the search space, without worrying about
"global" selections. Recall, however, that the structure of the search tree
is not known beforehand, and since subproblems are generated and selected
in an unpredictable way in parallel B&B algorithms, irregularities appear
during the parallel search. It is likely that a distributed algorithm will pro-
duce search overhead compared to the sequential algorithm because a task
can be assigned to a region of the search tree which would not be considered
424 R. C. Correa, A. Ferreira, and S. C. S. Porto

in the sequential case. In addition, a task can eventually run out of work
if the subproblems in the region assigned to it are solved to optimality or
eliminated due to a pruning operation.

. ..
Co C1 Cp - 1

selection/
insertion
qo ...
(decomp.) q1 qp-1

subproblem migration

Figure 4: Overall description of the DDM with p tasks.

Special techniques must be used to address the problems related to the


irregularity of the parallel tree searching process. These problems are es-
sentially related to dynamic workload balancing, where the amount of work
must be evenly distributed over the tasks. The amount of work can be de-
fined in several different forms. The most obvious is to count the number of
subproblems in the local priority list (this is used for quantitative workload
balancing). However, several experiments have shown that, in many situa-
tions of interest, this dynamic workload balancing must take into account
the heuristic function h (qualitative workload balancing) to minimize search
overhead [3, 20, 21, 39, 42, 50]. A dynamic workload balancing strategy de-
fines a way to perform a continuous monitoring of the amount of workload of
each task. Additionaly, it defines a way to decide about migrations of sub-
problems in order to recover some workload unbalance previously detected.
The algorithm in Figure 5 shows the main steps in a DDM implementa-
tion of parallel B&B. The main differences between this algorithm and that
of Figure 3 lay on the global data management. In Figure 5, the guarded
command 1 is activated by a message containing some information used
in the monitoring of the workload of the tasks. Such a message induces
some actions, which may include requesting subproblem migration from ti
Selected Algorithmic Techniques for Parallel Optimization 425

Algorithm: parallel branch-and-bound under DDM, task ti, 0 :::; i :::; m-l
Input: initial subproblem S, initial incumbent Xo and value Uo;
if (i = 0) then Li.openset +- {S}; TermCondi +- FALSE;
else TermCondi +- TRUE;
Ui +- U; XUi +- x; TermCond +- FALSE;
while (TermCond = TRUE) do
Guarded commands 1 and 2 of Figure 3;
1. guard: workload monitoring message
Workload monitoring actions, eventually requesting subproblem
migration;
2. guard: workload monitoring condition
Spontaneous workload monitoring actions, eventually requesting
subproblem migration;
3. guard: set V of migrated subproblems
Li.INS(V) and update local workload measures;
4. guard: subproblem migration request message or condition
Get subproblems to migrate from Li and send them;
Update local workload measures;
5. guard: (T RU E)
s +- Li.SEL();
V+-o(s);
if (decomposition identified a feasible solution xEs) then
if (f(x) < Ui) then
Broadcast x j
Ui +- f(x)j XUi +- Xj
£i.PRUN(X)j
£;.lNS(V)j
Update local workload measureSj
Guarded command 6 of Figure 3;
return XUij

Figure 5: Pseudo-code for parallel branch-and-bound under DDM.


426 R. C. Correa, A. Ferreira, and S. C. S. Porto

to some other task tj or from ij to ii in order to restore workload balancing.


The same type of actions may be induced by some local condition, which
corresponds to the guarded command 2.
The migration of subproblems is dealt with in guarded commands 3
and 4. In some situations, ii may receive some migrated subproblems from
another task ij, which may be requested by ii or spontaneously sent by ij.
Such a situation activates the guarded command 3, and the consequences
in ii are the insertion of the subproblems received in its local priority list
and the updating of local workload measures. Another situation of interest
that involves subproblem migrations occurs when guarded command 4 is
activated by a subproblem migration request. This request can be local
(some corresponding condition is detected) or originated by another task
(some corresponding message is received). In both cases, subproblems from
the local priority list are migrated, and the local workload measures are
updated.
Finally, guarded command 5 corresponds to the sequential B&B using
the local priority list. A task ii selects open subproblems from and inserts
the generated subproblems into Li. Similarly to the SDM case, feasible
solutions eventually encountered may update the incumbent in all tasks.

2.2.5 Implementation Trends and Applications


The most important question to be addressed in the context of DDM is the
dynamic workload balancing. Three choices must be made. The first choice
consists of deciding between quantity or quality workload balancing. In the
first case, the amount of work defines the workload of the tasks (e.g., the
number of open subproblems in a local priority list). In the second case,
it is the quality of the work (generally given by the priority of the open
subproblems) that defines the workload. The second choice is whether to
prevent or to recover workload imbalance. Finally, the workload balancing
actions can be of local or global reach. In what follows, we survey some of
the DDM implementations from the literature and its applications.
Besides the SDM implementations, Kumar, Ramesh and Nageshwara
Rao also experimented with two DDM implementations in [44]. In both
implementations, specialized workload balancing schemes were used, i.e.,
the randomized strategy of Karp and Zhang [39], and a ring strategy. In
the ring strategy, the tasks are assumed to be connected in a virtual ring.
Each task periodically sends the newly generated subproblems to one of its
neighbors in the ring. In order to interpret the results obtained, we note
Selected Algorithmic Techniques for Parallel Optimization 427

that the main difference between the two problems used (TSP and VCP)
is that, for the VCP, nearly 75 percent of the subproblems have the same
lower bound, while these values are more distributed in the TSP. Thus, a
quality workload balancing is more important for the TSP than for the VCP.
The results confirm this affirmation. For the TSP, both DDM strategies are
worse than the SDM strategies. For the VCP, the SDM implementation also
yields better speedups, but the randomized scheme also gives good results.
Ma, Tsung and Ma studied two implementations of an asynchronous
parallel B&B under DDM [52]. In the first implementation, there is no spe-
cialized workload balancing scheme. In this case, subproblems are allocated
to the tasks, and the tasks work independently. On the other hand, in the
second implementation, when a task runs out of work, it requests work from
another busy task. The busy task estimates its remaining work in order to
decide whether or not to transfer some work to the idle task, which char-
acterizes a quantitative workload balancing scheme. Good speedup values
were attained with dynamic workload balancing and a resource scheduling
problem as target application (no details about the instance chosen are pro-
vided). The workload imbalance caused bad speedup values in the first
implementation.
Pardalos and Crouse used a quite sophisticated sequential B&B algo-
rithm to find all optimal solutions of a quadratic assignment problem [65].
This fact prevents the possibility of superlinear speedups in its parallel im-
plementation, since the asynchronous parallel B&B algorithm is based on
the sequential one, under DDM. A sufficiently large number of subproblems
is initially generated and evenly distributed over the tasks. The speedup is
measured for two classes of test problems used to evaluate the algorithm.
Good speedups were obtained only for sufficiently large instances.
The VCP was also used by Luling and Monien as an application in order
to investigate DDM implementations [51]. The emphasis of their proposal
is a quality workload balancing specialized scheme. They proposed some
weight functions to express the workload of a task. The workload is locally
controlled in such a way that a task and its neighbors have nearly the same
workload. A feedback scheme is used to dynamically update the admissible
difference of workload between tasks, to avoid trashing effects and to control
the amount of communication. Using a variation of the VCP, known as
weighted vep, the instances were randomly generated with weights in a
large range. Two communication topologies were used, namely de Bruijn
and ring. For local workload balancing strategy, low diameter networks are
well adapted, thus the implementations in the de Bruijn network presented
428 R. C. Correa, A. Ferreira, and S. C. S. Porto

better results. Impressive good speedups were obtained with sufficiently


large instances, even up to 256 tasks.
Laursen used the QAP problem to compare asynchronous parallel B&B
implementations under DDM, with and without specialized dynamic work-
load balancing [47J. The emphasis in his work was on simplicity. He mea-
sures the average task utilization rate to determine efficiency. The QAP
being a very hard optimization problem, the sequential algorithm tends to
decompose a large number of subproblems. In the first approach, the original
problem is initially decomposed in an appropriate number of subproblems.
Then, the amount of work associated to each subproblem is estimated, and
the subproblems are distributed according to these estimations such that the
total amount of work is evenly distributed over the tasks. In the second ap-
proach, subproblems are exchanged between neighbor tasks at fixed periods
of time for dynamic workload balancing purposes. Good results were ob-
tained in both cases. Even the version without dynamic workload balancing
exhibited average task utilization rates between 80 and 90 percent, indicat-
ing that the tree search is quite "regular" in this case. The quantitative
workload balancing used is simple, and brings the average task utilization
to 98 percent.
In order to use a DDM to extend its previous SDM implementation with
centralized strategy, Eckstein [20, 21] used a randomized workload balanc-
ing scheme proposed by Karp and Zhang [38, 39J. In this scheme, each task
randomly chooses, from time to time, a task to which to send a subproblem
just generated. Comparing the results obtained with this scheme with the
results with the SDM, an increasing in the parallel execution time of about
30 percent in average was noticed. Investigating the total amount of work
accomplished by the tasks in both implementation, he surprisingly noticed
that this was typically smaller under the DDM, indicating that a certain
amount of randomness can be beneficial. An in-depth investigation of this
fact showed that the Karp-Zhang scheme does a fairly good job of balancing
workload quality, but is less effective at evening out its quantity, since more
task idleness was detected. Using the improved sequential implementation,
this scheme also exhibited a degradation on the parallel execution time (36
percent in average). The amount of work increased less than the parallel
execution time (25 percent in average). However, using in addition a special-
ized workload balancing scheme that organizes some of the tasks in a binary
tree, the speedup values became competitive with the corresponding values
using the centralized strategy. An interesting point to remark is that this
scheme used solely is worse than the randomized one in terms of speedup
Selected Algorithmic Techniques for Parallel Optimization 429

(in average). Apparently, randomization removes the "easy" parts of the


task of workload balancing, and the specialized workload balancing scheme
improves the performance taking care of the difficult aspects.

2.3 Some Remarks on Efficiency


In the light of the large number of empirical investigations on parallel B&B
algorithm found in the literature, it is clear that parallelism is useful to im-
prove its search performance. However, a universal parallelization strategy
does not seem to exist. The algorithmic solutions for the major problems
posed by the parallelism when one tries to parallelize a sequential B&B al-
gorithm for a given application seems to be problem-dependent, and not
rarely, instance-dependent. The SDM favors the qualitative distribution of
the subproblems over the processors, but tends to incur more overheads in
data structure operations. It is not indicated when qualitative workload
sharing is not crucial. However, even when DDM is more indicated, spe-
cialized workload sharing is often necessary, but its nature depends on the
shape of the searched tree. When lower bounds vary over a large range, it
is more efficient to keep an order of selections relatively close to the order
of selections of the synchronized execution. Then, quality workload sharing
is more important, to which some randomness appears to be useful. Other-
wise, quantity workload sharing appears to be sufficient. Therefore, in the
context of "industrial strength" problems, the instances tend to present sev-
eral different BB-trees, and quality workload sharing seems to be necessary
as a general strategy.
Another crucial question that emerged concerns the impact of the se-
quential algorithm on the performance of its parallel counterpart. Appro-
priately ordering the search and restricting the region searched are the key
factors determining the efficiency of B&B algorithms. Every time line 1 in
Figure 1 is executed, the B&B algorithm must decide which path, among
several different paths (represented by the open subproblems), has the higher
probability to be a minimum path towards an optimal solution. The prior-
ity function h determines most of these choices in a selection, but ties must
be broken heuristically when two or more open subproblems have the same
priority. In particular, the efficiency of a sequential execution depends on
these tie breaks when some of the subproblems involved has a lower bound
greater than or equal to f*. This, because every subproblem v such that
l{v) < f* must be selected during a B&B execution since they cannot be
eliminated by a lower bound test.
430 R. C. Correa, A. Ferreira, and S. C. S. Porto

It appears that increasing the performance of the sequential algorithm


tends to decrease the performance of its parallel algorithm. However, the
effect on the scalability is not clear (for details about scalability theory, see
[42]). In order to obtain good speedups, the number of decomposed sub-
problems in the sequential case must be large enough. Low speedups tend
to be more frequent in small problems. With large problems, on the other
hand, acceleration anomalies tend to be more frequent. Finally, the rela-
tion between the parallel performance and machine characteristics, such as
communication and computation power, is not yet established in a clear and
systematic manner. From the asynchronous experiments, it seems important
to obtain overlap between communication and computation. The question of
completely modeling parallel B&B, with performance predictions, remains
open.

II - Metaheuristics

For N P-hard combinatorial problems, exact search algorithms can de-


generate into complete enumeration, with exponential increase in execu-
tion time in the worst cases, when problem size increases. Assuming that
p 1= N P, a way to partially overcome this difficulty is to adopt a relaxed
meaning to the expression "solving a DOP". While in Part I of this chapter
this expression was equivalent to "finding an optimal solution to a DOP", in
this part, we review techniques used to obtain a sub-optimal solution when
we solve a DOP. A sub-optimal solution is a solution which is not demon-
strated to be non-optimal. In more concrete terms, a heuristic that solves a
DOP is a search algorithm that returns "the best solution it can find" at a
reasonable computational cost. This solution is not necessarily an optimal
one, even it is not indicated how close to optimality it is, but better solu-
tions are unknown. Many heuristics are problem-specific, so that it works
for a particular DOP, but cannot be used to solved other different DOPs.
In this context, we employ the term metaheuristic to identify heuristics that
are not problem-specific, being applicable more generally to all DOPs.
Obtaining good solutions with a metaheuristic is often hampered by high
computational times, due to the great number or the computationally in-
tensive character of the search. Therefore, efficient parallel implementations
of the search algorithm can significantly increase the size of the problems
that can be tackled in plausible processing times. In the following sections,
Selected Algorithmic Techniques for Parallel Optimization 431

we present parallel tabu search, simulated annealing, canonical optimization


and genetic algorithms.

3 Tabu Search
Tabu search [26, 27, 28, 29] is an adaptive higher level heuristic for solving
DOPs, designed to guide other local search approaches to continue explo-
ration without becoming confounded by the absence of improving moves,
and without falling back into local optima from which it previously emerged.
This is accomplished using a certain number of memories. A great num-
ber of tabu search procedures may be derived using different strategies to
implement its memories. This tendency is emphasized when parallel im-
plementations are contemplated, which allow more efficient exploration of
the solution space. Beforing discussing these parallel alternatives, we briefly
describe the main ideas behind sequential tabu search.

3.1 Preliminaries on Sequential Tabu Search


Local search approaches for solving DOPs are based on iterative search
procedures in the solution space S starting from an initial solution x(O) E S.
At each iteration k, a new solution x(k + 1) in the neighborhood N(x(k)) of
the current solution x(k) is obtained. A neighborhood of a solution xES is
defined as a set of solutions that can be obtained through slight changes in x.
A move is an atomic change which transforms the current solution, x(k), into
one of its neighbors, say x'(k). Thus, the current solution moves smoothly
towards better neighbor solutions, according to the difference between the
value of the objective function after the move, f(x'(k)), and the value of the
objective function before the move, f(x(k)), enhancing the best obtained
solution x( k+ 1). This local search approach corresponds to the so-called hill-
descending algorithms, in which a monotone sequence of improving solutions
is examined, until a local optimum is found.
Broadly speaking, tabu search is a local search approach in which two
mechanisms are used to direct the search trajectory and may be viewed as
learning capabilities that gradually build up images of good or promising
solutions. These two mechanisms, which are described in the sequel, are
based on the concepts of solution or move attributes and search history.
A solution attribute is any component of a solution, such as an element
of a vector or a set of variables. Given a move from x(k) to x'(k), the
corresponding move attribute denotes the action performed to generate x' (k)
432 R. C. Correa, A. Ferreira, and S. C. S. Porto

from x(k), and encompasses any aspect that changes as a result of the
move. At any point of the search, the search history is the move or solution
attributes that participate in moves performed so far. Exploiting the search
history to control the search process is at the basis of the two mechanisms
mentioned above and described next.

The first is intended to avoid cycling through the use of tabu lists, which
work as short term memories that keep track of the recent search history,
classifying a subset of moves as prohibited (tabu). This classification de-
pends on the recency or the frequency with which a certain move or solution
attributes have participated of already performed moves. There are situa-
tions in which the tabu classification of certain moves may be too restric-
tive, resulting in the prohibition of promising non-visited solutions. In these
cases, it may be interesting to de-activate tabu restrictions. This mechanism
called aspiration criteria is introduced in the tabu search in order to identify
restrictions which may be temporarily relaxed [28J.

The second mechanism to direct the search trajectory makes use of one
or several memories, to direct the search either into the exploration of a
promising neighborhood (intensification), or towards previously unexplored
regions of the solution space (diversification). The roles of intensification
and diversification in tabu search become especially relevant in longer term
search processes. Intensification strategies build new solutions by agressively
encouraging the incorporation of "good attributes". Diversification strate-
gies instead seek to generate solutions which consist of attributes signifi-
cantly different from those encountered previously during the search. These
two types of strategies counterbalance and reinforce each other in several
ways [28J.

The sequential tabu search algorithm is summarized in Figure 6. In


this pseudo-code, we make use of a generic variable H, which records the
various memories employed in tabu search as a search history representa-
tion. Given a current solution x(k), the notation N(H,x(k)) stands for the
subset of solutions in the neighborhood of x(k) satisfying the restrictions
of H. These currently accessible solutions are evaluated with the function
f(H,x(k)), which is a modification of the objective function that has the
purpose of evaluating the relative quality of solutions in N(H, x(k)). Fi-
nally, CandidateJ/(x(k)) denotes a subset of N(H,x(k)), and is used to
isolate some currently accessible solutions in case (H, x(k)) is too expensive
to examine entirely.
Selected Algorithmic Techniques for Parallel Optimization 433

Algorithm: sequential tabu search


x f- some solution of S;
Xu f- X; U f- f(x);
while termination criteria is not satisfied do
Determine CandidateJV(x) from N(H,x);
x' f- some solution in CandidateJV(x) that minimizes
f(H, x) over this set;
Update H;
if f(x' ) < U then
Xu f- x'; U f- f(x' );
X f- x';
return xu;

Figure 6: Pseudo-code for sequential tabu search.

3.2 Parallel Tabu Search

The design of parallel implementations for tabu search may use some basic
ideas derived from the work of Crainic et al. [16]' whose main interest is
to take into account the differences in control and communication strate-
gies which are so important when designing parallel algorithms. They have
proposed a taxonomy, which has a twofold basis: first, on how the search
space is partitioned; and second, on the control and communication strate-
gies used by parallel tabu search procedures. Therefore, how the knowledge
is gathered during the parallel exploration of the domain, and the way it
is exchanged and combined among tasks are issues as important as how
the domain is divided among the various tasks. A framework of parallel
tabu search can be built according to three dimensions meant to capture all
these factors. The first two dimensions represent the parallelization schemes
relatively to the control of the search trajectory and the communication ap-
proach, while the third accounts for the strategies used to partition the
domain and to specify the parameters for each search.
In the framework we describe in this section, each task ti is assigned a
portion Si of the search space S, and can carry out a search from an initial
solution Xi picked up in Si. We denote by Searcher the set of processors that
carry out a search, whose cardinality determines the number of trajectories
considered. If ti E Searcher, the candidates taken into account in each
434 R. C. Correa, A. Ferreira, and S. C. S. Porto

move of the search in ti include those in Si and those in the portion of the
solution space of some ofti's neighbors in G. This framework is summarized
in Figure 7. In what follows, we discuss how the guards in this pseudo-code
implement the three dimensions mentioned above.

Algorithm: parallel tabu search, task ti, 0::; i < m


if ti E Searcher then
Xi +- some solution of Si;
XUi +- Xi; Ui +- f(Xi);
while (TermCond = FALSE) do
guard: (TermCondi = TRUE)
Spontaneous termination actions;
guard: termination message
Termination actions, eventually setting TermCond to T RUE;
1. guard: (state = INACTIVE) AND (ti E Searcher)
Compute reachi based on Xi, Mi and Hi;
Spontaneously request move attributes that minimizes
f(H, Xi) over reachi;
Determine Candidate-M(xi) from Ni(Hi,xi);
x; +- some solution in CandidateJli(xi) that minimizes
f(Hi,xi) over this set;
Update Hi;
state +- WAITING;
2. guard: H management message
H management actions, eventually setting TermCondi to
FALSE or TRUE, or x; to a Xi'S neighbor, state to
ACT IV E and updating Hi;
guard: (state = ACTIVE)
if f(x;) < Ui then
XUi +- x;; Ui +- f(x;);
Xi +- X;;
state +- INACTIVE;
return XUi;

Figure 7: Pseudo-code for parallel tabu search.

The control of the search trajectory involves two aspects. The first one
is the partition of the search history recorded in H. In the parallel case,
each task is responsible for recording the attributes corresponding to the
moves it performs. This "local history", for a task ti, is represented by the
variable Hi. This variable is used and updated by ti each time it computes
Selected Algorithmic Techniques for Parallel Optimization 435

the candidates for a move from some solution (lines 1 and 2). The second
aspect is the definition of G, which determines the portions of the search
space a task may consider in a move.
We represent the partition of the domain by a family of sets of move
attributes. Let M denote the set of move attributes to be used in the
moves, and Mi denote the subset of move attributes assigned to task k
This implies that Candidate-Nj(x) only contains solutions obtained from x
using the move attributes in Mi. Thus, in the guard 1, ti asks for the best
move from Xi in some of its neighbors and search for the best move in its
own Candidate-Nj(x). The new solution x~ is finally determined in guard 2
when ti receives the best moves from Xi it requested, and corresponds to the
best of all these moves.
The communication approach, for each task ti, also involves two aspects.
The first one, which we call management of H in guard 2, is related to the
order of move requests ti serves. Since recency and frequency are parameters
defining the tabu list, this order has some influence on the trajectories.
The second aspect is the computation of reachi' which is the variable that
indicates the set of neighborood tj of ti whose solutions Candidates-Nj(xi)
are to be taken into account in a move. In the guard 1, ti requests the best
move of the tasks in reach i .

3.3 Parallelization Trends and Applications


Several implementations have been proposed for the parallelization of tabu
search. In this section, we list some of these implementations as illustrations
of how the framework of the previous section can be employed.
A tabu search scheme to the Quadratic Assignment Problem (QAP) is
presented by Taillard [74]. In this approach, the neighborhood is divided
into parts of about the same size, which are evaluated on different proces-
sors. Each processor computes the values of the moves assigned to it and
communicates to the others the best move it found, receiving the best moves
found by the other processors. Each processor chooses and performs the best
move proposed to (or by) it and updates the cost function and move value
accordingly. Computational results on a ring of transputers are presented.
Efficiency values of up to 85% are reported for 10 processors. Another par-
allelization strategy proposed in the same work consists of performing many
independent searches from different initial solutions.
Similarly in [70], Porto and Ribeiro study different synchronous strate-
gies for the parallel implementation of tabu search, taking the task schedul-
436 R. C. Correa, A. Ferreira, and S. C. S. Porto

ing problem on heterogeneous processors under precedence constraints as


the framework for the development, implementation, validation, and perfor-
mance evaluation of different parallel strategies. Several strategies are pro-
posed, discussed and compared: the master-slave model, with two different
schemes for improved load balancing, and the single-program-multiple-data
model, with single-token and multiple-token schemes for message passing.
All strategies are synchronous and based on the decomposition of the neigh-
borhood of the current solution at each iteration. Their differences rely
exclusively on distinct information communication patterns between par-
allel tasks during execution. The computational results confirm the great
adaptability of this kind of algorithm to parallelization, showing that com-
munication (accomplished by message exchange) is not a burden to the
achievement of almost linear efficiency in the majority of the test problems.
The task scheduling problem considered in this study is characterized by
neighborhood structures which are very large and costly to explore. How-
ever, the speedups achieved through simple parallelization techniques made
possible the use of a less restricted neighborhood search, which in some cases
provided better solutions.
Garcia and Toulouse [25] proposed a tabu search approach for the vehicle
routing problem with time windows. An asynchronous algorithm was elabo-
rated based on the partition of the neighborhood among several processors.
The processors are organized in a master-slave scheme, although the com-
munication is performed using a tree based interconnection structure. The
master is responsible for choosing the best move at the end of each iteration.
The acceleration due to parallelization makes possible the exploration of a
wider neighborhood. In the asynchronous version interprocessor communi-
cation occurs exclusively in two situations: (i) when a processor finds a new
optimum and wants to turn this public to other processors, and (ii) if some
processor has not improved its solution after a certain number of iterations,
then it will look for help from its peers, demanding their current optimal
solutions. The best solution found is never worse than that found by the
sequential version, as far as all processors execute the sequential version in-
dependently, exchanging optimal results from time to time. The difference
relies in the way each processor does its own sequential search. Features
such as initial solution and neighborhood size are varied to differentiate the
searches among processors.
Fiechter [24] proposed an efficient parallel tabu search algorithm for large
traveling salesperson problems. Two different approaches were considered,
both based on the general idea that the local search can be sliced into sev-
Selected Algorithmic Techniques for Parallel Optimization 437

eral independent searches performed in parallel, without much loss of quality,


while tabu search assures a high global solution quality. First, the search
for the next move to be performed can be parallelized, requiring the parti-
tion of the set of feasible moves. The overall best move is then determined
and applied to the current solution. This technique requires extensive com-
munication, since synchronization is required at each step. It is therefore
only worth applying to problems in which the search for the best move is
relatively complex and time consuming. The second type of parallelization
consists of performing several moves concurrently by partitioning the prob-
lem itself into several independent subproblems. The global solution is then
obtained by combining the subsolutions. This method needs no communi-
cation or synchronization, except for its initialization and for grouping the
subsolutions at the end. The difficulty with this kind of parallelism is that
it strongly limits the move possibilities and thus generally induces a loss of
quality. High-level tabu search seems particularly well suited to overcome
this difficulty, as this type of parallelism can be used in the intensification
phase, ensuring the global quality of the final solution by a diversification
procedure. The intensification strategy consists in dividing the current tour
in several open subtours (slices), each of them having a vertex in common
with the two adjacent slices. The path between these two fixed vertices is
then optimized independently on each slice. Two passes of this scheme with
shifted slices are applied to allow other combinations of edges, particularly
around the boundaries between slices. The entire procedure is clearly suited
for parallel computation, as the optimization is done completely indepen-
dently on each slice. The algorithm has been implemented in OCCAM on
a network of transputers. The parallel algorithm has been tested for 500,
3000, and 10000 vertices, and the speedups have been observed to be close
to optimal.
In [75], Taillard presents a special method based on tabu search for the
job shop scheduling problem, considering both serial and parallel imple-
mentations. The goal is to find a schedule for the operations in the ma-
chines, considering the precedence restrictions, which minimizes the overall
makespan, i.e. the finish time of the last operation to be completed in the
schedule. The sequential case shows that tabu search is more efficient than
other methods previously proposed in the literature, such as simulated an-
nealing and the shifting bottleneck procedure. The addition of long-term
memory has shown itself easy to implement and very efficient in enhanc-
ing solution quality for larger problems. For small problems, tabu search
is slower than the best known branch-and-bound methods. However, when
438 R. C. Correa, A. Ferreira, and S. C. S. Porto

problem size increases, the efficacy of tabu search is greater than any other
exact or heuristic method already published. Two different approaches were
proposed and evaluated in diversified parallel environments such as trans-
puter and Cray-2 producing low performance results due to large message
transmission delays and frequent synchronization, respectively. A third ap-
proach, based on a generic method of parallelizing probabilistic algorithms,
was considered and theoretically evaluated through several consecutive ex-
ecutions of the sequential tabu search algorithm. It was shown that paral-
lelization based on the execution of multiple independent search trajectories
is capable of producing almost linear speedup. The tabu search approach
finds new and better solutions for every problem in two sets of benchmark
problems.

4 Annealing-Based Methods
The ideas that form the basis of simulated annealing and other annealing-
based methods are based on an algorithm to simulate the cooling of material
in a heat bath - a physical process known as annealing. This type of simu-
lation is used to search the feasible solutions of a DOP, with the objective
of converging to an optimal solution [19].

4.1 Sequential Simulated Annealing


This approach can be regarded as a variant of the well-known heuristic tech-
nique of local search, in which a subset of the feasible solutions is explored
by repeatedly moving from the current solution to a neighboring solution.
In the simulated annealing heuristic, as in tabu search, uphill moves are al-
lowed in order to avoid a premature convergence to a local optimum. How-
ever, their frequency is governed by a probability function that changes as
the algorithm progresses [19]. The inspiration for this form of control was
Metropolis' work in statistical thermodynamics [56], based in laws which
state that at temperature t, the probability of an increase in energy of mag-
nitude 6E is given by
p(6E) = exp(-6Ejkt), (2)
where k is a physical constant known as Boltzmann's constant.
Metropolis' simulation generates a perturbation and calculates the re-
sulting energy change. If energy has decreased the system moves to this
new state. If energy has increased, the new state is accepted according to
Selected Algorithmic Techniques for Parallel Optimization 439

the probability given in the previous equation. The process is repeated for
a predetermined number of iterations at each temperature, after which the
temperature is decreased until the system freezes into a steady state.
Kirkpatrick et ai. [40] and Cerny [7] independently showed that the
Metropolis' algorithm could be applied to optimization problems by map-
ping the elements of the physical cooling process onto the elements of a
DOP, such as: system states are mapped as feasible solutions; the energy
represents solution cost; the change of state refers to neighboring solution;
the temperature is the control parameter; and the frozen state is the heuris-
tic final solution. Thus any local optimization algorithm can be converted
into an annealing algorithm by sampling the neighborhoods randomly and
allowing the acceptance of an inferior solution according to the probability
given in equation 2. The level of acceptance of uphill moves then depends on
the magnitude of the increase in the objective function, on the temperature
and on the search time date.

4.2 Parallelization Trends and Applications


The suitability of the annealing process for parallelization has been the sub-
ject of much recent research. This is especially so in the applications such
as image processing and VLSI technology design [19].
Aarts and Korst [1] identify three ways in which parallelism may be in-
troduced into the annealing process. The simplest is to allow different pro-
cessors to proceed with annealing using different streams of random numbers
until the temperature is about to be reduced. At this point, the best result
from all the processors is chosen and all processors restart the search at the
new temperature from this common solution. This will result in significantly
different chains of solutions when the temperature is high, while when the
temperature is low and there is less flexibility, it is likely that the processors
will end up with solutions which are similar in terms of the neighborhood
structure and their cost [19].
A second strategy is to use the processors to generate random neighbors
and test for acceptance independently. Once one processor finds a neighbor
to accept then this is communicated to all other processors and the search
moves to the neighborhood of the new current solution. This strategy will
be wasteful at high temperatures when almost all solutions are accepted,
as many processors are likely to find an acceptable solution but only one
will be used. On the other hand, at low temperatures when the majority of
solutions are rejected this will speed up the search considerably. Thus, the
440 R. C. Correa, A. Ferreira, and S. C. S. Porto

best solution would appear to be to start with the first strategy until the
ratio of rejections to acceptances exceeds a certain level and then to switch
to the second strategy [19J.
Finally, the third strategy consists of allowing all processors to act on
a common current solution independently, each one generating a neighbor
and updating the solution if the neighbor is accepted. However, it is possible
that this will result in two moves being made which both give an improve-
ment when considered independently but result in an uphill move if they
are both carried out. This approach has been used successfully for a school
timetabling problem by Abramson [2J, where a neighboring solution is de-
fined as one obtained by moving a class from one time slot to another. The
cost function is made up of the weighted sum of penalties representing differ-
ent forms of clashes and feasibility violations. When a processor considers
a move the two time-slots involved are locked from the other processors.
Thus all the clashes which are involved in the resulting change in the cost
function are also locked. The other processors will be working on unrelated
time-slots, so that the changes in cost they calculate will be independent
of those calculated by any other and the problems described above will not
arise. This locking process does have an overhead and Abramson reports
that for a given number of processors there is a threshold value on the num-
ber of time-slots below which it is better to use a sequential approach [19J.
In [41], Knopman and Aude analyse alternatives for the parallelization of
the simulated annealing algorithm when applied to the placement of modules
in a VLSI circuit. It is shown that different parallelization approaches have
to be used for high and low temperature values of the annealing process. The
algorithm used for low temperatures is an adaptive version of the speculative
algorithm proposed in the literature. Within this adaptation, the number
of moves evaluated per processor between synchronization points change
with the temperature. At high temperatures, an algorithm based on the
parallel evaluation of independent chains of moves has been adopted. It
is shown that results with the same quality of those produced by a serial
version can be obtained when shorter length chains are used in the parallel
implementation.

4.3 Microcanonical Optimization


A variation of the simulated annealing method is the so called microcanonical
optimization (referred, heretofore, as f.1.0 ), which was proposed by Torreao
and Roe [77J for image processing applications, and later refined and em-
Selected Algorithmic Techniques for Parallel Optimization 441

ployed by Linhares [49J in the solution of the TSP problem, yielding signif-
icant results. The basic algorithm is based on principles of statistical me-
chanics and consists of two iterative procedures - initialization and sampling
-, which are alternately applied. The initialization procedure implements
an iterative improvement search, in order to approach a local minimum
solution, while the sampling procedure tries to free itself from that local
minimum, at the same time keeping close to it, in terms of the value of the
objective function.
The initialization phase searches randomly through the solution space
for a lower-value solution. It may be seen as a "hill-descending" procedure,
since, at each iteration, a move is randomly proposed which is accepted only
if it imposes a value decrease on the current solution. The goal here is to
quickly approach a local-minimum solution. Optionally, an aggressive imple-
mentation of this phase can be chosen, meaning that the algorithm, at each
iteration, will pick the best candidate in a subset of possible moves. Dur-
ing the initialization, a list of the moves rejected for leading to higher-value
solutions is compiled, to be used in the subsequent sampling phase. The ini-
tialization ends when a certain number of consecutive moves, maxmoveinit,
have been rejected, meaning that the algorithm is close to a local minimum.
In the sampling phase, jlO aims at freeing itself from the local mini-
mum reached in the initialization, at the same time trying not to stray too
much, in terms of value, from that solution. Considering a three-dimensional
space as a metaphor of the search space, one may envision the jlO heuris-
tic as trying to get "around the hill", instead of "hill descending", in order
to break free from the local minimum. This is achieved by implementing
the so-called Creutz algorithm of statistical physics [17], where an extra
degree of freedom - called the demon - generates controlled disturbances
(moves) on the current solution. At each sampling iteration, the randomly
proposed move will only be accepted if the demon can supply or receive
the value variation implied by that move. The demon thus restricts the
maximum value variation which is allowed if a move is to be effected. It
is defined by two parameters: its capacity, dmax , and its initial value di.
The sampling phase generates a sequence of solutions of fixed value, except
for small fluctuations which are modeled by the demon. Calling Xi the so-
lution obtained in the initialization, and d and x, respectively, the value
of the demon and the solution at a given instant in the sampling, we will
have f(x) + d = f(Xi) + di = constant. Thus, the sampling phase gen-
erates solutions in the value interval [J(Xi) - dmax + di , f(Xi) + di], with
di, dmax « f(Xi).
442 R. C. Correa, A. Ferreira, and S. C. S. Porto

Therefore, di and dmax are the main parameters of the sampling phase.
They are determined as follows. The list of rejected moves, compiled in the
initialization, is sorted in increasing order of the value jumps, and two of its
lower entries are chosen as the demon capacity and initial value. The idea
is that such values will be representative of the hills found in the landscape
of the solution space, in the region being searched, thus being adequate
for defining the magnitude of the perturbations required for the evolution
of the current solution in the sampling phase. This phase stops when a
given number of iterations, maxitersamp , has been reached, after which a
new initialization procedure is performed. The algorithm thus proceeds,
alternating the two phases, until a stopping condition (such as a certain
number of iterations without global improvement, maxiteral g ) is satisfied.
The JJO heuristic shows good adaptability towards parallelization, due
to its alternating two-phase structure and to the randomness of its move se-
lection procedure, this latter aspect also being responsible for its controlled
diversification characteristic, as shown by Porto, Torreao and Barroso [71].
One may envision different possibilities for parallel strategies based on two
distinct parallelization approaches [71], namely: N eighborhood-Partitioning
(NP) and Multi-Threading (MT). Neighborhood-Partitioning represents the
class of strategies where p processes start the search from an unique single so-
lution, a neighborhood partition (a subset of the neighbor solutions) is given
to each process at the beginning of each iteration and each process performs
the local search strictly over this previously determined partition during
this iteration. Variations of this approach, for example, are based on (i) the
partitioning scheme which defines the subset of neighbor solutions that each
process will work on and (ii) the heuristic parameter settings employed by
each process. Taking into account this latter feature, similarly as for tabu
search in [16], one may define two different trends: Single-Parameter-Setting
(SPS) and Multiple-Parameter-Setting (MPS). In a single parameter setting,
all processes use the same parameters values, while in the multiple parameter
setting processes have different values for the heuristic parameters.
The Multi-Threading strategy, on the contrary, is initiated from different
starting solutions, one for each of the p parallel processes at the most. Each
process performs the iterative search over the entire neighborhood of its
own starting solution point. Again in this case, it is possible to establish
variations based on the heuristic parameter settings determined for each
process, namely: SPS and MPS approaches previously mentioned.
Moreover, as the JJO is composed of initialization and sampling phases,
the parallel JJO algorithm is thus composed of a parallel initialization and a
Selected Algorithmic Techniques for Parallel Optimization 443

parallel sampling phases respectively. Between phases, one may implement


what we call an interphase point, during which defines a certain communi-
cation pattern may take place between processes before starting the next
phase. The intensity of the communication depends directly on the parallel
strategies implemented for both phases. As both phases are consecutive to
each other, two distinct interphase points are present. This interphase point,
in some cases, plays the role of a synchronization point, but on the other
hand, may also not impose any waiting barrier for the parallel processes. In
this sense, when building a parallelization strategy for the }.LO heuristic, one
must specify: (i) a parallel strategy for the initilization phase, (ii) a parallel
strategy for the sampling phase, and (iii) the process communication pattern
at the interphase points.
Following these }.LO parallelization potential trends, Porto et al. [71] de-
veloped two distinct parallel implementations for the task scheduling prob-
lem with the }.LO heuristic, both based on a master-slave scheme where p
processes execute alternate parallel versions of the initialization and sam-
pling phases, coupled at a synchronization point coordinated by the master
process. In the first implementation , the parallelization strategy of the
initialization phase is based on a MT-SPS approach, while the strategy
used for the sampling phase is based on a NP-SPS approach. The first
interphase point, between initialization and sampling determines a synchro-
nization point, where the m~ter gathers results from all the processes and
determines the starting solution of the following sampling phase. The secong
interphase point, between sampling and initialization phases has no commu-
nication between processes. Thus processes continue from sampling to the
next initialization phase without any delay.
During initialization, each process executes the random "hill-descending"
search procedure, previously described, over the entire solution space. At
the synchronization point, one of the processes, called the master process,
receives the results of the initialization phase from all processes and selects
the best solution. If the master concludes that the search should proceed
(stopping conditions have not been attained), it broadcasts the selected
solution to all processes. Thus, every process will start the sampling phase
from the same solution. However, differently from the initialization phase,
the solution space is divided into p disjoint and equally sized regions, and
each process is made responsible for the search over one particular region
during the sampling phase. A region of the solution space is defined by a
given subset of the tasks (subject to the scheduling) which are allowed to
move (from one processor to another) during the heuristic search. In order
444 R. C. Correa, A. Ferreira, and S. C. S. Porto

to assign similar workloads to each process, the subset of tasks is randomly


selected by the master in the beginning of each sampling phase. There is no
synchronization between the sampling phase and the following initialization
phase. Each process starts a new initialization phase from the solution
obtained during its previous sampling phase.
The second implementation distinguishes itself from the previous strat-
egy due exclusively to the parallelization approach employed during the
sampling phase. In this case, the processes still start from the same best
found solution during the initialization phase, but the neighborhood is par-
titioned among them. All processes execute their sampling phase in freedom
to perform any feasible trial move, without the restrictions imposed in the
MT-NP parallel version. Initialization phase and interphase points remain
unchanged.

5 Genetic Algorithms
A genetic algorithm is a guided random search method [30, 33, 57, 18] where
elements (called individuals) in a given set of solutions (called population)
are randomly combined and modified (we call these combinations crossover
and mutation, respectively) until some termination condition is achieved.
The population evolves iteratively (in the genetic algorithm terminology,
through generations) in order to improve a given cost function (the fitness
in the genetic algorithm terminology) of its individuals. The fitness of an
individual SI is said to be better than the fitness of another individual 82 if
the solution corresponding to 81 is closer to an optimal solution than that of
82. In each iteration, the crossovers generate a new population in which the
individuals are supposed to keep the good characteristics of the individuals
of the previous generation.
Thus, a genetic algorithm starts with an initial population that evolves
through generations. This evolution starts with an initial population ran-
domly generated, and the ability of an individual to span through different
generations and to reproduce depends on its fitness. In what follows, we
review the operators that compose a genetic algorithm.
The selection operator allows the algorithm to take biased decisions fa-
voring good individuals when changing generations. For this, some of the
good individuals are replicated, while some of the bad individuals are re-
moved. As a consequence, after the selection, the population is likely to
be "dominated" by good individuals. Starting from a population PI of size
Selected Algorithmic Techniques for Parallel Optimization 445

S, this transformation is implemented iteratively by generating a new pop-


ulation Pi+ 1 of the same size as Pi, following the roulette wheel method:
Initially, the best individual of PI is replicated, with only one copy kept in
Pi and the other inserted in P2 . Then, a probability, Pi, of being chosen is
assigned to each individual, based on its relative fitness, iI = '2:.f:i11;' such
that Pi = k At each iteration, we randomly select an individual 81 E PI,
according to Pi. Finally, 81 is duplicated into a new individual 8~, and Sl
is kept in Pi while s~ is inserted into Pi +1. This process is repeated until
PHI reaches the size of Pi. This method is based on stochastic sampling
with replacement and each individual can be selected more than once or not
at all, the more fit being more likely to be chosen. Thus, some individuals
(hopefully the less fit) are eliminated from generation to generation.
Genetic algorithms are based on the principles that crossing two indi-
viduals can result on offsprings that are better than both parents, and that
a slight mutation of an individual can also generate a better individual.
The crossover takes two individuals of a population as input and generates
two new individuals, by crossing the parents characteristics. Hence, the
offsprings keep some of the characteristics of the parents. The mutation
randomly transforms an individual that was also randomly chosen. It is im-
portant to notice that the size of the different populations are all the same.
Therefore, it is desirable that "bad" individuals generated by crossover and
mutation operators tend to be eliminated, while "good" individuals tend
to survive and to reproduce. Thus, the selection operator eliminates some
individuals with poor fitness from generation to generation.
The structure of the algorithm (see Figure 8) is a loop composed of a
selection followed by a sequence of crossovers and a sequence of mutations.
Let the population be randomly divided in pairs of individuals. The sequence
of crossovers corresponds to the crossover of each of such pairs. After the
crossovers, each individual of the new population is mutated with some (low)
probability. This probability is fixed at the begining of the execution and
is constant. Moreover, the termination condition may be the number of
iterations, execution time, results stability, etc.

5.1 Parallel Trends

Different approaches exist for designing parallel genetic algorithms [46, 34,
8, 37]. They can be divided in two classes, as follows.
446 R. C. Correa, A. Ferreira, and S. C. S. Porto

Algorithm: Sequential genetic algorithm


t +- 0; % the generation counter
Choose initial population Pt ;
Evaluate the fitness for all elements of Pt ;
while (not finished) do
t +- t + 1;
Select Pt ;
Crossover of the elements in Pt ;
Mutation of the elements in Pt ;
Evaluate the fitness for all elements of Pt ;
return the best individual;

Figure 8: A classic genetic algorithm.

5.1.1 Centralized Genetic Algorithms


These strategies usually consider the population as a whole and execute
the cost and fitness functions evaluations concurrently (Steps 5 and 5 in
Figure 8). Other steps, as selection and crossover, could also be parallelized,
depending on their sequential time consumption [32]. The best paradigm to
implement this approach is master-slave, where the master holds the entire
population and gives the slaves work when needed [72]. Several modes for
the exchange of data have been proposed. They can be synchronous, semi-
asynchronous or asynchronous [30]. One of the advantages of this method is
that the best individual is always available at the master. Evidently, in case
all functions are easy to compute, there is no advantage in this approach.
Thus, we can say that it is very problem-dependent.

5.1.2 Distributed Genetic Algorithms


The distributed approach considers the whole population as the union of
breeding sub-populations of same size, called islands (whence the name is-
land modeQ, in which the classic genetic algorithm is run [31, 69, 10, 76, 6].
In this case, the sub-populations evolve independently and eventually in-
teract with other sub-populations for mating, according to specific policies
to be respected. A notion of neighborhood among the islands is defined,
and individuals are exchanged between neighbor islands, through a migra-
tion process, according to some predefined schedule and to a mode in which
Selected Algorithmic Techniques for Parallel Optimization 447

sub-populations are updated, synchronously or asynchronously.


Figure 9 shows a generic distributed genetic algorithm, where the use
of these rules for interaction, described below, becomes clear. Notice that
different choices define myriades of "new" genetic algorithms.

Grain policy: The size of a sub-population (or island) may vary from a
single individual (fine-grained case) [61,31, 72, 55], to sizes comparable
to genetic algorithms run in sequential (coarse-grained case) [76, 69].
This rule is used in Step 1 of the algorithm.

Neighborhood policy: Each island should exchange individuals for breed-


ing at time steps defined by the schedule policy. These exchanges take
place with neighbor islands. Such neighborhood is defined according
to an underlying graph, where the islands are mapped to the ver-
tices. Two islands are then neighbors if they are connected by an
edge. Graphs tipically used are the complete graph [66], where all
islands are neighbors to each other, and the grid [53, 73] and the
hypercube [76] graphs, because they reflect the architecture of many
parallel computers. This rule is used in Steps 2, 3, 4 and 5 of the
algorithm.

Schedule policy: This policy defines when individuals should migrate


and/or individuals coming from neighbor islands should be taken into
account in the selection process [79]. This rule is used in Steps 4 and
5 of the algorithm.

Migration policy: Once the schedule policy is defined, the type of indi-
viduals that should migrate to neighbor islands must be determined.
They can be, for instance, the best [79, 69], or randomly chosen [10]
individuals, from each island. This rule is used in Steps 4 and 5 of the
algorithm.

Mode policy: This rule is used in Steps 4 and 5 of the algorithm, and,
tipically, there are two main possible implementations:

Synchronous: Each sub-population evolves according to the classic


genetic algorithm for a number of iterations, defined by the sched-
ule policy. Then, all islands synchronize and exchange some in-
dividuals, as defined by the migration policy [79, 10].
448 R. C. Correa, A. Ferreira, and S. C. S. Porto

Asynchronous: Once again, all sub-populations run the classic ge-


netic algorithm. However, the exchanged individuals arrive in a
mailbox and are taken into account for breeding according to the
schedule policy [31, 60].

Algorithm: parallel genetic algorithm, task ti, 0 ~ i ~ m - 1


t +-- 0; % the generation counter
1. Choose initial sub-population P! (according to the grain policy);
Evaluate the fitness for all elements of p ti ;
TermCond +-- FALSE; TermCondi +-- FALSE;
while (TermCond = FALSE) do
2. guard: (TermCondi = T RUE)
Spontaneous termination actions;
3. guard: termination message
Termination actions, eventually setting TermCond to T RU E;
4. guard: some condition depending on the schedule and mode policies
Choose and seI).d individuals according to the migration and
mode policies;
5. guard: receive individual depending on the schedule and
mode policies
Choose and send individuals according to the migration and
mode policies;
guard: TRUE
t +-- t + 1;
Select P!;
Crossover of the elements in P!;
Mutation of the elements in p ti ;
Evaluate the fitness for all elements of P!;
return the best individual;

Figure 9: A generic distributed genetic algorithm.

6 Conclusion
We reviewed in this chapter the literature pertinent to modeling, perfor-
mance characterization and implementation of parallel algorithms for solving
discrete optimization problems in distributed memory systems. In partic-
ular, we described issues arising in the parallelization of one exact method
Selected Algorithmic Techniques for Parallel Optimization 449

(the branch-and-bound) and four metaheuristics, namely tabu-search, sim-


ulated annealing, micro canonical optimization and genetic algorithms.
With the advent of parallel systems composed of networks of PCs, in-
terconnected by an ultra-fast LAN (Myrinet, Fast-Ethernet, etc.), larger
optimization problems will be at the reach of a good solution in a reason-
able computing time. Therefore, it is of tremendous importance to design
and experimentally analyze algorithmic techniques to run in such high per-
formance systems. We hope that this chapter will help the interested reader
to find inspiration in this challenging field.

Acknowledgments
The authors are grateful to Panos Pardalos for his constant motivation.

References
[1] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines,
(Wiley, Chichester, 1989).
[2] D. Abramson, Constructing school timetables using simulated anneal-
ing: Sequential and parallel algorithms, Management Science, Vo1.37
(1991) pp. 98-113.
[3] G. Ananth, V. Kumar, and P. Pardalos, Parallel processing of discrete
optimization problems, Encyclopedia of Microcomputers, Vo1.13 (1993)
pp. 129-147.

[4] S. Arvindam, V. Kumar, and V. N. Rao, Efficient parallel algorithms


for search problems: Applications in VLSI CAD, in Proceedings of the
Frontiers 90 Conference on Massively Parallel Computation (1990).

[5] V. Barbosa, Introduction to Distributed Algorithms, (The MIT Press,


1996).
[6] R. Battiti and G. Tecchioli, Parallel biased search for combinatorial
optimization: Genetic algorithms and tabu, Microprocessors and Mi-
crosystems, Vo1.16 (1992) pp. 351-367.

[7] V. Cerny, A thermodynamical approach to the travelling salesman


problem: An efficient simulation algorithm, Journal of Optimization
Theory and Applications, Vo1.45 (1985) pp. 41-55.
450 R. C. Correa, A. Ferreira, and S. C. S. Porto

[8] A. Chipperfield and P. Fleming, Parallel genetic algorithms, in


A. Zomaya (ed.) Parallel and Distributed Computing Handbook,
(McGraw-Hill, 1996) Chapter 39, pp. 1118-1143.
[9] J. Clausen, Do inherently sequential branch-and-bound algorithms ex-
ist? Parallel ProcesBing Letters (1994).
[10] J. Cohoon, S. Hedge, W. Martin, and D. Richards, Punctuated equi-
libra: A parallel genetic algorithm, in Proceedings of the Second Int.
Conf. on Genetic Algorithms, (MIT, Cambridge, 1987) pp. 148-154.

[11] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms,


(The MIT Press, McGraw-Hill, 1990).
[12] R. Correa, Recherche Arborescente Parallele : de la Formulation AI-
gorithmique aux Applications, PhD thesis (Institut National Polytech-
nique de Grenoble, France, 1997).
[13] R. Correa and A. Ferreira, A distributed implementation of asyn-
chronous parallel branch-and-bound, in A. Ferreira and J. Rolim (eds.)
Parallel Algorithms for Irregular Problems: State of the Art, (Boston,
Kluwer Academic Publisher, 1995), Chapter 8, pp. 157-176.
[14] R. Correa and A. Ferreira, On the effectivenes of parallel branch and
bound, Parallel Processing Letters Vol.5 No.3 (1995) pp. 375-386.
[15] R. Correa and A. Ferreira, Parallel best-first branch-and-bound in dis-
crete optimization: A framework, in A. Ferreira and P. Pardalos (eds.)
Solving Combinatorial Optimization Problems in Parallel, volume 1054
of LNCS State-of-the-Art Surveys, (Springer-Verlag, 1996) pp. 171-200.
[16] T. Crainic, M. Toulouse, and M. Gendreau, Towards a taxonomy of
parallel tabu search algorithms, Technical Report CRT-933, (Centre de
Recherche sur les Transports, Universite de Montreal, 1993).
[17] M. Creutz, Microcanonical monte carlo simulation, Physics Review
Letters Vol.50 (1983) p. 1411.

[18] L. Davis (ed.), Handbook of genetic algorithms, (New York, Van Nos-
trand Reinhold, 1991).
[19] K. Dowsland, Simulated annealing, in C. R. Reeves (ed.) Modern
Heuristics Techniques for Combinatorial Problems, Advanced Topics in
Computer Science, (Blackwell Scientific Publications, 1993), pp. 20-69.
Selected Algorithmic Techniques for Parallel Optimization 451

[20] J. Eckstein, Control strategies for parallel mixed integer branch and
bound, in Proceedings of Supercomputing (1994).

[21] J. Eckstein, Parallel branch-and-bound algorithms for general mixed


integer programming on the CM-5, SIAM Journal on Optimization
VolA NoA (1994) pp. 794-814.

[22] A. Ferreira and P. Pardalos (eds.) Solving Combinatorial Optimization


Problems in Parallel: Methods and Techniques, volume 1054 of LNCS
State-of-the-Art Surveys, (Springer-Verlag, 1996).

[23] A. Ferreira and J. Rolim, (eds.) Solving Irregular Problems in Parallel:


State of the Art, (Boston, Kluwer Academic Publisher, 1995).

[24] C.-N. Fiechter, A parallel tabu search algorithm for large traveling
salesman problems, Discrete Applied Mathematics Vol.51 (1994) pp.
243-267.

[25] B. Garcia and M. Toulouse, A parallel tabu search for the vehicle rout-
ing problem with time windows, Computers and Operations Research
Vo1.21 (1994) pp. 1025-1033.

[26] F. Glover, Tabu search - part I, ORSA Journal on Computing VoLl


(1989) pp. 190-206.

[27] F. Glover, Tabu search - part II, ORSA Journal on Computing Vol.2
(1990) pp. 4-32.

[28] F. Glover and M. Laguna, Tabu search, in C. R. Reeves (ed.) Modern


Heuristics Techniques for Combinatorial Problems, Advanced Topics in
Computer Science, (Blackwell Scientific J>ublications, 1993) pp. 70-150.

[29] F. Glover, E. Taillard, and D. de Werra, A user's guide to tabu search,


Annals of Operations Research Vol.41 (1993) pp. 3-28.

[30] D. Goldberg, Genetic algorithms in search, optimization, and machine


learning, (Addison-Wesley, 1989).

[31] M. Gorges-Schleuter, Explicit parallelism of genetic algorithms through


population structures, in Proceedings of the First Conference on Paral-
lel Problem Solving from Nature - PPSN I, volume 496 of Lecture Notes
in Computer Science, (Springer-Verlag, 1990) pp. 150-159.
452 R. C. Correa, A. Ferreira, and S. C. S. Porto

[32J R. Hauser and R. Manner, Implementation of standard genetic algo-


rithm on mimd machines, in Proceedings of the Third Conference on
Parallel Problem Solving from Nature - PPSN III, volume 866 of Lecture
Notes in Computer Science, (Springer-Verlag, 1994) pp. 504-513.

[33J J. Holland, Adaptation in natural and artificial systems, (Ann Arbor,


University of Michigan Press, 1975).
[34J K. Homqvist, A. Migdalas, and P. Pardalos, Parallelized heuristics
for combinatorial search, in A. Migdalas, P. Pardalos, and S. Storoy
(eds.) Parallel computing in optimization, (Kluwer Academic Publish-
ers, 1997) pp. 269-294.
[35J T. Ibaraki, The power of dominance relations in branch-and-bound
algorithms, Journal of the ACM Vol. 24 No.2 (1977) pp. 264-279.

[36J T. Ibaraki, Enumerative approaches to combinatorial optimisation,


Annals of Operations Research Vol.11 No.1-4 (1988).

[37J P. Jog, J. Suh, and D. van Gucht, Parallel genetic algorithms applied to
the traveling salesman problem, SIAM Journal of Optimization VoLl
No.4 (1991) pp. 515-529.
[38J R. Karp and Y. Zhang, A randomized parallel branch-and-bound pro-
cedure, in Symposium on Theory of Computing (1998) pp. 290-300.
[39J R. Karp and Y. Zhang, Randomized parallel algorithms for back-
track search and branch-and-bound computations, Journal of the ACM
Vol.40 No.3 (1993) 765-789.
[40J S. Kirkpatrick, C. Gellat, and M. Vecchi, Optimization by simulated
annealing, Science Vol.220 (1983) pp. 671-680.
[41J J. Knopman and J. Aude, Parallel simulated annealing: An adaptive
approach, in International Parallel Processing Symposium, (Geneva,
1997) pp. 522-526.

[42J V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to


Parallel Computing: Design and Analysis of Algorithms, (The Ben-
jamin/Cummings Publishing Company, Inc., 1994).
[43] V. Kumar and L. Kanal, The CDP: a unifying formulation for heuris-
tic search, dynamic programming, and branch-and-bound, in National
Conf. on A.I. (1983).
Selected Algorithmic Techniques for Parallel Optimization 453

[44] V. Kumar, K. Ramesh, and V. N. Rao, Parallel best-first search of


state-space graphs: A summary of results, in Proceedings of the 1988
National Conf. on Artificial Intelligence (1988) pp. 122-127.

[45] T. Lai and S. Sahni, Anomalies in parallel branch-and-bound algo-


rithms, Communications of the ACM Vo1.27 (1984) pp. 594-602.

[46] P. Laursen, Parallel heuristic search - introduction and a new approach,


in A. Ferreira and P. Pardalos (eds.) Solving Combinatorial Optimiza-
tion Problems in Parallel, volume 1054 of Lecture Notes in Computer
Science, (Springer-Verlag, 1996) pp. 248-274.

[47] P. S. Laursen, Simple approaches to parallel Branch and Bound, Par-


allel Computing Vo1.19 (1993) pp. 143-152.

[48] G. Li and B. Wah, Coping with anomalies in parallel branch-and-bound


algorithms, IEEE Transactions on Computers Vol.C-35 No.6 (1986) pp.
568-573.

[49] A. Linhares, Microcanonical optimization applied to the travelling


salesperson problem (in portuguese), M.sc. dissertation (Applied Com-
puting & Automation, Universidade Federal Fluminense, 1996).

[50] R. Luling and B. Monien, Load balancing for distributed branch &
bound algorithms, in International Parallel Processing Symposium
(Beverly Hills, 1992) pp. 543-549.

[51] R. Luling and B. Monien, Load balancing for distributed branch


& bound algorithms, Technical Report Nr. 114, (Universitat
Gesamthochschule Paderborn, 1993).

[52] R. Ma, F. Tsung, and M. Ma, A dynamic load balancer for a parallel
branch and bound algorithm, in 1988 ACM Conf. on Lisp and Funct.
Prog. (1988) pp. 1505-1513.

[53] B. Manderick and P. Spiessens, Fine-grained parallel genetic algo-


rithms, in Proceedings of ICGA 3 (1989) pp. 428-433.

[54] B. Mans, Contribution a l'Algorithmique Non Numerique Parallele:


Parallelisation de Methodes de Recherche Arborescente. PhD thesis
(Universite Paris VI, 1992).
454 R. C. Correa, A. Ferreira, and S. C. S. Porto

[55] T. Maruyama, T. Hirose, and A. Konagaya, A fine-grained parallel


genetic algorithm for distributed parallel systems, in Proceedings of
ICGA 5 (1993) pp. 184-190.

[56] W. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller,


Equation of state calculations by fast computing machines, Journal
Chem. Phys. Vol.21 (1953) pp. 1087-1092.

[57] Z. Michalewicz, Genetic algorithms + Data structures = Evolution


programs, (Springer-Verlag, 1994).

[58] D. Miller and J. Pekny, Results from a parallel branch and bound
algorithm for the asymmetric traveling salesman problem, Operations
Research Letters Vol.8 (1989) pp. 129-135.

[59] L. Mitten, Branch-and-bound methods: General formulation and prop-


erties, Operations Research VoLl8 (1970) pp. 24-34. Errata in Opera-
tions Research VoLl9 (1971) p. 550.

[60] H. Muhlenbein, Parallel genetic algorithms, population genetics and


combinatorial optimization, in J. Becker, I. Eisele, and F. Mundemann
(eds.) Parallelism, Learning, Evolution, number 565 of Lecture Notes
in Artificial Intelligence, (Springer-Verlag, 1989) pp. 398-406.

[61] H. Muhlenbein and J. Kindermann, The dynamics of evolution and


learning: Towards genetic neural networks, in R. P. et al. (eds.) Con-
nectionism in Perspective (North-Holland, 1989) pp. 1753-197.

[62] D. Nau, V. Kumar, and L. Kanal, General branch and bound, and its
relation to A* and AO*, Artificial Intelligence Vol.23 (1984) pp. 29-58.
[63] G. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization,
(John Wiley and Sons Interscience, 1988).

[64] C. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algo-


rithms and Complexity, (Englewood Cliffs, Prentice Hall Inc., 1982).

[65] P. Pardalos and J. Crouse, A parallel algorithm for the quadratic assign-
ment problem, in Proceedings Supercomputing 1989 (1989) pp. 351-360.

[66] P. Pardalos, L. Pitsoulis, T. Mavridou, and M. Resende, Parallel search


for combinatorial optimization: Genetic algorithms, simulated anneal-
ing, tabu search and grasp, in A. Ferreira and J. Rolim (eds.) Parallel
Selected Algorithmic Techniques for Parallel Optimization 455

algorithms for irregularly structured problems, volume 980 of Lecture


Notes in Computer Science, (Springer-Verlag, 1996) pp. 317-331.

[67] J. Pearl, Heuristics - Intelligent Search Strategies for Computer Prob-


lem Solving, (Reading, Addison-Wesley, 1984).

[68] J. Pekny and D. Miller, A parallel branch and bound algorithm for
solving large asymmetric traveling salesman problems, Mathematical
Programming Vo1.55 (1992) pp. 17-33.

[69] C. Petty, M. Leuze, and J. Grefenstette, A parallel genetic algorithm,


in Proceedings of the Second Int. Conf. on Genetic Algorithms, (Cam-
bridge, MIT, 1987) pp. 155-161.

[70] S. Porto and C. Ribeiro, Parallel tabu search message-passing syn-


chronous strategies for task scheduling under precedence constraints,
Journal of Heuristics VoLl (1996) pp. 207-233.

[71] S. Porto, J. Torreao, and A. Barroso, A parallel micro canonical opti-


mization algorithm for the task scheduling problem, in Metaheuristic
International Conference (1997).

[72] G. Robertson, Parallel implementation of genetic algorithms in a clas-


sifier system, in L. Davis (ed.) Genetic Algorithms and Simulated An-
nealing, (London, Pitman, 1987) pp. 129-140.

[73] P. Spiessens and B. Manderick, A massively parallel genetic algorithm:


implementation and first analysis, in Proceedings of ICGA 4 (1989) pp.
279-286.

[74] E. Taillard, Robust tabu search for the quadratic assignment problem,
Parallel Computing Vo1.7 (1991) pp. 443-455.

[75] E. Taillard, Parallel taboo search techniques for the job shop scheduling
problem, ORSA Journal on Computing Vo1.6 (1994) pp. 108-117.

[76] R. Tanse, Parallel genetic algorithms for a hypercube, in Proceedings of


the Second Int. Conf. on Genetic Algorithms (Cambridge, MIT, 1987)
pp. 177-183.

[77] J. Torreao and E. Roe, Microcanonical optimization applied to visual


processing, Physics Letters A Vo1.205 (1995) pp. 377-382.
456 R. C. Correa, A. Ferreira, and S. C. S. Porto

[78J H. Trienekens, Parallel Branch and Bound Algorithms, PhD thesis


(Erasmus University, 1990).

[79J D. Whitley, T. Starkweather, and K. Mathias, Optimization using dis-


tributed genetic algorithms, in Proceedings of the First Conference on
Parallel Problem Solving from Nature - PPSN I, volume 496 of Lecture
Notes in Computer Science, (Springer-Verlag, 1990) pp. 134-144.
457

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 457-541
©1998 Kluwer Academic Publishers

Multispace Search for Combinatorial Optimization


Jun Gu
Dept. of Electrical and Computer Engineering,
University of Calgary,
Calgary, Alberta T2N IN4, Canada
E-mail: j.gu@computer .org

Contents
1 Introduction 458

2 Natural Basis 460


2.1 Evolution with Dissipative Structures · 461
2.2 Structural Qualities are Fundamental . · 461
2.3 Phase Transition and Special Conditions . · 463
2.4 Symmetrical Interactions and Homogeneity · 463

3 Multispace Search 464


3.1 Is Value Search Sufficient? . · 464
3.2 Multispace Search ..... · 465

4 Structural Multispace Operations 471


4.1 Multispace Scrambling . . . 471
4.2 Simulated Evolution .... · 471
4.3 Extradimension Transition. · 472
4.4 Search Space Smoothing . . · 474
4.5 Multiphase Search ..... .477
4.6 Passage from Local to Global Optimization · 478
4.7 Preprocessing · 480
4.8 Tabu Search . · 482
4.9 Perturbations · 482
458 Jun Gu

5 Multispace Search for Graph Partitioning 486


5.1 Previous Wrok . . . . . . . . . . . . . . . . . . . . . . . 487
5.2 A Multispace Search Algorithm for Graph Partitioning. 488
5.3 Experiment Results. . . . . . . . . . . . . . . . . . . . . 494

6 Search Space Smoothing for Traveling Salesman Problem 501


6.1 Traditional Approaches . . . . . . . . . . . . . . . . . . . . . 501
6.2 A Multispace Search Algorithm for Traveling Salesman Problem 504
6.3 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . .. 508

7 Local Search for Scheduling and Task Assignment 512


7.1 DAG Scheduling .. . . . . . . . . . . . . . . . . . 513
7.2 Local Search for Scheduling and Task Assignment 515
7.3 Random Local Search Algorithm . . . . 519
7.4 Local Search with Topological Ordering 520
7.5 Performance Study . . . . . . . . . . . . 525

8 Summary 529

References

1 Introd uction
Search problems are ubiquitous. The search process is an adaptive process
of cumulative performance selection. The structure of a given problem and
the environment impose constraints. With the given constraints, a search
process transforms a given problem from an initial state to a solution state.
The search process seems a combination between art and ad hoc tech-
niques. Interestingly enough, regardless of its mathematical formulations,
the search process can be viewed as a natural process. Natural sciences
provide a general understanding of search processes. Physical laws and life
evolution shed light on alternative methods to solving a search problem.
Many search and optimization methods have been developed in combi-
natorial optimization, operations research, artificial intelligence, and neural
network. A traditional search algorithm seeks a value assignment to vari-
ables such that the given constraints are satisfied and the performance crite-
ria are optimized. The algorithm proceeds by changing the values assigned
to the variables in the value space. A value search algorithm does not make
Multispace Search for Combinatorial Optimization 459

full use of the problem structure and the search space structure. It is diffi-
cult to handle the pathological phenomena occurring in many combinatorial
optimization problems.
In this chapter, we give a new optimization approach, the multispace
search, for combinatorial optimization [27, 33, 39]. A multispace search al-
gorithm not only alters values in the value space but also scrambles across
other active search spaces, dynamically changing the problem structure re-
lated to variables, parameters, and other components, and incrementally
building up the final solution to the search problem. The idea of multispace
search was derived from some principles in nonequilibrium thermodynamic
evolution that structural changes are more fundamental than the quantita-
tive changes and evolution depends on the growth of new structure rather
than just information transmission.
Structural multispace operations empower a search process with a infor-
mation flux which is sustained by a sequence of stepwise structural transfor-
mations. A number of multispace operations, such as multispace scrambling,
simulated evolution, extradimension, search space smoothing, multiphase
search, passage from local to global optimization, and preprocessing, have
been proposed. They offer alternative ways of handling difficult optimiza-
tion problems. We have tested multispace search with nontrivial case studies
in combinatorial optimization and practical applications. The experimen-
tal results indicate that multispace search outperforms the traditional value
search methods for various NP-hard combinatorial optimization problems.
The rest of this chapter is organized as follows. In the next section, we
give a brief discussion of the natural basis of search process. This discussion
suggests that significant structural changes to the problem configuration in-
troduces important information to solving a search problem. In Section 3,
we describe the ideas, major components, and basic procedure of multispace
search. Section 4 describes a number of multispace operations including mul-
tispace scrambling, simulated evolution, extradimension transition, search
space smoothing, multiphase search, passage from local to global optimiza-
tion, preprocessing, and perturbation.
Multispace search has been used widely in solving various practical ap-
plication problems in computer-aided design, computer-aided manufactur-
ing, database engineering, robotics, VLSI circuit design, image and object
recognition, computer architecture design, computer network design, and
real-time systems [27, 33, 39, 50]. From Section 5 to Section 7, we will
give three representative case studies of using multispace search for solving
combinatorial optimization problems.
460 Jun Gu

• Section 5 discusses graph partitioning problem which is a basis of logic


partitioning in VLSI circuit design. A multispace search algorithm was
developed. It dynamically reconstructs the graph structure, destroys
the environment of forming local minima, and finds better quality
solutions.

• Section 6 investigates multispace search for traveling salesman problem


(TSP). In the objective function space, we alter the shape of the object
function to smooth the rugged terrain surface structure in the search
space. This has improved solution quality significantly.

• Section 7 gives a fast local search algorithm for scheduling and task
assignment problems. In each search step, pseudo edges are added to a
task graph dynamically to enforce topological ordering constraints. We
applied this algorithm to schedule computing tasks to multiprocessor
computer. It can effectively reduce the scheduling lengths produced
by the best-known scheduling algorithm efficiently.

2 Natural Basis
Natural process evolves with a certain aspect of performance optimization.
Many laws of science can be rewritten as the outcome of an optimization
process.
In physics, a mechanical system moves along the path of least resis-
tance (Maupertuis - Lagrange's least action principle). In chemistry, an
isolated system of randomly interacting molecules evolves irreversibly to-
ward its state of maximum entropy. In economics, utility maximization
is evaluated to achieve improved cost-effectiveness. In biology, an individ-
ual organism able to maximize its fitness to the environment would survive
(Darwin's natural selection). Further, it is interesting to note that the nat-
ural selection operates in such a way that the path followed as a population
changes from one state to another is the one that minimizes the total genetic
variance over the path (Svirezhev's least variance principle).
In [39], we briefly overview some central constructs and background of
nonequilibrium thermodynamic evolution which are closely related to ther-
modynamical and biological evolution. The study of natural basis of com-
putation is an interesting and difficult subject [25, 57]. In this section, in
conjunction with a search process, we briefly discuss some general emergent
properties in thermodynamic and life evolution process informally.
Multispace Search for Combinatorial Optimization 461

2.1 Evolution with Dissipative Structures


Biology and ecology are areas with evidence of compliance with the prin-
ciples of Prigogine's nonequilibrium thermodynamics [81]. An individual
biological organism, which is far from equilibrium, functions as a dissipative
structure. A living organism shows lower entropy and higher order than
the nonliving components in nature. From organisms to ecosystems we find
dissipative structures, as they evolve from chemical elements and, through
energy flux transformations, develop useful genetic materials that reproduce
and metabolize into highly organized systems through stepwise energy trans-
formations. They use energy from outside systems to subsidize their highly
ordered states of life [91, 76]. It is the flux of energy from the sun or from
chemical reactions that sets the process of life in motion and maintains it.
Evolution is a cumulative, self-organizing process that deals with the dis-
sipation of order, energy fluxes, and entropy of organization. The dissipative
structures self-organize at the expense of dissipating some of the energy and
matter that flows through them. Influxes of energy and matter are required
to sustain the building of internal structure, and the structures thus built
lead to more efficient interactions between the system and the environment.
A search algorithm is a symbolic expression of a search process. Consid-
ering a final global optimum solution as an equilibrium state of a search
process, then the search process starting from the initial state may be
viewed as a dissipative structure arising in the nonequilibrium conditions.
There is much "disorder" between the initial state and the final solution
state. Nonequilibrium thermodynamic evolution enlightens an alternative
approach to solve a search problem, i.e., we may derive an information flux
from a sequence of stepwise structural transformations and interplay it with
a search process. To maintain the vital structural information from one in-
termediate structure to another, it is desirable to reduce the amount of dis-
orders between every intermediate steps gradually, facilitating incremental
building a highly "organized" solution. Eventually this will lead the search
process from its initial problem state to the final solution state. It is this
gradual interaction between information flow and cumulative performance
selection that would produce a highly organized final solution structure.

2.2 Structural Qualities are Fundamental


Darwin's evolution theory was dominant: evolutionary progress depends on
maximum structure adaptation of a biological organism to the environment
462 Jun Gu

- survival of the fittest. Later, since the rise of the neo-Darwinian synthe-
sis, and especially with the successes of molecular biology, the genetic code
is the key to understanding the nature and evolution of biological order.
Biological organisms and the material substance of organisms do not evolve.
What evolves is a historical sequence of forms, each causally relates to its
predecessor. The continuity between forms is provided by the information
transmitted to successors through replication. The medium of transfor-
mation is basically genetic code, the physical embodiment of information
responsible for biological organization.

Compared to neo-Darwinism, nonequilibrium thermodynamic evolution


is a notable advance since it addresses time direction and inherent structure.
Instantaneous natural selection is not so important other than to produce
grist. Evolution is not simply a duplication process, it progresses by adding
new grains to past structure over time. Functional structures are the mean-
ingful characteristics of organisms. An increase in functional order involves
the incorporation of new information. Evolution depends on the growth of
structure in biological systems rather than just change in numerical frequen-
cies and it recognizes that structural qualities follow entirely different path
than a quantitative change. Old elements in the system are rearranged or
substituted with some new element so as to preserve more information or
structure at the same or less energetic cost [73]. The structures thus built
lead to more efficient interactions between the system and the environment.

Evolution involves not only information transmission, but also changes


and increases in that information due to structural changes in the organisms.
This comes to the Dollo's law: Successor forms do not repeat any of their
predecessors. As is highly confirmed by observations, the same species do
not evolve, going extinct, and then reappearing again at some later date
when environmental conditions are right for it. Species are not analogous
to alleles, and speciation is more than natural selection. Darwin's natural
selection does not imply either directionality in time or inherent ordering
forces for biology. Dollo's Law, which is highly confirmed by observation,
does embody both irreversibility and a general increase in complexity during
evolution.

Darwin's natural evolution suggests cumulative performance selection


{environmental ordering). Dollo's Law emphasizes the information increase
due to structural changes (irreversibility and inherent ordering). They com-
plement two important aspects of designing a natural search algorithm.
Multispace Search for Combinatorial Optimization 463

2.3 Phase Transition and Special Conditions


A phase transition, i.e., phase transformation, is a transition of a substance
from one phase to another. The distinction is made between the first-order
and the second-order phase transitions.
A first-order phase transition is one in which the changes in internal en-
ergy and density suffer discontinuous jumps. First-order phase transitions
are always associated with the evolution or absorption of heat called the
heat of phase transition. The thermodynamic potential of the system does
not change in a first-order phase transition. Examples of such phase tran-
sitions are vaporization, fusion, sublimation and many transitions of solid
bodies from one crystalline modification to another. A second-order phase
transition is one in which there are no discontinuous jumps in the changes
of internal energy and density. The heat of a second-order phase transition
equals zero. Such a transition is accompanied, however, by discontinuous
jumps in the heat capacity, coefficient of thermal expansion and isothermal
compressibility. Examples of second-order phase transitions are the transi-
tion of liquid helium to the superfiuid state, transition of a ferromagnetic
substance into paramagnetic one at the Curie temperature.
It has been noted that certain critical conditions in the transitions pro-
duce significant changes to the physical system. At the Curie temperature
which is the critical transition point for ferromagnetic substance, the ferro-
magnetic properties of the crystals are lost and the structure of the crystal
lattice is changed, as are the heat capacity, electric conductivity, and other
physical characteristics. The above situations also hold for the antiferro-
magnetic properties.
Phase transition phenomena suggest that we should design an algorithm
able to adapt to different phases of a search process. Furthermore, we should
consider special conditions/moments to incorporate structural changes into
a search process.

2.4 Symmetrical Interactions and Homogeneity


Symmetrical interaction in nonequilibrium systems may be regarded as giv-
ing rise to a state of homogeneity, whereas asymmetrical interaction results
in heterogeneity. On the other hand, homogeneity results in symmetrical
interaction and least dissipation or least entropy production; heterogeneity
results in increased entropy production.
It has frequently been observed that there are two antagonistic processes
464 Jun Gu

operating simultaneously over the course of succession. Individual species


proceed toward a state of least dissipation but interaction among species
tends to stimulate an increase in the rate of dissipation. Initially, these two
processes are isolated. Over the time of evolution, the dominant species
that are least affected by interactions with others will take a path of least
dissipation. Eventually, the trend to reduced dissipation exceeds that toward
increasing energy flow.
Succession does not show any permanent development of homogeneity.
Evolution results in a long-term increase in diversity and an increase in
the number of asymmetrical interactions. Despite occasional fluctuations,
evolution ultimately indicates an ascendancy of the trend toward increasing
diversity and acceleration of the energy flow, counteracted and retarded
by the individual species attempting to proceed in the direction of greater
homogeneity and deceleration of the energy flow. The increase of diversity
emerges as the increase of information (not "disorder").
Interactions from evolving process enlightens us potential ways of de-
signing multispace search algorithms.

3 Multispace Search
Physical laws and life evolution process offer natural basis to the develop-
ment of multispace search. By using an information flux to empower a search
process, we hope that a highly organized "structure" - a better solution
point - can be found effectively.

3.1 Is Value Search Sufficient?


An optimization problem has four components: variables, values, constraints,
and performance criteria. The goal is to find an assignment of values to vari-
ables such that all constraints are satisfied and the performance criteria are
optimized. Present optimization algorithms search by changing the value
assignment to variables iteratively. Typically, in seeking a vector that solves
the optimization problem, an initial vector Xo is selected and the algorithm
generates an improved vector Xl. The process is repeated and an even
better solution X2 is found. Continuing in this fashion, a sequence of ever-
improving points xo, Xl, ... , Xk, ... , is found that approaches a solution point
x*.
A search process resembles an evolution process. A value assignment may
be viewed as a form of the "genetic code" and value changes the quantitative
Multispace Search for Combinatorial Optimization 465

numerical variations to the solution sequence. Important structural changes


which can provide significant information for search problem solving have
not been fully used. Traditional value search methods are not sufficient
to handle difficult optimization problems. Consider the following several
situations:

• The solution quality of a search algorithm is closely related to the


initial solution. It is difficult, however, for a value search method to
decide which initial solution point to start with.

• Various value search methods have been proposed to find the descent
direction. In many cases the best descending direction (that yields the
greatest descent) often leads to a quick stuck at a local minimum.

• When it gets stuck at local minima, a value search has difficulty to


pursue further. Although there are effective structures to help, a value
search can not explore and can not make use of these structures.

3.2 Multispace Search


Multispace search is a new optimization approach developed to handle dif-
ficult situations in combinatorial optimization. It can improve the perfor-
mance of the traditional value search methods [27, 33, 50, 49].
In multispace search, any active component related to the given problem
structure can be manipulated and thus can be formulated as an independent
search space. For a given optimization problem, for its variables, values,
constraints, objective functions, and key parameters (that affect the problem
structure), we define a variable space, a value space (i.e., the traditional
search space), a constraint space, an objective function space, a parameter
space, and other search spaces, respectively. The totality of all the search
spaces constitutes a multispace (see Figure 1).
The basic idea of multispace search is simple. Instead of being restricted
in one value search space, the multispace is taken as the search space. In
the multispace, components other than values can be manipulated and op-
timized as well. During the search, a multispace search algorithm not only
alters values in the value space, it also walks across the variable space and
other active spaces, dynamically changing the problem structure related to
variables, constraints, parameters, and other components, and systemati-
cally constructing a sequence of intermediate problem instances. Each in-
termediate problem instance is solved by an optimization algorithm and the
466 Jun Gu

Value space
Variable space
"\ ...................................

'" \
_..........,
Objective space
•......•.

.... ", .... -', .. .1

r . ..
<.....
..... ~~ ..
...-

\""""'''''''''''''''' """""",,...',""

Constraint space '\'" "''''''


.........~ ',- ..... ..............
-

Parameter space

Figure 1: In the value space, a traditional search process (dashed line)


cannot pass a "wall" of high cost search states (hatched region). It fails to
reach the final solution state, F. A multispace search process (solid lines)
scrambles across different search spaces. It may bypass this "wall" through
other search spaces.

solution found is used as the initial solution to the next intermediate prob-
lem instance to be constructed in the next iteration. By interplaying value
search and structured operations, multispace search solves the sequence of
intermediate problem instances incrementally and constructs the final solu-
tion based on a sequence of intermediate solutions. Only at the last moment
of search, the last reconstructed problem structure is switched back to the
original problem structure, and thus the final value assignment represents
the solution to the given search problem.
Because structural operations (related to variable, constraint, objective
function, and parameter) introduce quantitative as well as qualitative infor-
mation to value search, this makes a multispace search method much more
effective to handle difficult optimization problems (see case studies from
Section 5 to Section 7).
Multispace Search for Combinatorial Optimization 467

Multispace search algorithm combines the traditional optimization algo-


rithms with structural multispace operations. A typical multispace search
algorithm consists of initialization, search, and final state three stages [27,
33, 39, 49]:

Initialization. Initial solution point is crucial to the performance of a


search algorithm. How can we select a reasonably good initial solution point?
In nonequilibrium thermodynamic evolution, symmetrical and asymmetrical
interactions are two antagonistic processes operating simultaneously over the
course of succession. But at the beginning of evolution, homogeneity among
species takes a dominant position. That is, an initial solution should have
a relatively homogeneous state with almost symmetrical configurations. A
typical homogeneous state is a uniform state which has a completely sym-
metrical and uniform structure. There is no any local minimum in the search
space. Any solution point in the search space gives the same performance
figure. This is one of the initial solution points used in multispace search.
A uniform state with symmetrical problem structures is unbiased in the
sense that it does not favor any particular initial search path.
During initialization a scrambling schedule or scrambling scheme must
be defined. A scrambling schedule specifies a sequence of events as when
to make what structural changes to the intermediate problem instances in
which active search space. A scrambling schedule may be deterministic,
stochastic, or both. Although structural changes have significant effect, the
design of a good scrambling schedule itself poses an interesting optimization
problem. In multispace search a simple and effective scrambling schedule
with fewer structural operations is needed.
The last event in the schedule returns the last intermediate problem
instance to the original problem instance.

Search. This involves two fundamental operations during each search


step: a structural reconfiguration of an intermediate problem instance and
a value search that solves the intermediate problem instance.
In biological evolution, organisms and material substance of organisms
do not evolve. What evolves is a historical sequence of forms. The continu-
ity between forms is provided by the information transmitted to successors
through genetic code. Major steps in multispace search are iterative in
nature. The solution point, Xi, resembles the "genetic code" in biological
evolution. It carries structure and performance information along with the
468 Jun Gu

search process. The inheritance of "genetic code" is fundamental in multi-


space search. Traditional value search is an example of this inheritance in
value search space.
Evolution depends on the growth of structure in biological systems rather
than just change in numerical frequencies. Old elements in the system are
rearranged or substituted with some new element so as to preserve more in-
formation or structure at the same or less energetic cost [73]. The structures
thus built lead to more efficient interactions between the system and the en-
vironment. Following these ideas, during each step of multispace search, in
addition to value search, some structural operations are performed that add
structured changes to the problem instance. These structural changes to the
problem instance contribute new information and thus qualitative changes
to the solution sequence.
A multispace search process proceeds according to the scrambling sched-
ule. It scrambles across different search spaces, dynamically reconstructs the
structure of each intermediate problem instance, and incrementally builds
up the final solution to the given search problem.
The major structural operations in multispace search include multispace
scrambling [27, 33, 50, 49], simulated evolution [27, 41, 44], extradimension
transition (e.g., bridge, real dimension, and extra dimension) [32, 34, 35],
search space smoothing [44], multiphase search [101, 102, 32, 83, 106, 38,
36], local to global passage [28, 46, 38], preprocessors (e.g., compression,
decomposition, reorganization, and semi-processing) [37], tabu search [20]'
and perturbations (e.g., jumping, tunneling, climbing, and annealing) [31,
32, 35, 62].
Compared to the traditional value search, the scope of multispace search
has been extended widely. In addition to value search, multispace search can
change variable configurations to the search problem in variable space, can
modify performance criteria representing search target in objective space,
and can alter constraint structures among interacting objects in constraint
space, for examples. Compared to value search alone, structural multispace
operations provide more alternative ways to optimize the search problems.

Final State. At the last moment of search, any dynamically recon-


structed, intermediate problem structure must be replaced by the original
problem structure. This results in the final state of the search. At this
state, any value assignment to the variables represents the final solution to
the given search problem.
Multispace Search for Combinatorial Optimization 469

Evolution results in a long-term increase in diversity and an increase


in the number of asymmetrical interactions. The increased diversity in the
final solution state creates a complicated solution structure. It possesses
much useful information (solution) to the given search problem.

Summarizing the above discussions, a basic multispace search algorithm


is given in Figure 2 [27, 33, 39, 49]. Since there are different structural
changes, different forms of multispace search algorithms exist.
In many large size system engineering projects, a one-time, high quality
solution is required. Since the computation is not required in real-time,
multispace search can be an ideal candidate for this type of work.
As observed in many experiments, structural multispace operations can
significantly affect the convergence property (e.g., global convergence and
local convergence rate) of the existing value search algorithms. In many
cases, with some structural multispace operations, value search turns to be
faster and produces high quality solutions.
Structural operations do incur some additional cost. The amount of
structural operations is, fortunately, determined during the algorithm de-
sign. One can subjectively control the number of operations added to the
algorithm. To be efficient, it is advantageous to incorporate fewer simple op-
erations. In case when the structural operations incur excessive overheads,
tradeoffs must be made between the amount of structural operations and
the solution quality.
470 Jun Gu

procedure Multispacej)earch 0
begin
1* initialization */
input given problem instance;
1* preprocessing */
design an initial problem instance;
design a scrambling schedule;
solve the initial problem instance;

1* scrambling between the active search spaces */


while there are events in the schedule do
begin
enter an active search space;
construct an intermediate problem instance;
1* value search in the value space */
begin
enter the value search space;
solve the intermediate problem instance;
end;
if the intermediate solution is not the final solution then
use it as initial solution in the next iteration;
if the last intermediate problem instance acts then
switch it back to the original problem instance;
return the final solution;
end;
end;

Figure 2: MS: A typical multispace search algorithm interplays structural


multispace operations with the traditional value search. Different forms of
multispace search algorithms exist.
Multispace Search for Combinatorial Optimization 471

4 Structural Multispace Operations


In this section, we briefly discuss a number of structural operations devel-
oped in recent years [27, 33, 39, 49].

4.1 Multispace Scrambling


Multispace scrambling extends search paths from value search space to other
search spaces. This may substantially increase the passage bandwidth for a
search process to travel from an initial state to the final solution state.
In practice we often encounter a difficult situation as shown in Figure l.
A search process starts from an initial state, I, and attempts to reach the
final solution state, F, through search path 1 (dashed line). In the value
space there is a "wall" (hatched region) consisting of search states with
unacceptable high costs. The search process starting from I would not be
able to pass the "wall" and reach the final state F through path 1.
The problem may be recovered by using multispace scrambling, i.e., find
alternative paths and approach the final state through other spaces. Fol-
lowing path 2, we can reconfigure the structure of the search problem and
scramble through the constraint space, then the search process may avoid
being blocked by the "wall." It may reach the final solution state through
a path in the constraint space. Similarly, we may scramble through other
active search spaces through path 3.

4.2 Simulated Evolution


Simulated evolution is a multispace search technique in which the initial
state of search is taken as a uniform state [27]. From nonequilibrium ther-
modynamic evolution (Section 2.4), it is evident that symmetrical and asym-
metrical interactions are two antagonistic processes operating simultane-
ously over the course of succession. At the beginning, homogeneity among
species dominates the evolution. This suggests that a "natural" search pro-
cess should initially start from a relatively homogeneous state with almost
symmetrical configurations.
A typical homogeneous state is a uniform state where the initial problem
structure is completely symmetrical. In a uniform search state, there is no
local minima in the search space. Any solution point in the search space
gives the same performance figure. For graph-related optimization problems,
two typical uniform initial states are the null graph and the complete graph
472 Jun Gu

[41] (see Section 5). Three typical uniform initial states for path-related
optimization problems are the average distance, the shortest distance, and
the longest distance [44] (see Section 6).
Multispace search starting from an initial uniform state is proved to be
an effective technique [90].

4.3 Extradimension Transition


When a search process is trapped into a deep local minimum or is blocked
by a surrounding barrier, it will be difficult to pursue further. We may
need using extradimensions to break through the barrier. Three general
techniques have been proposed early.

Bridge. A bridge is a simple but useful technique handling a difficult


local minimum situation. We look at an example given in Figure 3. In the

a bridge

.......................................
• /./ F .//
A

•.:<:...................................//
Figure 3: Each circle represents a search state. Eight solid circles form a
"well" of high-cost, unacceptable search states. An extra (variable) dimen-
sion builds a local bridge which can help the search process reach the final
solution state, F.

figure, each circle represents a search state. A solid circle represents a high-
cost unacceptable search state. Eight high-cost unacceptable search states
form a surround barrier that blocks the search process from reaching the
final search state. A traditional search process would have to stop at state
A. Random jumps might not help unless an excessively long computing time
is allowed.
To resolve the problem we may introduce a "bridge" (a few extra vari-
ables) as additional dimensions in the search space. The value assignment
Multispace Search for Combinatorial Optimization 473

to these extra variables is so designed to construct the bridge as a low-cost


local path, leading the search process to the final solution state. The bridge
will be removed once it passes the barrier. Depending on the situations, one
or more bridges may be built in order to enhance the transition capability.

Real Dimension. Real dimension increases the passage bandwidth


in a real search space. The idea is based on a simple fact that there are
infinitely many search paths between any two discrete search states. We use
a two-dimensional search space as an example (Figure 4). In a discrete space
all the search paths are blocked by unacceptable search states (eight solid
circles), It is difficult for a search process to pass through the barrier. There

J. . . . . . ~L. ~. .~. .~\.


discrete values floating values

A o---J--.o F /fl/
.·:··················e···················• .:

Figure 4: Each circle represents a discrete search state. Eight solid circles
form a "well" of high-cost unacceptable search states.. Although all the
search paths are blocked by discrete unacceptable search states, there are
infinitely many unblocked paths in the real space. A search process can pass
the barrier through a real dimension.

is, however, an infinite number of floating point search states between any
two search states. Some of the floating point states can be low-cost states
which give feasible search paths from A to F.
To use a real dimension, we simply make a context switching and convert
variables and objective function into the real space. This idea has been
applied to a variety of discrete search problems [26, 45, 31, 34, 35].

Extra Dimension. Extra dimension is a technique of incorporating


extra variables to resolve inevitable conflicting situations in some search
problems. Once the requirement for the extra variables is justified, they
will be used systematically and/or permanently. This technique is crucial in
474 Jun Gu

solving the state minimization problem [84J and the SAT-Circuit problem
in asynchronous circuit synthesis [37, 39J.
The bridge may be viewed as a special case of the extra dimension tech-
niques. The difference is that a bridge is constructed adaptively and on a
temporary basis. The overhead for constructing a bridge may be ignored.

4.4 Search Space Smoothing


Local minima make search problem hard. Search space smoothing is a tech-
nique to "smooth out" local minima through a sequence of simplified inter-
mediate problem instances [44, 90J.
A major weakness of local search is that the algorithm has a tendency
to become stuck at a locally optimum (Figure 5). In local search, different

n initial point

a local minimum point

a global minimum point

Figure 5: An example of a simplified, l-dimensional search space.

neighborhood structures result in different terrain surface structures in the


search space, producing different numbers of local minimum points. Local
minimum points make a search problem hard. The less the number of local
minimum points, the more effective a local search algorithm is.
Search space smoothing can "limit" the number of local minimum points
in the search space [44J. Assume there are many local minima in a search
space (see Figure 6), we can use a smoothed search space to approximate the
original search space. After the search space smoothing, some local minima
are temporarily "filled" and the number of local minima in a smoothed
search space is "reduced." Thus, the probability of a search process being
Multispace Search for Combinatorial Optimization 475

- - the original search space


a smoothed search space

Figure 6: The illustration of smoothing a search space. Many local minimum


points are "filled" after a smoothing preprocessing, resulting in a simplified
problem instance.

trapped into local minima is minimized. In ideal cases there are only one
or few local minima left (in the smoothed search space), a local search may
find the solution global minimum point without difficulty.
A smoothing operation only changes the metric characteristics of a search
space. It leaves its topological structure untouched. Since a smoothed search
space contains topological structure information of the original search space,
space smoothing can facilitate the search of the global minimum point in the
original search space. Using an appropriate smoothing scheme, the global
minimum point in the smoothed search space may be set sufficiently close to
the global minimum point in the original search space. If we use the global
minimum point in the smoothed search space as an initial solution point
in the original search space (see Figure 7), the chance of finding the global
minimum point in the original search space can be increased substantially.
A single search space smoothing operation is not effective. That is,
if one applies a weaker smoothing operation, the surface structure of the
smoothed search space is similar to the original one. While the heuris-
tic guidance information from the original search space is strong, a weaker
smoothing operation results in less reduction in the number of local minima
in the original search space. To increase the chance of finding global mini-
mum points in the smoothed search space, we expect a stronger smoothing
operation that produces a flatter search space. This contradictory situation
476 Jun Gu

can be resolved by a series of smoothed search spaces with their structures


varying from the original search space to a flatter search space, as illustrated
in Figure 7. Each (upper) search space is a further smoothing of the lower

smoothed search space n


a»1
smoothed search space n-l

smoothed search space 1


the original search space
a=l

Solution of the smoothed search space


Initial search point _ _ . . - Solution of the original search space

Figure 7: A series of smoothed search spaces is generated. The solutions


of the smoothed search spaces are used to guide the search of those of the
rugged search spaces.

search space. The solutions of a smoothed, flatter search space are used to
guide the search of those in the more rugged search spaces.
In [90], Schneider et al. made an extensive study of various smooth-
ing formulas including linear smoothing, power-law smoothing, exponential
smoothing, hyperbolic smoothing, sigmoidal smoothing, and logarithmic
smoothing for a number of combinatorial optimization algorithms. They
found that search space smoothing method outperformed other unsmoothed
algorithms in almost all cases.
Search space smoothing method has been applied to solve traveling sales-
man problem [44], survivable network design problem [80], and quorumcast
routing (a general form of multicast routing) problems [43].
Multispace Search for Combinatorial Optimization 477

4.5 Multiphase Search


Multiphase is a natural phenomenon (Section 2.3). In optimization problem
solving, multi-phase phenomena have been observed by many researchers.
The search space model behind many optimization algorithms is an abstract
model with a three-level structure. An informal example of the model is
given in Figure 8.

Figure 8: An informal illustration of the three-level search space model.

In the model, a search space is viewed in three levels: top level, middle
level, and bottom level. The top level is the upper open portion of the search
space with smoothing edges. In this level most optimization algorithms can
reduce the objective function quickly and perform quite well (although the
initial optimization process may take different descent paths in the search
space, as affected by the optimization algorithm and the terrain surface
structure of the search space). The middle level is the middle portion of the
search space where there are relatively big mountain peaks. During descent,
the search process may encounter problems and it would use some tunneling
and random heuristics to proceed.
The difficult phase of an optimization process appears at the later stage
of locating the global optimum. There are many local minima in the bottom
level of the search space. 1 Due to the complication of local minima, the

IThe bottom level is the bottom portion of the valleys (particular the lowest valley) in
478 Jun Gu

difficulty of finding the global optimum is increased considerably. The most


troublesome case is a trap (i.e., a well of local minima). When a search
process falls into a trap it will soon be locked into a loop of local minima.
For hard optimization problems, many optimization algorithms that exhibit
satisfactory performance in the earlier search stage may perform poorly at
the later stage of optimization, showing inferior convergent behavior.
This suggests that different phases of an optimization process should be
treated separately. An optimization algorithm should be designed to adapt
to different phases of the optimization process. Furthermore, if the search
phases are distinct, we may use different algorithms in different phases of
optimization (i.e., algorithm switching); otherwise, we may handle the phase
transition and the corresponding algorithmic transition in a gradual manner
(i.e., algorithmic transition).
Multiphase heuristics form a part of multispace search heuristics [27, 39,
49,50]. They have been developed to adapt to the different phases of a search
process: perform a poor initial search and then a fine local search for conflict
minimization; perform a good initial search and then a fine local search
for conflict minimization; perform a good initial search, then a rough local
search, and a fine local search for conflict minimization; perform an initial
search, and then a rough local search and a fine local search alternatively
for conflict minimization; perform a rough initial search, then a coarse local
search, and finally, a fine local search for conflict minimization.
Multiphase heuristics are a basic idea behind the computer microword
length minimization algorithm [83], the n-queen local search algorithms [101,
102]' the SAT1.5 algorithm [30, 32, 36], the channel assignment algorithm
[106], the topological local search algorithm [29, 48, 112] (see Section 7), and
the af3 relaxation algorithm [28, 46, 38, 39].

4.6 Passage from Local to Global Optimization


In practice, a great deal of local information is expressed in the local con-
straints and local minima. Solving a constrained optimization problem
amounts to the derivation of a global solution from the local structures. A
proper use of local information would make the optimization process more
informed. In this aspect the following three techniques are particularly use-
ful [28, 46, 38, 39].

the search space.


Multispace Search for Combinatorial Optimization 479

Dual Step Local to Global Optimization. Local structures (con-


straints, local variables, and local terrain structure of the search space) are
a background we derive a solution. Normally we use one optimization al-
gorithm to derive the solution which is not efficient in some cases. In af3
relaxation [28, 38, 46], we incorporate a local optimization inside each step
of a global optimization procedure. This results in a dual step optimization
procedure. A local optimization is first performed that produces a locally
optimal solution. In the second step, based on the first local solution, second
optimization is performed over the entire (global) problem range, resulting in
a "dually optimized" solution. The dual step optimization gives a min-min
form optimization procedure.
Through numerous experiments we have observed that the solution pro-
duced from the dual step optimization procedure is much better than the
solutions from a single local optimization or a single global optimization
[28, 38, 46].

Iterative Local to Global Propagation. Local minima are unpre-


dictable and intractable. Local and global information exchange can make
an optimization algorithm much more informed. In the af3 relaxation, an
iterative procedure for local information propagation is developed. The pro-
cedure spreads local information of each local variable to other variables in
the network. Depending on the problem and the algorithms, the distribution
of local information may take a moderate or a quick way. This is controlled
by the local information propagation rate, a [28, 38, 46].
In general at the beginning of optimization it would be advantageous
to distribute the local information much faster, allowing it to inform other
variables at an early stage of optimization. So, the initial value of a is set
to a large number. A large value of a, however, may cause a nonuniform
distribution of local information. As the search process progresses, a is
gradually decreased such that the local information can be spread more
uniformly and thoroughly in the system.

Adaptive Local to Global Algorithmic Transition. This is an algo-


rithmic transition technique from multiphase heuristics. Most optimization
algorithms start to process low-level local information and gradually derive
a high-level global solution. In af3 relaxation [28, 46, 38], an adaptive local-
to-global algorithmic transition was used. It changes the composition of
local and global information and performs a dynamic algorithmic transition
480 Jun Gu

under the control of transition rate, {3. The {3 value is initially set to con-
figure the algorithm as a local optimization algorithm, making full use of
local information effectively. As the optimization process evolves, a global
solution is gradually derived from the local optimization. The local informa-
tion becomes less important. Accordingly, the {3 value is gradually changed
to reduce the contribution of local information and to increase the effect of
global information. Finally the a{3 relaxation approaches an optimization
algorithm utilizing global information.

4.7 Preprocessing
An important area to explore computing efficiency is to preprocess the given
problem structure before the search. An optimization problem may have
more than one representations which cause different computing efficiencies.
A preprocessor is a set of structural operations applied to process a given
problem structure at the beginning of search. Surprisingly, structural pre-
processing can significantly reduce the computing time for many optimiza-
tion problems. We describe several frequently used preprocessing methods
below.

Compression. A compressor does structural compression. The idea


of a "compressor" stems naturally from practical engineering applications
in image compression [55], state compression [84], and code compression
[83]. Compression techniques are critical in image processing and modern
communication systems where the limited resources are of a major concern.
In multispace search we compress a large size problem into a smaller problem
so the original problem can be processed much more efficiently. We must
maintain the functional equivalence between the problem instances before
and after the operations.

Decomposition. Decomposition is one of few most powerful techniques


in multispace search. Since most search problems are NP-hard in nature,
partitioning a large search space into smaller ones can significantly reduce
the computing time. A decomposer is a set of operations that partitions a
large problem into smaller subproblems [27, 33, 49]. These smaller localized
subproblems can be manipulated independently. An integration method
able to integrate these local subproblems into a global system is required.
The solutions derived from the subproblems can then be integrated in the
solution of the original problem. In a recent work for asynchronous cir-
Multispace Search for Combinatorial Optimization 481

cuit design, an effective decomposer was developed to solve the SAT-Circuit


problem [37, 39].
Techniques for partitioning value domain can be very powerful. A vari-
able domain contains values to be assigned to variables. The size of a variable
domain affect the computational complexity of an optimization algorithm.
Even a small reduction in the variable domain would result in significant
improvements in computing efficiency. It is, however, difficult to make use
of variable-domain reduction techniques in solving optimization problems.
Recently, Wang and Rushforth have studied mobile cellular network struc-
tures and developed a novel variable-domain reduction technique for channel
assignment in these networks [106, 107].
An important aspect of the decomposition approach is constraint de-
composition [27, 33, 49]. Decomposing constraints refers to the ability of an
optimization approach to enable complex information to be represented in
terms of subspaces of local information. Each subspace of constraint rela-
tions may be handled separately. The subspace representations differ from
each other on how a subspace of local information can communicate with
another. In our subspacing models, the assignment information in each local
subspace is communicated to other local subspaces as well as with the global
search space. It is this interaction that allows the complete solution to be
built from the solutions of the individual local subspaces.

Reorganization. Reorganization determines a processing sequence in


terms of a certain priority measure for individual components. This is useful
since in many combinatorial optimization problems the processing order has
a strong effect on the computing efficiency [53]. A number of optimization
algorithms, such as BDD (Binary Decision Diagrams) [5,6, 70]' identify the
processing priority for the variables.

Semi-Processing. Most optimization algorithms have an initialization


procedure where an initial solution is assigned to the given problem. The
initial solution determines the initial search path that affects algorithm's
performance. Semi-processing refers to the partial processing techniques
introduced in the initialization procedure.
One of semi-processing methods is the initial search (during the initial
solution generation). This method is cost-effective. In the n-queens prob-
lem, which is a benchmark for constraint satisfaction problem, instead of
using a complete random initial solution, we conduct an initial search dur-
482 Jun Gu

ing the generation of an initial placement of queens. This initial search was
performed for a limited number of queens so it does not impose high cost
other than initial queen placement. As a result, a search algorithm with
the initial search procedure can handle problems involving over one million
queens [101, 100, 102]. Similar algorithm techniques were used for sat is-
fiability problem [26, 32, 33]. In practice, these technique often increases
search efficiency by a few orders of magnitude.
Other recent work with partial processing includes backtracking and
probing [82].

4.8 Tabu Search


In a search space with rugged terrain structure there must be many local
minima points. During the search process, some local minima points may
be visited more than once. A natural heuristic is to record some previously

Figure 9: In a region with many local minima points, record some previously
visited places would prevent repeated, redundant search efforts.

visited places so as to prevent repeated, redundant search efforts (see Figure


9). This is known as tabu search [20]. It is fairly effective to handle some
difficult combinatorial optimization problems.

4.9 Perturbations
Perturbations are irregular value changes to the assignments. Perturbations
in general do not introduce structural changes to the problem structure.
Since most perturbations are performed in value space - the most important
Multispace Search for Combinatorial Optimization 483

space in multispace - in a broad sense we include value perturbations as a


set of operations in the multispace search. The following are several effective
value perturbation methods.

Jumping. A jump is often needed if the search process becomes stuck at


a local minimum point. A random jump has the similar effect of reassigning

Figure 10: A jump is a perturbation technique tackling local minima.

the initial feasible starting point.


As observed by many researchers, local optima produced by local search
have a certain average quality; they are related to the quality of the initial
solution points. The use of a random initial starting point is an unbiased
decision that enables one to sample the set of local optima widely, producing
an improved starting point. The use ofrandomjumps is a common technique
applied to tackle the pathological effect of the worst-case instances. This
strategy has been used to solve the traveling salesman problem (TSP) [71],
the satisfiability problem [26, 31, 45, 32, 35], and the n-queen problem [98,
99, 100, 101, 102] with great success.

Tunneling. A tunnel is a short-cut passing through a mountain re-


gion with rugged terrain structure. Tunneling operation can be used when
the search process travels around a region with many local minima points.
Whenever a local minima is encountered, make a direct tunnel passing
through its neighboring mountain region as long as this does not increase
the objective value monotonically. A tunneling operation has constant cost.
The simplest tunneling operation, i.e., negating the value of a variable if
484 Jun Gu

Figure 11: Thnneling operations are efficient to pass through the rugged
terrain surface with many local minima points as long as this does not
increase the objective function monotonically.

this does not monotonically increase the objective value, is proven to be an


effective operation [26, 32, 96].
It was evident from our previous experiments that the tunneling oper-
ation can be made more powerful if it is used even if the search process is
not stuck at a local minima [26, 31, 32, 35].

Climbing. Sometimes when the search process becomes stuck at a local


minimum, one can gradually increase the objective function and "climb" out
of the local minimum valley. Compared to jumping and tunneling opera-
tions, a climbing operation is more locally directed by the objective function.
In most cases, it is more effective to use climbing in conjunction with jump-
ing and tunneling operations.

Genetic Operations. Mutation and crossover operators are two oper-


ations in genetic algorithm [52]. They are applied to the fixed-length binary
strings (chromosome). The mutation alters one or more genes (positions
in a chromosome) with a probability equal to the mutation rate. Then the
crossover operator swaps substrings in two chromosomes, producing two new
offsprings. The two operations applied to fixed-length binary strings may
be considered a structural value perturbation to the intermediate solution.
Genetic algorithm can solve certain optimization problems efficiently.
Multispace Search for Combinatorial Optimization 485

Figure 12: When the search process becomes stuck at a local minima, one
can gradually increase the objective function and "climb" out the local min-
ima.

Annealing. A search process walking in a search space may follow dif-


ferent paths, as determined by the search algorithm and the terrain struc-
ture of the search space. An important stochastic perturbation method is
simulated annealing [62]. The basic procedure in simulated annealing is to
accept all walks that result in a cost reduction. Walks that result in a cost
increase are accepted with a probability. In the beginning, most of walks
are accepted. Gradually the cost increasing walks have less chance of being
accepted. Ultimately, the only walks causing a cost reduction are accepted.
An annealing procedure may produce a good quality solution if an exces-
sively long computing time is given.
In the next several sections, we will give three representative case studies
of using multispace search for solving combinatorial optimization problems.
486 Jun Gu

5 Multispace Search for Graph Partitioning

Graph partitioning problem is a fundamental combinatorial optimization


problem. In graph partitioning, a set of nodes is connected by a set of edges,
and each edge connects exactly two nodes. The objective of two-way graph
partitioning is to divide a graph into two subsets such that the number of
edges which connect to the nodes in two subsets is minimized. The number
of these connections is referred to as the cut cost of the partition. Two-way
graph partitioning is the basis of multiway graph partitioning problem [60J.
The graph partitioning problem with specified bounds on the sizes of the
resulting subsets belongs to the class of NP-complete problems [18, 19J. In
practice the size of partitioning problems makes it impossible to perform
the exhaustive search required to find an optimal partition. Work on graph
partition has concentrated on finding heuristics that give good results in a
reasonable amount of time.
Graph partitioning has practical applications in VLSI circuit design.
A VLSI circuit is represented by a graph with nodes representing circuit
modules and edges signal nets. A VLSI architecture may contain hundreds
and thousands gates. During the logic design, a large circuit design problem
is first divided into smaller subproblems so the complexity of circuit design
can be reduced dramatically. Depending on chip area available, a large
circuit design may be decomposed into smaller circuit modules which can
be implemented on separate chips. Graph partitioning is the basis of circuit
partitioning. A good partitioning can reduce layout costs, minimize off-chip
communication overheads, and significantly improve circuits' performance.
Many VLSI circuit partitioning problems are essentially multiway par-
titioning problems. Two-way partitioning methods are fundamental to the
solutions of multi way partitioning.
As a basis for graph partitioning, we choose standard two-way graph par-
titioning in this case study. Since Kernighan-Lin (K-L) algorithm is central
to graph partitioning, we choose it as a performance comparison standard.
Our multispace search algorithm interplays Kernighan-Lin algorithm along
with incremental structural changes to the graph [41, 33J. It scrambles
between the value space and graph space, reconstructing the graph struc-
ture while improving the partition solution. Experimental results indicate
that this method improves Kernighan-Lin algorithm in terms of the solution
quality and its sensitivity to the initial random partition.
Multispace Search for Combinatorial Optimization 487

5.1 Previous Wrok


Many approaches for graph and circuit partitioning have been proposed.
These include clustering [7, 92], eigenvector decomposition [17], network flow
[16], group migration method [15, 60, 63, 88, 94, 95], simulated annealing
[62], and ratio cut method [108, 109, 110]. Here we briefly overview some
major partitioning methods.
Ford and Fulker [16] transformed the minimum cut problem into the
maximum flow problem and proposed a maximum flow minimum cut algo-
rithm which is an exact algorithm for finding the optimal solution between
two subsets of unspecified sizes. In the algorithm, the minimum number
of crossing edges is chosen to be the maximum flow from one node to the
other. The algorithm has a time complexity of O(n 3 ) where n is the num-
ber of nodes in graph. Since there is no constraint on the sizes of resultant
subsets, the algorithm may generate two unevenly sized subsets. This type
of solutions has no use in practice.
The group migration method, i.e., the Kernighan-Lin algorithm [60], is
an effective method for graph partitioning. The algorithm starts with two
randomly generated subsets and then iteratively swaps all pairs of nodes to
reduce the cut cost. Researchers have used Kernighan-Lin algorithm or its
variants to VLSI circuit design problems. Schweikert and Kernighan used a
net cut model to handle the multipin net cases [94]. Fiduccia and Matthey-
ses showed a linear time heuristic that improves the previous results while
maintaining desirable balance between the subsets [15]. Krishnamurthyand
Sanchis applied lookahead technique in their algorithm [63, 88]. Sechen and
Chen proposed an improved objective function for mincut circuit partition-
ing [95].
A Kernighan-Lin type algorithm is in general fairly efficient but it needs
a predefined subset size to start with. Since there is no way of knowing
cluster size in circuits before partitioning, predefined partition size may not
be well suited for practical applications. Recently, following Fiduccia and
Mattheyses's work, Wei and Cheng proposed a ratio cut partitioning method
[108, 109, 110]. The algorithm released the constraint on subset size and
dynamically establishes its own subsets which are close to clusters in the
circuit. It shows desirable results for hierarchical designs and multiway
partitions.
Simulated annealing method is a nonconvex optimization algorithm [62].
It casts the graph partitioning problem in two parts: a cost function, which
classifies any feasible solution, and a set of moves, which allow movement
488 Jun Gu

from solution to solution. Simulated annealing may produce good results


for graph partitioning at the expense of extremely long running time.

5.2 A Multispace Search Algorithm for Graph Partitioning


We developed a multispace search algorithm for graph partitioning [41, 33].
The algorithm can dynamically scramble between the value space and graph
space, reconstruct the graph structure,' and improve the partition solution.
Following Kernighan and Lin [60], for a graph to be partitioned into two
subsets A and B, let:

- G(V, E) be a graph,
- V be the set of nodes in G, !VI = n,
- E be the set of edges in G,
- IAI and IBI be the sizes of two partitioned subsets,
- Cab be an edge connecting nodes a and b,
- Ea be the external cost of node a E A, i.e., Ea = LYEB Cay,
- fa be the internal cost of node a E A, i.e., fa = LXEA Cax ,
- D be the density of the graph, 2 and
- gain be the reduction of cut cost if swapping a and b, i.e.,

A graph is partitioned into two subsets with the same sizes, i.e., IAI-IBI :=::; l.
In the following, we first discuss some ideas and algorithms for multispace
graph partitioning. Then we describe a typical algorithm in detail.

1. Initial State, Final State, and Structural Changes

Using multispace search for graph partitioning, we conduct structural


changes to the graph in the graph space. Each time the graph structure is
changed, we perform a graph partitioning algorithm to partition the new
graph. In this study, Kernighan-Lin algorithm is chosen as the graph parti-
tioning algorithm.
In the graph space we may choose to incorporate structural changes to
the graph in an incremental (evolution) manner. To begin, we start from
2The density of the graph is defined as the ratio of the actual number of edges divided
·
b y t h e tot al numb er 0 f ed ges, l.e., D = # n.!n
of ed~es
1 .
-2-
Multispace Search for Combinatorial Optimization 489

an uniform search state. Two typical uniform search states for a graph
are a complete graph (with all nodes and edges connected) and an empty
graph (without nodes and edges). The final solution state must be the given
graph structure. That is, we start from a complete graph or an empty
graph and gradually change it to the given graph structure. Since the initial
search state is uniform, there is no local minimum point. A sequence of
incremental structural changes would have similar effect to search space
smoothing (Section 4.4). It can provide much heuristic information to the
Kernighan-Lin algorithm.
Depending on the initial state and structural changes, we have designed
several multispace search algorithms for graph partitioning. In this case
study, we experimented five Multispace Graph Partitioning (MGP) algo-
rithms:

• MGPl Algorithm: Create a complete graph with the same number of


nodes as in the given graph problem. Gradually remove some redun-
dant edges from the graph according to the scrambling schedule, per-
form the K-L graph partitioning for each intermediate graph instance.
The process is repeated until the graph is reduced to the original given
graph.

• MGP2 Algorithm: Create a graph with the same number of nodes as in


the given graph problem but without edges. Gradually add some edges
to the graph according to the scrambling schedule, perform the K-L
graph partitioning for each intermediate graph instance. The process
is repeated until the original given graph is built.

• MGP3 Algorithm: Create a complete graph with the same number of


nodes as in the given graph problem. Gradually remove some edges
from the graph and also add some edges to the graph according to
the scrambling schedule, perform the K-L graph partitioning for each
intermediate graph instance. The process is repeated until the graph
is replaced by the original given graph.

• MGP4 Algorithm: Create a graph with the same number of nodes as


in the given graph problem but without edges. Gradually remove some
nodes from the graph and also add some nodes to the graph according
to the scrambling schedule, perform the K-L graph partitioning for
each intermediate graph instance. The process is repeated until the
graph is replaced by the original given graph.
490 Jun Gu

• MGP5 Algorithm: Create a graph with the same number of nodes as in


the given graph problem. Gradually remove some edges and/or nodes
from the graph and also add some edges and/or nodes to the graph
according to the scrambling schedule, perform the K-L graph parti-
tioning for each intermediate graph instance. The process is repeated
until the graph is replaced by the original given graph.

As is clear, an important component in multispace search is the scram-


bling schedule, which determines when and how to change the graph struc-
ture and when and how to return to the original given graph structure. In
the following, we will discuss various scrambling schedule used in our exper-
iments after describing a typical multispace graph partitioning algorithm.

2. A Multispace Search Algorithm

Based on value search and graph structure reconstruction, in Figure 13,


we give a multispace graph partitioning algorithm, MGP1, for the graph
partitioning problem [41, 33]. The algorithm works in the following several
stages.

Initialization. Initially, procedure geLa_problem_graph( G) makes a


problem_ graph to be partitioned. Procedure generate_a_complete_graphO
generates a complete graph, graph, as an initial starting solution. This com-
plete graph has the same number of nodes as in the given problem graph.
Procedure random_partitioningO randomly partitions graph into two subsets
A and B with IAI - IBI ::; 1. This produces an initial partition. Based on
the initial partition, the initial cut cost, i.e., cost, is evaluated by function
compute_costO· .
A scrambling schedule is designed (or specified) by procedure design_a_
scrambling_scheduleO. In MGP1, we remove some edges from the initial
complete graph according to a variety of scrambling schedules discussed
below.

Search. During each iteration of the while loop, as long as there are
some actions in schedule queue, the multispace search process will scramble
between the value space and the graph space. To begin, in the graph space,
procedure change_the_graphO removes some redundant edges from the com-
plete graph. This produces a slightly complicated graph instance. In value
Multispace Search for Combinatorial Optimization 491

procedure MGP1 (G)


begin
1* initialization */
problem_graph := get_a_problem_graph(G);
graph := generate_a_complete_graphO;
partitiO'f), := random_partitioningO;
cost := compute_cost(partitiO'f),);
schedule := design_a_scrambling...scheduleO;

1* scrambling between value space and graph space */


while (schedule is not empty) do
begin
1* construct a new graph in graph space */
graph := change_the_graphO;
1* partition this new graph in value space */
begin
for each node ai in subset A do
for each node bj in subset B do
1* if swap(ai,bj ) does not increase cost */
if test...swap( ai ,bj ) then
begin
partition := perform...swap( ai ,bj );
cost := compute_cost (partitiO'f),);
end;
if local then locaLhandlersO;
end;
end;
1* when graph returns to the original problem_graph */
return partition and cost;
end;

Figure 13: MGP1: A multispace search algorithm for graph partitioning.


The algorithm starts from a complete graph with the same number of nodes
as in the given graph problem. It gradually removes some redundant edges
from the graph and performs the Kernighan-Lin algorithm for each inter-
mediate graph instance. The process is repeated until the graph is reduced
to the original given graph.
492 Jun Gu

space a K-L algorithm is applied to partition the new intermediate graph


instance. The test to see if the swap of node a and node b would reduce
the cut cost is performed by tesLswap(ai,bj}. If this is true, the swapping is
taken by procedure perform_swap(ai,bj }. This results in an improved parti-
tion. Subsequently, the new cost of the partition is evaluated by procedure
compute_costO·
During the search, the search process might get stuck at a local min-
imum point. Several locaLhandlersO with random heuristics are used to
improve the algorithm's convergence performance. Structural operations,
on the other hand, can effectively destroy the environment of forming local
minima. A local minimum in the current iteration may be removed when a
new graph structure is introduced.

Termination. During each iteration of search, some edges of the graph


are removed and a more complicated graph instance is generated. The algo-
rithm takes the solution produced from the previous, simpler graph structure
as the initial partition and searches for a solution for a more difficult graph
structure. This solution is again taken as the initial partition of an increas-
ingly complicated graph structure (more close to the given graph), which
will be solved during the next iteration of partition.
The above process is repeated. Every time some edges are removed from
the complete graph. The complete graph is incrementally reduced, getting
close to the actual problem_graph. At the last moment, all the redundant
edges in the complete graph are removed and the graph is reduced to the
original given problem_graph. At this moment, the final partition represents
the solution to the original given problem.

Running Time. The run time of the MGPl algorithm can be esti-
mated as follows. In the initialization part, it takes O(n 2 ) time to generate
a complete graph. Procedures random_partitioningO and compute_costO to-
gether perform the Kernighan-Lin algorithm which has an O(n 2 Iogn) time
complexity. So the initialization partion requires O(n 2 ) + O(n 2 logn) time.
At the beginning of search, it takes procedure change_the_graphO O(n 2 )
time to reconstruct the graph. In most cases, a fixed number of operations
is performed which take a constant time. The rest two for loops perform
the Kernighan-Lin algorithm, requiring O(n 2 logn) time. The locaLhandlers,
in the most cases, perform a prespecified number of operations which take
a constant time. The while loop follows the schedule and takes a constant
Multispace Search for Combinatorial Optimization 493

number, say O!, iterations.


Summarizing the above, the time complexity of the M G Pl algorithm is:

3. Scrambling Schemes

There are many ways of changing graph structures depending on a scram-


bling schedule. The design of a good scrambling schedule is itself a difficult
optimization problem. In this section, we discuss some scrambling strategies
used in our early experimentation of the MGP algorithms. For more details
about recent work see [50].
In the MGPl algorithm we start from a complete graph, the following
ten scrambling schemes are used to make structural changes to the graph:

• Scheme 1 (sl): linear-edges. The number ofredundant edges removed


from the graph varies as a linear function.

• Scheme 2 (s2): exponential-edges. The number of redundant edges


removed from the graph varies as an exponential function.

• Scheme 3 (s3): linear-nodes. The number of nodes whose redundant


edges are removed from the graph varies as a linear function.

• Scheme 4 (s4): exponential-nodes. The number of nodes whose re-


dundant edges are removed from the graph varies as an exponential
function.

• Scheme 5 (s5): linear-nodes-least-edge-first. The number of nodes


whose redundant edges are removed from the graph varies as a linear
function. And the redundant edges connected to the nodes with the
least number of edges are removed first.

• Scheme 6 (s6): exponential-nodes-least-edge-first. The number of nodes


whose redundant edges are removed from the graph varies as an ex-
ponential function. And the redundant edges connected to the nodes
with the least number of edges are removed first.

• Scheme 7 (s7): inverse-exponential-nodes-least-edge-first. The number


of nodes whose redundant edges are removed from the graph varies as
494 Jun Gu

an inverse exponential function. And the redundant edges connected


to the nodes with the least number of edges are removed first.

• Scheme 8 (s8): linear-nodes-most-edge-first. The number of nodes


whose redundant edges are removed from the graph varies as a linear
function. And the redundant edges connected to the nodes with the
most number of edges are removed first.

• Scheme 9 (s9): exponential-nodes-most-edge-first. The number of


nodes whose redundant edges are removed from the graph varies as
an exponential function. And the redundant edges connected to the
nodes with the most number of edges are removed first.

• Scheme 10 (s10): inverse-exponential-nodes-most-edge-first. The num-


ber of nodes whose redundant edges are removed from the graph varies
as an inverse exponential function. And the redundant edges con-
nected to the nodes with the most number of edges are removed first.

In a similar manner, the above scrambling schemes are used in M G P2 al-


gorithm. Instead of removing edges, in the MGP2 algorithm, we start from
an empty graph and incrementally add edges to the graph. Similar struc-
tural operations are used in the GMP3, GMP4, and GMP5 algorithms.
Scrambling schedules with random and partial random operations have
been developed. We will not discuss them here.

5.3 Experiment Results


The Kernighan-Lin based algorithms share common weakness that they all
need to start with a random initial partition. The algorithms are observed
to be highly sensitive to the choices of the initial partition, and the final
results vary significantly due to different starting points [63].
We have tested multispace graph partitioning algorithms for many ran-
dom graph instances. All the algorithms were coded in C and our experi-
ments were performed on a SUN SPARC 2 workstation. In this section, we
show experimental results of the multispace search algorithm and compare
them with those of the Kernighan-Lin for graph partitioning.
Performance of the MGPI Algorithm. Figures 14 to 17 show the
performance of the M G PI algorithm for random graphs with 200 nodes
and 500 nodes, respectively. Ten scrambling schemes discussed above were
Multispace Search for Combinatorial Optimization 495

experimented. In these figures, the leftmost vertical axis indicates the per-
formance of the Kernighan-Lin (K-L) algorithm.
Performance Comparisons of the M GPI to M GP3 Algorithms.
For a random graph with 1000 nodes, Figures 18 to 21 show the performance
of the M G PI to M G P3 algorithms. Ten scrambling schemes were experi-
mented. In these figures, the leftmost vertical axis indicates the performance
of the Kernighan-Lin algorithm.
Performance Comparisons of the MGPI to MGP5 Algorithms.
For a random graph with 1000 nodes, Figures 22 to 23 show the cut costs and
standard deviations of the MGPI to MGP5 algorithms. In these figures,
the leftmost vertical axis indicates the performance of the Kernighan-Lin
(K-L) algorithm.
Experimental results indicate that multispace graph partitioning algo-
rithms have significantly improved the performance of the Kernighan-Lin
algorithm in terms of the solution quality and its sensitivity to the initial
random partition.
496 Jun Gu

,
4560
Min-
Mean -+---
4550 Max --8----
4540

4530
....on
0
u 4520
'5 \ :,
u 4510 \ ':
\\ \:
::
4500
4490
\\
4480 V·····\ ......0.,... '-"'[3"""- 8..
8.
's.
~ ~ ______ [3~-_ ___ _ ____ ~8 ~~~~~~£3~-___ _ _______ ~B ______
_ ____ ______ _'.

sO s2 s4 s6 s8 s10
Schemes

Figure 14: Performance comparison between the MGPI algorithm and the
Kernighan-Lin algorithm for a random graph with 200 nodes.

18
Dev. -
16
14
<= 12
-.g0
.;;
., 10
Cl
"0
~ 8
"0
<=
~

cIJ 6
4
2

0
sO s2 s4 s6 s8 slO
Schemes

Figure 15: Standard deviations of the M G PI algorithm and the Kernighan-


Lin algorithm for a random graph with 200 nodes.
Multispace Search for Combinatorial Optimization 497

29450
Min-
Mean -+--
29400 Max .-8

29350
~

'"uo
:; 29300
U
29250 \ b.
\ "

29200 \ \ , : ••....• : ...... a ....... B ....: : : . : ·••• : •••••• : •••••••.

29150 _ \ ---+-------v-- ....

sO s2 s4 s6 s8 slO
Schemes

Figure 16: Performance comparison between the MGP1 algorithm and the
Kernighan-Lin algorithm for a random graph with 500 nodes.

40 Dev. -
35

<:: 30
0
.~
.;;: 25
OJ
Cl
-0 20
til
-0
<::
cd 15
Vi
10

0
sO s2 s4 s6 s8 slO
Schemes

Figure 17: Standard deviations of the MGP1 algorithm and the Kernighan-
Lin algorithm for a random graph with 500 nodes.
498 Jun Gu

119700 Min-
Mean -+---
Max ..E! ..•.
119600

119500
, .
\ \
~

'"
0
u 119400
\\
\ "
"5 \ ': .
u \\:
119300 \ \:
\ \:
119200 \~
II! ',_, .8
"8 ...... 8·
\' ", •• _• .CJ" •• " ••••• El~ ..• ..0"

119100
t'", ___.~._._._._._·_·~-_·~-----L..------~-"-_
....- -,- • ___ ""F
""8'". __ + _____ +
....... y ... _ ... ----+-----..
______ -+-.... _
-..

sO s2 s4 s6 s8 s10
Schemes

Figure 18: Performance comparison between the MGPI algorithm and the
Kernighan-Lin algorithm for a random graph with 1000 nodes.

119700 Min-
Mean -+---
Max .. E> ....
119600

119500
\ :..
~

'"
0
u 119400 \\
"5
u \\
119300 \\
119200
\,',~:. . . . ....•.... .8........•
B.
.• f] ...•..•• a···-- ... G ·_--···· Ef ••••':'0 ••••
, = .. -8- ....... £ ...

119100 t-..............i - ______ + ______ + _______+-______ + ______ + ______ -+_______ ....... ------..+---.. ---

sO s2 s4 s6 s8 s10
Schemes

Figure 19: Performance comparison between the MGP2 algorithm and the
Kernighan-Lin algorithm for a random graph with 1000 nodes.
Multispace Search for Combinatorial Optimization 499

119700 Min--
Mean -+---
Max --8----
119600

119500
...
til
0
OJ 119400
:;
u
119300

119200

119100

sO s2 s4 s6 s8 s10
Schemes

Figure 20: Performance comparison between the MGP3 algorithm and the
Kernighan-Lin algorithm for a random graph with 1000 nodes.

100
Remove edges - -
Add edges -+--
Add nodes --8----
80
<=
0
.~
-> 60
<U
0
"0
til
"0
<= 40
5
CI)

20

o ~~------~----~~----~------~----~
sO s2 s4 s6 s8 s10
Schemes

Figure 21: Standard deviations of the MGP1 to MGP3 algorithms and the
Kernighan-Lin algorithm for a random graph with 1000 nodes.
500 Jun Gu

119700 ~. Min-
Mean -+---
Max --E}-
119600

119500

§ 119400 \ ...

B :::::
K-L
~~>::~:~:: •.• •.•.
MGPO MGPI MGP2
Methods
MGP3 MGP4 MGP5

Figure 22: Performance comparison between the MGPl to MGP5 algo-


rithms and the Kernighan-Lin algorithm for a random graph with 1000
nodes.

100

80
Dev.
-
t:
0
O"§
':; 60
<1.)
Q
]
'U 40
§
en
20

o L -_ _ _ _ ~ _ _ _ __ L_ _ _ _ ~ ______ ~ _ _ _ __ L_ _ _ _ ~

K-L MGPO MGPI MGP2 MGP3 MGP4 MGP5


Methods

Figure 23: Standard deviations of the M G PI to M G P5 algorithms and the


Kernighan-Lin algorithm for a random graph with 1000 nodes.
Multispace Search for Combinatorial Optimization 501

6 Search Space Smoothing for Traveling Salesman


Problem
The Traveling Salesman Problem (TSP) is a well-known NP-hard combina-
torial optimization problem [68, 19]: a salesman tries to find the shortest
closed route to visit a set of cities under the conditions that each city is
visited exactly once. The distances between any pair of cities are assumed
to be known by the salesman.
Up to now, there seems no way to find an algorithm which can find
out an optimal solution for the TSP without suffering from exponentially
growing complexity. As a result researchers in the area have developed many
heuristic algorithms to solve the problem. Generally the performance of a
heuristic algorithm is measured by the "closeness" between its solution and
an optimal solution under some conditions.
Many local search algorithms have been developed to solve the TSP.
Local search is very efficient but, due to the rugged terrain surface of the
search space, a local search often becomes stuck at local minima. We use
search space smoothing methods to reduce the effect of local minima points.
The method works as follows. In an objective function space we alter the
structure of the objective function to produce a series of TSP problem in-
stances. They each have different terrain surface structures in the search
space. Initially, a simplified TSP instance with a smoother terrain surface
is solved which produces a route solution. Then, a more complicated TSP
instance which has a rougher terrain surface is generated. It takes the solu-
tion of the previously solved TSP as an initial route and further improves
the route. Eventually the original TSP instance with the most complicated
search space is solved.
Any existing heuristic local search algorithm can be used in conjunction
with this approach. We tested the performance of this method with nu-
merous randomly generated TSP instances and found that it significantly
improved the performance of the existing local search algorithms for the
traveling salesman problems.

6.1 Traditional Approaches


Many heuristic algorithms have been developed to solve the TSP. Golden and
Stewart gave an early review of some TSP heuristic algorithms [21]. Johnson
and McGeoch have recently written a comprehensive survey for TSP [56].
The traditional heuristic algorithms for TSP fall into the following three
502 Jun Gu

categories: tour construction, tour improvement, and a mix of both.


In tour construction approach, we start from an initial subtour and try to
extend this subtour into a complete one which is an approximately optimal
tour. The initial subtour is usually a randomly chosen city or a self-loop.
Typical algorithms in this category include arbitrary insertion [86], convex
hull insertion [104], greatest angle insertion [77, 78], and ratio times differ-
ence insertion [79].
In tour improvement, we start from an initial complete route and try to
reduce the length of the route as long as the route remains complete. A
frequently used heuristic of tour improvement is edge exchange [23, 71, 72].
A recent new development in this direction is to use stochastic methods to
solve the TSP [10, 62, 93].
The third approach combines tour construction and tour improvement.
Tour construction procedure provides an initial, complete route and tour
improvement procedure further improves the initial route.
In this case study, we will focus on the discussion of the edge exchange
(tour improvement) techniques since they are more effective methods. A
straightforward method in tour improvement is city swap heuristic, i.e.,
swapping two cities on a route. If the swap reduces the total length, keep
it; otherwise, try another swap. Figure 24 illustrates the tour configuration
changes using the city-swap heuristic.

Xl~Yl
x"/
,, '.. Y,
Xl' Y3

(a) (b)

Figure 24: Illustration of the route configuration changes using the city-
swap heuristic. (a) Before swaps. (b) After swaps: d(X2,yd + d(X2,Y3) +
d(XI' Y2) + d(X3, Y2) < d(Xb X2) + d(X2' X3) + d(Yb Y2) + d(Y2' Y3).

Although the city-swap heuristic is simple and straightforward, empiri-


Multispace Search for Combinatorial Optimization 503

cally, it is much less effective than other tour improvement heuristics. The
best known and more effective heuristics are r-opt (r = 2,3, ... ) and Or-opt
procedures [72, 79J. Number "r" in the r-opt is the number of edges to be
exchanged. Figure 25 illustrates an operation example of the 2-opt heuristic.
The r-opt procedure works as follows: At first, delete r edges from a route

X2
?~"\' Y2 X2
~:Q,~'y 2

(a) (b)

Figure 25: Illustration of the route configuration changes using the 2-opt
heuristic. (a) Before exchanges. (b) After exchanges: d(Xb Y2) + d(X2' yd <
d(Xl,X2) +d(Yl,Y2).

and add r new edges from the other parts of the route as long as the result
remains as a complete route. If the exchange results in a shorter route, keep
the exchange; otherwise, try other exchanges. Secondly, repeat Step 1 as
long as there are improvements after some exchanges. If no more improve-
ment can be made by all the possible exchanges, i.e., a local minimum point
is met, then exit and output the final route as a result.

A large r leads to more powerful heuristics which may provide solutions


closer to the optimal solutions. On the other hand, a large r requires the
testing of a large number of possible exchanges which increases the comput-
ing time.

The Or-opt procedure is a modified 3-opt procedure. It improves the r-


opt heuristic by reducing the number of exchanges to be tested. It considers
those exchanges which are obtained by cutting a piece of route and insert it
between two other cities, as illustrated in Figure 26.
504 Jun Gu

Xm.+l

I ((' -~~----------:::::><:::
Y,
y.
Y,
\"',:=........... Y,
,,
Xo Xo

(a) (b)

Figure 26: Illustration of the route configuration changes using the Or-opt
heuristic. (a) Before changes. (b) After changes: d(xo, X m+1) + d(Xl' yI) +
d(xm, Y2) < d(xo, xI) + d(xm, xm+I) + d(Yl' Y2).

6.2 A Multispace Search Algorithm for Traveling Salesman


Problem
In this section, we first describe the ideas of smoothing the objective function
in the objective function space. Then we give a multispace search algorithm
for the traveling salesman problem.

1. Smoothing the Search Space for TSP

There are many ways to smooth a search space (Section 4.4). The
smoothing method employed here is to simplify the search space of a hard
TSP problem instance by a series of simple ones, each is a gradually smoothed
approximation of the original search space (Figure 7). The number of lo-
cal minimum points in a simplified search space can be reduced by such a
simplification.
A smoothing operation has different levels of strength, resulting in a
search space with a varying degree of smoothness. We use a smoothing fac-
tor, C¥, to characterize the degree of a smoothing operation and the smooth-
ness of the resulting search space. If a = 1, no smoothing operation is
applied, the search space is the same as the original search space. If a > 1,
a smoothing operation is applied, the smoothed search space is flatter than
that of the original search space. If a » 1, the smoothing operation has a
stronger effect, the resulting smoothed search space is nearly flat.
Multispace Search for Combinatorial Optimization 505

An uniform instance of the TSP is an instance where all the distances


among the cities are equal. In such a case, the objective function is fiat since
there is only one route length. Any route is optimum for this simple uniform
instance. There is no local minimum point in the search space. For a given
TSP, let n be the number of cities, dij be the distances between cities i and
j3 (i,j = 1,2, ... , n), and dbe the average distance among all the cities, then

- 1 '" (2)
d= ( ) ~dij.
n n-1 2,J
.-1-"

So an uniform instance is a TSP with dij = d, for i, j = 1,2, ... , n.


A simplified TSP instance can be defined by the smoothing factor, a.
For a TSP instance, the distances among all the cities, i.e., dij(a), are
determined by 0: as:

(3)

where a 2: 1. When a is decreased step by step from a large number, say 10,
to 1, a series of simplified TSP instances is generated (Figure 7). A search
space generated from a larger a exhibits a smoother terrain surface, and
a search space generated from a smaller a exhibits a more rugged terrain
surface. Two extreme cases of the series of the TSP instances are:

1. If a » 1, then dij (0:) --+ d, this is the unform instance;


2. If 0: = 1, then dij(a) = dij , which is the original problem instance.
Once we have a series of smoothed search space, we can use any existing
local search algorithm to solve them. The idea of our simplification method
is simple. We start from the uniform TSP instance, i.e., the case with a fairly
"fiat" search space, generate a simplified problem instance whose smoothed
search space is close to the uniform one, and find the solution to this sim-
plified problem instance using an available TSP local search algorithm. The
solution of the uniform problem instance is then taken as the initial route to
the next problem instance that has a slightly more rugged search space. The
problem is again solved using the same local search algorithm. The above
procedure is repeated until the final TSP problem instance, which has the
original complicated search space, is solved.
3Note that, without lossing generality, all distances dij are normalized so that 0 ~
d ij ~ 1.
506 Jun Gu

The ideas described here are expressed in an algorithm form below.

2. A Multispace Search Algorithm for TSP

A multispace search for TSP, the T SP algorithm, is given in Figure 27


[44]. The algorithm works as follows.

procedure TSP (route,length)


begin
1* initialization */
problem_route := geLa_TSP jnstanceO;
route := generate_an_uniformjnstance(problem_route);
length := computeJoute_length(route);
a:= aD;

1* scrambling between objective function space and value space */


while (a ~ 1) do
begin
/* construct a new route in objective function space */
for i := 1 to n do
for j := 1 to n do
d[iHil := compute_dij (a);
1* search in value space */
route := TSP Jocal..search(d,route);
length := computeJoute_length(route);
a := a-I;
end;
1* when route returns to original problem_route */
return route and length;
end;

Figure 27: TSP: a multispace search algorithm for the traveling salesman
problem.

Initialization. Initially, procedure geLa_TSP_instanceO obtains a given


TSP problem_route (the given problem_route may be a randomly constructed
J1ultispace Search for Combinatorial Optimization 507

route). Procedure generate_an_uniform_instanceO makes the given prob-


lem_route an uniform instance. Function compute_routeJengthO returns
route length, length, as the initial value of the objective function. A proper
initial value of the smoothing factor 0, 00, is chosen (in this case it was
chosen to be less than 10).

Search. During each iteration of the search, a simplified TSP instance


with a smoother search space is generated. This is done by two for loops
that compute the distances among all the cities in the TSP instance. A
local search algorithm, i.e., TSP_locaLsearchO, can be used to minimize the
objective function, length. In this case study, we used three existing TSP
local search algorithms: 2-opt, Or-opt, and city-swap procedures. Procedure
TSPJocaLsearch() finds a route as the solution to the current TSP instance.
Subsequently, computcroute_lengthO computes the length for the new route.
At the end of each iteration loop, 0 is reduced by one. The same while
loop is repeated and a more complicated TSP instance is generated. The
local search algorithm takes the solution obtained from the previous, simpler
TSP problem instance as the initial route and searches for a solution of this
more difficult TSP instance. This solution again is used as the initial route
of an increasingly complicated TSP problem instance (with a smaller a)
which will be generated during the next while loop.

Termination. In the TSP algorithm, ao iterations are required to


reduce a to 1. When a is reduced to 1, route becomes the original given
problem_route and we have the original TSP instance. The solution yielded
by the algorithm at this time is thus a solution to the original problem. So
the algorithm returns the solution route and its length.

Running Time. The run time of the T SP algorithm can be estimated


as follows. Procedure geLa_TSP _instanceO takes O(n 2) time to generate
an initial TSP instance since, in case of a complete graph, there are n(n -
1) /2 arcs that have to be connected. It takes the same cost for procedure
generate_an_uniform_instance() to build an initial uniform instance. In the
beginning of the search, two for loops take 0(n 2) time to produce a simplified
TSP instance. Let the run time of procedure TSP_locaLsearchO be 0(.),
then the run time of the while loop is 0(aon2) + aoO(.). Summarizing the
above, the time complexity of the T S P algorithm is:
508 Jun Gu

Since most TSP_locaLsearch procedures, such as 2-opt, 3-opt, and Kernighan-


Lin algorithms, have a higher than 0(n 2 ) run time,4 the run time of the TSP
can be written as 0(.), which is determined by the local search algorithm
used in the TSP algorithm and is not affected by the 0(n 2 ) run time of the
search space smoothing procedure.

6.3 Experimental Results


A goal of testing a TSP algorithm is to see how close the route length pro-
duced by the algorithm is to an optimal route length. For a given TSP
instance, the optimal route length in most cases is unknown. Thus, re-
searchers in the area often compare for the same TSP instance the route
lengths produced from some existing TSP local search algorithms [68].
Since we used 2-opt, Or-opt, and city-swap procedures in the TSP algo-
rithm, we will make relative performance comparisons directly between the
local search (2-opt, Or-opt, and city-swap) algorithms and the multispace
search (TSP) algorithm.
Since TSP runs the local search algorithm aD times, we should run the
same local search algorithm aD times, and compare the route length pro-
duced from the TSP and the minimum route length produced from aD exe-
cutions of the same local search algorithm. That is, if we run a local search
algorithm #local times, we will run the TSP algorithm #TSP (= #locat!aO)
times and then compare their results. In the following experiments, we use
"running times" to denote #local. The initial value of a, i.e., aD, was set to
5.
All the comparisons in the following experiments are based on the per-
centage improvement of the route length. The improvement percentage of
algorithm A over algorithm B is defined as:

length(B) - length(A) x 100%,


length(A)

where length(A) represents the best solution (i.e., the shortest route length)
given by algorithm A. The same is true for length(B).
In our experiments, the initial problem structure of a TSP was generated
randomly. The distances among n distinct cities were set randomly within
the range of [0,1] with a uniform distribution.
4The worst-case running time bounds for 2-opt, 3-opt, and Kernighan-Lin search pro-
cedures are O(n 2 ), O(n 3 ), and O(n 5 ), respectively.
Multispace Search for Combinatorial Optimization 509

Improvement for a TSP Instance. For a TSP instance with 50 cities,


Figure 28 shows that as the local search execution time (#local) increases
the route lengths yielded by a 2-opt TSP local search algorithm (dashed
line) and the TSP algorithm (solid line). The experiment showed that, for

2.6
Local search ~
Multispace search
2.5

2.4

.sbll
c: 2.3
0)
-l
.!:l
::l
----\
2.2 ----\
\---\
0
~

1______ - - \

2.1
,-------------------------------------------------------\-------\

1.9 ' - - _ . L . . . . . . _ - ' - - _ - ' - _ - ' - _ - - ' - _ - - ' - _ - - ' -_ _'--_.L......---'

o 10 20 30 40 50 60 70 80 90 100
Local Search Execution Times

Figure 28: Route length as the increase of running times for a 2-opt local
search algorithm. Each instance had 50 cities.

the same 2-opt local search algorithm, adding simple smoothing operations
could reduce the route length. As the local search execution time increased,
while both algorithms improved route length, the T S P algorithm always
performed better than the 2-opt local search algorithm.

Improvements for Multiple TSP Instances. Ten randomly gener-


ated TSP instances, each with 50 cities, were used to study the performance
improvement for the 2-opt local search with search space smoothing. As
one can see from Table 1 that significant performance improvements, with
percentages varying from 7.712% to 31.89%, were achieved for all ten TSP
problem instances.
510 lun Gu

Table 1: Performance comparisons for a 2-opt local search algorithm and


the same algorithm with search space smoothing operations. 10 instances
with 50 cities were tested. Each had 25 executions.

Instances Local Multispace Improvements


Search Search (%)
1 2.377 2.113 12.49 %
2 2.381 2.112 12.75 %
3 2.775 2.393 15.97 %
4 2.438 2.084 16.97 %
5 2.524 1.914 31.89 %
6 2.433 2.226 9.284 %
7 2.652 2.346 13.04 %
8 3.035 2.518 20.51 %
9 2.873 2.485 15.65 %
10 2.423 2.250 7.712 %

Improvement with Increasing Problem Size. Table 2 gives some


statistics on the improvement percentage as the increase of the problem
size. The results in the table were obtained based on the average of 100
TSP problem executions. For the same problem instances, there were dif-
ferent performance improvements for three local search algorithms. The
improvement of the city-swap was the best, that of the 2-opt was the sec-
ond, and the improvement of the Or-opt was the least. This was due to
the original performance of the three local search algorithms. The Or-opt
algorithm was the best among all three. Solutions yielded from an Or-opt
algorithm were closer to the optimal one, compared to those given by the
other two algorithms. Therefore, less room was left for further improvement.
The city-swap heuristic was the worst among three. There was much space
left for further improvement.

Improvement as the Increase of Running Times. As illustrated in


Table 3, the performance improvement for different heuristic algorithms are
relatively stable with the increasing of execution times. The results shown in
Multispace Search for Combinatorial Optimization 511

Table 2: Percentage of performance improvement as the increase of the


problem size. 100 problem instances were tested. Each had 25 executions.

n 50 60 70 80 90 100
2-opt 15.91 % 19.43 % 22.43 % 22.39 % 25.43 % 26.68 %
Or-opt 3.781 % 4.669 % 6.205 % 6.202 % 6.208 % 5.993 %
city swap 19.68 % 22.90 % 25.53 % 27.95 % 29.83 % 33.53%

Table 3: Percentage improvement of the route length as the increase of


running times. 100 problem instances were tested. Each has 50 cities.

I Running Times I 25 50 75 100


2-opt 15.91 % 13.62 % 15.49 % 14.22 %
Or-opt 3.781 % 3.216 % 3.398 % 3.272 %
city swap 19.68 % 20.55 % 18.95 % 21.03 %

the table were obtained based on the average over 100 problems' executions.
Each instance had 50 cities.

Improvement with Different Smoothing Schemes. There are dif-


ferent ways to control the strength of a search space smoothing opera-
tion. The use of different decreasing schemes for a results in a different
series of the simplified TSP instances. The decreasing scheme for a men-
tioned before is called Scheme 1. To test the performance under differ-
ent simplification schemes, we used another decreasing scheme, Scheme 2.
In Scheme 2, a is reduced by Mix, where M is a positive integer and x
is incremented from 1 to M. This produces a sequence of decreasing a:
M, M/2, MI3, ... ,MI(M - 1),1. In the experiment, M was set to 5. Thus
five runs were required in Scheme 2 to yield a solution to the original prob-
lem.
Scheme 1 yields a smoother search space than that of Scheme 2. Ta-
ble 4 demonstrates that there is not much difference in their performance
variations. That is, the multispace search method based on the distance
512 Jun Gu

Table 4: Performance comparisons of the different simplification schemes.


100 instances with 50 cities were tested. 25 executions for each instance.

I Schemes I 2-opt lOr-opt I city-swap I


Scheme 1 15.91 % 3.781 % 19.68 %
Scheme 2 15.92 % 3.586 % 19.19 %

computation (Equation 3) is less sensitive to the implementation details.


Schneider et al. have recently made an extensive study of various smooth-
ing formulas including linear smoothing, power-law smoothing, exponential
smoothing, hyperbolic smoothing, sigmoidal smoothing, and logarithmic
smoothing for a number of combinatorial optimization algorithms [90]. They
found that search space smoothing method outperformed other unsmoothed
algorithms in almost all cases.

7 Local Search for Scheduling and Task Assign-


ment
Scheduling computational tasks to multiprocessors is one of the key is-
sues in high-performance computing today. Scheduling can be performed
at compile-time or runtime. Scheduling performed at compile-time is called
static scheduling. Scheduling performed at runtime is called dynamic schedul-
ing. The flexibility inherent in dynamic scheduling allows for adaptation to
the unforeseen applications requirements at runtime. However, load balanc-
ing suffers from run-time overhead due to load information transfer among
processors, load balancing decision making process, and communication de-
lay due to task relocation. Furthermore, most runtime scheduling algorithms
utilize neither the characteristics information of application problems, nor
global load information for load balancing decision. The major advantage of
static scheduling is that the overhead of the scheduling process is incurred at
compile time, resulting in a more efficient execution time environment com-
pared to dynamic scheduling. Static scheduling can utilize the knowledge of
problem characteristics to reach a well-balanced load.
We consider static scheduling algorithms that schedule an edge-weighted
Multispace Search for Combinatorial Optimization 513

directed acyclic graph (DAG), or called task graph, to a set of homogeneous


processors to minimize the completion time. Since the static scheduling
problem is NP-hard in its general forms [19], and optimal solutions are
known in restricted cases [9, 13, 22], there has been considerable research
effort in this area resulting in many heuristic algorithms [8, 12, 61, 89, 97,
111, 113].
In this section, we present a fast local search algorithm, namely TASK
(Topological Assignment and Scheduling Kernel), that can effectively sched-
ule DAGs onto multiprocessors [29, 48, 112]. Randomized local search was
efficient but was unable to produce the best scheduling results. Based on the
dynamic reconstruction of directed acyclic graph in each move, TASK can
quickly determine the optimal search direction and systematically minimizes
a given schedule in a topological order. Tested through real multiprocessor
machines, TASK is faster than existing scheduling algorithms. It can effec-
tively reduce the scheduling length generated by other existing scheduling
algorithms by 20% to 30%.

7.1 DAG Scheduling


A directed acyclic graph (DAG) consists of a set of nodes {nl,n2, ... ,nn}
connected by a set of edges, each of which is denoted by ei,j. Each node
represents a task, and the weight of node ni, w(nd, is the execution time
of the task. Each edge represents a message transferred from one node to
another node, and the weight of edge ei,j, w( ei,j) is equal to the transmission
time of the message. The communication-to-computation ratio (CCR) of a
parallel program is defined as its average communication cost divided by its
average computation cost on a given system. In a DAG, a node which does
not have any parent is called an entry node whereas a node which does not
have any child is called an exit node. A node cannot start execution before
it gathers all of the messages from its parent nodes. In static scheduling, the
number of nodes, the number of edges, the node weight, and the edge weight
are assumed to be known before program execution. The weight among two
nodes assigned to the same processing element (PE) is assumed to be zero.
The objective in static scheduling is to assign nodes of a DAG to PEs such
that the schedule length or makespan is minimized without violating the
precedence constraints. There are many approaches that can be employed
in static scheduling. In the classical approach [54], which is also called list
scheduling, the basic idea is to make a priority list of nodes, and then assign
these nodes one by one to PEs. In the scheduling process, the node with the
514 Jun Gu

highest priority is chosen for scheduling. The PE that allows the earliest
start time, is selected to accommodate this node. Most of the reported
scheduling algorithms are based on this concept employing variations in the
priority assignment methods such as HLF (Highest level First), LP (Longest
Path), LPT (Longest Processing Time) and CP (Critical Path) [2, 111,
64]. In the following we review some of contemporary static scheduling
algorithms including MCP, DSC, DLS, and CPN methods.
The Modified Critical Path (MCP) algorithm is based on the as-late-
as-possible (A LAP) time of a node [111]. The ALAP time is defined as
TL(ni) = Tcritical-level(ni), where Tcritical is the length of the critical path,
and level(ni) is the length of the longest path from node ni to an exit node,
including node ni [13]. The MCP algorithm was designed to schedule a
DAG on a bounded number of PEs. It sorts the node list in the increasing
ALAP order. The first node in the list is scheduled to the PE that allows
the earliest start time, considering idle time slots. Then the node is deleted
from the list and this operation repeats until the list is empty.
The Dominant Sequence Clustering (DSC) algorithm is designed based
on an attribute for a task graph called the dominant sequence (DS) [113].
A DS is defined for a partially scheduled task graph as the path with the
maximum sum of communication costs and computation costs in the graph.
Nodes on the DS are considered to be relatively more important than others.
The ready nodes with the highest priority in terms of blevel + tlevel will be
scheduled first. Then the priorities of the child nodes of the scheduled node
will be updated and this operation repeats until all nodes are scheduled.
The dynamic cost (i.e., tleve~ is used to quickly determine the critical path
length. This idea has been incorporated into our TASK algorithm to reduce
its complexity.
The Dynamic Level Scheduling (DLS) algorithm determines node prior-
ities by assigning an attribute called dynamic level (DL) to each node at
every scheduling step [97]. DL is the difference between the static level and
message ready time. DLS computes DL for each ready node on all available
processors. Suppose DL(ni' J) is the largest among all pairs of ready node
and available processor. Schedule ni to processor J. Repeat this process
until all nodes are scheduled.
Recently, a new algorithm has been proposed by using the Critical Path
Node (CPN) [67]. This algorithm is based on the CPN-dominate priority.
If the next CPN is a ready node, it is put in the CPN-dominate list. For
a non-ready CPN, its parent node ny with the largest blevel is put in the
list if all the parents of ny are already in the list; otherwise, all the ancestor
Multispace Search for Combinatorial Optimization 515

nodes of ny are recursively included in the list before the ePN node is in
the list. The first node in the list is scheduled to the PE that allows the
earliest start time. Then the scheduled node is removed from the list and
this operation repeats until the list is empty. The ePN algorithm utilizes the
two important properties of DAG: the critical path and topological order.
It potentially generates a good schedule.
Although these algorithms produce relatively good schedules, they are
usually not optimal. Sometimes, the generated schedule is far from opti-
mal. In this section, we propose a fast, deterministic local search algorithm,
TASK, to improve the quality of schedules generated by an initial scheduling
algorithm.

7.2 Local Search for Scheduling and Task Assignment


There were several significant local search solutions to the scheduling and
task assignment problems [40]. Early local search method was able to solve
small unconstrained path-finding problems such as TSP [71]. During the
middle and late eighties, more powerful techniques for local search were de-
veloped. These include conflict minimization, random variable selection, and
pre-, partial, and stochastic variable selection [24, 30, 32, 98, 99, 101, 100,
102]. These randomized local search algorithms can handle large size sat-
isfiability, constraint satisfaction problem (eSP), and scheduling problems
efficiently.
The scheduling and task assignment problems are well-known as esp /SAT
problems. In fact, SAT, n-queen and scheduling are three typical problems
in constraint satisfaction. SAT model formulates scheduling and task assign-
ment problems precisely in algebraic expressions. CSP model expressively
characterizes scheduling and task assignment operations in simple geometric
patterns e.g., n-queen and hyper-queen models below. Following this, sev-
eral local search solutions to SAT and the n-queen problems were applied
to solve large-scale industrial scheduling and task assignment problems. A
detailed discussion of the early work for SAT, n-queen and scheduling was
given in [40].
In the late eighties, it was found that, by a remarkable coincidence, the
n-queen model represents a significant model for scheduling and task assign-
ment problems [24, 49]. The underlying structure of the n-queen problem,
represented by a complete constraint graph, gives a relational model with
fully specified constraints among the multiple objects [24]. Variations on the
dimension, the objects' relative positions, and the weights on the constraints
516 Jun Gu

led to a hyper-queen problem model which had several simple and basic
models:
• n-queen problem: the base model. N queens are indistinguishable
and the constraints among queens are specified by the binary values
(i.e., 1 or 0).

• w-queen problem: the weighted n-queen model. N queens are dis-


tinguishable. Each is associated with a cost. The constraints among
queens are specified by some weights.

• 3d-queen problem: queens are to be placed in a 3-dimensional (1 x


m x n) rectangular cuboid. A special case, nm-queen, is to place
queens on an n by m rectangle.

• q+-queen problem: more than one queens are allowed to be placed


on the same row or the same column.
The hyper-queen problem can model the objects/tasks, the performance
criteria, the timing, spatial, capacity and resource constraints for a wide
range of scheduling and task assignment problems. This made the n-queen
problem a general model for many industrial scheduling and task assign-
ment problems. By a remarkable coincidence, the models of several difficult
scheduling projects at that time were either the n-queen or the hyper-queen
problems [59, 105]. All of them required efficient solutions to the n-queen
or hyper-queen problems.
Let's consider a static scheduling problem where we are to schedule an
edge-weighted directed acyclic graph (DAG), or called task graph, to a set of
homogeneous processors to minimize the completion time. This problem is
known to be NP-hard in its general forms [19]. Using the hyper-queen model,
a set of weighted queens is to be placed on a T by P rectangle (see Figure 29).
Here T denote the execution time, P the number of processors, Ji the ith
queen (i.e., the ith job and its execution time), and Cij the communication
time from the ith job to the jth job, the goal is to place the job queens onto
the T by P board and minimize the longest execution path, following the
given topological constraints. Instead of minimizing conflicts among queens
on diagonal lines as in the n-queen problem, here we reduce conflicts among
jobs and minimize the longest execution path following the given topological
constraints.
We search different geometric patterns in the n-queen and the schedul-
ing problems. These patterns produce different constraints in CSP and do
Multispace Search for Combinatorial Optimization 517

2 3 4
r---
Q T1 Q1 T=1 J,
r, r,
2 Q T2 Q4 Qs T=2 f\ J4 JB
T3 Qs T=3 R4 ~
Js f\
Q
T=4 ~ >--..,.../ R5
3
T4 I '>-.-!
4 Q ~ y--.,
Ts Q2 Qs T=5 J 2 Js
1\
I \
~
Ts R4 f\
'-~
r-'"
>- -- r
,,-,
I
"\-.../
'
T7 Q3 Q7 Q9 J3 J7 J9
Ts f\ R4 Rs
~
T9 '-.../ '-.../

TlO '-J

(a) (b) (c)

Figure 29: The n-queen scheduling model. Instead of minimizing conflicts


among queens on diagonal lines, here we reduce conflicts among jobs and
minimize the longest execution path following the given topological con-
straints.

not affect the formulation and the use of min-conflict heuristic. In fact,
once the constraints are extracted, an algorithm developed for the n-queen
problem can be used to solve the scheduling problem with or without minor
modifications. The hyper-queen models has freed the n-queen problem from
its puzzle game background and made many practical scheduling and task
assignment applications feasible. These include [34, 102] task scheduling,
real-time system, task assignment, computer resource management, VLSI
circuit design, air traffic control, communication system design, and so on.
Polynomial time, analytical solutions for the n-queen problem exist but
they cannot solve the general search problems and have no use in prac-
tice [1, 3, 4, 14, 51, 85]. Following local conflict minimization [24, 47], a
QS1 algorithm was developed during late 1987 and was implemented dur-
ing early 1988. It was the first local search algorithm developed for solv-
518 Jun Gu

ing the large size n-queen problem [24, 103, 98, 99J. Three improved local
search algorithms for the n-queen problem were developed during 1988 to
1990 [65, 66, 58, 87J. QS2 is a near linear-time local search algorithm with
an efficient random variable selection strategy [100J. QS3 is a near linear-
time local search algorithm with efficient pre- and random variable selection
and assignment [100J. QS4 is a linear time local search algorithm with
efficient partial and random variable selection and assignment techniques
[101, 102J. Compared to the first local search algorithm [24], partial and
random variable selection/ assignment heuristics have significantly improved
search efficiency by orders of magnitude. QS4, for example, was able to
solve 3,000,000 queens in a few seconds.

Three years after releasing the QS1 algorithm, Minton et al. indepen-
dently reported a similar local search algorithm for the n-queen problem
[74, 75J. A major difference between Minton's algorithms and Sosic and
Gu's algorithms was that Minton's algorithm was a one dimensional local
search without using random heuristics.

Early local search solutions for scheduling applications were developed


during the late eighties. Since then more than one hundred industrial com-
panies worldwide have developed similar scheduling software systems for
various industrial scheduling applications.

There are two basic strategies in local search. The first one is random
search, in which the local search direction is randomly selected. If the initial
solution point is improved, it moves to the refined solution point. Otherwise,
another search direction is randomly selected. The random strategy is simple
and effective for some problems, such as the n-queens problem [98J. However,
it may not be efficient for other problems such as the microword length
minimization [83J and DAG scheduling problem.

The second strategy utilizes certain criteria to find out a search direction
that most likely leads to a better solution point. In the computer microword
length minimization [83]' a compatibility class will be considered only when
moving some nodes from the class may reduce the cost function. This strat-
egy effectively reduces the search space by guiding the search towards a more
promising direction. With carefully selected criteria, local search for DAG
scheduling becomes very efficient. The scheduling quality can be improved
significantly.
Multispace Search for Combinatorial Optimization 519

7.3 Random Local Search Algorithm


A number of local search algorithms for scheduling have been developed
based on n-queen and hyper-queen scheduling models [29, 34, 102, 48, 112,
42, 67]. A typical random local search algorithm, RAND, was designed and
implemented [42, 67]. The algorithm makes a direct use of the random
variable selection heuristics developed from the QS algorithms [101].
In RAND algorithm, a node is randomly picked from the blocking node
list, where a blocking node is defined as a node that has the potential to block
critical path nodes. Then the node is moved to a randomly selected PE. If

procedure RAND (G,DAG_Schedule)


begin
construct_a_scheduled~AG (G);
construcLthe_blocking~odeJist() ;
k:= 0;
while (k++ < Max_iteration) do
begin
pick a node ni randomly from the blocking node list;
pick a PE P randomly;
assign ni to PE P;
if schedule length increases then;
move ni back to its original PE;
if local then random_locaLhandlerO;
end;
end;

Figure 30: A random local search algorithm for DAG scheduling.

the schedule length is reduced, the move is accepted. Otherwise, the node
is moved back the its original PE. Each move, successful or not, takes O(e)
time to compute the schedule length, where e is the number of edges in the
graph. To reduce its complexity, a constant M axjteration is defined to limit
the number of steps so that only Max_iteration nodes are inspected. The
time taken for the algorithm is proportional to exMaxjteration. Moreover,
randomly selected nodes and PEs may not be able to significantly reduce the
length of a given schedule. Even if the Max_iteration is equal to the number
520 Jun Gu

of nodes, leading to a complexity of O(en), the random search algorithm still


cannot provide a satisfied performance.

7.4 Local Search with Topological Ordering


We propose a fast local search algorithm utilizing topological ordering for
effective DAG scheduling. The algorithm is called TASK (Topological As-
signment and Scheduling Kernel) [29, 48, 112]. In this algorithm, the nodes
in the DAG are inspected in a topological order. In this order, it is not
required to visit every edge to determine whether the schedule length is
reduced. Time spent on each move can be drastically reduced so that in-
specting every node in a large graph becomes feasible. Also, in this order,
we can compact the given schedule systematically.
For a given graph, in order to describe the TASK algorithm succinctly,
several terms are defined as follows:

• tlevel(ni), the largest sum of communication and computation costs


at the top level of node ni, i.e., from an entry node to ni, excluding
its own weight w(ni).

• blevel(nd, the largest sum of communication and computation costs


at the bottom level of node ni, i.e., from ni to an exit node.
• The critical path, CP, is the longest path in a DAG. The length of the
critical path of a DAG is

Lop = max{L(ni)},
niEV

where L(ni) = tlevel(ni) + blevel(ni) and V is the node set of the


graph.

If the given graph has been previously scheduled, more terms are defined:

• Node ni has been scheduled on PE pe(ni).

• Let p(ni) be the predecessor node that has been scheduled immediately
before node ni on PE pe(nd. If node ni is the first node scheduled on
the PE, p(ni) is null.

• Let s(ni) be the successor node that has been scheduled immediately
after node ni on PE pe(ni). If node ni is the last node scheduled on
the PE, s(ni) is null.
Multispace Search for Combinatorial Optimization 521

One of characteristics of the TASK algorithm is its independence of the


algorithm that was used to generate the initial schedule. As long as the
initial schedule is correct and every node ni has available pe(ni), p(ni), and
s(ni) nodes, application of the local compaction algorithm guarantees that
the new schedule of the graph is better than or equal to the initial one. The
TASK algorithm is shown in Figure 31.
The input of the algorithm is a given DAG schedule generated by any
heuristic DAG scheduling algorithm. In the first initialization step, a sched-
uled DAG is constructed, which contains scheduling and execution order
information. To enforce the execution order in each P E, some pseudo edges
(with zero weights) are inserted to incorporate the initial schedule into the
graph. The for loop in the initialization computes the value of blevel of each
node in the scheduled DAG and initializes tlevel for entry nodes. Every edges
are marked unvisited. The variable nextk points to the next nodp that has
not been inspected in PE k. Initially, none of nodes is inspected so nextk
points to the first node in PE k.
In the while loop, a ready node ni with the maximum value L(ni) =
tlevel(ni)+blevel(ni) is selected for inspection. Ties are broken by tlevel(ni).
A node is ready when all its parents have been inspected. In this way,
the nodes are inspected in a topological order. Although other topological
orders, such as blevel, tlevel, or CPN-dominate can be used, L has been
shown to be a good indicator to determine the node order of inspection [113J.
To inspect node ni, in the for loop, the value L(nd = tlevel(nd +
blevel (ni) is re-calculated for each PE. To conduct the recalculation at P E
k, node ni is pretended to be inserted right in front of nextk. Here, tlevel(ni)
can be varied if any of its parent nodes was scheduled to either PE k or PE
pe(ni). Similarly, blevel(ni) can be varied if any of its child nodes was ini-
tially scheduled to either PE k or PEpe(ni). Because the tlevels of its parent
nodes are available and the blevels of its child nodes are unchanged, the value
of L(ni) in every PE can be easily computed. The values indicate the degree
of improvement by local search. With the new L( ni) 's recalculated for every
PE, node ni is then moved to the PE that allows the minimum value of
L(ni). If node ni has been moved to PE t, the corresponding pseudo edges
are to be modified by the modify_pseudo_edges_in_DAGO. The tlevel of ni
is propagated to its children so that when a node becomes ready, its tlevel
can be computed. This process continues until every node is inspected.
The TASK algorithm satisfies the following properties [48, 112J:
1. The critical path length Lcp will not increase after each step of the
TASK algorithm.
522 Jun Gu

procedure TASK (DAG_Schedule)


begin
1* initialization */
Construct a scheduled DAG;
for node i := 0 to n - 1 do
L(ni) := tlevel(ni) + blevel(nd;
Lcp := maxO::;i<n L(ni), the longest path in DAG;

1* search */
while there are nodes in DAG to be scheduled do
begin
i := pick_a_node_with~ax_L(nd;
for each PE k := 0 to p - 1 do
Lk(nd := move_ni_to_PE_k;
t := pick_a_PE_with~in~k, where k = 0, ... ,p -1;
1* if no improvement */
if t == pe(ni) then
let node ni stay at PE pe(ni);
/* if there are improvements */
else begin
move node ni from PE pe(nd to PE t;
modify _pseudo_edges_in-.DAG;
propagate_tleveLoLni _to_i ts_children;
end;
mark ni as being scheduled;
end;
end;

Figure 31: TASK: Topological Assignment and Scheduling Kernel, a fast,


deterministic local search algorithm based on topological ordering for fast
scheduling.
Multispace Search for Combinatorial Optimization 523

2. If the nodes in a DAG are inspected in a topological order and each


ready node is appended to the previous node list in the PE, the blevel of a
node is invariant before it is inspected and the tlevel of a node is invariant
after it is inspected.
3. The time complexity of the TASK algorithm is O(e + nlogn + np),
where e is the number of edges, n is the number of nodes, and p is the
number of PEs.
In the following, we use an example to illustrate the operation of the
TASK algorithm.

Example: Assume the DAG shown in Figure 32 has been scheduled


to three PEs by a DAG scheduling algorithm. The schedule is shown in

Figure 32: A DAG example for scheduling.

Figure 33(a), in which three pseudo (dashed) edges have been added to
construct a scheduled DAG, one from node n6 to node ns, one from node n3
to node ng, and one from node n4 to node n5 (not shown in Figure 33(a)).
The schedule length is 14. The blevel of each node is computed as shown in
Table 5. Tables 6 and 7 trace the tlevel + blevel = L values for each step.
524 Jun Gu

PEa PEt PE2 PEa PE1 PE2 PEa PE1 PE2


<II OJ>

~ 0 ~o

2 2 2

3 3 3

5
4

5
4

5
"'I, ["3 I I ",: red

2 i :
6 6 6 I
31 !
7 7 7 n81 I n7!
8

9
8

9
8

9
y~!
I ne~HJ.n'
i 9 ' i

1 2
10 10 10
n
i 1
11 11 11

12 12

13

14

(a) (b) (e)

Figure 33: An example of TASK's operations.

In Table 6, ".;" indicates the node with the largest L value and is to be
inspected in the current step. In Table 7, "*,, indicates the original PE and
".;" the PE where the node is moved to.

First, there is only one ready node, nl, which is a CP node. Its L value
on PE 0 is LO(nl) = 0 + 14 = 14, Then the L values on other PEs are
computed: LI(nl) = 0 + 14 = 14, L2(nd = 0 + 12 = 12, as shown in Table
7. Thus, node nl is moved from PE 0 to PE 2, as shown in Figure 33(b).
The Lcp of the DAG is reduced to 12. In iterations 2, 3, and 4, moving
nodes n3, n4, and n2 do not reduce any L value. In iteration 5, node n6 is
moved from PE 0 to PE 1 as the L value is reduced from 12 to 11, as shown
in Figure 33(c). In the rest five iterations, nodes n5,n7,nS,ng and nlO do
not move.
Multispace Search for Combinatorial Optimization 525

Table 5: The initial blevel value of each nodes for Example 1.

Node nl n2 n3 n4 n5 n6 n7 ns ng nlO
blevel 14 9 9 10 7 6 5 4 2 1

Table 6: The L values of ready nodes for selecting a node to be inspected.

Iteration
1 nl (0+14=14) V
2 n2 (2+9=11), n3 (3+9=12) V, n4 (1+10=11)
3 n2 (2+9=11), n4 (1+10=11) V
4 n2 (2+9=11) V, n5 (4+7=11)
5 n5 (4+7=11), n6 (6+6=12) V
6 n5 (4+7=11) V, ns (6+4=10), ng (8+2=10)
7 n7 (6+5=11) V, ns (6+4=10), ng (8+2=10)
8 ns (6+4=10) V, ng (8+2=10)
9 ng (8+2=10) V
10 nlO (10+1=11) V

7.5 Performance Study


The TASK algorithm is faster than presently existing scheduling algorithms
for DAG scheduling. It can effectively reduce the scheduling length gener-
ated by other existing scheduling algorithms by 15% to 20%.
Table 8 compares the complexities of several important scheduling algo-
rithms (where n is the number of nodes, e the number of edges, and p the
number of PEs). The TASK algorithm has the lowest complexity among
the five.
In the following, we present the performance results of the TASK algo-
rithm and compare its performance with those of the random local search
algorithm for scheduling, which was one of the best scheduling algorithms.
We performed experiments using synthetic DAGs as well as real workload
generated from the Gaussian elimination program.
We use the same random graph generator for performance comparison.
526 Jun Gu

Table 7: The L values of node ni on each PE to select aPE.

I Iteration I Nod.e I PE 0 I PE 1 I PE 2
1 nl 0+14=14* 0+14=14 0+12=12 J
2 n3 3+11=14 3+9=12* 1+12=13
3 n4 4+12=16 5+9=14 1+10=11 *
4 n2 2+9=11* 5+10=15 4+10 =14
5 n6 6+6 =12* 6+4=lOJ 6+9=15
6 n5 5+10 =15 8+10 =18 4+7 =11 *
7 n7 9+6 =15 9+4 =13 6+5=11*
8 ns 6+4 =10* 8+4=12 8+4 =12
9 ng 10+3=13 8+2=10* 8+4=12
10 nlO 10+1=11 10+1=11* 10+1=11

Table 8: Complexities of different scheduling algorithms.

MCP [111] n"l-log n


DSC [113] (e + n)logn
DLS [97] pn J
RAND [67] MAXSTEP x e
TASK e + nlogn + pn

The synthetic DAGs are randomly generated graphs consisting of thousands


of nodes. These large DAGs are used to test the scalability and robustness
of the local search algorithms. These DAGs were synthetically generated in
the following manner. Given N, the number of nodes in the DAG, we first
randomly generated the height of the DAG from a uniform distribution with
mean roughly equal to -/N. For each level, we generated a random number of
nodes which was also selected from a uniform distribution with mean roughly
equal to -/N. Then, we connected the nodes from the higher level to the
lower level randomly. The edge weights were also randomly generated. The
sizes of the random DAGs were varied from 1000 to 4000 with an increment
of 1000. Three values of the communication-computation-ratio (CCR) were
Multispace Search for Combinatorial Optimization 527

selected to be 0.1, 1, and 10. The weights on the nodes and edges were
generated randomly so that the average value of CCR corresponded to 0.1,
1, or 10.
We evaluate performance of these algorithms in two aspects: the schedule
length generated by the algorithm and the running time of the algorithm.

Table 9: Comparison for synthetic DAGs (4 PEs).

# of CCR Schedule length Running time


nodes Initial II +RAND + TASK RAND TASK ratio
1000 0.1 2536 2536 2531 16.4 0.76 21.6
1 2818 2813 2669 5.43 0.25 21.7
10 5095 5083 4462 9.13 0.43 21.2
2000 0.1 50ll 50ll 4994 41.8 1.91 21.9
1 5506 5498 5226 18.4 0.82 22.4
10 llOO2 10980 8873 21.7 1.02 21.3
3000 0.1 7728 7728 7582 28.1 1.27 22.1
1 7707 7698 7475 81.2 3.57 22.7
10 15619 15586 ll953 38.7 1.76 22.0
4000 0.1 10088 10087 9937 57.3 2.54 22.6
1 10665 10648 10145 88.2 3.90 22.6
10 21436 21419 17390 55.5 2.58 21.5

Tables 9 and 10 show the comparison of the random local search algo-
rithm and the TASK algorithm on 4 PEs and 16 PEs, respectively, where
"RAND" is the random local search algorithm and "TASK" is the TASK
algorithm. MAXSTEP is set to be 64 according to [67]. The compari-
son is conducted on 12 DAGs of different sizes and different CCRs. The
CPN-Dominate algorithm [67] generates the initial schedules. The value in
the column "Initial" is the length of the initial schedule; that in the col-
umn "+RAND" is after the random local search algorithm conducted on
the initial schedule; and that in the column "+ TASK" is after the TASK
algorithm conducted on the initial schedule. The columns "%" following
"+RAND" and "+TASK" are the percentage of improvement of the initial
schedule. The running times of the random local search algorithm and the
TASK algorithm are also shown in the table, as well as their ratio. From
the tables, it can be seen that TASK is much more effective and faster than
528 Jun Gu

Table 10: Comparison for synthetic DAGs (16 PEs).

# of CCR Schedule Length Running time


nodes Initial +RAND + TASK RAND TASK ratio
1000 0.1 663 660 653 16.1 0.82 19.6
1 960 960 910 5.18 0.29 17.9
10 3201 3187 3092 7.86 0.50 15.7
2000 0.1 1348 1348 1317 40.1 2.09 19.2
1 1831 1830 1738 17.6 0.95 18.5
10 6784 6784 6606 19.8 1.13 17.5
3000 0.1 2233 2233 2123 26.6 1.43 18.6
1 2038 2038 1976 74.1 3.81 19.4
10 8770 8770 8471 34.4 1.90 18.1
4000 0.1 2931 2931 2775 54.6 2.80 19.5
1 2992 2992 2865 79.9 4.14 19.3
10 12924 12920 12757 51.6 2.76 18.7

random local search for scheduling and task assignment problem. Increasing
the value of MAX STEP in the random local search algorithm can improve
its performance. However, even if the MAX STEP is equal to the num-
ber of nodes, the random search algorithm does not perform as well as the
TASK algorithm. The search order is important. The order with the L value
is superior than the random search order. Furthermore, if MAX STEP is
equal to the number of nodes, the random search algorithm is extremely
slow because the running time is proportional to MAXSTEP.
We also tested the local search algorithms with the DAGs generated from
a real application, Gaussian elimination with partial pivoting. The Gaussian
elimination program operates on matrices. The matrix is partitioned by
columns. The finest grain size of this column partitioning scheme is a single
column. However, this fine-grain partition generates too many nodes in the
graph. For example, the fine-grain partition of a 1kx 1k matrix generates
a DAG of 525,822 nodes. To reduce the number of nodes, a medium-grain
partition is used. Table 11 lists the number of nodes in different matrix
sizes and grain sizes (number of columns). These graphs are generated by the
Hypertool from an annotated sequential Gaussian elimination program [111].
Table 12 shows the comparison of the random local search algorithm and the
Multispace Search for Combinatorial Optimization 529

TASK algorithm on different DAGs and different number of PEs. Random


local search does not improve quality on these schedules except the first
one, whereas, TASK can always reduce the schedule length. Again, TASK
is much faster than random local search.

Table 11: The number of nodes in different matrix sizes and grain sizes for
Gaussian elimination.

Matrix size lkxlk 2kx2k


Grain size 64 32 16 8 64 32 16 8
# of nodes 138 530 2082 8258 530 2082 8258 32898

Table 12: Comparison for Gaussian elimination.

Matrix Grain # of Schedule Length Running time


size size PEs Initial +RAND + TASK RAND TASK ratio
1k 64 4 209 189 191 0.45 0.02 22.5
x 32 8 110 110 97.0 0.51 0.03 17.0
1k 16 16 56.6 56.6 49.8 1.80 0.09 20.0
8 32 28.9 28.9 25.4 7.60 0.35 21.7
2k 64 8 868 868 786 0.45 0.03 15.0
x 32 16 449 449 394 1.84 0.11 16.7
2k 16 32 386 386 353 7.43 0.45 16.5
8 64 116 116 102 32.0 1.88 17.0

From the real algorithm executions, TASK can improve the scheduling
lengths produced by the RAND algorithm by a factor between 26 to 238. In
addition, it can speed up the execution time of RAND by a factor between
15 to 22.

8 Summary
Multispace search is a new approach for combinatorial optimization [27, 33,
49]. In this paper, we have discussed three case studies of using multi-
530 Jun Gu

space search for solving combinatorial optimization problems. The experi-


mental results indicate that multispace search outperforms the traditional
value search methods in many cases. The performance of this approach is
achieved due to the exploration of problem structure and an aggressive in-
terplay of structural multispace operations along with the traditional value
search. Structural multispace operations empower a search process with a
sequence of dynamic stepwise structural transformations. This makes mul-
tispace search a natural approach to handle the pathological phenomena in
combinatorial optimization.
Multispace search is a new and promising approach. Although it has
demonstrated effectiveness for certain problems, a good deal of interesting
techniques remain to be discovered. Combining structural operations with
existing optimization methods has posed a number of interesting research
topics:

Optimization in multispace. When there are many local minima


in the search space and when we try to search for few optimal solutions,
structural multispace operations can provide alternatives to the existing
value search algorithms. An important area of multispace search is to find
a set of fewer simple multispace operations able to improve an algorithm's
performance. We have so far developed a number of systematic methods to
optimize the variable structures, the performance criteria, and the constraint
relations in multispace search [27, 33, 39, 49].

Theoretical study. Multispace search has posed some interesting and


challenge theoretical issues. During each iteration the problem structure
is changed. The complexity measure and the local convergence rate of the
problem vary accordingly. Special tools are required when we try to de-
rive the complexity or the convergence measure for a dynamically changing
problem structure. The worst case complexity study is feasible in some cases
(see Section 6), but, in general, this study can be difficult [11, 69].

Practical applications. Multispace search has been applied to several


dozens of practical applications with success. The operations and heuristics
used in multispace search vary from problem to problem dramatically. It
remains interesting to know what is its most effective formulations, and
what is the range of problems for which they are best suited to solve. Past
and current research encourages future application of multispace search to
Multispace Search for Combinatorial Optimization 531

a wide range of practical applications.


The power of computers has increased significantly. Many new opti-
mization methods have been developed. There are still difficult challenges
confronting the current optimization technology. The area of natural sci-
ences is a rich field with intriguing facts and truths. They may shed light
on designing natural and efficient algorithms for combinatorial optimiza-
tion. The possibility of multispace search gives a new dimension of freedom.
Much can be explored in the future, however, an effective multispace search
algorithm must rely on simple and nontraditional ideas.

Acknowledgments

I am grateful to many people who have contributed to this work. Bob


Johnson pioneered the early study of physical background of computation.
Panos Pardalos, Ding-Zhu Du, Christ os Papadimitriou, Ben Wah, Paul Pur-
dom, Xiaotie Deng, Andy Sage, Qianping Gu, Wei Wang, George Baciu,
and Bin Du have provided valuable discussions on multispace search. David
Johnson provided his recent survey and TSP benchmarks for our experi-
ments. Johannes Schneider sent us their recent paper.
This research was supported in part by NSERC Research Grant OGP0046423,
NSERC Strategic Grants MEF0045793 and STR0167029, 1996 DIMACS
Workshop on the satisfiability (SAT) problems, and a 1997 DIMACS visitor
program.
532 Jun Gu

References
[1] B. Abramson and M. Yung. Divide and conquer under global con-
straints: A solution to the n-queens problem. Journal of Parallel and
Distributed Computing, 6:649-662, 1989.

[2] T.L. Adam, K.M. Chandy, and J.R Dickson. A comparison of list
scheduling for parallel processing systems. Communications of A. CM,
17(12):685-690, Dec. 1974.

[3] W. Ahrens. Mathematische Unterhaltungen und Spiele (in German).


B.G. Teubner (Publishing Company), Leipzig, 1918-1921.

[4] B. Bernhardsson. Explicit solutions to the n-queensproblems for all


n. ACM SIGART Bulletin, 2(2):7, Apr. 1991, ACM Press.

[5] RE. Bryant. Graph-based algorithms for.Boolean function manipula-


tion. IEEE Trans. on Comput¢rs, C-35(8):677-691, Aug. 1986.

[6] RE. Bryant. Symbolic Boolean manipulation with ordered binary-


decision diagrams. ACM Computing Surveys, 24(3):293-318, Sept.
1992.

[7] H.R Charney and D.t. Plato. Efficient partitioning of components.


Proc. of the 5th Anrrual Design Automation Workshop, pages 16.0 -
16.21, Jul. 1968.

[8] y'C. Chung and S. Ranka. Applications and performance analysis of a


compile-time optimization approach for list scheduling algorithms on
distributed memory multiprocessors. Supercomputer'92, Nov. 1992.

[9] E. Coffman, editor. Computer and Job-Shop Scheduling Theory. John


Wiley & Sons, 1976.

[10] V. Cerny. A thermodynamical approach to the travelling salesman


problem: An efficient simulation algorithm. Technical report, Institute
of Physics and Biophysics, Comenius University, Bratislava, 1982.

[11] Ding-Zhu Du. Private Communications, 1993.


[12] H. EI-Rewini and T.G. Lewis. Scheduling parallel program tasks onto
arbitrary target machines. Journal of Parallel and Distributed Com-
puting, Jun. 1990.
Multispace Search for Combinatorial Optimization 533

[13] H. EI-Rewini, T.G. Lewis, and H. H. Ali. Task Scheduling in Parallel


and Distributed Systems. Prentice Hall, 1994.

[14] B.-J. Falkowski and L. Schmitz. A note on the queens' problem. In-
formation Processing Letters, Vol. 23:39-46, July 1986.

[15] C.M. Fiduccia and RM. Matteyses. A linear-time heuristic for im-
proving network partitioning. In Proc. 19th ACM/IEEE Design Au-
tomation Conference, pages 175-181. IEEE Computer Society Press,
1982.

[16] L.R. Ford and D.R Fulkerson. Flows in Networks. Princeton Univer-
sity Press, New Jersey, 1962.

[17] J. FrankIe and RM. Karp. Circuit placement and cost bounds by
eigenvector decomposition. In Proc. International Conference on
Computer-Aided Design, pages 414-417, 1986.

[18] M.R Garey and D.S. Johnson. The complexity of near-optimal graph
coloring. Journal of the ACM, 23:43-49, Jan. 1976.

[19] M.R Garey and D.S. Johnson. Computers and Intractability: A Guide
to the Theory of NP-Completeness. W.H. Freeman and Company, San
Francisco, 1979.

[20] F. Glover. Tabu search - Part 1. ORSA Journal on Computing,


1(3):190-206, Summer 1989.

[21] B.L. Golden and W.R Stewart. Empirical Analysis of Heuristics. In


The Traveling Salesman Problem. E.L. Lawler, J.K. Lenstra, A.H. G.
Rinnooy Kan, and D.B. Shmoys, editors, pages 207-249. John Wiley
& Sons, New York, 1985.

[22] R.L. Graham, E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnoy Kan. Op-
timization and approximation in deterministic sequencing and schedul-
ing: A survey. Annals of Discrete Mathematics, (5):287-326, 1979.

[23] G.A. Groes. A method for solving traveling-salesman problems. Op-


eration Research, 6:791-812, 1958.

[24] J. Gu. Parallel algorithms and architectures for very fast search (PhD
Thesis). Technical Report UUCS-TR-88-005, Jul. 1988.
534 Jun Gu

[25] J. Gu. Mapping a Computing Structure to the Problem Structure. In


Parallel Algorithms and Architectures for Very Fast Search, chapter 5,
pages 105-115. Technical Report UUCS-TR-88-005. Jul. 1988.

[26] J. Gu. How to solve Very Large-Scale Satisfiability problems. Techni-


cal Report UUCS-TR-88-032. 1988, and UCECE-TR-90-002, 1990.

[27] J. Gu. Optimization by multispace search. Technical Report UCECE-


TR-90-001, Jan. 1990.

[28] J. Gu. An a,B-relaxation for global optimization. Technical Report


UCECE-TR-91-003, Apr. 1991.

[29] J. Gu. Local search for large-scale scheduling and task assignment
problems. Lecture Notes in Algorithm and Optimization, 1991-1996.

[30] J. Gu. Efficient local search for very large-scale satisfiability problem.
SIGART Bulletin, 3(1):8-12, Jan. 1992, ACM Press.

[31] J. Gu. On Optimizing a Search Problem. In Advanced Series on Ar-


tificial Intelligence, Vol. 1, chapter 2, pages 63-105. World Scientific,
New Jersey, Jan. 1992.
[32] J. Gu. Local search for satisfiability (SAT) problem. IEEE Trans.
on Systems, Man, and Cybernetics, 23(4):1108-1129, Jul. 1993, and
24(4):709, Apr. 1994.

[33] J. Gu. Multispace Search: A New Optimization Approach (Summary).


In Lecture Notes in Computer Science, Vol. 834, pages 252-260. 1994.
[34] J. Gu. Optimization Algorithms for the Satisfiability (SAT) Prob-
lem. In Advances in Optimization and Approximation., pages 72-154.
Kluwer Academic Publishers, 1994.

[35] J. Gu. Global optimization for satisfiability (SAT) problem. IEEE


Trans. on Knowledge and Data Engineering, 6(3):361-381, Jun. 1994,
and 7(1):192, Feb. 1995.

[36] J. Gu, H. Li, Z. Zhou, and B. Du. An efficient implementation of


SAT1.5 algorithm. Technical report, USTC, Sept. 1995.
[37] J. Gu and R. Puri. Asynchronous circuit synthesis by Boolean satisfia-
bility. IEEE Transactions on CAD of Integrated Circuits and Systems,
14(8):961-973, Aug. 1995.
Multispace Search for Combinatorial Optimization 535

[38] J. Gu. An aj3 Relaxation for Global Optimization. In Minimax and


Applications, pages 251-268. Kluwer Academic Publishers, 1995.

[39] J. Gu. Multispace search for satisfiability and NP-hard problems. DI-
MA CS Volume Series on Discrete Mathematics and Theoretical Com-
puter Science, Vol. 35, pages 407-517, American Mathematical Soci-
ety, 1997.

[40] J. Gu. Randomized and deterministic local search for sat and schedul-
ing problems. DIMACS Volume Series on Discrete Mathematics and
Theoretical Computer Science, American Mathematical Society, 1998.

[41] J. Gu and B. Du. Graph partitioning by simulated evolution. Tech-


nical Report UCECE-TR-92-00l, Dept. of Electrical and Computer
Engineering, Univ. of Calgary, Apr. 1992.

[42] J. Gu, B. Du, and Y.K. Kwok. Design an efficient local search algo-
rithm for DAG scheduling. COMP680 Applied Optimization Course
Project Meeting, HKUST, Fall 1995.

[43] J. Gu, B. Du, and D. Tsang. Quorumcast routing by multispace search.


IEEE Transactions on Computers, to appear.

[44] J. Gu and X. Huang. Local search with search space smoothing: A


case study of the traveling salesman problem (TSP). Technical Report
UCECE-TR-91-006, Aug. 1991. In IEEE Trans. on Systems, Man, and
Cybernetics, 24(5):728-735, May 1994.

[45] J. Gu and W. Wang. A novel discrete relaxation architecture. IEEE


Trans. on Pattern Analysis and Machine Intelligence, 14(8):857-865,
Aug. 1992.

[46] J. Gu and X. Huang. A constraint network based approach to a shape


from shading analysis of a polyhedron. In Proceedings of IJCNN'92,
pages 441-446, Beijing, Nov. 1992.

[47] J. Gu, W. Wang, and T. C. Henderson. A parallel architecture for


discrete relaxation algorithm. IEEE Trans. on Pattern Analysis and
Machine Intelligence, PAMI-9(6):816-831, Nov. 1987.

[48] J. Gu, M.Y. Wu, and W. Shu. Fast local search algorithms for DAG
scheduling. Research Collaborations, Summer 1995.
536 Jun Gu

[49] J. Gu. Constraint-Based Search. Cambridge University Press, New


York, to appear.
[50] J. Gu. Optimization by Multispace Search. Kluwer Academic Publish-
ers, to appear.
[51] E. J. Hoffman, J. C. Loessi, and R. C. Moore. Constructions for the
solution of the m queens problem. Mathematics Magazine, pages 66-
72,1969.

[52] J. Holland. Adaptation in Natural and Artificial Systems. University


of Michigan Press, Ann Arbor, 1975.
[53] E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms.
Computer Science Press, Rockville, 1978.
[54] T.C. Hu. Parallel sequencing and assembly line problems. Operations
Research, 9(6):841-848, 1961.
[55] X. Huang, J. Gu, and Y. Wu. A constrained approach to multifont
character recognition. IEEE Transactions on Pattern Analysis And
Machine Intelligence, 15(8):838-843, Aug. 1993.

[56] D.S. Johnson and L.A. McGeoch. The Traveling Salesman Problem
(TSP): A Case Study in Local Optimization. In Local Search in Com-
binatorialOptimization, E.H.L. Aarts and J.K. Lenstra, editors. John
Wiley and Sons, New York.
[57] R. R. Johnson. Elements of a theory for computer performance. Un-
published Manuscript, Aug. 1988.
[58] W. Lewis Johnson. Letter from the editor. SIGART Bulletin, 2(2):1,
April 1991, ACM Press.
[59] M.D. Johnston. Scheduling with neural networks - the case of the
hubble space telescope. NASA Memo, 1989.

[60] B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for


Partitioning Graphs. The Bell Systems Technical Journal, pages 291-
307, Feb. 1970.

[61] A.A. Khan, C.L. McCreary, and M.S. Jones. A comparison of mul-
tiprocessor scheduling heuristics. Int'l Conf. on Parallel Processing,
II:243-250, Aug. 1994.
Multispace Search for Combinatorial Optimization 537

[62] S. Kirkpatrick, C.D. Gelat, and M.P. Vecchio Optimization by simu-


lated annealing. Science, 220:671-680, 1983.

[63] B. Krishnamurthy. An improved min-cut algorithm for partitioning


vlsi networks. IEEE Trans. on Computers, C-33:438-446, May 1984.

[64] B. Kruatrachue and T.G. Lewis. Grain size determination for parallel
processing. IEEE Software, pages 23-32, Jan. 1988.

[65] V. Kumar. Algorithms for constraint satisfaction problems: A sur-


vey. Technical Report TR-91-28, Dept. of Computer Science, Univ. of
Minnesota, 1991.

[66] V. Kumar. Algorithms for constraint satisfaction problems: A survey.


The AI Magazine, 13(1):32-44, 1992.

[67] Y.K. Kwok, I. Ahmad, and J. Gu. FAST: A low-complexity algorithm


for efficient scheduling of DAGs on parallel processors. In Proc. of
Int'l Conference on Parallel Processing, pages (II) 150 - 157, Aug.
1996.

[68] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys,
editors. The Traveling Salesman Problem. John Wiley & Sons, New
York, 1985.

[69] M. Li. Private Communications, 1993.

[70] H.-T. Liaw and C.-S. Lin. On the OBDD-representation of general


Boolean functions. IEEE Trans. on Computers, C-41(6):611-664, Jun.
1992.

[71] S. Lin. Computer solutions of the traveling salesman problem. Bell


Sys. Tech. Journal, 44{1O):2245-2269, Dec. 1965.

[72] S. Lin and B.W. Kernighan. An effective heuristic algorithm for the
traveling salesman problem. Operation Research, 21:498-516, 1973.

[73] R. Margalef. Perspectives in Ecology Theory. University of Chicago


Press, Chicago, 1968.

[74] S. Minton, M.D. Johnston, A.B. Philips, and P. Laird. Solving large-
scale constraint satisfaction and scheduling problems using a heuristic
repair method. In Proceedings of AAAI'90, pages 17-24, Aug. 1990.
538 Jun Gu

[75] S. Minton, M.D. Johnston, A.B. Philips, and P. Laird. A heuris-


tic repair method for constraint satisfaction and scheduling problems.
Artificial Intelligence, 58:161-205, 1992.

[76] H. Morowitz. Energy Flow in Biology. Academic Press, New York,


1968.

[77] J.P. Norback and RF. Love. Geometric approaches to solving the trav-
eling salesman problem. Management Science, 23:1208-1223, 1977.

[78] J.P. Norback and RF. Love. Heuristic for the hamiltonian path prob-
lem in euclidian two space. J. Oper. Res. Soc., 30:363-368, 1979.

[79] I. Or. Traveling Salesman-Type Combinatorial Problems and their Re-


lation to the Logistics of Regional Blood Banking. PhD thesis, North-
western University, Evanston, IL, 1976.

[80] J.C. Park and C.G. Han. Solving the survivable network design prob-
lem with search space smoothing. In Proc. of Conference on Network
Optimization, University of Florida, Feb. 1996.

[81] I. Prigogine. From Being to Becoming: Time and Complexity in the


Physical Sciences. W.H. Freeman and Company, New York, 1980.

[82] P.W. Purdom and G.N. Haven. Backtracking and probing. Technical
Report No. 387, Dept. of Computer Science, Indiana University, Aug.
1993.

[83] R Puri and J. Gu. An efficient algorithm for computer microword


length minimization. IEEE Transactions on CAD, 12{10}:1449-1457,
Oct. 1993.

[84] R Puri and J. Gu. An efficient algorithm to search for minimal closed
covers in sequential machines. IEEE Transactions on CAD, 12{6}:737-
745, Jun. 1993.

[85] M. Reichling. A simplified solution of the n queens' problem. Infor-


mation Processing Letters, Vol. 25:253-255, June 1987.

[86] D.J. Rosenkrantz, RE. Stearns, and P.M. Lewis. An analysis of several
heuristics for the traveling salesman problem. SIAM J. on Computing,
6:563-581, 1977.
Multispace Search for Combinatorial Optimization 539

[87] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach.


Prentice-Hall, Englewood Cliffs, 1995.

[88] L.A. Sanchis. Multi-way network partitioning. IEEE Trans. on Com-


puters, 38:62~81, Jan. 1989.

[89] V. Sarkar. Partitioning and Scheduling Parallel Programs for Multi-


processors. The MIT Press, 1989.

[90] J. Schneider, M. Dankesreiter, W. Fettes, 1. Morgenstern, M. Schmid,


and J.M. Singer. Search-space smoothing for combinatorial optimiza-
tion problems. Physica A, 243:77~ 112, 1997.

[91] E. Schroedinger. What Is Life? Cambridge University Press, Cam-


bridge, 1944.

[92] D.M. Schuler and E.G. Ulrich. Clustering and linear placement. In
Proc. of the 9th Annual Design Automation Workshop, pages 50~56,
1972.

[93] B.M. Schwartzschild. Statistical mechanics algorithm for Monte Carlo


optimization. Physics Today, 35:17~ 19, 1982.

[94] D.G. Schweikert and B.W. Kernighan. A proper model for the parti-
tioning of electrical circuits. In Proc. 9th Design Automation W ork-
shop, pages 57~62, 1972.

[95] C. Sechen and D. Chen. An improved objective function for mincut


circuit partitioning. In Proc. International Conference on Computer-
Aided Design, pages 502~505, 1988.

[96] B. Selman, H. Levesque, and D. Mitchell. A new method for solving


hard satisfiability problems. In Proceedings of AAAI'92, pages 440~
446, Jul. 1992.

[97] G.C. Sih and E.A. Lee. A compile-time scheduling heuristic


for interconnection-constrained heterogeneous processor architectures.
IEEE Transactions on Parallel and Distributed Systems, 4(2):175~ 187,
Feb. 1993.

[98] R. Sosic and J. Gu. How to search for million queens. Technical
Report UUCS-TR-88-008, Dept. of Computer Science, Univ. of Utah,
Feb. 1988.
540 Jun Gu

[99] R. Sosic and J. Gu. A polynomial time algorithm for the n-queens
problem. SIGART Bulletin, 1(3):7-11, Oct. 1990, ACM Press.

[100] R. Sosic and J. Gu. Fast search algorithms for the n-queens problem.
IEEE Trans. on Systems, Man, and Cybernetics, SMC-21(6):1572-
1576, Nov./Dec. 1991.

[101] R. Sosic and J. Gu. 3,000,000 queens in less than one minute. SIGART
Bulletin, 2(2):22-24, Apr. 1991, and IEEE Trans. on Knowledge and
Data Engineering, Vol. 6, No.5, pp. 661-668, Oct. 1994.

[102] R. Sosic and J. Gu. Efficient local search with conflict minimization.
IEEE Trans. on Knowledge and Data Engineering, 6(5):661-668, Oct.
1994.

[103] R. Sosic and J. Gu. Quick n-queen search on VAX and Bobcat ma-
chines. CS 547 AI Class Project, Winter Quarter Feb. 1988.

[104] W.R. Stewart. A computationally efficient heuristic for the traveling


salesman problem. In Proc. 13th Annual Meeting of S.E. TIMS, pages
75-85, 1977.

[105] H.S. Stone and J.M. Stone. Efficient search techniques - an empirical
study of the n-queens problem. IBM J. Res. Develop., 31(4):464-474,
July 1987.

[106] W. Wang and C.K. Rushforth. An adaptive local search algorithm for
channel assignment problem. IEEE Trans. on Vehicular Technology,
45(3):459-466, Aug. 1996.

[107] W. Wang and C.K. Rushforth. Structured partitioning for channel


assignment problem. IEEE Trans. on Vehicular Technology, 1996.

[108] Y.-C. Wei and C.-K. Cheng. Towards efficient hierarchical de-
signs by radio cut partitioning. In Proc. International Conference
on Computer-Aided Design, pages 298-301. IEEE Computer Society
Press, 1989.

[109] Y.-C. Wei and C.-K. Cheng. A two-level two-way partitioning algo-
rithm. In Proc. International Conference on Computer-Aided Design,
pages 516-519. IEEE Computer Society Press, 1990.
Multispace Search for Combinatorial Optimization 541

[110] Y.-C. Wei and C.-K. Cheng. Ratio cut partitioning for hierarchical
designs. IEEE Transactions on CAD, 10(7):438-446, Jul1991.

[111] M.Y. Wu and D.D. Gajski. Hypertool: A programming aid for


message-passing systems. IEEE Transactions on Parallel and Dis-
tributed Systems, 1(3):330-343, July 1990.

[112] M.Y. Wu, W. Shu, and J. Gu. Efficient local search for DAG schedul-
ing. In Submitted for publication, 1996.

[113] T. Yang and A. Gerasoulis. DSC: Scheduling parallel tasks on an


unbounded number of processors. IEEE Transactions on Parallel and
Distributed Systems, 5(9):951-967, Sept. 1994.
543

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 543-566
@1998 Kluwer Academic Publishers

The Equitable Coloring of Graphs


Ko-Wei Lih
Institute of Mathematics
Academia Sinica, Taipei, Taiwan 115
E-mail: makvlih@sinica.edu.tv

Contents
1 Introduction 544

2 Bipartite graphs 547

3 Trees 552

4 The Equitable ~-Coloring Conjecture 555

5 Special families supporting E~CC 559

6 Miscellaneous results 560

7 Related notions of coloring 561

8 Further research 563

References

Abstract
Let the vertices of a graph G be colored with k colors such that no
adjacent vertices receive the same color and the sizes of the color classes
differ by at most one. Then G is said to be equitably k-colorable. The
equitable chromatic number x= (G) is the smallest integer k such that
G is equitably k-colorable. In this article, we survey recent progress on
544 Ko-Wei Lih

the equitable coloring of graphs. We pay more attention to work done


on the Equitable ~-Coloring Conjecture. We also discuss related graph
coloring notions and their problems. The survey ends with suggestions
for further research topics.

keywords: Equitable coloring, Equitable edge coloring, m-bounded coloring, Equal-


ized total coloring

1 Introd uction
In the days when graph theory was not christened as such, coloring prob-
lems provided main motivations for researchers to thrive. Now when we are
coming toward the end of a millennium, coloring has already prospered into
a major and active area of graph-theoretic research. This can be attested
by the publication of the monograph by Jensen and Toft [16]. In this sur-
vey, we intend to discuss a restricted version of coloring problems. Roughly
speaking, if by coloring nodes of a network model we are assigning resources
to the agents, then what concerns us here is how to distribute the resources
as even as possible.
Network models are naturally formulated in terms of graphs. A graph
G = (V, E) consists of a vertex set V (G) and an edge set E( G). All graphs
considered in this survey are finite, loopless, and without multiple edges.
If the vertices of a graph G can be partitioned into k classes VI, V2 , ... , Vk
such that each Vi is an independent set (none of its vertices are adjacent ),
then G is said to be k-colorable and the k classes are called its color classes.
Equivalently, we can view a coloring as a function 7r : V(G) --+ {I, 2, ... , k}
such that adjacent vertices are mapped to distinct numbers and 7r is called a
k-coloring. All pre-images of a fixed number form a color class. The smallest
number k such that G is k-colorable is called the chromatic number of G,
denoted by x( G). The graph G is said to be equitably k-colorable if there is
a k-coloring whose color classes satisfy the condition IIViI-IY]11 ::; 1 for all
i =f:. j. The smallest integer n for which G is equitably n-colorable is defined
to be the equitable chromatic number of G, denoted by x= (G). this notion of
equitable colorability was first introduced in a paper by Walter Meyer [20].
His motivation came from Tucker [22] where vertices represented garbage
collection routes and two such vertices were joined when the corresponding
routes should not be run on the same day. Meyer thought that it would be
desirable to have an approximately equal number of routes run on each of
the six days.
The Equitable Coloring of Graphs 545

Let degG (x), or deg (x) for short, denote the degree of the vertex x in the
graph G. Let ~(G) = max {deg(x) : x E V(G)}. Let ,xl and LxJ denote,
respectively, the smallest integer not less than x and the largest integer not
greater than x. The main result obtained by Meyer was that a tree T can
be equitably colored with ,~(T) /21 + 1 colors. Unfortunately, his proof was
marred with gaps. According to Guy's report [12], Eggleton could extend
Meyer's result to show that a tree T can be equitably colored with k colors,
provided k 2': ,~(T) /21 + 1. A finer result about trees is the following
theorem by Bollobas and Guy [1]. However a complete determination of
when a tree is equitably colorable was left unsolved in their paper.

Theorem 1.1 A tree Ton t vertices is equitably 3-colorable ift 2': 3~(T)-8
or t = 3~(T) - 10.

By far the most interesting contribution made in Meyer's paper is to


propose the following conjecture.

The Equitable Coloring Conjecture (ECC) Let G be a connected graph.


If G is neither a complete graph nor an odd cycle, then X= (G) ~ ~(G).

Probably due to the lack of using powerful computers, Meyer was suc-
cessful in verifying the Eee only for graphs with six or fewer vertices. Ap-
parently the motivation of the Eee came from the following fundamental
theorem of Brooks [2].

Theorem 1.2 Let G be a connected graph. If G is neither a complete graph


nor an odd cycle, then X(G) ~ ~(G).

One well-known result of Hajnal and Szemeredi [13], when rephrased in


terms of the equitable colorability, has already shown the following.

Theorem 1.3 A graph G (not necessarily connected) is equitably k-colorable


if k ? ~(G) + 1.

The original proof was long and hard to devour. So far no significant
simplification has been obtained. If we let X:(G) to denote the smallest in-
teger n such that G is equitably k-colorable for all k ? n, then an equivalent
formulation of Theorem 1.3 is that X:(G) ~ ~(G) + 1 holds for any graph
G. At this point we should alert the reader that there is a strong contrast
between the equitable colorability and the ordinary colorability. X:(G) may
546 Ko-Wei Lih

in fact be greater than x= (G). This will be demonstrated later. It there-


fore makes sense to introduce the notion of X:
(G) which will be called the
equitable chromatic threshold of G.
In an entirely different context, de Werra [25] treated color sequences
and the majorization ordering among them. However his results have con-
sequences in the equitable colorability. A sequence of nonnegative integers
h = (hI, h2, ... , hk) is called a color sequence for a given graph G if the
following conditions hold.

(a) hI ~ h2 ~ ... ~ hk ~ 0;
(b) There is a k-coloring of G such that the color classes VI, V2 ,· .. , Vk
satisfy IViI = hi for i = 1,2, ... k.

The majorization ordering, also known as the dominance ordering, is a


widely used notion in measuring the evenness of distributions. Marshall and
Olkin [19] offers a comprehensive treatment of majorization. Let a : al ~
a2 ~ ... ~ ak ~ 0 and f3 : bl ~ b2 ~ ... ~ bk ~ 0 be two sequences of
nonnegative integers. We say that a is majorized by f3 if the following two
conditions hold.
j j
(a) L ai ~ L bi , for j = 1,2, ... ,k - 1;
i=1 i=1
k k
(b) L ai = L bi ·
i=1 i=1

Let Km,n denote the complete bipartite graph whose parts are of sizes
m and n, respectively. A claw-free graph is a graph containing no K I ,3 as
an induced subgraph. One of de Werra's results is the following.

Theorem 1.4 Let G be a claw-free graph and h = (hI, h 2, . .. ,hk ) be a color


sequence of G. Then any sequence of nonnegative integers h' = (hi, h~, ... , hU
is also a color sequence if h' is majorized by h.

Combining Theorem 1.2 and Theorem 1.4, it follows that, for a claw-
free graph G, G is equitably k-colorable for all k ~ X( G), or equivalently
X~(G) = X=(G) = X(G). Although this fact can be shown directly, it was
first implicitly implied in de Werra's paper. We see immediately that the
ECC holds for claw-free graphs. Since every line graph is claw-free, the ECC
holds for line graphs in particular.
Tbe Equitable Coloring of Grapbs 547

This ends the history of pre-1990 activities on the equitable coloring. In


Toft's survey paper [21], the ECC was included as Problem 36.
This survey is organized as follows. Section 1 is largely about background
and basic definitions. Section 2 deals with bipartite graphs. The publica-
tion of the main results was delayed. They actually signaled a new wave
of activities on the equitable coloring problems. Section 3 completely de-
ternines when a tree is equitably colorable. Section 4 introduces the impor-
tant Equitable ~-Coloring Conjecture and provides some positive evidence,
especially graphs with high degrees and cubic graphs. Section 5 supplies
further evidence for the conjecture such as split graphs, outerplanar graphs,
interval graphs, and planar graphs except a few undetermined cases. Sec-
tion 6 collects miscellaneous results about equitable chromatic numbers and
equitable chromatic thresholds, examples including null graphs, complete
partite graphs, square and cross products of complete graphs, cycles. In
Section 7, we discuss three notions of colorability that are related or paral-
lel to the equitable colorability. They are the equitable edge coloring, the
equalized total coloring, and the m-bounded coloring. In addition to the
problems and conjectures included in the earlier sections, the final Section 8
summarizes some general suggestions to further research and presents more
conjectures and problems.

2 Bipartite graphs
A breakthrough about the equitable coloring was achieved when Lih and
Wu [17] settled the ECC for bipartite graphs.

Theorem 2.1 Let G = G(X, Y) be a connected bipartite graph. If G is


different from any complete bipartite graph Kn,n, then G can be equitably
colored with ~(G) colors.

Proof. Let !X! = m 2: n = !Y!, where !X! and IY! denote the sizes of
the parts. Write ~ for ~(G). If m > n, then X must contain a vertex of
degree less than~. Since G is connected, it follows that n ::; (~ - 2) +
(m - 1)(~ - 1) + 1. Hence r(m + n)/ ~1 ::; m. On the other hand, we have
m::;n(~-l)+1. It follows that r(m+n-~+l)/~l = l(m+n)/~J::;
l(n(~ - 1) + 1 + n)/ ~J = n. Since ~ 2: 2, both r(m + n)/ ~1 ::; m and
l(m + n)/ ~J ::; n still hold when m = n.
It is easy to see the identity m + n = r(m + n)/ ~1+
r(m + n - 1) / ~1+ ... + r (m + n - ~ + 1) / ~1. If we can partition X into
548 Ko-Wei Lih

classes of sizes r(m + n) / Lll, r(m + n - 1) / Lll, ... , r(m + n - k) / Lll for a
certain k, 0 ::; k < Ll - 1, then we can partition Y into classes of sizes
r r
(m + n - k - 1) / Lll, ... , (m + n - Ll + 1) / Lll Thus we have an equi-
table coloring with Ll colors. Otherwise, we partition X into classes of sizes
r(m + n) / Lll , r(m + n - 1) / Lll, ... , r(m + n - k + 1) / Lll, and s; partition
Y into classes of sizes r(m + n - k)/ Lll, ... ,
r(m + n - Ll + 2)/ Lll, and t. The numbers s, t, and k satisfy the conditions
r r
o < k < Ll-l, 0 < s < (m + n - k) / Lll , and s +t = (m + n - Ll + 1) / Lll
If the following claim is true, then S U T, regarded as a class, together
with the other classes constitute an equitable coloring of G with Ll colors
and the theorem is established.

Claim There exist S <;;:; X and T <;;:; Y such that lSI = s, ITI = t, and S U T
is an independent set of vertices.

We first show that the two inequalities m - (t(Ll - 1) + 1) ::; s - 1 and


n - s (Ll-l) ::; t -1 cannot be true simultaneously. Suppose that they could.
Then, by adding them together, we have m+n-(s+t)(Ll-l)-1 ::; s+t-2. It
r
follows that (m+n+ 1) / Ll ::; s+t = (m + n - Ll + 1) / Lll = l (m + n) / Ll J ::;
(m + n)/ Ll, a contradiction.
Case 1: Let m - (t(Ll - 1) + 1) 2: s. We use N(V) to denote the
set of vertices which are adjacent to at least one vertex in V. Since G is
connected, we can successively choose distinct vertices Yl, Y2, ... , Yt in Y
such that IN(Yi+d n N(Yl, Y2, . .. , ydl 2: 1 for all 1 ::; i < t. Let T =
{Yl,Y2, ... ,Yt}. Then IN(T)I :S t(/:l. -1) + 1. Since m - (t(/:l. -1) + 1) 2: s,
the set S can be selected and the claim is true.
Case 2: Let m - (t(Ll - 1) + 1) ::; s - 1 and n - s(/:l. - 1) 2: t. Our
claim is true if we can find a subset S of X such that lSI = sand IN(S)I :S
s (/:l. - 1). Since G is connected, we can successively choose distinct vertices
Xl,X2, ... ,Xs in X such that IN(xi+dnN(xl,x2, ... ,xi)l2: 1 for alII::;
i < s. Unlike Case 1, we now choose Xl to be a vertex of degree less than Ll
if there is at least one such vertex. Otherwise we choose an arbitrary vertex
as Xl. Let S = {Xl,X2, ... ,x s }.
If we start with deg(xd < Ll, then IN(S)I ::; (Ll- 2) +
(s - 1) (Ll - 1) + 1 = s (Ll - 1) and we are done. Otherwise the graph must
be a regular graph of degree Ll and m = n. The trivial case is when Ll = 2.
Thus G is an even cycle and x= (G) = 2. So we let /:l. 2: 3. If in the process of
selecting the members of S we actually had IN(xi+d n N(Xl, X2,··· , xi)1 2: 2
for some i, 1 :S i < s, then it is easy to see IN(S)I ::; s (/:l. - 1).
The Equitable Coloring of Graphs 549

Now suppose IN(Xi+l) n N(XI' X2,"" xi)1 = 1 for all 1 ~ i < sand
IN(x) n N(XI' X2,'" ,xs-I)I ~ 1 for all x in X - {Xl, X2,"" Xs-I}. In this
case, IN(S)I = s (.6. - 1) + 1. The claim is true if the stronger inequality
n - (s(.6. - 1) + 1) ?: t holds.
Suppose, on the contrary, n - (s(.6. - 1) + 1) ~ t - 1. We have already
assumed that n-(t(.6.-1)+I) = m-(t(.6.-1)+I) ~ s-1. By adding these
two inequalities together, we get 2n/.6. ~ s + t = L2n/.6. J. This implies that
.6. divides 2n. From the way of defining sand t, it follows that s = t = n/.6..
Now IX - {XI,X2,'" ,xs-I}I = n - s + 1 and IY - N(XI,X2, ... , xs-dl =
n - (s - 1) (.6. - 1) - 1. When we count the number of edges between
X - {Xl, X2,"" xs-d and Y - N(Xb X2,'" ,xs-d in two different ways, it
comes out both?: (n - s + 1)(.6. -1) and ~ (n - (s -1)(.6. -1) -1).6.. Since
s = n/.6., we have
(n - (s - 1)(.6. - 1) - 1).6. ?: (n - s + 1)(.6. - 1),
.6. 2 - n.6. - 3.6. + 3n - s + 1 ?: 0,
.6. 3 - n.6. 2 - 3.6. 2 + 3n.6. - n +.6.?: 0,
(.6. - n)(.6. 2 - 3.6. + 1) ?: O.
However the regular bipartite graph G is not a complete bipartite graph,
hence .6. < n. At the same time, .6. ?: 3 implies that .6. 2 - 3.6. + 1 is
positive .. It is impossible to have (.6. - n)(.6. 2 - 3.6. + 1) ?: O. Therefore
n - (s(.6. - 1) + 1) ?: t and the claim is true. 0

Theorem 2.2 The complete bipartite graph Kn,n can be equitably colored
with k colors if and only if fn/ Lk/2Jl- Ln/ fk/21J ~ 1.

The proof for the complete bipartite case is straightforward by consid-


ering the appropriate sizes of the color classes. One interesting point to
note is that, for k = .6.(Kn,n), the difference involved in the theorem is
o when n is even and is 2 when n is odd. In the proof of Theorem 2.1,
we in fact used .6.(G) colors to color G equitably. We can conclude that,
except the complete bipartite graphs K2m+l,2m+l, every connected bipar-
tite graph G can be equitably colored with .6.(G) colors. Now we see that
X(K2m+l,2m+d = X= (K2m+I,2m+d = 2, yet X: (K2m+l,2m+d = 2m + 2.
There is a gap between the equitable chromatic number and the equitable
chromatic threshold.
In many cases, the equitable chromatic number is far below the maximum
degree. If we impose additional constraints on the graph, we could arrive
at a better bound for the equitable chromatic number. The following is a
result of this type.
550 Ko-Wei Lih

Theorem 2.3 Let G = G(X, Y) be a connected bipartite graph with c edges.


Suppose IXI = m ~ n = IYI and c < lm/(n + 1)J (m - n) + 2m. Then
X=(G) ::; rm/(n + 1)1 + 1.

Proof. Let q = lm/(n + 1)J. Hence m = q(n + 1) + r for some r such that
o ::; r < n + 1. r r
Then m / (n + 1)1 + 1 = q + (r / (n + 1))1 + 1 which is q + 1
if r = 0 and is q + 2 if 0 < r < n + 1.
r
If m = n, then X=(G) = 2 = m/(n + 1)1 + 1. If m > n, then it is easy
to see X=(G) ::; q + 1 when r = 0 and X=(G) ::; q + 2 when r = n. So the
theorem holds.
It remains to prove the case when m > n > r > O. Under these circum-
stances, we have q ~ 1 and we want to color G with q + 2 colors.
We first search for numbers k and t such that 0 ::; k < n, 0 ::; t ::; q + 1,
and m + k elements can be partitioned into t classes of size n - k and
q + 1 - t classes of size n - k + 1. Let k = l(n - r + 1)/(q + 2)J, hence
n - r + 1 = k(q + 2) + t for some t such that 0 ::; t ::; q + 1. Since
0::; k ::; (n - r + 1)/(q + 2) ::; n/(q + 2) ::; n/3 < n, the number k falls into
the desired range. Furthermore,
(m + k) - t(n - k) - (q + 1 - t)(n - k + 1)
= q(n + 1) + r + k - t(n - k) - (q + 1 - t)(n - k + 1)
= k(q + 2) + t - (n - r + 1) = O.
Therefore k and t satisfy our requirements.
Case 1: Suppose that the number k is equal to O. We may partition X
into q + 1 - t classes of size n + 1 and t classes of size n. Regard Y as a single
class. We then color each class with a distinct color. Thus G is equitably
colored with q + 2 colors.
Case 2: Suppose that the number k is different from O. We claim that
there exist A ~ X and B ~ Y such that IAI = n - 2k + 1, IBI = k, and
Au B is an independent set of vertices of size n - k + 1. When t = 0, we
color vertices in AU B with the same color. According to the choices of k
and t, we may then partition X - A into q classes of size n - k + 1. When
t > 0, we delete one element from A to get A' and color vertices in A' U B
with the same color. We then partition X - A' into q + 1 - t classes of size
n - k + 1 and t -1 classes of size n - k. In both cases, we always let Y - B
be a single class of size n - k. Then G is equitably colored with q + 2 colors.
To prove the claim, let u = In/kJ and hence n = uk+v for some v such
that 0 ::; v < k.
The Equitable Coloring of Graphs 551

Let Y = {Yl, Y2,· .. ,Yn} be such that deg(Yl) 2:: deg(Y2) 2:: ... 2:: deg(Yn).
If v > 0, then we define V = {Yl, Y2, ... ,Yv}. Let w(S) denote the number
of edges which have at least one end vertex in the set S. Then w(V) 2:: 2v if
V contains no vertex of degree 1. On the other hand, Y - V contains only
vertices of degree 1 if V contains at least one such vertex. In this case,
w(V) =E- (n - v) 2:: (m +n - 1) - (n - v) 2:: m - 1 + v 2:: n + v > 2v.
Moreover, we just let V be the empty set if v = O. Thus in all cases we have
w(V) 2:: 2v. Next we partition Y - V into Y1 , Y2 , . .. ,Yu such that IYiI = k
for i = 1,2, ... ,u.
Subcase 1: If w(Yi) < m - n + 2k for a certain Yi, then X contains at
least m - (m - n + 2k - 1) = n - 2k + 1 vertices which are independent of
Yi. So we are able to choose the required sets A and B.
Subcase 2: Suppose w(Yi) 2:: m - n + 2k for i = 1,2, ... ,u. Then
u
E = w(V)+ 2: w(Yi) 2:: 2v + u(m -
+ 2k) n
i=l
= 2v + u(m - n) + 2n - 2v = u(m - n) + 2n,
i.e., E 2:: In/kJ (m-n)+2n. However In/kJ 2:: In(q + 2)/(n - r + I)J 2:: q+2.
It follows that E 2:: (q+2)(m-n)+2n = q(m-n)+2m = lm/(n + I)J (m-
n) + 2m. This contradicts the assumption of the theorem. 0
The bound for the equitable chromatic number in the above theorem is
indeed better than .6.(G) when there are at least two edges. B.-L. Chen
in a personal communication made the following conjecture and proved its
validity when the maximum degree is at least 53. It is also trivial to see
that the conjecture holds for complete bipartite graphs. The Meyer-Eggleton
result about trees gives another evidence for the conjecture.

Conjecture 2.4 Let G be a connected bipartite graph. Then


X=(G) :::; l(.6.(G) + 3)/2J.
There are some miscellaneous results on the equitable coloring of general
bipartite graphs. For instance, Lin in his Master degree thesis [18] proved
the following.

Theorem 2.5 Let G(A, B) be a bipartite graph. Let IGI abbreviate IV( G) I
and .6. c (B) denote the maximum degree of a vertex in part B. If IBI <
lIGI/kJ and .6. c (B) :::; r(k-r), where k-llGI /kJ :::; r:::; k, then X:(G) :::; k.
Theorem 2.6 Let G(A, B) be a bipartite graph. Then X: (G) :::; 2k - 2 if
one of the following two conditions is true.
552 Ko-Wei Lih

(1) IGI ~ k 2 and t:.G(B) ~ rk/21 Lk/2J, or


(2) IGI < k2 and t:.G(B) ~ rlGI/2k1Lk -IGI/2kJ.

B.-L. Chen in a personal communication established the following.

Theorem 2.7 A bipartite graph G with t:.(G) ~ 2 is equitably t:.(G)-colorable


if and only if G is different from all K 2m + 1,2m+l'

3 Trees
A graph is non-trivial if it contains at least one edge. There is a natural way
to regard a non-trivial tree T as a bipartite graph T(X, Y). The technique
we used to prove the ECC for connected bipartite graphs can be applied to
find the equitable chromatic number of a non-trivial tree when the two parts
differ in sizes by at most one. We are trying to cut the parts into classes of
nearly equal size. If there are vertices remaining, then we manage to find
nonadjacent vertices in the opposite part to form a class of the right size.
In Chen and Lih [5], the following was established.

Theorem 3.1 Let T = T(X, Y) be a non-trivial tree satisfying


IIXI-IYII ~ 1. Then X=(T) = X:(T) = 2.
It is harder to determine the equitable chromatic number of a tree if the
sizes of the parts have larger difference. Let the graph G be equitably colored
with x= (G) colors. Let the arbitrarily chosen vertex v be colored with, say,
color 1. The total number of vertices colored with color 1 is ::; a(G - N(v)).
The notation a(S) denotes the independence number of the set S. Since this
is an equitable coloring, the number of vertices colored with any fixed color
i i- 1 is ~ a(G -N(v)) + 1. Hence X=(G) ~ r(IGI + 1)/(a(G - N(v)) + 1)1.
On the other hand, it is obvious that, if G = G(X, Y) is a connected bi-
partite graph, then a necessary and sufficient condition for x= (G) = 2 is
IIXI- IYII ~ 1. Putting these facts together, we have obtained the neces-
sity of the following theorem. A major vertex means a vertex whose degree
is equal to t:.(G).

Theorem 3.2 Let T = T(X, Y) be a tree satisfying IIXI- WII > 1. Then
X=(T) = X:(T) = max{3, r(ITI + l)/(a(T - N(v)) + 1)l}, where v is an
arbitrary major vertex.
The Equitable Coloring of Graphs 553

Proof. To prove the sufficiency, we write .6. for .6.(T) and t for ITI.
Case 1: Suppose.6. > (t + 3)/3. Fix a major vertex v. Let k be
any integer?: max{3, f(t + 1)/(a(T - N(v)) + 1)l}. It follows that a(T -
N(v)) ?: (t+l-k)/k. Since a(T-N(v)) is an integer, we have a(T-N(v)) ?:
f(t + 1 - k) / k1 = Lt / kJ. Thus we may choose an independent set So such
that ISol = Lt/kJ and v is included in So. Since v is a major vertex, the
number of components in T - So is > (t + 3)/3. It is straightforward to
check that (t + 3)/3 ?: (t + 3)/k ?: (t - It/kJ)/(k -1) = IT - Sol /(k -1).
Here we need the following fact whose proof is omitted.

Let G = G(X, Y) be a bipartite graph with g vertices. If G has r components


and r ?: g/ f for some positive integer f, then G is equitable f -colorable.

Using the above fact, T - So can be equitably partitioned into k - 1


independent sets SI, S2, . .. ,Sk-l. The size of each Si, 1 :::; i :::; k-l, is either
l(t - It/kJ)/(k - I)J or f(t - It/kJ)/(k - 1)1- Now let t = qk+r for some q
and r such that 0:::; r < k. We see that (t -It/kJ)/(k -1) = (t - q)/(k -1)
and q :::; (t - q)/(k - 1) :::; q + 1. These facts imply that the size of each Si,
o :::; i :::; k - 1, is either lt / kJ or ft / k1- Therefore T is equitably colored with
k colors.
Case 2: Suppose.6. :::; (t + 3)/3. In this case, we are going to prove
the sufficiency by induction on k. The basis of our induction is Theorem
1.1 which asserts that any tree T is equitably 3-colorable if .6.(T) :::; (t +
3)/3. Now suppose that T is equitably (k - I)-colorable for any k ?: 4
whenever the tree T(X, Y) satisfies the conditions .6.(T) :::; (t + 3)/3 and
IIXI-IYI! > 1. We want to show that T is equitably k-colorable. Note that
max{3, f(t + 1)/(a(T - N(v)) + 1)l} = 3 for such a tree and an arbitrary
major vertex v.
If It/kJ = 0, then it is trivial that T is equitably k-colorable. If It/kJ = 1,
we let t = k + r for some 0 :::; r < k. By the induction hypothesis, T is
equitably (k - I)-colorable. If r < k - 1, then each color class is of size 1
or 2 and at least one color class is of size 2. T is equitably colored with k
colors if we recolor one vertex in that class of size 2 with a new color. If
r = k - 1, then all color classes are of size 2 except one which is of size 3.
Again T is equitably colored with k colors if we recolor one vertex in that
class of size 3 with a new color.
From now on, we assume that It/kJ ?: 2. Our first step is to find an
independent set So such that ISol = It/kJ and .6.(T - So) :::; (t - It/kJ +
3)/3 = (IT - Sol + 3)/3. Now consider the set M = {v E T : degT(v) >
554 Ko-Wei Lih

{t-lt/kJ +3)/3}. Every vertex of M is adjacent to more than It/kJ vertices.


We note that there are at most three vertices in M. If there were at least
four vertices in M, then, by counting edges incident to these vertices, we see
that the number is ~ (4/3)(t-lt/kJ +3) -3 = (t+ 1) + (t-4lt/k J )/3 ~ t+ 1
since k ~ 4. But the lower bound has already exceeded the total number of
edges ofT.
(i) M is empty. Since k > r{t + l)/{a{T - N{v)) + 1)1 for any chosen
major vertex v, we may repeat the argument used at the beginning of Case
1 to obtain an independent set 8 0 of size lt / kJ.
(ii) M = {vt}. Let 8 0 consist of It/kJ vertices adjacent t~ VI.
(iii) M = {VI, V2}. Let 8 0 consist of the vertex V2 and lt / k J - 1 other
vertices which are adjacent to VI and independent of V2.
(iv) M = {VI, V2, V3}. We may suppose that the pair V2 and V3 are not
adjacent. Let 80 consist of the vertices V2, V3 and It/kJ - 2 other vertices
which are adjacent to VI and independent of V2 and V3.
In all cases, the degree of any vertex f:. VI in T - 80 is ::; {t -It/ kJ +3)/3.
In cases (ii) and (iii), the degree of VI in T - 8 0 is ::; ~(T) - (It/kJ -1) ::;
{t+3)/3-lt/kJ+1 ~ {t-lt/kJ+3)/3+{3-2lt/kJ)/3::; (t-lt/kJ+3)/3 since
It/kJ ~ 2. In case (iv), if It/kJ = 2, then the number of edges incident to the
set M is > (3/3)(t-lt/k J+3) - 2 = t-1 since V2 and V3 were assumed to be
nonadjacent. However, T cannot contain more than t - 1 edges. Therefore
It/kJ 2: 3. Then the degree of VI in T - So is ::; .6.(T) - (It/kJ - 2) ::;
{t+3)/3-lt/kJ +2::; {t-lt/kJ +3)/3+{6-2lt/kJ)/3::; (t-lt/kJ +3)/3.
In summary, ~(T - So) ::; (t - It/kJ + 3)/3 = (IT - Sol + 3)/3.
In the second step, we observe that T - 8 0 may be a forest. Let the
component trees of T - So be ordered into the sequence T l , T2, ... , Th, h ~ 1.
Each Ti has at least two distinct leaves Wi and Zi if it is not just a single
vertex. In the latter case, we let Wi = Zi be the single vertex of Ti. If
every Ti is either a single vertex or an edge, then T - 8 0 can be equitably
partitioned into k - 1 independent sets 8 1 ,82 , ... ,8k-l easily. Otherwise,
we add new edges to connect ZI with W2, Z2 with W3, ... , Wh-l with Zh such
that a new tree T* is constructed and ~(T*) = ~(T - So). Thus the tree
T* satisfies the condition ~(T*) ::; (IT*I + 3)/3. Regard T* as a bipartite
graph T*{X*, Y*). If IIX*I-IY*II ::; 1, T* is equitably {k - I)-colorable
by Theorem 3.1. If IIX*I-IY*II > 1, T* is equitably {k - I)-colorable by
our induction hypothesis. In both cases, it follows at once that T - So can
be equitably partitioned into k - 1 independent sets 8 1 ,82, ... , Sk-l. The
same argument used in Case 1 will show that 80 ,81 ,82 , ... , 8k-l form an
The Equitable Coloring of Graphs 555

equitable partition of T into k independent sets. o

4 The Equitable 6.-Coloring Conjecture


In contrast to the ordinary proper coloring of a graph, the equitable col-
oring does not possess monotonicity, namely, a graph could be equitable
k-colorable without being equitably (k + 1)-colorable. Therefore the ECC
does not fully reveal the nature of the equitable coloring. It seems that
the maximum degree plays a crucial role here. After examining relevant
evidence, Chen, Lih and Wu [6] proposed the following.

The Equitable .6.-Coloring Conjecture (E.6.CC) Let G be a connected


graph. If G is not a complete graph, or an odd cycle, or a complete bipartite
graph K 2m+1 ,2m+l, then G is equitably .6.(G)-colorable.

The conclusion of the E.6.CC can be equivalently stated as X:


(G) <
.6.(G). It is also immediate to see that the E.6.CC implies the ECC. On
the reverse direction, if the ECC is true, so is the E.6.CC for non-regular
graphs. In Chen, Lih and Wu [6], the E.6.CC was settled for graphs whose
maximum degree is at least one-half of the order. The following two lemmas
supplied the basic tools for the solution. We use G C to denote the com-
plement graph of G. o( G) and a' (G) denote the minimum degree and the
edge-independence number of G, respectively.

Theorem 4.1 Let G be a disconnected graph. If G is different from (Km)C


and (K 2m +1 ,2m+d c for all m 2: 1, then a'(G) > o(G).

Theorem 4.2 Let G be a connected graph such that IGI > 2J( G) + 1. Sup-
pose the vertex set of G cannot be partitioned into a set H of size 0(G) and
an independent set I of size IGI- J( G) such that each vertex of I is adjacent
to all vertices of H. Then a'(G) > o(G).

Theorem 4.3 Let G be a connected graph with .6.( G) ~ IGI/2. If G is


different from Km and K 2m+1 ,2m+l for all m ~ 1, then G is equitably .6.(G)-
colorable.

As pointed out by Yap, a close examination of the proof of Theorem


4.3 in Chen, Lih and Wu [6] reveals that a stronger result was obtained,
namely, X:(G) ::; IGI - a'(GC) ::; .6.(G). In a similar vein, Yap and Zhang
556 Ko-Wei Lih

[26J made an analysis of the complement graph and succeeded in extending


the above result. Let r be the maximum number of vertex-disjoint triangles
,
in GC, 8 the maximum number of independent edges in G C- U {ai, bi , cd
i=l
for all possible vertex-disjoint triangles {a1' b1, C1}, ... , {a" b" c, } in GC , and
t = IGI - 3r - 28. Yap and Zhang proved the following.
Theorem 4.4 Let G be a connected graph such that (IGI /3) + 1 ::; ~(G) <
IGI/2. Then X~(G) ::; r + 8+ t::; ~(G).
Theorems 4.3 and Theorem 4.4 together establish the E~CC for graphs
with large maximum degrees.

Theorem 4.5 The E~ CC holds for all connected graphs G such that ~(G) 2
(IGI /3) + 1.
Along a different direction, we may try to tackle the E~CC for special
classes of graphs. By attaching appropriately chosen auxiliary graphs to
a non-regular graph, we can restrict our attention to regular graphs. An
equitable coloring of the extended regular graph will induce an equitable
coloring to the original non-regular graph. Chen, Lih and WU [6J followed
this strategy to show the next reduction result.

Theorem 4.6 The E~ CC holds if it does 80 for all regular graphs.

The first non-trivial case to investigate is cubic graphs, i.e., regular


graphs of degree 3. If the chromatic number of a connected cubic graph
G is 2, then the EflCC has already been established. We are going to show
the following.

Theorem 4.7 Let G be a connected cubic graph whose chromatic number


is 3. Then X=(G) = 3.

We need to introduce some notation. For a bipartite graph G(X, Y), its
components are usually expressed as G(X', Y'). Let A and B be subsets of
vertices of a graph G. For a nonnegative integer k, we use AkB denote the
set {x E A : x is adjacent to exactly k vertices of B}. When all vertices
in A (and in B) are colored with the same color, say red, (respectively, say
blue), A ¢=::} B means that we change the color of vertices in A into blue
and the color of vertices in B into red. A one-way arrow A B means ~
that we change the color of B into the color of A. We write A ¢:: x when
B = {x}. We also need the following easy technical lemma.
The Equitable Coloring of Graphs 557

Theorem 4.8 Let G(X, Y) be a connected bipartite graph such that IXI =
m ~ n = WI and ~(G(X, Y)) ::; 3. If IY3XI = t, then m - n::; t + 1.

Proof. (Theorem 4.7) Let G be properly colored with 3 colors. The difference
between the size of the largest and the size of the smallest color classes is
call the width of the coloring. The coloring is equitable if the width is 0 or
1. We are going to show by constructive steps that we always can recolor
some of the vertices to decrease the width if the width of a coloring is at
least 2.
Let us start with the three color classes A, Band C of sizes a, band c,
respectively, such that a ~ b ~ c and a - c ~ 2.
(1) If there is a vertex x E A3B, then C {= x will do. So from now on
we assume that A3B is empty.
(2) Suppose that C3A is empty. Consider the bipartite subgraph G(A, C)
induced by A and C. Since a ~ c + 2 and A3B is empty, there must
exist a component G(A', C') of G(A, C) such that IA'I > IG'I. We have
o < IA'I-IG'I ::; 1 by Lemma 4.8. Hence IA'I = IC'I + 1 and A' -¢=} C' will
decrease the width.
(3) Suppose that B3A is empty. Let IC3BI = t. Similar to step (2),
there must exist a component G(A', C') of G(A, C) such that IA'I > IC'I. If
IC'3A'I ::; t, then we have IA'I-IC'I equal to some t' ::; t + 1 by Lemma 4.8.
Choose a subset S ~ C3B such that lSI = t' - 1. Then A' -¢=} C' u S will
decrease the width since C' and S are disjoint.
If IC'3A'I 2: t + 1, then we consider the bipartite subgraph G(B, G) in-
duced by Band G. We want to do some recoloring so that B3A is nonempty
afterwards. If b = c, then B -¢=} G suffices. Otherwise, since IBI > IGI and
B3A is empty, there must exist a component G(B", G") of G(B, G) such
that IB"I > IG"I· Since IG"3B"1 ::; IG3BI = t, we have IB"I - le"l equal
to some t" ::; t + 1. Choose a subset S" ~ G3A such that IS" I = t". Then
B" -¢=} G" u S" will do since G" and S" are disjoint.
(4) Now A3B, C3A and B3A are all assumed to be nonempty.
If a = b, then C {= x will decrease the width for any vertex x E B3A.
Let a > b. If x E B3A and y E A3C, then B {= y and C {= x will decrease
the width. So we may assume that A3C is empty. Then there must exist a
component G(A',B') of G(A,B) such that IA'I > IB'I.
Let B'3A' be empty and x E B3A. It follows from Lemma 4.8 that
IA'I = IB'I + 1. Then C {= x and A' -¢=} B' will decrease the width.
558 Ko-Wei Lih

We may henceforth suppose that B'3A' is nonempty. If there is some


x E B'3A' such that one of its neighbors y satisfies y E A'IB', then B {= Y
and C {= x will decrease the width. Therefore, we may further assume that
x E B'3A' and y E N(x) imply y E A'2B' for any x and y.
Suppose that there are distinct x, y E B'3A' having intersecting neigh-
borhoods. If N(x) = N(y), we first do an exchange B {= z and C {= y for
any z E C3A. Then N(x) -I N{z) since otherwise N{x) = N{y) = N{z)
would force G = K 3 ,3. This is impossible since the chromatic number of G
was assumed to be 3. So we may suppose that N{x) -I N{y). Let us choose
wE C3A and keep it fixed.
Now let U E N{x)nN{y). Ifu is not adjacent to w, then B {= u, B {= w
and C {= {x,y} will decrease the width. If u E N{w) but there is a third
z E B'3A', then u is not adjacent to z and B {= w, C {= u and C {= z will
decrease the width.
It remains to handle two more cases: (i) every vertex in N{x) n N{y)
is adjacent to wand B'3A' = {X,y}i (ii) no two distinct X,y E B'3A' have
intersecting neighborhoods. In the former case, IN{x) U N{y)1 ~ 4 and
2 ~ IN{x) U N{y)1 ~ 1. From G{A',B') delete B'3A' and all those vertices
which are adjacent to two vertices of B'3A'. Then each remaining vertex will
have degree I or 2. Hence the resulting graph G'{A',B') decomposes into
maximal paths. If all paths have initial and terminal vertices in different
parts, the IB'I > IA'I, which is a contradiction .. Without loss of generality,
let uo, VI, U2, V3, ... ,U2m be a maximal path of G' (A', B') such that Uo is
adjacent to x E B'3A' and U2m E A'. If it is not the case that U2m E N (y)
for some y E B'3A' and y -I x, then C {= x together with A" {=::::} B"
will decrease the width, where A" = {uo, U2, ... ,U2m} ~ A' and B" =
{ VI, V3, ... , V2m-l} ~ B'. Suppose, on the other hand, that Uo E N (x) -
N{y) and U2m E N(y) - N(x) for distinct x and y in B'3A'. If w is adjacent
to some V E A", then v is not adjacent to at least one of x and y, say y.
Hence B {= wand C {= {v, y} will decrease the width. Otherwise, w is
independent of A". Then B {= w, C {= {x, y} and A" {=::::} B" will decrease
the width.
We finally conclude that the width has been decreased in all cases and
the proof is complete. 0

Corollary 4.9 The EtlCC holds for all connected graphs G such that tl(G) :::;
3.
Our method for handling cubic graphs do not generalize to regular graphs
of higher degrees.
The Equitable Coloring of Graphs 559

5 Special families supporting E~CC

There are interesting results dealing with special families of graphs that
provide positive evidence for the E6.CC. We begin with the family of split
graphs. A connected graph G is called a split graph if its vertex set can
be partitioned into two nonempty subsets U = {UI' U2, .. . , un} and V =
{VI, V2, ... , V r } such that U induces a complete graph and V induces an
independent set. We denote the split graph G as G[U; V] and always assume
that no vertex in V is adjacent to all vertices in U. We assign a family
of bipartite graphs BG(k), k ~ 1, to the given split graph G[U; V] in the
following way. The vertex set of B G (k) is {Uij : 1 ~ i ~ nand 1 ~ j ~ k} UV
and {Uij,Vt} is defined to be an edge of BG(k) if and only if Ui and Vt are
non-adjacent in G. Note that BG(k) is a subgraph of BG(k + 1). The
coloring of a split graph G[U; V] is closely related to independent edges
of the graphs BG(k). For instance, any given set of independent edges in
BG(k) induces a partial coloring of G in the following standard way. We use
the ith color to color Ui and all those vertices in V that are matched by the
edges to some Uij, 1 ~ j ~ k. Chen, Ko and Lih [3] proved the following.

Theorem 5.1 Let G[U; V] be a split graph such that lUI = nand IVI = r.
Let m = max {k : a' (BG(k)) = kn} if the set in question is nonempty; other-
wise letm be zero. Then X:(G[U; V]) = n+f(r - a'(BG(m + 1)))/(m + 2)1-

Once X: (G[U; V]) is known, it is straightforward to verify the E6.CC


for split graphs.
A graph is planar if it can be drawn on an Euclidean plane such that
edges only meet each other at points representing the vertices of the graph.
An outerplanar graph is a planar graph that has a drawing on the plane such
that every vertex lies on the unbounded face. Zhang and Yap [28] proved
the following.

Theorem 5.2 A planar graph G is equitably 6.(G)-colorable if 6.(G) ~ 13.

In an earlier paper, Yap and Zhang [27] gave an elegant proof to settle
the case for outerplanar graphs.

Theorem 5.3 Theorem 5.3 Let G be an outerplanar graph having 6.(G) ~


3, then G is equitably 6.( G) -colorable.

They also posed a question which is similar to Conjecture 2.4.


560 Ko-Wei Lih

Question 5.4 Is it true that x= (G) ~ l (6. (G) + 3) /2 J for an outerplanar


graph G having 6.( G) 2: 3?

A graph G is called an interval graph if finite closed intervals on the real


line can be assigned to the vertices such that two vertices are adjacent if
and only if their corresponding intervals intersect. By exploiting an appro-
priate ordering of the vertices of an interval graph, Chen, Lih and Yan in
a unpubli5~led note [8] can show that the E6.CC holds for interval graphs
and the equitable chromatic number of an interval graph coincides with its
equitable chromatic threshold.

6 Miscellaneous results
We start with the simplest graph On that is a set of n independent vertices.
Then On is equitably k-colorable with color classes of size lx J or Ix 1 if
and only if lxJ ~ In/kJ ~ In/kl ~ Ixl; or equivalently, if and only if
In/ Ixll ~ k ~ In/ lxJJ. Let K n1 ,n2, ... ,nt denote the complete t-partite
graph whose parts have sizes nl, n2, ... nt. Chen and Wu [9] proved the
following two theorems.

t
Theorem 6.1 X=(Kn1 ,n2, ... ,nt) =2: fndhl where h = max{k: ni 2: fndkl (k
i=1
1) for all i}.
t
Theorem 6.2 X:(Kn1 ,n2, ... ,nt) =2.:: Indhl, where h = min{k : there is i
i=1
such that ni < Ind(k + 1)1 k or there are ni and nj, i -I j, such that k
divides neither ni nor nj}.

Take two graphs G I and G2. We can use the Cartesian product of the
vertex sets V(G 1 ) x V(G 2 ) to be the vertex set of a new graph. There are
several ways to define the edge set of a product graph. We introduce two
products, the square product GIDG 2 and the cross product G 1 x G2. They
are also known as the Cartesian and the direct products, respectively. The
edge sets are defined as follows.
E(G 1 DG 2 ) = {(u,x)(v,y) : (u = v and xy E E(G 2 )) or (x = y and
uv E E(G 1 ))},
E(G 1 x G2) = {(u,x)(v,y) : uv E E(G 1 ) and xy E E(G 2)}.
The Equitable Coloring of Graphs 561

The following results concerning products are included in a note by Chen,


Lih and Yan [7].

Theorem 6.3 If G 1 and G 2 are equitably k-colorable, so is G 1 DG 2.

Sincex:(Gd ~ Ll(Gd+1 andx:(G2) ~ Ll(G 2 )+1, we have X:(G 1 DG 2) ~


max{ Ll( Gd + 1, Ll( G2) + 1}. When both G 1 and G 2 are non-trivial graphs,
Ll(G 1 DG 2) ~ max{Ll(G 1 ) + 1, b.(G 2 ) + 1}. Therefore G 1 DG 2 is equitably
Ll( G 1 DG 2)-colorable.

Theorem 6.5 (1) x= (Kn x Km) = min{ n, m}.


2 if nm is even;
(2) X=(Cn x Cm) = x:(Cn x Cm) = { 3',
otherwise.

7 Related notions of coloring


It is natural to ponder if there is an edge version for the equitable colorability.
A fundamental work was done in Hilton and de Werra [15]. An edge coloring
of a graph G is simply an assignment of colors to the edges of G. Given an
edge coloring of G with k colors Cl, C2, ... , Ck, for a vertex v, let Ci (v) denote
the set of edges incident with v colored Ci. This edge coloring is said to be
equitable if, for each vertex v, Ilci(V)I- ICj(v)11 ~ 1 when 1 :::; i < j :::; k.
The following theorem was proved in Hilton and de Werra [15].

Theorem 7.1 If k 2 does not divide the degree of any vertex, then the
~
graph has an equitable edge coloring with k colors.

This theorem reduces to Vizing's well-known theorem [23] on proper


edge coloring when k = b.(G) + 1. It implies Gupta's theorem [11] when
k = J(G) + 1. Hilton and de Werra made the observation at the end of their
paper that the colorings can be equalized in the sense that all color classes
have almost the same number of edges.
The notion of an equalized coloring can be further extended to involve the
total coloring of a graph. A total k-coloring 7r of a graph G is a mapping
7r : V(G) U E(G) -+ {1,2, ... ,k} such that any two adjacent or incident
562 Ko-Wei Lih

elements have distinct images. The total chromatic number Xt (G) is the
smallest integer k such that G has a total k-coloring. A total k-coloring is
said to be equalized if 117r- 1 (i)I-I7r- 1 (j)11 ::; 1 when 1 ::; i < j ::; k. Fu
[10] gave the first systematic treatment of the notion of an equalized total
coloring. Given a graph G, it would be nice to have an equalized k-coloring
whenever k ~ Xt(G). Even though this is true for complete graphs, complete
bipartite graphs and trees, it does not hold in general. Fu gave a family of
counterexamples. In the following statement, the join operation V connects
each vertex in one graph to all vertices of the other graph.

Theorem 7.2 Let G = nK2 V 02n-l for each n ~ 3. Then Xt(G) = 2n+ 1,
yet G has no equalized total (2n + I)-coloring.

Fu also asked the following questions and made one interesting conjec-
ture.

Question 7.3 For an arbitrary bipartite graph G, whether there exists an


equalized total k-coloring for each k ~ Xt(G)?

Question 7.4 Does the existence of an equalized total k-coloring for G im-
ply the existence of an equalized total (k + I)-coloring?

Conjecture 7.5 For any given graph G, G has an equalized total k-coloring
~ max{Xt(G), 6.(G) + 2}.
for each k

Partial results for this conjecture include graphs Gwith 6.(G) = IGI- 2
and complete t-partite graphs of odd order.
The last related notion we are going to discuss is the m-bounded col-
oring. The main concern here is not the evenness of the distribution of
resources, rather the limitations on the amount of resources. An m-bounded
coloring of a graph G is a proper coloring in which each color is used at
most m times. The m-bounded chromatic number Xm (G) of G is the small-
est number of colors required for an m-bounded coloring of G. This notion
has applications in scheduling problems and was first discussed in Hansen,
Hertz and Kuplinsky [14]. The problem of determining the m-bounded chro-
matic number of a tree was left open there. The solution of this problem
depends on the equitable coloring of trees and the following fact.

Lemma 7.6 If a graph G can be equitably colored with rlGI 1m1 colors, then
Xm(G) = riGl/ml·
The Equitable Coloring of Graphs 563

Theorem 7.7 Let T = T(X, Y) be a tree and v be an arbitrary major


vertex. Then
(1) Xm(T) = 2 when m ~ max{IXI, IYI}·
(2) Xm(T) = max{3, flTI jml, f(ITI- a(T - N(v)))jml +l} when m <
max{IXI, WI}.

It is easy to see (1). Now assume m < max{IXI ,IYI}. Color T with
an m-bounded coloring using Xm(T) colors. Let the chosen major vertex
v be colored, say, with color 1. The number of vertices colored with color
1 is at most a(T - N(v)). The remaining Xm(T) - 1 colors are enough to
give an m-bounded coloring of all other vertices. It follows that Xm(T) ~
f(ITI- a(T - N(v)))jm 1+ l. Hence the right hand side of (2) gives a lower
bound. For a proof of upper bound, we refer to Chen and Lih [4]. We also
note that in Chen, Ko and Lih [3] the m-bounded chromatic number for a
split graph was obtained in addition to its equitable chromatic number.

Theorem 7.8 Let G = G[U; V] be a split graph such that lUI = n and !VI =
r. Let m ~ 1 be a given integer. Then Xm(G[U; V]) = n+f(r - a'(BG(m -l)))jm-

8 Further research
The area of equitable coloring is wide open and we have barely scratched
the surface in this survey. Besides the conjectures and questions included in
the foregoing sections, there are other directions which could lead to fruitful
research. However we want to stress the significance of the Equitable ..6..-
Coloring Conjecture again. The settlement of this conjecture presumably
will reveal profound nature of the equitable colorability.
Although some of the equitable coloring results surveyed here can be ex-
tended to disconnected graphs. A comprehensive study of the disconnected
case has not been achieved. B.-L. Chen and K.-C. Huang have obtained
some preliminary results.
We should mention the issue of efficient algorithms. For special classes of
graphs, how to design efficient algorithms to color them as even as possible
is a largely unexplored area.
In his Ph.D. dissertation [24], W.-F. Wang introduced the notion of the
chromatic difference p( G) = X= (G) - X( G). Because we are interested only
in non-trivial graphs, 0 :s; p( G) :s; ..6..( G) - l.

Conjecture 8.1 For any graph G, we have O:S; p(G) :s; L..6..(G)j2J.
564 Ko-Wei Lih

The upper bound can be attained, for instance, by a star KI,n, n 2': 3.
This conjecture implies ECC for graphs G having X(G) ::; f1(G)j2 since
X=(G) = X(G) + p(G) ::; f1(G)j2 + lf1(G)j2J ::; f1(G). Conjecture 2.4
implies Conjecture 8.1 for connected bipatite graphs. Another intriguing
problem in the present context is the following.

Problem 8.2 Characterize those graphs G that satisfy p( G) = O.

We note that being regular is not sufficient to have p( G) = O. An


example is the graph 3K2 V K 5 . Its chromatic number is 3. To make the
chromatic difference zero, the sizes of the color classes must be 4, 4, 3. Yet
there is no way to have two independent sets of size 4.
In a similar vein, we are interested in the difference p' (G) = X: (G) -
x= (G) and a parallel problem can be proposed.

Problem 8.3 Characterize those graphs G that satisfy p' (G) = O.

Some of our previous results provide graphs G that satisfy p' (G) = O.
Even if the difference is non-zero, we are curious about what would happen
between the two numbers.
We are going to conclude this survey by some thoughts on equalization.
It may place an unreasonable restriction on applications if the distribution
of resources is required to be as even as possible. It would be enlightening
to have a chromatic theory such that the difference between the sizes of any
two color classes is within some prescribed bound. When that bound is 1,
it reduces to the equitable coloring of graphs. However the bound may be
significantly extended or may even depend on some structural parameters
of the graph involved.

References
[1] Bollobas and R. K. Guy, Equitable and proportional coloring of trees,
J. Combin. Theory Ser. B, 34 (1983) 177-186.

[2] R. L. Brooks, On colouring the nodes of a network, Pmc. Cambridge


Philos. Soc., 37 (1941) 194-197.
The Equitable Coloring of Graphs 565

[3] B.-L. Chen, M.-T. Ko, and K.-W. Lih, Equitable and m-bounded col-
oring of split graphs, in M. Deza, R. Euler, and 1. Manoussakis, eds.,
Combinatorics and Computer Science, Lecture Notes in Computer Sci-
ence 1120, (Springer, Berlin, 1996) 1-5.

[4] B.-L. Chen and K.-W. Lih, A note on the m-bounded chromatic number
of a tree, Europ. J. Combinatorics, 14 (1993) 311-312.

[5] B.-L. Chen and K.-W. Lih, Equitable coloring of trees, J. Combin.
Theory Ser B, 61 (1994) 83-87.

[6] B.-L. Chen, K.-W. Lih, and P.-L. Wu, Equitable coloring and the max-
imum degree, Europ. J. Combinatorics, 15 (1994) 443-447.

[7] B.-L. Chen, K.-W. Lih, and J.-H. Van, Equitable coloring of graph
products, manuscript, 1998.

[8] B.-L. Chen, K.-W. Lih, and J.-H. Van, A note on equitable coloring of
interval graphs, manuscript, 1998.

[9] B.-L. Chen and C.-H. Wu, The equitable coloring of complete partite
graphs, manuscript, 1994.

[10] H.-L. Fu, Some results on equalized total coloring, Congressus Numer-
antium, 102 (1994) 111-119.

[11] R. P. Gupta, On decompositions of a multigraph into spanning sub-


graphs, Bull. Amer. Math. Soc., 80 (1974) 500-502.

[12] R. K. Guy, Monthly research problems, 1969-1975, Amer. Math.


Monthly, 82 (1975) 995-1004.

[13] A. Hajnal and E. Szemeredi, Proof 6f a conjecture of Erdos, in: A.


Renyi and V. T. S6s, eds., Combinatorial Theory and Its Applications,
Vol. II, Colloq. Math. Soc. Janos Bolyai 4 (North-Holland, Amsterdam,
1970) 601-623.

[14] P. Hansen, A. Hertz, and J. Kuplinsky, Bounded vertex colorings of


graphs, Discrete Math., 111 (1993) 305-312.

[15] A. J. W. Hilton and D. de Werra, A sufficient condition for equitable


edge-colourings of simple graphs, Discrete Math., 128 (1994) 179-201.
566 Ko-Wei Lih

[16J T. R. Jensen and B. Toft, Graph Coloring Problems, (John Wiley &
Sons, New York, 1995).

[17J K.- W. Lih and P.-L. Wu, On equitable coloring of bipartite graphs,
Discrete Math., 151 (1996) 155-160.

[18J C.-Y. Lin, The Equitable Coloring of Bipartite Graphs, Master thesis,
Tunghai University, Taiwan, 1995.

[19J A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and


Its Applications, (Academic Press, New York, 1979).

[20J W. Meyer, Equitable coloring, Amer. Math. Monthly, 80 (1973) 920-


922.

[21J B. Toft, 75 graph-colouring problems, in: R. Nelson and R. J. Wilson,


eds., Graph Colourings, (Longman, Essex, 1990) 9-36.

[22J A. C. Tucker, Perfect graphs and an application to optimizing municipal


services, SIAM Rev., 15 (1973) 585-590.

[23J V. G. Vizing, On an estimate of the chromatic class of a p-graph (Rus-


sian), Diskret. Analiz., 3 (1964) 25-30.

[24] W. Wang, Equitable Colorings and Total Colorings of Graphs, Ph.D.


dissertation, Nanjing University, China, 1997.

[25] D. de Werra, Some uses of hypergraph in timetabling, Asis-Pacific J.


Oper. Res., 2 (1985) 2-12.

[26J H. P. Yap and Y. Zhang, On equitable coloring of graphs, manuscript,


1996.

[27J H. P. Yap and Y. Zhang, The equitable ~-colouring conjecture holds for
outerplanar graphs, Bull. Inst. Math. Acad. Sinica, 25 (1997) 143-149.

[28] Y. Zhang and H. P. Yap, equitable colorings of planar graphs, to appear


in J. Cambin. Math. Combin. Comput.
567

HANDBOOK OF COMBINATORIAL OPTIMIZATION


D.-Z. Du and P.M. Pardalos (Eds.) pp. 567-620
©1998 Kluwer Academic Publishers

Randomized Parallel Algorithms for


Combinatorial Optimization
Sanguthevar Rajasekaran
Department of Computer and Information Science and Engineering
University of Florida, Gainesville, FL 32611
E-mail: raj COcise. ufl. edu

Jose D. P. Rolim
Centre Universitaire d'Informatique, University of Geneva, CH
Email: rolimCOcui.unige.ch

Contents
1 Introduction 568

2 Preliminaries 570
2.1 Parallel Models of Computing . . . . . . . . . . . . . . . . . . . . . . 570
2.2 Definition of Some Interconnection Networks . . . . . . . . . . . . . 571
2.3 Randomized Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 572
2.4 Chernoff Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
2.5 The Complexity Classes.Ne and nNe ................ 574

3 Parallel Sorting Algorithms 574


3.1 A Generic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
3.2 The PRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
3.3 The Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
3.4 The Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578

4 Packet Routing 579


4.1 The Mesh Connected Computer. . . . . . . . . . . . . . . . . . . . . 580
4.2 The Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
568 s. Rajasekaran and J.D.P. Rolim

5 Parallel Shortest Paths Computations and


Breadth First Search on Undirected Graphs 586
5.1 Ullmann and Yannakakis's algorithm for Breadth First Search. 588
5.2 Shortest paths computations in weighted graphs . . . . . . . . . 589

6 Parallel Randomized Techniques for Matching Problems and Their


Applications 591
6.1 Deciding the Existence of a Perfect Matching . . . . . . . . 591
6.2 Constructing a Perfect Matching . . . . . . . . . . . . . . . . 594
6.3 Minimum Weight Perfect Matching and Maximum Matching 597
6.4 Parallel Depth First Search . . . . . . . . . . . . . . . . . . . 598

7 Minimum Cost Spanning Trees 603

8 Luby's Method and the Maximal Independent Set problem 604

9 Randomization and approximation 608

10 Conclusions 613

References

Abstract. In this paper we show some important randomiza-


tion techniques for the parallel processing of discrete problems.
In particular, we present several parallel randomized algorithms
frequently used for sorting, packet routing, shortest paths prob-
lems, matching problems, depth first search, minimum cost span-
ning trees, and maximal independent set problems. We also dis-
cuss the connection between randomization and approximation,
showing how randomization yields approximate solutions and we
illustrate this connection by means of network flow problems.

1 Introd uction
This paper presents examples to illustrate some important randomization
techniques that have been fruitfully applied in the design of parallel algo-
rithms. Thus, the aim is to give simple algorithms which well-represent
the key-ideas of such techniques rather than the detailed description of the
most efficient algorithms in the literature (for the latter we provide suitable
pointers to the literature). We also illustrate some interesting applications
Randomized Parallel Algorithms 569

which underline the importance of such techniques. Of course we do not


and can not claim any completeness since research in this area is extremely
active and a suitable selection has to be made.

The paper is organized as follows. Section 2 provides the basic notations


and definitions adopted in the rest of the paper. In Section 3, we present
algorithms for sorting. Sorting is an important comparison problem that
has been studied extensively. In parallel computing, interprocessor commu-
nication plays a vital role. A single step of communication can be a thought
of as a packet routing task. In Section 4 we discuss a two phase strategy
for packet routing that has been employed over a variety of interconnection
networks.

In Section 5, we first present a randomized parallel technique for com-


puting shortest paths in unweighted graphs and then we discuss recent effi-
cient algorithms for weighted graphs which make use of the same technique.
Section 6 is devoted to the matching problem in graphs. In particular, an el-
egant parallel randomized technique is introduced. Such a technique applies
to several versions of the matching problem. Also it serves as a key-tool for
studying other fundamental topics in the theory of computational complex-
ity such as the connections between search and decision problems from a
parallel complexity point of view and the theory of randomized reductions.
We then turn to practical applications of this technique. In Section 6.4, we
describe a parallel algorithm for constructing the Depth First Search tree
in a graph, which is strongly based on matching computations on weighted
graphs.

Section 7 is devoted to a discussion on the problem of finding a minimum


cost spanning tree. Another important technique for solving optimization
problems efficiently in parallel is based on a general probabilistic result, com-
monly called the pairwise independence lemma, which has several important
applications. In Section 8, we apply this method to develop a simple and
efficient algorithm for the MAX IND SET problem.

In some cases, approximation techniques and the use of randomization


can be fruitfully combined in order to have efficient parallel algorithms for
important optimization problems. Section 9 is devoted to this argument.
We illustrate an 'RNC-approximation scheme for the MAX FLOW problem.
Section 10 concludes the paper.
570 S. Raja.sekaran and J.D.P. Rolim

2 Preliminaries
In this section we review some basics and definitions that we will adopt
throughout this paper.

2.1 Parallel Models of Computing


If P processors are used to solve a given problem, then there is a potential
of obtaining a speedup of up to P. If S is the best known sequential run
time for solving a given problem and T is the parallel run time using P
processors, then PT ~ 8. Otherwise one can simulate the parallel algorithm
using a single processor and get a run time better than 8 leading to a
contradiction. PT is refered to as the work done by the parallel algorithm.
Any parallel algorithm for which PT = 0(8) will be referred to as a work-
optimal algorithm.
There exist numerous parallel models of computing, partly due to the
fact that differing parallel architectures can be conceived of and have been
employed in practice. Each processor in a parallel machine can be thought
of as a random access machine (RAM). Depending on how interprocessor
communications are implemented, variations among different architectures
arise. For the purpose of this paper we categorize parallel models into shared
memory models and fixed connection machines.
A shared memory model (also known as a Parallel Random Access Ma-
chine (PRAM)) is a collection of RAMs working synchronously. In a PRAM
communication takes place with the help of a common block of common
memory. If processor i wants to communicate with processor j (for any i
and j), it can do so by writing a message in memory cell j which can then
be read by processor j.
Conflicts can arise when processors try to access the common memory
for either writing into or reading from. Depending on how these conflicts
are handled, a PRAM can further be classified into three. In an Exclusive
Read and Exclusive Write (EREW) PRAM concurrent reads or concurrent
writes are not allowed. A Concurrent Read and Exclusive Write (CREW)
PRAM permits concurrent reads but not concurrent writes. A Concurrent
Read and Concurrent Write (CRCW) PRAM allows both concurrent reads
and concurrent writes. For a CRCW PRAM, it is necessary to have a
protocol for handling write conflicts, since the processors trying to write at
the same time in the same cell can possibly have different data to write and
we should determine which data gets written. This is not a problem in the
Randomized Parallel Algorithms 571

case of concurrent reads since the data read by different processors will be
the same. In a Common-CRCW PRAM, concurrent writes are permissible
only if the processors trying to access the same cell have the same data to
write. In an Arbitrary-CRCW PRAM, if more than one processor tries to
write in the same cell at the same time, an arbitrary one of them succeeds.
In a Priority-CRCW PRAM, write conflicts are resolved using priorities
assigned to the processors.
A fixed connection machine (or a fixed connection network) is generally
represented as a directed graph whose nodes correspond to processors and
whose edges correspond to communication links. If there is an edge con-
necting two processors, they can communicate in a unit step. Processors
not connected by edges can communicate by sending messages along paths
connecting them. This way of communicating is also known as packet rout-
ing. Each processor in a fixed connection machine is a RAM. Examples of
fixed connection machines include the mesh, the hypercube, the star graph,
etc.

2.2 Definition of Some Interconnection Networks


Numerous interconnection networks have been proposed and used by re-
searchers in the area of parallel computing. The most popular networks
are the mesh, the hypercube, and their variants. Machines have been built
based on these models. In this section we define some of these models.

The Mesh. A mesh connected computer (referred to as the Mesh from


hereon) is an n x n square grid where there is a processor at each grid point.
A processor is connected to its four or less neighbors through bidirectional
links. In one unit of time a processor can perform a local computation
and/ or communicate with all its neighbors. The Mesh has been studied
extensively and algorithms have been devised for a large number of funda-
mental problems. Several variants of the Mesh have also been proposed and
investigated.
In a Mesh with fixed buses (denoted as Mf) one assumes that each
row and each column has been augmented with a broadcast bus. A single
message can be broadcast along any bus at any time. This message can be
read by all the processors connected to this bus in the same time unit. In a
Mesh with reconfigurable buses (denoted as M r ), processors are connected
to a reconfigurable broadcast bus. At any given time, the broadcast bus can
be partitioned (i.e., reconfigured) dynamically into subbuses with the help
572 S. Rajasekaran and J.D.P. Rolim

of locally controllable switches. Details on these models can be found e.g.


in [58].
Variants of the Mesh with buses wherein the buses are implemented
using optical technology have also been investigated. See e.g. [56]. Two
such models are the arrays with reconfigurable optical buses (AROBs) and
the optical transpose interconnection system Mesh (OTIS-Mesh).

The Hypercube. A hypercube of dimension d consists of p = 2d nodes


and d2 d- 1 edges. Each nod~ in the hypercube can be labeled with ad-bit
binary number. We use the same symbol to denote a node and its label.
A node x E V is bidirectionally connected to another node y E V if and
only if x and y differ in exactly one bit position (Le., the hamming distance
between x and y is 1.) Therefore, there are exactly d edges going out of
(and coming into) any node.
The butterfly and the eee networks are closely related to the hypercube
(see e.g. [26]). The star graph is somewhat related to the hypercube and
has sublogarithmic diameter in the network size (see e.g. [58]).

2.3 Randomized Algorithms


At least three different time complexities can be used to judge the perfor-
mance of any algorithm: the best case, the worst case, and the average case.
Many algorithms have better average case run times than their worst case
run times. For example, the worst case run time of Hoare's quicksort is
O(n 2 ), whereas its average case run time is only O(nlogn). The validity of
using the average case run time as a performance measure of an algorithm
very much depends on the assumption made on the input space, namely
that each input is equally likely.
Randomized algorithms exploit randomness without making any assump-
tions on the input space. Informally, a randomized algorithm is one wherein
certain decisions are made based on the outcomes of coin-flips made in the
algorithm. A randomized algorithm with one possible sequence of outcomes
for the coin flips can be thought of as being different from the same algo-
rithm with a different sequence of outcomes for the coin flips. As a result,
a randomized algorithm can be thought of as a family of algorithms. For
a given input, some of the algorithms in this family might have a 'poor
performance'. We should ensure that the number of such 'bad algorithms'
in the family is only a small fraction of the total number of algorithms. If
for any possible input we can find at least (1 - €) (€ being very close to
Randomized Parallel Algorithms 573

0) portion of algorithms in the family that will have a 'good performance'


on that input, then clearly, a random algorithm in the family will have a
'good performance' on any input with probability;:::: (1 - f). In this case
we say that this family of algorithms (or this randomized algorithm) has a
'good performance' with probability at least (1 - f). Here € is called the
error probability. Note that this probability is independent of the input
distribution.
The term 'good performance' can be interpreted in many ways. It could
mean that the algorithm outputs the correct answer or that its run time is
small, and so on. Accordingly we can think of different types of randomized
algorithms. A randomized algorithm that always outputs the correct answer
but whose run time is a random variable (possibly with a small mean) is
referred to as a Las Vegas algorithm. On the other hand, a Monte Carlo
algorithm is a randomized algorithm that runs for a predetermined amount
of time but whose output may be incorrect occasionally.
Asymptotic functions such as 0(.) and 0(.) can be modified for random-
ized algorithms as follows. The amount of resource (like time, space, etc.)
used by a randomized algorithm is defined to be O(f(n)) if there exists a
constant c such that the amount of resource used is no more than cCtf(n)
with probability;:::: 1- n -0: on any input of size n and for any positive Ct ;:::: 1.
Similar definitions apply to 0(.) and e(.) as well. By high probability we
mean a probability of;:::: 1- n-O: for any fixed Ct ;:::: 1 (n being the input size
of the problem at hand).

2.4 Chernoff Bounds


One of the frequently used results from probability theory in the analysis of
randomized algorithms is Chernoff bounds. It gives tight upper bounds in
the tail ends of a binomial distribution.
A Bernoulli trial is an experiment with two possible outcomes, namely,
success and failure. The probability of success is p. Let X be the number
of successes in a sequence of n independent Bernoulli trials. The random
variable X is said to have a binomial distribution with parameters (n, p).
We use B(n,p) to denote a binomial distribution with parameter (n,p). The
mean of X is np and the distribution of X is given by

Prob.[X ~ i] ~ (:)pi(l _ p)n-i.

Chernoff bounds can be stated as follows.


574 s. Rajasekaran and J.D.P. Rolim

Lemma 2.1 If X is binomially distributed with parameters (n, p), and


m > np is an integer, then

Prob.(X ~ m) $ c:::) m e(m-np). (1)

Also, Prob.(X $ L(l - €)pnJ) $ e(-E 2np/2) (2)


and Prob.(X ~ r(l + €)npl) < e(-E 2np/3) (3)
for any fixed €, 0<€ < 1. D

2.5 The Complexity Classes Ne and nNe


Informally, a parallel algorithm is fast and efficient if it runs in polylogarith-
mic time and uses a polynomial number of processors (in the input size).
The complexity class NC consists of all those problems that can be solved
by an algorithm having the above kind of performances. Furthermore, with
the term NC-reduction, we will mean an algorithmic transformation from
one problem to another that can be performed in parallel polylogarithmic
time, using a polynomial number of processors (with respect to the input
size of the first problem).
The complexity class NC remains the same over a variety of parallel
models, in particular over the PRAMs and some standard fixed connection
networks such as the the hypercube, the star graph, etc.
In a way equivalent to NC, we can define the class nNC. nNC con-
sists of all the problems that can be solved in polylogarithmic time using a
polynomial number of processors and randomized algorithms.
Concerning the relation among complexity classes, it is not hard to show
that nNC ~ BPP, where BPP is the class of problems solvable in polyno-
mial time by a randomized algorithm with bounded error probability and it
is also conjectured that P is not in nNC. However, it is unknown whether
or not nNC is in P and, even more, whether or not nNC is in NC.

3 Parallel Sorting Algorithms


The problem of sorting a given sequence of n keys is to rearrange this se-
quence in nondecreasing order. This important comparison problem has
been studied extensively by computer scientists. Sorting has asymptoti-
cally optimal sequential algorithms. There are several sorting algorithms
Randomized Parallel Algorithms 575

that run in time O(nlogn) in the worst case (see e.g., [26]). Optimal sort-
ing algorithms exist on certain parallel models like the EREW PRAM, the
comparison tree model, the mesh, etc. In this section we provide a summary
of these results.
Random sampling has been widely used in the design of sorting algo-
rithms over a variety of models of computing. The following sampling
scheme was proposed by Frazer and McKellar [19]. 1) Randomly sample
o(n) keys from the input and sort them using any nonoptimal algorithm; 2)
Partition the input into independent subsequences using the sample keys as
splitter elements; and 3) Sort each subsequence independently. In contrast,
in Hoare's quicksort algorithm [25] a single key is employed to partition the
input. This elegant approach of Frazer and McKellar has been adapted to
design sorting algorithms on various interconnection networks.
Recently various sorting algorithms (both deterministic and randomized)
have been implemented on different parallel machines. These experimental
results indicate that randomized algorithms perform better in practice than
their deterministic counterparts (see e.g., [8], [24], [70]).

3.1 A Generic Algorithm


In this section we describe the algorithm of Frazer and McKellar.

Algorithm AO

1. Randomly sample n E , for some constant f < 1, keys.


2. Sort this sample.

3. Partition the input using the sorted sample as splitter keys.

4. Sort each part independently in parallel.

3.2 The PRAM


A classical result in parallel sorting is Batcher's algorithm [6]. This al-
gorithm is based on the idea of bitonic sorting and was proposed for the
hypercube and hence can be run on any of the PRAMs as well. Batcher's
algorithm runs in 0(1og2 n) time using n processors. Preparata's algorithm
[53] is very nearly optimal and it uses n log n CREW PRAM processors to
576 S. Rajasekaran and J.D.P. Rolim

II AUTHOR(S) I YEAR I MODEL I p T I


Preparata [53] 1978 CREW nlogn O(logn)
Reischuk [65] 1981 CREW n O(logn)
Cole [12] 1987 EREW n O(logn)
Cole [12] 1987 CRCW n(logny O( logn )
iogiogiogn
Rajasekaran & Reif [59] 1988 CRCW n(logn)E O(io:~~n)

Table 1: Sorting on the PRAM

achieve a run time of O(logn). Finding a logarithmic time optimal parallel


algorithm for sorting remained an open problem for a long time in spite of
numerous attempts. In 1981, Reischuk was able to design a randomized log-
arithmic time optimal algorithm for the CREW PRAM [65] using the basic
technique of Algorithm AO. At around the same time Ajtai, Koml6s, and
Szemeredi announced their sorting network of depth O(logn) [1]. However
the size ofthe circuit was O(n log n) and also the underlying constant in the
time bound was enormous. Leighton subsequently was able to reduce the
circuit size to O(n) using the technique of columnsort [41]. Though several
attempts have been made to improve the constant in the time bound, the
algorithm of [1] remains mostly as a result of theoretical interest.

An optimal logarithmic time EREW PRAM algorithm for sorting with


a reasonably small underlying constant was given by Cole in 1987 [12]. In
the same paper, a sub-logarithmic time algorithm for sorting on the CRCW
PRAM is also given. This algorithm uses n(logn)E processors, for any con-
stant € > 0, the run time being O(iog~~g~ n)' The lower bound theorem
of Beam and Hastad implies that any cllcw PRAM algorithm for sorting
needs n(ioio~n n) time in the worst case, given only a polynomial number of
processors 1if,Rajasekaran and Reif [59] were able to obtain a randomized
algorithm for sorting on the CRCW PRAM that runs in time O(lo~~;n)'
the processor bound being n(log nY, for any fixed € > O. This algorithm is
also processor-optimal, i.e., to achieve the same time bound the processor
bound can not be decreased any further.

Table 1 summarizes these results.


Randomized Parallel Algorithms 577

3.3 The Mesh

Consider an n x n Mesh where there is a key at each node. Since the diameter
of an n x n Mesh is 2n - 2, any sorting algorithm will have to spend O(n)
time. Designing an O(n) time sorting algorithm for the Mesh was a long-
standing open problem. In 1977, Thompson and Kung [71] presented the
first O(n) time algorithm. This algorithm uses a variant of the odd-even
merge sort. Since a Mesh has a large diameter, it is essential to have not
only asymptotic optimality but also small underlying constants in the time
bounds. The challenge in designing Mesh algorithms lies in reducing the
constants in the time bounds.
Subsequent to Thompson and Kung's algorithm, Schnorr and Shamir
gave a 3n + o(n) time algorithm [66]. A lower bound of 3n - o(n) was
proven for sorting in the same paper. However, both the upper bound and
the lower bound were derived under the assumption of no queueing. Ma,
Sen, and Scherson [44] gave a near optimal algorithm for a related model.
Using the technique of Algorithm AO, Kaklamanis, Krizanc, Narayanan, and
Tsantilas presented a very interesting algorithm for sorting with a run time
of 2.5n + o(n). This algorithm used queues of size 0(1). The same authors
latter improved this time bound to 2n + o(n) [28].
A number of deterministic sorting algorithms that use queues have been
devised as well. Kunde's algorithm [39] has a run time of 2.5n + o(n)j
Nigam and Sahni have given a (2 + €)n + o(n) time algorithm (for any fixed
€ > 0) [48]j Kaufmann, Sibeyn, and Torsten have offered a 2n + o(n) time
algorithm [34]. The third algorithm closely resembles the one given by [28]
and Algorithm AO.
The problem of k - k sorting is to sort a Mesh where there are k elements
at each node. The bisection lower bound for this problem is k2n. For example,
if we have to interchange data from one half of the mesh with data from
the other half, k2n routing steps will be needed. A very nearly optimal
randomized algorithm for k - k sorting is given in [57]. Recently, Kunde [39]
has matched this result with a deterministic algorithm.
Table 2 summarizes sorting algorithms for the Mesh.
Sorting on variants of the mesh has been explored extensively as well. For
a summary of sorting algorithms on the mesh with fixed and reconfigurable
buses, the reader is refered to [58]. A survey of sorting algorithms on the
optical variants of the mesh (including the AROB and OTIS-Mesh) can be
found in [56].
578 S. Rajaselmran and J.D.P. Rolim

AUTHOR(S) YEAR PROBLEM TIME


Thompson & Kung [71] 1977 1-1 Sort 7n + o(n)
Schnorr & Shamir [66] 1986 1-1 Sort 3n+ o(n)
Ma, Sen, & Scherson [44] 1986 1-1 Sort 4n+ o(n)
Kunde [39] 1991 1-1 Sort 2.5n + o(n)
KKNT [28] 1992 1-1 Sort 2n+ o(n)
Nigam & Sahni [48] 1993 1-1 Sort (2 + e)n + o(n)
KST [34] 1993 1-1 Sort 2n + o(n)
Kunde [39] 1991 k-kSort kn + o(kn)
Rajasekaran [57] 1991 k-k Sort ~ + 2n + o(kn)

Table 2: Mesh Algorithms for Sorting

3.4 The Hypercube

Batcher's algorithm runs in O{log2 n) time on an n-node hypercube [6]. This


algorithm uses the technique of bitonic sorting. Odd-even merge sorting can
also be employed on the hypercube to obtain the same time bound. A loga-
rithmic time sorting algorithm was proposed by Nassimi and Sahni [47]. But
their algorithm uses nl+ E processors (for any fixed e > 0). This algorithm,
known as sparse enumeration sort, has found numerous applications in the
design of other sorting algorithms on various interconnection networks. A
slightly simpler algorithm that has the same time and processor bounds has
been recently proposed in [55]. Reif and Valiant employed a variant of Al-
gorithm AO to obtain an optimal randomized algorithm for sorting on the
CCC [64]. The best known deterministic algorithm for sorting on the hy-
percube (or any variant) is due to Cypher and Plaxton [16]. This algorithm
takes O{lognloglogn) time with a large underlying constant. It makes use
of a deterministic sampling technique. Table 3 carries a summary of these
results.

Numerous algorithms have been developed for sorting on the star graph
and related networks. A summary can be found e.g. in [58].
Randomized Parallel Algorithms 579

I AUTHOR(S) I YEAR I MODEL p T II


Batcher [6] 1968 Network n ~ log:.! n
Nassimi & Sahni [47] 1981 Hypercube n +E
1 O(logn)
Reif & Valiant [64] 1983 eee n O(logn)
Cypher & Plaxton [16] 1990 Hypercube n O(log n log log n)

Table 3: Sorting on the Hypercube

4 Packet Routing
Factors that determine the speed of a parallel computer are 1) the com-
puting power of individual processors, and 2) the speed of inter-processor
communication. Present day technology makes it possible to increase the
computing powers of processors arbitrarily. Thus the speed of a parallel ma-
chine critically depends on the speed of inter-processor communications. In
a fixed connection machine, a single step of inter-processor communication
can be thought of as the following task (also called packet routing): Each
processor has a packet of information that has to be sent to some other
processor. Send all the packets to their correct destinations as quickly as
possible. At most one packet should pass through any wire at any time.
Partial permutation routing is a special case of the routing problem,
where each node is the origin of at most one packet and each node is the
destination of at most one packet. Two criteria are normally used to judge
the performance of a packet routing algorithm: 1) its run time, i.e., the time
taken by the last packet to reach its destination, and 2) its queue length,
which is defined as the maximum number of packets any node will have to
store during routing. Priority schemes are used to resolve edge contentions.
Furthest destination first, furthest origin first, etc. are examples of priority
schemes. A packet not only contains the message (from one processor to
another) but also the origin and destination information of this packet. A
packet routing algorithm is specified by 1) the paths to be taken by the
packets, and 2) priority schemes for resolving contentions.
Another important instance of the packet routing problem is k- k routing
(also known as k-relations routing). In k - k routing at most k packets
originate from any node and at most k packets are destined for any node.
Valiant proposed a two phase strategy for packet routing in [75]. A
580 S. Rajasekaran and J.D.P. Rolim

packet is routed to a random destination in the first phase. In the second


phase the packet traverses to its actual destination using the shortest path
from its intermediate location. This scheme has been proven useful over a
variety of networks including the mesh, the hypercube, and their variants.
In this section we describe some of the results that have been obtained.

4.1 The Mesh Connected Computer


Mesh connected computers are very popular not only as interesting theoret-
ical models but also as useful practical models. Some of the special features
of a Mesh are: 1) It has a simple interconnection pattern, 2) Many prob-
lems have data which map naturally onto the Mesh, and 3) The Mesh is
linear-scalable.
Consider the problem ofrouting on an n x n Mesh. A 3n+o(n) time al-
gorithm with a queue size of O(log n) was given by Valiant and Brebner [76].
for permutation routing. Later Rajasekaran and Tsantilas [60] presented a
2n + O(logn) time algorithm with a queue size of 0(1). In this section we
provide summaries of these algorithms. We first describe the algorithm of
Valiant and Brebner followed by a description of Rajasekaran and Tsantilas'
algorithm. Before we do so, we provide some preliminary ideas that will be
useful in understanding the later algorithms.

The Queue Line Lemma. The time taken by any packet to reach its
destination is dictated by two factors: 1) the distance between the packet's
origin and destination, and 2) the number of steps (also called the delay) the
packet waits in queues. The Queue Line Lemma enables one to compute an
upper bound on the delay of any packet.
Let P be the set of paths taken by the packets. Two packets are said to
overlap if they share at least one edge in their paths. We say the set of paths
is nonrepeating if for any two paths in P, the following statement holds: If
these two paths meet, share some successive edges, and diverge, then they
will never meet again.

Lemma 4.1 The amount of delay any packet q suffers waiting in queues is
no more than the number of distinct packets that overlap with q, provided
the set of paths taken by packets is nonrepeating.

Proof. Let 7r be an arbitrary packet. If 7r is delayed by each of the packets


that overlap with 7r no more than once, the lemma is proven. Else, if a
Randomized Parallel Algorithms 581

packet (call it q) overlapping with 1r delays 1r twice (say), then it means that
q has been delayed by another packet which also overlaps with 1r and which
will never get to delay 1r. 0

Routing on a Linear Array. Now we study two different routing prob-


lems on a linear array (which is a one-dimensional version of the Mesh).
This study will help us analyze routing algorithms on the Mesh. Routing on
a Mesh can typically be broken into a constant number of phases, where in
each phase routing is performed either along the rows or along the columns.

Problem 1. On an n-node linear array, node i has ki (1 ~ ki ~ n & ~f=l ki =


n) packets initially (for i = 1,2, ... , n). Each node is the destination for ex-
actly one packet. Route the packets.
Lemma 4.2 Under the application of the furthest destination first priority
scheme, the time needed for a packet starting at node i to reach its destina-
tion is no more than the distance between i and the boundary in the direction
the packet is moving. In other words, if the packet is moving from left to
right then this time is no more than (n - i) and if the packet is moving from
right to left the time is ~ (i - 1).
Proof. Let q be a packet whose origin and destination are i and j, respec-
tively. Without loss of generality assume that it is moving from left to right.
The packet q can only be delayed by the packets with destinations > j and
which are to the left of their destinations. Let kb k2,"" kn be the number
of such packets (at the beginning) at nodes 1,2, ... ,n respectively. (Realize
that ~i=l kl ~ n - j).
Let m be such that km - 1 > 1 and km , ~ 1 for m ~ m' ~ n. Call the
sequence km, km+1,"" kn the free sequence. Note that a packet in the free
sequence will not be delayed by any other packet in the future. In addition,
at every time step at least one new packet joins the free sequence. As a
result, after n - j steps, all the packets that can possibly delay q would have
joined the free sequence. The packet q needs only an additional ~ (j - i)
steps to reach its destination. The case the packet moves from right to left
is similar.
The queue line lemma can also be used to prove the same result. 0

Problem 2. In an n-node linear array more than one packet can originate
from any node and more than one packet can be destined for any node. Also,
582 S. Rajasekaran and J.D.P. Rolim

the number of packets originating from the nodes 1,2, ... ,j ~ (j + f(n))
(for any j and some function f). Route the packets.

Lemma 4.3 If the furthest origin first priority scheme is employed, Prob-
lem 2 can be solved within n + f(n) steps.

Proof. Let i and j be the origin and destination, respectively, of the packet
q where j is to the right of i. The packet q can potentially be delayed by
at most i + f (n) packets. This is because only these many packets can
originate from the nodes 1,2, ... ,i and hence have a higher priority than q.
The packet q needs only an additional j - i steps to reach its destination.
Therefore the total time needed for q is ~ j + f (n). The maximum of this
time over all the packets is n + f (n). 0

Now we are ready to describe the routing algorithms for the Mesh.

A 3n+o(n) Step Algorithm. Let there be a packet initially at each node


of an n x n Mesh. Name the processors with tuples (i, j), i = 1, 2, ... ,n, j =
1,2, ... ,n with (1,1) at the top left corner. Consider a packet q whose
origin is (i, j) and whose destination is (r, s). There are three phases in the
algorithm.
In phase I, q travels along the column j to a random node (k, j). In phase
II, q traverses to (k, s) along the row k. And finally, in phase III, q goes to
(r, s) along the column s. The furthest origin first priority scheme is used
in phase II. The furthest destination first priority scheme (with ties broken
arbitrarily) is used in phase III. Clearly, each phase of routing corresponds
to routing on a linear array.
Phase I can be completed in ~ n steps, since there is only one packet at
each node at the beginning. There is no delay for any packet. The number
of packets that can start phase II from an arbitrary node (k, l) in row k (1 ~
k ~ n) is B(n, lin), since in phase I each ofthe n packets in column 1 would
have chosen (k, l) with probability lin. Thus, the total number of packets
starting their phase II from anyone of the nodes (k, 1), (k, 2), ... ,(k,j) is
B(nj, lin). Using Chernoff bounds (equation 1), this number is no more
than j + o(n). Applying Lemma 4.3, we see that phase II can be completed
in n + o(n) steps. Phase III is nothing but Problem 1 and hence can be
performed within n steps. In summary, the algorithm has a run time of
3n + o(n).
Randomized Parallel Algorithms 583

A (2 + I:)n Routing Algorithm. The run time of the above algorithm


can be improved to (2 + I:)n, for any fixed I: ;::: 1/ log n. The modification is
as follows. In phase I, each packet is sent to a random node in its column
as before but within a distance of m from its origin.
To be more precise, divide each column of the Mesh into slices of length
m. Assume without loss of generality that m divides n. In phase I a
processor (i, j) chooses a random node in column j within the slice it is in
and sends its packet to that node along column j. Phases II and III remain
the same.

Lemma 4.4 The above modified algorithm runs in time (2 + I:)n + o(n).

Proof. Phase I can now be implemented in time m without any additional


delay since there is only one packet at each node to begin with.
Let q be a packet that starts its phase II at (k, j). Without loss of
generality, assume that it is moving to the right. The number of packets
starting their phase II in (k, 1), (k,2), ... , (k,j) is B(jm, f~)' The mean of
this binomial random variable is j. Using Chernoff bounds (equation 1),
this number is j + o(n). Employing Lemma 4.3, we see that phase II can be
completed in n + o(n) steps.
There are n packets starting from any column in phase III and each node
is the destination of exactly one packet. As a result, using Lemma 4.2, phase
III terminates within n steps. 0

It is possible to choose as small an I: as desired in order to decrease the


run time. But, the smaller the value of 1:, the larger will be the queue size.
In phase I, packets randomize over slices of length m. So, for the worst case
input, queue size at the end of phase II will be 0(1/1:). For example, all the
packets destined for a particular column might appear in the same row in the
input. The queue size of this algorithm can be shown to be 0{1/1: + logn).
Thus I: has to be greater than l/log n if only 0 (log n) queue size is allowed.

An Optimal Algorithm. The run time of the above algorithm can be


improved to 2n + O(logn) using some additional techniques. In the above
algorithm, most of the packets will reach their destinations within 2n steps.
The troublesome packets are those that originate from the four corners and
some small neighborhoods of these corners. Call these packets the superior
packets and the rest as inferior packets. Superior packets are routed using
a slightly different algorithm so as to make sure that the distance they have
to travel is no more than 2n. The inferior packets are routed using the
584 S. Rajaseka.ran and J.n.p. Rolim

previous algorithm. Highest priority will be given to the superior packets.


It can be shown that the delay any superior packet suffers is O(logn). Thus
the stated run time is obtained.
The queue sizes of all these algorithms is O(logn). It can be reduced
to 0(1) using the idea of spreading. The key observation is that the total
queue size of any fixed logn nodes is only O(logn). Partition the Mesh so
that each part consists of log n successive nodes. At any time in routing,
packets within a part are distributed among the processors in this part so
as to use only queues of size 0(1). More details of this technique can be
found in [60].

4.2 The Hypercube


In this section we present a randomized algorithm for partial permutation
routing on the hypercube which was given by Valiant [75]. The algorithm
to be presented is for a butterfly network which is very closely related to the
hypercube. Algorithms designed for the butterfly can routinely be used for
the hypercube and vice versa. Often, it is easier to develop algorithms for
the butterfly and then adapt them to the hypercube. Valiant's algorithm
uses queues of size O(1ogn). Several attempts were made to decrease this
queuesize to 0(1). All these efforts culminated in the paper of Ranade
[61] who not only presented a logarithmic time constant queue algorithm
for routing on hypercubic networks but also presented an optimal PRAM
simulation algorithm on these networks. For pointers to related works, the
reader is refered to [40]. In this section we describe the original algorithm
of Valiant [75]. We use lld to denote a hypercube of dimension d.
A butterfly network of dimension d (denoted as Bd ), has p = (d + 1}2d
nodes and d2 d+1 edges. Any node in Bd can be labeled with a tuple (r, i),
where 0 ~ r ~ 2d - 1 and 0 ~ i ~ d. Here r is called the row of the node
and i is called the level of the node. Node u = (r, i) in Bd is connected to
two nodes v = (r,i + 1) and w = (r(l+I),i+ 1) in level i+ 1 (for 0 ~ i < d).
The nodes v and u have the same row number. The row number of w differs
from r only in the (i + 1)th bit. Both v and ware in level i + 1. The edge
(u, v) is refered to as the direct edge and the edge (u, w) is called the cross
edge. Both of these edges are known as level (i + 1) edges.
Let u be any processor in level 0 and v be any processor in level d. There
is a unique path between u and v of length d. If u = (r,O) and v = (r', d),
then this path is (r,O), (rl' 1), (r2' 2), ... , (r', d). Here rl has the same first
bit as r', r2 has the same first and second bits as r', and so on. Paths of
Randomized Parallel Algorithms 585

this kind are called greedy paths.


It follows that the distance between any two processors in Bd is ::; 2d.
In other words, the diameter of Bd is 2d. There exists a close relationship
between and lld and Bd. If each row of a Bd is collapsed into a single
processor preserving all the edges, then the result is a lld.
Consider a Bd where there is a packet at each processor of level zero and
the packets have destinations in level d. Also, the destination columns form
a permutation of the origin columns. The routing algorithm for this input
consists of three phases. In phase I, a packet chooses a random intermediate
destination in level d and goes there using the greedy path. In phase II,
the packet goes to its actual destination row but in level zero. In phase III,
the packet goes to its actual destination in level d. The third phase takes
d steps since each packet uses the direct edge at each level. Phase II is the
reverse of phase I. Therefore, it suffices to compute the run time of phase I
in order to compute the run time of the whole algorithm.

Analysis of phase I

Consider any packet 1C' and let ei be the edge that 1C' traverses in level i, for
1 ::; i ::; d. To get an upper bound on the delay suffered by 1C', it is enough to
compute the number of distinct packets that overlap with 1C' (c.f. the Queue
Line Lemma). Let ni be the number of packets that have the edge ei in their
paths. Clearly, D = Ef=l ni is an upper bound on the number of packets
that overlap with 1C'.
Let ei be any level i edge. The number of packets that can go through
this edge is 2i-l. This is because there are only 2i - 1 processors at level
zero for which there are greedy paths through the edge ei. The probability
that any such packet goes through ei is ~. The reason is a packet starting
at level zero can take either the direct edge or the cross edge, each with a
probability of ~. When the packet reaches level one, it can again take either
a cross edge or a direct edge with probability ~, and so on. The packet can
go through ei only if it picks the right edge at each level and there are i such
edges.
As a result, the number ni of packets that can go though ei is distributed
as B (2 i - 1 , ~ ). The mean of this random variable is ~. Since the expectation
of a sum is the sum of expectations, the expected value of Ef=l ni is ~. An
upper bound on the variable D is the binomial B(d, ~). Applying Chernoff
586 S. Rajasekaran and J.D.P. Rolim

bounds Equation 1 we get,

Prob.[D > ead] < (~fad eead-d/2 < (2:afad eead

< (..l)ead
2a < 2-ead < p -a-l
We have made use of the facts that d = 9(1ogp) and a ~ 1. The
probability that at least one of the packets has a delay of more than 2ead is
< p-a-lp = p-a, since there are < p packets. We get the following theorem.
Theorem 4.5 The randomized algorithm for routing on Bd runs in time
8~. 0

The above algorithm is asymptotically optimal since the diameter of any


network is a lower bound on the worst-case time for permutation routing.
Queue length analysis
The queue length of the preceding algorithm is 8( d). Let Vi be any processor
in level i (for 1 $ i $ d). The number of packets that can potentially go
i.
though this processor is 2i. Any such packet has a probability of of going
through Vi. Therefore, the expected number of packets that go through Vi
i.
is 2i = 1. Applying Chernoff bounds Equation 1, we can show that the
number of packets going through Vi is O(d).

Routing on other networks. The problem of packet routing has been


studied extensively on numerous other networks, since packet routing plays a
vital role in parallel computing. For example, routing algorithms on meshes
with fixed and reconfigurable buses are summarized in [58]. A discussion
of routing algorithms for optical versions of the mesh (such as the AROB
and the OTIS-Mesh) can be found in [56]. Packet routing algorithms for
variants of the hypercube can be found in [40].

5 Parallel Shortest Paths Computations and


Breadth First Search on Undirected Graphs
Several fundamental problems in combinatorial optimization, especially those
related to network applications, require to perform shortest paths compu-
tations (or Breadth First Search - BFS) in graphs. This section is devoted
Randomized Parallel Algorithms 587

to an overview of some parallel solutions, presented so far in the literature,


which make use of randomization.

Given an undirected graph G(V, E) and a vertex rEV, one version of


this problem is to find the shortest path (and thus the distance) between r
and every vertex of G. Let n = IVI and m = lEI. This variant is commonly
called the Single-Source Shortest Path problem (in short SSSP). The All-
Pairs version (denoted as APSP) is to compute the shortest paths between
all node pairs of G.

The inability to perform shortest paths computations efficiently in paral-


lel is commonly called the Transitive Closure Bottleneck [32]. In particular,
satisfactory sequential algorithms exist for SSSP (see for example [77] or
[26]). But, on the other hand, the known parallel deterministic algorithms
are not in general efficient. The best known polylog-time deterministic al-
gorithm has a work complexity w(n) = O(n3polylogn) (see [32]) which is
significantly greater than that of the best sequential algorithms.

Although efficient parallel algorithms have been developed for some spe-
cial classes of graphs (see for example the near-optimal algorithm in [35] for
sparse graphs, namely when m = O(n)), the above facts led researchers to
consider less ambitious goals than to solve shortest paths problems exactly
and/or within an efficient deterministic worst-case complexity. An interest-
ing approach to cope efficiently with these problems in parallel is then to de-
vise randomized algorithms having an efficient expected complexity. In par-
ticular, we refer to the celebrated randomized parallel SSSP algorithm for
unweighted graphs due to Ullmann and Yannakakis [73] and to its improve-
ment due to Klein and Sairam [36] for weighted graphs. These algorithms
run in O(v'n polylogn) time and perform an O(mVnpolylogn) expected
work. More recently, Cohen [11] has given a relevant improvement of the
previous performances by introducing the first randomized parallel approxi-
mation scheme for shortest paths on undirected weighted graphs running in
polylogarithmic time and performing O(mnk + s(m + n1+k)) work, where k
is an arbitrary positive constant and s is the number of source nodes. These
algorithms overcome the Transitive Closure Bottleneck in various ways.

Next, we present details of the Ullman and Yannakakis's algorithm since


the successive algorithms can be seen as improvements and generalizations
of the techniques adopted in this algorithm.
588 S. Rajasekaran and J.D.P. Rolim

5.1 Ullmann and Yannakakis's algorithm for Breadth First


Search
An easy parallel technique for solving the SSSP problem is to perform a
parallel breadth first search, in which the nodes are visited level by level
as the search progresses. Clearly, level 0 consists of the sourcej in order to
compute level h + 1 (for h ~ 0), one processor Pi is assigned to each edge
i having an endpoint belonging to level h. If the other endpoint of i has
not yet been visited, Pi puts it in level h + 1. The procedure terminates
when all nodes connected to the source have been visited. Let mi be the
number of the edges with one endpoint in level i, then the time required for
computing level h + 1 is O( mdP + log* n) by using P processors (notice that
the term log* n is due to the time required for load-balancing in assigning the
processors, using the technique in [23]). Thus, if the corresponding BFS-tree
has h levels, the required time is O(m/p + h log* n).
The last bound tells us that the running time, using this approach, is
linear in the number of traversed levels. The key-idea of the Ullmann and
Yannakakis's algorithm is in perfoming a limited search, in which the number
of traversed levels is limited. In other words, in a k-limited search, only
paths consisting of at most k nodes are considered. Thus, for example,
by performing a logn-limited search, we obtain logn-shortest paths which
are not in general the actual shortest pathsj clearly, the latter is correctly
computed with an n-limited search. The basic version of the Ullman and
Yannakakis's algorithm is based on the yin-limited search.
The algorithm has the following structure.

Algorithm Al

1. Input: an undirected graph G = (V, E)j a source 8 E V.

2. Choose uniformly at random a subset S of V and add s to it. The size


of S must be 8(ylnlogn).

3. For every xES perform, in parallel, a yin-limited search generating


the shortest path P~ ,v from x to every node v E V.

4. An auxiliary weighted graph H is computed on the vertex set S, where


the weight of an edge is defined as the length computed by the previous
yin-limited search.
Randomized Parallel Algorithms 589

5. Compute l the All-Pairs shortest paths Px,y in H.

6. The shortest path Pv , from s to a node v E V, is computed in the


following way:
U
Pv == Ps,min Pmin ,v
where min is a vertex in H for which:

IPs ,mini + IP:nin'vix=E


min{IP
H" s xl + IP~ vi}·

Theorem 5.1 With high probability, Algorithm A1 computes correctly the


shortest paths from the source s to all the other nodes in V. The parallel
global time is O( yin) using m log n processors.

Proof. Given any v E V, let Pv be an arbitrary shortest path from s to Vi It


is not hard to prove that, with high probability, each subpath of Pv of size
yin contains at least a node xES. Hence, Pv can be seen as a sequence
of subpaths of size not larger than yin, whose extremal nodes belong to S
(except for the last node v). Such subpaths are computed in the yin-limited
search of Step 3. The shortest path from s to the last S-vertex x in Pv is
thus correctly computed in Step 5 and the shortest path from the latter to
node v is correctly computed in Step 3. The yin-limited search, in Step 3,
can be performed in 0 (yin log* n) time using m log n processors. Finally,
the total work of Step 5 is O((y'nlogn)3 poly logn) and can be done in
O( y'n) using m log n PRAM processors. From hereon, PRAM can be taken
to be CRCW PRAM, unless otherwise mentioned. 0

5.2 Shortest paths computations in weighted graphs


In order to compute shortest paths from a single source in weighted graphs, it
is possible to adopt the same approach as the one in Section 5.1. That is, we
can perform a parallel breadth first search starting from the source. During
stage i, for each active edge, if the edge weight is 1 then its head is assigned
to level i + 1, otherwise its weight is decreased by one. Consequently, level
i consists of all nodes at distance i from the source. Moreover, by assuming
lwith no limits on search
590 s. Raj83ekaran and J.D.P. Rolim

that the graph is undirected, the above procedure performs a preprocessing


step (during every stage) which contracts all edges having zero weight.

As in the case of unweighted graphs, the running time of the above


procedure is dominated by the traversed distance. However, Ullmann and
Yannakakis's approach cannot be directly applied to weighted graphs. In-
deed there is no apparent way to perform efficiently the yn-limited search,
especially when the weights are large. On the other hand, it is easy to verify
that the remaining steps of Algorithm Al works also for weighted graphs.
The crucial problem is thus to find a "weighted" version of the yn-limited
search. An useful method for solving optimization problems which involve
numerical inputs is to uniformly shrink all weights; but this, in itself, is
not sufficient since the search is strongly based on the fact that weights are
integers. Klein and Sairam [36] proposed a yn-limited search for weighted
graphs which uses the integer shrinking together with the well-established
technique, due to Raghavan and Thompson [54], for rounding weights with-
out changing their sums "too much". The key idea is that a nonintegral
value is rounded up or down according to a probability function which re-
flects how close the value is to the next higher integer and next lower one. By
applying this approach to the basic techniques of Ullamnn and Yannakakis,
Klein and Sairam provided a randomized parallel approximating scheme for
SSSP in weighted graphs which uses (mlogn)€-2 PRAM processors and
runs in O(...jii log nlog* n€-2) time, with an approximation error bounded by
€, for any fixed € > o.

Recently, Cohen [11] introduced the first parallel randomized approx-


imating scheme for shortest paths computations in weighted undirected
graphs which runs in polylogarithmic time and performs a near-linear work
with respect to the number m of edges. The algorithm is rather complicated,
however, we just observe that one of the main ideas in this algorithm is the
construction of an auxiliary weighted subgraph2 which plays an equivalent
role of the subgraph H in Algorithm AI. Cohen showed that it is possi-
ble to construct the auxiliary subgraph in polylogarithmic time and with
almost-linear total work by a randomized parallel procedure.

2these subgraphs are commonly called Hop sets


Randomized Parallel Algorithms 591

6 Parallel Randomized Techniques for Matching


Problems and Their Applications
Matching problems in graphs can be considered one of the most rele-
vant examples of natural problems whose present efficient parallel solutions
require the use of randomness and thus witnessing the importance of proba-
bilistic methods in designing parallel algorithms. Indeed, up today, there is
no known NC-algorithm for both deciding or constructing a perfect match-
ing in a graph. Known sequential deterministic algorithms for matching
problems are based on the augmenting path technique, the best one was
proposed by Micali and Vazirani [45] and has O( v'fVfIEI) computational
complexity. On the other hand, the parallel randomized algorithms are
based on an algebraic approach. A crucial ingredient for this approach is
certainly the famous Thtte's theorem. Thtte [72] discovered that a graph
has a perfect matching if and only if the determinant of a certain matrix
(called the Tutte matrix) is different from zero. Thus the problem of testing
the existence of a perfect matching in a graph is reduced to that of deciding
whether or not a certain integer matrix is invertible. The first parallel ran-
domized algorithm based on Tutte's theorem was proposed by Lovasz [42]
which uses also the fact that polynomial identities can be tested randomly
in an efficient way.
The problem of efficient parallel searching for a perfect matching in a
graph was open for a long time. In 1985, Karp, Upfal and Wigderson [31]
provided the first efficient parallel algorithm based on Thtte's theorem. This
algorithm finds (if any) a perfect matching in O(log3 n) parallel time and
uses a polynomial number of processors (that is, the problem turns out to be
in the class RNC 3 ). Successively, an RNC 2 algorithm has been proposed
by Mulmuley, Vazirani and Vazirani in [46]. Their technique is based on
the so called probabilistic isolating lemma. This lemma is of independent
importance. Moreover, the authors showed how to apply the algorithms
also for some generalized versions of the perfect matching problem. We will
describe these important techniques and results in Section 6.1.

6.1 Deciding the Existence of a Perfect Matching


A matching M of a graph G(V, E) is a subset of E such that no pair of
M -edges incident to the same vertex exists. We say that a matching M of
G is maximum if there is no other matching M' of G such that IM'I > IMI.
A perfect matching of G is a matching M which covers each vertex of G. It
592 S. Rajasekaran and J.D.P. Rolim

is easy to see that a perfect matching is always a maximum matching with


1V1/2 edges.
Let us first concentrate on the question of whether or not a given graph
has a perfect matching. We introduce here a result, which is due to Schwartz
[67], on random testing of polynomial identities. This result can be certainly
considered one of the major key ingredients in the design of randomized
algorithms during the last years: for instance, the rather surprising results,
obtained in the theory of probabilistic checkable proofs (see for example [5]),
strongly use the Schwartz's lemma.
Let p(XI' X2, ... , xn) be a polynomial, with degree d, where Xl, X2,"" and
Xn are in the field F. We consider the problem of checking whether a mul-
tivariate polynomial is identically zero. If the input polynomial is not given
in the standard simplified form but as an arbitrary arithmetic expression
then no polynomial time algorithms are known for solving the above prob-
lem. However, it is possible to derive a simple polynomial time randomized
algorithm which is based on the abundance of witnesses principle. Indeed,
the Schwartz's lemma determines the probability that a non identically zero
polynomial takes value zero in a random point.
Lemma 6.1 Let P(XI' X2, ... , xn) be a polynomial of degree d with n inde-
terminates Xl, X2,"" and Xn . Let P(XI' X2, ... , xn) be not identically zero.
Let S be any finite subset of the domain of p. Then the probability that
pbl' 'Y2, ... , 'Yn) is zero is no greater than d/ISI, where b1, 'Y2, ... , 'Yn) is a
random element of sn.
In particular, given a polynomial p which is not identically zero on the
real domain, if we select an n-dimensional vector ('YI, 'Y2, ... , 'Yn) in such a way
that each entry of this vector is chosen uniformly and independently in the
range [-d, . .. ,0, ... ,d] then the probability that the value pbl' 'Y2,'" ,'Yn)
is equal to zero is not greater than 1/2. Thus we have an efficient random-
ized (Monte Carlo) algorithm for deciding whether a polynomial is identical
zero. We will refer to this problem as the zero-polynomial one. We just
sample the vector (-yI, 'Y2, ... , 'Yn) in the above range (notice that it is easy
to compute the degree of a polynomial even if it is given in a not simpli-
fied form) and then testing whether p(XI' X2, ... , xn) = O. If yes, we then
state that P(XI' X2, ... , xn) is identically zero, otherwise P(XI' X2, ... , xn) is
not identically zero. By performing this test k independent times, on the
same polynomial, we can reduce the error probability to at most 1/2k.
The key idea in designing a randomized parallel algorithm to determine
whether or not a graph G has a perfect matching is to reduce this problem
Randomized Parallel Algorithms 593

to the zero-polynomial one. The reduction and the consequent randomized


algorithm, which was first proposed by Lovasz [42], is based on the con-
struction of the Tufte matrix. Let A be the n x n adjacency matrix of graph
G. The Tutte matrix T is an n x n skew-symmetric matrix which is ob-
tained from the adjacency matrix A as follows. If ai,j=1 and i < j then
this entry is replaced by indeterminate Xi,j' If ai,j = 1 and i > j then this
entry is replaced by -Xj,i' All the other entries are replaced with zeros. In
other words, each i, jth non-zero entry above the diagonal is replaced by Xi,j
and each i, jth non-zero entry below the diagonal is replaced by -Xj,i. The
following important result was found by Tutte [72] in the 1947.

Theorem 6.2 Let G = (V, E) be a graph and T be the Tutte matrix of G.


Then, G has a perfect matching if and only if det(T) =I O.

We can now describe the Lovasz's algorithm for testing whether or not
a graph G has a perfect matching.

Algorithm A2

1. Input: The adjacency matrix A of a graph G.

2. Compute the Tutte matrix T of G.

3. Replace the indeterminates Xi,j in T by random integers from the range


{ -n, ... ,0, ... , n} which are chosen uniformly and independently.
4. compute the determinant of the new matrix T.

5. if the determinant is not equal to zero then output YES, otherwise


output NO.

Since the determinant of T is a polynomial, by applying the test for the


zero-polynomial problem, we have that the probability of incorrect answer
is less than 1/2. Moreover we can make the probability of error arbitrarily
small by performing the computation several times. The complexity of Al-
gorithm A2 is dominated by the numerical evaluation of the determinant in
Step 3. There is a parallel algorithm running in O(log2 n) time and using
O(n 3 .5 ) PRAM processors for computing the determinant of an n x n ma-
trix [15], thus, the overall computational complexity of Algorithm A2 has
the above asymptotical bound.
594 S. Rajasekaran and J.n.p. Rolim

6.2 Constructing a Perfect Matching


Let us turn our attention to the problem of searching for a perfect matching.
In the following, we will thus assume that the input graph has always a
perfect matching. All the following results have been first introduced in
[46].
l.From a parallel point of view, the most difficult step is to coordinate
the processors in such a way that they search for the same matching. The
significance of this coordination arises from the fact that a graph can contain
exponentially many maximum or perfect matchings. The following isolating
lemma is used for solving this problem.

Lemma 6.3 Let S = {Xl, X2, ... , xn} be a finite set of objects and let U =
be a collection of pairwise distinct subsets of S. Let W1, W2, ... ,
{ Sl, S2, .. , S k}
Wn be random weights, chosen uniformly and independently from the range
{ -n, -n+ 1, ... , 0, 1, .... , n}, assigned, respectively, to the objects Xl, X2, ... ,
Xn of S. The weight of a set Sj is EZiESj Wi. Then the probability that U
contains an unique set with minimum (maximum) weight is greater than
1/2.

Proof. Assume that (at least) two sets Sa and Sb have the minimum (maxi-
mum) weight. Since Sa and Sb are not identical, they must differ in at least
one object. Let us denote this object as Xo. Let us split U into two disjoint
families F1 and F2 such that F1 consists of all subsets Si E U which contain
x o , and F2 consists of all subsets Sj E U which do not contain Xo' Now one
can see that the minimum (maximum) weight set of F1 is equal to the min-
imum (maximum) weight set of F2. Assume that we fix the weights for all
objects Xi E S for i = 1,2, ... , n except for the object Xo. In order to equate
the minimum (maximum) weight set of F1 and the minimum (maximum)
weight set of F2 we have only one possible choice for the weight Wo of object
Xo from 2n + 1 integers. Thus, the probability of this event is 1/(2n + 1).
Since lSI = n, we have n possibilities for choosing the object Xo. Finally,
the probability that U contains two or more sets with minimum (maximum)
weight is at most n/(2n + 1) < 1/2.
o
This result can be used for finding a perfect matching in a graph G =
(V, E). According to the notations in the isolating lemma, we consider the
edges of G as the objects of S and for the sets of U we choose those edges
which form a perfect matching in G. Let us assign random integer weights to
Randomized Parallel Algorithms 595

the edges of G; such weights are chosen uniformly and independently in the
range {-m, ... ,0, ... , m}. By applying the isolating lemma, we can claim
that there is, with probability> 1/2, an unique perfect matching having
minimum weight.
Let us consider the integer matrix B obtained from the Tutte matrix T
by replacing every entry Xi,j by the integer 2Wi •i , where Wi,j is the weight
assigned to the edge (i,j) E E.

Lemma 6.4 Suppose that M is the unique minimum weight perfect match-
ing in G and let us denote its weight as W. Then det(B) f:. 0 and the highest
power of 2 that divides det(B) is 22W.

Proof. Consider the vertex set V = {I, 2, ... , n} and one of its permutation
(J. Define
n
val((J) = II bi,u(i)
i=l
where bi,j = 2Wi .i if the edge (i, j) E E and zero otherwise. One can see that
(J corresponds to a perfect matching in G if and only if val((J) f:. O. From
the definition, we have that

det(B) =L sgn((J) val((J)

where sgn((J) = 1 when (J is even; otherwise sgn((J) = -1. Let (JM corre-
spond to the unique minimum weight perfect matching M, thus val ((J M) =
22W. The other permutations have value either zero, or, if they correspond
to some perfect matching in G, then they must have value equal to some
power of 2 greater than 22W since M is unique. Finally, each term of the
sum defining det(B) is divisible by 22W and there is only one permutation
(J M with val ((J M) = 2 2W , thus 2 2W is the higher power of 2 that divides
det{B). 0
The fact that 22W is the higher power of 2 that divides det(B) is then
used for determining the weight W of the unique minimum weight perfect
matching M in graph G. Indeed, the following lemma shows how to dis-
tinguish between the edges which belong to M from those not belonging
to M. We use the notation Bi,j for the i,j-minor of the matrix B which
is obtained by removing the i-th row and the j-th column from B. The
adjoint adj(B) of n x n matrix B is an n x n matrix whose i,j-th entry is
equal to (-1)i+ j det(B i ,j)'
596 S. Rajasekaran and J.D.P. Rolim

Lemma 6.5 Let M be the unique mm~mum weight perfect matching in


graph G (V, E) and let W be its weight. The edge (i, j) E E belongs to
M if and only if the value
det{B·t,)')2 Wi ,i

is odd.

Proof. Let us consider an edge (i,j) E E and consider a{i) = j. If we


compute the value
L sgn{a) val{a)
u:u(i)=j
then we actually obtain det{Bi,j) multiplied by 2Wi ,i. Suppose now that
the edge (i,j) belongs to the matching M and aM is the corresponding
permutation to M. Clearly, we have that aM{i) = j and val{aM) = 22W.
As mentioned, the remaining permutations have either value zero or value
27, where "I > 2W. Since

for some kEN and "Ii > 2W, i = 1,2, ... , k, the above sum is a multiple
of 22W and the result is odd. On the contrary, if (i,j) tJ. M, then all the
permutations with a{i) = j have value either zero or a power of 2 higher
than 22W. The obtained sum is a multiple of 2 2W but in this case the result
is even. 0
On the basis of previous lemmas, it is possible to derive the following
algorithm which constructs, with probability> 1/2, a perfect matching M
in a graph.

Algorithm A3

1. Input: Graph G = (V, E).

2. Assign random weights Wi,j from the range {-m, ... , 0, ... , m} to
edges (i,j) E E (weights are chosen uniformly and independently).

3. From the Tutte matrix of G compute matrix B.

4. Evaluate det(B) and compute W by using Lemma 6.4.


5. Compute adj{B).
Randomized Parallel Algorithms 597

6. For all edges (i,j) E E compute (det(Bi,j}2wi';}/22W.

7. If the result is odd then include the edge (i,j) in M (property of


Lemma 6.5).

The running time of the above algorithm is dominated by the compu-


tation of det(B} and adj(B}. Since there is an O(log2 n} time algorithm
which requires O(n 3 .5 m} processors for computing the determinant and the
adjoint of an n x n matrix (whose entries are m-bit integers) in order to find
B- 1 [50, 15], the running time of Algorithm A3 is O(log2 n} using O(n 3 .5 m}
PRAM processors.

6.3 Minimum Weight Perfect Matching and Maximum Match-


ing
Algorithm A3 can be extended to a generalized version of the perfect match-
ing problem, that is, when a weighted graph G = (V, E) (with weights w(e),
e E E) is given, and a minimum weight perfect matching is sought.

Corollary 6.6 The minimum weight perfect matching problem in graphs,


with edge weights given in unary, is in 'R.NC 2 •

Proof. Let us scale up each edge of E by a factor of mn, the minimum weight
perfect matching will be lighter than the rest by at least mn. Moreover, if we
add to each edge weight a random integer r chosen from { -m, ... ,0, ... m}
uniformly and independently, we can then apply the isolating lemma and
thus obtaining, with high probability, one minimum weight perfect match-
ing in G. The running time of this algorithm is O(log2n) and requires
O(n 3 .5 mw) PRAM processors, where w is the weight of the heaviest edge in
G. 0
Observe that if the edge weights of G are given in binary, then it is still
unknown whether the minimum weight perfect matching problem belongs to
'R.NC. Another generalization of Algorithm A3 can be done by considering
graphs that, in general, could have no perfect matching and the goal is thus
to construct a maximum matching.

Corollary 6.7 The maximum matching problem is in 'R.NC 2 •

Proof. We add new edges to G in such a way, that the obtained graph will
be a complete graph on n vertices, where n = IVI. We then assign weight
598 S. Rajasekaran and J.D.P. Rolim

o to all original edges of G and weight 1 to all new edges. It is easy to see
that in order to obtain a maximum matching in G we can apply Algorithm
A3 to find a minimum weight perfect matching M in the extended graph
and then from M simply remove the added edges with weight 1. 0

A result equivalent to that in Lemma 6.7 holds for the vertex-weighted


matching problem where the goal is to find a matching of a graph, whose
vertices are weighted, such that the sum of the weights of vertices covered
by the matching is maximum.

6.4 Parallel Depth First Search


The problem of performing Depth First Search (DFS) in parallel has been
studied by several authors [62, 63, 4] in the past and it was suspected to
be inherently sequential. This conjecture was also based on the fact that
computing the lexicographically first DFS is P-complete [63]. Although for
restricted classes of graphs there are NC algorithms (in particular, for planar
graphs [69] and for directed acyclic graphs [21]), it remains an open question
whether the (general) DFS is inNC. In this section, we present an important
algorithm due to Aggarwal and Anderson [3]. They proved that a DFS for
general graphs can be performed by a fast and efficient randomized parallel
algorithm. In fact they showed that the DFS problem is in nNC . As we
will show in the following section, the randomness in the algorithm is due
to the fact that some matching computations are required.
Overall scheme of the Aggarwal and Anderson's algorithm
Given a graph G(V, E), we sometimes use a subset of vertices V' to
indicate the corresponding subgraph induced by V'. Let p = VI, •.. ,Vk be
a path. A lower segment of p is a subpath VI, ..• ,Vj and an upper segment
is a subpath Vj,' .. ,Vk, where j < k. With the term V - p, we intend the
induced subgraph after all vertices in p have been removed.
The Aggarwal and Anderson's algorithm is based on the divide and con-
quer technique. At each stage a portion T' of the DFS tree is constructed
and the remaining graph V - T' consists of connected components having
size less than n/2 so that the DFS can be performed independently in each
of such components. This implies that the number of recursive stages is
bounded by O(logn). Let us describe the portion T'. The algorithm gen-
erates a rooted subtree T', called initial segment, which can be extended
to a DFS tree since it has the following property. Let C be a connected
component in V - T', then there is a unique vertex x E T' of greatest depth
Randomized Parallel Algorithms 599

that is adjacent to a vertex y in C. Thus, this edge can be correctly used


for connecting the DFS tree of component C to T'. The running time of
the algorithm is thus O{log n) multiplied by the time required to compute
an initial segment.
The construction of an initial segment requires two steps. In the first
step, a set Q of small (i.e. constant size) of disjoint paths is determined
in such a way that the size of the connected components in V - Q is at
most n/2. A disjoint-path set Q with such a property is commonly called
a separator. The second step is devoted to the construction of an initial
segment from the separator Q previously computed. The second step is
performed in N C using some graph techniques and does not require the use
of randomness, thus the interested reader can find its detailed description in
[3]. Our interest in this algorithm is instead in the construction of the small
separator since it is based on a reduction to the minimum weight perfect
matching problem. Thus, the use of randomness in the global algorithm for
DFS is required only in solving the matching problem.

Theorem 6.8 If the minimum weight perfect matching problem is in NC


then the DFS problem is also in NC.

The algorithm for constructing a separator consists of the recursive ap-


plication of a routine Reduce(Q) which reduces (still preserving the separator
property) the number of paths in Q by a constant a factor of 1/12 until the
size of the resulting Q is less than 12. Thus, the depth of the recursion is
logarithmic in the initial size of Q. Initially, Q is equal to V, where the V-
elements are here considered as paths oflength 0 (clearly V is a separator).
Let us describe the structure of the procedure Reduce(Q) and, in particular,
its connection with the matching problem. Given a separator Q, the routine
divides it in two set of disjoint paths, Land S. A set of vertex disjoint
paths P = {PI, ... , Pk} is considered between the paths of L and those of
S. Each path of P has one of its endpoints as a vertex in some path of L
and the other endpoint in some of S and interior vertices in V - Q. Each
path of Q contains the endpoint of at most one path of P. Suppose that
pEP joins 1 = l 1xl 2 to s = s1ys2 where the enpoints of p are x and y.
Here 11,12, sl, and s2 are subpaths. Without lost of generality assume that
Isll ~ Is21. Then, 1 is replaced by the path 1 = lIps!, s2 replaces sand 12 is
discarded. Hence, the path s is reduced in length by half. This is done for
each path pEP and the pair of paths joined by p. Note that the size of L
remains the same and the size of S can instead decrease: this happens, for
600 s. Rajasekaran and J.D.P. Rolim

example, when there is a path pEP that joins a path s E S in one of its
endpoints. Thus, in order to guarantee a correct and efficient process, we
require the following conditions that must hold for every application of the
routine Reduce(Q}.
1. The joining operations inQ should not result in the components of
V - Q being merged (in this case the separator property could not be
preserved) .
2. The number of paths joined in Q, using P, is at least l21QI.
It is not hard to verify that if these two conditions are satisfied, then the
time required for computing a separator, having size not greater than 11,
is bounded by O(logn). The idea, in achieving the above conditions, is in
the construction of a particular set of disjoint paths. Suppose the maximum
number of disjoint paths between Land S is a and consider a maximum
set of disjoint paths po;. For each p E po; from I to s we assign to it the
value 11 2 1. That is, the weight of p is equal to length of the corresponding
discarded subpath. The set of disjoint paths, considered by Reduce(Q), is
the one that minimizes the total cost.

Lemma 6.9 If P is a minimum cost maximum size set of disjoint paths


between Land S then the routine Reduce(Q) satisfies conditions (1) and
(2).
Let us now show how to efficiently reduce the problem of finding a min-
cost max-size set of disjoint paths to a matching problem and consequently
solve it using an nNe-algorithm.
The maximum set of disjoint paths problem can be stated as follows.
Given a graph G' = (V, E) and two disjoint sets of vertices X and Y, find a
maximum size set of vertex disjoint paths between X and Y. In its weighted
version, edges are assigned weights and a minimum total weight maximum
set of disjoint paths is sought. The matching problem, we consider, is finding
a minimum weight perfect matching in a weighted graph. The edges have
non negative integer weights bounded by n. The above matching problem
is the one that we have to solve for computing the separator in the routine
Reduce (Q). Indeed, given the partition (L, S) of Q, each path in L is con-
tracted to one vertex in X and each path in S is contracted to one vertex
in Y. The edges starting from nodes in X have weights which correspond
to those of the edges leaving paths of L (that is the lengths of the discarded
subpaths).
Randomized Parallel Algorithms 601

Lemma 6.10 Given a weighted graph G(V, E) and two disjoint vertex sub-
set X and Y, the problem of finding a maximum set of disjoint paths between
X and Y can be reduced to the problem of finding a minimum weight perfect
matching in a graph G' in which the edge weights have value zero or one
only.

Proof. By adding some dummy vertices, we can always consider the case
IXI = IYI = ai The new graph G'(V',E') has vertices vin and v out for each
v E V - (XUY), with an edge between them. For any vertex x E X, there
is a vertex x E V'; the same holds for the set Y. Moreover, we use X and
Y to denote these sets even in the set V'. The set V' - (X UY) is denoted
as Wand for any edge (v,w) E E, where v and ware in V - (XUY), we
define the edges (vin,wout) and (win,v out ) in E'. For each edge (x,v) E E,
there is an edge (x, vin ) E E' and similarly, for each edge (v, y) E E there is
an edge (vout,y) E E. For each edge (x,y) E E, there is an edge (x,y) E E'.
All the edges defined so far have weight zero. In the following, we refer to
this as the basic construction, we add a complete bipartite graph, with edge
weights equal to 1, between X and Y. The fact that a multiedge could exist
between the sets X and Y, has no relevance. Notice that, by construction,
there is always a perfect matching in the graph G'.
Let us now assume that k vertex disjoint paths between X and Y in G
exist, then we can easily match the vertices in G' appearing in these paths by
using the corresponding edges of type: (x,v in ), (vin,wout) and (win,vout).
The unmatched vertices in W can be matched by (v in , v out ). The remaining
unmatched vertices consist of a - k vertices in X and a - k vertices in Y.
These can be matched by the edges (x, y) of weight equal to one and thus we
have a perfect matching of weight a - k. Conversely, suppose that a perfect
matching M of weight a - k exists in G' and let W' be the set of all edges of
type (v in , v out ). Consider the subgraph induced by the set of edges W $ W'
where $ denotes the symmetric difference. In this subgraph, all the vertices
in X and Y have degree one and the vertices in W have degree zero or two.
Hence, the subgraph consists of paths and cycles. The interior vertices of
such paths are alternative of type vin and v out so the paths go from X to Y.
There are a paths in the subgraph and since the matching has weight a - k,
k paths must have weight zero. These paths correspond directly to paths in
G'. Thus, by minimizing the weight of a perfect matching in the graph G'
we maximize the number of paths since a perfect matching of weight a - k
in G' corresponds to a set of a vertex disjoint paths in G.
o
602 S. Rajasekaran and J.D.P. Rolim

Our next goal will be to show that the minimum cost set of paths required
by the routine Reduce(Q) can be obtained by solving a matching problem
similar to that considered in the previous theorem.
We first construct a new weighted graph Gil from Q and the input graph
G. As previously defined, the cost function for a path which starts from the
vertex X of a path I E L is equal to the distance between x and the end of
the path I. Contract each path s E S to a single vertex y in Gil and each
path I E L to a single vertex x in Gil. If an edge e is incident to the new
vertex x, then we assign to it a weight j if e was incident to a vertex i of the
path I E L corresponding to x and i is at distance j from the topmost vertex
of I. This construction could generate multiple edges starting from nodes
of type x, in this case only the one with the minimum weight is considered.
All the edges not incident to a vertex of type x will have weight zero.

Lemma 6.11 The problem of finding a minimum cost set of disjoint paths
of a given size in a graph with n nodes can be reduced to the problem of
finding a minimum weight perfect matching in a graph with at most 2n
vertices and edge weights bounded by n.

Proof. We apply the above construction and then we repeat an argument


similar to that used in Lemma 6.10.
o

Theorem 6.12 Let pM (n) and TM (N) be, respectively, the number of
PRAM processors and the parallel time required to compute a minimum
weight perfect matching in a graph with n nodes and where the edge weights
are bounded by n. Then, the problem of finding the minimum cost maxi-
mum size set of disjoint paths can be solved in O(TM(n)) parallel time using
pM (n) PRAM processors.

Proof. We first apply the construction for proving Lemma 6.10 for finding
the maximum number of vertex disjoint paths and then apply the second
construction (the one used for proving Lemma 6.11) for determining the
minimum cost set of vertex disjoint paths of that size. All these constructions
are based on minimum weight perfect matching computations. 0

The above results globally provide an nNe-algorithm for computing the


DFS in general graphs. Moreover, they also prove Theorem 6.8.
Randomized Parallel Algorithms 603

7 Minimum Cost Spanning Trees

Given a weighted graph, the problem is to find a spanning tree whose weight
is minimum. A sequential lower bound of O(IVI + lEI) is trivial. Very
nearly optimal deterministic algorithms have been devised for this problem.
In 1994 Klein and Tarjan [29] presented a Las Vegas algorithm with a run
time of O(IVI + lEI). The main steps in the algorithm are: (1) For some
appropriate m, select a random sample E' of m edges from G. (2) Denote
the induced subgraph as G'(V, E'). The subgraph G' may not be connected.
Recursively compute a minimum-cost spanning tree for every component of
G'. Let F be the resultant minimum-cost spanning forest of G'. (3) Making
use of F, eliminate some edges (called the F-heavy edges) of G that cannot
belong to a minimum-cost spanning tree. Let the graph that results from
G after elimination of the F-heavy edges be Gil. (4) Recursively compute a
minimum-cost spanning tree for Gil. The same will also be a minimum-cost
spanning tree for G.

Steps 1 to 3 are effectively reducing the number of edges in G. The


algorithm can further be speeded up by reducing the number of nodes also.
BOrUvka steps steps can be used to reduce the number of nodes. For each
node, in a Boruvka step, an incident edge with minimum weight is selected.
Compute the connected components of the induced graph. Now replace each
component with a single node. Throwaway edges within the individual
components. In this graph keep only an edge of minimum weight between
any two nodes. Delete any isolated nodes.

Each Boruvka step reduces the number of nodes by a factor of at least


two, since an edge is picked for every node. A minimum-cost spanning tree
for the reduced graph can be extended to get a minimum-cost spanning tree
for the original graph. If E' is the set of edges in the minimum-cost spanning
tree of the reduced graph, include into E' the edges chosen in the Boruvka
step to obtain the minimum-cost spanning tree edges for the original graph.
Klein and Tarjan show that this algorithm runs in O(IVI + lEI) time. This
algorithm has been parallelized to run in O(log IVI) time, the total work
done being O(IVI + lEI) [13].
604 S. Rajasekaran and J.D.P. Rolim

8 Luby's Method and the Maximal Independent


Set problem
In this section, we will describe an important randomized technique, due
to Luby, for solving efficiently optimization problems in parallel. Luby's
method is based on a general probabilistic result, commonly called the pair-
wise independence lemma, which has several applications. We will present
this method by describing a simple and efficient algorithm for the MAX IND
SET problem [43].
Given a graph G (V, E), a Maximal Independent Set (MIS) is a maximal
subset of vertices such that no two are connected by an edge. The MAX IND
SET problem is to determine aMI S of a given graph. Karp and Wigderson
[30] provided a randomized parallel algorithm running in O(log4 n) expected
time and using n 2 processors. The same authors showed a deterministic
NC 4 algorithm for this problem. More recently, Luby provided a parallel
randomized algorithm, running in O(log2 n) expected time and using m
PRAM processors.
In what follows, we denote the degree of a node v E Vas d(v), the MIS
generated by the algorithm as IN D and the set of neighbors of a vertex v
(of a subset of X) as N(v) (as N(X)).

Algorithm A4
1. Input: an undirected graph G(V, E).
2. IND :=0j
3. G'(V',E'):= G(V,E)j
4. while V' t 0 do
5. begin
(a) Construct randomly in parallel the subset X as follows: for any v E V
add v to X with probability Pv = 2dlv) (if d(v) = 0 add always v to X)j
(b) IND':= Xj
(c) For every adjacent vertex pair v,w in X remove from IND' in parallel
that vertex which has maximum degree. If d( v) = d( w) choose an
arbitrary node to be removed.
(d) Add IND' to INDj
(e) Y:= IND' UN(IND')j
Randomized Parallel Algorithms 605

(f) Assign the subgraph induced by the new vertex subset V' := V' - Y to
G'(V',E');

6. end

It is easy to verify that A4 generates a MIS for any input graph G. The
harder task is to prove that the expected number of iterations of the while
loop is 'small'. To do this, we will show the following property. Assume that
m is the size of E' before one execution of the while loop, then the expected
number r of edges removed during one execution of the while loop is not
smaller than 1/8 m. Clearly, this property immediately implies that the
expected number of iterations of the while loop is bounded by O{logn). In
fact we can show that the number of iterations of the while loop is O{log n).
Let m' be the number of the remaining edges (i.e. m' = m - r) then, by
the above property, we have that E{m') ~ 7/8m where E(m') denotes the
expected value of m'; hence, by applying the Markov inequality (see for
example [2]), we have that:

7.5 E(m') 7
Prob(m' > -m) < - - < -.
- 8 - 785 m - 7.5

Thus, let success be the event m' ~ 785 m after one execution of the
while loop then, after k log n independent rounds, using Chernoff bounds,
we have:

111
Prob{ #successes ~ 215 k log n) ~ 1 - exp{ - 60 k log n).
The above inequality proves that, after O(log n) iterations the algorithm
terminates.
In order to prove the inequality E(m') ~ 7/8m, Luby's [43] introduced
the following result (also called the pairwise independence lemma).

Lemma 8.1 Let PI, .. ,Pn be the probabilities of the pairwise independent
n
events E I , ... ,En (that is Pr{Ei Ej) = PiPj)); then

where P = Ef=l Pi·


606 s. Rajasekaran and J.D.P. Rolim

Proof. Assume that PI, ... ,Pn are in nondecreasing order. Define also Ek =
Uf=I Ei and Prk = Ef=IPi. Hence, we have, for any fixed k, that Prob(Ek ) ~
Prob(E~) and, by the inclusion-exclusion Theorem (see for example [2]),

Prob(Ek ) ~ Prk - L PiPj·


I$i<j$k
The second part of the above inequality reaches its minimum value when
Pi = Prklk thus obtaining

Two cases may arises: Prn ~ 1 and Prn > 1. In the first case, we have
that Prob(E~) ~ P;n
~ ~. In the second case, let imin be the minimum
index such that Pri ~ 1; if imin = 1 the lemma is trivial, thus suppose that
imin > 1, then we have that

since the sequence PI, ... ,Pn is nondecreasing. Finally, we obtain that

P b( E~'min ) > P " " . (1- Primin(imin -1)) > !


ro - r'mm 2"Zmm
" - 2·
o
As stated above, this lemma can be used to prove that E(m') ~ 7/8m.
Indeed, for any fixed vertex v E V, consider the event

Ev : 'v is selected to be included in X'.

By definition, if v is not isolated, we have that Pv = Prob(Ev) = 2d(v).


Let us define Prv = EWEN(v) Pw, then the following lemma holds.

Lemma 8.2 For any v E V, the probability that v belongs to the set N(IND'),
where IN D' is the set constructed by A4, satisfies the following inequality:

Prob(v E N(IND')) ~ ~min(Prv, 1).


Randomized Parallel Algorithms 607

Proof. By definition of N(IND'), we have that

d(v)
Prob(v E N(IND')) ~ Prob( U Ei).
i=l
Let us denote the neighbors of v as 1, ... ,d( v) and, moreover, define, for
any i = 1, ... d(v), the following events:

E~ = El , Ek =(
k-l

i=l
n .(Em n Ei ,

Ai = n{
j
.(Ej) : j E N(i) and d(j) ~ d(i)}.

Then, by definition, we have:


d(v)
Prob(v E N(IND')) ~ L Prob(EDProb(AdED·
i=l
and since

Prob(Ad ED ~ Prob(A i ) ~ 1 - L Pz,


(z,i)EE:d(z)~d(i)

it follows that

1 d(v)
Prob(v E N(IND')) ~ - Prob(ED·L
2 i=l
Furthermore, we have that:
d(v) d(v)
L Pr(ED = Prob( U Ei).
i=l i=l
Finally, since the events Ev (v E V) are pairwise independent, Lemma
8.1 applies, thus proving that

Prob(v E N(IND')) ~ ~min(Prv, 1).


o
Given any subset S ~ V, we denote the subset of edges incident to S as
HIT(S). The set of edges which are removed during one execution of the
608 s. Rajaseka.ran and J.D.P. Rolim

while loop is HIT(IND' U N(IND')). Thus, the expected size of this set
satisfies the following inequality:

E(r) = E(IHIT(IND' U N(IND')) I) ~~L d(v)Prob(v E N(IND')).


vEV
(4)
From Lemma 8.2, it follows that

Prob(v E N(IND')) ~ ~min(prv, 1).


By replacing this value in Eq. 4 we thus obtain the final fact:
7
E(m') ~ Sm.
We complete the complexity analysis of Algorithm A4 by observing that
one execution of the while loop is dominated by the construction of sets
IND' and IND which can be performed in parallel, on the CREW-PRAM
model, in O(logn) by using m processors.

9 Randomization and approximation


In this section, we focus on the MAX FLOW problem for networks. A network
can be formally defined as N = (G,s,t,c) where G = (V,E) is a directed
graph (IVI = n and lEI = m), 8, t are two distinct vertices (Le. source
and sink) of G and c : E -t Z+ the capacity function. An assignment
fp: E -t Z+ of a non-negative number to each edge of G (flow in edge) is
called flow pattern if the following conditions hold:

1. the flow in each edge of G does not exceed the capacity of the edge,
2. the sum of the flows of incoming edges is equal to the sum of the flows
of out coming edges for every vertex of G, except for nodes sand t.

With v(fp), we denote the total flow that fp generates into the sink
t. A natural question is determining the maximum flow that we can push
through the network into the sink vertex.
If we denote by F(N) the value of the maximum flow into t then the MAX
FLOW problem can be stated as follows. Let N = (G, 8, t, c) be a network,
compute a flow pattern fp* such that v(fp*) = F(N).
Randomized Parallel Algorithms 609

The MAX FLOW problem and the Maximum Matching are related. Indeed,
we now show that there is an NC-reduction from MAX FLOW with polynomial
capacities to Maximum Matching in bipartite graphs. This interesting con-
nection has been observed by Karp, Upfal and Wigderson [31].

Theorem 9.1 MAX FLOW problem in networks restricted to capacities bounded


by a polynomial in the number of vertices is NC-reducible to Maximum
Matching problem for bipartite graphs. It thus follows that the former belongs
to'RNC 2 •

Proof. Let us first consider a network N = (G,s,t,c) where c(e) is one for
any edge e E E. Then, we can construct a bipartite graph H = (VI, V2, A)
such that each vertex partition of H contains a copy of the edge set of G.
Thus, VI = {(e, 1), e E E} and V2 = {(e, 2), e E E}. The edges of H are
defined as follows. There is an edge ((e, 1), (J, 2)) E A between vertices (e, 1)
and (J,2) of H ifthe head of edge e = (i,j) is also the tail of edge f = (j, k)
for some i,j, k. Moreover, if an edge e E E is incident with neither s nor t,
then ((e, 1), (e, 2)) EA.
Our next goal is proving that a maximum matching in H yields a max-
imum flow in N. Given a matching in the graph H, consider the following
rule. An edge e of the network N carries a flow of 1 if and only if (e, 1)
is matched with some vertex (J,2), where e ¥- f, or (e,2) is matched with
some vertex (J, 1), where e ¥- f. It is not hard to verify that this method
gives a maximum flow of N.
If we consider networks with capacities greater than one, the above de-
scribed reduction can be easily extended. Indeed, we can simply replace each
edge e = (i, j) of N with capacity c > 1 by c parallel edges from vertex i to
j, each of unit capacity. All capacities are now equal to 1 and, consequently,
the reduction to bipartite maximum matching applies as well.
Observe that the above reduction from the MAX FLOW problem to the
Maximum Matching problem can be performed in constant parallel time
using a polynomial number of PRAM processors. The theorem is completely
proved by observing that Maximum Matching problem is in 'RNC 2 (see
Corollary 6.7). 0

Approximating maximum flow in general networks


The above reduction is thus efficient only when the edge capacities are
bounded by some polynomial in the graph size. Thus, a natural further
study is to develop efficient parallel algorithms which apply also when the
610 S. Rajaseka.ran and J.D.P. Rolim

capacities are not bounded. Serna and Spirakis [68] showed how to achieve
this goal by an approximation nNC-scheme. In what follows, we describe
this important result.
A first, rough approximation of the maximum flow F(N) is given by the
following facts.

Lemma 9.2 Let N = (G, 8, t, c) be a network and let k be a positive integer.


We can decide in NC whether F(N) ~ k or F(N) < km.

Proof. For any positive integer k, consider the network N' = (G',8,t,c') ob-
tained from the input network N by removing all such edges whose capacity
is less than k and leaving the same capacity function c for the remaining
edges in E. Two cases may thus arise. If 8 is connected to t by a path P of
G' then the value F(N) is certainly not smaller than k since each edge of P
has capacity satisfying this condition. If instead 8 is not connected to t in
G', it follows that no flow pattern in N can generate a value greater than
klEI = km.
Thus, in order to check the inequality for the maximum flow F(N) ex-
pressed in the lemma, it is sufficient to construct the network N' from N
and then perform the connectivity test for the pair 8, t. There are several
NC-algorithms for this test (see for example [27], Ch. 5).
o

Lemma 9.3 Let N = (G, 8, t, c) be a network and let k be a positive integer.


We can construct in NC a network M = (G, 8, t, CM) such that:
kF(M) ~ F(N) ~ kF(M) + km.
Proof. The network M is identical to N except for the capacity function:
c(e)
cM(e) = lk J , e E E.

If C is an (8, t)-cut then we have that c(C) ~ kCM(C) + klCI and also
kCM(C) ~ c(C). These inequalities imply that any min cut A of N and any
min cut B of M satisfy the following inequalities:

kCM(B) ~ c(A) ~ kCM(B) + km


thus proving that

kF(M) ~ F(N) ~ kF(M) + km.


Randomized Parallel Algorithms 611

o
Since Lemma 9.2 can be applied even for numbers that are exponential
in the size of the instance N, it is not hard to prove the following fact.

Lemma 9.4 Let N = (G,s,t,c) be a network. We can compute in NC an


integer value k such that 2k ~ F(N) < m2k+l,

We can now state the existence of an NC-reduction from the MAX FLOW
version with polynomial maximum flow to that with polynomial capacities.

Lemma 9.5 Let N = (G,s,t,c) be a network, we can construct in NC a


new network Nl = (G, s, t, cd such that

log(Max(Nl)) ~ log(F(Nl)) + O(logn)

and F(N) = F(Nd, where Max(Nd = maxeEE{cl(e)}.

Proof. Let k be the value obtained in Lemma 9.4. In order to construct the
network Nl from N, it is sufficient to define the edge capacities for G as
follows:

c e = {m2k+l if c(e) ~ m2k+l


1( ) c( e) otherwise

According to the well-known Max-flow Min-cut Theorem [18] and the fact
that the directed graph G is the same for both networks Nand N I , it
follows that NI and N2 have the same min cuts. Since there is no edge e
in any min cut of the network N such that c(e) > m2k+l, it follows that
F(N) = F(Nt}. Consequently, by Lemma 9.4 we have 2k ~ F(Nt} < m2k+l
and Max(Nt} ~ m2k+l. In other words, Max(Nt} ~ 2mF(Nt}. 0
From the above lemma and Theorem 9.1, we have the following fact.

Theorem 9.6 Solving the MAX FLOW problem in networks where F(N) ~
p(n), for some polynomial p, is NC-reducible to the problem of constructing
a Maximum Matching in bipartite graphs, and thus the former belongs to
nNC2 •
Let us now give the randomized approximation scheme, due to Serna
and Spirakis [68], for the MAX FLOW problem in general networks.
612 S. Rajasekaran and J.D.P. Rolim

Algorithm AS

1. Input: a network N = (G, s, t, c); a positive number E.

2. Compute the integer k such that 2k ::; F(N) < m2k+1.


3. Construct the network Nl such that

log(Max(Nt}) ::; log(F(Nt}) + O(logn),

as described in Lemma 9.5.

4. If2k ::; (1+E)m then Theorem 9.6 applies and we can determine F(N)
by an nNC-algorithm.
Else (i.e., if 2k > (1 + E)m)

a) h:= l2k j(1+t:)mJ.


b) Construct the network M from Nl according to Lemma 9.3 using
h as parameter.
c) Solve the MAX FLOW problem in the network M by using Theorem
9.6.
d) Let jPM and F(M) be the flow pattern and the maximum flow,
respectively, computed in the previous step. As approximated
solution for N, Set jpapX(e) = hjpM(e) (e E E) and Ouptut
Fapx(N) = hF(M).

The computation of the integer k and the construction of networks Nl


and M can be performed in NC as stated, respectively, in Lemmas 9.4, 9.5
and 9.3. Finally, concerning Step c, we observe that Lemma 9.3 implies that

F(M) < F(N) < 2k+1 m.


- h - h

It thus follows that, for constant E, F(M) is polynomially bounded in m


and Theorem 9.6 applies correctly. Moreover, the approximation ratio is,
with high probability, either equal to 1 (this happens when the condition in
Step 4 is satisfied) or equal to r(t:) = ~l~). From Lemma 9.3, we have that
F(N) ::; hF(M) + hm, thus obtaining:
hm
r(E) ::; 1 + hF(M)
Randomized Parallel Algorithms 613

Applying again Lemma 9.3, we have that hF{M} ~ F{N} - hm. From Step
2 we have that F{N} ~ 2kj this implies the inequality:
hm
r{f} ~ 1 + 2k _ hm'

Since hm ~ l~t we have that


f 2k
2k -hm> - - .
- 1 +f
The above inequality is equivalent to the following
hm
-:--- <-1
2k - hm - f

and consequently we achieve the bound r{f} ~ 1 + l/f.


The above facts show that Algorithm A5 is an 'R.NC-approximation
scheme for computing the Max-Flow problem on general networks.

10 Conclusions
In this paper we have presented some basic randomization techniques that
have been used to develop efficient parallel algorithms. For several problems
of computing the best known parallel algorithms employ randomization.
Practical implementations often suggest that randomized algorithms per-
form better than their deterministic counterparts. Thus one would expect
that randomized algorithms are here to stay.

References
[1] M. Ajtai, J. Koml6s and E. Szemeredi, An O{ n log n) sorting network,
Proc. 15th Annual AGM STOG, 1983, pp. 1-9.

[2] N. Alon and J. H. Spencer, The Probabilistic Method, {Wiley-


Interscience Publication, 1992}.
[3] A. Aggarwal, R. J. Anderson, A random NC algorithm for depth first
search, Proc. 19th Annual AGM STOG, 1987, pp. 325-334.
[4] R. J. Anderson, A parallel algorithm for the maximal path problem,
Gombinatorica 7(3), 1987, pp. 400-415.
614 S. Rajasekaran and J.D.P. Rolim

[5] L. Babai, F. L. Levin, and M. Szegedy, Checking computation in poly-


logarithmic time, Proc. 23rd Annual ACM STOC, 1991, pp. 21-28.

[6] K. E. Batcher, Sorting networks and their applications, Proc. Spring


Joint Computer Conference 32, (AFIPS Press, 1968), pp. 307-314.

[7] P. Beame and J. Hastad, Optimal bounds for decision problems on the
CRCW PRAM, Proc. 19th Annual ACM STOC, 1987, pp. 83-93.

[8] G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J.


Smith, and M. Zagha, A comparison of sorting algorithms for the con-
nection machine CM-2, Proc. 3rd Annual ACM Symposium on Parallel
Algorithms and Architectures, 1991.

[9] B. Bollobas, Random Graphs, (Academic Press, 1985).

[10] A. Clementi, L. Kucera, and J. Rolim, A note on parallel randomized


algorithms for searching problems, to appear in DIMACS Series in
Discrete Mathematics and Theoretical Computer Sciences, (American
Mathematical Society, 1994).

[11] E. Cohen, Polylog-time and near-linear work approximation scheme for


undirected shortest paths, Proc. 26th Annual ACM STOC, 1994, pp.
16-26.

[12] R. Cole, Parallel merge sort, SIAM J. Comp., 17(4), 1988, pp. 770-785.

[13] R. Cole, P. Klein, and R. Tarjan, Finding minimum spanning forests in


logarithmic time and linear work using random sampling, Proc. Eighth
Annual Symposium on Parallel Algorithms and Architectures, 1996, pp.
243-250.

[14] D. Coppersmith, P. Raghavan, and M. Tompa, Parallel graph algo-


rithms that are efficient on average, Proc. 28th Annual IEEE FOCS,
1987, pp. 260-269.

[15] L. Csanky, Fast parallel matrix inversion algorithms, SIAM J. Compo


5, 1976, pp. 618-623.

[16] R. E. Cypher and C. G. Plaxton, Deterministic sorting in nearly log-


arithmic time on the hypercube and related computers, Proc. 22nd
Annual ACM STOC, 1990, pp. 193-203.
Randomized Parallel AlgoJ.:ithms 615

[17] P. Erdos and A. Renyi, On random graphs I, Publ. Math. Debrecen 6,


1959, pp. 290-297.

[18] L. R. Ford and D. R. Fulkerson, Flows in Networks, (Princeton Univer-


sity Press, 1962).

[19] W. D. Frazer and A. C. McKellar, Samplesort: a sampling approach


to minimal storage tree sorting, Journal of the A GM, 17(3), 1970, pp.
496-507.

[20] Z. Galil, and V. Pan, Improved processor bounds for algebraic and
combinatorial problems in nNe, Proc. 26th Annual IEEE FOGS, 1985,
pp. 490-495.

[21] R. K. Ghosh and G. P. Bhattacharjee, A parallel search algorithm for


directed acyclic graphs, BIT 24, 1984, pp. 134-150.

[22] R. Greenlaw, Polynomial completeness and parallel computation, in


J. H. Reif (ed.) Synthesis of Parallel Algorithms, (Morgan-Kaufmann
Publishers, 1993).

[23] J. Gil, Y. Matias, and U. Vishkin, Towards a theory of nearly constant


time parallel algorithms, Proc. 32nd Annual IEEE FOGS, 1991, pp.
698-710.

[24] W. L. Hightower, J. F. Prins, and J. H. Reif, Implementation of ran-


domized sorting on large parallel machines, Pmc. 4th Annual ACM
Symposium on Parallel Algorithms and Architectures, 1992, pp. 158-
167.

[25] C. A. R. Hoare, Quicksort, Gomputer Journal 5, 1962, pp. 10-15.

[26] E. Horowitz, S. Sahni, and S. Rajasekaran, Computer Algorithms, (W.


H. Freeman Press, 1998).

[27] J. Ja Ja., An Introduction to Parallel Algorithms, (Addison-Wesley Pub-


lishers, 1992).

[28] C. Kaklamanis and D. Krizanc, Optimal sorting on mesh-connected


processor arrays, Proc. 4th Annual AGM Symposium on Parallel Algo-
rithms and Architectures, 1992, pp. 50-59.
616 S. Rajasekaran and J.D.P. Rolim

[29] D. R. Karger, P. N. Klein, and R. E. Tarjan, A randomized linear-time


algorithm to find minimum spanning trees, Journal of the ACM 42(2),
1995, pp. 321-328.

[30] R. M. Karp and A. Wigderson, A fast parallel algorithm for the maximal
independent set problem, Journal of the ACM 32, 1985, pp. 762-773.

[31] R. M. Karp, E. Upfal, and A. Wigderson, Constructing a maximum


matching is in random NC. Combinatorica 6(1), 1986, pp. 35-48. A
preliminary version also appeared in Proc. 17th Annual ACM STOC,
1985.

[32] R. M. Karp and V. Ramachandran, Parallel algorithms for shared-


memory machines, in J. van Leeuwen (ed.) Handbook of Theoretical
Computer Science, (Elsevier Science, 1990), Vol. A, Chapter 17.

[33] R. Karp, An introduction to randomized algorithms, Discr. Appl. Math.


34, 1991, pp. 165-201.

[34] M. Kaufmann, S. Torsten, and J. Sibeyn, Derandomizing algorithms


for routing and sorting on meshes, Proc. 5th Annual ACM SIAM Sym-
posium on Discrete Algorithms, 1994, pp. 669-679.

[35] D. Kavvadias, G. E. Pantziou, P. G. Spirakis, and C. D. Zaroliagis,


Hammock-on-ears decomposition: a technique for the efficient parallel
solution of shortest paths and other problems, Proc. 19th MFCS, LNCS
841, 1994, pp. 462-472.

[36] P. N. Klein and S. Sairam, A parallel randomized approximation scheme


for shortest paths, Proc. 24th Annual ACM STOC, 1992, pp. 750-758.

[37] P. N. Klein and S. Sairam, A linear-processor polylog-time algorithm


for shortest paths in planar graphs, Proc. 34th Annual IEEE FOCS,
1994, pp. 259-270.

[38] L. Kucera, Expected behavior of graph coloring algorithms, Proc. Fun-


damentals in Computation Theory, LNCS 56, 1984, pp. 447-451.

[39] M. Kunde, Block gossiping on grids and tori: sorting and routing match
the bisection bound deterministically, Proc. European Symposium on
Algorithms, 1993, pp. 272-283.
Randomized Parallel Algorithms 617

[40] T. Leighton, Introduction to Parallel Algorithms and Architectures:


Arrays-Trees-Hypercube, {Morgan-Kaufmann Publishers, 1992}.

[41] T. Leighton, Tight bounds on the complexity of parallel sorting, IEEE


Transactions on Computers C34(4), 1985, pp. 344-354.

[42] L. Lovasz, On determinants, matchings and random algorithm, in L.


Budach (ed.) Fundamentals of Computing Theory, (Berlin, Akademia-
Verlag, 1979).

[43] M. Luby, A simple parallel algorithm for the maximal independent set
problem, SIAM J. Compo 15, 1986, pp. 1036-1053. (Also in Proc. 17th
Annual ACM STaG).

[44] Y. Ma, S. Sen, and D. Scherson, The distance bound for sorting on mesh
connected processor arrays is tight, Proc. 26th Annual IEEE FOCS,
1986, pp. 255-263.

[45] S. Micali and V. V. Vazirani, An O( JfVlIEI) algorithm for finding


maximum matching in general graphs, Proc. 21st Annual IEEE FOCS,
1980, pp. 17-27.

[46] K. Mulmuley, U. V. Vazirani, and V. V. Vazirani, Matching is as easy


as matrix inversion, Combinatorica 7, 1987, pp. 105-113. (Also in Proc.
19th Annual ACM STOC, 1987, pp. 345-354.)

[47] D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms


and a new generalized connection network, Journal of the ACM 29(3),
1982, pp. 642-667.

[48] M. Nigam and S. Sahni, Sorting n 2 numbers on n x n meshes, Proc.


International Parallel Processing Symposium, 1993, pp. 73-78.

[49] S. Nikoletseas, K. Palem, P. Spirakis, and M. Yung, Short vertex dis-


joint paths and multiconnectivity in random graphs: reliable networks
for computing, Proc. 21st ICALP, LNCS, 1994, pp. 508-519.

[50] V. Pan, Fast and efficient algorithms for the exact inversion of integer
matrices, Proc. Fifth Annual Symposium on Foundations of Software
Technology and Theoretical Computer Science, 1985, pp. 504-521.

[51] G. Pantziou, P. Spirakis, and C. Zaroliagis, Coloring random graphs


efficiently in parallel through adaptive techniques, CTI TR-90.10.25,
618 S. Ra.jasekaran and J.D.P. Rolim

Compo Techn. Institute, Patras. Also presented in the ALCOM Work-


shop on Graphs Algorithms, Data Structures and Computational Ge-
ometry, Berlin, October, 1990.

[52] P. M. Pardalos, S. Rajasekaran, editors, Advances in Randomized Par-


allel Computing, (Kluwer Academic Press, 1998).

[53] F. Preparata, New parallel sorting schemes, IEEE Transactions on


Computers C27(7), 1978, pp. 669-673.

[54] P. Raghavan and C. D. Thompson, Provably good routing in graphs:


regular arrays, Proc. 17th Annual ACM STOC, 1985, pp. 79-87.

[55] S. Rajasekaran, A simple parallel sorting algorithm, Technical Report


26, Dept. of CISE, University of Florida, 1997.

[56] S. Rajasekaran, Basic algorithms on parallel optic~l models of comput-


ing, in P. M. Pardalos (ed.) Parallel Processing of Discrete Problems,
(Springer-Verlag, 1998).

[57] S. Rajasekaran, k - k routing, k - k sorting, and cut through routing


on the mesh, Journal of Algorithms 19, 1995, pp. 361-382.

[58] S. Rajasekaran, Sorting and selection on interconnection networks, DI-


MACS Series in Discrete Mathematics and Theoretical Computer Sci-
ence 21, 1995, pp. 275-296.

[59] S. Rajasekaran and J. H. Reif, Optimal and sub-logarithmic time ran-


domized parallel sorting algorithms, SIAM J. Comp., 18(3), 1989, pp.
594-607.

[60] S. Rajasekaran and Th. Tsantilas, Optimal routing algorithms for mesh
connected processor arrays, Algorithmica 8, 1992, pp. 21-38.

[61] A. G. Ranade, How to emulate shared memory, Proc. 28th Annual IEEE
FOCS, 1987, pp. 185-192.

[62] E. Reghbati and D. Corniel, Parallel computations in graph theory,


SIAM J. Compo 7, 1978, pp. 230-237.

[63] J. H. Reif, Depth first search is inherently sequential, Information pro-


cessing Letters 20, 1985, pp. 229-234.
Randomized Parallel Algorithms 619

[64] J. H. Reif and L. G. Valiant, A logarithmic time sort for linear size
networks, Journal of the ACM 34(1), 1987, pp. 60-76.

[65] R. Reischuk, Probabilistic parallel algorithms for sorting and selection,


SIAM J. Compo 14(2), 1985, pp. 396-409.

[66] C. Schnorr and A. Shamir, An optimal sorting algorithm for mesh-


connected computers, Proc. 18th Annual ACM STOC, 1986, pp. 255-
263.

[67] J.T. Schwartz, Fast probabilistic algorithms for verification of polyno-


mial identities, Journal of the ACM27(4), 1980, pp. 701-717.

[68] M. Serna and P. G. Spirakis, Tight nNe approximations to max flow,


Proc. 8th Annual STACS, LNCS 480, 1991, pp. 118-126.

[69] J. R. Smith, Parallel algorithms for depth first searches I: planar graphs,
SIAM J. Compo 15(3), 1986, pp. 814-830.

[70] T. M. Stricker, Supporting the hypercube programming model on mesh


architectures (A fast sorter for iWarp tori), Proc. 4th Annual ACM
Symposium on Parallel Algorithms and Architectures, 1992, pp. 148-
157.

[71] C. D. Thompson and H. T. Kung, Sorting on a mesh connected parallel


computer, Communications of the ACM20(4), 1977, pp. 263-271.

[72] W. T. Thtte, The factorization of linear graphs, J. London Math. Soc.


22, 1947, pp. 107-111.

[73] J. Ullmann and M. Yannakakis, High probability parallel transitive


closure algorithms, SIAM J. Compo 20, 1991, pp. 100-125.

[74] E. Urland, Experimental tests of efficient shortest paths heuristics for


random graphs on the CM-2, Technical Report 71, University of Geneva,
August, 1994.

[75] L. G. Valiant, A scheme for fast parallel communication, SIAM J.


Compo 11, 1982, pp. 350-361.

[76] L. G. Valiant and G. J. Brebner, Universal schemes for parallel com-


munication, Proc. 13th Annual ACM STOC, 1981, pp. 263-277.
620 S. Rajasekaran and J.D.P. Rolim

[77] J. van Leeuwen, Graph Algorithms, in J. van Leeuwen (ed.) Handbook


of Theoretical Computer Science, (Elsevier Science, 1990), Vol. A, 10.
621

HANDBOOK OF COMBINATORIAL OPTIMIZATION (VOL. 3)


D.-Z. Du and P.M. Pardalos (Eds.) pp. 621-757
@1998 Kluwer Academic Publishers

Tabu Search!
Fred Glover
Manuel Laguna
University of Colorado, Boulder, CO 80309-0419, U.S.A.

Contents
1 Tabu Search Background and Relevance 623
1.1 General Tenets . . . . . . . . . . . · 624
1.2 Use of Memory . . . . . . . . . . · 626
1.3 Intensification and diversification . · 628

2 Tabu Search Foundations and Short Term Memory 629


2.1 Memory and Tabu Classifications .. . 631
2.2 Recency-Based Memory . . . . . . . . · 633
2.3 A First Level Tabu Search Approach. · 639
2.3.1 Critical Event Memory . . . . . · 643
2.4 Recency-Based Memory for Add / Drop Moves .. 645
2.4.1 Some Useful Notation · 647
2.4.2 Streamlining . . . . . . . · 650
2.5 Tabu Tenure . . . . . . . . . . . . · 651
2.5.1 Random Dynamic Tenure. · 653
2.5.2 Systematic Dynamic Tenure . . 654
2.6 Aspiration Criteria and Regional Dependencies 655
2.7 Concluding Observations for the Min k-Tree Example 657

3 Additional Aspects of Short Term Memory 660


3.1 Tabu Search and Candidate List Strategies . · 660
3.2 Some General Classes of Candidate List Strategies · 661
3.2.1 Aspiration Plus . . . . . . · 661
3.2.2 Elite Candidate List . . . . . . . . . . . . . .. 664
3.2.3 Successive Filter Strategy . . . . . . . . . . .. 666
IThe material of this chapter is principally adapted from the book Tabu Search, by
Fred Glover and Manuel Laguna, Kluwer Academic Publishers, 1997.
622 F. Glover and M. Laguna

3.2.4 Sequential Fan Candidate List . . . . . . . . . . . . . . . . . 667


3.2.5 Bounded Change Candidate List . . . . . . . . . . . . . . . . 669
3.3 Connections Between Candidate Lists, Tabu Status and Aspiration
Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 669
3.4 Logical Restructuring . . . . . . . . . . . . . . . . . . . . . . . . . . 670
3.4.1 Restructuring by Changing Evaluations and Neighborhoods. 673
3.4.2 Threshold Based Restructuring and Induced Decomposition . 675

4 Longer Term Memory 677


4.1 Frequency-Based Approach 678
4.2 Intensification Strategies . . 681
4.3 Diversification Strategies. . 683
4.3.1 Modifying Choice Rules 684
4.3.2 Restarting.. 685
4.4 Strategic Oscillation . . . . . . 687
4.5 Path Relinking . . . . . . . . . 691
4.5.1 Roles in Intensification and Diversification. 695
4.5.2 Incorporating Alternative Neighborhoods . 695
4.6 The Intensification / Diversification Distinction . . 697
4.7 Some Basic Memory Structures for Longer Term Strategies 699
4.7.1 Conventions........ 699
4.7.2 Frequency-Based Memory 700
4.7.3 Critical Event Memory. . 702

5 Connections, Hybrid Approaches and Learning 703


5.1 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . 705
5.2 Genetic Algorithms. . . . . . . . . . . . . . . . . . . . . . 706
5.2.1 Models of Nature-Beyond "Genetic Metaphors" 708
5.3 Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . 710
5.3.1 Modern Forms and Applications of Scatter Search 715
5.3.2 Scatter Search and Path Relinking Interconnections 716
5.4 Greedy Randomized Adaptive Search Procedures (GRASP) 718
5.5 Neural Networks . . . . . . . . . 721
5.6 Target Analysis . . . . . . . . . . . . . . . . . . . 722
5.6.1 Target Analysis Features . . . . . . . . . 724
5.6.2 Illustrative Application and Implications. 728
5.6.3 Conditional Dependencies Among Attributes 732
5.6.4 Differentiating Among Targets . . . . . . . 733
5.6.5 Generating Rules by Optimization Models . 733

6 Neglected Tabu Search Strategies 735


6.1 Candidate List Strategies .. . . . 735
6.2 Intensification Approaches . . . . . 736
6.2.1 Restarting with Elite Solutions 736
6.2.2 Frequency of Elite Solutions. . 737
Tabu Search 623

6.2.3 Memory and Intensification . . . . . . . . . . .. 738


6.2.4 Relevance of Clustering for Intensification 739
6.3 Diversification Approaches . . . . . . . . . . . . . . 739
6.3.1 Diversification and Intensification Links .... . 740
6.3.2 Implicit Conflict and the Importance of Interactions 741
6.3.3 Reactive Tabu Search . . . . 741
6.4 Strategic Oscillation . . . . . . . . . . 743
6.5 Clustering and Conditional Analysis 744
6.5.1 Conditional Relationships 745
6.6 Referent-Domain Optimization 747
6.7 Final Considerations . . . . . . . 749
References

1 Tabu Search Background and Relevance


Faced with the challenge of solving hard optimization problems that abound
in the real world, classical methods often encounter great difficulty. Vitally
important applications in business, engineering, economics and science can-
not be tackled with any reasonable hope of success, within practical time
horizons, by solution methods that have been the predominant focus of aca-
demic research throughout the past three decades (and which are still the
focus of many textbooks).
The meta-heuristic approach called tabu search (TS) is dramatically
changing our ability to solve problems of practical significance. Current
applications of TS span the realms of resource planning, telecommunica-
tions, VLSI design, financial analysis, scheduling, space planning, energy
distribution, molecular engineering, logistics, pattern classification, flexible
manufacturing, waste management, mineral exploration, biomedical anal-
ysis, environmental conservation and scores of others. In recent years,
journals in a wide variety of fields have published tutorial articles and
computational studies documenting successes by tabu search in extending
the frontier of problems that can be handled effectively-yielding solutions
whose quality often significantly surpasses that obtained by methods previ-
ously applied. Table 1 gives a partial catalog of example applications. A
more comprehensive list, including summary descriptions of gains achieved
from practical implementations, can be found in Glover and Laguna, 1997.
Reports of recent TS implementations can also be found on the web site
http://www .colorado.edu/Business/TabuSearch.
624 F. Glover and M. Laguna

A distinguishing feature of tabu search is embodied in its exploitation of


adaptive forms of memory, which equips it to penetrate complexities that
often confound alternative approaches. Yet we are only beginning to tap
the rich potential of adaptive memory strategies, and the discoveries that
lie ahead promise to be as important and exciting as those made to date.
The knowledge and principles that have emerged from the TS framework
give a foundation to create practical systems whose capabilities markedly
exceed those available earlier. At the same time, there are many untried
variations that may lead to further advances. A conspicuous feature of tabu
search is that it is dynamically growing and evolving, drawing on important
contributions by many researchers.

1.1 General Tenets


The word tabu (or taboo) comes from Tongan, a language of Polynesia, where
it was used by the aborigines of Tonga island to indicate things that cannot
be touched because they are sacred. According to Webster's Dictionary,
the word now also means "a prohibition imposed by social custom as a
protective measure" or of something "banned as constituting a risk." These
current more pragmatic senses of the word accord well with the theme of
tabu search. The risk to be avoided in this case is that of following a counter-
productive course, including one which may lead to entrapment without
hope of escape. On the other hand, as in the broader social context where
" protective prohibitions" are capable of being superseded when the occasion
demands, the "tabus" of tabu search are to be overruled when evidence of
a preferred alternative becomes compelling.
The most important association with traditional usage, however, stems
from the fact that tabus as normally conceived are transmitted by means of
a social memory which is subject to modification over time. This creates the
fundamental link to the meaning of "tabu" in tabu search. The forbidden
elements of tabu search receive their status by reliance on an evolving mem-
ory, which allows this status to shift according to time and circumstance.
More particularly, tabu search is based on the premise that problem
solving, in order to qualify as intelligent, must incorporate adaptive mem-
ory and responsive exploration. The adaptive memory feature of TS allows
the implementation of procedures that are capable of searching the solu-
tion space economically and effectively. Since local choices are guided by
information collected during the search, TS contrasts with memoryless de-
signs that heavily rely on semirandom processes that implement a form
of sampling. Examples of memoryless methods include semigreedy heuris-
Tabu Search 625

Scheduling Telecommunications
Flow-Time Cell Manufacturing Call Routing
Heterogeneous Processor Scheduling Bandwidth Packing
Workforce Planning Hub Facility Location
Classroom Scheduling Path Assignment
Machine Scheduling Network Design for Services
Flow Shop Scheduling Customer Discount Planning
Job Shop Scheduling Failure Immune Architecture
Sequencing and Batching Synchronous Optical Networks

Design Production, Inventory and Investment


Computer-Aided Design Flexible Manufacturing
Fault Tolerant Networks Just-in-Time Production
Transport Network Design Capacitated MRP
Architectural Space Planning Part Selection
Diagram Coherency Multi-item Inventory Planning
Fixed Charge Network Design Volume Discount Acquistion
Irregular Cutting Problems Fixed Mix Investment

Location and Allocation Routing


Multicommodity Location / Allocation Vehicle Routing
Quadratic Assignment Capacitated Routing
Quadratic Semi-Assignment Time Window Routing
Multilevel Generalized Assignment Multi-Mode Routing
Lay-Out Planning Mixed Fleet Routing
Off-Shore Oil Exploration Traveling Salesman
Traveling Purchaser

Logic and Artificial Intelligence Graph Optimization


Maximum Satisfiability Graph Partitioning
Probabilistic Logic Graph Coloring
Clustering Clique Partitioning
Pattern Recognition / Classification Maximum Clique Problems
Data Integrity Maximum Planner Graphs
Neural Network-Training and Design P-Median Problems

Technology General Combinatorial Optimization


Seismic Inversion Zero-One Programming
Electrical Power Distribution Fixed Cha,rge Optimization
Engineering Structural Design Nonconvex Nonlinear Programming
Minimum Volume Ellipsoids All-or-None Networks
Space Station Construction Bilevel Programming
Circuit Cell Placement General Mixed Integer Optimization

Table 1: Illustrative tabu search applications.


626 F. Glover and M. Laguna

tics and the prominent "genetic" and "annealing" approaches inspired by


metaphors of physics and biology. Adaptive memory also contrasts with
rigid memory designs typical of branch and bound strategies. (It can be ar-
gued that some types of evolutionary procedures that operate by combining
solutions, such as genetic algorithms, embody a form of implicit memory.
Special links with evolutionary methods, and implications for establishing
more effective variants of them, are discussed in Section 5.)
The emphasis on responsive exploration in tabu search, whether in a
deterministic or probabilistic implementation, derives from the supposition
that a bad strategic choice can yield more information than a good random
choice. In a system that uses memory, a bad choice based on strategy can
provide useful clues about how the strategy may profitably be changed.
(Even in a space with significant randomness a purposeful design can be
more adept at uncovering the imprint of structure.)
Responsive exploration integrates the basic principles of intelligent search,
i.e., exploiting good solution features while exploring new promising regions.
Tabu search is concerned with finding new and more effective ways of taking
advantage of the mechanisms associated with both adaptive memory and re-
sponsive exploration. The development of new designs and strategic mixes
makes TS a fertile area for research and empirical study.

1.2 Use of Memory


The memory structures in tabu search operate by reference to four principal
dimensions, consisting of recency, frequency, quality, and influence (Fig-
ure 1). Recency-based and frequency-based based memory complement each
other, and have important characteristics we amplify in later sections. The
quality dimension refers to the ability to differentiate the merit of solutions
visited during the search. In this context, memory can be used to identify
elements that are common to good solutions or to paths that lead to such
solutions. Operationally, quality becomes a foundation for incentive-based
learning, where inducements are provided to reinforce actions that lead to
good solutions and penalties are provided to discourage actions that lead to
poor solutions. The flexibility of these memory structures allows the search
to be guided in a multi-objective environment, where the goodness of a par-
ticular search direction may be determined by more than one function. The
tabu search concept of quality is broader than the one implicitly used by
standard optimization methods.
The fourth dimension, influence, considers the impact of the choices
made during the search, not only on quality but also on structure. (In a
Tabu Search 627

Quality Influence

Recency Frequency

Figure 1: Four TS dimensions.

sense, quality may be regarded as a special form of influence.) Recording


information about the influence of choices on particular solution elements
incorporates an additional level of learning. By contrast, in branch and
bound, for example, the separation rules are prespecified and the branching
directions remain fixed, once selected, at a given node of a decision tree. It
is clear however that certain decisions have more influence than others as
a function of the neighborhood of moves employed and the way that this
neighbor hood is negotiated (e.g., choices near the root of a branch and bound
tree are quite influential when using a depth-first strategy). The assessment
and exploitation of influence by a memory more flexible than embodied in
such tree searches is an important feature of the TS framework.
The memory used in tabu search is both explicit and attributive. Ex-
plicit memory records complete solutions, typically consisting of elite solu-
tions visited during the search. An extension of this memory records highly
attractive but unexplored neighbors of elite solutions. The memorized elite
solutions (or their attractive neighbors) are used to expand the local search,
as indicated in Section 3. In some cases explicit memory has been used to
guide the search and avoid visiting solutions more than once. This appli-
628 F. Glover and M. Laguna

cation is limited, because clever data structures must be designed to avoid


excessive memory requirements.
Alternatively, TS uses attributive memory for guiding purposes. This
type of memory records information about solution attributes that change
in moving from one solution to another. For example, in a graph or network
setting, attributes can consist of nodes or arcs that are added, dropped
or repositioned by the moving mechanism. In production scheduling, the
index of jobs may be used as attributes to inhibit or encourage the method
to follow certain search directions.

1.3 Intensification and diversification


Two highly important components of tabu search are intensification and
diversification strategies. Intensification strategies are based on modifying
choice rules to encourage move combinations and solution features histori-
cally found good. They may also initiate a return to attractive regions to
search them more thoroughly. Since elite solutions must be recorded in or-
der to examine their immediate neighborhoods, explicit memory is closely
related to the implementation of intensification strategies. As Figure 2 illus-
trates, the main difference between intensification and diversification is that
during an intensification stage the search focuses on examining neighbors of
elite solutions.
Here the term "neighbors" has a broader mea~ing than in the usual
context of "neighborhood search." That is, in addition to considering so-
lutions that are adjacent or close to elite solutions by means of standard
move mechanisms, intensification strategies generate "neighbors" by either
grafting together components of good solution or by using modified evalua-
tion strategies that favor the introduction of such components into a current
(evolving) solution. The diversification stage on the other hand encourages
the search process to examine unvisited regions and to generate solutions
that differ in various significant ways from those seen before. Again, such an
approach can be based on generating subassemblies of solution components
that are then "fleshed out" to produce full solutions, or can rely on modi-
fied evaluations as embodied, for example, in the use of penalty / incentive
functions.
Intensification strategies require a means for identifying a set of elite solu-
tions as basis for incorporating good attributes into newly created solutions.
Membership in the elite set is often determined by setting a threshold which
is connected to the objective function value of the best solution found during
the search. However, considerations of clustering and "anti-clustering" are
Tabu Search 629


•••••
Unvisi1e d solutions Me iglbors of
eli1e solutions

Figure 2: Intensification and diversification.

also relevant for generating such a set, and more particularly for generating
subsets of solutions that may be used for specific phases of intensification and
diversification. In the following sections, we show how the treatment of such
concerns can be enhanced by making use of special memory structures. The
TS notions of intensification and diversification are beginning to find their
way into other meta-heuristics, and it is important to keep in mind (as we
subsequently demonstrate) that these ideas are somewhat different than the
old control theory concepts of "exploitation" and "exploration," especially
in their implications for developing effective problem solving strategies.

2 Tabu Search Foundations and Short Term Mem-


ory
Tabu search can be applied directly to verbal or symbolic statements of many
kinds of decision problems, without the need to transform them into math-
ematical formulations. Nevertheless, it is useful to introduce mathematical
notation to express a broad class of these problems, as a basis for describing
630 F. Glover and M. Laguna

certain features of tabu search. We characterize this class of problems as


that of optimizing (minimizing or maximizing) a function J(x) subject to x
E X, where J(x) may be linear or nonlinear, and the set X summarizes con-
straints on the vector of decision variables x. The constraints may include
linear or nonlinear inequalities, and may compel all or some components of x
to receive discrete values. While this representation is useful for discussing a
number of problem solving considerations, we emphasize again that in many
applications of combinatorial optimization, the problem of interest may not
be easily formulated as an objective function subject to a set of constraints.
The requirement x E X, for example, may specify logical conditions or in-
terconnections that would be cumbersome to formulate mathematically, but
may be better be left as verbal stipulations that can be then coded as rules.
Tabu search begins in the same way as ordinary local or neighborhood
search, proceeding iteratively from one point (solution) to another until a
chosen termination criterion is satisfied. Each x E X has an associated
neighborhood N(x) C X, and each solution x' E N(x) is reached from x by
an operation called a move.
As an initial point of departure, we may contrast TS with a simple
descent method where the goal is to minimizeJ(x) (or a corresponding ascent
method where the goal is to maximize J(x)). Such a method only permits
moves to neighbor solutions that improve the current objective function
value and ends when no improving solutions can be found. A pseudo-code
of a generic descent method is presented in Figure 3. The final x obtained
by a descent method is called a local optimum, since it is at least as good
or better than all solutions in its neighborhood. The evident shortcoming
of a descent method is that such a local optimum in most cases will not be
a global optimum, i.e., it usually will not minimize J(x) over all x E X.
The version of a descent method called steepest descent scans the entire
neighborhood of x in search of a neighbor solution x' that gives a smallest
f(x ' ) value over x' E N(x). Steepest descent implementations of some types
of solution approaches (such as certain path augmentation algorithms in
networks and matroids) are guaranteed to yield globally optimal solutions
for the problems they are designed to handle, while other forms of descent
may terminate with local optima that are not global optima. In spite of this
attractive feature, in certain settings steepest descent is sometimes imprac-
tical because it is computationally too expensive, as where N(x) contains
many elements or each element is costly to retrieve or evaluate. Still, it is
often valuable to choose an x, at each iteration that yields a "good" if not
smallest f (x') value.
The relevance of choosing good solutions from current neighborhoods is
Thbu Search 631

1] Choose x e X to start the J>rocess.


2) Find x' eN(x) such that fl x') <!(x) .
3) If no such x' can be found, x is the local
optimum and the method, stops.
4] Otherwise, designateX to be the new x and
go to 2).

Figure 3: Descent method.

magnified when the guidance mechanisms of tabu search are introduced to


go beyond the locally optimal termination point of a descent method. Thus,
an important first level consideration for tabu search is to determine an ap-
propriate candidate list strategy for narrowing the examination of elements of
N(x), in order to achieve an effective tradeoff between the quality of x' and
the effort expended to find it. Here quality may involve considerations be-
yond those narrowly reflected by the value of f(x'). If a neighborhood space
is totally random, then of course nothing will work better than a totally
random choice. (In such a case there is no merit in trying to devise an ef-
fective solution procedure.) Assuming that neighborhoods can be identified
that are reasonably meaningful for a given class of problems, the challenge
is to define solution quality appropriately so that evaluations likewise will
have meaning. By the TS orientation, the ability to use history in creating
such evaluations then becomes important for devising effective methods.
To give a foundation for understanding the basic issues involved, we turn
our attention to the following illustrative example, which will also be used
as a basis for illustrating various aspects of tabu search in later sections.

2.1 Memory and Tabu Classifications


An important distinction in TS arises by differentiating between short term
memory and longer term memory. Each type of memory is accompanied
by its own special strategies. However, the effect of both types of memory
may be viewed as modifying the neighborhood N(x) of the current solution
x. The modified neighborhood, which we denote by N*(x), is the result of
632 F. Glover and M. Laguna

maintaining a selective history of the states encountered during the search.


In the TS strategies based on short term considerations, N* (x) charac-
teristically is a subset of N(x), and the tabu classification serves to identify
elements of N{x) excluded from N*(x). In TS strategies that include longer
term considerations, N* (x) may also be expanded to include solutions not
ordinarily found in N{x). Characterized in this way, TS may be viewed as
a dynamic neighborhood method. This means that the neighborhood of x
is not a static set, but rather a set that can change according to the history
of the search. This feature of a dynamically changing neighborhood also
applies to the consideration of selecting different component neighborhoods
from a compound neighborhood that encompasses multiple types or levels
of moves, and provides an important basis for parallel processing. Charac-
teristically, a TS process based strictly on short term strategies may allow a
solution x to be visited more than once, but it is likely that the corresponding
reduced neighborhood N*{x) will be different each time. With the inclusion
of longer term considerations, the likelihood of duplicating a previous neigh-
borhood upon revisiting a solution, and more generally of making choices
that repeatedly visit only a limited subset of X, is all but nonexistent. From
a practical standpoint, the method will characteristically identify an optimal
or near optimal solution long before a substantial portion of X is examined.
A crucial aspect of TS involves the choice of an appropriate definition
of N*(x). Due to the exploitation of memory, N*(x) depends upon the
trajectory followed in moving from one solution to the next (or upon a
collection of such trajectories in a parallel processing environment).

The approach of storing complete solutions (explicit memory) generally


consumes an enormous amount of space and time when applied to each solu-
tion generated. A scheme that emulates this approach with limited memory
requirements is given by the use of hash functions. (Also, as will be seen,
explicit memory has a valuable role when selectively applied in strategies
that record and analyze certain "special" solutions.) Regardless of the
implementation details, short term memory functions provide one of the im-
portant cornerstones of the TS methodology. These functions give the search
the opportunity to continue beyond local optima, by allowing the execution
of nonimproving moves coupled with the modification of the neighborhood
structure of subsequent solutions. However, instead of recording full solu-
tions, these memory structures are generally based on recording attributes
(attributive memory). In addition, short term memory is often based on the
most rece.nt history of the search trajectory.
Tabu Search 633

2.2 Recency-Based Memory


The most commonly used short term memory keeps track of solutions at-
tributes that have changed during the recent past, and is called recency-based
memory. This is the kind of memory that is included in most short descrip-
tions of tabu search in the literature (although a number of its aspects are
often left out by popular summaries).
To exploit this memory, selected attributes that occur in solutions re-
cently visited are labeled tabu-active, and solutions that contain tabu-active
elements, or particular combinations of these attributes, are those that be-
come tabu. This prevents certain solutions from the recent past from be-
longing to N*(x) and hence from being revisited. Other solutions that share
such tabu-active attributes are also similarly prevented from being visited.
Note that while the tabu classification strictly refers to solutions that are
forbidden to be visited, by virtue of containing tabu-active attributes (or
more generally by violating certain restriction based on these attributes),
we also often refer to moves that lead to such solutions as being tabu. We
illustrate these points with the following example.

Minimum k- Tree Problem Example

The Minimum k- Tree problem seeks a tree consisting of k edges in a graph


so that the sum of the weights of these edges is minimum (Lokketangen, et
al. 1994). An instance of this problem is given in Figure 4, where nodes are
shown as numbered circles, and edges are shown as lines that join pairs of
nodes (the two "endpoint" nodes that determine the edge). Edge weights
are shown as the numbers attached to these lines. A tree is a set of edges
that contains no cycles, i.e., that contains no paths that start and end at
the same node (without retracing any edges).
Assume that the move mechanism is defined by edge-swapping, as sub-
sequently described, and that a greedy procedure is used to find an initial
solution. The greedy construction starts by choosing the edge (i, J) with
the smallest weight in the graph, where i and j are the indexes of the nodes
that are the endpoints of the edge. The remaining k-1 edges are chosen
successively to minimize the increase in total weight at each step, where the
edges considered meet exactly one node from those that are endpoints of
edges previously chosen. For k = 4, the greedy construction performs the
steps in Table 2.
The construction starts by choosing edge (1,2) with a weight of 1 (the
smallest weight of any edge in the graph). After this selection, the candidate
634 F. Glover and M. Laguna

Figure 4: Weighted undirected graph.

edges are those that connect the nodes in the current partial tree with those
nodes not in the tree (Le., edges (1,4) and (2,3)). Since edge (1,4) minimizes
the weight increase, it is chosen to be part of the partial solution. The rest
of the selections follow the same logic, and the construction ends when the
tree consists of 4 edges (i.e., the value of k). The initial solution in this
particular case has a total weight of 40.
The swap move mechanism, which is used from this point onward, re-
places a selected edge in the tree by another selected edge outside the tree,
subject to requiring that the resulting subgraph is also a tree. There are

Step Candidates Selection Total Weight


1 (1,2) (1,2) 1
2 (1,4), (2,3) (1,4) 26
3 (2,3), (3,4), (4,6), (4,7) (4,7) 34
4 (2,3), (3,4), (4,6), (6,7), (7,8) (6,7) 40

Table 2: Greedy construction.


Tabu Search 635

Greedy solution
Tetal weight: 40

Be st static s\W.p
Be st dynamic swap
Total ~ight: 47
Total ~ight: 51
15

Figure 5: Swap move types.

actually two types of such edge swaps, one that maintains the current nodes
of the tree unchanged (static) and one that results in replacing a node of
the tree by a new node (dynamic). Figure 5 illustrates the best swap of each
type that can be made starting from the greedy solution. The added edge
in each case is shown by a heavy line and the dropped edge is shown by a
dotted line.
The best move of both types is the static swap of Figure 5, where for
our present illustration we are defining best solely in terms of the change on
the objective function value. Since this best move results in an increase of
the total weight of the current solution, the execution of such move aban-
dons the rules of a descent approach and sets the stage for a tabu search
process. (The feasibility restriction that requires a tree to be produced at
each step is particular to this illustration, since in general the TS methodol-
ogy may include search trajectories that violate various types of feasibility
conditions. )
Given a move mechanism, such as the swap mechanism we have selected
for our example, the next step is to choose the key attributes that will be
636 F. Glover and M. Laguna

used for the tabu classification. Tabu search is very flexible at this stage of
the design. Problem-specific knowledge can be used as guidance to settle on
a particular design. In problems where the moves are defined by adding and
deleting elements, the labels of these elements can be used as the attributes
for enforcing tabu status. Here, in the present example, we can simply refer
to the edges as attributes of the move, since the condition of being in or
out of the tree (which is a distinguishing property of the current solution)
may be assumed to always be automatically known by a reasonable solution
representation.

Choosing Tabu Classifications

Tabu classifications do not have to be symmetric, that is, the tabu struc-
ture can be designed to treat added and dropped elements differently. Sup-
pose for example that after choosing the static swap of Figure 5, which adds
edge (4,6) and drops edge (4,7), a tabu status is assigned to both of these
edges. Then one possibility is to classify both of these edges tabu-active for
the same number of iterations. The tabu-active status has different mean-
ings depending on whether the edge is added or dropped. For an added
edge, tabu-active means that this edge is not allowed to be dropped from
the current tree for the number of iterations that defines its tabu tenure.
For a dropped edge, on the other hand, tabu-active means the edge is not
allowed to be included in the current solution during its tabu tenure. Since
there are many more edges outside the tree than in the tree, it seems rea-
sonable to implement a tabu structure that keeps a recently dropped edge
tabu- active for a longer period of time than a recently added edge. Notice
also that for this problem the tabu-active period for added edges is bounded
by k, since if no added edge is allowed to be dropped for k iterations, then
within k steps all available moves will be classified tabu.
The concept of creating asymmetric tabu classifications can be readily
applied to settings where add/drop moves are not used.

Illustrative Tabu Classifications for the Min k- Tree Problem

As previously remarked, the tabu-active classification may in fact pre-


vent the search from visiting solutions that have not been examined yet.
We illustrate this phenomenon as follows. Suppose that in the Min k- Tree
problem instance of Figure 4, dropped edges are kept tabu-active for 2 iter-
ations, while added edges are kept tabu-active for only one iteration. (The
number of iterations an edge is kept tabu-active is called the tabu tenure of
the edge.) Also assume that we define a swap move to be tabu if either its
Tabu Search 637

Iteration Tabu-active net tenure Add Drop Weight


1 2
1 (4,6) (4,7) 47
2 (4,6) (4,7) (6,8) (6,7) 57
3 (6,8), (4,7) (6,7) (8,9) (1,2) 63
Table 3: TS Iterations.

added or dropped edge is tabu-active. If we examine the full neighborhood


of available edge swaps at each iteration, and always choose the best that is
not tabu, then the first three moves are as shown in Table
I
3 below (starting
from the initial solution found by the greedy construction heuristic). The
move of iteration 1 is the static swap move previously identified in Figure 5.
Diagrams showing the successive trees generated by these moves, starting
with the initial greedy solution, are given in Figure 6.
The net tenure values of 1 and 2 in Table 3 for the currently tabu-active
edges indicate the number of iterations that these edges will remain tabu-
active {including the current iteration}.
At iteration 2, the reversal of the move of iteration 1 (that is, the move
that now adds (4,7) and drops (4,6)) is clearly tabu, since both of its edges
are tabu-active at iteration 2. In addition, the move that adds (4, 7) and
drops (6,7) is also classified tabu, because it contains the tabu- active edge
(4,7) (with a net tenure of 2). This move leads to a solution with a total
weight of 49, a solution that clearly has not been visited before {see Figure 6}.
The tabu-active classification of (4,7) has modified the original neighborhood
of the solution at iteration 2, and has forced the search to choose a move
with an inferior objective function value (i.e., the one with a total weight
of 57). In this case, excluding the solution with a total weight of 49 has
little effect on the quality of the best solution found (since we have already
obtained one with a weight of 40).
In other situations, however, additional precautions must be taken to
avoid missing good solutions. These strategies are known as aspiration cri-
teria and are the subject of Section 2.6. For the moment we observe simply
that if the tabu solution encountered at the current step instead had a weight
of 39, which is better than the best weight of 40 so far seen, then we would
allow the tabu classification of this solution to be overridden and consider
the solution admissible to be visited. The aspiration criterion that applies in
this case is called the improved-best aspiration criterion. {It is important to
keep in mind that aspiration criteria do not compel particular moves to be
638 F. Glover and M. Laguna

¥iight: 40 Weight 47 WeightS7

® ®
®
Iterati.on: 3

Figure 6: Effects of attributive short term memory.


Tabu Search 639

selected, but simply make them available, or alternately rescind evaluation


penalties attached to certain tabu classifications.)
One other comment about tabu classification deserves to be made at this
point. In our preceding discussion of the Min Ie- Tree problem we consider a
swap move tabu if either its added edge or its dropped edge is tabu-active.
However, we could instead stipulate that a swap move is tabu only if both
its added and dropped edges are tabu-active. In general, the tabu status of
a move is a function of the tabu-active attributes of the move (Le., of the
new solution produced by the move).

2.3 A First Level Tabu Search Approach


We now have on hand enough ingredients for a first level tabu search pro-
cedure. Such a procedure is sometimes implemented in an initial phase of a
TS development to obtain a preliminary idea of performance and calibration
features, or simply to provide a convenient staged approach for the purpose
of debugging solution software. While this naive form of a TS method omits
a number of important short term memory considerations, and does not
yet incorporate longer term concerns, it nevertheless gives a useful starting
point for demonstrating several basic aspects of tabu search.
We start from the solution with a weight of 63 as shown previously in
Figure 6 which was obtained at iteration 3. At each step we select the
least weight non-tabu move from those available, and use the improved-best
aspiration criterion to allow a move to be considered admissible in spite of
leading to a tabu solution. The reader may verify that the outcome leads to
the series of solutions shown in Table 4, which continues from iteration 3,
just executed. For simplicity, we select an arbitrary stopping rule that ends
the search at iteration 10.
The successive solutions identified in Table 4 are shown graphically in
Figure 7 below. In addition to identifying the dropped edge at each step
as a dotted line, we also identify the dropped edge from the immediately
preceding step as a dotted line which is labeled 2*, to indicate its current net
tabu tenure of 2. Similarly, we identify the dropped edge from one further
step back by a dotted line which is labeled 1*, to indicate its current net tabu
tenure of 1. Finally, the edge that was added on the immediately preceding
step is also labeled 1* to indicate that it likewise has a current net tabu
tenure of 1. Thus the edges that are labeled with tabu tenures are those
which are currently tabu-active, and which are excluded from being chosen
by a move of the current iteration (unless permitted to be chosen by the
aspiration criterion).
640 F. Glover and M. Laguna

Iteration Tabu-active net tenure Add Drop Move Weight


1 2 Value
3 (6,8), (4,7) (6,7) (8,9) (1,2) 6 63
4 (6,7), (8,9) (1,2) (4,7) (1,4) -17 46
5 (1,2), (4,7) (1,4) (6,7) (4,6) -9 37*
6 (1,4), (6,7) (4,6) (6,9) (6,8) 0 37
7 (4,6), (6,9) (6,8) (8,10) (4,7) 1 38
8 (6,8), (8,10) (4,7) (9,12) (6,7) 3 41
9 (4,7), (9,12) (6,7) (10,11) (6,9) -7 34*
10 (6,7), (10,11) (6,9) (5,9) (9,12) 7 41
Table 4: Iterations of a first level TS procedure.

As illustrated in Table 4 and Figure 7 the method continues to generate


different solutions, and over time the best known solution (denoted by an
asterisk) progressively improves. In fact, it can be verified for this simple
example that the solution obtained at iteration 9 is optimal. (In general,
of course, there is no known way to verify optimality in polynomial time
for difficult discrete optimization problems, i.e., those that fall in the class
called NP-hard. The Min k-Tree problem is one of these.)
It may be noted that at iteration 6 the method selected a move with a
move value of zero. Nevertheless, the configuration of the current solution
changes after the execution of this move, as illustrated in Figure 7.
The selection of moves with certain move values, such as zero move val-
ues, may be strategically controlled, to limit their selection as added insur-
ance against cycling in special settings. We will soon see how considerations
beyond this first level implementation can lead to an improved search trajec-
tory, but the non-monotonic, gradually improving, behavior is characteristic
of TS in general. Figure 8 provides a graphic illustration of this behavior
for the current example.
We have purposely chosen the stopping iteration to be small to illustrate
an additional relevant feature, and to give a foundation for considering cer-
tain types of longer term considerations. One natural way to apply TS is to
periodically discontinue its progress, particularly if its rate of finding new
best solutions falls below a preferred level, and to restart the method by a
process designated to generate a new sequence of solutions.
Classical restarting procedures based on randomization evidently can
be used for this purpose, but TS often derives an advantage by employ-
Tabu Search 641

IteRtian: 3 Weiglt: 63 IteRtian: 4 Weight: 46 IteRtian: 5 \ Weiglt:~

CV G) CV
0-W 0W
::lolc

1 11* 1*
:l o1c

l
1*

IteRtian: 6 Weiglt:~ IteRtian: 1 Weight: 38 IteRtian: 8 Weiglt: 41

I teRtian: 9

Figure 7: Graphical representation of TS iterations.


642 F. Glover and M. Laguna

65

60
-o--C urrent Weight
-+-Best Weight
55

SO
1:
t1)

~ '5

'0

JS

JO
D 2 5 6 ., 8 9 1D
Iteration s

Figure 8: TS search trajectory.


Thbu Search 643

ing more strategic forms of restarting. We illustrate a simple instance of


such a restarting procedure, which also serves to introduce a useful memory
concept.

2.3.1 Critical Event Memory


Critical Event memory in tabu search, as its name implies, monitors the
occurrence of certain critical events during the search, and establishes a
memory that constitutes an aggregate summary of these events. For our
current example, where we seek to generate a new starting solution, a crit-
ical event that is clearly relevant is the generation of the previous starting
solution. Correspondingly, if we apply a restarting procedure multiple times,
the steps of generating all preceding starting solutions naturally qualify as
critical events. That is, we would prefer to depart from these solutions in
some significant manner as we generate other starting solutions.
Different degrees of departure, representing different levels of diversifica-
tion, can be achieved by defining solutions that correspond to critical events
in different ways (and by activating critical event memory by different rules).
In the present setting we consider it important that new starting solutions
not only differ from preceding starting solutions, but that they also differ
from other solutions generated during previous passes. One possibility is
to use a blanket approach that considers each complete solution previously
generated to represent a critical event. The aggregation of such events by
means of critical event memory makes this entirely practicable, but often
it is quite sufficient (and, sometimes preferable) to isolate a smaller set of
solutions.
For the current example, therefore, we will specify that the critical events
of interest consist of generating not only the starting solution of the previ-
ous pass (es), but also each subsequent solution that represents a "local TS
optimum," i.e. whose objective function value is better (or no worse) than
that of the solution immediately before and after it. Using this simple def-
inition we see that four solutions qualify as critical (i.e., are generated by
the indicated critical events) in the first solution pass of our example: the
initial solution and the solutions found at iterations 5, 6 and 9 (with weights
of 40, 37, 37 and 34, respectively).
Since the solution at iteration 9 happens to be optimal, we are interested
in the effect of restarting before this solution is found. Assume we had chosen
to restart after iteration 7, without yet reaching an optimal solution. Then
the solutions that correspond to critical events are the initial solution and the
solutions of iterations 5 and 6. We treat these three solutions in aggregate
644 F. Glover and M. Laguna

Step Candidates Selection Total Weight


1 (3,5) (3,5) 6
2 (2,3), (3,4), (3,6), (5,6), (5,9), (5,9) 22
(5,12)
3 (2,3), (3,4), (3,6), (5,6), (5,12), (8,9) 29
(6,9), (8,9), (9,12)
4 (2,3), (3,4), (3,6), (5,6), (5,12), (8,10) 38
(6,8), (6,9), (7,8), (8,10), (9,12)

Table 5: Restart procedure.

by combining their edges, to create a subgraph that consists of the edges


(1,2), (1,4), (4,7), (6,7), (6,8), (8,9) and (6,9). (Frequency-based memory,
as discussed in Section 4, refines this representation by accounting for the
number of times each edge appears in the critical solutions, and allows the
inclusion of additional weighting factors.)
To execute a restarting procedure, we penalize the inclusion of the edges
of this subgraph at various steps of constructing the new solution. It is
usually preferable to apply this penalty process at early steps, implicitly
allowing the penalty function to decay rapidly as the number of steps in-
creases. It is also sometimes useful to allow one or more intervening steps
after applying such penalties before applying them again.
For our illustration, we will use the memory embodied in the subgraph
of penalized edges by introducing a large penalty that effectively excludes
all these edges from consideration on the first two steps of constructing the
new solution. Then, because the construction involves four steps in total,
we will not activate the critical event memory on subsequent construction
steps, but will allow the method to proceed in its initial form.
Applying this approach, we restart the method by first choosing edge
(3,5), which is the minimum weight edge not in the penalized subgraph. This
choice and the remaining choices that generate the new starting solution are
shown in Table 5.
Beginning from the solution constructed in Table 5, and applying the
first level TS procedure exactly as it was applied on the first pass, generates
the sequence of solutions shown in Table 6 and depicted in Figure 9. (Again,
we have arbitrarily limited the total number of iterations, in this case to 5.)
It is interesting to note that the restarting procedure generates a better
solution (with a total weight of 38) than the initial solution generated dur-
Tabu Search 645

Iteration Tabu-active net tenure Add Drop Move Weight


1 2 Value
1 (9,12) (3,5) 3 41
2 (9,12) (3,5) (10,11) (5,9) -7 34*
3 (3,5), (10,11) (5,9) (6,8) (9,12) 7 41
4 (5,9), (6,8) (9,12) (6,7) (10,11) -3 38
5 (9,12), (6,7) (10,11) (4,7) (8,10) -1 37

Table 6: Iterations of a first level TS procedure.

ing the first construction (with a total weight of 40). Also, the restarting
solution contains 2 "optimal edges" (Le., edges that appear in the optimal
tree). This starting solution allows the search trajectory to find the optimal
solution in only two iterations, illustrating the benefits of applying a critical
event memory within a restarting strategy. As will be seen in Section 4,
related memory structures can also be valuable for strategies that drive the
search into new regions by "partial restarting" or by directly continuing a
current trajectory (with modified decision rules).
Now we return from our example to examine elements of TS that take
us beyond these first level concerns, and open up possibilities for creating
more powerful solution approaches. We continue to focus primarily on short
term aspects, and begin by discussing how to generalize the use of recency-
based memory when neighborhood exploration is based on add/drop moves.
From these foundations we then discuss issues of logical restructuring, tabu
activation rules and ways of determining tabu tenure. We then examine the
important area of aspiration criteria, together with the role of influence.

2.4 Recency-Based Memory for Add / Drop Moves


To understand procedurally how various forms of recency-based memory
work, and to see their interconnections, it is useful to examine a convenient
design for implementing the ideas illustrated so far. Such a design for the
Min k-Tree problem creates a natural basis for handling a variety of other
problems for which add/drop moves are relevant. In addition, the ideas can
be adapted to settings that are quite different from those where add/drop
moves are used.
As a step toward fuller generality, we will refer to items added and
dropped as elements, though we will continue to make explicit reference
to edges (as particular types of elements) within the context of the Min k-
646 F. Glover and M. Laguna

Reswtirc PoUt We~ld::38 Iteratian: 1 Itera.tion: 2 Weight: 34

@ 0~"-0
"
0 ® 0 ®
0) 0 0
Itera.tion: 5

Q)!.ok-Q), @ 0, ,10k eD 0 @
,20k " '1 0k
0 " 9@
I
:1 0k
(2)

Figure 9: Graphical representation of TS iterations after restarting.


Tabu Search 647

Tree problem example. (Elements are related to, but not quite the same as,
solution attributes. The difference will be made apparent shortly.) There
are many settings where operations of adding and dropping paired elements
are the cornerstone of useful neighborhood definitions. For example, many
types of exchange or swap moves can be characterized by such operations.
Add/ drop moves also apply to the omnipresent class of multiple choice prob-
lems, which require that exactly one element must be chosen from each
member set from a specified disjoint collection. Add/drop moves are quite
natural in this setting, since whenever a new element is chosen from a given
set (and hence is "added" to the current solution), the element previously
chosen from that set must be replaced (and hence" dropped "). Such prob-
lems are represented by discrete generalized upper bound (GUB) formulations
in mathematical optimization, where various disjoint sets of 0-1 variables
must sum to 1 (hence exactly one variable from each set must equal 1, and
the others must equal 0). An add/drop move in this formulation consists
of choosing a new variable to equal 1 (the "add move") and setting the
associated (previously selected) variable equal to 0 (the "drop move").
Add/drop moves further apply to many types of problems that are not
strictly discrete, that is, which contain variables whose values can vary con-
tinuously across specified ranges. Such applications arise by taking advan-
tage of basis exchange (pivoting) procedures, such as the simplex method
of linear programming. In this case, an add/drop move consists of selecting
a new variable to enter (add to) the basis, and identifying an associated
variable to leave (drop from) the basis. A variety of procedures for nonlin-
ear and mixed integer optimization rely on such moves, and have provided
a useful foundation for a number of tabu search applications. Additional
related examples will be encountered throughout the course of this book.

2.4.1 Some Useful Notation


The approach used in the Min k-Tree problem can be conveniently described
by means of the following notation. For a pair of elements that is selected to
perform an add/drop move, let Added denote the element that is added, and
Dropped the element that is dropped. Also denote the current iteration at
which this pair is selected by Iter. We maintain a record of Iter to identify
when Added and Dropped start to be tabu-active. Specifically, at this step
we set:

TabuDropStart( Added) = Iter


TabuAddStart( Dropped) = Iter.
648 F. Glover and M. Laguna

Thus, TabuDropStart records the iteration where Added becomes tabu-


active (to prevent this element from later being dropped), and TabuAddStart
records the iteration where Dropped becomes tabu-active (to prevent this
element from later being added).
For example, in the Min Iv- Tree problem illustration of Table 4, where the
edge (4,6) was added and the edge (4,7) was dropped on the first iteration,
we would establish the record (for Iter = 1)

TabuDropStart( 4,6) = 1
TabuAddStart( 4,7) = 1.

To identify whether or not an element is currently tabu-active, let Tabu-


DropTenure denote the tabu tenure (number of iterations) to forbid an ele-
ment to be dropped (once added), and let TabuAddTenure denote the tabu
tenure to forbid an element from being added (once dropped). (In our Min
Iv- Tree problem example of Section 2.2, we selected TabuAddTenure = 2 and
TabuDropTenure = 1.)
As a point of clarification, when we speak of an element as being tabu-
active, our terminology implicitly treats elements and attributes as if they
are the same. However, to be precise, each element is associated with two dif-
ferent attributes, one where the element belongs to the current solution and
one where the element does not. Elements may be viewed as corresponding
to variables and attributes as corresponding to specific value assignments
for such variables. There is no danger of confusion in the add/drop setting,
because we always know when an element belongs or does not belong to the
current solution, and hence we know which of the two associated attributes
is currently being considered.
We can now identify precisely the set of iterations during which an el-
ement (Le., its associated attribute) will be tabu-active. Let TestAdd and
TestDrop denote a candidate pair of elements, whose members are respec-
tively under consideration to be added and dropped from the current so-
lution. If TestAdd previously corresponded to an element Dropped that
was dropped from the solution and TestDrop previously corresponded to
an element Added that was added to the solution (not necessarily on the
same step), then it is possible that one or both may be tabu-active and
we can check their status as follows. By means of the records established
on earlier iterations, where TestAdd began to be tabu-active at iteration
TabuAddStart( TestAdd) and TestDrop began to be tabu-active at iteration
TabuDropStart( TestDrop), we conclude that as Iter grows the status of these
elements will be given by:
Tabu Search 649

TestAdd is tabu-active when:

Iter ~ TabuAddStart(TestAdd) + TabuAddTenure

TestDrop is tabu-active when:

Iter ~ TabuDropStart(TestDrop) + TabuDropTenure

Consider again the Min k- Tree problem illustration of Table 4. As previ-


ously noted, the move of Iteration 1 that added edge (4,6) and dropped edge
(4,7) was accompanied by setting TabuDropStart( 4,6) = 1 and TabuAddStart( 4,7)
= 1, to record the iteration where these two edges start to be tabu-active (to
prevent (4,6) from being dropped and (4,7) from being added). The edge
(4,6) will then remain tabu-active on subsequent iterations, in the role of
TestDrop (as a candidate to be dropped), as long as

Iter ~ TabuDropStart(4, 6) + TabuDropTenure.


Hence, since we selected TabuDrop Tenure = 1 (to prevent an added edge
from being dropped for 1 iteration), it follows that (4,6) remains tabu-active
as long as

Iter ~ 2.

Similarly, having selected TabuAddTenure = 2, we see that the edge (4,7)


remains tabu-active, to forbid it from being added back, as long as

Iter ~ 3.

An initialization step is needed to be sure that elements that have never


been previously added or dropped from the solutions successively generated
will not be considered tabu-active. This can be done by initially setting
TabuAddStart and TabuDropStart equal to a large negative number for all
elements. Then, as Iter begins at 1 and successively increases, the inequal-
ities that determine the tabu-active status will not be satisfied, and hence
will correctly disclose that an element is not tabu-active, until it becomes
one of the elements Added or Dropped. (Alternately, TabuAddStart and
TabuDropStart can be initialized at 0, and the test of whether an element is
tabu-active can be skipped when it has a 0 value in the associated array.)
650 F. Glover and M. Laguna

2.4.2 Streamlining
The preceding ideas can be streamlined to allow a more convenient im-
plementation. First, we observe that the two arrays, TabuAddStart and
TabuDropStart, which we have maintained separately from each other in
to emphasize their different functions, can be combined into a single array
TabuStart. The reason is simply that we can interpret TabuStart( E) to be
the same as TabuDropStart(E) when the element E is in the current solu-
tion, and to be the same as TabuAddStart{E) when E is not in the current
solution. (There is no possible overlap between these two states of E, and
hence no danger of using the TabuStart array incorrectly.) Consequently,
from now on, we will let the single array TabuStart take the role of both
TabuAddStart and TabuDropStart. For example, when the move is executed
that (respectively) adds and drops the elements Added and Dropped, the
appropriate record consists of setting:
TabuStart(Added) = Iter
TabuStart(Dropped) = Iter.
The TabuStart array has an additional function beyond that of monitor-
ing the status of tabu-active elements. (As shown in Section 4, this array is
also useful for determining a type of frequency measure called a residence
frequency.) However, sometimes it is convenient to use a different array,
TabuEnd, to keep track of tabu- active status for recency-based memory,
as we are treating here. Instead of recording when the tabu-active status
starts, TabuEnd records when it ends. Thus, in place of the two assignments
to TabuStart shown above, the record would consist of setting:
TabuEnd(Added) = Iter+ TabuDropTenure
TabuEnd(Dropped) = Iter + TabuAddTenure.
(The element Added is now available to be dropped, and the element Dropped
is now available to be added.) In conjunction with this, the step that checks
for whether a candidate pair of elements TestAdd and TestDrop are currently
tabu-active becomes:
TestAdd is tabu-active when:

Iter ~ TabuEnd(TestAdd)

TestDrop is tabu-active when:


Iter ~ TabuEnd{TestDrop).
Tabu Search 651

This is a simpler representation than the one using TabuStart, and so


it is appealing when TabuStart is not also used for additional purposes.
(Also, TabuEnd can simply be initialized at 0 rather than at a large negative
number.)
As will be discussed more fully in the next section, the values of TabuAd-
dTenure and TabuDropTenure (which are explicitly referenced in testing
tabu-active status with TabuStart, and implicitly referenced in testing this
status with TabuEnd), are often preferably made variable rather than fixed.
The fact that we use different tenures for added and dropped elements dis-
closes that it can be useful to differentiate the tenures applied to elements
of different classes. This type of differentiation can also be based on his-
torical performance, as tracked by frequency-based measures. Consequently,
tenures may be individually adjusted for different elements (as well as modi-
fied over time). Such adjustment can be quite effective in some settings (e.g.,
see Laguna, et al. 1995). These basic considerations can be refined to cre-
ate effective implementations and also can be extended to handle additional
move structures, as shown in Glover and Laguna (1997).

2.5 Tabu Tenure


In general, recency-based memory is managed by creating one or several tabu
lists, which record the tabu-active attributes and implicitly or explicitly
identify their current status. Tabu tenure can vary for different types or
combinations of attributes, and can also vary over different intervals of time
or stages of the search. This varying tenure makes it possible to create
different kinds of tradeoffs between short term and longer term strategies.
It also provides a dynamic and robust form of search.
The choice of appropriate types of tabu lists depends on the context.
Although no single type of list is uniformly best for all applications, some
guidelines can be formulated. If memory space is sufficient (as it often is)
to store one piece of information (e.g., a single integer) for each solution
attribute used to define the tabu activation rule, it is usually advantageous
to record the iteration number that identifies when the tabu-active status
of an attribute starts or ends as illustrated by the add/drop data structure
described in Sections 2.3 and 2.4. This typically makes it possible to test
the tabu status of a move in constant time. The necessary memory space
depends on the attributes and neighborhood size, but it does not depend on
the tabu tenure.
Depending on the size of the problem, it may not be feasible to imple-
ment the preceding memory structure in combination with certain types of
652 F. Glover and M. Laguna

attributes. In general, storing one piece of information for each attribute be-
comes unattractive when the problem size increases or attribute definition is
complex. Sequential and circular tabu lists are used in this case, which store
the identities of each tabu-active attribute, and explicitly (or implicitly, by
list position) record associated tabu tenures.
Effective tabu tenures have been empirically shown to depend on the
size of the problem instance. However, no single rule has been designed to
yield an effective tenure for all classes of problems. This is partly because an
appropriate tabu tenure depends on the strength of the tabu activation rule
employed (where more restrictive rules are generally coupled with shorter
tenures). Effective tabu tenures and tabu activation rules can usually be
determined quite easily for a given class of problems by a little experimen-
tation. Tabu tenures that are too small can be recognized by periodically
repeated objective function values or other function indicators, including
those generated by hashing, that suggest the occurrence of cycling. Tenures
that are too large can be recognized by a resulting deterioration in the qual-
ity of the solutions found (within reasonable time periods). Somewhere in
between typically exists a robust range of tenures that provides good per-
formance.
Once a good range of tenure values is located, first level improvements
generally result by selecting different values from this range on different
iterations. (A smaller subrange, or even more than one subrange, may be
chosen for this purpose.) Problem structures are sometimes encountered
where performance for some individual fixed tenure values within a range can
be unpredictably worse than for other values in the range, and the identity of
the isolated poorer values can change from problem to problem. However, if
the range is selected to be good overall then a strategy that selects different
tenure values from the range on different iterations typically performs at a
level comparable to selecting one of the best values in the range, regardless
of the problem instance.
Short term memory refinements subsequently discussed, and longer term
considerations introduced in later sections, transform the method based on
these constructions into one with considerable power. Still, it occasionally
happens that even the initial short term approach by itself leads to excep-
tionally high quality solutions. Consequently, some of the TS literature has
restricted itself only to this initial part of the method.
In general, short tabu tenures allow the exploration of solutions "close"
to a local optimum, while long tenures can help to break free from the
vicinity of a local optimum. These functions illustrate a special instance
of the notions of intensification and diversification that will be explored in
Tabu Search 653

more detail later. Varying the tabu tenure during the search provides one
way to induce a balance between closely examining one region and moving
to different parts of the solution space.
In situations where a neighborhood may (periodically) become fairly
small, or where a tabu tenure is chosen to be fairly large, it is entirely pos-
sible that iterations can occur when all available moves are classified tabu.
In this case an aspiration-by-default is used to allow a move with a "least
tabu" status to be considered admissible. Such situations rarely occur for
most problems, and even random selection is often an acceptable form of
aspiration-by-default. When tabu status is translated into a modified evalu-
ation criterion, by penalties and inducements, then of course aspiration-by-
default is handled automatically, with no need for to monitor the possibility
that all moves are tabu.
There are several ways in which a dynamic tabu tenure can be imple-
mented. These implementations may be classified into random and system-
atic dynamic tabu tenures.

2.5.1 Random Dynamic Tenure


Random dynamic tabu tenures are often given one of two forms. Both of
these forms use a tenure range defined by parameters tmin and t max . The
tabu tenure t is randomly selected within this range, usually following a
uniform distribution. In the first case, the chosen tenure is maintained
constant for at max iterations, and then a new tenure is selected by the same
process. The second form draws a new t for every attribute that becomes
tabu at a given iteration. The first form requires more bookkeeping than the
second one, because one must remember the last time that the tabu tenure
was modified.
Either of the two arrays TabuStart or TabuEnd discussed in Section 2.4
can be used to implement these forms of dynamic tabu tenure. For example,
a 2-dimensional array TabuEnd can be created to control a dynamic recency-
based memory for the sequencing problem introduced at the beginning of
this section. As in the case of the Min k-Tree problem, such an array can be
used to record the time (iteration number) at which a particular attribute
will be released from its tabu status. Suppose, for example, that tmin = 5
and t max = 10 and that swaps of jobs are used to move from one solution to
another in the sequencing problem. Also, assume that TabuEnd(j,p) refers to
the iteration that job j will be released from a tabu restriction that prevents
it from being assigned to position p. Then, if at iteration 30, job 8 in position
2 is swapped with job 12 in position 25, we will want to make the attribute
654 F. Glover and M. Laguna

(8,2) and (12,25) tabu-active for some number of iterations to prevent a


move that will return one or both of jobs 8 and 12 from re-occupying their
preceding positions. If t is assigned a value of 7 from the range tmin =
5 and t max = 10, then upon making the swap at iteration 30 we may set
TabuEnd{8,2) = 37 and TabuEnd(12,25) = 37.
This is not the only kind of TabuEnd array that can be used for the se-
quencing problem, and we examine other alternatives and their implications
in Section 3. Nevertheless, we warn against a potential danger. An array
TabuEnd{i,j) that seeks to prevent jobs. i and j from exchanging positions,
without specifying what these positions are, does not truly refer to attributes
of a sequencing solution, and hence entails a risk if used to determine tabu
status. (The pair (i,j) here constitutes an attribute of a move in a loose
sense, but does not serve to distinguish one solution from another.) Thus, if
at iteration 30 we were to set TabuEnd{8,12) = 37, in order to prevent jobs
8 and 12 from exchanging positions until after iteration 37, this still might
not prevent job 8 from returning to position 2 and job 12 from returning
to position 25. In fact, a sequence of swaps could be executed that could
return to precisely the same solution visited before swapping jobs 8 and 12.
Evidently, the TabuEnd array can be used by selecting a different t from
the interval (tmin, t max ) at every iteration. As remarked in the case of the
Min k- Tree problem, it is also possible to select t differently for different
solution attributes.

2.5.2 Systematic Dynamic Tenure

Dynamic tabu tenures based on a random scheme are attractive for their
ease of implementation. However, relying on randomization may not be
the best strategy when specific information about the context is available.
In addition, certain diversity-inducing patterns can be achieved more effec-
tively by not restricting consideration to random designs. A simple form
of systematic dynamic tabu tenure consists of creating a sequence of tabu
search tenure values in the range defined by tmin and t max . This sequence
is then used, instead of the uniform distribution, to assign the current tabu
tenure value. Suppose it is desired to vary t so that its value alternately
increases and decreases. (Such a pattern induces a form of diversity that
will rarely be achieved randomly.) Then the following sequence can be used
for the range defined above:

{5, 8, 6, 9, 7, 1O}.
Tabu Search 655

The sequence may be repeated as many times as necessary until the end of
the search, where additional variation is introduced by progressively shift-
ing and/or reversing the sequence before repeating it. (In a combined ran-
dom/systematic approach, the decision of the shift value and the forward
or backward direction can itself be made random.) Another variation is to
retain a selected tenure value from the sequence for a variable number of
iterations before selecting the next value. Different sequences can be created
and identified as effective for particular classes of problems.
The foregoing range of values (from 5 to 10) may seem relatively small.
However, some applications use even smaller ranges, but adaptively, increase
and decrease the midpoint of the range for diversification and intensifica-
tion purposes. Well designed adaptive systems can significantly reduce or
even eliminate the need to discover a best range of tenures by preliminary
calibration. This is an important area of study.
These basic alternatives typically provide good starting tabu search im-
plementations. In fact, most initial implementations apply only the simplest
versions of these ideas.

2.6 Aspiration Criteria and Regional Dependencies


Aspiration criteria are introduced in tabu search to determine when tabu
activation rules can be overridden, thus removing a tabu classification oth-
erwise applied to a move. (The improved-best and aspiration-by- default
criteria, as previously mentioned, are obvious simple instances.) The appro-
priate use of such criteria can be very important for enabling a TS method
to achieve its best performance levels. Early applications employed only a
simple type of aspiration criterion, consisting of removing a tabu classifi-
cation from a trial move when the move yields a solution better than the
best obtained so far. This criterion remains widely used. However, other
aspiration criteria can prove effective for improving the search.
A basis for one of these criteria arises by introducing the concept of
influence, which measures the degree of change induced in solution structure
or feasibility. This notion can be illustrated for the Min k-'free problem as
follows. Suppose that the current solution includes edges (1,2), (1,4), (4,7)
and (6,7), as illustrated in Figure 10, following. A high influence move, that
significantly changes the structure of the current solution, is exemplified by
dropping edge (1,2) and replacing it by edge (6,9). A low influence move,
on the other hand, is exemplified by dropping edge (6,7) and adding edge
(4,6). The weight difference of the edges in the high influence move is 15,
while the difference is 9 for the low influence move. However, it is important
656 F. Glover and M. Laguna

o 0 @
00
LON I:nflu.eme

00 @ 000 @
00 o
00

Figure 10: Influence level of two moves.

to point out that differences on weight or cost are not the only-or even the
primary-basis for distinguishing between moves of high and low influence.
In the present example, the move we identify as a low influence move creates
a solution that consists of the same set of nodes included in the current
solution, while the move we identified as a high influence move includes a
new node (number 9) from which new edges can be examined. (These moves
correspond to those labeled static and dynamic in Figure 5.)
As illustrated here, high influence moves mayor may not improve the
current solution, though they are less likely to yield an improvement when
the current solution is relatively good. But high influence moves are im-
portant, especially during intervals of breaking away from local optimality,
because a series of moves that is confined to making only small structural
Tabu Search 657

change is unlikely to uncover a chance for significant improvement. Execut-


ing the high influence move in Figure 10, for example, allows the search to
reach the optimal edges (8,9) and (9,12) in subsequent iterations. Of course,
moves of much greater influence than those shown can be constructed by
considering compound moves. Such considerations are treated in later sec-
tions.
Influence often is associated with the idea of move distance. Although
important, move influence is only one of several elements that commonly
underlie the determination of aspiration criteria. We illustrate a few of
these elements in Table 7.
Aspirations such as those shown in Table 7 can be applied according
to two implementation categories: aspiration by move and aspirations by
attribute. A move aspiration, when satisfied, revokes the move's tabu clas-
sification. An attribute aspiration, when satisfied, revokes the attribute's
tabu-active status. In the latter case the move mayor may not change its
tabu classification, depending on whether the tabu activation rule is trig-
gered by more than one attribute. For example in our sequencing problem,
if the swap of jobs 3 and 6 is forbidden because a tabu activation rule pre-
vents job 3 from moving at all, then an attribute aspiration that revokes job
3's tabu-active status also revokes the move's tabu classification. However,
if the swap (3,6) is classified tabu because both job 3 and job 6 are not
allowed to move, then revoking job 3's tabu-active status does not result in
overriding the tabu status of the entire move.
Different variants of the aspiration criteria presented in Table 7 are pos-
sible. For example, the regional aspiration by objective can be defined in
terms of bounds on the objective function value. These bounds determine
the region being explored, and they are modified to reflect the discovery of
better (or worse) regions. Another possibility is to define regions with re-
spect to time. For example, one may record the best solution found during
the recent past (defined as a number of iterations) and use this value as the
aspiration level.

2.7 Concluding Observations for the Min k-Tree Example


Influence of tabu tenures.
The tabu tenures used to illustrate the first level TS approach for the
Min k- Tree problem of course are very small. The risk of using such tenures
can be demonstrated in this example from the fact that changing the weight
of edge (3,6) in Figure 4 from 20 to 17 will cause the illustrated TS approach
with TabuAddTenure = 2 and TabuDropTenure = 1 to go into a cycle that
658 F. Glover and M. Laguna

Aspiration Description Example


by
Default If all available moves are Revoke the tabu status of
classified tabu, and are not all moves with minimum
rendered admissible by some TabuEnd value.
other aspiration criteria then a
"least tabu» move is selected.
Objective Global: A move aspiration is Global: The best total tardi-
satisfied if the move yields a ness found so far is 29. The
solution better than the best current sequence is (4, 1, 5, 3,
obtained so far. 6, 2) with T = 39. The move
value of the tabu swap (5,2)
is -20. Then, the tabu status
of the swap is revoked and the
search moves to the new best
sequence (4, 1, 2, 3, 6, 5) with
T= 19.
Regional: A move aspiration Regional: The best sequence
is satisfied if the move yields a found in the region defined by
solution better than the best all sequences (1, 2, 3,*,*, *) is
found in the region where the (1, 2, 3, 6, 4, 5) with T = 31.
solution lies. The current solution is (1, 4, 3,
2, 6, 5) with T = 23. The swap
(4, 2) with move value of 6 is
tabu. The tabu status is re-
voked because a new regional
best (1,2, 3, 4, 6, 5) with T =
29 can be found.
Search An attribute can be added and For the Min Iv- Tree problem,
Direction dropped from a solution (re- the edge (11, 12)has been re-
gardless of its tabu status), if cently dropped in the cur-
the direction of the search (im- rent improving phase making
proving or nonimproving) has its addition a tabu-active at-
not changed. tribute. The improving phase
can continue if edge (11,12) is
now added, therefore its tabu
status may be revoked.
Influence The tabu status of a low in- If the low influence swap (1,4)
fluence move may be revoked described in Table 8 is classi-
if a high influence move has fied tabu, its tabu status can
been performed since estab- be revoked after the high influ-
lishing the tabu status for the ence swap (4,5) is performed.
low influence move.

Table 7: Illustrative aspiration criteria.


Tabu Search 659

Characteristics Swap (1, 4) Swap (4, 5)


Move value 0 36
Move distance 1 4
Due date difference 1 12
Influence Low High

Table 8: Comparison of two swap moves.

will prevent the optimal solution from being found. The intuition that
TabuDropTenure has a stronger influence than the TabuAddTenure for this
problem is supported by the fact that the use of tenures of TabuAddTenure
= 1 and TabuDropTenure = 2 in this case will avoid the cycling problem
and allow an optimal solution to be found.

Alternative Neighborhoods.

The relevance of considering alternative neighborhoods can be illustrated by


reference to the following observation. For any given set of k+ 1 nodes, an
optimal (min weight) k-tree over these nodes can always be found by using
the greedy constructive procedure illustrated in Table 2 to generate a start-
ing solution (restricted to these nodes) or by beginning with an arbitrary
tree on these nodes and performing a succession of static improving moves
(which do not change the node set). The absence of a static improving move
signals that no better solution can be found on this set.
This suggests that tabu search might advantageously be used to guide
the search over a "node-swap" neighborhood instead of an "edge-swap"
neighborhood, where each move consists of adding a non-tree node i and
dropping a tree node j, followed by finding a min weight solution on the
resulting node set. (Since the tree node j may not be a leaf node, and
the reconnect ions may also not make node i a leaf node in the new tree,
the possibilities are somewhat different than making a dynamic move in the
edge-swap neighborhood.) The tabu tenures may reasonably be defined over
nodes added and dropped, rather than over edges added and dropped.

Critical event memory.

The type of critical event memory used in the illustration of restarting the
TS approach in Section 2.3 may not be best. Generally it is reasonable to
expect that the type of critical event memory used for restarting should be
different from that used to continue the search from the current solution
660 F. Glover and M. Laguna

(when both are applied to drive the search into new regions). Nevertheless,
a form that is popularly used in both situations consists of remembering all
elements contained in solutions previously examined. One reason is that it
is actually easier to maintain such memory than to keep track of elements
that only occur in selected solutions. Also, instead of keeping track only
of which elements occur in past solution, critical event memory is more
usually designed to monitor the frequency that elements have appeared in
past solutions. Such considerations are amplified in Section 4.

3 Additional Aspects of Short Term Memory


We began the discussion of short term memory for tabu search by contrasting
the TS designs with those of memoryless strategies such as simple or iterated
descent, and by pointing out how candidate list strategies are especially
important for applying TS in the most effective ways. We now describe
types of candidate list strategies that often prove valuable in tabu search
implementations. Then we examine the issues of logical restructuring, which
provide important bridges to longer term considerations.

3.1 Tabu Search and Candidate List Strategies


The aggressive aspect of TS is manifest in choice rules that seek the best
available move that can be determined with an appropriate amount of ef-
fort. As addressed in Section 2, the meaning of best in TS applications is
customarily not limited to an objective function evaluation. Even where the
objective function evaluation may appear on the surface to be the only rea-
sonable criterion to determine the best move, the non-tabu move that yields
a maximum improvement or least deterioration is not always the one that
should be chosen. Rather, as we have noted, the definition of best should
consider factors such as move influence, determined by the search history
and the problem context.
For situations where N*{x) is large or its elements are expensive to
evaluate, candidate list strategies are essential to restrict the number of
solutions examined on a given iteration. In many practical settings, TS is
used to control a search process that may involve the solution of relatively
complex subproblems by way of linear programming or simulation. Because
of the import~ce TS attaches to selecting elements judiciously, efficient
rules for generating and evaluating good candidates are critical to the search
process. The purpose of these values is to isolate regions of the neighborhood
Tabu Search 661

containing moves with desirable features and to put these moves on a list of
candidates for current examination.
Before describing the kinds of candidate list strategies that are partic-
ularly useful in tabu search implementations, we note that the efficiency
of implementing such strategies often can be enhanced by using relatively
straightforward memory structures to give efficient updates of move evalu-
ations from one iteration to another. Appropriately coordinated, such up-
dates can appreciably reduce the effort of finding best or near best moves.
In sequencing, for example, the move values often can be calculated
without a full evaluation of the objective function. Intelligent updating can
be useful even where candidate list strategies are not used. However, the
inclusion of explicit candidate list strategies, for problems that are large,
can significantly magnify the resulting benefits. Not only search speed but
also solution quality can be influenced by the use of appropriate candidate
list strategies. Perhaps surprisingly, the importance of such approaches is
often overlooked.

3.2 Some General Classes of Candidate List Strategies


Candidate lists can be constructed from context related rules and from gen-
eral strategies. In this section we focus on rules for constructing candidate
lists that are context-independent. We emphasize that the effectiveness of a
candidate list strategy should not be measured in terms of the reduction of
the computational effort in a single iteration. Instead, a preferable measure
of performance for a given candidate list is the quality of the best solution
found given a specified amount of computer time. For example, a candidate
list strategy intended to replace an exhaustive neighborhood examination
may result in more iterations per unit of time, but may require many more
iterations to match the solution quality of the original method. If the quality
of the best solution found within a desirable time limit (or across a gradu-
ated series of such limits) does not improve, we conclude that the candidate
list strategy is not effective.

3.2.1 Aspiration Plus


The Aspiration Plus strategy establishes a threshold for the quality of a
move, based on the history of the search pattern. The procedure operates
by examining moves until finding one that satisfies this threshold. Upon
reaching this point, additional moves are examined, equal in number to the
selected value Plus, and the best move overall is selected.
662 F. Glover and M. Laguna

To assure that neither too few nor too many moves are considered, this
rule is qualified to require that at least Min moves and at most Max moves
are examined, for chosen values of Min and Max. The interpretation of Min
and Max is as follows. Let First denote the number of moves examined
when the aspiration threshold is first satisfied. Then if Min and Max were
not specified, the total number of moves examined would be First + Plus.
However, if First + Plus < Min, then Min moves are examined while if
First + Plus> Max, then Max moves are examined. (This conditions may
be viewed as imposing limits on the move that is "effectively" treated as
the First move. For example, if as many as Max - Plus moves are exam-
ined without finding one that satisfies the aspiration threshold, then First
effectively becomes the same as Max - Plus.)
This strategy is graphically represented in Figure 11. In this illustration,
the fourth move examined satisfies the aspiration threshold and qualifies as
First. The value of Plus has been selected to be 5, and so 9 moves are
examined in total, selecting the best over this interval. The value of Min,
set at 7, indicates that at least 7 moves will be examined even if First is
so small that First + Plus < 7. (In this case, Min is not very restrictive,
because it only applies if First < 2.) Similarly, the value of Max, set at 11,
indicates that at most 11 moves will be examined even if First is so large
that First + Plus> 11. (Here, Max is strongly restrictive.) The sixth move
examined is the best found in this illustration.
The "Aspiration" line in this approach is an established threshold that
can be dynamically adjusted during the search. For example, during a se-
quence of improving moves, the aspiration may specify that the next move
chosen should likewise be improving, at a level based on other recent moves
and the current objective function value. Similarly, the values of Min and
Max can be modified as a function of the number of moves required to meet
the threshold.
During a nonimproving sequence the aspiration of the Aspiration Plus
rule will typically be lower than during an improving phase, but rise toward
the improving level as the sequence lengthens. The quality of currently ex-
amined moves can shift the threshold, as by encountering moves that signif-
icantly surpass or that uniformly fall below the threshold. As an elementary
option, the threshold can simply be a function of the quality of the initial
Min moves examined on the current iteration.
The Aspiration Plus strategy includes several other strategies as special
cases. For example, a first improving strategy results by setting Plus = 0 and
directing the aspiration threshold to accept moves that qualify as improving,
while ignoring the values of Min and Max. Then First corresponds to the
Thbu Search 663

Plus
.......................
····
; 0; •
Aspiration
--------Or----{)----~--------
0 :0 :
:0
o
0 0
I I I ·
I . I I I I I : I I I ..
...
1 2 3 4 5 6 7 8 9 10 11 12
flUs! Mn M:tx
Number of moves ~anined

Figure 11: Aspiration Plus Strategy.


664 F. Glover and M. Laguna

first move that improves the current value of the objective, if such a move can
be found. A slightly more advanced strategy can allow Plus to be increased
or decreased according to the variance in the quality of moves encountered
from among some initial number examined. In general, in applying the
Aspiration Plus strategy, it is important to assure on each iteration that
new moves are examined which differ from those just reviewed. One way of
achieving this is to create a circular list and start each new iteration where
the previous examination left off.

3.2.2 Elite Candidate List

The Elite Candidate List approach first builds a Master List by examining
all (or a relatively large number of) moves, selecting the k best moves en-
countered, where k is a parameter of the process. Then at each subsequent
iteration, the current best move from the Master List is chosen to be exe-
cuted, continuing until such a move falls below a given quality threshold, or
until a given number of iterations have elapsed. Then a new Master List is
constructed and the process repeats. This strategy is depicted in Figure 12,
below.
This technique is motivated by the assumption that a good move, if
not performed at the present iteration, will still be a good move for some
number of iterations. More precisely, after an iteration is performed, the
nature of a recorded move implicitly may be transformed. The assumption
is that a useful proportion of these transformed moves will inherit attractive
properties from their antecedents.
The evaluation and precise identity of a given move on the list must be
appropriately monitored, since one or both may change as result of execut-
ing other moves from the list. For example, in the Min k- Tree problem the
evaluations of many moves can remain unchanged from one iteration to the
next. However, the identity and evaluation of specific moves will change as
a result of deleting and adding particular edges, and these changes should
be accounted for by appropriate updating (applied periodically if not at
each iteration). An Elite Candidate List strategy can be advantageously
extended by a variant of the Aspiration Plus strategy, allowing some ad-
ditional number of moves outside the Master List to be examined at each
iteration, where those of sufficiently high quality may replace elements of
the Master List.
Tabu Search 665

I
1 2
I
3
I I I

I tef au ans
-

Figure 12: Elite candidate list strategy.


666 F. Glover and M. Laguna

3.2.3 Successive Filter Strategy

Moves can often be broken into component operations, and the set of moves
examined can be reduced by restricting consideration to those that yield high
quality outcomes for each operation separately. For example, the choice of
an exchange move that includes an "add component" and a "drop com-
ponent" may restrict attention only to exchanges created from a relatively
small subset of "best add" and "best drop" components. The gain in effi-
ciency can be considerable. If there are 100 add possibilities and 100 drop
possibilities, the number of add/drop combinations is 10,000. However, by
restricting attention to the 8 best add and drop moves, considered indepen-
dently, the number of combinations to examine is only 64. (Values of 8 and
even smaller have been found effective in some practical applications.)
The evaluations of the separate components often will give only approx-
imate information about their combined evaluation. Nevertheless, if this in-
formation is good enough to insure a significant number of the best complete
moves will result by combining these apparently best components, then the
approach can yield quite good outcomes. Improved information may be ob-
tained by sequential evaluations, as where the evaluation of one component
is conditional upon the prior (restricted) choices of another. Such strate-
gies of subdividing compound moves into components, and then restricting
consideration of complete compound moves only to those assembled from
components that pass selected thresholds of quality, have proved quite ef-
fective in TS methods for partitioning problems and for telecommunication
channel balancing problems.
Conditional uses of component evaluations are also relevant for sequenc-
ing problems, where a measure can be defined to identify preferred attributes
using information such as due dates, processing times, and delay penalties.
If swap moves are being used, then some jobs are generally better candi-
dates than others to move early or later in the sequence. The candidate
list considers those swaps whose composition includes at least one of these
preferred attributes.
In the context of the traveling salesman problem, good solutions are
often primarily composed of edges that are among the 20 to 40 shortest
edges meeting one of their endpoints (depending on various factors). Some
studies have attempted to limit consideration entirely to tours constructed
from such a collection of edges. The successive filter strategy, by contrast,
offers greater flexibility by organizing moves that do not have to be entirely
composed of such special elements, provided one or more of these elements is
incorporated as part of the move. This approach can be frequently controlled
Tabu Search 667

to require little more time than the more restricted standard approach, while
affording a more desirable set of alternatives to consider.

3.2.4 Sequential Fan Candidate List


A type of candidate list that is highly exploitable by parallel processing
is the sequential fan candidate list. The basic idea is to generate some p
best alternative moves at a given step, and then to create a fan of solution
streams, one for each alternative. The several best available moves for each
stream are again examined, and only the p best moves overall (where many
or no moves may be contributed by a given stream) provide the p new
streams at the next step.
In the setting of tree search methods such a sequential fanning process
is sometimes called beam search. For use in the tabu search framework,
TS memory and activation rules can be carried forward with each stream
and hence inherited in the selected continuations. Since a chosen solution
can be assigned to more than one stream, different streams can embody
different missions in TS. Alternatively, when two streams merge into the
same solution, other streams may be started by selecting a neighbor adjacent
to one of the current streams.
The process is graphically represented in Figure 13. Iteration 0 con-
structs an initial solution or alternatively may be viewed as the starting
point for constructing a solution. That is, the sequential fan approach can
be applied using one type of move to create a set of initial solutions, and
then can continue using another type of move to generate additional solu-
tions. (We thus allow a "solution" to be a partial solution as well as a
complete solution.) The best moves from this solution are used to generate
p streams. Then at every subsequent iteration, the overall best moves are
selected to lead the search to p different solutions. Note that since more
than one move may lead the search to the same solution, more than p moves
may be necessary to continue the exploration of p distinct streams.
A more intensive form of the sequential fan candidate list approach,
which is potentially more powerful but requires more work, is to use the
process illustrated in Figure 13 as a "look ahead" strategy. In this case a
limit is placed on the number of iterations that the streams are generated
beyond iteration O. Then the best outcome at this limiting iteration is
used to identify a "best current move" (a single first branch) from iteration
O. Upon executing this move, the step shown as iteration 1 in Figure 13
becomes the new iteration 0, that is, iteration 0 always corresponds to the
current iteration. Then this solution becomes the source of p new streams,
668 F. Glover and M. Laguna

p streams

o 1 2 3
Iterations

Figure 13: Sequential fan candidate list.


Tabu Search 669

and the process repeats.


There are a number of possible variants of this sequential fan strategy.
For example, instead of selecting a single best branch at the limiting itera-
tion, the method can select a small number of best branches, and thus give
the method a handful of candidates from which to generate p streams at the
new iteration O.
The iteration limit that determines depth of the look ahead can be vari-
able, and the value of p can change at various depths. Also the number of
successors of a given solution that are examined to determine candidates
for the p best continuations can be varied as by progressively reducing this
number at greater depths.
The type of staging involved in successive solution runs of each stream
may be viewed as a means of defining levels in the context of the Proxi-
mate Optimality Principle commonly associated with the strategic oscilla-
tion component of tabu search. Although we will study this principle in
more detail later, we remark that the sequential fan candidate list has a
form that is conveniently suited to exploit it.

3.2.5 Bounded Change Candidate List


A bounded change candidate list strategy is relevant in situations where
an improved solution can be found by restricting the domain of choices so
that no solution component changes by more than a limited degree on any
step. A bound on this degree, expressed by a distance metric appropriate to
the context, is selected large enough to encompass possibilities considered
strategically relevant. The metric may allow large changes along one dimen-
sion, but limit the changes along another so that choices can be reduced and
evaluated more quickly. Such an approach offers particular benefits as part
of an intensification strategy based on decomposition, where the decompo-
sition itself suggests the limits for bounding the changes considered.

3.3 Connections Between Candidate Lists, Tabu Status and


Aspiration Criteria
It is useful to summarize the short term memory considerations embodied in
the interaction between candidate lists, tabu status and aspiration criteria.
The operations of these TS short term elements are shown in Figure 14.
The representation of penalties in Figure 14 either as "large" or "very
small" expresses a thresholding effect: either the tabu status yields a greatly
deteriorated evaluation or else it chiefly serves to break ties among solutions
670 F. Glover and M. Laguna

with highest evaluations. Such an effect of course can be modulated to shift


evaluations across levels other than these extremes. If all moves currently
available lead to solutions that are tabu (with evaluations that normally
would exclude them from being selected), the penalties result in choosing a
"least tabu" solution.
The sequence of the tabu test and the aspiration test in Figure 14 evi-
dently can be reversed (that is, by employing the tabu test only if the aspi-
ration threshold is not satisfied). Also, the tabu evaluation can be modified
by creating inducements based on the aspiration level, just as it is modified
by creating penalties based on tabu status. In this sense, aspiration condi-
tions and tabu conditions can be conceived roughly as "mirror images" of
each other.
For convenience Figure 14 expresses tabu restrictions solely in terms of
penalized evaluations, although we have seen that tabu status is often per-
mitted to serve as an all-or-none threshold, without explicit reference to
penalties and inducements (by directly excluding tabu options from being
selected, subject to the outcome of aspiration tests). Whether or not mod-
ified evaluations are explicitly used, the selected move may not be the one
with the best objective function value, and consequently the solution with
the best objective function value encountered throughout the search history
is recorded separately.

3.4 Logical Restructuring


Logical restructuring is an important element of adaptive memory solution
approaches, which gives a connection between short and long term strategies.
Logical restructuring is implicit in strategic oscillation and path relinking,
which we examine in subsequent sections, but its role and significance in
these strategies is often overlooked. By extension, the general usefulness of
logical restructuring is also often not clearly understood. We examine some
of its principal features before delving into longer term considerations, and
show how it can also be relevant for improving the designs of short term
strategies.
Logical restructuring emerges as a way to meet the combined concerns
of quality and influence. Its goal is to exploit the ways in which influence
(structural, local and global) can uncover improved routes to high quality
solutions. For this purpose, a critical step is to re-design standard strategies
to endow them with the power to ferret out opportunities otherwise missed.
This step particularly relies on integrating two elements: (1) the identifica-
tion of changes that satisfy properties that are essential (and limiting) in
Tabu Search 671

Codb.LllIt !'.an.n:ri;In
GeraaIe a.:mM Scmtla cmfida..
r--------~ lis\ maaa.. a. tI:i.alsohtiau 'fitm
h OllBdsdumnx

a.a.~.d
Tdu '&dualion
Akh mpemlty(arwrysmall
pemltyb.¥ed on tabu:-ri,oe atII:Ibut!s:).

~. q,&:1. OU.Am/...d Tdul!PduQlion


If tabu ewhal:imaf" , is th!best fOr ....---1 Attacha.laJ;e pemltr buad CI\
arvcudida examined, mc:ad iDs by siam oftabu...adiw aI:hIb1tes
an~upd.ale.

No l!Ntu.~m~
J----~ Moo. Scm" ba. best J8ClOIded " :

Figure 14: Short term memory operation.


672 F. Glover and M. Laguna

31

3S

Figure 15: Illustrative Min Iv- Tree Problem.

order to achieve improvement, in contrast to changes that simply depart


from what has previously been seen; (2) the use of anticipatory (" means-
ends") analysis to bring about such essential changes. Within the context
of anticipatory analysis, logical restructuring seeks to answer the following
questions: "What conditions assure the existence of a trajectory that will
lead to an improved solution?" and" What intermediate moves can cre-
ate such conditions?" The" intermediate moves" of the second question
may be generated either by modifying the evaluations used to select tran-
sitions between solutions or by modifying the neighborhood structure that
determines these transitions.
To illustrate the relevant considerations, we return again to the example
of the Min Iv- Tree problem discussed in previous sections. We replace the
previous graph by the one shown in Figure 15, but continue to consider the
case of k = 4.
The same rules to execute a first-level tabu search approach as in our
earlier illustrations (including the rules for generating a starting solution)
produces a sequence of steps that quickly reaches the vicinity of the optimal
Tabu Search 673

solution, but requires some effort actually find this solution. In fact, it is
readily verified that applying these rules will cause all edges of the optimal
solution except one, edge (10,11), to be contained in the union of the two
solutions obtained on iterations 4 and 5. Yet an optimal solution will not
be found until iteration 11.
This delayed process of finding a route to an optimal solution (which
can be greatly magnified for larger or more complex problems) can be sub-
stantially accelerated by means of logical restructuring. More generally,
such restructuring can make it possible to uncover fertile options that can
otherwise be missed entirely.

3.4.1 Restructuring by Changing Evaluations and Neighborhoods


The first type of logical restructuring we illustrate makes use both of mod-
ified evaluations and an amended neighborhood structure. As pointed out
in Section 2.2 earlier, the swap moves we have employed for the Min k-Tree
problem may be subdivided into two types: static swaps, which leave the
nodes of the current tree unchanged, and dynamic swaps, which replace one
of the nodes currently in the tree with another that is not in the tree. This
terminology was chosen to reflect the effect that each swap type has on the
nodes of the tree. Since dynamic swaps in a sense are more influential, we
give them special consideration. We observe that a dynamic swap can select
an edge to be dropped only if it is a terminal edgf7-i.e., one that meets a
leaf node of the tree, which is a node that is met by only a single tree edge
(the terminal edge).
Although it is usually advantageous to drop an edge with a relatively
large weight, this may not be possible. Thus, we are prompted to consider
an "anticipatory goal" of making moves that cause more heavily weighted
edges to become terminal edges, and hence eligible to be dropped. By this
means, static swaps can be used to set up desirable conditions for dynamic
swaps.
The solution obtained at iteration 4 of the process for solving the example
problem of Figure 15 gives a basis for showing what is involved. We clarify
the situation by showing the current solution at this iteration in Figure 16
(without bothering to identify the solutions obtained at other iterations),
where edges contained in the current tree are shown as heavy edges and the
candidate edges to add to the tree are shown as light edges.
The move that changes the tree at iteration 4 to that of iteration 5-if
the rules illustrated in Section 2 are used-is a dynamic swap that adds
edge (8,11) with a weight of 9 and drops edge (9,10) with a weight of 8.
674 F. Glover and M. Laguna

Figure 16: Solution and candidate edges to add to iteration 4 tree.

We make use of information contained in this choice to construct a more


powerful move using logical restructuring, as follows.
Having identified (8,11) as a candidate to be added, the associated antic-
ipatory goal is to identify a static swap that will change a larger weight edge
into a terminal edge. Specifically, the static swap that adds edge (10,11) and
drops edge (6,10), with a move value of 3, produces a terminal edge from
the relatively high weight edge (6,11) (which has a weight of 13). Since the
candidate edge (8,11) to be added has a weight of9, the result of joining the
indicated static swap with the subsequent dynamic swap (that respectively
adds and drops (8,11) and (6,11)) will be a net gain. (The static move value
of 3 is joined with the dynamic move value of -4, yielding a result of -1.)
Effectively, such anticipatory analysis leads to a way to extract a fruitful
outcome from a relatively complex set of options by focusing on a simple set
of features. It would be possible to find the same outcome by a more pon-
derous approach that checks all sequences in which a dynamic move follows
a static move. This requires a great deal of computational effort-in fact,
considerably more than involved in the approach without logical restructur-
ing that succeeded in finding an optimal solution at iteration 11 (considering
the trade-off between number of iterations and work per iteration).
By contrast, the use of logical restructuring allows the anticipatory anal-
ysis to achieve the benefits of a more massive exploration of alternatives, but
without incurring the burden of undue computational effort. In this exam-
ple, the restructuring is accomplished directly as follows. First, it is only
Tabu Search 675

necessary to identify the two best edges to add for a dynamic swap (inde-
pendent of matching them with an edge to drop), subject to requiring that
these edges meet different nodes of the tree. (In the tree of iteration 4,
seen in Figure 16, these two edges are (8,11) and (8,12}.) Then at the next
step, during the process of looking at candidate static swaps, a modified
"anticipatory move value" is created for each swap that creates a terminal
edge, by subtracting the weight of this edge from the standard move value.
This gives all that is needed to find (and evaluate) a best "combined
move sequence" of the type we are looking for. In particular, every static
move that generates a terminal edge can be combined with a dynamic move
that drops this edge and then adds one of the two "best edges" identified
in first of the two preceding steps. Hence, the restructuring is completed
by adding the anticipatory move value to the weight of one of these two
edges (appropriately identified) thereby determining a best combined move.
The illustrated process therefore achieves restructuring in two ways-by
modifying customary move values and by fusing certain sequences of moves
into a single compound move.
Although this example appears on the surface to be highly problem spe-
cific, its basic features are shared by applications that arise in a variety of
problem settings. Later the reader will see how variants of logical restruc-
turing embodied in this illustration are natural components of the strategies
of path relinking and ejection chain constructions.

3.4.2 Threshold Based Restructuring and Induced Decomposi-


tion
The second mode of logical restructuring that we illustrate by reference to
the Min k-Tree problem example is more complex (in the sense of inducing a
more radical restructuring), but relatively easy to sketch and also potentially
more powerful.
Consider again the solution produced at iteration 4. This is a local
optimum and also the best solution found up to the current stage of search.
We seek to identify a property that will be satisfied by at least one solution
that has a smaller weight than the weight of this solution (41), and which
will impose useful limits on the composition of such a solution. A property
that in fact must be shared by all "better" solutions can be expressed as a
threshold involving the average weight of the tree edges. This average weight
must be less than the threshold value of 41/4 (Le., 10 1/4). Since some of the
edges in any improved solution must have weights less than this threshold,
we are motivated to identify such "preferred" edges as a foundation for a
676 F. Glover and M. Laguna

Figure 17: Threshold generated components.

restructured form of the solution approach. In this type of restructuring,


we no longer confine attention to swap moves, but look for ways to link the
preferred edges to produce an improved solution. (Such a restructuring can
be based on threshold values derived from multiple criteria.)
When the indicated strategy is applied to the present example, a large
part of the graph is eliminated, leaving only 3 separate connected compo-
nents: (a) the edge (2,3), (b) the edge (9,10), and (c) the three edges (8,11),
(8,12) and (11,12). The graph that highlights these components is shown in
Figure 17. At this point a natural approach is to link such components by
shortest paths, and then shave off terminal edges if the trees are too large,
before returning to the swapping process. Such an approach will immedi-
ately find the optimal solution that previously was not found until iteration
11.
This second illustrated form of restructuring is a fundamental compo-
nent of the strategic oscillation approach which we describe in more detail
in the next section. A salient feature of this type of restructuring is its
ability to create an induced decomposition of either the solution space or the
problem space. This outcome, coupled with the goal of effectively joining
Tabu Search 677

the decomposed components to generate additional solution alternatives, is


also a basic characteristic of path relinking, which is also examined in the
next section. More particularly, the special instance of path relinking known
as vocabulary building, which focuses on assembling fragments of solutions
into larger units, offers a direct model for generalizing the "threshold de-
composition" strategy illustrated here.
In some applications, specific theorems can be developed about the na-
ture of optimal solutions and can be used to provide relevant designs for
restructuring. The Min k- Tree problem is one for which such a theorem
is available (Glover and Laguna, 1997a). Interestingly, the second form of
restructuring we have illustrated, which is quite basic, exploits several as-
pects of this theorem-although without "knowing" what the theorem is.
In general, logical restructuring and the TS strategies such as path relinking
and strategic oscillation which embody it, appear to behave as if they simi-
larly have a capacity to exploit underlying properties of optimal solutions in
broader contexts- contexts whose features are not sufficiently uniform or
easily characterized to permit the nature of optimal solutions to be expressed
in the form of a theorem.

4 Longer Term Memory


In some applications, the short term TS memory components are sufficient
to produce very high quality solutions. However, in general, TS becomes
significantly stronger by including longer term memory and its associated
strategies. In the longer term TS strategies, the modified neighborhood pro-
duced by tabu search may contain solutions not in the original one, generally
consisting of selected elite solutions (high quality local optima) encountered
at various points in the solution process. Such elite solutions typically are
identified as elements of a regional cluster in intensification strategies, and as
elements of different clusters in diversification strategies. In addition, elite
solution components, in contrast to the solutions themselves, are included
among the elements that can be retained and integrated to provide inputs
to the search process.
Perhaps surprisingly, the use of longer term memory does not require
long solution runs before its benefits become visible. Often its improvements
begin to be manifest in a relatively modest length of time, and can allow
solution efforts to be terminated somewhat earlier than otherwise possible,
due to finding very high quality solutions within an economical time span.
The fastest methods for some types of routing and scheduling problems, for
678 F. Glover and M. Laguna

example, are based on including longer term TS memory. On the other hand,
it is also true that the chance of finding still better solutions as time grows-
in the case where an optimal solution is not already found-is enhanced by
using longer term TS memory in addition to short term memory.

4.1 Frequency-Based Approach


Frequency-based memory provides a type of information that complements
the information provided by recency-based memory, broadening the founda-
tion for selecting preferred moves. Like recency, frequency often is weighted
or decomposed into subclasses by taking account of the dimensions of so-
lution quality and move influence. Also, frequency can be integrated with
recency to provide a composite structure for creating penalties and induce-
ments that modify move evaluations. (Although recency-based memory is
often used in the context of short term memory, it can also be a foundation
of longer term forms of memory.)
For our present purposes, we conceive frequencies to consist of ratios,
whose numerators represent counts expressed in two different measures: a
transition measure-the number of iterations where an attribute changes
(enters or leaves) the solutions visited on a particular trajectory, and a
residence measure-the number of iterations where an attribute belongs to
solutions visited on a particular trajectory, or the number of instances where
an attribute belongs to solutions from a particular subset. The denominators
generally represent one of three types of quantities: (1) the total number of
occurrences of all events represented by the numerators (such as the total
number of associated iterations), (2) the sum (or average) of the numerators,
and (3) the maximum numerator value. In cases where the numerators
represent weighted counts, some of which may be negative, denominator
(3) is expressed as an absolute value and denominator (2) is expressed as a
sum of absolute values (possibly shifted by a small constant to avoid a zero
denominator). The ratios produce transition frequencies that keep track of
how often attributes change, and residence frequencies that keep track of how
often attributes are members of solutions generated. In addition to referring
to such frequencies, thresholds based on the numerators alone can be useful
for indicating when phases of greater diversification are appropriate. (The
thresholds for particular attributes can shift after a diversification phase is
executed.)
Residence frequencies and transition frequencies sometimes convey re-
lated information, but in general carry different implications. They are
sometimes confused (or treated identically) in the literature. A noteworthy
Tabu Search 679

distinction is that residence measures, by contrast to transition measures,


are not concerned with the characteristics of a particular solution attribute
or whether it is an attribute that changes in moving from one solution to
another. For example in the Min k-Tree problem, a residence measure may
count the number of times edge (i,j) was part of the solution, while a tran-
sition measure may count the number of times edge (i,j) was added to the
solution. (More complex joint measures, such as the number of times edge
(i,j) was accompanied in the solution by edge (k,n, or was deleted from
the solution in favor of edge (k,n can also selectively be generated. Such
frequencies relate to the issues of creating more complex attributes out of
simpler ones, and to the strategies of vocabulary building.)
A high residence frequency may indicate that an attribute is highly at-
tractive if the domain consists of high quality solutions, or may indicate
the opposite, if the domain consists of low quality solutions. On the other
hand, a residence frequency that is high (or low) when the domain is chosen
to include both high and low quality solutions may point to an entrenched
(or excluded) attribute that causes the search space to be restricted, and
that needs to be jettisoned (or incorporated) to allow increased diversity.
For example, an entrenched attribute may be a job that is scheduled in the
same position during a sequence of iterations that include both low and high
quality objective function evaluations.
As a further useful distinction, a high transition frequency, in contrast to
a high residence frequency, may indicate an associated attribute is a "crack
filler," that shifts in and out of solutions to perform a fine tuning function.
In this context, a transition frequency may be interpreted as a measure of
volatility. For example, the Min k-Tree problem instance in Figure 4 of
Section 2 contains a number of edges whose weight may give them the role
of crack fillers. Specifically, edges (3,5) and (6,7) both have a weight of
6, which makes them attractive relative to other edges in the graph. Since
these edges are not contained in an optimal solution, there is some likelihood
that they may repeatedly enter and leave the current solution in a manner
to lure the search away from the optimal region. In general, crack fillers
are determined not simply by cost or quality but by structure, as in certain
forms of connectivity. (Hence, for example, the edge (3,5) of Figure 4 does
not repeatedly enter and leave solutions in spite of its cost.) Some subset of
such elements is also likely to be a part of an optimal solution. This subset
can typically be identified with much less difficulty once other elements
are in place. On the other hand, a solution (full or partial) may contain
the "right" crack fillers but offer little clue as to the identity of the other
attributes that will transform the solution into one that is optimal.
680 F. Glover and M. Laguna

Problem Residency Measure Transition Measure


Sequencing Number of times job j Number of times job i
has occupied position has exchanged positions with
7r(j). with job j.
Sum of tardiness of job j Number of times job j
when this job occupies has been moved to an
position 7r(j). earlier position in the
sequence.
Min k- Tree Number of times edge (i, J) Number of times edge( i, J)
Problem has been part of the has been deleted from
current solution. the current solution when
edge (k, Q has been
added.
Sum of total solution Number of times edge(i, j)
weight when edge (i, j) has been added during
is part of the improving moves.
solution.

Table 9: Example of frequency measures.

We use a sequencing problem and the Min k- Tree problem as contexts


to further illustrate both residence and transition frequencies. Only numer-
ators are indicated, understanding that denominators are provided by the
conditions (1) to (3) previously defined. The measures are given in Table 9.
Attributes that have greater frequency measures, just as those that have
greater recency measures (i.e., that occur in solutions or moves closer to the
present), can trigger a tabu activation rule if they are based on consecutive
solutions that end with the current solution. However, frequency-based
memory often finds its most productive use as part of a longer term strategy,
which employs incentives as well as restrictions to determine which moves
are selected. In such a strategy, tabu activation rules are translated into
evaluation penalties, and incentives become evaluation enhancements, to
alter the basis for qualifying moves as attractive or unattractive.
To illustrate, in a scheduling setting where a swap neighborhood is used,
an attribute such as a job j with a high residence frequency in position 7r(j)
may be assigned a strong incentive (" profit") to serve as a swap attribute,
thus resulting in the choice of a move that yields a new sequence 7r' with
7r'(j) =/: 7r(j). Such an incentive is particularly relevant in the case where the
TabuEnd value of job j is small compared to the current iteration, since this
Tabu Search 681

value (minus the corresponding tabu tenure) identifies the latest iteration
that job j was a swap attribute, and hence discloses that job j has occupied
position 7r(j) in every solution since.
Frequency-based memory therefore is usually applied by introducing
graduated tabu states, as a foundation for defining penalty and incentive
values to modify the evaluation of moves. A natural connection exists be-
tween this approach and the recency-based memory approach that creates
tabu status as an all-or-none condition. If the tenure of an attribute in
recency-based memory is conceived as a conditional threshold for applying
a very large penalty, then the tabu classifications produced by such mem-
ory can be interpreted as the result of an evaluation that becomes strongly
inferior when the penalties are activated. Conditional thresholds are also
relevant to determining the values of penalties and incentives in longer term
strategies. Most applications at present, however, use a simple linear mul-
tiple of a frequency measure to create a penalty or incentive term. The
multiplier is adjusted to create the right balance between the incentive or
penalty and the cost (or profit) coefficients of the objective function.

4.2 Intensification Strategies


Intensification strategies are based on modifying choice rules to encourage
move combinations and solution features historically found good. They may
also initiate a return to attractive regions to search them more thoroughly.
A simple instance of this second type of intensification strategy is shown in
Figure 18. The strategy for selecting elite solutions is italicized in Figure 18
due to its importance. Two variants have proved quite successful. One intro-
duces a diversification measure to assure the solutions recorded differ from
each other by a desired degree, and then erases all short term memory before
resuming from the best of the recorded solutions. A diversification measure
may be related to the number of moves that are necessary to transform one
solution into another. Or the measure may be defined independently from
the move mechanism. For example, in sequencing, two solutions may be
considered diverse if the number of swaps needed to move from one to the
other is "large." On the other hand, the diversification measure may be the
number of jobs that occupy a different position in the two sequences being
compared. (This shows that intensification and diversification often work
together, as elaborated in the next section.)
The second variant that has also proved successful, keeps a bounded
length sequential list that adds a new solution at the end only if it is better
than any previously seen. The current last member of the list is always
682 F. Glover and M. Laguna

App ly TS short term memory.


App ly an elite selecticn strategy.
do {
Choose one of the elite solutions.
Resume short term memory TS from chosen solution.
Add new solutions to elite list when applicable.
}while [iterations < limit and list not empty)

Figure 18: Simple TS intensification approach.

the one chosen (and removed) as a basis for resuming search. However, TS
short term memory that accompanied this solution is also saved, and the
first move also forbids the move previously taken from this solution, so that
a new solution path will be launched.
A third variant of the approach of Figure 18 is related to a strategy
that resumes the search from unvisited neighbors of solutions previously
generated. Such a strategy keeps track of the quality of these neighbors
to select an elite set, and restricts attention to specific types of solutions,
such as neighbors of local optima or neighbors of solutions visited on steps
immediately before reaching such local optima. This type of "unvisited
neighbor" strategy has been little examined. It is noteworthy, however, that
the two variants previously indicated have provided solutions of remarkably
high quality.
Another type of intensification approach is intensification by decomposi-
tion, where restrictions may be imposed on parts of the problem or solution
structure in order to generate a form of decomposition that allows a more
concentrated focus on other parts of the structure. A classical example is
provided by the traveling salesman problem, where edges that belong to
the intersection of elite tours may be "locked into" the solution, in order
to focus on manipulating other parts of the tour. The use of intersections
is an extreme instance of a more general strategy for exploiting frequency
information, by a process that seeks to identify and constrain the values of
strongly determined and consistent variables. We discuss the identification
and use of such variables in Section 4.4.
Tabu Search 683

Intensification by decomposition also encompasses other types of strate-


gic considerations, basing the decomposition not only on indicators of strength
and consistency, but also on opportunities for particular elements to interact
productively. Within the context of a permutation problem as in scheduling
or routing, for example, where solutions may be depicted as selecting one
or more sequences of edges in a graph, a decomposition may be based on
identifying sub chains of elite solution, where two or more sub chains may
be assigned to a common set if they contain nodes that are "strongly at-
tracted" to be linked with nodes of other sub chains in the set. An edge
disjoint collection of sub chains can be treated by an intensification process
that operates in parallel on each set, subject to the restriction that the
identity of the endpoints of the subchains will not be altered. As a result
of the decomposition, the best new sets of sub chains can be reassembled to
create new solutions. Such a process can be applied to multiple alternative
decompositions in broader forms of intensification by decomposition.
These ideas are lately finding favor in other procedures, and may provide
a bridge for interesting components of tabu search with components of other
methodologies. We address the connections with these methodologies in
Section 5.

4.3 Diversification Strategies


Search methods based on local optimization often rely on diversification
strategies to increase their effectiveness in exploring the solution space de-
fined by a combinatorial optimization problem. Some of these strategies
are designed with the chief purpose of preventing searching processes from
cycling, i.e., from endlessly executing the same sequence of moves (or more
generally, from endlessly and exclusively revisiting the same set of solutions).
Others are introduced to impart additional robustness or vigor to the search.
Genetic algorithms use randomization in component processes such as com-
bining population elements and applying crossover (as well as occasional
mutation), thus providing an approximate diversifying effect. Simulated an-
nealing likewise incorporates randomization to make diversification a func-
tion of temperature, whose gradual reduction correspondingly diminishes the
directional variation in the objective function trajectory of solutions gener-
ated. Diversification in GRASP (Greedy Randomized Adaptive Search Pro-
cedures) is achieved in a certain sense within repeated construction phases
by means of a random sampling over elements that pass a threshold of at-
tractiveness by a greedy criterion.
In tabu search, diversification is created to some extent by short term
684 F. Glover and M. Laguna

memory functions, but is particularly reinforced by certain forms of longer


term memory. TS diversification strategies, as their name suggests, are
designed to drive the search into new regions. Often they are based on mod-
ifying choice rules to bring attributes into the solution that are infrequently
used. Alternatively, they may introduce such attributes by periodically ap-
plying methods that assemble subsets of these attributes into candidate solu-
tions for continuing the search, or by partially orJully restarting the solution
process. Diversification strategies are particularly helpful when better solu-
tions can be reached only by crossing barriers or "humps" in the solution
space topology.

4.3.1 Modifying Choice Rules


Consider a TS method designed to solve a graph partitioning problem which
uses full and partial swap moves to explore the local neighborhood. The goal
of this problem is to partition the nodes of the graph into two equal subsets
so that the sum of the weights of the edges that join nodes in one subset to
nodes in the other subset is minimized. Full swaps exchange two nodes that
lie in two different sets of the partition. Partial swaps transfer a single node
from one set to the other set. Since full swaps do not modify the number
of nodes in the two sets of the partition, they maintain feasibility, while
partial swaps do not. Therefore, under appropriate guidance, one approach
to generate diversity is to periodically disallow the use of non-improving full
swaps for a chosen duration (after an initial period where the search" settles
down"). The partial swaps must of course be coordinated to allow feasi-
bility to be recovered after achieving various degrees of infeasibility. (This
relates to the approach of strategic oscillation, described in Section 4.4.)
Implemented appropriately, this strategy has the effect of intelligently per-
turbing the current solution, while escaping from a local optimum, to an
extent that the search is directed to a region that is different than the one
being currently explored. The implementation of this strategy as applied to
experimental problems has resulted in significant improvements in problem-
solving efficacy.
The incorporation of partial swaps in place of full swaps in the previous
example can be moderated by using the following penalty function:

MoveValue' = MoveValue + d * Penalty.

This type of penalty approach is commonly used in TS, where the Penalty
value is often a function of frequency measures such as those indicated in
Tabu Search 685

Table 9, and d is an adjustable diversification parameter. Larger d values


correspond to a desire for more diversification. (E.g., nodes that change
sets more frequently are penalized more heavily to encourage the choice of
moves that incorporate other nodes. Negative penalties, or "inducements, "
may also be used to encourage low frequency elements.) The penalty can be
applied to classes of moves as well as to attributes of moves. Thus, during
a phase where full swaps moves are excluded, all such moves receive a large
penalty (with a value of d that is effectively infinite).
In some applications where d is used to inhibit the selection of "feasibility
preserving" moves, the parameter can be viewed as the reciprocal of a La-
grangean multiplier in that "low" values result in nearly infinite costs for
constraint violation, while "high" values allow searching through infeasi-
ble regions. The adjustment of such a parameter can be done in a way to
provide a strategic oscillation around the feasibility boundary, again as dis-
cussed in Section 4.4. The parameter can also be used to control the amount
of randomization in probabilistic versions of tabu search.
In TS methods that incorporate the simplex method of linear program-
ming, as in "adjacent extreme point approaches" for solving certain nonlin-
ear and mixed-integer programming problems, a diversification phase can be
designed based on the number of times variables become basic. For exam-
ple, a diversification step can give preference to bringing a nonbasic variable
into the basis that has remained out of the basis for a relatively long period
(cumulatively, or since its most recent inclusion, or a combination of the
two). The number of successive iterations such steps are performed, and
the frequency with which they are initiated, are design considerations of the
type that can be addressed, for example, by the approach of target analysis
(see Section 5).

4.3.2 Restarting
Frequency information can be used in different ways to design restarting
mechanisms within tabu search. In a sequencing problem, for example, the
overall frequency of jobs occupying certain positions can be used to bias a
construction procedure and generate new restarting points.
In a TS method for a location/ allocation problem, a diversification phase
can be developed using frequency counts on the number of times a depot has
changed its status (from open to closed or vice versa). The diversification
phase can be started from the best solution found during the search. Based
on the frequency information, d depots with the lowest counts are selected
and their status is changed. The search starts from the new solution which
686 F. Glover and M. Laguna

differs from the best by exactly d components. To prevent a quick return to


the best solution, the status of the d depots is also recorded in short term
memory. (This is another case where residence frequency measures may
provide useful alternatives or supplements to transition frequency measures.)

Additional forms of memory functions are possible when a restarting


mechanism is implemented. For example, in the location/allocation prob-
lem, it is possible to keep track of recent sets of depots that were selected
for diversification and avoid the same selection in the next diversification
phase. Similarly, in a sequencing problem, the positions occupied by jobs in
recent starting points can be recorded to avoid future repetition. This may
be viewed as a very simple form of the critical event memory discussed in
Section 2, and more elaborate forms will often yield greater benefits. The
exploitation of such memory is very important in TS designs that are com-
pletely deterministic, since in these cases a given starting point will always
produce the same search path. Experience also shows, however, that uses of
TS memory to guide probabilistic forms of restarting can likewise yield ben-
efits (Rochat and Taillard, 1995; Fleurent and Glover, 1996; Lokketangen
and Glover, 1996).

Before concluding this section, it is appropriate to provide a word of


background about the orientation underlying diversification strategies within
the tabu search framework. Often there appears to be a hidden assumption
that diversification is somehow tantamount to randomization. Certainly
the introduction of a random element to achieve a diversifying effect is a
widespread theme among search procedures, and is fundamental to the op-
eration of simulated annealing and genetic algorithms. From an abstract
standpoint, there is clearly nothing wrong with equating randomization and
diversification, but to the extent that diversity connotes differences among
elements of a set, and to the extent that establishing such differences is
relevant to an effective search strategy, then the popular use of randomiza-
tion is at best a convenient proxy (and at worst a haphazard substitute) for
something quite different.

When randomization is used as part of a restarting mechanism, for ex-


ample, frequency information can be employed to approximate probability
distributions that bias the construction process. In this way, randomization
is not a "blind" mechanism, but instead it is guided by search history. We
examine inappropriate roles of randomization in Section 4.6, where we also
explore the intensification / diversification distinction more thoroughly.
Tabu Search 687

.a
«$
>-
.....
2!
0
J~
.+1 O;cillation BlU1'ldary
~....
.....0
~
o-l

0 1 2 3
Iterations

Figure 19: Strategic oscillation.

4.4 Strategic Oscillation


Strategic oscillation is closely linked to the origins of tabu search, and pro-
vides a means to achieve an effective interplay between intensification and
diversification over the intermediate to long term. The recurring usefulness
of this approach documented in a variety of studies warrants a more detailed
examination of its characteristics.
Strategic oscillation operates by orienting moves in relation to a critical
level, as identified by a stage of construction or a chosen interval of functional
values. Such a critical level or oscillation boundary often represents a point
where the method would normally stop. Instead of stopping when this
boundary is reached, however, the rules for selecting moves are modified, to
permit the region defined by the critical level to be crossed. The approach
then proceeds for a specified depth beyond the oscillation boundary, and
turns around. The oscillation boundary again is approached and crossed,
this time from the opposite direction, and the method proceeds to a new
turning point (see Figure 19).
The process of repeatedly approaching and crossing the critical level from
different directions creates an oscillatory behavior, which gives the method
688 F. Glover and M. Laguna

its name. Control over this behavior is established by generating modi-


fied evaluations and rules of movement, depending on the region navigated
and the direction of search. The possibility of retracing a prior trajectory
is avoided by standard tabu search mechanisms, like those established by
recency-based and frequency-based memory functions.
A simple example of this approach occurs for the multidimensional knap-
sack problem, where values of zero-one variables are changed from 0 to 1
until reaching the boundary of feasibility. The method then continues into
the infeasible region using the same type of changes, but with a modified
evaluator. After a selected number of steps, the direction is reversed by
choosing moves that change variables from 1 to O. Evaluation criteria to
drive toward improvement vary according to whether the movement occurs
inside or outside the feasible region (and whether it is directed toward or
away from the boundary), accompanied by associated restrictions on admis-
sible changes to values of variables. The turnaround towards feasibility can
also be triggered by a maximum infeasibility value, which defines the depth
of the oscillation beyond the critical level (Le., the feasibility boundary).
A somewhat different type of application occurs for graph theory prob-
lems where the critical level represents a desired form of graph structure,
capable of being generated by progressive additions (or insertions) of basic
elements such as nodes, edges, or subgraphs. One type of strategic oscilla-
tion approach for this problem results by a constructive process of introduc-
ing elements until the critical level is reached, and then introducing further
elements to cross the boundary defined by the critical level. The current
solution may change its structure once this boundary is crossed (as where a
forest becomes transformed into a graph that contains loops), and hence a
different neighborhood may be required, yielding modified rules for selecting
moves. The rules again change in order to proceed in the opposite direc-
tion, removing elements until again recovering the structure that defines the
critical level.
In the Min Iv- Tree problem, for example, edges can be added beyond the
critical level defined by k. Then a rule to delete edges must be applied. The
rule to delete edges will typically be different in character from the one used
for adding (i.e., will not simply be its" inverse "). In this case, all feasible
solutions lie on the oscillation boundary, since any deviation from this level
results in solutions with more or less than kedges.
Such rule changes are typical features of strategic oscillation, and provide
an enhanced heuristic vitality. The application of different rules may be
accompanied by crossing a boundary to different depths on different sides.
An option is to approach and retreat from the boundary while remaining on
Tabu Search 689

a single side, without crossing (i.e., electing a crossing of "zero depth ").
These examples constitute a constructive/destructive type of strategic
oscillation, where constructive steps "add" elements (or set variables to
1) and destructive steps "drop" elements (or set variables to 0). (Types
of TS memory structures for add / drop moves discussed in Section 2 are
relevant for such procedures.) One-sided oscillations (that remain on a sin-
gle side of a critical boundary) are appropriate in a variety of scheduling
and graph-related applications, where constructive processes are tradition-
ally applied. The alternation with destructive processes that strategically
dismantle and then re-build successive trial solutions affords a potent en-
hancement of more traditional procedures. In both one-sided and two-sided
oscillation approaches it is frequently important to spend additional search
time in regions close to the critical level, and especially to spend time at
the critical level itself. This may be done by inducing a sequence of tight
oscillations about the critical level, as a prelude to each larger oscillation
that proceeds to a greater depth. Alternately, if greater effort is permitted
for evaluating and executing each move, the method may use "exchange
moves" (broadly interpreted) to stay at the critical level for longer periods.
In the case of the Min k- Tree problem, for example, once the oscillation
boundary has been reached, the search can stay on it by performing swap
moves (either of nodes or edges). An option is to use such exchange moves
to proceed to a local optimum each time the critical level is reached.
When the level or functional values in Figure 19 refer to degrees of fea-
sibility and infeasibility, a vector-valued function associated with a set of
problem constraints can be used to control the oscillation. In this case, con-
trolling the search by bounding this function can be viewed as manipulating
a parameterization of the selected constraint set. A preferred alternative
is often to make the function a Lagrangean or surrogate constraint penalty
function, avoiding vector-valued functions and allowing tradeoffs between
degrees of violation of different component constraints.
Intensification processes can readily be embedded in strategic oscilla-
tion by altering choice rules to encourage the incorporation of particular
attributes-{)r at the extreme, by locking such attributes into the solution
for a period. Such processes can be viewed as designs for exploiting strongly
determined and consistent variables. A strongly determined variable is one
that cannot change its value in a given high quality solution without se-
riously degrading quality or feasibility, while a consistent variable is one
that frequently takes on a specific value (or a highly restricted range of val-
ues) in good solutions. The development of useful measures of "strength"
and "consistency" is critical to exploiting these notions, particularly by
690 F. Glover and M. Laguna

accounting for tradeoffs determined by context. However, straightforward


uses of frequency-based memory for keeping track of consistency, sometimes
weighted by elements of quality and influence, have produced methods with
very good performance outcomes.
An example of where these kinds of approaches are also beginning to
find favor in other settings occurs in recently developed variants of genetic
algorithms for sequencing problems. The more venturesome of these ap-
proaches are coming to use special forms of "crossover" to assure offspring
will receive attributes shared by good parents, thus incorporating a type
of intensification based on consistency. Extensions of such procedures us-
ing TS ideas of identifying elements that qualify as consistent and strongly
determined according to broader criteria, and making direct use of mem-
ory functions to establish this identification, provide an interesting area for
investigation. (Additional links to GA methods, and ways to go beyond
current explorations of such methods, are discussed in Section 5.)
Longer term processes, following the type of progression customarily
found beneficial in tabu search, may explicitly introduce supplemental di-
versification strategies into the oscillation pattern. When oscillation is based
on constructive and destructive processes, the repeated application of con-
structive phases (rather than moving to intermediate levels using destructive
moves) embodies an extreme type of oscillation that is analogous to a restart
method. In this instance the restart point is always the same (Le., a null
state) instead of consisting of different initial solutions, and hence it is im-
portant to use choice rule variations to assure appropriate diversification.
A connection can also be observed between an extreme version of strate-
gic oscillation-in this case a relaxed version-and the class of procedures
known as perturbation approaches. An example is the subclass known as
"large-step simulated annealing" or "large-step Markov chain" methods
(Martin, Otto and Felten, 1991 and 1992; Johnson, 1990; Lourenco and
Zwijnenburg, 1996; Hong, Kahng and Moon, 1997). Such methods try to
drive an SA procedure (or an iterated descent procedure) out of local op-
timality by propelling the solution a greater distance than usual from its
current location.
Perturbation methods may be viewed as loosely structured procedures
for inducing oscillation, without reference to intensification and diversifica-
tion and their associated implementation strategies. Similarly, perturbation
methods are not designed to exploit tradeoffs created by parametric varia-
tions in elements such as different types of infeasibility, measures of displace-
ment from different sides of boundaries, etc. Nevertheless, at a first level
of approximation, perturbation methods seek goals similar to those pursued
Tabu Search 691

by strategic oscillation.

4.5 Path Relinking


A useful integration of intensification and diversification strategies occurs in
the approach called path relinking. This approach generates new solutions
by exploring trajectories that connect elite solutions-by starting from one
of these solutions, called an initiating solution, and generating a path in the
neighborhood space that leads toward the other solutions, called guiding
solutions. This is accomplished by selecting moves that introduce attributes
contained in the guiding solutions.
The approach may be viewed as an extreme (highly focused) instance of
a strategy that seeks to incorporate attributes of high quality solutions, by
creating inducements to favor these attributes in the moves selected. How-
ever, instead of using an inducement that merely encourages the inclusion
of such attributes, the path relinking approach subordinates all other con-
siderations to the goal of choosing moves that introduce the attributes of
the guiding solutions, in order to create a "good attribute composition" in
the current solution. The composition at each step is determined by choos-
ing the best move, using customary choice criteria, from the restricted set
of moves that incorporate a maximum number (or a maximum weighted
value) of the attributes of the guiding solutions. As in other applications of
TS, aspiration criteria can override this restriction to allow other moves of
particularly high quality to be considered.
Specifically, upon identifying a collection of one or more elite solutions
to guide the path of a given solution, the attributes of these guiding solu-
tions are assigned preemptive weights as inducements to be selected. Larger
weights are assigned to attributes that occur in greater numbers of the guid-
ing solutions, allowing bias to give increased emphasis to solutions with
higher quality or with special features (e.g., complementing those of the
solution that initiated the new trajectory).
More generally, it is not necessary for an attribute to occur in a guiding
solution in order to have a favored status. In some settings attributes can
share degrees of similarity, and in this case it can be useful to view a solution
vector as providing "votes" to favor or discourage particular attributes.
Usually the strongest forms of aspiration criteria are relied upon to overcome
this type of choice rule.
In a given collection of elite solutions, the role of initiating solution and
guiding solutions can be alternated. The distinction between initiating solu-
tions and guiding solutions effectively vanishes in such cases. For example,
692 F. Glover and M. Laguna

a set of current solutions may be generated simultaneously, extending dif-


ferent paths, and allowing an initiating solution to be replaced (as a guiding
solution for others) whenever its associated current solution satisfies a suf-
ficiently strong aspiration criterion.
Because their roles are interchangeable, the initiating and guiding so-
lutions are collectively called reference solutions. These reference solutions
can have different interpretations depending on the solution framework un-
der consideration. Reference points can be created by any of a number of
different heuristics that result in high quality solutions.
An idealized form of such a process is shown in Figure 20. The chosen
collection of reference solutions consists of the three members, A, B, and C.
Paths are generated by allowing each to serve as initiating solution, and by
allowing either one or both of the other two solutions to operate as guid-
ing solutions. Intermediate solutions encountered along the paths are not
shown. The representation of the paths as straight lines of course is oversim-
plified, since choosing among available moves in a current neighborhood will
generally produce a considerably more complex trajectory. Intensification
can be achieved by generating paths from similar solutions, while diversi-
fication is obtained creating paths from dissimilar solutions. Appropriate
aspiration criteria allow deviation from the paths at attractive neighbors.
As Figure 20 indicates, at least one path continuation is allowed beyond
each initiating/guiding solution. Such a continuation can be accomplished
by penalizing the inclusion of attributes dropped during a trajectory, includ-
ing attributes of guiding solutions that may be compelled to be dropped in
order to continue the path. (An initiating solution may also be repelled from
the guiding solutions by penalizing the inclusion of their attributes from the
outset.) Probabilistic TS variants operate in the path relinking setting, as
they do in others, by translating evaluations for deterministic rules into
probabilities of selection, strongly biased to favor higher evaluations.
Promising regions are searched more thoroughly in path relinking by
modifying the weights attached to attributes of the guiding solutions, and
by altering the bias associated with solution quality and selected solution
features. Figure 21 depicts the type of variation that can result, where
the point X represents an initiating solution, the points A, B and C repre-
sent guiding solutions, and the dashed, dotted and solid lines are different
searching paths. For appropriate choices of the reference points (and neigh-
borhoods for generating paths from them), the notion called the Principle of
Proximate Optimality (Glover and Laguna, 1997) suggests that additional
elite points are likely to be found in the regions traversed by the paths, upon
launching new searches from high quality points on these paths.
Thbu Search 693

1
t
8 ,

I
I
I
I
...t ... ,A ,,'
t ... ,,' I"
... ... \

... , ,, \
\
t "')c"
,
t
t
, I ...
I ...... ,

0
I
I ,
0'
I
\
I " ~

1 ,. "," :I ......... ...

, 13 --------------1-------------- - (;
,. " II ... ...
,. I ...

Figure 20: Paths relinking in neighborhood space.


694 F. Glover and M. Laguna

: .. ....
....
.. '"

0
" '.
:

'.
"

'.

,,
,
(
(
(
(
r,
,
,
\
®
--..-- --"'I> --"to __

,,

@)
,~

,,
I
~

.... ~
,
,
.... '

Figure 21: Path relinking by attributive bias.


Tabu Search 695

4.5.1 Roles in Intensification and Diversification


Path relinking, in common with strategic oscillation, gives a natural foun-
dation for developing intensification and diversification strategies. Intensi-
fication strategies in this setting typically choose reference solutions to be
elite solutions that lie in a common region or that share common features.
Similarly, diversification strategies based on path relinking characteristically
select reference solutions that come from different regions or that exhibit
contrasting features. Diversification strategies may also place more empha-
sis on paths that go beyond the reference points. Collections of reference
points that embody such conditions can be usefully determined by clustering
and conditional analysis methods.
These alternative forms of path relinking also offer a convenient basis
for parallel processing, contributing to the approaches for incorporating in-
tensification and diversification tradeoffs into the design of parallel solution
processes generally.

4.5.2 Incorporating Alternative Neighborhoods


Path relinking strategies in tabu search can occasionally profit by employ-
ing different neighborhoods and attribute definitions than those used by the
heuristics for generating the reference solutions. For example, it is some-
times convenient to use a constructive neighborhood for path relinking, i.e.,
one that permits a solution to be built in a sequence of constructive steps
(as in generating a sequence of jobs to be processed on specified machines
using dispatching rules). In this case the initiating solution can be used
to give a beginning partial construction, by specifying particular attributes
(such as jobs in particular relative or absolute sequence positions) as a basis
for remaining constructive steps. Similarly, path relinking can make use of
destructive neighborhoods, where an initial solution is "overloaded" with
attributes donated by the guiding solutions, and such attributes are progres-
sively stripped away or modified until reaching a set with an appropriate
composition.
When path relinking is based on constructive neighborhoods, the guiding
solution(s) provide the attribute relationships that give options for subse-
quent stages of construction. At an extreme, a full construction can be
produced, by making the initiating solution a null solution. The destructive
extreme starts from a "complete set" of solution elements. Constructive
and destructive approaches differ from transition approaches by typically
producing only a single new solution, rather than a sequence of solutions,
696 F. Glover and M. Laguna

on each path that leads from the initiating solution toward the others. In
this case the path will never reach the additional solutions unless a transition
neighborhood is used to extend the constructive neighborhood.
Constructive neighborhoods can often be viewed as a special case of feasi-
bility restoring neighborhoods, since a null or partially constructed solution
does not satisfy all conditions to qualify as feasible. Similarly, destructive
neighborhoods can also represent an instance of a feasibility restoring func-
tion, as where an excess of elements may violate explicit problem constraints.
A variety of methods have been devised to restore infeasible solutions to fea-
sibility, as exemplified by flow augmentation methods in network problems,
subtour elimination methods in traveling salesman and vehicle routing prob-
lems, alternating chain processes in degree-constrained subgraph problems,
and value incrementing and decrementing methods in covering and multidi-
mensional knapsack problems. Using neighborhoods that permit restricted
forms of infeasibilities to be generated, and then using associated neighbor-
hoods to remove these infeasibilities, provides a form of path relinking with
useful diversification features. Upon further introducing transition neigh-
borhoods, with the ability to generate successive solutions with changed
attribute mixes, the mechanism of path relinking also gives a way to tunnel
through infeasible regions. The following is a summary of the components
of path relinking:

Step 1. Identify the neighborhood structure and associated solution at-


tributes for path relinking (possibly different from those of other TS strate-
gies applied to the problem).

Step 2. Select a collection of two or more reference solutions, and iden-


tify which members will serve as the initiating solution and the guiding
solution(s). (Reference solutions can be infeasible, such as "incomplete"
or "overloaded" solution components treated by constructive or destructive
neighborhoods. )

Step 3. Move from the initiating solution toward (or beyond) the guiding
solution(s), generating one or more intermediate solutions as candidates to
initiate subsequent problem solving efforts. (If the first phase of this step
creates an infeasible solution, apply an associated second phase with a fea-
sibility restoring neighborhood.)

In Section 5 we will see how the path relinking strategy relates to a


strategy called scatter search, which provides additional insights into the
Tabu Search 697

nature of both approaches.

4.6 The Intensification / Diversification Distinction


The relevance of the intensification/diversification distinction is supported
by the usefulness of TS strategies that embody these notions. Although
both operate in the short term as well as the long term, we have seen that
longer term strategies are generally the ones where these notions find their
greatest application.
In some instances we may conceive of intensification as having the func-
tion of an intermediate term strategy, while diversification applies to consid-
erations that emerge in the longer run. This view comes from the observation
that in human problem solving, once a short term strategy has exhausted its
efficacy, the first (intermediate term) response is often to focus on the events
where the short term approach produced the best outcomes, and to try to
capitalize on elements that may be common to those events. When this
intensified focus on such events likewise begins to lose its power to uncover
further improvement, more dramatic departures from a short term strategy
are undertaken. (Psychologists do not usually differentiate between interme-
diate and longer term memory, but the fact that memory for intensification
and diversification can benefit from such differentiation suggests that there
may be analogous physical or functional differences in human memory struc-
tures.) Over the truly long term, however, intensification and diversification
repeatedly come into play in ways where each depends on the other, not
merely sequentially, but also simultaneously.
There has been some confusion between the terms intensification and
diversification, as applied in tabu search, and the terms exploitation and
exploration, as popularized in the literature of genetic algorithms. The dif-
ferences between these two sets of notions deserve to be clarified, because
they have substantially different consequences for problem solving.
The exploitation/exploration distinction comes from control theory, where
exploitation refers to following a particular recipe (traditionally memoryless)
until it fails to be effective, and exploration then refers to instituting a se-
ries of random changes-typically via multi-armed bandit schemes-before
reverting to the tactical recipe. (The issue of exploitation versus explo-
ration concerns how often and under what circumstances the randomized
departures are launched.)
By contrast, intensification and diversification in tabu search are both
processes that take place when simpler exploitation designs play out and lose
their effectiveness-although as we have noted, the incorporation of mem-
698 F. Glover and M. Laguna

ory into search causes intensification and diversification also to be manifest


in varying degrees even in the short range. (Similarly, as we have noted,
intensification and diversification are not opposed notions, for the best form
of each contains aspects of the other, along a spectrum of alternatives.)
Intensification and diversification are likewise different from the control
theory notion of exploration. Diversification, which is sometimes confused
with exploration, is not a recourse to a Game of Chance for shaking up
the options invoked, but is a collection of strategies-again taking advan-
tage of memory-designed to move purposefully rather than randomly into
uncharted territory.
The source of these differences is not hard to understand. Researchers
and practitioners in the area of search methods have had an enduring love af-
fair with randomization, perhaps influenced by the much publicized Heisen-
berg Uncertainty Principle in Quantum Mechanics. Einstein's belief that
God does not roll dice is out of favor, and many find a special enchantment
in miraculous events where blind purposelessness creates useful order. (We
are less often disposed to notice that this way of producing order requires an
extravagant use of time, and that order, once created, is considerably more
effective than randomization in creating still higher order.)
Our "scientific" reports of experiments with nature reflect our fasci-
nation with the role of chance. When apparently chaotic fluctuations are
brought under control by random perturbations, we seize upon the random
element as the key, while downplaying the importance of attendant restric-
tions on the setting in which randomization operates. The diligently con-
cealed message is that under appropriate controls, perturbation is effective
for creating desired patterned outcomes-and in fact, if the system and at-
tendant controls are sufficiently constrained, perturbation works even when
random. (Instead of accentuating differences between workable and unwork-
able kinds of perturbation, in our quest to mold the universe to match our
mystique we portray the central consideration to be randomization versus
nonrandomization. )
The tabu search orientation evidently contrasts with this perspective.
As manifest in the probabilistic TS variant, elements subjected to random
influence are preferably to be strongly confined, and uses of randomization
are preferably to be modulated through well differentiated probabilities.
In short, the situations where randomization finds a place are very highly
structured. From this point of view God may play with dice, but beyond
any question the dice are loaded.
Tabu Search 699

4.7 Some Basic Memory Structures for Longer Term Strate-


gies
To give a foundation for describing fundamental types of memory structures
for longer term strategies, we first briefly review the form of the recency-
based memory structure introduced in Section 2 for handling add/drop
moves. However, we slightly change the notation, to provide a convenient
way to refer to a variety of other types of moves.

4.7.1 Conventions
Let S = {I, 2, ... ,s} denote an index set for a collection of solution at-
tributes. For example, the indexes i E S may correspond to indexes of
zero-one variables Xi, or they may be indexes of edges that may be added
to or deleted from a graph, or the job indexes in a production scheduling
problem. More precisely, by the attribute/element distinction discussed in
Section 2, the attributes referenced by S in these cases consist of the spe-
cific values assigned to the variables, the specific add/drop states adopted
by the edges, or positions occupied by the jobs. In general, to give a corre-
spondence with developments of Section 3, an index i E S can summarize
more detailed information; e.g., by referring to an ordered pair (j,k) that
summarizes a value assignment Xj = k or the assignment of job j to position
k, etc. Hence, broadly speaking, the index i may be viewed as a notational
convenience for representing a pair or a vector.
To keep our description at the simplest level, suppose that each i E S
corresponds to a 0-1 variable Xi. As before, we let Iter denote the counter
that identifies the current iteration, which starts at 0 and increases by 1
each time a move is made.
For recency-based memory, following the approach indicated in Section 2,
when a move is executed that causes a variable Xi to change its value, we
record TabuStart( i} = Iter immediately after updating the iteration counter.
(This means that if the move has resulted in Xi = 1, then the attribute
Xi = 0 becomes tabu-active at the iteration TabuStart(i).} Further, we let
TabuTenure(i} denote the number of iterations this attribute will remain
tabu-active. Thus, by our previous design, the recency-based tabu criterion
says that the previous value of Xi is tabu-active throughout all iterations
such that
TabuStart(i} + TabuTenure(i} ::; Iter.
Similarly, in correspondence with earlier remarks, the value TabuStart( i}
can be set to 0 before initiating the method, as a convention to indicate no
700 F. Glover and M. Laguna

prior history exists. Then we automatically avoid assigning a tabu-active


status to any variable with TabuStart{ i} = 0 {since the starting value for
variable Xi has not yet been changed}.

4.7.2 Frequency-Based Memory

By our foregoing conventions, allowing the set S = {I, 2, ... ,s} for illus-
tration purposes to refer to indexes of 0-1 variables Xi, we may indicate
structures to handle frequency-based memory as follows.
Transition frequency-based memory is by far the simplest to handle. A
transition memory, Transition{i}, to record the number of times Xi changes
its value, can be maintained simply in the form of a counter for Xi that is
incremented at each move where such a change occurs. Since Xi is a zero-
one variable, Transition{ i} also discloses the number of times Xi changes to
and from each of its possible assigned values. In more complex situations,
by the conventions already noted, a matrix memory Transition(j, k} can
be used to determine numbers of transitions involving assignments such
as Xj = k. Similarly, a matrix memory may be used in the case of the
sequencing problem where both the index of job j and position k may be of
interest. In the context of the Min k- Tree problem, an array dimensioned
by the number of edges can maintain a transition memory to keep track
of the number of times that specific edges have been brought in and out
of the solution. A matrix based on the edges can also identify conditional
frequencies. For example, the matrix Transition(j, k} can be used to count
the number of times edge j replaced edge k. It should be kept in mind in using
transition frequency memory that penalties and inducements are often based
on relative numbers {rather than absolute numbers} of transitions, hence
requiring that recorded transition values are divided by the total number
of iterations {or the total number of transitions}. As noted earlier, other
options include dividing by the current maximum transition value. Raising
transition values to a power, as by squaring, is often useful to accentuate
the differences in relative frequencies.
Residence memory requires only slightly more effort to maintain than
transition memory, by taking advantage of the recency-based memory stored
in TabuStart{i}. The following approach can be used to track the number
of solutions in which, thereby allowing the number of solutions in which Xi
= 0 to be inferred from this. Start with Residence{ i} = 0 for all i. Then,
whenever Xi changes from 1 to 0, after updating Iter but before updating
TabuStart{ i}, set
Tabu Search 701

Iter Assignment Residence


0 X=o 0
10 x=1 0
22 x=O 22 - 10 = 12
50 x=l 12
73 x=o 12 + 73 - 50 = 35

Table 10: Illustrative residence memory.

Residence{i) = Residence{i) + Iter - TabuStart{i).

Then, during iterations when Xi = 0, Residence{ i) correctly stores the


number of earlier solutions in which Xi = 1. During iterations when Xi =
1, the true value of Residence{i) is the right hand side of the preceding as-
signment, however the update only has to be made at the indicated points
when Xi changes from 1 to O. Table 10 illustrates how this memory structure
works when used to track the assignments of a variable X during 100 itera-
tions. The variable is originally assigned to a value of zero by a construction
procedure that generates an initial solution. In iteration 10 a move is made
that changes the assignment of X from zero to one, however the Residence
value remains at zero. Residence is updated at iterations 22 and 73, when
moves are made that change the assignment of X from 1 to O. At iteration
65, for example, X has received a value of 1 for 27 iterations (i.e., Residence+
Iter - TabuStart = 12 + 65 - 50 = 27), while at iteration 90 the count is 35
(Le., the value of Residence).
As with transition memory, residence memory should be translated into
a relative as a basis for creating penalties and inducements.
The indicated memory structures can readily be applied to multivalued
variables (or multistate attributes) by the extended designs illustrated in
Section 3. In addition, the 0-1 format can be adapted to reference the num-
ber of times (and last time) a more general variable changed its value, which
leads to more restrictive tabu conditions and more limiting (" stronger")
uses of frequency-based memory than by referring separately to each value
the variable receives. As in the case of recency-based memory, the ability to
affect larger numbers of alternative moves by these more aggregated forms of
memory can be useful for larger problems, not only for conserving memory
space but also for providing additional control over solutions generated.
702 F. Glover and M. Laguna

4.7.3 Critical Event Memory


Strategic oscillation offers an opportunity to make particular use of both
short term and long term frequency-based memory. To illustrate, let A(Iter)
denote a zero-one vector whose jth component has the value 1 if attribute j
is present in the current solution and has the value 0 otherwise. The vector
A can be treated "as if" it is the same as the solution vector for zero-one
problems, though implicitly it is twice as large, since Xj = 0 is a different
attribute from Xj = 1. This means that rules for operating on the full A
must be reinterpreted for operating on the condensed form of A. The sum of
the A vectors over the most recent t critical events provides a simple memory
that combines recency and frequency considerations. To maintain the sum
requires remembering A{k), for k ranging over the last t iterations. Then the
sum vector A * can be updated quite easily by the incremental calculation

A* = A* + A{Iter) - A{Iter - t + 1).


Associated frequency measures, as noted earlier, may be normalized, in
this case for example by dividing A* by the value of t. A long term form
of A* does not require storing the A{k) vectors, but simply keeps a running
sum. A * can also be maintained by exponential smoothing.
Such frequency-based memory is useful in strategic oscillation where crit-
ical events are chosen to be those of generating a complete (feasible) con-
struction, or in general of reaching the targeted boundary (or a best point
within a boundary region). Instead of using a customary recency-based TS
memory at each step of an oscillating pattern, greater flexibility results by
disregarding tabu restrictions until reaching the turning point, where the
oscillation process alters its course to follow a path toward the boundary.
At this point, assume a choice rule is applied to introduce an attribute that
was not contained in any recent solution at the critical level. If this attribute
is maintained in the solution by making it tabu to be dropped, then upon
eventually reaching the critical level the solution will be different from any
seen over the horizon of the last t critical events. Thus, instead of updating
A * at each step, the updating is done only for critical level solutions, while
simultaneously enhancing the flexibility of making choices.
In general, the possibility occurs that no attribute exists that allows
this process to be implemented in the form stated. That is, every attribute
may already have a positive associated entry in A *. Thus, at the turn
around point, the rule instead is to choose a move that introduces attributes
which are least frequently used. {Note, "infrequently used" can mean ei-
ther "infrequently present" or "infrequently absent," depending upon the
Tabu Search 703

current direction of oscillation.) This again can be managed conveniently by


using penalties and inducements. Such an approach has been found very ef-
fective for multidimensional knapsack problems and 0-1 quadratic optimiza-
tion problems in Glover and Kochenberger (1996) and Glover, Kochenberger
and Alidaee (1997).
For greater diversification, this rule can be applied for r steps after reach-
ing the turn around point. Normally r should be a small number, e.g., with
a baseline value of 1 or 2, which is periodically increased in a standard
diversification pattern. Shifting from a short term A * to a long term A*
creates a global diversification effect. A template for this approach is given
in Figure 22.
The approach of Figure 22 is not symmetric. An alternative form of
control is to seek immediately to introduce a low frequency attribute upon
leaving the critical level, to increase the likelihood that the solution at the
next turn around will not duplicate a solution previously visited at that
point. Such a control enhances diversity, though duplication at the turn
around will already be inhibited by starting from different solutions at the
critical level.

5 Connections, Hybrid Approaches and Learning


Relationships between tabu search and other procedures like simulated an-
nealing and genetic algorithms provide a basis for understanding similarities
and contrasts in their philosophies, and for creating potentially useful hybrid
combinations of these approaches. We offer some speculation on pref~rable
directions in this regard, and also suggest how elements of tabu search can
add a useful dimension to neural network approaches.
From the standpoint of evolutionary strategies, we trace connections
between population based models for combining solutions, as in genetic al-
gorithms, and ideas that emerged from surrogate constraint approaches for
exploiting optimization problems by combining constraints. We show how
this provides the foundation for methods that give additional alternatives to
genetic-based frameworks, specifically as embodied in the scatter search ap-
proach, which is the "primal complement" to the dual strategy of surrogate
constraint approaches. Recent successes by integrating scatter search (and
its path relinking extensions) with tabu search disclose potential advantages
for evolutionary strategies that incorporate adaptive memory.
Finally, we describe the learning approach called target analysis, which
provides a way to determine decision parameters for deterministic and prob-
704 F. Glover and M. Laguna

Critical L'lI,.!

~
Update eritical attribute fn.quencies ...
(smrt and lollg tem\)
Maintain level for, iterations

Mvanc, &tr,at

Low freque n:y attributes (In ehosen dill!etio~


added durillg f1l5 t "s mall ~' Construetive or
steps all! Tabu to drop. Des truetiw, etc.

Turn hound Point

FaVOl (the inclusion of) low frequency


critical attributes for fil5 t "small 1'" step; of
the followillg "AdYallCe."

"'For seleo1ed part of critioell.ewl iterations: e .1. , for first erld best solutio ns of
current b look

Figure 22: Strategic oscillation illustrative memory.


Tabu Search 705

abilistic strategies-and thus affords an opportunity to create enhanced so-


lution methods.

5.1 Simulated Annealing


The contrasts between simulated annealing and tabu search are fairly con-
spicuous, though undoubtedly the most prominent is the focus on exploiting
memory in tabu search that is absent from simulated annealing. The intro-
duction of this focus entails associated differences in search mechanisms,
and in the elements on which they operate. Accompanying the differences
directly attributable to the focus on memory, and also magnifying them,
several additional elements are fundamental for understanding the relation-
ship between the methods. We consider three such elements in order of
increasing importance.
First, tabu search emphasizes scouting successive neighborhoods to iden-
tify moves of high quality, as by candidate list approaches of the form de-
scribed in Section 3. This contrasts with the simulated annealing approach
of randomly sampling among these moves to apply an acceptance criterion
that disregards the quality of other moves available. (Such an acceptance
criterion provides the sole basis for sorting the moves selected in the SA
method.) The relevance of this difference in orientation is accentuated for
tabu search, since its neighborhoods include linkages based on history, and
therefore yield access to information for selecting moves that is not available
in neighborhoods of the type used in simulated annealing.
Next, tabu search evaluates the relative attractiveness of moves not only
in relation to objective function change, but in relation to additional fac-
tors that represent quality, which are balanced over time with factors that
represent influence. Both types of measures are affected by the differen-
tiation among move attributes, as embodied in tabu activation rules and
aspiration criteria, and in turn by relationships manifested in recency, fre-
quency, and sequential interdependence (hence, again, involving recourse to
memory). Other aspects of the state of search also affect these measures, as
reflected in the altered evaluations of strategic oscillation, which depend on
the direction of the current trajectory and the region visited.
Finally TS emphasizes guiding the search by reference to multiple thresh-
olds, reflected in the tenures for tabu-active attributes and in the conditional
stipulations of aspiration criteria. This may be contrasted to the simulated
annealing reliance on guiding the search by reference to the single threshold
implicit in the temperature parameter. The treatment of thresholds by the
two methods compounds this difference between them. Tabu search varies
706 F. Glover and M. Laguna

its thresholds nonmonotonically, reflecting the conception that multidirec-


tional parameter changes are essential to adapt to different conditions, and
to provide a basis for locating alternatives that might otherwise be missed.
This contrasts with the simulated annealing philosophy of adhering to a
temperature parameter that only changes monotonically.
Hybrids are now emerging that are taking preliminary steps to bridge
some of these differences, particularly in the realm of transcending the sim-
ulated annealing reliance on a monotonic temperature parameter. A hybrid
method that allows temperature to be strategically manipulated, rather than
progressively diminished, has been shown to yield improved performance
over standard SA approaches. A hybrid method that expands the SA basis
for move evaluations also has been found to perform better than standard
simulated annealing. Consideration of these findings invites the question
of whether removing the memory scaffolding of tabu search and retaining
its other features may yield a viable method in its own right. For exam-
ple, experience cited in some of the studies reported in Glover and Laguna
(1997) suggests that, while a memoryless version of tabu search called tabu
thresholding can outperform a variety of alternative heuristics, it generally
does not match the performance of TS methods that appropriately exploit
memory.

5.2 Genetic Algorithms


Genetic algorithms offer a somewhat different set of comparisons and con-
trasts with tabu search. GAs are based on selecting subsets (traditionally
pairs) of solutions from a population, called parents, and combining them to
produce new solutions called children. Rules of combination to yield children
are based on the genetic notion of crossover, which in the classical form con-
sists of interchanging solution values of particular variables, together with
occasional operations such as random value changes. Children that pass a
survivability test, probabilistically biased to favor those of superior qual-
ity, are then available to be chosen as parents of the next generation. The
choice of parents to be matched in each generation is based on random or
biased random sampling from the population· (in some parallel versions ex-
ecuted over separate subpopulations whose best members are periodically
exchanged or shared). Genetic terminology customarily refers to solutions
as chromosomes, variables as genes, and values of variables as alleles.
By means of coding conventions, the genes of genetic algorithms may be
compared to attributes in tabu search. Introducing memory in GAs to track
the history of genes and their alleles over subpopulations would provide an
Thbu Search 707

immediate and natural way to create a hybrid with TS.


Some important differences between genes and attributes are worth not-
ing, however. The implicit differentiatIon of attributes into from and to
components, each having different memory functions, does not have a coun-
terpart in genetic algorithms. A from attribute is one that is part of the
current solution but is not included in the next solution once a move is made.
A to attribute is one that is not part of the current solution but becomes
part of the next solution once a move is made. The lack of this type of
differentiation in GAs results because these approaches are organized to op-
erate without reference to moves (although, strictly speaking, combination
by crossover can be viewed as a special type of move).
A contrast to be noted between genetic algorithms and tabu search arises
in the treatment of context, Le., in the consideration given to structure in-
herent in different problem classes. For tabu search, context is fundamental,
embodied in the interplay of attribute definitions and the determination of
move neighborhoods, and in the choice of conditions to define tabu restric-
tions. Context is also implicit in the identification of amended evaluations
created in association with longer term memory, and in the regionally de-
pendent neighborhoods and evaluations of strategic oscillation.
At the opposite end of the spectrum, GA literature has traditionally
stressed the freedom of its rules from the influence of context. Crossover,
in particular, is supposedly a context neutral operation, which assumes no
reliance on conditions that solutions must obey in a particular problem set-
ting, just as genes make no reference to the environment as they follow
their instructions for recombination (except, perhaps, in the case of muta-
tion). Practical application, however, generally renders this an inconvenient
assumption, making solutions of interest difficult to find. Consequently, a
good deal of effort in GA implementation is devoted to developing" special
crossover" operations that compensate for the difficulties created by con-
text, effectively reintroducing it on a case by case basis.
The chief method by which modern genetic algorithms handle structure
is by relegating its treatment to some other method. For example, genetic
algorithms combine solutions by their parent-children processes at one level,
and then a descent method may be introduced to operate on the resulting so-
lutions to produce new solutions. These new solutions in turn are submitted
to be recombined by the GA processes. In these versions, genetic algorithms
already take the form of hybrid methods. Hence there is a natural basis for
marrying GA and TS procedures in such approaches. But genetic algorithms
and tabu search also can be joined in a more fundamental way.
Specifically, tabu search strategies for intensification and diversification
708 F. Glover and M. Laguna

are based on the following question: how can information be extracted from
a set of good solutions to help uncover additional (and better) solutions?
From one point of view, GAs provide an approach for answering this ques-
tion, consisting of putting solutions together and interchanging components
(in some loosely defined sense, if traditional crossover is not strictly en-
forced). Tabu search, by contrast, seeks an answer by utilizing processes
that specifically incorporate neighborhood structures into their design.
Augmented by historical information, neighborhood structures are used
as a basis for applying penalties and incentives to induce attributes of good
solutions to become incorporated into current solutions. Consequently, al-
though it may be meaningless to interchange or otherwise incorporate a set
of attributes from one solution into another in a wholesale fashion, as at-
tempted in traditional GA recombination operations, a stepwise approach
to this goal through the use of neighborhood structures is entirely practi-
cable. This observation provides a motive for creating structured combina-
tions of solutions that embody desired characteristics such as feasibility-as
is automatically achieved by the TS approach of path relinking discussed in
Section 4. Instead of being compelled to create new types of crossover to re-
move deficiencies of standard operators upon being confronted by changing
contexts, this approach addresses context directly and makes it an essential
part of the design for generating combinations.
The current trend of genetic algorithms seems to be increasingly com-
patible with this perspective, and could provide a basis for a useful hybrid
combination of genetic algorithm and tabu search ideas. However, a funda-
mental question emerges, as posed in the development of the next sections,
about whether there is any advantage to introducing genetic crossover-based
ideas over introducing the apparently more flexible and exploitable path re-
linking ideas.

5.2.1 Models of Nature-Beyond "Genetic Metaphors"


An aspect of tabu search that is often misunderstood concerns the relation
between a subset of its strategies and certain approaches embodied in genetic
algorithms. TS researchers have tended sometimes to overlook the part of
the adaptive memory focus that is associated with strategies for combining
sets of elite solutions. Complementing this, GA researchers have been largely
unaware that such a collection of strategies outside their domain exists. This
has quite possibly been due to the influence of the genetic metaphor, which
on the one hand has helped to launch a number of useful problem solving
ideas, and on the other hand has also sometimes obscured fertile connections
Tabu Search 709

l.t) 8 egin with a p op ulation of binary vectors.

2) Op erate repeatedly on the current generation of vectors.


for a selected number of steps. choosing two "p arent
vectors" at rando m. Then mate the parents by ex:changing
certain of their components to produce offspring. (The
ex:change. called "crossover:' was originally designed to
reflect the process by which chromosomes ex:change
comp 0 nents in genetic mating and. in commo n with the
st ep of select ing parent s th emselves. was 0 rganized to rely
heavily on randomization. In addition. a "mutation"
op era!: ion is occasio nally allowed to flip bit s at rando m.)

3) App ly a measure of fitness to decide which offsp ring


survive to become parents fo r the nex:t generatio n. When
the selected number of matings has been performed for
the current generation. return tot he start of Step 2 to
initiate the mating of the resu lting new set of parents.

4) Carry out the mating-and-survival op era!:ion of Steps 2


and 3 until the population becomes stable or until a
chosen number of iterations has ela sed.

Figure 23: Genetic algorithm template.

to ideas that come from different foundations.


To understand the relevant ties, it is useful to go back in time to examine
the origins of the GA framework and of an associated set of notions that
became embodied in TS strategies. We will first sketch the original genetic
algorithm design (see Figure 23), as characterized in Holland (1975). Our
description is purposely somewhat loose, to be able to include approaches
more general than the specific proposals that accompanied the introduction
of GAs. Many variations and changes have come about over the years, as
we subsequently observe.
A somewhat different model for combining elements of a population
comes from a class of relaxation strategies in mathematical optimization
710 F. Glover and M. Laguna

known as surrogate constraint methods (Glover, 1965). The goal of these


approaches is to generate new constraints that capture information not con-
tained in the original problem constraints taken independently, but which
is implied by their union. We will see that some unexpected connections
emerge between this development and that of genetic algorithms.
The information-capturing focus of the surrogate constraint framework
has the aim of developing improved methods for solving difficult optimiza-
tion problems by means of (a) providing better criteria for choice rules to
guide a search for improved solutions, (b) inferring new bounds (constraints
with special structures) to limit the space of solutions examined. (The basic
framework and strategies for exploiting it are given in Glover (1965, 1968,
1975b), Greenberg and Pierskalla (1970, 1973), Karwan and Rardin (1976,
1979), and Freville and Plateau (1986, 1993).) Based on these objectives,
the generation of new constraints proceeds as indicated in Figure 24.
A natural first impression is that the surrogate constraint design is quite
unrelated to the GA design, stemming from the fact that the concept of com-
bining constraints seems inherently different from the concept of combining
vectors. However in many types of problem formulations, including those
where surrogate constraints were first introduced, constraints are summa-
rized by vectors. More particularly, over time, as the surrogate constraint
approach became embedded in both exact and heuristic methods, variations
led to the creation of a "primal counterpart" called scatter search. The
scatter search approach combines solution vectors by rules patterned after
those that govern the generation of new constraints, and specifically inherits
the strategy of exploiting linear combinations and inference (Glover, 1977).

5.3 Scatter Search


The scatter search process, building on the principles that underlie the surro-
gate constraint design, is organized to (1) capture information not contained
separately in the original vectors, (2) take advantage of auxiliary heuristic
solution methods to evaluate the combinations produced and to generate
new vectors.
The original form of scatter search may be sketched as in Figure 25.
Three particular features of scatter search deserve mention. First, the
linear combinations are structured according to the goal of generating weighted
centers of selected subregions, allowing for nonconvex combinations that
project these centers into regions external to the original reference solutions.
The dispersion pattern created by such centers and their external projections
is particularly useful for mixed integer optimization. Second, the strategies
Tabu Search 711

1) Begin with an initial set of prob lem constraints (chosen to


characterize all or a special p art of the feasible region for the
problem considered).

2) Create a measure of the relative influence of the constraints


as basis for combining sub sets to generate new constraints.
The new (surrogate) constraints. are created from
nonnegative linear combinations of other constraints.
together with cutting p lanes inferred from such
combinations. (The goal is to determine surrogate
constraints that are most effective for guiding the solution
process.)

3) Change the way the constraints are combined. based on the


problem constraints that are not satisfied by trial solutions
generated relat ive to the surrogate constraints. account ing
fo r the degree to which different source constraints are
vio 1at ed. Then process the resulting new surrogat e
constraints to introd uce add ition al inferred constraints
obtained from bounds and cuttingplanes. (Weaker
surrogate constraints and source constraints that are
determined to be red undant are discarded.)

4) Change the way the constraints are combined. based on the


problem constraints that are not satisfied by trial solutions
generated relative to the surrogate constraints. accounting
fo r the degree to which different source constraints are
violated. Then process the resulting new surrogate
constraints to introd uce add ition al inferred constraints
obtained from bounds and cutting p lanes. (Weaker
surrogate constraints and source constraints that are
determined to be red undant are discarded.

Figure 24: Surrogate constraint template.


712 F. Glover and M. Laguna

1) Generate a starting set of solution vectors by heuristic


processes designed for the problem considered, and
designate a sub set of the best vect ors to b e reference
solutions, (Subsequent iterations of this step, transferring
from Step 3 below, incorporate advanced starting
solut ions and best solut ions from previous history as
candidates for the reference solutions.)
2) Creat e new points co nsist ing of linear comb inatio ns of
sub sets of the current reference so lutio ns. The linear
comb inatio ns are:
(a) chosen to produce points both inside and outside the
convex: regions spanned by the reference solutions.
(b) mod ified by generalized round ing p rocesses to yield
integer values for integer-constrained vector
comp onents.
3) Extract a collection of the best solutions generated in
Step 2 to be used as starting points for a new application
of the heurist ic pro cesses of St ep 1. Rep eat these step s
until reaching a sp ecified it eratio n limit.

Figure 25: Scatter search procedure.


Tabu Search 713

for selecting particular subsets of solutions to combine in Step 2 are designed


to make use of clustering, which allows different types of strategic variation
by generating new solutions "within clusters" and "across clusters". Third,
the method is organized to use supporting heuristics that are able to start
from infeasible solutions, and hence which remove the restriction that solu-
tions selected as starting points for re-applying the heuristic processes must
be feasible. In sum, scatter search is founded on the following premises.

(P 1) Useful information about the form (or location) of optimal solutions


is typically contained in a suitably diverse collection of elite solutions.

(P2) When solutions are combined as a strategy for exploiting such in-
formation, it is important to provide for combinations that can extrapolate
beyond the regions spanned by the solutions considered, and further to in-
corporate heuristic processes to map combined solutions into new points.
(This serves to provide both diversity and quality.)

(P3) Taking account of multiple solutions simultaneously, as a foundation


for creating combinations, enhances the opportunity to exploit information
contained in the union of elite solutions.

The fact that the heuristic processes of scatter search are not restricted
to a single uniform design, but represent a varied collection of procedures, af-
fords additional strategic possibilities. This theme also shares a link with the
original surrogate constraint proposal, where heuristics for surrogate relax-
ations are introduced to improve the application of exact solution methods.
In combination, the heuristics are used to generate strengthened surrogate
constraints and, iteratively applied, to generate trial solutions for integer
programming problems.
The catalog in Figure 26 traces the links between the conceptions under-
lying scatter search and conceptions that have been introduced over time as
amendments to the GA framework.
These innovations in the GA domain, which have subsequently been in-
corporated in a wide range of studies, are variously considered to be advances
or heresies according to whether they are viewed from liberal or traditional
perspectives. Significantly, their origins are somewhat diffuse, rather than
integrated within a single framework.
It is clear that a number of the elements of the scatter search approach
remain outside of the changes brought about by these proposals. A sim-
ple example is the approach of introducing adaptive rounding processes for
714 F. Glover and M. Laguna

• Introduction of "flexible crossover operations." (Scatter


search combinations include all possibilities generated by
the early GA crossover operations, and also include all
possibilities embedded in the more advanced "uniform" and
"BernOUlli" crossovers (Ackley (1987), Spears and DaJ ong
(1991)). Path relinking descendants of scatter search
provide furthEr possibilities, noted subsequently.)

• Use of heurist ic methods to improve solutions generated


from processes for combining vectors (Muhlenb ein et al.
(1988), Ulder et al. (1991)), (VJhitley, Gordon and Mathias
(1994)].

• Exploitation of vector representations that are not restricted


to binary represent at ions (Davis (1989), Eschelman and
Schaffer (1992)).

• Introduction of special cases of linear combinations for


operating on cont inuous vectors [Dau-is (1989J, V\7ri~t .
(1990), Back et al. (1991), Michalewicz and J anikow (1991}).

• Use of combinations of more than two parents


simultaneously to produce offspring (Eib en et al. (1994),
MUhlenbein and Voight (1996)).

• Introduction of strategies that subdivide the population into


different ~oupings (MUhlenbein and Schlierkamp-Voosen
r199411.

Figure 26: Scatter search features {1977} incorporated into non-traditional


GA approaches.
Tabu Search 715

mapping fractional components into integers. There also has conspicuously


been no GA counterpart to the use of clustering to create strategic group-
ings of points, nor (as a result) to the notion of combining points according
to distinctions between membership in different clusters. (The closest ap-
proximation to this has been the use of "island populations" that evolve
separately, but without concern for analyzing or subdividing populations
based on inference and clustering.)
The most important distinction, however, is the link between scatter
search and the theme of exploiting history. The prescriptions for combining
solutions within scatter search are part of a larger design for taking advan-
tage of information about characteristics of previously generated solutions
to guide current search. In retrospect, it is perhaps not surprising that such
a design should share an intimate association with the surrogate constraint
framework, with its emphasis on extracting and coordinating information
across different solution phases. This orientation, which takes account of
elements such as the recency, frequency and quality of particular value as-
signments, clearly shares a common foundation with notions incorporated
within tabu search. (The same reference on surrogate constraint strategies
that is the starting point for scatter search is also often cited as a source
of early TS conceptions.) By this means, the link between tabu search and
so-called "evolutionary" approaches also becomes apparent. The term evo-
lutionary has undergone an interesting evolution of its own. By a novel turn,
the term "mutation" in the GA terminology has become reinterpreted to
refer to any form of change, including the purposeful change produced by a
heuristic process. As a result, all methods that apply heuristics to multiple
solutions, whether or not they incorporate strategies for combining solu-
tions, are now considered kindred to genetic algorithms, and the enlarged
collection is labeled "evolutionary methods." (This terminology accord-
ingly has acquired the distinction of embracing nearly every kind of method
conceivable. )

5.3.1 Modern Forms and Applications of Scatter Search


Recent implementations of scatter search (cited below) have taken advantage
of the implicit learning capabilities provided by the tabu search framework,
leading to refined methods for determining reference points and for generat-
ing new points. Current scatter search versions have also introduced more
sophisticated mechanisms to map fractional values into integer values. This
work is reinforced by new theorems about searches over spaces of zero-one
integer variables. Special models have also been developed to allow both
716 F. Glover and M. Laguna

• Tabu search memory is used to select current reference points from a


historica1poo1 (G10ver,1989,1994a).

• Tabu search intensificl!ld:ion and diversification strl!ld:egies guide the


generation of new points (Mulvey, 1995; Zenios,1996; II'leurent et 0.1.19915).

• Solutions generated as "vector combinations" are further improved by


explicit tabu search guidance (Trafalis and Al-Harkan, 1995; Glover, Kelly
and Laguna, 19915; 1I'1eurent et al., 19915; Cung, et 0.1. 1997).

• Directional rounding processes focus the search for feasible zero-one


solut ions allowing them to be mapp ed into co nvea: sub regions of hyp erp lanes
produced by valid cutting plane inequalities (Glover, 19950.).

• Nsural netwo rk learning is ap p lied to filter 0 ut promising and unp romising


points for further examination, and pattern analysis is used to predict the
locatio n of promising new so 1utio ns (G lover, Kelly and Lagu no., 19915).

• Mixed integer programming models generate sets of diversified points, and


yield refined procedures for mapping infeasible points into feasible points
(Glover, Kelly and Laguna,19915).

Figure 27: Scatter Search Extensions.

heuristic and exact methods to transform infeasible trial points into feasible
points. Finally, scatter search is the source of the broader class of path
relinking methods, as described in Section 4, which offer a wide range of
mechanisms for creating productive combinations of reference solutions. A
brief summary of some of these developments appears in Figure 27.
Implementation of various components of these extensions have provided
advances for solving general nonlinear mixed discrete optimization problems
with both linear and nonlinear constraints, as noted in the references cited.

5.3.2 Scatter Search and Path Relinking Interconnections


The relation between scatter search and path relinking sheds additional light
on the character of these approaches. As already remarked, path relinking
is a direct extension of scatter search. The way this extension comes about
is as follows.
Tabu Search 717

From a spatial orientation, the process of generating linear combinations


of a set of reference points may be characterized as generating paths between
and beyond these points (where points on such paths also serve as sources
for generating additional points). This leads to a broader conception of the
meaning of combinations of points. That is, by natural extension, we may
conceive such combinations to arise by generating paths between and beyond
selected points in neighborhood space, rather than in Euclidean space.
The form of these paths in neighborhood space is easily specified by refer-
ence to attribute-based memory, as used in tabu search. The path relinking
strategy thus emerges as a direct consequence. Just as scatter search encom-
passes the possibility to generate new solutions by weighting and combining
more than two reference solutions at a time, path relinking includes the pos-
sibility to generate new solutions by multi-parent path constructions that
incorporate attributes from a set of guiding solutions, where these attributes
are weighted to determine which moves are given higher priority, as we have
seen in Section 4. The name path relinking comes from the fact that the
generation of such paths in neighborhood space characteristically" relinks"
previous points in ways not achieved in the previous search history.
The relevance of these concepts as a foundation for evolutionary proce-
dures is illustrated by recent applications of scatter search and path relinking
which have disclosed the promise of these approaches for solving a variety
of optimization problems. A sampling of such applications includes:

• Vehicle Routing-Rochat and Taillard (1995); Taillard (1996)

• Quadratic Assignment-Cung et al. (1996)


• Financial Product Design-Consiglio and Zenios (1996)
• Neural Network Thaining -Kelly, Rangaswamy and Xu (1996)
• Job Shop Scheduling-Yamada and Nakano (1996)
• Flow Shop Scheduling-Yamada and Reeves (1997)

• Graph Drawing-Laguna and Marti (1997)

• Linear Ordering-Laguna, Marti and Campos (1997)


• Unconstrained Continuous Optimization-Fleurent et al. (1996)

• Bit Representation-Rana and Whitley (1997)


• Optimizing Simulation-Glover, Kelly and Laguna (1996)
718 F. Glover and M. Laguna

• Complex System Optimization-Laguna (1997)

It is additionally useful to note that re-expressing scatter search rela-


tive to neighborhood space-as done in path relinking-also leads to more
general forms of scatter search in Euclidean space. The form of path relink-
ing manifested in vocabulary building (which results by using constructive
and destructive neighborhoods to create and reassemble components of solu-
tions), also suggests the relevance of combining solutions in Euclidean space
by allowing different linear combinations to be created for different solution
components. The design considerations that underlie vocabulary building
generally carryover to this particular instance (see Glover and Laguna,
1997).
The broader conception of solution combinations provided by path re-
linking has useful implications for evolutionary procedures. The exploitation
of neighborhood space and attribute-based memory gives specific, versatile
mechanisms for achieving such combinations, and provides a further in-
teresting connection between tabu search proposals and genetic algorithm
proposals. In particular, many recently developed "crossover operators,"
which have no apparent relation between each other in the GA setting, can
be shown to arise as special instances of path relinking, by restricting atten-
tion to two reference points (taken as parents in GAs), and by replacing the
strategic neighborhood guidance of path relinking with a reliance on ran-
domization. In short, the options afforded by path relinking for combining
solutions are more unified, more systematic and more encompassing than
those provided by the "crossover" concept, which changes from instance
to instance and offers no guidance for how to take advantage of any given
context.

5.4 Greedy Randomized Adaptive Search Procedures (GRASP)


The GRASP methodology was developed in the late 1980s, and the acronym
was coined by Tom Feo (Feo and Resende, 1995). It was first used to solve
computationally difficult set covering problems (Feo and Resende, 1989).
Each GRASP iteration consists of constructing a trial solution and then ap-
plying an exchange procedure to find a local optimum (i.e., the final solution
for that iteration). The construction phase is iterative, greedy, and adaptive.
It is iterative because the initial solution is built considering one element at
a time. It is greedy because the addition of each element is guided by a
greedy function. It is adaptive because the element chosen at any iteration
in a construction is a function of those previously chosen. (That is, the
Tabu Search 719

method is adaptive in the sense of updating relevant information form iter-


ation to iteration, as in most constructive procedures.} The improvement
phase typically consists of a local search procedure.
For illustration purposes, consider the design of a GRASP for the 2-
partition problem (see, e.g., Laguna et al., 1994). This problem consists of
clustering the nodes of a weighted graph into two equal sized sets such that
the weight of the edges between the two sets is minimized. In this context,
the iterative, greedy, and adaptive elements of the GRASP construction
phase may be interpreted as follows. The initial solution is built considering
one node at a time. The addition of each node is guided by a greedy function
that minimizes the augmented weight of the partition. The node chosen at
any iteration in the construction is a function of the adjacencies of previously
chosen nodes. There is also a probabilistic component in GRASP that is
applied to the selection of elements during the construction phase. After
choosing the first node for one set, all non-adjacent nodes are of equal quality
with respect to the given greedy function. If one of those nodes is chosen
by some deterministic rule, then every GRASP iteration will repeat this
selection. In such stages within a construction where there are multiple
greedy choices, choosing anyone of them will not compromise the greedy
approach, yet each will often lead to a very different solution.
To generalize this strategy, consider forming a candidate list (at each
stage of the construction) consisting of high quality elements according to
an adaptive greedy function. Then, the next element to be included in the
initial solution is randomly selected from this list. A similar strategy has
been categorized as a cardinality-based semi-greedy heuristic.
The solution generated by a greedy randomized adaptive construction
can generally be improved by the application of an improvement phase fol-
lowing selected construction phases, as by using a descent method based on
an exchange mechanism, since usually the result of the construction phase is
not a local minimum with respect to simple exchange neighborhoods. There
is an obvious computational tradeoff between the construction and improv-
ing phases. An intelligent construction requires fewer improving exchanges
to reach a local optimum, and therefore, it results in a reduction of the total
CPU time required per GRASP iteration. The exchange mechanism can
also be used as a basis for a hybrid method, as by incorporating elements of
other methodologies such as simulated annealing or tabu search. In partic-
ular, given that the GRASP constructions inject a degree of diversification
to the search process, the improvement phase may consist of a short term
memory tabu search that is fine tuned for intensification purposes. Other
connections may be established with methods such as scatter search or the
720 F. Glover and M. Laguna

path relinking strategy of tabu search, by using the GRASP constructions


(or their associated local optima) as reference points.
Performing multiple GRASP iterations may be interpreted as a means of
strategically sampling the solution space. Based on empirical observations,
it has been found that the sampling distribution generally has a mean value
that is inferior to the one obtained by a deterministic construction, but the
best over all trials dominates the deterministic solution with a high probabil-
ity. The intuitive justification of this phenomenon is based on the ordering
statistics of sampling. GRASP implementations are generally robust in the
sense that it is difficult to find or devise pathological instances for which
the method will perform arbitrarily bad. The robustness of this method
has been well documented in applications to production, flight scheduling,
equipment and tool selection, location, and maximum independent sets.
An interesting connection exists between GRASP and probabilistic tabu
search (PTS).IfPTS is implemented in a memoryless form, and restricted to
operate only in the constructive phase of a multistart procedure (stripping
away memory, and even probabilistic choice, from the improving phase),
then a procedure resembling GRASP results. The chief difference is that
the probabilities used in PTS are rarely chosen to be uniform over mem-
bers of the candidate list, but generally seek to capture variations in the
evaluations, whenever these variations reflect anticipated differences in the
effective quality of the moves considered.
This connection raises the question of whether a multistart variant of
probabilistic tabu search may offer a useful alternative to memoryless mul-
tistart approaches like GRASP. A study of this issue for the quadratic as-
signment problem, where GRASP has been reported to perform well, was
conducted by Fleurent and Glover (1996). To provide a basis for compari-
son, the improving phases of the PTS multistart method excluded the use of
TS memory and guidance strategies, and were restricted to employ a stan-
dard descent procedure. Probabilistic tabu search mechanisms were used
in the constructive phases, incorporating frequency-based intensification to
improve the effectiveness of successive constructions. The reSUlting multi-
start method proved significantly superior to other multistart approaches
previously reported for the quadratic assignment problem. However, it also
turned out to be not as effective as the leading tabu search methods that
use memory in the improving phases as well as (or instead of) in the con-
structive phases. Nevertheless, it seems reasonable to conjecture that classes
of problems exist where increased reliance on re-starting will prove advan-
tageous, and where the best results may be obtained from appropriately
designed multistart strategies such as based on greedy randomized search
Tabu Search 721

and multistart variants of PTS.

5.5 Neural Networks


Neural networks have a somewhat different set of goals than tabu search,
although some overlaps exist. We indicate how tabu search can be used to
extend certain neural net conceptions, yielding a hybrid that may have both
hardware and software implications. The basic transferable insight from
tabu search is that memory components with dimensions such as recency
and frequency can increase the efficacy of a system designed to evolve toward
a desired state. We suggest the merit of fusing neural network memory with
tabu search memory as follows. (A rudimentary acquaintance with neural
network ideas is assumed.)
Recency based considerations can be introduced from tabu search into
neural networks by a time delay feedback loop from a given neuron back to
itself (or from a given synapse back to itself, by the device of interposing
additional neurons). This permits firing rules and synapse weights to be
changed only after a certain time threshold, determined by the length of the
feedback loop. Aspiration thresholds of the form conceived in tabu search
can be embodied in inputs transmitted on a secondary level, giving the
ability to override the time delay for altering firing thresholds and synaptic
weights. Frequency based effects employed in tabu search similarly may be
incorporated by introducing a form of cumulative averaged feedback.
Time delay feedback mechanisms for creating recency and frequency ef-
fects also can have other functions. In a problem solving context, for ex-
ample, it may be convenient to disregard one set of options to concentrate
on another, while retaining the ability to recover the suppressed options
after an interval. This familiar type of human activity is not a customary
part of neural network design, but can be introduced by the time dependent
functions previously indicated. In addition, a threshold can be created to
allow a suppressed option to "go unnoticed" if current activity levels fall in
a certain range, effectively altering the interval before the option reemerges
for consideration. Neural network designs to incorporate those features may
directly make use of the TS ideas that have made these elements effective
in the problem solving domain.
Tabu search strategies that introduce longer term intensification and di-
versification concerns are also relevant to neural network processes. As a
foundation for blending these approaches, it is useful to adopt an orienta-
tion where a collection of neurons linked by synapses with various activation
weights is treated as a set of attribute variables which can be assigned al-
722 F. Glover and M. Laguna

ternative values. Then the condition that synapse j (from a specified origin
neuron to a specified destination neuron) is assigned an activation weight in
interval p can be coded by the assignment Vi = p, where Vi is a component
of an attribute vector y as identified in the discussion of attribute creation
processes in connection with vocabulary building. A similar coding identi-
fies the condition under which a neuron fires (or does not fire) to activate
its associated synapses. As a neural netJork process evolves, a sequence
of these attribute vectors is produced over time. The association between
successive vectors may be imagined to operate by reference to a neighbor-
hood structure implicit in the neural architecture and associated connection
weights. There also may be an implicit association with some (unknown)
optimization problem, or a more explicit association with a known problem
and set of constraints. In the latter case, attribute assignments (neuron fir-
ings and synapse activation) can be evaluated for efficacy by transformation
into a vector x, to be checked for feasibility by x E X. (We maintain a
distinction between y and x since there may not be a one-one association
between them.)
Time records identifying the quality of outcomes produced by recent fir-
ings, and identifying the frequency particular attribute assignments produce
the highest quality firing outcomes, yield a basis for delaying changes in cer-
tain weight assignments and for encouraging changes in others. The concept
of influence, in the form introduced in tabu search, should be considered in
parallel with quality of outcomes.
Early designs to incorporate tabu search into neural networks are pro-
vided in the work of de Werra and Hertz (1989) and Beyer and Ogier (1991).
These applications, which respectively treat visual pattern identification and
nonconvex optimization, are reported to significantly reduce training times
and increase the reliability of outcomes generated. More recent uses of tabu
search to enhance the function of neural networks are provided by the studies
reported in Glover and Laguna (1997).

5.6 Target Analysis


Target analysis (Glover and Greenberg, 1989) links artificial intelligence and
operation research perspectives to give heuristic or exact solution procedures
the ability to learn what rules are best to solve a particular class of prob-
lems. Many existing solution methods have evolved by adopting, a priori, a
somewhat limited characterization of appropriate rules for evaluating deci-
sions. An illustration is provided by restricting the definition of a "best"
move to be one that produces the most attractive objective function change.
Tabu Search 723

However, this strategy does not guarantee that the selected move will lead
the search in the direction of the optimal solution. In fact, in some settings
it has been shown that the merit of such a decision rule diminishes as the
number of iterations increases during a solution attempt.
As seen earlier, the tabu search philosophy is to select a best admissi-
ble move (from a strategically controlled candidate list) at each iteration,
interpreting best in a broad sense that goes beyond the use of objective func-
tion measures, and relies upon historical parameters to aid in composing an
appropriate evaluation. Target analysis provides a means to exploit this
broader view. For example, target analysis can be used to create a dynamic
evaluation function that incorporates a systematic process for diversifying
the search over the longer term.
A few examples of the types of questions that target analysis can be used
to answer are:

(1) Which decision rule from a collection of proposed alternatives should


be selected to guide the search? (In an expanded setting, how should the
rules from the collection be combined? By interpreting "decision rule"
broadly, this encompasses the issue of selecting a neighborhood, or a combi-
nation of neighborhoods, as the source of a move at a given stage.) Similarly,
which parameter values should be chosen to provide effective instances of
the decision rules?

(2) What attributes are most relevant for determining tabu status, and
what associated tabu restrictions, tabu tenures and aspiration criteria should
be used?

(3) What weights should be assigned to create penalties or inducements


(e.g., as a function offrequency-based memory), and what thresholds should
govern their application?

(4) Which measures of quality and influence are most appropriate, and
which combinations of these lead to the best results in different search
phases?

(5) What features of the search trajectory disclose when to focus more
strongly on intensification and when to focus more strongly on diversifica-
tion? (In general, what is the best relative emphasis between intensifica-
tion and diversification, and under what conditions should this emphasis
change?)
724 F. Glover and M. Laguna

Motivation for using target analysis to answer such questions is pro-


vided by contrasting target analysis with the way answers are normally de-
termined. Typically, an experimenter begins with a set of alternative rules
and decision criteria which are intended to capture the principal elements
of a given method, often accompanied by ranges of associated parameters
for implementing the rules. Then various combinations of options are tried,
to see how each one works for a preliminary set of test problems. However,
even a modest number of rules and parameters may create a large number of
possibilities in combination, and there is usually little hope of testing these
with any degree of thoroughness. As a result, such testing for preferred al-
ternatives generally amounts to a process of blind groping. Where methods
boast the lack of optional parameters and rules, typically it is because the
experimenter has already done the advance work to settle upon a particu-
lar combination that has been hard-wired for the user, at best with some
degree of adaptiveness built in, but the process that led to this hard-wiring
still raises the prospect that another set of options may be preferable.
More importantly, in an adaptive memory approach, where information
from the history of the search is included among the inputs that determine
current choices, a trial and error testing of parameters may overlook key
elements of timing and yield no insights about relationships to be exploited.
Such a process affords no way to uncover or characterize the circumstances
encountered during the search that may cause a given rule to perform well
or badly, and consequently gives no way to anticipate the nature of rules
that may perform better than those originally envisioned. Target analysis
replaces this by a systematic approach to create hindsight before the fact,
and then undertakes to "reverse engineer" the types of rules that will lead
to good solutions.

5.6.1 Target Analysis Features


The main features of target analysis may briefly be sketched by viewing
the approach as a five phase procedure (see Figure 28). Phase 1 of target
analysis is devoted to applying existing methods to determine optimal or
exceptionally high quality solutions to representative problems of a given
class. In order to allow subsequent analysis to be carried out more conve-
niently, the problems are often selected to be relatively small, provided this
can be done in a way to assure these problems will exhibit features expected
to be encountered in hard problems from the class examined.
Although this phase is straightforward, the effort allotted to obtaining
Tabu Search 725

Phase 1 (a) Phase 1 (0)


I'
Selected Class 0 fProblems "
r ~stin~ Solution Methods '

~esentat.ive ~ ~ QualitYSOl~
"- ./ "- ./

1
Phase 5
~pplication of Improved Method'
,r
Phase 2
Scoring Procedure
,
€e OfEffecti~ ~OlUtiOn OfS~
\. .I \. ./

1
Phase 4 Phase 3
r I\tfath or Statistical Mo de1s " ..
/"
New Evaluation Functions
""\

~ve Parameter~ ~er Deci.si.on~


"- / "- /

Figure 28: Overview of the target analysis methodology.


726 F. Glover and M. Laguna

solutions of the specified quality will generally be somewhat greater than


would be allotted during the normal operation of the existing solution pro-
cedures, in order to assure that the solutions have the quality sought. (Such
an effort may be circumvented in cases where optimal solutions to a partic-
ular testbed of problems are known in advance.)
Phase 2 uses the solutions produced by Phase 1 as targets, which become
the focus of a new set of solution passes. During these passes, each problem is
solved again, this time scoring all available moves (or a high-ranking subset)
on the basis of their ability to progress effectively toward the target solution.
The scoring can be a simple classification, such as "good" or "bad," or it
may capture more refined gradations. In the case where multiple best or
near best solutions may reasonably qualify as targets, the scores may be
based on the target that is "closest to" the current solution.
In some implementations, choices during Phase 2 are biased to select
moves that have high scores, thereby leading to a target solution more
quickly than the customary choice rules. In other implementations, the
method is simply allowed to make its regular moves. In either case, the goal
is to generate information during this solution effort which may be useful in
inferring the solution scores. That is, the scores provide a basis for creating
modified evaluations-and more generally, for creating new rules to gener-
ate such evaluations in order to more closely match them with the measures
that represent "true goodness" (for reaching the targets).
In the case of tabu search intensification strategies such as elite solution
recovery approaches, scores can be assigned to parameterized rules for deter-
mining the types of solutions to be saved. For example, such rules may take
account of characteristics of clustering and dispersion among elite solutions.
In environments where data bases can be maintained of solutions to related
problems previously encountered, the scores may be assigned to rules for
recovering and exploiting particular instances of these past solutions, and
for determining which new solutions will be added to the data bases as ad-
ditional problems are solved. (The latter step, which is part of the target
analysis and not part of the solution effort can be performed "off line. ") An
integration of target analysis with a generalized form of sensitivity analysis
for these types of applications has been developed and implemented in finan-
cial planning and industrial engineering by Glover, Mulvey and Bai (1996)
and Glover, et al. (1997). Such designs are also relevant, for example, in
applications of linear and nonlinear optimization based on simplex method
subroutines, to identify sets of variables to provide crash-basis starting solu-
tion. (A very useful framework for recovering and exploiting past solutions
is provided by Kraay and Harker (1996, 1997).)
Tabu Search 727

In path relinking strategies, scores can be applied to rules for matching


initiating solutions with guiding solutions. As with other types of decision
rules produced by target analysis, these will preferably include reference to
parameters that distinguish different problem instances. The parameter-
based rules similarly can be used to select initiating and guiding solutions
from pre-existing solutions pools. Tunrieling applications of path relinking,
which allow traversal of infeasible regions, and strategic oscillation designs
that purposely drive the search into and out of such regions, are natural
accompaniments for handling recovered solutions that may be infeasible.
Phase 3 constructs parameterized functions of the information generated
in Phase 2, with the goal of finding values of the parameters to create a
master decision role. This rule is designed to choose moves that score highly,
in order to achieve the goal that underlies Phase 2. It should be noted that
the parameters available for constructing a master decision rule depend on
the search method employed. Thus, for example, tabu search may include
parameters that embody various elements of recency-based and frequency-
based memory, together with measures of influence linked to different classes
of attributes or to different regions from which elite solutions have been
derived.
Phase 4 transforms the general design of the master decision rule into
a specific design by applying a model to determine effective values for its
parameters. This model can be a simple set of relationships based on in-
tuition, or can be a more rigorous formulation based on mathematics or
statistics (such as a goal programming or discriminant analysis model, or
even a "connectionist" model based on neural networks).
The components of phases 2,3 and 4 are not entirely distinct, and may be
iterative. On the basis of the outcomes of these phases, the master decision
rule becomes the rule that drives the solution method. In the case of tabu
search, this rule may use feedback of outcomes obtained during the solution
process to modify its parameters for the problem being solved.
Phase 5 concludes the process by applying the master decision rule to
the original representative problems and to other problems from the chosen
solution class to confirm its merit. The process can be repeated and nested
to achieve further refinement.
Target analysis has an additional important function. On the basis of
the information generated during its application, and particularly during its
final confirmation phase, the method produces empirical frequency measures
for the probabilities that choices with high evaluations will lead to an optimal
(or near-optimal) solution within a certain number of steps. These decisions
are not only at tactical levels but also at strategic levels, such as when
728 F. Glover and M. Laguna

to initiate alternative solution phases, and which sources of information


to use for guiding these phases (e.g., whether from processes for tracking
solution trajectories or for recovering and analyzing solutions). By this
means, target analysis can provide inferences concerning expected solution
behavior, as a supplement to classical "worst case" complexity analysis.
These inferences can aid the practitioner by indicating how long to run a
solution method to achieve a solution desired quality, according to specified
empirical probability.
One of the useful features of target analysis is its capacity for taking
advantage of human interaction. The determination of key parameters, and
the rules for connecting them, can draw directly on the insight of the ob-
server as well as on supplementary analytical techniques. The ability to
derive inferences from pre-established knowledge of optimal or near optimal
solutions, instead of manipulating parameters blindly (without information
about the relation of decisions to targeted outcomes), can save significant in-
vestment in time and energy. The key, of course, is to coordinate the phases
of solution and guided re-solution to obtain knowledge that has the great-
est utility. Many potential applications of target analysis exist, and recent
applications suggest the approach holds considerable promise for developing
improved decision rules for difficult optimization problems.

5.6.2 Illustrative Application and Implications


An application of target analysis to a production scheduling problem (La-
guna and Glover, 1993) provides a basis for illustrating some of the relevant
considerations of the approach. In this study, the moves consisted of a com-
bination of swap and insert moves, and scores were generated to identify
the degree to which a move brought a solution closer to the target solution
(which consisted of the best known solution before improving the method
by means of target analysis). In the case of a swap move, for example, a
move might improve, worsen (or, by the measure used, leave unchanged) the
"positional value" of each component of the swap, and by the simplification
of assigning scores of 1,0 or -1 to each component, a move could accordingly
receive a score ranging from 2 to -2. The application of target analysis then
proceeded by tracking the scores of the 10 highest evaluation moves at each
iteration, to determine the circumstances under which the highest evalua-
tions tended to correspond to the highest scores. Both tabu and non-tabu
moves were included in the analysis, to see whether tabu status was also
appropriately defined.
At an early stage of the analysis a surprising relationship emerged. AI-
Thbu Search 729

though the scores of the highest evaluation non-tabu moves ranged across
both positive and negative values, the positive values were largely associ-
ated with moves that improved the schedule while the negative values were
largely associated with moves that worsened the schedule. In short, the
highest evaluations were significantly more "accurate" (corresponded more
closely to high scores) during phases where the objective function value of
the schedule improved than during phases when it deteriorated.
A simple diversification strategy was devised to exploit this discovery.
Instead of relying on the original evaluations during" disimproving phases, "
the strategy supplemented the evaluations over these intervals by assigning
penalties to moves whose component jobs had been moved frequently in the
past. The approach was initiated at a local optimum after the progress of
the search began to slow (as measured by how often a new best solution was
found), and was de-activated as soon as a move was executed that also was
an improving move (to become re-activated the next time that all available
moves were disimproving moves). The outcome was highly effective, pro-
ducing new solutions that were often superior to the best previously found,
especially for larger problems, and also finding the highest quality solutions
more quickly.
The success of this application, in view of its clearly limited scope, pro-
vides an incentive for more thorough applications. For example, a more
complete analysis would reasonably proceed by first seeking to isolate the
high scoring moves during the disimproving phases and to determine how
frequency-based memory and other factors could be used to identify these
moves more effectively. Comparisons between evaluations proposed in this
manner and their associated move scores would then offer a foundation for
identifying more intelligent choices. Classifications to segregate the moves
based on criteria other than "improving" and "disimproving" could also
be investigated. Additional relevant factors that may profitably be taken
into account are examined in the illustration of the next subsection.

A Hypothetical Illustration.

The following hypothetical example embodies a pattern related to the one


uncovered in the scheduling application cited above. However, the pattern
in this case is slightly more ambiguous, and less clearly points to options
that it may be exploited.
For simplicity in this illustration, suppose that moves are scored to be
either "good" or "bad." (If each move changes the value of a single 0-1
variable, for instance, a move may be judged good or bad depending on
730 F. Glover and M. Laguna

Move Rank 1 2 3 4 5
Percent of moves 22 14 10 20 16
with "good" scores

Table 11: Moves throughout the search history.

Move Rank 1 2 3 4 5
Percent of moves 34 21 9 14 7
with "good" scores

Table 12: Moves during improving phases.

whether the assigned value is the same as in the target solution. More
generally, a threshold can be used to differentiate the two classifications.)
Table 11 indicates the percent of time each of the five highest evaluation
moves, restricting attention in this case to those that are non-tabu, receives
a good score during the search history. (At a first stage of conducting the
target analysis, this history could be for a single hard problem, or for a small
collection of such problems.) The Move Rank in the table ranges from 1 to
5, corresponding to the highest evaluation move, the 2nd highest evaluation
move, and so on to the 5th highest evaluation move.
The indicated percent values do not total 100 because good scores may
also be assigned to moves with lower evaluations, whose ranks are not in-
cluded among those shown. Also, it may be expected that some non-tabu
moves will also receive good scores. (A fuller analysis would similarly show
ranks and scores for these moves.)
At first glance, the table appears to suggest that the fourth and fifth
ranked moves are almost as good as the first ranked move, although the
percent of moves that receive good scores is not particularly impressive for
any of the ranks. Without further information, a strategy might be contem-
plated that allocates choices probabilistically among the first, fourth and
fifth ranked moves (though such an approach would not be assured to do
better than choosing the first ranked move at each step). Tables 12 and 13
below provide more useful information about choices that are potentially fa-
vorable, by dividing the iterations into improving and disimproving phases
as in the scheduling study previously discussed.
These tables are based on a hypothetical situation where improving and
disimproving moves are roughly equal in number, so that the percent values
Tabu Search 731

Move Rank 1 2 3 4 5
Percent of moves 8 7 11 26 25
with "good" scores

Table 13: Moves during disimproving phases.

shown in Table 11 are the average of the corresponding values in Tables 12


and 13. (For definiteness, moves that do not change the problem objective
function may be assumed to be included in the improving phase, though a
better analysis might treat them separately.)
The foregoing outcomes to an extent resemble those found in the schedul-
ing study, though with a lower success rate for the highest evaluation im-
proving moves. Clearly Tables 12 and 13 give information that is more
exploitable than the information in Table 11. According to these latter
tables, it would be preferable to focus more strongly on choosing one of
the two highest evaluation moves during an improving phase, and one of
the fourth or fifth highest evaluation moves during a disimproving phase.
This conclusion is still weak in several respects, however, and we examine
considerations that may lead to doing better.
Refining the Analysis.
The approach of assigning scores to moves, as illustrated in Tables 11, 12
and 13, disregards the fact that some solution attributes (such as assign-
ments of values to particular 0-1 variables) may be fairly easy to choose
"correctly," while others may be somewhat harder. Separate tables of the
type illustrated should therefore be created for easy and hard attributes (as
determined by how readily their evaluations lead to choices that would gen-
erate the target solution), since the preferred rules for evaluating moves may
well differ depending on the types of attributes the moves contain. Likewise,
an effective strategy may require that easy and hard attributes become the
focus of different search phases. The question therefore arises as to how to
identify such attributes.
As a first approximation, we may consider an easy attribute to be one
that often generates an evaluation that keeps it out of the solution if it be-
longs out, or that brings it into the solution if it belongs in. A hard attribute
behaves oppositely. Thus, a comparison between frequency-based memory
and move scores gives a straightforward way to differentiate these types of
attributes. Both residence and transition frequencies are relevant, though
residence measures are probably more usually appropriate. For example, an
732 F. Glover and M. Laguna

attribute that belongs to the current solution a high percentage of the time,
and that also belongs to the target solution, would evidently qualify as easy.
On the other hand, the number of times the attribute is accepted or rejected
from the current solution may sometimes be less meaningful than how long
it stays in or out. The fact that residence and transition frequencies are
characteristically used in tabu search makes them conveniently available to
assist in differentiations that can improve the effectiveness of target analysis.

5.6.3 Conditional Dependencies Among Attributes


Tables 11, 12 and 13 suggest that the search process that produced them
is relatively unlikely to find the target solution. Even during improving
phases, the highest evaluation move is almost twice as likely to be bad as
good. However, this analysis is limited, and discloses a limitation of the
tables themselves. In spite of first appearances, it is entirely possible that
these tables could be produced by a search process that successfully obtains
the target solution (by a rule that chooses a highest evaluation move at each
step). The reason is that the relation between scores and evaluations may
change over time. While there may be fairly long intervals where choices
are made poorly, there may be other shorter intervals where the choices
are made more effectively-until eventually one of these shorter intervals
succeeds in bringing all of the proper attributes into the solution.
Such behavior is likely to occur in situations where correctly choosing
some attributes may pave the way for correctly choosing others. The in-
terdependence of easy and hard attributes previously discussed is carried a
step farther by these conditional relationships, because an attribute that at
one point deserves to be classified hard may later deserve to be classified
easy, once the appropriate foundations are laid.
Instead of simply generating tables that summarize results over long
periods of the search history, therefore, it can be important to look for
blocks of iterations where the success rate of choosing good moves may differ
appreciably from the success rate overall. These blocks provide clues about
intermediate solution compositions that may transform hard attributes into
easy ones, and thus about preferred sequences for introducing attributes
that may exploit conditional dependencies. The natural step then is to see
which additional types of evaluation information may independently lead to
identifying such sequences.
A simple instance of this type of effect occurs where the likelihood that
a given attribute will correctly be selected (to enter or leave the solution)
depends roughly on the number of attributes that already correctly belong
Tabu Search 733

to the solution. In such situations, the appropriate way to determine a


"best choice" is therefore also likely to depend on this number of attributes
correctly in solution. Even though such information will not generally be
known during the search, it may be possible to estimate it and adjust the
move evaluations accordingly. Such relationships, as well as the more gen-
eral ones previously indicated, are therefore worth ferreting out by target
analysis.

5.6.4 Differentiating Among Targets


In describing the steps of target analysis, it has already been noted that
scores should not always be rigidly determined by only one specific target,
but may account for alternative targets, and in general may be determined
by the target that is closest to the current solution (by a metric that de-
pends on the context). Acknowledging that there may be more than one
good solution that is worth finding, such a differentiation among targets can
prove useful. Yet even in the case where a particular solution is uniquely the
one to be sought (as where its quality may be significantly better than that
of all others known), alternative targets may be still be valuable to consider
in the role of intermediate solutions, and may provide a springboard to find-
ing additional solutions that are better. Making reference to intermediate
targets is another way of accounting for the fact that conditional depen-
dencies may exist among the attributes, as previously discussed. However,
such dependencies in some cases may be more easily exploited by explicitly
seeking constructions at particular stages that may progressively lead to a
final destination.
Some elite solutions may provide better targets than others because they
are easier to obtain-completely apart from developing strategies to reach
ultimate targets by means of intermediate ones. However, some care is
needed in making the decision to focus on such easier targets as a basis for
developing choice rules. As in the study of Lokketangen and Glover (1997b),
it may be that focusing instead on the harder targets will yield rules that
likewise cause the easier targets to be found more readily, and these rules
may apply to a wider spectrum of problems than those derived by focusing
on easier targets.

5.6.5 Generating Rules by Optimization Models


Target analysis can use optimization models to generate decision rules by
finding weights for various decision criteria to create a composite (master)
734 F. Glover and M. Laguna

rule. To illustrate, let G and B respectively denote index sets for good moves
and bad moves, as determined from move scores, as in the classification em-
bodied in Tables 11 , 12 and 13. Incorporate the values of the different
decision criteria in a vector Ai for i E G and i E Bj i.e., the jth component
aij of Ai is the value assigned to move i by the decision criterion j. These
components need not be the result of rules, but can simply correspond to
data considered relevant to constructing rules. In the tabu search setting,
such data can include elements of recency-based and frequency- based mem-
ory. Then we may consider a master rule which is created by applying a
weight vector w to each vector Ai to produce a composite decision value
AiW = Ej aijWj. An ambitious objective is to find a vector w that yields

AiW > 0 for i E G


AiW ~ 0 for i E B

If such a weight vector w could be found, then all good moves would have
higher evaluations by the composite criterion than all bad moves, which of
course is normally too much to ask. A step toward formulating a more
reasonable goal is as follows. Let G( iter) and B( iter) identify the sets G and
B for a given iteration iter. Then an alternative objective is to find a w so
that, at each such iteration, at least one i E G(iter) would yield

Aiw > Akw for all k E B{iter)


or equivalently

Max{AiW : i E G(iter)} > Max{Akw : k E B(iter)}


This outcome would insure that a highest evaluation move by the com-
posite criterion will always be a good move. Naturally, this latter goal is
still too optimistic. Nevertheless, it is possible to devise goal programming
models (related to LP discriminant analysis models) that can be used to
approximate this goal. A model of this type has proved to be effective for
devising branching rules to solve a problem of refueling nuclear reactors
(Glover, Klingman and Phillips, 1990).
A variety of opportunities exist for going farther in such strategies. For
example, issues of creating nonlinear and discontinuous functions to achieve
better master rules can be addressed by using trial functions to transform
components of Ai vectors into new components, guided by LP sensitivity
and postoptimality analysis. Target analysis ideas previously indicated can
also be useful in this quest.
Tabu Search 735

The range of possibilities for taking advantage of target analysis is con-


siderable, and for the most part only the most rudimentary applications of
this learning approach have been initiated. The successes of these applica-
tions make further exploration of this approach attractive.

6 Neglected Tabu Search Strategies


We briefly review several key strategies in tabu search that are often ne-
glected (especially in beginning studies), but which are important for pro-
ducing the best results.
Our purpose is to call attention to the relevance of particular elements
that are mutually reinforcing, but which are not always discussed "side by
side" in the literature, and which deserve special emphasis. In addition,
observations about useful directions for future research are included.
A comment regarding implementation: first steps do not have to include
the most sophisticated variants of the ideas discussed in the following sec-
tions, but the difference between "some inclusion" and "no inclusion" can
be significant. Implementations that incorporate simple instances of these
ideas will often disclose the manner in which refined implementations can
lead to improved performance.
The material that follows brings together ideas described in preceding
sections to provide a perspective on how they interrelate. In the process, a
number of additional observations are introduced.

6.1 Candidate List Strategies


Efficiency and quality can be greatly affected by using intelligent procedures
for isolating effective candidate moves, rather than trying to evaluate every
possible move in a current neighborhood of alternatives. This is particularly
true when such a neighborhood is large or expensive to examine. The gains
to be achieved by using candidate lists have been widely documented, yet
many TS studies overlook their relevance.
Careful organization in applying candidate lists, as by saving evaluations
from previous iterations and updating them efficiently, can also be valuable
for reducing overall effort. Time saved in these ways allows a chance to
devote more time to higher level features of the search.
While the basic theme of candidate lists is straightforward, there are
some subtleties in the ways candidate list strategies may be used. Con-
siderable benefit can result by being aware of fundamental candidate list
approaches, such as the Subdivision Strategy, the Aspiration Plus Strategy,
736 F. Glover and M. Laguna

the Elite Candidate List Strategy, the Bounded Change Strategy and the
Sequential Fan Strategy (as discussed in Section 3).
An effective integration of a candidate list strategy with the rest of a
tabu search method will typically benefit by using TS memory designs to
facilitate functions to be performed by the candidate lists. This applies
especially to the use of frequency based memory. A major mistake of some
TS implementations, whether or not they make use of candidate lists, is to
consider only the use of recency based memory. Frequency based memory-
which itself takes different forms in intensification phases and diversification
phases-cannot only have a dramatic impact on the performance of the
search in general but also can often yield gains in the design of candidate
list procedures. A useful way to meld different candidate list procedures is
described in Glover (1997).

6.2 Intensification Approaches


Intensification strategies, which are based on recording and exploiting elite
solutions or, characteristically, specific features of these solutions, have proved
very useful in a variety of applications. Some of the relevant forms of such
strategies and considerations for implementing them are as follows.

6.2.1 Restarting with Elite Solutions


The simplest intensification approach is the strategy of recovering elite so-
lutions in some order, each time the search progress slows, and then using
these solutions as a basis for re-initiating the search. The list of solutions
that are candidates to be recovered is generally limited in size, often in the
range of 20 to 40 (although in parallel processing applications the num-
ber is characteristically somewhat larger). The size chosen for the list in
serial TS applications also corresponds roughly to the number of solution
recoveries anticipated to be done during the search, and so may be less or
more depending on the setting. When an elite solution is recovered from
the list, it is removed, and new elite solutions are allowed to replace less
attractive previous solutions- usually dropping the worst of the current
list members. However, if a new elite solution is highly similar to a solu-
tion presently recorded, instead of replacing the current worst solution, the
new solution will compete directly with its similar counterpart to determine
which solution is saved.
This approach has been applied very effectively in job shop and flow shop
scheduling, in vehicle routing, and in telecommunication design problems.
Tabu Search 737

One of the best approaches for scheduling applications keeps the old TS
memory associated with the solution, but mak~s sure the first new move
away from this solution goes to a different neighbor than the one visited
after encountering this solution the first time. Another effective variant
does not bother to save the old TS memory, but uses a probabilistic TS
choice design.
The most common strategy is to go through the list from best to worst,
but in some cases it has worked even better to go through the list in the
other direction. In this approach, it appears effective to allow two passes
of the list. On the first pass, when a new elite solution is found that falls
below the quality of the solution currently recovered, but which is still better
than the worst already examined on the list, the method still adds the new
solution to the list and displaces the worst solution. Then a second pass,
after reaching the top of the list, recovers any added solutions not previously
recovered.

6.2.2 Frequency of Elite Solutions


Another primary intensification strategy is to examine elite solutions to
determine the frequency in which particular solution attributes occur (where
the frequency is typically weighted by the quality of the solutions in which
the attributes are found).
This strategy was originally formulated in the context of identifying
" consistent" and "strongly determined" variables-where, loosely speak-
ing, consistent variables are those more frequently found in elite solutions,
while strongly determined variables are those that would cause the greatest
disruption by changing their values (as sometimes approximately measured
by weighting the frequencies based on solution quality). The idea is to iso-
late the variables that qualify as more consistent and strongly determined
(according to varying thresholds), and then to generate new solutions that
give these variables their "preferred values." This can be done either by
rebuilding new solutions in a multistart approach or by modifying the choice
rules of an ongoing solution effort to favor the inclusion of these value as-
signments.
Keeping track of the frequency that elite solutions include particular at-
tributes (such as edges of tours, assignments of elements to positions, narrow
ranges of values taken on by variables, etc.) and then favoring the inclusion
of the highest frequency elements, effectively allows the search to concen-
trate on finding the best supporting uses and values of other elements. A
simple variant is to "lock in" a small subset of the most attractive attributes
738 F. Glover and M. Laguna

(value assignments)-allowing this subset to change over time or on different


passes.

A Relevant Concern:
In the approach that starts from a current (good) solution, and tries to bring
in favored elements, it is important to introduce an element that yields a
best outcome from among the current contenders (where, as always, best is
defined to encompass considerations that are not solely restricted to objec-
tive function changes). If an attractive alternative move shows up during
this process, which does not involve bringing in one of these elements, aspi-
ration criteria may determine whether such a move should be taken instead.
Under circumstances where the outcome of such a move appears sufficiently
promising, the approach may be discontinued and allowed to enter an im-
proving phase (reflecting a decision that enough intensification has been
applied, and it is time to return to searching by customary means).
Intensification of this form makes it possible to determine what percent
of " good attributes" from prior solutions should be included in the solution
currently generated. It also gives information about which subsets of these
attributes should go together, since it is preferable not to choose attributes
during this process that cause the solution to deteriorate compared to other
choices. This type of intensification strategy has proved highly effective in
the settings of vehicle routing and zero-one mixed integer optimization.

6.2.3 Memory and Intensification


It is clearly somewhat more dangerous to hold elements "in" solution than
to hold them "out" (considering that a solution normally is composed of a
small fraction of available elements-as where a tree contains only a frac-
tion of the edges of a graph). However, there is an important exception,
previously intimated. As part of a longer term intensification strategy, el-
ements may be selected very judiciously to be "locked in" on the basis of
having occurred with high frequency in the best solutions found. In that
case, choosing different mutually compatible (and mutually reinforcing) sets
to lock in can be quite helpful. This creates a combinatorial implosion effect
(opposite to a combinatorial explosion effect) that shrinks the solution space
to a point where best solutions over the reduced space are likely to be found
more readily.
The key to this type of intensification strategy naturally is to select an
appropriate set of elements to lock in, but the chances appear empirically
to be quite high that some subset of those with high frequencies in earlier
Tabu Search 739

best solutions will be correct. Varying the subsets selected gives a signifi-
cant likelihood of picking a good one. (More than one subset can be correct,
because different subsets can still be part of the same complete set.) Aspi-
ration criteria make it possible to drop elements that are supposedly locked
in, to give this approach more flexibility.

6.2.4 Relevance of Clustering for Intensification


A search process over a complex space is likely to produce clusters of elite
solutions, where one group of solutions gives high frequencies for one set of
attributes and another group gives high frequencies for a different set. It
is important to recognize this situation when it arises. Otherwise there is
a danger that an intensification strategy may try to compel a solution to
include attributes that work against each other. This is particularly true
in a strategy that seeks to generate a solution by incorporating a collec-
tion of attributes "all at once," rather than using a step by step evaluation
process that is reapplied at each move through a neighborhood space. (Step-
ping through a neighborhood has the disadvantage of being slower, but may
compensate by being more selective. Experimentation to determine the cir-
cumstances under which each of these alternative intensification approaches
may be preferable would be quite valuable.)
A strategy that incorporates a block of attributes together may yield
benefits by varying both the size and composition of the subsets of high
frequency "attractive" attributes, even if these attributes are derived from
solutions that lie in a common cluster, since the truly best solutions may
not include them all. Threshold based forms of logical restructuring, as
discussed in Section 3, may additionally lead to identifying elements to inte-
grate into solutions that may not necessarily belong to solutions previously
encountered. The vocabulary building theme becomes important in this
connection. The relevance of clustering analysis for logical restructuring
and vocabulary building is reinforced by the use of a related conditional
analysis, which is examined subsequently in Section 6.5.

6.3 Diversification Approaches


Diversification processes in tabu search are sometimes applied in ways that
limit their effectiveness, due to overlooking the fact that diversification is
not just "random" or "impulsive," but depends on a purposeful blend of
memory and strategy. As noted in Section 3, recency and frequency based
memory are both relevant for diversification. Historically, these ideas stem
740 F. Glover and M. Laguna

in part from proposals for exploiting surrogate constraint methods. In this


setting, the impetus is not simply to achieve diversification, but to derive
appropriate weights in order to assure that evaluations will lead to solutions
that satisfy required conditions (see Section 5). Accordingly, it is important
to account for elements such as how often, to what extent, and how recently,
particular constraints have been violated, in order to determine weights that
produce more effective valuations.
The implicit learning effects that underlie such uses of recency, frequency
and influence are analogous to those that motivate the procedures used for
diversification (and intensification) in tabu search. Early strategic oscillation
approaches exploited this principle by driving the search to various depths
outside (and inside) feasibility boundaries, and then employing evaluations
and directional search to move toward preferred regions.
In the same way that these early strategies bring diversification and
intensification together as part of a continuously modulated process, it is
important to stress that these two elements should be interwoven in gen-
eral. A common mistake in many TS implementations is to apply diversifi-
cation without regard for intensification. "Pure" diversification strategies
are appropriate for truly long term strategies, but over the intermediate
term, diversification is generally more effective if it is applied by heeding
information that is also incorporated in intensification strategies. In fact,
intensification by itself can sometimes cause a form of diversification, be-
cause intensifying over part of the space allows a broader search of the rest
of the space. A few relevant concerns are as follows.

6.3.1 Diversification and Intensification Links


A simple and natural diversification approach is to keep track of the fre-
quency that attributes occur in non-elite solutions, as opposed to solutions
encountered in general, and then to periodically discourage the incorpo-
ration of attributes that have modest to high frequencies (giving greater
penalties to larger frequencies). The reference to non-elite solutions tends
to avoid penalizing attributes that would be encouraged by an intensification
strategy.
More generally, for a "first level" balance, an Intermediate Term Mem-
ory matrix may be used, where the high frequency items in elite solutions
are not penalized by the long term values, but may even be encouraged.
The tradeoffs involved in establishing the degree of encouragement, or the
degree of reducing the penalties, represents an area where a small amount of
preliminary testing can be valuable. This applies as well to picking thresh-
Tabu Search 741

olds to identify high frequency items. (Simple guesses about appropriate


parameter values can often yield benefits, and tests of such initial guesses
can build an understanding that leads to increasingly effective strategies.)
By extension, if an element has never or rarely been in a solution gener-
ated, then it should be given a higher evaluation for being incorporated in
a diversification approach if it was "almost chosen" in the past but didn't
make the grade. This observation has not been widely heeded, but is not
difficult to implement, and is relevant to intensification strategies as well.
The relevant concerns are illustrated in the discussion of "Persistent At-
tractiveness" and" Persistent Voting" in Chapter 7 of Glover and Laguna
(1997).

6.3.2 Implicit Conflict and the Importance of Interactions


Current evaluations also should not be disregarded while diversification in-
fluences are activated. Otherwise, a diversification process may bring ele-
ments together that conflict with each other, make it harder rather than
easier to find improved solutions.
For example, a design that gives high penalties to a wide range of el-
ements, without considering interactions, may drive the solution to avoid
good combinations of elements. Consequently, diversification- especially
in intermediate term phases-should be carried out for a limited number
of steps, accompanied by watching for and sidestepping situations where
indiscriminately applying penalties would create incompatibilities or severe
deterioration of quality. To repeat the theme: even in diversification, at-
tention to quality is important. And as in "medical remedies," sometimes
small doses are better than large ones. Larger doses (Le., more radical de-
partures from previous solutions) which are normally applied less frequently,
can still benefit by coordinating the elements of quality and change.

6.3.3 Reactive Tabu Search


An approach called Reactive Tabu Search (RTS) developed by Battiti and
Tecchiolli (1992, 1994a) deserves additional consideration as a way to achieve
a useful blend of intensification and diversification. RTS incorporates hash-
ing in a highly effective manner to generate attributes that are very nearly
able to differentiate among distinct solutions. That is, very few solutions
contain the same hashed attribute, applying standard hash function tech-
niques. Accompanying this, Battiti and Tecchiolli use an automated tabu
tenure, which begins with the value of 1 (preventing a hashed attribute from
742 F. Glover and M. Laguna

being reinstated if this attribute gives the" signature" of the solution visited
on the immediately preceding step). This tenure is then increased if exam-
ination shows the method is possibly cycling, as indicated by periodically
generating solutions that produce the same hashed attribute.
The tabu tenure, which is the same for all attributes, is increased expo-
nentially when repetitions are encountered, and decreased gradually when
repetitions disappear. Under circumstances where the search nevertheless
encounters an excessive number of repetitions within a given span (i.e., where
a moving frequency measure exceeds a certain threshold), a diversification
step is activated, which consists of making a number of random moves pro-
portional to a moving average of the cycle length.
The reported successes of this approach invite further investigations of
its underlying ideas and related variants. As a potential basis for gener-
ating such variants, attributes created by hashing may be viewed as fine
grain attributes, which give them the ability to distinguish among different
solutions. By contrast, "standard" solution attributes, which are the raw
material for hashing, may be viewed as coarse grain attributes, since each
may be contained in (and hence provide a signature for) many different so-
lutions. Experience has shown that tabu restrictions based on coarse grain
attributes are often advantageous for giving increased vigor to the search.
(There can exist a variety of ways of defining and exploiting attributes,
particularly at coarser levels, which complicates the issue somewhat.) This
raises the question of when particular degrees of granularity are more effec-
tive than others.
It seems reasonable to suspect that fine grain attributes may yield greater
benefits if they are activated in the vicinity of elite solutions, thereby allow-
ing the search to scour "high quality terrain" more minutely. This effect
may also be achieved by reducing tabu tenures for coarse grain attributes-
or basing tabu restrictions on attribute conjunctions-and using more spe-
cialized aspiration criteria. Closer scouring of critical regions can also be
brought about by using strongly focused candidate list strategies, such as
a sequential fan candidate list strategy. (Empirical comparisons of such al-
ternatives to hashing clearly would be of interest.) On the other hand, as
documented by Nonoke and Ibaraki (1997), the use of "extra coarse grain"
attributes (those that prohibit larger numbers of moves when embodied in
tabu restrictions) can prove advantageous for solving large problems over a
broadly defined problem domain.
Another type of alternative to hashing also exists, which is to create
new attributes by processes that are not so uniform as hashing. A potential
drawback of hashing is its inability to distinguish the relative importance
Tabu Search 743

(and appropriate influence) of the attributes that it seeks to map into others
that are fine grained. A potential way to overcome this drawback is to make
use of vocabulary building (Glover and Laguna, 1997) and of conditional
analysis (Section 6.5).

6.4 Strategic Oscillation


A considerable amount has been written on strategic oscillation and its ad-
vantages. However, one of the uses of this approach that is frequently over-
looked involves the idea of oscillating among alternative choice rules and
neighborhoods. As stressed in Section 4, an important aspect of strategic
oscillation is the fact that there naturally arise different types of moves and
choice rules that are appropriate for negotiating different regions and dif-
ferent directions of search. Thus, for example, there are many constructive
methods in graph and scheduling problems, but strategic oscillation fur-
ther leads to the creation of complementary "destructive methods" which
can operate together with their constructive counterparts. Different crite-
ria emerge as relevant for selecting a move to take on a constructive step
versus one to take on a destructive step. Similarly, different criteria apply
according to whether moves are chosen within a feasible region or outside
a feasible region (and whether the search is moving toward or away from a
feasibility boundary).
The variation among moves and evaluations introduces an inherent vi-
tality into the search that provides one of the sources underlying the success
of strategic oscillation approaches. This reinforces the motivation to apply
strategic oscillation to the choice of moves and evaluation criteria themselves,
selecting moves from a pool of possibilities according to rules for transition-
ing from one choice to another. In general, instead of picking a single rule, a
process of invoking multiple rules provides a range of alternatives that run
all the way from "strong diversification" to "strong intensification. "
This form of oscillation has much greater scope than may at first be
apparent, because it invokes the possibility of simultaneously integrating
decision rules and neighborhoods, rather than only visiting them in a strate-
gically determined sequence.
Such concepts are beginning to find counterparts in investigations being
launched by the computer science community. The "agent" terminology is
being invoked in such applications to characterize different choice mecha-
nisms and neighborhoods as representing different agents. Relying on this
representation, different agents then are assigned to work on (or "attack")
the problem serially or in parallel. The CS community has begun to look
744 F. Glover and M. Laguna

upon this as a significant innovation-unaware of the literature where such


ideas were introduced a decade or more ago-and the potential richness and
variation of these ideas still seems not to be fully recognized. For example,
there have not yet been any studies that consider the idea of "strategically
sequencing" rules and neighborhoods, let alone those that envision the no-
tion of parametric integration. The further incorporation of adaptive mem-
ory structures to enhance the application of such concepts also lies somewhat
outside the purview of most current CS proposals. At the same time, how-
ever, TS research has also neglected to conduct empirical investigations of
the broader possibilities. This is clearly an area that deserves fuller study.

6.5 Clustering and Conditional Analysis


To reinforce the theme of identifying opportunities for future research, we
provide an illustration to clarify the relevance of clustering and conditional
analysis, particularly as a basis for intensification and diversification strate-
gies in tabu research.
An Example: Suppose 40 elite solutions have been saved during the
search, and each solution is characterized as a vector x of zero-one variables
Xj for j E N = {I, ... ,n}. Assume the variables that receive positive values
in at least one of the elite solutions are indexed Xl to X30. (Commonly
in such circumstances, n may be expected to be somewhat larger than the
number of positive valued variables, e.g., in this case, reasonable values may
be n = 100 or 1000.)
For simplicity, we restrict attention to a simple weighted measure of con-
sistency which is given by the frequency that the variables Xl to X30 receive
the value 1 in these elite solutions. (We temporarily disregard weightings
based on solution quality and other aspects of "strongly determined" as-
signments.) Specifically, assume the frequency measures are as shown in
Table 14.
Since each of Xl to Xl5 receives a value of 1 in 24 of the 40 solutions, these
variables tie for giving "most frequent" assignments. An intensification
strategy that favors the inclusion of some number of such assignments would
give equal bias to introducing each of Xl to Xl5 at the value 1. (Such a bias
would typically be administrated either by creating modified evaluations or
by incorporating probabilities based on such evaluations.)
To illustrate the relevance of clustering, suppose the collection of 40 elite
solutions can be partitioned into two subsets of 20 solutions each, whose
characteristics are summarized in Table 15.
A very different picture now emerges. The variables Xl to Xl5 no longer
Tabu Search 745

Variables Xj = 1 Number of Solutions


Xl to Xl5 24
Xl6 to X20 21
X21 to X25 17
X26 to X30 12

Table 14: Frequency measures.

Subset 1 (20 solutions) Subset 2 (20 solutions)


Variables Xj = 1 Number of Solutions Variables Xj = 1 Number of Solutions
Xu to Xl5 20 Xl6 to X20 20
X21 to X25 16 X6 to XIO 20
Xl to Xs 12 Xl to Xs 12
X6 to XIO 8 X26 to X30 8
X26 to X30 4 Xu to XIS 4
Xl6 to X20 1 X21 to X25 1

Table 15: Frequency measures for two subsets.

appear to deserve equal status as "most favored" variables. Treating them


with equal status may be a useful source of diversification, as opposed to
intensification, but the clustered data provide more useful information for
diversification concerns as well. In short, clustering gives a relevant contex-
tual basis for determining the variables (and combinations of variables) that
should be given special treatment.

6.5.1 Conditional Relationships

To go a step beyond the level of differentiation provided by cluster analysis,


it is useful to sharpen the focus by referring explicitly to interactions among
variables. Such interactions can often be identified in a very straightforward
way, and can form a basis for more effective clustering. In many types of
problems, the number of value assignments (or the number of "critical at-
tributes") needed to specify a solution is relatively small compared to the
total number of problem variables. (For example, in routing, distribution
and telecommunication applications, the number of links contained in fea-
sible constructions is typically a small fraction of those contained in the
underlying graph.) Using a 0-1 variable representation of possibilities, it
746 F. Glover and M. Laguna

is not unreasonable in such cases to create a cross reference matrix, which


identifies variables (or coded attributes) that simultaneously receive a value
of 1 in a specific collection of elite solutions.
To illustrate, suppose the index set P = {I, ... ,p} identifies the variables
Xj that receive a value of 1 in at least r solutions from the collection of elite
solutions under consideration. (Apart from other strategic considerations,
the parameter r can also be used to control the size of p, since larger values
of r result in smaller values of p.)
Then create a p x p symmetric matrix M whose entries mij identify
the number of solutions in which Xi and Xj are both 1. (Thus, row Mi
of M represents the sum of the solution vectors in which Xi= 1, restricted
to components Xj for j E P.) The value mii identifies the total number
of elite solutions in which Xi = 1, and the value mij / mii represents the
" conditional probability" that X j = 1 in this subset of solutions. Because p
can be controlled to be of modest size, as by the choice of r and the number
of solutions admitted to the elite set, the matrix M is not generally highly
expensive to create or maintain.
By means of the conditional probability interpretation, the entries of
M give a basis for a variety of analyses and choice rules for incorporating
=
preferred attributes into new solutions. Once an assignment Xj 1 is made
in a solution currently under consideration (which may be either partly or
completely constructed), an updated conditional matrix M can be created
by restricting attention to elite solution vectors for which Xj = 1. (Restricted
updates of this form can also be used for look-ahead purposes.) Weighted
versions of M, whose entries additionally reflect the quality of solutions in
which specific assignments occur, likewise can be used.
Critical event memory provides a convenient mechanism to maintain
appropriate variation when conditional influences are taken into account.
The "critical solutions" associated with such memory in the present case
are simply those constituting a selected subset of elite solutions. Frequency
measures for value assignments can be obtained by summing these solution
vectors for problems with 0-1 representations and the critical event control
mechanisms can then assure assignments are chosen to generate solutions
that differ from those of previous elite solutions.
Conditional analysis, independent of such memory structures, can also
be a useful foundation for generating solution fragments to be exploited by
vocabulary building processes.
Tabu Search 747

6.6 Referent-Domain Optimization


Referent-domain optimization is based on introducing one or more opti-
mization models to strategically restructure the problem or neighborhood,
accompanied by auxiliary heuristic or algorithmic process to map the so-
lutions back to the original problem space. The optimization models are
characteristically devised to embody selected heuristic goals (e.g., of inten-
sification, diversification or both), within the context of particular classes of
problems.
There are several ways to control the problem environment as a basis
for applying referent-domain optimization. A natural control method is
to limit the structure and range of parameters that define a neighborhood
(or the rules used to navigate through a neighborhood), and to create an
optimization model that operates under these restricted conditions.
The examples that follow assume the approach starts from a current
trial solution, which mayor may not be feasible. The steps described yield
a new solution, and then the step is repeated, using tabu search as a master
guiding strategy to avoid cycling, and to incorporate intensification and
diversification.
Example 1. A heuristic selects k variables to change values, holding other
variables constant. An exact method determines the (conditionally) optimal
new values of the k selected variables.
Example 2. A heuristic identifies a set of restrictive bounds that bracket
the values of the variables in the current trial solution (where the bounds
may compel some variables to take on a single value). An exact method
determines an optimal solution to the problem as modified to include these
bounds.
Example 3. A heuristic selects a restructured and exploitable region
around the current solution to search for an alternative solution. An exact
method finds the best solution in this region.
Example 4. For add/drop neighborhoods, a heuristic chooses k elements
to add (or to drop). For example, the heuristic may operate by both adding
and dropping k specific elements, as in k-opt moves for the TSP or k-swap
moves for graph bipartitioning that add and drop k nodes. Then, attention
is restricted to consider only the subset of elements added or the subset
of elements dropped (and further restricted in the case of a bipartitioning
problem to just one of the two sets). Then an exact method identifies
the remaining k elements to drop (or to add), that will complete the move
optimally.
Example 5. A heuristic chooses a modified problem formulation, that
748 F. Glover and M. Laguna

also admits the current trial solution as a trial solution. (For example, the
heuristic may relax some part of the formulation and/or restrict another
part.) An exact method then finds an optimal solution to the modified
formulation. An illustration occurs where a two phase exact algorithm first
finds an optimal solution to a relaxed portion of the problem, and then
finds an optimal solution to a restricted portion. Finally, a small part of the
feasible region of the original problem close to or encompassing this latter
solution is identified, and an exact solution method finds an optimal solution
in this region.
Example 6. The use of specially constructed neighborhoods (and aggre-
gations or partitions of integer variables) permits the application of mixed
integer programming (MIP) models to identify the best options from all
·moves of depth at most k (or from associated collections of at most k vari-
abies). When k is sufficiently small, such MIP models can be quite tractable,
and produce moves considerably more powerful than those provided by lower
level heuristics.
Example 7. In problems with graph-related structures, the imposition of
directionality or non-looping conditions gives a basis for devising generalized
shortest path (or dynamic programming) models to generate moves that are
optimal over a significant subclass of possibilities. This type of approach
gives rise to a combinatorial leverage phenomenon, where a low order effort
(e.g., linear or quadratic) can yield solutions that dominate exponential
numbers of alternatives. (See, e.g., Glover (1992), Rego (1996a), Punnen
and Glover (1997).)
Example 8. A broadly applicable control strategy, similar to that of
a relaxation procedure but more flexible, is to create a proxy model that
resembles the original problem of interest, and which is easier to solve. Such
an approach must be accompanied with a method to transform the solution
to the proxy model into a trial solution for the original problem. A version
of such an approach, which also induces special structure into the proxy
model, can be patterned after layered surrogate/Lagrangean decomposition
strategies for mixed integer optimization.
Referent-domain optimization can also be applied in conjunction with
target analysis to create more effective solution strategies. In this case, a
first stage learning model, based on controlled solution attempts, identifies a
set of desired properties of good solutions, together with target solutions (or
target regions) that embody these properties. Then a second stage model is
devised to generate neighborhoods and choice rules to take advantage of the
outcomes of the learning model. Useful strategic possibilities are created by
basing these two models on a proxy model for referent-domain optimization,
Tabu Search 749

to structure the outcomes so that they may be treated by one of the control
methods indicated in the foregoing examples.

6.7 Final Considerations


It is natural to be tempted to implement the most rudimentary forms of
a method. More than a few papers on tabu search examine only a small
portion of the elements of short term memory, and examine little or nothing
at all of longer term memory. Unfortunately, in some cases these papers
also present themselves as embodying the essence of tabu search.
A factor that has reinforced the tendency to examine a limited part of
tabu search (aside from convenience, which can be sensible in early stages
of an investigation), is that such a focus has sometimes produced very ap-
pealing results. When reasonably decent outcomes can be found without
great effort, the motive to look further is diminished. The danger, of course,
lies in failing to discover significant gains that are likely to be achieved by a
more complete approach.
It is appropriate to acknowledge that attention may be given to a limited
subset of ideas from an overall search framework for the following reasons:

(1) such a focus may help to uncover a better form for the strategies
associated with this subset;

(2) weaknesses of this subset, when studied in isolation from other ideas,
may stand out more clearly, thus yielding insights into the features of a more
complete approach that are required to produce a better method;

(3) for methods which are susceptible to highly "modular" implemen-


tations, as typically occurs for tabu search, simpler designs can readily be
made a part of more complex designs.

There remains a consideration that is often overlooked. As is true of


meta-heuristics in general, tabu search offers a framework for problem solv-
ing, as opposed to a rigidly detailed collection of prescriptions about how
this framework is best applied. Current research is disclosing instances of
this framework that yield remarkably effective outcomes, and is laying the
foundation to identify which components of tabu search will ultimately op-
erate to greatest advantage under various conditions. The reference to a
"more complete" form of tabu search is therefore partly ambiguous, be-
cause the framework includes more options than are appropriate to examine
750 F. Glover and M. Laguna

in any single design-at least, given the present stage of our understanding.
Nevertheless, there remains a substantial difference between the compre-
hensiveness of various tabu search procedures that appear in the literature.
Notably, the more comprehensive ones almost always perform appreciably
better than the others.
Finally, in the quest to develop advanced tabu search approaches, it is
worth noting there is a difference between "advanced" and "complex". An
advanced method may be relatively simple in its structure, and be easy for
others to use (as by a self-calibration system for setting its parameters).
The difference between rudimentary and advanced methods is analogous to
the difference between rudimentary and advanced search paths. A more
advanced path may represent a collection of links that are fabricated from
a richer set of components. But the path itself need not be complex. In
fact, the use of a richer set of components may be the key to allowing the
path to have a simple form. An advanced tabu search method may require
more effort to identify, yet may be easier to apply than methods which rest
on simpler foundations. An important challenge for research is to identify
the particular elements that combine to make an advanced method effective
and, at the same time, to determine the simplifications that become possible
by self- calibrating designs for interrelating these elements.
The fundamental message is that a great deal remains to be learned
about tabu search. Evidently, we also still know very little about how we
ourselves use memory in our problem solving. It is not inconceivable that
discoveries about effective uses of memory within our search methods will
provide clues about strategies that humans are adept at employing-or may
advantageously be taught to employ. The potential links between the areas
of heuristic search and psychology have scarcely been examined. Unques-
tionably, in the realm of optimization, we have not yet investigated the
strategic possibilities at a level that comes close to disclosing their full po-
tential. The numerous successes of tabu search implementations provide
encouragement that such issues are profitable to probe more fully.

References
[1] Ackley, D. (1987) "A Connectionist Model for Genetic Hillclimbing, "
Kluwer, Dordrecht. Academic Publishers.

[2] Back, T., F. Hoffmeister and H. Schwefel (1991) "A Survey of Evolu-
tion Strategies," Proceedings of the Fourth International Conference on
Genetic Algorithms, R. Belew and L~ Booker (eds.), pp. 2-9.
Tabu Search 751

[3] Battiti, R. and G. Tecchiolli (1992) "Parallel Based Search for Combi-
natorial Optimization: Genetic Algorithms and Tabu Search," Micro-
processor and Microsystems, Vol. 16, pp. 351-367.

[4] Battiti, R. and G. Tecchiolli (1994a) "The Reactive Tabu Search,"


ORSA Journal on Computing, Vol. 6, No.2, pp. 126-140.

[5] Beyer, D. and R. Ogier (1991) "Tabu Learning: A Neural Network


Search Method for Solving Nonconvex Optimization Problems," Pro-
ceedings of the International Conference in Neural Networks, IEEE and
INNS, Singapore.

[6] Consiglio, A. and S.A. Zenios (1996) "Designing Portfolios of Financial


Products via Integrated Simulation and Optimization Models," Report
96-05, Department of Public and Business Administration, University of
Cyprus, Nicosia, CYPRUS, to appear in Operations Research.

[7] Cung, V-D., T. Mautor, P. Michelon, A. Tavares (1996) "Scatter Search


for the Quadratic Assignment Problem," Laboratoire PRiSM-CNRS
URA 1525.

[8] Davis, L. (1989) "Adapting Operator Probabilities in Genetic Algo-


rithms," Proceedings of the Third International Conference on Genetic
Algorithms, Morgan Kaufmann, San Mateo, CA, pp. 61-69.

[9] Eiben, A. E., P-E Raue and Zs. Ruttkay (1994) "Genetic Algorithms
with Multi-Parent Recombination," Proceedings of the Third Interna-
tional Conference on Parallel Problem Solving from Nature (PPSN),
Y. Davidor, H-P Schwefel and R. Manner (eds.), New York: Springer-
Verlag, pp. 78-87.

[10] Eschelman, L. J. and J. D. Schaffer (1992) "Real-Coded Genetic Algo-


rithms and Interval-Schemata, " Technical Report, Phillips Laboratories.

[11] Foo, T. and M. G. C. Resende (1989) "A probabilistic Heuristic for a


Computationally Difficult Set Covering Problem," Operations Research
Letters, Vol. 8, pp. 67-71.

[12] Feo, T. and M. G. C. Resende (1995) "Greedy Randomized Adaptive


Search Procedures," Journal of Global Optimization, Vol. 2, pp. 1-27.

[13] Fleurent, C., F. Glover, P. Michelon and Z. Valli (1996) "A Scatter
Search Approach for Unconstrained Continuous Optimization," Pro-
752 F. Glover and M. Laguna

ceedings of the 1996 IEEE International Conference on Evolutionary


Computation, 643-648.

[14] Freville, A. and G. Plateau (1986) "Heuristics and Reduction Methods


for Multiple Constraint 0-1 Linear Programming Problems," European
Journal of Operational Research, 24, 206-215.

[15] Freville, A. and G. Plateau (1993) "An Exact Search for the Solution
of the Surrogate Dual of the 0-1 Bidimensional Knapsack Problem,"
European Journal of Operational Research, 68, 413-421.

[16] Glover, F. (1963) "Parametric Combinations of Local Job Shop Rules,"


Chapter IV, ONR Research Memorandum no. 117, GSIA, Carnegie Mel-
lon University, Pittsburgh, PA.
[17] Glover, F. (1968) "Surrogate Constraints," Operations Research, 16,
741-749.
[18] Glover, F. (1975) "Surrogate Constraint Duality in Mathematical Pro-
gramming," Operations Research, 23, 434-451.
[19] Glover, F. (1977) "Heuristics for Integer Programming Using Surrogate
Constraints," Decision Sciences, Vol 8, No 1, 156-166.
[20] Glover, F. (1989) "Tabu Search - Part I," ORSA Journal on Comput-
ing, Vol. 1, pp. 190-206.

[21] Glover, F. (1992) "Ejection Chains, Reference Structures and Alter-


nating Path Methods for Traveling Salesman Problems," University of
Colorado. Shortened version published in Discrete Applied Mathematics,
1996, 65, 223-253.
[22] Glover, F. (1994a) "Genetic Algorithms and Scatter Search: Unsus-
pected Potentials," Statistics and Computing, 4, 131-140.

[23] Glover, F. (1995a) "Scatter Search and Star-Paths: Beyond the Genetic
Metaphor," OR Spektrum, Vol. 17, pp. 125-137.
[24] Glover, F. (1997) "A Template for Scatter Search and Path Relinking,"
to appear in Lecture Notes in Computer Science, J.K. Hao, E. Lutton,
E. Ronald, M. Schoenauer, D. Snyers (Eds.).
[25] Glover, F. and H. Greenberg (1989) "New Approaches for Heuristic
Search: A Bilateral Linkage with Artificial Intelligence," European Jour-
nal of Operational Research, Vol. 39, No.2, pp. 119-130.
Tabu Search 753

[26] Glover, F., J. P. Kelly and M. Laguna (1996) "New Advances and
Applications of Combining Simulation and Optimization," Proceedings
of the 1996 Winter Simulation Conference, J. M. Charnes, D. J. Morrice,
D. T. Brunner, and J. J. Swain (Eds.), 144-152.

[27] Glover, F. and G. Kochenberger (1996) "Critical Event Tabu Search for
Multidimensional Knapsack Problems," Meta-Heuristics: Theory and
Applications, I. H. Osman and J. P. Kelly (eds.), Kluwer Academic Pub-
lishers, pp. 407-427.

[28] Glover, F., G. Kochenberger and B. Alidaee (1997) "Adaptive Memory


Tabu Search for Binary Quadratic Programs," to appear in Management
Science.

[29] Glover, F. and M. Laguna (1997a) Tabu Search, Kluwer Academic Pub-
lishers.

[30] Glover, F. and M. Laguna (1997b) "Properties of Optimal Solutions to


the Min k-Tree Problem," Graduate School of Business, University of
Colorado at Boulder.

[31] Glover, F., J. M. Mulvey and D. Bai (1996) "Improved Approaches to


Optimization Via Integrative Population Analysis," Graduate School of
Business, University of Colorado at Boulder.

[32] Greenberg, H. J. and Pierskalla, W.P. (1970) "Surrogate Mathematical


Programs," Operations Research, 18, 924-939.

[33] Greenberg, H. J. and W. P. Pierskalla (1973) "Quasi-conjugate Func-


tions and Surrogate Duality," Cahiers du Centre d'Etudes de Recherche
Operationelie, Vol. 15, pp. 437-448.

[34] Holland, J.H. (1975) Adaptation in Natural and Artificial Systems. Uni-
versity of Michigan Press, Ann Arbor, MI.

[35] Hong, I., A. B. Kahng and B-R Moon (1997) "Improved Large-Step
Markov Chain Variants for Symmetric TSP," to appear in Journal of
Heuristics.

[36] Johnson, D. S. (1990) "Local Optimization and the Traveling Salesman


Problem," Proc. 17th Inti. Colloquium on Automata, Languages and
Programming, pp. 446-460.
754 F. Glover and M. Laguna

[37] Karwan, M.H. and R.L. Rardin {1976} "Surrogate Dual Multiplier
Search Procedures in Integer Programming," School of Industrial Sys-
tems Engineering, Report Series No. J-77-13, Georgia Institute of Tech-
nology.

[38] Karwan, M.H. and R.L. Rardin {1979} "Some Relationships Between
Lagrangean and Surrogate Duality in Integer Programming," Mathe-
matical Programming, 17, 230-334.

[39] Kelly, J., B. Rangaswamy and J. Xu {1996} "A Scatter Search-Based


Learning Algorithm for Neural Network Training," Journal of Heuris-
tics, Vol. 2, pp. 129-146.

[40] Kraay, D. and P. Harker {1996} "Case-Based Reasoning for Repetitive


Combinatorial Optimization Problems, Part I: Framework," Journal of
Heuristics, Vol. 2, No.1, pp. 55-86.

[41] Kraay, D. and P. Harker (1997) "Case-Based Reasoning for Repetitive


Combinatorial Optimization Problems, Part II: Numerical Results," to
appear in Journal of Heuristics.

[42] Laguna, M. {1997} "Optimizing Complex Systems with OptQuest,"


Research Report, University of Colorado

[43] Laguna, M., T. Feo and H. Elrod (1994) "A Greedy Randomized
Adaptive Search Procedure for the 2-Partition Problem," Operations
Research, Vol. 42, No.4, pp. 677-687.

[44] Laguna, M., J. P. Kelly, J. L. Gonzalez-Velarde, and F. Glover (1995)


"Tabu Search for the Multilevel Generalized Assignment Problem," Eu-
ropean Journal of Operations Research, Vol. 82, pp. 176-189.

[45] Laguna, M. and R. Marti {1997} "GRASP and Path Relinking for
2-Layer Straight Line Crossing Minimization," Research Report, Uni-
versity of Colorado.

[46] Laguna, M., R. Marti and V. Campos {1997} "Tabu Search with Path
Relinking for the Linear Ordering Problem," Research Report, Univer-
sity of Colorado.

[47] Laguna M., R. Marti and V. Valls (1995) "Arc Crossing Minimization
in Hierarchical Digraphs with Tabu Search," to appear in Computers
and Operations Research.
Tabu Search 755

[48] Lokketangen, A. K. Jornsten and S. Storoy {1994} "Tabu Search within


a Pivot and Complement Framework," International Transactions in
Operations Research, Vol. 1, No.3, pp. 305-316.

[49] Lokketangen, A. and F. Glover (1996) "Probabilistic Move Selection


in Tabu Search for 0/1 Mixed Integer Programming Problems," Meta-
Heuristics: Theory and Applications, I. H. Osman and J. P. Kelly (eds.),
Kluwer Academic Publishers, pp. 467-488.
[50] Lokketangen, A. and Glover, F. {1997b} "Surrogate Constraint Analy-
sis - New Heuristics and Learning Schemes for Satisfiability Problems,"
Proceedings of the DIMACS workshop on Satisfiability Problems: The-
ory and Applications, D-Z. Du, J. Gu and P. Pardalos {eds.}.
[51] Lourenco, H. R. and M. Zwijnenburg {1996} "Combining the Large-
Step Optimization with Tabu Search: Application to the Job Shop
Scheduling Problem," Meta-Heuristics: Theory and Applications, I. H.
Osman and J. P. Kelly {eds.}, Kluwer Academic Publishers, pp. 219-236.
[52] Martin, 0., S. W. Otto and E. W. Felten {1991} "Large-Step Markov
Chains for the Traveling Salesman Problem," Complex Systems, Vol. 5,
No.3, pp. 299-326.
[53] Martin, 0., S. W. Otto and E. W. Felten {1992} "Large-Step Markov
Chains for TSP Incorporating Local Search Heuristics," Operations Re-
search Letters, Vol. 11, No.4, pp. 219-224.

[54] Michalewicz, Z. and C. Janikow (1991) "Genetic Algorithms for Nu-


merical Optimization," Statistics and Computing, Vol. 1, pp. 75-91.
[55] Miihlenbein, H., M. Gorges-Schleuter, and O. Kramer {1988}
"Evolution Algorithms in Combinatorial Optimization," Parallel Com-
puting, Vol. 7, pp. 65-88.
[56] Miihlenbein, H. and D. Schlierkamp-Voosen {1994} "The Science of
Breeding and its Application to the Breeder Genetic Algorithm," Evo-
lutionary Computation, Vol. 1, pp. 335-360.

[57] Miihlenbein, H. and H-M Voigt {1996} "Gene Pool Recombination in


Genetic Algorithms," Meta-Heurisitics: Theory and Applications, I. H.
Osman and J. P. Kelly (eds.), Kluwer Academic Publishers, pp. 53-62.
[58] Mulvey, J. {1995} "Generating Scenarios for the Towers Perrin Invest-
ment Systems," to appear in Interfaces.
756 F. Glover and M. Laguna

[59] Nonobe, K and T. Ibaraki (1997) "A Tabu Search Approach to the
CSP (Constraint Satisfaction Problem) as a General Problem Solver,"
to appear in European Journal of Operational Research.

[60] Punnen, A. P. and F. Glover (1997) "Ejection Chains with Combina-


torial Leverage for the Traveling Salesman Problem," Graduate School
of Business, University of Colorado at Boulder.

[61] Rana, S. and D. Whitley (1997) "Bit Representations with a Twist,"


Proc. 7th International Conference on Genetic Algorithms, T. Baeck
(ed.) pp: 188-196, Morgan Kaufman.

[62] Rego, C. (1996a) "Relaxed Tours and Path Ejections for the Travel-
ing Salesman Problems," to appear in European Journal of Operational
Research.

[63] Rochat, Y. and E. D. Taillard (1995) "Probabilistic diversification and


intensification in local search for vehicle routing". Journal of Heuristics,
1, 147-167.

[64] Spears, W.M. and KA. DeJong (1991) "On the Virtues of Uniform
Crossover," 4th International Conference on Genetic Algorithms, La
Jolla, CA.

[65] Taillard, E. D. (1996) "A heuristic column generation method for the
heterogeneous VRP", Publication CRT-96-03, Centre de recherche sur
les transports, Universite de Montreal. To appear in RAIRO-OR.

[66] Trafalis, T., and 1. AI-Harkan (1995) "A Continuous Scatter Search Ap-
proach for Global Optimization," Extended Abstract in: Conference in
Applied Mathematical Programming and Modeling (APMOD'95), Lon-
don, UK, 1995.

[67] Ulder, N. L. J., E. Pesch, P. J. M. van Laarhoven, H. J. Bandelt and E.


H. L. Aarts (1991) "Genetic Local Search Algorithm for the Traveling
Salesman Problem," Parallel Problem Solving from Nature, R. Maenner
and H. P. Schwefel (eds.), Springer-Verlag, Berlin, pp. 109-116.

[68] De Werra, D. and A. Hertz (1989) " Tabu Search Techniques: A Tuto-
rial and Applications to Neural Networks," OR Spectrum, Vol. 11, pp.
131-141.
Tabu Search 757

[69] Whitley, D., V. S. Gordon and K. Mathias (1994) "Lamarckian Evo-


lution, the Baldwin Effect and Function Optimization," Proceedings of
the Parallel Problem Solving from Nature, Vol. 3, New York: Springer-
Verlag, pp. 6-15.

[70] Wright, A. H. (1990) "Genetic Algorithms for Real Parameter Opti-


mization," Foundations of Genetic Algorithms, G. Rawlins (ed.), Mor-
gan Kaufmann, Los Altos, CA, pp. 205-218.

[71] Yamada, T. and C. Reeves (1997) "Permutation Flowshop Scheduling


by Genetic Local Search," 2nd lEE/IEEE Int. Conf. on Genetic Algo-
rithms in Engineering Systems (GALESIA '97), pp. 232-238, Glasglow,
UK.

[72] Yamada, T. and R. Nakano (1996) "Scheduling by Genetic Local Search


with Multi-Step Crossover," 4th International Conference on Parallel
Problem Solving from Nature, 960-969.

[73] Zenios, S. (1996) "Dynamic Financial Modeling and Optimizing the


Design of Financial Products," Presented at the National INFORMS
Meeting, Washington, D.C.
Author Index
Abernathy, W., 643, 668 Amar, G., 644, 677
Abramson, D., 234-235, 284 Amos, D., 646, 690
Achlioptas, D., 138, 140 Anbil, R., 644-645, 677-678
Action, F., 646, 689 Andersen, E.D., 212, 282
Adams, W.P., 179, 187-188, 236, Andros, P., 647, 694
282, 479-484, 498-499, 517, Androulakis, 1.P., 46-48, 50, 67,
524-527 530-532 71-72
Adams, W.W., 542, 569 Aneja, Y.P., 642, 646, 652, 665,
Adjiman, C.S., 1, 46-48, 50, 67, 687,714
71-72 Angel, R., 649, 691
Adler, 1., 209, 212, 217, 282, 292 Anily, S., 641, 647, 649, 659, 691
Agarwal, Y., 647, 691 Anthony Brooke, 51, 73
Aggarwal, A., 8, 56, 60, 72-73 Apkarian, P., 166, 184
Agin, N., 647, 691 Appelgren, K., 649, 692
Agrawal, V., 86, 142 Appelgren, L., 649, 691
Ahrens, J.H., 355, 418 Arabeyre, J.P., 644, 677
Ahuja, H., 643, 668
Arani, T., 647, 687
Ahuja, N.K., 215, 217, 221, 233,
Arisawa, S., 649, 692
282
Armour, G.C., 654, 714
Aittoniemi, L, 388, 418
Arora, S., 116, 140-141
Akinc, V., 652, 714
Arpin, D., 649, 652, 704, 729
AI-Khayyal, F.A., 41, 72, 524-525,
532 Asano, T., 116, 141
Alao, N., 654, 714 Asirelli, P., 79, 141
Albers, S., 641, 659 Assad, A., 647-648, 649, 692, 700,
Albracht, J.J., 645, 686 705
Ali, D., 641, 663 Assad, A.A., 647-649, 694, 701
Alimonti, P., 122, 140 Atkins, R.J., 652, 714
Alizadeh, F., 151, 179, 182, 276- Atkinson, D.S., 258, 283
277,282 Aubin, J., 646, 687
Allen, L.A., 654-655, 722 Aurenhammer, F., 174, 183
Almond, M., 646, 687 Ausiello, G., 79, 90, 117, 141
Altinkemer, K., 647, 691 Aust, RJ., 646, 687
Altman, S., 644, 668 Averabakh, 1., 647, 692
Alvernathy, W.J., 644, 673 Avis, D., 174, 183, 310, 418, 641,
Amado, L., 645, 682 659

747
748 Author Index

Bahn, 0., 258,283 Baumol, W.J., 652,715


Bailey, J., 643, 668 Baybars, I., 645, 684
Bailey, J.E., 643, 675 Bayer D., 558, 570
Baker, B.M., 647, 692 Bayer, D.A., 212, 283
Baker, E. 641, 644, 647-649, 659, Bazaraa, M.S., 217, 283
677, 692, 701 Beale, E.M.L., 8, 30, 72
Baker, E.K., 644, 678 Beasley, J., 647-648, 696, 698, 712
Baker, K.R, 643, 652, 668-669, 714 Beasley, J.E., 641-642, 642, 647,
Balakrishnan, A., 649, 695 660, 665, 693
Balakrishnan, P.V., 653, 714 Beaumont, N., 39, 72
Balanski, M., 742 Bechtold, S., 643, 669-670
Balanski, M.L., 642, 665 Beckmann, M., 652, 654, 715, 728
Balas, E., 39, 72, 246, 283, 311- Beguin, H., 652-653, 735
313, 321, 324-325, 338-340, Bell, T., 654, 715
342, 418-419, 480, 482-483, Bell, W., 647, 650, 693
492,494,517-518,525,527, Belletti, R, 644, 678
641,653,659-660,714,742 Bellman, R, 647, 654, 693, 715
Balinksi, M., 647, 649, 692 Bellman, RE., 336, 388,419,460,
Ball, M., 647, 649, 692, 698 474
Ball, M.O., 644, 647-648, 653, 678, Bellmore, M., 641-642, 647, 660,
694, 714 665,693
Ballou, R, 652, 714 Beltrami, E., 647, 693
Baloff, N., 643-644, 668, 673 Beltrami, E.J., 644, 668
Barber, G.M., 654, 730 Benders, J.F., 9, 72
Barcelo, J., 652, 714 Benichou, M., 33, 72
Barcia, P., 366, 419 Bennett, B., 649, 693
Barham, A.M., 646, 687 Bennett, B.T., 643, 670
Barkhi, R, 653, 737 Bennett, H.S., 648, 650, 695
Barnes, E.R, 217, 283 Bennett, V.L., 653, 655, 715
Barnhart, C., 644, 678 Benson, H.P., 156, 176, 182, 183
Barr, RS., 742 Benson, S.J., 280, 283
Bart, P., 653, 739 Benvensite, R., 641, 660
Bartholdi, J., 647, 650, 692 Berge, C., 641-642, 660, 665
Bartholdi, J.J., 643, 669 Bergman, L., 653, 658, 740
Bartlett, T., 649, 693 Berlin, G.N., 653, 715
Barton, R., 649, 693 Berman, L., 649-650, 694
Bastos, F., 220-223, 293 Berman, 0., 647, 692
Batta, R, 653, 654, 658, 715 Berry, W.L., 643, 672
Battiti, R, 77, 133-136, 141 Bertsimas, D., 654, 714
A uthor Index 749

Bertsimas, D.J., 647, 694 Brown, G.B., 647, 649-650, 695


Bhattacharya, B.K., 174, 183 Brown, J.R., 642, 665
Bielli, M., 645, 683 Brown, L.A., 653, 656, 726
Bilde, 0., 652, 715 Brown, P.A., 652, 716
Billera, L.J., 536-537, 562, 567, 570 Browne, J., 643, 672, 676
Billionnet, A., 527 Browne, J.J., 643, 670
Bindschedler, A.E., 654, 715 Brownell, W.S., 643, 670
Blair, C.E., 84, 141, 550, 570 Brusco, M., 643, 669
Bland, R.G., 742 Buchberger, B., 542, 570
Blattner, W.O., 430, 458, 475 Buckley, P., 654, 733
Blum, M., 452, 474 Buffa, E.S., 644, 654, 670, 714
Board, J., 648, 712 Bulfin, R.L, 388, 419
Bodin, D.L., 644, 678-679 Bulloch, B., 653, 655, 721
Bodin, L., 647-650, 692-694, 701, Burbeck, S., 644, 681
710 Burdet, C.A., 742
Bodin, L.D., 643-644, 670, 678 Burkard, R.E., 309, 392, 419
Bodner, R., 647, 694 Burns, R.N., 643, 668, 670-671
Boehm, M., 83, 142 Buro, M., 83, 142
Bomberault, A., 649, 713 Burstall, R.M., 652, 716
Booler, J.M., 644, 679 Bushel, G., 649, 699
Borchers, B., 8, 30, 33, 50, 73, 234- Bushnell, M., 86, 142
235,247,257,279-280,283- Butt, S., 647, 695
284, 291-292 Byrn, J.E., 644, 673
Boros, E., 482, 527 Byrne, J .L., 643, 671
Borret, J.M.J., 645, 679
Boufkhad, Y., 140, 143 CPLEX, 526, 528
Bouliane, J., 654, 715 Cabot, A.V., 652, 654, 716, 723
Bowman, E.H., 645, 684 Cabot, V., 183
Boyd, S., 277, 296 Cain, T., 648, 712
Bramel, J., 647, 649, 695 Camerini, P., 743
Branco, M.I., 645, 682 Camerini, P.M., 525, 527
Brandeau, M., 654, 716 Caprara, A., 309, 419
Brelaz, D., 642, 665 Caprera, D., 648, 707
Breu, R., 742 Captivo, M.E., 645, 682
Brill, E.D., 653, 731 Carraresi, P., 180, 183, 644, 679
Broder, A., 140, 142 Carstensen, P.J., 473, 474
Broder, S., 646, 687 Carter, M., 643, 668
Bronemann, D.R., 644, 679 Carter, M.W., 643, 646-647, 670,
Brown, E.L., 499, 524, 530 687
750 Author Index

Casanovas, J., 652, 714 Chhajad, D., 652, 716


Case, K.E., 653, 742 Chien, T.W., 649,695
Cassell, E., 647, 694 Chilali, M., 166, 184
Cassidy, P.J., 648, 650, 695 Chiu, S., 654, 716
Cattrysse, D., 646, 685 Cho, D.C., 652, 717
Cattrysse, J., 646, 685 Choi, G., 517, 525-526, 530
Caudle, W., 649, 691 Choi, Y., 648, 710
Cavalier, T.M., 647, 695 Chopra, S., 642, 665
Ceder, A., 644-645, 649-650, 679, Chrissis, J.W., 653, 717
683-684, 695 Christof, T., 246, 284
Ceria, S., 246, 283, 483, 492, 494, Christofides, N., 403, 419, 641, 652,
527 647-649,660,696,698,717,
Cerveny, R.P., 652, 655, 716 743
Chaiken, J., 643, 671 Church, J.G., 643, 671
Chaiken, J.M., 654, 716 Church,R., 654, 715, 737
Chakradar, S., 86, 142 Church, R.L., 652-655, 658, 715,
Chan, A.W., 654, 716 717-718,741
Chan, P.W., 646, 690 Chvatal, V., 310, 362, 419
Chan, T.J., 641, 660 Chv:atal, V., 138-139, 142, 310,
Chandra Sekaran, R., 652, 714 362,419,641-642,660,666
Chandra, A.K., 373, 408, 419 Ciftan, E., 646, 690
Chandrasekaran, R., 430, 474, 652, Ciric, A.R., 8, 73
654, 716, 732 Clarke, G., 647, 696
Chang, G.J., 642, 665 Clarke, M.R.B., 308, 421
Chao, M.-T., 140, 142 Cochran, H., 648, 712
Chard, R., 647, 695 Cockayne, E.J., 642, 666
Charnes, A., 649, 653, 693, 695, Coffman, E.G., 641, 647, 660, 695
716 Cohen, B., 129, 147
Chaudhry, S.S., 652-653, 732 Cohen, E., 472, 475
Chaudry, S.S., 641, 654, 660, 716 Cohen, J., 653-654, 737
Chelst, K., 643, 671 Cohon, J.L., 652, 654-655,718-719
Chen, C.K., 652-653, 727 Cole, R., 454, 475
Chen, D., 643, 671 Collins, R., 647, 650, 692
Chen, J., 92, 94, 96-97, 142 Conforti, M., 179, 183, 641, 661
Chen, J.M., 654, 656, 724 Conti, P., 542, 545-546, 570
Cherici, A., 645, 683 Conway, R.W., 654, 718
Cheriyan, J., 85, 142 Cook, S.A., 79, 138, 142
Cheshire, I.M., 647, 695 Cook, W., 178, 183, 550, 570
Chevalley, C., 204, 284 Cooper, L., 652, 654, 718
Author Index 751

Cormen, T.H., 419, 434, 465, 475 Dantazig, G.W., 644, 671
Corneil, D.G., 641, 661 Dantzig, G., 647, 699
Corneil, D.G., 642, 666 Dantzig, G.B., 208, 216, 284., 311,
Cornuejols, G., 179, 183,246,283, 320,419,430,458,475
483, 492, 494, 527, 569- Darby-Dowman, K., 644-645, 679,
570, 641, 652, 658, 661, 682
718-719 Daskin, M., 653-655, , 658, 719-
Cosagrove, M.J., 644, 670 721
Cox, D., 542, 570 Daskin, M.S., 647-648, 652, 705,
Crabil, T.B., 643, 669 733
Craig, C.S., 654, 724 Daughety, A., 652, 716
Crama, Y., 482, 527, 641, 661 Davani, A., 644, 678
Crandall, H., 647, 700 Davis, M., 81, 143, 200, 284
Crawford, J.L., 647, 650, 697 Davis, P.S., 652, 720
Crescenzi, P., 79, 90, 117-118, 122, Davis, R.P., 653, 717
141, 143 Day, R.H., 646, 688
Criss, E., 654, 656, 724 de Loera, J., 568, 570
Crowder, H., 302, 419, 480, 515, de Santis, M., 79, 141
525,528 de Silva, A., 234-235, 284
Crowder, H.P., 744 de Simone, C., 202, 254, 284
Csima, J., 646, 688 DeWerra, D., 646, 688
Cullen, D., 647, 691 Deane, R.H., 643, 674
Cullen, F.H., 647, 649, 697 Dearing, P.M., 652-654, 658, 720
Culver, W.D., 643, 674 Dee, N., 653, 720
Cunningham, W.H., 85,142 Deighton, D., 653, 720
Cunto, E., 647, 650, 697 Dembo, R.S., 331, 387, 408, 419
Current, J., 652, 653, 655, 717-719 Demers, A., 744
Dempster, M.A.H., 646, 688
d'Atri, A., 117, 141 Deo, N., 318, 427
d'Atri, G., 352, 418 Derochers, M., 648, 704
Daganzo, C., 647, 649, 697 Desrochers, M., 520, 522, 528, 648,
Dahl, R., , 647, 649, 701 697
Dakin, M., 654, 732 Desroches, S., 646, 689
Dalberto, L., 647, 650, 693 Desrosiers, J., 133, 145, 647-649,
Dallwig, S., 46-47, 50, 72 697-698, 711
Dannenbring, D., 403, 425 Deza, M., 245, 284
Dannenbring, D.G., 652, 727 Di Biase, F., 568, 571
Danninger, G., 180, 183 Diaconis, P., 549, 571
Dantazig, G.G., 647, 649, 697 Dial, R., 644, 678-679
752 A utbor Index

Diehl, M., 202, 254, 284 Eberhart, R., 652, 655, 718
Dietrich, R., 654, 656, 724 Edmonds, J., 167, 184, 245, 285,
Dietrich, B.L., 303, 352, 420 642,666
Difi"e, W., 305, 420 Edwards, G.R., 644, 679
Dikin, 1.1., 208, 217, 285 Efroymson, M., 652, 721
Dinkelbach, W., 430, 437, 473, 475 Egan, J.F., 649, 703
Dolan, J.M., 653, 715 Eilon, S., 643, 647, 671, 696, 698
Doll, L.L., 647, 698 Eiselt, H.A., 654, 721
Dormont, P., 643, 671 Eisemann, K., 652, 721
Dreyfus, S.E., 642, 666 EI-Azm, A., 648, 698
Drezner, Z., 196, 294, 652, 720 EI-Bakry, A.S., 226, 235, 285
Driscoll, P.J., 517, 531 EI-Darzi, E., 641, 661
Dror, M., 647-648, 649, 651, 698, EI-Halwagi, M., 42-44, 75
711 EI-Shaieb, A.M., 652, 721
Drysdale, J.K., 652, 655, 720 Elkihel, M., 359, 427
Du Merle, 0., 258, 283 Ellwein, L.B., 652, 721
Du, D., 137, 143 Elmaghraby, S.E., 649, 692
Du, D.-Z., 86, 144, 170-171, 183- Elson, D.G., 652, 721
184, 198, 213, 217, 285, Elzinga, D.J., 653-654, 735, 737
642,667 Elzinga, J., 654, 721
DuPuis, D., 643, 645, 673, 681 Emmons, H., 643, 671-673
Dubois, 0., 140, 143 Erkut, E., 173, 184, 653, 654, 721-
Dudzinski, 309,318,324,373,374, 722
376, 399, 420 Erlenkotter, D., 652-654, 722, 740
Duguid, R., 644, 678 Ervolina, T.R., 464, 475
Dulac, G., 649, 698 Escalante, 0.,648, 706
Dumas, Y., 648-649, 697-698 Escudero, L.F., 303, 352, 418, 420
Dunford, N., 156, 184 Etcheberry, J., 641, 661
Duran, M.A., 9,44,73 Etezadi, T., 647, 698
Durant, P.A., 645, 684 Eurkip, N., 654, 656, 728
Dutton, R., 652, 655, 720 Eusebio, R., 645, 682
Dyer, M.E., 372-373, 420, 652, 720, Evans, S.R., 647-648, 650, 698, 707
743 Even, S., 646, 688
Dzielinski, B.P., 646, 685 Evers, G.H.E., 644, 679
Eyster, J.W., 654, 722
Eagles, T., 652, 655, 718 Ezell, C., 641, 663
Easton, F.F., 643, 671
Eaton, D., 653, 655, 721 Falk, J.E., 72
Eaton, D.J., 653, 655, 715, 717 Falkner, J.C., 641, 644, 664, 679
Author Index 753

Falkson, L., 653, 735 Floudas, C., 281, 285


Falkson, L.M., 653, 740 Floudas, C.A., 1, 8-9, 46-48, 50-
Farinaccio, F., 180, 183 52, 60, 67, 71-75
Farvolden, J.M., 649, 699 Floyd, R.W., 452, 474
Favati, P., 178, 184 Flynn, J., 653, 655, 722
Fayard, D., 309, 321, 323, 342-343, Ford, L.R., 217, 285, 460, 475
420-421 Forgues, P., 649, 698
Fearnley, J., 644, 677 Foster, B.A., 645, 647, 683, 700
Federgruen, A., 641, 647, 649, 659, Foster, D.P., 653, 722
691, 699 Foulds, L., 648, 712
Feige, U., 116, 143 Foulds, L.R., 648-649, 700, 713
Feldman, I., 397, 422 Fowler, R.J., 641, 661
Feldmann, E., 652, 722 Fox, B., 430, 475
Felts, W., 648, 703 Fox, G.W., 641, 662
Feo, T.A., 130, 147 Frances, M.A., 643, 672
Ferebee, J., 647, 699 Francis, R.L., 175-176, 187, 652-
Ferguson, A., 647, 699 654, 716, 720-723, 739
Ferland, J., 649, 698, 711 Franco, J., 83, 137, 139-140, 142-
Ferland, J.A., 646, 687 144
Ferland, P.C., 646, 688 Frank, A., 184
Ferreira, C.E., 305, 372, 407, 421 Frank, H., 654, 723
Fiacco, A.V., 209, 285 Frank, R.S., 646, 688
Field, J., 643, 668 Fratta, L., 525, 527, 743
Finke, G., 355, 418 Frederickson, G., 647, 700
Finnegan, W.F., 644, 678 Fredman, M.L., 453, 475
Fischetti, M., 400, 409, 421 Freedman, B.A., 217, 295
Fisher, M., 644, 647, 650, 677, 693 Freeman, D.R., 645, 685
Fisher, M.L., 366, 421, 525, 528, Freville, A., 308, 421
641,646-647,649-650,652, Frey, K., 644, 678
655, 661, 685, 699, 718- Friesen, D., 92, 94, 96-97, 142
719,743 Frieze, A., 140, 142-143
Fisk, J.C., 403, 405, 410, 423 Frieze, A.M., 308, 421, 652, 720
Fitzpatrick, D.W., 743 Fujisawa, K., 279, 285
Fitzsimmons, J.A., 654-656, 722, Fujishige, S., 167, 184, 474, 476
728 Fujiwara, 0., 653, 655, 723
Fletcher, A., 647, 699 Fukushima, M., 474,476
Fletcher, R., 9, 23, 73 Fulkerson, D.R., 217, 285, 460, 475,
Fleuren, H.A., 647-649, 699 641, 646, 662, 688
Florian, M., 649, 699 Fullerton, H.V., 649, 707
754 Author Index

Fulton, W., 542, 571 Ghare, P., 648, 712


Ghare, P.M., 654, 734
Gaballa, A., 643, 672 Gheysens, F., 647, 700-701
Gafinkel, RS., 654, 717 Ghosh, A., 654, 724
Gahinet, P., 166, 184 Giannessi, F., 149, 152, 176, 184
Gallaire, H., 79, 143 Gibson, D.F., 652, 716
Gallo, G., 308, 421, 644, 679 Gilbert, E.N., 642, 666
Gans, O.B., de., 646, 688 Gillett, B.E., 647, 701
Garey, M.R, 408, 421, 744 Gilmore, P.C., 305, 388, 397,422
Garfinkel, R, 641, 662 Gilmour, P., 648, 702
Garfinkel, RS., 528, 642, 646, 652- Girlich, E., 167, 185
653, 658, 689, 723, 727, Girodet, P., 33, 72
743 Glassey, C.R, 646, 689
Garvin, W., 647, 700 Gleason, J., 654, 724
Gaskell, T., 647, 700 Gleiberman, L., 649, 703
Gaudioso, M., 647, 700 Glover, F., 46, 73, 132, 134, 143,
Gauthier, J.M., 33, 72 319, 382, 422, 643, 672,
Gavett, J.W., 654, 724 742
Gavish, B., 308, 422, 647,691, 700, Glover, G., 480, 528
744 Glover, R, 643, 672
Gazis, D., 649, 693 Glover, f., 744
Gehrlein, W.V., 645, 686 Goemans, M., 442, 475
Gel'fand, I.M., 537, 562-563, 570- Goemans, M.X., 85, 116, 143-144
571 Goemans, Michel X., 276-277, 286
Gelatt Jr., C.D., 132, 145 Goerdt, A., 115, 144
Gelders, L.F., 652, 656, 724, 732- Goffin, J.L., 258, 283, 286, 744
733 Goldberg, A., 464, 477
Gelman, E., 645, 677 Goldberg, A.V., 342, 422, 460, 464,
Gendreau, M., 647, 700 475-476
Gens, G., 352, 422 Goldberg, J., 654, 656, 724
Gent, I.P., 129, 143 Golden, B., 647-649, 692, 698, 700,
Gentzler, G.L., 643, 672 705, 711-712
Geoffrion, A.M., 8-9, 73, 526, 528, Golden, B.L., 647-649, 650, 651,
652, 724, 744 701-702
George, A., 223, 286 Goldern, L.B., 647-649, 694
Gerards, A.M., 178, 183 Goldestein, J.M., 654, 723
Gerards, A.M.H., 550, 570 Goldman, A.J., 652-653, 724-725
Gerbract, R, 644, 680 Goldmann, M" 473, 476
Gershkoff, I., 644, 648, 680, 700 Goldstein, A.S., 221, 223, 288
Author Index 755

Golub, G.H., 221, 286 Gurevich, Y., 648, 700


Gomory, RE., 245, 286, 305, 388, Gusfield, D., 472-473, 476
397, 422, 646, 685 Gutjahr, A.L., 645, 685
Gondzio, J., 212, 258, 282, 286
Goodchild, M., 653, 725 Ha'cijan, L.G., 186
Goodwin, T., 647, 701 Hackman, S.T., 645,685
Gordon, L., 648, 702 Hagberg, B., 643, 672
Gosselin, K, 646, 689 Hager, W.W., 164, 185
Gotleib, G.C., 646, 688 Haghani, A.E., 653, 719
Graham, B., 642, 666 Haimovich, M., 648, 702
Graham, RL., 744 Hakimi, L., 653, 731
Granot, F., 178, 185 Hakimi, S., 652-653, 658, 725
Graver, J.E., 550, 553, 571 Hakimi, S.L., 176, 185., 642, 652-
Graves, G.W., 647, 649-650, 695 654, 666, 727
Gray, P., 652, 721 Halfin, S., 652, 725
Grayson, D., 540, 571 Hall, A., 646, 689
Green, L., 643, 672 Hall, RW., 649, 702
Greenberg, H., 397, 422 Hallman, A.B., 645, 684
Greenberg, J., 644, 678 Halpern, J., 652-653, 725
Greenfield, A., 647, 650, 693 Hamburger, N., 652, 729
Greenfield, A.J., 647, 650, 699 Hammer, P., 641, 661
Griffin, J.H., 648, 706 Hammer, P.L., 85, 144, 182, 186,
Grimes, J., 646, 689 308, 324, 331, 387, 408,
Grossmann, I.E., 6, 9, 18, 20-21, 419, 421, 422, 482, 527,
30, 32-34, 37-39, 44, 51, 641-642, 652, 662, 726
60, 70, 73, 75-76 Hanan, M., 642,666
Gr:otschel, M., 167, 185, 245-248, Handler, G.Y., 175, 185, 652, 654,
286, 305, 421 726
Gruta, KK, 653, 655, 723 Hansen, P., 84-85, 120, 129, 132-
Gu, J., 79, 83, 86, 119, 137, 143- 133, 144, 166, 185, 652-
144, 147, 198, 285 654,726
Gu, Q.-P., 86, 144 Hanssman, F., 644, 682
Guerin, G., 649, 699 Hanthikumar, J.G., 179, 186
Guha, D., 643, 672 Hara, S., 277, 290
Guignard, M., 652, 725 Harjunkoski, I., 44, 75
Guisewite, G.M., 213, 287 Harlety, H.O., 649, 651, 706
Gumaer, R, 649, 693 Harrison, T.P., 744
Gunawardane, G., 653, 725 Hartley, T., 644, 680
Gupta, O.K, 8, 30, 32-33, 50, 73 Hasegawa, T., 654, 731
756 A uthor Index

Hashizume, S., 474, 476 Hochbaum, D.S., 168, 179, 186, 309,
Hastad, J., 116, 145 422
Hatay, L., 644, 678 Hochst:attler, W., 568, 571
Hauer, E., 648, 702 Hodgson, M.J., 654, 726
Haurie, A., 258, 286 Hofer, J.P., 648, 651, 703
Hausman, W., 648, 702 Hoffman, A.J., 162, 186, 641, 662
Hayer, M., 568, 571 Hoffman, K.L., 246, 287, 305, 422,
Hearn, D., 654, 721 480, 525, 528
Hearn, D.W., 642, 666 Hoffmann, T.R., 645, 685
Heath, L.S., 646, 689 Hoffstadt, J., 645, 680
Heath, M., 223, 286 Hogan, K., 653-654, 656,658, 719,
Hecht, M., 647, 700 726, 735
Held, M., 647, 702, 744 Hogg, J., 654, 656, 726
Helgason, R.V., 217, 290 Holland, 0., 245, 247, 286
Hellman, M.E., 305, 420 Holliday, A., 648, 713
Holloran, T.J., 644, 673
Helmberg, C., 277-280, 287
Holmberg, K., 9, 26, 73
Helmer, M.C., 654, 656, 737
Holmes, J., 653, 656, 726
Henderson, W., 644, 680
Holmes, R.A., 648, 702
Henderson, W.B., 643, 672
Hong, S., 647, 693
Hendrick, T.E., 654, 656, 734
Hooker, J.N., 79, 84-85, 144, 652-
Hentges, G., 33, 72
653, 727
Hering, R., 648, 712
Hoover, E.M., 652, 727
Hersh, M., 644, 676 Hopmans, A.C.M., 654, 656, 727
Hershey, J., 643, 668 Hormozi, A.M., 652, 727
Hershy, J.C., 644, 673 Horowitz, E., 313, 333, 337, 376,
Hertz, A., 646-647, 689, 700 423
Hervillard, R., 644, 680 Horst, R., 156, 176, 182, 186
Heurgon, E., 644, 680 Hosten, S., 545, 558, 568-569, 571
Hinman, G., 652, 655, 720 Houck, D.J., 642, 666
Hirata, T., 116, 141 Hougland, E., 648, 712
Hiriart-Urruty, J.-B., 180, 185 Housos, E., 196, 288
Hirsch, W.M., 162, 186 Howard, S.M., 645, 680
Hirschberg, D.S., 373, 394, 408, 419, Howell, J.P., 643, 673
422 Hsu, W.L., 654, 727
Hitchcock, F.L., 216, 287 Huang, C., 196, 288
Hitchings, G.F., 654, 726 Hung, M.S., 403, 405, 410, 423
Ho, A., 641, 659, 662 Hung, R., 643, 673
Hochbaum, D., 641, 662 Hurter, A.P., 654, 727
Author Index 757

Hurter, Jr., A.P., 652, 741 Jeroslow, R.G., 84, 141, 362, 423,
Hurtur, A.P., 654, 736 480, 528, 550, 570, 744
Huxlery, S.J., 644, 676 John,D.G., 641, 662
Hwang, F.K., 171, 183, 186, 641- John, J., 647, 700
642, 662, 666 Johnson, D., 641, 662
Hyman, W., 648, 702 Johnson, D.G., 648, 706
Johnson, D.S., 91-92, 94, 136, 145,
Ibaraki, S., 654, 731 198, 288,408, 421, 744
Ibaraki, T., 168, 186, 318, 336, 346, Johnson, E., 644, 678
423,430-431,473-474,476, Johnson, E.L., 302, 324, 365, 372,
478, 641, 661 419,422-423,480,515,525,
Ibarra, O.H., 346-347, 397, 423 528-529,641,652,662,717
Ignall, E., 643, 673 Johnson, F.L., 641, 662
Ignall, E.J., 645, 685 Johnson, J., 647, 701
Ignizio, J.P., 652, 737 Johnson, J.L., 86, 145
Igo, W., 648, 709 Johnson, R.V., 645, 685-686
Ingargiola, G.P., 331-332, 388, 408, Johnson, T.A., 236, 282, 498-499,
423 524,526
Ishii, H., 430, 476 Johnston, N.D., 119, 146
Itai, A., 646, 688 Jones, R.D., 644, 680
Iwano, K., 474, 476 Joshi, A., 221, 223, 288
Joy, S., 279, 283
J:ornsten, 366, 419 Jucker, J.V., 645, 685
Jacobs, F.R., 644, 674 Judice, J., 220-223, 293
Jacobs, L.W., 643, 669 Junger, M., 202, 245, 248, 254, 284-
Jacobsen, S.K., 649, 651, 702 286,288
Jaikumar, R., 647, 650, 693, 699
Jameson, S., 648, 706 Kabbani, N.M., 645,681
Janaro, R.E., 643, 670 Kalantari, B., 186
Jannssen, C., 652, 658, 729 Kaliski, J.A., 221, 288
Jansen, B., 270, 272, 296 Kalstorin, T.D., 653, 727
Jansma, G., 653, 655, 721 Kamath, A., 140, 145
Jarvinen, P., 653, 727 Kamath, A.P., 79, 85, 145, 198,
Jarvis, J.J., 217, 283, 647, 649, 697 269, 272, 276, 288-289
Jaumard, B., 84, 120, 129, 132- Kamiyama, A., 643, 677
133, 144-145, 166, 185 Kannan, R., 394, 423
Jaw, J., 648-649, 651, 702 Kapoor, A., 179, 183
Jayaraman, V., 653, 737 Karisch, S.E., 277, 279, 280, 297
Jena, S., 654, 657, 739 Kariv, 0., 652-654, 727
758 Author Index

Karloff, H., 116, 145 Kirkpatrick, S., 132, 138, 145-146


Karmarkar, N., 212, 217, 282 Kirousis, L.M., 138-140, 146
Karmarkar, N.K, 79, 85, 145, 196, Klasskin, P.M., 644, 673
198, 201, 208, 212, 225, Klee, V., 174, 186, 745
237,259,261-262,269,272, Kleine Buening, H., 83, 142
276,281,288-289,292,294 Klingman, D., 742
Karp, RM., 431, 460, 476, 647, Knauer, B.A., 646, 689
702, 744-745 Knight, KW., 648, 651, 703
Karwan, M.H., 745 Knuth, D.E., 355, 423
Katoh, N., 168, 186, 474, 476 Kochenberger, G.A., 641, 662
Kautz, H., 126, 129, 133, 147 Kocis, G.R, 6, 9, 18, 37, 73
Kayal, N., 372-372, 420 Kojima, M., 277, 279, 285, 290
Keaveny, I.T., 644, 681 Kolen, A., 648, 652-653, 703, 728
Kedia, P., 641, 661 Kolesar, P., 643, 653, 656, 672-
Keedia, P., 647, 650, 693 673,728
Keeney, RL., 654, 727 Kolesar, P.J., 332, 424
Keith, E.G., 644, 673 Kolner, T.K, 644, 681
Kelley, J.E., 34, 73 Koncal, R.D., 641, 664
Kendrick, D., 51, 73 Koop, G.J., 643, 670, 673
Kennington, J.L., 217,290 Koopmans, T.e., 652, 728
Kernighan, B., 647, 705 Korman, S., 641, 660
Kervahut, T., 648, 708 Korsh, J.F., 331-332,388,408,423
Khachiyan, L.G., 203, 208, 290 Koskosidis, Y.A., 648, 703
Khalil, T.M., 643, 672 Kostreva, M.M., 480, 529
Khanal, M., 653, 719 Koutsopoulos, H.N., 645, 681
Khanna, S., 122, 145 Koutsoupias, E., 125, 146
Khoury, B.N., 642, 666-667 Kovalev, M., 167, 185
Khumawala, B.M., 652-653, 714, Kowalik, J.S., 318, 427
727-728 Kozolov, M.K, 186
Kieft, S., 305, 421 Kraemer, S.A., 654, 730
Kilbridge, M.D., 645, 686 Krajewski, L.J., 643-644, 673, 675-
Killion, D.L., 645, 681 676
Kim, C., 647, 700 Kramer, RL., 652, 728
Kim, e.E., 346-347, 397, 423 Kranakis, E., 138-140, 146
Kimes, S.E., 654, 656, 728 Krarup, J., 652, 715 728
Kindervater, G.A.P., 314, 423 Krass, D., 310, 424
King, V., 464, 476 Kraus, A., 652, 658, 729
Kirby, RF., 648, 703 Kravanja, Z., 44, 75
Kirca, 0., 654, 656, 728 Krinzac, D., 138, 140
Author Index 759

Krishnamurthy, RS., 524-525, 532 Lawler, Eugene, 217, 290


Krishnamurty, N.N., 653, 715 Lawrence, RM., 654, 729
Krispenz, C., 305, 421 LePrince, M., 645, 681
Krizanc, D., 138-139, 146 Leamer, E.E., 654, 729
Krolak, P., 648, 703 Lee, E.K., 290
Krushal, J.B., 216, 290 Lee, J., 653, 725
Kuby, M., 654, 733 Lee, K, 654, 733
Kuehn, A., 652, 729 Lee, Y., 493, 498, 506, 517, 525,
Kuenne, RE., 652, 729 531-532
Kuhn, H.W., 652, 729 Leeaver, R.A., 652, 716
Kuik, R, 646, 685 Lehrer, F., 652, 722
Kullmann, 0., 83, 146 Leigh, W., 641, 663
Kunnathur, A.S., 646, 689 Leighton, F., 642, 667
Kursh, S., 647, 650, 694 Leiserson, C.E., 419, 434, 465, 475
Kwak, N.K., 654, 656, 737 Lemarechal, C., 180, 185
Kydes, A., 644,679 Lemke, C.E., 641-642, 663
Lenstra, J.K, 314, 423, 647-648,
LaBella, A., 645, 683 697, 704-705, 745
LaPorte, G., 646-649, 652, 689, 699, Lessard, R., 643, 645, 673, 681,
700, 703-704, 729 683
Labbe, M., 175-176, 186 Lester, J.T., 647, 650, 699
Labbe, M., 648, 652-653, 703, 726 Leudtke, L.K, 645, 681
Laderman, J.L., 649, 703 Levary, R., 648, 705
Laffrey, T.J., 79, 146 Levenberg, K, 290
Lagarias, 212, 289 Levesque, H., 119, 129, 138, 146-
Lagarias, J.C., 212, 282, 310, 424 147
Laird, P., 119, 146 Levin, A., 649, 705
Lam, T., 648, 703 Levner, E., 352, 422
Laporte, G., 305, 424, 520, 522, Levy, J., 652, 729
528 Levy, L., 647-649, 698, 701, 705
Larsen, C., 182, 186 Leyffer, S., 9, 23, 73
Larson, R.C., 654, 729 Li, C., 643, 674
Lasdon, L.S., 646, 686 Li, Chung-Lun, 648, 705
Lasky, J.S., 641, 664 Li, Y., 237, 239, 290-294
Lassiter, J.B., 522, 527 Liebman, J., 648, 705
Laurent, M., 245, 284 Liebman, J.C., 652-653, 715, 720,
Lavoie, S., 644, 681 731, 735
Lawler, E.L., 32, 74, 430, 458, 476, Liepins, G.E., 646, 689
641, 662, 745 Lin, C.C., 652, 729
760 Author Index

Lin, F.L., 653, 714 Maffioli, F., 525, 527, 743


Lin, S., 647, 705 Magazine, M.J., 643, 645, 668-669,
Linder, RW., 644, 674 685
Lions, J., 647, 689 Magnanti, T.L., 481, 529, 647-648,
Little, J., 542, 570 650, 702, 705
Liu, L., 196, 288 Magnanti, Thomas L., 215, 217,
Logemann, G., 81, 143 221, 233, 282
Lotti, V., 646, 687 Maheshwari, S.N., 652, 725
Loucks, J.S., 644, 674 Mahjoub, A.R., 641, 661
Loustaunau, P., 542, 569 Maier, S.F., 652, 730
Louveaux, F.V., 652, 729 Maier-Rothe, C., 643, 674
Lovasz, L., 167, 178, 185, 187,246, Makjamroen, T., 653, 655, 723
277, 286, 290, 483, 526, Malandradi, C., 647-648, 653, 719,
529, 641, 663 705
Love, R, 654, 729-730 Malczewski, J., 652, 730
Love, RF., 654, 741 Male, J., 648, 705
Loveland, D., 81, 143 Malleson, A.M., 647, 695
Loveland, D.W., 85, 146 Malucelli, F., 180, 183
Lowe, J.K., 84, 141,480, 528 Mangelsdorf, K.R, 643, 652, 657,
Lowe, T.J., 652-654, 716, 721-723, 676,738
739 Manne, A.S., 646, 652, 686, 730
Lowerre, J.M., 643, 670, 674 Mannila, H., 117, 146
Luce, B.J., 644, 670 Mannur, N.R, 654, 658, 715
Lucena Filho, A.P., 648, 705 Maranas, C.D., 46-47, 50, 72
Luckhardt, H., 83, 146 Maranzana, F., 652, 730
Lueker, G.S., 305, 394, 424, 641, Marchetti-Spaccamela, A., 342,422
660 Marianov, V., 654, 731
Luna, J.C., 643, 676 Markham, I.S., 648, 712
Luna, J.S., 652, 657, 738 Markland, RE., 647,652,658,689,
Lund, C., 116, 140 733
Luo, Z.Q., 258, 286 Marks, D., 652-653, 731, 735
Lustig, I.J., 208, 210, 253, 290 Marquardt, D., 291
Marquez Diez-Canedo, J., 648, 706
Mabert, V.A., 643, 674, 676 Marsten, RE., 208, 210, 253, 309,
MacKinnon, 654, 730 425, 641, 643, 645, 663,
Mack, R., 647, 650, 693 676, 681-682, 744, 746
Maculan, N., 642, 667 Martelli, A., 79, 141
Madsen, 0., 649, 651, 702 Martello, S., 309, 314, 318, 323,
Moos, J., 646, 685 325-329, 332-334, 340, 342-
A uthor Index 761

343,345,347-348,352,359, 477
361-362,382,384,387,391, Mehrez, A., 653, 731
394, 396-400, 403-404, 406, Mehrotra, S., 208, 212, 221, 225,
408-410, 415, 418, 420-421, 291
424-425 Mehta, N.K, 642, 647, 667, 690
Martin, A., 305, 372, 407, 421 Meketon, M.S., 217, 296
Martin, KR., 480, 529 Melzak, Z.A., 642, 666
Martin-Lof, A., 649, 706 Mercure, H., 648, 703-704
Masuyama, S., 654, 731 Mertens, W., 645, 681
Mathews, G.B., 305,425 Meszaros, C., 212, 282
Mathon, V., 166, 185 Meyer, P.D., 653, 731
Mathur, K, 647, 691 Meyer, R.R., 480, 529
Mavrides, L.P., 652-653, 731 Michaud, P., 641, 663
Mawengkang, H., 9, 36, 74 Mikhailow, G.W., 8, 30, 32, 50, 74
Maxwell, W.L., 654, 718 Miller, A.J., 645, 684
Mazzola, J.B., ,480, 527 Miller, D.M., 653, 717
McAdams, A.K, 652, 658, 729, 731 Miller, H.E., 643, 675
McBryde, R., 481, 528, 652, 724 Miller, L.R., 647, 701
McCloskey, J.F., 644, 682 Miller, M.H., 649, 695
McCormick, G.P., 41,74,209,285 Millham, C.B., 652, 655, 720
McCormick, S., 654, 716 Minas, J.G., 648, 706
McCormick, S.T., 464,474-475,477, Mine, H., 430, 476
641,660 Mingozzi, A., 403, 419, 647, 696
McDonald, J.J., 648-649, 651,703, Minieka, E., 648, 652-653, 658, 706,
706 732
McGinnis, L.F., 643, 654, 674, 722 Minker, J., 79, 143
McGrath, D., 644, 675 Minoux, M., 644, 681-682
McHose, A.H., 654, 731 Minton, S., 119, 146
McKay, M.D., 649, 651, 706 Mirchandani, P.B., 175-176, 185,
McKeewn, P.G., 746 187, 654, 722, 726, 732
McKenzie, J.P., 643, 674 Misono, S., 474, 476
McKenzie, P., 644, 673 Mitchell, D., 119, 129, 138, 146-
McKeown, P.G., 646, 689, 745 147
McKnew, M., 653, 736 Mitchell, D.G., 138, 142
McMillan, C., 643, 672 Mitchell, E., 189
Meadows, M.E., 653, 717 Mitchell, J.E., 8, 30, 33, 50, 73,
Medgiddo, N., 653, 731 202,234-235,247,250,253,
Megeath, J.D., 644, 675 256-258,279,283-284,290-
Megiddo, N., 431, 448, 472, 475, 292,294
762 A uthor Index

Mitchell, M., 644, 678 Nair, RP.K., 652, 654, 714, 732
Mitchell, R, 645, 682 Nakata, K., 279, 285
Mitra, G., 641, 644-645, 661, 679, Nambiar, J.M., 652, 656, 732-733
682,745 Narula, S.C., 653, 658, 733
Mitten, L.G., 648, 706 Natraj, N., 246, 283
Mitwasi, M.G., 654, 656, 724 Natraj, N.R., 549, 568, 572
Mizrach, M., 646, 689 Nauss, RM., 305, 365, 373, 425,
Mole, R, 648, 706 647, 652, 658, 689, 733
Mole, RH., 654, 732 Naw, D.S., 646, 690
Monroe, G., 643, 675 Nawijn, W.M., 646, 690
Monteiro, RD.C., 209, 212, 292 Neebe, A., 403, 425, 652, 727
Moon, D., 654, 716 Neebe, A.W., 653, 733
Moon, LD., 641, 652-653, 660, 732 Nelson, J., 648, 703
Moondra, S.L., 643, 675 Nelson, M.D., 648, 706
Moore, G., 654, 732 Nemhauser, G., 649, 706
Moore, J.M., 654, 715 Nemhauser, G.L., 179, 187, 245,
Moore, R.E., 42, 74 292, 318, 388, 426, 480,
Mora, T., 558, 571 507, 528-529, 641-642, 645-
More, J.J., 265, 292 646, 652, 654, 658, 662-
Morin, T.L., 309, 425, 746 663,665-667,685,688-689,
Morris, J.G., 643, 654, 675, 729- 718-719, 727, 743, 745
730 Nemirovsky, A.S., 277, 292
Morrish, A.R, 644, 675 Nesterov, Y.E., 277, 292
Morrison, 1., 558, 570 Neuhauser, G.L., 642, 743
Moser, P.L, 645, 680 Neumaier, A., 42, 46-47, 50, 72,
Motwani, R, 116, 122, 127, 140, 74, 180, 187
145-146 Neumann, S., 173, 184
Mukundan, S., 654, 732 Newton, R, 649, 707
Muller, M.R., 645, 681 Ng, S.M., 641, 659
Mj..uller-Merbach, H., 324, 425 Nguyen, H.Q., 647-648, 650, 702
Mulvey, J.M., 646, 690 Nguyen, T.A., 79, 146
Munro, J.L, 338, 425 Niccolucci, F., 152, 184
Murtagh, B.A., 9, 36, 74 Nicolas, J.M., 79, 143
Murty, K., 641, 663 Niederer, M., 644, 682
Mutzel, P., 202, 254, 284 Nobert, Y., 648-649, 652, 704, 729
Mycielske, J., 654, 732 Nobili, P., 85, 146
Myers, D.C., 525, 531 Noebe, A.W., 652, 658, 723
Noemi, P., 641, 663
Naccache, P.F., 647, 695 Noonan, R, 649, 691
A uthor Index 763

Norback, J.P., 647-648, 650, 698, Padberg, M.W., 245, 247, 253, 292,
707 302, 313, 365, 372, 419,
Norman, R.Z., 642, 667 423, 426, , 480, 515, 525,
Northup, W.D., 743 529, 641-642, 659-660
Norton, C.Haibt, 472, 477 Paessens, H., 648, 707
Nygard, KE., 648, 706 Pai, R., 201, 292
Paixao, J., 220-223, 293, 643, 675
O'Connor, A.R., 644, 675 Paixao, J.P., 645, 682
O'Kelly, M., 653, 655, 719 Paixo, J., 641, 660
O'Niel, KK, 645, 684 Palem, K, 140, 145
O'Shea, D., 542, 570 Palermo, F.P., 654, 733
Oakford, R.V., 642, 667 Paletta, G., 647, 700
Odier, E., 644, 681 Panconesi, A., 118, 143
Odlyzko, A.M., 310, 424 Pantelides, C.C., 45, 75
Papadimitriou, C.H., 84, 112, 114,
Odoni, A.R., 645, 654, 681, 732
118, 125-127, 130, 146-147,
Odoni, H., 648-649, 651, 702
318, 426, 465, 477
Ogbu, U.I., 653, 658, 733
Pappalardo, M., 180, 183
Ogyczak, W., 652, 730
Pappas, LA., 643, 675
Oley, L.A., 480, 529
Pardalos, P., 281, 285
Olson, C.A., 648, 707
Pardalos, P.M., 130-131, 137, 143,
Ono, T., 116, 141
147, 151, 156, 159-160, 164,
Oppenheim, R., 641, 662
170-171,176,179,182,184,
Orlin, J.B., 643, 669 187, 189, 196, 198, 213,
Orlin, James B., 215, 217, 221, 233, 217, 235, 237, 239, 272,
282 277, 285, 287, 290, 293-
Orloff, C., 648, 705, 707 294,431,473-474, 477, 498,
Orloff, C.S., 653, 733 530, 642, 666-667
Orponen, P., 117, 146 Parker, M.E., 645, 682
Osleeb, J.P., 654, 733 Parker, R.G., 234, 293, 388, 419,
Ostrovsky, G.M., 8, 30, 32, 50, 74 648,702
Ostrovsky, M.G., 8, 30, 32, 50, 74 Patel, N., 653, 656, 733
Overton, M., 526, 529 Paterson, M.S., 641, 661
Owen, J., 745 Pato, M., 643, 675
Ozkarahan, 1., 643, 675 Pato, M.V., 645, 682
Patterson, J.H., 645, 686
Padberg, M., 246, 287, 305, 422, Patty, B., 645, 677
480,525,528,641-642,652, Patty, B.W., 645, 681
663, 717, 745 Paules, G.E., 8, 51, 74
764 A uthor Index

Paull, M., 139, 143 Plotkin, S.A., 308, 427, 472, 477
Paz, L., 654, 724 Plyter, N.V., 654, 724
Pearce, W., 643, 672 Poljak, S., 151, 179, 187, 277-279,
Pecora, D., 79, 146 287
Pederzoli, G., 654, 721 Pollack, M., 649, 707
Peeters, D., 175-176, 186, 652-653, Pollak, H.O., 642, 666
735 Polopolus, L., 652, 734
Pekny, J.F., 498, 530 Ponder, R., 644, 678
Peled, U.N., 324, 422, 641, 662- P:orn, R., 44, 75
663 Portugal, L., 220-223, 293
Pengilly, P.J., 654, 729 Potts, R.B., 643, 648, 670-671, 703
Perkins, W.A., 79, 146 Potvin, J-Y., 648, 708
Perl, J., 652, 733 Powell, W.B., 648, 703
Peters, D., 652-653, 726, 729 Pratt, V., 452, 474
Peterson, F.R., 649, 707 Prawda, J., 643, 677
Pettersson, F., 34, 44, 51, 76 Price, E., 644, 675
Pferschy, U., 309, 392, 419, 426 Price, W.L., 654, 656, 734
Philips, A.B., 119, 146 Prim, R.C., 216, 293
Phillippe, D., 643, 676 Pritsker, A.A., 654, 734
Phillips, A.T., 431, 473-474, 477 Protasi, M., 77, 79, 90, 117, 122,
Picard, J.C., 654, 734 135-136, 141
Piccione, C., 645, 683 Prutzman, P., 647, 650, 693
Pierce, J., 652, 725 Pruzan, P.M., 652, 728
Pierce, J.F., 641, 646, 649, 664, Psaraftis, H., 648-649, 651, 702
686,707 Psaraftis, H.N., 648-649, 708
Pierskalla, W.P., 643, 675 Puech, C., 352, 418
Pintelow, L.M., 652, 724 Pullen, H.G.M., 648, 708
Pirkul, H., 308, 422, 653-654, 734 Purdom, P.W., 83, 137, 144
Pisinger, D., 299, 309, 314, 328, Puri, R., 79, 144, 147
330,334-337,339-340,343- Putnam, H., 81, 143, 200, 284
345, 352, 355, 370, 372,
375-377, 384,386-387, 389- Quandt, R., 647, 649, 692
391, 398-399, 403, 406-408, Quesada, I., 30, 32-33, 75
412, 419, 426-427
Pitsoulis, L.S., 130-131, 147 Rabin, M.O., 642, 667
Plane, D.R., 654, 656, 734 Radzik, T., 429, 430-431, 464-465,
Plateau, G., 308-309, 321, 323, 342- 473,477
343, 359, 427, 420-421 Raedels, A., 643, 674
Platzman, L., 647, 650, 692 Raft, O.M., 648, 709
Author Index 765

Ragavachari, M., 187 Reed, B., 138, 142


Raghavan, P., 127, 146 Reggia, J.A., 646, 690
Ragsdale, C. T., 746 Reid, RA., 643, 652, 657, 676, 738
Raimond, J.F., 746 Reinelt, G., 202, 245-246, 248, 254,
Rajala, J., 653, 727 284, 286, 288
Ramachandran, B., 498,530 Rendl, F., 151, 179, 187, 277-280,
Ramakrishnan, KG., 79, 85, 145, 287, 297
196, 198, 225, 235, 237, Resende, M.G.C., 130-131, 147, 189,
259, 262, 269, 272, 288- 196, 198, 212, 217, 221-
289, 293-294, 498, 530 222, 225, 235, 237, 239,
Raman, R, 39, 75 259, 262, 269, 272, 277,
Ramana, M., 277, 294 282, 288-289, 293-295, 498,
Ramaswamy, S., 258, 294 530
Ramirez, RJ., 338, 425 Revelle, C., 652-655, 658, 718-719,
Ramser, J.H., 647, 649, 697 726,732
Rand, G.K, 654,734 Revelle, C.S., 652-654, 731
Randolph, W.D., 654, 721 Ribiere, G., 33, 72
Rannou, B., 644, 683 Richard, D., 652-653, 735
Rao, A., 653, 734 Richards, D.S., 642, 666
Rao, M.R, 430, 458, 475, 649, 652, Richardson, R, 649, 709
658, 709, 717, 723 Rinaldi, G., 202, 245, 247, 253-
Rao, S., 464, 476 254, 284, 480, 529
Rao, S.P., 201, 292 Rinnooy Kan, A., 648-649,703,711
Rappoport, S.S., 644, 668 Rinnooy Kan, A.H.G.,647- 648, 704-
Rardin, R, 745 705,745
Rardin, RL., 234, 293 Rinnooy Kan, H.G., 648, 702
Rath, G.J., 643, 675 Ritzman, L.P., 643-644, 673, 675-
Ratick, S., 653, 655, 722 676
Ratick, S.J., 653-654, 733-734 Rivest, RL., 419, 434, 452, 465,
Ratliff, H.D., 641-643, 647, 649, 474
653-654,660,665,669,697, Robbiano, L., 558, 571
723, 734 Roberts, A., 644, 678
Ratschek, H., 42, 75 Roberts, KL., 653, 717
Ravindran, R, 30, 32-33, 50, 73 Robertson, W.C., 648, 709
Ray, T., 652, 721-722 Robinson, D.F., 649, 700
Ray, T.L., 652, 720 Robinson, E.P., 643, 674
ReVelle, C., 652-654, 658, 734-735, Robinson, J.A., 80, 147
737,740 Rockafellar, RT., 161, 187
Read, E.G., 649, 700 Roes, A.W., 645, 679
766 Author Index

Rogers, J.D., 652, 740 Saatcioglu, 653, 736


Rojeski, P., 653, 735 Saedt, A.H.P., 652, 656, 736
Rokne, J., 42, 75 Safra, S., 116, 141
Romig, W., 649, 694 Saha, J.L., 649, 709
Ronen, D., 648-650, 695 Sahinidis, N.V., 41, 75
Roodman, G.M., 654, 736 Sahinoglou, H.D., 164, 185
Roos, C., 208, 270, 272, 295-296 Sahni, S., 313, 333, 337, 376, 423
Rosen, J.B., 156, 176, 182, 185, Salazar, A., 642, 667
187 Salkin, H.M., 641, 647, 663-664,
Rosenfield, 644, 679 691
Rosenwein, M.B., 649-650, 699 Saloman, M., 646, 685
Rosing, KE., 654, 736 Salvelsbergh, M.W.P., 648, 710
Ross, G., 654, 730 Salveson, M.E., 645, 686
Ross, G.T., 654, 736 Salzborn, F., 648-649, 651, 709-
Rossin, D.F., 643, 671 710
Roth, R., 641, 664 Samuelsson, H.M., 653, 658, 733
Rothblum, U.G., 186 Sandiford, P.J., 652, 655, 720
Rothstein, M., 643, 676 Sarkissian, R., 258, 286
Rouseau, J.-M., 645, 681 Sassano, A., 85, 146, 641, 661
Rousseau, J-M., 648, 708 Saunders, M.A., 36, 74
Rousseau, J., 649, 711 Sauve, M., 648, 697
Rousseau, J.-M., 644-645, 683 Savelsbergh, M.M.P, 648, 697
Rousseau, J.M., 643, 673 Saydam, C., 653, 736
Roussos, I.M., 164, 185 Scarf, H.E., 550, 571
Roy, B., 641, 664 Schaefer, M.R., 654, 736
Roy, S., 646, 688 Schaeffer, M.K, 654, 727
Roy, T.J.van, 305, 427 Schaffer, J., 648, 692
Rubin, J., 644, 683 Schaible, S., 430-431, 473-474, 477-
Rudeanu, S., 182, 186 478
Ruhe, G., 217, 295 Scheffi, Y., 645, 684
Rushton, G., 654, 736 Schilling, D., 653-654, 734
Russell, R., 647, 649, 709 Schilling, D.A., 653-654, 719, 737
Ryan, D.M., 641, 645, 647, 664, Schmeichel, E., 652, 725
679, 683, 700 Schneider, J.B., 654, 737
Rydell, P.C., 653, 736 Schneiderjans, M.J., 654, 656, 737
Ryoo, H.S., 41, 75 Schoepfie, G.K, 644, 668
Ryzhkov, A.P., 641, 664 Schrage, L., 648, 710
Schreuder, J.A.M., 653, 656, 737
Sa, G., 652, 736 Schrijver, A., 167, 178-179, 183,
Author Index 767

185, 187, 246, 286, 483, Shmoys, D.S., 745


526, 529, 535, 537, 550, Shor, P.W., 647, 695
553, 566,570, 571 Showalter, M., 643, 669-670
Schultz, H., 648, 710 Showalter, M.J., 643-644, 675-676
Schuster, A.D., 649, 694 Shreve, W.E., 648, 706
Schwartz, J.T., 156, 184 Shriver, RH., 652, 714
Schwarz, L.B., 654, 736 Simchi-Levi, D., 647-649, 695, 699,
Schweiger, C.A., 1, 52, 71, 75 705
Schweitzer, P., 647, 651, 700 Simeone, B., 85, 144, 308, 421, 641,
Scott, A.J., 652, 737 662, 663
Scott, D., 645, 683 Simmons, D., 653, 655, 721
Scott, K.L., 646, 686-687 Simmons, D.M., 654, 737-738
Scudder, G.D., 641, 646, 662, 686 Simpson, R, 649, 710
Segal, M., 643, 676 Sinclair, G.B., 647, 650, 697
Selman, B., 119, 126, 129, 133, 138, Sinerro, J., 653, 727
146-147 Singer, I., 178, 188
Serra, D., 654, 734 Singhal, J., 746
Sethi, S.P., 310, 424 Sinha, A., 305, 365-366, 373, 380,
Sexton, T., 648-649, 694, 710 427
Shamir, A., 646, 688 Sinha, L.P., 217, 295
Shanker, RJ., 652, 737 Sivazlian, B.B., 643, 672
Shanno, D.F., 208, 210, 253 Sjouquist, RJ., 480, 529
Shannon, R.D., 652, 737 Skrifvars, H., 44, 75
Shapiro, J., 746 Slater, P.J., 653, 738
Shapiro, J.F., 743 Slorin-Kapov, J., 178, 185
Shepardson,F.,643,645, 676,682- Slutsman, L., 212, 289
683 Slyke, R Van, 258, 295
Sheppard, R, 643, 668 Smith, B., 649, 651, 711
Sherali, H.D., 179, 187-188, 217, Smith, B.M., 645, 682, 684
283,479-484,493,498-499, Smith, E.M.B., 45, 75
506,517,522,524-527,530- Smith, H.L., 643, 652, 657, 676,
532 738
Shetty, C.M., 388, 419 Smith, L, 643, 676
Shi, C.J., 263, 272, 295 Snyder, RD., 653-654, 738
Shi, J., 175, 188 Soland, RM., 654, 736
Shiloach, Y., 464, 478 Solomon, M., 647-649, 697, 711
Shindoh, S., 277, 290 Solomon, M.M., 648, 703
Shlifer, E., 647, 651, 700 Sorenson, D.C., 265, 292
Shmoys, D.B., 308, 427 Sorenson, E.E., 648, 707
768 A uthor Index

Sorger, G., 310, 424 Suhl, V.H., 480, 529


Soumis, F., 648-649, 697-698, 711 Sullivan, W.J., 648, 707
Souza, C.de, 305, 421 Sum, J., 641, 662
Spaccamela, M.A., 649, 711 Sumichrast, RT., 648, 712
Spears, W.M., 128, 148 Sumners, D.L., 643, 669-670
Speckenmeyer, E., 83, 142 Sussmas, J.E., 652, 716
Spellman, R, 647, 700 Sutcliffe, C., 648, 712
Spielberg, K, 641, 652, 663, 725, Sutter, A., 527
738,742 Swain, R., 653, 658, 735, 740
Spirakis, P., 140, 145 Swain, RW., 652, 739
Spitzer, M., 644, 683-684 Syslo, M.M., 318, 427
Stan, M., 133, 145 Szegedy, M., 116, 140
Stancill, J.M., 652, 738 Szemeredi, E., 139, 142
Stanfell, L.E., 646, 686 Szpigel, V., 649, 712
Stary, M.A., 652, 716
Steiger, F., 645, 684 Taillefer, S., 648-649, 704
Steiger, F.C., 644, 677 Talbot, F.B., 645, 686
Steiglitz, K, 84, 112, 114, 118, 147, Tamir, A., 654, 721
318, 426, 465, 477 Tan, C., 648, 712
Stein, D., 649, 711 Tanga, R,645, 677
Stern, E., 653, 655, 658, 720 Tanimoto, S.L., 641, 661
Stern, H., 644, 648-651, 684, 695, Tansel, B.C., 652-654, 739
711 Tapia, RA., 226, 235, 285
Stern, H.I., 644-645, 676, 683 Tapiero, C.S., 652, 739
Stevenson, KA., 654, 729 Tarasov, S.P., 186
Stewart, W.R, 648-649, 711-712 Tardella, F., 149, 163, 169, 171,
Stillman, M., 540, 571 173, 178, 184, 188
Storbeck, J., 653,716-717 Tardos, E., 308, 427
Storbeck, J.E., 653-654, 714, 719, Tardos, E., 178, 183-184,472,477,
738 550,570
Stouge, L., 648, 702 Tarjan, R, 464, 476
Stougie, L., 649, 711 Tarjan, RE., 217, 221, 295, 453,
Stricker, R, 648, 712 464, 474-476
Stulman, A., 653, 731 Taylor, P.E., 644, 676
Sturmfels, B., 536-537, 542, 545, Taylor, P.J., 654, 739
548-549,553,555,558,562, Tayur, S.R, 549, 568, 572
566, 568-569, 570-572 Teather, W., 644, 677
Sudan, M., 116, 122, 140, 145 Tecchiolli, G., 134, 141
Suen, S., 140, 143 Teitz, M., 653-654, 739
Author Index 769

Terjung, R.C., 646, 686 Trouchon, M., 646, 689


Terlaky, T., 208, 220-223, 270, 272, Trubin, V.A., 641, 664
293, 295-296 Trudear, P., 647, 698
Tewari, V.K., 654, 657, 739 Trzechiakowske, W., 654, 732
Tezuka, S., 474, 476 Tsuchiya, T., 225, 294
Thienel, S., 245, 288 Tuncbilek, C.H., 480, 493, 517, 525,
Thisse, J.F., 175-176, 186,652-654, 531
726 Tuncel, T., 85, 142
Thomas, R.R., 533, 549-550, 553, Turcotte, M., 654, 656, 734
555, 568-569, 570, 572 T:urkay, M., 37-39, 75
Thomas, W., 649, 707 Turner, W., 648, 712
Thuve, H., 646, 690 Tuy, H., 156, 162, 176, 182, 186,
Tibrewala, R., 643, 676 188
Tibrewala, R.K., 643, 670 Tyagi, M., 648, 713
Tideman, M., 654, 739 Tykulsker, R.J., 645, 684
Tien, J .M., 643, 677
Tillman, F., 648, 712 Ullman, J.D., 744
Tind, J., 182, 186
Ullmann, Z., 388, 426
Tjalling, C., 652, 728
Ulular, 0., 525-526, 531
Todd, M.J., 247, 277, 292
Unwin, W., 648, 713
Toledo, S., 472, 478
Upfal, E., 140, 142
Topkis, D.M., 167, 177, 188
Urbaniak, R., 569-570, 572
Toregas, C., 653, 658, 735, 740
Urbanke, R., 568, 571
Toth, P., 299, 309, 314, 318, 323,
325-329, 332-334, 336, 340,
342-343,345,347-348,352, Vaidya, P.M., 221, 223, 258, 283,
355,359,361-362,382,384, 288,296
387,391,394,396-400,403- Vaidyanathan, R., 42-44, 75
404,406,408-410,415,418- Valenta, J.R., 646, 690
421,424-425,427,647,696 Valenzuela, T., 654, 656, 724
Tovey, C.A., 646, 687 Valiant, L.G., 454, 478
Traverso, C., 542, 545-546, 570 Valinsky, D., 653, 740
Trevisan, L., 116, 148 Van Leeuwen P., 648, 713
Trick, M., 136, 145 Van Loan, C.F., 221, 286
Trick, M.A., 198, 288 Van Roy, T.J., 480, 532, 652, 740
Trienekens, H., 648, 703 Van Ryzin G., 647, 694
Tripathy, A., 646, 690 Van Slyke, R., 646, 690
Trotter, I.E., 646, 688 Van Wassenhove, L.N., 646, 652,
Trotter, L.E., 642, 667 656, 685, 724, 732-733
770 Author Index

Vandenberghe, L., 151, 179, 188, Walukiewicz, S., 309, 318, 324, 373-
277,296 374, 376, 420
Vanderbei, RJ., 217,277,279,287, Wang, C.C., 642, 667
296 Wang, J., 221, 225, 291
Vanderweide, J.H., 652, 730 Wang, P., 212, 289
Vannelli, A., 263, 272, 295 Wang, P.Y., 646, 690
Vasko, F.J., 641, 646, 664-645, 686, Wang, Y., 85, 142
687 Ward, RE., 645, 684
Vazirani, U., 122, 145 Warden, W., 647, 650, 692
Vecchi, M.P., 132, 145 Warner, D.M., 643, 677
Veiga, G., 196, 212, 217, 221-222, Warners, J.P., 270, 272, 296
227, 225, 282, 293-294 Wasil, E., 648, 651, 701
Vemuganti, RR., 573, 642, 646, Waters, N.M., 654, 657, 742
665-666, 687 Watson-Gandy, C.D.T., 648, 654,
Vergin, RC., 652, 740 713, 740-741
Vial, J.P., 258, 283, 286 Watson-Gandy, G., 647, 698
Vial, J.Ph., 208, 295 Watts, C.A., 643, 674
Vijay, J., 652, 740 Weaver, J.-R., 653-654, 741
Vincent, 0., 33, 72 Weaver, J.R, 652-654, 717
Viola, P., 652, 717 Webb, M.H.J., 648,708,713
Vishkin, U., 464, 478 Webster, L., 645, 686
Viswanathan, J., 9, 20-21, 51, 76 Wee, T.S., 645, 685
Vlach, J., 263, 272, 295 Weinberger, D.B., 642, 667
Vohra, RK., 653, 722 Weismantel, R, 277-279, 287, 305,
Vohra, RV., 643, 677 324, 372, 407, 421, 427,
Vohra, V., 653-654, 738 569-570, 572
Volgenant, A., 648, 713 Wells, R, 648, 706
Volz, RA., 654, 657, 740 Welsh, A., 645, 682
Vu'skovic, K., 179, 183 Wendell, RE., 652, 654, 726-727,
741
Wagner, J.L., 653, 740 Weslowsky, G.O., 654, 729-730
Wagner, RA., 642, 666 Wesolowsky, G.O., 654, 741
Wah, B.W., 83, 137, 144 Westerlund, T., 34, 44, 51, 75-76
Walker, J., 372-373, 420 Weston, Jr., F.C., 654, 657, 741
Walker, W., 643, 653, 657, 673, Westwood, J.B., 646, 687
740 Wets, R., 258, 295
Walker, W.E., 653, 656, 728 Whinston, A., 649, 691
Wallacher, C., 464, 478 White, A.L., 653, 734
Walsh, T., 129, 143 White, G.M., 646,690
Author Index 771

White, J.A., 653-654, 722-723, 742 667, 695


White, W., 649, 713 Wood, D.C., 642, 646, 668, 690
White, W.W., 645, 687 Wood, D.E., 32,74
Whybark, W.E., 652, 728 Woodbury, M., 646, 690
Widmayer, P., 642, 668 Wren, A., 645, 648-649, 651, 684,
Wierwille, W.W., 654, 722 711,713
Wiggins, A., 643, 676 Wright, J.W., 647, 696
Williams, B., 648, 713 Wright, S., 208, 296
Williams, F.B., 653, 656, 726 Wu, Y.F., 642, 668
Williams, H.P., 480, 532
Williamson, D.P., 85, 144,276-277, Xu, J., 649, 699
286 Xu, X., 212, 282
Wilson, G.R., 641, 664-645 Yannakakis, M., 85, 110, 112, 115,
Wilson, N., 648-649, 651, 702 148
Wilson, N.H.M., 645, 681 Yano, C.A., 641, 660
Winter, P., 642, 667 Yao, A.C., 453, 478
Wirasinghe, S.C., 654, 657, 742 Yao, E.Y., 641, 662
Witzgal, C., 366, 427 Ye, Y., 208, 212, 221, 258, 260,
Witzgall, C.J., 653, 724 280, 283, 286, 288, 296
Woeginger, G.J., 309, 426 Yee, T.F., 60, 76
Wolf, F.E., 646, 686-687 Yeh, Quey-Jen, 221, 296
Wolf, H.B., 643, 674 Yellow, P., 648, 713
Wolfe, F.E., 641, 664 Yesilkokcen, G., 654, 739
Wolfe, H., 644, 677 Yoshitsugu, Y., 175, 188
Wolfe, P., 652, 715, 744 Young, D., 649, 713
Wolkowicz, H., 151, 179, 187, 276- Young, H.A., 654, 742
277, 279-280" 287,293,296- Young, J.P., 644, 677
297, 526, 529
Wolsey, L., 305, 421, 569-570, 641, Zamora, J.M., 70, 76
661, 743 Zangwill, W.I., 170, 188
Wolsey, L.A., 179, 187, 245, 292, Zemel, E., 311-313, 321, 324, 338-
305, 313, 318, 324, 426- 340, 342, 368, 419, 428,
428, 480, 507, 529, 532, 517-518,525,527,653,731
652, 718, 745 Zhang, X., 280, 283
Wolters, J., 649, 713 Zhang, Y., 210, 226, 235, 285, 297
Wong, C.K., 373, 394, 408, 419, Zhao, Q., 210, 276-277, 279-280,
422, 642, 668 296-297
Wong, R., 647, 701 Zheng, H., 92, 94, 96-97, 142
Wong, R.T., 481, 529, 642, 649, Ziegler, G., 535, 569, 572
772 Author Index

Zionts, S., 649, 709


Zipkin, P., 649, 699
Zoltners, A.A., 305, 342, 365-366,
373,380,427-428,652,737
Zwick, D., 116, 145
Subject Index
0-1 Knapsack Problem, 302, 306, backtracks, 83
318-351 backward greedy solution, 330
3/4-approximate algorithm for MAX balanced filling, 335
W-SAT, 110 balanced insert, 335
3/4-Approximate SAT algorithm, balanced remove, 335
108 balancing, 314
balsub algorithm, 357-358
active literal, 97 baL.zem algorithm, 321
adjacency list, 214 base polyhedron, 167
adjacent arcs, 586 basic moves, 119
adjacent edges, 589 basic sets, 37
adjacent nodes, 586 basic variables, 37
ADP, 237 BB, Branch and Bound, 3
affine variety, 542 beginning node, 586
airline crew assignment problem, Bellman recursion, 355
193 bi-partite planar graph, 5
ak-approximation algorithm, 106 Bidimensional Knapsack Problem,
algebraic theory of Gr:obner bases, 308
538 bilinear programming problem, 176
allowed, 132 Bin-packing Problem, 308
almost strongly correlated instances, binary search method, 436
316, 348 bipartite graph, 586
annealing schedule, 129 blood analysis model, 620-621
approximation algorithm, 86, 88 Boolean Quadric Polytope, 493
aptimization problem, 87 bouknap algorithm,391
APX, 89 bound,303
APX-complete, 91 bound-and-bound algorithm, 410
arc list, 214 bound-factor products of degree (or
arcs, 586 order) d, 485
assembly line balancing problem, bound-factors, 485
609-610 Bounded Knapsack Problem (BKP),
asymmetric traveling salesman prob- 302, 306, 382-394
lem, 520 Bounded Multiple-choice Knapsack
augmented network, 232 Problem, 309
average k-SAT model, 137 bounded polyhedron, 534
avis problems, 361 bounding step, 30

773
774 Subject Index

bounding, 34 clause, 198


branch and bound methods, 233 clausing weighting, 133
branch and cut methods, 242-259 closed convex sets, 162
branch-and-bound algorithm, 30- CNF, 80
34,332-335, 373-374, 387- coefficient matrix, 442
388 coefficient matrix, 534
branch-and-bound method, 84 coherent, 537
branch-and-bound search tree, 237 Collapsing Knapsack Problem, 309
branch-and-bound tree, 30, 32 combinatorial approach, 202
branch-and-bound, 303, 397-398 combinatorial extremum problems,
branch-and-reduce algorithm, 41 157
branching priorities, 32 combinatorial optimization prob-
branching step, 30 lem, 150, 192
branching variable, 43, 47 Combinatorial Optimization, 303
branching, 34, 84, 239 compact convex sets, 162
breadth-first approach, 32 completeness in an approximation
break item, 311, 321 class, 90
break solution, 321 ComputeMaxima algorithm, 457
Buchberger's algorithm, 542 concave functions, 162
concave quadratic minimization, 180
capacitated facility location prob- conceptual interior point cutting
lems, 234 plane algorithm, 248
capacitated transshipment problem, conditional logic, 516
568 conditonal probabilities, 99
capacity, 463 configurations, 133
capital budgeting problem, 626 conjugate gradient method, 221-
cardinality bounds, 325 225
cells, 535 conjunctive normal form, 78
cellular manufacturing problem, 617 constrained extremum problem, 152
central trajectory, 209 constrained network scheduling prob-
chain, 586 lem, 615-617
Change-making Problem, 307 constraint underestimators, 41
check clearing, 625-626 constraint-factors, 501
Cholesky factors, 211-212 construction phase, 130
chromatic index, 592, 593 Conti-Traverso algorithm, 543-546
chromatic number, 593 continuous approach, 203
circuit fiber, 551 continuous branching variable, 43
circuit, 587 continuous embedding, 204
class reduction, 373 continuous mathematics, 85
Subject Index 775

Continuous Multiple-choice Knap- derandomization, 99


sack Problem (MCKP), 365 descent direction, 265-269
continuous relaxation, 311, 328, 404 diagnostic expert system, 621
continuous trajectories, 206-207 diagonal preconditioner, 223
continuously dominated, 366 Dinkelbach method, 437
convex hull extreme point, 161 diophantine equation, 351
convex hull representation, 482 directed graph, 586
convex hull, 161, 244 disconnecting set, 594
convex minimization problems, 207 discrete branching variable, 43
Convex MINLPs, 8 discrete lot sizing and scheduling
convex underestimators, 41, 47 problem, 610-611
convex-concave problems, 176-178 discrete optimization problem, 150,
core problem, 303, 312, 338-339, 193
361, 377 discrete problems, 191
core, 339, 344, 361, 377 disjunctive linear program, 38
corrector step, 210 disjunctive programming, 38
cost vector, 534 distance, 175
cost, 431 distillation sequencing problem, 56-
cover, 576 60
coverage, 80 distrust-region method, 43
Cramer's rule, 442 division algorithm, 542
crew base, 604 dominance relations, 366, 398
crew scheduling, 603-608 dominance rule, 357
critical item, 311 dominated items, 398
CTD test, 29 DP algorithm, 81
CTDU test, 29 DPLL algorithm, 81
cutting plane algorithm, 245 dual affine scaling algorithm, 217-
cutting planes, 242, 244 218, 226
Cutting problem, 352 dual affine scaling method, 208
cutting stock problem, 613-615 dual constraints, 230
cycle, 587 dual feasible solution, 230
cyclical scheduling, 599-600 dual representation, 12
dye-zem algorithm, 369
Dantzig bound, 320 dynamic programming algorithms,
days off scheduling, 599-600 354-359, 374-376
dead clauses, 95 dynamic programming, 303, 409
deductive reasoning, 129
degree, 558 e-approximable, 89
depth-first approach, 32 ECP algorithm, 34-36, 44
776 Subject Index

ECP, Extended Cutting Plane, 3 findcore algorithm, 340


edge, 175, 586 finite set, 175
edge-weight function, 463 finite subset, 540
elementary chain, 587 fixed charge problem, 627
elementary path, 587 fixed length clause model, 137
eliminate the unit clauses, 110 fixed-core algorithm, 342, 379
elimination term order, 545 FIXED-TS algorith, 134
elimination, 239 flight leg, 604
ellipsoid method, 208 flight segment, 604
ending node, 586 flow capacity contraints, 214
enumerative bounds, 372 flow conservation constraints, 214
equality constraint representations, flow limitations, 213
495 flow, 465
equivalent vectors, 558 formulae equivalent, 112
escape, 122 forward greedy solution, 330
essential, 447, 466 forward star representation, 214
estimation of the nodes, 32 fractional combinatorial optimiza-
Euclidean distance, 634 tion problem, 430
evenodd problems, 361 fractional optimization, 430
existential fashion, 482 frequency plan, 618
expanding-core algorithm, 343-345, frequency planning problem, 618-
379 619
expected performance, 100 fully polynomial approximation, 346,
expected value, 100 386
expected weight, 100, 102 fully polynomial-time approxima-
exponential-time algorithms, 203 tion scheme, 303
Extended Cutting Plane (ECP), 34-
36 gambler'S ruin chain, 127
extreme point, 161 gangster operator, 279
gas pipeline network, 37
FA, 3, 9, 36-37 GBD algorithm, 13-14
FA, Feasibility Approach, 3 GBD master problem, 11-13
faces, 535 GBD primal problem, 10-11
facets, 245 GBD, Generalized Benders Decom-
feasibility based range reduction tests, position, 3, 9-14
41 GCD algorithm, 27-29
feasible region, 195, 208 GCD, Generalized Cross Decom-
feasible solution, 88, 309, 534 position, 3
fiber, 534 Genapprox algorithm, 104
Subject Index 777

general cutting plane algorithm, 84 Grasp algorithm, 131


general fractional optimization, 430 Graver arrangement, 567
general set covering (GSC), 578 Graver basis, 550, 553
general set packing (GSP), 578 greedy algorithm, 330, 367
general set partitioning (GSPT), Greedy Johnson 1 algorithm, 92
578 Greedy Johnson 2 algorithm, 95
Generalized Assignment Problem, greedy principle, 320
309 greedy, 91
generalized set covering problem, Grabner bases to integer program-
84 ming, 534
generalized simplex method, 541 Grabner basis, 542
generalized upper bounding (GUB) Grabner core, 560
constrained knapsack poly- Grabner fans, 558, 559
topes, 493 Grabner fiber, 553
generalized upper bounding (GUB) GSAT,129-130
constraints, 506-512 GSAT-WITH-WALK algorithm, 130
generic cost vector, 556, 558 GTP test, 29
generic, 534 GUC algorithm, 140
GENRANDOM algorithm, 102
geometric inequality, 105 H-RTS algorithm, 135-136
geometric interpretation, 195 half-spaces, 195
geometric sequence, 441 heat exchanger network synthesis
global approximation, 203, 205 problem, 60-70
global minimum point, 161, 163- Hessian matrix, 86
164 Hilbert basis, 553
global minimum, 161, 208 history, 128, 132
global optimality conditions, 180 history-sensitive heuristics, 132-136
global optimum, 122 HSAT, 133
GMIN-aBB algorithm, 50-51 hs_branch algorithm, 333
GMIN-aBB, General structure Mixed hyperplane arrangement, 536, 567
Integer Nonliear aBB, 3
GOA algorith, 23-26 ill-conditional problems, 313
GOA master problem, 23-25 ILP feasibility problem, 85
GOA primal problem, 23 ILP, 83
Gomory cutting planes, 245 iminimal test set, 547
good entries, 471 incomplete QR decomposition, 223
good models, 480 indicator, 226
graded polynomial ideals, 558 inductive inference, 85, 192-193, 198
gradients, 377 infeasibility test, 42
778 Subject Index

infeasible primal problem, 10 killed clause, 97


infinitesimal version, 205 Kirchhoff's Law, 213
information retrieval and editing, Knapsack Problem with GUB, 365
623-625 Knapsack Problems, 246, 301, 302
ingot size, 611
initial best known upper bound, Lagrange Interpolation Polynomi-
239 als (LIP), 523
initial branch-and-bound search tree, Lagrangian relaxations, 322, 404,
237 406
initial form, 546 lattice arithmetic, 534
initial ideal, 542 Lawrence lifting, 553
initial monomial, 542 Lawrence type, 567
initial term, 542 length, 175
initial upper bound, 237 level 1 factors, 505
inner normal cone, 535 level 2 factors, 506
inner normal fan, 535 level d factors, 506
Integer Linear Programming, 302 Lie group embedding, 204
integer programming problem, 242, lift-and-project cutting plane algo-
246, 270 rithm, 494
Integer Programming Problem, 308 linear algorithm, 450
integer programming, 193 linear complementarity problem, 159
interior point algorithm, 85, 275 linear fractional combinatorial op-
interior point cutting plane algo- timization problem, 433
rith, 255 linear integer programming (LIP),
interior point cutting plane meth- 576
ods, 246-249 linear mixed integer 0-1 program-
interior point methods, 191, 247 ming problem, 484
interval analysis based algorithm, linear ordering problem, 193, 247
42-44 linear parallel algorithm, 453
inverse image, 205 linear programming (LP) relaxation,
inverse strongly correlated instances, 234
316, 348 linear programming relaxation, 84
Inverse-parametric Knapsack Prob- linear programming, 208-212
lem, 309 linear relaxations, 482
item r dominates item s, 366 list size, 134
literal, 198
local minimum point, 181
k-SAT model, 137 local minimum, 208, 261
Karmarkar Method, 205 local optimum point, 120
Subject Index 779

local search phase, 130 MAX-VAR BOUNDED SAT prob-


local search, 118-127, 202 lem, 118
lock box location problem, 636 MAX-VAR SAT problem, 117-118
Logic-Based Generalized Benders MaxCost algorithm, 435
Decomposition, 40 MaxCost2 algorithm, 456
Logic-Based Outer Approximation MaxCostRecursive algorithm 456-
algorithm, 39-40 457
lower bound test, 43 maxima-computing phase, 455
lower bound, 272 maximum cardinality constraint, 325
lower face, 538 maximum cut problem, 249
lower semicontinuous, 169 maximum flow problem, 215, 462
LP relaxation, 84, 103, 242 maximum flow, 465
LP,33 maximum matching problem, 590
LP-dominated, 366 maximum profit-to-time ratio cy-
LPnew, 251 cle problem, 458
LS-NOB, 135 maximum profit-to-time ratio cy-
LS-OB, oblivious local search, 122- cles, 458-460
123 maximum-mean cycle problem, 460
maximum-mean cycles, 460-462
m-concave, 165 maximum-ratio spanning tree, 453
m-quasi-concave, 165 maximum-ratio spanning-tree prob-
Markov chain, 127 lem,453
Markov process, 126-128 maximum-surplus cut, 463
MarkovSearch randomized algorithm MaxMeanCut problem, 464
for 2-SAT, 127 MaxRatioCut problem,462
Martello and Toth upper bound, MaxRatioCycle problem, 458
323, 384, 395 MaxRatioPath problem, 434
mass, 94 mean-weight cost, 431
master problem formulation, 11 mean-weight surplus, 463
master problem, 9 Megiddo's parametric search method,
matched, 80 430, 448-457
matching, 590 memory-less, 128
mathematical modeling, 3, 5-6 minimal non-face, 540
mathematical problems, 628 minimal test set, 540
MAX 2-SAT, 122 minimax problem, 170
MAX W-k-SAT, 79 minimum cardinality constraints,
MAX W-SAT, 79 325
MAX-k-SAT, 79 minimum cost network flow prob-
MAX-SAT problem, 78 lem, 213, 215
780 Subject Index

minimum covering problem, 592 problems, 494


minimum-cost flow problem, 464 multiple choice constraints, 507
minimum-cost spanning-tree prob- multiple depots and extensions, 633
lem, 453 Multiple Knapsack Problem (MKP),
minknap algorithm, 344 302, 307, 402-417
Minkowski integrals, 536 multiple runs, 128
Minkowski sum of line segments, Multiple-choice Knapsack Problem
567 (MCKP), 302, 306, 364-
Minkowski sum, 535 381
Minkowski summands, 535 Multiple-choice Nested Knapsack
MINLP, mixed-integer nonlinear pro- Problem, 309
gramming, 3, 6 Multiple-choice Subset-sum Prob-
MINOPT, Mixed-integer Nonlin- lem, 309
ear OPTimization, 3-4, 51-
56 negative clause, 97
mislead, 126 negative literal, 96
mixed discrete-continuous optimiza- neighbors of the origin, 550
tion, 6 Nested Knapsack Problem, 309
mixed-integer 0-1 constraint region, network flow problems, 212
487 network programming, 212-217
mixed-integer zero-one polynomial network, 175, 463, 586
programming problems, 482 neural network approaches, 86
model-finding procedure, 129 Newton method, 430, 437-448
monomial ideal, 546 NLP solver, 33
monotonicity test, 42 NLP, 33
most fractional variable rule, 32 NOB & OB, two-phase local search
mt2 algorithm, 342 algorithm, 124
mtm algorithm, 411 node with the lowest lower bound,
mts algorithm, 360 32
mtu2 algorithm, 400 node-arc incidence matrix, 214
mUlknap algorithm, 412-413 node-node adjacency matrix, 214
multi-commodity minimum discon- nodes, 586
necting problem, 594 non-fractional version, 432
Multi-constrained Knapsack Prob- non-negative real line, 538
lem, 302, 308 non-oblivious functions, 122
multi-commodity network flow prob- non-standard set of disjunctions,
lem, 216 482
multi-linear mixed-integer zero-one nonbasic sets, 37
polynomial programming nonbasic variables, 36
Subject Index 781

nonconvex MINLPs, 40 outer approximation methods, 14


nonconvex optimization problem,
259 P-center problem, 637
nonconvex potential function min- P-median problem, 638
imization algorithm, 264 packing problems, 304
nonconvex potential function min- packing, 576
imization, 259-272 parallel algorithm, 453
nonconvexity test, 43 parameter GAP, 433
nondeterministic machine, 87 parameterized ellipsoid, 273
Nonlinear Knapsack Problem, 309 parametric problem, 431
nonlinear program, 150 partial enumeration, 323
nonlinear pump configuration prob- partitioning, 577
lem,44 path, 175, 587
normal fan, 535 PCP, probabilistic checkable proof,
normal form, 542 116
normally equivalent, 535
PDPCM,208
NP-complete problem, 79
performance ratio, 88
NPO problem, 87
persistency issues, 522
NPO-complete,91
personnel scheduling problem, 598
OA algorithm, 16-17 piecewise concave, 168
OA master problem, 15-16 piecewise convexity, 168
OA primal problem, 15 piecewise quasi-concave, 168
OA, Outer Approximation, 3, 14- piecewise quasi-convexity, 168
18 pLbranch algorithm, 334
OA/ER algorithm, 18-19 plant location problem, 634-636
OAIERI AP algorithm, 20-23 point configuration, 537
off-line, 132 pointed Grabner fan, 560
on-line, 132 pointed secondary fan, 564
optimal face, 226 political districting problem, 622-
optimal process fiowsheets, 5 623
optimal slope, 368 polyhedral complex, 535, 537
optimal solution, 195, 368 polyhedral cone, 534
optimality based range reduction polyhedral fan, 535, 537
tests, 41 polyhedral geometry, 534
Original Conti-Traverso, algorithm, polyhedral properties, 313
545 polyhedron, 165
original input matrix, 206 polymatroid, 167
original parameters, 206 polynomial-time algorithms, 203
782 Subject Index

polynomial-time approximation scheme, quadratic assignment problem (QAP),


89, 303 236-242, 497
Polynomial-time, 386 Quadratic Assignment Problem, 309
polytope, 534 quadratic assignment, 37
positive literal, 96 quadratic integer programming prob-
potential function, 86 lem, 277
preconditioned conjugate gradient Quadratic Knapsack Problem, 308
algorithm, 218-220 quadratic programming problem,
preconditioner, 221-225 159
predictor step, 210 quasi-concave functions, 162
predictor-corrector method, 212 quicksort algorithm, 339
primal feasible solution, 232
primal formulation, 10 RANDOM algorithm, 100
primal problem, 9 random clause model, 137
primal-dual algorithm, 334 randomization, 99
primal-dual predictor-corrector method, randomized 1/2-approximate algo-
208 rithm for MAX W-SAT,
primitive edge of the circuit fiber, 99-102
552 randomized algorithms for MAX
primitive edge of the Gravner fiber, W-SAT, 99-110
555 randomized greedy construction, 131
primitive non-zero vector, 551 Randomized Rounding, 105
primitive, 551 REACTIVE-TS, 134
problem entry, 53 reasoning, 129
problem solution, 53-54 rectilinear distance, 634
process flowsheets, 3 reduced Grabner basis, 542
process superstructure, 5 reduction algorithms, 398-402, 407-
process synthesis problem, 3-4 408
prohibited, 132 reduction procedure, 303
prohibition parameter, 134 reduction, 90, 331, 386
projective transformation, 205 refinement of two fans, 536
property, 108 Reformulation Linearization Tech-
pseudo-costs, 33 nique (RLT) , 479, 484-485
pseudo-polynomial time, 310 reformulation, 45
pseudo-polynomially solvable, 310 regular triangulation, 537
PTAS,89 regular, 537
PTAS-reduction,90 relative boundary, 163
PURE LITERAL algorithm, 140 relative interior, 163
pure literal, 83 relaxation-penalization, 152
Subject Index 783

replacement technique, 355 set partitioning (SPT) problem, 575,


representation of process alterna- 577
tives, 5 set partitioning model, 577
residual capacity, 465 set partitioning polytope, 493
residual demand, 465 shift scheduling, 601
residual network, 113 Simplex Method, 196, 208
restricted network, 232 simulated annealing algorithm for
reverse star representation, 214 SAT, 128
right hand side vector, 442, 534 sing spin glass problem, 248
rounding, 85 single depot vehicle routing, 631-
row generation scheme, 595 632
sink nodes, 213
sink, 463
S-factors, 500
slacks, 86
SAMD algorithm, 132
smallest clause, 83
Satisfiability (SAT) Problem, 198
SMIN-aBB, 46-50
satisfiable, 80, 198
SMIN-aBB, Special structure Mixed
satisfied clauses, 97
Integer Nonliear aBB, 3
saturated lattice, 534
source nodes, 213
scheduling personnel, 598
source, 463
search trajectory, 119, 133 spanning tree preconditioner, 221
secondary cone, 563 spare parts allocation problem, 613
secondary fan, 563 sparse constraints, 515
secondary polytopes, 537, 563 spatial branch-and-bound algorithm,
selection, 83, 239 45
semidefined programming, 116 Special Structures Reformulation
semidefinite programming relaxations, Linearization Technique (SS-
276-280 RLT),502
semidefinite programs, 179 splitting rule, 81
separable, 313 stability number of a graph, 588
separating resolvents, 85 stage, 451
separation problem, 246 standard assignment problem, 591
separation routine, 246, 253 standard monomials, 542
service facility location problem, 639- state polytopes, 537, 558, 559, 560
640 states, 310
set covering (SC) problem, 575, 577 Steiner points, 596
set covering model, 577 Steiner problem in graphs, 596
set packing (SP) problem, 575, 577 stochastic search trajectory, 128
set packing model, 577 straightforward merging process, 455
784 Subject Index

strictly concave function, 162 timetable scheduling problem, 619-


strictly concave, 156 620
strictly piecewise concave, 168 todd problems, 361
strictly quasi-concave function, 162 topology, 207
strongest possible cutting planes, toric ideal, 542
245 toric variety, 542
strongly correlated data instances, tour scheduling, 602-603
379 trace minimization problem, 276
strongly correlated instances, 316, tramp-steamer problem, 458
348 transshipment nodes, 213
structured problem models, 137 traveling salesman problem, 629-
structures, 430 631
subdivision, 537 tree, 587
subedge, 175 triangulation, 537
sublattice, 166 trust region approach, 86, 260
submodular, 166 TS, Tabu Search, 132
subset of continuous space, 207 two-constraint knapsack problem,
subset-sum data instances, 380 326
subset-sum instances, 316 two-phase local search algorithm,
Subset-sum Problem (SSP), 351, 124
307
superstructure, 5 Unbounded Knapsack Problem (UKP),
support, 535 306, 394-402
surplus, 463 unbounded, 162
surrogate relaxation, 328 uncapacitated minimum cost net-
surrogate relaxed problem, 404 work flow problem, 216
symmetric flow, 112 uncapacitated network flow prob-
symmetric network, 112 lem, 214
unconstrained, quadratic pseudo-
temperature, 128-129 Boolean programming prob-
term order, 542 lem, 482
termination test, 239 uncorrelated data instances, 315,
test sets, 537, 547 348,379
testing and diagnosis problem, 620- undirected graph, 586
621 uniform fractional combinatorial op-
threshold effects, 138-140 timization problem, 433
threshold value, 138 unimodular, 566
tighter bound, 323 UNISAT models, 86
tighter reduction, 332 UNIT CLAUSE algorithm, 140
Subject Index 785

unit clause rule, 83


unit flow cost, 213
universal Grobner basis, 553
universal test sets, 537, 550
unrestricted location, 633
unsatisfiable, 198
upper bound, 84
upper-bound test, 42

Valiant's parallel algorithm, 454


Value-independent Knapsack Prob-
lem, 351
variable upper bounding (VUB) con-
straints, 506, 512-515
varialbe bounds, 48-49
vertex packing problem, 510, 588
vertex packing, 588
vertices, 175, 586

Weak Minimum Principle (WMP),


163
weaker bounds, 376
weakly correlated data instances,
379
weakly correlated instances, 315,
348
weight, 78, 94, 431, 463
weighted MAX-SAT problem, 78
weighted median problem, 321
weighted Vertex Packing Problem,
589
WeightedMaxCost algorithm, 449
wire routing problem, 200
wounds, 95

zig-zag instances, 380


zonotope, 536, 567
Author Index

Aarts, E., 376, 524 Appleby, J.S., 354


Abara, J., 712 Arabeyre, J.P., 713
Abbott-HL, H.L., 353 Arabie, P., 322, 323, 325, 328
Abe, T., 387 Aragon, C.R., 373
Abu-Mostafa, Y., 518 Aranson, S., 518
Achatz, H., 518 Arcelli, C., 323
Adam Beguelin, 154 Archdeacon, D., 354
Aldous, D., 73 Arguello, M.F., 713
Agarwal, A., 28, 153, 322, 583 Arikati, S., 28
Agarwala, R., 322 Arjomadi, E., 354
Agarwal, P.K., 27, 28 Arkin, E.M., 28
Aggarwal, C.C., 252 Armstrong, M.A., 73
Aggralwal, R., 384 Arnold, V.I., 518
Ahuja, R.K., 252 Arora, S., 354
Aichholzer, 0., 28, 629, 630 Arvind, K, 540
Akers, Jr., 353 Arya, S., 28
Akers, S.B., 583, Aspvall, B., 354
Akiyama, J., 353 Atallah, M.J., 28, 29, 153, 540
Alon, N., 353 Aumann, R.J., 1, 98
Aistrup, J., 712 Aumann, Y., 583
Aly, A.A., 461 Aurenhsmmer, F., 28, 461, 629,
Aly, K.A., 583 630
Amari, S., 518 Auslander, M.A., 360
Anbil, R., 712 Avra, L., 354
Anderson, D., 717 Avriel, M., 518
Anderson, J., 518 Awerbuch, B., 583
Ando, K., 252 Aykin, T., 10, 713
Andrasfai, B., 353 Azar, Y., 583
Andreatta, G., 713
Andreussi, A., 713 Baase, S., 252
Andrews, J.A., 354 Babel, L., 355
Annaratone, M., 384 Bafna, V., 322
Angeniol, B., 518 Bagchi, A., 377
Anosov, D., 518 Bailey, E.E., 713
Apostolico, A., 28 Baird, B., 518, 521,
Appel, K., 354 Bala, K., 584

727
728 Author Index

Balas, E., 355, 713 Benders, J.F., 714


Baldi, P., 355 Bern, M., 461
Baldick, R, 252 Bennett, C.H., 73
Ball, M., 713 Benzecri, J.P., 323
Ballart, R, 584 Berge, C., 356
Bandelt, H.-J., 323 Berge, M.A., 714
Bandyopadhyay, B.K, 387 Berger, B., 356
Bang-Jensen, J., 355 Berman, KA., 356
Banzhaf, J.F., 98 Bern, M., 618, 630
Banzhaf, W., 518 Bern, M.W., 153
Bar-Noy, A., 583 Bernadi, C., 356
Bard, J.F., 713 Bernstein, C., 356
Barnabas, J., 75 Bertsimas, D.J., 725
Barnhart, C., 717 Bestehorn, M., 519
Barry, D., 73 Betts, L.M., 253
Barry, RA., 584 Bhattacharya, B., 32
Bar-Noy, A., 540 Bhattacharya, P.P., 253
Bartal, Y., 584 Bianco, L., 713
Bauer, P.W., 17, 714 Biggs, N., 356
Bauernoppel, R, 355 Bird, C.G., 98
Baumert, L.D., 369 Bitan, S., 365
Bauslaugh, B., 355 Bitner, J.R, 356
Baybars, I., 355 Bitran, G.R., 253
Beall, C.L., 395 Binder, K, 519
Beasley, J.E., 461 Bixby, R.E., 714
Beck, L.L., 379 Blake, D.V., 354
Beckmann, M.J., 714 Blum, A., 356, 357
Bedi, D.N., 719 Blumstein, A., 714
Behrooz Kamgar-Parsi, 521 Boas, J., 712
Behzad Kamgar-Parsi, 521 Bodin, L., 356, 357, 713
Behzad, M., 355 Boender, E., 519
Beichel, I., 461 Bollobas, B., 357, 630
Bellare, M., 356 Bolland, R.P., 73
Belleville, P., 622, 630 Bondy, J.A., 357
Bellobaba, P.P., 19, 714 Boorman, S.A., 322
Beltrami, E., 356 Boorstyn, RB., 383
Ben-Or, M., 29 Boots, A.B., 464
Benchakroun, A., 714 Boots, B.N., 461
Bender, E.A., 356 Borodin, O.V., 357
Author Index 729

Borovikov, A.A., 357 Callahan, P.B., 323


Bose, P., 623, 630 Camerini, P.K., 714
Boyar, J.F., 357 Cameron, J., 359
Boyce, W.M., 153 Cameron, S.H., 359
Boyd, S., 329 Campers, G., 359
Brackett, C.A., 584 Canny, J., 20, 29
Brady, D., 523 Cao, F., 585
Brazil, M, 462, 590, 594, 601, 615 Cao, J., 715
Brandon, C., 461 Carlisle, M.C., 359
Brauner, E., 724 Carpinelli, J.D., 359
Brelaz, D., 358 Carroll, J.D., 323
Brewster, R., 358 Caspi, Y., 359
Briggs, R., 358 Catlin, P.A., 359, 360
Brigham, R.C., 364 Cavalli-Sforza, L.L, 73
Brockett, R., 519 Ceria, S., 355
Broder, S., 358 Chaitin, G.J., 360
Broersma, H.J., 358 Chams, M., 360
Bronshtein, I., 518 Chandra, A.K., 360
Brooks, R.L., 358 Chang, S, K., 462
Brown E.K., 73 Chartrand, G., 360
Brown, E.M., 462 Chataurvedi, A., 323
Brown, J.H., 714 Chazelle, B., 29, 153
Brown, J.!', 358 Cheban, Y.!., 360
Brown, J.R., 358 Cheeseman, P., 360
Bretthauer, K.M., 253 Chen, B.L., 360, 394
Brown, J.R., 253 Chen, M.S., 584
Brualdi, R.A., 358 Chen, D.Z., 28, 29, 30
Brucker, P., 253, 323 Chen, G., 360
Brumelle, S.L., 27, 714 Chen, G.X., 153
Bushel, G., 716 Chen, J.M., 462
Buneman, P., 323 Chen T.H., 32
Burattini, E., 358 Chen, Mon-S, 260
Burkard, R., 519 Cheng, K.W., 584
Burr, W.E., 584 Cheng, S.W., 621, 629, 630, 631
Buzacott, J.A., 253 Cheriton, D., 462
Chetwynd, A., 360
Caccetta, L., 358 Chew, L.P., 28, 30, 31, 33
Cai, L.Z., 358, 359 Cheeger, J., 615
Callahan, D., 359 Chhajed, D., 360
730 Author Index

Chiang, Y.-J., 30 Courant, R, 154


Chin, F., 633 Cowen, L.J., 362
Ching, Y.C., 584 Cowen, R.H., 362
Chlamtac, I., 585 Cox, T., 585
Choi, J., 30 Coxeter, H.S.M., 462
Chorobak, M., 360, 540 Cozzens, M.B., 362
Choudhury, A.K, 388 Crescenzi, P., 323
Chow, A., 153 Crowder, H., 715
Chow, F., 361 Culberson, J.C., 362
Christofides, N., 361, 715 Culik II, K, 6, 73
Chung,F.R.K., 153, 154,462 Cunningham, W.H., 99
Churchill, G.A., 76 Curiel, 1., 102
Chvatal, V., 361 Curiel I.J., 99
Clarkson, KL., 30 Curtis, A.R., 362
Claus, A., 4, 99 Curry, R.E., 35, 715
Cleeroux, R, 714 Cvijovic, D., 519
Cockayne, E.J., 154,462,590,597,
615 Dabrowski, J., 376
Cohen, M.A., 253 Dailey, D.P., 363
Colbourn, C.J., 540, 541 Dam, T.Q., 585
Cole, A.J., 361 Daniel Granot, 101
Cole, J., 462 Dantzig, G.B., 254, 715
Coleman, T.F., 361 Darrow, R.M., 723
Condon, A., 377 Das, G., 28, 29
Conley, W., 715 DasGupta, B., 73, 74, 76
Connelly, R, 28 Daskin, M.S., 715
Cook, RJ., 361, 362 Davis, L., 519
Cook, W., 253 Davis, M., 99
Cooke, J., 360 Day, W.H.E., 73, 323, 324
Cooper, KD., 358 Daescu, 0., 29
Coppersmith, D., 583 De Berg, M., 31
Corbett, P.F., 585, De Bruijn Digraph, 585
Cormen, T.H., 31 De Bruijin, N.G., 584
Corneil, D., 630 De la Croix Vaubois, 518
Corneil, D.G., 362 De Soete, G., 323
Cornuejols, G., 715 De Werra, D., 360, 361, 363, 366,
Cosares, S., 254 371, 520
Costa, D., 362 Dearagao, M.P., 371
Courant, D.R., 462 Dekel, E., 359
Author Index 731

Delattre, M., 324, 370 Du, D.Z., 154, 156, 463, 585, 604,
Demmel, J., 324 607, 609, 615, 716
Deng, X., 99 Dubes, R.C., 325
Dencker, P., 363 Dubey, P., 99
Denny, M., 631 Dubuis, 0., 362
Desai, N., 383 Duchet, P., 356
Desauliniers, G., 715 Dumas, J., 715
Descartes, B., 363 Dunstan, F.D.Jj" 364
Desrosiers, G., 715 Dunlay, J.W., 716
Deo, N., 157, 389 Durbin, R., 519
Devroye, L., 630 Durre, K, 363, 364
Dewdney, A.K, 73 Dutton, R.D., 364
Dhall, S.K, 541 Dyer, M.E., 254, 364
Dial, R., 713
Dias, D.M., 259 Early, S., 364
Dickerson, M., 622, 631 Ebeling, W., 519
Dickerson, R.E., 462 Ebin, D.G., 615
Diday, E., 324 Edelsbrunner, H., 31, 324, 618, 630,
Diks, K, 363 631
Dirac, G.A., 364 Edmonds, J., 15, 99, 100, 254
Divatia, S., 155 Edwards, A.W.F., 73
Dix, F., 585 Edwards, K., 364
Dobosiewicz, W., 31 Edwin Romeijin, H., 519
Doig, A., 719 Eglese, R. W., 364
Dolan, J., 590, 615 Eigen, M., 519
Dolan, J.R., 462 Eiselt, H., 519
Donnay, J.D.H., 462 Eiselt, H.A., 254
Dono, N.R., 584 Elce, I., 716
Dorofeyuk, A.A., 324 EIGindy, H., 31
Dove, C., 154 Ellis, J.A., 358, 359, 364
Dowd, P.W., 583 Engel, A., 519
Dowsland, K., 364 Eppstein, D., 461, 619, 630, 631
Dragone, C., 585 Erdos, P., 353, 365
Dress, A.W.M., 323, 324 Erlbach, T., 585, 586
Drezner, Z., 463 Erne, M., 365
Dror, M., 99 Etschmaier, M.M., 716
Druckerman, J., 715 Etzion, T., 365
Drysdale, R.L., 30, 631 Eu, J.E., 254
Du, D.H.C., 585, 587, 588 Evans, W., 630
732 Author Index

Even, S., 365 Franklin, M., 366


Fratta, P.K, 714
Faigle, U., 100 Fredman, M.L., 31
Farach, M., 322 Frederic Maffray, 370
Farber, M., 541 Fredrickson, G.N., 31, 254
Feairheller, S.E., 462 Freuder, E.C., 366
Fearnley, J.P., 713 Frick, M., 366
Federgururen, A., 254 Friden, C., 366
Feige, U., 365 Friedman, A.J., 357
Feistel, R, 519 Frieze, A.M., 254, 364, 366, 367
Fekete, S., 100 fu, H.L., 394
Felsenstein, J., 73, 75 Fujishige, S., 252, 254
Feo, T.A., 365 Fulkerson, n.R., 254
Ferland, J.A., 366, 714 Fuller, R.B., 463
fermat, 609 Furer, M., 367
Fernandez de la vega, W., 365, 366 Furtado, A.L., 386
Fiat, A., 583, 584
Fichtner, W., 384 Gabow, H.N., 255, 367
Fiedler, M., 324 Gacs, P., 73
Fiorini, S., 365 Gaddum, J., 382
Fisher, n.w., 324 Gagnon, G., 716
Fisher, M., 715 Galil, Z., 255
Fisher, M.L., 716 Gallager, RG., 587
Fitch, W.M., 74 Gallo, G., 255, 324
Fitzpatrick, G.L., 716 Galperin, A., 255
Flannery, B., 522 Gamst, A., 367
Fleischer, L., 259 Ganz, A., 585
Fleurent, C., 366 Garcia, A., 631
Florek, K, 324 Garey, M., 520
Florian, M., 716 Garey, M.R., 74, 100, 154, 324,
Foster, E., 155 361, 367, 463, 541, 592,
Ford, L.R, 254 615, 619, 631
Formby, J.A., 366 Garfinkel, R.S., 716
Fort, J., 520, Gasco, J.L., 52, 716
Fortune, S.J., 463 Gavril, F., 368
Foster, C.E., 366 Gay, Y., 363
Fossey, S.A., 463 Gee, A., 520
Fox, B.L., 254 Geisinger, KE., 717
Frank, A., 585 Geist AI, 154
Author Index 733

Geist R., 154 Goodchild, M.F., 721


Gelatt, C., 156, 521 Gorbatov, V.A., 357
Gelatt Jr., C.D., 375 Gower, J.C., 325
Geller, D.P., 360 Goodrich, M.T., 31, 153
Gelman, E., 712 Goodman, M., 75
Geoffrion, A.M., 717 Grable, D.A., 369
George, J.A., 368 Graham, B., 362
Georgiadis, L., 253 Graham, D.R., 713
Georgakopoulos, G., 155 Graham, R., 461
Gerands, A.M.H., 253 Graham, R.L., 153, 154, 155, 462,
Gerla, M., 586 463, 592, 609, 615
Gershkoff, I., 717 Grama, A., 156
Gessel, I.M., 368 Granot, D., 4, 23, 99, 100
Ghobrial, A., 717 Graves, G., 717
Gibbons, A., 368 Graves, G.W., 717
Gibbons, L.E., 368 Greening, M.G., 464
Gibson, K.D., 463 Griesshaber, R., 725
Gilbert, E.N., 155, 368, 463 Griggs, J.R., 367, 369
Gilbert, J.R., 354 Grigoriadis, M.D., 324
Gilbert, P.D., 619, 631 Grigoriadis, M.E., 255
Giles, R., 254 Grimmett, G.R., 369
Gillies, D., 21, 100 Grines, V., 518
Gionfriddo, M., 368 Grobmann, C., 520
Gjertsen, R.K., 368 Groenevelt, H., 254, 255
Glasgow, J., 259 Gross, 0., 255
Glover, F., 368, 520, 717 Grossberg, S., 155
Glover, R., 717 Grossman, H.C., 366
Gobel, F., 358 Grotschel, M., 100, 254, 369, 520
Gofinet, F., 461 Grunbaum, B., 369
Goldberg, D., 520 Gu, Q.-H., 585
Goldin, D.Q., 356 Gu, X.G., 523
Goldman, S.A., 632 Guenoche, A., 325
Goldwasser, S., 365 Guerin, G.G., 716
Golin, M.J., 630 Guha, S., 31
Golomb, S.W., 369 Guibas, L.J., 29, 31
Golumbic, M.C., 356, 369, 541 Guibas, L., 74, 153
Gomory, R., 717 Guichard, D.R., 369
Gondran, M., 325 Gupta, A., 156
Gonzalez, T., 369 Gupta, O.K., 255
734 Author Index

Gupta, R., 369 Hendrickson, B.A., 464


Gusfield, D., 75, 325, Henkes, 0., 359
Gutjahr, W., 369 Hennessy, J., 361
Guus, C., 519 Henning, M.A., 366
Gyarfas, A., 370 Herber Hamers, 101
Hersh, M., 718, 719
Hacene Alt Haddadene, 370 Hershberger, J., 31, 74
Hackman, S.T., 255 Hertz, A., 360, 362, 363, 366, 370,
Hagen, L., 325 371
Haggkvist, R., 370 Hertz, J., 520
Haimes, Y., 257 Hestenes, M., 520
Haken, H., 519, 520, 523 Heuft, J., 363, 364
Haken, W., 354 Hewgill, D.W., 154
Hale, W.K, 370 Hilton, A.J.W., 371, 372
Halldorsson, M.M., 370 Hind, H.R., 371, 372
Hallefjord, A., 100 Hirsch, M., 521, 523
Han, C.G., 258 Hluchyj, M.G., 585
Hane, C.A., 717 Ho, A., 713
Hansen, M., 61, 62, 717 Hoang, C.T., 372
Hansen, P., 324, 325, 370 Hoang, H.H., 718
Harary, F., 370, 391 Hoang, T., 361
Harel, D., 31, 583 Hochstattler, W., 100
Harris, A.J., 357 Hockaday, S.L.M., 718
Harris, F., 155 Hoey, D., 619, 628, 633
Harris, F.C., 155 Hofbauer, J., 521
Harrold, M.J., 375 Hoffman, A.J., 256, 372
Hartigan, J .A, 73, 325 Hoffman, J., 372
Hassin, R., 371 Hoffman, KL., 718
Hattori, Y., 387 Hogg, T., 393
Hax, A.C., 253 Holgate, P., 372
He, X., 73, 74, 76, 371 Holland, J., 521
Hearn, D.W., 368 Holloway, C., 718
Heath, L.S., 621, 628, 632 Homby, S.,
Hedetniemi, S., 156, 360 Hong, J., 628, 633
Hein, J., 74 Hong, S., 255
Hell, P., 355, 370, 371 Hopfield, J., 521, 523
Helme, M.P., 718 Hopfield, J.J., 156
Helming, R., 100 Hopkins, G., 372
Hemrick, C., 585 Hopkins, M.E., 360
Author Index 735

Hopperstad, C.A., 714 Janie, M., 718


Horst, R., 521 Janich, K., 325
Houchbaum, D.S., 254, 255, 256 Jansen,K., 373, 585, 586, 619, 632
Hsu, F., 585 Jarrah, A.I.Z., 718
Hsu, W.-L., 325 Jarnik, V., 156
Hu, T.C., 256, 632 Jarvis, J.P., 74
Hu, Q., 377 Jaumard, B., 325, 371
Hu, X., 30 Jensen, T.R., 373
Huberman, G., 100 Jeroen Kuipers, 101
Hubert, L., 322, 323, 325 Jeurissen, R., 373
Hubert, L.J., 325 Jiang, N., 259
Humblet, P.A., 584 Jiang, T., 73, 74, 76
Hung, K., 372 Johns, G., 373
Hunt, H.B., 379 Johnson, C., 717
Hurtado, F., 74 Johnson, D., 520
Hutchinson, J.P., 372 Johnson, D.S., 74, 100, 154, 254,
Hwang, F .H., 154 324, 326, 361, 367, 373,
Hwang, F.K, 156, 157, 462, 463, 463, 541, 592, 615, 619,
590, 592, 594, 604, 607, 631
609, 615, 716 Johnson, S.C., 326
Hwang, F.W., 464 Johnson, E.L., 712, 715
Johri, A., 373
Ibaraki, T., 10, 99, 102, 256, 258 Jones, M.T., 368, 374
Ibarra, O.H., 541 Jornsten, K., 100
Ichimori, T., 256 Jr., 155
Imase, M., 586 Jr., J.F., 102
Irani, S., 372, 373, 586 Jump, J.R., 586
Isaacson, J.D., 380 Jung, H., 355
Itoh, M., 586 Justin R. Smith, 157
Irving, R.W., 373
Itai, A., 365 Kabsch, W., 462
Iwai, M., 32 Kabutoya, N., 102
Kahng, A.B., 325
Jacobson, M.S., 354 Kahale, N., 353
Jack Dongarra, 154 Kahn, J., 374
Jackowski, B., 376 Kaklamani, C., 586, 587
Jaillet, P., 718 Kalai, E., 101
Jain, A.K., 325 Kalnins, A.A., 374
Jaja, J., 156 Kajitani, Y., 387
736 A uthor Index

Kamensky, V., 327 Kim, J.H., 375


Kanafani, A., 56, 62, 63, 715, 717, Kim, S., 375
718 Kim, S.H., 375
Kanafani, A.K., 718 Kim, W., 256
Kanefsky, 360 Kim, W.H., 393
Kann, V., 323, 374 King, G.H., 541
Kaplan, P.D., 14, 713 King, J.H., 256
Kapoor, S., 30 King, V., 75, 257
Karabati, S., 256 Kinzel, W., 521
Karger, D., 374 Kirkpatrick, D.G., 156, 371, 375,
Kariv, 0., 367 623,632
Karloff, H.J., 357 Kirkpatrick, S., 521
Karol, M.J., 585 Klawe, M.M, 28
Karp, R.M., 374, 541 Klein, C.N., 259
Karypis, G., 156 Klein, P., 322
Karzanov, A.V., 256 Klein, P.N., 32
Katoh, N., 254, 256, 629, 630 Klein, R.S., 257
Kautz Digraph, 585 Kleinberg, J., 586
Kautz, W.H., 586 Kleindorfer, P.R., 253
Kawaguchi, T., 374 Kleinschmdt, P., 518
Kay, D.C., 461 Kleitman, D.J., 99, 365, 367
Kececioglu, J., 75 Klenk, K.S., 29, 30
Keil, J.K., 540 Klimowicz, D., 375
Keil, J.M., 541 Klinowski, J., 519
Keil, M., 623, 630, 632 Klncewicz, J.G., 718
Kelly, D., 358 Klincsek, C.T., 619, 620, 632
Kelly, J.B., 374 Koblenz, B., 359
Kelly, L.M., 374 Koch, H.S., 257
Kempner, Y., 326 Koch, J., 354
Keppler, K.J., 328 Kodialam, M.S., 257
Keren, G., 326 Kohonen, T., 521
Kern, W., 100 Kolte, P., 155, 375
kernighan, B., 326 Konig, D., 375
Kevacevic, M., 586 Konno, H., 257
Khanna, S., 374 Koopman, B.O., 257
Kheifets, E.M., 374 Kooshesh, A.A., 375
Khuller, S., 75, 374, 375 Korfhage, R.R., 375
Kierstaed, H.A., 375, 586 Korman, S., 715
Kilakos, K., 375 Korman, S.M., 375
A uthor Index 737

Korshunov, A.D., 376 Laguna, M., 520


Korst, J., 376 Lahav, S., 371
Kostochka, A.V., 357 Lakshman, T.V., 377
Koutsoupias, E., 541 Lakshmivarahan, S., 541
Kratsch, D., 540 Lance, G.N., 326
Krawczyk, H., 356 Land, A.H., 719
Kriegel, K., 372 Lanfear, T.A., 377
Krishnamurthy, B., 326, 583 Larmore, L., 28
Krishnamurthy, N., 718, 722 Larmore, L.L., 540
Korzhik, V.P., 376 Lavoie, S., 719
Kosaraju, S.R., 323 Lawler, E.L., 377, 719
Kossler, 0., 156 Layton, W., 373
Kostochka, A.V., 376 Leathem, J.G., 597, 616
Kouvelis, P., 256, 260 Lebart, L., 326
Kovoor, N., 258 Lechenault, G., 390
Kowalik, J.S., 389 Leclerc, B., 326
Kratochvil, J., 376 Lecrecq, J.P., 359
Krivanek, M., 75 Le Texier, J.Y., 518
Krogh, A., 520 Lee, D.T., 28, 32, 33,465
Kruskal, J., 76 Lee H.E., 253, 257
Krznaric, D., 629, 632 Lee, J., 377
Kubale, M., 375, 376 Lee, K.C., 389
Kubat, P., 257 Lehel, J., 370
Kucera, L., 376, 377, 380 Leighton, F.T., 377
Kuhl, J.G., 587 Leimkuhler, J.F., 723
Kuhn, H.W., 464 Leiserson, C.E., 31
Kuhner, M., 75 Lenstra, J.K., 376, 719
Kumar, K.R., 377 Leonardi, S., 583, 584
Kumar, V., 156 Lepolesa, P.M., 364
Kung, C.E., 462 Lerme, C.S., 259
Kuno, T., 257 Leuker, G.S., 386
K upershtoh, V., 326 Leung, D.S.P., 377
Kuplinsky, J., 370 Leung, J., 377
Kushige, T., 719 Leung Yiu-W, 260
Kusiak AI, 377 Levcopoulos, C., 629, 632
Kusz, E., 376 Levin, A., 719
Levit, V., 326
Labombarda, P., 719 Levitt, P.R., 322
Ladany, S.P., 718, 719 Levin, L.A., 391
738 Author Index

Lewandowski, G., 377 Lukaszewicz, J., 324


Li, D., 257 Lund, C., 354, 379
Li, M., 73, 75, 76 Luo, F., 362
Liang, W.F., 377 Luo, S., 719, 720
Liang, Y., 541 Luo, X.H., 380
Libkin, L., 327, Luss, H., 252, 257, 258
Lick, D.R., 377 Lustig, I.J., 720
Lidstrom, N., 522
Liebman, J.S., 465 MacDonaIds, V.H., 379
Lih, KW., 360 MacGregor, J., 465, 466
Lin, C., 725 Madsen, S., 712
Lin, S., 326, 523 Maffoli, F., 714
Linek, V., 377 Magnanti, T.L., 252, 720
Lingas, A., 619, 621, 628, 632, 633 Mahadev, N.V.R., 361, 379
Linial, N., 374, 378 MahaIel, D., 384
Linke, R.A., 586 Mahidhara, D., 717
Linus, 464 Manacher, G.K, 540
Littlechild, S.C., 101 Manasse, M.S., 541
Litwhiler, D.W., 461, 464 MandaI, R., 387
Littlewood, K, 719 Mansour, Y., 356
Lipton, R.J., 32, 327 Manvel, B., 379
Liou, K-P., 328 Marathe, M.V., 379
Liu, D.D.F., 378 Marble, G., 380
Liu, R., 381 Marcotte, 0.,375
Lloyd, E.L., 359, 619, 628, 633 Margoliash, E., 74
Lofaro, G., 368 Markstein, P., 360
Lofti, V., 378 Marsten, E.L., 717
Lorenzo, R., 717 Marsten, R.E., 720, 723
Loughran, B.P., 719 Maschler, M., 1, 98, 99, 101
Loukakis, E., 378 Massey, J.J.Q., 358
Lovasz,L., 100,255,257,365,369, Massarotti, A., 358
378,520 Mas-Colell, A., 101
Love, 464 Matsuda, S., 522
Lu, Z.K, 378 Mathaisel, D.F.X., 716, 720
Lucas, W.F., 101 Matousek, J., 27
Luczak, T., 378, 379 Matula, D.W., 373, 374, 375, 379,
Ludwig, W., 259 380
Luedeman, KJ., 74 Maurer, H.A., 380
Luenberger, D., 522 Mayer, M., 720
Author Index 739

McAllister, M., 630 Mirkin, B., 326, 327


McBride, R., 717 Mirzaian, A., 622, 633
McColm, I.J., 464 Mitchell, J .S.B., 28, 32, 103
McCormick, S.T., 256,380 Mitchell, S., 630
Mcdiarmid, C., 364, 380 Mitchem, J., 381
McDiarmid, C.J.H., 369, 380 Mitra, P., 31, 32
McElfresh, S., 631 Mladenovic, N., 325
McFaddin, H. S., 28 Modlin, M.J., 716
McGeoch, L.A., 541 Montague, M.H., 622, 631
McGill, S.L., 714 Moran, S., 28
McGuinness, S., 327 More, J.J., 258, 361
McMillan, C., 717 Moret, B.M.E., 375
McMorris, F., 327 Morgenstern, C., 381
McRoberts, J., 585 Morgenstern, 0., 103
McShan, S., 720 Morineau, A., 326
Megiddo, N., 101, 255, 258 Morris, R.F.J.G., 464
Mehlhorn, K, 32 Morrison, S., 102, 720
Mehrotra, A., 381 Motwani, R., 354, 374, 540
Meidanis, J., 328 Moore, J .M., 721
Meijer, H., 621, 633 Moore, G.W., 75
Melzak, Z.A., 156, 462, 464 Mount, D.M., 28
Metropolis, N., 381, 522 Mukhopadhyay, A., 28
Metzger, B.H., 381 Mullat, J., 328
Mevert, P., 720 Muller, H., 364
Michael J. Quinn, 157 Muller, B., 522
Michael K Molloy, 156 Mulmuley, K, 33
Michalewicz, Z., 522 Muchnik, I, 326, 327, 328
Mine, H., 256 Murota, K, 258
Mihail, M., 587 Myers, B.R., 381
Miller, D.J., 370
Miller, D.M., 381 Nagamochi, H., 10, 99, 102
Miller, G.L., 327 Nahshon, I., 356
Miller, M.H., 464 Naito, T., 252
Miller, R., 720 Nakanishi, Y., 374
Miller, T., 523 Nakano, S., 395
Millighan, G.W., 327 Nakashima, K, 387
Minoux, M., 324, 719 Nakano, H., 374
Mirchandani, P., 721 Nakayama, H., 721
Mireault, P., 720 Namikawa, 258
740 Author Index

Naor, J., 540 Opsut, R.J., 382


Narayanana, B., 322 Oppo, G.L., 522
Nei, M., 75 Orchards-Hays, W., 721
Nelson, R., 381 Overmars, M., 31
Nemethy, G., 463, 464 Owen, G., 101, 102
Nemhauser, C.L., 716 Orlin, J.B., 252
Nemhauser, G., 715 Oruc, A.Y., 359
Nemhauser, G.L., 325, 721 Ostergard, P.R.P., 382
Nemhauser, R.E., 717 Ostresh, L., 721
NeSetril, J., 371 Oum, J.I., 714
Nesterov, Y., 522
Netanyahu, N.S., 28 Padberg, M.W., 713, 715, 718, 722
Neubecker, R., 522 Pal, K., 522
Neufeld, G.A., 381, 382 Pang, J.-S., 258
Neumann-Lara, 370 Pankaj, R.K., 587
Newell, G.F., 721 Pallo, J., 75
Newman, E.A., 354 Palmer, R., 520
Newmanwolfe, R., 393 Panconesi, A., 369, 382, 383
Newsam, G.N., 382 Papadimitriou, C.H., 32, 99, 102,
Neyman, A., 102 155, 258, 328, 373, 383,
Nguyen, Q.C., 258 522,541
Nicoletti, B., 719 Papadopoulou, E., 28
Nielsen, 8.8., 258 Paparrizos, K., 518
Nieminen, U.J., 382 Pardalos, P.M., 258, 328, 368, 383,
Nickerson, B.R., 382 518,724
Nishizeki, T., 32, 395, 585 Park, J., 28
Nordhaus, E., 382 Parker, M., 368
Noy, M., 74, 631 Patel, M, 465
Patreson, M., 322
Oakford, R.V., 387 Pattipati, K.R., 259
Odier, E., 719 Patty, B., 712
Odoni, A.R., 721, 724 Paul, J.L., 356
O'Dunlaing, C., 153 Pauling, 464
Ohtera, H., 258 Pavan, A., 587, 588
Okabe,464 Pawley, G., 524
Okada, K., 586 Payan, C., 383
O'Kelly, M.E., 107, 108, 721 Pearce, Peter, 464
Olariu, 8., 382 Pederzoli, G., 519
Ombres, D., 369 Peemoller, J., 383
Author Index 741

Peleg, B., 101 Psaltis, D., 518, 523


Pellionisz, A., 518 Pullman, N.J., 358
Peterson, C., 522
Peterson, J.L., 157 Quesne Le, W.J., 75
Petford, A.D., 384 Quillinan, J.D., 723
Pemmaraju, S.V., 621, 628, 632
Pennoti, R.J., 383, 384 Rabani, Y., 583, 587
Penrice, S.G., 375 Radke, J.D., 623, 632
Peretto, P., 522 ROOo, R., 371
Perkal, H., 324 Rafaeli, D., 384
Persiano, P., 586 Raghavan, P., 587
Philips, A., 383 Rajcani, P., 384
Pieris, G.R., 587 Rajan, V.T., 618, 633
Pierskalla, W .P., 257 Rakshit, A., 718, 722
Pinter, R.Y., 356 Ralf, K., 367
Pinter, S.S., 384 Ramana, M., 368
Piron, M., 326 Ramaswani, R., 583, 584
Pitel, B., 384 Ramsdell, J.D., 382
Pitsoulis, L., 522 Rand, G.K., 364
Plaisted, D.A., 628, 633 Randall, J., 382
Plassmann, P.E., 368, 374 Rangan, C.P., 540, 542
Platzman, L.K., 255 Rao, S., 32, 586, 587
Pocchiola, M., 33 Rappaport, D., 621, 633
Pollak, H.O., 155, 463 Rastani, K., 377
Pollard, J.K., 390 Rauch, M., 32
Pommerell, C., 384 Ravi, S.S., 379
Posner, E.C., 355 Ravikumar, C.P., 384
Potters, J., 102 Ravinndran, A., 255
Pothen, A., 328 Raychaudhuri, A., 384
Powell, M.J., D., 362 Razborov, A.A., 385
Powell, W.B., 722 Rechenberg, I., 523
Pradhan, D.K., 587 Reddy, S.M., 587
Prager, R., 520 Reed, B., 380
Prashker, J., 384 Reeves, C.R., 385
Preater, J., 384 Reid, J.K., 362
Preparata, F., 465 Reif, J.H., 20, 29
Preparata, F.P., 30, 33, 157 Reingold, E., 356
Press, W., 522 Reinhardt, J., 522
Prim, R.C., 464 rendel, F., 328
742 Author Index

Renegar, J., 258 Rote, G., 32, 629, 630


Resende, M.G.C., 365 Roth, R.L., 386
Reynolds-Feighan, A.J., 722 Rothblum, U.G., 257
Reynolds, R, 154, 155 Rothstein, M., 46, 716, 722
Reynolds, W.R., 157 Rubin, A., 365
Rhee, C., 541 Rubin, J., 723
Ricciar-Delli, S., 713 Rubinstein, J.H., 590, 591, 594, 601,
Richards, D., 464 604, 605, 606, 607, 615,
Richard, D.S., 156, 590, 592, 615 616
Richardson, R, 722 Rubistein, M., 462
Richetta, 0., 722 Rue, RC., 723
Richter, R.J., 722 Rus, T., 386
Rinaldi, G., 722 Rushton, G., 721
Rinnooy Kan, A.G.H., 719 Rutenberg, V., 386
Ritter, G.X., 724 Ryan, J., 368 , 386
Rivest, R.L., 31 Rytter, W., 368
Robins, H., 154,462 Rzhetsky, A., 327
Robert Manchek, 154
Roberts, A., 713 Saba, F., 373
Roberts, F., 327 Sarra, S., 354, 365, 374
Roberts, F.S., 362, 379, 382, 385 Saito, N., 585
Robertson, N., 385, 386 Saitou, N., 75
Robinson, A.G., 259 Sakai, D., 386
Robinson, C., 523 Sakaki, T., 387
Robinson, D.F., 75 Saks, M., 378
Robson, J.M., 386 Salazar, A., 387
Rodgers, G.P., 383 Salomaa, A., 380
Romanin-Jacur, G., 713 Salowe, J.S, 28
Rompel, J., 356 Saluja, K.K., 366
Rosa, A., 386 Salvi, N.Z., 387
Roschke, S.I., 386 Sanchez, F., 102
Rose, D.J., 386 Sanchezarroyo, A., 380
Rosen, A., 583 Sandblom, C.L., 519
Rosenbluth, A.W., 381 Sanders, D., 386
Rosenbluth, M., 522 Sankofi', D., 76
Rosenfeld, E., 518 Sanniti di Baja, G., 323
Rosenfeld, M., 386 Santaniello, A., 358
Rosenwein, M.B., 257, 718 Sarin, S., 378
Ross, G.J.S., 325 Sasaki, G.H., 587
Author Index 743

Sassano, A., 723 Shannon, D.F., 720


Sattah, S., 328 Shanthikumar, J.G., 253, 256, 259
Sawaki, T.H., 714 Shapiro, H., 381
Sawaragi, Y., 721 Sharpley, L.S., 99, 101, 102
Schanz, M., 523 Sharir, M., 28
Scheff, R., 723 Shasha, D., 76
Schemer, P., 373 Sheldon, B., 353
Schelp, R.H., 360 Shen, X.J., 377
Scheraga, H.A., 463, 464 Shepard, R.N., 328
Schieber. B., 33, 583 Shepardson, F., 720
schiff, L., 387 Shetty, B., 253
Schiller, D.G., 154 Shier, D.R., 74
Schmeidlre, D., 102 Shih, W., 259
schmidt, G., 387 Shinoda, S., 387
Schonheim, J., 358 Shioura, A., 259
Schorn, Peter, 465 Shlifer, E., 723
Schrijver, A., 100, 253, 255, 369 Shor, P., 28
Schuierer, S., 33 Shreve, W.E., 360
Schuster, P., 519 Shubik, M., 102, 103
Schwartzkopf, 0., 31 Shuller, S., 388
Schwarzer, L.V., 328 Shvarzer, L., 327
Schwefel, H.P., 523 Siddiqee, W., 723
Schwiegelsohn, U., 259 Sigismondi, G., 717
Scott, S.H., 371 Sigmund, K., 521
Scott, T.B., 387 Silverman, D., 715
Seery, J.R., 153 Silverman, R., 28
Sellen, J., 30 Simmons, G.J., 388
sengoku, M., 387 Simon, H., 103, 388
Sensarma, S., 387 Simon H.D., 328
Seth, A., 387 Simpson, R.W., 723
Setubal, J., 328 Sinden, F. W., 388
Sewell, E.C., 388 Sinha, B.P., 388
Seymour, P., 385, 386 Sleator, D., 42, 76
Shachnai, H., 260 sleator, D.D., 541
Shamir, A., 365 Slusarek, M., 360
Shamir, E., 388 Smale, S., 521
Shamir, R., 256 Smid, M., 28, 29
Shamos, M.L, 33, 157, 465, 619, Smith, 465, 466
628,633 Smith, B.C., 723
744 Author Index

Smith, D.H., 388 Stone, A.W., 722


Smith, D.R., 257, 258 Stone, H.S., 259
Smith, J. MacGregor, 462, 590, 615, Stone, R.E., 258
Smith, S.H., 365 Stoyan, G., 389
Smith T.F., 76 Stoyan, R., 389
smith, W., 463 Strohleim, T., 387
Snoeyink, J., 630 Struve, D.L., 723
Soderberg, B., 522 Studier, J.A., 328
Soffa, M.L., 369 Su, S.Y.W., 393
Sohler, C., 631 Subramanian, C.R., 367
Soifer, A., 388 Subramanian, R., 723
Solomon, Y., 715 Subramanian, S., 32
Soneoka, T., 586 Sudan, M., 354, 356, 374, 583
Song, G., 718, 723, 725 Sugai, M., 630
Song, G.D., 156 Suggs, D., 154, 155
Sos, V.T., 353 Sugihara, K., 464
Soudarovich, J., 723 Sulivan, F., 461
Soumis, F., 715 Summer, D.P., 389
Spellucci, P., 523 Sun, L., 389
Spencer, J., 388 Sun, X., 724
Spencer, J.H., 357 Sur., S., 389
Spinrad, J.P., 388 Suri, S., 28, 31, 32, 74
Springer, D., 388 Sussner, P., 724
Srimani, P.K., 388, 389 Suzuki, H., 32, 395, 585
Srinivasan, A., 383, 542 Suzuki, I., 31
Starke, J., 523 Syam, S., 253
Staton, W., 372 Syslo, M.M., 389
stahl, S., 389 Szegedy, M., 354, 365, 370
Stecke, K., 389 Szekeres, G., 389
Stef Tijis, 101
Steiger, J., 713 Taha, H.A., 724
Steiglitz, K., 258, 328, 522 Tai, K.C., 76
Steinhaus, H., 324 Taillard, E., 520
Stern, T.E., 584 Takefuji, Y., 389
Stewart, L.K., 540, 541 Talluri, K.T., 724
Stiebitz, M., 389 Tamaki, H., 585
Stockmeyer, I., 367 Tamasia, R., 30, 31
Stockmeyer, L., 389 Tamir, A., 59, 103, 258, 259
Stoffers, K.E., 389 Tamura, H., 387
Author Index 745

Tamura, S., 390 Thomassen, C., 390


Tan, T .K., 390 Thomason, A., 357
Tan, T.S., 618, 630 Thompson, H.R., 148, 725
Tank, D., 521, 523 Thorup, M., 322
Tang, C.S., 259 Thuering, B., 522
Tanga, R., 712 Thurston, W., 42, 76,327
Tantawi, A.N., 259 Tijis Bounds, S., 103
Tardos, E., 253, 255, 585, 586 Tijis, S., 102
Tarjan, R., 42, 76 Ting, G.Y., 156
Tarjan, R.E., 31, 32, 255, 327, 386, Tinhofer, G., 355
390, 462, 541, 587 Tirupati, D., 253
Tarsi, M., 353 Titze, B., 725
Tartar, J., 381, 382 Tiwari, P., 259
Taschwer, N., 629 Todhunter, I., 597, 616
Taylor, C.J., 724 Toft, B., 373, 390
Taylor, H., 365 Tomescu, 1., 390
Taylor, W., 360 Tong, S.R., 587
Teather, W., 713 Topp, J., 390
Tejel, J., 631 Toppur, B., 465
Teller, A., 522 Toponogov, 604
Teller, A.H., 381 Toraldo, G., 522
Teller, E., 522 Torczon, L., 358
Tellre, E., 381 Tosie, V., 718
Teng, S.-H., 327 Tooze, J., 461
Teodorovic, D., 724 Toth, Fejes, 465
Terab, M., 724 Toulouse, G., 521
Terhalle, W., 324 Tretheway, M.W., 714
Terno, J., 520 Trietsch, D., 157
Tesman, B.A., 390 Trick, M.A., 326, 381
Teukolsky, S., 522 Trofimov, V., 326
Tewinkel, D., 724 Trojanowski, A.E., 390
Thepot, J., 390 Tromp, J., 73,74,75, 76
Thiebaut, D., 259 Trotter, W.T., 375, 391, 586
Thomas, D.A., 590, 591, 594, 601, Troxell, D.S., 391
605, 607, 615, 616 Truszczynski, M., 391
Thomas, D.E., 388 Tsang, J.C., 630
Thomas, G., 359 Tschudi, T., 522
Thomas, N., 462 Tsoucas, P., 253
Thomas, R., 385, 386 Tsouros, C., 378
746 Author Index

Tu, H.-Y.T., 30 Viaropulos, K., 715


Tucker, A., 391, 587 Vidal, R.V.V., 712
Tucker, A.C., 391, 632 Vijayaditya, N., 392
Turan, P., 391 Vijayan, G., 388
Turek, J., 259 Viniotis, I., 253
Turner, J.S., 391 Vishkin, U., 33
Tuza, Z., 376, 391 Vishwanathan, S., 392
Tversky, A., 328 Vitanyi, P., 73
Twosely, D., 259 Vizing, N.G., 392
Tzeng, W.G., 541 Voet, D., 466
Voet, J., 466
Uesaka, Y., 523 Voigt, M., 392
Upfal, E., 388 Volkmann, L., 391
Upfal, U., 587 Von Haseler, A., 76
Urahama, K., 523 Von Neumann, J., 103
Urrutia, J., 74, 353, 540 Vranas, P.B., 725

Vaidy Sunderam, 154


Vaidya, P.M., 30, 466 Wagner, K., 76
Van Cutsem, B., 329 Wagon, S., 392
Van den Bout, 523 Waksman, Z., 255
Van Kreveld, M., 31 Walker, J., 254
Van Laarhoven, P., 524 Waller, A., 392
Vandenberghe, L., 329 Wallis, W.D., 392
Vanelli, A., 377 Walther, H., 392
Vardi, Y., 723 Wan, P.J., 587, 588
Vasquez-Marquez, A., 725 Wang, C., 386
Vavadis, S.A., 258 Wang, C.A., 622, 633
Vavasis, S.A., 327 Wang, C.C., 392
Vazirani, U., 378 Wang, D.I., 362, 392
Vazirani, U.V., 541 Wang, H., 725
Vazirani, V.V., 388, 541 Wang, J., 76
Vechi, M., 156, 521 Wang, K., 725
Vecchi, M.P., 375 Wang, J.F., 395
Vegter, G., 33 Wang, L., 74, 76
Venkatesan, R., 391 Wang, L.X., 395
Ventura, J .A., 259 Wang, W.F., 395
Verleger, P.K., 725 Ward, J.H., 329
Vetterling, W., 522 Warnow, T., 75
A uthor Index 747

Warren, D., 465, 466 Winter, P., 156, 157,465,466,590,


Waterman, M.S., 76 592, 615
Weatherford, L.R., 725 Wiper, D.S., 723
Wei, G., 723, 725 Woeginger, G., 32, 369
Weicheng Jiang, 154 Wolf, J.L., 252, 259, 260
Weiss, J.R., 462 Wolkowicz, H., 328, 383
Weiss, R, 465, 590, 615 Wollmer, R.D., 725
Weishaar, R.S., 384 Wong, R.T., 720
Welsh, D.J.A., 329, 392 Wong, W., 519, 524
Welzl, E., 369 Woo, T.K, 393
Weng, J.F., 156,463,589,590,594, Wood, D., 6, 73, 380
601, 604, 606, 607, 609, Wood, D.C., 393
610, 612, 614, 615, 616 Woodall, D.R., 362, 394
Wessels, J., 376 Woodrow, R.E., 358
Wesolowsky, G.O., 463, 464, 466 Woods, A.R., 394
White, 392 Wormald, J.H., 462
White, A.T., 377 Wormald, N.C., 590, 594, 601, 615
White, S., 253 Wong, C.K, 32, 33
Welzl, E., 631 Wu, A., 28
Weng, D., 462 Wu, F.F., 252
Widmayer, P., 33
Wu, Y.F., 33
Wiggins, S., 524
Xu, Y.F., 621, 623, 630, 631, 633
Wilber, R., 28
Xue, J., 355, 383, 394
Widhelm, W.B., 721
Wilf, H.S., 356, 389, 393 Yamada, S., 258
Wilkov, R.S., 393 Yan, S., 725, 726
Wigderson, A., 393 Yannakakis, M., 102, 373, 379, 383
Williams, C.P., 393 Yang, B.T., 623. 633
Williams, KA., 585, 587 Yang, C.D., 32, 33
Williams, M.R., 393 Yang, D., 726
Williams, W.T., 326 Yao, A.C, 329
Wills, J.M., 465 Yao, D.D., 259, 260
Willshaw, D., 519 Yao, F.F., 621, 633
Willson, G., 524 Yap, C., 153
Wilson, R.J., 365, 366, 372, 381, Yap, C.-K, 30
394 Yap, H.P., 394
Windle, R., 97, 720 Ye, Y., 258
Winston, C., 102, 721 Yeh, S.S.W., 394
748 Author Index

Yoeli, P., 633 Zoellner, J.A., 395


You, Z.Y., 623, 633 Zurek, W., 73
Young, A., 519
Young, H., 726
Youngs, J.W.T., 394
Yu, C.-S., 258
Yu, G., 256, 260, 713, 718, 719,
720, 722, 723, 725, 726
Yu, P.L., 260
Yu, P.S., 252, 259
Yu, S., 518
Yuceer, U., 260
Yuille, A., 524
Yum, T., 372
Yum, Tak-S, 260

Zahn, C.T., 329


Zang, W., 103
Zaretsky, K.A., 329
Zaroliagis, C.D., 28
Zaslavsky, T., 394
Zemel, E., 101, 257
Zeng, D., 102
Zenious, S.A., 258
Zeng, X., 103
Zerovnik, J., 394, 395
Zhang, G.H., 392
Zhang, K., 74, 76
Zhang, L., 73, 74, 75, 76
Zhang, L.X., 75
Zhang, Z.F., 395
Zheng, Q., 541
Zhou, B., 353
Zhou, D., 633
Zhou, X., 395
Zhu, A.X., 395
Zipkin, P.H., 260
Zobel, A., 395
Zubrzycki, S., 324
Subject Index
,a-skeleton 623 characteristic polygon 607
oo-junction, 430 centroid spanning tree, 429
o2-junction, 430 circumscribing triangle, 403
o3-junction, 430 clump cluster, 277
o4-junction, 430 cluster, 261, 274
7r-cluster, 277 clustering, 275
clustering approaches, 276
AID converter, 488 clustering criteria, 275
additive cluster, 292 coalition game,
additive clustering, 293 cohesive clustering, 295
additive partitioning, 305 cohesive clustering criteria, 296
additive partition clustering, 305 coloring, 331
aggregable data, 265 column-conditional data, 265
agglomerative clustering, 301 column-conditional table, 267
amino acids, 444 combinatorial optimization game,
amide plane, 446 86
airline industry, 635 comparable data, 265
air traffic flow control, 687 compression theorem 604
algebraic tree model, 215 computational geometry, 409
alternating square-error clustering, conceptual clustering, 302
301 constraints, 168
approximation cluster, 289 convex, 79
approximation criteria, 298 core, 81
artificial neural networks, 485 correlate partition, 296
assignment, 91, 480 coupled selection equations, 490,
autocatalytic macromolecules, 492 509
average linkage, 278 covering, 91
crew scheduling, 678
basins of attraction, 499
Benard problem, 491 data-sets, 503
between-item distances, 267 decision variable, 502
bicriteria, 9 decomposition algorithm, 187
bipartite clustering structure, 275 definition-based cluster, 276
Boolean data, 265 Delaunay tetrahedization, 428
branch and bound, 508 Delaunay triangulation, 618
box clustering, 316 diagonalize symmetric matrices, 490

749
750 Subject Index

diagonal flip distance, 69 frequency assignment, 341


digital image processing, 285 full topology, 601
dissimilarity, 267
dissimilarity matrix, 268 gateway, 5
distance, 36 gateway paradigm, 5
distance data, 283 gateway characterization, 10
dominating set, 525 gateway region, 11
double bracket flows, 490 general linear constraints, 170
double chain silicate, 431 genetic algorithm, 484
double wedge theorem, 117 geodesic minimum spanning tree,
dual forest algorithm, 475, 508 406
dual maximization problem, 476 geodesic Steiner minimal tree, 407
dual construction, 397 geometric shortest path query, 3
dual formulation, 412 global optimal solution, 502
dual problem, 414 graph partitioning, 308
dual Simpson line construction, 415 great circle, 407
dynamic programming, 621 greedy algorithm, 282
dynamical system, 477 greedy heuristic, 439

elastic net method, 489 Hamming distance, 36


heuristic, 403
energy configuration, 442
heuristic search, 409
energy function, 486
entity-to-center scalar product, 279 holistic linkage, 279
Hopfield model, 486
equilibrium, 442
Hopfield networks, 485
equilateral tetrahedra, 421
Hungarian method, 475
Euclidean minimum spanning tree,.
402 incremental algorithm, 172, 184
Euclidean Steiner minimal tree, 402 irregular operations control, 697
evolutionary strategies, 484 irregular tetrahedra, 417
evolutionary tree, 37 intersection of triangulations, 622
exact shortest path query, 6 Ising model, 485
extended greedy procedure, 284
Kohonen networks, 489
fleet assignment, 664
flight planning, 664 Lagrange multipliers, 418
flow index, 272 lattice solution, 403
flow table, 319 least deviation method, 276
fluid dynamics, 491 least-maximum-deviation criterion,
four way 'R.-sausage networks, 440 266
Subject Index 751

least-moduli, 276 minimum weight triangulation, 617,


least-squares, 276 619, 621
least-squares approximation, 266 minimum variance resource alloca-
least-squares standardization, 266 tion, 249
length query, 4 modeling skeletons, 285
linear assignment, 475 modulo formulation, 487
linear-cost subtree-transfer distance, monotone linkage, 283
57 monotone linkage clustering, 283
linear programming, 490 Monte-Carlo method, 482
linkage-based convex criteria, 287 multiple resource allocation, 240
list coloring, 351 multiperiod resource allocation, 245
local optimal solution, 433 multidimensional assignment, 477
local search, 295, 300 NE-core,97
network design, 637
many way n-sausage networks, 440 nested constraints, 168
marriage theorem, 624 network constraints, 168
matching, 624 neural networks, 484
matching among triangulations, 625 nonlinear equations, 417
matroid, 282 nonlinear programming, 402
maximin problem, 194 normal vector 593
max-min angle triangulation, 618 NP-hard, 401
maximum density subgraph prob-
lem, 281 on-line algorithms, 527
Maxwell's theorem, 445 optimal cluster, 280
M-convex, 221 overlapping clusters, 309
Melzak circle, 411 packing, 91
metropolis algorithm, 482 parameter setting, 507
metric space, 589 partition, 91
metropolis algorithm, 482 partitioning, 295
microbiology, 444 pattern formation, 493
min-max angle triangulation, 618 pattern recognition, 485
min-max aspect ratio triangulation, PC-core, 96
618 penalty terms, 495
min-max length triangulation, 618 penalty methods, 478, 508
minimal distance rule, 299 penalty terms, 481, 495
minimax problem, 194 permutation graph, 529
minimum spanning tree, 281 permutation matrix, 494
minimum energy configuration, 442 performance ratio, 527, 533
752 Subject Index

phylogenies, 37 separable convex, 167


physical dynamical systems, 485 sequential quadratic programming,
planar spanner, 17 420
planar subdivision, 19 set coloring, 352
polygon, 3 similarity matrices, 267
potential function, 494 similarity measure, 269
primal problem, 413 simple polygon, 3
primal feasibility, 417 Simpson line, 411
primal objective function, 417 simulated annealing, 346, 483, 509
principal cluster, 289 single chain silicate, 431
printed circuit board testing, 342 single cluster, 276
product planning, 234 single cluster clustering, 276
protein modeling, 444 single linkage, 278
protein objective function, 449 single linkage clustering, 281
proximity theorem, 198, 205 single-source path query, 3
pseudo-random value, 505 spanner, 16
spanning trees 590
quadratic optimization, 416 spatial data, 265
query, 3 spin glass theory, 485
rectilinear, 8 stability, 494
reduced visibility graph approach, stable points, 502
8 staircase separator, 20
reduced visibility graph, 9 statistical approaches, 482
register allocation, 340 statistically independent, 272
reliability, 232 Steiner construction, 405
resource allocation, 159, 166 Steiner hull, 118
ribbon, 430 Steiner minimal tree, 401
ribbon decomposition problem, 429 Steiner points 10, 409, 597
ribbon junction, 430 Steiner properties, 402
roll pattern, 491 Steiner ratio, 402, 606
rotation distance, 65 Steiner topology, 592
n-sausage, 423 Steiner tree 401, 549, 589, 612
strong cluster, 277
sausage, 423 structured partitioning, 306
scheduling, 340 structured partition, 275
screw symmetry, 426 submodular, 165
selection equations, 491 submodular constraints, 169
self-organizing maps, 489 submodular system, 165
separator, 20 submodular polyhedron, 166
Subject Index 753

subtree-transfer distance, 38 weighted obstacles, 9


summary linkage, 278 window linkage, 279
Winter's Algorithm, 122
tabu search, 346
tangent vector 593 x-ray crystallography, 445
T-coloring, 351
tetrahedron, 414 yied management, 652
three-dimensional assignment, 497,
501, 513
three way 'R.-sausage networks, 440
threshold linkage, 278
timetabling, 340
totally balanced game, 88
traveling salesman problem, 485
tree constraints, 168
tree metric, 314
triangle, 403
triangulation, 618
t-spanner, 16
two-dimensional assignment, 493,
498,511
two way 'R.-sausage networks, 440

uniform partitioning, 303


unit sphere, 406
ultrametric, 315

vector, 593
Voronoi diagram, 403, 618
Voronoi edges, 404
Voronoi points, 404
Voronoi polyhedra, 428
visibility, 6
visibility complex, 7
visibility graph, 7
visibility polygon, 6
visibility-sensitive, 6

weak cluster, 277


weighted, 9
Author Index
Aarts, E., 321, 449 Amoura, A.K., 130
Aarts, E.H.L., 129, 166, 321, 332, Ananth, G., 449
756 Anderson, E.J., 130, 155
Abdul-Razaq, T.S., 3, 129 Anderson, RJ., 613
Abramson, B., 532 Applegate, D., 130
Abramson, D., 449 Armour, G.C., 323
Acampora S. Anthony, 234 Armstrong, RD., 141, 181
Acampora, A.S., 236, 237 Arnborg, S., 392
Achugbue, J.O., 129 Arora, S., 322
Ackey, D., 750 Arvind, K, 392
Adam, T.L., 532 Arvindam, S., 449
Adams, J., 129 Asano, T., 392
Adams, W.P., 321, 322 Aspnes, J., 130
Adenso-Dias, B., 129 Assad, A.A., 322
Adiri, I., 129, 146, 154 At allah , M.J., 392
Adolphson, D., 129 Attaway, J.D., 234
Aggarwal, A., 613 Aude, J., 452
Ahmad, I., 537 Avidor, A., 131
Ahmadi, RH., 129, 130 Awerbuch, B., 131
Ahmadian, A., 130 Azar, Y., 130, 131
Ahmadvand Nima, 234
Ahn, S., 161 Babai, L., 614
Ahrens, W., 532 Back, T., 750
Ahuja, R, 333 Bagchi, D., 129, 130, 131
Ahuja, RK, 322 Bai, D., 753
Aizikowitz, N., 129 Baker, J.R, 31, 34, 132
Ajtai, M., 613 Baker, KR., 131, 161, 166
Akers, 8.B., 130 Balakrishnana, H., 393
Aksjonov, V.A., 130 Ballas, E., 129, 131, 332
Al-Harkan, I., 756 Bampis, E., 130
Albers, S., 130 Bandelt, H.J., 393, 756
Ali, H.H., 130, 533 Banerjee Subrata, 234
Alidaee, B., 19, 130, 753 Bange, D.W., 393
Alizadeh, F., 1, 18 Bannister, J.A., 234
Alon, N., 130, 613 Baracco, P., 138
Ammar, M.H., 238 Barany, I., 131, 132

759
760 A uthor Index

Barbosa, V., 449 Bollobas, B., 323, 564, 614


Barkauskas, A.E., 393 Bonniger, T., 323
Barkovic Oliver, 239 Bonuccelli, M.A., 393
Barnes, J.W., 132 Booth, K.S., 393
Barroso, A., 455 Borchers, A., 235
Bartal, Y., 132 Borella S. Michael, 235
Bartusch, M., 132 Brackett, C.A., 235
Barvinok, A.I., 322 Brandstadt, A., 394, 398, 403
Batcher, K.E., 614 Brasel, H., 133
Battiti, R, 322, 449, 751 Bratley, P., 133
Bauer, J., 337 Brebner, G.J., 619
Bazaraa, M.S., 322, 323 Brecht, T., 140
Beame, P., 614 Brennan, J.J., 132
Beckmann, M.J., 331 Breu, H., 392, 394
Behrendt, H., Broeckx, F., 331
Bellare, M., 2, 18 Broin, M.W., 394
Bellman, RE., 393 Brooks, RL., 564
Beloudah, H., 132 Brucker, P., 133, 134, 153, 349
Belov, I.S., 132 Bruengger, A., 323
Benavent, E., 326 Bruno, J.L., 134
Benson, S., 18 Bryant, RE., 532
Berman, P., 132 Buer, H., 134
Bernhard, P.J., 393 Buffa, E.S., 323
Bernhardsson, B., 532 Bulfin, R.L., 99, 136, 154, 167, 550
Bertolazzi, P., 393 Burkard, RE., 323, 324, 325, 327
Bertossi, A.A., 393 Burlet, M., 394
Bertrand, J.W., 31, 131 Burton, P.G., 155
Bertsimas, D., 18
Beyer, D., 751 Cai, X., 76, 134
Beyer, T.A., 393 Campbell, H.G., 134
Bhattacharjee, G.P., 615 Campos, V., 754
Bhattacharya, S., 234, 238 Cao, F., 11, 235
Bianco, A., 238 Carlier, J., 79, 134, 135
Bianco, L., 132 Carraresi, P., 325
Biggs, N.L., 393 Cela, E., 323, 324, 325, 327, 331
Birhoff, G., 234, 323 Cellary, W., 132
Bishop, A.B., 161 Cerny, V., 325, 449, 532
Blazewicz, J., 132, 133 Chakrabarti, S., 135
Blelloch, G.E., 614 Chakrapani, J., 325
Author Index 761

Chambers, J.B., 132 Cockayne, E.J., 396, 403


Chan, L.M.A., 135 Coffman Jr, E.G., 134, 137, 156,
Chandy, KM., 532 157
Chang, G.J., 395, 400, 402, 405 Coffman, E., 532
Chang, M.S., 392, 395, 403 Cohen, E., 614
Chang, S., 135 Cohoon, J., 450
Chang, Y.-L., 131 Colbourn, C.J., 396
Charikar, M., 132 Cole, R, 614
Charlton, J.M., 135 Colin, J.-y', 137
Charney, H.R, 532 Colorni, A., 326
Chekuri, C., 135 Congram, RK, 137
Chen Ming, 235 Connolly, D. T., 326
Chen, B., 135 Conrad, K, 326
Chen, B.-L., 565 Consiglio, A., 751
Chen, C.L., 136 Conway, RW., 137
Chen, D., 539 Cook, W., 130
Chen, T.C.E., 136 Coorg, S.R, 395
Chen, Z.L., 136 Coppersmith, D., 614
Cheng, C.K, 103, 540, 541 Cormen, T., 450
Cheng, T.C.E., 153, 154 Corneil, D.G., 396, 397
Chepoi, V.D., 394 Corniel, D., 618
Cheston, G.A., 396 Correa, R, 450
Cheung, KW., 235, 238 Cortesi, M., 165
Chiarulli, D., 235, 237 Crainic, T., 450
Chin, F.Y., 129 Crauwels, H.A.J., 118, 137
Chinn, D.D., 140 Creutz, M., 450
Chipalkatti, R, 236 Crouse, J., 334, 454
Chipperfield, A., 450 Csanky, L., 614
Cho, y., 136, 161, 461 Cung, V.-D., 751
Chowdhury, I.G., 140, 157 Cusworth, S.D., 236, 238
Chretienne, P., 136, 137 Cyganski, D., 326
Chretienne, P., 325 Cypher, RE., 614
Christofides, N., 325, 326
Chu, C., 136, 137 D'atri, A., 397
Chudak, F., 137 Dall, S.K, 402
Chung, Y.C., 532 Damaschke, P., 397, 402
Clausen, J., 323, 326, 327, 331, Dankesreiter, M., 539
450 Dannenbring, D.G., 137
Clementi, A., 614 Dauzere-Peres, S., 122, 137, 138
762 A uthor Index

Davis, E.W., 138 Du, B., 534, 535


Davis, J.S., 138 Du, D.H., 234
Davis, L., 326, 450, 715 Du, D.H.C., 235, 236, 238, 239
De Jaenisch, C.F., 397 Du, J., 139, 140
De Werra, D., 133, 329, 565, 566, Dudek, R.A., 134
756 Duksu Oh, A., 237
De, P., 138 Dunbar, J.E., 398
Death, C.C., 135 Dyer, M.E., 140, 327
Deineko, V.G., 326
Dejong, K.A., 756 Eastman, W.L., 140
Dell'Amico, M., 138 Ecker, K.H., 133
Della Croce, F., 131, 138 Eckstein, J., 451
Demeulemeester, E., 138 Edmonds, J., 140
Demeulemeester, E.L., 146 Edwards, C.S., 327
Demidenko, V.M., 323 Eiben, A.E., 751
Deng, X., 140 Eilon, S., 140, 157
Deogun, J.S., 139 EI-Horbaty, S., 328
Dergis, U., 324 EI-Rewini, H., 130, 532, 533
Dessouky, M.I., 139, 151, 167 Elmaghraby, S.E., 140
Devor, R.E., 151 Elrod, H., 754
Dhall, S.K., 404 Elshafei, A.N., 327
Di Tullio R., 138 Emmons, H., 140
Dickey, J.W., 326 Enscore Jr., E.E., 157
Dickson, J.R., 532 Epstein, L., 140
Dileepan, P., 139 Erdos, P., 140, 615
Ding-Zhu Du, 532 Ersoy, C., 236
Dipeepan, 140 Eschelman, L.J., 751
Dobson, G., 139 Esperflen, T., 327, 331
Dorigo, M., 326, 327, 328 Euler, L., 398
Dorndorf, U., 139 Even, S., 140, 398, 403
Dowd W. Patrick, 236
Downey, P.J., 134 Faber, M., 405
Dowsland, K., 450 Faigle, U., 140
Drabowski, M., 1343 Falkowski, B.-J., 533
Dragan, F.F., 394, 397, 398 Farber, M., 395, 398
Dreyfus, S.E., 393, 398 Farley, A.M., 398
Drezner, Z., 335 Federgruen, A., 140, 164
Drobouchevitch, I.G., 139 Feige, U., 140
Drozdowski, M., 139 Feldmann, A., 140, 141
A uthor Index 763

Fellows, M.R, 398 Fujii, M., 142


Felten, E.W., 755 Fulkerson, D.R, 533, 615
Feo, T., 751, 754
Feo, T.A., 327 Gajski, D.D., 541
Ferland, J., 327 Galambos, G., 142
Fernandez, S., 141 Galil, Z., 615
Ferreira, A., 450, 451 Gambardella, L.M., 328
Fettes, W., 539 Ganz, A., 236
Fiala, T., 131, 132 Gao, Y., 236
Fiat, A., 130, 132, 141 Garcia, B., 451
Fiduccia, C.M., 533 Garey, M.R, 137, 142, 186, 189,
Fiechter, C.-N., 451 191, 328, 399, 533
Fincke, D., 324 Gasteiger, J., 337
Finke, G., 327 Gavett, J.W., 328
Finn, G., 141 Gavril, F., 399, 405
Fisher, H., 141 Gelat, C.D., 537
Fisher, M.L., 141 Gelatt, C.D., 331
Fleming, P., 450 Gelders, L., 142, 541
Fleurent, C., 327 Gelders, L.F., 167
Fleurent, C., 751 Gellat, C., 452
Florian, M., 133, 155, 377 Gendreau, M., 141, 450
Ford, L.R, 533, 615 Gens, G.V., 143, 196, 197
Fotedar Shivi, 237 Geoffrion, A.M., 328
Franca, P.M., 141 Georganas D. Nicolas, 235, 237,
FrankIe, J., 533 239
Fratta, L., 234 Gerasoulis, A., 541
Frazer, W.D., 615 Gerla Mario, 236, 237
Frederickson, G.M., 141, 175 Gerla, M., 234
French, S., 141, 161 Gerrard, M., 326
Frenk, J.B.G., 328 Ghosh, J.B., 138, 142
Freville, A., 752 Ghosh, RK., 615
Fricke, G.H., 396 Gibbons, L.E., 18
Friedman, J., 130 Giffier, B., 142
Friedrich, J., 337 Gil, J., 615
Friesen, D.K., 141 Gilmore, P., 143
Frieze, A., 18, 322 Gilmore, P.C., 328
Frieze, A.M., 327, 328 Glass, C.A., 130 , 143
Fry, T.D., 141, 181 Glover, F., 328, 329, 451, 533, 751,
Fu, H.L., 565 752, 753, 754, 755, 756
764 Author Index

Goddard, W., 398 Gupta, M.C., 145


Goemans, M.X., 8, 18, 143, 206, Gupta, RP., 565
329 Gupta, Y.P., 145
Goldberg, D., 451 Gusfield, D., 145
Goldberg, D.E., 329 Guy, RK., 564
Goldberg, L.A., 143
Golumbic, M.C., 399 Hadley, S.W., 329
Gomory, RE., 143 Hahn, P., 329
Gonzalez, T., 134, 143, 144, 336 Hahn, W., 324
Gonzalez-Velarde, J.L., 754 Hajnal, A., 565
Goodman, S., 403 Hall, L.A., 145, 168
Goodman, S.E., 396 Hall, N., 329
Gopalan, S., 19, 130 Hall, N.G., 145
Gordon, V.S., 144, 165, 757 Ham, I., 157
Gorges-Schleuter, M., 451, 755 Hammer, P.H., 399
Gori, A., 393 Han, e.G., 538
Goyal, S.K., 144 Hanen, c., 156
Grabowski, J., 144, 169, 574 Hansen, P., 565
Grah M. Adrian, 239 Hardy, G.G., 330
Graham, A., 329 Hare, E.O., 399
Graham, RL., 144, 145 Hariri, A.M.A., 137, 146, 244
Graham, RL., 533 Harker, P., 754
Grama, A., 452 Hartnell, B.L., 396
Grant, T., 329 Hastad, J., 614
Gravenstreter, G., 235, 237 Hattingh, J.H., 399, 400
Graves, G.W., 328 Haupt, R, 146
Green, G.R, 236 Hauser, R, 452
Greenberg, H., 329, 752, 753 Haven, G.N., 538
Greenlaw, R, 615 Haynes, T. W., 400
Grefenstette, J., 455 Hearn, D.W., 7, 18
Grinstead, D.L., 399 Hedetniemi, S.M., 400, 402
Groenevelt, H., 140, 164 Hedetniemi, S.T., 393, 396, 398,
Groes, G.A., 533 399, 400, 402, 403
Grosso, A., 138 Hedge, S., 450
Grove, E., 131 Hefetz, N., 146
Gu, J., 404, 533, 534, 535, 536, Heffiey, D.R, 330
537, 538, 539, 540, 541 Heider, C.H., 330
Gupta, A., 452 Heidorn, G.E., 138
Gupta, J.N.D., 143, 165 Henderson, T.C., 535
Author Index 765

Henning, M.A., 398, 399, 400 Hurkens, C.A.J., 168


Herroelen, W., 138 Hwang, G., 238
Herroelen, W.S., 146 Hwang, S.F., 400
Hertz, A., 168, 565, 756 Hyeong-Ah Choi, 237
Hightower, W.L., 615
Hilton, A.J.W., 565 Ibaraki, T., 150, 452, 756
Hirose, T., 453, 454 Ibarra, O.H., 144, 148, 279, 400
Ho, P.H., 402 Ignall, I., 148
Hoare, C.A.R, 615 Inukai Thomas, 236
Hochbaum, D.S., 146 Irwin, J.D., 168, 558
Hoffman, A.J., 400 Isaacs, 1.M., 140
Hoffman, E.J., 536
Hoffmeister, F., 750 Ja Ja, J., 615
Holland, J., 452, 536 Jackson, J.R., 148, 149, 281
Holland, J.H., 330, 753 Jacobs, D.P., 393, 396, 400
Holmes, N.D., 399 Jacobson, M.S., 401
Holsenback, J.E., 147, 161, 253 Jaffe, J.M., 138, 149
Homqvist, K, 452 Jakoby, A., 149
Hong, 1., 753 Janiak, A., 144
Hong, KS., 147 Janikow, C., 755
Hongsik Choi, 237 Janoska, M., 237
Hoogeveen, J.A., 147, 148, 163, 166, Jansen, B., 330
168, 256, 262, 270, 547 Jayaram, T.S., 401
Hoover, M.N., 398 Jensen, T.R, 566
Hopcroft, J.E., 236 Jerrum, M., 18
Hopkins, J.W., 326 Jog, P., 452
Horn, W.A., 148, 273, 274 Johnson Lewis, W., 536
Horowitz, E., 141, 148, 536, 615 Johnson, D.B., 144
Horvath, E.C., 148, 276 Johnson, D.S., 137, 142, 186, 189,
Host, L.H., 393 191, 328, 330, 399, 401,
Howorka, E., 399 533, 536, 753
Hsing, F.H., 395 Johnson, J.H., 393
Hsu, D.F., 235 Johnson, N., 19
Hsu, W., 400 Johnson, RR., 536
Hsu, W.L., 405 Johnson, S.M., 149
Hu, T.C., 129, 148, 277, 536 Johnson, T.A., 321, 330
Huang, X., 535, 536 Johnston, M.D., 536, 537, 538
Hunt III, H.B., 402 Jordan, H., 238
Hurink, J., 133 Jornsten, A.K, 755
766 Author Index

Ju, D.Neumytov, 157 Kellerer, H., 150


Jung, H., 149 Kelly, J., 754
Junger, M., 330 Kelly, J.P., 753, 754
Jurisch, B., 133, 134, 149 Kenyon, C., 130
Jurisch, M., 133 Kern, W., 140
Kernighan, B., 331
Kahlbacher, H.G., 149, 290 Kernighan, B.W., 536, 537, 539
Kahng, A.B., 753 Khan, A.A., 536
Kaibel, V., 330 Kikuno, T., 401
Kailath, T., 167 Kim, C.E., 148
Kaklamanis Kim, S.J., 153
Kakuda, Y., 401 Kim, Y.-D., 150, 168
Kaminsky, P., 135 Kinacaid, RK., 155
Kamoun, H., 149 Kindermann, J., 454
Kanal, L., 452, 454 Kirca, 0., 322
Kanet, J.J., 127, 138, 149 Kirkpatrick, D.G., 392, 394
Kannan, B., 236, 237 Kirkpatrick, S., 331, 452, 537
Kannelakis, P.C., 158 Kirousis, L.M., 149
Kao, M., 131 Kise, H., 150
Kao, M.-Y., 140 Klarger, D.R, 616
Kaplan, H., 322 Klein, P., 614
Karger, D.R, 149 Klein, P.N., 616
Karisch, S.E., 324, 326, 327, 330, Kleinau, V., 150
331, 337 Kleinberg, J., 143
Karloff, H., 132 Kleindorfer, P.R., 142
Karp, R, 452 Klincewicz, J. G., 331
Karp, RM., 149, 236, 294, 331, Klinz, B., 323, 324
533, 616 Knopman, J., 452
Karpinski, M.132 Ko, M.T., 565
Karwan, M.H.~ 753 Kochenberger, G., 753
Karypis, G., 452 Kohler, E., 401
Kasami, T., 142 Kohler, W.H., 150
Kaufman, M.T., 149 Kolen, A., 401
Kaufmann, L., 331 Kolen, W.J., 400
Kavehrad, M., 239, 616 Komlos, J., 613
Kavvadias, D., 616 Konagaya, A., 454
Kawaguchi, T., 147, 149 Konig, J.-C., 156
Keil, J.M., 396, 401 Koopmans, T.C., 331
Keil, M., 398 Korst, J., 321
Author Index 767

Korst, J., 449 Lakshmivarahan, S., 402, 404


Kosaraju, S.R, 392 Lam, C.W.H., 402
Kotz, S., 19 Lam, S., 148, 151, 276
Koulamas, C., 150, 411 Land, A.M., 332
Koulamas, C.P., 158 Langston, M.A., 141
Kovacevic Milan, 237 Laporte, G., 141, 322
Kovalyov, M.Y., 150, 305 Larson, RE., 151
Kraay, D., 754 Laskar, R., 396, 403
Kramer, A., 134, 150 Laskar, RC., 399, 400, 402
Kramer, 0., 755 Lassere, J.B., 137
Krarup, J., 331 Laursen, P., 453
Kratsch, D., 394, 396, 401, 402 Laursen, P.S., 453
Kravchenko, S.A., 134, 150 Law, A.M., 398
Krawczyk, H., 150 Lawler, E.L., 144, 145, 151, 152,
Krishnamurthy, B., 537 153, 167, 315, 331, 322,
Krishnan, P., 131 327, 330, 332, 335, 338,
Krizanc, D., 615 402, 533, 537
Krone, M.J., 150 Lawler, E.R., 131
Kruatrachue, B., 537 Lawrence, S., 153
Kubale, M., 150, 151 Lee, C.-Y., 134, 153
Kubiak, W., 145, 149, 150, 151, Lee, E.A., 539
311 Lee, F.Y., 392
Kucera, L., 614 Lee, KY., 238
Kucera, L., 616 Lee, RC.T., 405
Kumar, A., 145 Leighton, T., 153
Kumar, V., 449, 452, 453, 454, 537 Leighton, T., 617
Kunde, M., 151, 616 Leiserson, C., 450
Kung, H.T., 619 Leiserson, C.E., 614
Kuplinsky, J., 565 Lempel, A., 398, 403
Kwok, Y.K, 535 Lengauer, T., 332
Lenstra, J.K, 129, 131, 133, 139,
Labetoulle, J., 151, 152, 315, 333 141, 145, 147, 151, 152,
Labourdette, J.P., 237 153, 154, 160, 166, 167,
Lageweg, B.J., 139, 141, 151, 160, 315, 321, 332, 335, 348,
167 533,537
Lagrangean, A., 333 Lenver, 196, 197
Laguna, M., 329, 451, 753, 754 Leonardi, E., 238
Lai, T., 453 Leonardi, S., 132
Laird, P., 537, 538 Leontief, W., 332
768 A uthor Index

Lerchs, H., 396 Lokketangen, 755


Leung, J.Y.-T., 139, 140, 147, 152, Loukakis, E., 402
153 Lourenco, H.R., 755
Leuze, M., 455 Lovasz, L., 19, 140, 332, 617
Lev, V., 154 Love, R.F., 538
Levesque, H., 539 Lowe, T.J., 394
Levin, F.L., 614 Lu, Q., 136
Levitan, S., 235 Lu, T.L., 402
Levner, E.V., 142 Lubiw, A., 402
Lewis, H., 141 Luby, M., 617
Lewis, P.M., 538 Luby, M.G., 152
Lewis, T.G., 532, 533 Luling, R., 453
Lewis, T.G., 537 Lushchakova, LN., 154
Li, B., 236
Li, C.-L., 134, 154 Ma, K.L., 402
Li, G., 453 Ma, M., 453
Li, H., 534 Ma, R., 453
Li, M., 537 Ma, Y., 617
Li, R., 154 MacGillivray, G., 396
Li, Y., 332, 334, 335 MacLeod, K., 141
Liang, Y.D., 392, 402, 404 Madhukar, K., 392
Liaw, H.-T., 537 Maffray, F., 399
Lih, K.-W., 565, 566 Maggs, B., 153
Lin, B.M.T., 153 Maggs, B.M., 614
Lin, C.-S., 537 Magirou, V.F., 333
Lin, C.-Y., 566 Magnanti, T., 333
Lin, S., 331, 536, 537 Malucelli, F., 325, 333
Linhares, A., 453 Manacher, G.K., 392, 402
Littlewood, J.E., 330 Manderick, B., 453, 455
Liu, C.L., 154 Maniezzo, V., 326
Liu, G., 238 Mankus, T.A., 402
Liu, J.J., 165 Manner, R., 452
Liu, J.S.W., 154 Manoussakis, Y., 130
Liu, Y.C., 395 Mans, B., 453
Liu, Z., 168 Mao, W., 155, 365
Livingston, M., 402 Marathe, M.V., 402
Lloyd, E.K., 393 Marchetti-Spaccamela, A., 132
Lloyd, E.L., 154 Margalef, R., 537
Loessi, J.C., 536 Marsan, M, 238
Author Index 769

Marshall, A.W., 566 Migdalas, A., 452


Martel, C.D., 153, 155, 366 Milis, I.Z., 333
Martello, S., 138, 155 Miller, D., 454, 455
Marti, R, 754 Miller, L.W., 137
Martin, 0., 755 Mine, H., 150
Martin, P., 155 Minton, S., 537, 538
Martin, W., 450 Mirchandani, P.B., 333
Maruyama, T., 453 Mirsky, L., 333
Marzetta, A., 323, 333 Mitchell, D., 539
Mason, A.J., 155 Mitchell, S.L., 403
Mathias, K, 456, 757 Mitten, 1., 454
Matias, Y., 615 Mittenthal, J., 156
Matsuo, H., 87, 135, 155 Mohring, RH., 132, 134, 156
Matteyses, RM., 533 Mohsen Kavehrad, 237
Mautor, T., 333, 751 Monien, B., 453
Mavridou, T., 333, 454 Monma, C.L., 156, 381
Maxemchuk, N.F., 238 Moon, B.-R, 753
Maxwell, W.L., 137 Moore, J.M., 153, 156
Mazzola, J.B., 322 Moore, RC., 536
McConnell, RM., 402 Morgentern, I., 539
McCormick, E.J., 332 Morowitz, H., 538
McCormick, S.T., 155 Morrison, J.F., 156
McDiarmind, C.J.H., 327 Moscarini, M., 397, 403
McGeoch, L.A., 536 Mosevich, J., 333
McKellar, A.C., 615 Motwani, R, 135, 156
McMahon, G.B., 132,377 Muhlenbein, H., 454
McMahon, G.B., 155 Muhlenbein, H., 755
McNaughton, R, 156, 378 Mukherjee Biswanath, 234, 235
McRae, A.A., 398 Mukherjee, B., 238
Melhem, R, 235, 237, 238 Mukhopadhyau, S.K, 165, 517
Meo, M., 238 Mulder, H.M., 393
Mercure, H., 332 Muller, F.M., 141
Metelski, N.N., 323 Muller, H., 397, 403
Metropolis, N., 333 Muller, J.H., 156
Metropolis, W., 454 Muller-Merbach, H., 334
Meyer, W., 566 Mulmuley, K, 617
Micali, S., 617 Mulvey, J., 755
Michalewicz, Z., 755 Mulvey, J.M., 753
Michelon, P., 751 Munier, A., 156
770 A uthor Index

Muntz, R.R., 156, 157, 392, 393 Osman, I.H., 158


Muriel, A., 135 Otto, S.W., 755
Murthy, K.A., 334
Murty, K.G., 334 Padberg, M.W., 334
Mynhardt, C.M., 396 Palem, K., 617
Palnati, P., 236
Nakano, R., 157, 168, 757 Palubeckis, G.S., 334
Naor, J., 131 Pamwar, S.P., 236
Nassimi, D., 617 Pan, V., 615, 617
Natarajan, B., 135 Pandu Rangan, C., 392, 393, 395,
Natarajan, K.S., 403 401, 404
Nau, D., 454 Pantziou, G., 617
Nawaz, M., 157 Pantziou, G.E., 616
Nemeti, E., 157 Panwalkar, S.S., 158, 411
Nemhauser, G., 454 Papadimitriou, C., 454
Nemhauser, G.L., 395, 403 Papadimitriou, C.H., 158, 330, 334,
Nemirovskii, S., 13, 18 413 .
Neri, F., 238 Pardalos, P., 7, 334, 449, 451, 452,
Nigam, M, m617 454
Nikoletseas, S., 617 Pardalos, P.M., 18, 332, 333, 334,
Ninomiya, K., 142 335, 618
Noga, J., 140 Parekh, A.K., 403
Nonobe, K., 756 Park, J.C., 538
Norback, J.P., 538 Park, S.H., 140
Norvig, P., 539 Parkinson, D., 328
Nowicki, E., 144, 157, 223, 403 Paterson, M., 143
Nugent, C.E., 334 Patterson, J.H., 165
Pavan, A., 234, 235, 238
Obata, T., 333 Pearl, J., 455
Oellermann, R., 397, 398 Pekny, J., 454, 455
Offermann, J., 324 Peng, S.L., 395, 403
Ogbu, F.A., 158 Perl, Y., 397
Ogier, R., 751 Perregaard, M., 323, 326
Olariu, S., 396, 397 Pesch, E., 139, 158, 756
Olkin, I., 566 Peters, K., 399, 401
Oosterhout, H., 147, 262 Petty, C., 455
Or, 1., 538 Pfaff, J., 400, 402, 403
Orlin, J., 333 Phillips, A.B., 537, 538
Orlin, J.B., 322 Phillips, C., 158
Author Index 771

Phillips, C.A., 135 Radermacher, F.J., 132


Phillips, S., 156 Raghavachari, M., 156
Phillips, S.J., 149 Raghavan, P., 614, 618
Picouleau, C., 136, 158 Raj agopalan , R., 158
Pierskalla, W.P., 753 Raj an, A., 331
Pinedo, M., 163 Rajaraman, A., 393
Pinedo, M.L., 155 Rajasekaran, S., 615, 618
Pinson, E., 135 Ramachandran, V., 616
Pitsoulis, L., 454 Ramakrishnan, K.G., 332, 335
Pitsoulis, L.S., 333, 335 Ramalingam, G., 404
Plateau, G., 752 Ramaswami, R., 238, 239
Plato, D.L., 532 Ramesh, K., 453
Plaxton, C.G., 614 Rana, A.I., 156
Plotkin, S., 130 Rana, S., 756
Plyter, N.V., 328 Ranade, A.G., 618
Pnueli, A., 398, 403 Rangaswamy, B., 754
Ranka, S., 532
Polya, G., 330
Rao, S., 153
Porto, S., 455
Rao, V.N., 449, 453
Posner, M.E., 132, 145, 158, 165,
Rardin, R.L., 753
238, 518
Raue, P.E., 751
Potts, C.N., 3, 113, 118, 129, 130,
Ravi, S.S., 402
132, 135, 137, 143, 146,
Rayward-Smith, V.J., 160
150, 156, 158, 159, 160,
Reddi, s.S., 143
244, 246, 305, 434, 435
Reeves, C., 757
Powell, W.B., 136
Reeves, C.R., 160
Preparata, F., 618
Reghbati, E., 618
Pretolani, D., 333
Rego, C., 756
Prigogine, I., 538 Reichling, M., 538
Prins, J.F., 615 Reif, J.R., 615, 618, 619
Proskurowski, A., 392, 398, 403, Reinelt, G., 335
404, 405 Reischuk, R., 148, 619
Pruzan, P.M., 331 Rendl, F., 9, 18, 324, 325, 326,
Pulleyblank, W., 405 327, 329, 331, 334, 335,
Punnen, A.P., 756 337
Purdom, P.W., 538 Renyi, A., 615
Puri, R., 534, 538 Resende, M., 454
Resende, M.G.C., 327, 332, 333,
Queyranne, M., 160, 355 334, 335, 751
772 Author Index

Rhee, C., 402, 404 Russell, S., 539


Rhee, W.T., 336 Ruttkay, Zs., 751
Ribeiro, C., 455 Ryley, A., 236, 238
Ricciardelli, S., 132
Richa, A., 153 Sabuncuoglu, I., 165
Richards, D., 450 Sahni, S., 136, 144, 148, 161, 217,
Rifkin, A., 155 336, 453, 457, 461, 536,
Rijal, M.P., 334, 336 615, 617
Rinaldi, G., 334 Sairam, S., 616
Rinnooy Kan, A.H.G., 131, 133, Sakarovitch, M., 400
141, 145, 151, 152, 153, Sanchis, L.A., 539
156, 160, 228, 315, 328, Sang-Kyu Lee, 237
332, 335, 348, 349, 533, Sarin, S.C., 161
537 Sarkar, V., 539
Rivest, R., 450 Sassono, A., 393
Roberts, F.D.K., 396 Satratzemi, M., 405
Robertson, G., 455 Schaffer, A., 336
Robillard, P., 133 Schaffer, J.D., 751
Rochat, Y., 756 Schaffter, M.W., 156
Rock, H., 160 Scheffler, P., 404
Roe, E., 455 Scheideler, C., 140
Rogaway, P., 2, 18 Scherson, D., 617
Rohl, J.S., 404 Schlierkamp-Voosen, D., 755
Rolim, J., 451, 614 Schmid, M., 539
Rom, R., 131 Schmidt, G., 160
Rosen, B.E., 168 Schmidt, J.P., 161
Rosenbluth, A., 333, 454 Schmitz, L., 533
Rosenbluth, M., 333, 454 Schneider, J., 539
Rosenkrantz, D.J., 538 Schnorr, C., 619
Rote, G., 160, 324 Schrage, 1., 148, 161
Rothkopf, M.H., 160, 161,452, 453 Schrage, L., 161
Roucairol, C., 333, 336 Schrijiver, A., 332
Rouskas, G.N., 238 Schroedinger, E., 539
Roychowdhury, V.P., 167 Schuler, D.M., 539
Rudolf, R., 324 Schulz, A.S., 135, 145, 156, 160,
Ruiz Diaz, F.M., 161 162, 233, 440
Ruml, J., 334 Schutten, J.M.J., 161
Rushforth, C.K., 540 Schuurman, P., 147, 162
Russell, R.M., 147, 161, 253 Schwartz, J.T., 619
A uthor Index 773

Schwartzschild, B.M., 539 Sievers, B., 134


Schwefel, H., 750 Sih, G.C., 539
Schweikert, D.G., 539 Simchi-Levi, D., 135
Scudder, G.D., 33, 131 Simmons, B.B., 163, 191, 493
Sechen, C., 539 Sin, C.C.S., 136
Seese, D., 404 Singer, J.M., 539
Seiden, S.S., 140 Sivarajan, K.N., 239
Selman, B., 539 Skorin-Kapov, J., 325, 336
Sen, S., 617 Skubalska, E., 144
Sen, T., 139 Slater, P.J., 393, 399, 400, 402,
Senior, J.M., 236, 238 404
Serna, M., 619 Slowinski, R, 132
Sethi, R, 134, 142, 148, 151, 162, Smart, C.B., 404
276 Smith, A.E., 337
Sethi, S.P., 145 Smith, D.K., 158
Sevastianov, S.V., 157, 159, 162 Smith, J.R, 619
Sgall, J., 131, 132, 140, 141, 162 Smith, M.L., 134, 158
Shade, P., 143 Smith, S.A., 161
Shaefer, D., 401 Smith, S.H., 327
Shafransky, Y.M., 165, 524 Smith, S.J., 614
Shaklevich, N.V., 163, 164 Smith, S.P., 163
Shamir, A., 619 Smith, W.E., 163
Sharary, A.H., 163, 485 Smutnicki, C., 144, 157
Sharony, J., 238 Sosic, R, 404, 539, 540
Sheppard, W.F., 15, 19 Sotskov, Y.N., 134, 150, 163, 164,
Sherali, H.D., 321, 322, 323 165, 525
Shewani, N.A., 399 Soumis, F., 155
Shi, L., 154 Sousa, J.P., 164
Shmidt, G., 133 Spearman, M.L., 168
Shmoys, D.B., 135, 145, 146, 152, Spears, W.M., 756
153, 155, 159, 163, 335, Spencer, J.H., 613
537 Spiessens, P., 453, 455
Shrijver, A., 11, 19 Spinrad, J., 156, 404
Shu, W., 535, 541 Spinrad, J.P., 402
Shwimer, J., 163 Spirakis, P., 149, 617
Sibeyn, J., 616 Spirakis, P.G., 616, 619
Sidney, J.B., 163, 490 Sri Karishna, G., 401
Siedmann, A., 158 Srinivasan Rao, A., 404
Siegel, A., 161 Srinivasan, A., 143, 161
774 Author Index

Sriskandarajah, C., 144, 145, 147 Tailard, E., 165, 337


Starkweather, T., 456 Taillard, E., 455
Stearns, R.E., 538 Taillard, E.D., 328, 329, 756
Steiglitz, K., 150, 454 Talbot, F.B., 165
Stein, C., 135, 158, 163, 164, 417 Tan, J., 234
Steinberg, A., 164 Tanaev, V.S., 144, 165
Steinberg, L., 336 Tang, G., 87, 135, 136
Steiner, G., 164, 502 Tanse, R., 455
Steinitz, E., 164 Tansel, B.C., 165
Stern, T.E., 238 Tardos, E., 153, 163
Stewart, L., 396, 397, 402 Tarjan, R., 614
Stewart, L.K., 396, 397 Tarjan, R.E., 142, 191, 616
Stewart, W.R., 533, 540 Tate, D.M., 337
Sticker, T.M., 619 Tautenhahn, T., 133, 150, 164
Stinson, J.P., 164 Tavares, A., 751
Stolin, J.I., 132 Tecchioli, G., 449, 751
Stone, H.S., 336, 540 Telle, J.A., 404, 405
Stone, J.M., 540 Teller, A., 333, 454
Storer, R.H., 164 Teller, E., 333, 454
Storoy, S., 755 Teng, S.H., 140, 141
Stougie, L., 132, 164 Teo, C., 4, 5, 18
Stout, Q.F., 402 Teza, J., 235
Strusevich, V.A., 135, 136, 139, 154, Thompson, C.D., 619
159, 163, 164, 165, 525 Thompson, G.L., 141
Su, Z.-S., 131 Thompson, G.L., 142
Sudhakar Ganti, 237 Tivari, A., 322
Sudhakar, G.N.M., 239 Todd D. Terence, 239
Suh, C.J., 155 Todd Terence, 234
Suh, J., 452 Todd, M.J., 18
Sullivan, R.S., 29, 131, 155 Todd, T.D., 237
Swart, H.C., 397, 398 Toft, B., 566
Sweedyk, E., 143 Tompa, M., 614
Szegedy, M., 614 Tong, S.R., 238, 239
Szemeredi, E., 565, 613 Tong, Y.L., 19
Szpankowski, W., 336 Torng, E., 156
Szwarc, W., 138, 164, 165, 512, Torreao, J., 455
513, 516, 517, 518 Torsten, S., 616
Toth, P., 155
Tadei, R., 138, 165 Toulouse, M., 450, 451
Author Index 775

Townsend, W., 166, 528 Van Wassenhove, L.N., 118, 129,


Trafalis, T., 756 137, 146, 150, 159, 160,
Trienekens, H., 456 166, 167, 246, 305, 434,
Trubian, M., 138 435
Tsai, K., 400 Van de Velde, S.L., 113, 137, 139,
Tsai, K.H., 405 147, 148, 162, 166, 268
Tsang, D., 535 Van den Akker, J.M., 166,533
Tsantilas, Th., 618 Varvarigou, T.A., 167
Tsouros, C., 405 Vassens, R.J.M., 166
Tsung, F., 453 Vavarigou, T.A., 167
Tucker, A.C., 566 Vaz, R.F., 326
Turan, G., 141 Vazacopoulos, A., 37, 131
Tutte, W.T., 619 Vazirani, U.V., 617
Tuza, Z., 395 Vazirani, V.V., 152, 617
Tuzikov, A.V., 166 Vecchi, M., 452
Vecchi, M.P., 331, 537
Veldhorst, M., 154
Ugi, 1., 337
Veltman, B., 147, 148, 154, 167
Uhry, J.P., 394
Ventura, J.A., 33, 167, 555
Ulder, N.L.J., 129, 756
Verma, S., 167
Ullman, J.D., 166
Vestjens, A., 136, 270, 547
Ullmann, J., 619
Vestjens, A.P.A., 148, 167
Ulrich, E.G., 539
Vetter, R.J., 236
Upfal, E., 616
Vicens, L., 141
Urland, E., 619
Vickson, R.G., 167
Urrutia, J., 392
Villareal, F.J., 167, 550
Virball, V.G., 326
Vaccari, R., 164 Vishkin, U., 615
Vairaktarakis, G.L., 153 Vitter, J., 131
Valiant, L.G., 619 Vizing, V.G., 566
Valli, Z., 751 Vohra, R., 4, 5, 18, 132
Valls, V., 754 Voigt, H.-M., 755
Van Gutch, D., 452 Vollmann, T.E., 323, 334
Van Houweninge, M., 328 Voloshin, V.I., 394
Van Laarhoven, P.J.M., 129, 332, Volta, G., 138
756
Van Leeuwen, J., 619 Waarts, 0., 130
Van Lint, J.H., 237 Wah, B., 453
Van Vliet, A., 95, 135, 136 Walters, J.L., 400
776 Author Index

Wan, P., 235 Woodruff, D.L., 168


Wan, P.J., 238 Wostmann, B., 133
Wang, W., 535, 540, 566 Wright, A.H., 757
Wang, Y., 160 Wu, M.Y., 535, 541
Ward, T.L., 337 Wu, S., 395
Webster, S.T., 167 Wu, S.D., 164
Weglarz, J., 132, 133 Wu, P.L., 565, 566
Wei, y'-C., 540, 541 Wu, Y., 536
Wein, J., 135, 145, 158, 163, 165,
233, 236, 417 Xu, J., 754
Wells, C.E., 138 Xu, W., 322
Weng, M.X., 167, 555 Xue, J., 335
Werner, F., 133, 164, 167 Yadegar, J., 328
West, D.H., 337 Yadid, T., 130
White, K., 405 Yamada, T., 157, 168, 757
White, L.J., 403 Yan, J.-H., 565
Whitley, D., 456, 756.757 Yang, A., 154, 361
Widmer, M., 168 Yang, T., 541
Wigderson, A., 616 Yannakakis, M., 142, 158, 330, 336,
Wilfong, G.T., 142, 193 405, 619
Wilhelm, M.R, 337 Yano, C.A., 168
Wilkerson, 1.J., 168, 558 Yap, H.P., 566
Williams, K.A., 239 Ye, Y., 3, 18, 19
Williamson, D.P., 8, 18, 159, 163, Yeh, H.G., 395, 405
168 Yen, C., 405
Wilson, RJ., 393 Yoshida, N., 401
Wilson, RM., 237 Young, G.H., 140, 152, 153
Wimer, T., 399 Yu, E.Nesterov, 19
Wimer, T.V., 405 Yu, W., 135, 168
Wismer, D.A., 168 Yu.M., 168
Woeginger, G.J., 95,130,135,136, Yue, M., 169
140, 141, 142, 147, 150, Yung, M., 532, 617
160, 162, 323, 324, 326
Wolfe, D., 334 Zagha, M, 614
Wolkowicz, H., 329, 331, 334, 335, Zaguia, N., 163, 485
337 Zaroliagis, C., 616, 617
Wolsey, L., 454 Zawak, D., 129
Wolsey, L.A., 140, 164 Zdrzalka, S., 144, 157, 169, 223,
Wong, C.S., 139 403,574
A uthor Index 777

Zenios, S., 757


Zenios, S.A., 751
Zenou, 1., 236
Zhang, X., 3, 18
Zhang, Y., 452, 566
Zhang, Z., 236
Zhao, Q., 337
Zheng, Q., 401
Zhou, Z., 534
Zijm, W.H.M., 162, 168
Zimmermann, D., 324, 325
Zwijnenburg, M.,
755
Subject Index
Active, 99 Critical path, 66, 90
Akers-Friedman-problem, 101 Critical ttranspose, 106
Algorithmic transition, 479 Cross-over rules, 297
All-to-neighbor transmission, 175, Cyclic triple exchange, 293
176, 183
Aspiration criterion, 294 Decomposer, 480
Dedicated machines, 115
Backboard wiring, 244 Degree of linearity, 278
Batch, 111 Delayed precedence constraints, 105
Batcher's algorithm, 575 Delivery time, 41
Beam search, 667 Demon, 441
Bernoulli, 573 Deterministic machine scheduling,
Binomial distribution, 573 24
Bottleneck quadratic assignment prob- Digraphs, 345
lem, 314 Directed Acyclic Graph, 513, 516
Bottom level, 477 Disallowed entries, 269
Box inequalities Disjunctive graph, 89
Branch-and-bound,411 Distance-hereditary, 388
Branching rule, 35 Distributed data model, 415
Bridge, 472 Distributed memory, 409
Dollo's law, 462
Cactus, 358 Domantic number, 347
Canonical compatibility ordering, Dominating number, 341
386 Dominating set, 341
Central strategy, 422 Dual step optimization, 479
Chess board matrices, 305 Duality equality, 357
Chromatic number, 544 Dynamic Scheduling, 512
Combinations, 717 Dynamic workload balancing, 424
Combinatorial implosion, 738 Dynasearch, 53
Combinatorial optimization, 172,
176, 180 Edge coloring, 561
Communication delay, 118 Elimination bound, 281
Compact vector summation, 89 Embedding, 185
Complete graph embedding, 219, Exponential-edges, 493
221, 223 Exponential-nodes, 493
Critical end job, 42 Exponential-nodes-least-edge-first,
Critical events, 643 493

779
780 Subject Index

Exponential-nodes-most-edge-first, Kalmanson matrices, 306


494
Linear arrangement problem, 244,
Family setup time, 110 251, 320
Fastest machines, 67 Linear nodes, 493
Feasible solution, 409 Linear-edges, 493
Feasible, 26 Linear-nodes-least-edge-first, 493
First-order phase, 463 Linear-nodes-most-edge-first, 494
Fixed connection machines, 570 List scheduling, 74
Local search algorithms, 292
Generalized upper bound, 647 Local search problem, 252
Genetic algorithm, 296, 484 Lower bound function, 413
Graph, 343
Greedy swap, 252 MGP1 algorithm, 489
Greedy, 120 MGP2 algorithm, 489
Guided logical search, 106 MGP3 algorithm, 489
Guiding solutions, 691 MGP4 algorithm, 489
MGP5 algorithm, 490
Heider's rule, 294 Machine based bound, 95
Hierarchical, 63 Manhattan distances, 308
Hill-ascending algorithms, 431 Master-slave, 446
Homogeneous state, 467 Matching, 591
Hybrid algorithms, 297 Microcanonical optimization, 440
Immediate selection, 104 Middle level, 477
Inequalities, 260 Migration policy, 447
Inferior packets, 583 Minimum k-'free problem, 633
Initial search, 481 Minimum weight feedback arc set
Inner product, 246 problem
Interphase point, 443 Mixed dominating set, 350
Interval indexed, 79 Move attribute, 431
Interval, 560 Multi-stage, 86
Inverse-exponential-nodes-least-edge- Multi-threading, 442
first, 493 Multifit, 75
Inverse-exponential-nodes-most-edge- Multiple-Parameter-Setting, 442
first, 494 Multiprocessor job, 115
Isomorphic, 343 Multiprocessor, 100, 101
Multispace search, 459, 465
Job duplication, 119 Mutations, 38, 297, 444

K-partite branching rule, 288 Neighborhood, 252, 293, 431, 446


Subject Index 781

Network topology, 185 Reformulation, 267


Non-preemptive, 102 Reorder crossover, 98
Null solution, 695 Residence frequencies, 678
Optimal solutions, 409 Restricted candidate list, 298
Optimal transmission schedule, 186 Round robin, 71
Optimization, 493
Ordering polytope, 259 Schedule policy, 447
Oscillation boundary, 687 Scrambling schedule, 467
Outerplanar, 559 Second-order phase, 463
Selection rule, 287
pair assignment branching, 287 Sequence independent, 110
Pair-exchange, 253 Series-parallel, 44
Pairwise Independent Lemma, 569 Shared data model, 415
Partial permutation, 579 Shared memory models, 570
Path relinking, 691, 717 Shifting bottleneck, 105
Permutation schedule, 92 Shortest path triple, 285
Perturbation, 691, 698 Simulated annealing, 485
Planar, 559 Single assignment branching, 287
Polynomial time approximation, 251 Single stage, 27
Polynomial-time local search prob- Single-Parameter-Setting, 442
lems, 252 Smoothing factor, 504
Positive, 32 Solution attribute, 431
Positonal weights, 58 Spanning subgraph, 344
Precedence diagramming, 124 Sparse enumeration sort, 578
Preemption, 28 Split graph, 366
Preprocessor, 480 Stable set, 356
Priority list, 417 Static scheduling, 512
Priority schemes, 579 Strongly chordal, 376
Priority set, 412
Structural qualities, 462
Proportionate, 93
Sub-optimal solution, 430
Proximate Optimality, 692
Superior packets, 583
Pseudoschedules, 84
Symmetrical configurations, 471
Quantitative change, 462
Queue Line lemma, 580 Tabu search, 482, 638, 639
Tabu tenure, 651
Recency-based memory, 633 TabuDrop start, 649
Recursion, 36 Thermal equilibrium, 296
Reduction methods, 265 Time complexity, 33
Referent domain optimization, 747 Time delay, 721
782 Subject Index

Toeplitz matrices, 305


Top level, 477
Transition frequencies, 678
Transitive Closure Bottleneck, 587
Transpositions, 294
Traveling salesman Problem, 250,
319, 501
Traveling salesman polytope, 259
Tree, 348
Triangle inequality, 110
Tutte matrix, 593
Two-way graph, 486
Typewriter keyboard design, 244

Uniform State, 471


Author Index of Volumes 1-3
Aarts, E., (2) 376, 524; (3) 321, Agarwal, P.K., (2) 27, 28
449 Agarwal, Y., (1) 647, 691
Aarts, E.H.L., (3) 129, 166, 321, Agarwala, R., (2) 322
332,756 Aggarwal, A., (1) 8, 56, 60, 72-73;
Abara, J., (2) 712 (3) 613
Abbott-HL, H.L., (2) 353 Aggarwal, C.C., (2) 252
Abdul-Razaq, T.S., (3) 3, 129 Aggralwal, R., (2) 384
Abe, T., (2) 387 Agin, N., (1) 647, 691
Abernathy, W., (1) 643, 668 Agrawal, V., (1) 86, 142
Abramson, B., (3) 532 Ahmad, 1., (3) 537
Abramson, D., (1) 234-235, 284; Ahmadi, R.H., (3) 129, 130
(3) 449 Ahmadian, A., (3) 130
Abu-Mostafa, Y., (2) 518 Ahmadvand Nima, 234
Acampora S. Anthony, 234 Ahn, S., (3) 161
Acampora, A.S., (3) 236, 237 Ahrens, J.H., (1) 355, 418
Achatz, H., (2) 518 Ahrens, W., (3) 532
Achlioptas, D., (1) 138, 140 Ahuja, H., (1) 643, 668
Achugbue, J.O., (3) 129 Ahuja, N.K., (1) 215, 217, 221, 233,
Ackey, D., (3) 750 282
Action, F., (1) 646, 689 Ahuja, R., (3) 333
Adam Beguelin, 154 Ahuja, R.K., (2) 252; (3) 322
Adam, T.L., (3) 532 Aichholzer, 0., (2) 28, 629, 630
Adams, J., (3) 129 Aittoniemi, L, 388, 418
Adams, W.P., (1) 179, 187-188, 236, Aizikowitz, N., (3) 129
282,479-484,498-499,517, Ajtai, M., (3) 613
524-527 530-532; (3) 321, Akers, Jr., (2) 353
I
322 Akers, S.B., (2) 583; (3) 130
Adams, W.W., (1) 542, 569 Akinc, V., (1) 652, 714
Adenso-Dias, B., (3) 129 Akiyama, J., (2) 353
Adiri, I., (3) 129, 146, 154 Aksjonov, V.A., (3) 130
Adjiman, C.S., (1) 1,46-48,50,67, AI-Harkan, I., (3) 756
71-72 AI-Khayyal, F.A., (1) 41, 72, 524-
Adler, 1., (1) 209, 212, 217, 282, 525,532
292 Alao, N., (1) 654, 714
Adolphson, D., (3) 129 Albers, S., (1) 641, 659; (3) 130
Agarwal, A., (2) 28, 153, 322, 583 Albracht, J.J., (1) 645, 686

783
784 A uthor Index of Volumes 1-3

Aldous, D., (2) 73 Aneja, Y.P., (1) 642, 646, 652, 665,
Ali, D., (1) 641, 663 687,714
Ali, H.H., (3) 130, 533 Angel, R., (1) 649, 691
Alidaee, B., (3) 19, 130, 753 Angeniol, B., (2) 518
Alimonti, P., (1) 122, 140 Anily, S., (1) 641, 647, 649, 659,
Alizadeh, F., (1) 151, 179, 182, 276- 691
277, 282; (3) 1, 18 Annaratone, M., (2) 384
Allen, L.A., (1) 654-655, 722 Anosov, D., (2) 518
Almond, M., (1) 646, 687 Anthony Brooke, 51, 73
Alon, N., (2) 353; (3) 130, 613 Apkarian, P., (1) 166, 184
Alstrup, J., (2) 712 Apostolico, A., (2) 28
Altinkemer, K, (1) 647, 691 Appel, K, (2) 354
Altman, S., (1) 644, 668 Appelgren, K, (1) 649, 692
Alvernathy, W.J., (1) 644, 673 Appelgren, L., (1) 649, 691
Appleby, J.S., (2) 354
Aly, A.A., (2) 461
Applegate, D., (3) 130
Aly, KA., (2) 583
Arabeyre, J.P., (1) 644, 677; (2)
Amado, L., (1) 645, 682
713
Amar, G., (1) 644, 677
Arabie, P., (2) 322, 323, 325, 328
Amari, S., (2) 518
Aragon, C.R., (2) 373
Ammar, M.H., (3) 238
Arani, T., (1) 647, 687
Amos, D., (1) 646, 690
Aranson, S., (2) 518
Amoura, A.K., (3) 130 Arcelli, C., (2) 323
Ananth, G., (3) 449 Archdeacon, D., (2) 354
Anbil, R., (1) 644-645, 677-678; (2) Arguello, M.F., (2) 713
712 Arikati, S., (2) 28
Andersen, E.D., (1) 212, 282 Arisawa, S., (1) 649, 692
Anderson, D., (2) 717 Arjomadi, E., (2) 354
Anderson, E.J., (3) 130, 155 Arkin, E.M., (2) 28
Anderson, J., (2) 518 Armour, G.C., (1) 654, 714; (3)
Anderson, R.J., (3) 613 323
Ando, K, (2) 252 Armstrong, M.A., (2) 73
Andrasfai, B., (2) 353 Armstrong, R.D., (3) 141, 181
Andreatta, G., (2) 713 Arnborg, S., (3) 392
Andreussi, A., (2) 713 Arnold, V.I., (2) 518
Andrews, J.A., (2) 354 Arora, S., (1) 116, 140-141; (2)
Andros, P., (1) 647, 694 354; (3) 322
Androulakis, I.P., (1) 46-48, 50, Arpin, D., (1) 649, 652, 704, 729
67,71-72 Arvind, K, (2) 540; (3) 392
Author Index of Volumes 1-3 785

Arvindam, S., (3) 449 Bagchi, A., (2) 377


Arya, S., (2) 28 Bagchi, U., (3) 129, 130, 131
Asano, T., (1) 116, 141; (3) 392 Bahn, 0., (1) 258, 283
Asirelli, P., (1) 79, 141 Bai, D., (3) 753
Aspnes, J., (3) 130 Bailey, E.E., (2) 713
Aspvall, B., (2) 354 Bailey, J., (1) 643, 668
Assad, A., (1) 647-648, 649, 692, Bailey, J.E., (1) 643, 675
700, 705 Baird, B., (2) 518, 521,
Assad, A.A., (1) 647-649, 694, 701; Baker, B.M., (1) 647, 692
(3) 322 Baker, E. 641, 644, 647-649, 659,
Atallah, M.J., (2) 28, 29, 153, 540; 677, 692, 701
(3) 392 Baker, E.K., (1) 644, 678
Atkins, RJ., (1) 652, 714 Baker, J.R, (3) 31, 34, 132
Atkinson, D.S., (1) 258, 283 Baker, K.R, (1) 643, 652, 668-669,
Attaway, J.D., (3) 234 714; (3) 131, 161, 166
Aubin, J., (1) 646, 687 Bala, K., (2) 584
Aude, J., (3) 452 Balakrishnan, A., (1) 649, 695
Aumann, RJ., (2) 1, 98 Balakrishnan, P.V., (1) 653, 714
Aumann, Y., (2) 583 Balakrishnana, H., (3) 393
Aurenhammer, F., (1) 174, 183; Balanski, M., (1) 742
(2) 28, 461, 629, 630 Balanski, M.L., (1) 642, 665
Ausiello, G., (1) 79, 90, 117, 141 Balas, E., (1) 39, 72, 246, 283, 311-
Auslander, M.A., (2) 360 313, 321, 324-325, 338-340,
Aust, RJ., (1) 646, 687 342, 418-419, 480, 482-483,
Averabakh, 1., (1) 647, 692 492, 494, 517-518, 525, 527,
Avidor, A., (3) 131 641,653,659-660,714,742;
Avis, D., (1) 174, 183, 310, 418, (2) 355, 713
641,659 Baldi, P., (2) 355
Avra, L., (2) 354 Baldick, R, (2) 252
Avriel, M., (2) 518 Balinksi, M., (1) 647, 649, 692
Awerbuch, B., (2) 583; (3) 131 Ball, M., (1) 647, 649, 692, 698;
Aykin, T., (2) 10, 713 (2) 713
Azar, Y., (2) 583; (3) 130, 131 Ball, M.O., (1) 644, 647-648, 653,
678, 694, 714
Baase, S., (2) 252 Ballart, R, (2) 584
Babai, L., (3) 614 Ballas, E., (3) 129, 131, 332
Babel, L., (2) 355 Ballou, R, (1) 652, 714
Back, T., (3) 750 Baloff, N., (1) 643-644, 668, 673
Bafna, V., (2) 322 Bampis, E., (3) 130
786 Author Index of Volumes 1-3

Bandelt, H.J., (2) 323; (3) 393, Batta, R, (1) 653, 654, 658, 715
756 Battiti, R, (1) 77, 133-136, 141;
Bandyopadhyay, B.K., (2) 387 (3) 322, 449, 751
Banerjee Subrata, 234 Bauer, J., (3) 337
Bang-Jensen, J., (2) 355 Bauer, P.W., (2) 17, 714
Bange, D.W., (3) 393 Bauernoppel, R, (2) 355
Bannister, J.A., (3) 234 Baumert, L.D., (2) 369
Banzhaf, J.F., (2) 98 Baumol, W.J., (1) 652, 715
Banzhaf, W., (2) 518 Bauslaugh, B., (2) 355
Bar-Noy, A., (2) 540, 583 Baybars, I., (1) 645, 684; (2) 355
Baracco, P., (3) 138 Bayer D., (1) 558, 570
Barany, I., (3) 131, 132 Bayer, D.A., (1) 212, 283
Barber, a.M., (1) 654, 730 Bazaraa, M.S., (1) 217, 283; (3)
Barbosa, V., (3) 449 322: 323
Barcelo, J., (1) 652, 714 Beale, E.M.L., (1) 8, 30, 72
Barcia, P., (1) 366, 419 Beall, C.L., (2) 395
Bard, J.F., (2) 713 Beame, P., (3) 614
Barham, A.M., (1) 646, 687 Beasley, J., (1) 647-648, 696, 698,
Barkauskas, A.E., (3) 393 712
Barkhi, R, (1) 653, 737 Beasley, J.E., (1) 641-642, 642, 647,
Barkovic Oliver, 239 660, 665, 693; (2) 461
Barnabas, J., (2) 75 Beaumont, N., (1) 39, 72
Barnes, E.R, (1) 217, 283 Bechtold, S., (1) 643, 669-670
Barnes, J.W., (3) 132 Beck, L.L., (2) 379
Barnhart, C., (1) 644, 678; (2) 717 Beckmann, M., (1) 652, 654, 715,
Barr, RS., (1) 742 728
Barroso, A., (3) 455 Beckmann, M.J., (2) 714; (3) 331
Barry, D., (2) 73 Bedi, D.N., (2) 719
Barry, RA., (2) 584 Beguin, H., (1) 652-653, 735
Bart, P., (1) 653, 739 Behrendt, H., (3)
Bartal, y., (2) 584; (3) 132 Behrooz Kamgarr-Parsi, (2) 521
Bartholdi, J., (1) 647, 650, 692 Behzad Kamgar-Parsi, (2) 521
Bartholdi, J.J., (1) 643, 669 Behzad, M., (2) 355
Bartlett, T., (1) 649, 693 Beichel, I., (2) 461
Barton, R, (1) 649, 693 Bell, T., (1) 654, 715
Bartusch, M., (3) 132 Bell, W., (1) 647, 650, 693
Barvinok, A.I., (3) 322 Bellare, M., (2) 356; (3) 2, 18
Bastos, F., (1) 220-223, 293 Belletti, R, (1) 644, 678
Batcher, K.E., (3) 614 Belleville, P., (2) 622, 630
Author Index of Volumes 1-3 787

Bellman, R., (1) 647, 654, 693, 715 Bern, M.W., (2) 153
Bellman, R.E., (1) 336, 388, 419, Bernadi, C., (2) 356
460, 474; (3) 393 Bernhard, P.J., (3) 393
Bellmore, M., (1) 641-642, 647, 660, Bernhardsson, B., (3) 532
665, 693 Bernstein, C., (2) 356
Bellobaba, P.P., (2) 19, 714 Berry, W.L., (1) 643, 672
Beloudah, H., (3) 132 Bertolazzi, P., (3) 393
Belov, I.S., (3) 132 Bertossi, A.A., (3) 393
Beltrami, E., (1) 647, 693; (2) 356 Bertrand, J.W., (3) 31, 131
Beltrami, E.J., (1) 644, 668 Bertsimas, D., (1) 654, 714; (3) 18
Ben-Or, M., (2) 29 Bertsimas, D.J., (1) 647, 694; (2)
Benavent, E., (3) 326 725
Benchakroun, A., (2) 714 Bestehorn, M., (2) 519
Bender, E.A., (2) 356 Betts, L.M., (2) 253
Benders, J.F., (1) 9, 72; (2) 714 Beyer, D., (3) 751
Benichou, M., (1) 33, 72 Beyer, T.A., (3) 393
Bennett, B., (1) 649, 693 Bhattacharjee, G.P., (3) 615
Bennett, B.T., (1) 643, 670 Bhattacharya, B., (2) 32
Bennett, C.H., (2) 73 Bhattacharya, B.K, (1) 174, 183
Bennett, H.S., (1) 648, 650, 695 Bhattacharya, P.P., (2) 253
Bennett, V.L., (1) 653, 655, 715 Bhattacharya, S., (3) 234, 238
Benson, H.P., (1) 156, 176, 182, Bianco, A., (3) 238
183 Bianco, L., (2) 713; (3) 132
Benson, S., (3) 18 Bielli, M., (1) 645, 683
Benson, S.J., (1) 280, 283 Biggs, N., (2) 356
Benvensite, R., (1) 641, 660 Biggs, N.L., (3) 393
Benzecri, J.P., (2) 323 Bilde, 0., (1) 652, 715
Berge, C., (1) 641-642, 660, 665; Billera, L.J., (1) 536-537, 562, 567,
(2) 356 570
Berge, M.A., (2) 714 Billionnet, A., (1) 527
Berger, B., (2) 356 Binder, K, (2) 519
Bergman, L., (1) 653, 658, 740 Bindschedler, A.E., (1) 654, 715
Berlin, G.N., (1) 653, 715 Bird, C.G., (2) 98
Berman, KA., (2) 356 Birhoff, G., (3) 234, 323
Berman, L., (1) 649-650, 694 Bishop, A.B., (3) 161
Berman, 0., (1) 647, 692 Bitan, S., (2) 365
Berman, P., (3) 132 Bitner, J.R., (2) 356
Bern, M., (2) 618, 630 Bitran, G.R., (2) 253
Bern, M., (2) 461 Bixby, R.E., (2) 714
788 A uthor Index of Volumes 1-3

Blair, C.E., (1) 84, 141, 550, 570 Borret, J.M.J., (1) 645, 679
Blake, D.V., (2) 354 Bose, P., (2) 623, 630
Bland, RG., (1) 742 Boufkhad, Y., (1) 140, 143
Blattner, W.O., (1) 430, 458, 475 Bouliane, J., (1) 654, 715
Blazewicz, J., (3) 132, 133 Bowman, E.H., (1) 645, 684
Blelloch, G.E., (3) 614 Boyar, J.F., (2) 357
Blum, A., (2) 356, 357 Boyce, W.M., (2) 153
Blum, M., (1) 452, 474 Boyd, S., (1) 277, 296; (2) 329
Blumstein, A., (2) 714 Brackett, C.A., (2) 584; (3) 235
Board, J., (1) 648, 712 Brady, D., (2) 523
Boas, J., (2) 712 Bramel, J., (1) 647, 649, 695
Bodin, D.L., (1) 644, 678-679 Branco, M.I., (1) 645, 682
Bodin, L., (1) 647-650, 692-694, Brandeau, M., (1) 654, 716
701, 710;(2) 356, 357, 713 Brandon, C., (2) 461
Bodin, L.D., (1) 643-644, 670, 678 Brandstadt, A., (3) 394, 398, 403
Bodner, R, (1) 647, 694 Brasel, H., (3) 133
Boehm, M., (1) 83, 142 Bratley, P., (3) 133
Boender, E., (2) 519 Brauner, E., (2) 724
Bolland, RP., (2) 73 Brazil, M., (2) 462, 590, 594, 601,
Bollobas, B., (2) 357, 630 615
Bollobas, B., (3) 323, 564, 614 Brebner, G.J., (3) 619
Bomberault, A., (1) 649, 713 Brecht, T., (3) 140
Bondy, J.A., (2) 357 Brelaz, D., (1) 642, 665; (2) 358
Bonniger, T., (3) 323 Brennan, J.J., (3) 132
Bonuccelli, M.A., (3) 393 Bretthauer, K.M., (2) 253
Booler, J.M., (1) 644, 679 Breu, H., (3) 392, 394
Boorman, S.A., (2) 322 Breu, R., (1) 742
Boorstyn, RB., (2) 383 Brewster, R, (2) 358
Booth, K.S., (3) 393 Briggs, R, (2) 358
Boots, A.B., (2) 464 Brigham, RC., (2) 364
Boots, B.N., (2) 461 Brill, E.D., (1) 653, 731
Borchers, A., (3) 235 Brockett, R, (2) 519
Borchers, B., (1) 8, 30, 33, 50, 73, Broder, A., (1) 140, 142
234-235, 247, 257, 279-280, Broder, S., (1) 646, 687; (2) 358
283-284, 291-292 Broeckx, F., (3) 331
Borella S. Michael, (3) 235 Broersma, H.J., (2) 358
Borodin, O.V., (2) 357 Broin, M.W., (3) 394
Boros, E., (1) 482, 527 Bronemann, D.R, (1) 644, 679
Borovikov, A.A., (2) 357 Bronshtein, I., (2) 518
Author Index of Volumes 1-3 789

Brooks, R.L., (2) 358; (3) 564 Buro, M., (1) 83, 142
Brown E.K., (2) 73 Burr, W.E., (2) 584
Brown, E.L., (1) 499, 524, 530 Burstall, R.M., (1) 652, 716
Brown, E.M., (2) 462 Burton, P.G., (3) 155
Brown, G.B., (1) 647, 649-650, 695 Bushel, G., (1) 649, 699; (2) 716
Brown, J.H., (2) 714 Bushnell, M., (1) 86, 142
Brown, J.I., (2) 358 Butt, S., (1) 647, 695
Brown, J.R., (1) 642, 665; (2) 253, Buzacott, J.A., (2) 253
358 Byrn, J.E., (1) 644, 673
Brown, L.A., (1) 653, 656, 726 Byrne, J.L., (1) 643, 671
Brown, P.A., (1) 652, 716
Browne, J., (1) 643, 672, 676 Cabot, A.V., (1) 652, 654, 716,
Browne, J.J., (1) 643, 670 723
Brownell, W.S., (1) 643, 670 Cabot, V., (1) 183
Brualdi, R.A., (2) 358 Caccetta, L., (2) 358
Brucker, P., (2) 253, 323; (3) 133, Cai, L.Z., (2) 358, 359
134, 153, 349 Cai, X., (3) 76, 134
Bruengger, A., (3) 323 Cain, T., (1) 648, 712
Brumelle, S.L., (2) 27, 714 Callahan, D., (2) 359
Bruno, J.L., (3) 134 Callahan, P.B., (2) 323
Brusco, M., (1) 643, 669 Camerini, P., (1) 743
Bryant, R.E., (3) 532 Camerini, P.K., (2) 714
Buchberger, B., (1) 542, 570 Camerini, P.M., (1) 525, 527
Buckley, P., (1) 654, 733 Cameron, J., (2) 359
Buer, H., (3) 134 Cameron, S.H., (2) 359
Buffa, E.S., (1) 644, 654, 670, 714; Campbell, H.G., (3) 134
(3) 323 Campers, G., (2) 359
Bulfin, R.L., (1) 388, 419; (3) 99, Campos, V., (3) 754
136, 154, 167, 550 Canny, J., (2) 20, 29
Bulloch, B., (1) 653, 655, 721 Cao, F., (2) 585; (3) 11, 235
Buneman, P., (2) 323 Cao, J., (2) 715
Burattini, E., (2) 358 Caprara, A., (1) 309, 419
Burbeck, S., (1) 644, 681 Caprera, D., (1) 648, 707
Burdet, C.A., (1) 742 Captiva, M.E., (1) 645, 682
Burkard, R., (2) 519 Carlier, J., (3) 79, 134, 135
Burkard, R.E., (1) 309, 392, 419; Carlisle, M.C., (2) 359
(3) 323, 324, 325, 327 Carpinelli, J.D., (2) 359
Budet, M., (3) 394 Carraresi, P., (1) 180, 183, 644,
Burns, R.N., (1) 643, 668, 670-671 679; (3) 325
790 Author Index of Volumes 1-3

Carroll, J.D., (2) 323 Chandra, A.K, (1) 373, 408, 419;
Carstensen, P.J., (1) 473, 474 (2) 360
Carter, M., (1) 643, 668 Chandrasekaran, R, (1) 430, 474,
Carter, M.W., (1) 643, 646-647, 652, 654, 716, 732
670,687 Chandy, KM., (3) 532
Casanovas, J., (1) 652, 714 Chang, G.J., (1) 642, 665; (3) 395,
Case, KE., (1) 653, 742 400, 402, 405
Caspi, Y., (2) 359 Chang, M.S., (3) 392, 395, 403
Cassell, E., (1) 647, 694 Chang, S, K, (2) 462
Cassidy, P.J., (1) 648, 650, 695 Chang, S., (3) 135
Catlin, P.A., (2) 359, 360 Chang, Y.-L., (3) 131
Cattrysse, D., (1) 646, 685 Chao, M.-T., (1) 140, 142
Cattrysse, J., (1) 646, 685 Chard, R, (1) 647, 695
Caudle, W., (1) 649, 691 Charikar, M., (3) 132
Charlton, J.M., (3) 135
Cavalier, T.M., (1) 647, 695
Charnes, A., (1) 649, 653, 693, 695,
Cavalli-Sforza, L.L., (2) 73
716
Ceder, A., (1) 644-645, 649-650,
Charney, R.R, (3) 532
679, 683-684, 695
Chartrand, G., (2) 360
Cela, E., (3) 323, 324, 325, 327,
Chataurvedi, A., (2) 323
331
Chaudhry, S.S., (1) 652-653, 732
Cellary, W., (3) 132
Chaudry, S.S., (1) 641, 654, 660,
Ceria, S., (1) 246, 283, 483, 492,
716
494, 527; (2) 355
Chazelle, B., (2) 29, 153
Cerny, V., (3) 325, 449, 532
Cheban, Y.!., (2) 360
Cerveny, RP., (1) 652, 655, 716 Cheeger, J., (2) 615
Chaiken, J., (1) 643, 671 Cheeseman, P., (2) 360
Chaiken, J.M., (1) 654, 716 Chekuri, C., (3) 135
Chaitin, G.J., (2) 360 Chelst, K, (1) 643, 671
Chakrabarti, S., (3) 135 Chen Ming, 235
Chakradar, S., (1) 86, 142 Chen T.R., (2) 32
Chakrapani, J., (3) 325 Chen, D.Z., (2) 28, 29, 30
Chambers, J.B., (3) 132 Chen, B., (3) 135
Chams, M., (2) 360 Chen, B.-L., (3) 565
Chan, A.W., (1) 654, 716 Chen, B.L., (2) 360, 394
Chan, L.M.A., (3) 135 Chen, C.K, (1) 652-653, 727
Chan, P.W., (1) 646, 690 Chen, C.L., (3) 136
Chan, T.J., (1) 641, 660 Chen, D., (1) 643, 671; (3) 539
Chandra Sekaran, R, (1) 652, 714 Chen, G., (2) 360
Author Index of Volumes 1-3 791

Chen, G.X., (2) 153 Choi, J., (2) 30


Chen, J., (1) 92, 94, 96-97, 142 Choi, Y., (1) 648, 710
Chen, J.M., (1) 654, 656, 724; (2) Chopra, S., (1) 642, 665
462 Chorobak, M., (2) 360, 540
Chen, M.S., (2) 584 Choudhury, A.K., (2) 388
Chen, Mon-S, (2) 260 Chow, A., (2) 153
Chen, T.C.E., (3) 136 Chow, F., (2) 361
Chen, Z.L., (3) 136 Chowdhury, I.G., (3) 140, 157
Cheng, S.W., (2) 621, 629, 630, Chretienne, P., (3) 136, 137, 325
631 Chrissis, J.W., (1) 653,717
Cheng, C.K., (3) 103, 540, 541 Christof, T., (1) 246, 284
Cheng, K.W., (2) 584 Christofides, N., (1) 403, 419, 641,
Cheng, T.C.E., (3) 153, 154 652,647-649,660,696,698,
Chepoi, V.D., (3) 394 717, 743; (2) 361, 715; (3)
Cherici, A., (1) 645, 683 325,326
Cheriton, D., (2) 462 Chu, C., (3) 136, 137
Cheriyan, J., (1) 85, 142 Chudak, F., (3) 137
Cheshire, I.M., (1) 647, 695 Chung, F.RK., (2) 153, 154, 462
Cheston, G.A., (3) 396 Chung, Y.C., (3) 532
Chetwynd, A., (2) 360 Church, J.G., (1) 643, 671
Cheung, K.W., (3) 235, 238 Church, R, (1) 654, 715, 737
Chevalley, C., (1) 204, 284 Church, RL., (1) 652-655, 658, 715,
Chew, L.P., (2) 28, 30, 31, 33 717-718,741
Chhajad, D., (1) 652, 716; (2) 360 Churchill, G.A., (2) 76
Chiang, y'-J., (2) 30 Chvatal, V., (2) 361
Chiarulli, D., (3) 235, 237 Chvatal, V., (1) 310, 362, 419
Chien, T.W., (1) 649, 695 Chv:atal, V., (1) 138-139, 142, 310,
Chilali, M., (1) 166, 184 362,419,641-642,660,666
Chin, F., (2) 633 Ciftan, E., (1) 646, 690
Chin, F.Y., (3) 129 Ciric, A.R., (1) 8, 73
Ching, Y.C., (2) 584 Clarke, G., (1) 647, 696
Chinn, D.D., (3) 140 Clarke, M.R.B., (1) 308, 421
Chipalkatti, R, (3) 236 Clarkson, K.L., (2) 30
Chipperfield, A., (3) 450 Claus, A., (2) 4, 99
Chiu, S., (1) 654, 716 Clausen, J., (3) 323, 326, 327, 331,
Chlamtac, I., (2) 585 450
Cho, D.C., (1) 652, 717 Cleeroux, R, (2) 714
Cho, Y., (3) 136, 161, 461 Clementi, A., (3) 614
Choi, G., (1) 517, 525-526, 530 Cochran, H., (1) 648, 712
792 Author Index of Volumes 1-3

Co ckayne , E.J., (1) 642, 666; (2) (2) 253; (3) 130
154, 462, 590, 597, 615; Cooke, J., (2) 360
(3) 396, 403 Cooper, KD., (2) 358
Coffman Jr, E.G., (3) 134, 137, Cooper, L., (1) 652, 654, 718
156, 157 Coorg, S.R, (3) 395
Coffman, E., (3) 532 Coppersmith, D., (2) 583; (3) 614
Coffman, E.G., (1) 641, 647, 660, Corbett, P.F., (2) 585,
695 Cormen, T., (3) 450
Cohen, B., (1) 129, 147 Cormen, T.H., (1) 419, 434, 465,
Cohen, E., (1) 472, 475; (3) 614 475; (2) 31
Cohen, J., (1) 653-654, 737 Corneil, D., (2) 630
Cohen, M.A., (2) 253 Corneil, D.G., (1) 641-642, 661, 666;
Cohon, J.L., (1) 652, 654-655, 718- (2) 362; (3) 396, 397
719 Corniel, D., (3) 618
Cohoon, J., (3) 450 Cornuejols, G., (2) 715
Colbourn, C.J., (2) 540, 541; (3) Cornuejols, G., (1) 179, 183, 246,
396 283, 483, 492, 494, 527,
Cole, A.J., (2) 361 569-570,641,652,658,661,
Cole, J., (2) 462 718-719
Cole, R, (1) 454, 475; (3) 614 Correa, R, (3) 450
Coleman, T.F., (2) 361 Cortesi, M., (3) 165
Colin, J.-Y., (3) 137 Cosagrove, M.J., (1) 644, 670
Collins, R, (1) 647, 650, 692 Casares, S., (2) 254
Colorni, A., (3) 326 Costa, D., (2) 362
Condon, A., (2) 377 Courant, D.R, (2) 462
Conforti, M., (1) 179, 183, 641, Courant, R, (2) 154
661 Cowen, L.J., (2) 362
Congram, RK, (3) 137 Cowen, RH., (2) 362
Conley, W., (2) 715 Cox, D., (1) 542, 570
Connelly, R, (2) 28 Cox, T., (2) 585
Connolly, D.T., (3) 326 Coxeter, H.S.M., (2) 462
Conrad, K, (3) 326 Cozzens, M.B., (2) 362
Consiglio, A., (3) 751 CPLEX, (1) 526, 528
Conti, P., (1) 542, 545-546, 570 Crabil, T.B., (1) 643, 669
Conway, RW., (1) 654, 718; (3) Craig, C.S., (1) 654, 724
137 Crainic, T., (3) 450
Cook, RJ., (2) 361, 362 Crama, Y., (1) 482, 527, 641, 661
Cook, S.A., (1) 79, 138, 142 Crandall, H., (1) 647, 700
Cook, W., (1) 178, 183, 550, 570; Crauwels, H.A.J., (3) 118, 137
Author Index of Volumes 1-3 793

Crawford, J.L., (1) 647, 650, 697 Dall, S.K., (3) 402
Crescenzi, P., (1) 79,90, 117-118, Dallwig, S., (1) 46-47, 50, 72
122, 141, 143; (2) 323 Dam, T.Q., (2) 585
Creutz, M., (3) 450 Damaschke, P., (3) 397, 402
Criss, E., (1) 654, 656, 724 Daniel Granot, (2) 101
Crouse, J., (3) 334, 454 Dankesreiter, M., (3) 539
Crowder, H., (1) 302,419,480,515, Dannenbring, D., (1) 403, 425
525, 528; (2) 715 Dannenbring, D.G., (1) 652, 727;
Crowder, H.P., (1) 744 (3) 137
Csanky, L., (3) 614 Danninger, G., (1) 180, 183
Csima, J., (1) 646, 688 Dantazig, G.G., (1) 647, 649, 697
Culberson, J.C., (2) 362 Dantazig, G.W., (1) 644, 671
Culik II, K., (2) 6, 73 Dantzig, G., (1) 647, 699
Cullen, D., (1) 647, 691 Dantzig, G.B., (1) 208, 216, 284,
Cullen, F.H., (1) 647, 649, 697 311, 320, 419, 430, 458,
Culver, W.D., (1) 643, 674 475; (2) 254, 715
Cung, V.-D., (3) 751 Darby-Dowman, K., (1) 644-645,
Cunningham, W.H., (1) 85, 142; 679,682
(2) 99 Darrow, RM., (2) 723
Cunto, E., (1) 647, 650, 697 Das, G., (2) 28, 29
CurieII.J., (2) 99 DasGupta, B., (2) 73, 74, 76
Curiel, 1., (2) 102 Daskin, M., (1) 653-655, 658, 719-
Current, J., (1) 652, 653, 655, 717- 721
719 Daskin, M.S., (1) 647-648, 652, 705,
Curry, RE., (2) 35, 715 733; (2) 715
Curtis, A.R., (2) 362 Daughety, A., (1) 652, 716
Cusworth, S.D., (3) 236, 238 Dauzere-Peres, S., (3) 122, 137, 138
Cvijovic, D., (2) 519; (3) 326 Davani, A., (1) 644, 678
Cypher, R.E., (3) 614 Davis, E.W., (3) 138
Davis, J.S., (3) 138
d'Atri, A., (1) 117,141; (3) 397 Davis, L., (2) 519; (3) 326, 450,
d'Atri, G., (1) 352, 418 715
Dabrowski, J., (2) 376 Davis, M., (1) 81, 143, 200, 284;
Daescu, 0., (2) 29 (2) 99
Daganzo, C., (1) 647, 649, 697 Davis, P.S., (1) 652, 720
Dahl, R, (1) , 647, 649, 701 Davis, RP., (1) 653, 717
Dailey, D.P., (2) 363 Day, RH., (1) 646, 688
Dakin, M., (1) 654, 732 Day, W.H.E., (2) 73,323,324
Dalberto, L., (1) 647, 650, 693 De Berg, M., (2) 31
794 Author Index of Volumes 1-3

De Bruijin, N.G., (2) 584 Dergis, U., (3) 324


De Bruijn Digraph, (2) 585 Derochers, M., (1) 648, 704
De Jaenisch, C.F., (3) 397 Desai, N., (2) 383
De la Croix Vaubois, (2) 518 Desauliniers, G., (2) 715
de Loera, J., (1) 568, 570 Descartes, B., (2) 363
de

You might also like