Global Optimization Algorithms
– Theory and Application –
3
r
d
E
d
Author: Thomas Weise
tweise@gmx.de
Edition: third
Version: 2011120715:31
Newest Version: http://www.itweise.de/projects/book.pdf
Sources/CVSRepository: goataa.cvs.sourceforge.net
2
Don
,
tprintme!
Also think about
the trees that
would have to
dieforthepaper!
This book is much more
useful as electronic resource:
youcansearchtermsandclick
links. You can t do that in a
printedbook.
,
Thebookisfrequentlyupdatedandimprovedas
well.Printedversionswilljustoutdate.
Don
,
tprintme!
Preface
This ebook is devoted to Global Optimization algorithms, which are methods for ﬁnding
solutions of high quality for an incredible wide range of problems. We introduce the basic
concepts of optimization and discuss features which make optimization problems diﬃcult
and thus, should be considered when trying to solve them. In this book, we focus on
metaheuristic approaches like Evolutionary Computation, Simulated Annealing, Extremal
Optimization, Tabu Search, and Random Optimization. Especially the Evolutionary Com
putation methods, which subsume Evolutionary Algorithms, Genetic Algorithms, Genetic
Programming, Learning Classiﬁer Systems, Evolution Strategy, Diﬀerential Evolution, Par
ticle Swarm Optimization, and Ant Colony Optimization, are discussed in detail.
In this third edition, we try to make a transition from a pure material collection and
compendium to a more structured book. We try to address two major audience groups:
1. Our book may help students, since we try to describe the algorithms in an under
standable, consistent way. Therefore, we also provide the fundamentals and much
background knowledge. You can ﬁnd (short and simpliﬁed) summaries on stochastic
theory and theoretical computer science in Part VI on page 638. Additionally, appli
cation examples are provided which give an idea how problems can be tackled with
the diﬀerent techniques and what results can be expected.
2. Fellow researchers and PhD students may ﬁnd the application examples helpful too.
For them, indepth discussions on the single approaches are included that are supported
with a large set of useful literature references.
The contents of this book are divided into three parts. In the ﬁrst part, diﬀerent op
timization technologies will be introduced and their features are described. Often, small
examples will be given in order to ease understanding. In the second part starting at page
530, we elaborate on diﬀerent application examples in detail. Finally, in the last part fol
lowing at page 638, the aforementioned background knowledge is provided.
In order to maximize the utility of this electronic book, it contains automatic, clickable
links. They are shaded with dark gray so the book is still b/w printable. You can click on
1. entries in the table of contents,
2. citation references like “Heitk¨otter and Beasley [1220]”,
3. page references like “253”,
4. references such as “see Figure 28.1 on page 254” to sections, ﬁgures, tables, and listings,
and
5. URLs and links like “http://www.lania.mx/
˜
ccoello/EMOO/ [accessed 20071025]”.
1
1
URLs are usually annotated with the date we have accessed them, like http://www.lania.mx/
˜
ccoello/EMOO/ [accessed 20071025]. We can neither guarantee that their content remains unchanged, nor
that these sites stay available. We also assume no responsibility for anything we linked to.
4 PREFACE
The following scenario is an example for using the book: A student reads the text and
ﬁnds a passage that she wants to investigate indepth. She clicks on a citation which seems
interesting and the corresponding reference is shown. To some of the references which are
online available, links are provided in the reference text. By clicking on such a link, the
Adobe Reader
R
2
will open another window and load the regarding document (or a browser
window of a site that links to the document). After reading it, the student may use the
“backwards” button in the Acrobat Reader’s navigation utility to go back to the text initially
read in the ebook.
If this book contains something you want to cite or reference in your work, please use
the citation suggestion provided in Chapter A on page 945. Also, I would be very happy
if you provide feedback, report errors or missing things that you have (or have not) found,
criticize something, or have any additional ideas or suggestions. Do not hesitate to contact
me via my email address tweise@gmx.de. Matter of fact, a large number of people helped
me to improve this book over time. I have enumerated the most important contributors in
Chapter D – Thank you guys, I really appreciate your help! At many places in this book
we refer to Wikipedia – The Free Encyclopedia [2938] which is a great source of knowledge.
Wikipedia – The Free Encyclopedia contains articles and deﬁnitions for many of the aspects
discussed in this book. Like this book, it is updated and improved frequently. Therefore,
including the links adds greatly to the book’s utility, in my opinion.
The updates and improvements will result in new versions of the book, which will
regularly appear at http://www.itweise.de/projects/book.pdf. The complete
L
A
T
E
X source code of this book, including all graphics and the bibliography, is hosted
at SourceForge under http://sourceforge.net/projects/goataa/ in the CVS
repository goataa.cvs.sourceforge.net. You can browse the repository under
http://goataa.cvs.sourceforge.net/goataa/ and anonymously download the
complete sources of it by using the CVS command given in Listing 1.
3
cvs z3 d:pserver:anonymous@goataa.cvs.sourceforge.net:/cvsroot/goataa
checkout P Book
Listing 1: The CVS command for anonymously checking out the complete book sources.
Compiling the sources requires multiple runs of L
A
T
E
X, BibT
E
X, and makeindex because of
the nifty way the references are incorporated. In the repository, an AntScript is provided for
doing this. Also, the complete bibliography of this book is stored in a MySQL
4
database and
the scripts for creating this database as well as tools for querying it (Java) and a Microsoft
Access
5
frontend are part of the sources you can download. These resources as well as all
contents of this book (unless explicitly stated otherwise) are licensed under the GNU Free
Documentation License (FDL, see Chapter B on page 947). Some of the algorithms provided
are made available under the LGPL license (see Chapter C on page 955)
Copyright c _ 2011 Thomas Weise.
Permission is granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License, Version 1.3 or any later version published by the Free
Software Foundation; with no Invariant Sections, no FrontCover Texts, and no BackCover
Texts. A copy of the license is included in the Chapter B on page 947 entitled “GNU Free
Documentation License (FDL)”.
2
The Adobe Reader
R
is available for download at http://www.adobe.com/products/reader/ [ac
cessed 20070813].
3
More information can be found at http://sourceforge.net/scm/?type=cvs&group_id=264352
4
http://en.wikipedia.org/wiki/MySQL [accessed 20100729]
5
http://en.wikipedia.org/wiki/Microsoft_Access [accessed 20100729]
Contents
Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Part I Foundations
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1 What is Global Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1.2 Algorithms and Programs . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1.3 Optimization versus Dedicated Algorithms . . . . . . . . . . . . . . . 24
1.1.4 Structure of this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Types of Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2.1 Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . 24
1.2.2 Numerical Optimization Problems . . . . . . . . . . . . . . . . . . . . 27
1.2.3 Discrete and Mixed Problems . . . . . . . . . . . . . . . . . . . . . . 29
1.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3 Classes of Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.1 Algorithm Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.2 Monte Carlo Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.3.3 Heuristics and Metaheuristics . . . . . . . . . . . . . . . . . . . . . . 34
1.3.4 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.5 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4 Classiﬁcation According to Optimization Time . . . . . . . . . . . . . . . . . 37
1.5 Number of Optimization Criteria . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2 Problem Space and Objective Functions . . . . . . . . . . . . . . . . . . . . 43
2.1 Problem Space: How does it look like? . . . . . . . . . . . . . . . . . . . . . 43
2.2 Objective Functions: Is it good? . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Optima: What does good mean? . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 Single Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Extrema: Minima and Maxima of Diﬀerentiable Functions . . . . . . 52
3.1.2 Global Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Multiple Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Multiple Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 CONTENTS
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Lexicographic Optimization . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.3 Weighted Sums (Linear Aggregation) . . . . . . . . . . . . . . . . . . 62
3.3.4 Weighted MinMax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.5 Pareto Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Genome and Phenomebased Approaches . . . . . . . . . . . . . . . . 71
3.4.2 Death Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.3 Penalty Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.4 Constraints as Additional Objectives . . . . . . . . . . . . . . . . . . 72
3.4.5 The Method Of Inequalities . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.6 ConstrainedDomination . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4.7 Limitations and Other Methods . . . . . . . . . . . . . . . . . . . . . 75
3.5 Unifying Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 External Decision Maker . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Prevalence Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Search Space and Operators: How can we ﬁnd it? . . . . . . . . . . . . . . 81
4.1 The Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 The Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 The Connection between Search and Problem Space . . . . . . . . . . . . . . 85
4.4 Local Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.1 Local Optima of singleobjective Problems . . . . . . . . . . . . . . . 89
4.4.2 Local Optima of MultiObjective Problems . . . . . . . . . . . . . . . 89
4.5 Further Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Fitness and Problem Landscape: How does the Optimizer see it? . . . . 93
5.1 Fitness Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Fitness as a Relative Measure of Utility . . . . . . . . . . . . . . . . . . . . . 94
5.3 Problem Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 The Structure of Optimization: Putting it together. . . . . . . . . . . . . 101
6.1 Involved Spaces and Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Other General Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.1 Gradient Descend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.2 Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.3 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.4 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.5 Modeling and Simulating . . . . . . . . . . . . . . . . . . . . . . . . . 106
7 Solving an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . 109
8 Baseline Search Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.1 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.2.1 Adaptive Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Exhaustive Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9 Forma Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
CONTENTS 7
10 General Information on Optimization . . . . . . . . . . . . . . . . . . . . . . 123
10.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Part II Diﬃculties in Optimization
11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
12 Problem Hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.1 Algorithmic Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.2 Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.2.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.2.2 T, AT, Hardness, and Completeness . . . . . . . . . . . . . . . . . . 146
12.3 The Problem: Many RealWorld Tasks are AThard . . . . . . . . . . . . . 147
12.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
13 Unsatisfying Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
13.2 The Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.1 Premature Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.2 NonUniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.3 Domino Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
13.3 One Cause: Loss of Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
13.3.1 Exploration vs. Exploitation . . . . . . . . . . . . . . . . . . . . . . . 155
13.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.1 Search Operator Design . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.2 Restarting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.3 Low Selection Pressure and Population Size . . . . . . . . . . . . . . 157
13.4.4 Sharing, Niching, and Clearing . . . . . . . . . . . . . . . . . . . . . . 157
13.4.5 SelfAdaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
13.4.6 MultiObjectivization . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
14 Ruggedness and Weak Causality . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.1 The Problem: Ruggedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.2 One Cause: Weak Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.3 Fitness Landscape Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.3.1 Autocorrelation and Correlation Length . . . . . . . . . . . . . . . . . 163
14.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
15 Deceptiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
16 Neutrality and Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
16.2 Evolvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
16.3 Neutrality: Problematic and Beneﬁcial . . . . . . . . . . . . . . . . . . . . . 172
16.4 Neutral Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
16.5 Redundancy: Problematic and Beneﬁcial . . . . . . . . . . . . . . . . . . . . 174
16.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16.7 NeedleInAHaystack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8 CONTENTS
17 Epistasis, Pleiotropy, and Separability . . . . . . . . . . . . . . . . . . . . . 177
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.1.1 Epistasis and Pleiotropy . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.1.2 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
17.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.1 Choice of the Representation . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.2 Parameter Tweaking . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
17.3.3 Linkage Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
17.3.4 Cooperative Coevolution . . . . . . . . . . . . . . . . . . . . . . . . . 184
18 Noise and Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.1 Introduction – Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.2 The Problem: Need for Robustness . . . . . . . . . . . . . . . . . . . . . . . 188
18.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
19 Overﬁtting and Oversimpliﬁcation . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1 Overﬁtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
19.2 Oversimpliﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.2.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.2.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
20 Dimensionality (Objective Functions) . . . . . . . . . . . . . . . . . . . . . . 197
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
20.2 The Problem: ManyObjective Optimization . . . . . . . . . . . . . . . . . . 198
20.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.1 Increasing Population Size . . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.2 Increasing Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.3 Indicator Functionbased Approaches . . . . . . . . . . . . . . . . . . 201
20.3.4 Scalerizing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 202
20.3.5 Limiting the Search Area in the Objective Space . . . . . . . . . . . . 202
20.3.6 Visualization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 202
21 Scale (Decision Variables) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
21.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
21.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
21.2.1 Parallelization and Distribution . . . . . . . . . . . . . . . . . . . . . 204
21.2.2 Genetic Representation and Development . . . . . . . . . . . . . . . . 204
21.2.3 Exploiting Separability . . . . . . . . . . . . . . . . . . . . . . . . . . 204
21.2.4 Combination of Techniques . . . . . . . . . . . . . . . . . . . . . . . . 205
22 Dynamically Changing Fitness Landscape . . . . . . . . . . . . . . . . . . . 207
23 The No Free Lunch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
23.1 Initial Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
23.2 The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
23.3 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
23.4 Inﬁnite and Continuous Domains . . . . . . . . . . . . . . . . . . . . . . . . 213
23.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
CONTENTS 9
24 Lessons Learned: Designing Good Encodings . . . . . . . . . . . . . . . . . 215
24.1 Compact Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
24.2 Unbiased Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
24.3 Surjective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.4 Injective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.5 Consistent GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.6 Formae Inheritance and Preservation . . . . . . . . . . . . . . . . . . . . . . 217
24.7 Formae in Genotypic Space Aligned with Phenotypic Formae . . . . . . . . . 217
24.8 Compatibility of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
24.9 Representation with Causality . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.10Combinations of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.11Reachability of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.12Inﬂuence of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.13Scalability of Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.14Appropriate Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.15Indirect Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.15.1Extradimensional Bypass: Example for Good Complexity . . . . . . . 219
Part III Metaheuristic Optimization Algorithms
25 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
25.1 General Information on Metaheuristics . . . . . . . . . . . . . . . . . . . . . 226
25.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 226
25.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
25.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 227
25.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
26 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.3 MultiObjective Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.4 Problems in Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
26.5 Hill Climbing With Random Restarts . . . . . . . . . . . . . . . . . . . . . . 232
26.6 General Information on Hill Climbing . . . . . . . . . . . . . . . . . . . . . . 234
26.6.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
26.7 Raindrop Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
26.7.1 General Information on Raindrop Method . . . . . . . . . . . . . . . 236
27 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.1.1 Metropolis Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
27.2.1 No Boltzmann’s Constant . . . . . . . . . . . . . . . . . . . . . . . . . 246
27.2.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
27.3 Temperature Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
27.3.1 Logarithmic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 248
27.3.2 Exponential Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 249
27.3.3 Polynomial and Linear Scheduling . . . . . . . . . . . . . . . . . . . . 249
27.3.4 Adaptive Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
27.3.5 Larger Step Widths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10 CONTENTS
27.4 General Information on Simulated Annealing . . . . . . . . . . . . . . . . . . 250
27.4.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
28 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
28.1.1 The Basic Cycle of EAs . . . . . . . . . . . . . . . . . . . . . . . . . . 254
28.1.2 Biological and Artiﬁcial Evolution . . . . . . . . . . . . . . . . . . . . 255
28.1.3 Historical Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 263
28.1.4 Populations in Evolutionary Algorithms . . . . . . . . . . . . . . . . . 265
28.1.5 Conﬁguration Parameters of Evolutionary Algorithms . . . . . . . . . 269
28.2 GenotypePhenotype Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 270
28.2.1 Direct Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
28.2.2 Indirect Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
28.3 Fitness Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
28.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
28.3.2 Weighted Sum Fitness Assignment . . . . . . . . . . . . . . . . . . . . 275
28.3.3 Pareto Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
28.3.4 Sharing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
28.3.5 Variety Preserving Fitness Assignment . . . . . . . . . . . . . . . . . 279
28.3.6 Tournament Fitness Assignment . . . . . . . . . . . . . . . . . . . . . 284
28.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
28.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
28.4.2 Truncation Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
28.4.3 Fitness Proportionate Selection . . . . . . . . . . . . . . . . . . . . . 290
28.4.4 Tournament Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
28.4.5 Linear Ranking Selection . . . . . . . . . . . . . . . . . . . . . . . . . 300
28.4.6 Random Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
28.4.7 Clearing and Simple Convergence Prevention (SCP) . . . . . . . . . . 302
28.5 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
28.6 Maintaining a Set of NonDominated/NonPrevailed Individuals . . . . . . . 308
28.6.1 Updating the Optimal Set . . . . . . . . . . . . . . . . . . . . . . . . 308
28.6.2 Obtaining NonPrevailed Elements . . . . . . . . . . . . . . . . . . . . 309
28.6.3 Pruning the Optimal Set . . . . . . . . . . . . . . . . . . . . . . . . . 311
28.7 General Information on Evolutionary Algorithms . . . . . . . . . . . . . . . 314
28.7.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 314
28.7.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
28.7.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 317
28.7.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
29 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.2 Genomes in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 326
29.2.1 Classiﬁcation According to Length . . . . . . . . . . . . . . . . . . . . 326
29.2.2 Classiﬁcation According to Element Type . . . . . . . . . . . . . . . . 327
29.2.3 Classiﬁcation According to Meaning of Loci . . . . . . . . . . . . . . 327
29.2.4 Introns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
29.3 FixedLength String Chromosomes . . . . . . . . . . . . . . . . . . . . . . . 328
29.3.1 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 328
29.3.2 Mutation: Unary Reproduction . . . . . . . . . . . . . . . . . . . . . 330
29.3.3 Permutation: Unary Reproduction . . . . . . . . . . . . . . . . . . . . 333
CONTENTS 11
29.3.4 Crossover: Binary Reproduction . . . . . . . . . . . . . . . . . . . . . 335
29.4 VariableLength String Chromosomes . . . . . . . . . . . . . . . . . . . . . . 339
29.4.1 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 340
29.4.2 Insertion and Deletion: Unary Reproduction . . . . . . . . . . . . . . 340
29.4.3 Crossover: Binary Reproduction . . . . . . . . . . . . . . . . . . . . . 340
29.5 Schema Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
29.5.1 Schemata and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
29.5.2 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
29.5.3 Holland’s Schema Theorem . . . . . . . . . . . . . . . . . . . . . . . . 342
29.5.4 Criticism of the Schema Theorem . . . . . . . . . . . . . . . . . . . . 345
29.5.5 The Building Block Hypothesis . . . . . . . . . . . . . . . . . . . . . 346
29.5.6 Genetic Repair and Similarity Extraction . . . . . . . . . . . . . . . . 346
29.6 The Messy Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.2 Reproduction Operations . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.3 Overspeciﬁcation and Underspeciﬁcation . . . . . . . . . . . . . . . . 348
29.6.4 The Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
29.7 Random Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.8 General Information on Genetic Algorithms . . . . . . . . . . . . . . . . . . 351
29.8.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 351
29.8.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
29.8.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 353
29.8.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
30 Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
30.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
30.3 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
30.3.1 Dominant Recombination . . . . . . . . . . . . . . . . . . . . . . . . . 362
30.3.2 Intermediate Recombination . . . . . . . . . . . . . . . . . . . . . . . 362
30.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
30.4.1 Using Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . 364
30.5 Parameter Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
30.5.1 The 1/5th Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
30.5.2 Endogenous Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 370
30.6 General Information on Evolution Strategies . . . . . . . . . . . . . . . . . . 373
30.6.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
31 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.2 General Information on Genetic Programming . . . . . . . . . . . . . . . . . 382
31.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 382
31.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
31.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 383
31.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3 (Standard) Tree Genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3.2 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 389
31.3.3 Node Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
31.3.4 Unary Reproduction Operations . . . . . . . . . . . . . . . . . . . . . 393
12 CONTENTS
31.3.5 Recombination: Binary Reproduction . . . . . . . . . . . . . . . . . . 398
31.3.6 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
31.4 Linear Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
31.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
31.4.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . 404
31.4.3 Realizations and Implementations . . . . . . . . . . . . . . . . . . . . 405
31.4.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
31.4.5 General Information on Linear Genetic Programming . . . . . . . . . 408
31.5 Grammars in Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . 409
31.5.1 General Information on GrammarGuided Genetic Programming . . . 409
31.6 Graphbased Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
31.6.1 General Information on Grahpbased Genetic Programming . . . . . 409
32 Evolutionary Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
32.1 General Information on Evolutionary Programming . . . . . . . . . . . . . . 414
32.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
33 Diﬀerential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.2 Ternary Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.2.1 Advanced Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
33.3 General Information on Diﬀerential Evolution . . . . . . . . . . . . . . . . . 421
33.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
34 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . 427
34.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
34.2 General Information on Estimation Of Distribution Algorithms . . . . . . . 428
34.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 428
34.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
34.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 429
34.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
34.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
34.4 EDAs Searching Bit Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
34.4.1 UMDA: Univariate Marginal Distribution Algorithm . . . . . . . . . 434
34.4.2 PBIL: PopulationBased Incremental Learning . . . . . . . . . . . . . 438
34.4.3 cGA: Compact Genetic Algorithm . . . . . . . . . . . . . . . . . . . . 439
34.5 EDAs Searching Real Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 442
34.5.1 SHCLVND: Stochastic Hill Climbing with Learning by Vectors of
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
34.5.2 PBIL
C
: Continuous PBIL . . . . . . . . . . . . . . . . . . . . . . . . 444
34.5.3 RealCoded PBIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
34.6 EDAs Searching Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
34.6.1 PIPE: Probabilistic Incremental Program Evolution . . . . . . . . . . 447
34.7 Diﬃculties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
34.7.1 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
34.7.2 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
CONTENTS 13
35 Learning Classiﬁer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
35.2 Families of Learning Classiﬁer Systems . . . . . . . . . . . . . . . . . . . . . 457
35.3 General Information on Learning Classiﬁer Systems . . . . . . . . . . . . . . 459
35.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
36 Memetic and Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 463
36.1 Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
36.2 Lamarckian Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
36.3 Baldwin Eﬀect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
36.4 General Information on Memetic Algorithms . . . . . . . . . . . . . . . . . . 467
36.4.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
37 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
37.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
37.2 General Information on Ant Colony Optimization . . . . . . . . . . . . . . . 473
37.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
38 River Formation Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
38.1 General Information on River Formation Dynamics . . . . . . . . . . . . . . 478
38.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
39 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
39.2 The Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . 481
39.2.1 Communication with Neighbors – Social Interaction . . . . . . . . . . 482
39.2.2 Particle Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
39.2.3 Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
39.3 General Information on Particle Swarm Optimization . . . . . . . . . . . . . 483
39.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
40 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
40.1 General Information on Tabu Search . . . . . . . . . . . . . . . . . . . . . . 490
40.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
14 CONTENTS
41 Extremal Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1.1 SelfOrganized Criticality . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1.2 The BakSneppen model of Evolution . . . . . . . . . . . . . . . . . . 493
41.2 Extremal Optimization and Generalized Extremal Optimization . . . . . . . 494
41.3 General Information on Extremal Optimization . . . . . . . . . . . . . . . . 496
41.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
42 GRASPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
42.1 General Information on GRAPSs . . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
43 Downhill Simplex (Nelder and Mead) . . . . . . . . . . . . . . . . . . . . . . 501
43.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
43.2 General Information on Downhill Simplex . . . . . . . . . . . . . . . . . . . 502
43.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 502
43.2.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 502
43.2.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
44 Random Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
44.1 General Information on Random Optimization . . . . . . . . . . . . . . . . . 506
44.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 506
44.1.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 506
44.1.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Part IV NonMetaheuristic Optimization Algorithms
45 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
46 State Space Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
46.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
46.1.1 The Baseline: A Binary Goal Criterion . . . . . . . . . . . . . . . . . 511
46.1.2 State Space and Neighborhood Search . . . . . . . . . . . . . . . . . . 511
46.1.3 The Search Space as Graph . . . . . . . . . . . . . . . . . . . . . . . . 512
46.1.4 Key Eﬃciency Features . . . . . . . . . . . . . . . . . . . . . . . . . . 514
46.2 Uninformed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
46.2.1 BreadthFirst Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
46.2.2 DepthFirst Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
46.2.3 DepthLimited Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
46.2.4 Iteratively Deepening DepthFirst Search . . . . . . . . . . . . . . . . 516
46.2.5 General Information on Uninformed Search . . . . . . . . . . . . . . . 517
46.3 Informed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
46.3.1 Greedy Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
46.3.2 A
⋆
Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
46.3.3 General Information on Informed Search . . . . . . . . . . . . . . . . 520
CONTENTS 15
47 Branch And Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
47.1 General Information on Branch And Bound . . . . . . . . . . . . . . . . . . 524
47.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 524
47.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
47.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 524
48 CuttingPlane Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
48.1 General Information on CuttingPlane Method . . . . . . . . . . . . . . . . . 528
48.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 528
48.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
48.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 528
Part V Applications
49 RealWorld Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
49.1 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
49.1.1 Genetic Programming: Genome for Symbolic Regression . . . . . . . 531
49.1.2 Sample Data, Quality, and Estimation Theory . . . . . . . . . . . . . 532
49.1.3 Limits of Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . 535
49.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.2.1 Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.3 Freight Transportation Planning . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
49.3.2 Vehicle Routing in Theory and Practice . . . . . . . . . . . . . . . . . 538
49.3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
49.3.4 Evolutionary Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 542
49.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
49.3.6 Holistic Approach to Logistics . . . . . . . . . . . . . . . . . . . . . . 549
49.3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
50 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.2 BitString based Problem Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.2.1 Kauﬀman’s NK Fitness Landscapes . . . . . . . . . . . . . . . . . . . 553
50.2.2 The pSpin Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
50.2.3 The ND Family of Fitness Landscapes . . . . . . . . . . . . . . . . . . 557
50.2.4 The Royal Road . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
50.2.5 OneMax and BinInt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
50.2.6 Long Path Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
50.2.7 Tunable Model for Problematic Phenomena . . . . . . . . . . . . . . 566
50.3 Real Problem Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
50.3.1 SingleObjective Optimization . . . . . . . . . . . . . . . . . . . . . . 580
50.4 ClosetoReal Vehicle Routing Problem Benchmark . . . . . . . . . . . . . . 621
50.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
50.4.2 Involved Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
50.4.3 Problem Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
50.4.4 Objectives and Constraints . . . . . . . . . . . . . . . . . . . . . . . . 634
50.4.5 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Part VI Background
16 CONTENTS
51 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
51.1 Set Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
51.2 Special Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
51.3 Relations between Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
51.4 Operations on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
51.5 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
51.6 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
51.7 Binary Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
51.8 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
51.9 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
51.10Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
51.10.1Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
51.11Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
51.11.1Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
51.11.2Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
51.11.3Transformations between Sets and Lists . . . . . . . . . . . . . . . . . 650
52 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
53 Stochastic Theory and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 653
53.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
53.1.1 Probabily as deﬁned by Bernoulli (1713) . . . . . . . . . . . . . . . . 654
53.1.2 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
53.1.3 The Limiting Frequency Theory of von Mises . . . . . . . . . . . . . . 655
53.1.4 The Axioms of Kolmogorov . . . . . . . . . . . . . . . . . . . . . . . . 656
53.1.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 657
53.1.6 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
53.1.7 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . 658
53.1.8 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . 659
53.1.9 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . 659
53.2 Parameters of Distributions and their Estimates . . . . . . . . . . . . . . . . 659
53.2.1 Count, Min, Max and Range . . . . . . . . . . . . . . . . . . . . . . . 660
53.2.2 Expected Value and Arithmetic Mean . . . . . . . . . . . . . . . . . . 661
53.2.3 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . 662
53.2.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
53.2.5 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . 665
53.2.6 Median, Quantiles, and Mode . . . . . . . . . . . . . . . . . . . . . . 665
53.2.7 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
53.2.8 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . 668
53.3 Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
53.3.1 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . 668
53.3.2 Poisson Distribution π
λ
. . . . . . . . . . . . . . . . . . . . . . . . . . 671
53.3.3 Binomial Distribution B(n, p) . . . . . . . . . . . . . . . . . . . . . . 674
53.4 Some Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 676
53.4.1 Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . 676
53.4.2 Normal Distribution N
_
µ, σ
2
_
. . . . . . . . . . . . . . . . . . . . . . 678
53.4.3 Exponential Distribution exp(λ) . . . . . . . . . . . . . . . . . . . . . 682
53.4.4 Chisquare Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 684
53.4.5 Student’s tDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . 687
53.5 Example – Throwing a Dice . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
53.6 Estimation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
53.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
53.6.2 Likelihood and Maximum Likelihood Estimators . . . . . . . . . . . . 693
53.6.3 Conﬁdence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
CONTENTS 17
53.7 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
53.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Part VII Implementation
54 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
55 The Speciﬁcation Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
55.1 General Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
55.1.1 Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
55.1.2 GenotypePhenotype Mapping . . . . . . . . . . . . . . . . . . . . . . 711
55.1.3 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
55.1.4 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
55.1.5 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 713
55.2 Algorithm Speciﬁc Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
55.2.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 717
55.2.2 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . 719
56 The Implementation Package . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
56.1 The Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 724
56.1.1 Algorithm Base Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 724
56.1.2 Baseline Search Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 729
56.1.3 Local Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 735
56.1.4 Populationbased Metaheuristics . . . . . . . . . . . . . . . . . . . . . 746
56.2 Search Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
56.2.1 Operations for Stringbased Search Spaces . . . . . . . . . . . . . . . 784
56.2.2 Operations for Treebased Search Spaces . . . . . . . . . . . . . . . . 809
56.2.3 Multiplexing Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 818
56.3 Data Structures for Special Search Spaces . . . . . . . . . . . . . . . . . . . 824
56.3.1 Treebased Search Spaces as used in Genetic Programming . . . . . . 824
56.4 GenotypePhenotype Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 833
56.5 Termination Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
56.6 Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
56.7 Utility Classes, Constants, and Routines . . . . . . . . . . . . . . . . . . . . 839
57 Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
57.1 Demos for Stringbased Problem Spaces . . . . . . . . . . . . . . . . . . . . . 859
57.1.1 RealVector based Problem Spaces X ⊆ R
n
. . . . . . . . . . . . . . . 859
57.1.2 Permutationbased Problem Spaces . . . . . . . . . . . . . . . . . . . 887
57.2 Demos for Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . 905
57.2.1 Demos for Genetic Programming Applied to Mathematical Problems 905
57.3 The VRP Benchmark Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 914
57.3.1 The Involved Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 914
57.3.2 The Optimization Utilities . . . . . . . . . . . . . . . . . . . . . . . . 925
Appendices
A Citation Suggestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
18 CONTENTS
B GNU Free Documentation License (FDL) . . . . . . . . . . . . . . . . . . . 947
B.1 Applicability and Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 947
B.2 Verbatim Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.3 Copying in Quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.4 Modiﬁcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.5 Combining Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
B.6 Collections of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
B.7 Aggregation with Independent Works . . . . . . . . . . . . . . . . . . . . . . 951
B.8 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.9 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.10 Future Revisions of this License . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.11 Relicensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
C GNU Lesser General Public License (LGPL) . . . . . . . . . . . . . . . . . 955
C.1 Additional Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
C.2 Exception to Section 3 of the GNU GPL . . . . . . . . . . . . . . . . . . . . 955
C.3 Conveying Modiﬁed Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
C.4 Object Code Incorporating Material from Library Header Files . . . . . . . . 956
C.5 Combined Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
C.6 Combined Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
C.7 Revised Versions of the GNU Lesser General Public License . . . . . . . . . 957
D Credits and Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217
List of Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221
Part I
Foundations
20 CONTENTS
Chapter 1
Introduction
1.1 What is Global Optimization?
One of the most fundamental principles in our world is the search for optimal states. It begins
in the microcosm in physics, where atom bonds
1
minimize the energy of electrons [2137].
When molecules combine to solid bodies during the process of freezing, they assume crystal
structures which come close to energyoptimal conﬁgurations. Such a natural optimization
process, of course, is not driven by any higher intention but the result from the laws of
physics.
The same goes for the biological principle of survival of the ﬁttest [2571] which, together
with the biological evolution [684], leads to better adaptation of the species to their environ
ment. Here, a local optimum is a welladapted species that can maintain a stable population
in its niche.
biggest...
smallest...
fastest...
cheapest...
...withleastenergy
mostprecise...
mostsimilarto...
maximal...
fastest...
mostrobust...
...thatfulfillsalltimingconstraints
...withleastaerodynamicdrag
...onthesmallestpossiblearea
besttradeoffsbetween....
minimal...
Figure 1.1: Terms that hint: “Optimization Problem” (inspired by [428])
1
http://en.wikipedia.org/wiki/Chemical_bond [accessed 20070712]
22 1 INTRODUCTION
As long as humankind exists, we strive for perfection in many areas. As sketched in
Figure 1.1 there exists an incredible variety of diﬀerent terms that indicate that a situation
is actually an optimization problem. For example, we want to reach a maximum degree
of happiness with the least amount of eﬀort. In our economy, proﬁt and sales must be
maximized and costs should be as low as possible. Therefore, optimization is one of the oldest
sciences which even extends into daily life [2023]. We also can observe the phenomenon of
tradeoﬀs between diﬀerent criteria which are also common in many other problem domains:
One person may choose to work hard with the goal of securing stability and pursuing wealth
for her future life, accepting very little spare time in its current stage. Another person can,
deliberately, choose a lax life full of joy in the current phase but rather unsecure in the
long term. Similar, most optimization problems in realworld applications involve multiple,
usually conﬂicting, objectives.
Furthermore, many optimization problems involve certain constraints. The guy with the
relaxed lifestyle, for instance, regardless how lazy he might become, cannot waive dragging
himself to the supermarket from time to time in order to acquire food. The eager beaver, on
the other hand, will not be able to keep up with 18 hours of work for seven days per week in
the long run without damaging her health considerably and thus, rendering the goal of an
enjoyable future unattainable. Likewise, the parameters of the solutions of an optimization
problem may be constrained as well.
1.1.1 Optimization
Since optimization covers a wide variety of subject areas, there exist diﬀerent points of view
on what it actually is. We approach this question by ﬁrst clarifying the goal of optimization:
solving optimization problems.
Deﬁnition D1.1 (Optimization Problem (Economical View)). An optimization prob
lem is a situation which requires deciding for one choice from a set of possible alternatives
in order to reach a predeﬁned/required beneﬁt at minimal costs.
Deﬁnition D1.2 (Optimization Problem (Simpliﬁed Mathematical View)). Solving
an optimization problem requires ﬁnding an input value x
⋆
for which a mathematical func
tion f takes on the smallest possible value (while usually obeying to some restrictions on the
possible values of x
⋆
).
Every task which has the goal of approaching certain conﬁgurations considered as optimal in
the context of predeﬁned criteria can be viewed as an optimization problem. After giving
these rough deﬁnitions which we will later reﬁne in Section 6.2 on page 103 we can also
specify the topic of this book:
Deﬁnition D1.3 (Optimization). Optimization is the process of solving an optimization
problem, i. e., ﬁnding suitable solutions for it.
Deﬁnition D1.4 (Global Optimization). Global optimization is optimization with the
goal of ﬁnding solutions x
⋆
for a given optimization problem which have the property that
no other, better solutions exist.
If something is important, general, and abstract enough, there is always a mathemati
cal discipline dealing with it. Global Optimization
2
[609, 2126, 2181, 2714] is also the
2
http://en.wikipedia.org/wiki/Global_optimization [accessed 20070703]
1.1. WHAT IS GLOBAL OPTIMIZATION? 23
branch of applied mathematics and numerical analysis that focuses on, well, optimization.
Closely related and overlapping areas are mathematical economics
3
[559] and operations
research
4
[580, 1232].
1.1.2 Algorithms and Programs
The title of this book is Global Optimization Algorithms – Theory and Application and
so far, we have clariﬁed what optimization is so the reader has an idea about the general
direction in which we are going to drift. Before giving an overview about the contents to
follow, let us take a quick look into the second component of the book title, algorithms.
The term algorithm comprises essentially all forms of “directives what to do to reach a
certain goal”. A culinary receipt is an algorithm, for example, since it tells how much of
which ingredient is to be added to a meal in which sequence and how everything should
be heated. The commands inside the algorithms can be very concise or very imprecise,
depending on the area of application. How accurate can we, for instance, carry out the
instruction “Add a tablespoon of sugar.”? Hence, algorithms are a very wide ﬁeld that
there exist numerous diﬀerent, rather fuzzy deﬁnitions for the word algorithm [46, 141, 157,
635, 1616]. We base ours on the one given by Wikipedia:
Deﬁnition D1.5 (Algorithm (Wikipedia)). An algorithm
5
is a ﬁnite set of welldeﬁned
instructions for accomplishing some task. It starts in some initial state and usually termi
nates in a ﬁnal state.
The term algorithms is derived from the name of the mathematician Mohammed ibnMusa
alKhwarizmi who was part of the royal court in Baghdad and who lived from about 780 to
850. AlKhwarizmi’s work is the likely source for the word algebra as well. Deﬁnition D1.6
provides a basic understanding about what an optimization algorithm is, a “working hy
pothesis” for the time being. It will later be reﬁned in Deﬁnition D6.3 on page 103.
Deﬁnition D1.6 (Optimization Algorithm). An optimization algorithm is an algo
rithm suitable to solve optimization problems.
An algorithm is a set of directions in a representation which is usually understandable for
human beings and thus, independent from speciﬁc hardware and computers. A program,
on the other hand, is basically an algorithm realized for a given computer or execution
environment.
Deﬁnition D1.7 (Program). A program
6
is a set of instructions that describe a task or
an algorithm to be carried out on a computer.
Therefore, the primitive instructions of the algorithm must be expressed either in a form that
the machine can process directly (machine code
7
[2135]), in a form that can be translated
(1:1) into such code (assembly language [2378], Java byte code [1109], etc.), or in a high
level programming language
8
[1817] which can be translated (n:m) into the latter using
special software (compiler) [1556].
In this book, we will provide several algorithms for solving optimization problems. In
our algorithms, we will try to bridge between abstraction, easiness of reading, and closeness
to the syntax of highlevel programming languages as Java. In the algorithms, we will use
selfexplaining primitives which will still base on clear deﬁnitions (such as the list operations
discussed in Section 51.11 on page 647).
3
http://en.wikipedia.org/wiki/Mathematical_economics [accessed 20090616]
4
http://en.wikipedia.org/wiki/Operations_research [accessed 20090619]
5
http://en.wikipedia.org/wiki/Algorithm [accessed 20070703]
6
http://en.wikipedia.org/wiki/Computer_program [accessed 20070703]
7
http://en.wikipedia.org/wiki/Machine_code [accessed 20070704]
8
http://en.wikipedia.org/wiki/Highlevel_programming_language [accessed 20070703]
24 1 INTRODUCTION
1.1.3 Optimization versus Dedicated Algorithms
During at least ﬁfty ﬁve years, research in computer science led to the development of many
algorithms which solve given problem classes to optimality. Basically, we can distinguish
two kinds of problemsolving algorithms: dedicated algorithms and optimization algorithms.
Dedicated algorithms are specialized to exactly solve a given class of problems in the
shortest possible time. They are most often deterministic and always utilize exact theoretical
knowledge about the problem at hand. Kruskal’s algorithm [1625], for example, is the best
way for searching the minimum spanning tree in a graph with weighted edges. The shortest
paths from a source node to the other nodes in a network can best be found with Dijkstra’s
algorithm [796] and the maximum ﬂow in a ﬂow network can be computed quickly with the
algorithm given by Dinitz [800]. There are many problems for which such fast and eﬃcient
dedicated algorithms exist. For each optimization problem, we should always check ﬁrst if
it can be solved with such a specialized, exact algorithm (see also Chapter 7 on page 109).
However, for many optimization tasks no eﬃcient algorithm exists. The reason can be
that (1) the problem is too speciﬁc – after all, researchers usually develop algorithms only
for problems which are of general interest. (2) Furthermore, for some problems (the current
scientiﬁc opinion is that) no eﬃcient, dedicated can be constructed at all (see Section 12.2.2
on page 146).
For such problems, the more general optimization algorithms are employed. Most often,
these algorithms only need a deﬁnition of the structure of possible solutions and a function
f which tells measures the quality of a candidate solution. Based on this information, they
try to ﬁnd solutions for which f takes on the best values. Optimization methods utilize much
less information than exact algorithms, they are usually slower and cannot provide the same
solution quality. In the cases 1 and 2, however, we have no choice but to use them. And
indeed, there exists a giant number of situations where either Point 1, 2, or both conditions
hold, as you can see in, for example, Table 25.1 on page 226 or 28.4 on page 314. . .
1.1.4 Structure of this Book
In the ﬁrst part of this book, we will ﬁrst discuss the things which constitute an optimization
problem and which are the basic components of each optimization process. In Part II,
we take a look on all the possible aspects which can make an optimization task diﬃcult.
Understanding these issues will allow you to anticipate possible problems when solving an
optimization task and, hopefully, to prevent their occurrence from the start. We then discuss
metaheuristic optimization algorithms (and especially Evolutionary Computation methods)
in Part III. Whereas these optimization approaches are clearly the center of this book, we
also introduce traditional, nonmetaheuristic, numerical, and/or exact methods in Part IV.
In Part V, we then list some examples of applications for which optimization algorithms can
be used. Finally, Part VI is a collection of deﬁnitions and knowledge which can be useful to
understand the terms and formulas used in the rest of the book. It is provided in an eﬀort
to make the book standalone and to allow students to understand it even if they have little
mathematical or theoretical background.
1.2 Types of Optimization Problems
Basically, there are two general types of optimization tasks: combinatorial and numerical
problems.
1.2.1 Combinatorial Optimization Problems
1.2. TYPES OF OPTIMIZATION PROBLEMS 25
Deﬁnition D1.8 (Combinatorial Optimization Problem). Combinatorial optimiza
tion problems
9
[578, 628, 2124, 2430] are problems which are deﬁned over a ﬁnite (or numer
able inﬁnite) discrete problem space X and whose candidate solutions structure can either
be expressed as
1. elements from ﬁnite sets,
2. ﬁnite sequence or permutation of elements x
i
chosen from ﬁnite sets, i. e., x ∈ X ⇒
x = (x
1
, x
2
, . . . ), as
3. sets of elements, i. e., x ∈ X ⇒x = ¦x
1
, x
2
, . . . ¦, as
4. tree or graph structures with node or edge properties stemming from any of the above
types, or
5. any form of nesting, combination, partitions, or subsets of the above.
Combinatorial tasks are, for example, the Traveling Salesman Problem (see Example E2.2 on
page 45 [1429]), Vehicle Routing Problems (see Section 49.3 on page 536 and [2162, 2912]),
graph coloring [1455], graph partitioning [344], scheduling [564], packing [564], and satisﬁa
bility problems [2335]. Algorithms suitable for such problems are, amongst others, Genetic
Algorithms (Chapter 29), Simulated Annealing (Chapter 27), Tabu Search (Chapter 40),
and Extremal Optimization (Chapter 41).
Example E1.1 (Bin Packing Problem).
The bin packing problem
10
[1025] is a combinatorial optimization problem which is deﬁned
a =1
5
a =3
10
a =2
7
a =2
9 a =4
2
a =1
4
a =1
3
a =4
1
a =2
8
a =2
6
b
=
5
bin0 bin1 bin2 bin3 bin4
distributionofn=10
objectsofsizea toa
ink=5binsofsizeb=5
1 10
Figure 1.2: One example instance of a bin packing problem.
as follows: Given are a number k ∈ N
1
of bins, each of which having size b ∈ N
1
, and a
number n ∈ N
1
with the weights a
1
, a
2
, . . . , a
n
. The question to answer is: Can the n objects
be distributed over the k bins in a way that none of the bins ﬂows over? A conﬁguration
which meets this requirement is sketched in Figure 1.2. If another object of size a
11
= 4 was
added, the answer to the question would become No.
The bin packing problem can be formalized to asking whether a mapping x from the
numbers 1..n to 1..k exists so that Equation 1.1 holds. In Equation 1.1, the mapping is
represented as a list x of k sets where the i
th
set contains the indices of the objects to be
packed into the i
th
bin.
∃x : ∀i ∈ 0..k −1 :
_
_
∀j∈x[i]
a
j
_
_
≤ b (1.1)
Finding whether this equation holds or not is known to be ATcomplete. The question
could be transformed into an optimization problem with the goal of to ﬁnd the smallest
9
http://en.wikipedia.org/wiki/Combinatorial_optimization [accessed 20100804]
10
http://en.wikipedia.org/wiki/Bin_packing_problem [accessed 20100827]
26 1 INTRODUCTION
number k of bins of size b which can contain all the objects listed in a. Alternatively, we
could try to ﬁnd the mapping x for which the function f
bp
given in Equation 1.2 takes on
the lowest value, for example.
f
bp
(x) =
k−1
i=0
max
_
_
_
0,
_
_
∀j∈x[i]
a
j
_
_
−b
_
_
_
(1.2)
This problem is known to be AThard. See Section 12.2 for a discussion on
ATcompleteness and AThardness and Task 67 on page 238 for a homework task con
cerned with solving the bin packing problem.
Example E1.2 (Circuit Layout).
Given are a card with a ﬁnite set S of slots, a set C : [C[ ≤ [S[ of circuits to be placed on
A B D J
C
E
F G
H
I
K
L
M
20slots
11
slots
(A,B),(B,C),(C,F),(C,G),(C,M),
(D,E),(D,J),(E,F),(F,K),(G,J),(G,M),
(H,J),(H,I),(H,L),(I,K),(I,L),(K,L),
(L,M)
R={
}
C={A,B,C,D,E,F,G,H,I,J,K,L,M}
S=20 11 ´
Figure 1.3: An example for circuit layouts.
that card, and a connection list R ⊆ C C where (c
1
, c
2
) ∈ R means that the two circuits
c
1
, c
2
∈ C have to be connected. The goal is to ﬁnd a layout x which assigns each circuit
c ∈ C to a location s ∈ S so that the overall length of wiring required to establish all the
connections r ∈ R becomes minimal, under the constraint that no wire must cross another
one. An example for such a scenario is given in Figure 1.3.
Similar circuit layout problems exist in a variety of forms for quite a while now [1541]
and are known to be quite hard. They may be extended with constraints such as to also
minimize the heat emission of the circuit and to minimize crosswire induction and so on.
They may be extended to not only evolve the circuit location but also the circuit types and
numbers [1479, 2792] and thus, can become arbitrarily complex.
Example E1.3 (Job Shop Scheduling).
Job Shop Scheduling
11
(JSS) is considered to be one of the hardest classes of scheduling
problems and is also ATcomplete [1026]. The goal is to distribute a set of tasks to machines
in a way that all deadlines are met.
In a JSS, a set T of jobs t
i
∈ T is given, where each job t
i
consists of a number of
subtasks t
i,j
. A partial order is deﬁned on the subtasks t
i,j
of each job t
i
which describes
the precedence of the subtasks, i. e., constraints in which order they must be processed.
In the case of a beverage factory, 10 000 bottles of cola (t
1
. . . t
10 000
) and 5000 bottles of
lemonade (t
10 001
. . . t
15 000
).
For the cola bottles, the jobs would consist of three subjobs, t
i,1
to t
i,2
for i ∈ 1..10 000,
since the bottles ﬁrst need to labeled, then they are ﬁlled with cola, and ﬁnally closed. It is
clear that the bottles cannot be closed before they are ﬁlled t
i,2
≺ t
i,3
whereas the labeling
11
http://en.wikipedia.org/wiki/Job_Shop_Scheduling [accessed 20100827]
1.2. TYPES OF OPTIMIZATION PROBLEMS 27
can take place at any time. For the lemonade, all production steps are the same as the
ones for the cola bottles, except that step t
i,2
∀i ∈ 10 001..15 000 is replaced with ﬁlling the
bottles with lemonade.
It is furthermore obvious that the diﬀerent subjobs have to be performed by diﬀerent
machines m
i
∈ M. Each of the machines m
i
can only process a single job at a time. Finally,
each of the jobs has a deadline until which it must be ﬁnished.
The optimization algorithm has to ﬁnd a schedule S which deﬁnes for each job when it has
to be executed and on what machine while adhering to the order and deadline constraints.
Usually, the optimization process would start with random conﬁgurations which violate
some of the constraints (typically the deadlines) and step by step reduces the violations
until results are found that fulﬁll all requirements.
The JSS can be considered as a combinatorial problem since the optimizer creates a
sequence and assignment of jobs and the set of possible starting times (which make sense)
is also ﬁnite.
Example E1.4 (Routing).
In a network N composed of “numNodes”N nodes, ﬁnd the shortest path from one node n
1
to another node n
2
. This optimization problem consists of composing a sequence of edges
which form a path from n
1
to n
2
of the shortest length. This combinatorial problem is
rather simple and has been solved by Dijkstra [796] a long time ago. Nevertheless, it is an
optimization problem.
1.2.2 Numerical Optimization Problems
Deﬁnition D1.9 (Numerical Optimization Problem). Numerical optimization prob
lems
12
[1281, 2040] are problems that are deﬁned over problem spaces which are subspaces
of (uncountable inﬁnite) numerical spaces, such as the real or complex vectors (X ⊆ R
n
or
X ⊆ C
n
).
Numerical problems deﬁned over R
n
(or any kind of space containing it) are called contin
uous optimization problems. Function optimization [194], engineering design optimization
tasks [2059], classiﬁcation and data mining tasks [633] are common continuous optimiza
tion problems. They often can eﬃciently be solved with Evolution Strategies (Chapter 30),
Diﬀerential Evolution (Chapter 33), Evolutionary Programming (Chapter 32), Estimation
of Distribution Algorithms (Chapter 34) or Particle Swarm Optimization (Chapter 39), for
example.
Example E1.5 (Function Roots and Minimization).
The task Find the roots of function g(x) deﬁned in Equation 1.3 and sketched in Figure 1.4
can – at least by the author’s limited mathematical capabilities – hardly be solved manually.
g(x) = 5 + ln
¸
¸
sin
_
x
2
+e
x
tan x
_
+ cosx
¸
¸
−arctan [x −3[ (1.3)
However, we can transform it into the optimization problem Minimize the objective function
f
1
(x) = [0 −g(x)[ = [g(x)[ or Minimize the objective function f
2
(x) = (0 −g(x))
2
= (g(x))
2
.
Diﬀerent from the original formulation, the objective functions f
1
and f
2
provide us with a
12
http://en.wikipedia.org/wiki/Optimization_(mathematics) [accessed 20100804]
28 1 INTRODUCTION
4
2
0
2
4
5 0 5
g(x)
x
x
1
x
2
x
3
x
4
Figure 1.4: The graph of g(x) as given in Equation 1.3 with the four roots x
⋆
1
to x
⋆
4
.
measure of how close an element x is to be a valid solution. For the global optima x
⋆
of
the problem, f
1
(x
⋆
) = f
2
(x
⋆
) = 0 holds. Based on this formulation, we can hand any of the
two objective functions to a method capable of solving numerical optimization problems. If
these indeed discover elements x
⋆
with f
1
(x
⋆
) = f
2
(x
⋆
) = 0, we have answered the initial
question since the elements x
⋆
are the roots of g(x).
1.2. TYPES OF OPTIMIZATION PROBLEMS 29
Example E1.6 (Truss Optimization).
In a truss design optimization problem [18] as sketched in Figure 1.5, the locations of n
F F
trussstructure:fixedpoints(left),
possiblebeamlocations(gray),
appliedforce(right,F)
possibleoptimizationresult:
beamthicknesses(darkgray),
unusedbeams(thinlightgray)
optimization
Figure 1.5: A truss optimization problem like the one given in [18].
possible beams (light gray), a few ﬁxed points (left hand side of the truss sketches), and
a point where a weight will be attached (right hand side of the truss sketches,
−→
F ). The
thickness d
i
of each of the i = 1..n beams which leads to the most stable truss whilst not
exceeding a maximum volume of material V would be subject to optimization. A structure
as the one sketched on the right hand side of the ﬁgure could result from a numerical
optimization process: Some beams receive diﬀerent nonzero thicknesses (illustrated with
thicker lines). Some other beams have been erased from the design (receive thickness d
i
= 0),
sketched as thin, light gray lines.
Notice that, depending on the problem formulation, truss optimization can also be
come a discrete or even combinatorial problem: Not in all realworld design tasks, we can
choose the thicknesses of the beams in the truss arbitrarily. It may probably be possi
ble when synthesizing the support structures for air plane wings or motor blocks. How
ever, when building a truss from predeﬁned building blocks, such as the beams avail
able for house construction, the thicknesses of the beams are ﬁnite and, for example,
d
i
∈ ¦10mm, 20mm, 30mm, 50mm, 100mm, 300mm¦ ∀i ∈ 1..n. In such a case, a problem
such as the one sketched in Figure 1.5 becomes eﬀectively combinatorial or, at least, a
discrete internet programming task.
1.2.3 Discrete and Mixed Problems
Of course, numerical problems may also be deﬁned over discrete spaces, such as the N
n
0
.
Problems of this type are called integer programming problems:
Deﬁnition D1.10 (Integer Programming). An integer programming problem
13
is an
optimization problem which is deﬁned over a ﬁnite or countable inﬁnite numerical prob
lem space (such as N
n
0
or Z
n
) [1495, 2965].
Basically, combinatorial optimization tasks can be considered as a class of such discrete
optimization problems. Because of the wide range of possible solution structures in combi
natorial tasks, however, techniques well capable of solving integer programming problems
13
http://en.wikipedia.org/wiki/Integer_programming [accessed 20100804]
30 1 INTRODUCTION
are not necessarily suitable for a given combinatorial task. The same goes for tasks which
are suitable for realvalued vector optimization problems: the class of integer programming
problems is, of course, a member of the general numerical optimization tasks. However,
methods for realvalued problems are not necessarily good for integervalued ones (and
hence, also not for combinatorial tasks). Furthermore, there exist optimization problems
which are combinations of the above classes. For instance, there may be a problem where
a graph must be built which has edges with realvalued properties, i. e., a crossover of a
combinatorial and a continuous optimization problem.
Deﬁnition D1.11 (Mixed Integer Programming). A mixed integer programming
problem (MIP) is a numerical optimization problem where some variables stem from ﬁ
nite (or countable inﬁnite) (numerical) spaces while others stem from uncountable inﬁnite
(numerical) spaces.
1.2.4 Summary
Figure 1.6 is not intended to provide a precise hierarchy of problem types. Instead, it is only
Continuous
Optimization
Problems
MixedInteger
Programming
Problems
NumericalOptimizationProblems
DiscreteOptimization
Problems
Integer
Programming
Problems
Combinatorial
Optimization
Problems
Figure 1.6: A rough sketch of the problem type relations.
a rough sketch the relations between them. Basically, there are smooth transitions between
many problem classes and the type of a given problem is often subject to interpretation.
The candidate solutions of combinatorial problems stem from ﬁnite sets and are discrete.
They can thus always be expressed as discrete numerical vectors and thus could theoretically
be solved with methods for numerical optimization, such as integer programming techniques.
Of course, every discrete numerical problem can also be expressed as continuous numeri
cal problem since any vector from the R
n
can be easily be transformed to a vector in the Z
n
by simply rounding each of its elements. Hence, the techniques for continuous optimization
can theoretically be used for discrete numerical problems (integer programming) as well as
for MIPs and for combinatorial problems.
However, algorithms for numerical optimization problems often apply numerical meth
ods, such as adding vectors, multiplying them with some number, or rotating vectors. Com
binatorial problems, on the other hand, are usually solved by applying combinatorial modi
ﬁcations to solution structures, such as changing the order of elements in a sequence. These
two approaches are fundamentally diﬀerent and thus, although algorithms for continuous
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 31
optimization can theoretically be used for combinatorial optimization, it is quite clear that
this is rarely the case in practice and would generally yield bad results.
Furthermore, the precision with which numbers can be represented on a computer is ﬁnite
since the memory available in any possible physical device is limited. From this perspective,
any continuous optimization problem, if solved on a physical machine, is a discrete problem
as well. In theory, it could thus be solved with approaches for combinatorial optimization.
Again, it is quite easy to see that adjusting the values and sequences of bits in a ﬂoating
point representation of a real number will probably lead to inferior results or, at least, to
slower optimization speed than using advanced mathematical operations.
Finally, there may be optimization tasks with both, combinatorial and continuous char
acteristics. Example E1.6 provides a problem which can exist in a continuous and in a
discrete ﬂavor, both of which can be tackled with numerical methods whereas the latter one
can also be solved eﬃciently with approaches for discrete problems.
1.3 Classes of Optimization Algorithms
In this book, we will only be able to discuss a small fraction of the wide variety of Global
Optimization techniques [1068, 2126]. Before digging any deeper into the matter, we will
attempt to classify some of these algorithms in order to give a rough overview of what will
come in Part III and Part IV. Figure 1.7 sketches a rough taxonomy of Global Optimization
methods. Please be aware that the idea of creating a precise taxonomy of optimization
methods makes no sense. Not only are there many hybrid approaches which combine several
diﬀerent techniques, there also exist various deﬁnitions for what things like metaheuristics,
artiﬁcial intelligence, Evolutionary Computation, Soft Computing, and so on exactly are
and which speciﬁc algorithms do belong to them and which not. Furthermore, many of
these fuzzy collective terms also widely overlap. The area of Global Optimization is a living
and breathing research discipline. Many contributions come from scientists from various
research areas, ranging from physics to economy and from biology to the ﬁne arts. As
same as important are the numerous ideas developed by practitioners in from engineering,
industry, and business. As stated before, Figure 1.7 should thus be considered as just a rough
sketch, as one possible way to classify optimizers according to their algorithm structure. In
the remainder of this section, we will try to outline some of the terms used in Figure 1.7.
1.3.1 Algorithm Classes
Generally, optimization algorithms can be divided in two basic classes: deterministic and
probabilistic algorithms.
1.3.1.1 Deterministic Algorithms
Deﬁnition D1.12 (Deterministic Algorithm). In each execution step of a determinis
tic algorithm, there exists at most one way to proceed. If no way to proceed exists, the
algorithm has terminated.
For the same input data, deterministic algorithms will always produce the same results.
This means that whenever they have to make a decision on how to proceed, they will come
to the same decision each time the available data is the same.
32 1 INTRODUCTION
Optimization
[609, 2126, 2181, 2714]
Probabilistic Approaches
✻
Metaheuristics
Part III [1068, 1887]
✻
Hill Climbing
Chapter 26 [2356]
✻
Simulated Annealing
Chapter 27 [1541, 2043]
✻
Tabu Search
Chapter 40 [1069, 1228]
✻
Extremal Optimization
Chapter 41 [345, 725]
✻
GRASP
Chapter 42 [902, 906]
✻
Downhill Simplex
Chapter 43 [1655, 2017]
✻
Random Optimization
Chapter 44 [526, 2926]
✻
Harmony Search
[1033]
✻
✛
Evolutionary Computation
[715, 847, 863, 1220, 3025, 3043]
✻
Evolutionary Algorithms
Chapter 28 [167, 171, 599]
Genetic Algorithms
Chapter 29 [1075, 1912]
✻
Evolution Strategies
Chapter 30 [300, 2279]
✻
Genetic Programming
Chapter 31 [1602, 2199]
✻
SGP
Section 31.3 [1602, 1615]
✻
LGP
Section 31.4 [209, 394]
✻
GGGP
Section 31.5 [1853, 2358]
✻
Graphbased
Section 31.6 [1901, 2191]
✻
Diﬀerential Evolution
Chapter 33 [2220, 2621]
✻
EDAs
Chapter 34 [1683, 2153]
✻
Evol. Programming
Chapter 32 [940, 945]
✻
LCS
Chapter 35 [430, 1256]
✻
Swarm Intelligence
[353, 881, 1524]
✻
ACO
Chapter 37 [809, 810]
✻
RFD
Chapter 38 [229, 230]
✻
PSO
Chapter 39 [589, 853]
✻
Memetic Algorithms
Chapter 36 [1141, 1961]
✻
✛
Deterministic Approaches
[1103, 1271, 2040, 2126]
Search Algorithms
Chapter 46 [635, 2356]
✻
Uninformed Search
Section 46.2 [1557, 2140]
✻
BFS
Section 46.2.1
✻
DFS
Section 46.2.2
✻
IDDFS
Section 46.2.4 [2356]
✻
Informed Search
Section 46.3 [1557, 2140]
✻
Greedy Search
Section 46.3.1 [635]
✻
A
⋆
Search
Section 46.3.2 [758]
✻
Branch & Bound
Chapter 47 [673, 1663]
✻
CuttingPlane Method
Chapter 48 [152, 643]
✻
Figure 1.7: Overview of optimization algorithms.
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 33
Example E1.7 (Decision based on Objective Value).
Let us assume, for example, that the optimization process has to select one of the two
candidate solutions x
1
and x
2
for further investigation. The algorithm could base its decision
on the value the objective function f subject to minimization takes on. If f(x
1
) < f(x
2
), then
x
1
is chosen and x
2
otherwise. For each two points x
i
, x
j
∈ X, the decision procedure is the
same and deterministically chooses the candidate solution with the better objective value.
Deterministic algorithms are thus used when decisions can eﬃciently be made on the data
available to the optimization process. This is the case if a clear relation between the char
acteristics of the possible solutions and their utility for a given problem exists. Then, the
search space can eﬃciently be explored with, for example, a divide and conquer scheme
14
.
1.3.1.2 Probabilistic Algorithms
There may be situations, however, where deterministic decision making may prevent the
optimizer from ﬁnding the best results. If the relation between a candidate solution and its
“ﬁtness” is not obvious, dynamically changing, too complicated, or the scale of the search
space is very high, applying many of the deterministic approaches becomes less eﬃcient and
often infeasible. Then, probabilistic optimization algorithms come into play. Here lies the
focus of this book.
Deﬁnition D1.13 (Probabilistic Algorithm). A randomized
15
(or probabilistic or
stochastic) algorithm includes at least one instruction that acts on the basis of random
numbers. In other words, a probabilistic algorithm violates the constraint of determin
ism [1281, 1282, 1919, 1966, 1967].
Generally, exact algorithms can be much more eﬃcient than probabilistic ones in many
domains. Probabilistic algorithms have the further drawback of not being deterministic,
i. e., may produce diﬀerent results in each run even for the same input. Yet, in situations like
the one described in Example E1.8, randomized decision making can be very advantageous.
Example E1.8 (Probabilistic Decision Making (Example E1.7 Cont.)).
In Example E1.7, we described a possible decision step in a deterministic optimization
algorithm, similar to a statement such as if f(x
1
) < f(x
2
) then investigate(x
1
);
else investigate(x
2
);. Although this decision may be the right one in most cases, it
may be possible that sometimes, it makes sense to investigate a less ﬁt candidate solution.
Hence, it could be a good idea to rephrase the decision to “if f(x
1
) < f(x
2
), then choose
x
1
with 90% probability, but in 10% of the situations, continue with x
2
”. In the case that
f(x
1
) < f(x
2
) actually holds, we could draw a random number uniformly distributed in [0, 1)
and if this number less than 0.9, we would pick x
1
and otherwise, investigate x
2
. Matter of
fact, such kind of randomized decision making distinguishes the more Simulated Annealing
algorithm (Chapter 27) which has been proven to be able to always ﬁnd the global optimum
(possibly in inﬁnite time) from simple Hill Climbing (Chapter 26) which does not have this
feature.
Early work in this area, which now has become one of most important research ﬁelds in
optimization, was started at least sixty years ago. The foundations of many of the eﬃcient
methods currently in use were laid by researchers such as Bledsoe and Browning [327],
Friedberg [995], Robbins and Monro [2313], and Bremermann [407].
14
http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm [accessed 20070709]
15
http://en.wikipedia.org/wiki/Randomized_algorithm [accessed 20070703]
34 1 INTRODUCTION
1.3.2 Monte Carlo Algorithms
Most probabilistic algorithms used for optimization are Monte Carlobased approaches (see
Section 12.4). They trade guaranteed global optimality of the solutions in for a shorter
runtime. This does not mean that the results obtained by using them are incorrect – they
may just not be the global optima. Still, a result little bit inferior to the best possible
solution x
⋆
is still better than waiting 10
100
years until x
⋆
is found
16
. . .
An umbrella term closely related to Monte Carlo algorithms is Soft Computing
17
(SC),
which summarizes all methods that ﬁnd solutions for computational hard tasks (such as
ATcomplete problems, see Section 12.2.2) in relatively short time, again by trading in
optimality for higher computation speed [760, 1242, 1430, 2685].
1.3.3 Heuristics and Metaheuristics
Most eﬃcient optimization algorithms which search good individuals in some targeted way,
i. e., all methods better than uninformed searches, utilize information from objective (see
Section 2.2) or heuristic functions. The term heuristic stems from the Greek heuriskein
which translates to to ﬁnd or to to discover.
1.3.3.1 Heuristics
Heuristics used in optimization are functions that rates the utility of an element and provides
support for the decision regarding which element out of a set of possible solutions is to be
examined next. Deterministic algorithms usually employ heuristics to deﬁne the processing
order of the candidate solutions. Approaches using such functions are called informed search
methods (which will be discussed in Section 46.3 on page 518). Many probabilistic methods,
on the other hand, only consider those elements of the search space in further computations
that have been selected by the heuristic while discarding the rest.
Deﬁnition D1.14 (Heuristic). A heuristic
18
[1887, 2140, 2276] function φ is a problem
dependent approximate measure for the quality of one candidate solution.
A heuristic may, for instance, estimate the number of search steps required from a given
solution to the optimal results. In this case, the lower the heuristic value, the better the
solution would be. It is quite clear that heuristics are problem class dependent. They can be
considered as the equivalent of ﬁtness or objective functions in (deterministic) state space
search methods [1467] (see Section 46.3 and the more speciﬁc Deﬁnition D46.5 on page 518).
An objective function f(x) usually has a clear meaning denoting an absolute utility of a
candidate solution x. Diﬀerent from that, heuristic values φ(p) often represent more indirect
criteria such as the aforementioned search costs required to reach the global optimum [1467]
from a given individual.
Example E1.9 (Heuristic for the TSP from Example E2.2).
A heuristic function in the Traveling Salesman Problem involving ﬁve cities in China as
deﬁned in Example E2.2 on page 45 could, for instance, state “The total distance of this
candidate solution is 1024km longer than the optimal tour.” or “You need to swap no more
than three cities in this tour in order to ﬁnd the optimal solution”. Such a statement has
a diﬀerent quality from an objective function just stating the total tour length and being
subject to minimization.
16
Kindly refer to Chapter 12 for a more indepth discussion of complexity.
17
http://en.wikipedia.org/wiki/Soft_computing [accessed 20070917]
18
http://en.wikipedia.org/wiki/Heuristic_%28computer_science%29 [accessed 20070703]
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 35
We will discuss heuristics in more depth in Section 46.3. It should be mentioned that opti
mization and artiﬁcial intelligence
19
(AI) merge seamlessly with each other. This becomes
especially clear when considering that one most of important research areas of AI is the
design of good heuristics for search, especially for the state space search algorithms which
here, are listed as deterministic optimization methods.
Anyway, knowing the proximity of the meaning of the terms heuristics and objective
functions, it may as well be possible to consider many of the following, randomized ap
proaches to be informed searches (which brings us back to the fuzzyness of terms). There
is, however, another diﬀerence between heuristics and objective functions: They usually
involve more knowledge about the relation of the structure of the candidate solutions and
their utility as may have become clear from Example E1.9.
1.3.3.2 Metaheuristics
This knowledge is a luxury often not available. In many cases, the optimization algorithms
have a “horizon” limited to the search operations and the objective functions with unknown
characteristics. They can produce new candidate solutions by applying search operators to
the currently available genotypes. Yet, the exact result of the operators is (especially in
probabilistic approaches) not clear in advance and has to be measured with the objective
functions as sketched in Figure 1.8. Then, the only choices an optimization algorithm has
are:
1. Which candidate solutions should be investigated further?
2. Which search operations should be applied to them? (If more than one operator is
available, that is.)
f(x ) ÎX
blackbox
(3,3)
(0,2) (0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2)
Pop(t)
f
(
x
)
1
(3,3)
(0,2) (0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2)
Pop(t+1)
computefitnessorheuristic,
selectbestsolutions,and
createnewonesvia
searchoperations
Figure 1.8: Black box treatment of objective functions in metaheuristics.
19
http://en.wikipedia.org/wiki/Artificial_intelligence [accessed 20070917]
36 1 INTRODUCTION
Deﬁnition D1.15 (Metaheuristic). A metaheuristic
20
is a method for solving general
classes of problems. It combines utility measures such as objective functions or heuristics
in an abstract and hopefully eﬃcient way, usually without utilizing deeper insight into their
structure, i. e., by treating them as black boxprocedures [342, 1068, 1103].
Metaheuristics, which you can ﬁnd discussed in Part III, obtain knowledge on the struc
ture of the problem space and the objective functions by utilizing statistics obtained from
stochastically sampling the search space. Many metaheuristics are inspired by a model of
some natural phenomenon or physical process. Simulated Annealing, for example, decides
which candidate solution is used during the next search step according to the Boltzmann
probability factor of atom conﬁgurations of solidifying metal melts. The Raindrop Method
emulates the waves on a sheet of water after a raindrop fell into it. There also are tech
niques without direct realworld role model like Tabu Search and Random Optimization.
Uniﬁed models of metaheuristic optimization procedures have been proposed by Osman
[2103], RaywardSmith [2275], Vaessens et al. [2761, 2762], and Taillard et al. [2650].
Obviously, metaheuristics are not necessarily probabilistic methods. The A
⋆
search, for
instance, a deterministic informed search approach, is considered to be a metaheuristic too
(although it incorporates more knowledge about the structure of the heuristic functions
than, for instance, Evolutionary Algorithms do). Still, in this book, we are interested in
probabilistic metaheuristics the most.
1.3.3.2.1 Summary: Diﬀerence between Heuristics and Metaheuristics
The terms heuristic, objective function, ﬁtness function (will be introduced in 28.1.2.5
and Section 28.3 on page 274 in detail), and metaheuristic may easily be mixed up and
it is maybe not entirely obvious which is which and where is the diﬀerence. Therefore, we
here want to give a short outline of the main diﬀerences:
1. Heuristic. problemspeciﬁc measure of the potential of a solution as step towards the
global optimum. May be absolute/direct (describe the utility of a solution) or indirect
(e.g., approximate the distance of the solution to the optimum in terms of search
steps).
2. Objective Function. Problemspeciﬁc absolute/direct quality measure which rates one
aspect of candidate solution.
3. Fitness Function. Secondary utility measure which may combine information from a
set of primary utility measures (such as objective functions or heuristics) to a single
rating for an individual. This process may also take place relative to a set of given
individuals, i. e., rate the utility of a solution in comparison to another set of solutions
(such as a population in Evolutionary Algorithms).
4. Metaheuristic. general, problemindependent (ternary) approach to select solutions
for further investigation in order to solve an optimization problem. Therefore uses
a (primary or secondary) utility measure such as a heuristic, objective function, or
ﬁtness function.
1.3.4 Evolutionary Computation
An important class of probabilistic Monte Carlo algorithms is Evolutionary Computation
21
(EC). It encompasses all randomized metaheuristic approaches which iteratively reﬁne a set
(population) of multiple candidate solutions.
20
http://en.wikipedia.org/wiki/Metaheuristic [accessed 20070703]
21
http://en.wikipedia.org/wiki/Evolutionary_computation [accessed 20070917]
1.4. CLASSIFICATION ACCORDING TO OPTIMIZATION TIME 37
Some of the most important EC approaches are Evolutionary Algorithms and Swarm
Intelligence, which will be discussed indepth in this book. Swarm Intelligence is inspired
by fact that natural systems composed of many independent, simple agents – ants in case of
Ant Colony Optimization (ACO) and birds or ﬁsh in case of Particle Swarm Optimization
(PSO) – are often able to ﬁnd pieces of food or shortestdistance routes very eﬃciently.
Evolutionary Algorithms, on the other hand, copy the process of natural evolution and
treat candidate solutions as individuals which compete and reproduce in a virtual envi
ronment. Generation after generation, these individuals adapt better and better to the
environment (deﬁned by the userspeciﬁed objective function(s)) and thus, become more
and more suitable solutions to the problem at hand.
1.3.5 Evolutionary Algorithms
Evolutionary Algorithms can again be divided into very diverse approaches, spanning from
Genetic Algorithms (GAs) which try to ﬁnd bitor integerstrings of high utility (see, for
instance, Example E4.2 on page 83) and Evolution Strategies (ESs) which do the same with
vectors of real numbers to Genetic Programming (GP), where optimal tree data structures
or programs are sought. The variety of Evolutionary Algorithms is the focus of this book,
also including Memetic Algorithms (MAs) which are EAs hybridized with other search al
gorithms.
1.4 Classiﬁcation According to Optimization Time
The taxonomy just introduced classiﬁes the optimization methods according to their algo
rithmic structure and underlying principles, in other words, from the viewpoint of theory. A
software engineer or a user who wants to solve a problem with such an approach is however
more interested in its “interfacing features” such as speed and precision.
Speed and precision are often conﬂicting objectives, at least in terms of probabilistic
algorithms. A general rule of thumb is that you can gain improvements in accuracy of
optimization only by investing more time. Scientists in the area of global optimization
try to push this Pareto frontier
22
further by inventing new approaches and enhancing or
tweaking existing ones. When it comes to time constraints and hence, the required speed of
the optimization algorithm, we can distinguish two types of optimization use cases.
Deﬁnition D1.16 (Online Optimization). Online optimization problems are tasks that
need to be solved quickly in a time span usually ranging between ten milliseconds to a few
minutes.
In order to ﬁnd a solution in this short time, optimality is often signiﬁcantly traded in
for speed gains. Examples for online optimization are robot localization, load balancing,
services composition for business processes, or updating a factory’s machine job schedule
after new orders came in.
Example E1.10 (Robotic Soccer).
Figure 1.9 shows some of the robots of the robotic soccer team
23
[179] as the play in the
RoboCup German Open 2009. In the games of the RoboCup competition, two teams, each
consisting of a ﬁxed number of robotic players, oppose each other on a soccer ﬁeld. The
rules are similar to human soccer and the team which scores the most goals wins.
When a soccer playing robot approaches the goal of the opposing team with the ball, it
has only a very limited time frame to decide what to do [2518]. Should it shoot directly at
22
Pareto frontiers have been discussed in Section 3.3.5 on page 65.
23
http://carpenoctem.daslab.net
38 1 INTRODUCTION
Figure 1.9: The robotic soccer team CarpeNoctem (on the right side) on the fourth day of
the RoboCup German Open 2009.
the goal? Should it try to pass the ball to a team mate? Should it continue to drive towards
the goal in order to get into a better shooting position?
To ﬁnd a sequence of actions which maximizes the chance of scoring a goal in a contin
uously changing world is an online optimization problem. If a robot takes too long to make
a decision, opposing and cooperating robots may already have moved to diﬀerent locations
and the decision may have become outdated.
Not only must the problem be solved in a few milliseconds, this process often involves
ﬁnding an agreement between diﬀerent, independent robots each having a possibly diﬀerent
view on what is currently going on in the world. Additionally, the degrees of freedom in the
real world, the points where the robot can move or shoot the ball to, are basically inﬁnite.
It is clear that in such situations, no complex optimization algorithm can be applied.
Instead, only a smaller set of possible alternatives can be tested or, alternatively, predeﬁned
programs may be processed, thus ignoring the optimization character of the decision making.
Online optimization tasks are often carried out repetitively – new orders will, for instance,
continuously arrive in a production facility and need to be scheduled to machines in a way
that minimizes the waiting time of all jobs.
Deﬁnition D1.17 (Oﬄine Optimization). In oﬄine optimization problems, time is not
so important and a user is willing to wait maybe up to days or weeks if she can get an
optimal or closetooptimal result.
Such problems regard for example design optimization, some data mining tasks, or creating
longterm schedules for transportation crews. These optimization processes will usually be
carried out only once in a long time.
Before doing anything else, one must be sure about to which of these two classes the
problem to be solved belongs. There are two measures which characterize whether an opti
mization is suited to the timing constraints of a problem:
1. The time consumption of the algorithm itself.
2. The number of evaluations τ until the optimum (or a suﬃciently good candidate
solution) is discovered, i. e., the ﬁrst hitting time FHT (or its expected value).
An algorithm with high internal complexity and a low FHT may be suitable for a problem
where evaluating the objective functions takes long. On the other hand, it may be less
feasible than a simple approach if the quality of a candidate solution can be determined
quickly.
1.5. NUMBER OF OPTIMIZATION CRITERIA 39
1.5 Number of Optimization Criteria
Optimization algorithms can be divided into
1. such which try to ﬁnd the best values of single objective functions f (see Section 3.1
on page 51,
2. such that optimize sets f of target functions (see Section 3.3 on page 55, and
3. such that optimize one or multiple objective functions while adhering to additional
constraints (see Section 3.4 on page 70).
This distinction between singleobjective, multiobjective, and constraint optimization will
be discussed in depth in Section 3.3. In this chapter, we will also introduced unifying
approaches which allow us to transform multiobjective problems into singleobjective ones
and to gain a joint perspective on constrained and unconstrained optimization.
1.6 Introductory Example
In Algorithm 1.1, we want to give a short introductory example on how metaheuristic
optimization algorithms may look like and of which modules they can be composed. Algo
rithm 1.1 is based in the bin packing Example E1.1 given on 25. It should be noted that
we do not claim that Algorithm 1.1 is any good in solving the problem, matter of fact, the
opposite is the case. However, it posses typical components which can be found in most
metaheuristics and can relatively easily be understood.
As stated in Example E1.1, the goal of bin packing is to whether an assignment x of n
objects (having the sizes a
1
, a
2
, .., a
n
) to (at most) k bins each able to hold a volume of b
exists where the capacity of no bin is exceeded. This decision problem can be translated
into an optimization problem where it is the goal to discover the x
⋆
where the bin capacities
are exceeded the least, i.e., to minimize the objective function f
bp
given in Equation 1.2
on page 26. Objective functions are discussed in Section 2.2 and in Algorithm 1.1, f
bp
is
evaluated in Line 23.
Obviously, if f
bp
(x
⋆
) = 0, the answer to the decision problem is “Yes, an assignment
which does not violate the constraints exist.” and otherwise, the answer is “No, it is not
possible to pack the n speciﬁed objects into the k bins of size b.”
For f
bp
in Example E1.1, we represented a possible assignment of the objects to bins as
a tuple x consisting of k sets. Each set can hold numbers from 1..n. If, for one speciﬁc x,
5 ∈ x[0], for example, then the ﬁfth object should be placed into the ﬁrst bin and 3 ∈ x[3]
means that the third object is to be put into the fourth bin. The set X of all such possible
solutions (s) x is called problem space and discussed in Section 2.1.
Instead of exploring this space directly, Example E1.1 uses a diﬀerent search space G
(see Section 4.1). Each of these encoded candidate solutions, each socalled genotype, is a
permutation of the ﬁrst n natural numbers. The sequence of these numbers stands for the
assignment order of objects to bins.
This representation is translated between Lines 13 and 22 to the tupleofsets representa
tion which can be evaluated by the objective function f
bp
. Basically, this genotypephenotype
mapping gpm : G →X (see Section 4.3) starts with a tuple of k empty sets and sequentially
processes the genotypes g from the beginning to the end. In each processing step, it takes
the next number from the genotype and checks whether the object identiﬁed by this number
still ﬁts into the current bin bin. If so, the number is placed into the set in the tuple x at
index bin. Otherwise, if there are empty bins left, bin is increased. This way, the ﬁrst k −1
bins are always ﬁlled to at most their capacity and only the last bin may overﬂow. The
translation between the genotypes g ∈ G and the candidate solutions x ∈ X is deterministic.
It is known that bin packing is AThard, which means that there currently exists no
algorithm which is always better than enumerating all possible solutions and testing each
40 1 INTRODUCTION
initialization: nullary search operation
searchOp : ∅ →G (see Section 4.2)
search step: unary search operation
searchOp : G →G (see Section 4.2)
genotypephenotype mapping
gpm : G →X (see Section 4.3)
objective function f : X →R (see Section 2.2)
What is good? (see Chapter 3 and 5.2)
termination criterion (see Section 6.3.3)
Algorithm 1.1: x ←− optimizeBinPacking(k, b, a) . . . . . . . . . . . . . . . (see Example E1.1)
Input: k ∈ N
1
: the number of bins
Input: b ∈ N
1
: the size of each single bin
Input: a
1
, a
2
, .., a
n
∈ N
1
: the sizes of the n objects
Data: g, g
⋆
∈ G: the current and best genotype so far . . . . . . . . . . (see Deﬁnition D4.2)
Data: x ∈ X: the current and best phenotype so far . . . . . . . . . . . . (see Deﬁnition D2.2)
Data: y, y
⋆
∈ R
+
: the current and best objective value . . . . . . . . . (see Deﬁnition D2.3)
Output: x ∈ X: the best phenotype found. . . . . . . . . . . . . . . . . . . . . . (see Deﬁnition D3.5)
1 begin
2 g
⋆
←− (1, 2, .., n)
3 y
⋆
←− +∞
4 repeat
// perform a random search step
5 g ←− g
⋆
6 i ←− ⌊randomUni[1, n)⌋
7 repeat
8 j ←− ⌊randomUni[1, n)⌋
9 until i ,= j
10 t ←− g[i]
11 g[i] ←− g[j]
12 g[j] ←− t
// compute the bins
13 x ←− (∅, ∅, . . .
. ¸¸ .
n times
)
14 bin ←− 0
15 sum ←− 0
16 for i ←− 0 up to n −1 do
17 j ←− g[i]
18 if ((sum+a
j
) > b) ∧ (bin < (k −1)) then
19 bin ←− bin + 1
20 sum ←− 0
21 x[bin] ←− x[bin] ∪ j
22 sum ←− sum+a
j
// compute objective value
23 y ←− f
bp
(x)
// the search logic: always choose the better point
24 if y < y
⋆
then
25 g ←− g
⋆
26 x ←− x
27 y
⋆
←− y
28 until y
⋆
= 0
29 return x
1.6. INTRODUCTORY EXAMPLE 41
of them. Our Algorithm 1.1 tries to explore the search space by starting with putting the
objects into the bins in their original order (Line 2). In each optimization step, it randomly
exchanges two objects (Lines 5 to 12). This operation basically is a unary search operation
which maps the elements of the search space G to new elements in G and creates new
genotypes g. The initialization step can be viewed as a nullary search operation. The basic
deﬁnitions for search operations is given in Section 4.2 on page 83.
In Line 24, the optimizer compares the objective value of the best assignment x found
so far with the objective value of the candidate solution x decoded from the newly created
genotype g with the genotypephenotype mapping “gpm”. If the new phenotype x is better
than the old one x, the search will from now on use its corresponding genotype (g
⋆
←− x)
for creating new candidate solutions with the search operation and remembers x as well as
its objective value y. A discussion on what the term better actually means in optimization
is given in Chapter 3.
The search process in Algorithm 1.1 continues until a perfect packing, i. e., one with no
overﬂowing bin, is discovered. This termination criterion (see Section 6.3.3), checked in
Line 28, is chosen not too wise, since if no solution can be found, the algorithm will run
forever. Eﬀectively, this Hill Climbing approach (see Chapter 26 on page 229) is thus a Las
Vegas algorithm (see Deﬁnition D12.2 on page 148).
In the following chapters, we will put a simple and yet comprehensive theory around
the structure of metaheuristics. We will give clear deﬁnitions for the typical modules of an
optimization algorithm and outline how they interact. Algorithm 1.1 here served as example
to show the general direction of where we will be going.
42 1 INTRODUCTION
Tasks T1
1. Name ﬁve optimization problems which can be encountered in daily life. Describe
the criteria which are subject to optimization, whether they should be maximized or
minimized, and how the investigated space of possible solutions looks like.
[10 points]
2. Name ﬁve optimization problems which can be encountered in engineering, business,
or science. Describe them in the same way as requested in Task 1.
[10 points]
3. Classify the problems from Task 1 and 2 into numerical, combinatorial, discrete, and
mixed integer programming, the classes introduced in Section 1.2 and sketched in Fig
ure 1.6 on page 30. Check if diﬀerent classiﬁcations are possible for the problems and
if so, outline how the problems should be formulated in each case. The truss opti
mization problem (see Example E1.6 on page 29) is, for example, usually considered
to be a numerical problem since the thickness of each beam is a real value. If there
is, however, a ﬁnite set of beam thicknesses we can choose from, it becomes a discrete
problem and could even be considered as a combinatorial one.
[10 points]
4. Implement Algorithm 1.1 on page 40 in a programming language of your choice. Apply
your implementation to the bin packing instance given in Figure 1.2 on page 25 and
to the problem k = 5, b = 6, a = (6, 5, 4, 3, 3, 3, 2, 2, 1) and (hence) n = 9. Does the
algorithm work? Describe your observations, use a stepbystep debugger if necessary.
[20 points]
5. Find at least three features which make Algorithm 1.1 on page 40 rather unsuitable for
practical applications. Describe these drawbacks, their impact, and point out possible
remedies. You may use an implementation of the algorithm as requested in Task 4 for
gaining experience with what is problematic for the algorithm.
[6 points]
6. What are the advantages and disadvantages of randomized/probabilistic optimization
algorithms?
[4 points]
7. In which way are the requirements for online and oﬄine optimization diﬀerent? Does
a need for online optimization inﬂuence the choice of the optimization algorithm and
the general algorithm structure which can be used in your opinion?
[5 points]
8. What are the diﬀerences between solving general optimization problems and speciﬁc
ones? Discuss especially the diﬀerence between black box optimization and solving
functions which are completely known and maybe even can be diﬀerentiated.
[5 points]
Chapter 2
Problem Space and Objective Functions
2.1 Problem Space: How does it look like?
The ﬁrst step which has to be taken when tackling any optimization problem is to deﬁne the
structure of the possible solutions. For a minimizing a simple mathematical function, the
solution will be a real number and if we wish to determine the optimal shape of an airplane’s
wing, we are looking for a blueprint describing it.
Deﬁnition D2.1 (Problem Space). The problem space (phenome) X of an optimization
problem is the set containing all elements x which could be its solution.
Usually, more than one problem space can be deﬁned for a given optimization problem. A
few lines before, we said that as problem space for minimizing a mathematical function, the
real numbers R would be ﬁne. On the other hand, we could as well restrict ourselves to
the natural numbers N
0
or widen the search to the whole complex plane C. This choice has
major impact: On one hand, it determines which solutions we can possible ﬁnd. On the
other hand, it also has subtle inﬂuence on the way in which an optimization algorithm can
perform its search. Between each two diﬀerent points in R, for instance, there are inﬁnitely
many other numbers, while in N
0
, there are not. Then again, unlike N
0
and R, there is no
total order deﬁned on C.
Deﬁnition D2.2 (Candidate Solution). A candidate solution (phenotype) x is an ele
ment of the problem space X of a certain optimization problem.
In this book, we will especially focus on evolutionary optimization algorithms. One of the
oldest such method, the Genetic Algorithm, is inspired by the natural evolution and reuses
many terms from biology. In dependence on Genetic Algorithms, we synonymously refer
to the problem space as phenome and to candidate solutions as phenotypes. The problem
space X is often restricted by practical constraints. It is, for instance, impossible to take
all real numbers into consideration in the minimization process of a real function. On our
oﬀtheshelf CPUs or with the Java programming language, we can only use 64 bit ﬂoating
point numbers. With them numbers can only be expressed up to a certain precision. If the
computer can only handles numbers up to 15 decimals, an element x = 1 +10
−16
cannot be
discovered.
44 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
2.2 Objective Functions: Is it good?
The next basic step when deﬁning an optimization problem is to deﬁne a measure for what
is good. Such measures are called objective functions. As parameter, they take a candidate
solution x from the problem space X and return a value which rates its quality.
Deﬁnition D2.3 (Objective Function). An objective function f : X → R is a mathe
matical function which is subject to optimization.
In Listing 55.8 on page 712, you can ﬁnd a Java interface resembling Deﬁnition D2.3.
Usually, objective functions are subject to minimization. In other words, the smaller f(x),
the better is the candidate solution x ∈ X. Notice that although we state that an objective
function is a mathematical function, this is not always fully true. In Section 51.10 on
page 646 we introduce the concept of functions as functional binary relations (see ?? on
page ??) which assign to each element x from the domain X one and only one element from
the codomain (in case of the objective functions, the codomain is R). However, the evaluation
of the objective functions may incorporate randomized simulations (Example E2.3) or even
human interaction (Example E2.6). In such cases, the functionality constraint may be
violated. For the purpose of simplicity, let us, however, stick with the concept given in
Deﬁnition D2.3.
In the following, we will use several examples to build an understanding of what the
three concepts problem space X, candidate solution x, and objective function f mean. These
example should give an impression on the variety of possible ways in which they can be
speciﬁed and structured.
Example E2.1 (A Stone’s Throw).
A very simple optimization problem is the question Which is best velocity with which I should
throw
1
a stone (in an α = 15
◦
angle) so that it lands exactly 100m away (assuming uniform
gravity and the absence of drag and wind)? Everyone of us has solved countless such tasks
in school and it probably never came to our mind to consider them as optimization problem.
a
100m
f(x)
d(x)
Figure 2.1: A sketch of the stone’s throw in Example E2.1.
The problem space X of this problem equals to the real numbers R and we can deﬁne
an objective function f : R → R which denotes the absolute diﬀerence between the actual
throw distance d(x) and the target distance 100m when a stone is thrown with x m/s as
sketched in Figure 2.1:
f(x) = [100m−d(x)[ (2.1)
d(x) =
x
2
g
sin(2α) (2.2)
≈ 0.051s
2
/m∗ x
2
¸
¸
g≈9.81m/s
2
, α=15
◦
(2.3)
1
http://en.wikipedia.org/wiki/Trajectory [accessed 20090613]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 45
This problem can easily be solved and we can obtain the (globally) optimal solution x
⋆
,
since we know that the best value which f can take on is zero:
f(x
⋆
) = 0 ≈
¸
¸
¸100m−0.051s
2
/m∗ (x
⋆
)
2
¸
¸
¸ (2.4)
0.051s
2
/m∗ (x
⋆
)
2
≈ 100m (2.5)
x
⋆
≈
¸
100m
0.051s
2
/m
≈
_
1960.78m
2
/s
2
≈ 44.28m/s (2.6)
Example E2.1 was very easy. We had a very clear objective function f for which even the
precise mathematical deﬁnition was available and, if not known beforehand, could have
easily been obtained from literature. Furthermore, the space of possible solutions, the
problem space, was very simple since it equaled the (positive) real numbers.
Example E2.2 (Traveling Salesman Problem).
The Traveling Salesman Problem [118, 1124, 1694] was ﬁrst mentioned in 1831 by Voigt
[2816] and a similar (historic) problem was stated by the Irish Mathematician Hamil
ton [1024]. Its basic idea is that a salesman wants to visit n cities in the shortest possible
time under the assumption that no city is visited twice and that he will again arrive at the
origin by the end of the tour. The pairwise distances of the cities are known. This problem
can be formalized with the aid of graph theory (see Chapter 52) as follows:
Deﬁnition D2.4 (Traveling Salesman Problem). The goal of the Traveling Salesman
Problem (TSP) is to ﬁnd a cyclic path of minimum total weight
2
which visits all vertices of
a weighted graph.
Beijing
Shanghai
Hefei
Guangzhou
Chengdu
Chengdu
Guangzhou
Beijing
Hefei
Chengdu
1854km
Guangzhou
2174km
1954km
Hefei
1044km
1615km
1257km
Shanghai
1244km
2095km
1529km
472km
w(a,b)
Figure 2.2: An example for Traveling Salesman Problems, based on data from GoogleMaps.
Assume that we have the task to solve the Traveling Salesman Problem given in Figure 2.2,
i. e., to ﬁnd the shortest cyclic tour through Beijing, Chengdu, Guangzhou, Hefei, and
Shanghai. The (symmetric) distance matrix
3
for computing a tour’s length is given in the
ﬁgure, too.
Here, the set of possible solutions X is not as trivial as in Example E2.1. Instead of
being a mapping from the real numbers to the real numbers, the objective function f takes
2
see Equation 52.3 on page 651
3
Based on the routing utility GoogleMaps (http://maps.google.com/ [accessed 20090616]).
46 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
one possible permutation of the ﬁve cities as parameter. Since the tours which we want
to ﬁnd are cyclic, we only need to take the permutations of four cities into consideration.
If we, for instance, assume that the tour starts in Hefei, it will also end in Hefei and each
possible solution is completely characterized by the sequence in which Beijing, Chengdu,
Guangzhou, and Shanghai are visited. Hence, X is the set of all such permutations.
X = Π(¦Beijing, Chengdu, Guangzhou, Shanghai¦) (2.7)
We can furthermore halve the search space by recognizing that it plays no role for the
total distance in which direction we take our tour x ∈ X. In other words, a tour
(Hefei →Shanghai → Beijing → Chengdu →Guangzhou →Hefei) has the same length and
is equivalent to (Hefei →Guangzhou →Chengdu → Beijing → Shanghai →Hefei). Hence,
for a TSP with n cities, there are
1
2
∗ (n −1)! possible solutions. In our case, this avails to
12 unique permutations.
f(x) = w(Hefei, x[0]) +
2
i=0
w(x[i], x[i + 1]) +w(x[3], Hefei) (2.8)
The objective function f : X → R – the criterion subject to optimization – is the total
distance of the tours. It equals to the sum of distances between the consecutive cities x[i]
in the lists x representing the permutations and to the start and end point Hefei. We again
want to minimize it. Computing it is again rather simple, but diﬀerent from Example E2.1,
we cannot simply solve it and compute the optimal result x
⋆
. Matter of fact, the Traveling
Salesman Problem is one of the most complicated optimization problems.
Here, we will determine x
⋆
with a socalled brute force
4
approach: We enumerate all
possible solutions and their corresponding objective values and then pick the best one.
the tour x
i
f(x
i
)
x
1
Hefei → Beijing → Chengdu → Guangzhou → Shanghai → Hefei 6853km
x
2
Hefei → Beijing → Chengdu → Shanghai → Guangzhou → Hefei 7779km
x
3
Hefei → Beijing → Guangzhou → Chengdu → Shanghai → Hefei 7739km
x
4
Hefei → Beijing → Guangzhou → Shanghai → Chengdu → Hefei 8457km
x
5
Hefei → Beijing → Shanghai → Chengdu → Guangzhou → Hefei 7594km
x
6
Hefei → Beijing → Shanghai → Guangzhou → Chengdu → Hefei 7386km
x
7
Hefei → Chengdu → Beijing → Guangzhou → Shanghai → Hefei 7644km
x
8
Hefei → Chengdu → Beijing → Shanghai → Guangzhou → Hefei 7499km
x
9
Hefei → Chengdu → Guangzhou → Beijing → Shanghai → Hefei 7459km
x
10
Hefei → Chengdu → Shanghai → Beijing → Guangzhou → Hefei 8385km
x
11
Hefei → Guangzhou → Beijing → Chengdu → Shanghai → Hefei 7852km
x
12
Hefei → Guangzhou → Chengdu → Beijing → Shanghai → Hefei 6781km
Table 2.1: The unique tours for Example E2.2.
From Table 2.1 we can easily spot the best solution x
⋆
= x
12
, as the route over
Guangzhou, Chengdu, Beijing, and Shanghai, both starting and ending in Hefei. With a
total distance of f(x
12
) = 6781 km, it is more than 1500 km shorter than the worst solution
x
4
.
Enumerating all possible solutions is, of course, the most timeconsuming optimization ap
proach possible. For TSPs with more cities, it quickly becomes infeasible since the number
of solutions grows more than exponentially
__
n
3
_
n
≤ n!
_
. Furthermore, in the simple Exam
ple E2.1, enumerating all candidate solutions would have even been impossible since X = R.
From this example we can learn three things:
4
http://en.wikipedia.org/wiki/Bruteforce_search [accessed 20090616]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 47
1. The problem space containing the parameters of the objective functions is not re
stricted to the real numbers but can, indeed, have a very diﬀerent structure.
2. There are objective functions which cannot be resolved mathematically. Matter of
fact, most “interesting” optimization problem have this feature.
3. Enumerating all solutions is not a good optimization approach and only applicable in
“small” or trivial problem instances.
Although the objective function of Example E2.2 could not be solved for x, its results can
still be easily determined for any given candidate solution. Yet, there are many optimization
problems where this is not the case.
Example E2.3 (Antenna Design).
Designing antennas is an optimization problem which has been approached by many
researchers such as Chattoraj and Roy [532], Choo et al. [575], John and Ammann
[1450, 1451], Koza et al. [1615], Lohn et al. [1769, 1770], and Villegas et al. [2814], amongst
others. RahmatSamii and Michielssen dedicate no less than four chapters to it in their
book on the application of (evolutionary) optimization methods to electromagnetismrelated
tasks [2250]. Here, the problem spaces X are the sets of feasible antenna designs with many
degrees of freedom in up to three dimensions, also often involving wire thickness and geom
etry.
The objective functions usually involve maximizing the antenna gain
5
and bandwidth
together with constraints concerning, for instance, weight and shape of the proposed con
structions. However, computing the gain (depending on the signal frequency and radiation
angle) of an arbitrarily structured antenna is not trivial. In most cases, a simulation envi
ronment and considerable computational power is required. Besides the fact that there are
many possible antenna designs, diﬀerent from Example E2.2 the evaluation of even small
sets of candidate solutions may take a substantial amount of time.
Example E2.4 (Analog Electronic Circuit Design).
A similar problem is the automated analog circuit design, where a circuit diagram involving
a combination of diﬀerent electronic components (resistors, capacitors, inductors, diodes,
and/or transistors) is sought which fulﬁlls certain requirements. Here, the tasks span from
the synthesis of ampliﬁers [1607, 1766], ﬁlters [1766], such as Butterworth [1608, 1763, 1764]
and Chebycheﬀ ﬁlters [1608], to analog robot controllers [1613].
Computing the features of analog circuits can be done by hand with some eﬀort, but
requires deeper knowledge of electrical engineering. If an analysis of the frequency and timing
behavior is required, the diﬃculty of the equations quickly goes out of hand. Therefore, in
order to evaluate the ﬁtness of circuits, simulation environments [2249] like SPICE [2815]
are normally used. Typically, a set of server computers is employed to execute these external
programs so the utility of multiple candidate solutions can be determined in parallel.
Example E2.5 (Aircraft Design).
Aircraft design is another domain where many optimization problems can be spotted.
Obayashi [2059] and Oyama [2111], for instance, tried to ﬁnd wing designs with minimal
drag and weight but maximum tank volume, whereas Chung et al. [579] searched a geometry
with a minimum drag and shock pressure/sonic boom for transonic aircrafts and Lee and
Hajela [1704, 1705] optimized the design of rotor blades for helicopters. Like in the electrical
engineering example, the computational costs of computing the objective values rating the
aforementioned features of the candidate solutions are high and involve simulating the air
ﬂows around the wings and blades.
5
http://en.wikipedia.org/wiki/Antenna_gain [accessed 20090619]
48 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
Example E2.6 (Interactive Optimization).
Especially in the area of Evolutionary Computation, some interactive optimization methods
have been developed. The Interactive Genetic Algorithms
6
(IGAs) [471, 2497, 2529, 2651,
2652] outsource the objective function evaluation to human beings who, for instance, judge
whether an evolved graphic
7
or movie [2497, 2746] is attractive or music
8
[1453] sounds good.
By incorporating this information in the search process, esthetic works of art are created. In
this domain, it is not really possible to devise any automated method for assigning objective
values to the points in the problem space. When it comes to judging art, human beings will
not be replaced by machines for a long time.
In Kosorukoﬀ’s Human Based Genetic Algorithm
9
(HBGAs, [1583]) and Unemi’s graphic
and movie tool [2746], this approach is driven even further by also letting human “agents”
select and combine the most interesting candidate solutions step by step.
All interactive optimization algorithms have to deal with the peculiarities of their hu
man “modules”, such as indeterminism, inaccuracy, ﬁckleness, changing attention and loss
of focus which, obviously, also rub oﬀ on the objective functions. Nevertheless, similar fea
tures may also occur if determining the utility of a candidate solution involves randomized
simulations or measuring data from realworld processes.
Let us now summarize the information gained from these examples. The ﬁrst step when
approaching an optimization problem is to choose the problem space X, as we will later
discuss in detail in Chapter 7 on page 109. Objective functions are not necessarily mere
mathematical expressions, but can be complex algorithms that, for example, involve multiple
simulations. Global Optimization comprises all techniques that can be used to ﬁnd the
best elements x
⋆
in X with respect to such criteria. The next question to answer is What
constitutes these “best” elements?
6
http://en.wikipedia.org/wiki/Interactive_genetic_algorithm [accessed 20070703]
7
http://en.wikipedia.org/wiki/Evolutionary_art [accessed 20090618]
8
http://en.wikipedia.org/wiki/Evolutionary_music [accessed 20090618]
9
http://en.wikipedia.org/wiki/HBGA [accessed 20070703]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 49
Tasks T2
9. In Example E1.2 on page 26, we stated the layout of circuits as a combinatorial opti
mization problem. Deﬁne a possible problem space X for this task that allows us to
assign each circuit c ∈ C to one location of an m nsized card. Deﬁne an objective
function f : X → R
+
which denotes the total length of required wiring required for a
given element x ∈ X. For this purpose, you can ignore wire crossing and simply assume
that a wire going from location (i, j) to location (k, l) has length [i −k[ +[j −l[.
[10 points]
10. Name some problems which can occur in circuit layout optimization as outlined in
Task 9 if the mentioned simpliﬁed assumption is indeed applied.
[5 points]
11. Deﬁne a problem space for general job shop scheduling problems as outlined in Exam
ple E1.3 on page 26. Do not make the problem speciﬁc to the cola/lemonade example.
[10 points]
12. Assume a network N = ¦n
1
, n
2
, . . . , n
m
¦ and that a mm distance matrix ∆ is given
that contains the distance between node n
i
and n
j
at position ∆
ij
. A value of +∞ at
∆
ij
means that there is no direct connection n
i
→n
j
between node n
i
and n
j
. These
nodes may, however, be connected by a path which leads over other, intermediate
nodes, such as n
i
→ n
A
→ n
B
→ n
j
. In this case, n
j
can be reached from n
i
over
a path leading ﬁrst to n
A
and then to n
B
before ﬁnally arriving at n
j
. Hence, there
would be a connection of ﬁnite length.
We consider the speciﬁc optimization task Find the shortest connection between node
n
1
and n
2
. Develop a problem space X which is able to represent all paths between
the ﬁrst and second node in an arbitrary network with m ≥ 2. Note that there is not
necessarily a direct connection between n
1
and n
2
.
Deﬁne a suitable objective function for this problem.
[15 points]
Chapter 3
Optima: What does good mean?
We already said that Global Optimization is about ﬁnding the best possible solutions for
given problems. Let us therefore now discuss what it is that makes a solution optimal
1
.
3.1 Single Objective Functions
In the case of optimizing a single criterion f, an optimum is either its maximum or minimum,
depending on what we are looking for. In the case of antenna design (Example E2.3), the
antenna gain was maximized and in Example E2.2, the goal was to ﬁnd a cyclic route of
minimal distance through ﬁve Chinese cities. If we own a manufacturing plant and have to
assign incoming orders to machines, we will do this in a way that miniminzes the production
time. We will, however, arrange the purchase of raw material, the employment of staﬀ, and
the placement of commercials in a way that maximizes the proﬁt, on the other hand.
In Global Optimization, it is a convention that optimization problems are most often
deﬁned as minimization tasks. If a criterion f is subject to maximization, we usually simply
minimize its negation (−f) instead. In this section, we will give some basic deﬁnitions on
what the optima of a single (objective) function are.
Example E3.1 (twodimensional objective function).
Figure 3.1 illustrates such a function f deﬁned over a twodimensional space X = (X
1
, X
2
) ⊂
R
2
. As outlined in this graphic, we distinguish between local and global optima. As illus
trated, a global optimum is an optimum of the whole domain X while a local optimum is
an optimum of only a subset of X.
1
http://en.wikipedia.org/wiki/Maxima_and_minima [accessed 20070703]
52 3 OPTIMA: WHAT DOES GOOD MEAN?
localmaximum
localminimum
globalminimum
localmaximum
globalmaximum
X
X
1
X
2
f
Figure 3.1: Global and local optima of a twodimensional function.
3.1.1 Extrema: Minima and Maxima of Diﬀerentiable Functions
A local minimum ˇ x
l
is a point which has the smallest objective value amongst its neighbors
in the problem space. Similarly, a local maximum ˆ x
l
has the highest objective value when
compared to its neighbors. Such points are called (local) extrema
2
. In optimization, a local
optimum is always an extremum, i. e., either a local minimum or maximum.
Back in high school, we learned: “If f : X → R (X ⊆ R) has a (local) extremum at x
⋆
l
and is diﬀerentiable at that point, then the ﬁrst derivative f
′
is zero at x
⋆
l
.”
x
⋆
l
is local extremum ⇒f
′
(x
⋆
l
) = 0 (3.1)
Furthermore, if f can be diﬀerentiated two times at x
⋆
l
, the following holds:
(f
′
(x
⋆
l
) = 0) ∧ (f
′′
(x
⋆
l
) > 0) ⇒ x
⋆
l
is local minimum (3.2)
(f
′
(x
⋆
l
) = 0) ∧ (f
′′
(x
⋆
l
) < 0) ⇒ x
⋆
l
is local maximum (3.3)
Alternatively, we can detect a local minimum or maximum also when the ﬁrst derivative
undergoes a sign change. If, for a small h, sign (f
′
(x
⋆
l
−h)) ,= sign (f
′
(x
⋆
l
+h)), then we can
state that:
(f
′
(x
⋆
l
) = 0) ∧
_
lim
h→0
f
′
(x
⋆
l
−h) < 0
_
∧
_
lim
h→0
f
′
(x
⋆
l
+h) > 0
_
⇒ x
⋆
l
is local minimum(3.4)
(f
′
(x
⋆
l
) = 0) ∧
_
lim
h→0
f
′
(x
⋆
l
−h) > 0
_
∧
_
lim
h→0
f
′
(x
⋆
l
+h) < 0
_
⇒ x
⋆
l
is local maximum(3.5)
2
http://en.wikipedia.org/wiki/Maxima_and_minima [accessed 20100907]
3.1. SINGLE OBJECTIVE FUNCTIONS 53
Example E3.2 (Extrema and Derivatives).
The function f(x) = x
2
+ 5 has the ﬁrst derivative f
′
(x) = 2x and the second derivative
f
′′
(x) = 2. To solve f
′
(x
⋆
l
) = 0 we set 0 = 2x
⋆
l
and get, obviously, x
⋆
l
= 0. For x
⋆
l
= 0,
f
′′
(x
⋆
l
) = 2 and hence, x
⋆
l
is a local minimum according to Equation 3.2.
If we take the function g(x) = x
4
− 2, we get g
′
(x) = 4x
3
and g
′′
(x) = 12x
2
. At x
⋆
l
,
both g
′
(x
⋆
l
) and g
′′
(x
⋆
l
) become 0. We cannot use Equation 3.2. However, if we take a small
step to the left, i. e., into x < 0, x < 0 ⇒ (4x
3
) < 0 ⇒ g
′
(x) < 0. For positive x, however,
x > 0 ⇒ (4x
3
) > 0 ⇒ g
′
(x) > 0. We have a sign change from − to + and according to
Equation 3.4, x
⋆
l
is a minimum of g(x).
From many of our previous examples such as Example E1.1 on page 25, Example E2.2 on
page 45, or Example E2.4 on page 47, we know that objective functions in many optimization
problems cannot be expressed in a mathematically closed form, let alone in one that can be
diﬀerentiated. Furthermore, Equation 3.1 to Equation 3.5 can only be directly applied if the
problem space is a subset of the real numbers X ⊆ R. They do not work for discrete spaces
or for vector spaces. For the twodimensional objective function given in Example E3.1, for
example, we would need to a apply Equation 3.6, the generalized form of Equation 3.1:
x
⋆
l
is local extremum ⇒grad(f)(x
⋆
l
) = 0 (3.6)
For distinguishing maxima, minima, and saddle points, the Hessian matrix
3
would be needed.
Even if the objective function exists in a diﬀerentiable form, for problem spaces X ⊆ R
n
of larger scales n > 5, computing the gradient, the Hessian, and their properties becomes
cumbersome. Derivativebased optimization is hence only feasible in a subset of optimization
problems.
In this book, we focus on derivativefree optimization. The precise deﬁnition of local
optima without using derivatives requires some sort of measure of what neighbor means
in the problem space. Since this requires some further deﬁnitions, we will postpone it to
Section 4.4.
3.1.2 Global Extrema
The deﬁnitions of globally optimal points are much more straightforward and can be given
based on the terms we already have discussed:
Deﬁnition D3.1 (Global Maximum). A global maximum ˆ x ∈ x of one (objective) func
tion f : X →R is an input element with f(ˆ x) ≥ f(x) ∀x ∈ X.
Deﬁnition D3.2 (Global Minimum). A global minimum ˇ x ∈ X of one (objective) func
tion f : X →R is an input element with f(ˇ x) ≤ f(x) ∀x ∈ X.
Deﬁnition D3.3 (Global Optimum). A global optimum x
⋆
∈ X of one (objective) func
tion f : X →R subject to minimization is a global minimum. A global optimum of a function
subject to maximization is a global maximum.
Of course, each global optimum, minimum, or maximum is, at the same time, also a local
optimum, minimum, or maximum. As stated before, the he deﬁnition of local optima is a
bit more complicated and will be discussed in detail in Section 4.4.
3
http://en.wikipedia.org/wiki/Hessian_matrix [accessed 20100907]
54 3 OPTIMA: WHAT DOES GOOD MEAN?
3.2 Multiple Optima
Many optimization problems have more than one globally optimal solution. Even a one
dimensional function f : R → R can have more than one global maximum, multiple global
minima, or even both in its domain X.
Example E3.3 (OneDimensional Functions with Multiple Optima).
If the function f(x) =
¸
¸
x
3
¸
¸
−8x
2
(illustrated in Fig. 3.2.a) was subject to minimization, we
100
0
100
200
10 5 0 5 10
x X Î
f(x)
x
1
^
x
2
^
Fig. 3.2.a:
x
3
−8x
2
: a function with
two global minima.
1
0.5
0
0.5
1
10 5 0 5 10
x
0
^ x
1
^ x ^
1
x X Î
f(x)
Fig. 3.2.b: The cosine function: in
ﬁnitely many optima.
20
40
60
80
100
10 5 0 5 10
xÎX
f(x)
X
^
Fig. 3.2.c: A function with uncount
able many minima.
Figure 3.2: Examples for functions with multiple optima
would ﬁnd that it has two global optimal x
⋆
1
= ˇ x
1
= −
16
3
and x
⋆
2
= ˇ x
2
=
16
3
. Another
interesting example is the cosine function sketched in Fig. 3.2.b. It has global maxima ˆ x
i
at
ˆ x
i
= 2iπ and global minima ˇ x
i
at ˇ x
i
= (2i + 1)π for all i ∈ Z. The correct solution of such
an optimization problem would then be a set X
⋆
of all optimal inputs in X rather than a
single maximum or minimum.
In Fig. 3.2.c, we sketch a function f(x) which evaluates x
2
outside the interval [−5, 5]
and has the value 25 in [−5, 5]. Hence, all the points in the real interval [−5, 5] are minima.
If the function is subject to minimization, it has uncountable inﬁnite global optima.
As mentioned before, the exact meaning of optimal is problem dependent. In singleobjective
optimization, it means to ﬁnd the candidate solutions which lead to either minimum or max
imum objective values. In multiobjective optimization, there exist a variety of approaches
to deﬁne optima which we will discuss indepth in Section 3.3.
Deﬁnition D3.4 (Optimal Set). The optimal set X
⋆
⊆ X of an optimization problem is
the set that contains all its globally optimal solutions.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 55
There are often multiple, sometimes even inﬁnite many optimal solutions in many optimiza
tion problems. Since the storage space in computers is limited, it is only possible to ﬁnd a
ﬁnite (sub)set of solutions. In the further text, we will denote the result of optimization
algorithms which return single candidate solutions (in sansserif font) as x and the results
of algorithms which are capable to produce sets of possible solutions as X = ¦x
1
, x
2
, . . . ¦.
Deﬁnition D3.5 (Optimization Result). The set X ⊆ X contains output elements x ∈ X
of an optimization process.
Obviously, the best outcome of an optimization process would be to ﬁnd X = X
⋆
. From
Example E3.3, we already know that this might actually be impossible if there are too
many global optima. Then, the goal should be to ﬁnd a X ⊂ X
⋆
in a way that the candidate
solutions are evenly distributed amongst the global optima and not all of them cling together
in a small area of X
⋆
. We will discuss the issue of diversity amongst the optima in Chapter 13
in more depth.
Besides that we are maybe not able to ﬁnd all global optima for a given optimization
problem, it may also be possible that we are not able to ﬁnd any global optimum at all or that
we cannot determine whether the results x ∈ X which we have obtained are optimal or not.
There exists, for instance, no algorithm which can solve the Traveling Salesman Problem
(see Example E2.2) in polynomial time, as described in detail in Section 12.3. Therefore,
if such a task involving a high enough number of locations is subject to optimization, an
eﬃcient approximation algorithm might produce reasonably good x in short time. In order
to know whether these are global optima, we would, however, need to wait for thousands of
years. Hence, in many problems, the goal of optimization is not necessarily to ﬁnd X = X
⋆
or X ⊂ X
⋆
but a good approximation X ≈ X
⋆
in a reasonable time.
3.3 Multiple Objective Functions
3.3.1 Introduction
Global Optimization techniques are not just used for ﬁnding the maxima or minima of single
functions f. In many realworld design or decision making problems, there are multiple
criteria to be optimized at the same time. Such tasks can be deﬁned as in Deﬁnition D3.6
which will later be reﬁned in Deﬁnition D6.2 on page 103.
Deﬁnition D3.6 (MultiObjective Optimization Problem (MOP)). In a multiobjective
optimization problem (MOP), a set f : X → R
n
consisting of n objective functions
f
i
: X → R, each representing one criterion, is to be optimized over a problem space
X [596, 740, 954].
f = ¦f
i
: X →R : i ∈ 1..n¦ (3.7)
f (x) = (f
1
(x) , f
2
(x) , . . . )
T
(3.8)
In the following, we will treat these sets as vector functions f : X → R
n
which return a
vector of real numbers of dimension n when applied to a candidate solution x ∈ X. To the
R
n
we will in this case refer to as objective space Y. Algorithms designed to optimize sets
of objective functions are usually named with the preﬁx multiobjective, like multiobjective
Evolutionary Algorithms (see Deﬁnition D28.2 on page 253).
Purshouse [2228, 2230] provides a nice classiﬁcation of the possible relations between
the objective functions which we illustrate here as Figure 3.3. In the best case from the
56 3 OPTIMA: WHAT DOES GOOD MEAN?
relation
dependent independent
conflict harmony
Figure 3.3: The possible relations between objective functions as given in [2228, 2230].
optimization perspective, two objective functions f
1
and f
2
are in harmony, written as f
1
∼ f
2
,
meaning that improving one criterion will also lead to an improvement in the other. Then, it
would suﬃce to optimize only one of them and the MOP can be reduced to a singleobjective
one.
Deﬁnition D3.7 (Harmonizing Objective Functions). In a multiobjective optimiza
tion problem, two objective functions f
i
and f
j
are harmonizing (f
i
∼
X
f
j
) in a subset X ⊆ X
of the problem space X if Equation 3.9 holds.
f
i
∼
X
f
j
⇒[f
i
(x
1
) < f
i
(x
2
) ⇒ f
j
(x
1
) < f
j
(x
2
) ∀x
1
, x
2
∈ X ⊆ X] (3.9)
The opposite are conﬂicting objectives where an improvement in one criterion leads to a
degeneration of the utility of the candidate solutions according to another criterion. If two
objectives f
3
and f
4
conﬂict, we denote this as f
3
≁ f
4
. Obviously, objective functions do not
need to be conﬂicting or harmonic for their complete domain X, but instead may exhibit
these properties on certain intervals only.
Deﬁnition D3.8 (Conﬂicting Objective Functions). In a multiobjective ptimization
problem, two objective functions f
i
and f
j
are conﬂicting (f
i
≁
X
f
j
) in a subset X ⊆ X of
the problem space X if Equation 3.10 holds in a subset X of the problem space X.
f
i
≁
X
f
j
⇒[f
i
(x
1
) < f
i
(x
2
) ⇒ f
j
(x
1
) > f
j
(x
2
) ∀x
1
, x
2
∈ X ⊆ X] (3.10)
Besides harmonizing and conﬂicting optimization criteria, there may be such which are
independent from each other. In a harmonious relationship, the optimization criteria can
be optimized simultaneously, i. e., an improvement of one leads to an improvement of the
other. If the criteria conﬂict, the exact opposite is the case. Independent criteria, however,
do not inﬂuence each other: It is possible to modify candidate solutions in a way that leads
to a change in one objective value while the other one remains constant (see Example E3.8).
Deﬁnition D3.9 (Independent Objective Functions). In a multiobjective optimiza
tion problem, two objective functions f
i
and f
j
are independent from each other in a subset
X ⊆ X of the problem space X if the MOP can be decomposed into two problems which can
optimized separately and whose solutions can then be composed to solutions of the original
problem.
If X = X, these relations are called total and the subscript X is left away in the notations. In
the following, we give some examples for typical situations involving harmonizing, conﬂicting,
and independent optimization criteria. We also list some typical multiobjective optimization
applications in Table 3.1, before discussing basic techniques suitable for deﬁning what is a
“good solution” in problems with multiple, conﬂicting criteria.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 57
Example E3.4 (Factory Example).
Multiobjective optimization often means to compromise conﬂicting goals. If we go back to
our factory example, we can specify the following objectives that all are subject to optimiza
tion:
1. Minimize the time between an incoming order and the shipment of the corresponding
product.
2. Maximize the proﬁt.
3. Minimize the costs for advertising, personal, raw materials etc..
4. Maximize the quality of the products.
5. Minimize the negative impact of waste and pollution caused by the production process
on environment.
The last two objectives seem to clearly conﬂict with the goal cost minimization. Between the
personal costs and the time needed for production and the product quality there should also
be some kind of contradictive relation, since producing many quality goods in short time
naturally requires many hands. The exact mutual inﬂuences between optimization criteria
can apparently become complicated and are not always obvious.
Example E3.5 (Artiﬁcial Ant Example).
Another example for such a situation is the Artiﬁcial Ant problem
4
where the goal is to
ﬁnd the most eﬃcient controller for a simulated ant. The eﬃciency of an ant is not only
measured by the amount of food it is able to pile. For every food item, the ant needs to
walk to some point. The more food it aggregates, the longer the distance it needs to walk.
If its behavior is driven by a clever program, it may take the ant a shorter walk to pile the
same amount of food. This better path, however, would maybe not be discovered by an ant
with a clumsy controller. Thus, the distance the ant has to cover to ﬁnd the food (or the
time it needs to do so) has to be considered in the optimization process, too.
If two control programs produce the same results and one is smaller (i. e., contains
fewer instructions) than the other, the smaller one should be preferred. Like in the factory
example, these optimization goals conﬂict with each other: A program which contains only
one instruction will likely not guide the ant to a food item and back to the nest.
From these both examples, we can gain another insight: To ﬁnd the global optimum could
mean to maximize one function f
i
∈ f and to minimize another one f
j
∈ f , (i ,= j). Hence,
it makes no sense to talk about a global maximum or a global minimum in terms of multi
objective optimization. We will thus retreat to the notation of the set of optimal elements
x
⋆
∈ X
⋆
⊆ X in the following discussions.
Since compromises for conﬂicting criteria can be deﬁned in many ways, there exist mul
tiple approaches to deﬁne what an optimum is. These diﬀerent deﬁnitions, in turn, lead to
diﬀerent sets X
⋆
. We will discuss some of these approaches in the following by using two
graphical examples for illustration purposes.
Example E3.6 (Graphical Example 1).
In the ﬁrst graphical example pictured in Figure 3.4, we want to maximize two independent
objective functions f
1
= ¦f
1
, f
2
¦. Both functions have the real numbers R as problem space
X
1
. The maximum (and thus, the optimum) of f
1
is ˆ x
1
and the largest value of f
2
is at ˆ x
2
.
In Figure 3.4, we can easily see that f
1
and f
2
are partly conﬂicting: Their maxima are at
diﬀerent locations and there even exist areas where f
1
rises while f
2
falls and vice versa.
4
See ?? on page ?? for more details.
58 3 OPTIMA: WHAT DOES GOOD MEAN?
y=f (x)
1
y=f (x)
2
y
x^
1
x^
2
xÎX
1
Figure 3.4: Two functions f
1
and f
2
with diﬀerent maxima ˆ x
1
and ˆ x
2
.
Example E3.7 (Graphical Example 2).
The objective functions f
1
and f
2
in the ﬁrst example are mappings of a onedimensional prob
x
3
^
x
4
^
x
1
^
x
2
^
X
2
x
2 x
2
x
1
x
1
f
3
f
4
Figure 3.5: Two functions f
3
and f
4
with diﬀerent minima ˇ x
1
, ˇ x
2
, ˇ x
3
, and ˇ x
4
.
lem space X
1
to the real numbers that are to be maximized. In the second example sketched
in Figure 3.5, we instead minimize two functions f
3
and f
4
that map a twodimensional
problem space X
2
⊂ R
2
to the real numbers R. Both functions have two global minima; the
lowest values of f
3
are ˇ x
1
and ˇ x
2
whereas f
4
gets minimal at ˇ x
3
and ˇ x
4
. It should be noted
that ˇ x
1
,= ˇ x
2
,= ˇ x
3
,= ˇ x
4
.
Example E3.8 (Independent Objectives).
Assume that the problem space are the threedimensional real vectors R
3
and the two
objectives f
1
and f
2
would be subject to minimization:
f
1
(x) =
_
(x
1
−5)
2
+ (x
2
+ 1)
2
(3.11)
f
2
(x) = x
2
3
−sin x −3 (3.12)
This problem can be decomposed into two separate problems, since f
1
can be optimized on
the ﬁrst two vector elements x
1
, x
2
alone and f
2
can be optimized by considering only x
3
while the values of x
1
and x
2
do not matter. After solving the two separate optimization
problems, their solutions can be combined to a full vector x
⋆
= (x
⋆
1
, x
⋆
2
, x
⋆
3
)
T
by taking the
solutions x
⋆
1
and x
⋆
2
from the optimization of f
1
and x
⋆
2
from the optimization process solving
f
2
.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 59
Example E3.9 (Robotic Soccer Cont.).
In Example E1.10 on page 37, we mentioned that in robotic soccer, the robots must try to
decide which action to take in order to maximize the chance of scoring a goal. If decision
making in robotic soccer was only based on this criteria, it is likely that most mates in
a robot team would jointly attack the goal of the opposing team, leaving the own one
unprotected. Hence, at least one second objective is important too: to minimize the chance
that the opposing team scores a goal.
Table 3.1: Applications and Examples of MultiObjective Optimization.
Area References
BusinessToBusiness [1559]
Chemistry [306, 668]
Combinatorial Problems [366, 491, 527, 654, 690, 1034, 1429, 1440, 1442, 1553, 1696,
1756–1758, 1859, 2019, 2025, 2158, 2472, 2473, 2862, 2895];
see also Table 20.1
Computer Graphics [1661]; see also Table 20.1
Data Mining [2069, 2398, 2644, 2645]; see also Table 20.1
Databases [1696]
Decision Making [198, 320, 861, 1299, 1559, 1571, 1801]; see also Table 20.1
Digital Technology [789, 869, 1479]
Distributed Algorithms and Sys
tems
[320, 605, 662, 663, 669, 861, 1299, 1488, 1559, 1571, 1574,
1801, 1840, 2071, 2644, 2645, 2862, 2897]
Economics and Finances [320, 662, 857, 1559]
Engineering [93, 519, 547, 548, 579, 605, 669, 737, 789, 869, 953, 1000,
1479, 1488, 1508, 1574, 1622, 1717, 1840, 1873, 2059, 2111,
2158, 2644, 2645, 2862, 2897]
Games [560, 997, 2102]
Graph Theory [669, 1488, 1574, 1840, 2025, 2897]
Logistics [527, 579, 861, 1034, 1429, 1442, 1553, 1756, 1859, 2059,
2111]
Mathematics [560, 662, 857, 869, 997, 1661, 2102]
Military and Defense [1661]
MultiAgent Systems [1661]
Physics [1000]
Security [1508, 1661]
Software [1508, 1559, 2863, 2897]; see also Table 20.1
Testing [2863]
Water Supply and Management [1268]
Wireless Communication [1717, 1796, 2158]
3.3.2 Lexicographic Optimization
The maybe simplest way to optimize a set f of objective functions is to solve them according
to priorities [93, 860, 2291, 2820]. Without loss of generality, we could, for instance, assume
that f
1
is strictly more important than f
2
, which, in turn, is more important than f
3
, and
so on. We then would ﬁrst determine the optimal set X
⋆
(1)
according to f
1
. If this set
60 3 OPTIMA: WHAT DOES GOOD MEAN?
contains more than one solution, we will then retain only those elements in X
⋆
(1)
which
have the best objective values according to f
2
compared to the other members of X
⋆
(1)
and
obtain X
⋆
(1,2)
⊆ X
⋆
(1)
. This process is repeated until either
¸
¸
¸X
⋆
(1,2,... )
¸
¸
¸ = 1 or all criteria have
been processed. By doing so, the multiobjective optimization problem has been reduced to
n = [f [ singleobjective ones to be solved iteratively. Then, the problem space X
i
of the i
th
problem equals X only for i = 1 and then is X
i
∀i > 1 = X
⋆
(1,2,..i−1)
[860].
Deﬁnition D3.10 (Lexicographically Better). A candidate solution x
1
∈ X is lexico
graphically better than another element x
2
∈ X (denoted by x
1
<
lex
x
2
) if
x
1
<
lex
x
2
⇔ (ω
i
f
i
(x
1
) < ω
i
f
i
(x
2
)) ∧ (i = min ¦k : f
k
(x
1
) ,= f
k
(x
2
)¦) (3.13)
ω
j
=
_
1 if f
j
should be minimized
−1 if f
j
should be maximized
(3.14)
where the ω
j
in Equation 3.13 only denote whether f
j
should be maximized or minimized,
as speciﬁed in Equation 3.14.
Since we generally consider minimization problems, the ω
j
will be assumed to be 1 if not
explicitly stated otherwise. If n objective functions are to be optimized, there are n! =
[Π(1..n)[ possible permutations of them where Π(1..n) is the set of all permutations of the
natural numbers from 1 to n. For each such permutation, the described approach may lead
to diﬀerent results. We can thus reﬁne the previous deﬁnition of “lexicographically better”
by putting it into the context of a permutation of the objective functions:
Deﬁnition D3.11 (Lexicographically Better (Reﬁned)). A candidate solution x
1
∈
X is lexicographically better than another element x
2
∈ X (denoted by x
1
<
lex,π
x
2
) according
to a permutation π ∈ Π(1..n) applied to the objective functions if
x
1
<
lex,π
x
2
⇔
_
ω
π[i]
f
π[i]
(x
1
) < ω
π[i]
f
π[i]
(x
2
)
_
∧
_
i = min
_
k : f
k
(x
1
) ,= f
π[k]
(x
2
)
__
(3.15)
where n = len(f ) is the number of objective functions, π[i] is the i
th
element of the permu
tation π, and the ω
i
retain their meaning from Equation 3.14.
Based on the <
lex
relation, we can now deﬁne optimality.
Deﬁnition D3.12 (Lexicographic Optimality). A candidate solution x
⋆
is lexicograph
ically optimal in terms of a permutation π ∈ Π(1..n) applied to the objective functions (and
therefore, member of the optimal set X
⋆
π
) if there is no lexicographically better element in
X, i. e.,
x
⋆
∈ X
⋆
π
⇔,∃x ∈ X : x<
lex,π
x
⋆
(3.16)
As lexicographically global optimal set X
⋆
we consider the union of the optimal sets X
⋆
π
for
all possible permutations π of the ﬁrst n natural numbers:
X
⋆
=
_
∀π∈Π(1..n)
X
⋆
π
(3.17)
X
⋆
can be computed by solving all the [Π(1..n)[ = n! diﬀerent multiobjective problems
arising, i. e., the n ∗ n! singleobjective problems, separately. This would lead to a more
than exponential complexity in n. In case that X is a ﬁnite set, this can be achieved in
polynomial time in n [860]. But which solutions would be contained in the X
⋆
π
or X
⋆
if we
use the lexicographical deﬁnition of optimality?
3.3. MULTIPLE OBJECTIVE FUNCTIONS 61
Example E3.10 (Example E3.4 Cont. Lexicographically).
In the factory example Example E3.4, we wished to minimize the production time
(f
1
, ω
1
= 1), maximize the proﬁt (f
2
, ω
2
= −1), minimize the running costs (f
3
, ω
3
= 1),
maximize a product quality measure (f
4
, ω
4
= −1), and minimize environmental damage
(f
5
, ω
5
= 1). Determining lexicographically optimal set X
⋆
π
for the identity permutation
π = (1, 2, 3, 4, 5) means to ﬁrst determine all possible conﬁgurations of the factory the
minimize the production time f
1
. Only if there is more than one such conﬁguration
_
len
_
X
⋆
(1)
_
> 1
_
, we consider f
2
. Amongst all elements in X
⋆
(1)
, we then copied those
to X
⋆
(1,2)
which have the highest f
2
value and so on. What we get in X
⋆
π
are conﬁgurations
which perform exceptionally well in terms of the production time but sacriﬁce proﬁt, running
costs, production quality, and will largely disregard any environmental issues. No tradeoﬀ
between the objectives will happen at all. Therefore, the set of all lexicographically optimal
solutions X
⋆
will contain the elements x
⋆
∈ X which have the best objective values regarding
to one single objective function while largely neglecting the others.
Example E3.11 (Example E3.5 Cont. Lexicographically).
The same would be the case in the Artiﬁcial Ant problem. X
⋆
, if determined lexicographi
cally, contains the
1. smallest possible program (length 0),
2. the program which leads to the highest possible food collection, and
3. the program which guides the ant along the shortest possible path (which is again the
empty program).
Example E3.12 (Example E3.6 Cont. Lexicographically).
There are exactly two lexicographically optimal points on the two functions illustrated in
Figure 3.4. The ﬁrst one x
⋆
1
is the single global maximum ˆ x
1
of f
1
and the other one (x
⋆
2
) is
the single global maximum ˆ x
2
of f
2
. In other words, X
⋆
(1,2)
= ¦x
⋆
1
¦ and X
⋆
(2,1)
= ¦x
⋆
2
¦. Since
Π(1, 2) = ¦(1, 2) , (2, 1)¦, it follows that X
⋆
= X
⋆
(1,2)
∪ X
⋆
(2,1)
= ¦x
⋆
1
, x
⋆
2
¦.
Example E3.13 (Example E3.7 Cont. Lexicographically).
The twodimensional objective functions f
3
and f
4
from Figure 3.5 each have two global
minima ˇ x
1
, ˇ x
2
and ˇ x
3
, ˇ x
4
, respectively. Although it is a bit hard to see in the ﬁgure itself,
f
4
(ˇ x
1
) = f
4
(ˇ x
2
) and f
3
(ˇ x
3
) = f
3
(ˇ x
4
) because of the symmetry in the two objective functions.
This means that X
⋆
(3,4)
= ¦ˇ x
1
, ˇ x
2
¦, that X
⋆
(4,3)
= ¦ˇ x
3
, ˇ x
4
¦, and that X
⋆
= X
⋆
(3,4)
∪ X
⋆
(4,3)
=
¦ˇ x
1
, ˇ x
2
, ˇ x
3
, ˇ x
4
¦.
3.3.2.1 Problems with the Lexicographic Method
From these examples it should become clear that the lexicographic approach to deﬁning what
is optimal and what not has some advantages and some disadvantages. On the positive side,
we ﬁnd that it is very easy realize and that it allows us to treat a multiobjective optimization
problem by iteratively solving singleobjective ones. On the downside, it does not perform
any tradeoﬀ between the diﬀerent objective functions at all and we only ﬁnd the extreme
values of the optimization criteria. In the factory example, for instance, we would surely
be interested in solutions which lead to a short production time but do not entirely ignore
any budget issues. Such diﬀerentiations, however, are not possible with the lexicographic
method.
62 3 OPTIMA: WHAT DOES GOOD MEAN?
3.3.3 Weighted Sums (Linear Aggregation)
Another very simple method to deﬁne what an optimal solution is, is to compute a weighted
sum ws(x) of all objective functions f
i
(x) ∈ f . Since a weighted sum is a linear aggregation
of the objective values, it is also often referred to as linear aggregating. Each objective f
i
is
multiplied with a weight w
i
representing its importance. Using signed weights allows us to
minimize one objective while maximizing another one. We can, for instance, apply a weight
w
a
= 1 to an objective function f
a
and the weight w
b
= −1 to the criterion f
b
. By minimizing
ws(x), we then actually minimize the ﬁrst and maximize the second objective function. If
we instead maximize ws(x), the eﬀect would be inverse and f
b
would be minimized and while
f
a
is maximized. Either way, multiobjective problems are reduced to singleobjective ones
with this method. As illustrated in Equation 3.19, a global optimum then is a candidate
solution x
⋆
∈ X for which the weighted sum of all objective values is less or equal to those
of all other candidate solutions (if minimization of the weighted sum is assumed).
ws(x) =
n
i=1
w
i
f
i
(x) =
∀f
i
∈f
w
i
f
i
(x) (3.18)
x
⋆
∈ X
⋆
⇔ ws(x
⋆
) ≤ ws(x) ∀x ∈ X (3.19)
With this method, we can also emulate lexicographical optimization in many cases by simply
setting the weights to values of diﬀerent magnitude. If the objective functions f
i
∈ f all had
the range f
i
: X → 0..10, for instance, setting the weights to w
i
= 10
i
would eﬀectively
achieve this.
Example E3.14 (Example E3.6 Cont.: Weighted Sums).
Figure 3.6 demonstrates optimization with the weighted sum approach for the example given
in Example E3.6. The weights are both set to 1 = w
1
= w
2
. If we maximize ws
1
(x), we will
thus also maximize the functions f
1
and f
2
. This leads to a single optimum x
⋆
= ˆ x.
y
y =f (x)
1 1
y =f (x)
2 2
y=ws (x)=f (x)+f (x)
1 1 2
x^
xÎX
1
Figure 3.6: Optimization using the weighted sum approach (ﬁrst example).
Example E3.15 (Example E3.7 Cont.: Weighted Sums).
The sum of the twodimensional functions f
3
and f
4
from the second graphical Example E3.7
is sketched in Figure 3.7. Again, we set the weights w
3
and w
4
to 1. The sum ws
2
, however,
is subject to minimization. The graph of ws
2
has two especially deep valleys. At the bottoms
of these valleys, the two global minima ˇ x
5
and ˇ x
6
can be found.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 63
x
2 x
1
y
x
5
^
x
6
^
y=ws (x)=f (x)+f (x)
2 3 4
Figure 3.7: Optimization using the weighted sum approach (second example).
3.3.3.1 Problems with Weighted Sums
This approach has a variety of drawbacks. One is that it cannot handle functions that rise
or fall with diﬀerent speed
5
properly. Whitley [2929] pointed out that the results of an
objective function should not be considered as a precise measure of utility. Adding multiple
such values up thus does not necessarily make sense.
Example E3.16 (Problematic Objectives for Weighted Sums).
In Figure 3.8, we have sketched the sum ws(x) of the two objective functions f
1
(x) = −x
2
and f
2
(x) = e
x−2
. When minimizing or maximizing this sum, we will always disregard
one of the two functions, depending on the interval chosen. For small x, f
2
is negligible
compared to f
1
. For x > 5 it begins to outpace f
1
which, in turn, will now become negligible.
Such functions cannot be added up properly using constant weights. Even if we would set
w
1
to the really large number 10
10
, f
1
will become insigniﬁcant for all x > 40, because
¸
¸
¸
¸
−(40
2
)∗10
10
e
40−2
¸
¸
¸
¸
≈ 0.0005.
5
See Deﬁnition D12.1 on page 144 or http://en.wikipedia.org/wiki/Asymptotic_notation [ac
cessed 20070703] for related information.
64 3 OPTIMA: WHAT DOES GOOD MEAN?
45
35
25
15
5
5
5 3 1
15
25
1 3 5
y =f (x)
1 1
y =f (x)
2 2
y=g(x)
=f (x)+f (x)
1 2
Figure 3.8: A problematic constellation for the weighted sum approach.
Weighted sums are only suitable to optimize functions that at least share the same bigO
notation (see Deﬁnition D12.1 on page 144). Often, it is not obvious how the objective
functions will fall or rise. How can we, for instance, determine whether the objective max
imizing the food piled by an Artiﬁcial Ant rises in comparison to the objective minimizing
the distance walked by the simulated insect?
And even if the shape of the objective functions and their complexity class were clear,
the question about how to set the weights w properly still remains open in most cases [689].
In the same paper, Das and Dennis [689] also show that with weighted sum approaches, not
necessarily all elements considered optimal in terms of Pareto domination (see next section)
will be found. Instead, candidate solutions hidden in concavities of the tradeoﬀ curve will
not be discovered [609, 1621, 1622] (see Example E3.17 on the next page for diﬀerent shapes
of tradeoﬀ curves). Also, the candidate solutions which will ﬁnally be contained in the
result set X will not necessarily be evenly spaced [1000]. Amongst further authors who
discuss the drawbacks of the linear aggregation method are Koski [1582] and Messac and
Mattson [1873].
3.3.4 Weighted MinMax
The weighted minmax approach [1303, 1305, 1554, 1741] for multiobjective optimization
compares two candidate solutions x
1
and x
2
on basis of the minimum (or maximum) of
their weighted objective values. A candidate solution x
1
∈ X is preferred in comparison
with x
2
∈ X if Equation 3.20 holds
min ¦w
i
f
i
(x
1
) : i ∈ 1..n¦ < min ¦w
i
f
i
(x
2
) : i ∈ 1..n¦ (3.20)
where w
i
are the weights of the objective values. For objective functions subject to mini
mization, w
i
> 0 is used and w
i
is set to a value below zero for functions to be maximized.
This approach which is also called Weighted Tschebychev Norm is able to ﬁnd solutions on
both, convex and concave tradeoﬀ surfaces (see Deﬁnition D3.15 on the facing page). The
weight vector describes a beam in direction (1/w
1
, 1/w
2
, .., 1/w
n
)
T
from the point of origin
in objective space to and along which the optimizer should converge.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 65
3.3.5 Pareto Optimization
The mathematical foundations for multiobjective optimization which considers conﬂicting
criteria in a fair way has been laid by Edgeworth [857] and Pareto [2127] around one hundred
ﬁfty years ago [1642]. Pareto optimality
6
became an important notion in economics, game
theory, engineering, and social sciences [560, 997, 2102, 2938]. It deﬁnes the frontier of
solutions that can be reached by tradingoﬀ conﬂicting objectives in an optimal manner.
From this front, a decision maker (be it a human or an algorithm) can ﬁnally choose the
conﬁgurations that, in her opinion, suit best [264, 528, 952, 954, 1010, 1166, 2610]. The
notation of optimal in the Pareto sense is strongly based on the deﬁnition of domination:
Deﬁnition D3.13 (Domination). An element x
1
dominates (is preferred to) an element
x
2
(x
1
⊣ x
2
) if x
1
is better than x
2
in at least one objective function and not worse with
respect to all other objectives. Based on the set f of objective functions f, we can write:
x
1
⊣ x
2
⇔ ∀i ∈ 1..n ⇒ ω
i
f
i
(x
1
) ≤ ω
i
f
i
(x
2
) ∧
∃j ∈ 1..n : ω
j
f
j
(x
1
) < ω
j
f
j
(x
2
) (3.21)
ω
i
=
_
1 if f
i
should be minimized
−1 if f
i
should be maximized
(3.22)
We provide a Java version of a comparison operator implementing the domination relation
in Listing 56.55 on page 838. Like in Equation 3.13 in the deﬁnition of the lexicographical
ordering of candidate solutions, the ω
i
again denote whether f
i
should be maximized or
minimized. Thus Equation 3.22 and Equation 3.14 are the same. Since we generally assume
minimization, the ω
i
will always be 1 if not explicitly stated otherwise in the rest of this
book. Then, if a candidate solution x
1
dominates another one (x
2
), f (x
1
) is partially less
than f (x).
The Pareto domination relation deﬁnes a strict partial order (an irreﬂexive, asymmetric,
and transitive binary relation, see Deﬁnition D51.16 on page 645) on the space of possible
objective values Y. In contrast, the weighted sum approach imposes a total order by pro
jecting the objective space Y it into the real numbers R. The use of Pareto domination in
multiobjective Evolutionary Algorithms was ﬁrst proposed by [1075] back in 1989 [2228].
Deﬁnition D3.14 (Pareto Optimal). An element x
⋆
∈ X is Pareto optimal (and hence,
part of the optimal set X
⋆
) if it is not dominated by any other element in the problem space
X. X
⋆
is called the Paretooptimal set or Pareto set.
x
⋆
∈ X
⋆
⇔,∃x ∈ X : x ⊣ x
⋆
(3.23)
The Pareto optimal set will also comprise all lexicographically optimal solutions since, per
deﬁnition, it includes all the extreme points of the objective functions. Additionally, it
provides the tradeoﬀ solutions which we longed for in Section 3.3.2.1.
Deﬁnition D3.15 (Pareto Frontier). For a given optimization problem, the Pareto
front(ier) F
⋆
⊂ R
n
is deﬁned as the set of results the objective function vector f creates
when it is applied to all the elements of the Paretooptimal set X
⋆
.
F
⋆
= ¦f (x
⋆
) : x
⋆
∈ X
⋆
¦ (3.24)
Example E3.17 (Shapes of Optimal Sets / Pareto Frontiers).
The Pareto front in Example E13.2 has a convex geometry, but there are other diﬀerent
6
http://en.wikipedia.org/wiki/Pareto_efficiency [accessed 20070703]
66 3 OPTIMA: WHAT DOES GOOD MEAN?
f
1
f
2
Fig. 3.9.a: NonConvex
(Concave)
f
1
f
2
Fig. 3.9.b: Discon
nected
f
1
f
2
Fig. 3.9.c: linear
f
1
f
2
Fig. 3.9.d: Non
Uniformly Distributed
Figure 3.9: Examples of Pareto fronts [2915].
shapes as well. In Figure 3.9 [2915], we show some examples, including nonconvex (concave),
disconnected, linear, and nonuniformly distributed Pareto fronts. Diﬀerent structures of the
optimal sets pose diﬀerent challenges to achieve good convergence and spread. Optimization
processes driven by linear aggregating functions will, for instance, have problems to ﬁnd all
portions of Pareto fronts with nonconvex geometry as shown by Das and Dennis [689].
3.3. MULTIPLE OBJECTIVE FUNCTIONS 67
In this book, we will refer to candidate solutions which are not dominated by any element
in a reference set as nondominated. This reference set may be the whole problem space X
itself (in which case the nondominated elements are global optima) or any subset X ⊆ x of
it. Many optimization algorithms, for instance, work on populations of candidate solutions
and in such a context, a nondominated candidate solution would be one not dominated by
any other element in the population.
Deﬁnition D3.16 (Domination Rank). We deﬁne the domination rank
#
dom(x, X) of
a candidate solution x relative to a reference set X as
#
dom(x, X) = [¦x
′
: (x
′
∈ X) ∧ (x
′
⊣ x)¦[ (3.25)
Hence, an element x with
#
dom(x, X) = 0 is nondominated in X and a candidate solution
x
⋆
with
#
dom(x
⋆
, X) = 0 is a global optimum. The higher the domination rank, the worse
is the associated element.
Example E3.18 (Example E3.6 Cont.: Pareto Optimization).
In Figure 3.10, we illustrate the impact of the deﬁnition of Pareto optimality on our ﬁrst
example (outlined in Example E3.6). We again assume that f
1
and f
2
should both be
maximized and hence, ω
1
= ω
2
= −1. The areas shaded with dark gray are Pareto optimal
and thus, represent the optimal set X
⋆
= [x
2
, x
3
] ∪ [x
5
, x
6
] which contains inﬁnite many
elements since it is composed of intervals on the real axis. In practice, of course, our
computers can only handle ﬁnitely many elements and we would not be able to ﬁnd all
optimal elements unless, of course, we could apply a method able to mathematically resolve
the objective functions and to return the intervals in a form similar to the notation we used
here. All candidate solutions not in the optimal set are dominated by at least one element
of X
⋆
.
y
x
1
x
2
x
3
x
4
x
5
x
6
y=f (x)
1
y=f (x)
2
xÎX
1
Figure 3.10: Optimization using the Pareto Frontier approach.
The points in the area between x
1
and x
2
(shaded in light gray) are dominated by other
points in the same region or in [x
2
, x
3
], since both functions f
1
and f
2
can be improved by
increasing x. In other words, f
1
and f
2
harmonize in [x
1
, x
2
): f
1
∼
[x
1
,x
2
)
f
2
. If we start at
the leftmost point in X (which is position x
1
), for instance, we can go one small step ∆
to the right and will ﬁnd a point x
1
+ ∆ dominating x
1
because f
1
(x
1
+ ∆) > f
1
(x
1
) and
f
2
(x
1
+ ∆) > f
2
(x
1
). We can repeat this procedure and will always ﬁnd a new dominating
point until we reach x
2
. x
2
demarks the global maximum of f
2
– the point with the highest
possible f
2
value – which cannot be dominated by any other element of X by deﬁnition (see
Equation 3.21).
68 3 OPTIMA: WHAT DOES GOOD MEAN?
From here on, f
2
will decrease for some time, but f
1
keeps rising. If we now go a
small step ∆ to the right, we will ﬁnd a point x
2
+ ∆ with f
2
(x
2
+ ∆) < f
2
(x
2
) but also
f
1
(x
2
+ ∆) > f
1
(x
2
). One objective can only get better if another one degenerates, i. e., f
1
and f
2
conﬂict f
1
≁
[x
2
,x
3
]
f
2
. In order to increase f
1
, f
2
would be decreased and vice versa
and hence, the new point is not dominated by x
2
. Although some of the f
2
(x) values of the
other points x ∈ [x
1
, x
2
) may be larger than f
2
(x
2
+ ∆), f
1
(x
2
+ ∆) > f
1
(x) holds for all of
them. This means that no point in [x
1
, x
2
) can dominate any point in [x
2
, x
4
] because f
1
keeps rising until x
4
is reached.
At x
3
however, f
2
steeply falls to a very low level. A level lower than f
2
(x
5
). Since the f
1
values of the points in [x
5
, x
6
] are also higher than those of the points in (x
3
, x
4
], all points
in the set [x
5
, x
6
] (which also contains the global maximum of f
1
) dominate those in (x
3
, x
4
].
For all the points in the white area between x
4
and x
5
and after x
6
, we can derive similar
relations. All of them are also dominated by the nondominated regions that we have just
discussed.
Example E3.19 (Example E3.7 Cont.: Pareto Optimization).
Another method to visualize the Pareto relationship is outlined in Figure 3.11 for our second
graphical example. For a certain gridbased resolution X
′
2
⊂ X
2
of the problem space X
2
, we
counted the number
#
dom(x, X
′
2
) of elements that dominate each element x ∈ X
′
2
. Again,
the higher this number, the worst is the element x in terms of Pareto optimization. Hence,
those candidate solutions residing in the valleys of Figure 3.11 are better than those which are
part of the hills. This Pareto ranking approach is also used in many optimization algorithms
as part of the ﬁtness assignment scheme (see Section 28.3.3 on page 275, for instance). A
nondominated element is, as the name says, not dominated by any other candidate solution.
These elements are Pareto optimal and have a domination rank of zero. In Figure 3.11, there
are four such areas X
⋆
1
, X
⋆
2
, X
⋆
3
, and X
⋆
4
.
x
2 x
1
#dom
X
1
«
X
2
«
X
3
«
X
4
«
Figure 3.11: Optimization using the Pareto Frontier approach (second example).
If we compare Figure 3.11 with the plots of the two functions f
3
and f
4
in Figure 3.5, we
can see that hills in the domination space occur at positions where both, f
3
and f
4
have high
values. Conversely, regions of the problem space where both functions have small values are
dominated by very few elements.
Besides these examples here, another illustration of the domination relation which may help
understanding Pareto optimization can be found in Section 28.3.3 on page 275 (Figure 28.4
and Table 28.1).
3.3. MULTIPLE OBJECTIVE FUNCTIONS 69
3.3.5.1 Problems of Pure Pareto Optimization
Besides the fact that we will usually not be able to list all Pareto optimal elements, the
complete Pareto optimal set is often also not the wanted result of the optimization process.
Usually, we are rather interested in some special areas of the Pareto front only.
Example E3.20 (Example E3.5 – Artiﬁcial Ant – Cont.).
We can again take the Artiﬁcial Ant example to visualize this problem. In Example E3.5
on page 57 we have introduced multiple conﬂicting criteria in this problem.
1. Maximize the amount of food piled.
2. Minimize the distance covered or the time needed to ﬁnd the food.
3. Minimize the size of the program driving the ant.
If we applied pure Pareto optimization to these objective, we may yield for example:
1. A program x
1
consisting of 100 instructions, allowing the ant to gather 50 food items
when walking a distance of 500 length units.
2. A program x
2
consisting of 100 instructions, allowing the ant to gather 60 food items
when walking a distance of 5000 length units.
3. A program x
3
consisting of 50 instructions, allowing the ant to gather 61 food items
when walking a distance of 1 000 000 length units.
4. A program x
4
consisting of 10 instructions, allowing the ant to gather 1 food item
when walking a distance of 5 length units.
5. A program x
5
consisting of 0 instructions, allowing the ant to gather 0 food item when
walking a distance of 0 length units.
The result of the optimization process obviously contains useless but nondominated indi
viduals which occupy space in the nondominated set. A program x
3
which forces the ant
to scan each and every location on the map is not feasible even if it leads to a maximum
in food gathering. Also, an empty program x
5
is useless because it will lead to no food
collection at all. Between x
1
and x
2
, the diﬀerence in the amount of piled food is only 20%,
but the ant driven by x
2
has to cover ten times the distance created by x
1
. Therefore, the
question of the actual utility arises in this case, too.
If such candidate solutions are created and archived, processing time is invested in evaluating
them (and we have already seen that evaluation may be a costly step of the optimization
process). Ikeda et al. [1400] show that various elements, which are apart from the true
Pareto frontier, may survive as hardlydominated solutions (called dominance resistant, see
Section 20.2). Also, such solutions may dominate others which are not optimal but fall into
the space behind the interesting part of the Pareto front. Furthermore, memory restrictions
usually force us to limit the size of the list of nondominated solutions X found during
the search. When this size limit is reached, some optimization algorithms use a clustering
technique to prune the optimal set while maintaining diversity. On one hand, this is good
since it will preserve a broad scan of the Pareto frontier. In this case, on the other hand, a
short but dumb program is of course very diﬀerent from a longer, intelligent one. Therefore,
it will be kept in the list and other solutions which diﬀer less from each other but are more
interesting for us will be discarded.
Better yet, nondominated elements usually have a higher probability of being explored
further. This then leads inevitably to the creation of a great proportion of useless oﬀspring.
70 3 OPTIMA: WHAT DOES GOOD MEAN?
In the next iteration of the optimization process, these useless oﬀspring will need a good
share of the processing time to be evaluated.
Thus, there are several reasons to force the optimization process into a wanted direction.
In ?? on page ?? you can ﬁnd an illustrative discussion on the drawbacks of strict Pareto
optimization in a practical example (evolving web service compositions).
3.3.5.2 Goals of Pareto Optimization
Purshouse [2228] identiﬁes three goals of Pareto optimization which also mirror the afore
mentioned problems:
1. Proximity: The optimizer should obtain result sets X which come as close to the Pareto
frontier as possible.
2. Diversity: If the real set of optimal solutions is too large, the set X of solutions actually
discovered should have a good distribution in both uniformity and extent.
3. Pertinency: The set X of solutions discovered should match the interests of the decision
maker using the optimizer. There is no use in containing candidate solutions with no
actual merits for the human operator.
3.4 Constraint Handling
Often, there are feasibility limits for the parameters of the candidate solutions or the objec
tive function values. Such regions of interest are one of the reasons for one further extension
of multiobjective optimization problems:
Deﬁnition D3.17 (Feasibility). In an optimization problem, p ≥ 0 inequality constraints
g and q ≥ 0 equality constraints h may be imposed additionally to the objective functions.
Then, a candidate solution x is feasible, if and only if it fulﬁls all constraints:
isFeasible(x) ⇔ g
i
(x) ≥ 0 ∀i ∈ 1..p∧
h
j
(x) = 0 ∀j ∈ 1..q
(3.26)
A candidate solution x for which isFeasible(x) does not hold is called infeasible. Obviously,
only a feasible individual can be a solution, i. e., an optimum, for a given optimization
problem. Constraint handling and the notation of feasibility can be combined with all the
methods for singleobjective and multiobjective optimization previously discussed. Com
prehensive reviews on techniques for such problems have been provided by Michalewicz
[1885], Michalewicz and Schoenauer [1890], and Coello Coello et al. [594, 599] in the con
text of Evolutionary Computation. In Table 3.2, we give some examples for constraints in
optimization problems.
Table 3.2: Applications and Examples of Constraint Optimization.
Area References
Combinatorial Problems [237, 1161, 1735, 1822, 1871, 1921, 2036, 2072, 2658, 3034]
Computer Graphics [2487, 2488]
Databases [1735, 1822, 2658]
3.4. CONSTRAINT HANDLING 71
Decision Making [1801]
Distributed Algorithms and Sys
tems
[1801, 3034]
Economics and Finances [530, 531, 3034]
Engineering [500, 519, 530, 547, 1881, 3070]
Function Optimization [1732]
3.4.1 Genome and Phenomebased Approaches
One approach to ensure that constraints are always obeyed is to ensure that only feasible
candidate solutions can occur during the optimization process.
1. Coding: This can be done as part of the genotypephenotype mapping by using so
called decoders. An example for a decoder is given in [2472, 2473]. [2228]
2. Repairing: Infeasible candidate solutions could be repaired, i. e., turned into feasible
ones [2228, 3041] as part of the genotypephenotype mapping as well.
3.4.2 Death Penalty
Probably the easiest way of dealing with constraints in objective and ﬁtness space is to simply
reject all infeasible candidate solutions right away and not considering them any further in
the optimization process. This death penalty [1885, 1888] can only work in problems where
the feasible regions are very large. If the problem space contains only few or noncontinuously
distributed feasible elements, such an approach will lead the optimization process to stagnate
because most eﬀort is spent in sampling untargeted in the hope to ﬁnd feasible candidate
solutions. Once this has happened, the transition to other feasible elements may be too
complicated and the search only ﬁnds local optima. Additionally, if infeasible elements are
directly discarded, the information which could be gained from them is disposed with them.
3.4.3 Penalty Functions
Maybe one of the most popular approach for dealing with constraints, especially in the
area of singleobjective optimization, goes back to Courant [647] who introduced the idea of
penalty functions in 1943. Here, the constraints are combined with the objective function f,
resulting in a new function f
′
which is then actually optimized. The basic idea is that this
combination is done in a way which ensures that an infeasible candidate solution always has
a worse f
′
value than a feasible one with the same objective values. In [647], this is achieved
by deﬁning f
′
as f
′
(x) = f(x)+v [h(x)]
2
. Various similar approaches exist. Carroll [500, 501],
for instance, chose a penalty function of the form f
′
(x) = f(x) +v
p
i=1
[g
i
(x)]
−1
in order to
ensure that the function g always stays larger than zero.
There are practically no limits for the ways in which a penalty for infeasibility can be
integrated into the objective functions. Several researchers suggest dynamic penalties which
incorporate the index of the current iteration of the optimizer [1458, 2072] or adaptive penal
ties which additionally utilize population statistics [237, 1161, 2487]. Thorough discussions
on penalty functions have been contributed by Fiacco and McCormick [908] and Smith and
Coit [2526].
72 3 OPTIMA: WHAT DOES GOOD MEAN?
3.4.4 Constraints as Additional Objectives
Another idea for handling constraints would be to consider them as new objective functions.
If g(x) ≥ 0 must hold, for instance, we can transform this to a new objective function which
is subject to minimization.
f
≥
(x) = −min ¦g(x) , 0¦ (3.27)
As long as g is larger than 0, the constraint is met. f
≥
will be zero for all candidate solutions
which fulﬁll this requirement, since there is no use in maximizing g further than 0. However,
if g gets negative, the minimum term in f
≥
will become negative as well. The minus in
front of it will turn it positive. The higher the absolute value of the negative g, the higher
will the value of f
≥
become. Since it is subject to minimization, we thus apply a force into
the direction of meeting the constraint. No optimization pressure is applied as long as the
constraint is met. An equality constraint h can similarly be transformed to a function f
=
(x)
(again subject to minimization).
f
=
(x) = max ¦[h(x)[ , ǫ¦ (3.28)
f
=
(x) minimizes the absolute value of h. If h is continuous and real, it may arbitrarily closely
approach 0 but never actually reach it. Then, deﬁning a small cutoﬀ value ǫ (for instance
ǫ = 10
−6
) may help the optimizer to satisfy the constraint. An approach similar to the
idea of interpreting constraints as objectives is Deb’s Goal Programming method [736, 739].
Constraint violations and objective values are ranked together in a part of the MOEA by
Ray et al. [2273] [749].
3.4.5 The Method Of Inequalities
General inequality constraints can also be processed according to the Method of Inequalities
(MOI) introduced by Zakian [3050, 3052, 3054] in his seminal work on computeraided
control systems design (CACSD) [2404, 2924, 3070]. In the MOI, an area of interest is
speciﬁed in form of a goal range [˘ r
i
,
˘
r
i
] for each objective function f
i
.
Pohlheim [2190] outlines how this approach can be applied to the set f of objective
functions and combined with Pareto optimization: Based on the inequalities, three categories
of candidate solutions can be deﬁned and each element x ∈ X belongs to one of them:
1. It fulﬁlls all of the goals, i. e.,
˘ r
i
≤ f
i
(x) ≤
˘
r
i
∀i ∈ 1.. [f [ (3.29)
2. It fulﬁlls some (but not all) of the goals, i. e.,
(∃i ∈ 1.. [f [ : ˘ r
i
≤ f
i
(x) ≤
˘
r
i
) ∧ (∃j ∈ 1.. [f [ : (f
j
(x) < ˘ r
j
) ∨ (f
j
(x) >
˘
r
j
)) (3.30)
3. It fulﬁlls none of the goals, i. e.,
(f
i
(x) < ˘ r
i
) ∨ (f
i
(x) >
˘
r
i
) ∀i ∈ 1.. [f [ (3.31)
Using these groups, a new comparison mechanism is created:
1. The candidate solutions which fulﬁll all goals are preferred instead of all other indi
viduals that either fulﬁll some or no goals.
2. The candidate solutions which are not able to fulﬁll any of the goals succumb to those
which fulﬁll at least some goals.
3. Only the solutions that are in the same group are compared on basis on the Pareto
domination relation.
3.4. CONSTRAINT HANDLING 73
By doing so, the optimization process will be driven into the direction of the interesting
part of the Pareto frontier. Less eﬀort will be spent on creating and evaluating individuals
in parts of the problem space that most probably do not contain any valid solution.
Example E3.21 (Example E3.6 Cont.: MOI).
In Figure 3.12, we apply the Paretobased Method of Inequalities to the graphical Exam
ple E3.6. We impose the same goal ranges on both objectives ˘ r
1
= ˘ r
2
and
˘
r
1
=
˘
r
2
. By
doing so, the second nondominated region from the Pareto example Figure 3.10 suddenly
becomes infeasible, since f
1
rises over
˘
r
1
. Also, the greater part of the ﬁrst optimal area
from this example is now infeasible because f
2
drops under ˘ r
2
. In the whole domain X of the
optimization problem, only the regions [x
1
, x
2
] and [x
3
, x
4
] fulﬁll all the target criteria. To
these elements, Pareto comparisons are applied. It turns out that the elements in [x
3
, x
4
]
dominate all the elements [x
1
, x
2
] since they provide higher values in f
1
for same values in f
2
.
If we scan through [x
3
, x
4
] from left to right, we see that f
1
rises while f
2
degenerates, which
is why the elements in this area cannot dominat each other and hence, are all optimal.
y
x
1
x
2
x
3
x
4
y=f (x)
1
y=f (x)
2
r ,r
1 2
r ,r
1 2
xÎX
1
˘ ˘
˘ ˘
Figure 3.12: Optimization using the Paretobased Method of Inequalities approach (ﬁrst
example).
Example E3.22 (Example E3.7 Cont.: MOI).
In Figure 3.13 we apply the Paretobased Method of Inequalities to our second graphical
example (Example E3.7). For illustrating purposes, we deﬁne two diﬀerent ranges of interest
[˘ r
3
,
˘
r
3
] and [˘ r
4
,
˘
r
4
] on f
3
and f
4
, as sketched in Fig. 3.13.a.
Like we did in the second example for Pareto optimization, we want to plot the quality of
the elements in the problem space. Therefore, we ﬁrst assign a number c ∈ ¦1, 2, 3¦ to each
of its elements in Fig. 3.13.b. This number corresponds to the classes to which the elements
belong, i. e., 1 means that a candidate solution fulﬁlls all inequalities, for an element of class
2, at least some of the constraints hold, and the elements in class 3 fail all requirements.
Based on this class division, we can then perform a modiﬁed Pareto counting where each
candidate solution dominates all the elements in higher classes Fig. 3.13.c. The result is that
multiple single optima x
⋆
1
, x
⋆
2
, x
⋆
3
, etc., and even a set of adjacent, nondominated elements
X
⋆
9
occurs. These elements are, again, situated at the bottom of the illustrated landscape
whereas the worst candidate solutions reside on hill tops.
74 3 OPTIMA: WHAT DOES GOOD MEAN?
x
2 x
2
x
1
x
1
f
4
f
3
r
3
r
3
r
4
r
4
˘
˘
˘
˘
Fig. 3.13.a: The ranges applied to f
3
and f
4
.
x
2
x
1
M
O
I
c
l
a
s
s
1
2
3
Fig. 3.13.b: The Paretobased Method of Inequalities
class division.
x
2
x
1
X
9
«
x
1
«
x
2
«
x
3
«
x
4
«
x
5
«
x
6
«
x
7
«
x
8
«
#dom
Fig. 3.13.c: The Paretobased Method of Inequali
ties ranking.
Figure 3.13: Optimization using the Paretobased Method of Inequalities approach (ﬁrst
example).
3.5. UNIFYING APPROACHES 75
A good overview on techniques for the Method of Inequalities is given by Whidborne et al.
[2924].
3.4.6 ConstrainedDomination
The idea of constraineddomination is very similar to the Method of Inequalities but natively
applies to optimization problems as speciﬁed in Deﬁnition D3.17. Deb et al. [749] deﬁne
this principle which can easily be incorporated into the dominance comparison between any
two candidate solutions. Here, a candidate solutions x
1
∈ X is preferred in comparison to
an element x
2
∈ X [547, 749, 1297] if
1. x
1
is feasible while x
2
is not,
2. x
1
and x
2
both are infeasible but x
1
has a smaller overall constraint violation, or
3. x
1
and x
2
are both feasible but x
1
dominates x
2
.
3.4.7 Limitations and Other Methods
Other approaches for incorporating constraints into optimization are Goal Attainment [951,
2951] and Goal Programming
7
[530, 531]. Especially interesting in our context are methods
which have been integrated into Evolutionary Algorithms [736, 739, 2190, 2393, 2662], such
as the popular Goal Attainment approach by Fonseca and Fleming [951] which is similar to
the ParetoMOI we have adopted from Pohlheim [2190]. Again, an overview on this subject
is given by Coello Coello et al. in [599].
3.5 Unifying Approaches
3.5.1 External Decision Maker
All approaches for deﬁning what optima are and how constraints should be considered until
now were rather speciﬁc and bound to certain mathematical constructs. The more general
concept of an External Decision Maker which (or who) decides which candidate solutions
prevail has been introduced by Fonseca and Fleming [952, 954].
One of the ideas behind “externalizing” the assessment process on what is good and
what is bad is that Pareto optimization as well as, for instance, the Method of Inequalities,
impose rigid, immutable partial orders
8
on the candidate solutions. The ﬁrst drawback of
such a partial order is that elements may exists which neither succeed nor precede each
other. As we have seen in the examples discussed in Section 3.3.5, there can, for instance,
be two candidate solutions x
1
, x
2
∈ X with neither x
1
⊣ x
2
nor x
2
⊣ x
1
. Since there can be
arbitrarily many such elements, the optimization process may run into a situation where it
set of currently discovered “best” candidate solutions X grows quickly, which leads to high
memory consumption and declining search speed.
The second drawback of only using rigid partial orders is that they make no statement
of the quality relations within X. There is no “best of the best”. We cannot decide which
of two candidate solutions not dominating each other is the most interesting one for further
7
http://en.wikipedia.org/wiki/Goal_programming [accessed 20070703]
8
A deﬁnition of partial order relations is speciﬁed in Deﬁnition D51.16 on page 645.
76 3 OPTIMA: WHAT DOES GOOD MEAN?
investigation. Also, the plain Pareto relation does not allow us to dispose the “weakest”
element from X in order to save memory, since all elements in X are alike.
One example for introducing a quality measure additional to the Pareto relation is the
Pareto ranking which we already used in Example E3.19 and which we will discuss later
in Section 28.3.3 on page 275. Here, the utility of a candidate solution is deﬁned by the
number of other candidate solutions it is dominated by. Into this number, additional den
sity information which deems areas of X which have only sparsely been sampled as more
interesting than those which have already extensively been investigated.
While such ordering methods are a good default approaches able to direct the search
into the direction of the Pareto frontier and delivering a broad scan of it, they neglect the
fact that the user of the optimization most often is not interested in the whole optimal
set but has preferences, certain regions of interest [955]. This region would exclude, for
example, the infeasible (but Pareto optimal) programs for the Artiﬁcial Ant as discussed in
Section 3.3.5.1. What the user wants is a detailed scan of these areas, which often cannot
be delivered by pure Pareto optimization.
utility/cost
results
apriori
knowledge
objectivevalues
(acquiredknowledge)
EA
(anoptimizer)
DM
(decisionmaker)
Figure 3.14: An external decision maker providing an Evolutionary Algorithm with utility
values.
Here comes the External Decision Maker as an expression of the user’s preferences [949]
into play, as illustrated in Figure 3.14. The task of this decision maker is to provide a cost
function u : Rn → R (or utility function, if the underlying optimizer is maximizing) which
maps the space of objective values (R
n
) to the real numbers R. Since there is a total order
deﬁned on the real numbers, this process is another way of resolving the “incomparability
situation”. The structure of the decision making process u can freely be deﬁned and may
incorporate any of the previously mentioned methods. u could, for example, be reduced
to compute a weighted sum of the objective values, to perform an implicit Pareto ranking,
or to compare individuals based on prespeciﬁed goalvectors. Furthermore, it may even
incorporate forms of artiﬁcial intelligence, other forms of multicriterion Decision Making,
and even interaction with the user (smilar to Example E2.6 on page 48, for instance). This
technique allows focusing the search onto solutions which are not only good in the Pareto
sense, but also feasible and interesting from the viewpoint of the user.
Fonseca and Fleming make a clear distinction between ﬁtness and cost values. Cost values
have some meaning outside the optimization process and are based on user preferences.
Fitness values on the other hand are an internal construct of the search with no meaning
outside the optimizer (see Deﬁnition D5.1 on page 94 for more details). If External Decision
Makers are applied in Evolutionary Algorithms or other search paradigms based on ﬁtness
measures, these will be computed using the values of the cost function instead of the objective
functions [949, 950, 956].
3.5.2 Prevalence Optimization
We have now discussed various ways to deﬁne what optima in terms of multiobjective
optimization are and to steer the search process into their direction. Let us subsume all
3.5. UNIFYING APPROACHES 77
of them in general approach. From the concept of Pareto optimization to the Method of
Inequalities, the need to compare elements of the problem space in terms of their quality
as solution for a given problem winds like a read thread through this matter. Even the
weighted sum approach and the External Decision Maker do nothing else than mapping
multidimensional vectors to the real numbers in order to make them comparable.
If we compare two candidate solutions x
1
und x
2
, either x
1
is better than x
2
, vice versa,
or the result is undecidable. In the latter case, we assume that both are of equal quality.
Hence, there are three possible relations between two elements of the problem space. These
two results can be expressed with a comparator function cmp
f
.
Deﬁnition D3.18 (Comparator Function). A comparator function cmp : A
2
→ R
maps all pairs of elements (a
1
, a
2
) ∈ A
2
to the real numbers R according to a (strict)
partial order
9
R:
R(a
1
, a
2
) ⇔ (cmp(a
1
, a
2
) < 0) ∧ (cmp(a
2
, a
1
) > 0) ∀a
1
, a
2
∈ A (3.32)
R(a
1
, a
2
) ∧ R(a
2
, a
1
) ⇔ cmp(a
1
, a
2
) = cmp(a
2
, a
1
) = 0 ∀a
1
, a
2
∈ A (3.33)
R(a
1
, a
2
) ⇒ R(a
1
, a
2
) ∀a
1
, a
2
∈ A (3.34)
cmp(a, a) = 0∀a ∈ A (3.35)
We provide an Java interface for the functionality of a comparator functions in Listing 55.15
on page 719. From the three deﬁning equations, many features of “cmp” can be deduced.
It is, for instance, transitive, i. e., cmp(a
1
, a
2
) < 0 ∧ cmp(a
2
, a
3
) < 0 ⇒ cmp(a
1
, a
3
)) < 0.
Provided with the knowledge of the objective functions f ∈ f , such a comparator function
cmp
f
can be imposed on the problem spaces of our optimization problems:
Deﬁnition D3.19 (Prevalence Comparator Function). A prevalence comparator func
tion cmp
f
: X
2
→R maps all pairs (x
1
, x
2
) ∈ X
2
of candidate solutions to the real numbers
R according to Deﬁnition D3.18.
The subscript f in cmp
f
illustrates that the comparator has access to all the values of the
objective functions in addition to the problem space elements which are its parameters. As
shortcut for this comparator function, we introduce the prevalence notation as follows:
Deﬁnition D3.20 (Prevalence). An element x
1
prevails over an element x
2
(x
1
≺ x
2
) if
the applicationdependent prevalence comparator function cmp
f
(x
1
, x
2
) ∈ R returns a value
less than 0.
x
1
≺ x
2
⇔ cmp
f
(x
1
, x
2
) < 0 ∀x
1
, x
2
, ∈ X (3.36)
(x
1
≺ x
2
) ∧ (x
2
≺ x
3
) ⇒ x
1
≺ x
3
∀x
1
, x
2
, x
3
∈ X (3.37)
It is easy to see that we can ﬁt lexicographic orderings, Pareto domination relations, the
Method of Inequalitiesbased comparisons, as well as the weighted sum combination of ob
jective values easily into this notation. Together with the ﬁtness assignment strategies for
Evolutionary Algorithms which will be introduced later in this book (see Section 28.3 on
page 274), it covers many of the most sophisticated multiobjective techniques that are pro
posed, for instance, in [952, 1531, 2662]. By replacing the Pareto approach with prevalence
comparisons, all the optimization algorithms (especially many of the evolutionary tech
niques) relying on domination relations can be used in their original form while oﬀering the
additional ability of scanning special regions of interests of the optimal frontier.
9
Partial orders are introduced in Deﬁnition D51.15 on page 645.
78 3 OPTIMA: WHAT DOES GOOD MEAN?
Since the comparator function cmp
f
and the prevalence relation impose a partial order
on the problem space X like the domination relation does, we can construct the optimal set
X
⋆
containing globally optimal elements x
⋆
in a way very similar to Equation 3.23:
x
⋆
∈ X
⋆
⇔ ∃x ∈ X : x ,= x
⋆
∧ x ≺ x
⋆
(3.38)
For illustration purposes, we will exercise the prevalence approach on the examples of lexico
graphic optimality (see Section 3.3.2) and deﬁne the comparator function cmp
f ,lex
. For the
weighted sum method with the weights w
i
(see Section 3.3.3) we similarly provide cmp
f ,ws
and the dominationbased Pareto optimization with the objective directions ω
i
as deﬁned
in Section 3.3.5 can be realized with cmp
f ,⊣
.
cmp
f ,lex
(x
1
, x
2
) =
_
¸
¸
¸
¸
_
¸
¸
¸
¸
_
−1 if ∃π ∈ Π(1.. [f [) : (x
1
<
lex,π
x
2
) ∧
∃π ∈ Π(1.. [f [) : (x
2
<
lex,π
x
1
)
1 if ∃π ∈ Π(1.. [f [) : (x
2
<
lex,π
x
1
) ∧
∃π ∈ Π(1.. [f [) : (x
1
<
lex,π
x
2
)
0 otherwise
(3.39)
cmp
f ,ws
(x
1
, x
2
) =
_
_
f 
i=1
(w
i
f
i
(x
1
) −w
i
f
i
(x
2
))
_
_
≡ ws(x
1
) −ws(x
2
) (3.40)
cmp
f ,⊣
(x
1
, x
2
) =
_
_
_
−1 if x
1
⊣ x
2
1 if x
2
⊣ x
1
0 otherwise
(3.41)
Example E3.23 (Example E3.5 – Artiﬁcial Ant – Cont.).
With the prevalence comparator, we can also easily solve the problem stated in Sec
tion 3.3.5.1 by no longer encouraging the evolution of useless programs for Artiﬁcial Ants
while retaining the beneﬁts of Pareto optimization. The comparator function can simply
be deﬁned in a way that infeasible candidate solutions will always be prevailed by useful
programs. It therefore may incorporate the knowledge on the importance of the objective
functions. Let f
1
be the objective function with an output proportional to the food piled, f
2
denote the distance covered in order to ﬁnd the food, and f
3
be the program length. Equa
tion 3.42 demonstrates one possible comparator function for the Artiﬁcial Ant problem.
cmp
f ,ant
(x
1
, x
2
) =
_
¸
¸
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−1 if (f
1
(x
1
) > 0 ∧ f
1
(x
2
) = 0) ∨
(f
2
(x
1
) > 0 ∧ f
2
(x
2
) = 0) ∨
(f
3
(x
1
) > 0 ∧ f
1
(x
2
) = 0)
1 if (f
1
(x
2
) > 0 ∧ f
1
(x
1
) = 0) ∨
(f
2
(x
2
) > 0 ∧ f
2
(x
1
) = 0) ∨
(f
3
(x
2
) > 0 ∧ f
1
(x
1
) = 0)
cmp
f ,⊣
(x
1
, x
2
) otherwise
(3.42)
Later in this book, we will discuss some of the most popular optimization strategies. Al
though they are usually implemented based on Pareto optimization, we will always introduce
them using prevalence.
3.5. UNIFYING APPROACHES 79
Tasks T3
13. In Example E1.2 on page 26, we stated the layout of circuits as a combinatorial opti
mization problem. Name at least three criteria which could be subject to optimization
or constraints in this scenario.
[6 points]
14. The job shop scheduling problem was outlined in Example E1.3 on page 26. The most
basic goal when solving job shop problems is to meet all production deadlines. Would
formulate this goal as a constraint or rather as an objective function? Why?
[4 points]
15. Let us consider the university life of a student as optimization problem with the goal
of obtaining good grades in all the subjects she attends. How could be formulate a
(a) single objective function for that purpose, or
(b) a set of objectives which can be optimized with, for example, the Pareto approach.
Additionally, phrase at least two constraints in daily life which, for example, prevent
the student from learning 24 hours per day.
[10 points]
16. Consider a biobjective optimization problem with X = R as problem space and the two
objective functions f
1
(x) = e
x−3
and f
2
(x) = [x −100[ both subject to minimization.
What is your opinion about the viability of solving this problem with the weighted
sum approach with both weights set to 1?
[5 points]
17. If we perform Pareto optimization of n = [f [ objective functions f
i
: i ∈ 1..n where
each of the objectives can take on m diﬀerent values, there can be at most m
n−1
nondominated candidate solutions at a time. Sketch conﬁgurations of m
n−1
non
dominated candidate solutions in diagrams similar to Figure 28.4 on page 276 for
n ∈ ¦1, 2¦ and m ∈ ¦1, 2, 3, 4¦. Outline why the sentence also holds for n > 2.
[10 points]
18. In your opinion, is it good if most of the candidate solutions under investigation
by a Paretobased optimization algorithm are nondominated? For answering this
question, consider an optimization process that decides which individual should be
used for further investigation solely on basis of its objective values and, hence, the
Pareto dominance relation. Consider the extreme case that all points known to such
an algorithm are nondominated to answer the question.
Based on your answer, make a statement about the expected impact of the fact dis
cussed in Task 17 on the quality of solutions that a Paretobased optimizer can deliver
for rising dimensionality of the objective space, i. e., for increasing values of n = [f [.
[10 points]
19. Use Equation 3.1 to Equation 3.5 from page 52 to determine all extrema (optima) for
(a) f(x) = 3x
5
+ 2x
2
−x,
(b) g(x) = 2x
2
+x, and
(c) h(x) = sin x.
[15 points]
Chapter 4
Search Space and Operators: How can we ﬁnd it?
4.1 The Search Space
In Section 2.2, we gave various examples for optimization problems, such as from simple
realvalued problems (Example E2.1), antenna design (Example E2.3), and tasks in aircraft
development (Example E2.5). Obviously, the problem spaces X of these problems are quite
diﬀerent. A solution of the stone’s throw task, for instance, was a real number of meters per
second. An antenna can be described by its blueprint which is structured very diﬀerently
from the blueprints of aircraft wings or rotors.
Optimization algorithms usually start out with a set of either predeﬁned or automatically
generated elements from X which are far from being optimal (otherwise, optimization would
make not too much sense). From these initial solutions, new elements from X are explored
in the hope to improve the solution quality. In order to ﬁnd these new candidate solutions,
search operators are used which modify or combine the elements already known.
Since the problem spaces of the three aforementioned examples are diﬀerent, this would
mean that diﬀerent sets of search operators are required, too. It would, however, be cumber
some to develop search operations time and again for each new problem space we encounter.
Such an approach would not only be errorprone, it would also make it very hard to formulate
general laws and to consolidate ﬁndings.
Instead, we often reuse wellknown search spaces for many diﬀerent problems. After we
the problem space is deﬁned, we will always look for ways to map it to such a search space
for which search operators are already available. This is not always possible, but in most of
the problems we will encounter, it will save a lot of eﬀort.
Example E4.1 (Search Spaces for Example E2.1, E2.3, and E2.5).
The stone’s throw as well as many antenna and wing blueprints, for instance, can all be
represented as vectors of real numbers. For the stone’s throw, these vectors would have
length one (Fig. 4.1.a). An antenna consisting of n segments could be encoded as vector of
length 3n, where the element at index i speciﬁes a length and i + 1 and i + 2 are angles
denoting into which direction the segment is bent as sketched in Fig. 4.1.b. The aircraft
wing could be represent as vector of length 3n, too, where the element at index i is the
segment width and i + 1 and i + 2 denote its disposition and height (Fig. 4.1.c).
82 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
x=g=(g )
1
Fig. 4.1.a: Stone’s throw.
g
1
g
4
g
7
g
5
g
8
g
0
g
3
g
6
g
2
g=
g
g
g
g
0
1
2
...
()
Fig. 4.1.b: Antenna design.
g
0
g
3
g
6
g
9
g
1
g
4 g
7
g
10 g
13
g
14
g
11
g
8
g
5
g
2
g=
g
g
g
g
0
1
2
...
()
Fig. 4.1.c: Wing design.
Figure 4.1: Examples for realcoding candidate solutions.
Deﬁnition D4.1 (Search Space). The search space G (genome) of an optimization prob
lem is the set of all elements g which can be processed by the search operations.
In dependence on Genetic Algorithms, we often refer to the search space synonymously as
genome
1
. Genetic Algorithms (see Chapter 29) are optimization algorithms inspired by the
biological evolution. The term genome was coined by the German biologist Winkler [2958]
as a portmanteau of the words gene and chromosome [1701].
2
In biology, the genome is
the whole hereditary information of organisms and the genotype is represents the heredity
information of an individual. Since in the context of optimization we refer to the search
space as genome, we call its elements genotypes.
Deﬁnition D4.2 (Genotype). The elements g ∈ G of the search space G of a given
optimization problem are called the genotypes.
As said, a biological genotype encodes the complete heredity information of an individual.
Likewise, a genotype in optimization encodes a complete candidate solution. If G ,= X, we
distinguish genotypes g ∈ G and phenotypes x ∈ X. In Example E4.1, the genomes all are
the vectors of real numbers but the problem spaces are either a velocity, an antenna design,
or a blueprint of a wing.
The elements of the search space rarely are unstructured aggregations. Instead, they
often consist of distinguishable parts, hierarchical units, or welltyped data structures. The
same goes for the chromosomes in biology. They consist of genes
3
, segments of nucleic acid,
which contain the information necessary to produce RNA strings in a controlled manner and
encode phenotypical or metabolical properties. A ﬁsh, for instance, may have a gene for the
color of its scales. This gene could have two possible “values” called alleles
4
, determining
whether the scales will be brown or gray. The Genetic Algorithm community has adopted
this notation long ago and we use it for specifying the detailed structure of the genotypes.
Deﬁnition D4.3 (Gene). The distinguishable units of information in genotypes g ∈ G
which encode properties of the candidate solutions x ∈ X are called genes.
1
http://en.wikipedia.org/wiki/Genome [accessed 20070715]
2
The words gene, genotype, and phenotype have, in turn, been introduced [2957] by the Danish biologist
Johannsen [1448].
3
http://en.wikipedia.org/wiki/Gene [accessed 20070703]
4
http://en.wikipedia.org/wiki/Allele [accessed 20070703]
4.2. THE SEARCH OPERATIONS 83
Deﬁnition D4.4 (Allele). An allele is a value of speciﬁc gene.
Deﬁnition D4.5 (Locus). The locus
5
is the position where a speciﬁc gene can be found
in a genotype [634].
Example E4.2 (Small Binary Genome).
Genetic Algorithms are a widelyresearched optimization approach which, in its original
form, utilizes bit strings as genomes. Assume that we wished to apply such a plain GA to
the minimization of a function f : (0..3 0..3) → R such as the trivial f(x = (x[0], x[1])) =
x[0] −x[1] : x[0] ∈ X
0
, x[1] ∈ X
1
.
genotypeg G Î
0 1 1 1
0 0 0 0
0 0 0 1
0 0 1 0
1 1 0 1
1 1 1 0
1 1 1 1
...
genome G
2
1
3
0
1 2 3
phenotypex X Î
x=gpm(g)
searchOp
2
1
3
0
1 2 3
phenome X=X X
0 1
´
g G Î
x X Î
(searchspace) (problemspace) (solutioncandidate)
allele,,11``
atlocus1
2 Gene
nd
allele,,01``
atlocus0
1 Gene
st
X
0
X
0
X
1
X
1
Figure 4.2: The relation of genome, genes, and the problem space.
Figure 4.2 illustrates how this can be achieved by using two bits for each, x[0] and x[1].
The ﬁrst gene resides at locus 0 and encodes x[0] whereas the two bits at locus 1 stand for
x[1]. Four bits suﬃce for encoding two natural numbers in X
1
= X
2
= 1..3 and thus, for the
whole candidate solution (phenotype) x. With this trivial encoding, the Genetic Algorithm
can be applied to the problem and will quickly ﬁnd the solution x = (0, 3)
4.2 The Search Operations
Deﬁnition D4.6 (Search Operation). A search operation “searchOp” takes a ﬁxed
number n ∈ N
0
of elements g
1
..g
n
from the search space G as parameter and returns another
element from the search space.
searchOp : G
n
→G (4.1)
Search operations are used an optimization algorithm in order to explore the search space.
n is an operatorspeciﬁc value, its arity
6
. An operation with n = 0, i. e., a nullary operator,
receives no parameters and may, for instance, create randomly conﬁgured genotypes (see,
for instance, Listing 55.2 on page 708). An operator with n = 1 could randomly modify
5
http://en.wikipedia.org/wiki/Locus_%28genetics%29 [accessed 20070703]
6
http://en.wikipedia.org/wiki/Arity [accessed 20080215]
84 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
its parameter – the mutation operator in Evolutionary Algorithms is an example for this
behavior.
The key idea of all search operations which take at least one parameter (n > 0) (see
Listing 55.3) is that
1. it is possible to take a genotype g of a candidate solution x = gpm(g) with good
objective values f (x) and to
2. modify this genotype g to get a new one g
′
= searchOp(g, . . .) which
3. can be translated to a phenotype x
′
= gpm(g
′
) with similar or maybe even better
objective values f (x
′
).
This assumption is one of the most basic concepts of iterative optimization algorithms.
It is known as causality or locality and discussed indepth in Section 14.2 on page 161
and Section 24.9 on page 218. If it does not hold then applying a small change to a genotype
leads to large changes in its utility. Instead modifying a genotype, we then could as well
create a new, random one directly.
Many operations which have n > 1 try to combine the genotypes they receive in the
input. This is done in the hope that positive characteristics of them can be joined into an
even better oﬀspring genotype. An example for such operations is the crossover operator in
Genetic Algorithms. Diﬀerential Evolution utilizes an operator with n = 3, i. e., which takes
three arguments. Here, the idea often is that good can step by step be accumulated and
united by an optimization process. This Building Block Hypothesis is discussed in detail
in Section 29.5.5 on page 346, where also criticism to this point of view is provided.
Search operations often are randomized, i. e., they are not functions in the mathematical
sense. Instead, the return values of two successive invocations of one search operation with
exactly the same parameters may be diﬀerent. Since optimization processes quite often use
more than a single search operation at a time, we deﬁne the set Op.
Deﬁnition D4.7 (Search Operation Set). Op is the set of search operations searchOp ∈
Op which are available to an optimization process. Op(g
1
, g
2
, .., g
n
) denotes the application
of any nary search operation in Op to the to the genotypes g
1
, g
2
, ∈ G.
∀g
1
, g
2
, .., g
n
∈ G ⇒
_
Op(g
1
, g
2
, .., g
n
) ∈ G if ∃ nary operation in Op
Op(g
1
, g
2
, .., g
n
) = ∅ otherwise
(4.2)
It should be noted that, if the nary search operation from Op which was applied is a random
ized operator, obviously also the behavior of Op(. . .) is randomized as well. Furthermore,
if more than one nary operation is contained in Op, the optimization algorithm using Op
usually performs some form of decision on which operation to apply. This decision may
either be random or follow a speciﬁc schedule and the decision making process may even
change during the course of the optimization process. Writing Op(g) in order to denote the
application of a unary search operation from Op to the genotype g ∈ G is thus a simpliﬁed
expression which encompasses both, possible randomness in the search operations and in
the decision process on which operation to apply.
With Op
k
(g
1
, g
2
, . . .) we denote k successive applications of (possibly diﬀerent) search
operators (with respect to their arities). The k
th
application receives all possible results of
the k1
th
application of the search operators as parameter, i. e., is deﬁned recursively as
Op
k
(g
1
, g
2
, . . .) =
_
∀G∈Π(P(Op
k−1
(g
1
,g
2
,...)))
Op(G) (4.3)
where Op
0
(g
1
, g
2
, . . .) = ¦g
1
, g
2
, . . . ¦, T(G) is the power set of a set G, and Π(A) is the set
of all possible permutations of the elements of G.
If the parameter g is left away, i. e., just Op
k
is written, this chain has to start with
a search operation with zero arity. In the style of Badea and Stanciu [177] and Skubch
[2516, 2517], we now can deﬁne:
4.3. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 85
Deﬁnition D4.8 (Completeness). A set Op of search operations “searchOp” is complete
if and only if every point g
1
in the search space G can be reached from every other point
g
2
∈ G by applying only operations searchOp ∈ Op.
∀g
1
, g
2
∈ G ⇒∃k ∈ N
1
: g
1
∈ Op
k
(g
2
) (4.4)
Deﬁnition D4.9 (Weak Completeness). A set Op of search operations “searchOp” is
weakly complete if every point g in the search space G can be reached by applying only
operations searchOp ∈ Op.
∀g ∈ G ⇒∃k ∈ N
1
: g ∈ Op
k
(4.5)
A weakly complete set of search operations usually contains at least one operation without
parameters, i. e., of zero arity. If at least one such operation exists and this operation can
potentially return all possible genotypes, then the set of search operations is already weakly
complete. However, nullary operations are often only used in the beginning of the search
process (see, for instance, the Hill Climbing Algorithm 26.1 on page 230). If Op achieved its
weak completeness because of a nullary search operation, the initialization of the algorithm
would determine which solutions can be found and which not. If the set of search operations
is strongly complete, the search process, in theory, can ﬁnd any solution regardless of the
initialization.
From the opposite point of view, this means that if the set of search operations is not
complete, there are elements in the search space which cannot be reached from a random
starting point. If it is not even weakly complete, parts of the search space can never be
discovered. Then, the optimization process is probably not able to explore the problem
space adequately and may not ﬁnd optimal or satisfyingly good solution.
Deﬁnition D4.10 (Adjacency (Search Space)). A point g
2
is εadjacent to a point g
1
in the search space G if it can be reached by applying a single search operation “searchOp”
to g1 with a probability of more than ε ∈ [0, 1). Notice that the adjacency relation is not
necessarily symmetric.
adjacent
ε
(g
2
, g
1
) ⇔P (Op(g
1
) = g
2
) > ε (4.6)
The probability parameter ε is deﬁned here in order to allow us to diﬀerentiate between
probable and improbable transitions in the search space. Normally, we will use the adjacency
relation with ε = 0 and abbreviate the notation with adjacent(g
2
, g
1
) ≡ adjacent
0
(g
2
, g
1
).
In the case of Genetic Algorithms with, for instance, singlebit ﬂip mutation operators
(see Section 29.3.2.1 on page 330), adjacent(g
2
, g
1
) would be true for all bit strings g
2
which have a Hamming distance of no more than one from g
1
, i. e., which diﬀer by at
most one bit from g
1
. However, if we assume a realvalued search space G = R and a
mutation operator which adds a normally distributed random value to a genotype, then
adjacent(g
2
, g
1
) would always be true, regardless of their distance. The adjacency relation
with ε = 0 has thus no meaning in this context. If we set ε to a very small number such
as 1 10
4
, we can get a meaningful impression on how likely it is to discover a point g
2
when starting from g
1
. Furthermore, this deﬁnition allows us to reason about limits such as
lim
ε→0
adjacent
ε
(g
2
, g
1
).
4.3 The Connection between Search and Problem Space
If the search space diﬀers from the problem space, a translation between them is furthermore
required. In Example E4.2, we would need to transform the binary strings processed by the
Genetic Algorithm to tuples of natural numbers which can be processed by the objective
functions.
86 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
Deﬁnition D4.11 (GenotypePhenotype Mapping). The genotypephenotype map
ping (GPM, or ontogenic mapping [2134]) gpm : G → X is a lefttotal
7
binary relation
which maps the elements of the search space G to elements in the problem space X. In
Section 28.2, you can ﬁnd a thorough discussion of diﬀerent types of genotypephenotype
mappings used in Evolutionary Algorithms.
∀g ∈ G ∃x ∈ X : gpm(g) = x (4.7)
In Listing 55.7 on page 711, you can ﬁnd a Java interface which resembles Equation 4.7. The
only hard criterion we impose on genotypephenotype mappings in this book is lefttotality,
i. e., that they map each element of the search space to at least one candidate solution. The
GPM may be a functional relation if it is deterministic. Nevertheless, it is possible to create
mappings which involve random numbers and hence, cannot be considered to be functions
in the mathematical sense of Section 51.10 on page 646. In this case, we would rewrite
Equation 4.7 as Equation 4.8:
∀g ∈ G ∃x ∈ X : P (gpm(g) = x) > 0 (4.8)
Genotypephenotype mappings should further be surjective [2247], i. e., relate at least one
genotype to each element of the problem space. Otherwise, some candidate solutions can
never be found and evaluated by the optimization algorithm even if the search operations
are complete. Then, there is no guarantee whether the solution of a given problem can
be discovered or not. If a genotypephenotype mapping is injective, which means that it
assigns distinct phenotypes to distinct elements of the search space, we say that it is free from
redundancy. There are diﬀerent forms of redundancy, some are considered to be harmful for
the optimization process, others have positive inﬂuence
8
. Most often, GPMs are not bijective
(since they are neither necessarily injective nor surjective). Nevertheless, if a genotype
phenotype mapping is bijective, we can construct an inverse mapping gpm
−1
: X →G.
gpm
−1
(x) = g ⇔gpm(g) = x ∀x ∈ X, g ∈ G (4.9)
Based on the genotypephenotype mapping and the deﬁnition of adjacency in the search
space given earlier in this section, we can also deﬁne a neighboring relation for the problem
space, which, of course, is also not necessarily symmetric.
Deﬁnition D4.12 (Neighboring (Problem Space)). Under the condition that the can
didate solution x
1
∈ X was obtained from the element g
1
∈ G by applying the genotype
phenotype mapping, i. e., x
1
= gpm(g
1
), a point x
2
is εneighboring to x
1
if it can be reached
by applying a single search operation and a subsequent genotypephenotype mapping to g
1
with at least a probability of ε ∈ [0, 1). Equation 4.10 provides a general deﬁnition of the
neighboring relation. If the genotypephenotype mapping is deterministic (which usually is
the case), this deﬁnition boils down to Equation 4.11.
neighboring
ε
(x
2
, x
1
) ⇔
_
_
∀g
2
:adjacent
0
(g
2
,g
1
)
P (Op(g
1
) = g
2
) P
_
x
2
= gpm(g
2
)[
Op(g
1
)=g
2
_
_
_
> ε
¸
¸
¸
¸
¸
¸
x
1
=gpm(g
1
)
(4.10)
neighboring
ε
(x
2
, x
1
) ⇔
_
_
_
_
_
_
_
∀g
2
: adjacent
0
(g
2
, g
1
) ∧
x
2
= gpm(g
2
)
P (Op(g
1
) = g
2
)
_
_
_
_
_
_
_
> ε
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
x
1
=gpm(g
1
)
(4.11)
7
See Equation 51.29 on page 644 to Point 5 for an outline of the properties of binary relations.
8
See Section 16.5 on page 174 for more information.
4.3. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 87
The condition x
1
= gpm(g
1
) is relevant if the genotypephenotype mapping “gpm” is not
injective (see Equation 51.31 on page 644), i. e., if more than one genotype can map to
the same phenotype. Then, the set of εneighbors of a candidate solution x
1
depends
on how it was discovered, on the genotype g1 it was obtained from, as can be seen in
Example E4.3. Example E5.1 on page 96 shows an account on how the search operations and,
thus, the deﬁnition of neighborhood, inﬂuence the performance of an optimization algorithm.
Example E4.3 (Neighborhoods under NonInjective GPMs).
As an example for neighborhoods under noninjective genotypephenotype mappings, let
111
011
001
100 101
1 2
110
010
GPM
X
G
ProblemSpace:
Thenaturalnumbers0,1,and2
= 0,1,2 { }
X
X
GenotypePhenotypeMapping:gpm
gpm(g)=(g +g )
*
g
1 2 3
SearchSpace:
Thebitstringsoflength3
= = 0,1
3 3
{ }
G
G B
0
(g g ,g
1 2 3
, )=g
000
g
A
g
B
Figure 4.3: Neighborhoods under noninjective genotypephenotype mappings.
us imagine that we have a very simple optimization problem where the problem space
X only consists of the three numbers 0, 1, and 2, as sketched in Figure 4.3. Imagine
further that we would use the bit strings of length three as search space, i. e., G = B
3
=
¦0, 1¦
3
. As genotypephenotype mapping gpm : G → X, we would apply the function
x = gpm(g) = (g
1
+ g
2
) ∗ g
3
. The two genotypes g
A
= 010 and g
B
= 110 both map to 0
since (0 + 1) ∗ 0 = (1 + 1) ∗ 0 = 0.
If the only search operation available would be a singlebit ﬂip mutation, then all geno
types which have a Hamming distance of 1 are adjacent in the search space. This means
that g
A
= 010 is adjacent to 110, 000, and 011 while g
B
= 110 is adjacent to 010, 100, and
111.
The candidate solution x = 0 is neighboring to 0 and 1 if it was obtained by applying the
genotypephenotype mapping to g
A
= 010. With one search step, gpm(110) = gpm(000) =
0 and gpm(011) = 1 can be reached. However, if x = 0 was the result of by applying the
genotypephenotype mapping to g
B
= 110, its neighborhood is ¦0, 2¦, since gpm(010) =
gpm(100) = 0 and gpm(111) = 2.
Hence, the neighborhood relation under noninjective genotypephenotype mappings de
pends on the genotypes from which the phenotypes under investigation originated. For
injective genotypephenotype mappings, this is not the case.
If the search space G and the problem space X are the same (G = X) and the genotype
phenotype mapping is the identity mapping, it follows that g
2
= x
2
and the sum in Equa
tion 4.11 breaks down to P (Op(g
1
) = g
2
). Then the neighboring relation (deﬁned on the
problem space) equals the adjacency relation (deﬁned on the search space), as can be seen
88 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
when taking a look at Equation 4.6. Since the identity mapping is injective, the condition
x
1
= gpm(g
1
) is always true, too.
(G = X) ∧ (gpm(g) = g ∀g ∈ G) ⇔neighboring ≡ adjacent (4.12)
In the following, we will use neighboring
ε
to denote a εneighborhood relation with implicitly
assuming the dependencies on the search space or that the GPM is injective. Furthermore,
with neighboring(x
1
, x
2
), we denote that x
1
and x
2
are ε = 0neighbors.
4.4 Local Optima
We now have the means to deﬁne the term local optimum more precisely. From Exam
ple E4.4, it becomes clear that such a deﬁnition is not as trivial as the deﬁnitions of global
optima given in Chapter 3.
Example E4.4 (Types of Local Optima).
If we assume minimization, the global optimum x
⋆
of the objective function given can easily
f(x)
x X Î
X
«
l
x
«
X
N
Figure 4.4: Minima of a single objective function.
be found. Obviously, every global optimum is also a local optimum. In the case of single
objective minimization, it is also a local minimum, and a global minimum.
In order to deﬁne what a local minimum is, as a ﬁrst step, we could say “A (local)
minimum ˇ x
l
∈ X of one (objective) function f : X →R is an input element with f(ˇ x
l
) < f(x)
for all x neighboring to ˇ x
l
.” However, such a deﬁnition would disregard locally optimal
plateaus, such as the set X
⋆
l
of candidate solutions sketched in Figure 4.4.
If we simply exchange the “<” in f(ˇ x
l
) < f(x) with a “≤” and obtain f(ˇ x
l
) ≤ f(x) however,
we would classify the elements in X
N
as a local optima as well. This would contradict the
intuition, since they are clearly worse than the elements on the right side of X
N
. Hence, a
more ﬁnegrained deﬁnition is necessary.
Similar to Handl et al. [1178], let us start with creating a deﬁnition of areas in the search
space based on the speciﬁcation of εneighboring (Deﬁnition D4.12).
Deﬁnition D4.13 (Connection Set). A set X ⊆ X is εconnected if all of its elements
can be reached from each other with consecutive search steps, each of which having at least
probability ε ∈ [0, 1), i. e., if Equation 4.13 holds.
x
1
,= x
2
⇒ neighboring
ε
(x
1
, x
2
) ∀x
1
, x
2
∈ X (4.13)
4.4. LOCAL OPTIMA 89
4.4.1 Local Optima of singleobjective Problems
Deﬁnition D4.14 (εLocal Minimal Set). A εlocal minimal set
ˇ
X
l
⊆ X under objec
tive function f and problem space X is a connected set for which Equation 4.14 holds.
neighboring
ε
(x, ˇ x
l
) ⇒
_
(x ∈
ˇ
X
l
) ∧ (f(x) = f(ˇ x
l
))
¸
∨
_
(x ,∈
ˇ
X
l
) ∧ (f(ˇ x
l
) < f(x))
¸
∀ˇ x
l
∈
ˇ
X
l
, x ∈ X
(4.14)
An εlocal minimum ˇ x
l
is an element of an εlocal minimal set. We will use the term local
minimum set and local minimum to denote scenarios where ε = 0.
Deﬁnition D4.15 (εLocal Maximal Set). A εlocal maximal set
ˆ
X
l
⊆ X under objec
tive function f and problem space X is a connected set for which Equation 4.15 holds.
neighboring
ε
(x, ˆ x
l
) ⇒
_
(x ∈
ˆ
X
l
) ∧ (f(x) = f(ˆ x
l
))
_
∨
_
(x ,∈
ˆ
X
l
) ∧ (f(ˆ x
l
) > f(x))
_
∀ˆ x
l
∈
ˆ
X
l
, x ∈ X
(4.15)
Again, an εlocal maximum ˆ x
l
is an element of an εlocal maximal set and we will use the
term local maximum set and local maximum to indicate scenarios where ε = 0.
Deﬁnition D4.16 (εLocal Optimal Set (SingleObjective Optimization)). Aεlocal
optimal set X
⋆
l
⊆ X is either a εlocal minimum set or a εlocal maximum set, depending
on the deﬁnition of the optimization problem.
Similar to the case of local maxima and minima, we will again use the terms εlocal optimum,
local optimum set, and local optimum.
4.4.2 Local Optima of MultiObjective Problems
As basis for the deﬁnition of local optima in multiobjective problems we use the prevalence
relations and the concepts provided by Handl et al. [1178]. In Section 3.5.2, we showed
that these encompass the other multiobjective concepts as well. Therefore, in the following
deﬁnition, “prevails” can as well be read as “dominates”, which is probably the most common
use.
Deﬁnition D4.17 (εLocal Optimal Set (MultiObjective Optimization)). Aεlocal
optimal set X
⋆
l
⊆ X under the prevalence operator “≺” and problem space X is a connected
set for which Equation 4.16 holds.
neighboring
ε
(x, x
⋆
l
) ⇒ [(x ∈ X
⋆
l
) ∧ (x ≺ x
⋆
l
)] ∨
[(x ,∈ X
⋆
l
) ∧ (x
⋆
l
≺ x)] ∀x
⋆
l
∈ X
⋆
l
, x ∈ X
(4.16)
Again, we use the term εlocal optimum to denote candidate solutions which are part of
εlocal optimal sets. If ε is omitted in the name, ε = 0 is assumed. See, for example, Exam
ple E13.4 on page 158 for an example of local optima in multiobjective problems.
90 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
4.5 Further Deﬁnitions
In order to ease the discussions of diﬀerent Global Optimization algorithms, we furthermore
deﬁne the data structure individual, which is the unison of the genotype and phenotype of
an individual.
Deﬁnition D4.18 (Individual). An individual p is a tuple p = (p.g, p.x) ∈ G X of an
element p.g in the search space G and the corresponding element p.x = gpm(p.g) in the
problem space X.
Besides this basic individual structure, many practical realizations of optimization algo
rithms use extended variants of such a data structure to store additional information like
the objective and ﬁtness values. Then, we will consider individuals as tuples in GXW,
where W is the space of the additional information – W = R
n
R, for instance. Such en
dogenous information [302] often represents local optimization strategy parameters. It then
usually coexists with global (exogenous) conﬁguration settings. In the algorithm deﬁnitions
later in this book, we will often access the phenotypes p.x without explicitly referring to
the genotypephenotype mapping “gpm”, since “gpm” is already implicitly included in the
relation of p.x and p.g according to Deﬁnition D4.18. Again relating to the notation used
in Genetic Algorithms, we will refer to lists of such individual records as populations.
Deﬁnition D4.19 (Population). A population pop is a list consisting of ps individuals
which are jointly under investigation during an optimization process.
pop ∈ (GX)
ps
⇒ (∀p = (p.g, p.x) ∈ pop ⇒ p.x = gpm(p.g)) (4.17)
4.5. FURTHER DEFINITIONS 91
Tasks T4
20. Assume that we are car salesmen and search for an optimal conﬁguration for a car.
We have a search space X = X
1
X
2
X
3
X
4
where
• X
1
= ¦with climate control, without climate control¦,
• X
2
= ¦with sun roof, without sun roof¦,
• X
3
= ¦5 gears, 6 gears, automatic 5 gears, automatic 6 gears¦, and
• X
4
= ¦red, blue, green, silver, white, black, yellow¦.
Deﬁne a proper search space for this problem along with a unary search operation and
a genotypephenotype mapping.
[10 points]
21. Deﬁne a nullary search operation which creates new random points in the n
dimensional search space G = ¦0, 1, 2¦
n
. A nullary search operation has no parameters
(see Section 4.2). ¦0, 1, 2¦
n
denotes a search space which consists of strings of the
length n where each element of the string is from the set ¦0, 1, 2¦. An example for
such a string for n = 5 is (1, 2, 1, 1, 0). A nullary search operation here would thus just
create a new, random string of this type.
[5 points]
22. Deﬁne a unary search operation which randomly modiﬁes a point in the ndimensional
search space G = ¦0, 1, 2¦
n
See Task 21 for a description of the search space. A unary
search operation for G = ¦0, 1, 2¦
n
has one parameter, one string of length n where
each element is from the set ¦0, 1, 2¦. A unary search operation takes such a string
g ∈ G and modiﬁes it. It returns a new string g
′
∈ G which is a bit diﬀerent from g.
This modiﬁcation has to be done in a random way, by adding, subtraction, multiplying,
or permutating g randomly, for example.
[5 points]
23. Deﬁne a possible unary search operation which modiﬁes a point g ∈ R
2
and which
could be used for minimizing a black box objective function f : R
2
→R.
[10 points]
24. When trying to ﬁnd the minimum of an black box objective function f : R
3
→ R, we
could use the same search space and problem space G = X = R
3
. Instead, we could
use G = R
2
and apply a genotypephenotype mapping gpm : R
2
→ R
3
. For example,
we could set gpm
_
g = (g[0], g[1])
T
_
= (g[0], g[1], g[0] +g[1])
T
. By this approach, our
optimization algorithm would only face a twodimensional problem instead of a three
dimensional one. What do you think about this idea? Is it good?
[5 points]
25. Discuss some basic assumptions usually underlying the design of unary and binary
search operations.
[5 points]
26. Deﬁne a search space and search operations for the Traveling Salesman Problem as
discussed in Example E2.2 on page 45.
[10 points]
27. We can deﬁne a search space G = (1..n)
n
for a Traveling Salesman Problem with n
cities. This search space would clearly be numerical since G ⊆ N
n
1
⊆ R
n
. Discuss
about whether it makes sense to use a pure numerical optimization algorithm (with
operations similar to the one you deﬁned in Task 23, for example) to ﬁnd solutions to
the Traveling Salesman Problem in this search space.
[5 points]
92 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
28. Assume that you want to solve an optimization problem deﬁned over an ndimensional
vector of natural numbers where each element of the vector stems from the natural
interval
9
a..b. In other words, the problem space X is a subsetset of the natural
numbers by the n (i. e., X ⊆ N
n
0
); more speciﬁc: X = (a..b)
n
. You have an optimization
algorithm available which is suitable for continuous optimization problems deﬁned over
mdimensional realvectors where each vector element stems from the real interval
10
[c, d], i. e., which deals with search spaces G ⊆ R
m
; more speciﬁc G = [c, d]
m
. m, c,
and d can be chosen freely. n, a, and b are ﬁxed constants that are speciﬁc to the
optimization problem you have.
How can you solve your optimization problem with the available algorithm? Provide an
implementation of the necessary module in a programming language of your choice.
The implementation should obey to the interface deﬁnitions in Chapter 55. If you
choose a programming language diﬀerent from Java, you should give the methods
names similar to those used in that section and write comments relate your code to
the interfaces given there.
[10 points]
9
The natural interval a..b contains only the natural numbers from a to b. For example 1..5 = {1, 2, 34, 5}
and π ∈ 1..4 .
10
The real interval [c, d] contains all the real numbers between c and d. For example π ∈ [1, 4] and
c..d ∈ [c, d].
Chapter 5
Fitness and Problem Landscape: How does the
Optimizer see it?
5.1 Fitness Landscapes
A very powerful metaphor in Global Optimization is the ﬁtness landscape
1
. Like many other
abstractions in optimization, ﬁtness landscapes have been developed and extensively been
researched by evolutionary biologists [701, 1030, 1500, 2985]. Basically, they are visualiza
tions of the relationship between the genotypes or phenotypes in a given population and
their corresponding reproduction probability. The idea of such visualizations goes back to
Wright [2985], who used level contours diagrams in order to outline the eﬀects of selection,
mutation, and crossover on the capabilities of populations to escape locally optimal conﬁg
urations. Similar abstractions arise in many other areas [2598], like in physics of disordered
systems such as spin glasss [311, 1879], for instance.
In Chapter 28, we will discuss Evolutionary Algorithms, which are optimization methods
inspired by natural evolution. The Evolutionary Algorithm research community has widely
adopted the ﬁtness landscapes as relation between individuals and their objective values [865,
1912]. Langdon and Poli [1673]
2
explain that ﬁtness landscapes can be imagined as a view
on a countryside from far above. Here, the height of each point is analogous to its objective
value. An optimizer can then be considered as a shortsighted hiker who tries to ﬁnd the
lowest valley or the highest hilltop. Starting from a random point on the map, she wants to
reach this goal by walking the minimum distance.
Evolutionary Algorithms were initially developed as singleobjective optimization meth
ods. Then, the objective values were directly used as ﬁtness and the “reproduction prob
ability”, i. e., the chance of a candidate solution for being subject to further investigation,
was proportional to them. In multiobjective optimization applications with more sophisti
cated ﬁtness assignment and selection processes, this simple approach does not reﬂect the
biological metaphor correctly anymore.
In the context of this book, we therefore deviate from this view. Since it would possibly
be confusing for the reader if we used a diﬀerent deﬁnition for ﬁtness landscapes than the
rest of the world, we introduce the new term problem landscape and keep using the term
ﬁtness landscape in the traditional manner. In Figure 11.1 on page 140, you can ﬁnd some
examples for ﬁtness landscapes.
1
http://en.wikipedia.org/wiki/Fitness_landscape [accessed 20070703]
2
This part of [1673] is also online available at http://www.cs.ucl.ac.uk/staff/W.Langdon/FOGP/
intro_pic/landscape.html [accessed 20080215].
945 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
5.2 Fitness as a Relative Measure of Utility
When performing a multiobjective optimization, i. e., n = [f [ > 1, the utility of a candidate
solution is characterized by a vector in R
n
. In Section 3.3 on page 55, we have seen how such
vectors can be compared directly in a consistent way. Such comparisons, however, only state
whether one candidate solution is better than another one or not, but give no information
on how much it is better or worse and how interesting it is for further investigation. Often,
such a scalar, relative measure is needed.
In many optimization techniques, especially in Evolutionary Algorithms, the partial or
ders created by Pareto or prevalence comparators are therefore mapped to the positive real
numbers R
+
. For each candidate solution, this single number represents its ﬁtness as solu
tion for the given optimization problem. The process of computing such a ﬁtness value is
often not solely depending on the absolute objective values of the candidate solutions but
also on those of the other individuals known. It could, for instance, be position of a can
didate solution in the list of investigated elements sorted according to the Pareto relation.
Hence, ﬁtness values often only have a meaning inside the optimization process [949] and
may change in each iteration of the optimizer, even if the objective values stay constant.
This concept is similar to the heuristic functions in many deterministic search algorithms
which approximate how many modiﬁcations are likely necessary to an element in order to
reach a feasible solution.
Deﬁnition D5.1 (Fitness). The ﬁtness
3
value v : GX →R
+
maps an individual p to a
positive real value v(p) denoting its utility as solution or its priority in the subsequent steps
of the optimization process.
The ﬁtness mapping v is usually built by incorporating the information of a set of indi
viduals, i. e., a population pop. This allows the search process to compute the ﬁtness as
a relative utility rating. In multiobjective optimization, a ﬁtness assignment procedure
v = assignFitness(pop, cmp
f
) as deﬁned in Deﬁnition D28.6 on page 274 is typically used
to construct v based on the set pop of all individuals which are currently under investiga
tion and a comparison criterion cmp
f
(see Deﬁnition D3.20 on page 77). This procedure is
usually repeated in each iteration t cycle of the optimizer and diﬀerent ﬁtness assignments
v
t=1
, v
t=2
, . . . , result.
We can detect a similarity between a ﬁtness function v and a heuristic function φ as
deﬁned in Deﬁnition D1.14 on page 34 and the more precise Deﬁnition D46.5 on page 518.
Both of them give indicators about the utility of p which, most often, are only valid and make
only sense within the optimization process. Both measures reﬂect some ranking criterion
used to pick the point(s) in the search space which should be investigated further in the next
iteration. The diﬀerence between ﬁtness functions and heuristics is that heuristics usually
are based on some insights about the concrete structure of the optimization problem whereas
a ﬁtness assignment process obtains its information primarily from the objective values of
the individuals. They are thus more abstract and general. Also, as mentioned before, the
ﬁtness can change during the optimization process and often a relative measure whereas
heuristic values for a ﬁxed point in the search space are usually constant and independent
from the structure of the other candidate solutions under investigation.
Although it is a bit counterintuitive that a ﬁtness function is built repeatedly and is not
a constant mapping, this is indeed the case in most multiobjective optimization situations.
In practical implementations, the individual records are often extended to also facilitate the
ﬁtness values. If we refer to v(p) multiple times in an algorithm for the same p and pop,
this should be understood as access to such a cached value which is actually computed only
once. This is indeed a good implementation decision, since some ﬁtness assignment processes
3
http://en.wikipedia.org/wiki/Fitness_(genetic_algorithm) [accessed 20080810]
5.3. PROBLEM LANDSCAPES 95
have a complexity of O
_
(len(pop))
2
_
or higher, because they may involve comparing each
individual with every other one in pop as well as computing density information or clustering.
The term ﬁtness has been borrowed from biology
4
[2136, 2542] by the Evolutionary
Algorithms community. As already mentioned, the ﬁrst applications of Genetic Algorithms
were developed, the focus was mainly on singleobjective optimization. Back then, they
called this single function ﬁtness function and thus, set objective value ≡ ﬁtness value. This
point of view is obsolete in principle, yet you will ﬁnd many contemporary publications that
use this notion. This is partly due the fact that in simple problems with only one objective
function, the old approach of using the objective values directly as ﬁtness, i. e., vp = f(p.x),
can sometimes actually be applied. In multiobjective optimization processes, this is not
possible and ﬁtness assignment processes like those which we are going to elaborate on
in Section 28.3 on page 274 are used instead.
In the context of this book, ﬁtness is subject to minimization, i. e., elements with smaller
ﬁtness are “better” than those with higher ﬁtness. Although this deﬁnition diﬀers from the
biological perception of ﬁtness, it complies with the idea that optimization algorithms are to
ﬁnd the minima of mathematical functions (if nothing else has been stated). It makes ﬁtness
and objective values compatible in that it allows us to consider the ﬁtness of a multiobjective
problem to be the objective value of a singleobjective one.
5.3 Problem Landscapes
The ﬁtness steers the search into areas of the search space considered promising to contain
optima. The quality of an optimization process is largely deﬁned by how fast it ﬁnds these
points. Since most of the algorithms discussed in this book are randomized, meaning that
two consecutive runs of the same conﬁguration may lead to totally diﬀerent results, we can
deﬁne a quality measure based on probabilities.
Deﬁnition D5.2 (Problem Landscape). The problem landscape Φ : X N
1
→ [0, 1]
maps all the points x in a ﬁnite problem space X to the cumulative probability of reaching
them until (inclusively) the τth evaluation of a candidate solution (for a certain conﬁguration
of an optimization algorithm).
Φ(x, τ) = P
_
x has been visited until the τ
th
evaluation
_
∀x ∈ X, τ ∈ N
1
(5.1)
This deﬁnition of problem landscape is very similar to the performance measure deﬁnition
used by Wolpert and Macready [2963, 2964] in their No Free Lunch Theorem (see Chapter 23
on page 209). In our understanding, problem landscapes are not only closer to the original
meaning of ﬁtness landscapes in biology, they also have another advantage. According to
this deﬁnition, all entities involved in an optimization process over a ﬁnite problem space
directly inﬂuence the problem landscape. The choice of the search operations in the search
space G, the way the initial elements are picked, the genotypephenotype mapping, the
objective functions, the ﬁtness assignment process, and the way individuals are selected for
further exploration all have impact on the probability to discover certain candidate solutions
– the problem landscape Φ.
Problem landscapes are essentially some form of cumulative distribution functions
(see Deﬁnition D53.18 on page 658). As such, they can be considered as the counterparts
of the Markov chain models for the search state used in convergence proofs such as those
in [2043] which resemble probability mass/density functions. On basis of its cumulative
character, we can furthermore make the following assumptions about Φ(x, τ):
Φ(x, τ
1
) ≤ Φ(x, τ
2
) ∀τ
1
< τ
2
∧ x ∈ X, τ
1
, τ
2
∈ N
1
(5.2)
0 ≤ Φ(x, τ) ≤ 1 ∀x ∈ X, τ ∈ N
1
(5.3)
4
http://en.wikipedia.org/wiki/Fitness_%28biology%29 [accessed 20080222]
965 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
Example E5.1 (Inﬂuence of Search Operations on Φ).
Let us now give a simple example for problem landscapes and how they are inﬂuenced by
the optimization algorithm applied to them. Figure 5.1 illustrates one objective function,
deﬁned over a ﬁnite subset X of the twodimensional real plane, which we are going to
optimize. In this example, we use the problem space X also as search space G, i. e., the
f
(
x
)
X
x
l
«
x
«
Figure 5.1: An example optimization problem.
genotypephenotype mapping is the identity mapping. For optimization, we will use a very
simple Hill Climbing algorithm
5
, which initially randomly draws one candidate solution with
uniform probability from the X. In each iteration, it creates a new candidate solution from
the current one by using a unary search operation. The old and the new candidate are
compared, and the better one is kept. Hence, we do not need to diﬀerentiate between ﬁtness
and objective values. In the example, better means has lower ﬁtness.
In Figure 5.1, we can spot one local optimum x
⋆
l
and one global optimum x
⋆
. Between
them, there is a hill, an area of very bad ﬁtness. The rest of the problem space exhibits a
small gradient into the direction of its center. The optimization algorithm will likely follow
this gradient and sooner or later discover x
⋆
l
or x
⋆
. The chances to ﬁrst of x
⋆
l
are higher,
since it is closer to the center of X and thus, has a bigger basin of attraction.
With this setting, we have recorded the traces of two experiments with 1.3 million runs of
the optimizer (8000 iterations each). From these records, we can approximate the problem
landscapes very good.
In the ﬁrst experiment, depicted in Figure 5.2, the Hill Climbing optimizer used a search
operation searchOp
1
: X → X which creates a new candidate solution according to a (dis
cretized) normal distribution around the old one. We divided X in a regular lattice and
in the second experiment series, we used an operator searchOp
2
: X → X which produces
candidate solutions that are direct neighbors of the old ones in this lattice. The problem
landscape Φ produced by this operator is shown in Figure 5.3. Both operators are complete,
since each point in the search space can be reached from each other point by applying them.
searchOp
1
(x) ≡ (x
1
+ randomNorm(0, 1) , x
2
+ randomNorm(0, 1)) (5.4)
searchOp
1
(x) ≡ (x
1
+ randomUni[−1, 1) , x
2
+ randomUni[−1, 1)) (5.5)
In both experiments, the probabilities of the elements in the problem space of being discov
ered are very low, near to zero, and rather uniformly distributed in the ﬁrst few iterations.
To put it precise, since our problem space is a 3636 lattice, this probability is exactly
1
/36
2
in the ﬁrst iteration. Starting with the tenth or so evaluation, small peaks begin to form
around the places where the optima are located. These peaks then begin to grow.
5
Hill Climbing algorithms are discussed thoroughly in Chapter 26.
5.3. PROBLEM LANDSCAPES 97
0
0.2
0.4
0.6
0.8
F
(
x
,
0
)
X
Fig. 5.2.a: Φ(x, 1)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
)
X
Fig. 5.2.b: Φ(x, 2)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
)
X
Fig. 5.2.c: Φ(x, 5)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
)
X
Fig. 5.2.d: Φ(x, 10)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
)
X
Fig. 5.2.e: Φ(x, 50)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
)
X
Fig. 5.2.f: Φ(x, 100)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
0
)
X
Fig. 5.2.g: Φ(x, 500)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
0
)
X
Fig. 5.2.h: Φ(x, 1000)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
0
0
0
)
X
Fig. 5.2.i: Φ(x, 2000)
0
0.2
0.4
0.6
0.8
F
(
x
,
4
0
0
0
)
X
Fig. 5.2.j: Φ(x, 4000)
0
0.2
0.4
0.6
0.8
F
(
x
,
6
0
0
0
)
X
Fig. 5.2.k: Φ(x, 6000)
0
0.2
0.4
0.6
0.8
F
(
x
,
8
0
0
0
)
X
Fig. 5.2.l: Φ(x, 8000)
Figure 5.2: The problem landscape of the example problem derived with searchOp
1
.
985 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
0
0.2
0.4
0.6
0.8
F
(
x
,
0
)
X
Fig. 5.3.a: Φ(x, 1)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
)
X
Fig. 5.3.b: Φ(x, 2)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
)
X
Fig. 5.3.c: Φ(x, 5)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
)
X
Fig. 5.3.d: Φ(x, 10)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
)
X
Fig. 5.3.e: Φ(x, 50)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
)
X
Fig. 5.3.f: Φ(x, 100)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
0
)
X
Fig. 5.3.g: Φ(x, 500)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
0
)
X
Fig. 5.3.h: Φ(x, 1000)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
0
0
0
)
X
Fig. 5.3.i: Φ(x, 2000)
0
0.2
0.4
0.6
0.8
F
(
x
,
4
0
0
0
)
X
Fig. 5.3.j: Φ(x, 4000)
0
0.2
0.4
0.6
0.8
F
(
x
,
6
0
0
0
)
X
Fig. 5.3.k: Φ(x, 6000)
0
0.2
0.4
0.6
0.8
F
(
x
,
8
0
0
0
)
X
Fig. 5.3.l: Φ(x, 8000)
Figure 5.3: The problem landscape of the example problem derived with searchOp
2
.
5.3. PROBLEM LANDSCAPES 99
Since the local optimum x
⋆
l
resides at the center of a large basin and the gradients point
straight into its direction, it has a higher probability of being found than the global optimum
x
⋆
. The diﬀerence between the two search operators tested becomes obvious starting with
approximately the 2000th iteration. The process with the operator utilizing the normal
distribution, the Φ value of the global optimum begins to rise farther and farther, ﬁnally
surpassing the one of the local optimum. Even if the optimizer gets trapped in the local
optimum, it will still eventually discover the global optimum and if we ran this experiment
longer, the according probability would have converge to 1. The reason for this is that with
the normal distribution, all points in the search space have a nonzero probability of being
found from all other points in the search space. In other words, all elements of the search
space are adjacent.
The operator based on the uniform distribution is only able to create candidate solutions
in the direct neighborhood of the known points. Hence, if an optimizer gets trapped in the
local optimum, it can never escape. If it arrives at the global optimum, it will never discover
the local one. In Fig. 5.3.l, we can see that Φ(x
⋆
l
, 8000) ≈ 0.7 and Φ(x
⋆
, 8000) ≈ 0.3. In
other words, only one of the two points will be the result of the optimization process whereas
the other optimum remains undiscovered.
From Example E5.1 we can draw four conclusions:
1. Optimization algorithms discover good elements with higher probability than elements
with bad characteristics, given the problem is deﬁned in a way which allows this to
happen.
2. The success of optimization depends very much on the way the search is conducted,
i. e., the deﬁnition of neighborhoods of the candidate solutions (see Deﬁnition D4.12
on page 86).
3. It also depends on the time (or the number of iterations) the optimizer allowed to use.
4. Hill Climbing algorithms are no Global Optimization algorithms since they have no
general means of preventing getting stuck at local optima.
1005 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
Tasks T5
29. What is are the similarities and diﬀerences between ﬁtness values and traditional
heuristic values?
[10 points]
30. For many (singleobjective) applications, a ﬁtness assignment process which assigns
the rank of an individual p in a population pop according to the objective value f(p.x)
of its phenotype p.x as its ﬁtness v(p) is highly useful. Specify an algorithm for such
an assignment process which sorts the individuals in a list (population pop) according
to their objective value and assigns the index of an individual in this sorted list as its
ﬁtness.
Should the individuals be sorted in ascending or descending order if f is subject to
maximization and the ﬁtness v is subject to minimization?
[10 points]
Chapter 6
The Structure of Optimization: Putting it together.
After discussing the single entities involved in optimization, it is time to put them together
to the general structure common to all optimization processes, to an overall picture. This
structure consists of the previously introduced welldeﬁned spaces and sets as well as the
mappings between them. One example for this structure of optimization processes is given in
Figure 6.1 by using a Genetic Algorithm which encodes the coordinates of points in a plane
into bit strings as an illustration. The same structure was already used in Example E4.2.
6.1 Involved Spaces and Sets
At the lowest level, there is the search space G inside which the search operations navigate.
Optimization algorithms can usually only process a subset of this space at a time. The
elements of G are mapped to the problem space X by the genotypephenotype mapping
“gpm”. In many cases, “gpm” is the identity mapping (if G = X) or a rather trivial, injective
transformation. Together with the elements under investigation in the search space, the
corresponding candidate solutions form the population pop of the optimization algorithm.
A population may contain the same individual record multiple times if the algorithm
permits this. Some optimization methods only process one candidate solution at a time.
In this case, the size of pop is one and a single variable instead of a list data structure is
used to hold the element’s data. Having determined the corresponding phenotypes of the
genotypes produced by the search operations, the objective functions(s) can be evaluated.
In the singleobjective case, a single value f(p.x) determines the utility of p.x. In case
of a multiobjective optimization, on the other hand, a vector f (p.x) of objective values
is computed for each candidate solution p.x in the population. Based on these objective
values, the optimizer may compute a real ﬁtness v(p) denoting how promising an individual
p is for further investigation, i. e., as parameter for the search operations, relative to the
other known candidate solutions in the population pop. Either based on such a ﬁtness value,
directly on the objective values, or on other criteria, the optimizer will then decide how to
proceed, how to generate new points in G.
102 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
(1,3)
(3,3)
(0,2)
(0,2)
(0,2)
(0,2) (0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
ProblemSpace X
SearchSpace G
0100
1010
1100 1101
0000 1000 1001
0101
0110 1110 1111 0111
0001
0010 0011 1011
ObjectiveSpace Y R
n
Í
FitnessSpace R
+
(0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
Population (Phenotypes)
Population (Genotypes)
ObjectiveValues
FitnessValues
1111
1111
1110
1000
0100
0111
0111
0010
0010
0010
0010
1111
f
i
t
n
e
s
s
f
(
x
)
1
f
(
x
)
1
PopÎ( ) G X ´
ps
Fi tness and heuri sti c val ues
(normally)haveonlyameaninginthe
context of a population or a set of
solution candidates..............................
optimal
solutions
f: XY Y R ( )
n
Í
v: G X ´ R
+
TheInvolvedSpaces TheInvolvedSets/Elements
FitnessAssignmentProcess FitnessAssignmentProcess
GenotypePhenotypeMapping
ObjectiveFunction(s)
GenotypePhenotypeMapping
ObjectiveFunction(s)
Pop ( ) G X ´
ps
Î
Figure 6.1: Spaces, Sets, and Elements involved in an optimization process.
6.2. OPTIMIZATION PROBLEMS 103
6.2 Optimization Problems
An optimization problem is the task we wish to solve with an optimization algorithm. In
Deﬁnition D1.1 and D1.2 on page 22, we gave two diﬀerent views on how optimization
problems can be deﬁned. Here, we will give a more precise discussion which will be the basis
of our considerations later in this book. An optimization problem, in the context of this
book, includes the deﬁnition of the possible solution structures (X), the means to determine
their utilities (f ), and a way to compare two candidate solutions (cmp
f
).
Deﬁnition D6.1 (Optimization Problem (Mathematical View)). An optimization
problem “OptProb” is a triplet OptProb = (X, f , cmp
f
) consisting of a problem space X, a
set f = ¦f
1
, f
2
, . . . ¦ of n = [f [ objective functions f
i
: X → R ∀i ∈ 1..n and a comparator
function cmp
f
: XX →R which allowing us to compare two candidate solutions x
1
, x
2
∈ X
based on their objective values f (x
1
) and f (x
2
) (and/or constraint satisfaction properties or
other possible features).
Many optimization algorithms are general enough to work on diﬀerent search spaces and
utilize various search operations. Others can only be applied to a distinct class of search
spaces or have ﬁxed operators. Each optimization algorithm is deﬁned by its speciﬁcation
that states which operations have to be carried out in which order. Deﬁnition D6.2 provides
us with a mathematical expression for one speciﬁc application of an optimization algorithm
to a speciﬁc optimization problem instance.
Deﬁnition D6.2 (Optimization Setup). The setup “OptSetup” of an application of an
optimization algorithm is the ﬁvetuple OptSetup = (OptProb, G, Op, gpm, Θ) of the opti
mization problem OptProb to be solved, the search space G in which the search operations
searchOp ∈ Op navigate and which is translated to the problem space X with the genotype
phenotype mapping gpm, and the algorithmspeciﬁc parameter conﬁguration Θ (such as size
limits for populations, time limits for the optimization process, or starting seeds for random
number generators).
This setup, together with the speciﬁcation of the algorithm, entirely deﬁnes the behavior of
the optimizer “OptAlgo”.
Deﬁnition D6.3 (Optimization Algorithm). An optimization algorithm is a mapping
OptAlgo : OptSetup →ΦX
∗
of an optimization setup “OptSetup” to a problem landscape
Φ as well as a mapping to the optimization result x ∈ X
∗
, an arbitrarily sized subset of the
problem space.
Apart from this very mathematical deﬁnition, we provide a Java interface which unites the
functionality and features of optimization algorithms in Section 55.1.5 on page 713. Here,
the functionality (ﬁnding a list of good individuals) and the properties (the objective func
tion(s), the search operations, the genotypephenotype mapping, the termination criterion)
characterize an optimizer. This perspective comes closer to the practitioner’s point of view
whereas Deﬁnition D6.3 gives a good summary of the theoretical idea of optimization.
If an optimization algorithm can eﬀectively be applied to an optimization problem, it
will ﬁnd at least one local optimum x
⋆
l
if granted inﬁnite processing time and if such an
optimum exists. Usual prerequisites for eﬀective problem solving are a weakly complete
set of search operations Op, a surjective genotypephenotype mapping “gpm”, and that the
objective functions f do provide more information than needleinahaystack conﬁgurations
(see Section 16.7 on page 175).
∃x
⋆
l
∈ X : lim
τ→∞
Φ(x
⋆
l
, τ) = 1 (6.1)
From an abstract perspective, an optimization algorithm is characterized by
104 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
1. the way it assigns ﬁtness to the individuals,
2. the ways it selects them for further investigation,
3. the way it applies the search operations, and
4. the way it builds and treats its state information.
The best optimization algorithm for a given setup OptSetup is the one with the highest values
of Φ(x
⋆
, τ) for the optimal elements x
⋆
in the problem space with the lowest corresponding
values of τ. Therefore, ﬁnding the best optimization algorithm for a given optimization
problem is, itself, a multiobjective optimization problem.
For a perfect optimization algorithm (given an optimization setup with weakly complete
search operations, a surjective genotypephenotype mapping, and reasonable objective func
tion(s)), Equation 6.2 would hold. However, it is more than questionable whether such an
algorithm can actually be built (see Chapter 23).
∀x
1
, x
2
∈ X : x
1
≺ x
2
⇒ lim
τ→∞
Φ(x
1
, τ) > lim
τ→∞
Φ(x
2
, τ) (6.2)
6.3 Other General Features
There are some further common semantics and operations that are shared by most optimiza
tion algorithms. Many of them, for instance, start out by randomly creating some initial
individuals which are then reﬁned iteratively. Optimization processes which are not allowed
to run inﬁnitely have to ﬁnd out when to terminate. In this section we deﬁne and discuss
general abstractions for such similarities.
6.3.1 Gradient Descend
Deﬁnition D6.4 (Gradient). A gradient
1
of a scalar mathematical function (scalar ﬁeld)
f : R
n
→R is a vector function (vector ﬁeld) which points into the direction of the greatest
increase of the scalar ﬁeld. It is denoted by ∇f or grad(f).
lim
h→0
[[f (x +h) −f (x) −grad(f)(x) h[[
[[h[[
= 0 (6.3)
In other words, if we have a diﬀerentiable smooth function with low total variation
2
[2224],
the gradient would be the perfect hint for optimizer, giving the best direction into which
the search should be steered. What many optimization algorithms which treat objective
functions as black boxes do is to use rough estimations of the gradient for this purpose. The
Downhill Simplex (see Chapter 43 on page 501), for example, uses a polytope consisting of
n + 1 points in G = R
n
for approximating the direction with the steepest descend.
In most cases, we cannot directly diﬀerentiate the objective functions with the Nabla op
erator
3
∇f because its exact mathematical characteristics are not known or too complicated.
Also, the search space G is often not a vector space over the real numbers R.
Thus, samples of the search space are used to approximate the gradient. If we compare
two elements g
1
and g
2
via their corresponding phenotypes x
1
= gpm(g
1
) and x
2
= gpm(g
2
)
and ﬁnd x
1
≺ x
2
, we can assume that there is some sort of gradient facing upwards from
x
1
to x
2
and hence, from g
1
to g
2
. When descending this gradient, we can hope to ﬁnd an
x
3
with x
3
≺ x
1
and ﬁnally the global minimum.
1
http://en.wikipedia.org/wiki/Gradient [accessed 20071106]
2
http://en.wikipedia.org/wiki/Total_variation [accessed 20090701]
3
http://en.wikipedia.org/wiki/Del [accessed 20080215]
6.3. OTHER GENERAL FEATURES 105
6.3.2 Iterations
Global optimization algorithms often proceed iteratively and step by step evaluate candi
date solutions in order to approach the optima. We distinguish between evaluations τ and
iterations t.
Deﬁnition D6.5 (Evaluation). The value τ ∈ N
0
denotes the number of candidate solu
tions for which the set of objective functions f has been evaluated.
Deﬁnition D6.6 (Iteration). An iteration
4
refers to one round in a loop of an algorithm.
It is one repetition of a speciﬁc sequence of instructions inside of the algorithm.
Algorithms are referred to as iterative if most of their work is done by cyclic repetition of
one main loop. In the context of this book, an iterative optimization algorithm starts with
the ﬁrst step t = 0. The value t ∈ N
0
is the index of the iteration currently performed by
the algorithm and t + 1 refers to the following step. One example for iterative algorithm
is Algorithm 6.1. In some optimization algorithms like Genetic Algorithms, for instance,
iterations are referred to as generations.
There often exists a welldeﬁned relation between the number of performed evaluations
τ of candidate solutions and the index of the current iteration t in an optimization pro
cess: Many Global Optimization algorithms create and evaluate exactly a certain number
of individuals per generation.
6.3.3 Termination Criterion
The termination criterion terminationCriterion
__
is a function with access to all the infor
mation accumulated by an optimization process, including the number of performed steps
t, the objective values of the best individuals, and the time elapsed since the start of the
process. With terminationCriterion
__
, the optimizers determine when they have to halt.
Deﬁnition D6.7 (Termination Criterion). When the termination criterion function
terminationCriterion ∈ ¦true, false¦ evaluates to true, the optimization process will
stop and return its results.
In Listing 55.9 on page 712, you can ﬁnd a Java interface resembling Deﬁnition D6.7. Some
possible criteria that can be used to decide whether an optimizer should terminate or not
are [2160, 2622, 3088, 3089]:
1. The user may grant the optimization algorithm a maximum computation time. If this
time has been exceeded, the optimizer should stop. Here we should note that the time
needed for single individuals may vary, and so will the times needed for iterations.
Hence, this time threshold can sometimes not be abided exactly and may avail to
diﬀerent total numbers of iterations during multiple runs of the same algorithm.
2. Instead of specifying a time limit, an upper limit for the number of iterations
˘
t or
individual evaluations
˘
τ may be speciﬁed. Such criteria are most interesting for the
researcher, since she often wants to know whether a qualitatively interesting solution
can be found for a given problem using at most a predeﬁned number of objective
function evaluations. In Listing 56.53 on page 836, we give a Java implementation of
the terminationCriterion
__
operation which can be used for that purpose.
4
http://en.wikipedia.org/wiki/Iteration [accessed 20070703]
106 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
3. An optimization process may be stopped when no signiﬁcant improvement in the
solution quality was detected for a speciﬁed number of iterations. Then, the process
most probably has converged to a (hopefully good) solution and will most likely not
be able to make further progress.
4. If we optimize something like a decision maker or classiﬁer based on a sample data
set, we will normally divide this data into a training and a test set. The training set is
used to guide the optimization process whereas the test set is used to verify its results.
We can compare the performance of our solution when fed with the training set to
its properties if fed with the test set. This comparison helps us detect when most
probably no further generalization can be achieved by the optimizer and we should
terminate the process.
5. Obviously, we can terminate an optimization process if it has already yielded a suﬃ
ciently good solution.
In practical applications, we can apply any combination of the criteria above in order to
determine when to halt. Algorithm 1.1 on page 40 is an example for a badly designed termi
nation criterion: the algorithm will not halt until a globally optimal solution is found. From
its structure, however, it is clear that it is very prone for premature convergence (see Chap
ter 13 on page 151 and thus, may get stuck at a local optimum. Then, it will run forever
even if an optimal solution exists. This example also shows another thing: The structure
of the termination criterion decides whether a probabilistic optimization technique is a Las
Vegas or Monte Carlo algorithm (see Deﬁnition D12.2 and D12.3). How the termination
criterion may be tested in an iterative algorithm is illustrated in Algorithm 6.1.
Algorithm 6.1: Example Iterative Algorithm
Input: [implicit] terminationCriterion: the termination criterion
Data: t: the iteration counter
1 begin
2 t ←− 0
// initialize the data of the algorithm
3 while terminationCriterion
__
do
// perform one iteration  here happens the magic
4 t ←− t + 1
6.3.4 Minimization
The original form of many optimization algorithms was developed for singleobjective op
timization. Such algorithms may be used for both, minimization or maximization. As
previously mentioned, we will present them as minimization processes since this is the most
commonly used notation without loss of generality. An algorithm that maximizes the func
tion f may be transformed to a minimization using −f instead.
Note that using the prevalence comparisons as introduced in Section 3.5.2 on page 76,
multiobjective optimization processes can be transformed into singleobjective minimization
processes. Therefore x
1
≺ x
2
⇔cmp
f
(x
1
, x
2
) < 0.
6.3.5 Modeling and Simulating
While there are a lot of problems where the objective functions are mathematical expressions
that can directly be computed, there exist problem classes far away from such simple function
optimization that require complex models and simulations.
6.3. OTHER GENERAL FEATURES 107
Deﬁnition D6.8 (Model). A model
5
is an abstraction or approximation of a system that
allows us to reason and to deduce properties of the system.
Models are often simpliﬁcations or idealization of realworld issues. They are deﬁned by
leaving away facts that probably have only minor impact on the conclusions drawn from
them. In the area of Global Optimization, we often need two types of abstractions:
1. The models of the potential solutions shape the problem space X. Examples are
(a) programs in Genetic Programming, for example for the Artiﬁcial Ant problem,
(b) construction plans of a skyscraper,
(c) distributed algorithms represented as programs for Genetic Programming,
(d) construction plans of a turbine,
(e) circuit diagrams for logical circuits, and so on.
2. Models of the environment in which we can test and explore the properties of the
potential solutions, like
(a) a map on which the Artiﬁcial Ant will move driven by an optimized program,
(b) an abstraction from the environment in which the skyscraper will be built, with
wind blowing from several directions,
(c) a model of the network in which the evolved distributed algorithms can run,
(d) a physical model of air which blows through the turbine,
(e) the model of an energy source the other pins which will be attached to the circuit
together with the possible voltages on these pins.
Models themselves are rather static structures of descriptions and formulas. Deriving con
crete results (objective values) from them is often complicated. It often makes more sense
to bring the construction plan of a skyscraper to life in a simulation. Then we can test the
inﬂuence of various wind forces and directions on building structure and approximate the
properties which deﬁne the objective values.
Deﬁnition D6.9 (Simulation). A simulation
6
is the computational realization of a model.
Whereas a model describes abstract connections between the properties of a system, a
simulation realizes these connections.
Simulations are executable, live representations of models that can be as meaningful as real
experiments. They allow us to reason if a model makes sense or not and how certain objects
behave in the context of a model.
5
http://en.wikipedia.org/wiki/Model_%28abstract%29 [accessed 20070703]
6
http://en.wikipedia.org/wiki/Simulation [accessed 20070703]
Chapter 7
Solving an Optimization Problem
Before considering the following steps, always
ﬁrst check if a dedicated algorithm exists for
the problem (see Section 1.1.3).
Figure 7.1: Use dedicated algorithms!
There are many highly eﬃcient algorithms for solving special problems. For ﬁnding
the shortest paths from one source node to the other nodes in a network, for example, one
would never use a metaheuristic but Dijkstra’s algorithm [796]. For computing the maximum
ﬂow in a ﬂow network, the algorithm by Dinitz [800] beats any probabilistic optimization
method. The key to win with metaheuristics is to know when to apply them and when not.
The ﬁrst step before using them is always an internet or literature search in order to check
for alternatives.
If this search did not lead to the discovery of a suitable method to solve the problem,
a randomized or traditional optimization technique can be used. Therefore, the following
steps should be performed in the order they are listed below and sketched in Figure 7.2.
0. Check for possible alternatives to using optimization algorithms!
1. Deﬁne the data structures for the possible solutions, the problem space X (see Sec
tion 2.1).
2. Deﬁne the objective functions f ∈ f which are subject to optimization (see Section 2.2).
3. Deﬁne the equality constraints h and inequality constraints g, if needed (see Sec
tion 3.4).
4. Deﬁne a comparator function cmp
f
(x
1
, x
2
) able to decide which one of two candidate
solutions x
1
and x
2
is better (based on f , and the constraints h
i
and g
i
, see Sec
tion 3.5.2). In the unconstrained, singleobjective case which is most common, this
function will just select the one individual with the lower objective value.
5. Decide upon the optimization algorithm to be used for the optimization problem (see
Section 1.3).
6. Choose an appropriate search space G (see Section 4.1). This decision may be directly
implied by the choice in Point 5.
110 7 SOLVING AN OPTIMIZATION PROBLEM
Determinedatastructureforsolutions.
Defineoptimizationcriteriaandconstraints.
Performrealexperimentconsistingofenough
runstogetstatisticallysignificantresults.
Areresults
satisfying?
X
G G G G X
m
;searchOp: ;gpm:
f: p Pop,Pop X R
n
( ;v ) Î R
+
EA?CMAES?DE?
populationsize=x,
crossoverrate=y,
...
Selectoptimizationalgorithm.
Configureoptimizationalgorithm.
Runsmallscaletestexperiments.
yes
no
StatisticalEvaluation
ProductionStage
e.g.,30runsforeachconfiguration
andprobleminstance
e.g., Ourapproachbeatsapproach
xyzwith2%probabilitytoerrina
twotailed MannWhitneytest.
„
integrateintorealapplication
afewrunsperconfiguration
Choosesearchspace,searchoperations,
andmappingtoproblemspace.
Areresults
satisfying?
yes
no
Aretherededi
catedalgorithms?
no
yes
Usededicatedalgorithm!
Figure 7.2:
7 SOLVING AN OPTIMIZATION PROBLEM 111
7. Deﬁne a genotypephenotype mapping “gpm” between the search space G and the
problem space X (see Section 4.3).
8. Deﬁne the set searchOperations of search operations searchOp ∈ Op navigating in
the search space (see Section 4.2). This decision may again be directly implied by the
choice in Point 5.
9. Conﬁgure the optimization algorithm, i. e., set all its parameters to values which look
reasonable.
10. Run a smallscale test with the algorithm and the settings and test, whether they
approximately meet the expectations. If not, go back to Point 9, 5, or even Point 1,
depending on how bad the results are. Otherwise, continue at Point 11
11. Conduct experiments with some diﬀerent conﬁgurations and perform many runs for
each conﬁguration. Obtain enough results to make statistical comparisons between
the conﬁgurations and with other algorithms (see e.g., Section 53.7). If the results are
bad, again consider to go back to a previous step.
12. Now either the results of the optimizer can be used and published or the developed
optimization system can become part of an application (see Part V and [2912] as an
example), depending on the goal of the work.
112 7 SOLVING AN OPTIMIZATION PROBLEM
Tasks T7
31. Why is it important to always look for traditional dedicated and deterministic alterna
tives before trying to solve an optimization problem with a metaheuristic optimization
algorithm?
[5 points]
32. Why is a single experiment with a randomized optimization algorithm not enough to
show that this algorithm is good (or bad) for solving a given optimization task? Why
do we need comprehensive experiments instead, as pointed out in Point 11 on the
previous page?
[5 points]
Chapter 8
Baseline Search Patterns
Metaheuristic optimization algorithms trade in the guaranteed optimality of their results for
shorter runtime. This means that their results can be (but not necessarily are) worse than
the global optima of a given optimization problem. The results are acceptable as long as the
user is satisﬁed with them. An interesting question, however, is When is an optimization
algorithm suitable for a given problem?. The answer to this question is strongly related to
how well it can utilize the information it gains during the optimization process in order to
ﬁnd promising regions in the search space.
The primary information source for any optimizer is the (set of) objective functions.
Hence, it makes sense to declare an optimization algorithm as eﬃcient if it can outperform
primitive approaches which do not utilize the information gained from evaluating candidate
solutions with objective functions, such as the random sampling and random walk methods
which will be introduced in this chapter. The comparison criterion should be the solution
quality in terms of objective values.
Deﬁnition D8.1 (Eﬀectiveness). An optimization algorithm is eﬀective if it can (statis
tically signiﬁcantly) outperform random sampling and random walks in terms of solution
quality according to the objective functions.
A third baseline algorithm is exhaustive enumeration, the testing of all possible solutions. It
does not utilize the information gained from the objective functions as well. It can be used
as a yardstick in terms of optimization speed.
Deﬁnition D8.2 (Eﬃciency). An optimization algorithm is eﬃcient if it can meet the
criterion of eﬀectiveness (Deﬁnition D8.1) with a lowerthanexponential algorithmic com
plexity (see Section 12.1 on page 143), i. e., within fewer steps than required by exhaustive
enumeration.
8.1 Random Sampling
Random sampling is the most basic search approach. It is uninformed, i. e., does not make
any use of the information it can obtain from evaluating candidate solutions with the objec
tive functions. As sketched in Algorithm 8.1 and implemented in Listing 56.4 on page 729,
it keeps creating genotypes of random conﬁguration until the termination criterion is met.
In every iteration, it therefore utilizes the nullary search operation “create” which builds
a genotype p.g ∈ G with a random conﬁguration and maps this genotype to a phenotype
p.x ∈ X with the genotypephenotype mapping “gpm”. It checks whether the new pheno
type is better than the best candidate solution discovered so far (x). If so, it is stored in
x.
114 8 BASELINE SEARCH PATTERNS
Algorithm 8.1: x ←− randomSampling(f)
Input: f: the objective/ﬁtness function
Data: p: the current individual
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− ∅
3 repeat
4 p.g ←− create
__
5 p.x ←− gpm(p.g)
6 if (x = ∅) ∨ (f(p.x) < f(x)) then
7 x ←− p.x
8 until terminationCriterion
__
9 return x
Notice that x is the only variable used for information aggregation and that there is
no form of feedback. The search makes no use of the knowledge about good structures of
candidate solutions nor does it further investigate a genotype which has produced a good
phenotype. Instead, if “create” samples the search space G in a uniform, unbiased way,
random sampling will be completely unbiased as well.
This unbiasedness and ignorance of available information turns random sampling into
one of the baseline algorithms used for checking whether an optimization method is suitable
for a given problem. If an optimization algorithm does not perform better than random
sampling on a given optimization problem, we can deduce that (a) it is not able to utilize the
information gained from the objective function(s) eﬃciently (see Chapter 14 and 16), (b) its
search operations cannot modify existing genotypes in a useful manner (see Section 14.2),
or (c) the information given by the objective functions is largely misleading (see Chapter 15)..
It is well known as the No Free Lunch Theorem (see Chapter 23), that averaged over the set
of all possible optimization problems, no optimization algorithm can beat random sampling.
However, for speciﬁc families of problems and for most practically relevant problems in
general, outperforming random sampling is not too complicated. Still, a comparison to this
primitive algorithm is necessary whenever a new, not yet analyzed optimization task is to
be solved.
8.2 Random Walks
Like random sampling, random walks
1
[899, 1302, 2143] (sometimes also called drunkard’s
walks) can be considered as an uninformed, undirected search methods. They form the
second one of the two baseline algorithms to which optimization methods must be compared
for speciﬁc problems in order to test their utility. We specify the random walk algorithm in
Algorithm 8.2 and implement it in Listing 56.5 on page 732.
While random sampling algorithms make use of no information gathered so far, random
walks base their next “search move” on the most recently investigated genotype p.g. Instead
of sampling the search space uniformly as random sampling does, a random walk starts at one
(possibly random) point. In each iteration, it uses a unary search operation (in Algorithm 8.2
simply denoted as searchOp(p.g)) to move to the next point.
Because of this structure, random walks are quite similar in their structure to basic
optimization algorithms such as Hill Climbing (see Chapter 26). The diﬀerence is that
optimization algorithms make use of the information they gain by evaluating the candidate
1
http://en.wikipedia.org/wiki/Random_walk [accessed 20100826]
8.2. RANDOM WALKS 115
Algorithm 8.2: x ←− randomWalk(f)
Input: f: the objective/ﬁtness function
Data: p: the current individual
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− ∅
3 p.g ←− create
__
4 repeat
5 p.x ←− gpm(p.g)
6 if (x = ∅) ∨ (f(p.x) < f(x)) then
7 x ←− p.x
8 p.g ←− searchOp(p.g)
9 until terminationCriterion
__
10 return x
solutions with the objective functions. In any useful conﬁguration for optimization problems
which are not too illdeﬁned, they should thus be able to outperform random walks and
random sampling.
In cases where they cannot, applying one of these two uninformed, primitive strategies is
advised. Better yet, since optimization methods always introduce some bias into the search
process, always try to steer into the direction of the search space where they expected a
ﬁtness improvement, random walk or random sampling may even outperform them if the
problems are deceptive enough (see Chapter 15). Circumstances where random walks can
be the search algorithms of choice are, for example,
1. if a search algorithm encounters a state explosion because there are too many states
and transcending to them would consume too much memory.
2. In certain cases of online search it is not possible to apply systematic approaches
like breadthﬁrst search or depthﬁrst search. If the environment, for instance, is only
partially observable and each state transition represents an immediate interaction with
this environment, we are maybe not able to navigate to past states again. One example
for such a case is discussed in the work of Skubch [2516, 2517] about reasoning agents.
Random walks are often used in optimization theory for determining features of a ﬁtness
landscape. Measures that can be derived mathematically from a walk model include esti
mates for the number of steps needed until a certain conﬁguration is found, the chance to
ﬁnd a certain conﬁguration at all, and the average diﬀerence of the objective values of two
consecutive populations. From practically running random walks, some information about
the search space can be extracted. Skubch [2516, 2517], for instance, uses the number of en
counters of certain state transition during the walk in order to successively build a heuristic
function.
Figure 8.1 illustrates some examples for random walks. The subﬁgures 8.1.a to 8.1.c
each illustrate three random walks in which axisparallel steps of length 1 are taken. The
dimensions range from 1 to 3 from the left to the right. The same goes for Fig. 8.1.d to
Fig. 8.1.f, each of which displaying three random walks with Gaussian (standardnormally)
distributed step widths and arbitrary step directions.
8.2.1 Adaptive Walks
An adaptive walk is a theoretical optimization method which, like a random walk, usually
works on a population of size 1. It starts at a random location in the search space and pro
116 8 BASELINE SEARCH PATTERNS
s
t
e
p
s
Fig. 8.1.a: 1d, step length 1 Fig. 8.1.b: 2d, axisparallel,
step length 1
Fig. 8.1.c: 3d, axisparallel,
step length 1
s
t
e
p
s
Fig. 8.1.d: 1d, Gaussian step
length
Fig. 8.1.e: 2d, arbitrary angle,
Gaussian step length
Fig. 8.1.f: 3d, arbitrary angle,
Gaussian step length
Figure 8.1: Some examples for random walks.
ceeds by changing (or mutating) its single individual. For this modiﬁcation, three methods
are available:
1. Onemutant change: The optimization process chooses a single new individual from
the set of “onemutant change” neighbors. These neighbors diﬀer from the current
candidate solution in only one property, i. e., are in its ε = 0neighborhood. If the new
individual is better, it replaces its ancestor, otherwise it is discarded.
2. Greedy dynamics: The optimization process chooses a single new individual from the
set of “onemutant change” neighbors. If it is not better than the current candidate
solution, the search continues until a better one has been found or all neighbors have
been enumerated. The major diﬀerence to the previous form is the number of steps
that are needed per improvement.
3. Fitter Dynamics: The optimization process enumerates all onemutant neighbors of
the current candidate solution and transcends to the best one.
From these elaborations, it becomes clear that adaptive walks are very similar to Hill Climb
ing and Random Optimization. The major diﬀerence is that an adaptive walk is a theoreti
cal construct that, very much like random walks, helps us to determine properties of ﬁtness
landscapes whereas the other two are practical realizations of optimization algorithms.
Adaptive walks are a very common construct in evolutionary biology. Biological
populations are running for a very long time and so their genetic compositions are as
sumed to be relatively converged [79, 1057]. The dynamics of such populations in near
equilibrium states with low mutation rates can be approximated with onemutant adaptive
walks [79, 1057, 2528].
8.3 Exhaustive Enumeration
Exhaustive enumeration is an optimization method which is as same as primitive as random
walks. It can be applied to any ﬁnite search space and basically consists of testing every single
8.3. EXHAUSTIVE ENUMERATION 117
possible solution for its utility, returning the best candidate solution discovered after all have
been tested. If the minimum possible objective value ˘ y is known for a given minimization
problem, the algorithm can also terminate sooner.
Algorithm 8.3: x ←− exhaustiveSearch(f, ˘ y)
Input: f: the objective/ﬁtness function
Input: ˘ y: the lowest possible objective value, −∞ if unknown
Data: x: the current candidate solution
Data: t: the iteration counter
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− gpm(G[0])
3 t ←− 1
4 while (t < [G[) ∧ (f(x) > ˘ y) do
5 x ←− gpm(G[t])
6 if f(x) > f(x) then x ←− x
7 t ←− t + 1
8 return x
Algorithm 8.3 contains a deﬁnition of the exhaustive search algorithm. It is based on the
assumption that the search space G can be traversed in a linear order, where the t
th
element
in the search space can be accessed as G[t]. The second parameter of the algorithm is the
lower limit ˘ y of the objective value. If this limit is known, the algorithm may terminate
before traversing the complete search space if it discovers a global optimum with f(x) = ˘ y
earlier. If no lower limit for f is known, ˘ y = −∞ can be provided.
It is interesting to see that for many hard problems such as ATcomplete ones (Sec
tion 12.2.2), no algorithms are known which can generally be faster than exhaustive enu
meration. For an input size of n, algorithms for such problems need O(2
n
) iterations.
Exhaustive search requires exactly the same complexity. If G = B
n
, i. e., the bit strings of
length n, the algorithm will need exactly 2
n
steps if no lower limit ˘ y for the value of f is
known.
118 8 BASELINE SEARCH PATTERNS
Tasks T8
33. What is are the similarities and diﬀerences between random walks, random sampling,
and adaptive walks?
[10 points]
34. Why should we consider an optimization algorithm as ineﬀective for a given problem
if it does not perform better than random sampling or a random walk on the problem?
[5 points]
35. Is it possible that an optimization algorithm such as, for example, Algorithm 1.1 on
page 40 may give us even worse results than a random walk or random sampling? Give
reasons. If yes, try to ﬁnd a problem (problem space, objective function) where this
could be the case.
[5 points]
36. Implement random sampling for the bin packing problem as a comparison algorithm
for Algorithm 1.1 on page 40 and test it on the same problem instances used to
verify Task 4 on page 42. Discuss your results.
[10 points]
37. Compare your answers to Task 35 and 36. Is the experiment in Task 36 suitable
to validate your hypothesis? If not, can you ﬁnd some instances of the bin packing
problem where it can be used to verify your idea? If so, apply both Algorithm 1.1
on page 40 and your random sampling implementation to these instances. Did the
experiments support your hypothesis?
[15 points]
38. Can we implement an exhaustive search pattern to the ﬁnd the minimum of a math
ematical function f deﬁned over the real interval (−1, 1)? Answer this question both
from
(a) a theoreticalmathematical perspective in an idealized execution environment
(i. e., with inﬁnite precision) and
(b) from a practical point of view based on realworld computers and requirements
(i. e., under the assumption that the available precision of ﬂoating point numbers
and of any given production process is limited).
[10 points]
39. Implement an exhaustivesearch based optimization procedure for the bin packing
problem as a comparison algorithm for Algorithm 1.1 on page 40 and test it on the
same problem instances used to verify Task 4 on page 42 and 36. Discuss your results.
[10 points]
40. Implement an exhaustive search pattern to solve the instance of the Traveling Salesman
Problem given in Example E2.2 on page 45.
[10 points]
Chapter 9
Forma Analysis
The Schema Theorem has been stated for Genetic Algorithms by Holland [1250] in its seminal
work [711, 1250, 1253]. In this section, we are going to discuss it in the more general version
from Weicker [2879] as introduced by Radcliﬀe and Surry [2241–2243, 2246, 2634].
Deﬁnition D9.1 (Property). A property φ is a function which maps individuals, geno
types, or phenotypes to a set of possible property values.
Each individual p in the population pop of an optimization algorithm is characterized by
its properties φ. Optimizers focus mainly on the phenotypical properties since these are
evaluated by the objective functions. In the forma analysis, the properties of the genotypes
are considered as well, since they inﬂuence the features of the phenotypes via the genotype
phenotype mapping. With Example E9.1, we want to clarify how possible properties could
look like.
Example E9.1 (Examples for Properties).
A rather structural property φ
1
of formulas z : R → R in symbolic regression
1
would be
whether it contains the mathematical expression x + 1 or not: φ
1
: X → ¦true, false¦.
We can also declare a behavioral property φ
2
which is true if [z(0) −1[ ≤ 0.1 holds, i. e., if
the result of z is close to a value 1 for the input 0, and false otherwise: φ
2
= [z(0) −1[ ≤
0.1 ∀z ∈ X.
Assume that the optimizer uses a binary search space G = B
n
and that the formulas
z are decoded from there to trees that represent mathematical expression by a genotype
phenotype mapping. A genotypical φ
3
property then would be if a certain sequence of bits
occurs in the genotype p.g and another phenotypical property φ
4
is the number of nodes in
the phenotype p.x, for instance.
If we try to solve a graphcoloring problem, for example, a property φ
5
: X →
¦black, white, gray¦ could denote the color of a speciﬁc vertex q as illustrated in Figure 9.1.
In general, we can consider properties φ
i
to be some sort of functions that map the
individuals to property values. φ
1
and φ
2
both map the space of mathematical functions to
the set B = ¦true, false¦ whereas φ
5
maps the space of all possible colorings for the given
graph to the set ¦white, gray, black¦ and property f
4
maps trees to the natural numbers.
1
More information on symbolic regression can be found in Section 49.1 on page 531.
120 9 FORMA ANALYSIS
Af
5
=white
Af
5
=black
q
G
1
q
G
3
q
G
2
q
G
4
q
G
6
q
G
7
q
G
8
q
G
5
Af
5
=gray
Í
Pop
( = )
X
G X
Figure 9.1: An graph coloringbased example for properties and formae.
On the basis of the properties φ
i
we can deﬁne equivalence relations
2
∼
φ
i
:
p
1
∼
φ
i
p
2
⇒φ
i
(p
1
) = φ
i
(p
2
) ∀p
1
, p
2
∈ GX (9.1)
Obviously, for each two candidate solutions x
1
and x
2
, either x
1
∼
φ
i
x
2
or x
1
≁
φ
i
x
2
holds.
These relations divide the search space into equivalence classes A
φ
i
=v
.
Deﬁnition D9.2 (Forma). An equivalence class A
φ
i
=v
that contains all the individuals
sharing the same characteristic v in terms of the property φ
i
is called a forma [2241] or
predicate [2826].
A
φ
i
=v
= ¦∀p ∈ GX : φ
i
(p) = v¦ (9.2)
∀p
1
, p
2
∈ A
φ
i
=v
⇒ p
1
∼
φ
i
p
2
(9.3)
The number of formae induced by a property, i. e., the number of its diﬀerent characteristics,
is called its precision [2241]. Two formae A
φ
i
=v
and A
φ
j
=w
are said to be compatible, written
as A
φ
i
=v
⊲⊳ A
φ
j
=w
, if there can exist at least one individual which is an instance of both.
A
φ
i
=v
⊲⊳ A
φ
j
=w
⇔ A
φ
i
=v
∩ A
φ
j
=w
,= ∅ (9.4)
A
φ
i
=v
⊲⊳ A
φ
j
=w
⇔ ∃ p ∈ GX : p ∈ A
φ
i
=v
∧ p ∈ A
φ
j
=w
(9.5)
A
φ
i
=v
⊲⊳ A
φ
i
=w
⇒ w = v (9.6)
Of course, two diﬀerent formae of the same property φ
i
, i. e., two diﬀerent characteristics of
φi, are always incompatible.
Example E9.2 (Examples for Formae (Example E9.1 Cont.)).
The precision of φ
1
and φ
2
in Example E9.1 is 2, for φ
5
it is 3. We can deﬁne another
property φ
6
≡ z(0) denoting the value a mathematical function has for the input 0. This
property would have an uncountable inﬁnitely large precision.
2
See the deﬁnition of equivalence classes in Section 51.9 on page 646.
9 FORMA ANALYSIS 121
z (x)=x+1
1
z (x)=x +1.1
2
2
z (x)=
3
x+2
z (x)=2(
4
x+1)
z (x)=(sin x)(x+1)
6
z (x)=(cos x)(x+1)
7
z (x)=(tan x)+1
8
z (x)=tan x
5
z
1
z
4
z
6
z
7
Af
1
=true
z
2
z
3
z
8
z
5
Af
1
=false
z
1 z
2
z
7
z
8
Af
2
=true
z
3
z
4
z
6
z
5
Af
2
=false
Af
4
=1
z
1
z
2
z
7
z
8
Af
4
=0
z
6
z
5
z
3
z
4
Af
4
=2
~
f
1
~
f
2
~
f
6
Af
4
=1.1
Í
Pop
( = )
X
G X
Figure 9.2: Example for formae in symbolic regression.
In our initial symbolic regression example A
φ
1
=true
,⊲⊳ A
φ
1
=false
holds, since it is not
possible that a function z contains a term x + 1 and at the same time does not contain
it. All formae of the properties φ
1
and φ
2
on the other hand are compatible: A
φ
1
=false
⊲⊳
A
φ
2
=false
, A
φ
1
=false
⊲⊳ A
φ
2
=true
, A
φ
1
=true
⊲⊳ A
φ
2
=false
, and A
φ
1
=true
⊲⊳ A
φ
2
=true
all
hold. If we take φ
6
into consideration, we will ﬁnd that there exist some formae compatible
with some of φ
2
and some that are not, like A
φ
2
=true
⊲⊳ A
φ
6
=1
and A
φ
2
=false
⊲⊳ A
φ
6
=2
, but
A
φ
2
=true
,⊲⊳ A
φ
6
=0
and A
φ
2
=false
,⊲⊳ A
φ
6
=0.95
.
The discussion of forma and their dependencies stems from the Evolutionary Algorithm
community and there especially from the supporters of the Building Block Hypothesis.
The idea is that an optimization algorithm should ﬁrst discover formae which have a good
inﬂuence on the overall ﬁtness of the candidate solutions. The hope is that there are many
compatible ones under these formae that can gradually be combined in the search process.
In this text we have deﬁned formae and the corresponding terms on the basis of individ
uals p which are records that assign an element of the problem spaces p.x ∈ X to an element
of the search space p.g ∈ G. Generally, we will relax this notation and also discuss forma
directly in the context of the search space G or problem space X, when appropriate.
122 9 FORMA ANALYSIS
Tasks T9
41. Deﬁne at least three reasonable properties on the individuals, genotypes, or phenotypes
of the Traveling Salesman Problem as introduced in Example E2.2 on page 45 and
further discussed in Task 26 on page 91.
[6 points]
42. List the relevant forma for each of the properties you have deﬁned in based on your
deﬁnitions in Task 41 and in Task 26 on page 91.
[12 points]
43. List all the members of your forma given in Task 42 for the Traveling Salesman Problem
instance provided in Table 2.1 on page 46.
[10 points]
Chapter 10
General Information on Optimization
10.1 Applications and Examples
Table 10.1: Applications and Examples of Optimization.
Area References
Art see Tables 28.4, 29.1, and 31.1
Astronomy see Table 28.4
BusinessToBusiness see Table 3.1
Chemistry [623]; see also Tables 3.1, 22.1, 28.4, 29.1, 30.1, 31.1, 32.1,
40.1, and 43.1
Cloud Computing [2301, 2790, 3022, 3047]
Combinatorial Problems [5, 17, 115, 118, 130, 131, 178, 213, 241, 250, 251, 265, 315,
431, 432, 463, 475, 497, 533, 568, 571, 576, 578, 626–628,
671, 680, 682, 783, 806, 822–825, 832, 868, 1008, 1026, 1027,
1042, 1089, 1091, 1096, 1124, 1239, 1264, 1280, 1308, 1313,
1472, 1497, 1501, 1502, 1533, 1585, 1625, 1694, 1723, 1724,
1834, 1850, 1851, 1879, 1998, 2000, 2066, 2124, 2131, 2132,
2172, 2219, 2223, 2248, 2256, 2257, 2261, 2262, 2288–2290,
2330, 2422, 2449, 2508, 2511, 2512, 2667, 2673, 2692, 2717,
2812, 2813, 2817, 2885, 2900, 2990, 3016, 3017, 3047]; see
also Tables 3.1, 3.2, 18.1, 20.1, 22.1, 25.1, 26.1, 27.1, 28.4,
29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1, 40.1,
41.1, 42.1, 46.1, 46.3, 47.1, and 48.1
124 10 GENERAL INFORMATION ON OPTIMIZATION
Computer Graphics [45, 327, 450, 844, 1224, 2506]; see also Tables 3.1, 3.2, 20.1,
27.1, 28.4, 29.1, 30.1, 31.1, 33.1, and 36.1
Control [1922]; see also Tables 28.4, 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork [2238]; see also Tables 22.1, 28.4, 29.1, 31.1, and 37.1
Data Compression see Table 31.1
Data Mining [37, 40, 92, 115, 126, 189, 203, 217, 277, 281, 305, 310, 406,
450, 538, 656, 766, 805, 849, 889, 930, 961, 968, 975, 983,
991, 1035, 1036, 1100, 1168, 1176, 1194, 1259, 1425, 1503,
1959, 2070, 2088, 2202, 2234, 2272, 2312, 2319, 2357, 2459,
2506, 2602, 2671, 2788, 2790, 2824, 2839, 2849, 2939, 2961,
2999, 3047, 3066, 3077, 3078, 3103]; see also Tables 3.1, 20.1,
21.1, 22.1, 27.1, 28.4, 29.1, 31.1, 32.1, 34.1, 35.1, 36.1, 37.1,
39.1, and 46.3
Databases [115, 130, 131, 213, 533, 568, 571, 783, 991, 1096, 1308, 1585,
2692, 2824, 2971, 3016, 3017]; see also Tables 3.1, 3.2, 21.1,
26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1, 40.1, 46.3,
and 47.1
Decision Making [1194]; see also Tables 3.1, 3.2, 20.1, and 31.1
Digital Technology see Tables 3.1, 26.1, 28.4, 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys
tems
[127, 204, 226, 330–333, 492, 495, 506, 533, 565, 576, 617,
628, 650, 671, 822, 823, 1101, 1168, 1280, 1342, 1425, 1472,
1497, 1561, 1645, 1647, 1650, 1690, 1777, 1850, 1851, 1998,
2000, 2066, 2070, 2172, 2202, 2238, 2256, 2257, 2301, 2319,
2330, 2375, 2511, 2512, 2602, 2638, 2716, 2790, 2839, 2878,
2990, 3017, 3022, 3058]; see also Tables 3.1, 3.2, 18.1, 21.1,
22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
35.1, 36.1, 37.1, 39.1, 40.1, 41.1, 46.1, 46.3, and 47.1
ELearning [506, 565, 2319]; see also Tables 28.4, 29.1, 37.1, and 39.1
Economics and Finances [204, 281, 310, 559, 1194, 1998, 2257, 2301, 2330, 2878, 3022];
see also Tables 3.1, 3.2, 27.1, 28.4, 29.1, 31.1, 33.1, 35.1, 36.1,
37.1, 39.1, 40.1, 46.1, and 46.3
Engineering [189, 199, 312, 1101, 1194, 1219, 1227, 1431, 1690, 1777, 2088,
2375, 2459, 2716, 2839, 2878, 3004, 3047, 3077, 3078]; see also
Tables 3.1, 3.2, 18.1, 21.1, 22.1, 26.1, 27.1, 28.4, 29.1, 30.1,
31.1, 32.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, 41.1, 46.3,
and 47.1
Face Recognition [471]
Function Optimization [72, 613, 1136, 1137, 1188, 1942, 2333, 2708]; see also Tables
3.2, 21.1, 26.1, 27.1, 28.4, 29.1, 30.1, 32.1, 33.1, 34.1, 35.1,
36.1, 39.1, 40.1, 41.1, and 43.1
Games [1954, 2383–2385]; see also Tables 3.1, 28.4, 29.1, 31.1, and
32.1
Graph Theory [241, 576, 628, 650, 1219, 1263, 1472, 1625, 1690, 2172, 2375,
2716, 2839]; see also Tables 3.1, 18.1, 21.1, 22.1, 25.1, 26.1,
27.1, 28.4, 29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1,
39.1, 40.1, 41.1, 46.1, 46.3, and 47.1
Healthcare [45, 3103]; see also Tables 28.4, 29.1, 31.1, 35.1, 36.1, and
37.1
Image Synthesis [1637]; see also Tables 18.1 and 28.4
Logistics [118, 178, 241, 250, 251, 265, 495, 497, 578, 627, 680, 806,
824, 825, 832, 868, 1008, 1027, 1089, 1091, 1124, 1313, 1501,
1625, 1694, 1723, 1724, 1777, 2131, 2132, 2248, 2261, 2262,
2288–2290, 2508, 2673, 2717, 2812, 2813, 2817, 3047]; see
also Tables 3.1, 18.1, 22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1,
33.1, 34.1, 36.1, 37.1, 38.1, 39.1, 40.1, 41.1, 46.3, 47.1, and
48.1
10.2. BOOKS 125
Mathematics [405, 766, 930, 1020, 2200, 2349, 2357, 2418, 2572, 2885, 2939,
3078]; see also Tables 3.1, 18.1, 26.1, 27.1, 28.4, 29.1, 30.1,
31.1, 32.1, 33.1, 34.1, and 35.1
Military and Defense see Tables 3.1, 26.1, 27.1, 28.4, and 34.1
Motor Control see Tables 18.1, 22.1, 28.4, 33.1, 34.1, 36.1, and 39.1
MultiAgent Systems [1818, 2081, 2667, 2919]; see also Tables 3.1, 18.1, 22.1, 28.4,
29.1, 31.1, 35.1, and 37.1
Multiplayer Games [127]; see also Table 31.1
News Industry [1425]
Nutrition [2971]
Operating Systems [2900]
Physics [1879]; see also Tables 3.1, 27.1, 28.4, 29.1, 31.1, 32.1, and
34.1
Police and Justice [471]
Prediction [405]; see also Tables 28.4, 29.1, 30.1, 31.1, and 35.1
Security [2459, 2878, 3022]; see also Tables 3.1, 22.1, 26.1, 27.1, 28.4,
29.1, 31.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 46.3
Shape Synthesis see Table 26.1
Software [822, 823, 827, 1008, 1027, 1168, 1651, 1850, 1851, 2000, 2066,
2131, 2132, 2330, 2961, 2971, 3058]; see also Tables 3.1, 18.1,
20.1, 21.1, 22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 33.1,
34.1, 36.1, 37.1, 40.1, 46.1, and 46.3
Sorting see Tables 28.4 and 31.1
Telecommunication see Tables 28.4, 37.1, and 40.1
Testing [1168, 1651]; see also Tables 3.1, 28.4, and 31.3
Theorem Prooﬁng and Automatic
Veriﬁcation
see Tables 29.1, 40.1, and 46.3
Water Supply and Management see Tables 3.1 and 28.4
Wireless Communication [1219]; see also Tables 3.1, 18.1, 22.1, 26.1, 27.1, 28.4, 29.1,
31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 46.3
10.2 Books
1. Handbook of Evolutionary Computation [171]
2. New Ideas in Optimization [638]
3. Scalable Optimization via Probabilistic Modeling – From Algorithms to Applications [2158]
4. Handbook of Metaheuristics [1068]
5. New Optimization Techniques in Engineering [2083]
6. Metaheuristics for Multiobjective Optimisation [1014]
7. Selected Papers from the Workshop on Pattern Directed Inference Systems [2869]
8. Applications of MultiObjective Evolutionary Algorithms [597]
9. Handbook of Applied Optimization [2125]
10. The Traveling Salesman Problem: A Computational Study [118]
11. Intelligent Systems for Automated Learning and Adaptation: Emerging Trends and Applica
tions [561]
12. Optimization and Operations Research [773]
13. Readings in Artiﬁcial Intelligence: A Collection of Articles [2872]
14. Machine Learning: Principles and Techniques [974]
15. Experimental Methods for the Analysis of Optimization Algorithms [228]
16. Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8]
17. Evolutionary Algorithms for Solving MultiObjective Problems [599]
18. Pareto Optimality, Game Theory and Equilibria [560]
19. Pattern Classiﬁcation [844]
20. The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization [1694]
21. Parallel Evolutionary Computations [2015]
22. Introduction to Global Optimization [2126]
126 10 GENERAL INFORMATION ON OPTIMIZATION
23. Global Optimization Algorithms – Theory and Application [2892]
24. Computational Intelligence in Control [1922]
25. Introduction to Operations Research [580]
26. Management Models and Industrial Applications of Linear Programming [530]
27. Eﬃcient and Accurate Parallel Genetic Algorithms [477]
28. Swarm Intelligence – Focus on Ant and Particle Swarm Optimization [525]
29. Nonlinear Programming: Sequential Unconstrained Minimization Techniques [908]
30. Nonlinear Programming: Sequential Unconstrained Minimization Techniques [909]
31. Global Optimization: Deterministic Approaches [1271]
32. Soft Computing: Methodologies and Applications [1242]
33. Handbook of Global Optimization [1272]
34. New Learning Paradigms in Soft Computing [1430]
35. How to Solve It: Modern Heuristics [1887]
36. Numerical Optimization [2040]
37. Artiﬁcial Intelligence: A Modern Approach [2356]
38. Electromagnetic Optimization by Genetic Algorithms [2250]
39. Soft Computing: Integrating Evolutionary, Neural, and Fuzzy Systems [2685]
40. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
[2961]
41. MultiObjective Swarm Intelligent Systems: Theory & Experiences [2016]
42. Evolutionary Optimization [2394]
43. RuleBased Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis
and Design [460]
44. MultiObjective Optimization in Computational Intelligence: Theory and Practice [438]
45. An Operations Research Approach to the Economic Optimization of a Kraft Pulping Process
[500]
46. Fundamental Methods of Mathematical Economics [559]
47. Einf¨ uhrung in die K¨ unstliche Intelligenz [797]
48. Stochastic and Global Optimization [850]
49. Machine Learning Applications in Expert Systems and Information Retrieval [975]
50. Nonlinear and Dynamics Programming [1162]
51. Introduction to Operations Research [1232]
52. Introduction to Stochastic Search and Optimization [2561]
53. Nonlinear Multiobjective Optimization [1893]
54. Heuristics: Intelligent Search Strategies for Computer Problem Solving [2140]
55. Global Optimization with NonConvex Constraints [2623]
56. The Nature of Statistical Learning Theory [2788]
57. Advances in Greedy Algorithms [247]
58. Multiobjective Decision Making Theory and Methodology [528]
59. Multiobjective Optimization: Principles and Case Studies [609]
60. MultiObjective Optimization Using Evolutionary Algorithms [740]
61. Multicriteria Optimization [860]
62. Multiple Criteria Optimization: Theory, Computation and Application [2610]
63. Global Optimization [2714]
64. Soft Computing [760]
10.3 Conferences and Workshops
Table 10.2: Conferences and Workshops on Optimization.
10.3. CONFERENCES AND WORKSHOPS 127
IJCAI: International Joint Conference on Artiﬁcial Intelligence
History: 2009/07: Pasadena, CA, USA, see [373]
2007/01: Hyderabad, India, see [2797]
2005/07: Edinburgh, Scotland, UK, see [1474]
2003/08: Acapulco, M´exico, see [1118, 1485]
2001/08: Seattle, WA, USA, see [2012]
1999/07: Stockholm, Sweden, see [730, 731, 2296]
1997/08: Nagoya, Japan, see [1946, 1947, 2260]
1995/08: Montr´eal, QC, Canada, see [1836, 1861, 2919]
1993/08: Chamb´ery, France, see [182, 183, 927, 2259]
1991/08: Sydney, NSW, Australia, see [831, 1987, 1988, 2122]
1989/08: Detroit, MI, USA, see [2593, 2594]
1987/08: Milano, Italy, see [1846, 1847]
1985/08: Los Angeles, CA, USA, see [1469, 1470]
1983/08: Karlsruhe, Germany, see [448, 449]
1981/08: Vancouver, BC, Canada, see [1212, 1213]
1979/08: Tokyo, Japan, see [2709, 2710]
1977/08: Cambridge, MA, USA, see [2281, 2282]
1975/09: Tbilisi, Georgia, USSR, see [2678]
1973/08: Stanford, CA, USA, see [2039]
1971/09: London, UK, see [630]
1969/05: Washington, DC, USA, see [2841]
AAAI: Annual National Conference on Artiﬁcial Intelligence
History: 2011/08: San Francisco, CA, USA, see [3]
2010/07: Atlanta, GA, USA, see [2]
2008/06: Chicago, IL, USA, see [979]
2007/07: Vancouver, BC, Canada, see [1262]
2006/07: Boston, MA, USA, see [1055]
2005/07: Pittsburgh, PA, USA, see [1484]
2004/07: San Jos´e, CA, USA, see [1849]
2002/07: Edmonton, Alberta, Canada, see [759]
2000/07: Austin, TX, USA, see [1504]
1999/07: Orlando, FL, USA, see [1222]
1998/07: Madison, WI, USA, see [1965]
1997/07: Providence, RI, USA, see [1634, 1954]
1996/08: Portland, OR, USA, see [586, 2207]
1994/07: Seattle, WA, USA, see [1214]
1993/07: Washington, DC, USA, see [913]
1992/07: San Jos´e, CA, USA, see [2639]
1991/07: Anaheim, CA, USA, see [732]
1990/07: Boston, MA, USA, see [793]
1988/08: St. Paul, MN, USA, see [1916]
1987/07: Seattle, WA, USA, see [960]
128 10 GENERAL INFORMATION ON OPTIMIZATION
1986/08: Philadelphia, PA, USA, see [1512, 1513]
1984/08: Austin, TX, USA, see [381]
1983/08: Washington, DC, USA, see [1041]
1982/08: Pittsburgh, PA, USA, see [2845]
1980/08: Stanford, CA, USA, see [197]
HIS: International Conference on Hybrid Intelligent Systems
History: 2011/12: Melaka, Malaysia, see [14]
2010/08: Atlanta, GA, USA, see [1377]
2009/08: Shˇeny´ang, Li´aon´ıng, China, see [1375]
2008/10: Barcelona, Catalonia, Spain, see [2993]
2007/09: Kaiserslautern, Germany, see [1572]
2006/12: Auckland, New Zealand, see [1340]
2005/11: Rio de Janeiro, RJ, Brazil, see [2014]
2004/12: Kitakyushu, Japan, see [1421]
2003/12: Melbourne, VIC, Australia, see [10]
2002/12: Santiago, Chile, see [9]
2001/12: Adelaide, SA, Australia, see [7]
IAAI: Conference on Innovative Applications of Artiﬁcial Intelligence
History: 2011/08: San Francisco, CA, USA, see [4]
2010/07: Atlanta, GA, USA, see [2361]
2009/07: Pasadena, CA, USA, see [1164]
2006/07: Boston, MA, USA, see [1055]
2005/07: Pittsburgh, PA, USA, see [1484]
2004/07: San Jos´e, CA, USA, see [1849]
2003/08: Acapulco, M´exico, see [2304]
2001/04: Seattle, WA, USA, see [1238]
2000/07: Austin, TX, USA, see [1504]
1999/07: Orlando, FL, USA, see [1222]
1998/07: Madison, WI, USA, see [1965]
1997/07: Providence, RI, USA, see [1634]
1996/08: Portland, OR, USA, see [586]
1995/08: Montr´eal, QC, Canada, see [51]
1994/08: Seattle, WA, USA, see [462]
1993/07: Washington, DC, USA, see [1]
1992/07: San Jos´e, CA, USA, see [2439]
1991/06: Anaheim, CA, USA, see [2533]
1990/05: Washington, DC, USA, see [2269]
1989/03: Stanford, CA, USA, see [2428]
ICCI: IEEE International Conference on Cognitive Informatics
History: 2011/08: Banﬀ, AB, Canada, see [200]
2010/07: Bˇeij¯ıng, China, see [2630]
2009/06: Kowloon, Hong Kong / Xi¯ anggˇ ang, China, see [164]
2008/08: Stanford, CA, USA, see [2857]
2007/08: Lake Tahoe, CA, USA, see [3065]
2006/07: Bˇeij¯ıng, China, see [3029]
2005/08: Irvine, CA, USA, see [1339]
10.3. CONFERENCES AND WORKSHOPS 129
2004/08: Victoria, Canada, see [524]
2003/08: London, UK, see [2133]
2002/08: Calgary, AB, Canada, see [1337]
ICML: International Conference on Machine Learning
History: 2011/06: Seattle, WA, USA, see [2442]
2010/06: Haifa, Israel, see [1163]
2009/06: Montr´eal, QC, Canada, see [681]
2008/07: Helsinki, Finland, see [603]
2007/06: Corvallis, OR, USA, see [1047]
2006/06: Pittsburgh, PA, USA, see [602]
2005/08: Bonn, North RhineWestphalia, Germany, see [724]
2004/07: Banﬀ, AB, Canada, see [426]
2003/08: Washington, DC, USA, see [895]
2002/07: Sydney, NSW, Australia, see [2382]
2001/06: Williamstown, MA, USA, see [427]
2000/06: Stanford, CA, USA, see [1676]
1999/06: Bled, Slovenia, see [401]
1998/07: Madison, WI, USA, see [2471]
1997/07: Nashville, TN, USA, see [926]
1996/07: Bari, Italy, see [2367]
1995/07: Tahoe City, CA, USA, see [2221]
1994/07: New Brunswick, NJ, USA, see [601]
1993/06: Amherst, MA, USA, see [1945]
1992/07: Aberdeen, Scotland, UK, see [2520]
1991/06: Evanston, IL, USA, see [313]
1990/06: Austin, TX, USA, see [2206]
1989/06: Ithaca, NY, USA, see [2447]
1988/06: Ann Arbor, MI, USA, see [1659, 1660]
1987: Irvine, CA, USA, see [1415]
1985: Skytop, PA, USA, see [2519]
1983: Monticello, IL, USA, see [1936]
1980: Pittsburgh, PA, USA, see [2183]
MCDM: Multiple Criteria Decision Making
History: 2011/06: Jyv¨ askyl¨ a, Finland, see [1895]
2009/06: Ch´engd¯ u, S`ıchu¯an, China, see [2750]
2008/06: Auckland, New Zealand, see [861]
2006/06: Chania, Crete, Greece, see [3102]
2004/08: Whistler, BC, Canada, see [2875]
2002/02: Semmering, Austria, see [1798]
2000/06: Ankara, Turkey, see [1562]
1998/06: Charlottesville, VA, USA, see [1166]
1997/01: Cape Town, South Africa, see [2612]
1995/06: Hagen, North RhineWestphalia, Germany, see [891]
1994/08: Coimbra, Portugal, see [592]
1992/07: Taipei, Taiwan, see [2744]
1990/08: Fairfax, VA, USA, see [2551]
130 10 GENERAL INFORMATION ON OPTIMIZATION
1988/08: Manchester, UK, see [1759]
1986/08: Kyoto, Japan, see [2410]
1984/06: Cleveland, OH, USA, see [1165]
1982/08: Mons, Brussels, Belgium, see [1190]
1980/08: Newark, DE, USA, see [1960]
1979/08: Hagen/K¨onigswinter, West Germany, see [890]
1977/08: Buﬀalo, NY, USA, see [3094]
1975/05: JouyenJosas, France, see [2701]
SEAL: International Conference on Simulated Evolution and Learning
History: 2012/12: Hanoi, Vietnam, see [1179]
2010/12: Kanpur, Uttar Pradesh, India, see [2585]
2008/12: Melbourne, VIC, Australia, see [1729]
2006/10: H´ef´ei,
¯
Anhu¯ı, China, see [2856]
2004/10: Busan, South Korea, see [2123]
2002/11: Singapore, see [2661]
2000/10: Nagoya, Japan, see [1991]
1998/11: Canberra, Australia, see [1852]
1996/11: Taejon, South Korea, see [2577]
AI: Australian Joint Conference on Artiﬁcial Intelligence
History: 2010/12: Adelaide, SA, Australia, see [33]
2006/12: Hobart, Australia, see [2406]
2005/12: Sydney, NSW, Australia, see [3072]
2004/12: Cairns, Australia, see [2871]
AISB: Artiﬁcial Intelligence and Simulation of Behaviour
History: 2008/04: Aberdeen, Scotland, UK, see [1147]
2007/04: Newcastle upon Tyne, UK, see [2549]
2005/04: Bristol, UK and Hatﬁeld, England, UK, see [2547, 2548]
2003/03: Aberystwyth, Wales, UK and Leeds, UK, see [2545, 2546]
2002/04: London, UK, see [2544]
2001/03: Heslington, York, UK, see [2759]
2000/04: Birmingham, UK, see [2748]
1997/04: Manchester, UK, see [637]
1996/04: Brighton, UK, see [938]
1995/04: Sheﬃeld, UK, see [937]
1994/04: Leeds, UK, see [936]
BIONETICS: International Conference on BioInspired Models of Network, Information, and Com
puting Systems
History: 2010/12: Boston, MA, USA, see [367]
2009/12: Avignon, France, see [1275]
2008/11: Awaji City, Hyogo, Japan, see [153]
2007/12: Budapest, Hungary, see [1342]
2006/12: Cavalese, Italy, see [509]
ICARIS: International Conference on Artiﬁcial Immune Systems
History: 2011/07: Cambridge, UK, see [1203]
2010/07: Edinburgh, Scotland, UK, see [858]
2009/08: York, UK, see [2363]
2008/08: Phuket, Thailand, see [274]
10.3. CONFERENCES AND WORKSHOPS 131
2007/08: Santos, Brazil, see [707]
2006/09: Oeiras, Portugal, see [282]
2005/08: Banﬀ, AB, Canada, see [1424]
2004/09: Catania, Sicily, Italy, see [2035]
2003/09: Edinburgh, Scotland, UK, see [2706]
: Canterbury, Kent, UK, see [2705]
MICAI: Mexican International Conference on Artiﬁcial Intelligence
History: 2010/11: Pachuca, Mexico, see [2583]
2009/11: Guanajuato, M´exico, see [38]
2007/11: Aguascalientes, M´exico, see [1037]
2005/11: Monterrey, M´exico, see [1039]
2004/04: Mexico City, M´exico, see [1929]
2002/04: M´erida, Yucat´ an, M´exico, see [598]
2000/04: Acapulco, M´exico, see [470]
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
History: 2011/10: ClujNapoca, Romania, see [2576]
2010/05: Granada, Spain, see [1102]
2008/11: Puerto de la Cruz, Tenerife, Spain, see [1620]
2007/11: Catania, Sicily, Italy, see [1619]
2006/06: Granada, Spain, see [2159]
SMC: International Conference on Systems, Man, and Cybernetics
History: 2014/10: San Diego, CA, USA, see [2386]
2013/10: Edinburgh, Scotland, UK, see [859]
2012/10: Seoul, South Korea, see [2454]
2011/10: Anchorage, AK, USA, see [91]
2009/10: San Antonio, TX, USA, see [1352]
2001/10: Tucson, AZ, USA, see [1366]
1999/10: Tokyo, Japan, see [1331]
1998/10: La Jolla, CA, USA, see [1364]
WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications)
History: 2010/11: see [1029]
2009/11: see [1015]
2008/11: see [1857]
2007/10: see [151]
2006/09: see [2980]
2005: see [2979]
2004: see [2978]
2003: see [2977]
2002: see [2976]
2001: see [2975]
2000: see [2974]
1999/09: Nagoya, Japan, see [1993]
1998/08: Nagoya, Japan, see [1992]
1997/06: see [535]
1996/08: Nagoya, Japan, see [1001]
AIMS: Artiﬁcial Intelligence in Mobile System
132 10 GENERAL INFORMATION ON OPTIMIZATION
History: 2003/10: Seattle, WA, USA, see [1624]
CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology
History: 2008/10: CergyPontoise, France, see [3049]
2005/05: Muroran, Japan, see [11]
ECML: European Conference on Machine Learning
History: 2007/09: Warsaw, Poland, see [2866]
2006/09: Berlin, Germany, see [278]
2005/10: Porto, Portugal, see [2209]
2004/09: Pisa, Italy, see [2182]
2003/09: Cavtat, Dubrovnik, Croatia, see [512]
2002/08: Helsinki, Finland, see [1221]
2001/09: Freiburg, Germany, see [992]
2000/05: Barcelona, Catalonia, Spain, see [215]
1998/04: Chemnitz, Sachsen, Germany, see [537]
1997/04: Prague, Czech Republic, see [2781]
1995/04: Heraclion, Crete, Greece, see [1223]
1994/04: Catania, Sicily, Italy, see [507]
1993/04: Vienna, Austria, see [2809]
EPIA: Portuguese Conference on Artiﬁcial Intelligence
History: 2011/10: Lisbon, Portugal, see [1744]
2009/10: Aveiro, Portugal, see [1774]
2007/12: Guimar˜aes, Portugal, see [2026]
2005/12: Covilh˜ a, Portugal, see [275]
EUFIT: European Congress on Intelligent Techniques and Soft Computing
History: 1999/09: Aachen, North RhineWestphalia, Germany, see [2807]
1998/09: Aachen, North RhineWestphalia, Germany, see [2806]
1997/09: Aachen, North RhineWestphalia, Germany, see [3091]
1996/09: Aachen, North RhineWestphalia, Germany, see [877]
1995/08: Aachen, North RhineWestphalia, Germany, see [2805]
1994/09: Aachen, North RhineWestphalia, Germany, see [2803]
1993/09: Aachen, North RhineWestphalia, Germany, see [2804]
EngOpt: International Conference on Engineering Optimization
History: 2010/09: Lisbon, Portugal, see [1405]
2008/06: Rio de Janeiro, RJ, Brazil, see [1227]
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
HAIS: International Workshop on Hybrid Artiﬁcial Intelligence Systems
History: 2011/05: Warsaw, Poland, see [2867]
2009/06: Salamanca, Spain, see [2749]
2008/09: Burgos, Spain, see [632]
2007/11: Salamanca, Spain, see [631]
2006/10: Ribeir˜ao Preto, SP, Brazil, see [184]
ICAART: International Conference on Agents and Artiﬁcial Intelligence
History: 2011/01: Rome, Italy, see [1404]
2010/01: Val`encia, Spain, see [917]
10.3. CONFERENCES AND WORKSHOPS 133
2009/01: Porto, Portugal, see [916]
ICTAI: International Conference on Tools with Artiﬁcial Intelligence
History: 2003/11: Sacramento, CA, USA, see [1367]
IEA/AIE: International Conference on Industrial and Engineering Applications of Artiﬁcial Intel
ligence and Expert Systems
History: 2004/05: Ottawa, ON, Canada, see [2088]
ISA: International Workshop on Intelligent Systems and Applications
History: 2011/05: Wˇ uh`an, H´ ubˇei, China, see [1380]
2010/05: Wˇ uh`an, H´ ubˇei, China, see [2846]
2009/05: Wˇ uh`an, H´ ubˇei, China, see [2847]
ISICA: International Symposium on Advances in Computation and Intelligence
History: 2007/09: Wˇ uh`an, H´ ubˇei, China, see [1490]
LION: Learning and Intelligent OptimizatioN
History: 2011/01: Rome, Italy, see [2318]
2010/01: Venice, Italy, see [2581]
2009/01: Trento, Italy, see [2625]
2007/02: Andalo, Trento, Italy, see [1273, 1826]
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
History: 2010/09: St Andrews, Scotland, UK, see [780]
2009/09: Lisbon, Portugal, see [779]
2008/09: Sydney, NSW, Australia, see [2011]
2007/09: Providence, RI, USA, see [2225]
2006/09: Nantes, France, see [583]
2005/10: Barcelona, Catalonia, Spain, see [1860]
2004/09: Toronto, ON, Canada, see [2715]
SoCPaR: International Conference on SOft Computing and PAttern Recognition
History: 2011/10: D` ali´ an, Li´aon´ıng, China, see [1381]
2010/12: CergyPontoise, France, see [515]
2009/12: Malacca, Malaysia, see [1224]
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
History: 1992/09: Tutzing, Germany, see [2742]
1989: Germany, see [245]
1986/07: Neubiberg, Bavaria, Germany, see [244]
1983/06: Neubiberg, Bavaria, Germany, see [243]
AINS: Annual Symposium on Autonomous Intelligent Networks and Systems
History: 2003/06: Menlo Park, CA, USA, see [513]
ALIO/EURO: Workshop on Applied Combinatorial Optimization
History: 2011/05: Porto, Portugal, see [138]
2008/12: Buenos Aires, Argentina, see [137]
2005/10: Paris, France, see [136]
2002/11: Puc´on, Chile, see [135]
1999/11: Erice, Italy, see [134]
1996/11: Valpara´ıso, Chile, see [133]
1989/08: Rio de Janeiro, RJ, Brazil, see [132]
ANZIIS: Australia and New Zealand Conference on Intelligent Information Systems
History: 1994/11: Brisbane, QLD, Australia, see [1359]
134 10 GENERAL INFORMATION ON OPTIMIZATION
ASC: IASTED International Conference on Artiﬁcial Intelligence and Soft Computing
History: 2002/07: Banﬀ, AB, Canada, see [1714]
EDM: International Conference on Educational Data Mining
History: 2009/07: C´ordoba, Spain, see [217]
EWSL: European Working Session on Learning
Continued as ECML in 1993.
History: 1991/03: Porto, Portugal, see [1560]
1988/10: Glasgow, Scotland, UK, see [1061]
1987/05: Bled, Yugoslavia (now Slovenia), see [322]
1986/02: Orsay, France, see [2097]
FLAIRS: Florida Artiﬁcial Intelligence Research Symposium
History: 1994/05: Pensacola Beach, FL, USA, see [1360]
FWGA: Finnish Workshop on Genetic Algorithms and Their Applications
See Table 29.2 on page 355.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
History: 2006/12: Vienna, Austria, see [2024]
ICMLC: International Conference on Machine Learning and Cybernetics
History: 2005/08: Guˇangzh¯ou, Guˇangd¯ ong, China, see [2263]
IICAI: Indian International Conference on Artiﬁcial Intelligence
History: 2011/12: Tumkur, India, see [2736]
2009/12: Tumkur, India, see [2217]
2007/12: Pune, India, see [2216]
2005/12: Pune, India, see [2215]
2003/12: Hyderabad, India, see [2214]
ISIS: International Symposium on Advanced Intelligent Systems
History: 2007/09: Sokcho, Korea, see [1579]
MCDA: International Summer School on Multicriteria Decision Aid
History: 1998/07: Monte Estoril, Lisbon, Portugal, see [198]
PAKDD: PaciﬁcAsia Conference on Knowledge Discovery and Data Mining
History: 2011/05: Sh¯enzh`en, Guˇangd¯ ong, China, see [1294]
STeP: Finnish Artiﬁcial Intelligence Conference
History: 2008/08: Espoo, Finland, see [2255]
TAAI: Conference on Technologies and Applications of Artiﬁcial Intelligence
History: 2010/11: Hsinchu, Taiwan, see [1283]
2009/10: Wufeng Township, Taichung County, Taiwan, see [529]
2008/11: Chiaohsi Shiang, Ilan County, Taiwan, see [2659]
2007/11: Douliou, Yunlin, Taiwan, see [2006]
2006/12: Kaohsiung, Taiwan, see [1493]
2005/12: Kaohsiung, Taiwan, see [2004]
2004/11: Wenshan District, Taipei City, Taiwan, see [2003]
TAI: IEEE Conference on Tools for Artiﬁcial Intelligence
History: 1990/11: Herndon, VA, USA, see [1358]
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
History: 1997/09: Rytro, Poland, see [2362]
10.4. JOURNALS 135
Machine Intelligence Workshop
History: 1977: Santa Cruz, CA, USA, see [874]
TAINN: Turkish Symposium on Artiﬁcial Intelligence and Neural Networks (Tu¨ark Yapay Zeka ve
Yapay Sinir A˘gları Sempozyumu)
History: 1993/06: Istanbul, Turkey, see [1643]
AGI: Conference on Artiﬁcial General Intelligence
History: 2011/08: Mountain View, CA, USA, see [2587]
2010/03: Lugano, Switzerland, see [1786]
2009/03: Arlington, VA, USA, see [661]
2008/03: Memphis, TN, USA, see [1414]
AICS: Irish Conference on Artiﬁcial Intelligence and Cognitive Science
History: 2007/08: Dublin, Ireland, see [764]
AIIA: Congress of the Italian Association for Artiﬁcial Intelligence (AI*IA) on Trends in Artiﬁcial
Intelligence
History: 1991/10: Palermo, Italy, see [124]
ECML PKDD: European Conference on Machine Learning and European Conference on Principles
and Practice of Knowledge Discovery in Databases
History: 2009/09: Bled, Slovenia, see [321]
2008/09: Antwerp, Belgium, see [114]
SEA: International Symposium on Experimental Algorithms
History: 2011/05: Chania, Crete, Greece, see [2098]
2010/05: Ischa Island, Naples, Italy, see [2584]
10.4 Journals
1. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics published by
IEEE Systems, Man, and Cybernetics Society
2. Information Sciences – Informatics and Computer Science Intelligent Systems Applications:
An International Journal published by Elsevier Science Publishers B.V.
3. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers
B.V. and NorthHolland Scientiﬁc Publishers Ltd.
4. Machine Learning published by Kluwer Academic Publishers and Springer Netherlands
5. Applied Soft Computing published by Elsevier Science Publishers B.V.
6. Soft Computing – A Fusion of Foundations, Methodologies and Applications published by
SpringerVerlag
7. Computers & Operations Research published by Elsevier Science Publishers B.V. and Perg
amon Press
8. Artiﬁcial Intelligence published by Elsevier Science Publishers B.V.
9. Operations Research published by HighWire Press (Stanford University) and Institute for
Operations Research and the Management Sciences (INFORMS)
10. Annals of Operations Research published by J. C. Baltzer AG, Science Publishers and Springer
Netherlands
11. International Journal of Computational Intelligence and Applications (IJCIA) published by
Imperial College Press Co. and World Scientiﬁc Publishing Co.
12. ORSA Journal on Computing published by Operations Research Society of America (ORSA)
13. Management Science published by HighWire Press (Stanford University) and Institute for
Operations Research and the Management Sciences (INFORMS)
14. Journal of Artiﬁcial Intelligence Research (JAIR) published by AAAI Press and AI Access
Foundation, Inc.
15. Computational Optimization and Applications published by Kluwer Academic Publishers and
Springer Netherlands
16. The Journal of the Operational Research Society (JORS) published by Operations Research
Society and Palgrave Macmillan Ltd.
17. Data Mining and Knowledge Discovery published by Springer Netherlands
136 10 GENERAL INFORMATION ON OPTIMIZATION
18. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied
Mathematics (SIAM)
19. Mathematics of Operations Research (MOR) published by Institute for Operations Research
and the Management Sciences (INFORMS)
20. Journal of Global Optimization published by Springer Netherlands
21. AIAA Journal published by American Institute of Aeronautics and Astronautics
22. Journal of Artiﬁcial Evolution and Applications published by Hindawi Publishing Corporation
23. Applied Intelligence – The International Journal of Artiﬁcial Intelligence, Neural Networks,
and Complex ProblemSolving Technologies published by Springer Netherlands
24. Artiﬁcial Intelligence Review – An International Science and Engineering Journal published
by Kluwer Academic Publishers and Springer Netherlands
25. Annals of Mathematics and Artiﬁcial Intelligence published by Springer Netherlands
26. IEEE Transactions on Knowledge and Data Engineering published by IEEE Computer Soci
ety Press
27. IEEE Intelligent Systems Magazine published by IEEE Computer Society Press
28. ACM SIGART Bulletin published by ACM Press
29. Journal of Machine Learning Research (JMLR) published by MIT Press
30. Discrete Optimization published by Elsevier Science Publishers B.V.
31. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
32. Artiﬁcial Intelligence in Engineering published by Elsevier Science Publishers B.V.
33. Engineering Optimization published by Informa plc and Taylor and Francis LLC
34. Optimization and Engineering published by Kluwer Academic Publishers and Springer
Netherlands
35. International Journal of Organizational and Collective Intelligence (IJOCI) published by Idea
Group Publishing (Idea Group Inc., IGI Global)
36. Journal of Intelligent Information Systems published by SpringerVerlag GmbH
37. International Journal of KnowledgeBased and Intelligent Engineering Systems published by
IOS Press
38. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) published by
IEEE Computer Society
39. Optimization Methods and Software published by Taylor and Francis LLC
40. Journal of Mathematical Modelling and Algorithms published by Springer Netherlands
41. Structural and Multidisciplinary Optimization published by SpringerVerlag GmbH
42. OR/MS Today published by Institute for Operations Research and the Management Sciences
(INFORMS) and Lionheart Publishing, Inc.
43. AI Magazine published by AAAI Press
44. Computers in Industry – An International, Application Oriented Research Journal published
by Elsevier Science Publishers B.V.
45. International Journal Approximate Reasoning published by Elsevier Science Publishers B.V.
46. International Journal of Intelligent Systems published by Wiley Interscience
47. International Journal of Production Research published by Taylor and Francis LLC
48. International Transactions in Operational Research published by Blackwell Publishing Ltd
49. INFORMS Journal on Computing (JOC) published by HighWire Press (Stanford University)
and Institute for Operations Research and the Management Sciences (INFORMS)
50. OR Spectrum – Quantitative Approaches in Management published by SpringerVerlag
51. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization pub
lished by SIAM Activity Group on Optimization – SIAG/OPT
52. K¨ unstliche Intelligenz (KI) published by B¨ottcher IT Verlag
Part II
Diﬃculties in Optimization
138 10 GENERAL INFORMATION ON OPTIMIZATION
Chapter 11
Introduction
The classiﬁcation of optimization algorithms in Section 1.3 and the table of contents of this
book enumerate a wide variety of optimization algorithms. Yet, the approaches introduced
here resemble only a fraction of the actual number of available methods. It is a justiﬁed
question to ask why there are so many diﬀerent approaches, why is this variety needed? One
possible answer is simply because there are so many diﬀerent kinds of optimization tasks.
Each of them puts diﬀerent obstacles into the way of the optimizers and comes with own,
characteristic diﬃculties.
In this part we want to discuss the most important of these complications, the major
problems that may be encountered during optimization [2915]. You may wish to read this
part after reading through some of the later parts on speciﬁc optimization algorithms or
after solving your ﬁrst optimization task. You will then realize that many of the problems
you encountered are discussed here in depth.
Some of subjects in the following text concern Global Optimization in general (multi
modality and overﬁtting, for instance), others apply especially to natureinspired approaches
like Genetic Algorithms (epistasis and neutrality, for example). Sometimes, neglecting a
single one of these aspects during the development of the optimization process can render
the whole eﬀorts invested useless, even if highly eﬃcient optimization techniques are applied.
By giving clear deﬁnitions and comprehensive introductions to these topics, we want to raise
the awareness of scientists and practitioners in the industry and hope to help them to use
optimization algorithms more eﬃciently.
In Figure 11.1, we have sketched a set of diﬀerent types of ﬁtness landscapes (see Sec
tion 5.1) which we are going to discuss. The objective values in the ﬁgure are subject to
minimization and the small bubbles represent candidate solutions under investigation. An
arrow from one bubble to another means that the second individual is found by applying
one search operation to the ﬁrst one.
Before we go more into detail about what makes these landscapes diﬃcult, we should
establish the term in the context of optimization. The degree of diﬃculty of solving a certain
problem with a dedicated algorithm is closely related to its computational complexity. We
will outline this in the next section and show that for many problems, there is no algorithm
which can exactly solve them in reasonable time.
As already stated, optimization algorithms are guided by objective functions. A function
is diﬃcult from a mathematical perspective in this context if it is not continuous, not
diﬀerentiable, or if it has multiple maxima and minima. This understanding of diﬃculty
comes very close to the intuitive sketches in Figure 11.1.
In many real world applications of metaheuristic optimization, the characteristics of the
objective functions are not known in advance. The problems are usually very complex and it
is only rarely possible to derive boundaries for the performance or the runtime of optimizers
in advance, let alone exact estimates with mathematical precision.
In such cases, experience and general rules of thumb are often the best guides available
140 11 INTRODUCTION
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.a: Best Case
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.b: Low Variation
multiple(local)optima o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.c: MultiModal
nousefulgradientinformation
? ? ?
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
Fig. 11.1.d: Rugged
regionwithmisleading
gradientinformation
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.e: Deceptive
?
neutralarea
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.f: Neutral
?
?
neutral area or
areawithoutmuch
information
needle
(isolated
optimum)
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.g: NeedleInAHaystack
?
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 11.1.h: Nightmare
Figure 11.1: Diﬀerent possible properties of ﬁtness landscapes (minimization).
11 INTRODUCTION 141
(see also Chapter 24). Empirical results based on models obtained from related research
areas such as biology are sometimes helpful as well. In this part, we discuss many such models
and rules, providing a better understanding of when the application of a metaheuristic is
feasible and when not, as well as on how to avoid deﬁning problems in a way that makes
them diﬃcult.
Chapter 12
Problem Hardness
Most of the algorithms we will discuss in this book, in particular the metaheuristics, usually
give approximations as results but cannot guarantee to always ﬁnd the globally optimal
solution. However, we are actually able to ﬁnd the global optima for all given problems (of
course, limited to the precision of the available computers). Indeed, the easiest way to do
so would be exhaustive enumeration (see Section 8.3), to simply test all possible values from
the problem space and to return the best one.
This method, obviously, will take extremely long so our goal is to ﬁnd an algorithm
which is better. More precisely, we want to ﬁnd the best algorithm for solving a given class
of problems. This means that we need some sort of metric with which we can compare the
eﬃciency of algorithms [2877, 2940].
12.1 Algorithmic Complexity
The most important measures obtained by analyzing an algorithm
1
are the time that it takes
to produce the wanted outcome and the storage space needed for internal data [635]. We
call those the time complexity and the space complexity dimensions. The timecomplexity
denotes how many steps algorithms need until they return their results. The space complex
ity determines how much memory an algorithm consumes at most in one run. Of course,
these measures depend on the input values passed to the algorithm.
If we have an algorithm that should decide whether a given number is prime or not, the
number of steps needed to ﬁnd that out will diﬀer if the size of the inputs are diﬀerent.
For example, it is easy to see that telling whether 7 is prime or not can be done in much
shorter time than verifying the same property for 2
32 582 657
− 1. The nature of the input
may inﬂuence the runtime and space requirements of an algorithm as well. We immediately
know that 1 000 000 000 000 000 is not prime, though it is a big number, whereas the answer
in case of the smaller number 100 000 007 is less clear.
2
Therefore, often best, average, and
worstcase complexities exist for given input sizes.
In order to compare the eﬃciency of algorithms, some approximative notations have been
introduced [1555, 1556, 2510]. As we just have seen, the basic criterion inﬂuencing the time
and space requirements of an algorithm applied to a problem is the size of the inputs, i. e.,
the number of bits needed to represent the number from which we want to ﬁnd out whether
it is prime or not, the number of elements in a list to be sorted, the number of nodes or
edges in a graph to be 3colored, and so on. We can describe this dependency as a function
of this size.
In real systems however, the knowledge of the exact dependency is not needed. If we,
for example, know that sorting n data elements with the Quicksort algorithm
3
[1240, 1557]
1
http://en.wikipedia.org/wiki/Analysis_of_algorithms [accessed 20070703]
2
100 000 007 is indeed a prime, in case you’re wondering, 2
32 582 657
− 1 too.
3
http://en.wikipedia.org/wiki/Quicksort [accessed 20070703]
144 12 PROBLEM HARDNESS
takes in average something about nlog n steps, this is suﬃcient enough, even if the correct
number is 2nln n ≈ 1.39nlog n.
The BigOfamily notations introduced by Bachmann [163] and made popular by Landau
[1664] allow us to group functions together that rise at approximately the same speed.
Deﬁnition D12.1 (BigO notation). The bigO notation
4
is a mathematical notation
used to describe the asymptotical upper bound of functions.
f (x) ∈ O(g(x)) ⇔∃ x
0
, m ∈ R : m > 0 ∧ [f (x) [ ≤ m[g(x) [ ∀ x > x
0
(12.1)
In other words, a function f (x) is in O of another function g(x) if and only if there exists a
real number x
0
and a constant, positive factor m so that the absolute value of f (x) is smaller
(or equal) than mtimes the absolute value of g(x) for all x that are greater than x
0
. f (x)
thus cannot grow (considerably) faster than g(x). Therefore, x
3
+x
2
+x+1 = f (x) ∈ O
_
x
3
_
since for m = 5 and x
0
= 2 it holds that 5x
3
> x
3
+x
2
+x + 1 ∀x ≥ 2.
In terms of algorithmic complexity, we specify the amount of steps or memory units an
algorithm needs in dependency on the size of its inputs in the bigO notation. More gen
erally, this speciﬁcation is based on keyfeatures of the input data. For a graphprocessing
algorithm, this may be the number of edges [E[ and the number of vertexes [V [ of a graph
to be processed by a graphprocessing algorithm and the algorithmic complexity may be
O
_
[V [
2
+
_
[E[[V [
_
. Some examples for singleparametric functions can be found in Ex
ample E12.1.
Example E12.1 (Some examples of the bigO notation).
class examples description
O(1) f
1
(x) = 2
222
,
f
2
(x) = sinx
Algorithms that have constant runtime for all in
puts are O(1).
O(log n) f
3
(x) = log x,
f
4
(x) = f
4
_
x
2
_
+
1; f
4
(x < 1) = 0
Logarithmic complexity is often a feature of al
gorithms that run on binary trees or search algo
rithms in ordered sets. Notice that O(log n) im
plies that only parts of the input of the algorithm
is read/regarded, since the input has length n and
we only perform mlog n steps.
O(n) f
5
(x) = 23n + 4,
f
6
(x) =
n
2
Algorithms of O(n) require to access and process
their input a constant number of times. This is
for example the case when searching in a linked
list.
O(nlog n) f
7
(x) = 23x +xlog 7x Many sorting algorithms like Quicksort and
Mergesort are in O(nlog n)
O
_
n
2
_
f
8
(x) = 34x
2
,
f
9
(x) =
x+3
i=0
x −2
Some sorting algorithms like selection sort have
this complexity. For many problems, O
_
n
2
_

solutions are acceptable good.
O
_
n
i
_
:
i > 1, i ∈ R
f
10
(x) = x
5
−x
2
The general polynomial complexity. In this group
we ﬁnd many algorithms that work on graphs.
O(2
n
) f
11
(x) = 23 ∗ 2
x
Algorithms with exponential complexity perform
slowly and become infeasible with increasing in
put size. For many hard problems, there exist
only algorithms of this class. Their solution can
otherwise only be approximated by the means of
randomized global optimization techniques.
12.2. COMPLEXITY CLASSES 145
f(x)=x
f(x)=x
2
f(x)=x
4
f(x)=x
8
1
10
100
1000
1million
1billion
1trillion
10
15
10
20
10
30
10
25
10
40
10
35
4 8 1 2 64 128 16 32 256 512 1024 2048
f(x)=x
10
f(x)=1.1
x
msperday
picoseconds
sincebigbang
f(x)=x
x
f(x)=x! f(x)=e
x
f(x)=2
x
f(x)
x
Figure 12.1: Illustration of some functions illustrating their rising speed, gleaned from [2365].
In Example E12.1, we give some examples for the bigO notation and name kinds of algo
rithms which have such complexities. The rising behavior of some functions is illustrated in
Figure 12.1. From this ﬁgure it becomes clear that problems for which only algorithms with
high complexities such as O(2
n
) exist quickly become intractable even for small input sizes
n.
12.2 Complexity Classes
While the bigO notation is used to state how long a speciﬁc algorithm needs (at least/in
average/at most) for input with certain features, the computational complexity
5
of a problem
is bounded by the best algorithm known for it. In other words, it makes a statement about
how much resources are necessary for solving a given problem, or, from the opposite point
of view, tell us whether we can solve a problem with given available resources.
12.2.1 Turing Machines
The amount of resources required at least to solve a problem depends on the problem
itself and on the machine used for solving it. One of the most basic computer models are
deterministic Turing machines
6
(DTMs) [2737] as sketched in Fig. 12.2.a. They consist of
a tape of inﬁnite length which is divided into distinct cells, each capable of holding exactly
one symbol from a ﬁnite alphabet. A Turing machine has a head which is always located
at one of these cells and can read and write the symbols. The machine has a ﬁnite set of
possible states and is driven by rules. These rules decide which symbol should be written
and/or move the head one unit to the left or right based on the machine’s current state and
the symbol on the tape under its head. With this simple machine, any of our computers
and, hence, also the programs executed by them, can be simulated.
7
4
http://en.wikipedia.org/wiki/Big_O_notation [accessed 20070703]
5
http://en.wikipedia.org/wiki/Computational_complexity_theory [accessed 20100624]
6
http://en.wikipedia.org/wiki/Turing_machine [accessed 20100624]
7
Another computational model, RAM (Random Access Machine) is quite close to today’s computer
architectures and can be simulated with Turing machines eﬃciently [1804] and vice versa.
146 12 PROBLEM HARDNESS
1 0
0
0
0
0
0 0 0
0
1
1
1
0
0
0
0
TapeSymbol State Print Motion NextState Þ
A 0 1 Right A Þ
A 1 0 Right B Þ
B 0 1 B Right Þ
B 1 1 Left C Þ
C 1 0 Left C Þ
C 0 1 Right A Þ
Fig. 12.2.a: A deterministic Turing machine.
TapeSymbol State Print Motion NextState Þ
A 0 1 Right A Þ
B 0 1 B Right Þ
B 1 1 Left C Þ
C 1 0 Left C Þ
C 0 Right 1 A Þ
C 0 1 Left A Þ
A 1 0 Right B Þ
A 1 0 Right A Þ
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
Fig. 12.2.b: A nondeterministic Turing machine.
Figure 12.2: Turing machines.
Like DTMs, nondeterministic Turing machine
8
(NTMs, see Fig. 12.2.b) are thought
experiments rather than real computers. They are diﬀerent from DTMs in that they may
have more than one rule for a state/input pair. Whenever there is an ambiguous situation
where n > 1 rules could be applied, the NTM can be thought of to “fork” into n independent
NTMs running in parallel. The computation of the overall system stops when any of these
reaches a ﬁnal, accepting state. NTMs can be simulated with DTMs by performing a
breadthﬁrst search on the possible computation steps of the NTM until it ﬁnds an accepting
state. The number of steps required for this simulation, however, grows exponentially with
the length of the shortest accepting path of the NTM [1483].
12.2.2 P, NP, Hardness, and Completeness
The set of all problems which can be solved by a deterministic Turing machine within
polynomial time is called T [1025]. This means that for the problems within this class,
there exists at least one algorithm (for a DTM) with an algorithmic complexity in O(n
p
)
(where n is the input size and p ∈ N
1
is some natural number).
Algorithms designed for Turing machines can be translated to algorithms for normal PCs.
Since it quite easy to simulate a DTM (with ﬁnite tape) with an oﬀtheshelf computer and
vice versa, these are also the problems that can be solved within polynomial time by existing,
real computers. These are the problems which are said to be exactly solvable in a feasible
way [359, 1092].
AT, on the other hand, includes all problems which can be solved by a nondeterministic
Turing machine in a runtime rising polynomial with the input size. It is the set of all problems
for which solutions can be checked in polynomial time [1549].
Clearly, these include all the problems from T, since DTMs can be considered as a
subset of NTMs and if a deterministic Turing machine can solve something in polynomial
time, a nondeterministic one can as well. The other way around, i.e., the question whether
AT ⊆ T and, hence, T = AT, is much more complicated and so far, there neither exists
proof against nor for it [1549]. However, it is strongly assumed that problems in AT need
superpolynomial, exponential, time to be solved by deterministic Turing machines and
therefore also by computers as we know and use them.
8
http://en.wikipedia.org/wiki/Nondeterministic_Turing_machine [accessed 20100624]
12.3. THE PROBLEM: MANY REALWORLD TASKS ARE ATHARD 147
If problem A can be transformed in a way so that we can use an algorithm for problem
B to solve A, we say A reduces to B. This means that A is no more diﬃcult than B. A
problem is hard for a complexity class if every other problem in this class can be reduced
to it. For classes larger than T, reductions which take no more than polynomial time in the
input size are usually used. The problems which are hard for AT are called AThard. A
problem which is in a complexity class C and hard for C is called complete for C, hence the
name ATcomplete.
12.3 The Problem: Many RealWorld Tasks are NPhard
From a practical perspective, problems in T are usually friendly. Everything with an algo
rithmic complexity in O
_
n
5
_
and less can usually be solved more or less eﬃciently if n does
not get too large. Searching and sorting algorithms, with complexities of between log n and
nlog n, are good examples for such problems which, in most cases, are not bothersome.
However, many realworld problems are AThard or, without formal proof, can be as
sumed to be at least as bad as that. So far, no polynomialtime algorithm for DTMs has
been discovered for this class of problems and the runtime of the algorithms we have for
them increases exponentially with the input size.
In 1962, Bremermann [407] pointed out that “No data processing system whether artiﬁcial
or living can process more than 2 10
47
bits per second per gram of its mass”. This is an
absolute limit to computation power which we are very unlikely to ever be able to unleash.
Yet, even with this power, if the number of possible states of a system grow exponentially
with its size, any solution attempt becomes infeasible quickly. Minsky [1907], for instance,
estimates the number of all possible move sequences in chess to be around 1 10
120
.
The Traveling Salesman Problem which we initially introduced in Example E2.2 on
page 45 (TSP) is a classical example for ATcomplete combinatorial optimization problems.
Since many Vehicle Routing Problems (VRPs) [2912, 2914] are basically iterated variants
of the TSP, they fall at least into the same complexity class. This means that until now,
there exists no algorithm with a lessthan exponential complexity which can be used to ﬁnd
optimal transportation plans for a logistics company. In other words, as soon as a logistics
company has to satisfy transportation orders of more than a handful of customers, it will
likely not be able to serve them in an optimal way.
The Knapsack problem [463], i. e., answering the question on how to ﬁt as many items
(of diﬀerent weight and value) into a weightlimited container so that the totally value of
the items in the container is maximized, is ATcomplete as well. Like the TSP, it occurs in
a variety of practical applications.
The ﬁrst known ATcomplete problem is the satisﬁability problem (SAT) [626]. Here,
the problem is to decide whether or not there is an assignment of values to a set of Boolean
variables x
1
, x
2
, . . . , x
n
which makes a given Boolean expression based on these variables,
parentheses, ∧, ∨, and evaluate to true.
These few examples and the wide area of practical application problems which can be
reduced to them clearly show one thing: Unless T = AT can be found at some point in
time, there will be no way to solve a variety of problems exactly.
12.4 Countermeasures
Solving a problem exactly to optimality is not always possible, as we have learned in the
previous section. If we have to deal with ATcomplete problems, we will therefore have
to give up on some of the features which we expect an algorithm or a solution to have. A
randomized algorithm includes at least one instruction that acts on the basis of random de
cision, as stated in Deﬁnition D1.13 on page 33. In other words, as opposed to deterministic
148 12 PROBLEM HARDNESS
algorithms
9
which always produce the same results when given the same inputs, a random
ized algorithm violates the constraint of determinism. Randomized algorithms are also often
called probabilistic algorithms [1281, 1282, 1919, 1966, 1967]. There are two general classes
of randomized algorithms: Las Vegas and Monte Carlo algorithms.
Deﬁnition D12.2 (Las Vegas Algorithm). A Las Vegas algorithm
10
is a randomized
algorithm that never returns a wrong result [141, 1281, 1282, 1966]. It either returns the
correct result, reports a failure, or does not return at all.
If a Las Vegas algorithm returns, its outcome is deterministic (but not the algorithm itself).
The termination, however, cannot be guaranteed. There usually exists an expected runtime
limit for such algorithms – their actual execution however may take arbitrarily long. In
summary, a Las Vegas algorithm terminates with a positive probability and is (partially)
correct.
Deﬁnition D12.3 (Monte Carlo Algorithm). A Monte Carlo algorithm
11
always ter
minates. Its result, however, can be either correct or incorrect [1281, 1282, 1966].
In contrast to Las Vegas algorithms, Monte Carlo algorithms always terminate but are
(partially) correct only with a positive probability. As pointed out in Section 1.3, they are
usually the way to go to tackle complex problems.
9
http://en.wikipedia.org/wiki/Deterministic_computation [accessed 20070703]
10
http://en.wikipedia.org/wiki/Las_Vegas_algorithm [accessed 20070703]
11
http://en.wikipedia.org/wiki/Monte_carlo_algorithm [accessed 20070703]
12.4. COUNTERMEASURES 149
Tasks T12
44. Find all bigO relationships between the following functions f
i
: N
1
→R:
• f
1
(x) = 9
√
x
• f
2
(x) = log x
• f
3
(x) = 7
• f
4
(x) = (1.5)
x
• f
5
(x) =
99
/x
• f
6
(x) = e
x
• f
7
(x) = 10x
2
• f
8
(x) = (2 + (−1)
x
)x
• f
9
(x) = 2 ln x
• f
10
(x) = (1 + (−1)
x
)x
2
+x
[10 points]
45. Show why the statement “The runtime of the algorithm is at least in O
_
n
2
_
” makes
no sense.
[5 points]
46. What is the diﬀerence between a deterministic Turing machine (DTM) and non
deterministic one (NTM)?
[10 points]
47. Write down the states and rules of a deterministic Turing machine which inverts a
sequence of zeros and ones on the tape while moving its head to the right and that
stops when encountering a blank symbol.
[5 points]
48. Use literature or the internet to ﬁnd at least three more ATcomplete problems. Deﬁne
them as optimization problems by specifying proper problem spaces X and objective
functions f which are subject to minimization.
[15 points]
Chapter 13
Unsatisfying Convergence
13.1 Introduction
Deﬁnition D13.1 (Convergence). An optimization algorithm has converged (a) if it can
not reach new candidate solutions anymore or (b) if it keeps on producing candidate solutions
from a “small”
1
subset of the problem space.
Optimization processes will usually converge at some point in time. In nature, a similar phe
nomenon can be observed according to Koza [1602]: The niche preemption principle states
that a niche in a natural environment tends to become dominated by a single species [1812].
Linear convergence to the global optimum x
⋆
with respect to an objective function f is
deﬁned in Equation 13.1, where t is the iteration index of the optimization algorithm and
x(t) is the best candidate solution discovered until iteration t [1976]. This is basically the
best behavior of an optimization process we could wish for.
[f(x(t + 1)) −f(x
⋆
)[ ≤ c [f(x(t)) −f(x
⋆
)[ , c ∈ [0, 1) (13.1)
One of the problems in Global Optimization (and basically, also in nature) is that it is often
not possible to determine whether the best solution currently known is situated on local or
a global optimum and thus, if convergence is acceptable. In other words, it is usually not
clear whether the optimization process can be stopped, whether it should concentrate on
reﬁning the current optimum, or whether it should examine other parts of the search space
instead. This can, of course, only become cumbersome if there are multiple (local) optima,
i. e., the problem is multimodal [1266, 2267] as depicted in Fig. 11.1.c on page 140.
Deﬁnition D13.2 (MultiModality). A mathematical function is multimodal if it has
multiple (local or global) maxima or minima [711, 2474, 3090]. A set of objective functions
(or a vector function) f is multimodal if it has multiple (local or global) optima – depending
on the deﬁnition of “optimum” in the context of the corresponding optimization problem.
The existence of multiple global optima itself is not problematic and the discovery of only
a subset of them can still be considered as successful in many cases. The occurrence of
numerous local optima, however, is more complicated.
1
according to a suitable metric like numbers of modiﬁcations or mutations which need to be applied to
a given solution in order to leave this subset
152 13 UNSATISFYING CONVERGENCE
13.2 The Problems
There are three basic problems which are related to the convergence of an optimization algo
rithm: premature, nonuniform, and domino convergence. The ﬁrst one is most important
in optimization, but the latter ones may also cause large inconvenience.
13.2.1 Premature Convergence
Deﬁnition D13.3 (Premature Convergence). An optimization process has prema
turely converged to a local optimum if it is no longer able to explore other parts of the
search space than the area currently being examined and there exists another region that
contains a superior solution [2417, 2760].
The goal of optimization is, of course, to ﬁnd the global optima. Premature convergence,
basically, is the convergence of an optimization process to a local optimum. A prematurely
converged optimization process hence provides solutions which are below the possible or
anticipated quality.
Example E13.1 (Premature Convergence).
Figure 13.1 illustrates two examples for premature convergence. Although a higher peak
exists in the maximization problem pictured in Fig. 13.1.a, the optimization algorithm solely
focuses on the local optimum to the right. The same situation can be observed in the
minimization task shown in Fig. 13.1.b where the optimization process gets trapped in the
local minimum on the left hand side.
globaloptimum
localoptimum
Fig. 13.1.a: Example 1: Maximization
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
Fig. 13.1.b: Example 2: Minimization
Figure 13.1: Premature convergence in the objective space.
13.2.2 NonUniform Convergence
13.2. THE PROBLEMS 153
Deﬁnition D13.4 (NonUniform Convergence). In optimization problems with in
ﬁnitely many possible (globally) optimal solutions, an optimization process has non
uniformly converged if the solutions that it provides only represent a subset of the optimal
characteristics.
This deﬁnition is rather fuzzy, but it becomes much clearer when taking a look on
Example E13.2. The goal of optimization in problems with inﬁnitely many optima is often
not to discover only one or two optima. Instead, usually we want to ﬁnd a set X of solutions
which has a good spread over all possible optimal features.
Example E13.2 (Convergence in a BiObjective Problem).
Assume that we wish to solve a biobjective problem with f = (f
1
, f
2
). Then, the goal is to
ﬁnd a set X of candidate solutions which approximates the set X
⋆
of globally (Pareto) optimal
solutions as good as possible. Then, two issues arise: First, the optimization process should
converge to the true Pareto front and return solutions as close to it as possible. Second,
these solutions should be uniformly spread along this front.
f
1
f
2
trueParetofront
frontreturned
bytheoptimizer
Fig. 13.2.a: Bad Convergence and
Good Spread
f
1
f
2
Fig. 13.2.b: Good Convergence and
Bad Spread
f
1
f
2
Fig. 13.2.c: Good Convergence and
Spread
Figure 13.2: Pareto front approximation sets [2915].
Let us examine the three fronts included in Figure 13.2 [2915]. The ﬁrst picture
(Fig. 13.2.a) shows a result X having a very good spread (or diversity) of solutions, but
the points are far away from the true Pareto front. Such results are not attractive because
they do not provide Paretooptimal solutions and we would consider the convergence to be
premature in this case. The second example (Fig. 13.2.b) contains a set of solutions which
are very close to the true Pareto front but cover it only partially, so the decision maker
could lose important tradeoﬀ options. Finally, the front depicted in Fig. 13.2.c has the two
desirable properties of good convergence and spread.
13.2.3 Domino Convergence
The phenomenon of domino convergence has been brought to attention by Rudnick [2347]
who studied it in the context of his BinInt problem [2347, 2698] which is discussed in
Section 50.2.5.2. In principle, domino convergence occurs when the candidate solutions
have features which contribute to signiﬁcantly diﬀerent degrees to the total ﬁtness. If these
features are encoded in separate genes (or building blocks) in the genotypes, they are likely to
be treated with diﬀerent priorities, at least in randomized or heuristic optimization methods.
Building blocks with a very strong positive inﬂuence on the objective values, for instance,
will quickly be adopted by the optimization process (i. e., “converge”). During this time, the
alleles of genes with a smaller contribution are ignored. They do not come into play until
154 13 UNSATISFYING CONVERGENCE
the optimal alleles of the more “important” blocks have been accumulated. Rudnick [2347]
called this sequential convergence phenomenon domino convergence due to its resemblance
to a row of falling domino stones [2698].
Let us consider the application of a Genetic Algorithm, an optimization method working
on bit strings, in such a scenario. Mutation operators from time to time destroy building
blocks with strong positive inﬂuence which are then reconstructed by the search. If this
happens with a high enough frequency, the optimization process will never get to optimize
the less salient blocks because repairing and rediscovering those with higher importance
takes precedence. Thus, the mutation rate of the EA limits the probability of ﬁnding the
global optima in such a situation.
In the worst case, the contributions of the less salient genes may almost look like noise
and they are not optimized at all. Such a situation is also an instance of premature conver
gence, since the global optimum which would involve optimal conﬁgurations of all building
blocks will not be discovered. In this situation, restarting the optimization process will not
help because it will turn out the same way with very high probability. Example problems
which are often likely to exhibit domino convergence are the Royal Road and the aforemen
tioned BinInt problem, which you can ﬁnd discussed in Section 50.2.4 and Section 50.2.5.2,
respectively.
There are some relations between domino convergence and the Siren Pitfall in feature
selection for classiﬁcation in data mining [961]. Here, if one feature is highly useful in a hard
subproblem of the classiﬁcation task, its utility may still be overshadowed by mediocre
features contributing to easy subtasks.
13.3 One Cause: Loss of Diversity
In biology, diversity is the variety and abundance of organisms at a given place and
time [1813, 2112]. Much of the beauty and eﬃciency of natural ecosystems is based on
a dazzling array of species interacting in manifold ways. Diversiﬁcation is also a good strat
egy utilized by investors in the economy in order to increase their wealth.
In populationbased Global Optimization algorithms, maintaining a set of diverse candi
date solutions is very important as well. Losing diversity means approaching a state where all
the candidate solutions under investigation are similar to each other. Another term for this
state is convergence. Discussions about how diversity can be measured have been provided
by Cousins [648], Magurran [1813], Morrison [1956], Morrison and De Jong [1958], Paenke
et al. [2112], Routledge [2344], and Burke et al. [454, 458].
Example E13.3 (Loss of Diversity → Premature Convergence).
Figure 13.3 illustrates two possible ways in which a populationbased optimization method
may progress. Both processes start with a set of random candidate solutions denoted as
bubbles in the ﬁtness landscape already used in Fig. 13.1.a in their search for the global
maximum. While the upper algorithm preserves a reasonable diversity in the set of elements
under investigation although initially, the best candidate solutions are located in the prox
imity of the local optimum. By doing so, it also manages to explore the area near the global
optimum and eventually, discovers both peaks. The process illustrated in the lower part of
the picture instead solely concentrates on the area where the best candidate solutions were
sampled in the beginning and quickly climbs up the local optimum. Reaching a state of
premature convergence, it cannot escape from there anymore.
13.3. ONE CAUSE: LOSS OF DIVERSITY 155
lossofdiversity ® prematureconvergence
preservationofdiversity ® convergencetoglobaloptimum
initialsituation
Figure 13.3: Loss of Diversity leads to Premature Convergence.
Preserving diversity is directly linked with maintaining a good balance between exploitation
and exploration [2112] and has been studied by researchers from many domains, such as
1. Genetic Algorithms [238, 2063, 2321, 2322],
2. Evolutionary Algorithms [365, 366, 1692, 1964, 2503, 2573],
3. Genetic Programming [392, 456, 458, 709, 1156, 1157],
4. Tabu Search [1067, 1069], and
5. Particle Swarm Optimization [2943].
13.3.1 Exploration vs. Exploitation
The operations which create new solutions from existing ones have a very large impact on
the speed of convergence and the diversity of the populations [884, 2535]. The step size in
Evolution Strategy is a good example of this issue: setting it properly is very important and
leads over to the “exploration versus exploitation” problem [1250]. The dilemma of deciding
which of the two principles to apply to which degree at a certain stage of optimization is
sketched in Figure 13.4 and can be observed in many areas of Global Optimization.
2
Deﬁnition D13.5 (Exploration). In the context of optimization, exploration means ﬁnd
ing new points in areas of the search space which have not been investigated before.
2
More or less synonymously to exploitation and exploration, the terms intensiﬁcations and diversiﬁcation
have been introduced by Glover [1067, 1069] in the context of Tabu Search.
156 13 UNSATISFYING CONVERGENCE
initialsituation:
goodcandidatesolution
found
exploitation:
searchcloseproximityof
thegoodcandidata
exploration:
searchalsothe
distantarea
Figure 13.4: Exploration versus Exploitation
Since computers have only limited memory, already evaluated candidate solutions usually
have to be discarded in order to accommodate the new ones. Exploration is a metaphor
for the procedure which allows search operations to ﬁnd novel and maybe better solution
structures. Such operators (like mutation in Evolutionary Algorithms) have a high chance
of creating inferior solutions by destroying good building blocks but also a small chance of
ﬁnding totally new, superior traits (which, however, is not guaranteed at all).
Deﬁnition D13.6 (Exploitation). Exploitation is the process of improving and combin
ing the traits of the currently known solutions.
The crossover operator in Evolutionary Algorithms, for instance, is considered to be an
operator for exploitation. Exploitation operations often incorporate small changes into al
ready tested individuals leading to new, very similar candidate solutions or try to merge
building blocks of diﬀerent, promising individuals. They usually have the disadvantage that
other, possibly better, solutions located in distant areas of the problem space will not be
discovered.
Almost all components of optimization strategies can either be used for increasing ex
ploitation or in favor of exploration. Unary search operations that improve an existing
solution in small steps can often be built, hence being exploitation operators. They can also
be implemented in a way that introduces much randomness into the individuals, eﬀectively
making them exploration operators. Selection operations
3
in Evolutionary Computation
choose a set of the most promising candidate solutions which will be investigated in the
next iteration of the optimizers. They can either return a small group of best individuals
(exploitation) or a wide range of existing candidate solutions (exploration).
Optimization algorithms that favor exploitation over exploration have higher conver
gence speed but run the risk of not ﬁnding the optimal solution and may get stuck at a local
optimum. Then again, algorithms which perform excessive exploration may never improve
their candidate solutions well enough to ﬁnd the global optimum or it may take them very
long to discover it “by accident”. A good example for this dilemma is the Simulated An
nealing algorithm discussed in Chapter 27 on page 243. It is often modiﬁed to a form called
simulated quenching which focuses on exploitation but loses the guaranteed convergence to
the optimum. Generally, optimization algorithms should employ at least one search opera
tion of explorative character and at least one which is able to exploit good solutions further.
There exists a vast body of research on the tradeoﬀ between exploration and exploitation
that optimization algorithms have to face [87, 741, 864, 885, 1253, 1986].
3
Selection will be discussed in Section 28.4 on page 285.
13.4. COUNTERMEASURES 157
13.4 Countermeasures
There is no general approach which can prevent premature convergence. The probability
that an optimization process gets caught in a local optimum depends on the characteristics
of the problem to be solved and the parameter settings and features of the optimization
algorithms applied [2350, 2722].
13.4.1 Search Operator Design
The most basic measure to decrease the probability of premature convergence is to make
sure that the search operations are complete (see Deﬁnition D4.8 on page 85), i. e., to make
sure that they can at least theoretically reach every point in the search space from every
other point. Then, it is (at least theoretically) possible to escape arbitrary local optima.
A prime example for bad operator design is Algorithm 1.1 on page 40 which cannot escape
any local optima. The two diﬀerent operators given in Equation 5.4 and Equation 5.5 on 96
are another example for the inﬂuence of the search operation design on the vulnerability of
the optimization process to local convergence.
13.4.2 Restarting
A very crude and yet, sometimes eﬀective measure is restarting the optimization process at
randomly chosen points in time. One example for this method is GRASPs, Greedy Random
ized Adaptive Search Procedures [902, 906] (see Chapter 42 on page 499), which continuously
restart the process of creating an initial solution and reﬁning it with local search. Still, such
approaches are likely to fail in domino convergence situations. Increasing the proportion of
exploration operations may also reduce the chance of premature convergence.
13.4.3 Low Selection Pressure and Population Size
Generally, the higher the chance that candidate solutions with bad ﬁtness are investigated
instead of being discarded in favor of the seemingly better ones, the lower the chance of
getting stuck at a local optimum. This is the exact idea which distinguishes Simulated
Annealing from Hill Climbing (see Chapter 27). It is known that Simulated Annealing can
ﬁnd the global optimum, whereas simple Hill Climbers are likely to prematurely converge.
In an EA, too, using low selection pressure decreases the chance of premature convergence.
However, such an approach also decreases the speed with which good solutions are exploited.
Simulated Annealing, for instance, takes extremely long if a guaranteed convergence to the
global optimum is required.
Increasing the population size of a populationbased optimizer may be useful as well,
since larger populations can maintain more individuals (and hence, also be more diverse).
However, the idea that larger populations will always lead to better optimization results
does not always hold, as shown by van Nimwegen and Crutchﬁeld [2778].
13.4.4 Sharing, Niching, and Clearing
Instead of increasing the population size, it makes thus sense to “gain more” from the smaller
populations. In order to extend the duration of the evolution in Evolutionary Algorithms,
many methods have been devised for steering the search away from areas which have already
been frequently sampled. In steadystate EAs (see Paragraph 28.1.4.2.1), it is common to
remove duplicate genotypes from the population [2324].
More general, the exploration capabilities of an optimizer can be improved by integrating
density metrics into the ﬁtness assignment process. The most popular of such approaches
are sharing and niching (see Section 28.3.4, [735, 742, 1080, 1085, 1250, 1899, 2391]). The
158 13 UNSATISFYING CONVERGENCE
Strength Pareto Algorithms, which are widely accepted to be highly eﬃcient, use another
idea: they adapt the number of individuals that one candidate solution dominates as density
measure [3097, 3100].
P´etrowski [2167, 2168] introduced the related clearing approach which you can ﬁnd
discussed in Section 28.4.7, where individuals are grouped according to their distance in the
phenotypic or genotypic space and all but a certain number of individuals from each group
just receive the worst possible ﬁtness. A similar method aiming for convergence prevention
is introduced in Section 28.4.7, where individuals close to each other in objective space are
deleted.
13.4.5 SelfAdaptation
Another approach against premature convergence is to introduce the capability of self
adaptation, allowing the optimization algorithm to change its strategies or to modify its
parameters depending on its current state. Such behaviors, however, are often implemented
not in order to prevent premature convergence but to speed up the optimization process
(which may lead to premature convergence to local optima) [2351–2353].
13.4.6 MultiObjectivization
Recently, the idea of using helper objectives has emerged. Here, usually a singleobjective
problem is transformed into a multiobjective one by adding new objective functions [1429,
1440, 1442, 1553, 1757, 2025]. Brockhoﬀ et al. [425] proved that in some cases, such changes
can speed up the optimization process. The new objectives are often derived from the main
objective by decomposition [1178] or from certain characteristics of the problem [1429]. They
are then optimized together with the original objective function with some multiobjective
technique, say an MOEA (see Deﬁnition D28.2). Problems where the main objective is
some sort of sum of diﬀerent components contributing to the ﬁtness are especially suitable
for that purpose, since such a component naturally lends itself as helper objective. It has
been shown that the original objective function has at least as many local optima as the
total number of all local optima present in the decomposed objectives for sumofparts
multiobjectivizations [1178].
Example E13.4 (SumofParts MultiObjectivization).
In Figure 13.5, we provide an example of multiobjectivization of the a sumofparts objective
function f deﬁned on the real interval X = G = [0, 1]. The original (single) objective f
(Fig. 13.5.a) can be divided into the two objective functions f
1
and f
2
with f(x) = f
1
(x)+f
2
(x)
illustrated Fig. 13.5.b.
If we assume minimization, f has ﬁve locally optimal sets according to Deﬁnition D4.14
on page 89: The sets X
⋆
1
and X
⋆
5
are globally optimal since they contain the elements with
lowest function value. x
⋆
l,2
, x
⋆
l,3
, and x
⋆
l,4
are three sets each containing exactly one locally
minimal point.
We now split up f to f
1
and f
2
and optimize either f ≡ (f
1
, f
2
) or f ≡ (f, f
1
, f
2
) together.
If we apply Paretobased optimization (see Section 3.3.5 on page 65) and use the Deﬁni
tion D4.17 on page 89 of local optima (which is also used in [1178]), we ﬁnd that there are
only three locally optima regions remaining A
⋆
1
, A
⋆
2
, and A
⋆
3
.
A
⋆
1
, for example, is a continuous region which borders to elements it dominates. If we
go a small step to the left from the left border of A
⋆
1
, we will ﬁnd an element with the same
value of f
1
but with worse values in both f
2
and f. The same is true for its right border. A
⋆
3
has exactly the same behavior.
Any step to the left or right within A
⋆
2
leads to an improvement of either f
1
or f
2
and a
degeneration of the other function. Therefore, no element within A
⋆
2
dominates any other
element in A
⋆
2
. However, at the borders of A
⋆
2
, f
1
becomes constant while f
2
keeps growing,
13.4. COUNTERMEASURES 159
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6 0.8 1
f(x)
mainobjective:f(x)=f (x)+f (x)
1 2
X
«
1
x
«
l,2
x
«
l,3 x
«
l,4
X
«
5
x X Î
Fig. 13.5.a: The original objective and its local op
tima.
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6 0.8 1
A
«
1
A
«
2
A
«
3
f (x)
1
f (x)
2
x X Î
Fig. 13.5.b: The helper objectives and the local op
tima in the objective space.
Figure 13.5: MultiObjectivization `a la SumofParts.
i. e., getting worse. The neighboring elements outside of A
⋆
2 are thus dominated by the
elements on its borders.
Multiobjectivization may thus help here to overcome the local optima, since there are
fewer locally optimal regions. However, a look at A
⋆
2
shows us that it contains x
⋆
l,2
, x
⋆
l,3
, and
x
⋆
l,4
and – since it is continuous – also (uncountable) inﬁnitly many other points. In this
case, it would thus not be clear whether multiobjectivization would be beneﬁcial or not a
priori.
160 13 UNSATISFYING CONVERGENCE
Tasks T13
49. Sketch one possible objective function for a singleobjective optimization problem
which has inﬁtely many solutions. Copy this graph three times and paint six points
into each copy which represent:
(a) the possible results of a possible prematurely converged optimization process,
(b) the possible results of a possible correctly converged optimization process with
nonuniform spread (convergence),
(c) the possible results of a possible optimization process which has converged nicely.
[10 points]
50. Discuss the dilemma of exploration versus exploitation. What are the advantages and
disadvantages of these two general search principles?
[5 points]
51. Discuss why the Hill Climbing optimization algorithm (Chapter 26) is particularly
prone to premature convergence and how Simulated Annealing (Chapter 27) mitigates
the cause of this vulnerability.
[5 points]
Chapter 14
Ruggedness and Weak Causality
unimodal multimodal somewhatrugged veryrugged
landscapedifficultyincreases
Figure 14.1: The landscape diﬃculty increases with increasing ruggedness.
14.1 The Problem: Ruggedness
Optimization algorithms generally depend on some form of gradient in the objective or ﬁtness
space. The objective functions should be continuous and exhibit low total variation
1
, so the
optimizer can descend the gradient easily. If the objective functions become more unsteady
or ﬂuctuating, i. e., go up and down often, it becomes more complicated for the optimization
process to ﬁnd the right directions to proceed to as sketched in Figure 14.1. The more
rugged a function gets, the harder it gets to optimize it. For short, one could say ruggedness
is multimodality plus steep ascends and descends in the ﬁtness landscape. Examples of
rugged landscapes are Kauﬀman’s NK ﬁtness landscape (see Section 50.2.1), the pSpin
model discussed in Section 50.2.2, Bergman and Feldman’s jagged ﬁtness landscape [276],
and the sketch in Fig. 11.1.d on page 140.
14.2 One Cause: Weak Causality
During an optimization process, new points in the search space are created by the search
operations. Generally we can assume that the genotypes which are the input of the search
operations correspond to phenotypes which have previously been selected. Usually, the
better or the more promising an individual is, the higher are its chances of being selected for
further investigation. Reversing this statement suggests that individuals which are passed
to the search operations are likely to have a good ﬁtness. Since the ﬁtness of a candidate
solution depends on its properties, it can be assumed that the features of these individuals
1
http://en.wikipedia.org/wiki/Total_variation [accessed 20080423]
162 14 RUGGEDNESS AND WEAK CAUSALITY
are not so bad either. It should thus be possible for the optimizer to introduce slight changes
to their properties in order to ﬁnd out whether they can be improved any further
2
. Normally,
such exploitive modiﬁcations should also lead to small changes in the objective values and
hence, in the ﬁtness of the candidate solution.
Deﬁnition D14.1 (Strong Causality). Strong causality (locality) means that small
changes in the properties of an object also lead to small changes in its behavior [2278,
2279, 2329].
This principle (proposed by Rechenberg [2278, 2279]) should not only hold for the search
spaces and operations designed for optimization (see Equation 24.5 on page 218 in Sec
tion 24.9), but applies to natural genomes as well. The oﬀspring resulting from sexual
reproduction of two ﬁsh, for instance, has a diﬀerent genotype than its parents. Yet, it is
far more probable that these variations manifest in a unique color pattern of the scales, for
example, instead of leading to a totally diﬀerent creature.
Apart from this straightforward, informal explanation here, causality has been investi
gated thoroughly in diﬀerent ﬁelds of optimization, such as Evolution Strategy [835, 2278],
structure evolution [1760, 1761], Genetic Programming [835, 1396, 2327, 2329], genotype
phenotype mappings [2451], search operators [835], and Evolutionary Algorithms in gen
eral [835, 2338, 2599].
Sendhoﬀ et al. [2451, 2452] introduce a very nice deﬁnition for causality which combines
the search space G, the (unary) search operations searchOp
1
∈ Op, the genotypephenotype
mapping “gpm” and the problem space X with the objective functions f . First, a distance
measure “dist caus” based on the stochastic nature of the unary search operation searchOp
1
is deﬁned.
3
dist caus(g
1
, g
2
) = −log
_
1
P
id
P (g
2
= searchOp
1
(g
1
))
_
(14.1)
P
id
= P (g = searchOp
1
(g)) (14.2)
In Equation 14.1, g
1
and g
2
are two elements of the search space G, P (g
2
= searchOp
1
(g
1
))
is the probability that g
2
is the result of one application of the search operation searchOp
1
to
g
1
, and P
id
is the probability that the searchOp
1
maps an element to itself. dist caus(g
1
, g
2
)
falls with a rising probability the g
1
is mapped to g
2
. Sendhoﬀ et al. [2451, 2452] then deﬁne
strong causality as the state where Equation 14.3 holds:
∀g
1
, g
2
, g
3
∈ G : [[f (gpm(g
1
)) −f (gpm(g
2
))[[
2
< [[f (gpm(g
1
)) −f (gpm(g
3
))[[
2
dist caus(g
1
, g
2
) < dist caus(g
1
, g
3
)
P (g
2
= searchOp
1
(g
1
)) > P (g
3
= searchOp
1
(g
1
)) (14.3)
In other words, strong causality means that the search operations have a higher chance to
produce points g
2
which (map to candidate solutions which) are closer in objective space
to their inputs g
1
than to create elements x
3
which (map to candidate solutions which) are
farther away.
In ﬁtness landscapes with weak (low) causality, small changes in the candidate solutions
often lead to large changes in the objective values, i. e., ruggedness. It then becomes harder
to decide which region of the problem space to explore and the optimizer cannot ﬁnd reliable
gradient information to follow. A small modiﬁcation of a very bad candidate solution may
then lead to a new local optimum and the best candidate solution currently known may be
surrounded by points that are inferior to all other tested individuals.
The lower the causality of an optimization problem, the more rugged its ﬁtness landscape
is, which leads to a degeneration of the performance of the optimizer [1563]. This does not
necessarily mean that it is impossible to ﬁnd good solutions, but it may take very long to
do so.
2
We have already mentioned this under the subject of exploitation.
3
This measure has the slight drawback that it is undeﬁned for P (g
2
= searchOp
1
(g
2
)) = 0 or P
id
= 0.
14.3. FITNESS LANDSCAPE MEASURES 163
14.3 Fitness Landscape Measures
As measures for the ruggedness of a ﬁtness landscape (or their general diﬃculty), many
diﬀerent metrics have been proposed. Wedge and Kell [2874] and Altenberg [80] provide
nice lists of them in their work
4
, which we summarize here:
1. Weinberger [2881] introduced the autocorrelation function and the correlation length
of random walks.
2. The correlation of the search operators was used by Manderick et al. [1821] in con
junction with the autocorrelation [2267].
3. Jones and Forrest [1467, 1468] proposed the ﬁtness distance correlation (FDC), the
correlation of the ﬁtness of an individual and its distance to the global optimum. This
measure has been extended by researchers such as Clergue et al. [590, 2784].
4. The probability that search operations create oﬀspring ﬁtter than their parents, as
deﬁned by Rechenberg [2278] and Beyer [295] (and called evolvability by Altenberg
[77]), will be discussed in Section 16.2 on page 172 in depth.
5. Simulation dynamics have been researched by Altenberg [77] and Grefenstette [1130].
6. Another interesting metric is the ﬁtness variance of formae (Radcliﬀe and Surry [2246])
and schemas (Reeves and Wright [2284]).
7. The error threshold method from theoretical biology [867, 2057] has been adopted
Ochoa et al. [2062] for Evolutionary Algorithms. It is the “critical mutation rate be
yond which structures obtained by the evolutionary process are destroyed by mutation
more frequently than selection can reproduce them” [2062].
8. The negative slope coeﬃcient (NSC) by Vanneschi et al. [2785, 2786] may be considered
as an extension of Altenberg’s evolvability measure.
9. Davidor [691] uses the epistatic variance as a measure of utility of a certain represen
tation in Genetic Algorithms. We discuss the issue of epistasis in Chapter 17.
10. The genotypeﬁtness correlation (GFC) of Wedge and Kell [2874] is a new measure for
ruggedness in ﬁtness landscape and has been shown to be a good guide for determining
optimal population sizes in Genetic Programming.
11. Rana [2267] deﬁned the Staticφ and the Basinδ metrics based on the notion of
schemas in binarycoded search spaces.
14.3.1 Autocorrelation and Correlation Length
As example, let us take a look at the autocorrelation function as well as the correlation
length of random walks [2881]. Here we borrow its deﬁnition from V´erel et al. [2802]:
Deﬁnition D14.2 (Autocorrelation Function). Given a random walk (x
i
, x
i+1
, . . . ),
the autocorrelation function ρ of an objective function f is the autocorrelation function
of the time series (f(x
i
) , f(x
i+1
) , . . . ).
ρ(k, f) =
E[f(x
i
) f(x
i+k
)] −E[f(x
i
)] E[f(x
i+k
)]
D
2
[f(x
i
)]
(14.4)
where E[f(x
i
)] and D
2
[f(x
i
)] are the expected value and the variance of f(x
i
).
4
Especially the one of Wedge and Kell [2874] is beautiful and far more detailed than this summary here.
164 14 RUGGEDNESS AND WEAK CAUSALITY
The correlation length τ = −
1
log ρ(1,f)
measures how the autocorrelation function decreases
and summarizes the ruggedness of the ﬁtness landscape: the larger the correlation length,
the lower the total variation of the landscape. From the works of Kinnear, Jr [1540] and
Lipsitch [1743] from twenty, however, we also know that correlation measures do not always
represent the hardness of a problem landscape full.
14.4 Countermeasures
To the knowledge of the author, no viable method which can directly mitigate the eﬀects
of rugged ﬁtness landscapes exists. In populationbased approaches, using large population
sizes and applying methods to increase the diversity can reduce the inﬂuence of ruggedness,
but only up to a certain degree. Utilizing Lamarckian evolution [722, 2933] or the Baldwin
eﬀect [191, 1235, 1236, 2933], i. e., incorporating a local search into the optimization pro
cess, may further help to smoothen out the ﬁtness landscape [1144] (see Section 36.2 and
Section 36.3, respectively).
Weak causality is often a homemade problem because it results to some extent from
the choice of the solution representation and search operations. We pointed out that explo
ration operations are important for lowering the risk of premature convergence. Exploitation
operators are as same as important for reﬁning solutions to a certain degree. In order to
apply optimization algorithms in an eﬃcient manner, it is necessary to ﬁnd representations
which allow for iterative modiﬁcations with bounded inﬂuence on the objective values, i. e.,
exploitation. In Chapter 24, we present some further rulesofthumb for search space and
operation design.
14.4. COUNTERMEASURES 165
Tasks T14
52. Investigate the Weierstraß Function
5
. Is it unimodal, smooth, multimodal, or even
rugged?
[5 points]
53. Why is causality important in metaheuristic optimization? Why are problems with
low causality normally harder than problems with strong causality?
[5 points]
54. Deﬁne three objective functions by yourself: a unimodal one (f
1
: R →R
+
), a simple
multimodal one (f
2
: R →R
+
), and a rugged one (f
3
: R →R
+
). All three functions
should have the same global minimum (optimum) 0 with objective value 0. Try to solve
all three function with a trivial implementation of Hill Climbing (see Algorithm 26.1 on
page 230) by applying it 30 times to each function. Note your observations regarding
the problem hardness.
[15 points]
5
http://en.wikipedia.org/wiki/Weierstrass_function [accessed 20100914]
Chapter 15
Deceptiveness
15.1 Introduction
Especially annoying ﬁtness landscapes show deceptiveness (or deceptivity). The gradient
of deceptive objective functions leads the optimizer away from the optima, as illustrated in
Figure 15.1 and Fig. 11.1.e on page 140.
unimodal unimodalwith
neutrality
multimodal(center
andlimits)deceptive
verydeceptive
landscapedifficultyincreases
Figure 15.1: Increasing landscape diﬃculty caused by deceptivity.
The term deceptiveness [288] is mainly used in the Genetic Algorithm
1
community in the
context of the Schema Theorem [1076, 1077]. Schemas describe certain areas (hyperplanes)
in the search space. If an optimization algorithm has discovered an area with a better
average ﬁtness compared to other regions, it will focus on exploring this region based on
the assumption that highly ﬁt areas are likely to contain the true optimum. Objective
functions where this is not the case are called deceptive [288, 1075, 1739]. Examples for
deceptiveness are the ND ﬁtness landscapes outlined in Section 50.2.3, trap functions (see
Section 50.2.3.1), and the fully deceptive problems given by Goldberg et al. [743, 1076, 1083]
and [288].
15.2 The Problem
Deﬁnition D15.1 (Deceptiveness). An objective function f : X →R is deceptive if a Hill
Climbing algorithm (see Chapter 26 on page 229) would be steered in a direction leading
away from all global optima in large parts of the search space.
1
We are going to discuss Genetic Algorithms in Chapter 29 on page 325 and the Schema Theorem
in Section 29.5 on page 341.
168 15 DECEPTIVENESS
For Deﬁnition D15.1, it is usually also assumed that (a) the search space is also the problem
space (G = X), (b) the genotypephenotype mapping is the identity mapping, and (c) the
search operations do not violate the causality (see Deﬁnition D14.1 on page 162).. Still,
Deﬁnition D15.1 remains a bit blurry since the term large parts of the search space is
intuitive but not precise. However, it reveals the basic problem caused by deceptiveness.
If the information accumulated by an optimizer actually guides it away from the optimum,
search algorithms will perform worse than a random walk or an exhaustive enumeration
method. This issue has been known for a long time [1914, 1915, 2696, 2868] and has been
subsumed under the No Free Lunch Theorem which we will discuss in Chapter 23.
15.3 Countermeasures
Solving deceptive optimization tasks perfectly involves sampling many individuals with very
bad features and low ﬁtness. This contradicts the basic ideas of metaheuristics and thus,
there are no eﬃcient countermeasures against deceptivity. Using large population sizes,
maintaining a very high diversity, and utilizing linkage learning (see Section 17.3.3) are,
maybe, the only approaches which can provide at least a small chance of ﬁnding good
solutions.
A completely deceptive ﬁtness landscape would be even worse than the needleina
haystack problems outlined in Section 16.7 [? ]. Exhaustive enumeration of the search
space or searching by using random walks may be the only possible way to go in extremely
deceptive cases. However, the question what kind of practical problem would have such a
feature arises. The solution of a completely deceptive problem would be the one that, when
looking on all possible solutions, seems to be the most improbably one. However, when
actually testing it, this worstlooking point in the problem space turns out to be the best
possible solution. Such kinds of problem are, as far as the author of this book is concerned,
rather unlikely and rare.
15.3. COUNTERMEASURES 169
Tasks T15
55. Why is deceptiveness fatal for metaheuristic optimization? Why are nondeceptive
problems usually easier than deceptive ones?
[5 points]
56. Construct two diﬀerent deceptive objective functions f
1
, f
2
: R →R
+
by yourself. Like
the three functions from Task 54 on page 165, they should have the global minimum
(optimum) 0 with objective value 0. Try to solve these functions with a trivial imple
mentation of Hill Climbing (see Algorithm 26.1 on page 230) by applying it 30 times
to both of them. Note your observations regarding the problem hardness and compare
them to your results from Task 54.
[7 points]
Chapter 16
Neutrality and Redundancy
16.1 Introduction
unimodal notmuchgradient
distantfromoptimum
withsomeneutrality veryneutral
landscapedifficultyincreases
Figure 16.1: Landscape diﬃculty caused by neutrality.
Deﬁnition D16.1 (Neutrality). We consider the outcome of the application of a search
operation to an element of the search space as neutral if it yields no change in the objective
values [219, 2287].
It is challenging for optimization algorithms if the best candidate solution currently known
is situated on a plane of the ﬁtness landscape, i. e., all adjacent candidate solutions have
the same objective values. As illustrated in Figure 16.1 and Fig. 11.1.f on page 140, an
optimizer then cannot ﬁnd any gradient information and thus, no direction in which to
proceed in a systematic manner. From its point of view, each search operation will yield
identical individuals. Furthermore, optimization algorithms usually maintain a list of the
best individuals found, which will then overﬂow eventually or require pruning.
The degree of neutrality ν is deﬁned as the fraction of neutral results among all possible
products of the search operations applied to a speciﬁc genotype [219]. We can generalize
this measure to areas G in the search space G by averaging over all their elements. Regions
where ν is close to one are considered as neutral.
∀g
1
∈ G ⇒ ν(g
1
) =
[¦g
2
: P (g
2
= Op(g
1
)) > 0 ∧ f (gpm(g
2
)) = f (gpm(g
1
))¦[
[¦g
2
: P (g
2
= Op(g
1
)) > 0¦[
(16.1)
∀G ⊆ G ⇒ ν(G) =
1
[G[
g∈G
ν(g) (16.2)
172 16 NEUTRALITY AND REDUNDANCY
16.2 Evolvability
Another metaphor in Global Optimization which has been borrowed from biological sys
tems [1289] is evolvability
1
[699]. Wagner [2833, 2834] points out that this word has two
uses in biology: According to Kirschner and Gerhart [1543], a biological system is evolv
able if it is able to generate heritable, selectable phenotypic variations. Such properties
can then be spread by natural selection and changed during the course of evolution. In its
second sense, a system is evolvable if it can acquire new characteristics via genetic change
that help the organism(s) to survive and to reproduce. Theories about how the ability
of generating adaptive variants has evolved have been proposed by Riedl [2305], Wagner
and Altenberg [78, 2835], Bonner [357], and Conrad [624], amongst others. The idea of
evolvability can be adopted for Global Optimization as follows:
Deﬁnition D16.2 (Evolvability). The evolvability of an optimization process in its cur
rent state deﬁnes how likely the search operations will lead to candidate solutions with new
(and eventually, better) objectives values.
The direct probability of success [295, 2278], i. e., the chance that search operators produce
oﬀspring ﬁtter than their parents, is also sometimes referred to as evolvability in the context
of Evolutionary Algorithms [77, 80].
16.3 Neutrality: Problematic and Beneﬁcial
The link between evolvability and neutrality has been discussed by many researchers [2834,
3039]. The evolvability of neutral parts of a ﬁtness landscape depends on the optimization
algorithm used. It is especially low for Hill Climbing and similar approaches, since the search
operations cannot directly provide improvements or even changes. The optimization process
then degenerates to a random walk, as illustrated in Fig. 11.1.f on page 140. The work of
Beaudoin et al. [242] on the ND ﬁtness landscapes
2
shows that neutrality may “destroy”
useful information such as correlation.
Researchers in molecular evolution, on the other hand, found indications that the major
ity of mutations in biology have no selective inﬂuence [971, 1315] and that the transformation
from genotypes to phenotypes is a manytoone mapping. Wagner [2834] states that neutral
ity in natural genomes is beneﬁcial if it concerns only a subset of the properties peculiar to
the oﬀspring of a candidate solution while allowing meaningful modiﬁcations of the others.
Toussaint and Igel [2719] even go as far as declaring it a necessity for selfadaptation.
The theory of punctuated equilibria
3
, in biology introduced by Eldredge and Gould
[875, 876], states that species experience long periods of evolutionary inactivity which are
interrupted by sudden, localized, and rapid phenotypic evolutions [185].
4
It is assumed that
the populations explore neutral layers
5
during the time of stasis until, suddenly, a relevant
change in a genotype leads to a better adapted phenotype [2780] which then reproduces
quickly. Similar phenomena can be observed/are utilized in EAs [604, 1838].
“Uh?”, you may think, “How does this ﬁt together?” The key to diﬀerentiating between
“good” and “bad” neutrality is its degree ν in relation to the number of possible solutions
maintained by the optimization algorithms. Smith et al. [2538] have used illustrative ex
amples similar to Figure 16.2 showing that a certain amount of neutral reproductions can
foster the progress of optimization. In Fig. 16.2.a, basically the same scenario of premature
1
http://en.wikipedia.org/wiki/Evolvability [accessed 20070703]
2
See Section 50.2.3 on page 557 for a detailed elaboration on the ND ﬁtness landscape.
3
http://en.wikipedia.org/wiki/Punctuated_equilibrium [accessed 20080701]
4
A very similar idea is utilized in the Extremal Optimization method discussed in Chapter 41.
5
Or neutral networks, as discussed in Section 16.4.
16.4. NEUTRAL NETWORKS 173
convergence as in Fig. 13.1.a on page 152 is depicted. The optimizer is drawn to a local opti
mum from which it cannot escape anymore. Fig. 16.2.b shows that a little shot of neutrality
could form a bridge to the global optimum. The optimizer now has a chance to escape the
smaller peak if it is able to ﬁnd and follow that bridge, i. e., the evolvability of the system
has increased. If this bridge gets wider, as sketched in Fig. 16.2.c, the chance of ﬁnding the
global optimum increases as well. Of course, if the bridge gets too wide, the optimization
process may end up in a scenario like in Fig. 11.1.f on page 140 where it cannot ﬁnd any
direction. Furthermore, in this scenario we expect the neutral bridge to lead to somewhere
useful, which is not necessarily the case in reality.
globaloptimum
localoptimum
Fig. 16.2.a: Premature Conver
gence
Fig. 16.2.b: Small Neutral Bridge Fig. 16.2.c: Wide Neutral Bridge
Figure 16.2: Possible positive inﬂuence of neutrality.
Recently, the idea of utilizing the processes of molecular
6
and evolutionary
7
biology as
complement to Darwinian evolution for optimization gains interest [212]. Scientists like
Hu and Banzhaf [1286, 1287, 1289] began to study the application of metrics such as the
evolution rate of gene sequences [2973, 3021] to Evolutionary Algorithms. Here, the degree
of neutrality (synonymous vs. nonsynonymous changes) seems to play an important role.
Examples for neutrality in ﬁtness landscapes are the ND family (see Section 50.2.3),
the NKp and NKq models (discussed in Section 50.2.1.5), and the Royal Road (see Sec
tion 50.2.4). Another common instance of neutrality is bloat in Genetic Programming,
which is outlined in ?? on page ??.
16.4 Neutral Networks
From the idea of neutral bridges between diﬀerent parts of the search space as sketched by
Smith et al. [2538], we can derive the concept of neutral networks.
Deﬁnition D16.3 (Neutral Network). Neutral networks are equivalence classes K of
elements of the search space G which map to elements of the problem space X with the
same objective values and are connected by chains of applications of the search operators
Op [219].
∀g
1
, g
2
∈ G : g
1
∈ K(g
2
) ⊆ G ⇔ ∃k ∈ N
0
: P
_
g
2
= Op
k
(g
1
)
_
> 0 ∧
f (gpm(g
1
)) = f (gpm(g
2
)) (16.3)
6
http://en.wikipedia.org/wiki/Molecular_biology [accessed 20080720]
7
http://en.wikipedia.org/wiki/Evolutionary_biology [accessed 20080720]
174 16 NEUTRALITY AND REDUNDANCY
Barnett [219] states that a neutral network has the constant innovation property if
1. the rate of discovery of innovations keeps constant for a reasonably large amount of
applications of the search operations [1314], and
2. if this rate is comparable with that of an unconstrained random walk.
Networks with this property may prove very helpful if they connect the optima in the ﬁtness
landscape. Stewart [2611] utilizes neutral networks and the idea of punctuated equilibria
in his extrema selection, a Genetic Algorithm variant that focuses on exploring individuals
which are far away from the centroid of the set of currently investigated candidate solutions
(but have still good objective values). Then again, Barnett [218] showed that populations
in Genetic Algorithm tend to dwell in neutral networks of high dimensions of neutrality
regardless of their objective values, which (obviously) cannot be considered advantageous.
The convergence on neutral networks has furthermore been studied by BornbergBauer
and Chan [361], van Nimwegen et al. [2778, 2779], and Wilke [2942]. Their results show that
the topology of neutral networks strongly determines the distribution of genotypes on them.
Generally, the genotypes are “drawn” to the solutions with the highest degree of neutrality
ν on the neutral network Beaudoin et al. [242].
16.5 Redundancy: Problematic and Beneﬁcial
Deﬁnition D16.4 (Redundancy). Redundancy in the context of Global Optimization is
a feature of the genotypephenotype mapping and means that multiple genotypes map to
the same phenotype, i. e., the genotypephenotype mapping is not injective.
∃g
1
, g
2
: g
1
,= g
2
∧ gpm(g
1
) = gpm(g
2
) (16.4)
The role of redundancy in the genome is as controversial as that of neutrality [2880].
There exist many accounts of its positive inﬂuence on the optimization process. Ship
man et al. [2458, 2480], for instance, tried to mimic desirable evolutionary properties of
RNA folding [1315]. They developed redundant genotypephenotype mappings using voting
(both, via uniform redundancy and via a nontrivial approach), Turing machinelike binary
instructions, Cellular automata, and random Boolean networks [1500]. Except for the trivial
voting mechanism based on uniform redundancy, the mappings induced neutral networks
which proved beneﬁcial for exploring the problem space. Especially the last approach pro
vided particularly good results [2458, 2480]. Possibly converse eﬀects like epistasis (see
Chapter 17) arising from the new genotypephenotype mappings have not been considered
in this study.
Redundancy can have a strong impact on the explorability of the problem space. When
utilizing a onetoone mapping, the translation of a slightly modiﬁed genotype will always
result in a diﬀerent phenotype. If there exists a manytoone mapping between genotypes
and phenotypes, the search operations can create oﬀspring genotypes diﬀerent from the
parent which still translate to the same phenotype. The optimizer may now walk along a
path through this neutral network. If many genotypes along this path can be modiﬁed to
diﬀerent oﬀspring, many new candidate solutions can be reached [2480]. One example for
beneﬁcial redundancy is the extradimensional bypass idea discussed in Section 24.15.1.
The experiments of Shipman et al. [2479, 2481] additionally indicate that neutrality
in the genotypephenotype mapping can have positive eﬀects. In the Cartesian Genetic
Programming method, neutrality is explicitly introduced in order to increase the evolvability
(see ?? on page ??) [2792, 3042].
16.6. SUMMARY 175
Yet, Rothlauf [2338] and Shackleton et al. [2458] show that simple uniform redundancy
is not necessarily beneﬁcial for the optimization process and may even slow it down. Ronald
et al. [2324] point out that the optimization may be slowed down if the population of the
individuals under investigation contains many isomorphic genotypes, i. e., genotypes which
encode the same phenotype. If this isomorphismcan be identiﬁed an removed, a signiﬁcant
speedup may be gained [2324]. We can thus conclude that there is no use in introducing
encodings which, for instance, represent each phenotypic bit with two bits in the genotype
where 00 and 01 map to 0 and 10 and 11 map to 1. Another example for this issue is given
in Fig. 24.1.b on page 220.
16.6 Summary
Diﬀerent from ruggedness which is always bad for optimization algorithms, neutrality has
aspects that may further as well as hinder the process of ﬁnding good solutions. Generally
we can state that degrees of neutrality ν very close to 1 degenerate optimization processes
to random walks. Some forms of neutral networks accompanied by low (nonzero) values of
ν can improve the evolvability and hence, increase the chance of ﬁnding good solutions.
Adverse forms of neutrality are often caused by bad design of the search space or
genotypephenotype mapping. Uniform redundancy in the genome should be avoided where
possible and the amount of neutrality in the search space should generally be limited.
16.7 NeedleInAHaystack
Besides fully deceptive problems, one of the worst cases of ﬁtness landscapes is the needle
inahaystack (NIAH) problem [? ] sketched in Fig. 11.1.g on page 140, where the optimum
occurs as isolated [1078] spike in a plane. In other words, small instances of extreme rugged
ness combine with a general lack of information in the ﬁtness landscape. Such problems
are extremely hard to solve and the optimization processes often will converge prematurely
or take very long to ﬁnd the global optimum. An example for such ﬁtness landscapes is
the allornothing property often inherent to Genetic Programming of algorithms [2729], as
discussed in ?? on page ??.
176 16 NEUTRALITY AND REDUNDANCY
Tasks T16
57. Describe the possible positive and negative aspects of neutrality in the ﬁtness land
scape.
[10 points]
58. Construct two diﬀerent objective functions f
1
, f
2
: R → R
+
which have neutral parts
in the ﬁtness landscape. Like the three functions from Task 54 on page 165 and the
two from Task 56 on page 169, they should have the global minimum (optimum) 0
with objective value 0. Try to solve these functions with a trivial implementation of
Hill Climbing (see Algorithm 26.1 on page 230) by applying it 30 times to both of
them. Note your observations regarding the problem hardness and compare them to
your results from Task 54.
[7 points]
Chapter 17
Epistasis, Pleiotropy, and Separability
17.1 Introduction
17.1.1 Epistasis and Pleiotropy
In biology, epistasis
1
is deﬁned as a form of interaction between diﬀerent genes [2170].
The term was coined by Bateson [231] and originally meant that one gene suppresses the
phenotypical expression of another gene. In the context of statistical genetics, epistasis was
initially called “epistacy” by Fisher [928]. According to Lush [1799], the interaction between
genes is epistatic if the eﬀect on the ﬁtness of altering one gene depends on the allelic state
of other genes.
Deﬁnition D17.1 (Epistasis). In optimization, epistasis is the dependency of the con
tribution of one gene to the value of the objective functions on the allelic state of other
genes. [79, 692, 2007]
We speak of minimal epistasis when every gene is independent of every other gene. Then,
the optimization process equals ﬁnding the best value for each gene and can most eﬃciently
be carried out by a simple greedy search (see Section 46.3.1) [692]. A problem is maximally
epistatic when no proper subset of genes is independent of any other gene [2007, 2562]. Our
understanding and eﬀects of epistasis come close to another biological phenomenon:
Deﬁnition D17.2 (Pleiotropy). Pleiotropy
2
denotes that a single gene is responsible for
multiple phenotypical traits [1288, 2944].
Like epistasis, pleiotropy can sometimes lead to unexpected improvements but often is harm
ful for the evolutionary system [77]. Both phenomena may easily intertwine. If one gene
epistatically inﬂuences, for instance, two others which are responsible for distinct phenotyp
ical traits, it has both epistatic and pleiotropic eﬀects. We will therefore consider pleiotropy
and epistasis together and when discussing the eﬀects of the latter, also implicitly refer to
the former.
Example E17.1 (Pleiotropy, Epistasis, and a Dinosaur).
In Figure 17.1 we illustrate a ﬁctional dinosaur along with a snippet of its ﬁctional genome
1
http://en.wikipedia.org/wiki/Epistasis [accessed 20080531]
2
http://en.wikipedia.org/wiki/Pleiotropy [accessed 20080302]
178 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
gene1
color
gene2 gene3 gene4
length
length
shape
Figure 17.1: Pleiotropy and epistasis in a dinosaur’s genome.
consisting of four genes. Gene 1 inﬂuences the color of the creature and is neither pleiotropic
nor has any epistatic relations. Gene 2, however, exhibits pleiotropy since it determines the
length of the hind legs and forelegs. At the same time, it is epistatically connected with
gene 3 which too, inﬂuences the length of the forelegs – maybe preventing them from looking
exactly like the hind legs. The fourth gene is again pleiotropic by determining the shape of
the bone armor on the top of the dinosaur’s skull.
Epistasis can be considered as the higherorder or nonmain eﬀects in a model predicting
ﬁtness values based on interactions of the genes of the genotypes from the point of view of
Design of Experiments (DOE) [2285].
17.1.2 Separability
In the area of optimizing over continuous problem spaces (such as X ⊆ R
n
), epistasis and
pleiotropy are closely related to the term separability. Separbility is a feature of the objective
functions of an optimization problem [2668]. A function of n variables is separable if it can
be rewritten as a sum of n functions of just one variable [1162, 2099]. Hence, the decision
variables involved in the problem can be optimized independently of each other, i. e., are
minimally epistatic, and the problem is said to be separable according to the formal deﬁnition
given by Auger et al. [147, 1181] as follows:
Deﬁnition D17.3 (Separability). A function f : X
n
→ R (where usually X ⊆ R is
separable if and only if
arg min
(x
1
,..,x
n
)
f(x
1
, .., x
n
) = (arg min
x
1
f(x
1
, ..) , .., arg min
x
n
f(.., x
n
)) (17.1)
Otherwise, f(x) is called a nonseparable function.
17.2. THE PROBLEM 179
If a function f(x) is separable, the parameters x
1
, .., x
n
forming the candidate solution x ∈ X
are called independent. A separable problem is decomposable, as already discussed under
the topic of epistasis.
Deﬁnition D17.4 (Nonseparability). A function f : X
n
→ R (where usually x ⊆ R
is mnonseparable if at most m of its parameters x
i
are not independent. A nonseparable
function f(x) is called fullynonseparable if any two of its parameters x
i
are not independent.
The higher the degree of nonseparability, the harder a function will usually become more
diﬃcult for optimization [147, 1181, 2635]. Often, the term nonseparable is used in the sense
of fullynonseparable. In other words, in between separable and fully nonseparable problems,
a variety of partially separable problems exists. Deﬁnition D17.5 is, hence, an alternative
for Deﬁnition D17.4.
Deﬁnition D17.5 (Partial Separability). If an objective function f : X
n
→ R
+
can be
expressed as
f(x) =
M
i=1
f
i
_
x
I
i,1
, .., x
I
i,I
i

_
(17.2)
where each of the M component functions f
i
: X
I
i

→R depends on a subset I
i
⊆ ¦1..n¦ of
the n decision variables [612, 613, 1136, 1137].
17.1.3 Examples
Examples of problems with a high degree of epistasis are Kauﬀman’s NK ﬁtness land
scape [1499, 1501] (Section 50.2.1), the pSpin model [86] (Section 50.2.2), and the tun
able model by Weise et al. [2910] (Section 50.2.7). Nonseparable objective functions often
used for benchmarking are, for instance, the Griewank function [1106, 2934, 3026] (see Sec
tion 50.3.1.11) and Rosenbrock’s Function [3026] (see Section 50.3.1.5). Partially separable
test problems are given in, for example, [2670, 2708].
17.2 The Problem
As sketched in Figure 17.2, epistasis has a strong inﬂuence on many of the previously
discussed problematic features. If one gene can “turn oﬀ” or aﬀect the expression of other
genes, a modiﬁcation of this gene will lead to a large change in the features of the phenotype.
Hence, the causality will be weakened and ruggedness ensues in the ﬁtness landscape. It also
becomes harder to deﬁne search operations with exploitive character. Moreover, subsequent
changes to the “deactivated” genes may have no inﬂuence on the phenotype at all, which
would then increase the degree of neutrality in the search space. Bowers [375] found that
representations and genotypes with low pleiotropy lead to better and more robust solutions.
Epistasis and pleiotropy are mainly aspects of the way in which objective functions f ∈ f ,
the genome G, and the genotypephenotype mapping are deﬁned and interact. They should
be avoided where possible.
180 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
ruggedness
multi
modality
weakcausality
high
epistasis
º causes
neutrality
Needleina
Haystack
Figure 17.2: The inﬂuence of epistasis on the ﬁtness landscape.
Example E17.2 (Epistasis in Binary Genomes).
Consider the following three objective functions f
1
to f
3
all subject to maximization. All three
are based on a problem space consisting of all bit strings of length ﬁve
_
X
B
= B
5
= ¦1, 0¦
5
_
.
f
1
(x) =
i=0
4x[i] (17.3)
f
2
(x) = (x[0] ∗ x[1]) + (x[2] ∗ x[3]) +x[4] (17.4)
f
3
(x) =
i=0
4x[i] (17.5)
Furthermore, the search space be identical to the problem space, i. e., G = X. In the
optimization problem deﬁned by X
B
and f , the elements of the candidate solutions are
totally independent and can be optimized separately. Thus, f
1
is minimal epistatic. In f
2
,
only the ﬁrst two (x[0], x[1]) and second two (x[2], x[3]) genes of the each genotype inﬂuence
each other. Both groups are independent from each other and from x[4]. Hence, the problem
has some epistasis. f
3
is a needleinahaystack problem where all ﬁve genes directly inﬂuence
each other. A single zero in a candidate solution deactivates all other genes and the optimum
– the string containing all ones – is the only point in the search space with nonzero objective
values.
Example E17.3 (Epistasis in Genetic Programming).
Epistasis can be observed especially often Genetic Programming. In this area of Evolution
ary Computation the problem spaces are sets of all programs in (usually) domainspeciﬁc
languages and we try to ﬁnd candidate solutions which solve a given problem “algorithmi
cally”.
1 int gcd(int a, int b) { //skeleton code
2 int t; //skeleton code
3 while(b!=0) { // 
4 t = b; //
5 b = a % b; // Evolved Code
6 a = t; //
7 } // 
8 return a; //skeleton code
9 } //skeleton code
Listing 17.1: A program computing the greatest common divisor.
17.2. THE PROBLEM 181
Let us assume that we wish to evolve a program computing the greatest common divisor of
two natural numbers a, b ∈ N
1
in a Cstyle representation (X = C). Listing 17.1 would be
a program which fulﬁlls this task. As search space G we could, for instance, use all trees
where the inner nodes are chosen from while, =, !=, ==, %, +, , each accepting two child
nodes, and the leaf nodes are either xa, b, t, 1, or 0. The evolved trees could be pasted by
the genotypephenotype mapping into some skeleton code as indicated in Listing 17.1.
The genes of this representation are the nodes of the trees. A unary search operator
(mutation) could exchange a node inside a genotype with a randomly created one and a
binary search operator (crossover) could swap subtrees between two genotypes. Such a
mutation operator could, for instance, replace the “t = b” in Line 4 of Listing 17.1 with
a “t = a”. Although only a single gene of the program has been modiﬁed, its behavior
has now changed completely. Without stating any speciﬁc objective functions, it should be
clear that the new program will have very diﬀerent (functional) objective values. Changing
the instruction has epistatic impact on the behavior of the rest of the program – the loop
declared in Line 3, for instance, may now become an endless loop for many inputs a and b.
Example E17.4 (Epistasis in Combinatorial Optimization: Bin Packing).
In Example E1.1 on page 25 we introduced bin packing as a typical combinatorial optimiza
tion problem. Later, in Task 67 on page 238 we propose a solution approach to a variant of
this problem. As discussed in this task and sketched in Figure 17.3, we suggest representing
a solution to bin packing as a permutation of the n objects to be packed into the bin and
to set G = X = Π(n). In other words, a candidate solution describes the order in which
the objects have to be placed into the bin and is processed from left to right. Whenever
the current object does not ﬁt into the current bin anymore, a new bin is “opened”. An
implementation of this idea is given in Section 57.1.2.1 and an example output of a test
program applying Hill Climbing, Simulated Annealing, a random walk, and the random
sampling algorithm is listen in Listing 57.16 on page 903.
As can be seen in that listing of results, neither Hill Climbing nor Simulated Annealing
can come very close to the known optimal solutions. One reason for this performance is
that the permutation representation for bin packing exhibits some epistasis. In Figure 17.3,
we sketch how the exchange of two genes near the beginning of a candidate solution x
1
can
lead to a new solution x
2
with a very diﬀerent behavior: Due to the sequential packing of
objects into bins according to the permutation described by the candidate solutions, object
groupings way after the actually swapped objects may also be broken down. In other words,
the genes at the beginning of the representation have a strong inﬂuence on the behavior at
its end. Interestingly (and obviously), this inﬂuence is not mutual: the genes at the end of
the permutation have no inﬂuence on the genes preceding them.
Suggestions for a better encoding, an extension of the permutation encoding with so
called separators marking the beginning and ending of a bin, have been made by Jones and
Beltramo [1460] and Stawowy [2604]. Khuri et al. [1534] suggest a method which works
completely without permutations: Each gene of a string of length n stands for one of the
n objects and its (numerical) allele denotes the bin into which the object belongs. Though
this solution allows for infeasible solutions, it does not suﬀer from the epistasis phenomenon
discussed in this example.
182 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
0
a =2
0
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
6
a =6
6
7
a =7
7
8
a =8
8
9
a =9
9
10
a =10
10
11
a =12
11
g =
1
x =( ,11,6, ,2,9,4,8,5,7,1,3)
1
10 0 g =
1
x =( ,11,6, ,2,9,4,8,5,7,1,3)
2
0 10
0
a =2
0
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
6
a =6
6
7
a =7
7
8
a =8
8
9
a =9
9
10
a =10
10
11
a =12
11
bin1 bin2 bin3 bin4 bin5 bin6
b
i
n
s
i
z
e
b
=
1
4
12 objects (with identifiers 0 to 11) and
sizesa=(2,3,3,3,4,5,5,7,8,9,11,12)
shouldbepackedintobinsofsizeb=14
0
a =2
0
10
a =10
10
0
a =2
0
6
a =6
6
10
a =10
10
11
a =12
11
bin1 bin2 bin3 bin4 bin5 bin6 bin7
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
7
a =7
7
8
a =8
8
9
a =9
9
0
a =2
0
10
a =10
10
swapthefirstandfourthelementofthepermutation
Firstcandidatesolutionx requires6bins.
1
Candidatex ,aslightlymodifiedversionofx ,
requires7binsandhasaverydifferent
,
behavior
,
.
2 1
Figure 17.3: Epistasis in the permutationrepresentation for bin packing.
Generally, epistasis and conﬂicting objectives in multiobjective optimization should be dis
tinguished from each other. Epistasis as well as pleiotropy is a property of the inﬂuence of
the editable elements (the genes) of the genotypes on the phenotypes. Objective functions
can conﬂict without the involvement of any of these phenomena. We can, for example, deﬁne
two objective functions f
1
(x) = x and f
2
(x) = −x which are clearly contradicting regardless
of whether they both are subject to maximization or minimization. Nevertheless, if the can
didate solutions x and the genotypes are simple real numbers and the genotypephenotype
mapping is an identity mapping, neither epistatic nor pleiotropic eﬀects can occur.
Naudts and Verschoren [2008] have shown for the special case of lengthtwo binary string
genomes that deceptiveness does not occur in situations with low epistasis and also that
objective functions with high epistasis are not necessarily deceptive. [239] state the de
ceptiveness is a special case of epistatic interaction (while we would rather say that it is
a phenomenon caused by epistasis). Another discussion about diﬀerent shapes of ﬁtness
landscapes under the inﬂuence of epistasis is given by Beerenwinkel et al. [248].
17.3 Countermeasures
We have shown that epistasis is a root cause for multiple problematic features of optimization
tasks. General countermeasures against epistasis can be divided into two groups. The
symptoms of epistasis can be mitigated with the same methods which increase the chance
of ﬁnding good solutions in the presence of ruggedness or neutrality.
17.3.1 Choice of the Representation
Epistasis itself is a feature which results from the choice of the search space structure, the
17.3. COUNTERMEASURES 183
search operations, the genotypephenotype mapping, and the structure of the problem space.
Avoiding epistatic eﬀects should be a major concern during their design.
Example E17.5 (Bad Representation Design: Algorithm 1.1).
Algorithm 1.1 on page 40 is a good example for bad search space design. There, the
assignment of n objects to k bins in order to solve the bin packing problem is represented
as a permutation of the ﬁrst n natural numbers. The genotypephenotype mapping starts
to take numbers from the start of the genotype sequentially and places the corresponding
objects into the ﬁrst bin. If the object identiﬁed by the current gene does not ﬁll into the
bin anymore, the GPM continues with the next bin.
Representing an assignment of objects to bins like this has one drawback: if the genes at
lower loci, i. e., close to the beginning of the genotype, are modiﬁed, this may change many
subsequent assignments. If a large object is swapped from the back to an early bin, the
other objects may be shifted backwards and many of them will land in diﬀerent bins than
before. The meaning of the gene at locus i in the genotype depends on the allelic states of
all genes at loci smaller than i. Hence, this representation is quite epistatic.
Interestingly, using the representation suggested in Example E1.1 on page 25 and in
Equation 1.2 directly as search space would not only save us the work of implementing a
genotypephenotype mapping, it would also exhibit signiﬁcantly lower epistasis.
Choosing the problem space and the genotypephenotype mapping eﬃciently can lead to a
great improvement in the quality of the solutions produced by the optimization process [2888,
2905]. Introducing specialized search operations can achieve the same [2478].
We outline some general rules for representation design in Chapter 24. Considering these
suggestions may further improve the performance of an optimizer.
17.3.2 Parameter Tweaking
Using larger populations and favoring explorative search operations seems to be helpful in
epistatic problems since this should increase the diversity. Shinkai et al. [2478] showed that
applying (a) higher selection pressure, i. e., increasing the chance to pick the best candidate
solutions for further investigation instead of the weaker ones, and (b) extinctive selection,
i. e., only working with the newest produced set of candidate solutions while discarding
the ones from which they were derived, can also increase the reliability of an optimizer to
ﬁnd good solutions. These two concepts are slightly contradicting, so careful adjustment of
the algorithm settings appears to be vital in epistatic environments. Shinkai et al. [2478]
observed higher selection pressure also leads to earlier convergence, a fact that we pointed
out in Chapter 13.
17.3.3 Linkage Learning
According to Winter et al. [2960], linkage is “the tendency for alleles of diﬀerent genes to
be passed together from one generation to the next” in genetics. This usually indicates that
these genes are closely located in the same chromosome. In the context of Evolutionary
Algorithms, this notation is not useful since identifying spatially close elements inside the
genotypes g ∈ G is trivial. Instead, we are interested in alleles of diﬀerent genes which have
a joint eﬀect on the ﬁtness [1980, 1981].
Identifying these linked genes, i. e., learning their epistatic interaction, is very helpful
for the optimization process. Such knowledge can be used to protect building blocks
3
from
being destroyed by the search operations (such as crossover in Genetic Algorithms), for
instance. Finding approaches for linkage learning has become an especially popular discipline
3
See Section 29.5.5 for information on the Building Block Hypothesis.
184 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
in the area of Evolutionary Algorithms with binary [552, 1196, 1981] and real [754] genomes.
Two important methods from this area are the messy GA (mGA, see Section 29.6) by
Goldberg et al. [1083] and the Bayesian Optimization Algorithm (BOA) [483, 2151]. Module
acquisition [108] may be considered as such an eﬀort.
17.3.4 Cooperative Coevolution
Variable Interaction Learning (VIL) techniques can be used to discover the relations be
tween decision variables, as introduced by Chen et al. [551] and discussed in the context of
cooperative coevolution in Section 21.2.3 on page 204.
17.3. COUNTERMEASURES 185
Tasks T17
59. What is the diﬀerence between epistasis and pleiotropy?
[5 points]
60. Why does epistasis cause ruggedness in the ﬁtness landscape?
[5 points]
61. Why can epistasis cause neutrality in the ﬁtness landscape?
[5 points]
62. Why does pleiotropy cause ruggedness in the ﬁtness landscape?
[5 points]
63. How are epistasis and separability related with each other?
[5 points]
Chapter 18
Noise and Robustness
18.1 Introduction – Noise
In the context of optimization, three types of noise can be distinguished. The ﬁrst form is
noise in the training data used as basis for learning or the objective functions [3019] (i). In
many applications of machine learning or optimization where a model for a given system is
to be learned, data samples including the input of the system and its measured response are
used for training.
Example E18.1 (Noise in the Training Data).
Some typical examples of situations where training data is the basis for the objective function
evaluation are
1. the usage of Global Optimization for building classiﬁers (for example for predicting
buying behavior using data gathered in a customer survey for training),
2. the usage of simulations for determining the objective values in Genetic Programming
(here, the simulated scenarios correspond to training cases), and
3. the ﬁtting of mathematical functions to (x, y)data samples (with artiﬁcial neural
networks or symbolic regression, for instance).
Since no measurement device is 100% accurate and there are always random errors, noise is
present in such optimization problems.
Besides inexactnesses and ﬂuctuations in the input data of the optimization process, pertur
bations are also likely to occur during the application of its results. This category subsumes
the other two types of noise: perturbations that may arise from (ii) inaccuracies in the pro
cess of realizing the solutions and (iii) environmentally induced perturbations during the
applications of the products.
Example E18.2 (Creating the perfect Tire).
This issue can be illustrated by using the process of developing the perfect tire for a car
as an example. As input for the optimizer, all sorts of material coeﬃcients and geometric
constants measured from all known types of wheels and rubber could be available. Since
these constants have been measured or calculated from measurements, they include a certain
degree of noise and imprecision (i).
188 18 NOISE AND ROBUSTNESS
The result of the optimization process will be the best tire construction plan discovered
during its course and it will likely incorporate diﬀerent materials and structures. We would
hope that the tires created according to the plan will not fall apart if, accidently, an extra
0.0001% of a speciﬁc rubber component is used (ii). During the optimization process,
the behavior of many construction plans will be simulated in order to ﬁnd out about their
utility. When actually manufactured, the tires should not behave unexpectedly when used in
scenarios diﬀerent from those simulated (iii) and should instead be applicable in all driving
situations likely to occur.
The eﬀects of noise in optimization have been studied by various researchers; Miller and
Goldberg [1897, 1898], Lee and Wong [1703], and Gurin and Rastrigin [1155] are some of
them. Many Global Optimization algorithms and theoretical results have been proposed
which can deal with noise. Some of them are, for instance, specialized
1. Genetic Algorithms [932, 1544, 2389, 2390, 2733, 2734],
2. Evolution Strategies [168, 294, 1172], and
3. Particle Swarm Optimization [1175, 2119] approaches.
18.2 The Problem: Need for Robustness
The goal of Global Optimization is to ﬁnd the global optima of the objective functions. While
this is fully true from a theoretical point of view, it may not suﬃce in practice. Optimization
problems are normally used to ﬁnd good parameters or designs for components or plans to
be put into action by human beings or machines. As we have already pointed out, there will
always be noise and perturbations in practical realizations of the results of optimization.
There is no process in the world that is 100% accurate and the optimized parameters,
designs, and plans have to tolerate a certain degree of imprecision.
Deﬁnition D18.1 (Robustness). A system in engineering or biology isrobust if it is able
to function properly in the face of genetic or environmental perturbations [1288, 2833].
Therefore, a local optimum (or even a nonoptimal element) for which slight disturbances
only lead to gentle performance degenerations is usually favored over a global optimum
located in a highly rugged area of the ﬁtness landscape [395]. In other words, local optima in
regions of the ﬁtness landscape with strong causality are sometimes better than global optima
with weak causality. Of course, the level of this acceptability is applicationdependent.
Figure 18.1 illustrates the issue of local optima which are robust vs. global optima which
are not. Table 18.1 gives some examples on problems which require robust optimization.
Example E18.3 (Critical Robustness).
When optimizing the control parameters of an airplane or a nuclear power plant, the global
optimum is certainly not used if a slight perturbation can have hazardous eﬀects on the
system [2734].
Example E18.4 (Robustness in Manufacturing).
Wiesmann et al. [2936, 2937] bring up the topic of manufacturing tolerances in multilayer
optical coatings. It is no use to ﬁnd optimal conﬁgurations if they only perform optimal
when manufactured to a precision which is either impossible or too hard to achieve on a
constant basis.
18.2. THE PROBLEM: NEED FOR ROBUSTNESS 189
Example E18.5 (Dynamic Robustness in RoadSalting).
The optimization of the decision process on which roads should be precautionary salted for
areas with marginal winter climate is an example of the need for dynamic robustness. The
global optimum of this problem is likely to depend on the daily (or even current) weather
forecast and may therefore be constantly changing. Handa et al. [1177] point out that it is
practically infeasible to let road workers follow a constantly changing plan and circumvent
this problem by incorporating multiple road temperature settings in the objective function
evaluation.
Example E18.6 (Robustness in Nature).
Tsutsui et al. [2733, 2734] found a nice analogy in nature: The phenotypic characteristics
of an individual are described by its genetic code. During the interpretation of this code,
perturbations like abnormal temperature, nutritional imbalances, injuries, illnesses and so
on may occur. If the phenotypic features emerging under these inﬂuences have low ﬁtness,
the organism cannot survive and procreate. Thus, even a species with good genetic material
will die out if its phenotypic features become too sensitive to perturbations. Species robust
against them, on the other hand, will survive and evolve.
globaloptimum
robustlocaloptimum
f(x)
X
Figure 18.1: A robust local optimum vs. a “unstable” global optimum.
Table 18.1: Applications and Examples of robust methods.
Area References
Combinatorial Problems [644, 1177]
Distributed Algorithms and Sys
tems
[52, 1401, 2898]
190 18 NOISE AND ROBUSTNESS
Engineering [52, 644, 1134, 1135, 2936, 2937]
Graph Theory [52, 63, 67, 644, 1401]
Image Synthesis [375]
Logistics [1177]
Mathematics [2898]
Motor Control [489]
MultiAgent Systems [2925]
Software [1401]
Wireless Communication [63, 67, 644]
18.3 Countermeasures
For the special case where the phenome is a real vector space (X ⊆ R
n
), several approaches
for dealing with the need for robustness have been developed. Inspired by Taguchi meth
ods
1
[2647], possible disturbances are represented by a vector δ = (δ
1
, δ
2
, .., δ
n
)
T
, δ
i
∈
R in the method suggested by Greiner [1134, 1135]. If the distributions and inﬂu
ences of the δ
i
are known, the objective function f(x) : x ∈ X can be rewritten as
˜
f(x, δ) [2937]. In the special case where δ is normally distributed, this can be simpliﬁed
to
˜
f
_
(x
1
+δ
1
, x
2
+δ
2
, .., x
n
+δ
n
)
T
_
. It would then make sense to sample the probability
distribution of δ a number of t times and to use the mean values of
˜
f(x, δ) for each objective
function evaluation during the optimization process. In cases where the optimal value y
⋆
of the objective function f is known, Equation 18.1 can be minimized. This approach is
also used in the work of Wiesmann et al. [2936, 2937] and basically turns the optimization
algorithm into something like a maximum likelihood estimator (see Section 53.6.2 and Equa
tion 53.257 on page 695).
f
′
(x) =
1
t
t
i=1
_
y
⋆
−
˜
f(x, δ
i
)
_
2
(18.1)
This method corresponds to using multiple, diﬀerent training scenarios during the objective
function evaluation in situations where X ,⊆ R
n
. By adding random noise and artiﬁcial
perturbations to the training cases, the chance of obtaining robust solutions which are
stable when applied or realized under noisy conditions can be increased.
1
http://en.wikipedia.org/wiki/Taguchi_methods [accessed 20080719]
Chapter 19
Overﬁtting and Oversimpliﬁcation
In all scenarios where optimizers evaluate some of the objective values of the candidate
solutions by using training data, two additional phenomena with negative inﬂuence can be
observed: overﬁtting and (what we call) oversimpliﬁcation. This is especially true if the
solutions represent functional dependencies inside the training data S
t
.
Example E19.1 (Function Approximation).
Let us assume that S
t
⊆ S be the training data sample from a space S of possible data.
Let us assume in the following that this space would be the cross product S = A B of a
space B containing the values of the variables whose dependencies on input variables from
A should be learned. The structure of each single element s[i] of the training data sample
S
t
= (s[0], s[1], .., s[n −1]) is s[i] = (a[i], b[i]) : a[i] ∈ A, b[i] ∈ B. The candidate solutions would
actually be functions x : A →B and the goal is to ﬁnd the optimal function x
⋆
which exactly
represents the (previously unknown) dependencies between A and B as they occur in S
t
,
i. e., a function with x
⋆
(a[i]) ≈ b[i] ∀i ∈ 1..n.
Artiﬁcial neural networks, (normal) regression, and symbolic regression are the most
common techniques to solve such problems, but other techniques such as Particle Swarm
Optimization, Evolution Strategies, and Diﬀerential Evolution can be used too. Then, the
candidate solutions may, for instance, be polynomials x(a) = x
0
+ x
1
a
1
+ x
2
a
2
+ . . . or
otherwise structured functions and only the coeﬃcients x
0
, x
1
, x
2
. . . would be optimized.
When trying to good candidate solutions x : A → B, possible objective functions (subject
to minimization) f
E
(x) =
n−1
i=0
[x(a[i]) −b[i][ or f
SE
(x) =
n−1
i=0
(x(a[i]) −b[i])
2
.
19.1 Overﬁtting
19.1.1 The Problem
Overﬁtting
1
is the emergence of an overly complicated candidate solution in an optimization
process resulting from the eﬀort to provide the best results for as much of the available
training data as possible [792, 1040, 2397, 2531].
Deﬁnition D19.1 (Overﬁtting). A candidate solution ` x ∈ X optimized based on a set
of training data S
t
is considered to be overﬁtted if a less complicated, alternative solution
x
⋆
∈ X exists which has a smaller (or equal) error for the set of all possible (maybe even
inﬁnitely many), available, or (theoretically) producible data samples S ∈ S.
1
http://en.wikipedia.org/wiki/Overfitting [accessed 20070703]
192 19 OVERFITTING AND OVERSIMPLIFICATION
Hence, overﬁtting also stands for the creation of solutions which are not robust. Notice
that x
⋆
may, however, have a larger error in the training data S
t
than ` x. The phenomenon
of overﬁtting is best known and can often be encountered in the ﬁeld of artiﬁcial neural
networks or in curve ﬁtting
2
[1697, 1742, 2334, 2396, 2684] as described in Example E19.1.
Example E19.2 (Example E19.1 Cont.).
There exists exactly one polynomial
3
of the degree n − 1 that ﬁts to each such training
data and goes through all its points.
4
Hence, when only polynomial regression is performed,
there is exactly one perfectly ﬁtting function of minimal degree. Nevertheless, there will
also be an inﬁnite number of polynomials with a higher degree than n −1 that also match
the sample data perfectly. Such results would be considered as overﬁtted.
In Figure 19.1, we have sketched this problem. The function x
⋆
(a) = b for A = B = R
shown in Fig. 19.1.b has been sampled three times, as sketched in Fig. 19.1.a. There exists
no other polynomial of a degree of two or less that ﬁts to these samples than x
⋆
. Optimizers,
however, could also ﬁnd overﬁtted polynomials of a higher degree such as ` x in Fig. 19.1.c
which also match the data. Here, ` x plays the role of the overly complicated solution which
will perform as good as the simpler model x
⋆
when tested with the training sets only, but
will fail to deliver good results for all other input data.
a
b
Fig. 19.1.a: Three sample points
of a linear function.
a
b
x
★
Fig. 19.1.b: The optimal solution
x
⋆
.
a
b
x
Fig. 19.1.c: The overly complex
variant ` x.
Figure 19.1: Overﬁtting due to complexity.
A very common cause for overﬁtting is noise in the sample data. As we have already pointed
out, there exists no measurement device for physical processes which delivers perfect results
without error. Surveys that represent the opinions of people on a certain topic or randomized
simulations will exhibit variations from the true interdependencies of the observed entities,
too. Hence, data samples based on measurements will always contain some noise.
Example E19.3 (Overﬁtting caused by Noise).
In Figure 19.2 we have sketched how such noise may lead to overﬁtted results. Fig. 19.2.a
illustrates a simple physical process obeying some quadratic equation. This process has been
measured using some technical equipment and the n = 100 = [S[ noisy samples depicted in
Fig. 19.2.b have been obtained. Fig. 19.2.c shows a function ` x resulting from optimization
that ﬁts the data perfectly. It could, for instance, be a polynomial of degree 99 which goes
right through all the points and thus, has an error of zero. Hence, both f
E
(` x) and f
SQ
(` x)
from Example E19.1 would be zero. Although being a perfect match to the measurements,
this complicated solution does not accurately represent the physical law that produced the
sample data and will not deliver precise results for new, diﬀerent inputs.
2
We will discuss overﬁtting in conjunction with Genetic Programmingbased symbolic regression in Sec
tion 49.1 on page 531.
3
http://en.wikipedia.org/wiki/Polynomial [accessed 20070703]
4
http://en.wikipedia.org/wiki/Polynomial_interpolation [accessed 20080301]
19.1. OVERFITTING 193
a
b
x
★
Fig. 19.2.a: The original physical
process.
a
b
Fig. 19.2.b: The measured train
ing data.
a
b
x
Fig. 19.2.c: The overﬁtted result
` x.
Figure 19.2: Fitting noise.
From the examples we can see that the major problem that results from overﬁtted solutions
is the loss of generality.
Deﬁnition D19.2 (Generality). A solution of an optimization process is general if it is
not only valid for the sample inputs s[0], s[1], .., s[n −1] ∈ S
t
which were used for training
during the optimization process, but also for diﬀerent inputs s ,∈ S
t
if such inputs exist.
19.1.2 Countermeasures
There exist multiple techniques which can be utilized in order to prevent overﬁtting to a
certain degree. It is most eﬃcient to apply multiple such techniques together in order to
achieve best results.
19.1.2.1 Restricting the Problem Space
A very simple approach is to restrict the problem space X in a way that only solutions up
to a given maximum complexity can be found. In terms of polynomial function ﬁtting, this
could mean limiting the maximum degree of the polynomials to be tested, for example.
19.1.2.2 Complement Optimization Criteria
Furthermore, the functional objective functions (such as those deﬁned in Example E19.1)
which solely concentrate on the error of the candidate solutions should be augmented by
penalty terms and nonfunctional criteria which favor small and simple models [792, 1510].
19.1.2.3 Using much Training Data
Large sets of sample data, although slowing down the optimization process, may improve the
generalization capabilities of the derived solutions. If arbitrarily many training datasets or
training scenarios can be generated, there are two approaches which work against overﬁtting:
1. The ﬁrst method is to use a new set of (randomized) scenarios for each evaluation
of each candidate solution. The resulting objective values then may diﬀer largely
even if the same individual is evaluated twice in a row, introducing incoherence and
ruggedness into the ﬁtness landscape.
194 19 OVERFITTING AND OVERSIMPLIFICATION
2. At the beginning of each iteration of the optimizer, a new set of (randomized) scenarios
is generated which is used for all individual evaluations during that iteration. This
method leads to objective values which can be compared without bias. They can be
made even more comparable if the objective functions are always normalized into some
ﬁxed interval, say [0, 1].
In both cases it is helpful to use more than one training sample or scenario per evaluation
and to set the resulting objective value to the average (or better median) of the outcomes.
Otherwise, the ﬂuctuations of the objective values between the iterations will be very large,
making it hard for the optimizers to follow a stable gradient for multiple steps.
19.1.2.4 Limiting Runtime
Another simple method to prevent overﬁtting is to limit the runtime of the optimizers [2397].
It is commonly assumed that learning processes normally ﬁrst ﬁnd relatively general solutions
which subsequently begin to overﬁt because the noise “is learned”, too.
19.1.2.5 Decay of Learning Rate
For the same reason, some algorithms allow to decrease the rate at which the candidate
solutions are modiﬁed by time. Such a decay of the learning rate makes overﬁtting less
likely.
19.1.2.6 Dividing Data into Training and Test Sets
If only one ﬁnite set of data samples is available for training/optimization, it is common
practice to separate it into a set of training data S
t
and a set of test cases S
c
. During the
optimization process, only the training data is used. The resulting solutions x are tested
with the test cases afterwards. If their behavior is signiﬁcantly worse when applied to S
c
than when applied to S
t
, they are probably overﬁtted.
The same approach can be used to detect when the optimization process should be
stopped. The best known candidate solutions can be checked with the test cases in each
iteration without inﬂuencing their objective values which solely depend on the training data.
If their performance on the test cases begins to decrease, there are no beneﬁts in letting the
optimization process continue any further.
19.2 Oversimpliﬁcation
19.2.1 The Problem
With the term oversimpliﬁcation (or overgeneralization) we refer to the opposite of overﬁt
ting. Whereas overﬁtting denotes the emergence of overly complicated candidate solutions,
oversimpliﬁed solutions are not complicated enough. Although they represent the training
samples used during the optimization process seemingly well, they are rough overgeneral
izations which fail to provide good results for cases not part of the training.
A common cause for oversimpliﬁcation is sketched in Figure 19.3: The training sets only
represent a fraction of the possible inputs. As this is normally the case, one should always
be aware that such an incomplete coverage may fail to represent some of the dependencies
and characteristics of the data, which then may lead to oversimpliﬁed solutions. Another
possible reason for oversimpliﬁcation is that ruggedness, deceptiveness, too much neutrality,
19.2. OVERSIMPLIFICATION 195
or high epistasis in the ﬁtness landscape may lead to premature convergence and prevent
the optimizer from surpassing a certain quality of the candidate solutions. It then cannot
adapt them completely even if the training data perfectly represents the sampled process. A
third possible cause is that a problem space could have been chosen which does not include
the correct solution.
Example E19.4 (Oversimpliﬁed Polynomial).
Fig. 19.3.a shows a cubic function. Since it is a polynomial of degree three, four sample
points are needed for its unique identiﬁcation. Maybe not knowing this, only three samples
have been provided in Fig. 19.3.b. By doing so, some vital characteristics of the function
are lost. Fig. 19.3.c depicts a square function – the polynomial of the lowest degree that ﬁts
exactly to these samples. Although it is a perfect match, this function does not touch any
other point on the original cubic curve and behaves totally diﬀerently at the lower parameter
area.
However, even if we had included point P in our training data, it is still possible that the
optimization process could yield Fig. 19.3.c as a result. Having training data that correctly
represents the sampled system does not mean that the optimizer is able to ﬁnd a correct
solution with perfect ﬁtness – the other, previously discussed problematic phenomena can
prevent it from doing so. Furthermore, if it was not known that the system which was to be
modeled by the optimization process can best be represented by a polynomial of the third
degree, one could have limited the problem space X to polynomials of degree two and less.
Then, the result would likely again be something like Fig. 19.3.c, regardless of how many
training samples are used.
a
b
P
Fig. 19.3.a: The “real system”
and the points describing it.
a
b
Fig. 19.3.b: The sampled training
data.
b
a
Fig. 19.3.c: The oversimpliﬁed re
sult ` x.
Figure 19.3: Oversimpliﬁcation.
19.2.2 Countermeasures
In order to counter oversimpliﬁcation, its causes have to be mitigated.
19.2.2.1 Using many Training Cases
It is not always possible to have training scenarios which cover the complete input space
A. If a program is synthesized Genetic Programming which has an 32bit integer number
as input, we would need 2
32
= 4 294 967 296 samples. By using multiple scenarios for each
individual evaluation, the chance of missing important aspects is decreased. These scenarios
can be replaced with new, randomly created ones in each generation, in order to decrease
this chance even more.
196 19 OVERFITTING AND OVERSIMPLIFICATION
19.2.2.2 Using a larger Problem Space
The problem space, i. e., the representation of the candidate solutions, should further be
chosen in a way which allows constructing a correct solution to the problem deﬁned. Then
again, releasing too many constraints on the solution structure increases the risk of overﬁt
ting and thus, careful proceeding is recommended.
Chapter 20
Dimensionality (Objective Functions)
20.1 Introduction
The most common way to deﬁne optima in multiobjective problems (MOPs, see Deﬁni
tion D3.6 on page 55) is to use the Pareto domination relation discussed in Section 3.3.5
on page 65. A candidate solution x
1
∈ X is said to dominate another candidate solution
x
2
∈ X (x
1
⊣ x
2
) if and only if its corresponding vector of objective values f (x
1
) is (par
tially) less than the one of x
2
, i. e., i ∈ 1..n ⇒ f
i
(x
1
) ≤ f
i
(x
2
) and ∃i ∈ 1..n : f
i
(x
1
) < f
i
(x
2
)
(in minimization problems, see Deﬁnition D3.13). The domination rank
#
dom(x, X) of a
candidate solution x ∈ X is the number of elements in a reference set X ⊆ X it is dominated
by (Deﬁnition D3.16), a nondominated element (relative to X) has a domination rank of
#
dom(x, X) = 0, and the solutions considered to be global optima are nondominated for
the whole problem space X (
#
dom(x, X) = 0, Deﬁnition D3.14). These are the elements we
would like to ﬁnd with optimization or, in the case of probabilistic metaheuristics, at least
wish to approximate as closely as possible.
Deﬁnition D20.1 (Dimension of MOP). We refer to the number n = [f [ of objective
functions f
1
, f
2
, .., f
n
∈ f of an optimization problem as its dimension (or dimensionality).
Many studies in the literature consider mainly biobjective MOPs [1237, 1530]. As a conse
quence, many algorithms are designed to deal with that kind of problems. However, MOPs
having a higher number of objective functions are common in practice – sometimes the
number of objectives reaches double ﬁgures [599] – leading to the socalled manyobjective
optimization [2228, 2915]. This term has been coined by the Operations Research commu
nity to denote problems with more than two or three objective functions [894, 3099]. It is
currently a hot research topic with the ﬁrst conference on evolutionary multiobjective opti
mization held in 2001 [3099]. Table 20.1 gives some examples for manyobjective situations.
Table 20.1: Applications and Examples of ManyObjective Optimization.
Area References
Combinatorial Problems [1417, 1437]
Computer Graphics [1002, 1577, 1893, 2060]
198 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
Data Mining [2213]
Decision Making [1893]
Software [2213]
20.2 The Problem: ManyObjective Optimization
When the dimension of the MOPs increases, the majority of candidate solutions are non
dominated. As a consequence, traditional natureinspired algorithms have to be redesigned:
According to Hughes [1303] and on basis of the results provided by Purshouse and Fleming
[2231], it can be assumed that purely Paretobased optimization approaches (maybe with
added diversity preservation methods) will not perform good in problems with four or more
objectives. Results from the application of such algorithms (NSGAII [749], SPEA2 [3100],
PESA [639]) to two or three objectives cannot simply be extrapolated to higher numbers
of optimization criteria [1530]. Kang et al. [1491] consider Pareto optimality as unfair
and imperfect in manyobjective problems. Hughes [1303] indicates that the following two
hypotheses may hold:
1. An optimizer that produces an entire Pareto set in one run is better than generating the
Pareto set through many singleobjective optimizations using an aggregation approach
such as weighted minmax (see Section 3.3.4) if the number of objective function
evaluations is ﬁxed.
2. Optimizers that use Pareto ranking based methods to sort the population will be very
eﬀective for small numbers of objectives, but not perform as eﬀectively for many
objective optimization in comparison with methods based on other approaches.
The results of Jaszkiewicz [1437] and Ishibuchi et al. [1418] further demonstrate the degener
ation of the performance of the traditional multiobjective metaheuristics in manyobjective
problems in comparison with singleobjective approaches. Ikeda et al. [1400] show that vari
ous elements, which are apart from the true Pareto frontier, may survive as hardlydominated
solutions which leads to a decrease in the probability of producing new candidate solutions
dominating existing ones. This phenomenon is called dominance resistant. Deb et al. [750]
recognize this problem of redundant solutions and incorporate an example function provok
ing it into their test function suite for realvalued, multiobjective optimization.
Additionally to this algorithmsided limitations, Bouyssou [374] suggests that based on,
for instance, the limited capabilities of the human mind [1900], an operator will not be able
to make eﬃcient decisions if more than a dozen objectives are involved. Visualizing the
solutions in a humanunderstandable way becomes more complex with the rising number of
dimensions, too [1420].
Example E20.1 (NonDominated Elements in Random Samples).
Deb [740] showed that the number of nondominated elements in random samples increases
quickly with the dimension. The hypersurface of the Pareto frontier may increase exponen
tially with the number of objective functions [1420]. Like Ishibuchi et al. [1420], we want to
illustrate this issue with an experiment.
Imagine that a populationbased optimization approach was used to solve a many
objective problem with n = [f [ objective functions. At startup, the algorithm will ﬁll
the initial population pop with ps randomly created individuals p. If we assume the problem
space X to be continuous and the creation process suﬃciently random, no two individuals
in pop will be alike. If the objective functions are locally invertible and suﬃciently indepen
dent, we can assume that there will be no two candidate solution in pop for which any of
the objectives takes on the same value.
20.2. THE PROBLEM: MANYOBJECTIVE OPTIMIZATION 199
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.a: ps = 5
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.b: ps = 10
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.c: ps = 50
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.d: ps = 100
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.e: ps = 500
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.f: ps = 1000
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)

no.ofobjectivesn
Fig. 20.1.g: ps = 3000
Figure 20.1: Approximated distribution of the domination rank in random samples for
several population sizes.
200 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
The distribution P (
#
dom = k[ ps, n) of the probability that the domination rank “
#
dom”
of a randomly selected individual from the initial, randomly created population is exactly
k ∈ 0..ps −1 depends on the population size ps and the number of objective functions n. We
have approximated this probability distribution with experiments where we determined the
domination rank of ps ndimensional vectors where each element is drawn from a uniform
distribution for several values of n and ps, spanning from n = 2 to n = 50 and ps = 3 to
ps = 3600. (For n = 1, no experiments where performed since there, each domination rank
from 0 to ps −1 occurs exactly once in the population.)
Figure 20.1 and Figure 20.2 illustrate the results of these experiments, the arithmetic
means of 100 000 runs for each conﬁguration. In Figure 20.1, the approximated distribu
tion of P (
#
dom = k[ ps, n) is illustrated for various n. The value of P (
#
dom = 0[ ps, n)
quickly approaches one for rising n. We limited the zaxis to 0.3 so the behavior of the
other ranks k > 0, which quickly becomes negligible, can still be seen. For ﬁxed n and
ps, P (
#
dom = k[ ps, n) drops rapidly, from the looks roughly similar to an exponential dis
tribution (see Section 53.4.3). For a ﬁxed k and ps, P (
#
dom = k[ ps, n) takes on roughly
the shape of a Poisson distribution (see Section 53.3.2) along the naxis, i. e., it rises for
some time and then falls again. With rising ps, the maximum of this distribution shifts
to the right, meaning that an algorithm with a higher population size may be able to deal
with more objective functions. Still, even for ps = 3000, around 64% of the population are
already nondominated in this experiment.
500
1000
1500
2000
2500
3000
3500
5
10
15
20
0
0.2
0.4
0.6
0.8
1.0
p
o
p
u
la
t
io
n
s
iz
e
p
s
P
(
#
d
o
m
=
0
p
s
,
n
)

n
o
.
o
f
o
b
j
e
c
t
i
v
e
s
n
Figure 20.2: The proportion of nondominated candidate solutions for several population
sizes ps and dimensionalities n.
The fraction of nondominated elements in the random populations is illustrated in Fig
ure 20.2. We ﬁnd that it again seems to rise exponentially with n. The population size ps in
Figure 20.2 seems to have only an approximately logarithmically positive inﬂuence: If we list
the population sizes required to keep the fraction of nondominated candidate solutions at the
same level of P (
#
dom = 0[ ps = 5, n = 2) ≈ 0.457, we ﬁnd P (
#
dom = 0[ ps = 12, n = 3) ≈
0.466, P (
#
dom = 0[ ps = 35, n = 4) ≈ 0.447, P (
#
dom = 0[ ps = 90, n = 5) ≈ 0.455,
P (
#
dom = 0[ ps = 250, n = 6) ≈ 0.452, P (
#
dom = 0[ ps = 650, n = 7) ≈ 0.460, and
P (
#
dom = 0[ ps = 1800, n = 8) ≈ 0.459. An extremely coarse rule of thumb for this ex
periment would hence be that around 0.6e
n
individuals are required in the population to
hold the proportion of nondominated at around 46%.
20.3. COUNTERMEASURES 201
To sum things up, the increasing dimensionality of the objective space leads to three main
problems [1420]:
1. The performance of traditional approaches based solely on Pareto comparisons dete
riorates.
2. The utility of the solutions cannot be understood by the human operator anymore.
3. The number of possible Paretooptimal solutions may increase exponentially.
Finally, it should be noted that there exists also interesting research where objectives are
added to a singleobjective problem to make it easier [1553, 2025] which you can ﬁnd sum
marized in Section 13.4.6. Indeed, Brockhoﬀ et al. [425] showed that in some cases, adding
objective functions may be beneﬁcial (while it is not in others).
20.3 Countermeasures
Various countermeasures have been proposed against the problem of dimensionality. Nice
surveys on contemporary approaches to manyobjective optimization with evolutionary ap
proaches, to the diﬃculties arising in manyobjective, and on benchmark problems have
been provided by Hiroyasu et al. [1237], Ishibuchi et al. [1420], and L´opez Jaimes and Coello
Coello [1775]. In the following we list some of approaches for manyobjective optimization,
mostly utilizing the information provided in [1237, 1420].
20.3.1 Increasing Population Size
The most trivial measure, however, has not been introduced in these studies: increasing
the population size. Well, from Example E20.1 we know that this will work only for a
few objective functions and we have to throw in exponentially more individuals in order
to ﬁght the inﬂuence of many objectives. For ﬁve or six objective functions, we may still
be able to obtain good results if we do this – for the price of many additional objective
function evaluations. If we follow the rough approximation equation from the example, it
would already take a population size of around ps = 100 000 in order to keep the fraction of
nondominated candidate solutions below 0.5 (in the scenario used there, that is). Hence,
increasing the population size can only get us so far.
20.3.2 Increasing Selection Pressure
A very simple idea for improving the way multiobjective approaches scale with increasing
dimensionality is to increase the exploration pressure into the direction of the Pareto frontier.
Ishibuchi et al. [1420] distinguish approaches which modify the deﬁnition of domination in
order to reduce the number nondominated candidate solutions in the population [2403] and
methods which assign diﬀerent ranks to nondominated solutions [636, 829, 830, 1576, 1635,
2629]. Farina and Amato [894] propose to rely fuzzy Pareto methods instead of pure Pareto
comparisons.
20.3.3 Indicator Functionbased Approaches
Ishibuchi et al. [1420] point further out that ﬁtness assignment methods could be used which
are not based on Pareto dominance. According to them, one approach is using indicator
functions such as those involving hypervolume metrics [1419, 2838].
202 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
20.3.4 Scalerizing Approaches
Another ﬁtness assignment method are scalarizing functions [1420] which treat many
objective problems with singleobjectivestyle methods. Advanced singleobjective optimiza
tion like those developed by Hughes [1303] could be used instead of Pareto comparisons.
Wagner et al. [2837, 2838] support the results and assumptions of Hughes [1303] in terms
of singleobjective being better in manyobjective problems than traditional multiobjective
Evolutionary Algorithms such as NSGAII [749] or SPEA 2 [3100], but also refute the gen
eral idea that all Paretobased cannot cope with their performance. They also show that
more recent approaches [880, 2010, 3096] which favor more than just distribution aspects
can perform very well. Other scalarizing methods can be found in [1304, 1416, 1417, 1437]
20.3.5 Limiting the Search Area in the Objective Space
Hiroyasu et al. [1237] point out that the search area in the objective space can be lim
ited. This leads to a decrease of the number of nondominated points [1420] and can be
achieved by either incorporating preference information [746, 933, 2695] or by reducing the
dimensionality [422–424, 745, 2411].
20.3.6 Visualization Methods
Approaches for visualizing solutions of manyobjective problems in order to make them
more comprehensible have been provided by Furuhashi and Yoshikawa [1002], K¨ oppen and
Yoshida [1577], Obayashi and Sasaki [2060] and in [1893].
Chapter 21
Scale (Decision Variables)
In Chapter 20, we discussed how an increasing number of objective functions can threaten
the performance of optimization algorithms. We referred to this problem as dimensionality,
i. e., the dimension of the objective space. However, there is another space in optimization
whose dimension can make an optimization problem diﬃcult: the search space G. The
“curse of dimensionality”
1
of the search space stands for the exponential increase of its
volume with the number of decision variables [260, 261]. In order to better distinguish
between the dimensionality of the search and the objective space, we will refer to the former
one as scale.
In Example E21.1, we give an example for the growth of the search space with the scale
of a problem and in Table 21.1, some largescale optimization problems are listed.
Example E21.1 (Smallscale versus largescale).
The problem of smallscale versus largescale problems can best be illustrated using discrete
or continuous vectorbased search spaces as example. If we search, for instance, in the
natural interval G = (1..10), there are ten points which could be the optimal solution.
When G = (1..10)
2
, there exist one hundred possible results and for G = (1..10)
3
, it is
already one thousand. In other words, if the number of points in G be n, then the number
of points in G
m
is n
m
.
Table 21.1: Applications and Examples of LargeScale Optimization.
Area References
Data Mining [162, 994, 1748]
Databases [162]
Distributed Algorithms and Sys
tems
[1748, 3031]
Engineering [2860]
Function Optimization [551, 1927, 2130, 2486, 3020]
Graph Theory [3031]
Software [3031]
21.1 The Problem
Especially in combinatorial optimization, the issue of scale has been considered for a long
time. Here, the computational complexity (see Section 12.1) denotes how many algorithm
1
http://en.wikipedia.org/wiki/Curse_of_dimensionality [accessed 20091020]
204 21 SCALE (DECISION VARIABLES)
steps are needed to solve a problem which consists of n parameters. As sketched in Fig
ure 12.1 on page 145, the time requirements to solve problems which require step numbers
growing exponentially with n quickly exceed any feasible bound. Here, especially AThard
problems should be mentioned, since they always exhibit this feature.
Besides combinatorial problems, numerical problems suﬀer the curse of dimensionality
as well.
21.2 Countermeasures
21.2.1 Parallelization and Distribution
When facing problems of largescale, the problematic symptom is the high runtime require
ment. Thus, any measure to use as much computational power as available can be a remedy.
Obviously, there (currently) exists no way to solve largescale AThard problems exactly
within feasible time. With more computers, cores, or hardware, only a linear improvement
of runtime can be achieved at most. However, for problems residing in the gray area between
feasible and infeasible, distributed computing [1747] may be the method of choice.
21.2.2 Genetic Representation and Development
The best way to solve a largescale problem is to solve it as a smallscale problem. For some
optimization tasks, it is possible to choose a search space which has a smaller size than
the problem space ([G[ < [X[). Indirect genotypephenotype mappings can link the spaces
together, as we will discuss later in Section 28.2.2.
Here, one option are generative mappings (see Section 28.2.2.1) which stepbystep con
struct a complex phenotype by translating a genotype according to some rules. Grammatical
Evolution, for instance, unfolds a symbol according to a BNF grammar with rules identiﬁed
in a genotype. This recursive process can basically lead to arbitrarily complex phenotypes.
Another approach is to apply a developmental, ontogenic mapping (see Section 28.2.2.2)
that uses feedback from simulations or objective functions in this process.
Example E21.2 (Dimension Reduction via Developmental Mapping).
Instead of optimizing (may be up to a million) points on the surface of an airplane’s wing
(see Example E2.5 and E4.1) in order to ﬁnd an optimal shape, the weights of an artiﬁcial
neural network used to move surface points to a beneﬁcial conﬁguration could be optimized.
The second task would involve signiﬁcantly fewer variables, since an artiﬁcial neural network
with usually less than 1000 or even 100 weights could be applied. Of course, the number
of possible solutions which can be discovered by this approach decreases (it is at most [G[).
However, the problem can become tangible and good results may be obtained in reasonable
time.
21.2.3 Exploiting Separability
Sometimes, parts of candidate solutions are independent from each other and can be opti
mized more or less separately. In such a case of low epistasis (see Chapter 17), a largescale
problem can be divided into several problems of smallerscale which can be optimized sepa
rately. If solving a problem of scale n takes 2
n
algorithm steps, solving two problems of scale
0.5n will clearly lead to a great runtime reduction. Such a reduction may even be worth
21.2. COUNTERMEASURES 205
sacriﬁcing some solution quality. If the optimization problem at hand exhibits low epistasis
or is separable, such a sacriﬁce may even be unnecessary.
Coevolution has shown to be an eﬃcient approach in combinatorial optimization [1311].
If extended with a cooperative component, i. e., to Cooperative Coevolution [2210, 2211],
it can eﬃciently exploit separability in numerical problems and lead to better results [551,
3020].
21.2.4 Combination of Techniques
Generally, it can be a good idea to concurrently use sets of algorithm [1689] or portfo
lios [2161] working on the same or diﬀerent populations. This way, the diﬀerent strengths
of the optimization methods can be combined. In the beginning, for instance, an algorithm
with good convergence speed may be granted more runtime. Later, the focus can shift
towards methods which can retain diversity and are not prone to premature convergence.
Alternatively, a sequential approach can be performed, which starts with one algorithm and
switches to another one when no further improvements are found [? ]. This way, ﬁrst an
interesting area in the search space can be found which then can be investigated in more
detail.
Chapter 22
Dynamically Changing Fitness Landscape
It should also be mentioned that there exist problems with dynamically changing ﬁtness
landscapes [396, 397, 400, 1957, 2302]. The task of an optimization algorithm is then to
provide candidate solutions with momentarily optimal objective values for each point in time.
Here we have the problem that an optimum in iteration t will possibly not be an optimum in
iteration t +1 anymore. Problems with dynamic characteristics can, for example, be tackled
with special forms [3019] of
1. Evolutionary Algorithms [123, 398, 1955, 1956, 2723, 2941],
2. Genetic Algorithms [1070, 1544, 1948–1950],
3. Particle Swarm Optimization [316, 498, 499, 1728, 2118],
4. Diﬀerential Evolution [1866, 2988], and
5. Ant Colony Optimization [1148, 1149, 1730].
The moving peaks benchmarks by Branke [396, 397] and Morrison and De Jong [1957] are
good examples for dynamically changing ﬁtness landscapes. You can ﬁnd them discussed
in ?? on page ??. In Table 22.1, we give some examples for applications of dynamic opti
mization.
Table 22.1: Applications and Examples of Dynamic Optimization.
Area References
Chemistry [1909, 2731, 3007]
Combinatorial Problems [400, 644, 1148, 1446, 1486, 1730, 2120, 2144, 2499]
Cooperation and Teamwork [1996, 1997, 2424, 2502]
Data Mining [3018]
Distributed Algorithms and Sys
tems
[353, 786, 962, 964–966, 1401, 1733, 1840, 1909, 1935, 1982,
1995, 2387, 2425, 2694, 2731, 2791, 3007, 3032, 3106]
Engineering [556, 644, 786, 1486, 1733, 1840, 1909, 1982, 2144, 2425,
2499, 2694, 2731, 3007]
Graph Theory [353, 556, 644, 786, 962, 964–966, 1401, 1486, 1733, 1840,
1909, 1935, 1982, 1995, 2144, 2387, 2425, 2502, 2694, 2731,
2791, 3032, 3106]
Logistics [1148, 1446, 1730, 2120]
Motor Control [489]
MultiAgent Systems [2144, 2925]
208 22 DYNAMICALLY CHANGING FITNESS LANDSCAPE
Security [1486]
Software [1401, 1486, 1995, 3032]
Wireless Communication [556, 644, 1486, 2144, 2502]
Chapter 23
The No Free Lunch Theorem
By now, we know the most important diﬃculties that can be encountered when applying an
optimization algorithm to a given problem. Furthermore, we have seen that it is arguable
what actually an optimum is if multiple criteria are optimized at once. The fact that there
is most likely no optimization method that can outperform all others on all problems can,
thus, easily be accepted. Instead, there exists a variety of optimization methods specialized
in solving diﬀerent types of problems. There are also algorithms which deliver good results
for many diﬀerent problem classes, but may be outperformed by highly specialized methods
in each of them. These facts have been formalized by Wolpert and Macready [2963, 2964]
in their No Free Lunch Theorems
1
(NFL) for search and optimization algorithms.
23.1 Initial Deﬁnitions
Wolpert and Macready [2964] consider singleobjective optimization over a ﬁnite do
main and deﬁne an optimization problem φ(g) ≡ f(gpm(g)) as a mapping of a search
space G to the objective space R.
2
Since this deﬁnition subsumes the problem space
and the genotypephenotype mapping, only skipping the possible search operations, it
is very similar to our Deﬁnition D6.1 on page 103. They further call a timeordered
set d
m
of m distinct visited points in G R a “sample” of size m and write d
m
≡
¦(d
g
m
(1), d
y
m
(1)) , (d
g
m
(2), d
y
m
(2)) , . . . , (d
g
m
(m), d
y
m
(m))¦. d
g
m
(i) is the genotype and d
y
m
(i)
the corresponding objective value visited at time step i. Then, the set D
m
= (GR)
m
is
the space of all possible samples of length m and D = ∪
m≥0
D
m
is the set of all samples of
arbitrary size.
An optimization algorithm a can now be considered to be a mapping of the previously
visited points in the search space (i. e., a sample) to the next point to be visited. Formally,
this means a : D →G. Without loss of generality, Wolpert and Macready [2964] only regard
unique visits and thus deﬁne a : d ∈ D → g : g ,∈ d.
Performance measures Ψ can be deﬁned independently from the optimization algorithms
only based on the values of the objective function visited in the samples d
m
. If the objective
function is subject to minimization, Ψ(d
y
m
) = min ¦d
y
m
: i = 1..m¦ would be the appropriate
measure.
Often, only parts of the optimization problem φ are known. If the minima of the objective
function f were already identiﬁed beforehand, for instance, its optimization would be useless.
Since the behavior in wide areas of φ is not obvious, it makes sense to deﬁne a probability
P (φ) that we are actually dealing with φ and no other problem. Wolpert and Macready
[2964] use the handy example of the Traveling Salesman Problem in order to illustrate this
1
http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization [accessed 2008
0328]
2
Notice that we have partly utilized our own notations here in order to be consistent throughout the
book.
210 23 THE NO FREE LUNCH THEOREM
issue. Each distinct TSP produces a diﬀerent structure of φ. Yet, we would use the same
optimization algorithm a for all problems of this class without knowing the exact shape of φ.
This corresponds to the assumption that there is a set of very similar optimization problems
which we may encounter here although their exact structure is not known. We act as if there
was a probability distribution over all possible problems which is nonzero for the TSPalike
ones and zero for all others.
23.2 The Theorem
The performance of an algorithm a iterated m times on an optimization problem φ can
then be deﬁned as P (d
y
m
[φ, m, a), i. e., the conditional probability of ﬁnding a particular
sample d
y
m
. Notice that this measure is very similar to the value of the problem landscape
Φ(x, τ) introduced in Deﬁnition D5.2 on page 95 which is the cumulative probability that
the optimizer has visited the element x ∈ X until (inclusively) the τth evaluation of the
objective function(s).
Wolpert and Macready [2964] prove that the sum of such probabilities over all possible
optimization problems φ on ﬁnite domains is always identical for all optimization algorithms.
For two optimizers a
1
and a
2
, this means that
∀φ
P (d
y
m
[φ, m, a
1
) =
∀φ
P (d
y
m
[φ, m, a
2
) (23.1)
Hence, the average over all φ of P (d
y
m
[φ, m, a) is independent of a.
23.3 Implications
From this theorem, we can immediately follow that, in order to outperform a
1
in one opti
mization problem, a
2
will necessarily perform worse in another. Figure 23.1 visualizes this
issue. It shows that general optimization approaches like Evolutionary Algorithms can solve
a variety of problem classes with reasonable performance. In this ﬁgure, we have chosen a
performance measure Φ subject to maximization, i. e., the higher its values, the faster will
the problem be solved. Hill Climbing approaches, for instance, will be much faster than
Evolutionary Algorithms if the objective functions are steady and monotonous, that is, in a
smaller set of optimization tasks. Greedy search methods will perform fast on all problems
with matroid
3
structure. Evolutionary Algorithms will most often still be able to solve these
problems, it just takes them longer to do so. The performance of Hill Climbing and greedy
approaches degenerates in other classes of optimization tasks as a tradeoﬀ for their high
utility in their “area of expertise”.
One interpretation of the No Free Lunch Theorem is that it is impossible for any opti
mization algorithm to outperform random walks or exhaustive enumerations on all possible
problems. For every problem where a given method leads to good results, we can construct
a problem where the same method has exactly the opposite eﬀect (see Chapter 15). As
a matter of fact, doing so is even a common practice to ﬁnd weaknesses of optimization
algorithms and to compare them with each other, see Section 50.2.6, for example.
3
http://en.wikipedia.org/wiki/Matroid [accessed 20080328]
23.3. IMPLICATIONS 211
allpossibleoptimizationproblems
p
e
r
f
o
r
m
a
n
c
e
randomwalkorexhaustiveenumerationor...
generaloptimizationalgorithmanEA,forinstance
specializedoptimizationalgorithm1;ahillclimber,forinstance
specializedoptimizationalgorithm2;adepthfirstsearch,forinstance
verycrudesketch
Figure 23.1: A visualization of the No Free Lunch Theorem.
Example E23.1 (Fitness Landscapes for Minimization and the NFL).
Let us assume that the Hill Climbing algorithm is applied to the objective functions given
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
r
i
g
h
t
r
i
g
h
t
w
r
o
n
g
I II III
Fig. 23.2.a: A landscape good for Hill
Climbing.
o
b
j
e
c
t
i
v
e
v
a
l
u
e
s
f
(
x
)
x
w
r
o
n
g
r
i
g
h
t
IV V
Fig. 23.2.b: A landscape bad for Hill
Climbing.
Figure 23.2: Fitness Landscapes and the NFL.
in Figure 23.2 which are both subject to minimization. We divided both ﬁtness landscapes
into sections labeled with roman numerals. A Hill Climbing algorithm will generate points
in the proximity of the currently best solution and transcend to these points if they are
better. Hence, in both problems it will always follow the downward gradient in all three
sections. In Fig. 23.2.a, this is the right decision in sections II and III because the global
optimum is located at the border between them, but wrong in the deceptive section I. In
Fig. 23.2.b, exactly the opposite is the case: Here, a gradient descend is only a good idea
in section IV (which corresponds to I in Fig. 23.2.a) whereas it leads away from the global
optimum in section V which has the same extend as sections II and III in Fig. 23.2.a.
Hence, our Hill Climbing would exhibit very diﬀerent performance when applied to
212 23 THE NO FREE LUNCH THEOREM
Fig. 23.2.a and Fig. 23.2.b. Depending on the search operations available, it may be even
outpaced by a random walk on Fig. 23.2.b in average. Since we may not know the nature of
the objective functions before the optimization starts, this may become problematic. How
ever, Fig. 23.2.b is deceptive in the whole section V, i. e., most of its problem space, which
is not the case to such an extent in many practical problems.
Another interpretation is that every useful optimization algorithm utilizes some form of
problemspeciﬁc knowledge. Radcliﬀe [2244] basically states that without such knowledge,
search algorithms cannot exceed the performance of simple enumerations. Incorporating
knowledge starts with relying on simple assumptions like “if x is a good candidate solution,
than we can expect other good candidate solutions in its vicinity”, i. e., strong causality.
The more (correct) problem speciﬁc knowledge is integrated (correctly) into the algorithm
structure, the better will the algorithm perform. On the other hand, knowledge correct
for one class of problems is, quite possibly, misleading for another class. In reality, we use
optimizers to solve a given set of problems and are not interested in their performance when
(wrongly) applied to other classes.
Example E23.2 (Arbitrary Landscapes and Exhaustive Enumeration).
An even simpler thought experiment than Example E23.1 on optimization thus goes as
follows: If we consider an optimization problem of truly arbitrary structure, then any element
in X could be the global optimum. In a arbitrary (or random) ﬁtness landscape where the
objective value of the global optimum is unknown, it is impossible to be sure that the best
candidate solution found so far is the global optimum until all candidate solutions have
been evaluated. In this case, a complete enumeration or uninformed search is the optimal
approach. In most randomized search procedures, we cannot fully prevent that a certain
candidate solution is discovered and evaluated more than once, at least without keeping all
previously discovered elements in memory which is infeasible in most cases. Hence, especially
when approaching full coverage of the problem space, these methods will waste objective
function evaluations and thus, become inferior.
The number of arbitrary ﬁtness landscapes should be at least as high as the number
of “optimizationfavorable” ones: Assuming a favorable landscape with a single global op
timum, there should be at least one position to which the optimum could be moved in
order to create a ﬁtness landscape where any informed search is outperformed by exhaustive
enumeration or a random walk.
The rough meaning of the NLF is that all blackbox optimization methods perform equally
well over the complete set of all optimization problems [2074]. In practice, we do not want
to apply an optimizer to all possible problems but to only some, restricted classes. In terms
of these classes, we can make statements about which optimizer performs better.
Today, there exists a wide range of work on No Free Lunch Theorems for many diﬀerent
aspects of machine learning. The website http://www.nofreelunch.org/
4
gives
a good overview about them. Further summaries, extensions, and criticisms have been
provided by K¨ oppen et al. [1578], Droste et al. [837, 838, 839, 840], Oltean [2074], and
Igel and Toussaint [1397, 1398]. Radcliﬀe and Surry [2247] discuss the NFL in the context
of Evolutionary Algorithms and the representations used as search spaces. The No Free
Lunch Theorem is furthermore closely related to the Ugly Duckling Theorem
5
proposed by
Watanabe [2868] for classiﬁcation and pattern recognition.
4
accessed: 20080328
5
http://en.wikipedia.org/wiki/Ugly_duckling_theorem [accessed 20080822]
23.4. INFINITE AND CONTINUOUS DOMAINS 213
23.4 Inﬁnite and Continuous Domains
Recently, Auger and Teytaud [145, 146] showed that the No Free Lunch Theorem holds only
in a weaker form for countable inﬁnite domains. For continuous domains (i. e., uncountable
inﬁnite ones), it does not hold. This means for a given performance measure and a limit
of the maximally allowed objective function evaluations, it can be possible to construct an
optimal algorithm for a certain family of objective functions [145, 146].
A factor that should be considered when thinking about these issues is, that real ex
isting computers can only work with ﬁnite search spaces, since they have ﬁnite memory.
In the end, even ﬂoating point numbers are ﬁnite in value and precision. Hence, there is
always a diﬀerence between the behavior of the algorithmic, mathematical deﬁnition of an
optimization method and its practical implementation for a given computing environment.
Unfortunately, the author of this book ﬁnds himself unable to state with certainty whether
this means that the NFL does hold for realistic continuous optimization or whether the
proofs by Auger and Teytaud [145, 146] carry over to actual programs.
23.5 Conclusions
So far, the subject of this chapter was the question about what makes optimization prob
lems hard, especially for metaheuristic approaches. We have discussed numerous diﬀerent
phenomena which can aﬀect the optimization process and lead to disappointing results.
If an optimization process has converged prematurely, it has been trapped in a non
optimal region of the search space from which it cannot “escape” anymore (Chapter 13).
Ruggedness (Chapter 14) and deceptiveness (Chapter 15) in the ﬁtness landscape, often
caused by epistatic eﬀects (Chapter 17), can misguide the search into such a region. Neu
trality and redundancy (Chapter 16) can either slow down optimization because the appli
cation of the search operations does not lead to a gain in information or may also contribute
positively by creating neutral networks from which the search space can be explored and
local optima can be escaped from. Noise is present in virtually all practical optimization
problems. The solutions that are derived for them should be robust (Chapter 18). Also,
they should neither be too general (oversimpliﬁcation, Section 19.2) nor too speciﬁcally
aligned only to the training data (overﬁtting, Section 19.1). Furthermore, many practical
problems are multiobjective, i. e., involve the optimization of more than one criterion at
once (partially discussed in Section 3.3), or concern objectives which may change over time
(Chapter 22).
The No Free Lunch Theorem argues that it is not possible to develop the one optimization
algorithm, the problemsolving machine which can provide us with nearoptimal solutions
in short time for every possible optimization task (at least in ﬁnite domains). Even though
this proof does not directly hold for purely continuous or inﬁnite search spaces, this must
sound very depressing for everybody new to this subject.
Actually, quite the opposite is the case, at least from the point of view of a researcher.
The No Free Lunch Theorem means that there will always be new ideas, new approaches
which will lead to better optimization algorithms to solve a given problem. Instead of being
doomed to obsolescence, it is far more likely that most of the currently known optimization
methods have at least one niche, one area where they are excellent. It also means that it
is very likely that the “puzzle of optimization algorithms” will never be completed. There
will always be a chance that an inspiring moment, an observation in nature, for instance,
may lead to the invention of a new optimization algorithm which performs better in some
problem areas than all currently known ones.
214 23 THE NO FREE LUNCH THEOREM
Evolutionary
Algorithms
ACO
PSO
Extremal
Optimiz.
Simulated
Annealing
EDA
Tabu
Search
Branch&
Bound
Dynamic
Program.
A
Search
«
IDDFS
Hill
Climbing
Memetic
Algorithms
Downhill
Simplex
GA,GP,ES,
DE,EP,...
LCS
RFD
Random
Optimiz.
Figure 23.3: The puzzle of optimization algorithms.
Chapter 24
Lessons Learned: Designing Good Encodings
Most Global Optimization algorithms share the premise that solutions to problems are either
(a) elements of a somewhat continuous space that can be approximated stepwise or (b) that
they can be composed of smaller modules which already have good attributes even when oc
curring separately.Especially in the second case, the design of the search space (or genome)
G and the genotypephenotype mapping “gpm” is vital for the success of the optimization
process. It determines to what degree these expected features can be exploited by deﬁning
how the properties and the behavior of the candidate solutions are encoded and how the
search operations inﬂuence them. In this chapter, we will ﬁrst discuss a general theory
on how properties of individuals can be deﬁned, classiﬁed, and how they are related. We
will then outline some general rules for the design of the search space G, the search opera
tions searchOp ∈ Op, and the genotypephenotype mapping “gpm”. These rules represent
the lessons learned from our discussion of the possible problematic aspects of optimization
tasks.In software engineering, there exists a variety of design patterns
1
that describe good
practice and experience values. Utilizing these patterns will help the software engineer to
create wellorganized, extensible, and maintainable applications.
Whenever we want to solve a problem with Global Optimization algorithms, we need to
deﬁne the structure of a genome. The individual representation along with the genotype
phenotype mapping is a vital part of Genetic Algorithms and has major impact on the
chance of ﬁnding good solutions. Like in software engineering, there exist basic patterns
and rules of thumb on how to design them properly.
Based on our previous discussions, we can now provide some general best practices for the
design of encodings from diﬀerent perspectives. These principles can lead to ﬁnding better
solutions or higher optimization speed if considered in the design phase [1075, 2032, 2338].
We will outline suggestions given by diﬀerent researchers transposed into the more general
vocabulary of formae (introduced in Chapter 9 on page 119).
24.1 Compact Representation
[1075] and Ronald [2323] state that the representations of the formae in the search space
should be as short as possible. The number of diﬀerent alleles and the lengths of the diﬀerent
genes should be as small as possible. In other words, the redundancy in the genomes should
be minimal. In Section 16.5 on page 174, we will discuss that uniform redundancy slows
down the optimization process.
24.2 Unbiased Representation
UnbiasednessThe search space G should be unbiased in the sense that all phenotypes are
1
http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29 [accessed 20070812]
216 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
represented by the same number of genotypes [2032, 2113, 2115]. This property allows to
eﬃciently select an unbiased starting population which gives the optimizer the chance of
reaching all parts of the problem space.
∀x
1
, x
2
∈ X ⇒[¦g ∈ G : x
1
= gpm(g)¦[ ≈ [¦g ∈ G : x
2
= gpm(g)¦[ (24.1)
Beyer and Schwefel [300, 302] furthermore also request for unbiased variation operators. The
unary mutation operations in Evolution Strategy, for example, should not give preference to
any speciﬁc change of the genotypes. The reason for this is that selection operations, i. e.,
the methods which chooses the candidate solutions already exploits ﬁtness information and
guides the search into one direction. The variation operators thus should not do this job as
well but instead try to increase the diversity. They should perform exploration as opposed
to the exploitation performed by selection.
Beyer and Schwefel [302] therefore again suggest to use normal distributions to mutate
realvalued vectors if G ⊆ R
n
and Rudolph [2348] propose to use geometrically distributed
random numbers to oﬀset genotype vectors in G ⊆ Z
n
.
24.3 Surjective GPM
Palmer [2113], Palmer and Kershenbaum [2115] and Della Croce et al. [765] state that a
good search space G and genotypephenotype mapping “gpm” should be able to repre
sent all possible phenotypes and solution structures, i. e., be surjective (see Section 51.7 on
page 644) [2032].
∀x ∈ X ⇒ ∃g ∈ G : x = gpm(g) (24.2)
24.4 Injective GPM
In the spirit of Section 24.1 and with the goal to lower redundancy, the number of genotypes
which map to the same phenotype should be low [2322–2324] In Section 16.5 we provided a
thorough discussion of redundancy in the genotypes and its demerits (but also its possible
advantages). Combining the idea of achieving low uniform redundancy with the principle
stated in Section 24.3, Palmer and Kershenbaum [2113, 2115] conclude that the GPM should
ideally be both, simple and bijective.
24.5 Consistent GPM
The genotypephenotype mapping should always yield valid phenotypes [2032, 2113, 2115].
The meaning of valid in this context is that if the problem space X is the set of all possible
trees, only trees should be encoded in the genome. If we use the R
3
as problem space,
no vectors with fewer or more elements than three should be produced by the genotype
phenotype mapping. Anything else would not make sense anyway, so this is a pretty basic
rule. This form of validity does not imply that the individuals are also correct solutions in
terms of the objective functions.
Constraints on what a good solution is can, on one hand, be realized by deﬁning them ex
plicitly and making them available to the optimization process (see Section 3.4 on page 70).
On the other hand, they can also be implemented by deﬁning the encoding, search opera
tions, and genotypephenotype mapping in a way which ensures that all phenotypes created
are feasible by default [1888, 2323, 2912, 2914].
24.6. FORMAE INHERITANCE AND PRESERVATION 217
24.6 Formae Inheritance and Preservation
The search operations should be able to preserve the (good) formae and (good) conﬁgurations
of multiple formae [978, 2242, 2323, 2879]. Good formae should not easily get lost during
the exploration of the search space.
From a binary search operation like recombination (e.g., crossover in GAs), we would
expect that, if its two parameters g
1
and g
2
are members of a forma A, the resulting element
will also be an instance of A.
∀g
1
, g
2
∈ A ⊆ G ⇒searchOp(g
1
, g
2
) ∈ A (24.3)
If we furthermore can assume that all instances of all formae A with minimal precision
(A ∈ mini) of an individual are inherited by at least one parent, the binary reproduction
operation is considered as pure. [2242, 2879]
∀g
3
= searchOp(g
1
, g
2
) ∈ G, ∀A ∈ mini : g
3
∈ A ⇒ g
1
∈ A∨ g
2
∈ A (24.4)
If this is the case, all properties of a genotype g
3
which is a combination of two others (g
1
, g
2
)
can be traced back to at least one of its parents. Otherwise, “searchOp” also performs an
implicit unary search step, a mutation in Genetic Algorithm, for instance. Such properties,
although discussed here for binary search operations only, can be extended to arbitrary nary
operators.
24.7 Formae in Genotypic Space Aligned with Phenotypic Formae
The optimization algorithms ﬁnd new elements in the search space G by applying the search
operations searchOp ∈ Op. These operations can only create, modify, or combine genotypical
formae since they usually have no information about the problem space. Most mathematical
models dealing with the propagation of formae like the Building Block Hypothesis and the
Schema Theorem
2
thus focus on the search space and show that highly ﬁt genotypical formae
will more probably be investigated further than those of low utility. Our goal, however, is to
ﬁnd highly ﬁt formae in the problem space X. Such properties can only be created, modiﬁed,
and combined by the search operations if they correspond to genotypical formae. According
to Radcliﬀe [2242] and Weicker [2879], a good genotypephenotype mapping should provide
this feature.
It furthermore becomes clear that useful separate properties in phenotypic space can only
be combined by the search operations properly if they are represented by separate formae
in genotypic space too.
24.8 Compatibility of Formae
Formae of diﬀerent properties should be compatible [2879]. Compatible Formae in phe
notypic space should also be compatible in genotypic space. This leads to a low level of
epistasis [240, 2323] (see Chapter 17 on page 177) and hence will increase the chance of
success of the reproduction operations. The representations of diﬀerent, compatible pheno
typic formae should not inﬂuence each other [1075]. This will lead to a representations with
minimal epistasis.
2
See Section 29.5 for more information on the Schema Theorem.
218 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
24.9 Representation with Causality
Besides low epistasis, the search space, GPM, and search operations together should possess
strong causality (see Section 14.2). Generally, the search operations should produce small
changes in the objective values. This would, simpliﬁed, mean that:
∀x
1
, x
2
∈ X, g ∈ G : x
1
= gpm(g) ∧ x
2
= gpm(searchOp(g)) ⇒ f (x
2
) ≈ f x
1
(24.5)
24.10 Combinations of Formae
If genotypes g
1
, g
2
, . . . which are instances of diﬀerent but compatible formae A
1
⊲⊳ A
2
⊲⊳ . . .
are combined by a binary (or nary) search operation, the resulting genotype g should be an
instance of both properties, i. e., the combination of compatible formae should be a forma
itself [2879].
∀g
1
∈ A
1
, g
2
∈ A
2
, ⇒searchOp(g
1
, g
2
, . . .) ∈ A
1
∩ A
2
∩ . . . (,= ∅) (24.6)
If this principle holds for many individuals and formae, useful properties can be combined by
the optimization step by step, narrowing down the precision of the arising, most interesting
formae more and more. This should lead the search to the most promising regions of the
search space.
24.11 Reachability of Formae
Beyer and Schwefel [300, 302] suggest that a good representation has the feature of reach
ability that any other (ﬁnite) individual state can be reached from any parent state within
a ﬁnite number of (mutation) steps. In other words, the set of search operations should be
complete (see Deﬁnition D4.8 on page 85). Beyer and Schwefel state that this property is
necessary for proving global convergence.
The set of available search operations Op should thus include at least one (unary) search
operation which is theoretically able to reach all possible formae within ﬁnite steps. If the
binary search operations in Op all are pure, this unary operator is the only one (apart
from creation operations) able to introduce new formae which are not yet present in the
population. Hence, it should be able to ﬁnd any given forma [2879].
Generally, we just need to be able to generate all formae, not necessarily all genotypes
with such an operator, if formae can arbitrarily combined by the available nary search or
recombination operations.
24.12 Inﬂuence of Formae
One rule which, in my opinion, was missing in the lists given by Radcliﬀe [2242] and Weicker
[2879] is that the absolute contributions of the single formae to the overall objective values
of a candidate solution should not be too diﬀerent. Let us divide the phenotypic formae
into those with positive and those with negative or neutral contribution and let us, for
simpliﬁcation purposes, assume that those with positive contribution can be arbitrarily
combined. If one of the positive formae has a contribution with an absolute value much
lower than those of the other positive formae, we will trip into the problem of domino
convergence discussed in Section 13.2.3 on page 153.
Then, the search will ﬁrst discover the building blocks of higher value. This, itself, is
not a problem. However, as we have already pointed out in Section 13.2.3, if the search is
24.13. SCALABILITY OF SEARCH OPERATIONS 219
stochastic and performs exploration steps, chances are that alleles of higher importance get
destroyed during this process and have to be rediscovered. The values of the less salient
formae would then play no role. Thus, the chance of ﬁnding them strongly depends on how
frequent the destruction of important formae takes place.
Ideally, we would therefore design the genome and phenome in a way that the diﬀerent
characteristics of the candidate solution all inﬂuence the objective values to a similar degree.
Then, the chance of ﬁnding good formae increases.
24.13 Scalability of Search Operations
The scalability requirement outlined by Beyer and Schwefel [300, 302] states that the strength
of the variation operator (usually a unary search operation such as mutation) should be
tunable in order to adapt to the optimization process. In the initial phase of the search,
usually larger search steps are performed in order to obtain a coarse idea in which general
direction the optimization process should proceed. The closer the focus gets to a certain local
or global optimum, the smaller should the search steps get in order to balance the genotypic
features in a ﬁnegrained manner. Ideally, the optimization algorithm should achieve such
behavior in a selfadaptive way in reaction to the structure of the problem it is solving.
24.14 Appropriate Abstraction
The problem should be represented at an appropriate level of abstraction [240, 2323]. The
complexity of the optimization problem should evenly be distributed between the search
space G, the search operations searchOp ∈ Op, the genotypephenotype mapping “gpm”,
and the problem space X. Sometimes, using a nontrivial encoding which shifts complexity
to the genotypephenotype mapping and the search operations can improve the performance
of an optimizer and increase the causality [240, 2912, 2914].
24.15 Indirect Mapping
If a direct mapping between genotypes and phenotypes is not possible, a suitable develop
mental approach should be applied [2323]. Such indirect genotypephenotype mappings, as
we will discuss in Section 28.2.2 on page 272, are especially useful if largescale and robust
structure designs should be constructed by the optimization process.
24.15.1 Extradimensional Bypass: Example for Good Complexity
Minimalsized genomes are not always the best approach. An interesting aspect of genome
design supporting this claim is inspired by the works of the theoretical biologist Conrad
[621, 622, 623, 625]. According to his extradimensional bypass principle, it is possible to
transform a rugged ﬁtness landscape with isolated peeks into one with connected saddle
points by increasing the dimensionality of the search space [496, 544]. In [625] he states that
the chance of isolated peeks in randomly created ﬁtness landscapes decreases when their
dimensionality grows.
This partly contradicts Section 24.1 which states that genomes should be as compact
as possible. Conrad [625] does not suggest that nature includes useless sequences in the
genome but either genes which allow for
1. new phenotypical characteristics or
2. redundancy providing new degrees of freedom for the evolution of a species.
220 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
In some cases, such an increase in freedom makes more than up for the additional “costs”
arising from the enlargement of the search space. The extradimensional bypass can be
considered as an example of positive neutrality (see Chapter 16).
G
G’
f(x)
local
optimum
global
optimum
Fig. 24.1.a: Useful increase of dimensionality.
G
G’
f(x)
local
optimum
global
optimum
Fig. 24.1.b: Useless increase of dimensionality.
Figure 24.1: Examples for an increase of the dimensionality of a search space G (1d) to G
′
(2d).
In Fig. 24.1.a, an example for the extradimensional bypass (similar to Fig. 6 in [355])
is sketched. The original problem had a onedimensional search space G corresponding to
the horizontal axis up front. As can be seen in the plane in the foreground, the objective
function had two peeks: a local optimum on the left and a global optimum on the right,
separated by a larger valley. When the optimization process began climbing up the local
optimum, it was very unlikely that it ever could escape this hill and reach the global one.
Increasing the search space to two dimensions (G
′
), however, opened up a path way
between them. The two isolated peeks became saddle points on a longer ridge. The global
optimum is now reachable from all points on the local optimum.
Generally, increasing the dimension of the search space makes only sense if the added
dimension has a nontrivial inﬂuence on the objective functions. Simply adding a useless new
dimension (as done in Fig. 24.1.b) would be an example for some sort of uniform redundancy
from which we already know (see Section 16.5) that it is not beneﬁcial. Then again, adding
useful new dimensions may be hard or impossible to achieve in most practical applications.
A good example for this issue is given by Bongard and Paul [355] who used an EA to
evolve a neural network for the motion control of a bipedal robot. They performed runs
where the evolution had control over some morphological aspects and runs where it had
not. The ability to change the leg with of the robots, for instance, comes at the expense
of an increase of the dimensions of the search spaced. Hence, one would expect that the
optimization would perform worse. Instead, in one series of experiments, the results were
much better with the extended search space. The runs did not converge to one particular
leg shape but to a wide range of diﬀerent structures. This led to the assumption that the
morphology itself was not so much target of the optimization but the ability of changing it
transformed the ﬁtness landscape to a structure more navigable by the evolution. In some
other experimental runs performed by Bongard and Paul [355], this phenomenon could not
be observed, most likely because
1. the robot conﬁguration led to a problem of too high complexity, i. e., ruggedness in
the ﬁtness landscape and/or
2. the increase in dimensionality this time was too large to be compensated by the gain
of evolvability.
24.15. INDIRECT MAPPING 221
Further examples for possible beneﬁts of “gradually complexifying” the search space are
given by Malkin in his doctoral thesis [1818].
Part III
Metaheuristic Optimization Algorithms
224 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
Chapter 25
Introduction
In Deﬁnition D1.15 on page 36 we stated that metaheuristics is a blackbox optimization
technique which combines information from the objective functions gained by sampling the
search space in a way which (hopefully) leads to the discovery of candidate solutions with
high utility. In the taxonomy given in Section 1.3, these algorithms are the majority since
they are the focus of this book.
226 25 INTRODUCTION
25.1 General Information on Metaheuristics
25.1.1 Applications and Examples
Table 25.1: Applications and Examples of Metaheuristics.
Area References
Art see Tables 28.4, 29.1, and 31.1
Astronomy see Table 28.4
Chemistry see Tables 28.4, 29.1, 30.1, 31.1, 32.1, 40.1, and 43.1
Combinatorial Problems [342, 651, 1033, 1034, 2283, 2992]; see also Tables 26.1, 27.1,
28.4, 29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1,
40.1, 41.1, and 42.1
Computer Graphics see Tables 27.1, 28.4, 29.1, 30.1, 31.1, 33.1, and 36.1
Control see Tables 28.4, 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork see Tables 28.4, 29.1, 31.1, and 37.1
Data Compression see Table 31.1
Data Mining see Tables 27.1, 28.4, 29.1, 31.1, 32.1, 34.1, 35.1, 36.1, 37.1,
and 39.1
Databases see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
Decision Making see Table 31.1
Digital Technology see Tables 26.1, 28.4, 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys
tems
[3001, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
32.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 41.1
ELearning see Tables 28.4, 29.1, 37.1, and 39.1
Economics and Finances see Tables 27.1, 28.4, 29.1, 31.1, 33.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
Engineering see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
35.1, 36.1, 37.1, 39.1, 40.1, and 41.1
Function Optimization see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 32.1, 33.1, 34.1, 35.1,
36.1, 39.1, 40.1, 41.1, and 43.1
Games see Tables 28.4, 29.1, 31.1, and 32.1
Graph Theory [3001, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1, 40.1, and 41.1
Healthcare see Tables 28.4, 29.1, 31.1, 35.1, 36.1, and 37.1
Image Synthesis see Table 28.4
Logistics [651, 1033, 1034]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1,
33.1, 34.1, 36.1, 37.1, 38.1, 39.1, 40.1, and 41.1
Mathematics see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
and 35.1
Military and Defense see Tables 26.1, 27.1, 28.4, and 34.1
Motor Control see Tables 28.4, 33.1, 34.1, 36.1, and 39.1
MultiAgent Systems see Tables 28.4, 29.1, 31.1, 35.1, and 37.1
Multiplayer Games see Table 31.1
Physics see Tables 27.1, 28.4, 29.1, 31.1, 32.1, and 34.1
Prediction see Tables 28.4, 29.1, 30.1, 31.1, and 35.1
Security see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
25.1. GENERAL INFORMATION ON METAHEURISTICS 227
Shape Synthesis see Table 26.1
Software [467, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
33.1, 34.1, 36.1, 37.1, and 40.1
Sorting see Tables 28.4 and 31.1
Telecommunication see Tables 28.4, 37.1, and 40.1
Testing see Tables 28.4 and 31.3
Theorem Prooﬁng and Automatic
Veriﬁcation
see Tables 29.1 and 40.1
Water Supply and Management see Table 28.4
Wireless Communication see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 33.1, 34.1, 35.1, 36.1,
37.1, 39.1, and 40.1
25.1.2 Books
1. Metaheuristics for Scheduling in Industrial and Manufacturing Applications [2992]
2. Handbook of Metaheuristics [1068]
3. Fleet Management and Logistics [651]
4. Metaheuristics for Multiobjective Optimisation [1014]
5. LargeScale Network Parameter Conﬁguration using Online Simulation Framework [3031]
6. Modern Heuristic Techniques for Combinatorial Problems [2283]
7. How to Solve It: Modern Heuristics [1887]
8. Search Methodologies – Introductory Tutorials in Optimization and Decision Support Tech
niques [453]
9. Handbook of Approximation Algorithms and Metaheuristics [1103]
25.1.3 Conferences and Workshops
Table 25.2: Conferences and Workshops on Metaheuristics.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
ICARIS: International Conference on Artiﬁcial Immune Systems
See Table 10.2 on page 130.
MIC: Metaheuristics International Conference
History: 2009/07: Hamburg, Germany, see [1968]
2008/10: Atizap´an de Zaragoza, M´exico, see [1038]
2007/06: Montr´eal, QC, Canada, see [1937]
2005/08: Vienna, Austria, see [2810]
2003/08: Kyoto, Japan, see [1321]
2001/07: Porto, Portugal, see [2294]
1999/07: Angra dos Reis, Rio de Janeiro, Brazil, see [2298]
1997/07: SophiaAntipolis, France, see [2828]
1995/07: Breckenridge, CO, USA, see [2104]
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
228 25 INTRODUCTION
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
History: 2012/10: Port El Kantaoui, Sousse, Tunisia, see [2205]
2010/10: Djerba Island, Tunisia, see [873]
2008/10: Hammamet, Tunesia, see [1171]
2006/11: Hammamet, Tunesia, see [1170]
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
25.1.4 Journals
1. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers
B.V. and NorthHolland Scientiﬁc Publishers Ltd.
2. Computational Optimization and Applications published by Kluwer Academic Publishers and
Springer Netherlands
3. The Journal of the Operational Research Society (JORS) published by Operations Research
Society and Palgrave Macmillan Ltd.
4. Journal of Heuristics published by Springer Netherlands
5. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied
Mathematics (SIAM)
6. Journal of Global Optimization published by Springer Netherlands
7. Discrete Optimization published by Elsevier Science Publishers B.V.
8. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
9. Engineering Optimization published by Informa plc and Taylor and Francis LLC
10. Optimization and Engineering published by Kluwer Academic Publishers and Springer
Netherlands
11. Optimization Methods and Software published by Taylor and Francis LLC
12. Journal of Mathematical Modelling and Algorithms published by Springer Netherlands
13. Structural and Multidisciplinary Optimization published by SpringerVerlag GmbH
14. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization pub
lished by SIAM Activity Group on Optimization – SIAG/OPT
Chapter 26
Hill Climbing
26.1 Introduction
Deﬁnition D26.1 (Local Search). A local search
1
algorithm solves an optimization
problem by iteratively moving from one candidate solution to another in the problem space
until a termination criterion is met [2356].
In other words, a local search algorithm in each iteration either keeps the current candidate
solution or moves to one element from its neighborhood as deﬁned in Deﬁnition D4.12 on
page 86. Local search algorithms have two major advantages: They use very little memory
(normally only a constant amount) and are often able to ﬁnd solutions in large or inﬁnite
search spaces. These advantages come, of course, with large tradeoﬀs in processing time
and the risk of premature convergence.
Hill Climbing
2
(HC) [2356] is one of the oldest and simplest local search and optimiza
tion algorithm for single objective functions f. In Algorithm 1.1 on page 40, we gave an
example for a simple and unstructured Hill Climbing algorithm. Here we will generalize
from Algorithm 1.1 to a more basic Monte Carlo Hill Climbing algorithm structure.
26.2 Algorithm
The Hill Climbing procedures usually starts either from a random point in the search space
or from a point chosen according to some problemspeciﬁc criterion (visualized with the
operator create
__
Algorithm 26.1). Then, in a loop, the currently known best individual p
⋆
is modiﬁed (with the mutate( )operator) in order to produce one oﬀspring p
new
. If this new
individual is better than its parent, it replaces it. Otherwise, it is discarded. This procedure
is repeated until the termination criterion terminationCriterion
__
is met. In Listing 56.6 on
page 735, we provide a Java implementation which resembles Algorithm 26.1.
In this sense, Hill Climbing is similar to an Evolutionary Algorithm (see Chapter 28)
with a population size of 1 and truncation selection. Matter of fact, it resembles a (1 + 1)
Evolution Strategies. In Algorithm 30.7 on page 369, we provide such an optimization
algorithm for G ⊆ R
n
which additionally adapts the search step size.
1
http://en.wikipedia.org/wiki/Local_search_(optimization) [accessed 20100724]
2
http://en.wikipedia.org/wiki/Hill_climbing [accessed 20070703]
230 26 HILL CLIMBING
Algorithm 26.1: x ←− hillClimbing(f)
Input: f: the objective function subject to minization
Input: [implicit] terminationCriterion: the termination criterion
Data: p
new
: the new element created
Data: p
⋆
: the (currently) best individual
Output: x: the best element found
1 begin
2 p
⋆
.g ←− create
__
3 p
⋆
.x ←− gpm(p
⋆
.g)
4 while terminationCriterion
__
do
5 p
new
.g ←− mutate(p
⋆
.g)
6 p
new
.x ←− gpm(p
new
.g)
7 if f(p
new
.x) ≤ f(p
⋆
.x) then p
⋆
←− p
new
8 return p
⋆
.x
Although the search space G and the problem space X are most often the same in Hill
Climbing, we distinguish them in Algorithm 26.1 for the sake of generality. It should further
be noted that Hill Climbing can be implemented in a deterministic manner if, for each
genotype g
1
∈ G, the set of adjacent points is ﬁnite (g
2
∈ G : [adjacent(g
2
, g
1
)[ ∈ N
0
).
26.3 MultiObjective Hill Climbing
As illustrated in Algorithm 26.2, we can easily extend Hill Climbing algorithms with a
support for multiobjective optimization by using some of the methods from Section 3.5.2
on page 76 and some which we later will discuss in the context of Evolutionary Algorithms.
This extended approach will then return a set X of the best candidate solutions found instead
of a single point x from the problem space as done in Algorithm 26.1.
The set of currently known best individuals archive may contain more than one element.
In each round, we pick one individual from the archive at random. We then use this individ
ual to create the next point in the problem space with the “mutate” operator. The method
pruneOptimalSet(archive, as) checks whether the new candidate solution is nonprevailed
from any element in archive. If so, it will enter the archive and all individuals from archive
which now are dominated are removed. Since an optimization problem may potentially have
inﬁnitely many global optima, we subsequently have to ensure that the archive does not grow
too big: pruneOptimalSet(archive, as) makes sure that it contains at most as individuals and,
if necessary, deletes some individuals. In case of Algorithm 26.2, the archive can grow to
at most as + 1 individuals before the pruning, so at most one individual will be deleted.
When the loop ends because the termination criterion terminationCriterion
__
was met, we
extract all the phenotypes from the archived individuals (extractPhenotypes(archive)) and
return them as the set X. An example implementation of this method in Java can be found
in Listing 56.7 on page 737.
26.4. PROBLEMS IN HILL CLIMBING 231
Algorithm 26.2: X ←− hillClimbingMO(OptProb, as)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: as: the maximum archive size
Input: [implicit] terminationCriterion: the termination criterion
Data: p
old
: the individual used as parent for the next point
Data: p
new
: the new individual generated
Data: archive: the set of best individuals known
Output: X: the set of the best elements found
1 begin
2 archive ←− ()
3 p
new
.g ←− create
__
4 p
new
.x ←− gpm(p
new
.g)
5 while terminationCriterion
__
do
// check whether the new individual belongs into archive
6 archive ←− updateOptimalSet(archive, p
new
, cmp
f
)
// if the archive is too big, delete one individual
7 archive ←− pruneOptimalSet(archive, as)
// pick random element from archive
8 p
old
←− archive[⌊randomUni[0, len(archive))⌋]
9 p
new
.g ←− mutate(p
old
.g)
10 p
new
.x ←− gpm(p
new
.g)
11 return extractPhenotypes(archive)
26.4 Problems in Hill Climbing
The major problem of Hill Climbing is premature convergence. We introduced this phe
nomenon in Chapter 13 on page 151: The optimization gets stuck at a local optimum and
cannot discover the global optimum anymore.
Both previously speciﬁed versions of the algorithm are very likely to fall prey to this
problem, since the always move towards improvements in the objective values and thus,
cannot reach any candidate solution which is situated behind an inferior point. They will
only follow a path of candidate solutions if it is monotonously
3
improving the objective func
tion(s). The function used in Example E5.1 on page 96 is, for example, already problematic
for Hill Climbing.
As previously mentioned, Hill Climbing in this form is a local search rather than Global
Optimization algorithm. By making a few slight modiﬁcations to the algorithm however, it
can become a valuable technique capable of performing Global Optimization:
1. A tabulist which stores the elements recently evaluated can be added. By preventing
the algorithm from visiting them again, a better exploration of the problem space X can
be enforced. This technique is used in Tabu Search which is discussed in Chapter 40
on page 489.
2. A way of preventing premature convergence is to not always transcend to the better
candidate solution in each step. Simulated Annealing introduces a heuristic based
on the physical model the cooling down molten metal to decide whether a superior
oﬀspring should replace its parent or not. This approach is described in Chapter 27
on page 243.
3
http://en.wikipedia.org/wiki/Monotonicity [accessed 20070703]
232 26 HILL CLIMBING
3. The Dynamic Hill Climbing approach by Yuret and de la Maza [3048] uses the last
two visited points to compute unit vectors. With this technique, the directions are
adjusted according to the structure of the problem space and a new coordinate frame
is created which points more likely into the right direction.
4. Randomly restarting the search after soandso many steps is a crude but eﬃcient
method to explore wide ranges of the problem space with Hill Climbing. You can ﬁnd
it outlined in Section 26.5.
5. Using a reproduction scheme that not necessarily generates candidate solutions directly
neighboring p
⋆
.x, as done in Random Optimization, an optimization approach deﬁned
in Chapter 44 on page 505, may prove even more eﬃcient.
26.5 Hill Climbing With Random Restarts
Hill Climbing with random restarts is also called Stochastic Hill Climbing (SH) or Stochastic
gradient descent
4
[844, 2561]. In the following, we will outline how random restarts can be
implemented into Hill Climbing and serve as a measure against premature convergence.
We will further combine this approach directly with the multiobjective Hill Climbing ap
proach deﬁned in Algorithm 26.2. The new algorithm incorporates two archives for optimal
solutions: archive
1
, the overall optimal set, and archive
2
the set of the best individuals in
the current run. We additionally deﬁne the restart criterion shouldRestart
__
which is eval
uated in every iteration and determines whether or not the algorithm should be restarted.
shouldRestart
__
therefore could, for example, count the iterations performed or check if
any improvement was produced in the last ten iterations. After each single run, archive
2
is
incorporated into archive1, from which we extract and return the problem space elements
at the end of the Hill Climbing process, as deﬁned in Algorithm 26.3.
4
http://en.wikipedia.org/wiki/Stochastic_gradient_descent [accessed 20070703]
26.5. HILL CLIMBING WITH RANDOM RESTARTS 233
Algorithm 26.3: X ←− hillClimbingMORR(OptProb, as)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: as: the maximum archive size
Input: [implicit] terminationCriterion: the termination criterion
Input: [implicit] shouldRestart: the restart criterion
Data: p
old
: the individual used as parent for the next point
Data: p
new
: the new individual generated
Data: archive
1
, archive
2
: the sets of best individuals known
Output: X: the set of the best elements found
1 begin
2 archive
1
←− ()
3 while terminationCriterion
__
do
4 archive
2
←− ()
5 p
new
.g ←− create
__
6 p
new
.x ←− gpm(p
new
.g)
7 while
_
shouldRestart
__
∨ terminationCriterion
___
do
// check whether the new individual belongs into
archive
8 archive
2
←− updateOptimalSet(archive
2
, p
new
, cmp
f
)
// if the archive is too big, delete one individual
9 archive
2
←− pruneOptimalSet(archive
2
, as)
// pick random element from archive
10 p
old
←− archive
2
[⌊randomUni[0, len(archive
2
))⌋]
11 p
new
.g ←− mutate(p
old
.g)
12 p
new
.x ←− gpm(p
new
.g)
// check whether good individual were found in this run
13 archive
1
←− updateOptimalSetN(archive
1
, archive
2
, cmp
f
)
// if the archive is too big, delete one individual
14 archive
1
←− pruneOptimalSet(archive
1
, as)
15 return extractPhenotypes(archive
1
)
234 26 HILL CLIMBING
26.6 General Information on Hill Climbing
26.6.1 Applications and Examples
Table 26.1: Applications and Examples of Hill Climbing.
Area References
Combinatorial Problems [16, 502, 775, 1261, 1486, 1532, 1553, 1558, 1781, 1971, 2656,
2663, 2745, 3027]; see also Table 26.3
Databases [1781, 2745]
Digital Technology [1600, 1665]
Distributed Algorithms and Sys
tems
[15, 410, 775, 1532, 1558, 2058, 2165, 2286, 2315, 2554, 2656,
2663, 2903, 2994, 3027, 3085, 3106]
Engineering [410, 1486, 1600, 1665, 2058, 2286, 2656, 2663]
Function Optimization [1927, 3048]
Graph Theory [15, 410, 775, 1486, 1532, 1558, 2058, 2165, 2237, 2286, 2315,
2554, 2596, 2656, 2663, 3027, 3085, 3106]
Logistics [1553, 1971]
Mathematics [1674]
Military and Defense [775]
Security [1486, 2165, 2554]
Shape Synthesis [887]
Software [1486, 1665, 2554, 2994]
Wireless Communication [1486, 2237, 2596]
26.6.2 Books
1. Artiﬁcial Intelligence: A Modern Approach [2356]
2. Introduction to Stochastic Search and Optimization [2561]
3. Evolution and Optimum Seeking [2437]
26.6.3 Conferences and Workshops
Table 26.2: Conferences and Workshops on Hill Climbing.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
26.7. RAINDROP METHOD 235
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
26.6.4 Journals
1. Journal of Heuristics published by Springer Netherlands
26.7 Raindrop Method
Only ﬁve years ago, Bettinger and Zhu [290, 3083] contributed a new search heuristic for
constrained optimization: the Raindrop Method, which they used for forest planning prob
lems. In the original description of the algorithm, the search and problem space are identical
(G = X). The algorithm is based on precise knowledge of the components of the candidate
solutions and on how their interaction inﬂuences the validity of the constraints. It works as
follows:
1. The Raindrop Method starts out with a single, valid candidate solution x
⋆
(i. e., one
that violates none of the constraints). This candidate may be found with a random
search process or provided by a human operator.
2. Create a copy x of x
⋆
. Set the iteration counter τ to a userdeﬁned maximum value
maxτ of iterations for modifying and correcting x that are allowed without improve
ments before reverting to x
⋆
.
3. Perturb x by randomly modifying one of its components. Let us refer to this randomly
selected component as s. This modiﬁcation may lead to constraint violations.
4. If no constraint was violated, continue at Point 11, otherwise proceed as follows.
5. Set a distance value d to 0.
6. Create a list L of the components of x that lead to constraint violations. Here we
make use the knowledge of the interaction of components and constraints.
7. FromL, we pick the component c which is physically closest to s, that is, the component
with the minimum distance dist(c, s). In the original application of the Raindrop
Method, physically close was properly deﬁned due to the fact that the candidate
solutions were basically twodimensional maps. For applications diﬀerent from forest
planning, appropriate deﬁnitions for the distance measure have to be supplied.
8. Set d = dist(c, s).
236 26 HILL CLIMBING
9. Find the next best value for c which does not introduce any new constraint violations
in components f which are at least as close to s, i. e., with dist(f, s) ≤ dist(c, s). This
change may, however, cause constraints violations in components farther away from s.
If no such change is possible, go to Point 13. Otherwise, modify the component c in
x.
10. Go back to step Point 4.
11. If x is better than x
⋆
, that is, x ≺ x
⋆
, set x
⋆
= x. Otherwise, decrease the iteration
counter τ.
12. If the termination criterion has not yet been met, go back to step Point 3 if τ > 0 and
to Point 2 if τ = 0.
13. Return x
⋆
to the user.
The iteration counter τ here is used to allow the search to explore solutions more distance
from the current optimum x
⋆
. The higher the initial value maxτ speciﬁed user, the more
iterations without improvement are allowed before reverting x to x
⋆
. The name Raindrop
Method comes from the fact that the constraint violations caused by the perturbation of
the valid solution radiate away from the modiﬁed component s like waves on a water surface
radiate away from the point where a raindrop hits.
26.7.1 General Information on Raindrop Method
26.7.1.1 Applications and Examples
Table 26.3: Applications and Examples of Raindrop Method.
Area References
Combinatorial Problems [290, 3083]
26.7.1.2 Books
1. NatureInspired Algorithms for Optimisation [562]
26.7.1.3 Conferences and Workshops
Table 26.4: Conferences and Workshops on Raindrop Method.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
26.7. RAINDROP METHOD 237
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
26.7.1.4 Journals
1. Journal of Heuristics published by Springer Netherlands
238 26 HILL CLIMBING
Tasks T26
64. Use the implementation of Hill Climbing given in Listing 56.6 on page 735 (or an
equivalent own one as in Task 65) in order to ﬁnd approximations of the global minima
of
f
1
(x) = 0.5 −
1
2n
n
i=1
[cos (ln (1 +[x
i
[))] (26.1)
f
2
(x) = 0.5 −0.5 ∗ cos
_
ln
_
1 +
n
i=1
[x
i
[
__
(26.2)
f
3
(x) = 0.01 ∗ max
n
i=1
_
x
i
2
_
(26.3)
in the real interval [−10, 10]
n
.
(a) Run ten experiments for n = 2, n = 5, and n = 10 each. Note all parameter
settings of the algorithm you used (the number of algorithm steps, operators,
etc.), the results, the vectors discovered, and their objective values. [20 points]
(b) Compute the median (see Section 53.2.6 on page 665), the arithmetic mean (see
Section 53.2.2, especially Deﬁnition D53.32 on page 661) and the standard devi
ation (see Section 53.2.3, especially Section 53.2.3.2 on page 663) of the achieved
objective values in the ten runs. [10 points]
(c) The three objective functions are designed so that their value is always in [0, 1]
in the interval [−10, 10]
n
. Try to order the optimization problems according to
the median of the achieved objective values. Which problems seem to be the
hardest? What is the inﬂuence of the dimension n on the achieved objective
values? [10 points]
(d) If you compare f
1
and f
2
, you ﬁnd striking similarities. Does one of the functions
seem to be harder? If so, which. . . and what, in your opinion, makes it harder?
[5 points]
[45 points]
65. Implement the Hill Climbing procedure Algorithm 26.1 on page 230 in a programming
language diﬀerent from Java. You can specialize your implementation for only solving
continuous numerical problems, i. e., you do not need to follow the general, interface
based approach we adhere to in Listing 56.6 on page 735. Apply your implementation
to the Sphere function (see Section 50.3.1.1 on page 581 and Listing 57.1 on page 859)
for n = 10 dimensions in order to test its functionality.
[20 points]
66. Extend the Hill Climbing algorithm implementation given in Listing 56.6 on page 735
to Hill Climbing with restarts, i. e., to a Hill Climbing which restarts the search pro
cedure every n steps (while preserving the results of the optimization process in an
additional variable). You may use the discussion in Section 26.5 on page 232 as refer
ence. However, you do not need to provide the multiobjective optimization capabilities
of Algorithm 26.3 on page 233. Instead, only extend the singleobjective Hill Climbing
method in Listing 56.6 on page 735 or provide a completely new implementation of
Hill Climbing with restarts in a programming language of your choice. Test your im
plementation on one of the previously discussed benchmark functions for realvalued,
continuous optimization (see, e.g., Task 64).
[25 points]
67. Let us now solve a modiﬁed version of the bin packing problem from Example E1.1
on page 25. In Table 26.5, we give four datasets from Data Set 1 in [2422]. Each
26.7. RAINDROP METHOD 239
dataset stands for one bin packing problem and provides the bin size b, the number
n of objects to be put into the bins, and the sizes a
i
of the objects (with i ∈ 1..n).
The problem space be the set of all possible assignments of objects to bins, where the
number k of bins is not provided. For each of the four bin packing problems, ﬁnd
a candidate solution x which contains an assignment of objects to as few as possible
bins of the given size b. In other words: For each problem, we do know the size b of
the bin, the number n of objects, and the object sizes a. We want to ﬁnd out how
many bins we need at least to store all the objects (k ∈ N
1
, the objective, subject to
minimization) and how we can store (x ∈ X) the objects in these bins.
Name [2422] b n a
i
k
⋆
N1C1W1 A 100 50 [50, 100, 99, 99, 96, 96, 92, 92, 91, 88, 87, 86, 85, 76, 74,
72, 69, 67, 67, 62, 61, 56, 52, 51, 49, 46, 44, 42, 40, 40,
33, 33, 30, 30, 29, 28, 28, 27, 25, 24, 23, 22, 21, 20, 17,
14, 13, 11, 10, 7, 7, 3]
25
N1C2W2 D 120 50 [50, 120, 99, 98, 98, 97, 96, 90, 88, 86, 82, 82, 80, 79, 76,
76, 76, 74, 69, 67, 66, 64, 62, 59, 55, 52, 51, 51, 50, 49,
44, 43, 41, 41, 41, 41, 41, 37, 35, 33, 32, 32, 31, 31, 31,
30, 29, 23, 23, 22, 20, 20]
24
N2C1W1 A 100 100 [100, 100, 99, 97, 95, 95, 94, 92, 91, 89, 86, 86, 85, 84, 80,
80, 80, 80, 80, 79, 76, 76, 75, 74, 73, 71, 71, 69, 65, 64,
64, 64, 63, 63, 62, 60, 59, 58, 57, 54, 53, 52, 51, 50, 48,
48, 48, 46, 44, 43, 43, 43, 43, 42, 41, 40, 40, 39, 38, 38,
38, 38, 37, 37, 37, 37, 36, 35, 34, 33, 32, 30, 29, 28, 26,
26, 26, 24, 23, 22, 21, 21, 19, 18, 17, 16, 16, 15, 14, 13,
12, 12, 11, 9, 9, 8, 8, 7, 6, 6, 5, 1]
48
N2C3W2 C 150 100 [100, 150, 100, 99, 99, 98, 97, 97, 97, 96, 96, 95, 95, 95,
94, 93, 93, 93, 92, 91, 89, 88, 87, 86, 84, 84, 83, 83, 82,
81, 81, 81, 78, 78, 75, 74, 73, 72, 72, 71, 70, 68, 67, 66,
65, 64, 63, 63, 62, 60, 60, 59, 59, 58, 57, 56, 56, 55, 54,
51, 49, 49, 48, 47, 47, 46, 45, 45, 45, 45, 44, 44, 44, 44,
43, 41, 41, 40, 39, 39, 39, 37, 37, 37, 35, 35, 34, 32, 31,
31, 30, 28, 26, 25, 24, 24, 23, 23, 22, 21, 20, 20]
41
Table 26.5: Bin Packing test data from [2422], Data Set 1.
Solve this optimization problem with Hill Climbing. It is not the goal to solve it to
optimality, but to get solutions with Hill Climbing which are as good as possible. For
the sake of completeness and in order to allow comparisons, we provide the lowest
known number k
⋆
of bins for the problems given by Scholl and Klein [2422] in the last
column of Table 26.5.
In order to solve this optimization problem, you may proceed as discussed in Chap
ter 7 on page 109, except that you have to use Hill Climbing. Below, we give some
suggestions – but you do not necessarily need to follow them:
(a) A possible problem space X for n objects would be the set of all permutations
of the ﬁrst n natural numbers (G = Π(1..n)). A candidate solution x ∈ X, i. e.,
such a permutation, could describe the order in which the objects have to be put
into bins. If a bin is ﬁlled to its capacity (or the current object does not ﬁt into
the bin anymore), a new bin is needed. This process is repeated until all objects
have been packed into bins.
(b) For the objective function, there exist many diﬀerent choices.
i. As objective function f
1
, the number k(x) of bins required for storing the
objects according to a permutation x ∈ X could be used. This value can
easily be computed as discussed in our outline of the problem space X above.
You can ﬁnd an example implementation of this function in Listing 57.10 on
page 892.
240 26 HILL CLIMBING
ii. Alternatively, we could also sum up the squares of the free space left over in
each bin as f
2
. It is easy to see that if we put each object into a single bin,
the free capacity left over is maximal. If there was a way to perfectly ﬁll each
bin with objects, the free capacity would become 0. Such a conﬁguration
would also require the least number of bins, since using fewer bins would
mean to not store all objects. By summing up the squares of the free spaces,
we punish bins with lots of free space more severely. In other words, we
reward solutions where the bins are ﬁlled equally. Naturally one would “feel”
that such solutions are “good”. However, this approach may prevent us
from ﬁnding the optimum if it is a solution where the objects are distributed
uneven. We implement this objective function as Listing 57.11 on page 893.
iii. Also, a combination of the above would be possible, such as f
3
(x) = f
1
(x) −
1
1+f
2
(x)
, for example. An example implementation of this function is given
as Listing 57.12 on page 894.
iv. Then again, we could consider the total number of bins required but ad
ditionally try to minimize the space lastWaste(x) left over in the last bin,
i. e., set f
4
(x) = f
1
(x) + 1/(1 + lastWaste(x)). The idea here would be that
the dominating factor of the objective should still be the number of bins re
quired. But, if two solutions require the same amount of bins, we prefer the
one where more space is free in the last bin. If this waste in the last bin is
increased more and more, this means that the volume of objects in that bin
is reduced. It may eventually become 0 which would equal leaving the bin
empty. The empty bin would not be used and hence, we would be saving one
bin. An implementation of such a function for the suggested problem space
is given at Listing 57.13 on page 896.
v. Finally, instead of only considering the remaining empty space in the last
bin, we could consider the maximum empty space over all bins. The idea
is pretty similar: If we have a bin which is almost empty, maybe we can
make it disappear with just a few moves (object swaps). Hence, we could
prefer solutions where there is one bin with much free space. In order to ﬁnd
such solutions, we go over all the bins required by the solution, and check how
much space is free in them. We then take the one bin with the most free space
mf and add this space as a factor to the bin count in the objective function,
similar to as we did in the previous function: f
5
(x) = f
1
(x) + 1/(1 + mf(x)).
An implementation of this function for the suggested problem space is given
at Listing 57.14 on page 897.
(c) If the suggested problem space is used, it could also be used as search space,
i. e., G = X. Then, a simple unary search operation (mutation) would swap the
elements of the permutation such as those provided in Listing 56.38 on page 806
and Listing 56.39. A creation operation could draw a permutation according to
the uniform distribution as shown in Listing 56.37.
Use the Hill Climbing Algorithm 26.1 given in Listing 56.6 on page 735 for solving the
task and compare its results with the results produced by equivalent random walks
and random sampling (see Chapter 8 and Section 56.1.2 on page 729). For your
experiments, you may use the program given in Listing 57.15 on page 899.
(a) For each algorithm or objective function you test, perform at least 30 independent
runs to get reliable results (Remember: our Hill Climbing algorithm is randomized
and can produce diﬀerent results in each run.)
(b) For each of the four problems, write down the best assignment of objects to bins
that you have found (along with the number of bins required). You may add own
ideas as well
26.7. RAINDROP METHOD 241
(c) Discuss which of the objective functions and algorithm or algorithm conﬁguration
performed best and give reasons.
(d) Also provide all your source code.
[40 points]
Chapter 27
Simulated Annealing
27.1 Introduction
In 1953, Metropolis et al. [1874] developed a Monte Carlo method for “calculating the
properties of any substance which may be considered as composed of interacting individual
molecules”. With this new “Metropolis” procedure stemming from statistical mechanics,
the manner in which metal crystals reconﬁgure and reach equilibriums in the process of
annealing can be simulated, for example. This inspired Kirkpatrick et al. [1541] to develop
the Simulated Annealing
1
(SA) algorithm for Global Optimization in the early 1980s and
to apply it to various combinatorial optimization problems. Independently around the same
time,
ˇ
Cern´ y [516] employed a similar approach to the Traveling Salesman Problem. Even
earlier, Jacobs et al. [1426, 1427] use a similar method to optimize program code. Simulated
Annealing is a generalpurpose optimization algorithm that can be applied to arbitrary
search and problem spaces. Like simple Hill Climbing, Simulated Annealing only needs a
single initial individual as starting point and a unary search operation and thus belongs to
the family of local search algorithms.
27.1.1 Metropolis Simulation
In metallurgy and material science, annealing
2
[1307] is a heat treatment of material
with the goal of altering properties such as its hardness. Metal crystals have small defects,
dislocations of atoms which weaken the overall structure as sketched in Figure 27.1. It
should be noted that Figure 27.1 is a very rough sketch and the following discussion is very
rough and simpliﬁed, too.
3
In order decrease the defects, the metal is heated to, for instance
0.4 times of its melting temperature, i. e., well below its melting point but usually already a
few hundred degrees Celsius. Adding heat means adding energy to the atoms in the metal.
Due to the added energy, the electrons are freed from the atoms in the metal (which now
become ions). Also, the higher energy of the ions makes them move and increases their
diﬀusion rate. The ions may now leave their current position and assume better positions.
During this process, they may also temporarily reside in conﬁgurations of higher energy, for
example if two ions get close to each other. The movement decreases as the temperature
of the metal sinks. The structure of the crystal is reformed as the material cools down and
1
http://en.wikipedia.org/wiki/Simulated_annealing [accessed 20070703]
2
http://en.wikipedia.org/wiki/Annealing_(metallurgy) [accessed 20080919]
3
I assume that a physicist or chemist would beat for these simpliﬁcations it. If you are a physicist or
chemist and have suggestions for improving/correcting the discussion, please drop me an email.
244 27 SIMULATED ANNEALING
t
T
T=getTemperature(1)
s
0K=getTemperature( ) ¥
physicalrolemodelof
SimulatedAnnealing
metalstructurewithdefects increasedmovementofatoms ionsinlowenergystructure
withfewerdefects
Figure 27.1: A sketch of annealing in metallurgy as role model for Simulated Annealing.
approaches its equilibrium state. During this process, the dislocations in the metal crystal
can disappear. When annealing metal, the initial temperature must not be too low and the
cooling must be done suﬃciently slowly so as to avoid to stop the ions too early and to
prevent the system from getting stuck in a metastable noncrystalline state representing a
local minimum of energy.
In the Metropolis simulation of the annealing process, each set of positions pos of all
atoms of a system is weighted by its Boltzmann probability factor e
−
E(pos)
k
B
∗T
where E(pos) is
the energy of the conﬁguration pos, T is the temperature measured in Kelvin, and k
B
is the
Boltzmann’s constant
4
:
k
B
≈ 1.380 650 524 10
−23
J/K (27.1)
The Metropolis procedure is a model of this physical process which could be used to simulate
a collection of atoms in thermodynamic equilibrium at a given temperature. A new nearby
geometry pos
i+1
was generated as a random displacement from the current geometry pos
i
of the atoms in each iteration. The energy of the resulting new geometry is computed and
∆E, the energetic diﬀerence between the current and the new geometry, was determined.
The probability that this new geometry is accepted, P (∆E) is deﬁned in Equation 27.3.
∆E = E(pos
i+1
) −E(pos
i
) (27.2)
P (∆E) =
_
e
−
∆E
k
B
∗T
if ∆E > 0
1 otherwise
(27.3)
Systems in chemistry and physics always “strive” to attain lowenergy states, i. e., the lower
the energy, the more stable is the system. Thus, if the new nearby geometry has a lower
energy level, the change of atom positions is accepted in the simulation. Otherwise, a
uniformly distributed random number r = randomUni[0, 1) is drawn and the step will only
be realized if r is less or equal the Boltzmann probability factor, i. e., r ≤ P (∆E). At high
temperatures T, this factor is very close to 1, leading to the acceptance of many uphill steps.
4
http://en.wikipedia.org/wiki/Boltzmann%27s_constant [accessed 20070703]
27.2. ALGORITHM 245
As the temperature falls, the proportion of steps accepted which would increase the energy
level decreases. Now the system will not escape local regions anymore and (hopefully) comes
to a rest in the global energy minimum at temperature T = 0K.
27.2 Algorithm
The abstraction of the Metropolis simulation method in order to allow arbitrary problem
spaces is straightforward – the energy computation E(pos
i
) is replaced by an objective
function f or maybe even by the result v of a ﬁtness assignment process. Algorithm 27.1
illustrates the basic course of Simulated Annealing which you can ﬁnd implemented in List
ing 56.8 on page 740. Without loss of generality, we reuse the deﬁnitions from Evolutionary
Algorithms for the search operations and set Op = ¦create, mutate¦.
Algorithm 27.1: x ←− simulatedAnnealing(f)
Input: f: the objective function to be minimized
Data: p
new
: the newly generated individual
Data: p
cur
: the point currently investigated in problem space
Data: T: the temperature of the system which is decreased over time
Data: t: the current time index
Data: ∆E: the energy (objective value) diﬀerence of the p
cur
.x and p
new
.x
Output: x: the best element found
1 begin
2 p
cur
.g ←− create
__
3 p
cur
.x ←− gpm(p
new
.g)
4 x ←− p
cur
.x
5 t ←− 1
6 while terminationCriterion
__
do
7 T ←− getTemperature(t)
8 p
new
.g ←− mutate(p
cur
.g)
9 p
new
.x ←− gpm(p
new
.g)
10 ∆E ←− f(p
new
.x) −f(p
cur
.x)
11 if ∆E ≤ 0 then
12 p
cur
←− p
new
13 if f(p
cur
.x) < f(x) then x ←− p
cur
.x
14 else
15 if randomUni[0, 1) < e
−
∆E
T
then p
cur
←− p
new
16 t ←− t + 1
17 return x
Let us discuss in a very simpliﬁed way how this algorithm works. During the course of
the optimization process, the temperature T decreases (see Section 27.3). If the temperature
is high in the beginning, this means that, diﬀerent from a Hill Climbing process, Simulated
Annealing accepts also many worse individuals. Better candidate solutions are always ac
cepted. In summary, the chance that the search will transcend to a new candidate solution,
regardless whether it is better or worse, is high. In its behavior, Simulated Annealing is
then a bit similar to a random walk. As the temperature decreases, the chance that worse
individuals are accepted decreases. If T approaches 0, this probability becomes 0 as well
and Simulated Annealing becomes a pure Hill Climbing algorithm. The rationale of having
an algorithm that initially behaves like a random walk and then becomes a Hill Climber is
to make use of the beneﬁts of the two extreme cases
246 27 SIMULATED ANNEALING
A random walk is a maximallyexplorative and minimallyexploitive algorithm. It can
never get stuck at a local optimum and will explore all areas in the search space with
the same probability. Yet, it cannot exploit a good solution, i. e., is unable to make small
modiﬁcations to an individual in a targeted way in order to check if there are better solutions
nearby.
A Hill Climber, on the other hand, is minimallyexplorative and maximallyexploitive. It
always tries to modify and improve on the current solution and never investigates faraway
areas of the search space.
Simulated Annealing combines both properties, it performs a lot of exploration in order
to discover interesting areas of the search space. The hope is that, during this time, it will
discover the basin of attraction of the global optimum. The more the optimization process
becomes like a Hill Climber, the less likely will it move away from a good solution to a worse
one and will instead concentrate on exploiting the solution.
27.2.1 No Boltzmann’s Constant
Notice the diﬀerence between Line 15 in Algorithm 27.1 and Equation 27.3 on page 244:
Boltzmann’s constant has been removed. The reason is that with Simulated Annealing, we
do not want to simulate a physical process but instead, aim to solve an optimization problem.
The nominal value of Boltzmann’s constant, given in Equation 27.1 (something around
1.4 10
−23
), has quite a heavy impact on the Boltzmann probability. In a combinatorial
problem, where the diﬀerences in the objective values between two candidate solutions are
often discrete, this becomes obvious: In order to achieve an acceptance probability of 0.001
for a solution which is 1 worse than the best known one, one would need a temperature T
of:
P = 0.001 = e
−
1
k
B
∗T
(27.4)
ln 0.001 = −
1
k
B
∗ T
(27.5)
−
1
ln 0.001
= k
B
∗ T (27.6)
−
1
k
B
∗ ln 0.001
= T (27.7)
T ≈ 1.05 10
22
K (27.8)
Making such a setting, i. e., using a nominal value of the scale of 1 10
22
or more, for such
a problem is not obvious and hard to justify. Leaving the constant k
B
away leads to
P = 0.001 = e
−
1
T
(27.9)
ln 0.001 = −
1
T
(27.10)
−
1
ln 0.001
= T (27.11)
T ≈ 0.145 (27.12)
In other words, a temperature of around 0.15 would be suﬃcient to achieve an acceptance
probability of around 0.001. Such a setting is much easier to grasp and allows the algo
rithm user to relate objective values, objective value diﬀerences, and probabilities in a more
straightforward way. Since “physical correctness” is not a goal in optimization (while ﬁnd
ing good solutions in an easy way is), k
B
is thus eliminated from the computations in the
Simulated Annealing algorithm.
If you take a close look on Equation 27.8 and Equation 27.12, you will spot another
diﬀerence: Equation 27.8 contains the unit K whereas Equation 27.12 does not. The reason
is that the absence of k
B
in Equation 27.12 removes the physical context and hence, using
27.3. TEMPERATURE SCHEDULING 247
the unit Kelvin is no longer appropriate. Technically, we should also no longer speak about
temperature, but for the sake of simplicity and compliance with literature, we will keep that
name during our discussion of Simulated Annealing.
27.2.2 Convergence
It has been shown that Simulated Annealing algorithms with appropriate cooling strategies
will asymptotically converge to the global optimum [1403, 2043]. The probability that
the individual p
cur
as used in Algorithm 27.1 represents a global optimum approaches 1
for t → ∞. This feature is closely related to the property ergodicity
5
of an optimization
algorithm:
Deﬁnition D27.1 (Ergodicity). An optimization algorithm is ergodic if its result x ∈ X
(or its set of results X ⊆ X) does not depend on the point(s) in the search space at which
the search process is started.
Nolte and Schrader [2043] and van Laarhoven and Aarts [2776] provide lists of the most
important works on the issue of guaranteed convergence to the global optimum. Nolte and
Schrader [2043] further list the research providing deterministic, noninﬁnite boundaries
for the asymptotic convergence by Anily and Federgruen [111], Gidas [1054], Nolte and
Schrader [2042], and Mitra et al. [1917]. In the same paper, they introduce a signiﬁcantly
lower bound. They conclude that, in general, p
cur
.x in the Simulated Annealing process is
a globally optimal candidate solution with a high probability after a number of iterations
which exceeds the cardinality of the problem space. This is a boundary which is, well, slow
from the viewpoint of a practitioner [1403]: In other words, it would be faster to enumerate
all possible candidate solutions in order to ﬁnd the global optimum with certainty than
applying Simulated Annealing. Indeed, this makes sense: If this was not the case, Simulated
Annealing could be used to solve ATcomplete problems in subexponential time. . .
This does not mean that Simulated Annealing is generally slow. It only needs that much
time if we insist on the convergence to the global optimum. Speeding up the cooling process
will result in a faster search, but voids the guaranteed convergence on the other hand. Such
speededup algorithms are called Simulated Quenching (SQ) [1058, 1402, 2405].
Quenching
6
, too, is a process in metallurgy. Here, a work piece is ﬁrst heated and then
cooled rapidly, often by holding it into a water bath. This way, steel objects can be hardened,
for example.
27.3 Temperature Scheduling
Deﬁnition D27.2 (Temperature Schedule). The temperature schedule deﬁnes how the
temperature parameter T in the Simulated Annealing process (as deﬁned in Algorithm 27.1)
is set. The operator getTemperature : N
1
→ R
+
maps the current iteration index t to a
(positive) real temperature value T.
getTemperature(t) ∈ [1.. +∞) ∀t ∈ N
1
(27.13)
5
http://en.wikipedia.org/wiki/Ergodicity [accessed 20100926]
6
http://en.wikipedia.org/wiki/Quenching [accessed 20101011]
248 27 SIMULATED ANNEALING
Equation 27.13 is provided as a Java interface in Listing 55.12 on page 717. As already
mentioned, the temperature schedule has major inﬂuence on the probability that the Sim
ulated Annealing algorithm will ﬁnd the global optimum. For “getTemperature”, a few
general rules hold. All schedules start with a temperature T
s
which is greater than zero.
getTemperature(1) = T
s
> 0 (27.14)
If the number of iterations t approaches inﬁnity, the temperature should approach 0. If the
temperature sinks too fast, the ergodicity of the search may be lost and the global optimum
may not be discovered. In order to avoid that the resulting Simulated Quenching algorithm
gets stuck at a local optimum, random restarting can be applied, which already has been
discussed in the context of Hill Climbing in Section 26.5 on page 232.
lim
t→+∞
getTemperature(t) = 0 (27.15)
There exists a wide range of methods to determine this temperature schedule. Miki et al.
0.2T
s
0.4T
s
0.6T
s
0.8T
s
T
s
0 20 40 60 80 100
linearscaling
(i.e.,polynomialwith =1) a
polynomial
with a=2
exponentially
with e=0.05
logarithmically
exponentially
with e=0.025
t
Figure 27.2: Diﬀerent temperature schedules for Simulated Annealing.
[1896], for example, used Genetic Algorithms for this purpose. The most established methods
are listed below and implemented in Paragraph 56.1.3.2.1 and some examples of them are
sketched in Figure 27.2.
27.3.1 Logarithmic Scheduling
Scale the temperature logarithmically where T
s
is set to a value larger than the greatest
diﬀerence of the objective value of a local minimum and its best neighboring candidate
27.3. TEMPERATURE SCHEDULING 249
solution. This value may, of course, be estimated since it usually is not known. [1167, 2043]
An implementation of this method is given in Listing 56.9 on page 743.
getTemperature
ln
(t) =
_
T
s
if t < e
T
s
/ ln t otherwise
(27.16)
27.3.2 Exponential Scheduling
Reduce T to (1−ǫ)T in each step (or ever m steps, see Equation 27.19), where the exact value
of 0 < ǫ < 1 is determined by experiment [2808]. Such an exponential cooling strategy will
turn Simulated Annealing into Simulated Quenching [1403]. This strategy is implemented
in Listing 56.10 on page 744.
getTemperature
exp
(t) = (1 −ǫ)
t
∗ T
s
(27.17)
27.3.3 Polynomial and Linear Scheduling
T is polynomially scaled between T
s
and 0 for
˘
t iterations according to Equation 27.18 [2808].
α is a constant, maybe 1, 2, or 4 that depends on the positions and objective values of the
local minima. Large values of α will spend more iterations at lower temperatures. For α = 1,
the schedule proceeds in linear steps.
getTemperature
poly
(t) =
_
1 −
t
˘
t
_
α
∗ T
s
(27.18)
27.3.4 Adaptive Scheduling
Instead of basing the temperature solely on the iteration index t, other information may be
used for selfadaptation as well. After every m moves, the temperature T can be set to β
times ∆E
c
= f(p
cur
.x) − f(x), where β is an experimentally determined constant, f(p
cur
.x)
is the objective value of the currently examined candidate solution p
cur
.x, and f(x) is the
objective value of the best phenotype x found so far. Since ∆E
c
may be 0, we limit the
temperature change to a maximum of T ∗ γ with 0 < γ < 1. [2808]
27.3.5 Larger Step Widths
If the temperature reduction should not take place continuously but only every m ∈ N
1
iterations, t
′
computed according to Equation 27.19 can be used in the previous formulas,
i. e., getTemperature(t
′
) instead of getTemperature(t).
t
′
= m⌊t/m⌋ (27.19)
250 27 SIMULATED ANNEALING
27.4 General Information on Simulated Annealing
27.4.1 Applications and Examples
Table 27.1: Applications and Examples of Simulated Annealing.
Area References
Combinatorial Problems [190, 308, 344, 382, 516, 667, 771, 1403, 1454, 1455, 1486,
1541, 1550, 1896, 2043, 2105, 2171, 2640, 2770, 2901, 2966]
Computer Graphics [433, 1043, 1403, 2707]
Data Mining [1403]
Databases [771, 2171]
Distributed Algorithms and Sys
tems
[616, 1926, 2058, 2068, 2631, 2632, 2901, 2994, 3002]
Economics and Finances [1060, 1403]
Engineering [437, 1058, 1059, 1403, 1486, 1541, 1550, 1926, 2058, 2250,
2405, 3002]
Function Optimization [1018, 1071, 2180, 2486]
Graph Theory [63, 344, 616, 1454, 1455, 1486, 1926, 2058, 2068, 2237, 2631,
2632, 2793, 3002]
Logistics [190, 382, 516, 667, 1403, 1550, 1896, 2043, 2770, 2966]
Mathematics [111, 227, 1043, 1674, 2180]
Military and Defense [1403]
Physics [1403]
Security [1486]
Software [1486, 1550, 2901, 2994]
Wireless Communication [63, 154, 1486, 2237, 2250, 2793]
27.4.2 Books
1. Genetic Algorithms and Simulated Annealing [694]
2. Simulated Annealing: Theory and Applications [2776]
3. Electromagnetic Optimization by Genetic Algorithms [2250]
4. Facts, Conjectures, and Improvements for Simulated Annealing [2369]
5. Simulated Annealing for VLSI Design [2968]
6. Introduction to Stochastic Search and Optimization [2561]
7. Simulated Annealing [2966]
27.4.3 Conferences and Workshops
Table 27.2: Conferences and Workshops on Simulated Annealing.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
27.4. GENERAL INFORMATION ON SIMULATED ANNEALING 251
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
27.4.4 Journals
1. Journal of Heuristics published by Springer Netherlands
252 27 SIMULATED ANNEALING
Tasks T27
68. Perform the same experiments as in Task 64 on page 238 with Simulated Annealing.
Therefore, use the Java Simulated Annealing implementation given in Listing 56.8 on
page 740 or develop an equivalent own implementation.
[45 points]
69. Compare your results from Task 64 on page 238 and Task 68. What are the diﬀerences?
What are the similarities? Did you use diﬀerent experimental settings? If so, what
happens if you use the same experimental settings for both tasks? Try to give reasons
for your results.
[10 points]
70. Perform the same experiments as in Task 67 on page 238 with Simulated Annealing and
Simulated Quenching. Therefore, use the Java Simulated Annealing implementation
given in Listing 56.8 on page 740 or develop an equivalent own implementation.
[25 points]
71. Compare your results from Task 67 on page 238 and Task 70. What are the diﬀerences?
What are the similarities? Did you use diﬀerent experimental settings? If so, what
happens if you use the same experimental settings for both tasks? Try to give reasons
for your results. Put your explanations into the context of the discussions provided
in Example E17.4 on page 181.
[10 points]
72. Try solving the bin packing problem again, this time by using the representation given
by Khuri, Sch¨ utz, and Heitk¨otter in [1534]. Use both, Simulated Annealing and Hill
Climbing to evaluate the performance of the new representation similar to what we
do in Section 57.1.2.1 on page 887. Compare the results with the results listed in that
section and obtained by you in Task 70. Which method is better? Provide reasons for
your observation.
[50 points]
73. Compute the probabilities that a new solution with ∆E
1
= 0, ∆E
2
= 0.001, ∆E
3
=
0.01, ∆E
4
= 0.1, ∆E
5
= 1.0, ∆E
6
= 10.0, and ∆E
7
= 100 is accepted by the Simulated
Annealing algorithm at temperatures T
1
= 0, T
2
= 1, T
3
= 10, and T
4
= 100. Use
the same formulas used in Algorithm 27.1 (consider the discussion in Section 27.2.1).
[15 points]
74. Compute the temperatures needed so that solutions with ∆E
1
= 0, ∆E
2
= 0.001,
∆E
3
= 0.01, ∆E
4
= 0.1, ∆E
5
= 1.0, ∆E
6
= 10.0, and ∆E
7
= 100 are accepted by the
Simulated Annealing algorithm with probabilities P
1
= 0, P
2
= 0.1, P
3
= 0.25, and
P
4
= 0.5. Base your calculations on the formulas used in Algorithm 27.1 and consider
the discussion in Section 27.2.1.
[15 points]
Chapter 28
Evolutionary Algorithms
28.1 Introduction
Deﬁnition D28.1 (Evolutionary Algorithm). Evolutionary Algorithms
1
(EAs) are
populationbased metaheuristic optimization algorithms which use mechanisms inspired by
the biological evolution, such as mutation, crossover, natural selection, and survival of the
ﬁttest, in order to reﬁne a set of candidate solutions iteratively in a cycle. [167, 171, 172]
Evolutionary Algorithms have not been invented in a canonical form. Instead, they are a
large family of diﬀerent optimization methods which have, to a certain degree, been de
veloped independently by diﬀerent researchers. Information on their historic development
is thus largely distributed amongst the chapters of this book which deal with the speciﬁc
members of this family of algorithms.
Like other metaheuristics, all EAs share the “black box” optimization character and
make only few assumptions about the underlying objective functions. Their consistent and
high performance in many diﬀerent problem domains (see, e.g., Table 28.4 on page 314) and
the appealing idea of copying natural evolution for optimization made them one the most
popular metaheuristic.
As in most metaheuristic optimization algorithms, we can distinguish between single
objective and multiobjective variants. The latter means that multiple, possibly conﬂict
ing criteria are optimized as discussed in Section 3.3. The general area of Evolutionary
Computation that deals with multiobjective optimization is called EMOO, evolutionary
multiobjective optimization and the multiobjective EAs are called, well, multiobjective
Evolutionary Algorithms.
Deﬁnition D28.2 (MultiObjective Evolutionary Algorithm). A multiobjective Evo
lutionary Algorithm (MOEA) is an Evolutionary Algorithm suitable to optimize multiple
criteria at once [595, 596, 737, 740, 954, 1964, 2783].
In this section, we will ﬁrst outline the basic cycle in which Evolutionary Algorithms proceed
in Section 28.1.1. EAs are inspired by natural evolution, so we will compare the natural
paragons of the single steps of this cycle in detail in Section 28.1.2. Later in this chapter,
we will outline possible algorithms which can be used for the processes in EAs and ﬁnally
give some information on conferences, journals, and books on EAs as well as it applications.
1
http://en.wikipedia.org/wiki/Artificial_evolution [accessed 20070703]
254 28 EVOLUTIONARY ALGORITHMS
28.1.1 The Basic Cycle of EAs
All evolutionary algorithms proceed in principle according to the scheme illustrated in Fig
Evaluation
computetheobjective
valuesofthesolution
candidates
GPM
applythegenotype
phenotypemappingand
obtainthephenotypes
FitnessAssignment
usetheobjectivevalues
todeterminefitness
values
InitialPopulation
createaninitial
populationofrandom
individuals
createnewindividuals
fromthematingpoolby
crossoverandmutation
Reproduction
Selection
selectthefittestindi
vidualsforreproduction
Figure 28.1: The basic cycle of Evolutionary Algorithms.
ure 28.1 which is a specialization of the basic cycle of metaheuristics given in Figure 1.8 on
page 35. We deﬁne the basic EA for singleobjective optimization in Algorithm 28.1 and the
multiobjective version in Algorithm 28.2.
1. In the ﬁrst generation (t = 1), a population pop of ps individuals p is created. The
function createPop(ps) produces this population. It depends on its implementation
whether the individuals p have random genomes p.g (the usual case) or whether the
population is seeded (see Paragraph 28.1.2.2.2).
2. the genotypes p.g are translated to phenotypes p.x (see Section 28.1.2.3). This
genotypephenotype mapping may either directly be performed to all individuals
(performGPM) in the population or can be a part of the process of evaluating the
objective functions.
3. The values of the objective functions f ∈ f are evaluated for each candidate solu
tion p.x in pop (and usually stored in the individual records) with the operation
“computeObjectives”. This evaluation may incorporate complicated simulations and
calculations. In singleobjective Evolutionary Algorithms, f = ¦f¦ and, hence, [f [ = 1
holds.
4. With the objective functions, the utilities of the diﬀerent features of the candidate
solutions p.x have been determined and a scalar ﬁtness value v(p) can now be assigned
to each of them (see Section 28.1.2.5). In singleobjective Evolutionary Algorithms,
v ≡ f holds.
5. A subsequent selection process selection(pop, v, mps) (or selection(pop, f, mps), if v ≡ f,
as is the case in singleobjective Evolutionary Algorithms) ﬁlters out the candidate
solutions with bad ﬁtness and allows those with good ﬁtness to enter the mating pool
with a higher probability (see Section 28.1.2.6).
6. In the reproduction phase, oﬀspring are derived from the genotypes p.g of the selected
individuals p ∈ mate by applying the search operations searchOp ∈ Op (which are
28.1. INTRODUCTION 255
called reproduction operations in the context of EAs, see Section 28.1.2.7). There usu
ally are two diﬀerent reproduction operations: mutation, which modiﬁes one genotype,
and crossover, which combines two genotypes to a new one. Whether the whole popu
lation is replaced by the oﬀspring or whether they are integrated into the population as
well as which individuals recombined with each other depends on the implementation
of the operation “reproducePop”. We highlight the transition between the generations
in Algorithm 28.1 and 28.2 by incrementing a generation counter t and annotating the
populations with it, i. e., pop(t).
7. If the terminationCriterion
__
is met, the evolution stops here. Otherwise, the algo
rithm continues at Point 2.
Algorithm 28.1: X ←− simpleEA(f, ps, mps)
Input: f: the objective/ﬁtness function
Input: ps: the population size
Input: mps: the mating pool size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 pop(t = 1) ←− createPop(ps)
4 continue ←− true
5 while continue do
6 pop(t) ←− performGPM(pop(t) , gpm)
7 pop(t) ←− computeObjectives(pop(t) , f)
8 if terminationCriterion
__
then
9 continue ←− false
10 else
11 mate ←− selection(pop(t) , f, mps)
12 t ←− t + 1
13 pop(t) ←− reproducePop(mate, ps)
14 return extractPhenotypes(extractBest(pop(t) , f))
Algorithm 28.1 is strongly simpliﬁed. The termination criterion terminationCriterion
__
,
for example, is usually invoked after each and every evaluation of the objective functions.
Often, we have a limit in terms of objective function evaluations (see, for instance Sec
tion 6.3.3). Then, it is not suﬃcient to check terminationCriterion
__
after each generation,
since a generation involves computing the objective values of multiple individuals. You can
ﬁnd a more precise version of the basic Evolutionary Algorithm scheme (with steadystate
population treatment) is speciﬁed in Listing 56.11 on page 746.
Also, the return value(s) of an Evolutionary Algorithm often is not the best individual(s)
from the last generation, but the best candidate solution discovered so far. This, however,
demands for maintaining an archive, a thing which – especially in the case of multiobjective
or multimodal optimization.
An example implementation of Algorithm 28.2 is given in Listing 56.12 on page 750.
28.1.2 Biological and Artiﬁcial Evolution
The concepts of Evolutionary Algorithms are strongly related to the way in which the
256 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.2: X ←− simpleMOEA(OptProb, ps, mps)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: ps: the population size
Input: mps: the mating pool size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: v: the ﬁtness function resulting from the ﬁtness assigning process
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 pop(t = 1) ←− createPop(ps)
4 continue ←− true
5 while continue do
6 pop(t) ←− performGPM(pop(t) , gpm)
7 pop(t) ←− computeObjectives(pop(t) , f)
8 if terminationCriterion
__
then
9 continue ←− false
10 else
11 v ←− assignFitness(pop(t) , cmp
f
)
12 mate ←− selection(pop, v, mps)
13 t ←− t + 1
14 pop(t) ←− reproducePop(mate, ps)
15 return extractPhenotypes(extractBest(pop, cmp
f
))
28.1. INTRODUCTION 257
development of species in nature could proceed. We will outline the historic ideas behind
evolution and put the steps of a multiobjective Evolutionary Algorithm into this context.
28.1.2.1 The Basic Principles from Nature
In 1859, Darwin [684] published his book “On the Origin of Species”
2
in which he identiﬁed
the principles of natural selection and survival of the ﬁttest as driving forces behind the
biological evolution. His theory can be condensed into ten observations and deductions
[684, 1845, 2938]:
1. The individuals of a species possess great fertility and produce more oﬀspring than
can grow into adulthood.
2. Under the absence of external inﬂuences (like natural disasters, human beings, etc.),
the population size of a species roughly remains constant.
3. Again, if no external inﬂuences occur, the food resources are limited but stable over
time.
4. Since the individuals compete for these limited resources, a struggle for survival ensues.
5. Especially in sexual reproducing species, no two individuals are equal.
6. Some of the variations between the individuals will aﬀect their ﬁtness and hence, their
ability to survive.
7. Some of these variations are inheritable.
8. Individuals which are less ﬁt are also less likely to reproduce. The ﬁttest individuals
will survive and produce oﬀspring more probably.
9. Individuals that survive and reproduce will likely pass on their traits to their oﬀspring.
10. A species will slowly change and adapt more and more to a given environment during
this process which may ﬁnally even result in new species.
These historic ideas themselves remain valid to a large degree even today. However, some
of Darwin’s speciﬁc conclusions can be challenged based on data and methods which simply
were not available at his time. Point 10, the assumption that change happens slowly over
time, for instance, became controversial as fossils indicate that evolution seems to “at rest”
for longer times which exchange with short periods of sudden, fast developments [473, 2778]
(see Section 41.1.2 on page 493).
28.1.2.2 Initialization
28.1.2.2.1 Biology
Until now, it is not clear how life started on earth. It is assumed that before the ﬁrst organ
isms emerged, an abiogenesis
3
, a chemical evolution took place in which the chemical com
ponents of life formed. This process was likely divided in three steps: (1) In the ﬁrst step,
simple organic molecules such as alcohols, acids, purines
4
(such as adenine and guanine),
and pyrimidines
5
(such as thymine, cytosine, and uracil) were synthesized from inorganic
2
http://en.wikipedia.org/wiki/The_Origin_of_Species [accessed 20070703]
3
http://en.wikipedia.org/wiki/Abiogenesis [accessed 20090805]
4
http://en.wikipedia.org/wiki/Purine [accessed 20090805]
5
http://en.wikipedia.org/wiki/Pyrimidine [accessed 20090805]
258 28 EVOLUTIONARY ALGORITHMS
substances. (2) Then, simple organic molecules were combined to the basic components of
complex organic molecules such as monosaccharides, amino acids
6
, pyrroles, fatty acids, and
nucleotides
7
. (3) Finally, complex organic molecules emerged.Many diﬀerent theories exist
on how this could have taken place [176, 1300, 1698, 1713, 1837, 2084, 2091, 2432]. An impor
tant milestone in the research in this area is likely the MillerUrey experiment
8
[1904, 1905]
where it was shown that amino acids can form under conditions such as those expected to be
present in the early time of the earth from inorganic substances. The ﬁrst singlecelled life
most probably occurred in the Eoarchean
9
, the earth age 3.8 billion years ago, demarking
the onset of evolution.
28.1.2.2.2 Evolutionary Algorithms
Usually, when an Evolutionary Algorithm starts up, there exists no information about what
is good or what is bad. Basically, only random genes are coupled together as individuals
in the initial population pop(t = 0). This form of initialization can be considered to be
similar to the early steps of chemical and biological evolution, where it was unclear which
substances or primitive life form would be successful and nature experimented with many
diﬀerent reactions and concepts.
Besides random initialization, Evolutionary Algorithms can also be seeded [1126, 1139,
1471]. If seeding is applied, a set of individuals which already are adapted to the optimization
criteria is inserted into the initial population. Sometimes, the entire ﬁrst population consists
of such individuals with good characteristics only. The candidate solutions used for this may
either have been obtained manually or by a previous optimization step [1737, 2019].
The goal for seeding is be to achieve faster convergence and better results in the evolu
tionary process. However, it also raises the danger of premature convergence (see Chapter 13
on page 151), since the alreadyadapted candidate solutions will likely be much more com
petitive than the random ones and thus, may wipe them out of the population. Care must
thus be taken in order to achieve the wanted performance improvement.
28.1.2.3 GenotypePhenotype Mapping
28.1.2.3.1 Biology
TODO
Example E28.1 (Evolution of Fish).
In the following, let us discuss the evolution on the example of ﬁsh. We can consider a ﬁsh as
an instance of its heredity information, as one possible realization of the blueprint encoded
in its DNA.
28.1.2.3.2 Evolutionary Algorithms
At the beginning of every generation in an EA, each genotype p.g is instantiated as a
new phenotype p.x = gpm(p.g). Genotypephenotype mappings are discussed in detail in
Section 28.2.
6
http://en.wikipedia.org/wiki/Amino_acid [accessed 20090805]
7
http://en.wikipedia.org/wiki/Nucleotide [accessed 20090805]
8
http://en.wikipedia.org/wiki/MillerUrey_experiment [accessed 20090805]
9
http://en.wikipedia.org/wiki/Eoarchean [accessed 20070703]
28.1. INTRODUCTION 259
28.1.2.4 Objective Values
28.1.2.4.1 Biology
There are diﬀerent criteria which determine whether an organism can survive in its environ
ment. Often, individuals are good in some characteristics but rarely reach perfection in all
possible aspects.
Example E28.2 (Evolution of Fish (Example E28.1) Cont.).
The survival of the genes of the ﬁsh depends on how good the ﬁsh performs in the ocean.
Its ﬁtness, however, is not only determined by one single feature of the phenotype like its
size. Although a bigger ﬁsh will have better chances to survive, size alone does not help if
it is too slow to catch any prey. Also its energy consumption in terms of calories per time
unit which are necessary to sustain its life functions should be low so it does not need to eat
all the time. Other factors inﬂuencing the ﬁtness positively are formae like sharp teeth and
colors that blend into the environment so it cannot be seen too easily by its predators such
as sharks. If its camouﬂage is too good on the other hand, how will it ﬁnd potential mating
partners? And if it is big, it will usually have higher energy consumption. So there may be
conﬂicts between the desired properties.
28.1.2.4.2 Evolutionary Algorithms
To sum it up, we could consider the life of the ﬁsh as the evaluation process of its genotype
in an environment where good qualities in one aspect can turn out as drawbacks from other
perspectives. In multiobjective Evolutionary Algorithms, this is exactly the same. For each
problem that we want to solve, we can specify one or multiple socalled objective functions
f ∈ f . An objective function f represents one feature that we are interested in.
Example E28.3 (Evolution of a Car).
Let us assume that we want to evolve a car (a pretty weird assumption, but let’s stick with
it). The genotype p.g ∈ G would be the construction plan and the phenotype p.x ∈ X the
real car, or at least a simulation of it. One objective function f
a
would deﬁnitely be safety.
For the sake of our children and their children, the car should also be environmentfriendly,
so that’s our second objective function f
b
. Furthermore, a cheap price f
c
, fast speed f
d
, and
a cool design f
e
would be good. That makes ﬁve objective functions from which for example
the second and the fourth are contradictory (f
b
≁ f
d
).
28.1.2.5 Fitness
28.1.2.5.1 Biology
As stated before, diﬀerent characteristics determine the ﬁtness of an individual. The biolog
ical ﬁtness is a scalar value that describes the number of oﬀspring of an individual [2542].
260 28 EVOLUTIONARY ALGORITHMS
Example E28.4 (Evolution of Fish (Example E28.1) Cont.).
In Example E28.1, the ﬁsh genome is instantiated and proved its physical properties in
several diﬀerent categories in Example E28.2. Whether one speciﬁc ﬁsh is ﬁt or not, however,
is still not easy to say: ﬁtness is always relative; it depends on the environment. The ﬁtness
of the ﬁsh is not only deﬁned by its own characteristics but also results from the features of
the other ﬁsh in the population (as well as from the abilities of its prey and predators). If
one ﬁsh can beat another one in all categories, i. e., is bigger, stronger, smarter, and so on,
it is better adapted to its environment and thus, will have a higher chance to survive. This
relation is transitive but only forms a partial order since a ﬁsh which is strong but not very
clever and a ﬁsh that is clever but not strong maybe have the same probability to reproduce
and hence, are not directly comparable.
It is not clear whether a weak ﬁsh with a clever behavioral pattern is worse or better
than a really strong but less cunning one. Both traits are useful in the evolutionary process
and maybe, one ﬁsh of the ﬁrst kind will sometimes mate with one of the latter and produce
an oﬀspring which is both, intelligent and powerful.
Example E28.5 (Computer Science and Sports).
Fitness depends on the environment. In my department, I may be considered as average ﬁt.
If took a stroll to the department of sports science, however, my relative physical suitability
will decrease. Yet, my relative computer science skills will become much better.
28.1.2.5.2 Evolutionary Algorithms
The ﬁtness assignment process in multiobjective Evolutionary Algorithms computes one
scalar ﬁtness value for each individual p in a population pop by building a ﬁtness function
v (see Deﬁnition D5.1 on page 94). Instead of being an a posteriori measure of the number
of oﬀspring of an individual, it rather describes the relative quality of a candidate solution,
i. e., its ability to produce oﬀspring [634, 2987].
Optimization criteria may contradict, so the basic situation in optimization very much
corresponds to the one in biology. One of the most popular methods for computing the ﬁtness
is called Pareto ranking
10
. It performs comparisons similar to the ones we just discussed.
Some popular ﬁtness assignment processes are given in Section 28.3.
28.1.2.6 Selection
28.1.2.6.1 Biology
Selection in biology has many diﬀerent facets [2945]. Let us again assume ﬁtness to be a
measure for the expected number of oﬀspring, as done in Paragraph 28.1.2.5.1. How ﬁt an
individual is then does not necessarily determine directly if it can produce oﬀspring and, if
so, how many. There are two more aspects which have inﬂuence on this: the environment
(including the rest of the population) and the potential mating partners. These factors are
the driving forces behind the three selection steps in natural selection
11
: (1) environmental
selection
12
(or survival selection and ecological selection) which determines whether an in
dividual survives to reach fertility [684], (2) sexual selection, i. e., the choice of the mating
partner(s) [657, 2300], and (3) parental selection – possible choices of the parents against
pregnancy or newborn [2300].
10
Pareto comparisons are discussed in Section 3.3.5 on page 65 and Pareto ranking is introduced in
Section 28.3.3.
11
http://en.wikipedia.org/wiki/Natural_selection [accessed 20100803]
12
http://en.wikipedia.org/wiki/Ecological_selection [accessed 20100803]
28.1. INTRODUCTION 261
Example E28.6 (Evolution of Fish (Example E28.1) Cont.).
An intelligent ﬁsh may be eaten by a shark, a strong one can die from a disease, and even
the most perfect ﬁsh could die from a lighting strike before ever reproducing. The process
of selection is always stochastic, without guarantees – even a ﬁsh that is small, slow, and
lacks any sophisticated behavior might survive and could produce even more oﬀspring than
a highly ﬁt one. The degree to which a ﬁsh is adapted to its environment therefore can
be considered as some sort of basic probability which determines its expected chances to
survive environment selection. This is the form of selection propagated by Darwin [684] in
his seminal book “On the Origin of Species”.
The sexual or mating selection describes the fact that survival alone does not guarantee
reproduction: At least for sexually reproducing species, two individuals are involves in this
process. Hence, even a ﬁsh which is perfectly adapted to its environment will have low
chances to create oﬀspring if it is unable to attract female interest. Sexual selection is
considered to be responsible for many of nature’s beauty, such as the colorful wings of a
butterﬂy or the tail of a peacock [657, 2300].
Example E28.7 (Computer Science and Sports (Example E28.5) cnt.).
For today’s noble computer scientist, survival selection is not the threatening factor. Further
more, our relative ﬁtness is on par with the ﬁtness of the guys with the sports department,
at least in terms tradeoﬀ between theoretical and physical abilities. Yet, surprisingly, the
sexual selection is less friendly to some of us. Indeed, parallels with human behavior and
peacocks which prefer mating partners with colorful feathers over others regardless of their
further qualities can be drawn.
The third selection factor, parental , may lead to examples for reinforcement of certain phys
ical characteristics which are much more extreme than Example E28.7. Rich Harris [2300]
argues that after the fertilization, i. e., before or after birth, parents may make a choice
for or against the life of their oﬀspring. If the oﬀspring does or is likely to exhibit certain
unwanted traits, this decision may be negative. A negative decision may also be caused by
environmental or economical decisions. A negative decision, however, may also again be
overruled after birth if the oﬀspring turns out to possess especially favorable characteris
tics. Rich Harris [2300], for instance, argues that parents in prehistoric European and Asian
cultures may this way have actively selected hairlessness and pale skin.
28.1.2.6.2 Evolutionary Algorithms
In Evolutionary Algorithms, survival selection is always applied in order to steer the op
timization process towards good areas in the search space. From time to time, also a
subsequent sexual selection is used. For survival selection, the socalled selection algorithms
mate = selection(pop, v, mps) determine the mps individuals from the population pop which
should enter the mating pool mate based on their scalar ﬁtness v. selection(pop, v, mps)
usually picks the ﬁttest individuals and places them into mate with the highest probability.
The oldest selection scheme is called Roulette wheel and will be introduced in Section 28.4.3
on page 290 in detail. In the original version of this algorithm (intended for ﬁtness maxi
mization), the chance of an individual p to reproduce is proportional to its ﬁtness v(p)pop.
More information on selection algorithms is given in Section 28.4.
28.1.2.7 Reproduction
262 28 EVOLUTIONARY ALGORITHMS
28.1.2.7.1 Biology
Last but not least, there is the reproduction. For this purpose, two approaches exist in
nature: the sexual and the asexual method.
Example E28.8 (Evolution of Fish (Example E28.1) Cont.).
Fish reproduce sexually. Whenever a female ﬁsh and a male ﬁsh mate, their genes will
be recombined by crossover. Furthermore, mutations may take place which slightly al
ter the DNA. Most often, the mutations aﬀect the characteristics of resulting larva only
slightly [2303]. Since ﬁt ﬁsh produce oﬀspring with higher probability, there is a good
chance that the next generation will contain at least some individuals that have combined
good traits from their parents and perform even better than them.
Asexual reproducing species basically create clones of themselves, a process during which,
again, mutations may take place. Regardless of how they were produced, the new generation
of oﬀspring now enters the struggle for survival and the cycle of evolution starts in another
round.
28.1.2.7.2 Evolutionary Algorithms
In Evolutionary Algorithms, individuals usually do not have such a “gender”. Yet, the
concept of recombination, i. e., of binary search operations, is one of the central ideas of this
metaheuristic. Each individual from the mating pool can potentially be recombined with
every other one. In the car example, this means that we could copy the design of the engine
of one car and place it into the construction plan of the body of another car.
The concept of mutation is present in EAs as well. Parts of the genotype (and hence,
properties of the phenotype) can randomly be changed with a unary search operation. This
way, new construction plans for new cars are generated. The common deﬁnitions for repro
duction operations in Evolutionary Algorithms are given in Section 28.5.
28.1.2.8 Does the Natural Paragon Fit?
At this point it should be mentioned that the direct reference to Darwinian evolution in
Evolutionary Algorithms is somehow controversial. The strongest argument against this
reference is that Evolutionary Algorithms introduce a change in semantics by being goal
driven [2772] while the natural evolution is not.
Paterson [2134] further points out that “neither GAs [Genetic Algorithms] nor GP [Ge
netic Programming] are concerned with the evolution of new species, nor do they use natural
selection.” On the other hand, nobody would claim that the idea of selection has not been
borrowed from nature although many additions and modiﬁcations have been introduced in
favor for better algorithmic performance.
An argument against close correspondence between EAs and natural evolution concerns
the development of diﬀerent species. It depends on the deﬁnition of species: According to
Wikipedia – The Free Encyclopedia [2938], “a species is a class of organisms which are very
similar in many aspects such as appearance, physiology, and genetics.” In principle, there
is some elbowroom for us and we may indeed consider even diﬀerent solutions to a single
problem in Evolutionary Algorithms as members of a diﬀerent species – especially if the
binary search operation crossover/recombination applied to their genomes cannot produce
another valid candidate solution.
In Genetic Algorithms, the crossover and mutation operations may be considered to be
strongly simpliﬁed models of their natural counterparts. In many advanced EAs there may
be, however, operations which work entirely diﬀerently. Diﬀerential Evolution, for instance,
features a ternary search operation (see Chapter 33). Another example is [2912, 2914], where
28.1. INTRODUCTION 263
the search operations directly modify the phenotypes (G = X) according to heuristics and
intelligent decision.
Paterson [2134] points out that “neither GAs [Genetic Algorithms] nor GP [Genetic
Programming] are concerned with the evolution of new species, nor do they use natural
selection.” Nonetheless, nobody would claim that the idea of selection has not been borrowed
from nature although many additions and modiﬁcations have been introduced in favor for
better algorithmic performance.
Furthermore, although the concept of ﬁtness
13
in nature is controversial [2542], it is often
considered as an a posteriori measurement. It then deﬁnes the ratio between the numbers
of occurrences of a genotype in a population after and before selection or the number of
oﬀspring an individual has in relation to the number of oﬀspring of another individual. In
Evolutionary Algorithms, ﬁtness is an a priori quantity denoting a value that determines
the expected number of instances of a genotype that should survive the selection process.
However, one could conclude that biological ﬁtness is just an approximation of the a priori
quantity arisen due to the hardness (if not impossibility) of directly measuring it.
My personal opinion (which may as well be wrong) is that the citation of Darwin here is
well motivated since there are close parallels between Darwinian evolution and Evolutionary
Algorithms. Nevertheless, natural and artiﬁcial evolution are still two diﬀerent things and
phenomena observed in either of the two do not necessarily carry over to the other.
28.1.2.9 Formae and Implicit Parallelistm
Let us review our introductory ﬁsh example in terms of forma analysis. Fish can, for instance,
be characterized by the properties “clever” and “strong”. For the sake of simplicity, let us
assume that both properties may be true or false for a single individual. They hence
deﬁne two formae each. A third property can be the color, for which many diﬀerent possible
variations exist. Some of them may be good in terms of camouﬂage, others maybe good in
terms of ﬁnding mating partners. Now a ﬁsh can be clever and strong at the same time, as
well as weak and green. Here, a living ﬁsh allows nature to evaluate the utility of at least
three diﬀerent formae.
This issue has ﬁrst been stated by Holland [1250] for Genetic Algorithms and is termed
implicit parallelism (or intrinsic parallelism). Since then, it has been studied by many dif
ferent researchers [284, 1128, 1133, 2827]. If the search space and the genotypephenotype
mapping are properly designed, implicit parallelism in conjunction with the crossover/re
combination operations is one of the reasons why Evolutionary Algorithms are such a suc
cessful class of optimization algorithms. The Schema Theorem discussed in Section 29.5.3
on page 342 gives a further explanation for this from of parallel evaluation.
28.1.3 Historical Classiﬁcation
The family of Evolutionary Algorithms historically encompasses ﬁve members, as illustrated
in Figure 28.2. We will only enumerate them here in short. Indepth discussions will follow
in the corresponding chapters.
1. Genetic Algorithms (GAs) are introduced in Chapter 29 on page 325. GAs subsume
all Evolutionary Algorithms which have stringbased search spaces G in which they
mainly perform combinatorial search.
2. The set of Evolutionary Algorithms which explore the space of real vectors X ⊆ R
n
,
often using more advanced mathematical operations, is called Evolution Strategies (ES,
see Chapter 30 on page 359). Closely related is Diﬀerential Evolution, a numerical
optimizer using a ternary search operation.
13
http://en.wikipedia.org/wiki/Fitness_(biology) [accessed 20080810]
264 28 EVOLUTIONARY ALGORITHMS
3. For Genetic Programming (GP), which will be elaborated on in Chapter 31 on page 379,
we can provide two deﬁnitions: On one hand, GP includes all Evolutionary Algorithms
that grow programs, algorithms, and these alike. On the other hand, also all EAs that
evolve treeshaped individuals are instances of Genetic Programming.
4. Learning Classiﬁer Systems (LCS), discussed in Chapter 35 on page 457, are learning
approaches that assign output values to given input values. They often internally use
a Genetic Algorithm to ﬁnd new rules for this mapping.
5. Evolutionary Programming (EP, see Chapter 32 on page 413) approaches usually fall
into the same category as Evolution Strategies, they optimize vectors of real numbers.
Historically, such approaches have also been used to derive algorithmlike structures
such as ﬁnite state machines.
EvolutionaryAlgorithms
LearningClassifier
Systems
GeneticAlgorithms
Differential
Evolution
GeneticProgramming
GGGP
LGP
SGP
Evolutionary
Programming
EvolutionStrategy
Figure 28.2: The family of Evolutionary Algorithms.
The early research [718] in Genetic Algorithms (see Section 29.1 on page 325), Genetic
Programming (see Section 31.1.1 on page 379), and Evolutionary Programming (see ??
on page ??) date back to the 1950s and 60s. Besides the pioneering work listed in these
sections, at least other important early contribution should not go unmentioned here: The
Evolutionary Operation (EVOP) approach introduced by Box and Draper [376, 377] in the
late 1950s. The idea of EVOP was to apply a continuous and systematic scheme of small
changes in the control variables of a process. The eﬀects of these modiﬁcations are evaluated
and the process is slowly shifted into the direction of improvement. This idea was never
realized as a computer algorithm, but Spendley et al. [2572] used it as basis for their simplex
method which then served as progenitor of the Downhill Simplex algorithm
14
by Nelder and
Mead [2017]. [718, 1719] Satterthwaite’s REVOP [2407, 2408], a randomized Evolutionary
Operation approach, however, was rejected at this time [718].
Like in Section 1.3 on page 31, we here classiﬁed diﬀerent algorithms according to their
semantics, in other words, corresponding to their speciﬁc search and problem spaces and
the search operations [634]. All ﬁve major approaches can be realized with the basic scheme
deﬁned in Algorithm 28.1. To this simple structure, there exist many general improve
ments and extensions. Often, these do not directly concern the search or problem spaces.
14
We discuss Nelder and Mead’s Downhill Simplex [2017] optimization method in Chapter 43 on page 501.
28.1. INTRODUCTION 265
Then, they also can be applied to all members of the EA family alike. Later in this chap
ter, we will discuss the major components of many of today’s most eﬃcient Evolutionary
Algorithms [593]. Some of the distinctive features of these EAs are:
1. The population size and the number of populations used.
2. The method of selecting the individuals for reproduction.
3. The way the oﬀspring is included into the population(s).
28.1.4 Populations in Evolutionary Algorithms
Evolutionary Algorithms try to simulate processes known from biological evolution in order
to solve optimization problems. Species in biology consist of many individuals which live
together in a population. The individuals in a population compete but also mate and with
each other generation for generation. In EAs, it is quite similar: the population pop ∈
(GX)
ps
is a set of ps individuals p ∈ G X where each individual p possesses heredity
information (it’s genotype p.g) and a physical representation (the phenotype p.x).
There exist various way in which an Evolutionary Algorithm can process its population.
Especially interesting is how the population pop(t + 1) of the next generation is formed
from (1) the individuals selected from pop(t) which are collected in the mating pool mate
and (2) the new oﬀspring derived from them.
28.1.4.1 Extinctive/Generational EAs
If it the new population only contains the oﬀspring, we speak of extinctive selection [2015,
2478]. Extinctive selection can be compared with ecosystems of small singlecell organisms
(protozoa
15
) which reproduce in a asexual ﬁssiparous
16
manner, i. e., by cell division. In this
case, of course, the elders will not be present in the next generation. Other comparisons can
partly be drawn to the sexual reproducing to octopi, where the female dies after protecting
the eggs until the larvae hatch, or to the black widow spider where the female devours
the male after the insemination. Especially in the area of Genetic Algorithms, extinctive
strategies are also known as generational algorithms.
Deﬁnition D28.3 (Generational). In Evolutionary Algorithms that are genera
tional [2226], the next generation will only contain the oﬀspring of the current one and
no parent individuals will be preserved.
Extinctive Evolutionary Algorithms can further be divided into left and right selec
tion [2987]. In left extinctive selections, the best individuals are not allowed to reproduce in
order to prevent premature convergence of the optimization process. Conversely, the worst
individuals are not permitted to breed in right extinctive selection schemes in order to reduce
the selective pressure since they would otherwise scatter the ﬁtness too much [2987].
28.1.4.2 Preservative EAs
In algorithms that apply a preservative selection scheme, the population is a combination of
the previous population and its oﬀspring [169, 1461, 2335, 2772]. The biological metaphor for
such algorithms is that the lifespan of many organisms exceeds a single generation. Hence,
parent and child individuals compete with each other for survival.
Even in preservative strategies, it is not granted that the best individuals will always
survive, unless truncation selection (as introduced in Section 28.4.2 on page 289) is applied.
In general, most selection methods are randomized. Even if they pick the best candidate
solutions with the highest probabilities, they may also select worse individuals.
15
http://en.wikipedia.org/wiki/Protozoa [accessed 20080312]
16
http://en.wikipedia.org/wiki/Binary_fission [accessed 20080312]
266 28 EVOLUTIONARY ALGORITHMS
28.1.4.2.1 SteadyState Evolutionary Algorithms
Steadystate Evolutionary Algorithms [519, 2041, 2270, 2316, 2641], abbreviated by SSEA,
are preservative Evolutionary Algorithms which usually do not allow inferior individuals to
replace better ones in the population. Therefore, the binary search operator (crossover/re
combination) is often applied exactly once per generation. The new oﬀspring then replaces
the worst member of the population if and only if it has better ﬁtness.
Steadystate Evolutionary Algorithms often produce better results than generational
EAs. Chafekar et al. [519], for example, introduce steadystate Evolutionary Algorithms
that are able to outperform generational NSGAII (which you can ﬁnd summarized in ??
on page ??) for some diﬃcult problems. In experiments of Jones and Soule [1463] (primarily
focused on other issues), steadystate algorithms showed better convergence behavior in a
multimodal landscape. Similar results have been reported by Chevreux [558] in the context
of molecule design optimization. The steadystate GENITOR has shown good behavior in
an early comparative study by Goldberg and Deb [1079]. On the other hand, with a badly
conﬁgured steadystate approach, we run also the risk of premature convergence. In our own
experiments on a realworld Vehicle Routing Problem discussed in Section 49.3 on page 536,
for instance, steadystate algorithms outperformed generational ones – but only if sharing
was applied [2912, 2914].
28.1.4.3 The µλ Notation
The µλ notation has been developed to describe the treatment of parent and oﬀspring
individuals in Evolution Strategies (see Chapter 30) [169, 1243, 2437]. Now it is often used
in other Evolutionary Algorithms as well. Usually, this notation implies truncation selection,
i. e., the deterministic selection of the µ best individuals from a population. Then,
1. λ denotes the number of oﬀspring (to be) created and
2. µ is the number of parent individuals, i. e., the size of the mating pool (mps = µ).
The population adaptation strategies listed below have partly been borrowed from the Ger
man Wikipedia – The Free Encyclopedia [2938] site for Evolution Strategy
17
:
28.1.4.3.1 (µ +λ)EAs
Using the reproduction operations, from µ parent individuals λ oﬀspring are created from µ
parents. From the joint set of oﬀspring and parents, i. e., a temporary population of size ps =
λ+µ, only the µ ﬁttest ones are kept and the λ worst ones are discarted [302, 1244]. (µ +λ)
strategies are thus preservative. The naming (µ + λ) implies that only unary reproduction
operators are used. However, in practice, this part of the convention is often ignored [302].
28.1.4.3.2 (µ, λ)EAs
For extinctive selection patterns, the (µ, λ)notation is used, as introduced by Schwefel
[2436]. EAs named according to this pattern create λ ≥ µ child individuals from the µ
available genotypes. From these, they only keep the µ best individuals and discard the µ
parents as well as the λ−µ worst children [295, 2436]. Here, λ corresponds to the population
size ps.
Again, this notation stands for algorithms which employ no crossover operator but mu
tation only. Yet, many works on (µ, λ) Evolution Strategies, still apply recombination [302].
28.1.4.3.3 (1 + 1)EAs
The EA only keeps track a single individual which is reproduced. The elder and the oﬀspring
together form the population pop, i. e., ps = 2. From these two, only the better individual
will survive and enter the mating pool mate (mps = 1). This scheme basically is equivalent
to Hill Climbing introduced in Chapter 26 on page 229.
17
http://en.wikipedia.org/wiki/Evolutionsstrategie [accessed 20070703]
28.1. INTRODUCTION 267
28.1.4.3.4 (1, 1)EAs
The EA creates a single oﬀspring from a single parent. The oﬀspring then replaces its parent.
This is equivalent to a random walk and hence, does not optimize very well, see Section 8.2
on page 114.
28.1.4.3.5 (µ + 1)EAs
Here, the population contains µ individuals from which one is drawn randomly. This indi
vidual is reproduced. Alternatively, two parents can be chosen and recombined [302, 2278].
From the joint set of its oﬀspring and the current population, the least ﬁt individual is
removed.
28.1.4.3.6 (µ/ρ, λ)EAs
The notation (µ/ρ, λ) is a generalization of (µ, λ) used to signify the application of ρary
reproduction operators. In other words, the parameter ρ is added to denote the number of
parent individuals of one oﬀspring. Normally, only unary mutation operators (ρ = 1) are
used in Evolution Strategies, the family of EAs the µλ notation stems from. If (binary)
recombination is also applied (as normal in general Evolutionary Algorithms), ρ = 2 holds. A
special case of (µ/ρ, λ) algorithms is the (µ/µ, λ) Evolution Strategy [1843] which, basically,
is similar to an Estimation of Distribution Algorithm.
28.1.4.3.7 (µ/ρ +λ)EAs
Analogously to (µ/ρ, λ)Evolutionary Algorithms, the (µ/ρ + λ)Evolution Strategies are
generalized (µ, λ) approaches where ρ denotes the number of parents of an oﬀspring indi
vidual.
28.1.4.3.8 (µ
′
, λ
′
(µ, λ)
γ
)EAs
Geyer et al. [1044, 1045, 1046] have developed nested Evolution Strategies where λ
′
oﬀ
spring are created and isolated for γ generations from a population of the size µ
′
. In each
of the γ generations, λ children are created from which the ﬁttest µ are passed on to the
next generation. After the γ generations, the best individuals from each of the γ isolated
candidate solutions propagated back to the toplevel population, i. e., selected. Then, the
cycle starts again with λ
′
new child individuals. This nested Evolution Strategy can be
more eﬃcient than the other approaches when applied to complex multimodal ﬁtness envi
ronments [1046, 2279].
28.1.4.4 Elitism
Deﬁnition D28.4 (Elitism). An elitist Evolutionary Algorithm [595, 711, 1691] ensures
that at least one copy of the best individual(s) of the current generation is propagated on
to the next generation.
The main advantage of elitism is that it is guaranteed that the algorithm will converge. This
means, meaning that once the basin of attraction of the global optimum has been discovered,
the Evolutionary Algorithm will converge to that optimum. On the other hand, the risk
of converging to a local optimum is also higher. Elitism is an additional feature of Global
Optimization algorithms – a special type of preservative strategy – which is often realized
by using a secondary population (called the archive) only containing the nonprevailed in
dividuals. This population is updated at the end of each iteration. It should be noted that
268 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.3: X ←− elitistMOEA(OptProb, ps, as, mps)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: ps: the population size
Input: as: the archive size
Input: mps: the population size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: v: the ﬁtness function resulting from the ﬁtness assigning process
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 archive ←− ()
4 pop(t = 1) ←− createPop(ps)
5 continue ←− true
6 while continue do
7 pop(t) ←− performGPM(pop(t) , gpm)
8 pop(t) ←− computeObjectives(pop(t) , f)
9 archive ←− updateOptimalSetN(archive, pop, cmp
f
)
10 archive ←− pruneOptimalSet(archive, as)
11 if terminationCriterion
__
then
12 continue ←− false
13 else
14 v ←− assignFitness(pop(t) , cmp
f
)
15 mate ←− selection(pop, v, mps)
16 t ←− t + 1
17 pop(t) ←− reproducePop(mate, ps)
18 return extractPhenotypes(extractBest(archive, cmp
f
))
28.1. INTRODUCTION 269
elitism is usually of ﬁtness and concentrates on the true objective values of the individuals.
Such an archivebased elitism can also be combined with generational approaches. Algo
rithm 28.3, an extension of Algorithm 28.2 on page 256 and the basic evolutionary cycle
given in Section 28.1.1:
1. The archive archive is the set of best individuals found by the algorithm. Initially, it is
the empty set ∅. Subsequently, it is updated with the function “updateOptimalSetN”
which inserts only the new, unprevailed elements from the population into it and also
removes archived individuals which become prevailed by those new optima. Algorithms
that realize such updating are deﬁned in Section 28.6.1 on page 308.
2. If the archive becomes too large – it might theoretically contain uncountable many
individuals – “pruneOptimalSet” reduces it to a proper size, employing techniques like
clustering in order to preserve the element diversity. More about pruning can be found
in Section 28.6.3 on page 311.
3. You should also notice that both, the ﬁtness assignment and selection processes, of
elitist Evolutionary Algorithms may take the archive as additional parameter. In
principle, such archivebased algorithms can also be used in nonelitist Evolutionary
Algorithms by simply replacing the parameter archive with ∅.
28.1.5 Conﬁguration Parameters of Evolutionary Algorithms
Evolutionary Algorithms are very robust and eﬃcient optimizers which can often also provide
good solutions for hard problems. However, their eﬃcient is strongly inﬂuenced by design
choices and conﬁguration parameters. In Chapter 24, we gave some general rulesofthumb
for the design of the representation used in an optimizer. Here we want to list the set of
conﬁguration and setup choices available to the user of an EA.
B
a
s
i
c
P
a
r
a
m
e
t
e
r
s
A
r
c
h
iv
e
/
P
r
u
n
in
g
S
e
l
e
c
t
i
o
n
A
l
g
o
r
i
t
h
m
F
i
t
n
e
s
s
A
s
s
i
g
n
m
e
n
t
P
r
o
c
e
s
s
S
e
a
r
c
h
S
p
a
c
e
a
n
d
S
e
a
r
c
h
O
p
e
r
a
t
io
n
s
G
e
n
o
t
y
p
e

P
h
e
n
o
t
y
p
e
M
a
p
p
i
n
g
(
p
o
p
u
l
a
t
i
o
n
s
i
z
e
,
m
u

t
a
t
i
o
n
a
n
d
c
r
o
s
s
o
v
e
r
r
a
t
e
s
)
Figure 28.3: The conﬁguration parameters of Evolutionary Algorithms.
Figure 28.3 illustrates these parameters. The performance and success of an evolutionary
optimization approach applied to a problem given by a set of objective functions f and a
problem space X is deﬁned by
270 28 EVOLUTIONARY ALGORITHMS
1. its basic parameter settings such as the population size ps or the crossover rate cr and
mutation rate mr,
2. whether it uses an archive archive of the best individuals found and, if so, the archive
size as and which pruning technology is used to prevent it from overﬂowing,
3. the ﬁtness assignment process “assignFitness” and the selection algorithm “selection”,
4. the choice of the search space G and the search operations Op, and, ﬁnally
5. the genotypephenotype mapping “gpm” connecting the search Space and the problem
space.
In ??, we go more into detail on how to state the conﬁguration of an optimization algorithm
in order to fully describe experiments and to make them reproducible.
28.2 GenotypePhenotype Mappings
In Evolutionary Algorithms as well as in any other optimization method, the search space
G is not necessarily the same as the problem space X. In other words, the elements g
processed by the search operations searchOp ∈ Op may diﬀer from the candidate solutions
x ∈ X which are evaluated by the objective functions f : X → R. In such cases, a mapping
between G and X is necessary: the genotypephenotype mapping (GPM). In Section 4.3 on
page 85 we gave a precise deﬁnition of the term GPM. Generally, we can distinguish four
types of genotypephenotype mappings:
1. If G = X, the genotypephenotype mapping is the identity mapping. The search
operations then directly work on the candidate solutions.
2. In many cases, direct mappings, i. e., simple functional transformations which directly
relate parts of the genotypes and phenotypes are used (see Section 28.2.1).
3. In generative mappings, the phenotypes are built step by step in a process directed by
the genotypes. Here, the genotypes serve as programs which direct the phenotypical
developments (see Section 28.2.2.1).
4. Finally, the third set of genotypephenotype mappings additionally incorporates feed
back from the environment or the objective functions into the developmental process
(see Section 28.2.2.2).
Which kind of genotypephenotype mapping should be used for a given optimization task
strongly depends on the type of the task at hand. For most smallscale and many large
scale problems, direct mappings are completely suﬃcient. However, especially if largescale,
possibly repetitive or recursive structures are to be created, generative or ontogenic mappings
can be the best choice [777, 2735].
28.2.1 Direct Mappings
Direct mappings are explicit functional relationships between the genotypes and the phe
notypes [887]. Usually, each element of the phenotype is related to one element in the
genotype [2735] or to a group of genes. Also, the algorithmic complexity of these mappings
usually is usually not above O(nlog n) where n is the length of a genotype. Based on the
work of Devert [777], we deﬁne explicit or direct mappings as follows:
28.2. GENOTYPEPHENOTYPE MAPPINGS 271
Deﬁnition D28.5 (Explicit Representation). An explicit (or direct) representation is
a representation for which a computable functions exists that takes as argument the size
the genotype and gives an upper bound for the number of algorithm steps of the genotype
phenotype mapping until the phenotypebuilding process has converged to a stable pheno
type.
The algorithm steps mentioned in Deﬁnition D28.5 can be, for example, the write operations
to the phenotype. In the direct mappings we discuss in the remainder of this section, usually
each element of a candidate solution is written to exactly once. If bit strings are transformed
to vectors of natural numbers, for example, each element of the numerical vector is set to
one value. The same goes when scaling real vectors: the elements of the resulting vectors are
written to once. In the following, we provide some examples for direct genotypephenotype
mappings.
Example E28.9 (Binary Search Spaces: Gray Coding).
Bit string genomes are sometimes complemented with the application of gray coding
18
during
the genotypephenotype mapping. If the value of a natural number changes by one, in its
binary complement representation, an arbitrary number of bit may change. 8
10
has the
binary representation 1000
2
but 7
10
has 0111
2
. In gray code, only one bit changes: (one
possible) the gray code representation of 8
10
is 1100
2g
and 7
10
then corresponds to 0100
g2
.
Using gray code may thus be good to preserve locality (see Section 14.2) and to reduce
hidden biases [504] when dealing with bitstring encoded numbers in optimization. Collins
and Eaton [610] studied further encodings for Genetic Algorithms and found that their
Ecode outperform both gray and direct binary coding in function optimization.
Example E28.10 (Real Search Spaces: Normalization).
If bounded realvalued, continuous problem spaces X ⊆ R
n
are investigated, it is possible
that the bounds of the dimensions diﬀer signiﬁcantly. Depending on the optimization algo
rithm and search operations applied, this may deceive the optimization process to give more
emphasis to dimensions with large bounds while neglecting dimensions with smaller bounds.
Example E28.11 (Neglecting One Dimension of X).
Assume that a two dimensional problem space X was searched for the best solution of an
optimization task. The bounds of the ﬁrst dimension are ˘ x
1
= −1 and
˘
x
1
= 1 and the
bounds of the second dimension are ˘ x
2
= −10
9
and
˘
x
2
= 10
9
. The only search operation
available would add random numbers normally distributed with expected value µ = 0 and
standard deviation σ = 0.01 to each dimension of the candidate vector. Then, a more or
less eﬃcient search in the ﬁrst dimension would be possible.
A genotypephenotype mapping which allows the search to operate on G = [0, 1]
n
instead
of X = [˘ x
1
,
˘
x
1
] [˘ x
2
,
˘
x
2
] [˘ x
n
,
˘
x
n
], where ˘ x
i
∈ R and
˘
x
i
∈ R are the lower and upper
bounds of the i
th
dimension of the problem space X, respectively, would prevent such pitfalls
from the beginning:
gpm
scale
(g) = (˘ x
i
+g[i] ∗ (
˘
x
i
− ˘ x
i
∀i ∈ 1 . . . n) (28.1)
18
http://en.wikipedia.org/wiki/Gray_coding [accessed 20070703]
272 28 EVOLUTIONARY ALGORITHMS
28.2.2 Indirect Mappings
Diﬀerent from direct mappings, in the phenotypebuilding process in indirect mappings is
more complex [777]. The basic idea of such developmental approaches is that the phenotypes
are constructed from the genotypes in an iterative way, by stepwise applications of rules and
reﬁnements. These reﬁnements take into consideration some state information such as the
structure of the phenotype. The genotypes usually represent the rules and transformations
which have to be applied to the candidate solutions during the construction phase.
In nature, life begins with a single cell which divides
19
time and again until a mature
individual is formed
20
after the genetic information has been reproduced. The emergence
of a multicellular phenotype from its genotypic representation is called embryogenesis
21
in biology. Many researchers have drawn their inspiration for indirect mappings from this
embryogenesis, from the developmental process which embryos undergo when developing to
a fullyfunctional being inside an egg or the womb of their mother.
Therefore, developmental mappings received colorful names such as artiﬁcial embryo
geny [2601], artiﬁcial ontongeny (AO) [356], computational embryogeny [273, 1637], cellular
encoding [1143], morphogenesis [291, 1431] and artiﬁcial development [2735].
Natural embryogeny is a highly complex translation from hereditary information to phys
ical structures. Among other things, the DNA, for instance, encodes the structural design
information of the human brain. As pointed out by Manos et al. [1829], there are only
about 30 thousand active genes in the human genome (2800 million amino acids) for over
100 trillion neural connections in our cerebrum. A huge manifold of information is hence
decoded from “data” which is of a much lower magnitude. This is possible because the same
genes can be reused in order to repeatedly create the same pattern. The layout of the light
receptors in the eye, for example, is always the same – just their wiring changes.
If we draw inspiration from such a process, we ﬁnd that features such as gene reuse and
(phenotypic) modularity seem to be beneﬁcial. Indeed, allowing for the repetition of the
same modules (cells, receptors, . . . ) encoded in the same way many times is one of the
reasons why natural embryogeny leads to the development of organisms whose number of
modules surpasses the number of genes in the DNA by far.
The inspiration we get from nature can thus lead to good ideas which, in turn, can lead
to better results in optimization. Yet, we would like to point out that being inspired by
nature does – by far – not mean to be like nature. The natural embryogenesis processes
are much more complicated than the indirect mappings applied in Evolutionary Algorithms.
The same goes for approaches which are inspired by chemical diﬀusion and such and such.
Furthermore, many genotypephenotype mappings based on gene reuse, modularity, and
feedback between the phenotypes and the environment are not related to and not inspired
by nature.
In the following, we divide indirect mappings into two categories. We distinguish GPMs
which solely use state information (such as the partlyconstructed phenotype) and such that
additionally incorporate feedback from external information sources, the environment, into
the phenotype building process.
28.2.2.1 Generative Mappings: Phenotypic Interaction
In the following, we provide some examples for generative genotypephenotype mappings.
28.2.2.1.1 Genetic Programming
19
http://en.wikipedia.org/wiki/Cell_division [accessed 20070703]
20
Matter of fact, cell division will continue until the individual dies. However, this is not important here.
21
http://en.wikipedia.org/wiki/Embryogenesis [accessed 20070703]
28.2. GENOTYPEPHENOTYPE MAPPINGS 273
Example E28.12 (GrammarGuided Genetic Programming).
Genetic Programming, as we will discuss in Chapter 31, is the family of Evolutionary Algo
rithms which deals with the evolutionary synthesis of programs. In grammarguided Genetic
Programming, usually the grammar of the programming language in which these programs
are to be speciﬁed is given. Each gene in the genotypes of these approaches usually encodes
for the application of a single rule of the grammar, as is, for instance, the case in Gads
(see ?? on page ??) which uses an integer string genome where each integer corresponds to
the ID of a rule. This way, step by step, complex sentences can be built by unfolding the
genotype. In Grammatical Evolution (??), integer strings with genes identifying grammati
cal productions are used as well. Depending on the unexpanded variables in the phenotype,
these genes may have diﬀerent meanings. Applying the rules identiﬁed by them may intro
duce new unexpanded variables into the phenotype. Hence, the runtime of such a mapping
is not necessarily known in advance.
Example E28.13 (Graphcreating Trees).
In Standard Genetic Programming approaches, the genotypes are trees. Trees are very useful
to express programs and programs can be used to create many diﬀerent structures. In a
genotypephenotype mapping, the programs may, for example, be executed to build graphs
(as in Luke and Spector’s Edge Encoding [1788] discussed here in ?? on page ??) or electrical
circuits [1608]. In the latter case, the structure of the phenotype strongly depends on how
and how often given instructions or automatically deﬁned functions in the genotypes are
called.
28.2.2.2 Ontogenic Mappings: Environmental Interaction
Full ontogenic (or developmental [887]) mappings even go a step farther than just utilizing
state information during the transformation of the genotypes to phenotypes [777]. This
additional information can stem from the objective functions. If simulations are involved
in the computation of the objective functions, the mapping may access the full simulation
information as well.
Example E28.14 (Developmental Wing Construction).
The phenotype could, for instance, be a set of points describing the surface of an airplane’s
wing. The genotype could be a real vector holding the weights of an artiﬁcial neural network.
The artiﬁcial neural network may then represent rules for moving the surface points as feed
back to the air pressure on them given by a simulation of the wing. These rules can describe
a distinct wing shape which develops according to the feedback from the environment.
Example E28.15 (Developmental Termite Nest Construction).
Est´evez and Lipson [887] evolved rules for the construction of a termite nest. The phenotype
is the nest which consists of multiple cells. The ﬁtness of the cell depends on how close the
temperature in each cell comes to a given target temperature. The rules which were evolved
in Est´evez and Lipson’s work are directly applied to the phenotypes in the simulations.
The decide whether and where cells were added to or removed from the nest based on the
temperature and density of the cells in the simulation.
274 28 EVOLUTIONARY ALGORITHMS
28.3 Fitness Assignment
28.3.1 Introduction
In multiobjective optimization, each candidate solution p.x is characterized by a vector
of objective values f (p.x). The early Evolutionary Algorithms, however, were intended to
perform singleobjective optimization, i. e., were capable of ﬁnding solutions for a single
objective function alone. Historically, many algorithms for survival and mating selection
base their decision on such a scalar value.
A ﬁtness assignment process, as deﬁned in Deﬁnition D5.1 on page 94 in Section 5.2 is
thus used to transform the information given by the vector of objective values f (p.x) for
the phenotype p.x ∈ X of an individual p ∈ G X to a scalar ﬁtness value v(p). This
assignment process takes place in each generation of the Evolutionary Algorithm and, as
stated in Section 5.2, may thus change in each iteration.
Including a ﬁtness assignment step may also be beneﬁcial in a MOEA, but can also
prove useful in a singleobjective scenario: The ﬁtness assigned to an individual may not
just reﬂect the absolute utility of an individual, but its suitability relative to the other
individuals in the population pop. Then, the ﬁtness function has the character of a heuristic
function (see Deﬁnition D1.14 on page 34).
Besides such ranking data, ﬁtness can also incorporate density/niching information. This
way, not only the quality of a candidate solution is considered, but also the overall diversity
of the population. Therefore, either the phenotypes p.x ∈ X or the genotypes p.g ∈ G may
be analyzed and compared. This can improve the chance of ﬁnding the global optima as
well as the performance of the optimization algorithm signiﬁcantly. If many individuals in
the population occupy the same rank or do not dominate each other, for instance, such
information will be very helpful.
The ﬁtness v(p.x) thus may not only depend on the candidate solution p.x itself, but
on the whole population pop of the Evolutionary Algorithm (and on the archive archive of
optimal elements, if available). In practical realizations, the ﬁtness values are often stored
in a special member variable in the individual records. Therefore, v(p) can be considered as
a mapping that returns the value of such a variable which has previously been stored there
by a ﬁtness assignment process assignFitness(pop, cmp
f
).
Deﬁnition D28.6 (Fitness Assignment Process). A ﬁtness assignment process “assignFitness”
uses a comparison criterion cmp
f
(see Deﬁnition D3.20 on page 77) to create a function
v : GX →R
+
(see Deﬁnition D5.1 on page 94) which relates a scalar ﬁtness value to each
individual p in the population pop as stated in Equation 28.2 (and archive archive, if an
archive is available Equation 28.3).
v = assignFitness(pop, cmp
f
) ⇒ v(p) ∈ R
+
∀p ∈ pop (28.2)
v = assignFitnessArch(pop, archive, cmp
f
) ⇒ v(p) ∈ R
+
∀p ∈ pop ∪ archive (28.3)
Listing 55.14 on page 718 provides a Java interface which enables us to implement diﬀerent
ﬁtness assignment processes according to Deﬁnition D28.6. In the context of this book, we
generally minimize ﬁtness values, i. e., the lower the ﬁtness of a candidate solution, the
better. If no density or sharing information is included and the ﬁtness only reﬂects the
strict partial order introduced by the Pareto (see Section 3.3.5) or a prevalence relation (see
Section 3.5.2), Equation 28.4 will hold. If further information such as diversity metrics are
included in the ﬁtness, this equation may no longer be applicable.
p
1
.x ≺ p
2
.x ⇒v(p
1
) < v(p
2
) ∀p
1
, p
2
∈ pop ∪ archive (28.4)
28.3. FITNESS ASSIGNMENT 275
28.3.2 Weighted Sum Fitness Assignment
The most primitive ﬁtness assignment strategy would be to just use a weighted sum of
the objective values. This approach is very static and brings about the same problems as
weighted sumbased approach for deﬁning what an optimum is discussed in Section 3.3.3 on
page 62. It makes no use of Pareto or prevalence relations. For computing the weighted sum
of the diﬀerent objective values of a candidate solution, we reuse Equation 3.18 on page 62
from the weighted sum optimum deﬁnition. The weights have to be chosen in a way that
ensures that v(p) ∈ R
+
holds for all individuals p.x ∈ X.
v(p) = assignFitnessWeightedSum(pop) ⇔ (∀p ∈ pop ⇒ v(p) = ws(p.x)) (28.5)
28.3.3 Pareto Ranking
Another simple method for computing ﬁtness values is to let them directly reﬂect the Pareto
domination (or prevalence) relation. There are two ways for doing this:
1. We can assign a value inversely proportional to the number of other individuals p
2
it
prevails to each individual p
1
.
v
1
(p
1
) ≡
1
[p
2
∈ pop : p
1
.x ≺ p
2
.x[ + 1
(28.6)
2. or we can assign the number of other individuals p3 it is dominated by to it.
v
2
(p
1
) ≡ [p
3
∈ pop : p
3
.x ≺ p
2
.x[ (28.7)
In case 1, individuals that dominate many others will here receive a lower ﬁtness value
than those which are prevailed by many. Since ﬁtness is subject to minimization they will
strongly be preferred. Although this idea looks appealing at ﬁrst glance, it has a decisive
drawback which can also be observed in Example E28.16: It promotes individuals that reside
in crowded region of the problem space and underrates those in sparsely explored areas. If an
individual has many neighbors, it can potentially prevail many other individuals. However,
a nonprevailed candidate solution in a sparsely explored region of the search space may not
prevail any other individual and hence, receive ﬁtness 1, the worst possible ﬁtness under v
1
.
Thus, the ﬁtness assignment process achieves exactly the opposite of what we want.
Instead of exploring the problem space and delivering a wide scan of the frontier of best
possible candidate solutions, it will focus all eﬀort on a small set of individuals. We will
only obtain a subset of the best solutions and it is even possible that this ﬁtness assignment
method leads to premature convergence to a local optimum.
A much better second approach for ﬁtness assignment has ﬁrst been proposed by Gold
berg [1075]. Here, the idea is to assign the number of individuals it is prevailed by to each
candidate solution [1125, 1783]. This way, the previously mentioned negative eﬀects will not
occur and the exploration pressure is applied to a much wider area of the Pareto frontier.
This socalled Pareto ranking can be performed by ﬁrst removing all nonprevailed (non
dominated) individuals from the population and assigning the rank 0 to them (since they
are prevailed by no individual and according to Equation 28.7 should receive v
2
). Then, the
same is performed with the rest of the population. The individuals only dominated by those
on rank 0 (now nondominated) will be removed and get the rank 1. This is repeated until
all candidate solutions have a proper ﬁtness assigned to them. Since we follow the idea of the
freer prevalence comparators instead of Pareto dominance relations, we will synonymously
refer to this approach as Prevalence ranking and deﬁne Algorithm 28.4 which is implemented
in Java in Listing 56.16 on page 759. It should be noted that this simple algorithm has
a complexity of O
_
ps
2
∗ [f [
_
. A more eﬃcient approach with complexity O
_
ps log
f −1
ps
_
has been introduced by Jensen [1441] based on a divideandconquer strategy.
276 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.4: v
2
←− assignFitnessParetoRank(pop, cmp
f
)
Input: pop: the population to assign ﬁtness values to
Input: cmp
f
: the prevalence comparator deﬁning the prevalence relation
Data: i, j, cnt: the counter variables
Output: v
2
: a ﬁtness function reﬂecting the Prevalence ranking
1 begin
2 for i ←− len(pop) −1 down to 0 do
3 cnt ←− 0
4 p ←− pop[i]
5 for j ←− len(pop) −1 down to 0 do
// Check whether cmp
f
(pop[j].x, p.x) < 0
6 if (j ,= i) ∧ (pop[j].x ≺ p.x) then cnt ←− cnt + 1
7 v
2
(p) ←− cnt
8 return v
2
Example E28.16 (Pareto Ranking Fitness Assignment).
Figure 28.4 and Table 28.1 illustrate the Pareto relations in a population of 15 individuals
and their corresponding objective values f
1
and f
2
, both subject to minimization.
f
1
f
2
8
0
1
2
3
4
5
6
7
8
10
1 2 3 4 5 6 7 8 9 10 12
11
13
9
10
1 6
7
2
15
14
12
5
3
4
ParetoFrontier
Figure 28.4: An example scenario for Pareto ranking.
28.3. FITNESS ASSIGNMENT 277
p.x dominates is dominated by v
1
(p) v
2
(p)
1 ¦5, 6, 8, 9, 14, 15¦ ∅
1
/7 0
2 ¦6, 7, 8, 9, 10, 11, 13, 14, 15¦ ∅
1
/10 0
3 ¦12, 13, 14, 15¦ ∅
1
/5 0
4 ∅ ∅ 1 0
5 ¦8, 15¦ ¦1¦
1
/3 1
6 ¦8, 9, 14, 15¦ ¦1, 2¦
1
/5 2
7 ¦9, 10, 11, 14, 15¦ ¦2¦
1
/6 1
8 ¦15¦ ¦1, 2, 5, 6¦
1
/2 4
9 ¦14, 15¦ ¦1, 2, 6, 7¦
1
/3 4
10 ¦14, 15¦ ¦2, 7¦
1
/3 2
11 ¦14, 15¦ ¦2, 7¦
1
/3 2
12 ¦13, 14, 15¦ ¦3¦
1
/4 1
13 ¦15¦ ¦2, 3, 12¦
1
/2 3
14 ¦15¦ ¦1, 2, 3, 6, 7, 9, 10, 11, 12¦
1
/2 9
15 ∅ ¦1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14¦ 1 13
Table 28.1: The Pareto domination relation of the individuals illustrated in Figure 28.4.
We computed the ﬁtness values for both approaches for the individuals in the population
given in Figure 28.4 and noted them in Table 28.1. In column v
1
(p), we provide the ﬁtness
values are inversely proportional to the number of individuals which are dominated by p.
A good example for the problem that approach 1 prefers crowded regions in the search
space while ignoring interesting candidate solutions in less populated regions are the four
nonprevailed individuals ¦1, 2, 3, 4¦ on the Pareto frontier. The best ﬁtness is assigned to
the element 2, followed by individual 1. Although individual 7 is dominated (by 1), its
ﬁtness is better than the ﬁtness of the nondominated element 3.
The candidate solution 4 gets the worst possible ﬁtness 1, since it prevails no other
element. Its chances for reproduction are similarly low than those of individual 15 which
is dominated by all other elements except 4. Hence, both candidate solutions will most
probably be not selected and vanish in the next generation. The loss of individual 4 will
greatly decrease the diversity and further increase the focus on the crowded area near 1 and
2.
Column v
2
(p) in Table 28.1 shows that all four nonprevailed individuals now have the
best possible ﬁtness 0 if the method given as Algorithm 28.4 is applied. Therefore, the focus
is not centered to the overcrowded regions.
However, the region around the individuals 1 and 2 has probably already extensively
been explored, whereas the surrounding of candidate solution 4 is rather unknown. More
individuals are present in the topleft area of Figure 28.4 and at least two individuals with
the same ﬁtness as 3 and 4 reside there. Thus, chances are sill that more individuals
from this area will be selected than from the bottomright area. A better approach of
ﬁtness assignment should incorporate such information and put a bit more pressure into the
direction of individual 4, in order to make the Evolutionary Algorithm investigate this area
more thoroughly. Algorithm 28.4 cannot fulﬁll this hope.
28.3.4 Sharing Functions
Previously, we have mentioned that the drawback of Pareto ranking is that it does not
incorporate any information about whether the candidate solutions in the population reside
closely to each other or in regions of the problem space which are only sparsely covered
by individuals. Sharing, as a method for including such diversity information into the
278 28 EVOLUTIONARY ALGORITHMS
ﬁtness assignment process, was introduced by Holland [1250] and later reﬁned by Deb [735],
Goldberg and Richardson [1080], and Deb and Goldberg [742] and further investigated
by [1899, 2063, 2391].
Deﬁnition D28.7 (Sharing Function). A sharing function Sh : R
+
→ R
+
is a function
that maps the distance d = dist(p
1
, p
2
) between two individuals p
1
and p
2
to the real interval
[0, 1]. The value of Sh
σ
is 1 if d = 0 and 1 is d exceeds a speciﬁc constant value σ.
Sh
σ
(d = dist(p
1
, p
2
)) =
_
_
_
1 if d ≤ 0
∈ [0, 1] if 0 < d < σ
0 otherwise
(28.8)
Sharing functions can be employed in many diﬀerent ways and are used by a variety of ﬁtness
assignment processes [735, 1080]. Typically, the simple triangular function “Sh tri” [1268]
or one of its either convex (Sh cvex
ξ
) or concave (Sh ccav
ξ
) pendants with the power
ξ ∈ R
+
, ξ > 0 are applied. Besides using diﬀerent powers of the distanceσratio, another
approach is the exponential sharing method “Sh exp”.
Sh tri
σ
(d) =
_
1 −
d
σ
if 0 ≤ d < σ
0 otherwise
(28.9)
Sh cvex
σ,ξ
(d) =
_
_
1 −
d
σ
_
ξ
if 0 ≤ d < σ
0 otherwise
(28.10)
Sh ccav
σ,ξ
(d) =
_
1 −
_
d
σ
_
ξ
if 0 ≤ d < σ
0 otherwise
(28.11)
Sh exp
σ,ξ
(d) =
_
¸
_
¸
_
1 if d ≤ 0
0 if d ≥ σ
e
−
ξd
σ −e
−ξ
1−e
−ξ
otherwise
(28.12)
For sharing, the distance of the individuals in the search space G as well as their distance
in the problem space X or the objective space Y may be used. If the candidate solutions
are real vectors in the R
n
, we could use the Euclidean distance of the phenotypes of the
individuals directly, i. e., compute dist eucl(p
1
.x, p
2
.x). However, this makes only sense in
lower dimensional spaces (small values of n), since the Euclidean distance in R
n
does not
convey much information for high n.
In Genetic Algorithms, where the search space is the set of all bit strings G = B
n
of the length n, another suitable approach would be to use the Hamming distance
22
dist ham(p
1
.g, p
2
.g) of the genotypes. The work of Deb [735] indicates that phenotypical
sharing will often be superior to genotypical sharing.
Deﬁnition D28.8 (Niche Count). The niche count m(p, P) [738, 1899] of an individual
p ∈ P is the sum of its sharing values with all individual in a list P.
∀p ∈ P ⇒m(p, P) =
∀p
′
∈P
Sh
σ
(dist(p, p
′
)) (28.13)
The niche count “m” is always greater than or equal to 1, since p ∈ P and, hence,
Sh
σ
(dist(p, p)) = 1 is computed and added up at least once. The original sharing ap
proach was developed for ﬁtness assignment in singleobjective optimization where only one
objective function f was subject to maximization. In this case, the objective value was sim
ply divided by the niche count, punishing solutions in crowded regions [1899]. The goal of
22
See ?? on page ?? for more information on the Hamming distance.
28.3. FITNESS ASSIGNMENT 279
sharing was to distribute the population over a number of diﬀerent peaks in the ﬁtness land
scape, with each peak receiving a fraction of the population proportional to its height [1268].
The results of dividing the ﬁtness by the niche counts strongly depends on the height dif
ferences of the peaks and thus, on the complexity class
23
of f. On f
1
∈ O(x), for instance,
the inﬂuence of “m” is much bigger than on a f
2
∈ O(e
x
).
By multiplying the niche count “m” to predetermined ﬁtness values v
′
, we can use this
approach for ﬁtness minimization in conjunction with a variety of other diﬀerent ﬁtness
assignment processes, but also inherit its shortcomings:
v(p) = v
′
(p) ∗ m(p, pop) , v
′
≡ assignFitness(pop, cmp
f
) (28.14)
Sharing was traditionally combined with ﬁtness proportionate, i. e., roulette wheel selec
tion
24
. Oei et al. [2063] have shown that if the sharing function is computed using the
parental individuals of the “old” population and then na¨ıvely combined with the more so
phisticated tournament selection
25
, the resulting behavior of the Evolutionary Algorithm
may be chaotic. They suggested to use the partially ﬁlled “new” population to circumvent
this problem. The layout of Evolutionary Algorithms, as deﬁned in this book, bases the
ﬁtness computation on the whole set of “new” individuals and assumes that their objec
tive values have already been completely determined. In other words, such issues simply
do not exist in multiobjective Evolutionary Algorithms as introduced here and the chaotic
behavior does occur.
For computing the niche count “m”, O
_
n
2
_
comparisons are needed. According to
Goldberg et al. [1085], sampling the population can be suﬃcient to approximate “m”in
order to avoid this quadratic complexity.
28.3.5 Variety Preserving Fitness Assignment
Using sharing and the niche counts na¨ıvely leads to more or less unpredictable eﬀects. Of
course, it promotes solutions located in sparsely populated niches but how much their ﬁtness
will be improved is rather unclear. Using distance measures which are not normalized can
lead to strange eﬀects, too. Imagine two objective functions f
1
and f
2
. If the values of f
1
span from 0 to 1 for the individuals in the population whereas those of f
2
range from 0 to
10 000, the components of f
1
will most often be negligible in the Euclidian distance of two
individuals in the objective space Y. Another problem is that the eﬀect of simple sharing
on the pressure into the direction of the Pareto frontier is not obvious either or depends on
the sharing approach applied. Some methods simply add a niche count to the Pareto rank,
which may cause nondominated individuals having worse ﬁtness than any others in the
population. Other approaches scale the niche count into the interval [0, 1) before adding it
which not only ensures that nondominated individuals have the best ﬁtness but also leave
the relation between individuals at diﬀerent ranks intact which, however, does not further
variety very much.
In some of our experiments [2888, 2912, 2914], we applied a Variety Preserving Fitness
Assignment, an approach based on Pareto ranking using prevalence comparators and well
dosed sharing. We developed it in order to mitigate the previously mentioned side eﬀects
and balance the evolutionary pressure between optimizing the objective functions and max
imizing the variety inside the population. In the following, we will describe it in detail and
give its speciﬁcation in Algorithm 28.5.
Before this ﬁtness assignment process can begin, it is required that all individuals with
inﬁnite objective values must be removed from the population pop. If such a candidate
solution is optimal, i. e., if it has negative inﬁnitely large objectives in a minimization process,
for instance, it should receive ﬁtness zero, since ﬁtness is subject to minimization. If the
23
See Chapter 12 on page 143 for a detailed introduction into complexity and the Onotation.
24
Roulette wheel selection is discussed in Section 28.4.3 on page 290.
25
You can ﬁnd an outline of tournament selection in Section 28.4.4 on page 296.
280 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.5: v ←− assignFitnessVariety(pop, cmp
f
)
Input: pop: the population
Input: cmp
f
: the comparator function
Input: [implicit] f : the set of objective functions
Data: . . .: sorry, no space here, we’ll discuss this in the text
Output: v: the ﬁtness function
1 begin
/
*
If needed: Remove all elements with infinite objective
values from pop and assign fitness 0 or
len(pop) +
_
len(pop) + 1 to them. Then compute the
prevalence ranks.
*
/
2 ranks ←− createList(len(pop) , 0)
3 maxRank ←− 0
4 for i ←− len(pop) −1 down to 0 do
5 for j ←− i −1 down to 0 do
6 k ←− cmp
f
(pop[i].x, pop[j].x)
7 if k < 0 then ranks[j] ←− ranks[j] + 1
8 else if k > 0 then ranks[i] ←− ranks[i] + 1
9 if ranks[i] > maxRank then maxRank ←− ranks[i]
// determine the ranges of the objectives
10 mins ←− createList([f [ , +∞)
11 maxs ←− createList([f [ , −∞)
12 foreach p ∈ pop do
13 for i ←− [f [ down to 1 do
14 if f
i
(p.x) < mins[i −1] then mins[i −1] ←− f
i
(p.x)
15 if f
i
(p.x) > maxs[i −1] then maxs[i −1] ←− f
i
(p.x)
16 rangeScales ←− createList([f [ , 1)
17 for i ←− [f [ −1 down to 0 do
18 if maxs[i] > mins[i] then rangeScales[i] ←− 1/ (maxs[i] −mins[i])
// Base a sharing value on the scaled Euclidean distance of
all elements
19 shares ←− createList(len(pop) , 0)
20 minShare ←− +∞
21 maxShare ←− −∞
22 for i ←− len(pop) −1 down to 0 do
23 curShare ←− shares[i]
24 for j ←− i −1 down to 0 do
25 dist ←− 0
26 for k ←− [f [ down to 1 do
27 dist ←− dist + [(f
k
(pop[i].x) −f
k
(pop[j].x)) ∗ rangeScales[k −1]]
2
28 s ←− Sh exp
√
f ,16
_
√
dist
_
29 curShare ←− curShare +s
30 shares[j] ←− shares[j] +s
31 shares[i] ←− curShare
32 if curShare < minShare then minShare ←− curShare
33 if curShare > maxShare then maxShare ←− curShare
// Finally, compute the fitness values
34 scale ←−
_
1/ (maxShare −minShare) if maxShare > minShare
1 otherwise
35 for i ←− len(pop) −1 down to 0 do
36 if ranks[i] > 0 then
37 v(pop[i]) ←− ranks[i] +
√
maxRank ∗ scale ∗ (shares[i] −minShare)
38 else v(pop[i]) ←− scale ∗ (shares[i] −minShare)
28.3. FITNESS ASSIGNMENT 281
individual is infeasible, on the other hand, its ﬁtness should be set to len(pop)+
_
len(pop)+1,
which is 1 larger than every other ﬁtness values that may be assigned by Algorithm 28.5.
In lines 2 to 9, a list ranks is created which is used to eﬃciently compute the Pareto rank
of every candidate solution in the population. Actually, the word prevalence rank would be
more precise in this case, since we use prevalence comparisons as introduced in Section 3.5.2.
Therefore, Variety Preserving is not limited to Pareto optimization but may also incorporate
External Decision Makers (Section 3.5.1) or the method of inequalities (Section 3.4.5).
The highest rank encountered in the population is stored in the variable maxRank.
This value may be zero if the population contains only nonprevailed elements. The lowest
rank will always be zero since the prevalence comparators cmp
f
deﬁne order relations which
are noncircular by deﬁnition.
26
. maxRank is used to determine the maximum penalty for
solutions in an overly crowded region of the search space later on.
From line 10 to 18, the maximum and the minimum values that each objective function
takes on when applied to the individuals in the population are obtained. These values are
used to store the inverse of their ranges in the array rangeScales, which will be used to
scale all distances in each dimension (objective) of the individuals into the interval [0, 1].
There are [f [ objective functions in f and, hence, the maximum Euclidian distance between
two candidate solutions in the objective space (scaled to normalization) becomes
_
[f [. It
occurs if all the distances in the single dimensions are 1.
Between line 19 and 33, the scaled distance from every individual to every other candidate
solution in the objective space is determined. This distance is used to aggregate share values
(in the array shares). Therefore, again two nested loops are needed (lines 22 and 24). The
distance components of two individuals pop[i] and pop[j] are scaled and summarized in a
variable dist in line 27. The Euclidian distance between them is
√
dist which is used to
determine a sharing value in 28. We therefore have decided for exponential sharing, as
introduced in Equation 28.12 on page 278, with power 16 and σ =
_
[f [. For every individual,
we sum up all the shares (see line 30). While doing so, the minimum and maximum total
shares are stored in the variables minShare and maxShare in lines 32 and 33.
These variables are used scale all sharing values again into the interval [0, 1] (line 34),
so the individual in the most crowded region always has a total share of 1 and the most
remote individual always has a share of 0. Basically, two things are now known about the
individuals in pop:
1. their Pareto/Prevalence ranks, stored in the array ranks, giving information about
their relative quality according to the objective values and
2. their sharing values, held in shares, denoting how densely crowded the area around
them is.
With this information, the ﬁnal ﬁtness values of an individual p is determined as follows: If p
is nonprevailed, i. e., its rank is zero, its ﬁtness is its scaled total share (line 38). Otherwise,
the square root of the maximum rank,
√
maxRank, is multiplied with the scaled share and
add it to its rank (line 37). By doing so, the supremacy of nonprevailed individuals in
the population is preserved. Yet, these individuals can compete with each other based on
the crowdedness of their location in the objective space. All other candidate solutions may
degenerate in rank, but at most by the square root of the worst rank.
Example E28.17 (Variety Preserving Fitness Assignment).
Let us now apply Variety Preserving to the examples for Pareto ranking from Section 28.3.3.
In Table 28.2, we again list all the candidate solutions from Figure 28.4 on page 276, this
time with their objective values obtained with f
1
and f
2
corresponding to their coordinates
in the diagram. In the third column, you can ﬁnd the Pareto ranks of the individuals as it
has been listed in Table 28.1 on page 277. The columns share/u and share/s correspond to
the total sharing sums of the individuals, unscaled and scaled into [0, 1].
26
In all order relations imposed on ﬁnite sets there is always at least one “smallest” element. See Sec
tion 51.8 on page 645 for more information.
282 28 EVOLUTIONARY ALGORITHMS
2
4
6
8
10 f
1
0
2
4
6
8
10
f
2
0
0.2
0.4
0.6
0.8
1
s
h
a
r
e
p
o
t
e
n
t
i
a
l
1
5
6
8
2
9
7
10
11
14
15
3
12
13
4
ParetoFrontier
Figure 28.5: The sharing potential in the Variety Preserving Example E28.17.
28.3. FITNESS ASSIGNMENT 283
x f
1
f
2
rank share/u share/s v(x)
1 1 7 0 0.71 0.779 0.779
2 2 4 0 0.239 0.246 0.246
3 6 2 0 0.201 0.202 0.202
4 10 1 0 0.022 0 0
5 1 8 1 0.622 0.679 3.446
6 2 7 2 0.906 1 5.606
7 3 5 1 0.531 0.576 3.077
8 2 9 4 0.314 0.33 5.191
9 3 7 4 0.719 0.789 6.845
10 4 6 2 0.592 0.645 4.325
11 5 5 2 0.363 0.386 3.39
12 7 3 1 0.346 0.366 2.321
13 8 4 3 0.217 0.221 3.797
14 7 7 9 0.094 0.081 9.292
15 9 9 13 0.025 0.004 13.01
Table 28.2: An example for Variety Preserving based on Figure 28.4.
But ﬁrst things ﬁrst; as already mentioned, we know the Pareto ranks of the candidate
solutions from Table 28.1, so the next step is to determine the ranges of values the objective
functions take on for the example population. These can again easily be found out from
Figure 28.4. f
1
spans from 1 to 10, which leads to rangeScale[0] =
1
/9. rangeScale[1] =
1
/8
since the maximum of f
2
is 9 and its minimum is 1. With this, we now can compute the
(dimensionally scaled) distances amongst the candidate solutions in the objective space,
the values of
√
dist in algorithm Algorithm 28.5, as well as the corresponding values of
the sharing function Sh exp
√
f ,16
_
√
dist
_
. We noted these in Table 28.3, using the upper
triangle of the table for the distances and the lower triangle for the shares.
Upper triangle: distances. Lower triangle: corresponding share values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0.391 0.836 1.25 0.125 0.111 0.334 0.274 0.222 0.356 0.51 0.833 0.863 0.667 0.923
2 0.012 0.51 0.965 0.512 0.375 0.167 0.625 0.391 0.334 0.356 0.569 0.667 0.67 0.998
3 7.7E5 0.003 0.462 0.933 0.767 0.502 0.981 0.708 0.547 0.391 0.167 0.334 0.635 0.936
4 6.1E7 1.8E5 0.005 1.329 1.163 0.925 1.338 1.08 0.914 0.747 0.417 0.436 0.821 1.006
5 0.243 0.003 2.6E5 1.8E7 0.167 0.436 0.167 0.255 0.417 0.582 0.914 0.925 0.678 0.898
6 0.284 0.014 1.7E4 1.8E6 0.151 0.274 0.25 0.111 0.255 0.417 0.747 0.765 0.556 0.817
7 0.023 0.151 0.003 2.9E5 0.007 0.045 0.512 0.25 0.167 0.222 0.51 0.569 0.51 0.833
8 0.045 0.001 1.5E5 1.5E7 0.151 0.059 0.003 0.274 0.436 0.601 0.933 0.914 0.609 0.778
9 0.081 0.012 3.3E4 4.8E6 0.056 0.284 0.059 0.045 0.167 0.334 0.669 0.67 0.444 0.712
10 0.018 0.023 0.002 3.2E5 0.009 0.056 0.151 0.007 0.151 0.167 0.502 0.51 0.356 0.67
11 0.003 0.018 0.012 2.1E4 0.001 0.009 0.081 0.001 0.023 0.151 0.334 0.356 0.334 0.669
12 8E5 0.002 0.151 0.009 3.2E5 2.1E4 0.003 2.6E5 0.001 0.003 0.023 0.167 0.5 0.782
13 5.7E5 0.001 0.023 0.007 2.9E5 1.7E4 0.002 3.2E5 0.001 0.003 0.018 0.151 0.391 0.635
14 0.001 0.001 0.001 9.3E5 4.6E4 0.002 0.003 0.001 0.007 0.018 0.023 0.003 0.012 0.334
15 2.9E5 1.2E5 2.5E5 1.1E5 3.9E5 9.7E5 8E5 1.5E4 3.1E4 0.001 0.001 1.4E4 0.001 0.023
Table 28.3: The distance and sharing matrix of the example from Table 28.2.
The value of the sharing function can be imagined as a scalar ﬁeld, as illustrated in
Figure 28.5. In this case, each individual in the population can be considered as an electron
that will build an electrical ﬁeld around it resulting in a potential. If two electrons come
284 28 EVOLUTIONARY ALGORITHMS
close, repulsing forces occur, which is pretty much the same what we want to do with Variety
Preserving. Unlike the electrical ﬁeld, the power of the sharing potential falls exponentially,
resulting in relatively steep spikes in Figure 28.5 which gives proximity and density a heavier
inﬂuence. Electrons in atoms on planets are limited in their movement by other inﬂuences
like gravity or nuclear forces, which are often stronger than the electromagnetic force. In
Variety Preserving, the prevalence rank plays this role – as you can see in Table 28.2, its
inﬂuence on the ﬁtness is often dominant.
By summing up the single sharing potentials for each individual in the example, we
obtain the ﬁfth column of Table 28.3, the unscaled share values. Their minimum is around
0.022 and the maximum is 0.94. Therefore, we must subtract 0.022 from each of these values
and multiply the result with 1.131. By doing so, we build the column shares/s. Finally, we
can compute the ﬁtness values v(x) according to lines 38 and 37 in Algorithm 28.5.
The last column of Table 28.2 lists these results. All nonprevailed individuals have
retained a ﬁtness value less than one, lower than those of any other candidate solution in
the population. However, amongst these best individuals, candidate solution 4 is strongly
preferred, since it is located in a very remote location of the objective space. Individual
1 is the least interesting nondominated one, because it has the densest neighborhood in
Figure 28.4. In this neighborhood, the individuals 5 and 6 with the Pareto ranks 1 and 2 are
located. They are strongly penalized by the sharing process and receive the ﬁtness values
v(5) = 3.446 and v(6) = 5.606. In other words, individual 5 becomes less interesting than
candidate solution 7 which has a worse Pareto rank. 6 now is even worse than individual 8
which would have a ﬁtness better by two if strict Pareto ranking was applied.
Based on these ﬁtness values, algorithms like Tournament selection (see Section 28.4.2)
or ﬁtness proportionate approaches (discussed in Section 28.4.3) will pick elements in a
way that preserves the pressure into the direction of the Pareto frontier but also leads to a
balanced and sustainable variety in the population. The beneﬁts of this approach have been
shown, for instance, in [2188, 2888, 2912, 2914].
28.3.6 Tournament Fitness Assignment
In tournament ﬁtness assignment which is a generalization of the qlevel (binary) tournament
selection introduced by Weicker [2879], the ﬁtness of each individual is computed by letting
it compete q times against r other individuals (with r = 1 as default) and counting the
number of competitions it loses. For a better understanding of the tournament metaphor
see Section 28.4.4 on page 296, where the tournament selection scheme is discussed. The
number of losses will approximate the Pareto rank of the individual, but it is a bit more
randomized than that. If the number of tournaments won was instead of the losses, the
same problems than in the ﬁrst idea of Pareto ranking would be encountered.
28.4. SELECTION 285
Algorithm 28.6: v ←− assignFitnessTournament(q, r, pop, cmp
f
)
Input: q: the number of tournaments per individuals
Input: r: the number of other contestants per tournament, normally 1
Input: pop: the population to assign ﬁtness values to
Input: cmp
f
: the comparator function providing the prevalence relation
Data: i, j, k, z: counter variables
Data: b: a Boolean variable being true as long as a tournament isn’t lost
Data: p: the individual currently examined
Output: v: the ﬁtness function
1 begin
2 for i ←− len(pop) −1 down to 0 do
3 z ←− q
4 p ←− pop[i]
5 for j ←− q down to 1 do
6 b ←− true
7 k ←− r
8 while (k > 0) ∧ b do
9 b ←− pop[⌊randomUni0len(pop)⌋].x ≺ p.x
10 k ←− k −1
11 if b then z ←− z −1
12 v(p) ←− z
13 return v
28.4 Selection
28.4.1 Introduction
Deﬁnition D28.9 (Selection). In Evolutionary Algorithms, the selection
27
operation
mate = selection(pop, v, mps) chooses mps individuals according to their ﬁtness values v
(subject to minimization) from the population pop and places them into the mating pool
mate [167, 340, 1673, 1912].
mate = selection(pop, v, mps) ⇒ ∀p ∈ mate ⇒ p ∈ pop
∀p ∈ pop ⇒ p ∈ GX
v(p) ∈ R
+
∀p ∈ pop
(len(mate) ≥ min ¦len(pop) , mps¦) ∧ (len(mate) ≤ mps)
(28.15)
Deﬁnition D28.10 (Mating Pool). The mating pool mate in an Evolutionary Algorithm
contains the individuals which are selected for further investigation and on which the repro
duction operations (see Section 28.5 on page 306) will subsequently be applied.
27
http://en.wikipedia.org/wiki/Selection_%28genetic_algorithm%29 [accessed 20070703]
286 28 EVOLUTIONARY ALGORITHMS
Selection may behave in a deterministic or in a randomized manner, depending on the
algorithm chosen and its applicationdependant implementation. Furthermore, elitist Evo
lutionary Algorithms may incorporate an archive archive in the selection process, as outlined
in ??.
Generally, there are two classes of selection algorithms: such with replacement (annotated
with a subscript
r
in this book) and such without replacement (here annotated with a
subscript
w
) [2399]. In Figure 28.6, we present the transition from generation t to t + 1 for
both types of selection methods.In a selection algorithm without replacement, each individual
A B C
E F
I
D
H G
A B C
D G
Populationingenerationt
Pop(t);ps=9
selection
with
replacement
individualscanenter
matingpoolmorethanonce
B D ´ C A ´
D’
A’
A B ´
B
G
C’ D B ´
A B ´ A D ´
B’
A’
B D ´
C
B
D A ´ D’
individualsentermating
poolatmostonce
duplication,
mutation,
crossover
...
duplication,
mutation,
crossover
...
Populationingenerationt+1
Pop(t+1);ps=9
matingpoolattimestept
Mate(t);ms 5 =
selection
without
replacement
insomeEAs(steadystate,( + )ES,...)
theparentscompetewiththeoffsprings
m l
B A A
D B
matingpoolattimestept
Mate(t);ms 5 =
Figure 28.6: Selection with and without replacement.
from the population pop can enter the mating pool mate at most once in each generation.
If a candidate solution has been selected, it cannot be selected a second time in the same
generation. This process is illustrated in the bottom row of Figure 28.6.
The mating pool returned by algorithms with replacement can contain the same indi
vidual multiple times, as sketched in the top row of Figure 28.6. The dashed gray line in
this ﬁgure stands for the possibility to include the parent generation in the next selection
process. In other words, the parents selected into mate(t) may compete with their children
in popt+1 and together take part in the next environmental selection step. This is the case
in (µ + λ)Evolution Strategies and in steadystate Evolutionary Algorithms, for example.
In extinctive/generational EAs, such as, for example, (µ, λ)Evolution Strategies, the parent
generation simply is discarded, as discussed in Section 28.1.4.1.
mate = selection
w
(pop, v, mps) ⇒ count(p, mate) = 1 ∀p ∈ mate (28.16)
mate = selection
r
(pop, v, mps) ⇒ count(p, mate) ≥ 1 ∀p ∈ mate (28.17)
The selection algorithms have major impact on the performance of Evolutionary Algorithms.
Their behavior has thus been subject to several detailed studies, conducted by, for instance,
Blickle and Thiele [340], Chakraborty et al. [520], Goldberg and Deb [1079], Sastry and
Goldberg [2400], and Zhong et al. [3079], just to name a few.
Usually, ﬁtness assignment processes are carried out before selection and the selection
algorithms base their decisions solely on the ﬁtness v of the individuals. It is possible to
28.4. SELECTION 287
rely on the prevalence or dominance relation, i. e., to write selection(pop, cmp
f
, mps) instead
of selection(pop, v, mps), or, in case of singleobjective optimization, to use the objective
values directly (selection(pop, f, mps)) and thus, to save the costs of the ﬁtness assignment
process. However, this can lead to the same problems that occurred in the ﬁrst approach of
Paretoranking ﬁtness assignment (see Point 1 in Section 28.3.3 on page 275).
As outlined in Paragraph 28.1.2.6.2, Evolutionary Algorithms generally apply survival
selection, sometimes followed by a sexual selectionSelection!sexual procedure. In this section,
we will focus on the former one.
28.4.1.1 Visualization
In the following sections, we will discuss multiple selection algorithms. In order to ease
understanding them, we will visualize the expected number of oﬀspring ESp of an indi
vidual, i. e., the number of times it will take part in reproduction. If we assume uniform
sexual selection (i. e., no sexual selection), this value will be proportional to its number of
occurrences in the mating pool.
Example E28.18 (Individual Distribution after Selection).
Therefore, we will use the special case where we have a population pop of the constant size
of ps(t) = ps(t + 1) = 1000 individuals. The individuals are numbered from 0 to 999, i. e.,
p
0
..p
999
. For the sake of simplicity, we assume that sexual selection from the mating pool
mate is uniform and that only a unary reproduction operation such as mutation is used.
The ﬁtness of an individual strongly inﬂuence its chance for reproduction. We consider four
cases with ﬁtness subject to minimization:
1. As sketched in Fig. 28.7.a, the individual p
i
has ﬁtness i, i. e., v
1
(p
0
) = 0, v
1
(p
1
) =
1, . . . , v
1
(p
999
) = 999.
2. Individual p
i
has ﬁtness (i + 1)
3
, i. e., v
2
(p
0
) = 1, v
2
(p
1
) = 3, . . . , v
2
(p
999
) =
1 000 000 000, as illustrated in Fig. 28.7.b.
288 28 EVOLUTIONARY ALGORITHMS
0
200
400
600
800
1000
200 400 600 800
i
v (p)
1 i
Fig. 28.7.a: Case 1: v
1
(p
i
) = i
0
2e8
4e8
6e8
8e8
1e9
200 400 600 800
i
v (p)
2 i
Fig. 28.7.b: Case 2: v
2
(p
i
) = (i + 1)
3
0
0.4
0.8
1.2
1.6
2.0
200 400 600 800
i
v (p)
3 i
Fig. 28.7.c: Case 3: v
3
(p
i
) =
0.0001i if i < 999, 1 otherwise
0
0.4
0.8
1.2
1.6
2.0
200 400 600 800
i
v (p)
4 i
Fig. 28.7.d: Case 4: v
4
(p
i
) = 1 + v
3
(p
i
)
Figure 28.7: The four example ﬁtness cases.
28.4. SELECTION 289
28.4.2 Truncation Selection
Truncation selection
28
, also called deterministic selection, threshold selection, or breeding
selection, returns the k ≤ mps best elements from the list pop. The name breeding selection
stems from the fact that farmers who breed animals or crops only choose the best individuals
for reproduction, too [302]. The selected elements are copied as often as needed until the
mating pool size mps is reached. Most often k equals the mating pool size mps which usually
is signiﬁcantly smaller than the population size ps.
Algorithm 28.7 realizes truncation selection with replacement (i. e., where individuals
may enter the mating pool mate more than once) and Algorithm 28.8 represents truncation
selection without replacement. You can ﬁnd both algorithms are implemented in one single
class whose sources are provided in Listing 56.13 on page 754. They ﬁrst sort the population
in ascending order according to the ﬁtness v. For truncation selection without replacement,
k = mps and k < ps always holds, since otherwise, this approach would make no sense since
it would select all individuals. Algorithm 28.8 simply returns the best mps individuals from
the population. Algorithm 28.7, on the other hand, iterates from 0 to mps − 1 after the
sorting and inserts only the elements with indices from 0 to k −1 into the mating pool.
Algorithm 28.7: mate ←− truncationSelection
r
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: k: cutoﬀ rank, usually k = mps
Data: i: counter variable
Output: mate: the survivors of the truncation which now form the mating pool
1 begin
2 mate ←− ()
3 k ←− min ¦k, ps¦
4 pop ←− sortAsc(pop, v)
5 for i ←− 1 up to mps do
6 mate ←− addItem(mate, pop[i mod k])
7 return mate
Algorithm 28.8: mate ←− truncationSelection
w
(mps, pop, v)
Input: pop: the list of individuals to select from
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Output: mate: the survivors of the truncation which now form the mating pool
1 begin
2 pop ←− sortAsc(pop, v)
3 return deleteRange(pop, mps, ps −mps)
Truncation selection is typically applied in Evolution Strategies with (µ +λ) and (µ, λ)
Evolution Strategies as well as in steadystate EAs. In general Evolutionary Algorithms, it
28
http://en.wikipedia.org/wiki/Truncation_selection [accessed 20070703]
290 28 EVOLUTIONARY ALGORITHMS
should be combined with a ﬁtness assignment process that incorporates diversity information
in order to prevent premature convergence. Recently, L¨assig and Hoﬀmann [1686], L¨assig
et al. [1687] have proved that threshold selection, a truncation selection method where a cut
oﬀ ﬁtness is deﬁned after which no further individuals are selected is the optimal selection
strategy for crossover, provided that the right cutoﬀ value is used. In practical applications,
this value is normally not known.
Example E28.19 (Truncation Selection (Example E28.18 cntd)).
In Figure 28.8, we sketch the number of oﬀspring for the individuals p ∈ pop from Exam
0
1
2
3
4
5
6
7
ES(p)
i
9
k=
k=
0.100 * ps
0.250 * ps
k= 0.125 * ps
k= 0.500 * ps
k= 1.000 * ps
100 200 300 400 500 600 700 i 900
100 200 300 400 500 600 700 900 v (p)
1 i
1e6 8e6 3e7 6e7 1e8 2e8 3e8 7e8 v (p)
2 i
0
0
1
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.09 v (p)
3 i
0
1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.09 1 v (p)
4 i
Figure 28.8: The number of expected oﬀspring in truncation selection.
ple E28.18 if truncation selection would be applied. As stated, k is less or equal to the
mating pool size mps. Under the assumption that no sexual selection takes place, ESp only
depends on k.
Under truncation selection, the diagram will look exactly the same regardless which of
the four example ﬁtness conﬁgurations we use. Here, only the order of individuals and not
on the numerical relation of their ﬁtness is important. If we set k = ps, each individual will
have one oﬀspring in average. If k =
1
2
mps, the top50% individuals will have two oﬀspring
and the others none. For k =
1
10
mps, only the best 100 from the 1000 candidate solutions
will reach the mating pool and reproduce 10 times in average.
28.4.3 Fitness Proportionate Selection
Fitness proportionate selection
29
was the selection method applied in the original Genetic
Algorithms as introduced by Holland [1250] and therefore is one of the oldest selection
29
http://en.wikipedia.org/wiki/Fitness_proportionate_selection [accessed 20080319]
28.4. SELECTION 291
schemes. In this Genetic Algorithm, ﬁtness was subject to maximization. Under ﬁtness
proportionate selection, as its name already implies, the probability P (select(p)) of an in
dividual p ∈ pop to enter the mating pool is proportional to its ﬁtness v(p). This relation
in its original form is deﬁned in Equation 28.18 below: The probability P (select(p)) that
individual p is picked in each selection decision corresponds to the fraction of its ﬁtness in
the sum of the ﬁtness of all individuals.
P (select(p)) =
v(p)
∀p
′
∈pop
v(p
′
)
(28.18)
There exists a variety of approaches which realize such probability distributions [1079], like
stochastic remainder selection [358, 409] and stochastic universal selection [188, 1133]. The
most commonly known method is the Monte Carlo roulette wheel selection by De Jong
[711], where we imagine the individuals of a population to be placed on a roulette
30
wheel
as sketched in Fig. 28.9.a. The size of the area on the wheel standing for a candidate solution
is proportional to its ﬁtness. The wheel is spun, and the individual where it stops is placed
into the mating pool mate. This procedure is repeated until mps individuals have been
selected.
In the context of this book, ﬁtness is subject to minimization. Here, higher ﬁtness values
v(p) indicate unﬁt candidate solutions p.x whereas lower ﬁtness denotes high utility. Fur
thermore, the ﬁtness values are normalized into a range of [0, 1], because otherwise, ﬁtness
proportionate selection will handle the set of ﬁtness values ¦0, 1, 2¦ in a diﬀerent way than
¦10, 11, 12¦. Equation 28.22 deﬁnes the framework for such a (normalized) ﬁtness propor
tionate selection. It is realized in Algorithm 28.9 as a variant with and in Algorithm 28.10
without replacement.
minV = min ¦v(p) ∀p ∈ pop¦ (28.19)
maxV = max ¦v(p) ∀p ∈ pop¦ (28.20)
normV(p) =
maxV−v(p.x)
maxV−minV
(28.21)
P (select(p)) ≈
normV (p)
∀p
′
∈pop
normV (p
′
)
(28.22)
30
http://en.wikipedia.org/wiki/Roulette [accessed 20080320]
292 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.9: mate ←− rouletteWheelSelection
r
(pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: a counter variable
Data: a: a temporary store for a numerical value
Data: A: the array of ﬁtness values
Data: min, max, sum: the minimum, maximum, and sum of the ﬁtness values
Output: mate: the mating pool
1 begin
2 A ←− createList(ps, 0)
3 min ←− ∞
4 max ←− −∞
// Initialize fitness array and find extreme fitnesses
5 for i ←− 0 up to ps −1 do
6 a ←− v(pop[i])
7 A[i] ←− a
8 if a < min then min ←− a
9 if a > max then max ←− a
// break situations where all individuals have the same
fitness
10 if max = min then
11 max ←− max + 1
12 min ←− min −1
/
*
aggregate fitnesses: A = (normV(pop[0]),normV(pop[0]) +
normV(pop[1]),normV(pop[0]) +normV(pop[1]) +normV(pop[2]),. . . )
*
/
13 sum ←− 0
14 for i ←− 0 up to ps −1 do
15 sum ←− sum+
max −A[i]
max −min
16 A[i] ←− sum
/
*
spin the roulette wheel: the chance that a uniformly
distributed random number lands in A[i] is (inversely)
proportional to v(pop[1])
*
/
17 for i ←− 0 up to mps −1 do
18 a ←− searchItemAS(randomUni[0, sum) , A)
19 if a < 0 then a ←− −a −1
20 mate ←− addItem(mate, pop[a])
21 return mate
28.4. SELECTION 293
Algorithm 28.10: mate ←− rouletteWheelSelection
w
(pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: a counter variable
Data: a, b: temporary stores for numerical values
Data: A: the array of ﬁtness values
Data: min, max, sum: the minimum, maximum, and sum of the ﬁtness values
Output: mate: the mating pool
1 begin
2 A ←− createList(ps, 0)
3 min ←− ∞
4 max ←− −∞
// Initialize fitness array and find extreme fitnesses
5 for i ←− 0 up to ps −1 do
6 a ←− v(pop[i])
7 A[i] ←− a
8 if a < min then min ←− a
9 if a > max then max ←− a
// break situations where all individuals have the same
fitness
10 if max = min then
11 max ←− max + 1
12 min ←− min −1
/
*
aggregate fitnesses: A = (normV(pop[0]),normV(pop[0]) +
normV(pop[1]),normV(pop[0]) +normV(pop[1]) +normV(pop[2]),. . . )
*
/
13 sum ←− 0
14 for i ←− 0 up to ps −1 do
15 sum ←− sum+
max −A[i]
max −min
16 A[i] ←− sum
/
*
spin the roulette wheel: the chance that a uniformly
distributed random number lands in A[i] is (inversely)
proportional to v(pop[1])
*
/
17 for i ←− 0 up to min ¦mps, ps¦ −1 do
18 a ←− searchItemAS(randomUni[0, sum) , A)
19 if a < 0 then a ←− −a −1
20 if a = 0 then b ←− 0
21 else b ←− A[a −1]
22 b ←− A[a] −b
// remove the element at index a from A
23 for j ←− a + 1 up to len(A) −1 do
24 A[j] ←− A[j] −b
25 sum ←− sum−b
26 mate ←− addItem(mate, pop[a])
27 pop ←− deleteItem(pop, a)
28 A ←− deleteItem(A, a)
29 return mate
294 28 EVOLUTIONARY ALGORITHMS
Example E28.20 (Roulette Wheel Selection).
In Figure 28.9, we illustrate the ﬁtness distribution on the roulette wheel for ﬁtness max
v(p )=10
A(p )= / A
1
1
10
1
v(p )=20
A(p )= / A
2
2
5
1
v(p )=30
A(p )= / A
3
3
3
1
v(p )=40
A(p )= / A
4
4
5
2
Fig. 28.9.a: Example for ﬁtness maximiza
tion.
v(p )=10
A(p )= / A=
1
1
60
30
2
1
/ A
v(p )=20
A(p )= / A= / A
2
2
60 3
20 1
v(p )=30
A(p )= / A
= / A
3
3
60
6
10
1
v(p )=40
A(p )= / A
=
4
4
60
0
0A
Fig. 28.9.b: Example for normalized ﬁtness mini
mization.
Figure 28.9: Examples for the idea of roulette wheel selection.
imimization (Fig. 28.9.a) and normalized minimization (Fig. 28.9.b). In this example we
compute the expected proportion in which each of the four element p
1
..p
4
will occur in the
mating pool mate if the ﬁtness values are v(p
1
) = 10, v(p
2
) = 20, v(p
3
) = 30, and v(p
4
) = 40.
Amongst others, Whitley [2929] points out that even ﬁtness normalization as performed here
cannot overcome the drawbacks of ﬁtness proportional selection methods. In order to clarify
these drawbacks, let us visualize the expected results of roulette wheel selection applied to
the special cases stated in Section 28.4.1.1.
Example E28.21 (Roulette Wheel Selection (Example E28.18 cntd)).
Figure 28.10 illustrates the expected number of occurrences ES(p
i
) of an individual p
i
if
roulette wheel selection is applied with replacement.
In such cases, usually parents for each oﬀspring are selected into the mating pool, which
then is as large as the population (mps = ps = 1000). Thus we draw one thousand times a
single individual from the population pop. Each single choice is based on the proportion of
the individual ﬁtness in the total ﬁtness of all individuals, as deﬁned in Equation 28.18 and
Equation 28.22.
In scenario 1 with the ﬁtness sum
999∗998
2
= 498501, the relation ES(p
i
) = mps ∗
i
498501
holds for ﬁtness maximization and ES(p
i
) = mps
999−i
498501
for minimization. As result
(sketched in Fig. 28.10.a), the ﬁttest individuals produce (on average) two oﬀspring, whereas
the worst candidate solutions will always vanish in this example.
For scenario 2 with v
2
(p
i
.x) = (i +1)
3
, the total ﬁtness sum is approximately 2.51 10
11
and ES(p
i
) = mps
(i+1)
3
2.52 10
11
holds for maximization. The resulting expected values depicted
in Fig. 28.10.b are signiﬁcantly diﬀerent from those in Fig. 28.10.a.
The meaning of this is that the design of the objective functions (or the ﬁtness assign
ment process) has a very strong inﬂuence on the convergence behavior of the Evolutionary
Algorithm. This selection method only works well if the ﬁtness of an individual is indeed
something like a proportional measure for the probability that it will produce better oﬀ
spring.
The drawback becomes even clearer when comparing the two ﬁtness functions v
3
and
v
4
. v
4
is basically the same as v
3
– just shifted upwards by 1. From a balanced selection
28.4. SELECTION 295
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.8
200 400 600
fitnessminimization
fitnessmaximization
ES(p)
i
i
200 400 600 v (p)
1
0
0
Fig. 28.10.a: v
1
(p
i
.x) = i
0
200 400 600
fitnessminimization
fitnessmaximization
ES(p)
i
i
8.12e6 6.48e7 2.17e8 v (p)
2
0
1
0.5
1.0
1.5
2.0
2.5
3.5
Fig. 28.10.b: v
1
(p
i
.x) = (i + 1)
3
fitnessmaximization
0
5
10
15
20
ES(p)
i
200 400 600 i
0.02 0.04 0.06 v (p)
3
0
0
Fig. 28.10.c: v
3
(p
i
) =
0.0001i if i < 999, 1 otherwise
0
0.5
1.0
1.5
2.0
200 400 600
fitnessmaximization
ES(p)
i
i
1.02 1.04 1.06 v (p)
4
0
0
Fig. 28.10.d: v
4
(p
i
) = 1 + v
3
(p
i
)
Figure 28.10: The number of expected oﬀspring in roulette wheel selection.
296 28 EVOLUTIONARY ALGORITHMS
method, we would expect that it produces similar mating pools for two functions which are
that similar. However, the result is entirely diﬀerent. For v
3
, the individual at index 999
is totally outstanding with its ﬁtness of 1 (which is more than ten times the ﬁtness of the
nextbest individual). Hence, it receives many more copies in the mating pool. In the case
of v
4
, however, it is only around twice as good as the next best individual and thus, only
receives around twice as many copies. From this situation, we can draw two conclusions:
1. The number of copies that an individual receives changes signiﬁcantly when the vertical
shift of the ﬁtness function changes – even if its other characteristics remain constant.
2. In cases like v
3
, where one individual has outstanding ﬁtness, this may degenerate the
Genetic Algorithm to a better Hill Climber, since this individual will unite most of
the slots in the mating pool on itself. If the Genetic Algorithm behaves like a Hill
Climber, its beneﬁts, such as the ability to trace more than one possible solution at a
time, disappear.
Roulette wheel selection has a bad performance compared to other schemes like tournament
selection or ranking selection [1079]. It is mainly included here for the sake of completeness
and because it is easy to understand and suitable for educational purposes.
28.4.4 Tournament Selection
Tournament selection
31
, proposed by Wetzel [2923] and studied by Brindle [409], is one
of the most popular and eﬀective selection schemes. Its features are wellknown and have
been analyzed by a variety of researchers such as Blickle and Thiele [339, 340], Lee et al.
[1707], Miller and Goldberg [1898], Sastry and Goldberg [2400], and Oei et al. [2063]. In
tournament selection, k elements are picked from the population pop and compared with
each other in a tournament. The winner of this competition will then enter mating pool
mate. Although being a simple selection strategy, it is very powerful and therefore used in
many practical applications [81, 96, 461, 1880, 2912, 2914].
Example E28.22 (Binary Tournament Selection).
As example, consider a tournament selection (with replacement) with a tournament size of
two [2931]. For each single tournament, the contestants are chosen randomly according to a
uniform distribution and the winners will be allowed to enter the mating pool. If we assume
that the mating pool will contain about as same as many individuals as the population, each
individual will, on average, participate in two tournaments. The best candidate solution of
the population will win all the contests it takes part in and thus, again on average, contributes
approximately two copies to the mating pool. The median individual of the population is
better than 50% of its challengers but will also loose against 50%. Therefore, it will enter the
mating pool roughly one time on average. The worst individual in the population will lose
all its challenges to other candidate solutions and can only score even if competing against
itself, which will happen with probability (
1
/mps)
2
. It will not be able to reproduce in the
average case because mps ∗ (
1
/mps)
2
=
1
/mps < 1 ∀mps > 1.
Example E28.23 (Tournament Selection (Example E28.18 cntd)).
For visualization purposes, let us go back to Example E28.18 with a population of 1000
individuals p
0
..p
999
and mps = ps = 1000. Again, we assume that each individual has an
unique ﬁtness value of v
1
(p
i
) = i, v
2
(p
i
) = (i +1)
3
, v
3
(p
i
) = 0.0001i if i < 999, 1 otherwise,
and v
4
(p
i
) = 1 +v
3
(p
i
), respectively. If we apply tournament selection with replacement in
31
http://en.wikipedia.org/wiki/Tournament_selection [accessed 20070703]
28.4. SELECTION 297
this special scenario, the expected number of occurrences ES(p
i
) of an individual p
i
in the
mating pool can be computed according to Blickle and Thiele [340] as
ES(p
i
) = mps ∗
_
_
1000 −i
1000
_
k
−
_
1000 −i −1
1000
_
k
_
(28.23)
The absolute values of the ﬁtness play no role in tournament selection. The only thing that
k=10
k=5
k=4
k=3
k=2
k=1
100 200 300 400 500 600 700 i 900
100 200 300 400 500 600 700 900 v (p)
1 i
1e6 8e6 3e7 6e7 1e8 2e8 3e8 7e8 v (p)
2 i
0
0
1
0
1
2
3
4
5
6
7
ES(p)
i
9
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.09 v (p)
3 i
0
1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.09 1 v (p)
4 i
Figure 28.11: The number of expected oﬀspring in tournament selection.
matters is whether or not the ﬁtness of one individual is higher as the ﬁtness of another
one, not ﬁtness diﬀerence itself. The expected numbers of oﬀspring for the four example
cases 1 and 2 from Example E28.18 are the same. Tournament selection thus gets rid of the
problematic dependency on the relative proportions of the ﬁtness values in roulettewheel
style selection schemes.
Figure 28.11 depicts the expected oﬀspring numbers for diﬀerent tournament sizes k ∈
¦1, 2, 3, 4, 5, 10¦. If k = 1, tournament selection degenerates to randomly picking individuals
and each candidate solution will occur one time in the mating pool on average. With rising
k, the selection pressure increases: individuals with good ﬁtness values create more and
more oﬀspring whereas the chance of worse candidate solutions to reproduce decreases.
Tournament selection with replacement (TSR) is presented in Algorithm 28.11 which has
been implemented in Listing 56.14 on page 756. Tournament selection without replacement
(TSoR) [43, 1707] can be deﬁned in two forms. In the ﬁrst variant speciﬁed as Algo
rithm 28.12, a candidate solution cannot compete against itself. This method is deﬁned in.
In Algorithm 28.13, on the other hand, an individual may enter the mating pool at most
once.
298 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.11: mate ←− tournamentSelection
r
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: i, j, a, b: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 for i ←− 1 up to mps do
4 a ←− ⌊randomUni[0, ps)⌋
5 for j ←− 1 up to k −1 do
6 b ←− ⌊randomUni[0, ps)⌋
7 if v(pop[b]) < v(pop[a]) then a ←− b
8 mate ←− addItem(mate, pop[a])
9 return mate
Algorithm 28.12: mate ←− tournamentSelection
w
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: i, j, a, b: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to min ¦len(pop) , mps¦ do
5 a ←− ⌊randomUni[0, len(pop))⌋
6 for j ←− 1 up to min ¦len(pop) , k¦ −1 do
7 b ←− ⌊randomUni[0, ps)⌋
8 if v(pop[b]) < v(pop[a]) then a ←− b
9 mate ←− addItem(mate, pop[a])
10 pop ←− deleteItem(pop, a)
11 return mate
28.4. SELECTION 299
Algorithm 28.13: mate ←− tournamentSelection
w
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: A: the list of contestants per tournament
Data: a: the tournament winner
Data: i, j: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to mps do
5 A ←− ()
6 for j ←− 1 up to min¦k, len(pop)¦ do
7 repeat
8 searchItemUS(a, A)¡0
9 until a ←− ⌊randomUni[0, len(pop))⌋
10 A ←− addItem(A, a)
11 a ←− minA
12 mate ←− addItem(mate, pop[a])
13 return mate
The algorithms speciﬁed here should more precisely be entitled as deterministic tour
nament selection algorithms since the winner of the k contestants that take part in each
tournament enters the mating pool. In the nondeterministic variants, this is not necessarily
the case. There, a probability η is deﬁned. The best individual in the tournament is selected
with probability η, the second best with probability η(1−η), the third best with probability
η(1 − η)
2
and so on. The i
th
best individual in a tournament enters the mating pool with
probability η(1−η)
i
. Algorithm 28.14 realizes this behavior for a tournament selection with
replacement. Notice that it becomes equivalent to Algorithm 28.11 on the preceding page
if η is set to 1. Besides the algorithms discussed here, a set of additional tournamentbased
selection methods has been introduced by Lee et al. [1707].
300 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.14: mate ←− tournamentSelectionND
r
(η, k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the ﬁtness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] η: the selection probability, η ∈ [0, 1]
Input: [implicit] k: the tournament size
Data: A: the set of tournament contestants
Data: i, j: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to mps do
5 A ←− ()
6 for j ←− 1 up to k do
7 A ←− addItem(A, ⌊randomUni[0, ps)⌋)
8 A ←− sortAsc(A, cmp(a
1
, a
2
) ≡ (a
1
−a
2
))
9 for j ←− 0 up to len(A) −1 do
10 if (randomUni[0, 1) ≤ η) ∨ (j ≥ len(A) −1) then
11 mate ←− addItem(mate, pop[A[j]])
12 j ←− ∞
13 return mate
28.4.5 Linear Ranking Selection
Linear ranking selection [340], introduced by Baker [187] and more thoroughly discussed by
Blickle and Thiele [340], Whitley [2929], and Goldberg and Deb [1079] is another approach
for circumventing the problems of ﬁtness proportionate selection methods. In ranking se
lection [187, 1133, 2929], the probability of an individual to be selected is proportional to
its position (rank) in the sorted list of all individuals in the population. Using the rank
smoothes out larger diﬀerences of the ﬁtness values and emphasizes small ones. Generally,
we can consider the conventional ranking selection method as the application of a ﬁtness
assignment process setting the rank as ﬁtness (which can be achieved with Pareto ranking,
for instance) and a subsequent ﬁtness proportional selection.
In its original deﬁnition discussed in [340], linear ranking selection was used to maximize
ﬁtness. Here, we will discuss it again for ﬁtness minimization. In linear ranking selection with
replacement, the probability P (select(p
i
)) that the individual at rank i ∈ 0..ps is picked in
each selection decision is given in Equation 28.24 below. Notice that each individual always
receives a diﬀerent rank and even if two individuals have the same ﬁtness, they must be
assigned to diﬀerent ranks [340].
P (select(p
i
)) =
1
ps
_
η
−
+
_
η
+
−η
−
_
ps −i −1
ps −1
_
(28.24)
In Equation 28.24, the best individual (at rank 0) will have the probability
η+
ps
of being
28.4. SELECTION 301
picked while the worst individual (rank ps −1) is chosen with
η
−
ps
chance:
P (select(p
0
)) =
1
ps
_
η
−
+
_
η
+
−η
−
_
ps −0 −1
ps −1
_
=
1
ps
_
η
−
+
_
η
+
−η
−
__
=
η
+
ps
(28.25)
P (select(p
ps−1
)) =
1
ps
_
η
−
+
_
η
+
−η
−
_
ps −(ps −1) −1
ps −1
_
(28.26)
=
1
ps
_
η
−
+
_
η
+
−η
−
_
0
ps −1
_
=
η
−
ps
(28.27)
It should be noted that all probabilities must add up to one, i. e., :
ps−1
i=0
P (select(p
i
)) = 1 (28.28)
1 =
ps−1
i=0
1
ps
_
η
−
+
_
η
+
−η
−
_
ps −i −1
ps −1
_
(28.29)
=
1
ps
ps−1
i=0
_
η
−
+
_
η
+
−η
−
_
ps −i −1
ps −1
_
(28.30)
=
1
ps
_
ps η
−
+
ps−1
i=0
_
η
+
−η
−
_
ps −i −1
ps −1
_
(28.31)
=
1
ps
_
ps η
−
+
_
η
+
−η
−
_
ps−1
i=0
ps −i −1
ps −1
_
(28.32)
=
1
ps
_
ps η
−
+
η
+
−η
−
ps −1
ps−1
i=0
ps −i −1
_
(28.33)
=
1
ps
_
ps η
−
+
η
+
−η
−
ps −1
_
ps (ps −1) −
ps−1
i=0
i
__
(28.34)
=
1
ps
_
ps η
−
+
η
+
−η
−
ps −1
_
ps (ps −1) −
1
2
ps (ps −1)
__
(28.35)
=
1
ps
_
ps η
−
+
η
+
−η
−
ps −1
1
2
ps (ps −1)
_
(28.36)
=
1
ps
_
ps η
−
+
1
2
ps
_
η
+
−η
−
_
_
(28.37)
=
1
ps
_
1
2
ps η
−
+
1
2
ps η
+
_
(28.38)
1 =
1
2
η
−
+
1
2
η
+
(28.39)
1 −
1
2
η
−
=
1
2
η
+
(28.40)
2 −η
−
= η
+
(28.41)
The parameters η
−
and η
+
must be chosen in a way that 2 − η
−
= η
+
and, obviously,
0 < η
−
and 0 < η
+
must hold as well. If the selection algorithm should furthermore make
sense, i. e., prefer individuals with a lower ﬁtness rank over those with a higher ﬁtness rank
(again, ﬁtness is subject to minimization), 0 ≤ η
−
< η
+
≤ 2 is true.
Koza [1602] determines the slope of the linear ranking selection probability function by
a factor r
m
in his implementation of this selection scheme. According to [340], this factor
302 28 EVOLUTIONARY ALGORITHMS
can be transformed to η
−
and η
+
by the following equations.
η
−
=
2
r
m
+ 1
(28.42)
η
+
=
2r
m
r
m
+ 1
(28.43)
2 −η
−
= η
+
⇒ 2 −
2
r
m
+ 1
=
2r
m
r
m
+ 1
(28.44)
2 (r
m
+ 1) −2 = 2r
m
(28.45)
2r
m
= 2r
m
q.e.d. (28.46)
As mentioned, we can simply emulate this scheme by (1) ﬁrst building a secondary ﬁt
ness function v
′
based on the ranks of the individuals p
i
according to the original ﬁtness v
and Equation 28.24 and then (2) applying ﬁtness proportionate selection as given in Algo
rithm 28.9 on page 292. We therefore only need to set v
′
(p
i
) = 1 −P (select(p
i
)) (since Al
gorithm 28.9 on page 292 minimizes ﬁtness).
28.4.6 Random Selection
Random selection is a selection method which simply ﬁlls the mating pool with randomly
picked individual from the population. Obviously, to use this selection scheme for sur
vival/environmental selection, i. e., to determine which individuals are allowed to produce
oﬀspring (see Section 28.1.2.6 on page 260) makes little sense. This would turn the Evolu
tionary Algorithm into some sort of parallel random walk, since it means to disregard all
ﬁtness information.
However, after an actual mating pool has been ﬁlled, it may make sense to use random
selection to choose the parents for the diﬀerent individuals, especially in cases where there
are many parents such as in Evolution Strategies which are introduced in Chapter 30 on
page 359.
Algorithm 28.15: mate ←− randomSelection
r
(pop, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: counter variable
Output: mate: the randomly picked parents
1 begin
2 mate ←− ()
3 for i ←− 1 up to mps do
4 mate ←− addItem(mate, pop[⌊randomUni[0, ps)⌋])
5 return mate
Algorithm 28.15 provides a speciﬁcation of random selection with replacement. An
example implementation of this algorithm is given in Listing 56.15 on page 758.
28.4.7 Clearing and Simple Convergence Prevention (SCP)
In our experiments, especially in Genetic Programming [2888] and problems with discrete
objective functions [2912, 2914], we often use a very simple mechanism to prevent premature
convergence. Premature convergence, as outlined in Chapter 13, means that the optimization
process gets stuck at a local optimum and loses the ability to investigate other areas in the
28.4. SELECTION 303
search space, in particular the area where the global optimum resides. OurSCP method
is neither a ﬁtness nor a selection algorithm, but instead more of a ﬁlter which is placed
after computing the objectives and before applying a ﬁtness assignment process. Since our
method actively deletes individuals from the population, we placed it into this section.
The idea is simple: the more similar individuals exist in the population, the more likely
the optimization process has converged. It is not clear whether it converged to a global
optimum or to a local one. If it got stuck at a local optimum, the fraction of the population
which resides at that spot should be limited. Then, some individuals necessarily will be
located somewhere else and, if they survive, may ﬁnd a path away from the local optimum
towards a diﬀerent area in the search space. In case that the optimizer indeed converged to
the global optimum, such a measure does not cause problems because in the end, because
better points than the global optimum cannot be discovered anyway.
28.4.7.1 Clearing
The ﬁrst one to apply an explicit limitation of the number of individuals in an area of
the search space was P´etrowski [2167, 2168] whose clearing approach is applied in each
generation and works as speciﬁed in Algorithm 28.16 where ﬁtness is subject to minimization.
Basically, clearing divides the population of an EA into several subpopulations according
Algorithm 28.16: v
′
←− clearingFilter(pop, σ, k, v)
Input: pop: the list of individuals to apply clearing to
Input: σ: the clearing radius
Input: k: the nieche capacity
Input: [implicit] v: the ﬁtness values
Input: [implicit] dist: a distance measure in the genome or phenome
Data: n: the current number of winners
Data: i, j: counter variables
Data: P: counter variables
Output: v
′
: the new ﬁtness assignment
1 begin
2 v
′
←− fitnf
3 P ←− sortAsc(pop, v)
4 for i ←− 0 up to len(P) −1 do
5 if v(P[i]) < ∞ then
6 n ←− 1
7 for j ←− i + 1 up to len(P) −1 do
8 if (v(P[j]) < ∞) ∧ (dist(P[i], P[j]) < σ) then
9 if n < k then n ←− n + 1
10 else v
′
(P[j]) ←− ∞
11 return v
′
to a distance measure “dist” applied in the genotypic (G) or phenotypic space (X) in each
generation. The individuals of each subpopulation have at most the distance σ to the
ﬁttest individual in this niche. Then, the ﬁtness of all but the k best individuals in such a
subpopulation is set to the worst possible value. This eﬀectively prevents that a niche can
get too crowded. Sareni and Kr¨ahenb¨ uhl [2391] showed that this method is very promising.
Singh and Deb [2503] suggest a modiﬁed clearing approach which shifts individuals that
would be cleared farther away and reevaluates their ﬁtness.
304 28 EVOLUTIONARY ALGORITHMS
28.4.7.2 SCP
We modify this approach in two respects: We measure similarity not in form of a distance
in the search space G or the problem space X, but in the objective space Y ⊆ R
f 
. All
individuals are compared with each other. If two have exactly the same objective values
32
,
one of them is thrown away with probability
33
c ∈ [0, 1] and does not take part in any further
comparisons. This way, we weed out similar individuals without making any assumptions
about G or X and make room in the population and mating pool for a wider diversity of
candidate solutions. For c = 0, this prevention mechanism is turned oﬀ, for c = 1, all
remaining individuals will have diﬀerent objective values.
Although this approach is very simple, the results of our experiments were often
signiﬁcantly better with this convergence prevention method turned on than without
it [2188, 2888, 2912, 2914]. Additionally, in none of our experiments, the outcomes were
inﬂuenced negatively by this ﬁlter, which makes it even more robust than other methods for
convergence prevention like sharing or variety preserving. Algorithm 28.17 speciﬁes how our
simple mechanism works. It has to be applied after the evaluation of the objective values of
the individuals in the population and before any ﬁtness assignment or selection takes place.
Algorithm 28.17: P ←− scpFilter(pop, c, f )
Input: pop: the list of individuals to apply convergence prevention to
Input: c: the convergence prevention probability, c ∈ [0, 1]
Input: [implicit] f : the set of objective function(s)
Data: i, j: counter variables
Data: p: the individual checked in this generation
Output: P: the pruned population
1 begin
2 P ←− ()
3 for i ←− 0 up to len(P) −1 do
4 p ←− pop[i]
5 for j ←− len(P) −1 down to 0 do
6 if f (p.x) = f (pop
′
[j].x) then
7 if randomUni[0, 1) < c then
8 P ←− deleteItem(P, j)
9 P ←− addItem(P, p)
10 return P
If an individual p occurs n times in the population or if there are n individuals with
exactly the same objective values, Algorithm 28.17 cuts down the expected number of their
occurrences ESp to
ES(p) =
n
i=1
(1 −c)
i−1
=
n−1
i=0
(1 −c)
i
=
(1 −c)
n
−1
−c
=
1 −(1 −c)
n
c
(28.47)
In Figure 28.12, we sketch the expected number of remaining instances of the individual p
after this pruning process if it occurred n times in the population before Algorithm 28.17
was applied.
From Equation 28.47 follows that even a population of inﬁnite size which has fully con
verged to one single value will probably not contain more than
1
c
copies of this individual
32
The exactlythesamecriterion makes sense in combinatorial optimization and many Genetic Program
ming problems but may easily be replaced with a limit imposed on the Euclidian distance in realvalued
optimization problems, for instance.
33
instead of deﬁning a ﬁxed threshold k
28.4. SELECTION 305
5 10 15 n 25
cp=0.7
cp=0.5
cp=0.3
cp=0.2
cp=0.1
0
1
2
3
4
5
6
7
ES(p)
9
Figure 28.12: The expected numbers of occurrences for diﬀerent values of n and c.
after the simple convergence prevention has been applied. This threshold is also visible in
Figure 28.12.
lim
n→∞
ES(p) = lim
n→∞
1 −(1 −c)
n
c
=
1 −0
c
=
1
c
(28.48)
28.4.7.3 Discussion
In P´etrowski’s clearing approach [2167], the maximum number of individuals which can
survive in a niche was a ﬁxed constant k and, if less than k individuals resided in a niche,
none of them would be aﬀected. Diﬀerent from that, in SCP, an expected value of the
number of individuals allowed in a niche is speciﬁed with the probability c and may be
both, exceeded or undercut.
Another diﬀerence of the approaches arises from the space in which the distance is
computed: Whereas clearing prevents the EA from concentrating too much on a certain area
in the search or problem space, SCP stops it from keeping too many individuals with equal
utility. The former approach works against premature convergence to a certain solution
structure while the latter forces the EA to “keep track” of a trail to candidate solutions
with worse ﬁtness which may later evolve to good individuals with traits diﬀerent from
the currently exploited ones. Furthermore, since our method does not need to access any
information apart from the objective values, it is independent from the search and problem
space and can, once implemented, combined with any kind of populationbased optimizer.
Which of the two approaches is better has not yet been tested with comparative ex
periments and is part of our future work. At the present moment, we assume that in
realvalued search or problem spaces, clearing should be more suitable whereas we know
from experiments using our approach only that SCP performs very good in combinatorial
problems [2188, 2914] and Genetic Programming [2888].
306 28 EVOLUTIONARY ALGORITHMS
28.5 Reproduction
An optimization algorithm uses the information gathered up to step t for creating the can
didate solutions to be evaluated in step t + 1. There exist diﬀerent methods to do so. In
Evolutionary Algorithms, the aggregated information corresponds to the population pop
and the archive archive of best individuals, if such an archive is maintained. The search
operations searchOp ∈ Op used in the Evolutionary Algorithm family are called reproduc
tion operation, inspired by the biological procreation mechanisms
34
of mother nature [2303].
There are four basic operations:
1. Creation simply creates a new genotype without any ancestors or heritage.
2. Duplication directly copies an existing genotype without any modiﬁcation.
3. Mutation in Evolutionary Algorithms corresponds to small, random variations in the
genotype of an individual, exactly natural mutation
35
.
4. Like in sexual reproduction, recombination
36
combines two parental genotypes to a
new genotype including traits from both elders.
In the following, we will discuss these operations in detail and provide general deﬁnitions
form them. When an Evolutionary Algorithm starts, no information about the search space
has been gathered yet. Hence, unless seeding is applied (see Paragraph 28.1.2.2.2, we cannot
use existing candidate solutions to derive new ones and search operations with an arity higher
than zero cannot be applied. Creation is thus used to ﬁll the initial population pop(t = 1).
Deﬁnition D28.11 (Creation). The creation operation create
__
is used to produce a new
genotype g ∈ G with a random conﬁguration.
g = create
__
⇒g ∈ G (28.49)
For creating the initial population of the size ps, we deﬁne the function createPop(ps) in
Algorithm 28.18.
Algorithm 28.18: pop ←− createPop(ps)
Input: ps: the number of individuals in the new population
Input: [implicit] create: the creation operator
Data: i: a counter variable
Output: pop: the new population of randomly created individuals (len(pop) = s)
1 begin
2 pop ←− ()
3 for i ←− 0 up to ps −1 do
4 p ←− ∅
5 p.g ←− create
__
6 pop ←− addItem(pop, p)
7 return pop
Duplication is just a placeholder for copying an element of the search space, i. e., it is
what occurs when neither mutation nor recombination are applied. It is useful to increase
the share of a given type of individual in a population.
34
http://en.wikipedia.org/wiki/Reproduction [accessed 20070703]
35
http://en.wikipedia.org/wiki/Mutation [accessed 20070703]
36
http://en.wikipedia.org/wiki/Sexual_reproduction [accessed 20080317]
28.5. REPRODUCTION 307
Deﬁnition D28.12 (Duplication). The duplication operation duplicate : G →G is used
to create an exact copy of an existing genotype g ∈ G.
g = duplicate(g) ∀g ∈ G (28.50)
In most EAs, a unary search operation, mutation, is provided. This operation takes one
genotype as argument and modiﬁes it. The way this modiﬁcation is performed is application
dependent. It may happen in a randomized or in a deterministic fashion. Together with
duplication, mutation corresponds to the asexual reproduction in nature.
Deﬁnition D28.13 (Mutation). The mutation operation mutate : G → G is used to
create a new genotype g
n
∈ G by modifying an existing one.
g
n
= mutate(g) : g ∈ G ⇒ g
n
∈ G (28.51)
Like mutation, the recombination is a search operation inspired by natural reproduction. Its
role model is sexual reproduction. The goal of implementing a recombination operation is to
provide the search process with some means to join beneﬁcial features of diﬀerent candidate
solutions. This may happen in a deterministic or randomized fashion.
Notice that the term recombination is more general than crossover since it stands for
arbitrary search operations that combines the traits of two individuals. Crossover, however,
is only used if the elements search space G are linear representations. Then, it stands for
exchanging parts of these socalled strings.
Deﬁnition D28.14 (Recombination). The recombination
37
operation recombine : G
G → G is used to create a new genotype g
n
∈ G by combining the features of two existing
ones.
g
n
= recombine(g
a
, g
b
) : g
a
, g
b
∈ G ⇒ g
n
∈ G (28.52)
It is not unusual to chain the application of search operators, i. e., to perform something
like mutate(recombine(g
1
, g
2
)). The four operators are altogether used to reproduce whole
populations of individuals.
Deﬁnition D28.15 (reproducePop). The population reproduction operation pop =
reproducePop(mate, ps) is used to create a new population pop of ps individuals by applying
the reproduction operations to the mating pool mate.
In Algorithm 28.19, we give a speciﬁcation on how Evolutionary Algorithms usually would
apply the reproduction operations deﬁned so far to create a new population of ps individuals.
This procedure applies mutation and crossover at speciﬁc rates mr ∈ [0, 1] and cr ∈ [0, 1],
respectively.
37
http://en.wikipedia.org/wiki/Recombination [accessed 20070703]
308 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.19: pop ←− reproducePopEA(mate, ps, mr, cr)
Input: mate: the mating pool
Input: [implicit] mps: the mating pool size
Input: ps: the number of individuals in the new population
Input: mr: the mutation rate
Input: cr: the crossover rate
Input: [implicit] mutate, recombine: the mutation and crossover operators
Data: i, j: counter variables
Output: pop: the new population (len(pop) = ps)
1 begin
2 pop ←− ()
3 for i ←− 0 up to ps do
4 p ←− duplicate(mate[i mod mps])
5 if (randomUni[0, 1) < cr) ∧ (mps > 1) then
6 repeat
7 j ←− ⌊randomUni[0, mps)⌋
8 until j = (i mod mps)
9 p.g ←− recombine(p.g, mate[j].g)
10 if randomUni[0, 1) < mr then p.g ←− mutate(p.g)
11 pop ←− addItem(pop, p)
12 return pop
28.6 Maintaining a Set of NonDominated/NonPrevailed
Individuals
Most multiobjective optimization algorithms return a set X of best solutions x discovered
instead of a single individual. Many optimization techniques also internally keep track of
the set of best candidate solutions encountered during the search process. In Simulated An
nealing, for instance, it is quite possible to discover an optimal element x
⋆
and subsequently
depart from it to a local optimum x
⋆
l
. Therefore, optimizers normally carry a list of the
nonprevailed candidate solutions ever visited with them.
In scenarios where the search space G diﬀers from the problem space X it often makes
more sense to store the list of individual records P instead of just keeping a set x of the
best phenotypes X. Since the elements of the search space are no longer required at the
end of the optimization process, we deﬁne the simple operation “extractPhenotypes” which
extracts them from a set of individuals P.
∀x ∈ extractPhenotypes(P) ⇒ ∃p ∈ P : x = p.x (28.53)
28.6.1 Updating the Optimal Set
Whenever a new individual p is created, the set P holding the best individuals known so far
may change. It is possible that the new candidate solution must be included in the optimal
set or even prevails some of the phenotypes already contained therein which then must be
removed.
Deﬁnition D28.16 (updateOptimalSet). The function “updateOptimalSet” updates a
set of elements P
old
with the new individual p
new
.x. It uses knowledge of the prevalence
28.6. MAINTAINING A SET OF NONDOMINATED/NONPREVAILED INDIVIDUALS309
relation ≺ and the corresponding comparator function cmp
f
.
P
new
= updateOptimalSet(P
old
, p
new
, , ) P
old
, P
new
⊆ GX, p
new
∈ GX :
∀p
1
∈ P
old
∃p
2
∈ P
old
: p
2
.x ≺ p
1
.x ⇒ P
new
⊆ P
old
∪ ¦p
new
¦ ∧
∀p
1
∈ P
new
∃p
2
∈ P
new
: p
2
.x ≺
(28.54)
We deﬁne two equivalent approaches in Algorithm 28.20 and Algorithm 28.21 which perform
the necessary operations. Algorithm 28.20 creates a new, empty optimal set and successively
inserts nonprevailed elements whereas Algorithm 28.21 removes all elements which are pre
vailed by the new individual p
new
from the old set P
old
.
Algorithm 28.20: P
new
←− updateOptimalSet(P
old
, p
new
, cmp
f
)
Input: P
old
: a set of nonprevailed individuals as known before the creation of p
new
Input: p
new
: a new individual to be checked
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Output: P
new
: the set updated with the knowledge of p
new
1 begin
2 P
new
←− ∅
3 foreach p
old
∈ P
old
do
4 if (p
new
.x ≺ p
old
.x) then
5 P
new
←− P
new
∪ ¦p
old
¦
6 if p
old
.x ≺ p
new
.x then return P
old
7 return P
new
∪ ¦p
new
¦
Algorithm 28.21: P
new
←− updateOptimalSet(P
old
, p
new
, cmp
f
) (
nd
Version)
Input: P
old
: a set of nonprevailed individuals as known before the creation of p
new
Input: p
new
: a new individual to be checked
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Output: P
new
: the set updated with the knowledge of p
new
1 begin
2 P
new
←− P
old
3 foreach p
old
∈ P
old
do
4 if p
new
.x ≺ p
old
.x then
5 P
new
←− P
new
¸ ¦p
old
¦
6 else if p
old
.x ≺ p
new
.x then
7 return P
old
8 return P
new
∪ ¦p
new
¦
Especially in the case of Evolutionary Algorithms, not a single new element is created
in each generation but a set P. Let us deﬁne the operation “updateOptimalSetN” for this
purpose. This operation can easily be realized by iteratively applying “updateOptimalSet”,
as shown in Algorithm 28.22.
28.6.2 Obtaining NonPrevailed Elements
The function “updateOptimalSet” helps an optimizer to build and maintain a list of optimal
310 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.22: P
new
←− updateOptimalSetN(P
old
, P, cmp
f
)
Input: P
old
: the old set of nonprevailed individuals
Input: P: the set of new individuals to be checked for nonprevailedness
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Data: p: an individual from P
Output: P
new
: the updated set
1 begin
2 P
new
←− P
old
3 foreach p ∈ P do P
new
←− updateOptimalSet(P
new
, p, cmp
f
)
4 return P
new
individuals. When the optimization process ﬁnishes, the “extractPhenotypes” can then be
used to obtain the nonprevailed elements of the problem space and return them to the
user. However, not all optimization methods maintain an nonprevailed set all the time.
When they terminate, they have to extracts all optimal elements the set of individuals pop
currently known.
Deﬁnition D28.17 (extractBest). The function “extractBest” function extracts a set P
of nonprevailed (nondominated) individuals from any given set of individuals pop according
to the comparator cmp
f
.
∀P ⊆ pop ⊆ GX, P = extractBest(pop, cmp
f
) ⇒∀p
1
∈ P ,∃p
2
∈ pop : p
2
.x ≺ p
1
.x
(28.55)
Algorithm 28.23 deﬁnes one possible realization of “extractBest”. By the way, this approach
could also be used for updating an optimal set.
updateOptimalSet(P
old
, p
new
, cmp
f
) ≡ extractBest(P
old
cupp
new
, cmp
f
) (28.56)
Algorithm 28.23: P ←− extractBest(pop, cmp
f
)
Input: pop: the list to extract the nonprevailed individuals from
Data: p
any
, p
chk
: candidate solutions tested for supremacy
Data: i, j: counter variables
Output: P: the nonprevailed subset extracted from pop
1 begin
2 P ←− pop
3 for i ←− len(P) −1 down to 1 do
4 for j ←− i −1 down to 0 do
5 if P[i] ≺ P[j] then
6 P ←− deleteItemPj
7 i ←− i −1
8 else if P[j] ≺ P[i] then
9 P ←− deleteItemPi
10 return listToSetP
28.6. MAINTAINING A SET OF NONDOMINATED/NONPREVAILED INDIVIDUALS311
28.6.3 Pruning the Optimal Set
In some optimization problems, there may be very many if not inﬁnite many optimal indi
viduals. The set X of the best discovered candidate solutions computed by the optimization
algorithms, however, cannot grow inﬁnitely because we only have limited memory. The same
holds for the archives archive of nondominated individuals maintained by several MOEAs.
Therefore, we need to perform an action called pruning which reduces the size of the set of
nonprevailed individuals to a given limit k [605, 1959, 2645].
There exists a variety of possible pruning operations. Morse [1959] and Taboada and Coit
[2644], for instance, suggest to use clustering algorithms
38
for this purpose. In principle, also
any combination of the ﬁtness assignment and selection schemes discussed in Chapter 28
would do. It is very important that the loss of generality during a pruning operation is
minimized. Fieldsend et al. [912], for instance, point out that if the extreme elements of
the optimal frontier are lost, the resulting set may only represent a small fraction of the
optima that could have been found without pruning. Instead of working on the set X of best
discovered candidate solutions x, we again base our approach on a set of individuals P and
we deﬁne:
Deﬁnition D28.18 (pruneOptimalSet). The pruning operation “pruneOptimalSet” re
duces the size of a set P
old
of individuals to ﬁt to a given upper boundary k.
∀P
new
⊆ P
old
subseteqGX, k ∈ N
1
: P
new
= pruneOptimalSet(P
old
, k) ⇒[P
new
[ ≤ k
(28.57)
28.6.3.1 Pruning via Clustering
Algorithm 28.24 uses clustering to provide the functionality speciﬁed in this deﬁnition and
thereby realizes the idea of Morse [1959] and Taboada and Coit [2644]. Basically, any given
clustering algorithm could be used as replacement for “cluster” – see ?? on page ?? for more
information on clustering.
Algorithm 28.24: P
new
←− pruneOptimalSet
c
(P
old
, k)
Input: P
old
: the set to be pruned
Input: k: the maximum size allowed for the set (k > 0)
Input: [implicit] cluster: the clustering algorithm to be used
Input: [implicit] nucleus: the function used to determine the nuclei of the clusters
Data: B: the set of clusters obtained by the clustering algorithm
Data: b: a single cluster b ∈ B
Output: P
new
: the pruned set
1 begin
// obtain k clusters
2 B ←− cluster(P
old
)
3 P
new
←− ∅
4 foreach b ∈ B do P
new
←− P
new
∪ nucleus(b)
5 return P
new
38
You can ﬁnd a discussion of clustering algorithms in ??.
312 28 EVOLUTIONARY ALGORITHMS
28.6.3.2 Adaptive Grid Archiving
Let us discuss the adaptive grid archiving algorithm (AGA) as example for a more sophis
ticated approach to prune the set of nondominated / nonprevailed individuals. AGA has
been introduced for the Evolutionary Algorithm PAES
39
by Knowles and Corne [1552] and
uses the objective values (computed by the set of objective functions f ) directly. Hence, it
can treat the individuals as [f [dimensional vectors where each dimension corresponds to one
objective function f ∈ f . This [f [dimensional objective space Y is divided in a grid with d
divisions in each dimension. Its span in each dimension is deﬁned by the corresponding min
imum and maximum objective values. The individuals with the minimum/maximum values
are always preserved. This circumvents the phenomenon of narrowing down the optimal
set described by Fieldsend et al. [912] and distinguishes the AGA approach from clustering
based methods. Hence, it is not possible to deﬁne maximum set sizes k which are smaller
than 2[f [. If individuals need to be removed from the set because it became too large, the
AGA approach removes those that reside in regions which are the most crowded.
The original sources outline the algorithm basically with with descriptions and deﬁni
tions. Here, we introduce a more or less trivial speciﬁcation in Algorithm 28.25 on the next
page and Algorithm 28.26 on page 314.
The function “divideAGA” is internally used to perform the grid division. It transforms
the objective values of each individual to grid coordinates stored in the array lst. Further
more, “divideAGA” also counts the number of individuals that reside in the same coordi
nates for each individual and makes it available in cnt. It ensures the preservation of border
individuals by assigning a negative cnt value to them. This basically disables their later
disposal by the pruning algorithm “pruneOptimalSetAGA” since “pruneOptimalSetAGA”
deletes the individuals from the set P
old
that have largest cnt values ﬁrst.
39
PAES is discussed in ?? on page ??
28.6. MAINTAINING A SET OF NONDOMINATED/NONPREVAILED INDIVIDUALS313
Algorithm 28.25: (P
l
, lst, cnt) ←− divideAGA(P
old
, d)
Input: P
old
: the set to be pruned
Input: d: the number of divisions to be performed per dimension
Input: [implicit] f : the set of objective functions
Data: i, j: counter variables
Data: mini, maxi, mul: temporary stores
Output: (P
l
, lst, cnt): a tuple containing the list representation P
l
of P
old
, a list lst
assigning grid coordinates to the elements of P
l
and a list cnt containing
the number of elements in those grid locations
1 begin
2 mini ←− createList([f [, ∞)
3 maxi ←− createList([f [, −∞)
4 for i ←− [f [ −1 down to 0 do
5 mini[i −1] ←− min¦f
i
(p.x) ∀p ∈ P
old
¦
6 maxi[i −1] ←− max¦f
i
(p.x) ∀p ∈ P
old
¦
7 mul ←− createList([f [, 0)
8 for i ←− [f [ −1 down to 0 do
9 if maxi[i] ,= mini[i] then
10 mul[i] ←−
d
maxi[i] −mini[i]
11 else
12 maxi[i] ←− maxi[i] + 1
13 mini[i] ←− mini[i] −1
14 P
l
←− setToListP
old
15 lst ←− createList(len(P
l
) , ∅)
16 cnt ←− createList(len(P
l
) , 0)
17 for i ←− len(P
l
) −1 down to 0 do
18 lst[i] ←− createList([f [, 0)
19 for j ←− 1 up to [f [ do
20 if (f
j
(P
l
[i]) ≤ mini[j −1]) ∨ (f
j
(P
l
[i]) ≥ maxi[j −1]) then
21 cnt[i] ←− cnt[i] −2
22 lst[i][j −1] ←− ⌊(f
j
(P
l
[i]) −mini[j −1]) ∗ mul[j −1]⌋
23 if cnt[i] > 0 then
24 for j ←− i + 1 up to len(P
l
) −1 do
25 if lst[i] = lst[j] then
26 cnt[i] ←− cnt[i] + 1
27 if cnt[j] > 0 then cnt[j] ←− cnt[j] + 1
28 return (P
l
, lst, cnt)
314 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.26: P
new
←− pruneOptimalSetAGA(P
old
, d, k)
Input: Pold: the set to be pruned
Input: d: the number of divisions to be performed per dimension
Input: k: the maximum size allowed for the set (k ≥ 2[f [)
Input: [implicit] f : the set of objective functions
Data: i: a counter variable
Data: P
l
: the list representation of P
old
Data: lst: a list assigning grid coordinates to the elements of P
l
Data: cnt: the number of elements in the grid locations deﬁned in lst
Output: P
new
: the pruned set
1 begin
2 if len(P
old
) ≤ k then return P
old
3 (P
l
, lst, cnt) ←− divideAGA(P
old
, d)
4 while len(P
l
) > k do
5 idx ←− 0
6 for i ←− len(P
l
) −1 down to 1 do
7 if cnt[i] > cnt[idx] then idx ←− i
8 for i ←− len(P
l
) −1 down to 0 do
9 if (lst[i] = lst[idx]) ∧ (cnt[i] > 0) then cnt[i] ←− cnt[i] −1
10 P
l
←− deleteItemP
l
idx
11 cnt ←− deleteItemcntidx
12 lst ←− deleteItemlstidx
13 return listToSetP
l
28.7 General Information on Evolutionary Algorithms
28.7.1 Applications and Examples
Table 28.4: Applications and Examples of Evolutionary Algorithms.
Area References
Art [1174, 2320, 2445]; see also Tables 29.1 and 31.1
Astronomy [480]
Chemistry [587, 678, 2676]; see also Tables 29.1, 30.1, 31.1, and 32.1
Combinatorial Problems [308, 354, 360, 371, 400, 444, 445, 451, 457, 491, 564, 567,
619, 644, 649, 690, 802, 807, 808, 810, 812, 820, 1011–1013,
1117, 1145, 1148, 1149, 1177, 1290, 1311, 1417, 1429, 1437,
1486, 1532, 1618, 1653, 1696, 1730, 1803, 1822–1825, 1830,
1835, 1859, 1871, 1884, 1961, 2025, 2055, 2105, 2144, 2162,
2188, 2189, 2245, 2335, 2448, 2499, 2535, 2633, 2663, 2669,
2772, 2811, 2862, 3027, 3064, 3071, 3101]; see also Tables
29.1, 30.1, 31.1, 33.1, 34.1, and 35.1
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 315
Computer Graphics [1452, 1577, 1661, 2096, 2487, 2488, 2651]; see also Tables
29.1, 30.1, 31.1, and 33.1
Control [1710]; see also Tables 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork [1996, 2424]; see also Tables 29.1 and 31.1
Data Compression see Table 31.1
Data Mining [144, 360, 480, 1048, 1746, 2213, 2503, 2644, 2645, 2676,
3018]; see also Tables 29.1, 31.1, 32.1, 34.1, and 35.1
Databases [371, 567, 1145, 1696, 1822, 1824, 1825, 2633, 3064, 3071];
see also Tables 29.1, 31.1, and 35.1
Decision Making see Table 31.1
Digital Technology [1479]; see also Tables 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys
tems
[89, 112, 125, 156, 202, 328, 352–354, 404, 439, 545, 605,
663, 669, 702, 705, 786, 811, 843, 871, 882, 963, 1006, 1007,
1094, 1225, 1292, 1298, 1532, 1574, 1633, 1718, 1733, 1749,
1800, 1835, 1838, 1863, 1883, 1995, 2020, 2058, 2071, 2100,
2387, 2425, 2440, 2448, 2470, 2493, 2628, 2644, 2645, 2663,
2694, 2791, 2862, 2864, 2893, 2903, 3027, 3061, 3085]; see
also Tables 29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
ELearning [1158, 1298, 2628, 2763]; see also Table 29.1
Economics and Finances [549, 585, 1718, 1825, 2622]; see also Tables 29.1, 31.1, 33.1,
and 35.1
Engineering [112, 156, 267, 517, 535, 567, 579, 605, 644, 669, 677, 777,
786, 787, 853, 871, 882, 953, 1094, 1225, 1310, 1323, 1479,
1486, 1508, 1574, 1710, 1717, 1733, 1746, 1749, 1830, 1835,
1838, 1883, 2058, 2059, 2083, 2111, 2144, 2189, 2320, 2364,
2425, 2448, 2468, 2499, 2535, 2644, 2645, 2663, 2677, 2693,
2694, 2743, 2799, 2858, 2861, 2862, 2864]; see also Tables
29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
Function Optimization [742, 795, 853, 855, 1018, 1523, 1727, 1927, 1986, 2099, 2210,
2799]; see also Tables 29.1, 30.1, 32.1, 33.1, 34.1, and 35.1
Games [2350]; see also Tables 29.1, 31.1, and 32.1
Graph Theory [39, 63, 67, 89, 112, 202, 328, 329, 353, 354, 439, 644, 669,
786, 804, 843, 882, 963, 1007, 1094, 1225, 1290, 1486, 1532,
1574, 1733, 1749, 1800, 1835, 1863, 1883, 1934, 1995, 2020,
2025, 2058, 2100, 2144, 2189, 2237, 2387, 2425, 2440, 2448,
2470, 2493, 2663, 2677, 2694, 2772, 2791, 2864, 3027, 3085];
see also Tables 29.1, 30.1, 31.1, 33.1, 34.1, and 35.1
Healthcare [567, 2021, 2537]; see also Tables 29.1, 31.1, and 35.1
Image Synthesis [291, 777, 2445, 2651]
Logistics [354, 444, 445, 451, 517, 579, 619, 802, 808, 810, 812, 820,
1011, 1013, 1148, 1149, 1177, 1429, 1653, 1730, 1803, 1823,
1830, 1859, 1869, 1884, 1961, 2059, 2111, 2162, 2188, 2189,
2245, 2669, 2811, 2912]; see also Tables 29.1, 30.1, 33.1, and
34.1
Mathematics [376, 377, 550, 1217, 1661, 2350, 2407, 2677]; see also Tables
29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
Military and Defense [439, 1661, 1872]; see also Table 34.1
Motor Control [489]; see also Tables 33.1 and 34.1
MultiAgent Systems [967, 1661, 2144, 2268, 2693, 2766]; see also Tables 29.1, 31.1,
and 35.1
Multiplayer Games see Table 31.1
Physics [407, 678]; see also Tables 29.1, 31.1, 32.1, and 34.1
316 28 EVOLUTIONARY ALGORITHMS
Prediction [144, 1045]; see also Tables 29.1, 30.1, 31.1, and 35.1
Security [404, 540, 871, 1486, 1507, 1508, 1661, 1749]; see also Tables
29.1, 31.1, and 35.1
Software [703, 842, 1486, 1508, 1791, 1934, 1995, 2213, 2677, 2863,
2887]; see also Tables 29.1, 30.1, 31.1, 33.1, and 34.1
Sorting [1233]; see also Table 31.1
Telecommunication [820]
Testing [2863]; see also Table 31.3
Theorem Prooﬁng and Automatic
Veriﬁcation
see Table 29.1
Water Supply and Management [1268, 1986]
Wireless Communication [39, 63, 67, 68, 155, 644, 787, 1290, 1486, 1717, 1934, 2144,
2189, 2237, 2364, 2677]; see also Tables 29.1, 31.1, 33.1, 34.1,
and 35.1
28.7.2 Books
1. Handbook of Evolutionary Computation [171]
2. Variants of Evolutionary Algorithms for RealWorld Applications [567]
3. New Ideas in Optimization [638]
4. Multiobjective Problem Solving from Nature – From Concepts to Applications [1554]
5. Natural Intelligence for Scheduling, Planning and Packing Problems [564]
6. Advances in Evolutionary Computing – Theory and Applications [1049]
7. Bioinspired Algorithms for the Vehicle Routing Problem [2162]
8. Evolutionary Computation 1: Basic Algorithms and Operators [174]
9. NatureInspired Algorithms for Optimisation [562]
10. Telecommunications Optimization: Heuristic and Adaptive Techniques [640]
11. Hybrid Evolutionary Algorithms [1141]
12. Success in Evolutionary Computation [3014]
13. Computational Intelligence in Telecommunications Networks [2144]
14. Evolutionary Design by Computers [272]
15. Evolutionary Computation in Economics and Finance [549]
16. Parameter Setting in Evolutionary Algorithms [1753]
17. Evolutionary Computation in Dynamic and Uncertain Environments [3019]
18. Applications of MultiObjective Evolutionary Algorithms [597]
19. Introduction to Evolutionary Computing [865]
20. NatureInspired Informatics for Intelligent Applications and Knowledge Discovery: Implica
tions in Business, Science and Engineering [563]
21. Evolutionary Computation [847]
22. Evolutionary Computation [863]
23. Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8]
24. Evolutionary Computation 2: Advanced Algorithms and Operators [175]
25. Evolutionary Algorithms for Solving MultiObjective Problems [599]
26. Genetic and Evolutionary Computation for Image Processing and Analysis [466]
27. Evolutionary Computation: The Fossil Record [939]
28. Swarm Intelligence: Collective, Adaptive [1524]
29. Parallel Evolutionary Computations [2015]
30. Evolutionary Computation in Practice [3043]
31. Design by Evolution: Advances in Evolutionary Design [1234]
32. New Learning Paradigms in Soft Computing [1430]
33. Designing Evolutionary Algorithms for Dynamic Environments [1956]
34. How to Solve It: Modern Heuristics [1887]
35. Electromagnetic Optimization by Genetic Algorithms [2250]
36. Advances in Metaheuristics for Hard Optimization [2485]
37. Soft Computing: Integrating Evolutionary, Neural, and Fuzzy Systems [2685]
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 317
38. The Art of Artiﬁcial Evolution: A Handbook on Evolutionary Art and Music [2320]
39. Evolutionary Optimization [2394]
40. RuleBased Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis
and Design [460]
41. MultiObjective Optimization in Computational Intelligence: Theory and Practice [438]
42. Computer Simulation in Genetics [659]
43. Evolutionary Computation: A Uniﬁed Approach [715]
44. Genetic Algorithms in Search, Optimization, and Machine Learning [1075]
45. Evolutionary Computation in Data Mining [1048]
46. Introduction to Stochastic Search and Optimization [2561]
47. Advances in Evolutionary Algorithms [1581]
48. SelfAdaptive Heuristics for Evolutionary Computation [1617]
49. Representations for Genetic and Evolutionary Algorithms [2338]
50. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Pro
gramming, Genetic Algorithms [167]
51. Search Methodologies – Introductory Tutorials in Optimization and Decision Support Tech
niques [453]
52. Data Mining and Knowledge Discovery with Evolutionary Algorithms [994]
53. Evolution¨ are Algorithmen [2879]
54. Advances in Evolutionary Algorithms – Theory, Design and Practice [41]
55. Evolutionary Algorithms in Molecular Design [587]
56. MultiObjective Optimization Using Evolutionary Algorithms [740]
57. Evolutionary Dynamics in TimeDependent Environments [2941]
58. Evolutionary Computation: Theory and Applications [3025]
59. Evolutionary Computation [2987]
28.7.3 Conferences and Workshops
Table 28.5: Conferences and Workshops on Evolutionary Algorithms.
GECCO: Genetic and Evolutionary Computation Conference
History: 2011/07: Dublin, Ireland, see [841]
2010/07: Portland, OR, USA, see [30, 31]
2009/07: Montr´eal, QC, Canada, see [2342, 2343]
2008/07: Atlanta, GA, USA, see [585, 1519, 1872, 2268, 2537]
2007/07: London, UK, see [905, 2579, 2699, 2700]
2006/07: Seattle, WA, USA, see [1516, 2441]
2005/06: Washington, DC, USA, see [27, 304, 2337, 2339]
2004/06: Seattle, WA, USA, see [751, 752, 1515, 2079, 2280]
2003/07: Chicago, IL, USA, see [484, 485, 2578]
2002/07: New York, NY, USA, see [224, 478, 1675, 1790, 2077]
2001/07: San Francisco, CA, USA, see [1104, 2570]
2000/07: Las Vegas, NV, USA, see [1452, 2155, 2928, 2935, 2986]
1999/07: Orlando, FL, USA, see [211, 482, 2093, 2502]
CEC: IEEE Conference on Evolutionary Computation
History: 2011/06: New Orleans, LA, USA, see [2027]
2010/07: Barcelona, Catalonia, Spain, see [1354]
2009/05: Trondheim, Norway, see [1350]
318 28 EVOLUTIONARY ALGORITHMS
2008/06: Hong Kong (Xi¯anggˇ ang), China, see [1889]
2007/09: Singapore, see [1343]
2006/07: Vancouver, BC, Canada, see [3033]
2005/09: Edinburgh, Scotland, UK, see [641]
2004/06: Portland, OR, USA, see [1369]
2003/12: Canberra, Australia, see [2395]
2002/05: Honolulu, HI, USA, see [944]
2001/05: Gangnamgu, Seoul, Korea, see [1334]
2000/07: La Jolla, CA, USA, see [1333]
1999/07: Washington, DC, USA, see [110]
1998/05: Anchorage, AK, USA, see [2496]
1997/04: Indianapolis, IN, USA, see [173]
1996/05: Nagoya, Japan, see [1445]
1995/11: Perth, WA, Australia, see [1361]
1994/06: Orlando, FL, USA, see [1891]
EvoWorkshops: RealWorld Applications of Evolutionary Computing
History: 2011/04: Torino, Italy, see [788]
2009/04: T¨ ubingen, Germany, see [1052]
2008/05: Naples, Italy, see [1051]
2007/04: Val`encia, Spain, see [1050]
2006/04: Budapest, Hungary, see [2341]
2005/03: Lausanne, Switzerland, see [2340]
2004/04: Coimbra, Portugal, see [2254]
2003/04: Colchester, Essex, UK, see [2253]
2002/04: Kinsale, Ireland, see [465]
2001/04: Lake Como, Milan, Italy, see [343]
2000/04: Edinburgh, Scotland, UK, see [464]
1999/05: G¨ oteborg, Sweden, see [2197]
1998/04: Paris, France, see [1310]
PPSN: Conference on Parallel Problem Solving from Nature
History: 2012/09: Taormina, Italy, see [2675]
2010/09: Krak´ow, Poland, see [2412, 2413]
2008/09: Birmingham, UK and Dortmund, North RhineWestphalia, Germany, see
[2354, 3028]
2006/09: Reykjavik, Iceland, see [2355]
2002/09: Granada, Spain, see [1870]
2000/09: Paris, France, see [2421]
1998/09: Amsterdam, The Netherlands, see [866]
1996/09: Berlin, Germany, see [2818]
1994/10: Jerusalem, Israel, see [693]
1992/09: Brussels, Belgium, see [1827]
1990/10: Dortmund, North RhineWestphalia, Germany, see [2438]
WCCI: IEEE World Congress on Computational Intelligence
History: 2012/06: Brisbane, QLD, Australia, see [1411]
2010/07: Barcelona, Catalonia, Spain, see [1356]
2008/06: Hong Kong (Xi¯anggˇ ang), China, see [3104]
2006/07: Vancouver, BC, Canada, see [1341]
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 319
2002/05: Honolulu, HI, USA, see [1338]
1998/05: Anchorage, AK, USA, see [1330]
1994/06: Orlando, FL, USA, see [1324]
EMO: International Conference on Evolutionary MultiCriterion Optimization
History: 2011/04: Ouro Preto, MG, Brazil, see [2654]
2009/04: Nantes, France, see [862]
2007/03: Matsushima, Sendai, Japan, see [2061]
2005/03: Guanajuato, M´exico, see [600]
2003/04: Faro, Portugal, see [957]
2001/03: Z¨ urich, Switzerland, see [3099]
AE/EA: Artiﬁcial Evolution
History: 2011/10: Angers, France, see [2586]
2009/10: Strasbourg, France, see [608]
2007/10: Tours, France, see [1928]
2005/10: Lille, France, see [2657]
2003/10: Marseilles, France, see [1736]
2001/10: Le Creusot, France, see [606]
1999/11: Dunkerque, France, see [948]
1997/10: Nˆımes, France, see [1191]
1995/09: Brest, France, see [75]
1994/09: Toulouse, France, see [74]
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
ICANNGA: International Conference on Adaptive and Natural Computing Algorithms
History: 2011/04: Ljubljana, Slovenia, see [2589]
2009/04: Kuopio, Finland, see [1564]
2007/04: Warsaw, Poland, see [257, 258]
2005/03: Coimbra, Portugal, see [2297]
2003/04: Roanne, France, see [2142]
2001/04: Prague, Czech Republic, see [1641]
1999/04: Protoroˇz, Slovenia, see [801]
1997/04: Vienna, Austria, see [2527]
1995/04: Al`es, France, see [2141]
1993/04: Innsbruck, Austria, see [69]
ICCI: IEEE International Conference on Cognitive Informatics
See Table 10.2 on page 128.
ICNC: International Conference on Advances in Natural Computation
History: 2012/05: Ch´ ongq`ıng, China, see [574]
2011/07: Sh`anghˇ ai, China, see [1379]
2010/08: Y¯ant´ai, Sh¯and¯ong, China, see [1355]
2009/08: Ti¯anj¯ın, China, see [2848]
2008/10: J`ı’n´ an, Sh¯and¯ong, China, see [1150]
2007/08: Hˇaikˇ ou, Hˇain´an, China, see [1711]
2006/09: X¯ı’¯an, Shˇanx¯ı, China, see [1443, 1444]
2005/08: Ch´ angsh¯a, H´ un´an, China, see [2850–2852]
320 28 EVOLUTIONARY ALGORITHMS
SEAL: International Conference on Simulated Evolution and Learning
See Table 10.2 on page 130.
BIOMA: International Conference on Bioinspired Optimization Methods and their Applications
History: 2012/05: Bohinj, Slovenia, see [349]
2010/05: Ljubljana, Slovenia, see [921]
2008/10: Ljubljana, Slovenia, see [920]
2006/10: Ljubljana, Slovenia, see [919]
2004/10: Ljubljana, Slovenia, see [918]
BIONETICS: International Conference on BioInspired Models of Network, Information, and Com
puting Systems
See Table 10.2 on page 130.
FEA: International Workshop on Frontiers in Evolutionary Algorithms
History: 2005/07: Salt Lake City, UT, USA, see [2379]
2003/09: Cary, NC, USA, see [546]
2002/03: Research Triangle Park, NC, USA, see [508]
2000/02: Atlantic City, NJ, USA, see [2853]
1998/10: Research Triangle Park, NC, USA, see [2293]
1997/03: Research Triangle Park, NC, USA, see [2475]
ICARIS: International Conference on Artiﬁcial Immune Systems
See Table 10.2 on page 130.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
MICAI: Mexican International Conference on Artiﬁcial Intelligence
See Table 10.2 on page 131.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications)
See Table 10.2 on page 131.
CIMCA: International Conference on Computational Intelligence for Modelling, Control and Au
tomation
History: 2006/11: Sydney, NSW, Australia, see [1923]
2005/11: Vienna, Austria, see [1370]
CIS: International Conference on Computational Intelligence and Security
History: 2009/12: Bˇeij¯ıng, China, see [1393]
2008/12: S¯ uzh¯ou, Ji¯ angs¯ u, China, see [1392]
2007/12: Harbin, Heilongjiang China, see [1390]
2006/11: Guˇangzh¯ou, Guˇangd¯ ong, China, see [557]
2005/12: X¯ı’¯an, Shˇanx¯ı, China, see [1192, 1193]
CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology
See Table 10.2 on page 132.
EUFIT: European Congress on Intelligent Techniques and Soft Computing
See Table 10.2 on page 132.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 321
EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization
History: 2009/04: T¨ ubingen, Germany, see [645]
2008/03: Naples, Italy, see [2775]
2007/04: Val`encia, Spain, see [646]
2006/04: Budapest, Hungary, see [1116]
2005/03: Lausanne, Switzerland, see [2252]
2004/04: Coimbra, Portugal, see [1115]
GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation
History: 2009/06: Sh`anghˇ ai, China, see [3000]
GEM: International Conference on Genetic and Evolutionary Methods
History: 2011/07: Las Vegas, NV, USA, see [2552]
2010/07: Las Vegas, NV, USA, see [119]
2009/07: Las Vegas, NV, USA, see [121]
2008/06: Las Vegas, NV, USA, see [120]
2007/06: Las Vegas, NV, USA, see [122]
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
SoCPaR: International Conference on SOft Computing and PAttern Recognition
See Table 10.2 on page 133.
Progress in Evolutionary Computation: Workshops on Evolutionary Computation
History: 1993—1994: Australia, see [3023]
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
AJWS: AustraliaJapan Joint Workshop on Intelligent & Evolutionary Systems
History: 2001/11: Dunedin, New Zealand, see [1498]
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
ASC: IASTED International Conference on Artiﬁcial Intelligence and Soft Computing
See Table 10.2 on page 134.
EvoNUM: European Event on BioInspired Algorithms for Continuous Parameter Optimisation
History: 2011/04: Torino, Italy, see [2588]
FOCI: IEEE Symposium on Foundations of Computational Intelligence
History: 2007/04: Honolulu, HI, USA, see [1865]
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
IJCCI: International Conference on Computational Intelligence
History: 2010/10: Val`encia, Spain, see [914]
2009/10: Madeira, Portugal, see [1809]
IWNICA: International Workshop on Nature Inspired Computation and Applications
History: 2011/04: H´ef´ei,
¯
Anhu¯ı, China, see [2754]
322 28 EVOLUTIONARY ALGORITHMS
2010/10: H´ef´ei,
¯
Anhu¯ı, China, see [2758]
2008/05: H´ef´ei,
¯
Anhu¯ı, China, see [2757]
2006/10: H´ef´ei,
¯
Anhu¯ı, China, see [1122]
2004/10: H´ef´ei,
¯
Anhu¯ı, China, see [2755, 2756]
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
NaBIC: World Congress on Nature and Biologically Inspired Computing
History: 2011/11: Salamanca, Spain, see [2368]
2010/12: Kitakyushu, Japan, see [1545]
2009/12: Coimbatore, India, see [12]
WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms
History: 2010/09: Vienna, Austria, see [2311]
2007/07: London, UK, see [905]
2005/06: Oslo, Norway, see [479]
EvoRobots: International Symposium on Evolutionary Robotics
History: 2001/10: Tokyo, Japan, see [1099]
ICEC: International Conference on Evolutionary Computation
History: 2010/10: Val`encia, Spain, see [1276]
2009/10: Madeira, Portugal, see [2325]
IWACI: International Workshop on Advanced Computational Intelligence
History: 2010/08: S¯ uzh¯ou, Ji¯ angs¯ u, China, see [2995]
2009/10: Mexico City, M´exico, see [1875]
2008/06: Macau, China, see [2752]
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
SEMCCO: International Conference on Swarm, Evolutionary and Memetic Computing
History: 2010/12: Chennai, India, see [2595]
SSCI: IEEE Symposium Series on Computational Intelligence
Conference series which contains many other symposia and workshops.
History: 2011/04: Paris, France, see [1357]
2009/03: Nashville, TN, USA, see [1353]
2007/04: Honolulu, HI, USA, see [1345]
28.7.4 Journals
1. IEEE Transactions on Evolutionary Computation (IEEEEC) published by IEEE Computer
Society
2. Evolutionary Computation published by MIT Press
3. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics published by
IEEE Systems, Man, and Cybernetics Society
4. International Journal of Computational Intelligence and Applications (IJCIA) published by
Imperial College Press Co. and World Scientiﬁc Publishing Co.
5. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and
Springer Netherlands
6. Natural Computing: An International Journal published by Kluwer Academic Publishers and
Springer Netherlands
7. Journal of Heuristics published by Springer Netherlands
8. Journal of Artiﬁcial Evolution and Applications published by Hindawi Publishing Corporation
9. International Journal of Computational Intelligence Research (IJCIR) published by Research
India Publications
10. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) pub
lished by Idea Group Publishing (Idea Group Inc., IGI Global)
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 323
11. IEEE Computational Intelligence Magazine published by IEEE Computational Intelligence
Society
12. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
13. Computational Intelligence published by Wiley Periodicals, Inc.
14. International Journal of Swarm Intelligence and Evolutionary Computation (IJSIEC) pub
lished by Ashdin Publishing (AP)
324 28 EVOLUTIONARY ALGORITHMS
Tasks T28
75. Implement a general steadystate Evolutionary Algorithm. You may use Listing 56.11
as blueprint and create a new class derived from EABase (given in ?? on page ??) which
provides the steadystate functionality described in Paragraph 28.1.4.2.1 on page 266.
[20 points]
76. Implement a general Evolutionary Algorithm with archivebased elitism. You may use
Listing 56.11 as blueprint and create a new class derived from EABase (given in ?? on
page ??) which provides the steadystate functionality described in Section 28.1.4.4 on
page 267.
[20 points]
77. Implement the linear ranking selection scheme given in Section 28.4.5 on page 300
for your version of Evolutionary Algorithms. You may use Listing 56.14 as blueprint
implement the interface ISelectionAlgorithm given in Listing 55.13 in a new class.
[20 points]
Chapter 29
Genetic Algorithms
29.1 Introduction
Genetic Algorithms
1
(GAs) are a subclass of Evolutionary Algorithms where (a) the
genotypes g of the search space G are strings of primitive elements (usually all of the
same type) such as bits, integer, or real numbers, and (b) search operations such as
mutation and crossover directly modify these genotypes.. Because of the simple struc
ture of the search spaces of Genetic Algorithms, often a nonidentity mapping is used
as genotypephenotype mapping to translate the elements g ∈ G to candidate solutions
x ∈ X [1075, 1220, 1250, 2931].
Genetic Algorithms are the original prototype of Evolutionary Algorithms and adhere to
the description given in Section 28.1.1. They provide search operators which try to closely
copy sexual and asexual reproduction schemes from nature. In such “sexual” search opera
tions, the genoty