Global Optimization Algorithms

– Theory and Application –
3
r
d

E
d
Author: Thomas Weise
tweise@gmx.de
Edition: third
Version: 2011-12-07-15:31
Newest Version: http://www.it-weise.de/projects/book.pdf
Sources/CVS-Repository: goa-taa.cvs.sourceforge.net
2
Don
,
tprintme!
Also think about
the trees that
would have to
dieforthepaper!
This book is much more
useful as electronic resource:
youcansearchtermsandclick
links. You can t do that in a
printedbook.
,
Thebookisfrequentlyupdatedandimprovedas
well.Printedversionswilljustoutdate.
Don
,
tprintme!
Preface
This e-book is devoted to Global Optimization algorithms, which are methods for finding
solutions of high quality for an incredible wide range of problems. We introduce the basic
concepts of optimization and discuss features which make optimization problems difficult
and thus, should be considered when trying to solve them. In this book, we focus on
metaheuristic approaches like Evolutionary Computation, Simulated Annealing, Extremal
Optimization, Tabu Search, and Random Optimization. Especially the Evolutionary Com-
putation methods, which subsume Evolutionary Algorithms, Genetic Algorithms, Genetic
Programming, Learning Classifier Systems, Evolution Strategy, Differential Evolution, Par-
ticle Swarm Optimization, and Ant Colony Optimization, are discussed in detail.
In this third edition, we try to make a transition from a pure material collection and
compendium to a more structured book. We try to address two major audience groups:
1. Our book may help students, since we try to describe the algorithms in an under-
standable, consistent way. Therefore, we also provide the fundamentals and much
background knowledge. You can find (short and simplified) summaries on stochastic
theory and theoretical computer science in Part VI on page 638. Additionally, appli-
cation examples are provided which give an idea how problems can be tackled with
the different techniques and what results can be expected.
2. Fellow researchers and PhD students may find the application examples helpful too.
For them, in-depth discussions on the single approaches are included that are supported
with a large set of useful literature references.
The contents of this book are divided into three parts. In the first part, different op-
timization technologies will be introduced and their features are described. Often, small
examples will be given in order to ease understanding. In the second part starting at page
530, we elaborate on different application examples in detail. Finally, in the last part fol-
lowing at page 638, the aforementioned background knowledge is provided.
In order to maximize the utility of this electronic book, it contains automatic, clickable
links. They are shaded with dark gray so the book is still b/w printable. You can click on
1. entries in the table of contents,
2. citation references like “Heitk¨otter and Beasley [1220]”,
3. page references like “253”,
4. references such as “see Figure 28.1 on page 254” to sections, figures, tables, and listings,
and
5. URLs and links like “http://www.lania.mx/
˜
ccoello/EMOO/ [accessed 2007-10-25]”.
1
1
URLs are usually annotated with the date we have accessed them, like http://www.lania.mx/
˜
ccoello/EMOO/ [accessed 2007-10-25]. We can neither guarantee that their content remains unchanged, nor
that these sites stay available. We also assume no responsibility for anything we linked to.
4 PREFACE
The following scenario is an example for using the book: A student reads the text and
finds a passage that she wants to investigate in-depth. She clicks on a citation which seems
interesting and the corresponding reference is shown. To some of the references which are
online available, links are provided in the reference text. By clicking on such a link, the
Adobe Reader
R
2
will open another window and load the regarding document (or a browser
window of a site that links to the document). After reading it, the student may use the
“backwards” button in the Acrobat Reader’s navigation utility to go back to the text initially
read in the e-book.
If this book contains something you want to cite or reference in your work, please use
the citation suggestion provided in Chapter A on page 945. Also, I would be very happy
if you provide feedback, report errors or missing things that you have (or have not) found,
criticize something, or have any additional ideas or suggestions. Do not hesitate to contact
me via my email address tweise@gmx.de. Matter of fact, a large number of people helped
me to improve this book over time. I have enumerated the most important contributors in
Chapter D – Thank you guys, I really appreciate your help! At many places in this book
we refer to Wikipedia – The Free Encyclopedia [2938] which is a great source of knowledge.
Wikipedia – The Free Encyclopedia contains articles and definitions for many of the aspects
discussed in this book. Like this book, it is updated and improved frequently. Therefore,
including the links adds greatly to the book’s utility, in my opinion.
The updates and improvements will result in new versions of the book, which will
regularly appear at http://www.it-weise.de/projects/book.pdf. The complete
L
A
T
E
X source code of this book, including all graphics and the bibliography, is hosted
at SourceForge under http://sourceforge.net/projects/goa-taa/ in the CVS
repository goa-taa.cvs.sourceforge.net. You can browse the repository under
http://goa-taa.cvs.sourceforge.net/goa-taa/ and anonymously download the
complete sources of it by using the CVS command given in Listing 1.
3
cvs -z3 -d:pserver:anonymous@goa-taa.cvs.sourceforge.net:/cvsroot/goa-taa
checkout -P Book
Listing 1: The CVS command for anonymously checking out the complete book sources.
Compiling the sources requires multiple runs of L
A
T
E
X, BibT
E
X, and makeindex because of
the nifty way the references are incorporated. In the repository, an Ant-Script is provided for
doing this. Also, the complete bibliography of this book is stored in a MySQL
4
database and
the scripts for creating this database as well as tools for querying it (Java) and a Microsoft
Access
5
frontend are part of the sources you can download. These resources as well as all
contents of this book (unless explicitly stated otherwise) are licensed under the GNU Free
Documentation License (FDL, see Chapter B on page 947). Some of the algorithms provided
are made available under the LGPL license (see Chapter C on page 955)
Copyright c _ 2011 Thomas Weise.
Permission is granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License, Version 1.3 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in the Chapter B on page 947 entitled “GNU Free
Documentation License (FDL)”.
2
The Adobe Reader
R
is available for download at http://www.adobe.com/products/reader/ [ac-
cessed 2007-08-13].
3
More information can be found at http://sourceforge.net/scm/?type=cvs&group_id=264352
4
http://en.wikipedia.org/wiki/MySQL [accessed 2010-07-29]
5
http://en.wikipedia.org/wiki/Microsoft_Access [accessed 2010-07-29]
Contents
Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Part I Foundations
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1 What is Global Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1.2 Algorithms and Programs . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1.3 Optimization versus Dedicated Algorithms . . . . . . . . . . . . . . . 24
1.1.4 Structure of this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Types of Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2.1 Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . 24
1.2.2 Numerical Optimization Problems . . . . . . . . . . . . . . . . . . . . 27
1.2.3 Discrete and Mixed Problems . . . . . . . . . . . . . . . . . . . . . . 29
1.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3 Classes of Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.1 Algorithm Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.2 Monte Carlo Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.3.3 Heuristics and Metaheuristics . . . . . . . . . . . . . . . . . . . . . . 34
1.3.4 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.5 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4 Classification According to Optimization Time . . . . . . . . . . . . . . . . . 37
1.5 Number of Optimization Criteria . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2 Problem Space and Objective Functions . . . . . . . . . . . . . . . . . . . . 43
2.1 Problem Space: How does it look like? . . . . . . . . . . . . . . . . . . . . . 43
2.2 Objective Functions: Is it good? . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 Optima: What does good mean? . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 Single Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Extrema: Minima and Maxima of Differentiable Functions . . . . . . 52
3.1.2 Global Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Multiple Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Multiple Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 CONTENTS
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Lexicographic Optimization . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.3 Weighted Sums (Linear Aggregation) . . . . . . . . . . . . . . . . . . 62
3.3.4 Weighted Min-Max . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.5 Pareto Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Genome and Phenome-based Approaches . . . . . . . . . . . . . . . . 71
3.4.2 Death Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.3 Penalty Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.4 Constraints as Additional Objectives . . . . . . . . . . . . . . . . . . 72
3.4.5 The Method Of Inequalities . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.6 Constrained-Domination . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4.7 Limitations and Other Methods . . . . . . . . . . . . . . . . . . . . . 75
3.5 Unifying Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 External Decision Maker . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Prevalence Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 Search Space and Operators: How can we find it? . . . . . . . . . . . . . . 81
4.1 The Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 The Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 The Connection between Search and Problem Space . . . . . . . . . . . . . . 85
4.4 Local Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.1 Local Optima of single-objective Problems . . . . . . . . . . . . . . . 89
4.4.2 Local Optima of Multi-Objective Problems . . . . . . . . . . . . . . . 89
4.5 Further Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Fitness and Problem Landscape: How does the Optimizer see it? . . . . 93
5.1 Fitness Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Fitness as a Relative Measure of Utility . . . . . . . . . . . . . . . . . . . . . 94
5.3 Problem Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 The Structure of Optimization: Putting it together. . . . . . . . . . . . . 101
6.1 Involved Spaces and Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Other General Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.1 Gradient Descend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3.2 Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.3 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.4 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.5 Modeling and Simulating . . . . . . . . . . . . . . . . . . . . . . . . . 106
7 Solving an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . 109
8 Baseline Search Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.1 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.2.1 Adaptive Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Exhaustive Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9 Forma Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
CONTENTS 7
10 General Information on Optimization . . . . . . . . . . . . . . . . . . . . . . 123
10.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Part II Difficulties in Optimization
11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
12 Problem Hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.1 Algorithmic Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.2 Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.2.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.2.2 T, AT, Hardness, and Completeness . . . . . . . . . . . . . . . . . . 146
12.3 The Problem: Many Real-World Tasks are AT-hard . . . . . . . . . . . . . 147
12.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
13 Unsatisfying Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
13.2 The Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.1 Premature Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.2 Non-Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . 152
13.2.3 Domino Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
13.3 One Cause: Loss of Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
13.3.1 Exploration vs. Exploitation . . . . . . . . . . . . . . . . . . . . . . . 155
13.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.1 Search Operator Design . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.2 Restarting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
13.4.3 Low Selection Pressure and Population Size . . . . . . . . . . . . . . 157
13.4.4 Sharing, Niching, and Clearing . . . . . . . . . . . . . . . . . . . . . . 157
13.4.5 Self-Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
13.4.6 Multi-Objectivization . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
14 Ruggedness and Weak Causality . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.1 The Problem: Ruggedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.2 One Cause: Weak Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
14.3 Fitness Landscape Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.3.1 Autocorrelation and Correlation Length . . . . . . . . . . . . . . . . . 163
14.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
15 Deceptiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
16 Neutrality and Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
16.2 Evolvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
16.3 Neutrality: Problematic and Beneficial . . . . . . . . . . . . . . . . . . . . . 172
16.4 Neutral Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
16.5 Redundancy: Problematic and Beneficial . . . . . . . . . . . . . . . . . . . . 174
16.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16.7 Needle-In-A-Haystack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8 CONTENTS
17 Epistasis, Pleiotropy, and Separability . . . . . . . . . . . . . . . . . . . . . 177
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.1.1 Epistasis and Pleiotropy . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.1.2 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
17.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
17.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.1 Choice of the Representation . . . . . . . . . . . . . . . . . . . . . . . 182
17.3.2 Parameter Tweaking . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
17.3.3 Linkage Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
17.3.4 Cooperative Coevolution . . . . . . . . . . . . . . . . . . . . . . . . . 184
18 Noise and Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.1 Introduction – Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
18.2 The Problem: Need for Robustness . . . . . . . . . . . . . . . . . . . . . . . 188
18.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
19 Overfitting and Oversimplification . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
19.1.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
19.2 Oversimplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.2.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
19.2.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
20 Dimensionality (Objective Functions) . . . . . . . . . . . . . . . . . . . . . . 197
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
20.2 The Problem: Many-Objective Optimization . . . . . . . . . . . . . . . . . . 198
20.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.1 Increasing Population Size . . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.2 Increasing Selection Pressure . . . . . . . . . . . . . . . . . . . . . . . 201
20.3.3 Indicator Function-based Approaches . . . . . . . . . . . . . . . . . . 201
20.3.4 Scalerizing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 202
20.3.5 Limiting the Search Area in the Objective Space . . . . . . . . . . . . 202
20.3.6 Visualization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 202
21 Scale (Decision Variables) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
21.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
21.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
21.2.1 Parallelization and Distribution . . . . . . . . . . . . . . . . . . . . . 204
21.2.2 Genetic Representation and Development . . . . . . . . . . . . . . . . 204
21.2.3 Exploiting Separability . . . . . . . . . . . . . . . . . . . . . . . . . . 204
21.2.4 Combination of Techniques . . . . . . . . . . . . . . . . . . . . . . . . 205
22 Dynamically Changing Fitness Landscape . . . . . . . . . . . . . . . . . . . 207
23 The No Free Lunch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
23.1 Initial Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
23.2 The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
23.3 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
23.4 Infinite and Continuous Domains . . . . . . . . . . . . . . . . . . . . . . . . 213
23.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
CONTENTS 9
24 Lessons Learned: Designing Good Encodings . . . . . . . . . . . . . . . . . 215
24.1 Compact Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
24.2 Unbiased Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
24.3 Surjective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.4 Injective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.5 Consistent GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
24.6 Formae Inheritance and Preservation . . . . . . . . . . . . . . . . . . . . . . 217
24.7 Formae in Genotypic Space Aligned with Phenotypic Formae . . . . . . . . . 217
24.8 Compatibility of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
24.9 Representation with Causality . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.10Combinations of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.11Reachability of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.12Influence of Formae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
24.13Scalability of Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.14Appropriate Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.15Indirect Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
24.15.1Extradimensional Bypass: Example for Good Complexity . . . . . . . 219
Part III Metaheuristic Optimization Algorithms
25 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
25.1 General Information on Metaheuristics . . . . . . . . . . . . . . . . . . . . . 226
25.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 226
25.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
25.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 227
25.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
26 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.3 Multi-Objective Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.4 Problems in Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
26.5 Hill Climbing With Random Restarts . . . . . . . . . . . . . . . . . . . . . . 232
26.6 General Information on Hill Climbing . . . . . . . . . . . . . . . . . . . . . . 234
26.6.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 234
26.6.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
26.7 Raindrop Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
26.7.1 General Information on Raindrop Method . . . . . . . . . . . . . . . 236
27 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.1.1 Metropolis Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
27.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
27.2.1 No Boltzmann’s Constant . . . . . . . . . . . . . . . . . . . . . . . . . 246
27.2.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
27.3 Temperature Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
27.3.1 Logarithmic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 248
27.3.2 Exponential Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 249
27.3.3 Polynomial and Linear Scheduling . . . . . . . . . . . . . . . . . . . . 249
27.3.4 Adaptive Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
27.3.5 Larger Step Widths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
10 CONTENTS
27.4 General Information on Simulated Annealing . . . . . . . . . . . . . . . . . . 250
27.4.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 250
27.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
28 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
28.1.1 The Basic Cycle of EAs . . . . . . . . . . . . . . . . . . . . . . . . . . 254
28.1.2 Biological and Artificial Evolution . . . . . . . . . . . . . . . . . . . . 255
28.1.3 Historical Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 263
28.1.4 Populations in Evolutionary Algorithms . . . . . . . . . . . . . . . . . 265
28.1.5 Configuration Parameters of Evolutionary Algorithms . . . . . . . . . 269
28.2 Genotype-Phenotype Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 270
28.2.1 Direct Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
28.2.2 Indirect Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
28.3 Fitness Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
28.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
28.3.2 Weighted Sum Fitness Assignment . . . . . . . . . . . . . . . . . . . . 275
28.3.3 Pareto Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
28.3.4 Sharing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
28.3.5 Variety Preserving Fitness Assignment . . . . . . . . . . . . . . . . . 279
28.3.6 Tournament Fitness Assignment . . . . . . . . . . . . . . . . . . . . . 284
28.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
28.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
28.4.2 Truncation Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
28.4.3 Fitness Proportionate Selection . . . . . . . . . . . . . . . . . . . . . 290
28.4.4 Tournament Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
28.4.5 Linear Ranking Selection . . . . . . . . . . . . . . . . . . . . . . . . . 300
28.4.6 Random Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
28.4.7 Clearing and Simple Convergence Prevention (SCP) . . . . . . . . . . 302
28.5 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
28.6 Maintaining a Set of Non-Dominated/Non-Prevailed Individuals . . . . . . . 308
28.6.1 Updating the Optimal Set . . . . . . . . . . . . . . . . . . . . . . . . 308
28.6.2 Obtaining Non-Prevailed Elements . . . . . . . . . . . . . . . . . . . . 309
28.6.3 Pruning the Optimal Set . . . . . . . . . . . . . . . . . . . . . . . . . 311
28.7 General Information on Evolutionary Algorithms . . . . . . . . . . . . . . . 314
28.7.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 314
28.7.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
28.7.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 317
28.7.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
29 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
29.2 Genomes in Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 326
29.2.1 Classification According to Length . . . . . . . . . . . . . . . . . . . . 326
29.2.2 Classification According to Element Type . . . . . . . . . . . . . . . . 327
29.2.3 Classification According to Meaning of Loci . . . . . . . . . . . . . . 327
29.2.4 Introns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
29.3 Fixed-Length String Chromosomes . . . . . . . . . . . . . . . . . . . . . . . 328
29.3.1 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 328
29.3.2 Mutation: Unary Reproduction . . . . . . . . . . . . . . . . . . . . . 330
29.3.3 Permutation: Unary Reproduction . . . . . . . . . . . . . . . . . . . . 333
CONTENTS 11
29.3.4 Crossover: Binary Reproduction . . . . . . . . . . . . . . . . . . . . . 335
29.4 Variable-Length String Chromosomes . . . . . . . . . . . . . . . . . . . . . . 339
29.4.1 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 340
29.4.2 Insertion and Deletion: Unary Reproduction . . . . . . . . . . . . . . 340
29.4.3 Crossover: Binary Reproduction . . . . . . . . . . . . . . . . . . . . . 340
29.5 Schema Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
29.5.1 Schemata and Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
29.5.2 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
29.5.3 Holland’s Schema Theorem . . . . . . . . . . . . . . . . . . . . . . . . 342
29.5.4 Criticism of the Schema Theorem . . . . . . . . . . . . . . . . . . . . 345
29.5.5 The Building Block Hypothesis . . . . . . . . . . . . . . . . . . . . . 346
29.5.6 Genetic Repair and Similarity Extraction . . . . . . . . . . . . . . . . 346
29.6 The Messy Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.2 Reproduction Operations . . . . . . . . . . . . . . . . . . . . . . . . . 347
29.6.3 Overspecification and Underspecification . . . . . . . . . . . . . . . . 348
29.6.4 The Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
29.7 Random Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.8 General Information on Genetic Algorithms . . . . . . . . . . . . . . . . . . 351
29.8.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 351
29.8.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
29.8.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 353
29.8.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
30 Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
30.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
30.3 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
30.3.1 Dominant Recombination . . . . . . . . . . . . . . . . . . . . . . . . . 362
30.3.2 Intermediate Recombination . . . . . . . . . . . . . . . . . . . . . . . 362
30.4 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
30.4.1 Using Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . 364
30.5 Parameter Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
30.5.1 The 1/5th Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
30.5.2 Endogenous Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 370
30.6 General Information on Evolution Strategies . . . . . . . . . . . . . . . . . . 373
30.6.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 373
30.6.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
31 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
31.2 General Information on Genetic Programming . . . . . . . . . . . . . . . . . 382
31.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 382
31.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
31.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 383
31.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3 (Standard) Tree Genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
31.3.2 Creation: Nullary Reproduction . . . . . . . . . . . . . . . . . . . . . 389
31.3.3 Node Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
31.3.4 Unary Reproduction Operations . . . . . . . . . . . . . . . . . . . . . 393
12 CONTENTS
31.3.5 Recombination: Binary Reproduction . . . . . . . . . . . . . . . . . . 398
31.3.6 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
31.4 Linear Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
31.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
31.4.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . 404
31.4.3 Realizations and Implementations . . . . . . . . . . . . . . . . . . . . 405
31.4.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
31.4.5 General Information on Linear Genetic Programming . . . . . . . . . 408
31.5 Grammars in Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . 409
31.5.1 General Information on Grammar-Guided Genetic Programming . . . 409
31.6 Graph-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
31.6.1 General Information on Grahp-based Genetic Programming . . . . . 409
32 Evolutionary Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
32.1 General Information on Evolutionary Programming . . . . . . . . . . . . . . 414
32.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 414
32.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
33 Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.2 Ternary Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
33.2.1 Advanced Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
33.3 General Information on Differential Evolution . . . . . . . . . . . . . . . . . 421
33.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 421
33.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
34 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . 427
34.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
34.2 General Information on Estimation Of Distribution Algorithms . . . . . . . 428
34.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 428
34.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
34.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 429
34.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
34.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
34.4 EDAs Searching Bit Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
34.4.1 UMDA: Univariate Marginal Distribution Algorithm . . . . . . . . . 434
34.4.2 PBIL: Population-Based Incremental Learning . . . . . . . . . . . . . 438
34.4.3 cGA: Compact Genetic Algorithm . . . . . . . . . . . . . . . . . . . . 439
34.5 EDAs Searching Real Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 442
34.5.1 SHCLVND: Stochastic Hill Climbing with Learning by Vectors of
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
34.5.2 PBIL
C
: Continuous PBIL . . . . . . . . . . . . . . . . . . . . . . . . 444
34.5.3 Real-Coded PBIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
34.6 EDAs Searching Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
34.6.1 PIPE: Probabilistic Incremental Program Evolution . . . . . . . . . . 447
34.7 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
34.7.1 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
34.7.2 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
CONTENTS 13
35 Learning Classifier Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
35.2 Families of Learning Classifier Systems . . . . . . . . . . . . . . . . . . . . . 457
35.3 General Information on Learning Classifier Systems . . . . . . . . . . . . . . 459
35.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 459
35.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
36 Memetic and Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 463
36.1 Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
36.2 Lamarckian Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
36.3 Baldwin Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
36.4 General Information on Memetic Algorithms . . . . . . . . . . . . . . . . . . 467
36.4.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 467
36.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
37 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
37.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
37.2 General Information on Ant Colony Optimization . . . . . . . . . . . . . . . 473
37.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 473
37.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
38 River Formation Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
38.1 General Information on River Formation Dynamics . . . . . . . . . . . . . . 478
38.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 478
38.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
39 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
39.2 The Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . 481
39.2.1 Communication with Neighbors – Social Interaction . . . . . . . . . . 482
39.2.2 Particle Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
39.2.3 Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
39.3 General Information on Particle Swarm Optimization . . . . . . . . . . . . . 483
39.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 483
39.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
40 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
40.1 General Information on Tabu Search . . . . . . . . . . . . . . . . . . . . . . 490
40.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 490
40.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
14 CONTENTS
41 Extremal Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1.1 Self-Organized Criticality . . . . . . . . . . . . . . . . . . . . . . . . . 493
41.1.2 The Bak-Sneppen model of Evolution . . . . . . . . . . . . . . . . . . 493
41.2 Extremal Optimization and Generalized Extremal Optimization . . . . . . . 494
41.3 General Information on Extremal Optimization . . . . . . . . . . . . . . . . 496
41.3.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 496
41.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
42 GRASPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
42.1 General Information on GRAPSs . . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 500
42.1.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
43 Downhill Simplex (Nelder and Mead) . . . . . . . . . . . . . . . . . . . . . . 501
43.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
43.2 General Information on Downhill Simplex . . . . . . . . . . . . . . . . . . . 502
43.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 502
43.2.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 502
43.2.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
44 Random Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
44.1 General Information on Random Optimization . . . . . . . . . . . . . . . . . 506
44.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 506
44.1.2 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 506
44.1.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Part IV Non-Metaheuristic Optimization Algorithms
45 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
46 State Space Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
46.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
46.1.1 The Baseline: A Binary Goal Criterion . . . . . . . . . . . . . . . . . 511
46.1.2 State Space and Neighborhood Search . . . . . . . . . . . . . . . . . . 511
46.1.3 The Search Space as Graph . . . . . . . . . . . . . . . . . . . . . . . . 512
46.1.4 Key Efficiency Features . . . . . . . . . . . . . . . . . . . . . . . . . . 514
46.2 Uninformed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
46.2.1 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
46.2.2 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
46.2.3 Depth-Limited Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
46.2.4 Iteratively Deepening Depth-First Search . . . . . . . . . . . . . . . . 516
46.2.5 General Information on Uninformed Search . . . . . . . . . . . . . . . 517
46.3 Informed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
46.3.1 Greedy Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
46.3.2 A

Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
46.3.3 General Information on Informed Search . . . . . . . . . . . . . . . . 520
CONTENTS 15
47 Branch And Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
47.1 General Information on Branch And Bound . . . . . . . . . . . . . . . . . . 524
47.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 524
47.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
47.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 524
48 Cutting-Plane Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
48.1 General Information on Cutting-Plane Method . . . . . . . . . . . . . . . . . 528
48.1.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . . . 528
48.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
48.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . 528
Part V Applications
49 Real-World Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
49.1 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
49.1.1 Genetic Programming: Genome for Symbolic Regression . . . . . . . 531
49.1.2 Sample Data, Quality, and Estimation Theory . . . . . . . . . . . . . 532
49.1.3 Limits of Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . 535
49.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.2.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.3 Freight Transportation Planning . . . . . . . . . . . . . . . . . . . . . . . . . 536
49.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
49.3.2 Vehicle Routing in Theory and Practice . . . . . . . . . . . . . . . . . 538
49.3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
49.3.4 Evolutionary Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 542
49.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
49.3.6 Holistic Approach to Logistics . . . . . . . . . . . . . . . . . . . . . . 549
49.3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
50 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.2 Bit-String based Problem Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 553
50.2.1 Kauffman’s NK Fitness Landscapes . . . . . . . . . . . . . . . . . . . 553
50.2.2 The p-Spin Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
50.2.3 The ND Family of Fitness Landscapes . . . . . . . . . . . . . . . . . . 557
50.2.4 The Royal Road . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
50.2.5 OneMax and BinInt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
50.2.6 Long Path Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
50.2.7 Tunable Model for Problematic Phenomena . . . . . . . . . . . . . . 566
50.3 Real Problem Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
50.3.1 Single-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . 580
50.4 Close-to-Real Vehicle Routing Problem Benchmark . . . . . . . . . . . . . . 621
50.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
50.4.2 Involved Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
50.4.3 Problem Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
50.4.4 Objectives and Constraints . . . . . . . . . . . . . . . . . . . . . . . . 634
50.4.5 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Part VI Background
16 CONTENTS
51 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
51.1 Set Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
51.2 Special Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
51.3 Relations between Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
51.4 Operations on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
51.5 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
51.6 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
51.7 Binary Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
51.8 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
51.9 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
51.10Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
51.10.1Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
51.11Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
51.11.1Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
51.11.2Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
51.11.3Transformations between Sets and Lists . . . . . . . . . . . . . . . . . 650
52 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
53 Stochastic Theory and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 653
53.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
53.1.1 Probabily as defined by Bernoulli (1713) . . . . . . . . . . . . . . . . 654
53.1.2 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
53.1.3 The Limiting Frequency Theory of von Mises . . . . . . . . . . . . . . 655
53.1.4 The Axioms of Kolmogorov . . . . . . . . . . . . . . . . . . . . . . . . 656
53.1.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 657
53.1.6 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
53.1.7 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . 658
53.1.8 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . 659
53.1.9 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . 659
53.2 Parameters of Distributions and their Estimates . . . . . . . . . . . . . . . . 659
53.2.1 Count, Min, Max and Range . . . . . . . . . . . . . . . . . . . . . . . 660
53.2.2 Expected Value and Arithmetic Mean . . . . . . . . . . . . . . . . . . 661
53.2.3 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . 662
53.2.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
53.2.5 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . 665
53.2.6 Median, Quantiles, and Mode . . . . . . . . . . . . . . . . . . . . . . 665
53.2.7 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
53.2.8 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . 668
53.3 Some Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
53.3.1 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . 668
53.3.2 Poisson Distribution π
λ
. . . . . . . . . . . . . . . . . . . . . . . . . . 671
53.3.3 Binomial Distribution B(n, p) . . . . . . . . . . . . . . . . . . . . . . 674
53.4 Some Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 676
53.4.1 Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . 676
53.4.2 Normal Distribution N
_
µ, σ
2
_
. . . . . . . . . . . . . . . . . . . . . . 678
53.4.3 Exponential Distribution exp(λ) . . . . . . . . . . . . . . . . . . . . . 682
53.4.4 Chi-square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 684
53.4.5 Student’s t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 687
53.5 Example – Throwing a Dice . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
53.6 Estimation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
53.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
53.6.2 Likelihood and Maximum Likelihood Estimators . . . . . . . . . . . . 693
53.6.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
CONTENTS 17
53.7 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
53.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Part VII Implementation
54 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
55 The Specification Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
55.1 General Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
55.1.1 Search Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
55.1.2 Genotype-Phenotype Mapping . . . . . . . . . . . . . . . . . . . . . . 711
55.1.3 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
55.1.4 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
55.1.5 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 713
55.2 Algorithm Specific Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 716
55.2.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 717
55.2.2 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . 719
56 The Implementation Package . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
56.1 The Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 724
56.1.1 Algorithm Base Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 724
56.1.2 Baseline Search Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 729
56.1.3 Local Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 735
56.1.4 Population-based Metaheuristics . . . . . . . . . . . . . . . . . . . . . 746
56.2 Search Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
56.2.1 Operations for String-based Search Spaces . . . . . . . . . . . . . . . 784
56.2.2 Operations for Tree-based Search Spaces . . . . . . . . . . . . . . . . 809
56.2.3 Multiplexing Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 818
56.3 Data Structures for Special Search Spaces . . . . . . . . . . . . . . . . . . . 824
56.3.1 Tree-based Search Spaces as used in Genetic Programming . . . . . . 824
56.4 Genotype-Phenotype Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 833
56.5 Termination Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
56.6 Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
56.7 Utility Classes, Constants, and Routines . . . . . . . . . . . . . . . . . . . . 839
57 Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
57.1 Demos for String-based Problem Spaces . . . . . . . . . . . . . . . . . . . . . 859
57.1.1 Real-Vector based Problem Spaces X ⊆ R
n
. . . . . . . . . . . . . . . 859
57.1.2 Permutation-based Problem Spaces . . . . . . . . . . . . . . . . . . . 887
57.2 Demos for Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . 905
57.2.1 Demos for Genetic Programming Applied to Mathematical Problems 905
57.3 The VRP Benchmark Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 914
57.3.1 The Involved Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 914
57.3.2 The Optimization Utilities . . . . . . . . . . . . . . . . . . . . . . . . 925
Appendices
A Citation Suggestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
18 CONTENTS
B GNU Free Documentation License (FDL) . . . . . . . . . . . . . . . . . . . 947
B.1 Applicability and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 947
B.2 Verbatim Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.3 Copying in Quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.4 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
B.5 Combining Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
B.6 Collections of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
B.7 Aggregation with Independent Works . . . . . . . . . . . . . . . . . . . . . . 951
B.8 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.9 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.10 Future Revisions of this License . . . . . . . . . . . . . . . . . . . . . . . . . 952
B.11 Relicensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
C GNU Lesser General Public License (LGPL) . . . . . . . . . . . . . . . . . 955
C.1 Additional Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955
C.2 Exception to Section 3 of the GNU GPL . . . . . . . . . . . . . . . . . . . . 955
C.3 Conveying Modified Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
C.4 Object Code Incorporating Material from Library Header Files . . . . . . . . 956
C.5 Combined Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956
C.6 Combined Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957
C.7 Revised Versions of the GNU Lesser General Public License . . . . . . . . . 957
D Credits and Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217
List of Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221
Part I
Foundations
20 CONTENTS
Chapter 1
Introduction
1.1 What is Global Optimization?
One of the most fundamental principles in our world is the search for optimal states. It begins
in the microcosm in physics, where atom bonds
1
minimize the energy of electrons [2137].
When molecules combine to solid bodies during the process of freezing, they assume crystal
structures which come close to energy-optimal configurations. Such a natural optimization
process, of course, is not driven by any higher intention but the result from the laws of
physics.
The same goes for the biological principle of survival of the fittest [2571] which, together
with the biological evolution [684], leads to better adaptation of the species to their environ-
ment. Here, a local optimum is a well-adapted species that can maintain a stable population
in its niche.
biggest...
smallest...
fastest...
cheapest...
...withleastenergy
mostprecise...
mostsimilarto...
maximal...
fastest...
mostrobust...
...thatfulfillsalltimingconstraints
...withleastaerodynamicdrag
...onthesmallestpossiblearea
besttrade-offsbetween....
minimal...
Figure 1.1: Terms that hint: “Optimization Problem” (inspired by [428])
1
http://en.wikipedia.org/wiki/Chemical_bond [accessed 2007-07-12]
22 1 INTRODUCTION
As long as humankind exists, we strive for perfection in many areas. As sketched in
Figure 1.1 there exists an incredible variety of different terms that indicate that a situation
is actually an optimization problem. For example, we want to reach a maximum degree
of happiness with the least amount of effort. In our economy, profit and sales must be
maximized and costs should be as low as possible. Therefore, optimization is one of the oldest
sciences which even extends into daily life [2023]. We also can observe the phenomenon of
trade-offs between different criteria which are also common in many other problem domains:
One person may choose to work hard with the goal of securing stability and pursuing wealth
for her future life, accepting very little spare time in its current stage. Another person can,
deliberately, choose a lax life full of joy in the current phase but rather unsecure in the
long term. Similar, most optimization problems in real-world applications involve multiple,
usually conflicting, objectives.
Furthermore, many optimization problems involve certain constraints. The guy with the
relaxed lifestyle, for instance, regardless how lazy he might become, cannot waive dragging
himself to the supermarket from time to time in order to acquire food. The eager beaver, on
the other hand, will not be able to keep up with 18 hours of work for seven days per week in
the long run without damaging her health considerably and thus, rendering the goal of an
enjoyable future unattainable. Likewise, the parameters of the solutions of an optimization
problem may be constrained as well.
1.1.1 Optimization
Since optimization covers a wide variety of subject areas, there exist different points of view
on what it actually is. We approach this question by first clarifying the goal of optimization:
solving optimization problems.
Definition D1.1 (Optimization Problem (Economical View)). An optimization prob-
lem is a situation which requires deciding for one choice from a set of possible alternatives
in order to reach a predefined/required benefit at minimal costs.
Definition D1.2 (Optimization Problem (Simplified Mathematical View)). Solving
an optimization problem requires finding an input value x

for which a mathematical func-
tion f takes on the smallest possible value (while usually obeying to some restrictions on the
possible values of x

).
Every task which has the goal of approaching certain configurations considered as optimal in
the context of pre-defined criteria can be viewed as an optimization problem. After giving
these rough definitions which we will later refine in Section 6.2 on page 103 we can also
specify the topic of this book:
Definition D1.3 (Optimization). Optimization is the process of solving an optimization
problem, i. e., finding suitable solutions for it.
Definition D1.4 (Global Optimization). Global optimization is optimization with the
goal of finding solutions x

for a given optimization problem which have the property that
no other, better solutions exist.
If something is important, general, and abstract enough, there is always a mathemati-
cal discipline dealing with it. Global Optimization
2
[609, 2126, 2181, 2714] is also the
2
http://en.wikipedia.org/wiki/Global_optimization [accessed 2007-07-03]
1.1. WHAT IS GLOBAL OPTIMIZATION? 23
branch of applied mathematics and numerical analysis that focuses on, well, optimization.
Closely related and overlapping areas are mathematical economics
3
[559] and operations
research
4
[580, 1232].
1.1.2 Algorithms and Programs
The title of this book is Global Optimization Algorithms – Theory and Application and
so far, we have clarified what optimization is so the reader has an idea about the general
direction in which we are going to drift. Before giving an overview about the contents to
follow, let us take a quick look into the second component of the book title, algorithms.
The term algorithm comprises essentially all forms of “directives what to do to reach a
certain goal”. A culinary receipt is an algorithm, for example, since it tells how much of
which ingredient is to be added to a meal in which sequence and how everything should
be heated. The commands inside the algorithms can be very concise or very imprecise,
depending on the area of application. How accurate can we, for instance, carry out the
instruction “Add a tablespoon of sugar.”? Hence, algorithms are a very wide field that
there exist numerous different, rather fuzzy definitions for the word algorithm [46, 141, 157,
635, 1616]. We base ours on the one given by Wikipedia:
Definition D1.5 (Algorithm (Wikipedia)). An algorithm
5
is a finite set of well-defined
instructions for accomplishing some task. It starts in some initial state and usually termi-
nates in a final state.
The term algorithms is derived from the name of the mathematician Mohammed ibn-Musa
al-Khwarizmi who was part of the royal court in Baghdad and who lived from about 780 to
850. Al-Khwarizmi’s work is the likely source for the word algebra as well. Definition D1.6
provides a basic understanding about what an optimization algorithm is, a “working hy-
pothesis” for the time being. It will later be refined in Definition D6.3 on page 103.
Definition D1.6 (Optimization Algorithm). An optimization algorithm is an algo-
rithm suitable to solve optimization problems.
An algorithm is a set of directions in a representation which is usually understandable for
human beings and thus, independent from specific hardware and computers. A program,
on the other hand, is basically an algorithm realized for a given computer or execution
environment.
Definition D1.7 (Program). A program
6
is a set of instructions that describe a task or
an algorithm to be carried out on a computer.
Therefore, the primitive instructions of the algorithm must be expressed either in a form that
the machine can process directly (machine code
7
[2135]), in a form that can be translated
(1:1) into such code (assembly language [2378], Java byte code [1109], etc.), or in a high-
level programming language
8
[1817] which can be translated (n:m) into the latter using
special software (compiler) [1556].
In this book, we will provide several algorithms for solving optimization problems. In
our algorithms, we will try to bridge between abstraction, easiness of reading, and closeness
to the syntax of high-level programming languages as Java. In the algorithms, we will use
self-explaining primitives which will still base on clear definitions (such as the list operations
discussed in Section 51.11 on page 647).
3
http://en.wikipedia.org/wiki/Mathematical_economics [accessed 2009-06-16]
4
http://en.wikipedia.org/wiki/Operations_research [accessed 2009-06-19]
5
http://en.wikipedia.org/wiki/Algorithm [accessed 2007-07-03]
6
http://en.wikipedia.org/wiki/Computer_program [accessed 2007-07-03]
7
http://en.wikipedia.org/wiki/Machine_code [accessed 2007-07-04]
8
http://en.wikipedia.org/wiki/High-level_programming_language [accessed 2007-07-03]
24 1 INTRODUCTION
1.1.3 Optimization versus Dedicated Algorithms
During at least fifty five years, research in computer science led to the development of many
algorithms which solve given problem classes to optimality. Basically, we can distinguish
two kinds of problem-solving algorithms: dedicated algorithms and optimization algorithms.
Dedicated algorithms are specialized to exactly solve a given class of problems in the
shortest possible time. They are most often deterministic and always utilize exact theoretical
knowledge about the problem at hand. Kruskal’s algorithm [1625], for example, is the best
way for searching the minimum spanning tree in a graph with weighted edges. The shortest
paths from a source node to the other nodes in a network can best be found with Dijkstra’s
algorithm [796] and the maximum flow in a flow network can be computed quickly with the
algorithm given by Dinitz [800]. There are many problems for which such fast and efficient
dedicated algorithms exist. For each optimization problem, we should always check first if
it can be solved with such a specialized, exact algorithm (see also Chapter 7 on page 109).
However, for many optimization tasks no efficient algorithm exists. The reason can be
that (1) the problem is too specific – after all, researchers usually develop algorithms only
for problems which are of general interest. (2) Furthermore, for some problems (the current
scientific opinion is that) no efficient, dedicated can be constructed at all (see Section 12.2.2
on page 146).
For such problems, the more general optimization algorithms are employed. Most often,
these algorithms only need a definition of the structure of possible solutions and a function
f which tells measures the quality of a candidate solution. Based on this information, they
try to find solutions for which f takes on the best values. Optimization methods utilize much
less information than exact algorithms, they are usually slower and cannot provide the same
solution quality. In the cases 1 and 2, however, we have no choice but to use them. And
indeed, there exists a giant number of situations where either Point 1, 2, or both conditions
hold, as you can see in, for example, Table 25.1 on page 226 or 28.4 on page 314. . .
1.1.4 Structure of this Book
In the first part of this book, we will first discuss the things which constitute an optimization
problem and which are the basic components of each optimization process. In Part II,
we take a look on all the possible aspects which can make an optimization task difficult.
Understanding these issues will allow you to anticipate possible problems when solving an
optimization task and, hopefully, to prevent their occurrence from the start. We then discuss
metaheuristic optimization algorithms (and especially Evolutionary Computation methods)
in Part III. Whereas these optimization approaches are clearly the center of this book, we
also introduce traditional, non-metaheuristic, numerical, and/or exact methods in Part IV.
In Part V, we then list some examples of applications for which optimization algorithms can
be used. Finally, Part VI is a collection of definitions and knowledge which can be useful to
understand the terms and formulas used in the rest of the book. It is provided in an effort
to make the book stand-alone and to allow students to understand it even if they have little
mathematical or theoretical background.
1.2 Types of Optimization Problems
Basically, there are two general types of optimization tasks: combinatorial and numerical
problems.
1.2.1 Combinatorial Optimization Problems
1.2. TYPES OF OPTIMIZATION PROBLEMS 25
Definition D1.8 (Combinatorial Optimization Problem). Combinatorial optimiza-
tion problems
9
[578, 628, 2124, 2430] are problems which are defined over a finite (or numer-
able infinite) discrete problem space X and whose candidate solutions structure can either
be expressed as
1. elements from finite sets,
2. finite sequence or permutation of elements x
i
chosen from finite sets, i. e., x ∈ X ⇒
x = (x
1
, x
2
, . . . ), as
3. sets of elements, i. e., x ∈ X ⇒x = ¦x
1
, x
2
, . . . ¦, as
4. tree or graph structures with node or edge properties stemming from any of the above
types, or
5. any form of nesting, combination, partitions, or subsets of the above.
Combinatorial tasks are, for example, the Traveling Salesman Problem (see Example E2.2 on
page 45 [1429]), Vehicle Routing Problems (see Section 49.3 on page 536 and [2162, 2912]),
graph coloring [1455], graph partitioning [344], scheduling [564], packing [564], and satisfia-
bility problems [2335]. Algorithms suitable for such problems are, amongst others, Genetic
Algorithms (Chapter 29), Simulated Annealing (Chapter 27), Tabu Search (Chapter 40),
and Extremal Optimization (Chapter 41).
Example E1.1 (Bin Packing Problem).
The bin packing problem
10
[1025] is a combinatorial optimization problem which is defined
a =1
5
a =3
10
a =2
7
a =2
9 a =4
2
a =1
4
a =1
3
a =4
1
a =2
8
a =2
6
b
=
5
bin0 bin1 bin2 bin3 bin4
distributionofn=10
objectsofsizea toa
ink=5binsofsizeb=5
1 10
Figure 1.2: One example instance of a bin packing problem.
as follows: Given are a number k ∈ N
1
of bins, each of which having size b ∈ N
1
, and a
number n ∈ N
1
with the weights a
1
, a
2
, . . . , a
n
. The question to answer is: Can the n objects
be distributed over the k bins in a way that none of the bins flows over? A configuration
which meets this requirement is sketched in Figure 1.2. If another object of size a
11
= 4 was
added, the answer to the question would become No.
The bin packing problem can be formalized to asking whether a mapping x from the
numbers 1..n to 1..k exists so that Equation 1.1 holds. In Equation 1.1, the mapping is
represented as a list x of k sets where the i
th
set contains the indices of the objects to be
packed into the i
th
bin.
∃x : ∀i ∈ 0..k −1 :
_
_

∀j∈x[i]
a
j
_
_
≤ b (1.1)
Finding whether this equation holds or not is known to be AT-complete. The question
could be transformed into an optimization problem with the goal of to find the smallest
9
http://en.wikipedia.org/wiki/Combinatorial_optimization [accessed 2010-08-04]
10
http://en.wikipedia.org/wiki/Bin_packing_problem [accessed 2010-08-27]
26 1 INTRODUCTION
number k of bins of size b which can contain all the objects listed in a. Alternatively, we
could try to find the mapping x for which the function f
bp
given in Equation 1.2 takes on
the lowest value, for example.
f
bp
(x) =
k−1

i=0
max
_
_
_
0,
_
_

∀j∈x[i]
a
j
_
_
−b
_
_
_
(1.2)
This problem is known to be AT-hard. See Section 12.2 for a discussion on
AT-completeness and AT-hardness and Task 67 on page 238 for a homework task con-
cerned with solving the bin packing problem.
Example E1.2 (Circuit Layout).
Given are a card with a finite set S of slots, a set C : [C[ ≤ [S[ of circuits to be placed on
A B D J
C
E
F G
H
I
K
L
M
20slots
11
slots
(A,B),(B,C),(C,F),(C,G),(C,M),
(D,E),(D,J),(E,F),(F,K),(G,J),(G,M),
(H,J),(H,I),(H,L),(I,K),(I,L),(K,L),
(L,M)
R={
}
C={A,B,C,D,E,F,G,H,I,J,K,L,M}
S=20 11 ´
Figure 1.3: An example for circuit layouts.
that card, and a connection list R ⊆ C C where (c
1
, c
2
) ∈ R means that the two circuits
c
1
, c
2
∈ C have to be connected. The goal is to find a layout x which assigns each circuit
c ∈ C to a location s ∈ S so that the overall length of wiring required to establish all the
connections r ∈ R becomes minimal, under the constraint that no wire must cross another
one. An example for such a scenario is given in Figure 1.3.
Similar circuit layout problems exist in a variety of forms for quite a while now [1541]
and are known to be quite hard. They may be extended with constraints such as to also
minimize the heat emission of the circuit and to minimize cross-wire induction and so on.
They may be extended to not only evolve the circuit location but also the circuit types and
numbers [1479, 2792] and thus, can become arbitrarily complex.
Example E1.3 (Job Shop Scheduling).
Job Shop Scheduling
11
(JSS) is considered to be one of the hardest classes of scheduling
problems and is also AT-complete [1026]. The goal is to distribute a set of tasks to machines
in a way that all deadlines are met.
In a JSS, a set T of jobs t
i
∈ T is given, where each job t
i
consists of a number of
sub-tasks t
i,j
. A partial order is defined on the sub-tasks t
i,j
of each job t
i
which describes
the precedence of the sub-tasks, i. e., constraints in which order they must be processed.
In the case of a beverage factory, 10 000 bottles of cola (t
1
. . . t
10 000
) and 5000 bottles of
lemonade (t
10 001
. . . t
15 000
).
For the cola bottles, the jobs would consist of three subjobs, t
i,1
to t
i,2
for i ∈ 1..10 000,
since the bottles first need to labeled, then they are filled with cola, and finally closed. It is
clear that the bottles cannot be closed before they are filled t
i,2
≺ t
i,3
whereas the labeling
11
http://en.wikipedia.org/wiki/Job_Shop_Scheduling [accessed 2010-08-27]
1.2. TYPES OF OPTIMIZATION PROBLEMS 27
can take place at any time. For the lemonade, all production steps are the same as the
ones for the cola bottles, except that step t
i,2
∀i ∈ 10 001..15 000 is replaced with filling the
bottles with lemonade.
It is furthermore obvious that the different sub-jobs have to be performed by different
machines m
i
∈ M. Each of the machines m
i
can only process a single job at a time. Finally,
each of the jobs has a deadline until which it must be finished.
The optimization algorithm has to find a schedule S which defines for each job when it has
to be executed and on what machine while adhering to the order and deadline constraints.
Usually, the optimization process would start with random configurations which violate
some of the constraints (typically the deadlines) and step by step reduces the violations
until results are found that fulfill all requirements.
The JSS can be considered as a combinatorial problem since the optimizer creates a
sequence and assignment of jobs and the set of possible starting times (which make sense)
is also finite.
Example E1.4 (Routing).
In a network N composed of “numNodes”N nodes, find the shortest path from one node n
1
to another node n
2
. This optimization problem consists of composing a sequence of edges
which form a path from n
1
to n
2
of the shortest length. This combinatorial problem is
rather simple and has been solved by Dijkstra [796] a long time ago. Nevertheless, it is an
optimization problem.
1.2.2 Numerical Optimization Problems
Definition D1.9 (Numerical Optimization Problem). Numerical optimization prob-
lems
12
[1281, 2040] are problems that are defined over problem spaces which are subspaces
of (uncountable infinite) numerical spaces, such as the real or complex vectors (X ⊆ R
n
or
X ⊆ C
n
).
Numerical problems defined over R
n
(or any kind of space containing it) are called contin-
uous optimization problems. Function optimization [194], engineering design optimization
tasks [2059], classification and data mining tasks [633] are common continuous optimiza-
tion problems. They often can efficiently be solved with Evolution Strategies (Chapter 30),
Differential Evolution (Chapter 33), Evolutionary Programming (Chapter 32), Estimation
of Distribution Algorithms (Chapter 34) or Particle Swarm Optimization (Chapter 39), for
example.
Example E1.5 (Function Roots and Minimization).
The task Find the roots of function g(x) defined in Equation 1.3 and sketched in Figure 1.4
can – at least by the author’s limited mathematical capabilities – hardly be solved manually.
g(x) = 5 + ln
¸
¸
sin
_
x
2
+e
x
tan x
_
+ cosx
¸
¸
−arctan [x −3[ (1.3)
However, we can transform it into the optimization problem Minimize the objective function
f
1
(x) = [0 −g(x)[ = [g(x)[ or Minimize the objective function f
2
(x) = (0 −g(x))
2
= (g(x))
2
.
Different from the original formulation, the objective functions f
1
and f
2
provide us with a
12
http://en.wikipedia.org/wiki/Optimization_(mathematics) [accessed 2010-08-04]
28 1 INTRODUCTION
-4
-2
0
2
4
-5 0 5
g(x)
x
x
1
x
2
x
3
x
4
Figure 1.4: The graph of g(x) as given in Equation 1.3 with the four roots x

1
to x

4
.
measure of how close an element x is to be a valid solution. For the global optima x

of
the problem, f
1
(x

) = f
2
(x

) = 0 holds. Based on this formulation, we can hand any of the
two objective functions to a method capable of solving numerical optimization problems. If
these indeed discover elements x

with f
1
(x

) = f
2
(x

) = 0, we have answered the initial
question since the elements x

are the roots of g(x).
1.2. TYPES OF OPTIMIZATION PROBLEMS 29
Example E1.6 (Truss Optimization).
In a truss design optimization problem [18] as sketched in Figure 1.5, the locations of n
F F
trussstructure:fixedpoints(left),
possiblebeamlocations(gray),
appliedforce(right,F)
possibleoptimizationresult:
beamthicknesses(darkgray),
unusedbeams(thinlightgray)
optimization
Figure 1.5: A truss optimization problem like the one given in [18].
possible beams (light gray), a few fixed points (left hand side of the truss sketches), and
a point where a weight will be attached (right hand side of the truss sketches,
−→
F ). The
thickness d
i
of each of the i = 1..n beams which leads to the most stable truss whilst not
exceeding a maximum volume of material V would be subject to optimization. A structure
as the one sketched on the right hand side of the figure could result from a numerical
optimization process: Some beams receive different non-zero thicknesses (illustrated with
thicker lines). Some other beams have been erased from the design (receive thickness d
i
= 0),
sketched as thin, light gray lines.
Notice that, depending on the problem formulation, truss optimization can also be-
come a discrete or even combinatorial problem: Not in all real-world design tasks, we can
choose the thicknesses of the beams in the truss arbitrarily. It may probably be possi-
ble when synthesizing the support structures for air plane wings or motor blocks. How-
ever, when building a truss from pre-defined building blocks, such as the beams avail-
able for house construction, the thicknesses of the beams are finite and, for example,
d
i
∈ ¦10mm, 20mm, 30mm, 50mm, 100mm, 300mm¦ ∀i ∈ 1..n. In such a case, a problem
such as the one sketched in Figure 1.5 becomes effectively combinatorial or, at least, a
discrete internet programming task.
1.2.3 Discrete and Mixed Problems
Of course, numerical problems may also be defined over discrete spaces, such as the N
n
0
.
Problems of this type are called integer programming problems:
Definition D1.10 (Integer Programming). An integer programming problem
13
is an
optimization problem which is defined over a finite or countable infinite numerical prob-
lem space (such as N
n
0
or Z
n
) [1495, 2965].
Basically, combinatorial optimization tasks can be considered as a class of such discrete
optimization problems. Because of the wide range of possible solution structures in combi-
natorial tasks, however, techniques well capable of solving integer programming problems
13
http://en.wikipedia.org/wiki/Integer_programming [accessed 2010-08-04]
30 1 INTRODUCTION
are not necessarily suitable for a given combinatorial task. The same goes for tasks which
are suitable for real-valued vector optimization problems: the class of integer programming
problems is, of course, a member of the general numerical optimization tasks. However,
methods for real-valued problems are not necessarily good for integer-valued ones (and
hence, also not for combinatorial tasks). Furthermore, there exist optimization problems
which are combinations of the above classes. For instance, there may be a problem where
a graph must be built which has edges with real-valued properties, i. e., a crossover of a
combinatorial and a continuous optimization problem.
Definition D1.11 (Mixed Integer Programming). A mixed integer programming
problem (MIP) is a numerical optimization problem where some variables stem from fi-
nite (or countable infinite) (numerical) spaces while others stem from uncountable infinite
(numerical) spaces.
1.2.4 Summary
Figure 1.6 is not intended to provide a precise hierarchy of problem types. Instead, it is only
Continuous
Optimization
Problems
MixedInteger
Programming
Problems
NumericalOptimizationProblems
DiscreteOptimization
Problems
Integer
Programming
Problems
Combinatorial
Optimization
Problems
Figure 1.6: A rough sketch of the problem type relations.
a rough sketch the relations between them. Basically, there are smooth transitions between
many problem classes and the type of a given problem is often subject to interpretation.
The candidate solutions of combinatorial problems stem from finite sets and are discrete.
They can thus always be expressed as discrete numerical vectors and thus could theoretically
be solved with methods for numerical optimization, such as integer programming techniques.
Of course, every discrete numerical problem can also be expressed as continuous numeri-
cal problem since any vector from the R
n
can be easily be transformed to a vector in the Z
n
by simply rounding each of its elements. Hence, the techniques for continuous optimization
can theoretically be used for discrete numerical problems (integer programming) as well as
for MIPs and for combinatorial problems.
However, algorithms for numerical optimization problems often apply numerical meth-
ods, such as adding vectors, multiplying them with some number, or rotating vectors. Com-
binatorial problems, on the other hand, are usually solved by applying combinatorial modi-
fications to solution structures, such as changing the order of elements in a sequence. These
two approaches are fundamentally different and thus, although algorithms for continuous
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 31
optimization can theoretically be used for combinatorial optimization, it is quite clear that
this is rarely the case in practice and would generally yield bad results.
Furthermore, the precision with which numbers can be represented on a computer is finite
since the memory available in any possible physical device is limited. From this perspective,
any continuous optimization problem, if solved on a physical machine, is a discrete problem
as well. In theory, it could thus be solved with approaches for combinatorial optimization.
Again, it is quite easy to see that adjusting the values and sequences of bits in a floating
point representation of a real number will probably lead to inferior results or, at least, to
slower optimization speed than using advanced mathematical operations.
Finally, there may be optimization tasks with both, combinatorial and continuous char-
acteristics. Example E1.6 provides a problem which can exist in a continuous and in a
discrete flavor, both of which can be tackled with numerical methods whereas the latter one
can also be solved efficiently with approaches for discrete problems.
1.3 Classes of Optimization Algorithms
In this book, we will only be able to discuss a small fraction of the wide variety of Global
Optimization techniques [1068, 2126]. Before digging any deeper into the matter, we will
attempt to classify some of these algorithms in order to give a rough overview of what will
come in Part III and Part IV. Figure 1.7 sketches a rough taxonomy of Global Optimization
methods. Please be aware that the idea of creating a precise taxonomy of optimization
methods makes no sense. Not only are there many hybrid approaches which combine several
different techniques, there also exist various definitions for what things like metaheuristics,
artificial intelligence, Evolutionary Computation, Soft Computing, and so on exactly are
and which specific algorithms do belong to them and which not. Furthermore, many of
these fuzzy collective terms also widely overlap. The area of Global Optimization is a living
and breathing research discipline. Many contributions come from scientists from various
research areas, ranging from physics to economy and from biology to the fine arts. As
same as important are the numerous ideas developed by practitioners in from engineering,
industry, and business. As stated before, Figure 1.7 should thus be considered as just a rough
sketch, as one possible way to classify optimizers according to their algorithm structure. In
the remainder of this section, we will try to outline some of the terms used in Figure 1.7.
1.3.1 Algorithm Classes
Generally, optimization algorithms can be divided in two basic classes: deterministic and
probabilistic algorithms.
1.3.1.1 Deterministic Algorithms
Definition D1.12 (Deterministic Algorithm). In each execution step of a determinis-
tic algorithm, there exists at most one way to proceed. If no way to proceed exists, the
algorithm has terminated.
For the same input data, deterministic algorithms will always produce the same results.
This means that whenever they have to make a decision on how to proceed, they will come
to the same decision each time the available data is the same.
32 1 INTRODUCTION
Optimization
[609, 2126, 2181, 2714]
Probabilistic Approaches

Metaheuristics
Part III [1068, 1887]

Hill Climbing
Chapter 26 [2356]

Simulated Annealing
Chapter 27 [1541, 2043]

Tabu Search
Chapter 40 [1069, 1228]

Extremal Optimization
Chapter 41 [345, 725]

GRASP
Chapter 42 [902, 906]

Downhill Simplex
Chapter 43 [1655, 2017]

Random Optimization
Chapter 44 [526, 2926]

Harmony Search
[1033]


Evolutionary Computation
[715, 847, 863, 1220, 3025, 3043]

Evolutionary Algorithms
Chapter 28 [167, 171, 599]
Genetic Algorithms
Chapter 29 [1075, 1912]

Evolution Strategies
Chapter 30 [300, 2279]

Genetic Programming
Chapter 31 [1602, 2199]

SGP
Section 31.3 [1602, 1615]

LGP
Section 31.4 [209, 394]

GGGP
Section 31.5 [1853, 2358]

Graph-based
Section 31.6 [1901, 2191]

Differential Evolution
Chapter 33 [2220, 2621]

EDAs
Chapter 34 [1683, 2153]

Evol. Programming
Chapter 32 [940, 945]

LCS
Chapter 35 [430, 1256]

Swarm Intelligence
[353, 881, 1524]

ACO
Chapter 37 [809, 810]

RFD
Chapter 38 [229, 230]

PSO
Chapter 39 [589, 853]

Memetic Algorithms
Chapter 36 [1141, 1961]


Deterministic Approaches
[1103, 1271, 2040, 2126]
Search Algorithms
Chapter 46 [635, 2356]

Uninformed Search
Section 46.2 [1557, 2140]

BFS
Section 46.2.1

DFS
Section 46.2.2

IDDFS
Section 46.2.4 [2356]

Informed Search
Section 46.3 [1557, 2140]

Greedy Search
Section 46.3.1 [635]

A

Search
Section 46.3.2 [758]

Branch & Bound
Chapter 47 [673, 1663]

Cutting-Plane Method
Chapter 48 [152, 643]

Figure 1.7: Overview of optimization algorithms.
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 33
Example E1.7 (Decision based on Objective Value).
Let us assume, for example, that the optimization process has to select one of the two
candidate solutions x
1
and x
2
for further investigation. The algorithm could base its decision
on the value the objective function f subject to minimization takes on. If f(x
1
) < f(x
2
), then
x
1
is chosen and x
2
otherwise. For each two points x
i
, x
j
∈ X, the decision procedure is the
same and deterministically chooses the candidate solution with the better objective value.
Deterministic algorithms are thus used when decisions can efficiently be made on the data
available to the optimization process. This is the case if a clear relation between the char-
acteristics of the possible solutions and their utility for a given problem exists. Then, the
search space can efficiently be explored with, for example, a divide and conquer scheme
14
.
1.3.1.2 Probabilistic Algorithms
There may be situations, however, where deterministic decision making may prevent the
optimizer from finding the best results. If the relation between a candidate solution and its
“fitness” is not obvious, dynamically changing, too complicated, or the scale of the search
space is very high, applying many of the deterministic approaches becomes less efficient and
often infeasible. Then, probabilistic optimization algorithms come into play. Here lies the
focus of this book.
Definition D1.13 (Probabilistic Algorithm). A randomized
15
(or probabilistic or
stochastic) algorithm includes at least one instruction that acts on the basis of random
numbers. In other words, a probabilistic algorithm violates the constraint of determin-
ism [1281, 1282, 1919, 1966, 1967].
Generally, exact algorithms can be much more efficient than probabilistic ones in many
domains. Probabilistic algorithms have the further drawback of not being deterministic,
i. e., may produce different results in each run even for the same input. Yet, in situations like
the one described in Example E1.8, randomized decision making can be very advantageous.
Example E1.8 (Probabilistic Decision Making (Example E1.7 Cont.)).
In Example E1.7, we described a possible decision step in a deterministic optimization
algorithm, similar to a statement such as if f(x
1
) < f(x
2
) then investigate(x
1
);
else investigate(x
2
);. Although this decision may be the right one in most cases, it
may be possible that sometimes, it makes sense to investigate a less fit candidate solution.
Hence, it could be a good idea to rephrase the decision to “if f(x
1
) < f(x
2
), then choose
x
1
with 90% probability, but in 10% of the situations, continue with x
2
”. In the case that
f(x
1
) < f(x
2
) actually holds, we could draw a random number uniformly distributed in [0, 1)
and if this number less than 0.9, we would pick x
1
and otherwise, investigate x
2
. Matter of
fact, such kind of randomized decision making distinguishes the more Simulated Annealing
algorithm (Chapter 27) which has been proven to be able to always find the global optimum
(possibly in infinite time) from simple Hill Climbing (Chapter 26) which does not have this
feature.
Early work in this area, which now has become one of most important research fields in
optimization, was started at least sixty years ago. The foundations of many of the efficient
methods currently in use were laid by researchers such as Bledsoe and Browning [327],
Friedberg [995], Robbins and Monro [2313], and Bremermann [407].
14
http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm [accessed 2007-07-09]
15
http://en.wikipedia.org/wiki/Randomized_algorithm [accessed 2007-07-03]
34 1 INTRODUCTION
1.3.2 Monte Carlo Algorithms
Most probabilistic algorithms used for optimization are Monte Carlo-based approaches (see
Section 12.4). They trade guaranteed global optimality of the solutions in for a shorter
runtime. This does not mean that the results obtained by using them are incorrect – they
may just not be the global optima. Still, a result little bit inferior to the best possible
solution x

is still better than waiting 10
100
years until x

is found
16
. . .
An umbrella term closely related to Monte Carlo algorithms is Soft Computing
17
(SC),
which summarizes all methods that find solutions for computational hard tasks (such as
AT-complete problems, see Section 12.2.2) in relatively short time, again by trading in
optimality for higher computation speed [760, 1242, 1430, 2685].
1.3.3 Heuristics and Metaheuristics
Most efficient optimization algorithms which search good individuals in some targeted way,
i. e., all methods better than uninformed searches, utilize information from objective (see
Section 2.2) or heuristic functions. The term heuristic stems from the Greek heuriskein
which translates to to find or to to discover.
1.3.3.1 Heuristics
Heuristics used in optimization are functions that rates the utility of an element and provides
support for the decision regarding which element out of a set of possible solutions is to be
examined next. Deterministic algorithms usually employ heuristics to define the processing
order of the candidate solutions. Approaches using such functions are called informed search
methods (which will be discussed in Section 46.3 on page 518). Many probabilistic methods,
on the other hand, only consider those elements of the search space in further computations
that have been selected by the heuristic while discarding the rest.
Definition D1.14 (Heuristic). A heuristic
18
[1887, 2140, 2276] function φ is a problem
dependent approximate measure for the quality of one candidate solution.
A heuristic may, for instance, estimate the number of search steps required from a given
solution to the optimal results. In this case, the lower the heuristic value, the better the
solution would be. It is quite clear that heuristics are problem class dependent. They can be
considered as the equivalent of fitness or objective functions in (deterministic) state space
search methods [1467] (see Section 46.3 and the more specific Definition D46.5 on page 518).
An objective function f(x) usually has a clear meaning denoting an absolute utility of a
candidate solution x. Different from that, heuristic values φ(p) often represent more indirect
criteria such as the aforementioned search costs required to reach the global optimum [1467]
from a given individual.
Example E1.9 (Heuristic for the TSP from Example E2.2).
A heuristic function in the Traveling Salesman Problem involving five cities in China as
defined in Example E2.2 on page 45 could, for instance, state “The total distance of this
candidate solution is 1024km longer than the optimal tour.” or “You need to swap no more
than three cities in this tour in order to find the optimal solution”. Such a statement has
a different quality from an objective function just stating the total tour length and being
subject to minimization.
16
Kindly refer to Chapter 12 for a more in-depth discussion of complexity.
17
http://en.wikipedia.org/wiki/Soft_computing [accessed 2007-09-17]
18
http://en.wikipedia.org/wiki/Heuristic_%28computer_science%29 [accessed 2007-07-03]
1.3. CLASSES OF OPTIMIZATION ALGORITHMS 35
We will discuss heuristics in more depth in Section 46.3. It should be mentioned that opti-
mization and artificial intelligence
19
(AI) merge seamlessly with each other. This becomes
especially clear when considering that one most of important research areas of AI is the
design of good heuristics for search, especially for the state space search algorithms which
here, are listed as deterministic optimization methods.
Anyway, knowing the proximity of the meaning of the terms heuristics and objective
functions, it may as well be possible to consider many of the following, randomized ap-
proaches to be informed searches (which brings us back to the fuzzyness of terms). There
is, however, another difference between heuristics and objective functions: They usually
involve more knowledge about the relation of the structure of the candidate solutions and
their utility as may have become clear from Example E1.9.
1.3.3.2 Metaheuristics
This knowledge is a luxury often not available. In many cases, the optimization algorithms
have a “horizon” limited to the search operations and the objective functions with unknown
characteristics. They can produce new candidate solutions by applying search operators to
the currently available genotypes. Yet, the exact result of the operators is (especially in
probabilistic approaches) not clear in advance and has to be measured with the objective
functions as sketched in Figure 1.8. Then, the only choices an optimization algorithm has
are:
1. Which candidate solutions should be investigated further?
2. Which search operations should be applied to them? (If more than one operator is
available, that is.)
f(x ) ÎX
blackbox
(3,3)
(0,2) (0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2)
Pop(t)
f
(
x
)
1
(3,3)
(0,2) (0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2)
Pop(t+1)
computefitnessorheuristic,
selectbestsolutions,and
createnewonesvia
searchoperations
Figure 1.8: Black box treatment of objective functions in metaheuristics.
19
http://en.wikipedia.org/wiki/Artificial_intelligence [accessed 2007-09-17]
36 1 INTRODUCTION
Definition D1.15 (Metaheuristic). A metaheuristic
20
is a method for solving general
classes of problems. It combines utility measures such as objective functions or heuristics
in an abstract and hopefully efficient way, usually without utilizing deeper insight into their
structure, i. e., by treating them as black box-procedures [342, 1068, 1103].
Metaheuristics, which you can find discussed in Part III, obtain knowledge on the struc-
ture of the problem space and the objective functions by utilizing statistics obtained from
stochastically sampling the search space. Many metaheuristics are inspired by a model of
some natural phenomenon or physical process. Simulated Annealing, for example, decides
which candidate solution is used during the next search step according to the Boltzmann
probability factor of atom configurations of solidifying metal melts. The Raindrop Method
emulates the waves on a sheet of water after a raindrop fell into it. There also are tech-
niques without direct real-world role model like Tabu Search and Random Optimization.
Unified models of metaheuristic optimization procedures have been proposed by Osman
[2103], Rayward-Smith [2275], Vaessens et al. [2761, 2762], and Taillard et al. [2650].
Obviously, metaheuristics are not necessarily probabilistic methods. The A

search, for
instance, a deterministic informed search approach, is considered to be a metaheuristic too
(although it incorporates more knowledge about the structure of the heuristic functions
than, for instance, Evolutionary Algorithms do). Still, in this book, we are interested in
probabilistic metaheuristics the most.
1.3.3.2.1 Summary: Difference between Heuristics and Metaheuristics
The terms heuristic, objective function, fitness function (will be introduced in 28.1.2.5
and Section 28.3 on page 274 in detail), and metaheuristic may easily be mixed up and
it is maybe not entirely obvious which is which and where is the difference. Therefore, we
here want to give a short outline of the main differences:
1. Heuristic. problem-specific measure of the potential of a solution as step towards the
global optimum. May be absolute/direct (describe the utility of a solution) or indirect
(e.g., approximate the distance of the solution to the optimum in terms of search
steps).
2. Objective Function. Problem-specific absolute/direct quality measure which rates one
aspect of candidate solution.
3. Fitness Function. Secondary utility measure which may combine information from a
set of primary utility measures (such as objective functions or heuristics) to a single
rating for an individual. This process may also take place relative to a set of given
individuals, i. e., rate the utility of a solution in comparison to another set of solutions
(such as a population in Evolutionary Algorithms).
4. Metaheuristic. general, problem-independent (ternary) approach to select solutions
for further investigation in order to solve an optimization problem. Therefore uses
a (primary or secondary) utility measure such as a heuristic, objective function, or
fitness function.
1.3.4 Evolutionary Computation
An important class of probabilistic Monte Carlo algorithms is Evolutionary Computation
21
(EC). It encompasses all randomized metaheuristic approaches which iteratively refine a set
(population) of multiple candidate solutions.
20
http://en.wikipedia.org/wiki/Metaheuristic [accessed 2007-07-03]
21
http://en.wikipedia.org/wiki/Evolutionary_computation [accessed 2007-09-17]
1.4. CLASSIFICATION ACCORDING TO OPTIMIZATION TIME 37
Some of the most important EC approaches are Evolutionary Algorithms and Swarm
Intelligence, which will be discussed in-depth in this book. Swarm Intelligence is inspired
by fact that natural systems composed of many independent, simple agents – ants in case of
Ant Colony Optimization (ACO) and birds or fish in case of Particle Swarm Optimization
(PSO) – are often able to find pieces of food or shortest-distance routes very efficiently.
Evolutionary Algorithms, on the other hand, copy the process of natural evolution and
treat candidate solutions as individuals which compete and reproduce in a virtual envi-
ronment. Generation after generation, these individuals adapt better and better to the
environment (defined by the user-specified objective function(s)) and thus, become more
and more suitable solutions to the problem at hand.
1.3.5 Evolutionary Algorithms
Evolutionary Algorithms can again be divided into very diverse approaches, spanning from
Genetic Algorithms (GAs) which try to find bit-or integer-strings of high utility (see, for
instance, Example E4.2 on page 83) and Evolution Strategies (ESs) which do the same with
vectors of real numbers to Genetic Programming (GP), where optimal tree data structures
or programs are sought. The variety of Evolutionary Algorithms is the focus of this book,
also including Memetic Algorithms (MAs) which are EAs hybridized with other search al-
gorithms.
1.4 Classification According to Optimization Time
The taxonomy just introduced classifies the optimization methods according to their algo-
rithmic structure and underlying principles, in other words, from the viewpoint of theory. A
software engineer or a user who wants to solve a problem with such an approach is however
more interested in its “interfacing features” such as speed and precision.
Speed and precision are often conflicting objectives, at least in terms of probabilistic
algorithms. A general rule of thumb is that you can gain improvements in accuracy of
optimization only by investing more time. Scientists in the area of global optimization
try to push this Pareto frontier
22
further by inventing new approaches and enhancing or
tweaking existing ones. When it comes to time constraints and hence, the required speed of
the optimization algorithm, we can distinguish two types of optimization use cases.
Definition D1.16 (Online Optimization). Online optimization problems are tasks that
need to be solved quickly in a time span usually ranging between ten milliseconds to a few
minutes.
In order to find a solution in this short time, optimality is often significantly traded in
for speed gains. Examples for online optimization are robot localization, load balancing,
services composition for business processes, or updating a factory’s machine job schedule
after new orders came in.
Example E1.10 (Robotic Soccer).
Figure 1.9 shows some of the robots of the robotic soccer team
23
[179] as the play in the
RoboCup German Open 2009. In the games of the RoboCup competition, two teams, each
consisting of a fixed number of robotic players, oppose each other on a soccer field. The
rules are similar to human soccer and the team which scores the most goals wins.
When a soccer playing robot approaches the goal of the opposing team with the ball, it
has only a very limited time frame to decide what to do [2518]. Should it shoot directly at
22
Pareto frontiers have been discussed in Section 3.3.5 on page 65.
23
http://carpenoctem.das-lab.net
38 1 INTRODUCTION
Figure 1.9: The robotic soccer team CarpeNoctem (on the right side) on the fourth day of
the RoboCup German Open 2009.
the goal? Should it try to pass the ball to a team mate? Should it continue to drive towards
the goal in order to get into a better shooting position?
To find a sequence of actions which maximizes the chance of scoring a goal in a contin-
uously changing world is an online optimization problem. If a robot takes too long to make
a decision, opposing and cooperating robots may already have moved to different locations
and the decision may have become outdated.
Not only must the problem be solved in a few milliseconds, this process often involves
finding an agreement between different, independent robots each having a possibly different
view on what is currently going on in the world. Additionally, the degrees of freedom in the
real world, the points where the robot can move or shoot the ball to, are basically infinite.
It is clear that in such situations, no complex optimization algorithm can be applied.
Instead, only a smaller set of possible alternatives can be tested or, alternatively, predefined
programs may be processed, thus ignoring the optimization character of the decision making.
Online optimization tasks are often carried out repetitively – new orders will, for instance,
continuously arrive in a production facility and need to be scheduled to machines in a way
that minimizes the waiting time of all jobs.
Definition D1.17 (Offline Optimization). In offline optimization problems, time is not
so important and a user is willing to wait maybe up to days or weeks if she can get an
optimal or close-to-optimal result.
Such problems regard for example design optimization, some data mining tasks, or creating
long-term schedules for transportation crews. These optimization processes will usually be
carried out only once in a long time.
Before doing anything else, one must be sure about to which of these two classes the
problem to be solved belongs. There are two measures which characterize whether an opti-
mization is suited to the timing constraints of a problem:
1. The time consumption of the algorithm itself.
2. The number of evaluations τ until the optimum (or a sufficiently good candidate
solution) is discovered, i. e., the first hitting time FHT (or its expected value).
An algorithm with high internal complexity and a low FHT may be suitable for a problem
where evaluating the objective functions takes long. On the other hand, it may be less
feasible than a simple approach if the quality of a candidate solution can be determined
quickly.
1.5. NUMBER OF OPTIMIZATION CRITERIA 39
1.5 Number of Optimization Criteria
Optimization algorithms can be divided into
1. such which try to find the best values of single objective functions f (see Section 3.1
on page 51,
2. such that optimize sets f of target functions (see Section 3.3 on page 55, and
3. such that optimize one or multiple objective functions while adhering to additional
constraints (see Section 3.4 on page 70).
This distinction between single-objective, multi-objective, and constraint optimization will
be discussed in depth in Section 3.3. In this chapter, we will also introduced unifying
approaches which allow us to transform multi-objective problems into single-objective ones
and to gain a joint perspective on constrained and unconstrained optimization.
1.6 Introductory Example
In Algorithm 1.1, we want to give a short introductory example on how metaheuristic
optimization algorithms may look like and of which modules they can be composed. Algo-
rithm 1.1 is based in the bin packing Example E1.1 given on 25. It should be noted that
we do not claim that Algorithm 1.1 is any good in solving the problem, matter of fact, the
opposite is the case. However, it posses typical components which can be found in most
metaheuristics and can relatively easily be understood.
As stated in Example E1.1, the goal of bin packing is to whether an assignment x of n
objects (having the sizes a
1
, a
2
, .., a
n
) to (at most) k bins each able to hold a volume of b
exists where the capacity of no bin is exceeded. This decision problem can be translated
into an optimization problem where it is the goal to discover the x

where the bin capacities
are exceeded the least, i.e., to minimize the objective function f
bp
given in Equation 1.2
on page 26. Objective functions are discussed in Section 2.2 and in Algorithm 1.1, f
bp
is
evaluated in Line 23.
Obviously, if f
bp
(x

) = 0, the answer to the decision problem is “Yes, an assignment
which does not violate the constraints exist.” and otherwise, the answer is “No, it is not
possible to pack the n specified objects into the k bins of size b.”
For f
bp
in Example E1.1, we represented a possible assignment of the objects to bins as
a tuple x consisting of k sets. Each set can hold numbers from 1..n. If, for one specific x,
5 ∈ x[0], for example, then the fifth object should be placed into the first bin and 3 ∈ x[3]
means that the third object is to be put into the fourth bin. The set X of all such possible
solutions (s) x is called problem space and discussed in Section 2.1.
Instead of exploring this space directly, Example E1.1 uses a different search space G
(see Section 4.1). Each of these encoded candidate solutions, each so-called genotype, is a
permutation of the first n natural numbers. The sequence of these numbers stands for the
assignment order of objects to bins.
This representation is translated between Lines 13 and 22 to the tuple-of-sets representa-
tion which can be evaluated by the objective function f
bp
. Basically, this genotype-phenotype
mapping gpm : G →X (see Section 4.3) starts with a tuple of k empty sets and sequentially
processes the genotypes g from the beginning to the end. In each processing step, it takes
the next number from the genotype and checks whether the object identified by this number
still fits into the current bin bin. If so, the number is placed into the set in the tuple x at
index bin. Otherwise, if there are empty bins left, bin is increased. This way, the first k −1
bins are always filled to at most their capacity and only the last bin may overflow. The
translation between the genotypes g ∈ G and the candidate solutions x ∈ X is deterministic.
It is known that bin packing is AT-hard, which means that there currently exists no
algorithm which is always better than enumerating all possible solutions and testing each
40 1 INTRODUCTION
initialization: nullary search operation
searchOp : ∅ →G (see Section 4.2)
search step: unary search operation
searchOp : G →G (see Section 4.2)
genotype-phenotype mapping
gpm : G →X (see Section 4.3)
objective function f : X →R (see Section 2.2)
What is good? (see Chapter 3 and 5.2)
termination criterion (see Section 6.3.3)
Algorithm 1.1: x ←− optimizeBinPacking(k, b, a) . . . . . . . . . . . . . . . (see Example E1.1)
Input: k ∈ N
1
: the number of bins
Input: b ∈ N
1
: the size of each single bin
Input: a
1
, a
2
, .., a
n
∈ N
1
: the sizes of the n objects
Data: g, g

∈ G: the current and best genotype so far . . . . . . . . . . (see Definition D4.2)
Data: x ∈ X: the current and best phenotype so far . . . . . . . . . . . . (see Definition D2.2)
Data: y, y

∈ R
+
: the current and best objective value . . . . . . . . . (see Definition D2.3)
Output: x ∈ X: the best phenotype found. . . . . . . . . . . . . . . . . . . . . . (see Definition D3.5)
1 begin
2 g

←− (1, 2, .., n)
3 y

←− +∞
4 repeat
// perform a random search step
5 g ←− g

6 i ←− ⌊randomUni[1, n)⌋
7 repeat
8 j ←− ⌊randomUni[1, n)⌋
9 until i ,= j
10 t ←− g[i]
11 g[i] ←− g[j]
12 g[j] ←− t
// compute the bins
13 x ←− (∅, ∅, . . .
. ¸¸ .
n times
)
14 bin ←− 0
15 sum ←− 0
16 for i ←− 0 up to n −1 do
17 j ←− g[i]
18 if ((sum+a
j
) > b) ∧ (bin < (k −1)) then
19 bin ←− bin + 1
20 sum ←− 0
21 x[bin] ←− x[bin] ∪ j
22 sum ←− sum+a
j
// compute objective value
23 y ←− f
bp
(x)
// the search logic: always choose the better point
24 if y < y

then
25 g ←− g

26 x ←− x
27 y

←− y
28 until y

= 0
29 return x
1.6. INTRODUCTORY EXAMPLE 41
of them. Our Algorithm 1.1 tries to explore the search space by starting with putting the
objects into the bins in their original order (Line 2). In each optimization step, it randomly
exchanges two objects (Lines 5 to 12). This operation basically is a unary search operation
which maps the elements of the search space G to new elements in G and creates new
genotypes g. The initialization step can be viewed as a nullary search operation. The basic
definitions for search operations is given in Section 4.2 on page 83.
In Line 24, the optimizer compares the objective value of the best assignment x found
so far with the objective value of the candidate solution x decoded from the newly created
genotype g with the genotype-phenotype mapping “gpm”. If the new phenotype x is better
than the old one x, the search will from now on use its corresponding genotype (g

←− x)
for creating new candidate solutions with the search operation and remembers x as well as
its objective value y. A discussion on what the term better actually means in optimization
is given in Chapter 3.
The search process in Algorithm 1.1 continues until a perfect packing, i. e., one with no
overflowing bin, is discovered. This termination criterion (see Section 6.3.3), checked in
Line 28, is chosen not too wise, since if no solution can be found, the algorithm will run
forever. Effectively, this Hill Climbing approach (see Chapter 26 on page 229) is thus a Las
Vegas algorithm (see Definition D12.2 on page 148).
In the following chapters, we will put a simple and yet comprehensive theory around
the structure of metaheuristics. We will give clear definitions for the typical modules of an
optimization algorithm and outline how they interact. Algorithm 1.1 here served as example
to show the general direction of where we will be going.
42 1 INTRODUCTION
Tasks T1
1. Name five optimization problems which can be encountered in daily life. Describe
the criteria which are subject to optimization, whether they should be maximized or
minimized, and how the investigated space of possible solutions looks like.
[10 points]
2. Name five optimization problems which can be encountered in engineering, business,
or science. Describe them in the same way as requested in Task 1.
[10 points]
3. Classify the problems from Task 1 and 2 into numerical, combinatorial, discrete, and
mixed integer programming, the classes introduced in Section 1.2 and sketched in Fig-
ure 1.6 on page 30. Check if different classifications are possible for the problems and
if so, outline how the problems should be formulated in each case. The truss opti-
mization problem (see Example E1.6 on page 29) is, for example, usually considered
to be a numerical problem since the thickness of each beam is a real value. If there
is, however, a finite set of beam thicknesses we can choose from, it becomes a discrete
problem and could even be considered as a combinatorial one.
[10 points]
4. Implement Algorithm 1.1 on page 40 in a programming language of your choice. Apply
your implementation to the bin packing instance given in Figure 1.2 on page 25 and
to the problem k = 5, b = 6, a = (6, 5, 4, 3, 3, 3, 2, 2, 1) and (hence) n = 9. Does the
algorithm work? Describe your observations, use a step-by-step debugger if necessary.
[20 points]
5. Find at least three features which make Algorithm 1.1 on page 40 rather unsuitable for
practical applications. Describe these drawbacks, their impact, and point out possible
remedies. You may use an implementation of the algorithm as requested in Task 4 for
gaining experience with what is problematic for the algorithm.
[6 points]
6. What are the advantages and disadvantages of randomized/probabilistic optimization
algorithms?
[4 points]
7. In which way are the requirements for online and offline optimization different? Does
a need for online optimization influence the choice of the optimization algorithm and
the general algorithm structure which can be used in your opinion?
[5 points]
8. What are the differences between solving general optimization problems and specific
ones? Discuss especially the difference between black box optimization and solving
functions which are completely known and maybe even can be differentiated.
[5 points]
Chapter 2
Problem Space and Objective Functions
2.1 Problem Space: How does it look like?
The first step which has to be taken when tackling any optimization problem is to define the
structure of the possible solutions. For a minimizing a simple mathematical function, the
solution will be a real number and if we wish to determine the optimal shape of an airplane’s
wing, we are looking for a blueprint describing it.
Definition D2.1 (Problem Space). The problem space (phenome) X of an optimization
problem is the set containing all elements x which could be its solution.
Usually, more than one problem space can be defined for a given optimization problem. A
few lines before, we said that as problem space for minimizing a mathematical function, the
real numbers R would be fine. On the other hand, we could as well restrict ourselves to
the natural numbers N
0
or widen the search to the whole complex plane C. This choice has
major impact: On one hand, it determines which solutions we can possible find. On the
other hand, it also has subtle influence on the way in which an optimization algorithm can
perform its search. Between each two different points in R, for instance, there are infinitely
many other numbers, while in N
0
, there are not. Then again, unlike N
0
and R, there is no
total order defined on C.
Definition D2.2 (Candidate Solution). A candidate solution (phenotype) x is an ele-
ment of the problem space X of a certain optimization problem.
In this book, we will especially focus on evolutionary optimization algorithms. One of the
oldest such method, the Genetic Algorithm, is inspired by the natural evolution and re-uses
many terms from biology. In dependence on Genetic Algorithms, we synonymously refer
to the problem space as phenome and to candidate solutions as phenotypes. The problem
space X is often restricted by practical constraints. It is, for instance, impossible to take
all real numbers into consideration in the minimization process of a real function. On our
off-the-shelf CPUs or with the Java programming language, we can only use 64 bit floating
point numbers. With them numbers can only be expressed up to a certain precision. If the
computer can only handles numbers up to 15 decimals, an element x = 1 +10
−16
cannot be
discovered.
44 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
2.2 Objective Functions: Is it good?
The next basic step when defining an optimization problem is to define a measure for what
is good. Such measures are called objective functions. As parameter, they take a candidate
solution x from the problem space X and return a value which rates its quality.
Definition D2.3 (Objective Function). An objective function f : X → R is a mathe-
matical function which is subject to optimization.
In Listing 55.8 on page 712, you can find a Java interface resembling Definition D2.3.
Usually, objective functions are subject to minimization. In other words, the smaller f(x),
the better is the candidate solution x ∈ X. Notice that although we state that an objective
function is a mathematical function, this is not always fully true. In Section 51.10 on
page 646 we introduce the concept of functions as functional binary relations (see ?? on
page ??) which assign to each element x from the domain X one and only one element from
the codomain (in case of the objective functions, the codomain is R). However, the evaluation
of the objective functions may incorporate randomized simulations (Example E2.3) or even
human interaction (Example E2.6). In such cases, the functionality constraint may be
violated. For the purpose of simplicity, let us, however, stick with the concept given in
Definition D2.3.
In the following, we will use several examples to build an understanding of what the
three concepts problem space X, candidate solution x, and objective function f mean. These
example should give an impression on the variety of possible ways in which they can be
specified and structured.
Example E2.1 (A Stone’s Throw).
A very simple optimization problem is the question Which is best velocity with which I should
throw
1
a stone (in an α = 15

angle) so that it lands exactly 100m away (assuming uniform
gravity and the absence of drag and wind)? Everyone of us has solved countless such tasks
in school and it probably never came to our mind to consider them as optimization problem.
a
100m
f(x)
d(x)
Figure 2.1: A sketch of the stone’s throw in Example E2.1.
The problem space X of this problem equals to the real numbers R and we can define
an objective function f : R → R which denotes the absolute difference between the actual
throw distance d(x) and the target distance 100m when a stone is thrown with x m/s as
sketched in Figure 2.1:
f(x) = [100m−d(x)[ (2.1)
d(x) =
x
2
g
sin(2α) (2.2)
≈ 0.051s
2
/m∗ x
2
¸
¸
g≈9.81m/s
2
, α=15

(2.3)
1
http://en.wikipedia.org/wiki/Trajectory [accessed 2009-06-13]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 45
This problem can easily be solved and we can obtain the (globally) optimal solution x

,
since we know that the best value which f can take on is zero:
f(x

) = 0 ≈
¸
¸
¸100m−0.051s
2
/m∗ (x

)
2
¸
¸
¸ (2.4)
0.051s
2
/m∗ (x

)
2
≈ 100m (2.5)
x


¸
100m
0.051s
2
/m

_
1960.78m
2
/s
2
≈ 44.28m/s (2.6)
Example E2.1 was very easy. We had a very clear objective function f for which even the
precise mathematical definition was available and, if not known beforehand, could have
easily been obtained from literature. Furthermore, the space of possible solutions, the
problem space, was very simple since it equaled the (positive) real numbers.
Example E2.2 (Traveling Salesman Problem).
The Traveling Salesman Problem [118, 1124, 1694] was first mentioned in 1831 by Voigt
[2816] and a similar (historic) problem was stated by the Irish Mathematician Hamil-
ton [1024]. Its basic idea is that a salesman wants to visit n cities in the shortest possible
time under the assumption that no city is visited twice and that he will again arrive at the
origin by the end of the tour. The pairwise distances of the cities are known. This problem
can be formalized with the aid of graph theory (see Chapter 52) as follows:
Definition D2.4 (Traveling Salesman Problem). The goal of the Traveling Salesman
Problem (TSP) is to find a cyclic path of minimum total weight
2
which visits all vertices of
a weighted graph.
Beijing
Shanghai
Hefei
Guangzhou
Chengdu
Chengdu
Guangzhou
Beijing
Hefei
Chengdu
1854km
Guangzhou
2174km
1954km
Hefei
1044km
1615km
1257km
Shanghai
1244km
2095km
1529km
472km
w(a,b)
Figure 2.2: An example for Traveling Salesman Problems, based on data from GoogleMaps.
Assume that we have the task to solve the Traveling Salesman Problem given in Figure 2.2,
i. e., to find the shortest cyclic tour through Beijing, Chengdu, Guangzhou, Hefei, and
Shanghai. The (symmetric) distance matrix
3
for computing a tour’s length is given in the
figure, too.
Here, the set of possible solutions X is not as trivial as in Example E2.1. Instead of
being a mapping from the real numbers to the real numbers, the objective function f takes
2
see Equation 52.3 on page 651
3
Based on the routing utility GoogleMaps (http://maps.google.com/ [accessed 2009-06-16]).
46 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
one possible permutation of the five cities as parameter. Since the tours which we want
to find are cyclic, we only need to take the permutations of four cities into consideration.
If we, for instance, assume that the tour starts in Hefei, it will also end in Hefei and each
possible solution is completely characterized by the sequence in which Beijing, Chengdu,
Guangzhou, and Shanghai are visited. Hence, X is the set of all such permutations.
X = Π(¦Beijing, Chengdu, Guangzhou, Shanghai¦) (2.7)
We can furthermore halve the search space by recognizing that it plays no role for the
total distance in which direction we take our tour x ∈ X. In other words, a tour
(Hefei →Shanghai → Beijing → Chengdu →Guangzhou →Hefei) has the same length and
is equivalent to (Hefei →Guangzhou →Chengdu → Beijing → Shanghai →Hefei). Hence,
for a TSP with n cities, there are
1
2
∗ (n −1)! possible solutions. In our case, this avails to
12 unique permutations.
f(x) = w(Hefei, x[0]) +
2

i=0
w(x[i], x[i + 1]) +w(x[3], Hefei) (2.8)
The objective function f : X → R – the criterion subject to optimization – is the total
distance of the tours. It equals to the sum of distances between the consecutive cities x[i]
in the lists x representing the permutations and to the start and end point Hefei. We again
want to minimize it. Computing it is again rather simple, but different from Example E2.1,
we cannot simply solve it and compute the optimal result x

. Matter of fact, the Traveling
Salesman Problem is one of the most complicated optimization problems.
Here, we will determine x

with a so-called brute force
4
approach: We enumerate all
possible solutions and their corresponding objective values and then pick the best one.
the tour x
i
f(x
i
)
x
1
Hefei → Beijing → Chengdu → Guangzhou → Shanghai → Hefei 6853km
x
2
Hefei → Beijing → Chengdu → Shanghai → Guangzhou → Hefei 7779km
x
3
Hefei → Beijing → Guangzhou → Chengdu → Shanghai → Hefei 7739km
x
4
Hefei → Beijing → Guangzhou → Shanghai → Chengdu → Hefei 8457km
x
5
Hefei → Beijing → Shanghai → Chengdu → Guangzhou → Hefei 7594km
x
6
Hefei → Beijing → Shanghai → Guangzhou → Chengdu → Hefei 7386km
x
7
Hefei → Chengdu → Beijing → Guangzhou → Shanghai → Hefei 7644km
x
8
Hefei → Chengdu → Beijing → Shanghai → Guangzhou → Hefei 7499km
x
9
Hefei → Chengdu → Guangzhou → Beijing → Shanghai → Hefei 7459km
x
10
Hefei → Chengdu → Shanghai → Beijing → Guangzhou → Hefei 8385km
x
11
Hefei → Guangzhou → Beijing → Chengdu → Shanghai → Hefei 7852km
x
12
Hefei → Guangzhou → Chengdu → Beijing → Shanghai → Hefei 6781km
Table 2.1: The unique tours for Example E2.2.
From Table 2.1 we can easily spot the best solution x

= x
12
, as the route over
Guangzhou, Chengdu, Beijing, and Shanghai, both starting and ending in Hefei. With a
total distance of f(x
12
) = 6781 km, it is more than 1500 km shorter than the worst solution
x
4
.
Enumerating all possible solutions is, of course, the most time-consuming optimization ap-
proach possible. For TSPs with more cities, it quickly becomes infeasible since the number
of solutions grows more than exponentially
__
n
3
_
n
≤ n!
_
. Furthermore, in the simple Exam-
ple E2.1, enumerating all candidate solutions would have even been impossible since X = R.
From this example we can learn three things:
4
http://en.wikipedia.org/wiki/Brute-force_search [accessed 2009-06-16]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 47
1. The problem space containing the parameters of the objective functions is not re-
stricted to the real numbers but can, indeed, have a very different structure.
2. There are objective functions which cannot be resolved mathematically. Matter of
fact, most “interesting” optimization problem have this feature.
3. Enumerating all solutions is not a good optimization approach and only applicable in
“small” or trivial problem instances.
Although the objective function of Example E2.2 could not be solved for x, its results can
still be easily determined for any given candidate solution. Yet, there are many optimization
problems where this is not the case.
Example E2.3 (Antenna Design).
Designing antennas is an optimization problem which has been approached by many
researchers such as Chattoraj and Roy [532], Choo et al. [575], John and Ammann
[1450, 1451], Koza et al. [1615], Lohn et al. [1769, 1770], and Villegas et al. [2814], amongst
others. Rahmat-Samii and Michielssen dedicate no less than four chapters to it in their
book on the application of (evolutionary) optimization methods to electromagnetism-related
tasks [2250]. Here, the problem spaces X are the sets of feasible antenna designs with many
degrees of freedom in up to three dimensions, also often involving wire thickness and geom-
etry.
The objective functions usually involve maximizing the antenna gain
5
and bandwidth
together with constraints concerning, for instance, weight and shape of the proposed con-
structions. However, computing the gain (depending on the signal frequency and radiation
angle) of an arbitrarily structured antenna is not trivial. In most cases, a simulation envi-
ronment and considerable computational power is required. Besides the fact that there are
many possible antenna designs, different from Example E2.2 the evaluation of even small
sets of candidate solutions may take a substantial amount of time.
Example E2.4 (Analog Electronic Circuit Design).
A similar problem is the automated analog circuit design, where a circuit diagram involving
a combination of different electronic components (resistors, capacitors, inductors, diodes,
and/or transistors) is sought which fulfills certain requirements. Here, the tasks span from
the synthesis of amplifiers [1607, 1766], filters [1766], such as Butterworth [1608, 1763, 1764]
and Chebycheff filters [1608], to analog robot controllers [1613].
Computing the features of analog circuits can be done by hand with some effort, but
requires deeper knowledge of electrical engineering. If an analysis of the frequency and timing
behavior is required, the difficulty of the equations quickly goes out of hand. Therefore, in
order to evaluate the fitness of circuits, simulation environments [2249] like SPICE [2815]
are normally used. Typically, a set of server computers is employed to execute these external
programs so the utility of multiple candidate solutions can be determined in parallel.
Example E2.5 (Aircraft Design).
Aircraft design is another domain where many optimization problems can be spotted.
Obayashi [2059] and Oyama [2111], for instance, tried to find wing designs with minimal
drag and weight but maximum tank volume, whereas Chung et al. [579] searched a geometry
with a minimum drag and shock pressure/sonic boom for transonic aircrafts and Lee and
Hajela [1704, 1705] optimized the design of rotor blades for helicopters. Like in the electrical
engineering example, the computational costs of computing the objective values rating the
aforementioned features of the candidate solutions are high and involve simulating the air
flows around the wings and blades.
5
http://en.wikipedia.org/wiki/Antenna_gain [accessed 2009-06-19]
48 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS
Example E2.6 (Interactive Optimization).
Especially in the area of Evolutionary Computation, some interactive optimization methods
have been developed. The Interactive Genetic Algorithms
6
(IGAs) [471, 2497, 2529, 2651,
2652] outsource the objective function evaluation to human beings who, for instance, judge
whether an evolved graphic
7
or movie [2497, 2746] is attractive or music
8
[1453] sounds good.
By incorporating this information in the search process, esthetic works of art are created. In
this domain, it is not really possible to devise any automated method for assigning objective
values to the points in the problem space. When it comes to judging art, human beings will
not be replaced by machines for a long time.
In Kosorukoff’s Human Based Genetic Algorithm
9
(HBGAs, [1583]) and Unemi’s graphic
and movie tool [2746], this approach is driven even further by also letting human “agents”
select and combine the most interesting candidate solutions step by step.
All interactive optimization algorithms have to deal with the peculiarities of their hu-
man “modules”, such as indeterminism, inaccuracy, fickleness, changing attention and loss
of focus which, obviously, also rub off on the objective functions. Nevertheless, similar fea-
tures may also occur if determining the utility of a candidate solution involves randomized
simulations or measuring data from real-world processes.
Let us now summarize the information gained from these examples. The first step when
approaching an optimization problem is to choose the problem space X, as we will later
discuss in detail in Chapter 7 on page 109. Objective functions are not necessarily mere
mathematical expressions, but can be complex algorithms that, for example, involve multiple
simulations. Global Optimization comprises all techniques that can be used to find the
best elements x

in X with respect to such criteria. The next question to answer is What
constitutes these “best” elements?
6
http://en.wikipedia.org/wiki/Interactive_genetic_algorithm [accessed 2007-07-03]
7
http://en.wikipedia.org/wiki/Evolutionary_art [accessed 2009-06-18]
8
http://en.wikipedia.org/wiki/Evolutionary_music [accessed 2009-06-18]
9
http://en.wikipedia.org/wiki/HBGA [accessed 2007-07-03]
2.2. OBJECTIVE FUNCTIONS: IS IT GOOD? 49
Tasks T2
9. In Example E1.2 on page 26, we stated the layout of circuits as a combinatorial opti-
mization problem. Define a possible problem space X for this task that allows us to
assign each circuit c ∈ C to one location of an m n-sized card. Define an objective
function f : X → R
+
which denotes the total length of required wiring required for a
given element x ∈ X. For this purpose, you can ignore wire crossing and simply assume
that a wire going from location (i, j) to location (k, l) has length [i −k[ +[j −l[.
[10 points]
10. Name some problems which can occur in circuit layout optimization as outlined in
Task 9 if the mentioned simplified assumption is indeed applied.
[5 points]
11. Define a problem space for general job shop scheduling problems as outlined in Exam-
ple E1.3 on page 26. Do not make the problem specific to the cola/lemonade example.
[10 points]
12. Assume a network N = ¦n
1
, n
2
, . . . , n
m
¦ and that a mm distance matrix ∆ is given
that contains the distance between node n
i
and n
j
at position ∆
ij
. A value of +∞ at

ij
means that there is no direct connection n
i
→n
j
between node n
i
and n
j
. These
nodes may, however, be connected by a path which leads over other, intermediate
nodes, such as n
i
→ n
A
→ n
B
→ n
j
. In this case, n
j
can be reached from n
i
over
a path leading first to n
A
and then to n
B
before finally arriving at n
j
. Hence, there
would be a connection of finite length.
We consider the specific optimization task Find the shortest connection between node
n
1
and n
2
. Develop a problem space X which is able to represent all paths between
the first and second node in an arbitrary network with m ≥ 2. Note that there is not
necessarily a direct connection between n
1
and n
2
.
Define a suitable objective function for this problem.
[15 points]
Chapter 3
Optima: What does good mean?
We already said that Global Optimization is about finding the best possible solutions for
given problems. Let us therefore now discuss what it is that makes a solution optimal
1
.
3.1 Single Objective Functions
In the case of optimizing a single criterion f, an optimum is either its maximum or minimum,
depending on what we are looking for. In the case of antenna design (Example E2.3), the
antenna gain was maximized and in Example E2.2, the goal was to find a cyclic route of
minimal distance through five Chinese cities. If we own a manufacturing plant and have to
assign incoming orders to machines, we will do this in a way that miniminzes the production
time. We will, however, arrange the purchase of raw material, the employment of staff, and
the placement of commercials in a way that maximizes the profit, on the other hand.
In Global Optimization, it is a convention that optimization problems are most often
defined as minimization tasks. If a criterion f is subject to maximization, we usually simply
minimize its negation (−f) instead. In this section, we will give some basic definitions on
what the optima of a single (objective) function are.
Example E3.1 (two-dimensional objective function).
Figure 3.1 illustrates such a function f defined over a two-dimensional space X = (X
1
, X
2
) ⊂
R
2
. As outlined in this graphic, we distinguish between local and global optima. As illus-
trated, a global optimum is an optimum of the whole domain X while a local optimum is
an optimum of only a subset of X.
1
http://en.wikipedia.org/wiki/Maxima_and_minima [accessed 2007-07-03]
52 3 OPTIMA: WHAT DOES GOOD MEAN?
localmaximum
localminimum
globalminimum
localmaximum
globalmaximum
X
X
1
X
2
f
Figure 3.1: Global and local optima of a two-dimensional function.
3.1.1 Extrema: Minima and Maxima of Differentiable Functions
A local minimum ˇ x
l
is a point which has the smallest objective value amongst its neighbors
in the problem space. Similarly, a local maximum ˆ x
l
has the highest objective value when
compared to its neighbors. Such points are called (local) extrema
2
. In optimization, a local
optimum is always an extremum, i. e., either a local minimum or maximum.
Back in high school, we learned: “If f : X → R (X ⊆ R) has a (local) extremum at x

l
and is differentiable at that point, then the first derivative f

is zero at x

l
.”
x

l
is local extremum ⇒f

(x

l
) = 0 (3.1)
Furthermore, if f can be differentiated two times at x

l
, the following holds:
(f

(x

l
) = 0) ∧ (f
′′
(x

l
) > 0) ⇒ x

l
is local minimum (3.2)
(f

(x

l
) = 0) ∧ (f
′′
(x

l
) < 0) ⇒ x

l
is local maximum (3.3)
Alternatively, we can detect a local minimum or maximum also when the first derivative
undergoes a sign change. If, for a small h, sign (f

(x

l
−h)) ,= sign (f

(x

l
+h)), then we can
state that:
(f

(x

l
) = 0) ∧
_
lim
h→0
f

(x

l
−h) < 0
_

_
lim
h→0
f

(x

l
+h) > 0
_
⇒ x

l
is local minimum(3.4)
(f

(x

l
) = 0) ∧
_
lim
h→0
f

(x

l
−h) > 0
_

_
lim
h→0
f

(x

l
+h) < 0
_
⇒ x

l
is local maximum(3.5)
2
http://en.wikipedia.org/wiki/Maxima_and_minima [accessed 2010-09-07]
3.1. SINGLE OBJECTIVE FUNCTIONS 53
Example E3.2 (Extrema and Derivatives).
The function f(x) = x
2
+ 5 has the first derivative f

(x) = 2x and the second derivative
f
′′
(x) = 2. To solve f

(x

l
) = 0 we set 0 = 2x

l
and get, obviously, x

l
= 0. For x

l
= 0,
f
′′
(x

l
) = 2 and hence, x

l
is a local minimum according to Equation 3.2.
If we take the function g(x) = x
4
− 2, we get g

(x) = 4x
3
and g
′′
(x) = 12x
2
. At x

l
,
both g

(x

l
) and g
′′
(x

l
) become 0. We cannot use Equation 3.2. However, if we take a small
step to the left, i. e., into x < 0, x < 0 ⇒ (4x
3
) < 0 ⇒ g

(x) < 0. For positive x, however,
x > 0 ⇒ (4x
3
) > 0 ⇒ g

(x) > 0. We have a sign change from − to + and according to
Equation 3.4, x

l
is a minimum of g(x).
From many of our previous examples such as Example E1.1 on page 25, Example E2.2 on
page 45, or Example E2.4 on page 47, we know that objective functions in many optimization
problems cannot be expressed in a mathematically closed form, let alone in one that can be
differentiated. Furthermore, Equation 3.1 to Equation 3.5 can only be directly applied if the
problem space is a subset of the real numbers X ⊆ R. They do not work for discrete spaces
or for vector spaces. For the two-dimensional objective function given in Example E3.1, for
example, we would need to a apply Equation 3.6, the generalized form of Equation 3.1:
x

l
is local extremum ⇒grad(f)(x

l
) = 0 (3.6)
For distinguishing maxima, minima, and saddle points, the Hessian matrix
3
would be needed.
Even if the objective function exists in a differentiable form, for problem spaces X ⊆ R
n
of larger scales n > 5, computing the gradient, the Hessian, and their properties becomes
cumbersome. Derivative-based optimization is hence only feasible in a subset of optimization
problems.
In this book, we focus on derivative-free optimization. The precise definition of local
optima without using derivatives requires some sort of measure of what neighbor means
in the problem space. Since this requires some further definitions, we will postpone it to
Section 4.4.
3.1.2 Global Extrema
The definitions of globally optimal points are much more straightforward and can be given
based on the terms we already have discussed:
Definition D3.1 (Global Maximum). A global maximum ˆ x ∈ x of one (objective) func-
tion f : X →R is an input element with f(ˆ x) ≥ f(x) ∀x ∈ X.
Definition D3.2 (Global Minimum). A global minimum ˇ x ∈ X of one (objective) func-
tion f : X →R is an input element with f(ˇ x) ≤ f(x) ∀x ∈ X.
Definition D3.3 (Global Optimum). A global optimum x

∈ X of one (objective) func-
tion f : X →R subject to minimization is a global minimum. A global optimum of a function
subject to maximization is a global maximum.
Of course, each global optimum, minimum, or maximum is, at the same time, also a local
optimum, minimum, or maximum. As stated before, the he definition of local optima is a
bit more complicated and will be discussed in detail in Section 4.4.
3
http://en.wikipedia.org/wiki/Hessian_matrix [accessed 2010-09-07]
54 3 OPTIMA: WHAT DOES GOOD MEAN?
3.2 Multiple Optima
Many optimization problems have more than one globally optimal solution. Even a one-
dimensional function f : R → R can have more than one global maximum, multiple global
minima, or even both in its domain X.
Example E3.3 (One-Dimensional Functions with Multiple Optima).
If the function f(x) =
¸
¸
x
3
¸
¸
−8x
2
(illustrated in Fig. 3.2.a) was subject to minimization, we
-100
0
100
200
-10 -5 0 5 10
x X Î
f(x)
x
1
^
x
2
^
Fig. 3.2.a:

x
3

−8x
2
: a function with
two global minima.
-1
-0.5
0
0.5
1
-10 -5 0 5 10
x
0
^ x
-1
^ x ^
1
x X Î
f(x)
Fig. 3.2.b: The cosine function: in-
finitely many optima.
20
40
60
80
100
-10 -5 0 5 10
xÎX
f(x)
X
^
Fig. 3.2.c: A function with uncount-
able many minima.
Figure 3.2: Examples for functions with multiple optima
would find that it has two global optimal x

1
= ˇ x
1
= −
16
3
and x

2
= ˇ x
2
=
16
3
. Another
interesting example is the cosine function sketched in Fig. 3.2.b. It has global maxima ˆ x
i
at
ˆ x
i
= 2iπ and global minima ˇ x
i
at ˇ x
i
= (2i + 1)π for all i ∈ Z. The correct solution of such
an optimization problem would then be a set X

of all optimal inputs in X rather than a
single maximum or minimum.
In Fig. 3.2.c, we sketch a function f(x) which evaluates x
2
outside the interval [−5, 5]
and has the value 25 in [−5, 5]. Hence, all the points in the real interval [−5, 5] are minima.
If the function is subject to minimization, it has uncountable infinite global optima.
As mentioned before, the exact meaning of optimal is problem dependent. In single-objective
optimization, it means to find the candidate solutions which lead to either minimum or max-
imum objective values. In multi-objective optimization, there exist a variety of approaches
to define optima which we will discuss in-depth in Section 3.3.
Definition D3.4 (Optimal Set). The optimal set X

⊆ X of an optimization problem is
the set that contains all its globally optimal solutions.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 55
There are often multiple, sometimes even infinite many optimal solutions in many optimiza-
tion problems. Since the storage space in computers is limited, it is only possible to find a
finite (sub-)set of solutions. In the further text, we will denote the result of optimization
algorithms which return single candidate solutions (in sans-serif font) as x and the results
of algorithms which are capable to produce sets of possible solutions as X = ¦x
1
, x
2
, . . . ¦.
Definition D3.5 (Optimization Result). The set X ⊆ X contains output elements x ∈ X
of an optimization process.
Obviously, the best outcome of an optimization process would be to find X = X

. From
Example E3.3, we already know that this might actually be impossible if there are too
many global optima. Then, the goal should be to find a X ⊂ X

in a way that the candidate
solutions are evenly distributed amongst the global optima and not all of them cling together
in a small area of X

. We will discuss the issue of diversity amongst the optima in Chapter 13
in more depth.
Besides that we are maybe not able to find all global optima for a given optimization
problem, it may also be possible that we are not able to find any global optimum at all or that
we cannot determine whether the results x ∈ X which we have obtained are optimal or not.
There exists, for instance, no algorithm which can solve the Traveling Salesman Problem
(see Example E2.2) in polynomial time, as described in detail in Section 12.3. Therefore,
if such a task involving a high enough number of locations is subject to optimization, an
efficient approximation algorithm might produce reasonably good x in short time. In order
to know whether these are global optima, we would, however, need to wait for thousands of
years. Hence, in many problems, the goal of optimization is not necessarily to find X = X

or X ⊂ X

but a good approximation X ≈ X

in a reasonable time.
3.3 Multiple Objective Functions
3.3.1 Introduction
Global Optimization techniques are not just used for finding the maxima or minima of single
functions f. In many real-world design or decision making problems, there are multiple
criteria to be optimized at the same time. Such tasks can be defined as in Definition D3.6
which will later be refined in Definition D6.2 on page 103.
Definition D3.6 (Multi-Objective Optimization Problem (MOP)). In a multi-objective
optimization problem (MOP), a set f : X → R
n
consisting of n objective functions
f
i
: X → R, each representing one criterion, is to be optimized over a problem space
X [596, 740, 954].
f = ¦f
i
: X →R : i ∈ 1..n¦ (3.7)
f (x) = (f
1
(x) , f
2
(x) , . . . )
T
(3.8)
In the following, we will treat these sets as vector functions f : X → R
n
which return a
vector of real numbers of dimension n when applied to a candidate solution x ∈ X. To the
R
n
we will in this case refer to as objective space Y. Algorithms designed to optimize sets
of objective functions are usually named with the prefix multi-objective, like multi-objective
Evolutionary Algorithms (see Definition D28.2 on page 253).
Purshouse [2228, 2230] provides a nice classification of the possible relations between
the objective functions which we illustrate here as Figure 3.3. In the best case from the
56 3 OPTIMA: WHAT DOES GOOD MEAN?
relation
dependent independent
conflict harmony
Figure 3.3: The possible relations between objective functions as given in [2228, 2230].
optimization perspective, two objective functions f
1
and f
2
are in harmony, written as f
1
∼ f
2
,
meaning that improving one criterion will also lead to an improvement in the other. Then, it
would suffice to optimize only one of them and the MOP can be reduced to a single-objective
one.
Definition D3.7 (Harmonizing Objective Functions). In a multi-objective optimiza-
tion problem, two objective functions f
i
and f
j
are harmonizing (f
i

X
f
j
) in a subset X ⊆ X
of the problem space X if Equation 3.9 holds.
f
i

X
f
j
⇒[f
i
(x
1
) < f
i
(x
2
) ⇒ f
j
(x
1
) < f
j
(x
2
) ∀x
1
, x
2
∈ X ⊆ X] (3.9)
The opposite are conflicting objectives where an improvement in one criterion leads to a
degeneration of the utility of the candidate solutions according to another criterion. If two
objectives f
3
and f
4
conflict, we denote this as f
3
≁ f
4
. Obviously, objective functions do not
need to be conflicting or harmonic for their complete domain X, but instead may exhibit
these properties on certain intervals only.
Definition D3.8 (Conflicting Objective Functions). In a multi-objective ptimization
problem, two objective functions f
i
and f
j
are conflicting (f
i

X
f
j
) in a subset X ⊆ X of
the problem space X if Equation 3.10 holds in a subset X of the problem space X.
f
i

X
f
j
⇒[f
i
(x
1
) < f
i
(x
2
) ⇒ f
j
(x
1
) > f
j
(x
2
) ∀x
1
, x
2
∈ X ⊆ X] (3.10)
Besides harmonizing and conflicting optimization criteria, there may be such which are
independent from each other. In a harmonious relationship, the optimization criteria can
be optimized simultaneously, i. e., an improvement of one leads to an improvement of the
other. If the criteria conflict, the exact opposite is the case. Independent criteria, however,
do not influence each other: It is possible to modify candidate solutions in a way that leads
to a change in one objective value while the other one remains constant (see Example E3.8).
Definition D3.9 (Independent Objective Functions). In a multi-objective optimiza-
tion problem, two objective functions f
i
and f
j
are independent from each other in a subset
X ⊆ X of the problem space X if the MOP can be decomposed into two problems which can
optimized separately and whose solutions can then be composed to solutions of the original
problem.
If X = X, these relations are called total and the subscript X is left away in the notations. In
the following, we give some examples for typical situations involving harmonizing, conflicting,
and independent optimization criteria. We also list some typical multi-objective optimization
applications in Table 3.1, before discussing basic techniques suitable for defining what is a
“good solution” in problems with multiple, conflicting criteria.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 57
Example E3.4 (Factory Example).
Multi-objective optimization often means to compromise conflicting goals. If we go back to
our factory example, we can specify the following objectives that all are subject to optimiza-
tion:
1. Minimize the time between an incoming order and the shipment of the corresponding
product.
2. Maximize the profit.
3. Minimize the costs for advertising, personal, raw materials etc..
4. Maximize the quality of the products.
5. Minimize the negative impact of waste and pollution caused by the production process
on environment.
The last two objectives seem to clearly conflict with the goal cost minimization. Between the
personal costs and the time needed for production and the product quality there should also
be some kind of contradictive relation, since producing many quality goods in short time
naturally requires many hands. The exact mutual influences between optimization criteria
can apparently become complicated and are not always obvious.
Example E3.5 (Artificial Ant Example).
Another example for such a situation is the Artificial Ant problem
4
where the goal is to
find the most efficient controller for a simulated ant. The efficiency of an ant is not only
measured by the amount of food it is able to pile. For every food item, the ant needs to
walk to some point. The more food it aggregates, the longer the distance it needs to walk.
If its behavior is driven by a clever program, it may take the ant a shorter walk to pile the
same amount of food. This better path, however, would maybe not be discovered by an ant
with a clumsy controller. Thus, the distance the ant has to cover to find the food (or the
time it needs to do so) has to be considered in the optimization process, too.
If two control programs produce the same results and one is smaller (i. e., contains
fewer instructions) than the other, the smaller one should be preferred. Like in the factory
example, these optimization goals conflict with each other: A program which contains only
one instruction will likely not guide the ant to a food item and back to the nest.
From these both examples, we can gain another insight: To find the global optimum could
mean to maximize one function f
i
∈ f and to minimize another one f
j
∈ f , (i ,= j). Hence,
it makes no sense to talk about a global maximum or a global minimum in terms of multi-
objective optimization. We will thus retreat to the notation of the set of optimal elements
x

∈ X

⊆ X in the following discussions.
Since compromises for conflicting criteria can be defined in many ways, there exist mul-
tiple approaches to define what an optimum is. These different definitions, in turn, lead to
different sets X

. We will discuss some of these approaches in the following by using two
graphical examples for illustration purposes.
Example E3.6 (Graphical Example 1).
In the first graphical example pictured in Figure 3.4, we want to maximize two independent
objective functions f
1
= ¦f
1
, f
2
¦. Both functions have the real numbers R as problem space
X
1
. The maximum (and thus, the optimum) of f
1
is ˆ x
1
and the largest value of f
2
is at ˆ x
2
.
In Figure 3.4, we can easily see that f
1
and f
2
are partly conflicting: Their maxima are at
different locations and there even exist areas where f
1
rises while f
2
falls and vice versa.
4
See ?? on page ?? for more details.
58 3 OPTIMA: WHAT DOES GOOD MEAN?
y=f (x)
1
y=f (x)
2
y
x^
1
x^
2
xÎX
1
Figure 3.4: Two functions f
1
and f
2
with different maxima ˆ x
1
and ˆ x
2
.
Example E3.7 (Graphical Example 2).
The objective functions f
1
and f
2
in the first example are mappings of a one-dimensional prob-
x
3
^
x
4
^
x
1
^
x
2
^
X
2
x
2 x
2
x
1
x
1
f
3
f
4
Figure 3.5: Two functions f
3
and f
4
with different minima ˇ x
1
, ˇ x
2
, ˇ x
3
, and ˇ x
4
.
lem space X
1
to the real numbers that are to be maximized. In the second example sketched
in Figure 3.5, we instead minimize two functions f
3
and f
4
that map a two-dimensional
problem space X
2
⊂ R
2
to the real numbers R. Both functions have two global minima; the
lowest values of f
3
are ˇ x
1
and ˇ x
2
whereas f
4
gets minimal at ˇ x
3
and ˇ x
4
. It should be noted
that ˇ x
1
,= ˇ x
2
,= ˇ x
3
,= ˇ x
4
.
Example E3.8 (Independent Objectives).
Assume that the problem space are the three-dimensional real vectors R
3
and the two
objectives f
1
and f
2
would be subject to minimization:
f
1
(x) =
_
(x
1
−5)
2
+ (x
2
+ 1)
2
(3.11)
f
2
(x) = x
2
3
−sin x −3 (3.12)
This problem can be decomposed into two separate problems, since f
1
can be optimized on
the first two vector elements x
1
, x
2
alone and f
2
can be optimized by considering only x
3
while the values of x
1
and x
2
do not matter. After solving the two separate optimization
problems, their solutions can be combined to a full vector x

= (x

1
, x

2
, x

3
)
T
by taking the
solutions x

1
and x

2
from the optimization of f
1
and x

2
from the optimization process solving
f
2
.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 59
Example E3.9 (Robotic Soccer Cont.).
In Example E1.10 on page 37, we mentioned that in robotic soccer, the robots must try to
decide which action to take in order to maximize the chance of scoring a goal. If decision
making in robotic soccer was only based on this criteria, it is likely that most mates in
a robot team would jointly attack the goal of the opposing team, leaving the own one
unprotected. Hence, at least one second objective is important too: to minimize the chance
that the opposing team scores a goal.
Table 3.1: Applications and Examples of Multi-Objective Optimization.
Area References
Business-To-Business [1559]
Chemistry [306, 668]
Combinatorial Problems [366, 491, 527, 654, 690, 1034, 1429, 1440, 1442, 1553, 1696,
1756–1758, 1859, 2019, 2025, 2158, 2472, 2473, 2862, 2895];
see also Table 20.1
Computer Graphics [1661]; see also Table 20.1
Data Mining [2069, 2398, 2644, 2645]; see also Table 20.1
Databases [1696]
Decision Making [198, 320, 861, 1299, 1559, 1571, 1801]; see also Table 20.1
Digital Technology [789, 869, 1479]
Distributed Algorithms and Sys-
tems
[320, 605, 662, 663, 669, 861, 1299, 1488, 1559, 1571, 1574,
1801, 1840, 2071, 2644, 2645, 2862, 2897]
Economics and Finances [320, 662, 857, 1559]
Engineering [93, 519, 547, 548, 579, 605, 669, 737, 789, 869, 953, 1000,
1479, 1488, 1508, 1574, 1622, 1717, 1840, 1873, 2059, 2111,
2158, 2644, 2645, 2862, 2897]
Games [560, 997, 2102]
Graph Theory [669, 1488, 1574, 1840, 2025, 2897]
Logistics [527, 579, 861, 1034, 1429, 1442, 1553, 1756, 1859, 2059,
2111]
Mathematics [560, 662, 857, 869, 997, 1661, 2102]
Military and Defense [1661]
Multi-Agent Systems [1661]
Physics [1000]
Security [1508, 1661]
Software [1508, 1559, 2863, 2897]; see also Table 20.1
Testing [2863]
Water Supply and Management [1268]
Wireless Communication [1717, 1796, 2158]
3.3.2 Lexicographic Optimization
The maybe simplest way to optimize a set f of objective functions is to solve them according
to priorities [93, 860, 2291, 2820]. Without loss of generality, we could, for instance, assume
that f
1
is strictly more important than f
2
, which, in turn, is more important than f
3
, and
so on. We then would first determine the optimal set X

(1)
according to f
1
. If this set
60 3 OPTIMA: WHAT DOES GOOD MEAN?
contains more than one solution, we will then retain only those elements in X

(1)
which
have the best objective values according to f
2
compared to the other members of X

(1)
and
obtain X

(1,2)
⊆ X

(1)
. This process is repeated until either
¸
¸
¸X

(1,2,... )
¸
¸
¸ = 1 or all criteria have
been processed. By doing so, the multi-objective optimization problem has been reduced to
n = [f [ single-objective ones to be solved iteratively. Then, the problem space X
i
of the i
th
problem equals X only for i = 1 and then is X
i
∀i > 1 = X

(1,2,..i−1)
[860].
Definition D3.10 (Lexicographically Better). A candidate solution x
1
∈ X is lexico-
graphically better than another element x
2
∈ X (denoted by x
1
<
lex
x
2
) if
x
1
<
lex
x
2
⇔ (ω
i
f
i
(x
1
) < ω
i
f
i
(x
2
)) ∧ (i = min ¦k : f
k
(x
1
) ,= f
k
(x
2
)¦) (3.13)
ω
j
=
_
1 if f
j
should be minimized
−1 if f
j
should be maximized
(3.14)
where the ω
j
in Equation 3.13 only denote whether f
j
should be maximized or minimized,
as specified in Equation 3.14.
Since we generally consider minimization problems, the ω
j
will be assumed to be 1 if not
explicitly stated otherwise. If n objective functions are to be optimized, there are n! =
[Π(1..n)[ possible permutations of them where Π(1..n) is the set of all permutations of the
natural numbers from 1 to n. For each such permutation, the described approach may lead
to different results. We can thus refine the previous definition of “lexicographically better”
by putting it into the context of a permutation of the objective functions:
Definition D3.11 (Lexicographically Better (Refined)). A candidate solution x
1

X is lexicographically better than another element x
2
∈ X (denoted by x
1
<
lex,π
x
2
) according
to a permutation π ∈ Π(1..n) applied to the objective functions if
x
1
<
lex,π
x
2

_
ω
π[i]
f
π[i]
(x
1
) < ω
π[i]
f
π[i]
(x
2
)
_

_
i = min
_
k : f
k
(x
1
) ,= f
π[k]
(x
2
)
__
(3.15)
where n = len(f ) is the number of objective functions, π[i] is the i
th
element of the permu-
tation π, and the ω
i
retain their meaning from Equation 3.14.
Based on the <
lex
-relation, we can now define optimality.
Definition D3.12 (Lexicographic Optimality). A candidate solution x

is lexicograph-
ically optimal in terms of a permutation π ∈ Π(1..n) applied to the objective functions (and
therefore, member of the optimal set X

π
) if there is no lexicographically better element in
X, i. e.,
x

∈ X

π
⇔,∃x ∈ X : x<
lex,π
x

(3.16)
As lexicographically global optimal set X

we consider the union of the optimal sets X

π
for
all possible permutations π of the first n natural numbers:
X

=
_
∀π∈Π(1..n)
X

π
(3.17)
X

can be computed by solving all the [Π(1..n)[ = n! different multi-objective problems
arising, i. e., the n ∗ n! single-objective problems, separately. This would lead to a more
than exponential complexity in n. In case that X is a finite set, this can be achieved in
polynomial time in n [860]. But which solutions would be contained in the X

π
or X

if we
use the lexicographical definition of optimality?
3.3. MULTIPLE OBJECTIVE FUNCTIONS 61
Example E3.10 (Example E3.4 Cont. Lexicographically).
In the factory example Example E3.4, we wished to minimize the production time
(f
1
, ω
1
= 1), maximize the profit (f
2
, ω
2
= −1), minimize the running costs (f
3
, ω
3
= 1),
maximize a product quality measure (f
4
, ω
4
= −1), and minimize environmental damage
(f
5
, ω
5
= 1). Determining lexicographically optimal set X

π
for the identity permutation
π = (1, 2, 3, 4, 5) means to first determine all possible configurations of the factory the
minimize the production time f
1
. Only if there is more than one such configuration
_
len
_
X

(1)
_
> 1
_
, we consider f
2
. Amongst all elements in X

(1)
, we then copied those
to X

(1,2)
which have the highest f
2
-value and so on. What we get in X

π
are configurations
which perform exceptionally well in terms of the production time but sacrifice profit, running
costs, production quality, and will largely disregard any environmental issues. No trade-off
between the objectives will happen at all. Therefore, the set of all lexicographically optimal
solutions X

will contain the elements x

∈ X which have the best objective values regarding
to one single objective function while largely neglecting the others.
Example E3.11 (Example E3.5 Cont. Lexicographically).
The same would be the case in the Artificial Ant problem. X

, if determined lexicographi-
cally, contains the
1. smallest possible program (length 0),
2. the program which leads to the highest possible food collection, and
3. the program which guides the ant along the shortest possible path (which is again the
empty program).
Example E3.12 (Example E3.6 Cont. Lexicographically).
There are exactly two lexicographically optimal points on the two functions illustrated in
Figure 3.4. The first one x

1
is the single global maximum ˆ x
1
of f
1
and the other one (x

2
) is
the single global maximum ˆ x
2
of f
2
. In other words, X

(1,2)
= ¦x

1
¦ and X

(2,1)
= ¦x

2
¦. Since
Π(1, 2) = ¦(1, 2) , (2, 1)¦, it follows that X

= X

(1,2)
∪ X

(2,1)
= ¦x

1
, x

2
¦.
Example E3.13 (Example E3.7 Cont. Lexicographically).
The two-dimensional objective functions f
3
and f
4
from Figure 3.5 each have two global
minima ˇ x
1
, ˇ x
2
and ˇ x
3
, ˇ x
4
, respectively. Although it is a bit hard to see in the figure itself,
f
4
(ˇ x
1
) = f
4
(ˇ x
2
) and f
3
(ˇ x
3
) = f
3
(ˇ x
4
) because of the symmetry in the two objective functions.
This means that X

(3,4)
= ¦ˇ x
1
, ˇ x
2
¦, that X

(4,3)
= ¦ˇ x
3
, ˇ x
4
¦, and that X

= X

(3,4)
∪ X

(4,3)
=
¦ˇ x
1
, ˇ x
2
, ˇ x
3
, ˇ x
4
¦.
3.3.2.1 Problems with the Lexicographic Method
From these examples it should become clear that the lexicographic approach to defining what
is optimal and what not has some advantages and some disadvantages. On the positive side,
we find that it is very easy realize and that it allows us to treat a multi-objective optimization
problem by iteratively solving single-objective ones. On the downside, it does not perform
any trade-off between the different objective functions at all and we only find the extreme
values of the optimization criteria. In the factory example, for instance, we would surely
be interested in solutions which lead to a short production time but do not entirely ignore
any budget issues. Such differentiations, however, are not possible with the lexicographic
method.
62 3 OPTIMA: WHAT DOES GOOD MEAN?
3.3.3 Weighted Sums (Linear Aggregation)
Another very simple method to define what an optimal solution is, is to compute a weighted
sum ws(x) of all objective functions f
i
(x) ∈ f . Since a weighted sum is a linear aggregation
of the objective values, it is also often referred to as linear aggregating. Each objective f
i
is
multiplied with a weight w
i
representing its importance. Using signed weights allows us to
minimize one objective while maximizing another one. We can, for instance, apply a weight
w
a
= 1 to an objective function f
a
and the weight w
b
= −1 to the criterion f
b
. By minimizing
ws(x), we then actually minimize the first and maximize the second objective function. If
we instead maximize ws(x), the effect would be inverse and f
b
would be minimized and while
f
a
is maximized. Either way, multi-objective problems are reduced to single-objective ones
with this method. As illustrated in Equation 3.19, a global optimum then is a candidate
solution x

∈ X for which the weighted sum of all objective values is less or equal to those
of all other candidate solutions (if minimization of the weighted sum is assumed).
ws(x) =
n

i=1
w
i
f
i
(x) =

∀f
i
∈f
w
i
f
i
(x) (3.18)
x

∈ X

⇔ ws(x

) ≤ ws(x) ∀x ∈ X (3.19)
With this method, we can also emulate lexicographical optimization in many cases by simply
setting the weights to values of different magnitude. If the objective functions f
i
∈ f all had
the range f
i
: X → 0..10, for instance, setting the weights to w
i
= 10
i
would effectively
achieve this.
Example E3.14 (Example E3.6 Cont.: Weighted Sums).
Figure 3.6 demonstrates optimization with the weighted sum approach for the example given
in Example E3.6. The weights are both set to 1 = w
1
= w
2
. If we maximize ws
1
(x), we will
thus also maximize the functions f
1
and f
2
. This leads to a single optimum x

= ˆ x.
y
y =f (x)
1 1
y =f (x)
2 2
y=ws (x)=f (x)+f (x)
1 1 2
x^
xÎX
1
Figure 3.6: Optimization using the weighted sum approach (first example).
Example E3.15 (Example E3.7 Cont.: Weighted Sums).
The sum of the two-dimensional functions f
3
and f
4
from the second graphical Example E3.7
is sketched in Figure 3.7. Again, we set the weights w
3
and w
4
to 1. The sum ws
2
, however,
is subject to minimization. The graph of ws
2
has two especially deep valleys. At the bottoms
of these valleys, the two global minima ˇ x
5
and ˇ x
6
can be found.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 63
x
2 x
1
y
x
5
^
x
6
^
y=ws (x)=f (x)+f (x)
2 3 4
Figure 3.7: Optimization using the weighted sum approach (second example).
3.3.3.1 Problems with Weighted Sums
This approach has a variety of drawbacks. One is that it cannot handle functions that rise
or fall with different speed
5
properly. Whitley [2929] pointed out that the results of an
objective function should not be considered as a precise measure of utility. Adding multiple
such values up thus does not necessarily make sense.
Example E3.16 (Problematic Objectives for Weighted Sums).
In Figure 3.8, we have sketched the sum ws(x) of the two objective functions f
1
(x) = −x
2
and f
2
(x) = e
x−2
. When minimizing or maximizing this sum, we will always disregard
one of the two functions, depending on the interval chosen. For small x, f
2
is negligible
compared to f
1
. For x > 5 it begins to outpace f
1
which, in turn, will now become negligible.
Such functions cannot be added up properly using constant weights. Even if we would set
w
1
to the really large number 10
10
, f
1
will become insignificant for all x > 40, because
¸
¸
¸
¸
−(40
2
)∗10
10
e
40−2
¸
¸
¸
¸
≈ 0.0005.
5
See Definition D12.1 on page 144 or http://en.wikipedia.org/wiki/Asymptotic_notation [ac-
cessed 2007-07-03] for related information.
64 3 OPTIMA: WHAT DOES GOOD MEAN?
45
35
25
15
-5
5
-5 -3 -1
-15
-25
1 3 5
y =f (x)
1 1
y =f (x)
2 2
y=g(x)
=f (x)+f (x)
1 2
Figure 3.8: A problematic constellation for the weighted sum approach.
Weighted sums are only suitable to optimize functions that at least share the same big-O
notation (see Definition D12.1 on page 144). Often, it is not obvious how the objective
functions will fall or rise. How can we, for instance, determine whether the objective max-
imizing the food piled by an Artificial Ant rises in comparison to the objective minimizing
the distance walked by the simulated insect?
And even if the shape of the objective functions and their complexity class were clear,
the question about how to set the weights w properly still remains open in most cases [689].
In the same paper, Das and Dennis [689] also show that with weighted sum approaches, not
necessarily all elements considered optimal in terms of Pareto domination (see next section)
will be found. Instead, candidate solutions hidden in concavities of the trade-off curve will
not be discovered [609, 1621, 1622] (see Example E3.17 on the next page for different shapes
of trade-off curves). Also, the candidate solutions which will finally be contained in the
result set X will not necessarily be evenly spaced [1000]. Amongst further authors who
discuss the drawbacks of the linear aggregation method are Koski [1582] and Messac and
Mattson [1873].
3.3.4 Weighted Min-Max
The weighted min-max approach [1303, 1305, 1554, 1741] for multi-objective optimization
compares two candidate solutions x
1
and x
2
on basis of the minimum (or maximum) of
their weighted objective values. A candidate solution x
1
∈ X is preferred in comparison
with x
2
∈ X if Equation 3.20 holds
min ¦w
i
f
i
(x
1
) : i ∈ 1..n¦ < min ¦w
i
f
i
(x
2
) : i ∈ 1..n¦ (3.20)
where w
i
are the weights of the objective values. For objective functions subject to mini-
mization, w
i
> 0 is used and w
i
is set to a value below zero for functions to be maximized.
This approach which is also called Weighted Tschebychev Norm is able to find solutions on
both, convex and concave trade-off surfaces (see Definition D3.15 on the facing page). The
weight vector describes a beam in direction (1/w
1
, 1/w
2
, .., 1/w
n
)
T
from the point of origin
in objective space to and along which the optimizer should converge.
3.3. MULTIPLE OBJECTIVE FUNCTIONS 65
3.3.5 Pareto Optimization
The mathematical foundations for multi-objective optimization which considers conflicting
criteria in a fair way has been laid by Edgeworth [857] and Pareto [2127] around one hundred
fifty years ago [1642]. Pareto optimality
6
became an important notion in economics, game
theory, engineering, and social sciences [560, 997, 2102, 2938]. It defines the frontier of
solutions that can be reached by trading-off conflicting objectives in an optimal manner.
From this front, a decision maker (be it a human or an algorithm) can finally choose the
configurations that, in her opinion, suit best [264, 528, 952, 954, 1010, 1166, 2610]. The
notation of optimal in the Pareto sense is strongly based on the definition of domination:
Definition D3.13 (Domination). An element x
1
dominates (is preferred to) an element
x
2
(x
1
⊣ x
2
) if x
1
is better than x
2
in at least one objective function and not worse with
respect to all other objectives. Based on the set f of objective functions f, we can write:
x
1
⊣ x
2
⇔ ∀i ∈ 1..n ⇒ ω
i
f
i
(x
1
) ≤ ω
i
f
i
(x
2
) ∧
∃j ∈ 1..n : ω
j
f
j
(x
1
) < ω
j
f
j
(x
2
) (3.21)
ω
i
=
_
1 if f
i
should be minimized
−1 if f
i
should be maximized
(3.22)
We provide a Java version of a comparison operator implementing the domination relation
in Listing 56.55 on page 838. Like in Equation 3.13 in the definition of the lexicographical
ordering of candidate solutions, the ω
i
again denote whether f
i
should be maximized or
minimized. Thus Equation 3.22 and Equation 3.14 are the same. Since we generally assume
minimization, the ω
i
will always be 1 if not explicitly stated otherwise in the rest of this
book. Then, if a candidate solution x
1
dominates another one (x
2
), f (x
1
) is partially less
than f (x).
The Pareto domination relation defines a strict partial order (an irreflexive, asymmetric,
and transitive binary relation, see Definition D51.16 on page 645) on the space of possible
objective values Y. In contrast, the weighted sum approach imposes a total order by pro-
jecting the objective space Y it into the real numbers R. The use of Pareto domination in
multi-objective Evolutionary Algorithms was first proposed by [1075] back in 1989 [2228].
Definition D3.14 (Pareto Optimal). An element x

∈ X is Pareto optimal (and hence,
part of the optimal set X

) if it is not dominated by any other element in the problem space
X. X

is called the Pareto-optimal set or Pareto set.
x

∈ X

⇔,∃x ∈ X : x ⊣ x

(3.23)
The Pareto optimal set will also comprise all lexicographically optimal solutions since, per
definition, it includes all the extreme points of the objective functions. Additionally, it
provides the trade-off solutions which we longed for in Section 3.3.2.1.
Definition D3.15 (Pareto Frontier). For a given optimization problem, the Pareto
front(ier) F

⊂ R
n
is defined as the set of results the objective function vector f creates
when it is applied to all the elements of the Pareto-optimal set X

.
F

= ¦f (x

) : x

∈ X

¦ (3.24)
Example E3.17 (Shapes of Optimal Sets / Pareto Frontiers).
The Pareto front in Example E13.2 has a convex geometry, but there are other different
6
http://en.wikipedia.org/wiki/Pareto_efficiency [accessed 2007-07-03]
66 3 OPTIMA: WHAT DOES GOOD MEAN?
f
1
f
2
Fig. 3.9.a: Non-Convex
(Concave)
f
1
f
2
Fig. 3.9.b: Discon-
nected
f
1
f
2
Fig. 3.9.c: linear
f
1
f
2
Fig. 3.9.d: Non-
Uniformly Distributed
Figure 3.9: Examples of Pareto fronts [2915].
shapes as well. In Figure 3.9 [2915], we show some examples, including non-convex (concave),
disconnected, linear, and non-uniformly distributed Pareto fronts. Different structures of the
optimal sets pose different challenges to achieve good convergence and spread. Optimization
processes driven by linear aggregating functions will, for instance, have problems to find all
portions of Pareto fronts with non-convex geometry as shown by Das and Dennis [689].
3.3. MULTIPLE OBJECTIVE FUNCTIONS 67
In this book, we will refer to candidate solutions which are not dominated by any element
in a reference set as non-dominated. This reference set may be the whole problem space X
itself (in which case the non-dominated elements are global optima) or any subset X ⊆ x of
it. Many optimization algorithms, for instance, work on populations of candidate solutions
and in such a context, a non-dominated candidate solution would be one not dominated by
any other element in the population.
Definition D3.16 (Domination Rank). We define the domination rank
#
dom(x, X) of
a candidate solution x relative to a reference set X as
#
dom(x, X) = [¦x

: (x

∈ X) ∧ (x

⊣ x)¦[ (3.25)
Hence, an element x with
#
dom(x, X) = 0 is non-dominated in X and a candidate solution
x

with
#
dom(x

, X) = 0 is a global optimum. The higher the domination rank, the worse
is the associated element.
Example E3.18 (Example E3.6 Cont.: Pareto Optimization).
In Figure 3.10, we illustrate the impact of the definition of Pareto optimality on our first
example (outlined in Example E3.6). We again assume that f
1
and f
2
should both be
maximized and hence, ω
1
= ω
2
= −1. The areas shaded with dark gray are Pareto optimal
and thus, represent the optimal set X

= [x
2
, x
3
] ∪ [x
5
, x
6
] which contains infinite many
elements since it is composed of intervals on the real axis. In practice, of course, our
computers can only handle finitely many elements and we would not be able to find all
optimal elements unless, of course, we could apply a method able to mathematically resolve
the objective functions and to return the intervals in a form similar to the notation we used
here. All candidate solutions not in the optimal set are dominated by at least one element
of X

.
y
x
1
x
2
x
3
x
4
x
5
x
6
y=f (x)
1
y=f (x)
2
xÎX
1
Figure 3.10: Optimization using the Pareto Frontier approach.
The points in the area between x
1
and x
2
(shaded in light gray) are dominated by other
points in the same region or in [x
2
, x
3
], since both functions f
1
and f
2
can be improved by
increasing x. In other words, f
1
and f
2
harmonize in [x
1
, x
2
): f
1

[x
1
,x
2
)
f
2
. If we start at
the leftmost point in X (which is position x
1
), for instance, we can go one small step ∆
to the right and will find a point x
1
+ ∆ dominating x
1
because f
1
(x
1
+ ∆) > f
1
(x
1
) and
f
2
(x
1
+ ∆) > f
2
(x
1
). We can repeat this procedure and will always find a new dominating
point until we reach x
2
. x
2
demarks the global maximum of f
2
– the point with the highest
possible f
2
value – which cannot be dominated by any other element of X by definition (see
Equation 3.21).
68 3 OPTIMA: WHAT DOES GOOD MEAN?
From here on, f
2
will decrease for some time, but f
1
keeps rising. If we now go a
small step ∆ to the right, we will find a point x
2
+ ∆ with f
2
(x
2
+ ∆) < f
2
(x
2
) but also
f
1
(x
2
+ ∆) > f
1
(x
2
). One objective can only get better if another one degenerates, i. e., f
1
and f
2
conflict f
1

[x
2
,x
3
]
f
2
. In order to increase f
1
, f
2
would be decreased and vice versa
and hence, the new point is not dominated by x
2
. Although some of the f
2
(x) values of the
other points x ∈ [x
1
, x
2
) may be larger than f
2
(x
2
+ ∆), f
1
(x
2
+ ∆) > f
1
(x) holds for all of
them. This means that no point in [x
1
, x
2
) can dominate any point in [x
2
, x
4
] because f
1
keeps rising until x
4
is reached.
At x
3
however, f
2
steeply falls to a very low level. A level lower than f
2
(x
5
). Since the f
1
values of the points in [x
5
, x
6
] are also higher than those of the points in (x
3
, x
4
], all points
in the set [x
5
, x
6
] (which also contains the global maximum of f
1
) dominate those in (x
3
, x
4
].
For all the points in the white area between x
4
and x
5
and after x
6
, we can derive similar
relations. All of them are also dominated by the non-dominated regions that we have just
discussed.
Example E3.19 (Example E3.7 Cont.: Pareto Optimization).
Another method to visualize the Pareto relationship is outlined in Figure 3.11 for our second
graphical example. For a certain grid-based resolution X

2
⊂ X
2
of the problem space X
2
, we
counted the number
#
dom(x, X

2
) of elements that dominate each element x ∈ X

2
. Again,
the higher this number, the worst is the element x in terms of Pareto optimization. Hence,
those candidate solutions residing in the valleys of Figure 3.11 are better than those which are
part of the hills. This Pareto ranking approach is also used in many optimization algorithms
as part of the fitness assignment scheme (see Section 28.3.3 on page 275, for instance). A
non-dominated element is, as the name says, not dominated by any other candidate solution.
These elements are Pareto optimal and have a domination rank of zero. In Figure 3.11, there
are four such areas X

1
, X

2
, X

3
, and X

4
.
x
2 x
1
#dom
X
1
«
X
2
«
X
3
«
X
4
«
Figure 3.11: Optimization using the Pareto Frontier approach (second example).
If we compare Figure 3.11 with the plots of the two functions f
3
and f
4
in Figure 3.5, we
can see that hills in the domination space occur at positions where both, f
3
and f
4
have high
values. Conversely, regions of the problem space where both functions have small values are
dominated by very few elements.
Besides these examples here, another illustration of the domination relation which may help
understanding Pareto optimization can be found in Section 28.3.3 on page 275 (Figure 28.4
and Table 28.1).
3.3. MULTIPLE OBJECTIVE FUNCTIONS 69
3.3.5.1 Problems of Pure Pareto Optimization
Besides the fact that we will usually not be able to list all Pareto optimal elements, the
complete Pareto optimal set is often also not the wanted result of the optimization process.
Usually, we are rather interested in some special areas of the Pareto front only.
Example E3.20 (Example E3.5 – Artificial Ant – Cont.).
We can again take the Artificial Ant example to visualize this problem. In Example E3.5
on page 57 we have introduced multiple conflicting criteria in this problem.
1. Maximize the amount of food piled.
2. Minimize the distance covered or the time needed to find the food.
3. Minimize the size of the program driving the ant.
If we applied pure Pareto optimization to these objective, we may yield for example:
1. A program x
1
consisting of 100 instructions, allowing the ant to gather 50 food items
when walking a distance of 500 length units.
2. A program x
2
consisting of 100 instructions, allowing the ant to gather 60 food items
when walking a distance of 5000 length units.
3. A program x
3
consisting of 50 instructions, allowing the ant to gather 61 food items
when walking a distance of 1 000 000 length units.
4. A program x
4
consisting of 10 instructions, allowing the ant to gather 1 food item
when walking a distance of 5 length units.
5. A program x
5
consisting of 0 instructions, allowing the ant to gather 0 food item when
walking a distance of 0 length units.
The result of the optimization process obviously contains useless but non-dominated indi-
viduals which occupy space in the non-dominated set. A program x
3
which forces the ant
to scan each and every location on the map is not feasible even if it leads to a maximum
in food gathering. Also, an empty program x
5
is useless because it will lead to no food
collection at all. Between x
1
and x
2
, the difference in the amount of piled food is only 20%,
but the ant driven by x
2
has to cover ten times the distance created by x
1
. Therefore, the
question of the actual utility arises in this case, too.
If such candidate solutions are created and archived, processing time is invested in evaluating
them (and we have already seen that evaluation may be a costly step of the optimization
process). Ikeda et al. [1400] show that various elements, which are apart from the true
Pareto frontier, may survive as hardly-dominated solutions (called dominance resistant, see
Section 20.2). Also, such solutions may dominate others which are not optimal but fall into
the space behind the interesting part of the Pareto front. Furthermore, memory restrictions
usually force us to limit the size of the list of non-dominated solutions X found during
the search. When this size limit is reached, some optimization algorithms use a clustering
technique to prune the optimal set while maintaining diversity. On one hand, this is good
since it will preserve a broad scan of the Pareto frontier. In this case, on the other hand, a
short but dumb program is of course very different from a longer, intelligent one. Therefore,
it will be kept in the list and other solutions which differ less from each other but are more
interesting for us will be discarded.
Better yet, non-dominated elements usually have a higher probability of being explored
further. This then leads inevitably to the creation of a great proportion of useless offspring.
70 3 OPTIMA: WHAT DOES GOOD MEAN?
In the next iteration of the optimization process, these useless offspring will need a good
share of the processing time to be evaluated.
Thus, there are several reasons to force the optimization process into a wanted direction.
In ?? on page ?? you can find an illustrative discussion on the drawbacks of strict Pareto
optimization in a practical example (evolving web service compositions).
3.3.5.2 Goals of Pareto Optimization
Purshouse [2228] identifies three goals of Pareto optimization which also mirror the afore-
mentioned problems:
1. Proximity: The optimizer should obtain result sets X which come as close to the Pareto
frontier as possible.
2. Diversity: If the real set of optimal solutions is too large, the set X of solutions actually
discovered should have a good distribution in both uniformity and extent.
3. Pertinency: The set X of solutions discovered should match the interests of the decision
maker using the optimizer. There is no use in containing candidate solutions with no
actual merits for the human operator.
3.4 Constraint Handling
Often, there are feasibility limits for the parameters of the candidate solutions or the objec-
tive function values. Such regions of interest are one of the reasons for one further extension
of multi-objective optimization problems:
Definition D3.17 (Feasibility). In an optimization problem, p ≥ 0 inequality constraints
g and q ≥ 0 equality constraints h may be imposed additionally to the objective functions.
Then, a candidate solution x is feasible, if and only if it fulfils all constraints:
isFeasible(x) ⇔ g
i
(x) ≥ 0 ∀i ∈ 1..p∧
h
j
(x) = 0 ∀j ∈ 1..q
(3.26)
A candidate solution x for which isFeasible(x) does not hold is called infeasible. Obviously,
only a feasible individual can be a solution, i. e., an optimum, for a given optimization
problem. Constraint handling and the notation of feasibility can be combined with all the
methods for single-objective and multi-objective optimization previously discussed. Com-
prehensive reviews on techniques for such problems have been provided by Michalewicz
[1885], Michalewicz and Schoenauer [1890], and Coello Coello et al. [594, 599] in the con-
text of Evolutionary Computation. In Table 3.2, we give some examples for constraints in
optimization problems.
Table 3.2: Applications and Examples of Constraint Optimization.
Area References
Combinatorial Problems [237, 1161, 1735, 1822, 1871, 1921, 2036, 2072, 2658, 3034]
Computer Graphics [2487, 2488]
Databases [1735, 1822, 2658]
3.4. CONSTRAINT HANDLING 71
Decision Making [1801]
Distributed Algorithms and Sys-
tems
[1801, 3034]
Economics and Finances [530, 531, 3034]
Engineering [500, 519, 530, 547, 1881, 3070]
Function Optimization [1732]
3.4.1 Genome and Phenome-based Approaches
One approach to ensure that constraints are always obeyed is to ensure that only feasible
candidate solutions can occur during the optimization process.
1. Coding: This can be done as part of the genotype-phenotype mapping by using so-
called decoders. An example for a decoder is given in [2472, 2473]. [2228]
2. Repairing: Infeasible candidate solutions could be repaired, i. e., turned into feasible
ones [2228, 3041] as part of the genotype-phenotype mapping as well.
3.4.2 Death Penalty
Probably the easiest way of dealing with constraints in objective and fitness space is to simply
reject all infeasible candidate solutions right away and not considering them any further in
the optimization process. This death penalty [1885, 1888] can only work in problems where
the feasible regions are very large. If the problem space contains only few or non-continuously
distributed feasible elements, such an approach will lead the optimization process to stagnate
because most effort is spent in sampling untargeted in the hope to find feasible candidate
solutions. Once this has happened, the transition to other feasible elements may be too
complicated and the search only finds local optima. Additionally, if infeasible elements are
directly discarded, the information which could be gained from them is disposed with them.
3.4.3 Penalty Functions
Maybe one of the most popular approach for dealing with constraints, especially in the
area of single-objective optimization, goes back to Courant [647] who introduced the idea of
penalty functions in 1943. Here, the constraints are combined with the objective function f,
resulting in a new function f

which is then actually optimized. The basic idea is that this
combination is done in a way which ensures that an infeasible candidate solution always has
a worse f

-value than a feasible one with the same objective values. In [647], this is achieved
by defining f

as f

(x) = f(x)+v [h(x)]
2
. Various similar approaches exist. Carroll [500, 501],
for instance, chose a penalty function of the form f

(x) = f(x) +v

p
i=1
[g
i
(x)]
−1
in order to
ensure that the function g always stays larger than zero.
There are practically no limits for the ways in which a penalty for infeasibility can be
integrated into the objective functions. Several researchers suggest dynamic penalties which
incorporate the index of the current iteration of the optimizer [1458, 2072] or adaptive penal-
ties which additionally utilize population statistics [237, 1161, 2487]. Thorough discussions
on penalty functions have been contributed by Fiacco and McCormick [908] and Smith and
Coit [2526].
72 3 OPTIMA: WHAT DOES GOOD MEAN?
3.4.4 Constraints as Additional Objectives
Another idea for handling constraints would be to consider them as new objective functions.
If g(x) ≥ 0 must hold, for instance, we can transform this to a new objective function which
is subject to minimization.
f

(x) = −min ¦g(x) , 0¦ (3.27)
As long as g is larger than 0, the constraint is met. f

will be zero for all candidate solutions
which fulfill this requirement, since there is no use in maximizing g further than 0. However,
if g gets negative, the minimum term in f

will become negative as well. The minus in
front of it will turn it positive. The higher the absolute value of the negative g, the higher
will the value of f

become. Since it is subject to minimization, we thus apply a force into
the direction of meeting the constraint. No optimization pressure is applied as long as the
constraint is met. An equality constraint h can similarly be transformed to a function f
=
(x)
(again subject to minimization).
f
=
(x) = max ¦[h(x)[ , ǫ¦ (3.28)
f
=
(x) minimizes the absolute value of h. If h is continuous and real, it may arbitrarily closely
approach 0 but never actually reach it. Then, defining a small cut-off value ǫ (for instance
ǫ = 10
−6
) may help the optimizer to satisfy the constraint. An approach similar to the
idea of interpreting constraints as objectives is Deb’s Goal Programming method [736, 739].
Constraint violations and objective values are ranked together in a part of the MOEA by
Ray et al. [2273] [749].
3.4.5 The Method Of Inequalities
General inequality constraints can also be processed according to the Method of Inequalities
(MOI) introduced by Zakian [3050, 3052, 3054] in his seminal work on computer-aided
control systems design (CACSD) [2404, 2924, 3070]. In the MOI, an area of interest is
specified in form of a goal range [˘ r
i
,
˘
r
i
] for each objective function f
i
.
Pohlheim [2190] outlines how this approach can be applied to the set f of objective
functions and combined with Pareto optimization: Based on the inequalities, three categories
of candidate solutions can be defined and each element x ∈ X belongs to one of them:
1. It fulfills all of the goals, i. e.,
˘ r
i
≤ f
i
(x) ≤
˘
r
i
∀i ∈ 1.. [f [ (3.29)
2. It fulfills some (but not all) of the goals, i. e.,
(∃i ∈ 1.. [f [ : ˘ r
i
≤ f
i
(x) ≤
˘
r
i
) ∧ (∃j ∈ 1.. [f [ : (f
j
(x) < ˘ r
j
) ∨ (f
j
(x) >
˘
r
j
)) (3.30)
3. It fulfills none of the goals, i. e.,
(f
i
(x) < ˘ r
i
) ∨ (f
i
(x) >
˘
r
i
) ∀i ∈ 1.. [f [ (3.31)
Using these groups, a new comparison mechanism is created:
1. The candidate solutions which fulfill all goals are preferred instead of all other indi-
viduals that either fulfill some or no goals.
2. The candidate solutions which are not able to fulfill any of the goals succumb to those
which fulfill at least some goals.
3. Only the solutions that are in the same group are compared on basis on the Pareto
domination relation.
3.4. CONSTRAINT HANDLING 73
By doing so, the optimization process will be driven into the direction of the interesting
part of the Pareto frontier. Less effort will be spent on creating and evaluating individuals
in parts of the problem space that most probably do not contain any valid solution.
Example E3.21 (Example E3.6 Cont.: MOI).
In Figure 3.12, we apply the Pareto-based Method of Inequalities to the graphical Exam-
ple E3.6. We impose the same goal ranges on both objectives ˘ r
1
= ˘ r
2
and
˘
r
1
=
˘
r
2
. By
doing so, the second non-dominated region from the Pareto example Figure 3.10 suddenly
becomes infeasible, since f
1
rises over
˘
r
1
. Also, the greater part of the first optimal area
from this example is now infeasible because f
2
drops under ˘ r
2
. In the whole domain X of the
optimization problem, only the regions [x
1
, x
2
] and [x
3
, x
4
] fulfill all the target criteria. To
these elements, Pareto comparisons are applied. It turns out that the elements in [x
3
, x
4
]
dominate all the elements [x
1
, x
2
] since they provide higher values in f
1
for same values in f
2
.
If we scan through [x
3
, x
4
] from left to right, we see that f
1
rises while f
2
degenerates, which
is why the elements in this area cannot dominat each other and hence, are all optimal.
y
x
1
x
2
x
3
x
4
y=f (x)
1
y=f (x)
2
r ,r
1 2
r ,r
1 2
xÎX
1
˘ ˘
˘ ˘
Figure 3.12: Optimization using the Pareto-based Method of Inequalities approach (first
example).
Example E3.22 (Example E3.7 Cont.: MOI).
In Figure 3.13 we apply the Pareto-based Method of Inequalities to our second graphical
example (Example E3.7). For illustrating purposes, we define two different ranges of interest
[˘ r
3
,
˘
r
3
] and [˘ r
4
,
˘
r
4
] on f
3
and f
4
, as sketched in Fig. 3.13.a.
Like we did in the second example for Pareto optimization, we want to plot the quality of
the elements in the problem space. Therefore, we first assign a number c ∈ ¦1, 2, 3¦ to each
of its elements in Fig. 3.13.b. This number corresponds to the classes to which the elements
belong, i. e., 1 means that a candidate solution fulfills all inequalities, for an element of class
2, at least some of the constraints hold, and the elements in class 3 fail all requirements.
Based on this class division, we can then perform a modified Pareto counting where each
candidate solution dominates all the elements in higher classes Fig. 3.13.c. The result is that
multiple single optima x

1
, x

2
, x

3
, etc., and even a set of adjacent, non-dominated elements
X

9
occurs. These elements are, again, situated at the bottom of the illustrated landscape
whereas the worst candidate solutions reside on hill tops.
74 3 OPTIMA: WHAT DOES GOOD MEAN?
x
2 x
2
x
1
x
1
f
4
f
3
r
3
r
3
r
4
r
4
˘
˘
˘
˘
Fig. 3.13.a: The ranges applied to f
3
and f
4
.
x
2
x
1
M
O
I

c
l
a
s
s
1
2
3
Fig. 3.13.b: The Pareto-based Method of Inequalities
class division.
x
2
x
1
X
9
«
x
1
«
x
2
«
x
3
«
x
4
«
x
5
«
x
6
«
x
7
«
x
8
«
#dom
Fig. 3.13.c: The Pareto-based Method of Inequali-
ties ranking.
Figure 3.13: Optimization using the Pareto-based Method of Inequalities approach (first
example).
3.5. UNIFYING APPROACHES 75
A good overview on techniques for the Method of Inequalities is given by Whidborne et al.
[2924].
3.4.6 Constrained-Domination
The idea of constrained-domination is very similar to the Method of Inequalities but natively
applies to optimization problems as specified in Definition D3.17. Deb et al. [749] define
this principle which can easily be incorporated into the dominance comparison between any
two candidate solutions. Here, a candidate solutions x
1
∈ X is preferred in comparison to
an element x
2
∈ X [547, 749, 1297] if
1. x
1
is feasible while x
2
is not,
2. x
1
and x
2
both are infeasible but x
1
has a smaller overall constraint violation, or
3. x
1
and x
2
are both feasible but x
1
dominates x
2
.
3.4.7 Limitations and Other Methods
Other approaches for incorporating constraints into optimization are Goal Attainment [951,
2951] and Goal Programming
7
[530, 531]. Especially interesting in our context are methods
which have been integrated into Evolutionary Algorithms [736, 739, 2190, 2393, 2662], such
as the popular Goal Attainment approach by Fonseca and Fleming [951] which is similar to
the Pareto-MOI we have adopted from Pohlheim [2190]. Again, an overview on this subject
is given by Coello Coello et al. in [599].
3.5 Unifying Approaches
3.5.1 External Decision Maker
All approaches for defining what optima are and how constraints should be considered until
now were rather specific and bound to certain mathematical constructs. The more general
concept of an External Decision Maker which (or who) decides which candidate solutions
prevail has been introduced by Fonseca and Fleming [952, 954].
One of the ideas behind “externalizing” the assessment process on what is good and
what is bad is that Pareto optimization as well as, for instance, the Method of Inequalities,
impose rigid, immutable partial orders
8
on the candidate solutions. The first drawback of
such a partial order is that elements may exists which neither succeed nor precede each
other. As we have seen in the examples discussed in Section 3.3.5, there can, for instance,
be two candidate solutions x
1
, x
2
∈ X with neither x
1
⊣ x
2
nor x
2
⊣ x
1
. Since there can be
arbitrarily many such elements, the optimization process may run into a situation where it
set of currently discovered “best” candidate solutions X grows quickly, which leads to high
memory consumption and declining search speed.
The second drawback of only using rigid partial orders is that they make no statement
of the quality relations within X. There is no “best of the best”. We cannot decide which
of two candidate solutions not dominating each other is the most interesting one for further
7
http://en.wikipedia.org/wiki/Goal_programming [accessed 2007-07-03]
8
A definition of partial order relations is specified in Definition D51.16 on page 645.
76 3 OPTIMA: WHAT DOES GOOD MEAN?
investigation. Also, the plain Pareto relation does not allow us to dispose the “weakest”
element from X in order to save memory, since all elements in X are alike.
One example for introducing a quality measure additional to the Pareto relation is the
Pareto ranking which we already used in Example E3.19 and which we will discuss later
in Section 28.3.3 on page 275. Here, the utility of a candidate solution is defined by the
number of other candidate solutions it is dominated by. Into this number, additional den-
sity information which deems areas of X which have only sparsely been sampled as more
interesting than those which have already extensively been investigated.
While such ordering methods are a good default approaches able to direct the search
into the direction of the Pareto frontier and delivering a broad scan of it, they neglect the
fact that the user of the optimization most often is not interested in the whole optimal
set but has preferences, certain regions of interest [955]. This region would exclude, for
example, the infeasible (but Pareto optimal) programs for the Artificial Ant as discussed in
Section 3.3.5.1. What the user wants is a detailed scan of these areas, which often cannot
be delivered by pure Pareto optimization.
utility/cost
results
apriori
knowledge
objectivevalues
(acquiredknowledge)
EA
(anoptimizer)
DM
(decisionmaker)
Figure 3.14: An external decision maker providing an Evolutionary Algorithm with utility
values.
Here comes the External Decision Maker as an expression of the user’s preferences [949]
into play, as illustrated in Figure 3.14. The task of this decision maker is to provide a cost
function u : Rn → R (or utility function, if the underlying optimizer is maximizing) which
maps the space of objective values (R
n
) to the real numbers R. Since there is a total order
defined on the real numbers, this process is another way of resolving the “incomparability-
situation”. The structure of the decision making process u can freely be defined and may
incorporate any of the previously mentioned methods. u could, for example, be reduced
to compute a weighted sum of the objective values, to perform an implicit Pareto ranking,
or to compare individuals based on pre-specified goal-vectors. Furthermore, it may even
incorporate forms of artificial intelligence, other forms of multi-criterion Decision Making,
and even interaction with the user (smilar to Example E2.6 on page 48, for instance). This
technique allows focusing the search onto solutions which are not only good in the Pareto
sense, but also feasible and interesting from the viewpoint of the user.
Fonseca and Fleming make a clear distinction between fitness and cost values. Cost values
have some meaning outside the optimization process and are based on user preferences.
Fitness values on the other hand are an internal construct of the search with no meaning
outside the optimizer (see Definition D5.1 on page 94 for more details). If External Decision
Makers are applied in Evolutionary Algorithms or other search paradigms based on fitness
measures, these will be computed using the values of the cost function instead of the objective
functions [949, 950, 956].
3.5.2 Prevalence Optimization
We have now discussed various ways to define what optima in terms of multi-objective
optimization are and to steer the search process into their direction. Let us subsume all
3.5. UNIFYING APPROACHES 77
of them in general approach. From the concept of Pareto optimization to the Method of
Inequalities, the need to compare elements of the problem space in terms of their quality
as solution for a given problem winds like a read thread through this matter. Even the
weighted sum approach and the External Decision Maker do nothing else than mapping
multi-dimensional vectors to the real numbers in order to make them comparable.
If we compare two candidate solutions x
1
und x
2
, either x
1
is better than x
2
, vice versa,
or the result is undecidable. In the latter case, we assume that both are of equal quality.
Hence, there are three possible relations between two elements of the problem space. These
two results can be expressed with a comparator function cmp
f
.
Definition D3.18 (Comparator Function). A comparator function cmp : A
2
→ R
maps all pairs of elements (a
1
, a
2
) ∈ A
2
to the real numbers R according to a (strict)
partial order
9
R:
R(a
1
, a
2
) ⇔ (cmp(a
1
, a
2
) < 0) ∧ (cmp(a
2
, a
1
) > 0) ∀a
1
, a
2
∈ A (3.32)
R(a
1
, a
2
) ∧ R(a
2
, a
1
) ⇔ cmp(a
1
, a
2
) = cmp(a
2
, a
1
) = 0 ∀a
1
, a
2
∈ A (3.33)
R(a
1
, a
2
) ⇒ R(a
1
, a
2
) ∀a
1
, a
2
∈ A (3.34)
cmp(a, a) = 0∀a ∈ A (3.35)
We provide an Java interface for the functionality of a comparator functions in Listing 55.15
on page 719. From the three defining equations, many features of “cmp” can be deduced.
It is, for instance, transitive, i. e., cmp(a
1
, a
2
) < 0 ∧ cmp(a
2
, a
3
) < 0 ⇒ cmp(a
1
, a
3
)) < 0.
Provided with the knowledge of the objective functions f ∈ f , such a comparator function
cmp
f
can be imposed on the problem spaces of our optimization problems:
Definition D3.19 (Prevalence Comparator Function). A prevalence comparator func-
tion cmp
f
: X
2
→R maps all pairs (x
1
, x
2
) ∈ X
2
of candidate solutions to the real numbers
R according to Definition D3.18.
The subscript f in cmp
f
illustrates that the comparator has access to all the values of the
objective functions in addition to the problem space elements which are its parameters. As
shortcut for this comparator function, we introduce the prevalence notation as follows:
Definition D3.20 (Prevalence). An element x
1
prevails over an element x
2
(x
1
≺ x
2
) if
the application-dependent prevalence comparator function cmp
f
(x
1
, x
2
) ∈ R returns a value
less than 0.
x
1
≺ x
2
⇔ cmp
f
(x
1
, x
2
) < 0 ∀x
1
, x
2
, ∈ X (3.36)
(x
1
≺ x
2
) ∧ (x
2
≺ x
3
) ⇒ x
1
≺ x
3
∀x
1
, x
2
, x
3
∈ X (3.37)
It is easy to see that we can fit lexicographic orderings, Pareto domination relations, the
Method of Inequalities-based comparisons, as well as the weighted sum combination of ob-
jective values easily into this notation. Together with the fitness assignment strategies for
Evolutionary Algorithms which will be introduced later in this book (see Section 28.3 on
page 274), it covers many of the most sophisticated multi-objective techniques that are pro-
posed, for instance, in [952, 1531, 2662]. By replacing the Pareto approach with prevalence
comparisons, all the optimization algorithms (especially many of the evolutionary tech-
niques) relying on domination relations can be used in their original form while offering the
additional ability of scanning special regions of interests of the optimal frontier.
9
Partial orders are introduced in Definition D51.15 on page 645.
78 3 OPTIMA: WHAT DOES GOOD MEAN?
Since the comparator function cmp
f
and the prevalence relation impose a partial order
on the problem space X like the domination relation does, we can construct the optimal set
X

containing globally optimal elements x

in a way very similar to Equation 3.23:
x

∈ X

⇔ ∃x ∈ X : x ,= x

∧ x ≺ x

(3.38)
For illustration purposes, we will exercise the prevalence approach on the examples of lexico-
graphic optimality (see Section 3.3.2) and define the comparator function cmp
f ,lex
. For the
weighted sum method with the weights w
i
(see Section 3.3.3) we similarly provide cmp
f ,ws
and the domination-based Pareto optimization with the objective directions ω
i
as defined
in Section 3.3.5 can be realized with cmp
f ,⊣
.
cmp
f ,lex
(x
1
, x
2
) =
_
¸
¸
¸
¸
_
¸
¸
¸
¸
_
−1 if ∃π ∈ Π(1.. [f [) : (x
1
<
lex,π
x
2
) ∧
∃π ∈ Π(1.. [f [) : (x
2
<
lex,π
x
1
)
1 if ∃π ∈ Π(1.. [f [) : (x
2
<
lex,π
x
1
) ∧
∃π ∈ Π(1.. [f [) : (x
1
<
lex,π
x
2
)
0 otherwise
(3.39)
cmp
f ,ws
(x
1
, x
2
) =
_
_
|f |

i=1
(w
i
f
i
(x
1
) −w
i
f
i
(x
2
))
_
_
≡ ws(x
1
) −ws(x
2
) (3.40)
cmp
f ,⊣
(x
1
, x
2
) =
_
_
_
−1 if x
1
⊣ x
2
1 if x
2
⊣ x
1
0 otherwise
(3.41)
Example E3.23 (Example E3.5 – Artificial Ant – Cont.).
With the prevalence comparator, we can also easily solve the problem stated in Sec-
tion 3.3.5.1 by no longer encouraging the evolution of useless programs for Artificial Ants
while retaining the benefits of Pareto optimization. The comparator function can simply
be defined in a way that infeasible candidate solutions will always be prevailed by useful
programs. It therefore may incorporate the knowledge on the importance of the objective
functions. Let f
1
be the objective function with an output proportional to the food piled, f
2
denote the distance covered in order to find the food, and f
3
be the program length. Equa-
tion 3.42 demonstrates one possible comparator function for the Artificial Ant problem.
cmp
f ,ant
(x
1
, x
2
) =
_
¸
¸
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
¸
¸
_
−1 if (f
1
(x
1
) > 0 ∧ f
1
(x
2
) = 0) ∨
(f
2
(x
1
) > 0 ∧ f
2
(x
2
) = 0) ∨
(f
3
(x
1
) > 0 ∧ f
1
(x
2
) = 0)
1 if (f
1
(x
2
) > 0 ∧ f
1
(x
1
) = 0) ∨
(f
2
(x
2
) > 0 ∧ f
2
(x
1
) = 0) ∨
(f
3
(x
2
) > 0 ∧ f
1
(x
1
) = 0)
cmp
f ,⊣
(x
1
, x
2
) otherwise
(3.42)
Later in this book, we will discuss some of the most popular optimization strategies. Al-
though they are usually implemented based on Pareto optimization, we will always introduce
them using prevalence.
3.5. UNIFYING APPROACHES 79
Tasks T3
13. In Example E1.2 on page 26, we stated the layout of circuits as a combinatorial opti-
mization problem. Name at least three criteria which could be subject to optimization
or constraints in this scenario.
[6 points]
14. The job shop scheduling problem was outlined in Example E1.3 on page 26. The most
basic goal when solving job shop problems is to meet all production deadlines. Would
formulate this goal as a constraint or rather as an objective function? Why?
[4 points]
15. Let us consider the university life of a student as optimization problem with the goal
of obtaining good grades in all the subjects she attends. How could be formulate a
(a) single objective function for that purpose, or
(b) a set of objectives which can be optimized with, for example, the Pareto approach.
Additionally, phrase at least two constraints in daily life which, for example, prevent
the student from learning 24 hours per day.
[10 points]
16. Consider a bi-objective optimization problem with X = R as problem space and the two
objective functions f
1
(x) = e
|x−3|
and f
2
(x) = [x −100[ both subject to minimization.
What is your opinion about the viability of solving this problem with the weighted
sum approach with both weights set to 1?
[5 points]
17. If we perform Pareto optimization of n = [f [ objective functions f
i
: i ∈ 1..n where
each of the objectives can take on m different values, there can be at most m
n−1
non-dominated candidate solutions at a time. Sketch configurations of m
n−1
non-
dominated candidate solutions in diagrams similar to Figure 28.4 on page 276 for
n ∈ ¦1, 2¦ and m ∈ ¦1, 2, 3, 4¦. Outline why the sentence also holds for n > 2.
[10 points]
18. In your opinion, is it good if most of the candidate solutions under investigation
by a Pareto-based optimization algorithm are non-dominated? For answering this
question, consider an optimization process that decides which individual should be
used for further investigation solely on basis of its objective values and, hence, the
Pareto dominance relation. Consider the extreme case that all points known to such
an algorithm are non-dominated to answer the question.
Based on your answer, make a statement about the expected impact of the fact dis-
cussed in Task 17 on the quality of solutions that a Pareto-based optimizer can deliver
for rising dimensionality of the objective space, i. e., for increasing values of n = [f [.
[10 points]
19. Use Equation 3.1 to Equation 3.5 from page 52 to determine all extrema (optima) for
(a) f(x) = 3x
5
+ 2x
2
−x,
(b) g(x) = 2x
2
+x, and
(c) h(x) = sin x.
[15 points]
Chapter 4
Search Space and Operators: How can we find it?
4.1 The Search Space
In Section 2.2, we gave various examples for optimization problems, such as from simple
real-valued problems (Example E2.1), antenna design (Example E2.3), and tasks in aircraft
development (Example E2.5). Obviously, the problem spaces X of these problems are quite
different. A solution of the stone’s throw task, for instance, was a real number of meters per
second. An antenna can be described by its blueprint which is structured very differently
from the blueprints of aircraft wings or rotors.
Optimization algorithms usually start out with a set of either predefined or automatically
generated elements from X which are far from being optimal (otherwise, optimization would
make not too much sense). From these initial solutions, new elements from X are explored
in the hope to improve the solution quality. In order to find these new candidate solutions,
search operators are used which modify or combine the elements already known.
Since the problem spaces of the three aforementioned examples are different, this would
mean that different sets of search operators are required, too. It would, however, be cumber-
some to develop search operations time and again for each new problem space we encounter.
Such an approach would not only be error-prone, it would also make it very hard to formulate
general laws and to consolidate findings.
Instead, we often reuse well-known search spaces for many different problems. After we
the problem space is defined, we will always look for ways to map it to such a search space
for which search operators are already available. This is not always possible, but in most of
the problems we will encounter, it will save a lot of effort.
Example E4.1 (Search Spaces for Example E2.1, E2.3, and E2.5).
The stone’s throw as well as many antenna and wing blueprints, for instance, can all be
represented as vectors of real numbers. For the stone’s throw, these vectors would have
length one (Fig. 4.1.a). An antenna consisting of n segments could be encoded as vector of
length 3n, where the element at index i specifies a length and i + 1 and i + 2 are angles
denoting into which direction the segment is bent as sketched in Fig. 4.1.b. The aircraft
wing could be represent as vector of length 3n, too, where the element at index i is the
segment width and i + 1 and i + 2 denote its disposition and height (Fig. 4.1.c).
82 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
x=g=(g )
1
Fig. 4.1.a: Stone’s throw.
g
1
g
4
g
7
g
5
g
8
g
0
g
3
g
6
g
2
g=
g
g
g
g
0
1
2
...
()
Fig. 4.1.b: Antenna design.
g
0
g
3
g
6
g
9
g
1
g
4 g
7
g
10 g
13
g
14
g
11
g
8
g
5
g
2
g=
g
g
g
g
0
1
2
...
()
Fig. 4.1.c: Wing design.
Figure 4.1: Examples for real-coding candidate solutions.
Definition D4.1 (Search Space). The search space G (genome) of an optimization prob-
lem is the set of all elements g which can be processed by the search operations.
In dependence on Genetic Algorithms, we often refer to the search space synonymously as
genome
1
. Genetic Algorithms (see Chapter 29) are optimization algorithms inspired by the
biological evolution. The term genome was coined by the German biologist Winkler [2958]
as a portmanteau of the words gene and chromosome [1701].
2
In biology, the genome is
the whole hereditary information of organisms and the genotype is represents the heredity
information of an individual. Since in the context of optimization we refer to the search
space as genome, we call its elements genotypes.
Definition D4.2 (Genotype). The elements g ∈ G of the search space G of a given
optimization problem are called the genotypes.
As said, a biological genotype encodes the complete heredity information of an individual.
Likewise, a genotype in optimization encodes a complete candidate solution. If G ,= X, we
distinguish genotypes g ∈ G and phenotypes x ∈ X. In Example E4.1, the genomes all are
the vectors of real numbers but the problem spaces are either a velocity, an antenna design,
or a blueprint of a wing.
The elements of the search space rarely are unstructured aggregations. Instead, they
often consist of distinguishable parts, hierarchical units, or well-typed data structures. The
same goes for the chromosomes in biology. They consist of genes
3
, segments of nucleic acid,
which contain the information necessary to produce RNA strings in a controlled manner and
encode phenotypical or metabolical properties. A fish, for instance, may have a gene for the
color of its scales. This gene could have two possible “values” called alleles
4
, determining
whether the scales will be brown or gray. The Genetic Algorithm community has adopted
this notation long ago and we use it for specifying the detailed structure of the genotypes.
Definition D4.3 (Gene). The distinguishable units of information in genotypes g ∈ G
which encode properties of the candidate solutions x ∈ X are called genes.
1
http://en.wikipedia.org/wiki/Genome [accessed 2007-07-15]
2
The words gene, genotype, and phenotype have, in turn, been introduced [2957] by the Danish biologist
Johannsen [1448].
3
http://en.wikipedia.org/wiki/Gene [accessed 2007-07-03]
4
http://en.wikipedia.org/wiki/Allele [accessed 2007-07-03]
4.2. THE SEARCH OPERATIONS 83
Definition D4.4 (Allele). An allele is a value of specific gene.
Definition D4.5 (Locus). The locus
5
is the position where a specific gene can be found
in a genotype [634].
Example E4.2 (Small Binary Genome).
Genetic Algorithms are a widely-researched optimization approach which, in its original
form, utilizes bit strings as genomes. Assume that we wished to apply such a plain GA to
the minimization of a function f : (0..3 0..3) → R such as the trivial f(x = (x[0], x[1])) =
x[0] −x[1] : x[0] ∈ X
0
, x[1] ∈ X
1
.
genotypeg G Î
0 1 1 1
0 0 0 0
0 0 0 1
0 0 1 0
1 1 0 1
1 1 1 0
1 1 1 1
...
genome G
2
1
3
0
1 2 3
phenotypex X Î
x=gpm(g)
searchOp
2
1
3
0
1 2 3
phenome X=X X
0 1
´
g G Î
x X Î
(searchspace) (problemspace) (solutioncandidate)
allele,,11``
atlocus1
2 Gene
nd
allele,,01``
atlocus0
1 Gene
st
X
0
X
0
X
1
X
1
Figure 4.2: The relation of genome, genes, and the problem space.
Figure 4.2 illustrates how this can be achieved by using two bits for each, x[0] and x[1].
The first gene resides at locus 0 and encodes x[0] whereas the two bits at locus 1 stand for
x[1]. Four bits suffice for encoding two natural numbers in X
1
= X
2
= 1..3 and thus, for the
whole candidate solution (phenotype) x. With this trivial encoding, the Genetic Algorithm
can be applied to the problem and will quickly find the solution x = (0, 3)
4.2 The Search Operations
Definition D4.6 (Search Operation). A search operation “searchOp” takes a fixed
number n ∈ N
0
of elements g
1
..g
n
from the search space G as parameter and returns another
element from the search space.
searchOp : G
n
→G (4.1)
Search operations are used an optimization algorithm in order to explore the search space.
n is an operator-specific value, its arity
6
. An operation with n = 0, i. e., a nullary operator,
receives no parameters and may, for instance, create randomly configured genotypes (see,
for instance, Listing 55.2 on page 708). An operator with n = 1 could randomly modify
5
http://en.wikipedia.org/wiki/Locus_%28genetics%29 [accessed 2007-07-03]
6
http://en.wikipedia.org/wiki/Arity [accessed 2008-02-15]
84 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
its parameter – the mutation operator in Evolutionary Algorithms is an example for this
behavior.
The key idea of all search operations which take at least one parameter (n > 0) (see
Listing 55.3) is that
1. it is possible to take a genotype g of a candidate solution x = gpm(g) with good
objective values f (x) and to
2. modify this genotype g to get a new one g

= searchOp(g, . . .) which
3. can be translated to a phenotype x

= gpm(g

) with similar or maybe even better
objective values f (x

).
This assumption is one of the most basic concepts of iterative optimization algorithms.
It is known as causality or locality and discussed in-depth in Section 14.2 on page 161
and Section 24.9 on page 218. If it does not hold then applying a small change to a genotype
leads to large changes in its utility. Instead modifying a genotype, we then could as well
create a new, random one directly.
Many operations which have n > 1 try to combine the genotypes they receive in the
input. This is done in the hope that positive characteristics of them can be joined into an
even better offspring genotype. An example for such operations is the crossover operator in
Genetic Algorithms. Differential Evolution utilizes an operator with n = 3, i. e., which takes
three arguments. Here, the idea often is that good can step by step be accumulated and
united by an optimization process. This Building Block Hypothesis is discussed in detail
in Section 29.5.5 on page 346, where also criticism to this point of view is provided.
Search operations often are randomized, i. e., they are not functions in the mathematical
sense. Instead, the return values of two successive invocations of one search operation with
exactly the same parameters may be different. Since optimization processes quite often use
more than a single search operation at a time, we define the set Op.
Definition D4.7 (Search Operation Set). Op is the set of search operations searchOp ∈
Op which are available to an optimization process. Op(g
1
, g
2
, .., g
n
) denotes the application
of any n-ary search operation in Op to the to the genotypes g
1
, g
2
, ∈ G.
∀g
1
, g
2
, .., g
n
∈ G ⇒
_
Op(g
1
, g
2
, .., g
n
) ∈ G if ∃ n-ary operation in Op
Op(g
1
, g
2
, .., g
n
) = ∅ otherwise
(4.2)
It should be noted that, if the n-ary search operation from Op which was applied is a random-
ized operator, obviously also the behavior of Op(. . .) is randomized as well. Furthermore,
if more than one n-ary operation is contained in Op, the optimization algorithm using Op
usually performs some form of decision on which operation to apply. This decision may
either be random or follow a specific schedule and the decision making process may even
change during the course of the optimization process. Writing Op(g) in order to denote the
application of a unary search operation from Op to the genotype g ∈ G is thus a simplified
expression which encompasses both, possible randomness in the search operations and in
the decision process on which operation to apply.
With Op
k
(g
1
, g
2
, . . .) we denote k successive applications of (possibly different) search
operators (with respect to their arities). The k
th
application receives all possible results of
the k-1
th
application of the search operators as parameter, i. e., is defined recursively as
Op
k
(g
1
, g
2
, . . .) =
_
∀G∈Π(P(Op
k−1
(g
1
,g
2
,...)))
Op(G) (4.3)
where Op
0
(g
1
, g
2
, . . .) = ¦g
1
, g
2
, . . . ¦, T(G) is the power set of a set G, and Π(A) is the set
of all possible permutations of the elements of G.
If the parameter g is left away, i. e., just Op
k
is written, this chain has to start with
a search operation with zero arity. In the style of Badea and Stanciu [177] and Skubch
[2516, 2517], we now can define:
4.3. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 85
Definition D4.8 (Completeness). A set Op of search operations “searchOp” is complete
if and only if every point g
1
in the search space G can be reached from every other point
g
2
∈ G by applying only operations searchOp ∈ Op.
∀g
1
, g
2
∈ G ⇒∃k ∈ N
1
: g
1
∈ Op
k
(g
2
) (4.4)
Definition D4.9 (Weak Completeness). A set Op of search operations “searchOp” is
weakly complete if every point g in the search space G can be reached by applying only
operations searchOp ∈ Op.
∀g ∈ G ⇒∃k ∈ N
1
: g ∈ Op
k
(4.5)
A weakly complete set of search operations usually contains at least one operation without
parameters, i. e., of zero arity. If at least one such operation exists and this operation can
potentially return all possible genotypes, then the set of search operations is already weakly
complete. However, nullary operations are often only used in the beginning of the search
process (see, for instance, the Hill Climbing Algorithm 26.1 on page 230). If Op achieved its
weak completeness because of a nullary search operation, the initialization of the algorithm
would determine which solutions can be found and which not. If the set of search operations
is strongly complete, the search process, in theory, can find any solution regardless of the
initialization.
From the opposite point of view, this means that if the set of search operations is not
complete, there are elements in the search space which cannot be reached from a random
starting point. If it is not even weakly complete, parts of the search space can never be
discovered. Then, the optimization process is probably not able to explore the problem
space adequately and may not find optimal or satisfyingly good solution.
Definition D4.10 (Adjacency (Search Space)). A point g
2
is ε-adjacent to a point g
1
in the search space G if it can be reached by applying a single search operation “searchOp”
to g1 with a probability of more than ε ∈ [0, 1). Notice that the adjacency relation is not
necessarily symmetric.
adjacent
ε
(g
2
, g
1
) ⇔P (Op(g
1
) = g
2
) > ε (4.6)
The probability parameter ε is defined here in order to allow us to differentiate between
probable and improbable transitions in the search space. Normally, we will use the adjacency
relation with ε = 0 and abbreviate the notation with adjacent(g
2
, g
1
) ≡ adjacent
0
(g
2
, g
1
).
In the case of Genetic Algorithms with, for instance, single-bit flip mutation operators
(see Section 29.3.2.1 on page 330), adjacent(g
2
, g
1
) would be true for all bit strings g
2
which have a Hamming distance of no more than one from g
1
, i. e., which differ by at
most one bit from g
1
. However, if we assume a real-valued search space G = R and a
mutation operator which adds a normally distributed random value to a genotype, then
adjacent(g
2
, g
1
) would always be true, regardless of their distance. The adjacency relation
with ε = 0 has thus no meaning in this context. If we set ε to a very small number such
as 1 10
4
, we can get a meaningful impression on how likely it is to discover a point g
2
when starting from g
1
. Furthermore, this definition allows us to reason about limits such as
lim
ε→0
adjacent
ε
(g
2
, g
1
).
4.3 The Connection between Search and Problem Space
If the search space differs from the problem space, a translation between them is furthermore
required. In Example E4.2, we would need to transform the binary strings processed by the
Genetic Algorithm to tuples of natural numbers which can be processed by the objective
functions.
86 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
Definition D4.11 (Genotype-Phenotype Mapping). The genotype-phenotype map-
ping (GPM, or ontogenic mapping [2134]) gpm : G → X is a left-total
7
binary relation
which maps the elements of the search space G to elements in the problem space X. In
Section 28.2, you can find a thorough discussion of different types of genotype-phenotype
mappings used in Evolutionary Algorithms.
∀g ∈ G ∃x ∈ X : gpm(g) = x (4.7)
In Listing 55.7 on page 711, you can find a Java interface which resembles Equation 4.7. The
only hard criterion we impose on genotype-phenotype mappings in this book is left-totality,
i. e., that they map each element of the search space to at least one candidate solution. The
GPM may be a functional relation if it is deterministic. Nevertheless, it is possible to create
mappings which involve random numbers and hence, cannot be considered to be functions
in the mathematical sense of Section 51.10 on page 646. In this case, we would rewrite
Equation 4.7 as Equation 4.8:
∀g ∈ G ∃x ∈ X : P (gpm(g) = x) > 0 (4.8)
Genotype-phenotype mappings should further be surjective [2247], i. e., relate at least one
genotype to each element of the problem space. Otherwise, some candidate solutions can
never be found and evaluated by the optimization algorithm even if the search operations
are complete. Then, there is no guarantee whether the solution of a given problem can
be discovered or not. If a genotype-phenotype mapping is injective, which means that it
assigns distinct phenotypes to distinct elements of the search space, we say that it is free from
redundancy. There are different forms of redundancy, some are considered to be harmful for
the optimization process, others have positive influence
8
. Most often, GPMs are not bijective
(since they are neither necessarily injective nor surjective). Nevertheless, if a genotype-
phenotype mapping is bijective, we can construct an inverse mapping gpm
−1
: X →G.
gpm
−1
(x) = g ⇔gpm(g) = x ∀x ∈ X, g ∈ G (4.9)
Based on the genotype-phenotype mapping and the definition of adjacency in the search
space given earlier in this section, we can also define a neighboring relation for the problem
space, which, of course, is also not necessarily symmetric.
Definition D4.12 (Neighboring (Problem Space)). Under the condition that the can-
didate solution x
1
∈ X was obtained from the element g
1
∈ G by applying the genotype-
phenotype mapping, i. e., x
1
= gpm(g
1
), a point x
2
is ε-neighboring to x
1
if it can be reached
by applying a single search operation and a subsequent genotype-phenotype mapping to g
1
with at least a probability of ε ∈ [0, 1). Equation 4.10 provides a general definition of the
neighboring relation. If the genotype-phenotype mapping is deterministic (which usually is
the case), this definition boils down to Equation 4.11.
neighboring
ε
(x
2
, x
1
) ⇔
_
_

∀g
2
:adjacent
0
(g
2
,g
1
)
P (Op(g
1
) = g
2
) P
_
x
2
= gpm(g
2
)[
Op(g
1
)=g
2
_
_
_
> ε
¸
¸
¸
¸
¸
¸
x
1
=gpm(g
1
)
(4.10)
neighboring
ε
(x
2
, x
1
) ⇔
_
_
_
_
_
_
_

∀g
2
: adjacent
0
(g
2
, g
1
) ∧
x
2
= gpm(g
2
)
P (Op(g
1
) = g
2
)
_
_
_
_
_
_
_
> ε
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
x
1
=gpm(g
1
)
(4.11)
7
See Equation 51.29 on page 644 to Point 5 for an outline of the properties of binary relations.
8
See Section 16.5 on page 174 for more information.
4.3. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 87
The condition x
1
= gpm(g
1
) is relevant if the genotype-phenotype mapping “gpm” is not
injective (see Equation 51.31 on page 644), i. e., if more than one genotype can map to
the same phenotype. Then, the set of ε-neighbors of a candidate solution x
1
depends
on how it was discovered, on the genotype g1 it was obtained from, as can be seen in
Example E4.3. Example E5.1 on page 96 shows an account on how the search operations and,
thus, the definition of neighborhood, influence the performance of an optimization algorithm.
Example E4.3 (Neighborhoods under Non-Injective GPMs).
As an example for neighborhoods under non-injective genotype-phenotype mappings, let
111
011
001
100 101
1 2
110
010
GPM
X
G
ProblemSpace:
Thenaturalnumbers0,1,and2
= 0,1,2 { }
X
X
Genotype-PhenotypeMapping:gpm
gpm(g)=(g +g )
*
g
1 2 3
SearchSpace:
Thebitstringsoflength3
= = 0,1
3 3
{ }
G
G B
0
(g g ,g
1 2 3
, )=g
000
g
A
g
B
Figure 4.3: Neighborhoods under non-injective genotype-phenotype mappings.
us imagine that we have a very simple optimization problem where the problem space
X only consists of the three numbers 0, 1, and 2, as sketched in Figure 4.3. Imagine
further that we would use the bit strings of length three as search space, i. e., G = B
3
=
¦0, 1¦
3
. As genotype-phenotype mapping gpm : G → X, we would apply the function
x = gpm(g) = (g
1
+ g
2
) ∗ g
3
. The two genotypes g
A
= 010 and g
B
= 110 both map to 0
since (0 + 1) ∗ 0 = (1 + 1) ∗ 0 = 0.
If the only search operation available would be a single-bit flip mutation, then all geno-
types which have a Hamming distance of 1 are adjacent in the search space. This means
that g
A
= 010 is adjacent to 110, 000, and 011 while g
B
= 110 is adjacent to 010, 100, and
111.
The candidate solution x = 0 is neighboring to 0 and 1 if it was obtained by applying the
genotype-phenotype mapping to g
A
= 010. With one search step, gpm(110) = gpm(000) =
0 and gpm(011) = 1 can be reached. However, if x = 0 was the result of by applying the
genotype-phenotype mapping to g
B
= 110, its neighborhood is ¦0, 2¦, since gpm(010) =
gpm(100) = 0 and gpm(111) = 2.
Hence, the neighborhood relation under non-injective genotype-phenotype mappings de-
pends on the genotypes from which the phenotypes under investigation originated. For
injective genotype-phenotype mappings, this is not the case.
If the search space G and the problem space X are the same (G = X) and the genotype-
phenotype mapping is the identity mapping, it follows that g
2
= x
2
and the sum in Equa-
tion 4.11 breaks down to P (Op(g
1
) = g
2
). Then the neighboring relation (defined on the
problem space) equals the adjacency relation (defined on the search space), as can be seen
88 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
when taking a look at Equation 4.6. Since the identity mapping is injective, the condition
x
1
= gpm(g
1
) is always true, too.
(G = X) ∧ (gpm(g) = g ∀g ∈ G) ⇔neighboring ≡ adjacent (4.12)
In the following, we will use neighboring
ε
to denote a ε-neighborhood relation with implicitly
assuming the dependencies on the search space or that the GPM is injective. Furthermore,
with neighboring(x
1
, x
2
), we denote that x
1
and x
2
are ε = 0-neighbors.
4.4 Local Optima
We now have the means to define the term local optimum more precisely. From Exam-
ple E4.4, it becomes clear that such a definition is not as trivial as the definitions of global
optima given in Chapter 3.
Example E4.4 (Types of Local Optima).
If we assume minimization, the global optimum x

of the objective function given can easily
f(x)
x X Î
X
«
l
x
«
X
N
Figure 4.4: Minima of a single objective function.
be found. Obviously, every global optimum is also a local optimum. In the case of single-
objective minimization, it is also a local minimum, and a global minimum.
In order to define what a local minimum is, as a first step, we could say “A (local)
minimum ˇ x
l
∈ X of one (objective) function f : X →R is an input element with f(ˇ x
l
) < f(x)
for all x neighboring to ˇ x
l
.” However, such a definition would disregard locally optimal
plateaus, such as the set X

l
of candidate solutions sketched in Figure 4.4.
If we simply exchange the “<” in f(ˇ x
l
) < f(x) with a “≤” and obtain f(ˇ x
l
) ≤ f(x) however,
we would classify the elements in X
N
as a local optima as well. This would contradict the
intuition, since they are clearly worse than the elements on the right side of X
N
. Hence, a
more fine-grained definition is necessary.
Similar to Handl et al. [1178], let us start with creating a definition of areas in the search
space based on the specification of ε-neighboring (Definition D4.12).
Definition D4.13 (Connection Set). A set X ⊆ X is ε-connected if all of its elements
can be reached from each other with consecutive search steps, each of which having at least
probability ε ∈ [0, 1), i. e., if Equation 4.13 holds.
x
1
,= x
2
⇒ neighboring
ε
(x
1
, x
2
) ∀x
1
, x
2
∈ X (4.13)
4.4. LOCAL OPTIMA 89
4.4.1 Local Optima of single-objective Problems
Definition D4.14 (ε-Local Minimal Set). A ε-local minimal set
ˇ
X
l
⊆ X under objec-
tive function f and problem space X is a connected set for which Equation 4.14 holds.
neighboring
ε
(x, ˇ x
l
) ⇒
_
(x ∈
ˇ
X
l
) ∧ (f(x) = f(ˇ x
l
))
¸

_
(x ,∈
ˇ
X
l
) ∧ (f(ˇ x
l
) < f(x))
¸
∀ˇ x
l

ˇ
X
l
, x ∈ X
(4.14)
An ε-local minimum ˇ x
l
is an element of an ε-local minimal set. We will use the term local
minimum set and local minimum to denote scenarios where ε = 0.
Definition D4.15 (ε-Local Maximal Set). A ε-local maximal set
ˆ
X
l
⊆ X under objec-
tive function f and problem space X is a connected set for which Equation 4.15 holds.
neighboring
ε
(x, ˆ x
l
) ⇒
_
(x ∈
ˆ
X
l
) ∧ (f(x) = f(ˆ x
l
))
_

_
(x ,∈
ˆ
X
l
) ∧ (f(ˆ x
l
) > f(x))
_
∀ˆ x
l

ˆ
X
l
, x ∈ X
(4.15)
Again, an ε-local maximum ˆ x
l
is an element of an ε-local maximal set and we will use the
term local maximum set and local maximum to indicate scenarios where ε = 0.
Definition D4.16 (ε-Local Optimal Set (Single-Objective Optimization)). Aε-local
optimal set X

l
⊆ X is either a ε-local minimum set or a ε-local maximum set, depending
on the definition of the optimization problem.
Similar to the case of local maxima and minima, we will again use the terms ε-local optimum,
local optimum set, and local optimum.
4.4.2 Local Optima of Multi-Objective Problems
As basis for the definition of local optima in multi-objective problems we use the prevalence
relations and the concepts provided by Handl et al. [1178]. In Section 3.5.2, we showed
that these encompass the other multi-objective concepts as well. Therefore, in the following
definition, “prevails” can as well be read as “dominates”, which is probably the most common
use.
Definition D4.17 (ε-Local Optimal Set (Multi-Objective Optimization)). Aε-local
optimal set X

l
⊆ X under the prevalence operator “≺” and problem space X is a connected
set for which Equation 4.16 holds.
neighboring
ε
(x, x

l
) ⇒ [(x ∈ X

l
) ∧ (x ≺ x

l
)] ∨
[(x ,∈ X

l
) ∧ (x

l
≺ x)] ∀x

l
∈ X

l
, x ∈ X
(4.16)
Again, we use the term ε-local optimum to denote candidate solutions which are part of
ε-local optimal sets. If ε is omitted in the name, ε = 0 is assumed. See, for example, Exam-
ple E13.4 on page 158 for an example of local optima in multi-objective problems.
90 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
4.5 Further Definitions
In order to ease the discussions of different Global Optimization algorithms, we furthermore
define the data structure individual, which is the unison of the genotype and phenotype of
an individual.
Definition D4.18 (Individual). An individual p is a tuple p = (p.g, p.x) ∈ G X of an
element p.g in the search space G and the corresponding element p.x = gpm(p.g) in the
problem space X.
Besides this basic individual structure, many practical realizations of optimization algo-
rithms use extended variants of such a data structure to store additional information like
the objective and fitness values. Then, we will consider individuals as tuples in GXW,
where W is the space of the additional information – W = R
n
R, for instance. Such en-
dogenous information [302] often represents local optimization strategy parameters. It then
usually coexists with global (exogenous) configuration settings. In the algorithm definitions
later in this book, we will often access the phenotypes p.x without explicitly referring to
the genotype-phenotype mapping “gpm”, since “gpm” is already implicitly included in the
relation of p.x and p.g according to Definition D4.18. Again relating to the notation used
in Genetic Algorithms, we will refer to lists of such individual records as populations.
Definition D4.19 (Population). A population pop is a list consisting of ps individuals
which are jointly under investigation during an optimization process.
pop ∈ (GX)
ps
⇒ (∀p = (p.g, p.x) ∈ pop ⇒ p.x = gpm(p.g)) (4.17)
4.5. FURTHER DEFINITIONS 91
Tasks T4
20. Assume that we are car salesmen and search for an optimal configuration for a car.
We have a search space X = X
1
X
2
X
3
X
4
where
• X
1
= ¦with climate control, without climate control¦,
• X
2
= ¦with sun roof, without sun roof¦,
• X
3
= ¦5 gears, 6 gears, automatic 5 gears, automatic 6 gears¦, and
• X
4
= ¦red, blue, green, silver, white, black, yellow¦.
Define a proper search space for this problem along with a unary search operation and
a genotype-phenotype mapping.
[10 points]
21. Define a nullary search operation which creates new random points in the n-
dimensional search space G = ¦0, 1, 2¦
n
. A nullary search operation has no parameters
(see Section 4.2). ¦0, 1, 2¦
n
denotes a search space which consists of strings of the
length n where each element of the string is from the set ¦0, 1, 2¦. An example for
such a string for n = 5 is (1, 2, 1, 1, 0). A nullary search operation here would thus just
create a new, random string of this type.
[5 points]
22. Define a unary search operation which randomly modifies a point in the n-dimensional
search space G = ¦0, 1, 2¦
n
See Task 21 for a description of the search space. A unary
search operation for G = ¦0, 1, 2¦
n
has one parameter, one string of length n where
each element is from the set ¦0, 1, 2¦. A unary search operation takes such a string
g ∈ G and modifies it. It returns a new string g

∈ G which is a bit different from g.
This modification has to be done in a random way, by adding, subtraction, multiplying,
or permutating g randomly, for example.
[5 points]
23. Define a possible unary search operation which modifies a point g ∈ R
2
and which
could be used for minimizing a black box objective function f : R
2
→R.
[10 points]
24. When trying to find the minimum of an black box objective function f : R
3
→ R, we
could use the same search space and problem space G = X = R
3
. Instead, we could
use G = R
2
and apply a genotype-phenotype mapping gpm : R
2
→ R
3
. For example,
we could set gpm
_
g = (g[0], g[1])
T
_
= (g[0], g[1], g[0] +g[1])
T
. By this approach, our
optimization algorithm would only face a two-dimensional problem instead of a three-
dimensional one. What do you think about this idea? Is it good?
[5 points]
25. Discuss some basic assumptions usually underlying the design of unary and binary
search operations.
[5 points]
26. Define a search space and search operations for the Traveling Salesman Problem as
discussed in Example E2.2 on page 45.
[10 points]
27. We can define a search space G = (1..n)
n
for a Traveling Salesman Problem with n
cities. This search space would clearly be numerical since G ⊆ N
n
1
⊆ R
n
. Discuss
about whether it makes sense to use a pure numerical optimization algorithm (with
operations similar to the one you defined in Task 23, for example) to find solutions to
the Traveling Salesman Problem in this search space.
[5 points]
92 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT?
28. Assume that you want to solve an optimization problem defined over an n-dimensional
vector of natural numbers where each element of the vector stems from the natural
interval
9
a..b. In other words, the problem space X is a subsetset of the natural
numbers by the n (i. e., X ⊆ N
n
0
); more specific: X = (a..b)
n
. You have an optimization
algorithm available which is suitable for continuous optimization problems defined over
m-dimensional real-vectors where each vector element stems from the real interval
10
[c, d], i. e., which deals with search spaces G ⊆ R
m
; more specific G = [c, d]
m
. m, c,
and d can be chosen freely. n, a, and b are fixed constants that are specific to the
optimization problem you have.
How can you solve your optimization problem with the available algorithm? Provide an
implementation of the necessary module in a programming language of your choice.
The implementation should obey to the interface definitions in Chapter 55. If you
choose a programming language different from Java, you should give the methods
names similar to those used in that section and write comments relate your code to
the interfaces given there.
[10 points]
9
The natural interval a..b contains only the natural numbers from a to b. For example 1..5 = {1, 2, 34, 5}
and π ∈ 1..4 .
10
The real interval [c, d] contains all the real numbers between c and d. For example π ∈ [1, 4] and
c..d ∈ [c, d].
Chapter 5
Fitness and Problem Landscape: How does the
Optimizer see it?
5.1 Fitness Landscapes
A very powerful metaphor in Global Optimization is the fitness landscape
1
. Like many other
abstractions in optimization, fitness landscapes have been developed and extensively been
researched by evolutionary biologists [701, 1030, 1500, 2985]. Basically, they are visualiza-
tions of the relationship between the genotypes or phenotypes in a given population and
their corresponding reproduction probability. The idea of such visualizations goes back to
Wright [2985], who used level contours diagrams in order to outline the effects of selection,
mutation, and crossover on the capabilities of populations to escape locally optimal config-
urations. Similar abstractions arise in many other areas [2598], like in physics of disordered
systems such as spin glasss [311, 1879], for instance.
In Chapter 28, we will discuss Evolutionary Algorithms, which are optimization methods
inspired by natural evolution. The Evolutionary Algorithm research community has widely
adopted the fitness landscapes as relation between individuals and their objective values [865,
1912]. Langdon and Poli [1673]
2
explain that fitness landscapes can be imagined as a view
on a countryside from far above. Here, the height of each point is analogous to its objective
value. An optimizer can then be considered as a short-sighted hiker who tries to find the
lowest valley or the highest hilltop. Starting from a random point on the map, she wants to
reach this goal by walking the minimum distance.
Evolutionary Algorithms were initially developed as single-objective optimization meth-
ods. Then, the objective values were directly used as fitness and the “reproduction prob-
ability”, i. e., the chance of a candidate solution for being subject to further investigation,
was proportional to them. In multi-objective optimization applications with more sophisti-
cated fitness assignment and selection processes, this simple approach does not reflect the
biological metaphor correctly anymore.
In the context of this book, we therefore deviate from this view. Since it would possibly
be confusing for the reader if we used a different definition for fitness landscapes than the
rest of the world, we introduce the new term problem landscape and keep using the term
fitness landscape in the traditional manner. In Figure 11.1 on page 140, you can find some
examples for fitness landscapes.
1
http://en.wikipedia.org/wiki/Fitness_landscape [accessed 2007-07-03]
2
This part of [1673] is also online available at http://www.cs.ucl.ac.uk/staff/W.Langdon/FOGP/
intro_pic/landscape.html [accessed 2008-02-15].
945 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
5.2 Fitness as a Relative Measure of Utility
When performing a multi-objective optimization, i. e., n = [f [ > 1, the utility of a candidate
solution is characterized by a vector in R
n
. In Section 3.3 on page 55, we have seen how such
vectors can be compared directly in a consistent way. Such comparisons, however, only state
whether one candidate solution is better than another one or not, but give no information
on how much it is better or worse and how interesting it is for further investigation. Often,
such a scalar, relative measure is needed.
In many optimization techniques, especially in Evolutionary Algorithms, the partial or-
ders created by Pareto or prevalence comparators are therefore mapped to the positive real
numbers R
+
. For each candidate solution, this single number represents its fitness as solu-
tion for the given optimization problem. The process of computing such a fitness value is
often not solely depending on the absolute objective values of the candidate solutions but
also on those of the other individuals known. It could, for instance, be position of a can-
didate solution in the list of investigated elements sorted according to the Pareto relation.
Hence, fitness values often only have a meaning inside the optimization process [949] and
may change in each iteration of the optimizer, even if the objective values stay constant.
This concept is similar to the heuristic functions in many deterministic search algorithms
which approximate how many modifications are likely necessary to an element in order to
reach a feasible solution.
Definition D5.1 (Fitness). The fitness
3
value v : GX →R
+
maps an individual p to a
positive real value v(p) denoting its utility as solution or its priority in the subsequent steps
of the optimization process.
The fitness mapping v is usually built by incorporating the information of a set of indi-
viduals, i. e., a population pop. This allows the search process to compute the fitness as
a relative utility rating. In multi-objective optimization, a fitness assignment procedure
v = assignFitness(pop, cmp
f
) as defined in Definition D28.6 on page 274 is typically used
to construct v based on the set pop of all individuals which are currently under investiga-
tion and a comparison criterion cmp
f
(see Definition D3.20 on page 77). This procedure is
usually repeated in each iteration t cycle of the optimizer and different fitness assignments
v
t=1
, v
t=2
, . . . , result.
We can detect a similarity between a fitness function v and a heuristic function φ as
defined in Definition D1.14 on page 34 and the more precise Definition D46.5 on page 518.
Both of them give indicators about the utility of p which, most often, are only valid and make
only sense within the optimization process. Both measures reflect some ranking criterion
used to pick the point(s) in the search space which should be investigated further in the next
iteration. The difference between fitness functions and heuristics is that heuristics usually
are based on some insights about the concrete structure of the optimization problem whereas
a fitness assignment process obtains its information primarily from the objective values of
the individuals. They are thus more abstract and general. Also, as mentioned before, the
fitness can change during the optimization process and often a relative measure whereas
heuristic values for a fixed point in the search space are usually constant and independent
from the structure of the other candidate solutions under investigation.
Although it is a bit counter-intuitive that a fitness function is built repeatedly and is not
a constant mapping, this is indeed the case in most multi-objective optimization situations.
In practical implementations, the individual records are often extended to also facilitate the
fitness values. If we refer to v(p) multiple times in an algorithm for the same p and pop,
this should be understood as access to such a cached value which is actually computed only
once. This is indeed a good implementation decision, since some fitness assignment processes
3
http://en.wikipedia.org/wiki/Fitness_(genetic_algorithm) [accessed 2008-08-10]
5.3. PROBLEM LANDSCAPES 95
have a complexity of O
_
(len(pop))
2
_
or higher, because they may involve comparing each
individual with every other one in pop as well as computing density information or clustering.
The term fitness has been borrowed from biology
4
[2136, 2542] by the Evolutionary
Algorithms community. As already mentioned, the first applications of Genetic Algorithms
were developed, the focus was mainly on single-objective optimization. Back then, they
called this single function fitness function and thus, set objective value ≡ fitness value. This
point of view is obsolete in principle, yet you will find many contemporary publications that
use this notion. This is partly due the fact that in simple problems with only one objective
function, the old approach of using the objective values directly as fitness, i. e., vp = f(p.x),
can sometimes actually be applied. In multi-objective optimization processes, this is not
possible and fitness assignment processes like those which we are going to elaborate on
in Section 28.3 on page 274 are used instead.
In the context of this book, fitness is subject to minimization, i. e., elements with smaller
fitness are “better” than those with higher fitness. Although this definition differs from the
biological perception of fitness, it complies with the idea that optimization algorithms are to
find the minima of mathematical functions (if nothing else has been stated). It makes fitness
and objective values compatible in that it allows us to consider the fitness of a multi-objective
problem to be the objective value of a single-objective one.
5.3 Problem Landscapes
The fitness steers the search into areas of the search space considered promising to contain
optima. The quality of an optimization process is largely defined by how fast it finds these
points. Since most of the algorithms discussed in this book are randomized, meaning that
two consecutive runs of the same configuration may lead to totally different results, we can
define a quality measure based on probabilities.
Definition D5.2 (Problem Landscape). The problem landscape Φ : X N
1
→ [0, 1]
maps all the points x in a finite problem space X to the cumulative probability of reaching
them until (inclusively) the τth evaluation of a candidate solution (for a certain configuration
of an optimization algorithm).
Φ(x, τ) = P
_
x has been visited until the τ
th
evaluation
_
∀x ∈ X, τ ∈ N
1
(5.1)
This definition of problem landscape is very similar to the performance measure definition
used by Wolpert and Macready [2963, 2964] in their No Free Lunch Theorem (see Chapter 23
on page 209). In our understanding, problem landscapes are not only closer to the original
meaning of fitness landscapes in biology, they also have another advantage. According to
this definition, all entities involved in an optimization process over a finite problem space
directly influence the problem landscape. The choice of the search operations in the search
space G, the way the initial elements are picked, the genotype-phenotype mapping, the
objective functions, the fitness assignment process, and the way individuals are selected for
further exploration all have impact on the probability to discover certain candidate solutions
– the problem landscape Φ.
Problem landscapes are essentially some form of cumulative distribution functions
(see Definition D53.18 on page 658). As such, they can be considered as the counterparts
of the Markov chain models for the search state used in convergence proofs such as those
in [2043] which resemble probability mass/density functions. On basis of its cumulative
character, we can furthermore make the following assumptions about Φ(x, τ):
Φ(x, τ
1
) ≤ Φ(x, τ
2
) ∀τ
1
< τ
2
∧ x ∈ X, τ
1
, τ
2
∈ N
1
(5.2)
0 ≤ Φ(x, τ) ≤ 1 ∀x ∈ X, τ ∈ N
1
(5.3)
4
http://en.wikipedia.org/wiki/Fitness_%28biology%29 [accessed 2008-02-22]
965 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
Example E5.1 (Influence of Search Operations on Φ).
Let us now give a simple example for problem landscapes and how they are influenced by
the optimization algorithm applied to them. Figure 5.1 illustrates one objective function,
defined over a finite subset X of the two-dimensional real plane, which we are going to
optimize. In this example, we use the problem space X also as search space G, i. e., the
f
(
x
)
X
x
l
«
x
«
Figure 5.1: An example optimization problem.
genotype-phenotype mapping is the identity mapping. For optimization, we will use a very
simple Hill Climbing algorithm
5
, which initially randomly draws one candidate solution with
uniform probability from the X. In each iteration, it creates a new candidate solution from
the current one by using a unary search operation. The old and the new candidate are
compared, and the better one is kept. Hence, we do not need to differentiate between fitness
and objective values. In the example, better means has lower fitness.
In Figure 5.1, we can spot one local optimum x

l
and one global optimum x

. Between
them, there is a hill, an area of very bad fitness. The rest of the problem space exhibits a
small gradient into the direction of its center. The optimization algorithm will likely follow
this gradient and sooner or later discover x

l
or x

. The chances to first of x

l
are higher,
since it is closer to the center of X and thus, has a bigger basin of attraction.
With this setting, we have recorded the traces of two experiments with 1.3 million runs of
the optimizer (8000 iterations each). From these records, we can approximate the problem
landscapes very good.
In the first experiment, depicted in Figure 5.2, the Hill Climbing optimizer used a search
operation searchOp
1
: X → X which creates a new candidate solution according to a (dis-
cretized) normal distribution around the old one. We divided X in a regular lattice and
in the second experiment series, we used an operator searchOp
2
: X → X which produces
candidate solutions that are direct neighbors of the old ones in this lattice. The problem
landscape Φ produced by this operator is shown in Figure 5.3. Both operators are complete,
since each point in the search space can be reached from each other point by applying them.
searchOp
1
(x) ≡ (x
1
+ randomNorm(0, 1) , x
2
+ randomNorm(0, 1)) (5.4)
searchOp
1
(x) ≡ (x
1
+ randomUni[−1, 1) , x
2
+ randomUni[−1, 1)) (5.5)
In both experiments, the probabilities of the elements in the problem space of being discov-
ered are very low, near to zero, and rather uniformly distributed in the first few iterations.
To put it precise, since our problem space is a 3636 lattice, this probability is exactly
1
/36
2
in the first iteration. Starting with the tenth or so evaluation, small peaks begin to form
around the places where the optima are located. These peaks then begin to grow.
5
Hill Climbing algorithms are discussed thoroughly in Chapter 26.
5.3. PROBLEM LANDSCAPES 97
0
0.2
0.4
0.6
0.8
F
(
x
,
0
)
X
Fig. 5.2.a: Φ(x, 1)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
)
X
Fig. 5.2.b: Φ(x, 2)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
)
X
Fig. 5.2.c: Φ(x, 5)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
)
X
Fig. 5.2.d: Φ(x, 10)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
)
X
Fig. 5.2.e: Φ(x, 50)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
)
X
Fig. 5.2.f: Φ(x, 100)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
0
)
X
Fig. 5.2.g: Φ(x, 500)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
0
)
X
Fig. 5.2.h: Φ(x, 1000)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
0
0
0
)
X
Fig. 5.2.i: Φ(x, 2000)
0
0.2
0.4
0.6
0.8
F
(
x
,
4
0
0
0
)
X
Fig. 5.2.j: Φ(x, 4000)
0
0.2
0.4
0.6
0.8
F
(
x
,
6
0
0
0
)
X
Fig. 5.2.k: Φ(x, 6000)
0
0.2
0.4
0.6
0.8
F
(
x
,
8
0
0
0
)
X
Fig. 5.2.l: Φ(x, 8000)
Figure 5.2: The problem landscape of the example problem derived with searchOp
1
.
985 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
0
0.2
0.4
0.6
0.8
F
(
x
,
0
)
X
Fig. 5.3.a: Φ(x, 1)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
)
X
Fig. 5.3.b: Φ(x, 2)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
)
X
Fig. 5.3.c: Φ(x, 5)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
)
X
Fig. 5.3.d: Φ(x, 10)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
)
X
Fig. 5.3.e: Φ(x, 50)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
)
X
Fig. 5.3.f: Φ(x, 100)
0
0.2
0.4
0.6
0.8
F
(
x
,
5
0
0
)
X
Fig. 5.3.g: Φ(x, 500)
0
0.2
0.4
0.6
0.8
F
(
x
,
1
0
0
0
)
X
Fig. 5.3.h: Φ(x, 1000)
0
0.2
0.4
0.6
0.8
F
(
x
,
2
0
0
0
)
X
Fig. 5.3.i: Φ(x, 2000)
0
0.2
0.4
0.6
0.8
F
(
x
,
4
0
0
0
)
X
Fig. 5.3.j: Φ(x, 4000)
0
0.2
0.4
0.6
0.8
F
(
x
,
6
0
0
0
)
X
Fig. 5.3.k: Φ(x, 6000)
0
0.2
0.4
0.6
0.8
F
(
x
,
8
0
0
0
)
X
Fig. 5.3.l: Φ(x, 8000)
Figure 5.3: The problem landscape of the example problem derived with searchOp
2
.
5.3. PROBLEM LANDSCAPES 99
Since the local optimum x

l
resides at the center of a large basin and the gradients point
straight into its direction, it has a higher probability of being found than the global optimum
x

. The difference between the two search operators tested becomes obvious starting with
approximately the 2000th iteration. The process with the operator utilizing the normal
distribution, the Φ value of the global optimum begins to rise farther and farther, finally
surpassing the one of the local optimum. Even if the optimizer gets trapped in the local
optimum, it will still eventually discover the global optimum and if we ran this experiment
longer, the according probability would have converge to 1. The reason for this is that with
the normal distribution, all points in the search space have a non-zero probability of being
found from all other points in the search space. In other words, all elements of the search
space are adjacent.
The operator based on the uniform distribution is only able to create candidate solutions
in the direct neighborhood of the known points. Hence, if an optimizer gets trapped in the
local optimum, it can never escape. If it arrives at the global optimum, it will never discover
the local one. In Fig. 5.3.l, we can see that Φ(x

l
, 8000) ≈ 0.7 and Φ(x

, 8000) ≈ 0.3. In
other words, only one of the two points will be the result of the optimization process whereas
the other optimum remains undiscovered.
From Example E5.1 we can draw four conclusions:
1. Optimization algorithms discover good elements with higher probability than elements
with bad characteristics, given the problem is defined in a way which allows this to
happen.
2. The success of optimization depends very much on the way the search is conducted,
i. e., the definition of neighborhoods of the candidate solutions (see Definition D4.12
on page 86).
3. It also depends on the time (or the number of iterations) the optimizer allowed to use.
4. Hill Climbing algorithms are no Global Optimization algorithms since they have no
general means of preventing getting stuck at local optima.
1005 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT?
Tasks T5
29. What is are the similarities and differences between fitness values and traditional
heuristic values?
[10 points]
30. For many (single-objective) applications, a fitness assignment process which assigns
the rank of an individual p in a population pop according to the objective value f(p.x)
of its phenotype p.x as its fitness v(p) is highly useful. Specify an algorithm for such
an assignment process which sorts the individuals in a list (population pop) according
to their objective value and assigns the index of an individual in this sorted list as its
fitness.
Should the individuals be sorted in ascending or descending order if f is subject to
maximization and the fitness v is subject to minimization?
[10 points]
Chapter 6
The Structure of Optimization: Putting it together.
After discussing the single entities involved in optimization, it is time to put them together
to the general structure common to all optimization processes, to an overall picture. This
structure consists of the previously introduced well-defined spaces and sets as well as the
mappings between them. One example for this structure of optimization processes is given in
Figure 6.1 by using a Genetic Algorithm which encodes the coordinates of points in a plane
into bit strings as an illustration. The same structure was already used in Example E4.2.
6.1 Involved Spaces and Sets
At the lowest level, there is the search space G inside which the search operations navigate.
Optimization algorithms can usually only process a subset of this space at a time. The
elements of G are mapped to the problem space X by the genotype-phenotype mapping
“gpm”. In many cases, “gpm” is the identity mapping (if G = X) or a rather trivial, injective
transformation. Together with the elements under investigation in the search space, the
corresponding candidate solutions form the population pop of the optimization algorithm.
A population may contain the same individual record multiple times if the algorithm
permits this. Some optimization methods only process one candidate solution at a time.
In this case, the size of pop is one and a single variable instead of a list data structure is
used to hold the element’s data. Having determined the corresponding phenotypes of the
genotypes produced by the search operations, the objective functions(s) can be evaluated.
In the single-objective case, a single value f(p.x) determines the utility of p.x. In case
of a multi-objective optimization, on the other hand, a vector f (p.x) of objective values
is computed for each candidate solution p.x in the population. Based on these objective
values, the optimizer may compute a real fitness v(p) denoting how promising an individual
p is for further investigation, i. e., as parameter for the search operations, relative to the
other known candidate solutions in the population pop. Either based on such a fitness value,
directly on the objective values, or on other criteria, the optimizer will then decide how to
proceed, how to generate new points in G.
102 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
(1,3)
(3,3)
(0,2)
(0,2)
(0,2)
(0,2) (0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
ProblemSpace X
SearchSpace G
0100
1010
1100 1101
0000 1000 1001
0101
0110 1110 1111 0111
0001
0010 0011 1011
ObjectiveSpace Y R
n
Í
FitnessSpace R
+
(0,0) (0,1) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)
Population (Phenotypes)
Population (Genotypes)
ObjectiveValues
FitnessValues
1111
1111
1110
1000
0100
0111
0111
0010
0010
0010
0010
1111
f
i
t
n
e
s
s
f
(
x
)
1
f
(
x
)
1
PopÎ( ) G X ´
ps
Fi tness and heuri sti c val ues
(normally)haveonlyameaninginthe
context of a population or a set of
solution candidates..............................
optimal
solutions
f: XY Y R ( )
n
Í
v: G X ´ R
+

TheInvolvedSpaces TheInvolvedSets/Elements
FitnessAssignmentProcess FitnessAssignmentProcess
Genotype-PhenotypeMapping
ObjectiveFunction(s)
Genotype-PhenotypeMapping
ObjectiveFunction(s)
Pop ( ) G X ´
ps
Î
Figure 6.1: Spaces, Sets, and Elements involved in an optimization process.
6.2. OPTIMIZATION PROBLEMS 103
6.2 Optimization Problems
An optimization problem is the task we wish to solve with an optimization algorithm. In
Definition D1.1 and D1.2 on page 22, we gave two different views on how optimization
problems can be defined. Here, we will give a more precise discussion which will be the basis
of our considerations later in this book. An optimization problem, in the context of this
book, includes the definition of the possible solution structures (X), the means to determine
their utilities (f ), and a way to compare two candidate solutions (cmp
f
).
Definition D6.1 (Optimization Problem (Mathematical View)). An optimization
problem “OptProb” is a triplet OptProb = (X, f , cmp
f
) consisting of a problem space X, a
set f = ¦f
1
, f
2
, . . . ¦ of n = [f [ objective functions f
i
: X → R ∀i ∈ 1..n and a comparator
function cmp
f
: XX →R which allowing us to compare two candidate solutions x
1
, x
2
∈ X
based on their objective values f (x
1
) and f (x
2
) (and/or constraint satisfaction properties or
other possible features).
Many optimization algorithms are general enough to work on different search spaces and
utilize various search operations. Others can only be applied to a distinct class of search
spaces or have fixed operators. Each optimization algorithm is defined by its specification
that states which operations have to be carried out in which order. Definition D6.2 provides
us with a mathematical expression for one specific application of an optimization algorithm
to a specific optimization problem instance.
Definition D6.2 (Optimization Setup). The setup “OptSetup” of an application of an
optimization algorithm is the five-tuple OptSetup = (OptProb, G, Op, gpm, Θ) of the opti-
mization problem OptProb to be solved, the search space G in which the search operations
searchOp ∈ Op navigate and which is translated to the problem space X with the genotype-
phenotype mapping gpm, and the algorithm-specific parameter configuration Θ (such as size
limits for populations, time limits for the optimization process, or starting seeds for random
number generators).
This setup, together with the specification of the algorithm, entirely defines the behavior of
the optimizer “OptAlgo”.
Definition D6.3 (Optimization Algorithm). An optimization algorithm is a mapping
OptAlgo : OptSetup →ΦX

of an optimization setup “OptSetup” to a problem landscape
Φ as well as a mapping to the optimization result x ∈ X

, an arbitrarily sized subset of the
problem space.
Apart from this very mathematical definition, we provide a Java interface which unites the
functionality and features of optimization algorithms in Section 55.1.5 on page 713. Here,
the functionality (finding a list of good individuals) and the properties (the objective func-
tion(s), the search operations, the genotype-phenotype mapping, the termination criterion)
characterize an optimizer. This perspective comes closer to the practitioner’s point of view
whereas Definition D6.3 gives a good summary of the theoretical idea of optimization.
If an optimization algorithm can effectively be applied to an optimization problem, it
will find at least one local optimum x

l
if granted infinite processing time and if such an
optimum exists. Usual prerequisites for effective problem solving are a weakly complete
set of search operations Op, a surjective genotype-phenotype mapping “gpm”, and that the
objective functions f do provide more information than needle-in-a-haystack configurations
(see Section 16.7 on page 175).
∃x

l
∈ X : lim
τ→∞
Φ(x

l
, τ) = 1 (6.1)
From an abstract perspective, an optimization algorithm is characterized by
104 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
1. the way it assigns fitness to the individuals,
2. the ways it selects them for further investigation,
3. the way it applies the search operations, and
4. the way it builds and treats its state information.
The best optimization algorithm for a given setup OptSetup is the one with the highest values
of Φ(x

, τ) for the optimal elements x

in the problem space with the lowest corresponding
values of τ. Therefore, finding the best optimization algorithm for a given optimization
problem is, itself, a multi-objective optimization problem.
For a perfect optimization algorithm (given an optimization setup with weakly complete
search operations, a surjective genotype-phenotype mapping, and reasonable objective func-
tion(s)), Equation 6.2 would hold. However, it is more than questionable whether such an
algorithm can actually be built (see Chapter 23).
∀x
1
, x
2
∈ X : x
1
≺ x
2
⇒ lim
τ→∞
Φ(x
1
, τ) > lim
τ→∞
Φ(x
2
, τ) (6.2)
6.3 Other General Features
There are some further common semantics and operations that are shared by most optimiza-
tion algorithms. Many of them, for instance, start out by randomly creating some initial
individuals which are then refined iteratively. Optimization processes which are not allowed
to run infinitely have to find out when to terminate. In this section we define and discuss
general abstractions for such similarities.
6.3.1 Gradient Descend
Definition D6.4 (Gradient). A gradient
1
of a scalar mathematical function (scalar field)
f : R
n
→R is a vector function (vector field) which points into the direction of the greatest
increase of the scalar field. It is denoted by ∇f or grad(f).
lim
h→0
[[f (x +h) −f (x) −grad(f)(x) h[[
[[h[[
= 0 (6.3)
In other words, if we have a differentiable smooth function with low total variation
2
[2224],
the gradient would be the perfect hint for optimizer, giving the best direction into which
the search should be steered. What many optimization algorithms which treat objective
functions as black boxes do is to use rough estimations of the gradient for this purpose. The
Downhill Simplex (see Chapter 43 on page 501), for example, uses a polytope consisting of
n + 1 points in G = R
n
for approximating the direction with the steepest descend.
In most cases, we cannot directly differentiate the objective functions with the Nabla op-
erator
3
∇f because its exact mathematical characteristics are not known or too complicated.
Also, the search space G is often not a vector space over the real numbers R.
Thus, samples of the search space are used to approximate the gradient. If we compare
two elements g
1
and g
2
via their corresponding phenotypes x
1
= gpm(g
1
) and x
2
= gpm(g
2
)
and find x
1
≺ x
2
, we can assume that there is some sort of gradient facing upwards from
x
1
to x
2
and hence, from g
1
to g
2
. When descending this gradient, we can hope to find an
x
3
with x
3
≺ x
1
and finally the global minimum.
1
http://en.wikipedia.org/wiki/Gradient [accessed 2007-11-06]
2
http://en.wikipedia.org/wiki/Total_variation [accessed 2009-07-01]
3
http://en.wikipedia.org/wiki/Del [accessed 2008-02-15]
6.3. OTHER GENERAL FEATURES 105
6.3.2 Iterations
Global optimization algorithms often proceed iteratively and step by step evaluate candi-
date solutions in order to approach the optima. We distinguish between evaluations τ and
iterations t.
Definition D6.5 (Evaluation). The value τ ∈ N
0
denotes the number of candidate solu-
tions for which the set of objective functions f has been evaluated.
Definition D6.6 (Iteration). An iteration
4
refers to one round in a loop of an algorithm.
It is one repetition of a specific sequence of instructions inside of the algorithm.
Algorithms are referred to as iterative if most of their work is done by cyclic repetition of
one main loop. In the context of this book, an iterative optimization algorithm starts with
the first step t = 0. The value t ∈ N
0
is the index of the iteration currently performed by
the algorithm and t + 1 refers to the following step. One example for iterative algorithm
is Algorithm 6.1. In some optimization algorithms like Genetic Algorithms, for instance,
iterations are referred to as generations.
There often exists a well-defined relation between the number of performed evaluations
τ of candidate solutions and the index of the current iteration t in an optimization pro-
cess: Many Global Optimization algorithms create and evaluate exactly a certain number
of individuals per generation.
6.3.3 Termination Criterion
The termination criterion terminationCriterion
__
is a function with access to all the infor-
mation accumulated by an optimization process, including the number of performed steps
t, the objective values of the best individuals, and the time elapsed since the start of the
process. With terminationCriterion
__
, the optimizers determine when they have to halt.
Definition D6.7 (Termination Criterion). When the termination criterion function
terminationCriterion ∈ ¦true, false¦ evaluates to true, the optimization process will
stop and return its results.
In Listing 55.9 on page 712, you can find a Java interface resembling Definition D6.7. Some
possible criteria that can be used to decide whether an optimizer should terminate or not
are [2160, 2622, 3088, 3089]:
1. The user may grant the optimization algorithm a maximum computation time. If this
time has been exceeded, the optimizer should stop. Here we should note that the time
needed for single individuals may vary, and so will the times needed for iterations.
Hence, this time threshold can sometimes not be abided exactly and may avail to
different total numbers of iterations during multiple runs of the same algorithm.
2. Instead of specifying a time limit, an upper limit for the number of iterations
˘
t or
individual evaluations
˘
τ may be specified. Such criteria are most interesting for the
researcher, since she often wants to know whether a qualitatively interesting solution
can be found for a given problem using at most a predefined number of objective
function evaluations. In Listing 56.53 on page 836, we give a Java implementation of
the terminationCriterion
__
operation which can be used for that purpose.
4
http://en.wikipedia.org/wiki/Iteration [accessed 2007-07-03]
106 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.
3. An optimization process may be stopped when no significant improvement in the
solution quality was detected for a specified number of iterations. Then, the process
most probably has converged to a (hopefully good) solution and will most likely not
be able to make further progress.
4. If we optimize something like a decision maker or classifier based on a sample data
set, we will normally divide this data into a training and a test set. The training set is
used to guide the optimization process whereas the test set is used to verify its results.
We can compare the performance of our solution when fed with the training set to
its properties if fed with the test set. This comparison helps us detect when most
probably no further generalization can be achieved by the optimizer and we should
terminate the process.
5. Obviously, we can terminate an optimization process if it has already yielded a suffi-
ciently good solution.
In practical applications, we can apply any combination of the criteria above in order to
determine when to halt. Algorithm 1.1 on page 40 is an example for a badly designed termi-
nation criterion: the algorithm will not halt until a globally optimal solution is found. From
its structure, however, it is clear that it is very prone for premature convergence (see Chap-
ter 13 on page 151 and thus, may get stuck at a local optimum. Then, it will run forever
even if an optimal solution exists. This example also shows another thing: The structure
of the termination criterion decides whether a probabilistic optimization technique is a Las
Vegas or Monte Carlo algorithm (see Definition D12.2 and D12.3). How the termination
criterion may be tested in an iterative algorithm is illustrated in Algorithm 6.1.
Algorithm 6.1: Example Iterative Algorithm
Input: [implicit] terminationCriterion: the termination criterion
Data: t: the iteration counter
1 begin
2 t ←− 0
// initialize the data of the algorithm
3 while terminationCriterion
__
do
// perform one iteration - here happens the magic
4 t ←− t + 1
6.3.4 Minimization
The original form of many optimization algorithms was developed for single-objective op-
timization. Such algorithms may be used for both, minimization or maximization. As
previously mentioned, we will present them as minimization processes since this is the most
commonly used notation without loss of generality. An algorithm that maximizes the func-
tion f may be transformed to a minimization using −f instead.
Note that using the prevalence comparisons as introduced in Section 3.5.2 on page 76,
multi-objective optimization processes can be transformed into single-objective minimization
processes. Therefore x
1
≺ x
2
⇔cmp
f
(x
1
, x
2
) < 0.
6.3.5 Modeling and Simulating
While there are a lot of problems where the objective functions are mathematical expressions
that can directly be computed, there exist problem classes far away from such simple function
optimization that require complex models and simulations.
6.3. OTHER GENERAL FEATURES 107
Definition D6.8 (Model). A model
5
is an abstraction or approximation of a system that
allows us to reason and to deduce properties of the system.
Models are often simplifications or idealization of real-world issues. They are defined by
leaving away facts that probably have only minor impact on the conclusions drawn from
them. In the area of Global Optimization, we often need two types of abstractions:
1. The models of the potential solutions shape the problem space X. Examples are
(a) programs in Genetic Programming, for example for the Artificial Ant problem,
(b) construction plans of a skyscraper,
(c) distributed algorithms represented as programs for Genetic Programming,
(d) construction plans of a turbine,
(e) circuit diagrams for logical circuits, and so on.
2. Models of the environment in which we can test and explore the properties of the
potential solutions, like
(a) a map on which the Artificial Ant will move driven by an optimized program,
(b) an abstraction from the environment in which the skyscraper will be built, with
wind blowing from several directions,
(c) a model of the network in which the evolved distributed algorithms can run,
(d) a physical model of air which blows through the turbine,
(e) the model of an energy source the other pins which will be attached to the circuit
together with the possible voltages on these pins.
Models themselves are rather static structures of descriptions and formulas. Deriving con-
crete results (objective values) from them is often complicated. It often makes more sense
to bring the construction plan of a skyscraper to life in a simulation. Then we can test the
influence of various wind forces and directions on building structure and approximate the
properties which define the objective values.
Definition D6.9 (Simulation). A simulation
6
is the computational realization of a model.
Whereas a model describes abstract connections between the properties of a system, a
simulation realizes these connections.
Simulations are executable, live representations of models that can be as meaningful as real
experiments. They allow us to reason if a model makes sense or not and how certain objects
behave in the context of a model.
5
http://en.wikipedia.org/wiki/Model_%28abstract%29 [accessed 2007-07-03]
6
http://en.wikipedia.org/wiki/Simulation [accessed 2007-07-03]
Chapter 7
Solving an Optimization Problem
Before considering the following steps, always
first check if a dedicated algorithm exists for
the problem (see Section 1.1.3).
Figure 7.1: Use dedicated algorithms!
There are many highly efficient algorithms for solving special problems. For finding
the shortest paths from one source node to the other nodes in a network, for example, one
would never use a metaheuristic but Dijkstra’s algorithm [796]. For computing the maximum
flow in a flow network, the algorithm by Dinitz [800] beats any probabilistic optimization
method. The key to win with metaheuristics is to know when to apply them and when not.
The first step before using them is always an internet or literature search in order to check
for alternatives.
If this search did not lead to the discovery of a suitable method to solve the problem,
a randomized or traditional optimization technique can be used. Therefore, the following
steps should be performed in the order they are listed below and sketched in Figure 7.2.
0. Check for possible alternatives to using optimization algorithms!
1. Define the data structures for the possible solutions, the problem space X (see Sec-
tion 2.1).
2. Define the objective functions f ∈ f which are subject to optimization (see Section 2.2).
3. Define the equality constraints h and inequality constraints g, if needed (see Sec-
tion 3.4).
4. Define a comparator function cmp
f
(x
1
, x
2
) able to decide which one of two candidate
solutions x
1
and x
2
is better (based on f , and the constraints h
i
and g
i
, see Sec-
tion 3.5.2). In the unconstrained, single-objective case which is most common, this
function will just select the one individual with the lower objective value.
5. Decide upon the optimization algorithm to be used for the optimization problem (see
Section 1.3).
6. Choose an appropriate search space G (see Section 4.1). This decision may be directly
implied by the choice in Point 5.
110 7 SOLVING AN OPTIMIZATION PROBLEM
Determinedatastructureforsolutions.
Defineoptimizationcriteriaandconstraints.
Performrealexperimentconsistingofenough
runstogetstatisticallysignificantresults.
Areresults
satisfying?
X
G G G G X
m
;searchOp: ;gpm:
f: p Pop,Pop X R
n
( ;v ) Î R
+

EA?CMA-ES?DE?
populationsize=x,
crossoverrate=y,
...
Selectoptimizationalgorithm.
Configureoptimizationalgorithm.
Runsmall-scaletestexperiments.
yes
no
StatisticalEvaluation
ProductionStage
e.g.,30runsforeachconfiguration
andprobleminstance
e.g., Ourapproachbeatsapproach
xyzwith2%probabilitytoerrina
two-tailed Mann-Whitneytest.

integrateintorealapplication
afewrunsperconfiguration
Choosesearchspace,searchoperations,
andmappingtoproblemspace.
Areresults
satisfying?
yes
no
Aretherededi-
catedalgorithms?
no
yes
Usededicatedalgorithm!
Figure 7.2:
7 SOLVING AN OPTIMIZATION PROBLEM 111
7. Define a genotype-phenotype mapping “gpm” between the search space G and the
problem space X (see Section 4.3).
8. Define the set searchOperations of search operations searchOp ∈ Op navigating in
the search space (see Section 4.2). This decision may again be directly implied by the
choice in Point 5.
9. Configure the optimization algorithm, i. e., set all its parameters to values which look
reasonable.
10. Run a small-scale test with the algorithm and the settings and test, whether they
approximately meet the expectations. If not, go back to Point 9, 5, or even Point 1,
depending on how bad the results are. Otherwise, continue at Point 11
11. Conduct experiments with some different configurations and perform many runs for
each configuration. Obtain enough results to make statistical comparisons between
the configurations and with other algorithms (see e.g., Section 53.7). If the results are
bad, again consider to go back to a previous step.
12. Now either the results of the optimizer can be used and published or the developed
optimization system can become part of an application (see Part V and [2912] as an
example), depending on the goal of the work.
112 7 SOLVING AN OPTIMIZATION PROBLEM
Tasks T7
31. Why is it important to always look for traditional dedicated and deterministic alterna-
tives before trying to solve an optimization problem with a metaheuristic optimization
algorithm?
[5 points]
32. Why is a single experiment with a randomized optimization algorithm not enough to
show that this algorithm is good (or bad) for solving a given optimization task? Why
do we need comprehensive experiments instead, as pointed out in Point 11 on the
previous page?
[5 points]
Chapter 8
Baseline Search Patterns
Metaheuristic optimization algorithms trade in the guaranteed optimality of their results for
shorter runtime. This means that their results can be (but not necessarily are) worse than
the global optima of a given optimization problem. The results are acceptable as long as the
user is satisfied with them. An interesting question, however, is When is an optimization
algorithm suitable for a given problem?. The answer to this question is strongly related to
how well it can utilize the information it gains during the optimization process in order to
find promising regions in the search space.
The primary information source for any optimizer is the (set of) objective functions.
Hence, it makes sense to declare an optimization algorithm as efficient if it can outperform
primitive approaches which do not utilize the information gained from evaluating candidate
solutions with objective functions, such as the random sampling and random walk methods
which will be introduced in this chapter. The comparison criterion should be the solution
quality in terms of objective values.
Definition D8.1 (Effectiveness). An optimization algorithm is effective if it can (statis-
tically significantly) outperform random sampling and random walks in terms of solution
quality according to the objective functions.
A third baseline algorithm is exhaustive enumeration, the testing of all possible solutions. It
does not utilize the information gained from the objective functions as well. It can be used
as a yardstick in terms of optimization speed.
Definition D8.2 (Efficiency). An optimization algorithm is efficient if it can meet the
criterion of effectiveness (Definition D8.1) with a lower-than-exponential algorithmic com-
plexity (see Section 12.1 on page 143), i. e., within fewer steps than required by exhaustive
enumeration.
8.1 Random Sampling
Random sampling is the most basic search approach. It is uninformed, i. e., does not make
any use of the information it can obtain from evaluating candidate solutions with the objec-
tive functions. As sketched in Algorithm 8.1 and implemented in Listing 56.4 on page 729,
it keeps creating genotypes of random configuration until the termination criterion is met.
In every iteration, it therefore utilizes the nullary search operation “create” which builds
a genotype p.g ∈ G with a random configuration and maps this genotype to a phenotype
p.x ∈ X with the genotype-phenotype mapping “gpm”. It checks whether the new pheno-
type is better than the best candidate solution discovered so far (x). If so, it is stored in
x.
114 8 BASELINE SEARCH PATTERNS
Algorithm 8.1: x ←− randomSampling(f)
Input: f: the objective/fitness function
Data: p: the current individual
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− ∅
3 repeat
4 p.g ←− create
__
5 p.x ←− gpm(p.g)
6 if (x = ∅) ∨ (f(p.x) < f(x)) then
7 x ←− p.x
8 until terminationCriterion
__
9 return x
Notice that x is the only variable used for information aggregation and that there is
no form of feedback. The search makes no use of the knowledge about good structures of
candidate solutions nor does it further investigate a genotype which has produced a good
phenotype. Instead, if “create” samples the search space G in a uniform, unbiased way,
random sampling will be completely unbiased as well.
This unbiasedness and ignorance of available information turns random sampling into
one of the baseline algorithms used for checking whether an optimization method is suitable
for a given problem. If an optimization algorithm does not perform better than random
sampling on a given optimization problem, we can deduce that (a) it is not able to utilize the
information gained from the objective function(s) efficiently (see Chapter 14 and 16), (b) its
search operations cannot modify existing genotypes in a useful manner (see Section 14.2),
or (c) the information given by the objective functions is largely misleading (see Chapter 15)..
It is well known as the No Free Lunch Theorem (see Chapter 23), that averaged over the set
of all possible optimization problems, no optimization algorithm can beat random sampling.
However, for specific families of problems and for most practically relevant problems in
general, outperforming random sampling is not too complicated. Still, a comparison to this
primitive algorithm is necessary whenever a new, not yet analyzed optimization task is to
be solved.
8.2 Random Walks
Like random sampling, random walks
1
[899, 1302, 2143] (sometimes also called drunkard’s
walks) can be considered as an uninformed, undirected search methods. They form the
second one of the two baseline algorithms to which optimization methods must be compared
for specific problems in order to test their utility. We specify the random walk algorithm in
Algorithm 8.2 and implement it in Listing 56.5 on page 732.
While random sampling algorithms make use of no information gathered so far, random
walks base their next “search move” on the most recently investigated genotype p.g. Instead
of sampling the search space uniformly as random sampling does, a random walk starts at one
(possibly random) point. In each iteration, it uses a unary search operation (in Algorithm 8.2
simply denoted as searchOp(p.g)) to move to the next point.
Because of this structure, random walks are quite similar in their structure to basic
optimization algorithms such as Hill Climbing (see Chapter 26). The difference is that
optimization algorithms make use of the information they gain by evaluating the candidate
1
http://en.wikipedia.org/wiki/Random_walk [accessed 2010-08-26]
8.2. RANDOM WALKS 115
Algorithm 8.2: x ←− randomWalk(f)
Input: f: the objective/fitness function
Data: p: the current individual
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− ∅
3 p.g ←− create
__
4 repeat
5 p.x ←− gpm(p.g)
6 if (x = ∅) ∨ (f(p.x) < f(x)) then
7 x ←− p.x
8 p.g ←− searchOp(p.g)
9 until terminationCriterion
__
10 return x
solutions with the objective functions. In any useful configuration for optimization problems
which are not too ill-defined, they should thus be able to outperform random walks and
random sampling.
In cases where they cannot, applying one of these two uninformed, primitive strategies is
advised. Better yet, since optimization methods always introduce some bias into the search
process, always try to steer into the direction of the search space where they expected a
fitness improvement, random walk or random sampling may even outperform them if the
problems are deceptive enough (see Chapter 15). Circumstances where random walks can
be the search algorithms of choice are, for example,
1. if a search algorithm encounters a state explosion because there are too many states
and transcending to them would consume too much memory.
2. In certain cases of online search it is not possible to apply systematic approaches
like breadth-first search or depth-first search. If the environment, for instance, is only
partially observable and each state transition represents an immediate interaction with
this environment, we are maybe not able to navigate to past states again. One example
for such a case is discussed in the work of Skubch [2516, 2517] about reasoning agents.
Random walks are often used in optimization theory for determining features of a fitness
landscape. Measures that can be derived mathematically from a walk model include esti-
mates for the number of steps needed until a certain configuration is found, the chance to
find a certain configuration at all, and the average difference of the objective values of two
consecutive populations. From practically running random walks, some information about
the search space can be extracted. Skubch [2516, 2517], for instance, uses the number of en-
counters of certain state transition during the walk in order to successively build a heuristic
function.
Figure 8.1 illustrates some examples for random walks. The subfigures 8.1.a to 8.1.c
each illustrate three random walks in which axis-parallel steps of length 1 are taken. The
dimensions range from 1 to 3 from the left to the right. The same goes for Fig. 8.1.d to
Fig. 8.1.f, each of which displaying three random walks with Gaussian (standard-normally)
distributed step widths and arbitrary step directions.
8.2.1 Adaptive Walks
An adaptive walk is a theoretical optimization method which, like a random walk, usually
works on a population of size 1. It starts at a random location in the search space and pro-
116 8 BASELINE SEARCH PATTERNS
s
t
e
p
s
Fig. 8.1.a: 1d, step length 1 Fig. 8.1.b: 2d, axis-parallel,
step length 1
Fig. 8.1.c: 3d, axis-parallel,
step length 1
s
t
e
p
s
Fig. 8.1.d: 1d, Gaussian step
length
Fig. 8.1.e: 2d, arbitrary angle,
Gaussian step length
Fig. 8.1.f: 3d, arbitrary angle,
Gaussian step length
Figure 8.1: Some examples for random walks.
ceeds by changing (or mutating) its single individual. For this modification, three methods
are available:
1. One-mutant change: The optimization process chooses a single new individual from
the set of “one-mutant change” neighbors. These neighbors differ from the current
candidate solution in only one property, i. e., are in its ε = 0-neighborhood. If the new
individual is better, it replaces its ancestor, otherwise it is discarded.
2. Greedy dynamics: The optimization process chooses a single new individual from the
set of “one-mutant change” neighbors. If it is not better than the current candidate
solution, the search continues until a better one has been found or all neighbors have
been enumerated. The major difference to the previous form is the number of steps
that are needed per improvement.
3. Fitter Dynamics: The optimization process enumerates all one-mutant neighbors of
the current candidate solution and transcends to the best one.
From these elaborations, it becomes clear that adaptive walks are very similar to Hill Climb-
ing and Random Optimization. The major difference is that an adaptive walk is a theoreti-
cal construct that, very much like random walks, helps us to determine properties of fitness
landscapes whereas the other two are practical realizations of optimization algorithms.
Adaptive walks are a very common construct in evolutionary biology. Biological
populations are running for a very long time and so their genetic compositions are as-
sumed to be relatively converged [79, 1057]. The dynamics of such populations in near-
equilibrium states with low mutation rates can be approximated with one-mutant adaptive
walks [79, 1057, 2528].
8.3 Exhaustive Enumeration
Exhaustive enumeration is an optimization method which is as same as primitive as random
walks. It can be applied to any finite search space and basically consists of testing every single
8.3. EXHAUSTIVE ENUMERATION 117
possible solution for its utility, returning the best candidate solution discovered after all have
been tested. If the minimum possible objective value ˘ y is known for a given minimization
problem, the algorithm can also terminate sooner.
Algorithm 8.3: x ←− exhaustiveSearch(f, ˘ y)
Input: f: the objective/fitness function
Input: ˘ y: the lowest possible objective value, −∞ if unknown
Data: x: the current candidate solution
Data: t: the iteration counter
Output: x: the result, i. e., the best candidate solution discovered
1 begin
2 x ←− gpm(G[0])
3 t ←− 1
4 while (t < [G[) ∧ (f(x) > ˘ y) do
5 x ←− gpm(G[t])
6 if f(x) > f(x) then x ←− x
7 t ←− t + 1
8 return x
Algorithm 8.3 contains a definition of the exhaustive search algorithm. It is based on the
assumption that the search space G can be traversed in a linear order, where the t
th
element
in the search space can be accessed as G[t]. The second parameter of the algorithm is the
lower limit ˘ y of the objective value. If this limit is known, the algorithm may terminate
before traversing the complete search space if it discovers a global optimum with f(x) = ˘ y
earlier. If no lower limit for f is known, ˘ y = −∞ can be provided.
It is interesting to see that for many hard problems such as AT-complete ones (Sec-
tion 12.2.2), no algorithms are known which can generally be faster than exhaustive enu-
meration. For an input size of n, algorithms for such problems need O(2
n
) iterations.
Exhaustive search requires exactly the same complexity. If G = B
n
, i. e., the bit strings of
length n, the algorithm will need exactly 2
n
steps if no lower limit ˘ y for the value of f is
known.
118 8 BASELINE SEARCH PATTERNS
Tasks T8
33. What is are the similarities and differences between random walks, random sampling,
and adaptive walks?
[10 points]
34. Why should we consider an optimization algorithm as ineffective for a given problem
if it does not perform better than random sampling or a random walk on the problem?
[5 points]
35. Is it possible that an optimization algorithm such as, for example, Algorithm 1.1 on
page 40 may give us even worse results than a random walk or random sampling? Give
reasons. If yes, try to find a problem (problem space, objective function) where this
could be the case.
[5 points]
36. Implement random sampling for the bin packing problem as a comparison algorithm
for Algorithm 1.1 on page 40 and test it on the same problem instances used to
verify Task 4 on page 42. Discuss your results.
[10 points]
37. Compare your answers to Task 35 and 36. Is the experiment in Task 36 suitable
to validate your hypothesis? If not, can you find some instances of the bin packing
problem where it can be used to verify your idea? If so, apply both Algorithm 1.1
on page 40 and your random sampling implementation to these instances. Did the
experiments support your hypothesis?
[15 points]
38. Can we implement an exhaustive search pattern to the find the minimum of a math-
ematical function f defined over the real interval (−1, 1)? Answer this question both
from
(a) a theoretical-mathematical perspective in an idealized execution environment
(i. e., with infinite precision) and
(b) from a practical point of view based on real-world computers and requirements
(i. e., under the assumption that the available precision of floating point numbers
and of any given production process is limited).
[10 points]
39. Implement an exhaustive-search based optimization procedure for the bin packing
problem as a comparison algorithm for Algorithm 1.1 on page 40 and test it on the
same problem instances used to verify Task 4 on page 42 and 36. Discuss your results.
[10 points]
40. Implement an exhaustive search pattern to solve the instance of the Traveling Salesman
Problem given in Example E2.2 on page 45.
[10 points]
Chapter 9
Forma Analysis
The Schema Theorem has been stated for Genetic Algorithms by Holland [1250] in its seminal
work [711, 1250, 1253]. In this section, we are going to discuss it in the more general version
from Weicker [2879] as introduced by Radcliffe and Surry [2241–2243, 2246, 2634].
Definition D9.1 (Property). A property φ is a function which maps individuals, geno-
types, or phenotypes to a set of possible property values.
Each individual p in the population pop of an optimization algorithm is characterized by
its properties φ. Optimizers focus mainly on the phenotypical properties since these are
evaluated by the objective functions. In the forma analysis, the properties of the genotypes
are considered as well, since they influence the features of the phenotypes via the genotype-
phenotype mapping. With Example E9.1, we want to clarify how possible properties could
look like.
Example E9.1 (Examples for Properties).
A rather structural property φ
1
of formulas z : R → R in symbolic regression
1
would be
whether it contains the mathematical expression x + 1 or not: φ
1
: X → ¦true, false¦.
We can also declare a behavioral property φ
2
which is true if [z(0) −1[ ≤ 0.1 holds, i. e., if
the result of z is close to a value 1 for the input 0, and false otherwise: φ
2
= [z(0) −1[ ≤
0.1 ∀z ∈ X.
Assume that the optimizer uses a binary search space G = B
n
and that the formulas
z are decoded from there to trees that represent mathematical expression by a genotype-
phenotype mapping. A genotypical φ
3
property then would be if a certain sequence of bits
occurs in the genotype p.g and another phenotypical property φ
4
is the number of nodes in
the phenotype p.x, for instance.
If we try to solve a graph-coloring problem, for example, a property φ
5
: X →
¦black, white, gray¦ could denote the color of a specific vertex q as illustrated in Figure 9.1.
In general, we can consider properties φ
i
to be some sort of functions that map the
individuals to property values. φ
1
and φ
2
both map the space of mathematical functions to
the set B = ¦true, false¦ whereas φ
5
maps the space of all possible colorings for the given
graph to the set ¦white, gray, black¦ and property f
4
maps trees to the natural numbers.
1
More information on symbolic regression can be found in Section 49.1 on page 531.
120 9 FORMA ANALYSIS
Af
5
=white
Af
5
=black
q
G
1
q
G
3
q
G
2
q
G
4
q
G
6
q
G
7
q
G
8
q
G
5
Af
5
=gray
Í
Pop
( = )
X
G X
Figure 9.1: An graph coloring-based example for properties and formae.
On the basis of the properties φ
i
we can define equivalence relations
2

φ
i
:
p
1

φ
i
p
2
⇒φ
i
(p
1
) = φ
i
(p
2
) ∀p
1
, p
2
∈ GX (9.1)
Obviously, for each two candidate solutions x
1
and x
2
, either x
1

φ
i
x
2
or x
1

φ
i
x
2
holds.
These relations divide the search space into equivalence classes A
φ
i
=v
.
Definition D9.2 (Forma). An equivalence class A
φ
i
=v
that contains all the individuals
sharing the same characteristic v in terms of the property φ
i
is called a forma [2241] or
predicate [2826].
A
φ
i
=v
= ¦∀p ∈ GX : φ
i
(p) = v¦ (9.2)
∀p
1
, p
2
∈ A
φ
i
=v
⇒ p
1

φ
i
p
2
(9.3)
The number of formae induced by a property, i. e., the number of its different characteristics,
is called its precision [2241]. Two formae A
φ
i
=v
and A
φ
j
=w
are said to be compatible, written
as A
φ
i
=v
⊲⊳ A
φ
j
=w
, if there can exist at least one individual which is an instance of both.
A
φ
i
=v
⊲⊳ A
φ
j
=w
⇔ A
φ
i
=v
∩ A
φ
j
=w
,= ∅ (9.4)
A
φ
i
=v
⊲⊳ A
φ
j
=w
⇔ ∃ p ∈ GX : p ∈ A
φ
i
=v
∧ p ∈ A
φ
j
=w
(9.5)
A
φ
i
=v
⊲⊳ A
φ
i
=w
⇒ w = v (9.6)
Of course, two different formae of the same property φ
i
, i. e., two different characteristics of
φi, are always incompatible.
Example E9.2 (Examples for Formae (Example E9.1 Cont.)).
The precision of φ
1
and φ
2
in Example E9.1 is 2, for φ
5
it is 3. We can define another
property φ
6
≡ z(0) denoting the value a mathematical function has for the input 0. This
property would have an uncountable infinitely large precision.
2
See the definition of equivalence classes in Section 51.9 on page 646.
9 FORMA ANALYSIS 121
z (x)=x+1
1
z (x)=x +1.1
2
2
z (x)=
3
x+2
z (x)=2(
4
x+1)
z (x)=(sin x)(x+1)
6
z (x)=(cos x)(x+1)
7
z (x)=(tan x)+1
8
z (x)=tan x
5
z
1
z
4
z
6
z
7
Af
1
=true
z
2
z
3
z
8
z
5
Af
1
=false
z
1 z
2
z
7
z
8
Af
2
=true
z
3
z
4
z
6
z
5
Af
2
=false
Af
4
=1
z
1
z
2
z
7
z
8
Af
4
=0
z
6
z
5
z
3
z
4
Af
4
=2
~
f
1
~
f
2
~
f
6
Af
4
=1.1
Í
Pop
( = )
X
G X
Figure 9.2: Example for formae in symbolic regression.
In our initial symbolic regression example A
φ
1
=true
,⊲⊳ A
φ
1
=false
holds, since it is not
possible that a function z contains a term x + 1 and at the same time does not contain
it. All formae of the properties φ
1
and φ
2
on the other hand are compatible: A
φ
1
=false
⊲⊳
A
φ
2
=false
, A
φ
1
=false
⊲⊳ A
φ
2
=true
, A
φ
1
=true
⊲⊳ A
φ
2
=false
, and A
φ
1
=true
⊲⊳ A
φ
2
=true
all
hold. If we take φ
6
into consideration, we will find that there exist some formae compatible
with some of φ
2
and some that are not, like A
φ
2
=true
⊲⊳ A
φ
6
=1
and A
φ
2
=false
⊲⊳ A
φ
6
=2
, but
A
φ
2
=true
,⊲⊳ A
φ
6
=0
and A
φ
2
=false
,⊲⊳ A
φ
6
=0.95
.
The discussion of forma and their dependencies stems from the Evolutionary Algorithm
community and there especially from the supporters of the Building Block Hypothesis.
The idea is that an optimization algorithm should first discover formae which have a good
influence on the overall fitness of the candidate solutions. The hope is that there are many
compatible ones under these formae that can gradually be combined in the search process.
In this text we have defined formae and the corresponding terms on the basis of individ-
uals p which are records that assign an element of the problem spaces p.x ∈ X to an element
of the search space p.g ∈ G. Generally, we will relax this notation and also discuss forma
directly in the context of the search space G or problem space X, when appropriate.
122 9 FORMA ANALYSIS
Tasks T9
41. Define at least three reasonable properties on the individuals, genotypes, or phenotypes
of the Traveling Salesman Problem as introduced in Example E2.2 on page 45 and
further discussed in Task 26 on page 91.
[6 points]
42. List the relevant forma for each of the properties you have defined in based on your
definitions in Task 41 and in Task 26 on page 91.
[12 points]
43. List all the members of your forma given in Task 42 for the Traveling Salesman Problem
instance provided in Table 2.1 on page 46.
[10 points]
Chapter 10
General Information on Optimization
10.1 Applications and Examples
Table 10.1: Applications and Examples of Optimization.
Area References
Art see Tables 28.4, 29.1, and 31.1
Astronomy see Table 28.4
Business-To-Business see Table 3.1
Chemistry [623]; see also Tables 3.1, 22.1, 28.4, 29.1, 30.1, 31.1, 32.1,
40.1, and 43.1
Cloud Computing [2301, 2790, 3022, 3047]
Combinatorial Problems [5, 17, 115, 118, 130, 131, 178, 213, 241, 250, 251, 265, 315,
431, 432, 463, 475, 497, 533, 568, 571, 576, 578, 626–628,
671, 680, 682, 783, 806, 822–825, 832, 868, 1008, 1026, 1027,
1042, 1089, 1091, 1096, 1124, 1239, 1264, 1280, 1308, 1313,
1472, 1497, 1501, 1502, 1533, 1585, 1625, 1694, 1723, 1724,
1834, 1850, 1851, 1879, 1998, 2000, 2066, 2124, 2131, 2132,
2172, 2219, 2223, 2248, 2256, 2257, 2261, 2262, 2288–2290,
2330, 2422, 2449, 2508, 2511, 2512, 2667, 2673, 2692, 2717,
2812, 2813, 2817, 2885, 2900, 2990, 3016, 3017, 3047]; see
also Tables 3.1, 3.2, 18.1, 20.1, 22.1, 25.1, 26.1, 27.1, 28.4,
29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1, 40.1,
41.1, 42.1, 46.1, 46.3, 47.1, and 48.1
124 10 GENERAL INFORMATION ON OPTIMIZATION
Computer Graphics [45, 327, 450, 844, 1224, 2506]; see also Tables 3.1, 3.2, 20.1,
27.1, 28.4, 29.1, 30.1, 31.1, 33.1, and 36.1
Control [1922]; see also Tables 28.4, 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork [2238]; see also Tables 22.1, 28.4, 29.1, 31.1, and 37.1
Data Compression see Table 31.1
Data Mining [37, 40, 92, 115, 126, 189, 203, 217, 277, 281, 305, 310, 406,
450, 538, 656, 766, 805, 849, 889, 930, 961, 968, 975, 983,
991, 1035, 1036, 1100, 1168, 1176, 1194, 1259, 1425, 1503,
1959, 2070, 2088, 2202, 2234, 2272, 2312, 2319, 2357, 2459,
2506, 2602, 2671, 2788, 2790, 2824, 2839, 2849, 2939, 2961,
2999, 3047, 3066, 3077, 3078, 3103]; see also Tables 3.1, 20.1,
21.1, 22.1, 27.1, 28.4, 29.1, 31.1, 32.1, 34.1, 35.1, 36.1, 37.1,
39.1, and 46.3
Databases [115, 130, 131, 213, 533, 568, 571, 783, 991, 1096, 1308, 1585,
2692, 2824, 2971, 3016, 3017]; see also Tables 3.1, 3.2, 21.1,
26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1, 40.1, 46.3,
and 47.1
Decision Making [1194]; see also Tables 3.1, 3.2, 20.1, and 31.1
Digital Technology see Tables 3.1, 26.1, 28.4, 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys-
tems
[127, 204, 226, 330–333, 492, 495, 506, 533, 565, 576, 617,
628, 650, 671, 822, 823, 1101, 1168, 1280, 1342, 1425, 1472,
1497, 1561, 1645, 1647, 1650, 1690, 1777, 1850, 1851, 1998,
2000, 2066, 2070, 2172, 2202, 2238, 2256, 2257, 2301, 2319,
2330, 2375, 2511, 2512, 2602, 2638, 2716, 2790, 2839, 2878,
2990, 3017, 3022, 3058]; see also Tables 3.1, 3.2, 18.1, 21.1,
22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
35.1, 36.1, 37.1, 39.1, 40.1, 41.1, 46.1, 46.3, and 47.1
E-Learning [506, 565, 2319]; see also Tables 28.4, 29.1, 37.1, and 39.1
Economics and Finances [204, 281, 310, 559, 1194, 1998, 2257, 2301, 2330, 2878, 3022];
see also Tables 3.1, 3.2, 27.1, 28.4, 29.1, 31.1, 33.1, 35.1, 36.1,
37.1, 39.1, 40.1, 46.1, and 46.3
Engineering [189, 199, 312, 1101, 1194, 1219, 1227, 1431, 1690, 1777, 2088,
2375, 2459, 2716, 2839, 2878, 3004, 3047, 3077, 3078]; see also
Tables 3.1, 3.2, 18.1, 21.1, 22.1, 26.1, 27.1, 28.4, 29.1, 30.1,
31.1, 32.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, 41.1, 46.3,
and 47.1
Face Recognition [471]
Function Optimization [72, 613, 1136, 1137, 1188, 1942, 2333, 2708]; see also Tables
3.2, 21.1, 26.1, 27.1, 28.4, 29.1, 30.1, 32.1, 33.1, 34.1, 35.1,
36.1, 39.1, 40.1, 41.1, and 43.1
Games [1954, 2383–2385]; see also Tables 3.1, 28.4, 29.1, 31.1, and
32.1
Graph Theory [241, 576, 628, 650, 1219, 1263, 1472, 1625, 1690, 2172, 2375,
2716, 2839]; see also Tables 3.1, 18.1, 21.1, 22.1, 25.1, 26.1,
27.1, 28.4, 29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1,
39.1, 40.1, 41.1, 46.1, 46.3, and 47.1
Healthcare [45, 3103]; see also Tables 28.4, 29.1, 31.1, 35.1, 36.1, and
37.1
Image Synthesis [1637]; see also Tables 18.1 and 28.4
Logistics [118, 178, 241, 250, 251, 265, 495, 497, 578, 627, 680, 806,
824, 825, 832, 868, 1008, 1027, 1089, 1091, 1124, 1313, 1501,
1625, 1694, 1723, 1724, 1777, 2131, 2132, 2248, 2261, 2262,
2288–2290, 2508, 2673, 2717, 2812, 2813, 2817, 3047]; see
also Tables 3.1, 18.1, 22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1,
33.1, 34.1, 36.1, 37.1, 38.1, 39.1, 40.1, 41.1, 46.3, 47.1, and
48.1
10.2. BOOKS 125
Mathematics [405, 766, 930, 1020, 2200, 2349, 2357, 2418, 2572, 2885, 2939,
3078]; see also Tables 3.1, 18.1, 26.1, 27.1, 28.4, 29.1, 30.1,
31.1, 32.1, 33.1, 34.1, and 35.1
Military and Defense see Tables 3.1, 26.1, 27.1, 28.4, and 34.1
Motor Control see Tables 18.1, 22.1, 28.4, 33.1, 34.1, 36.1, and 39.1
Multi-Agent Systems [1818, 2081, 2667, 2919]; see also Tables 3.1, 18.1, 22.1, 28.4,
29.1, 31.1, 35.1, and 37.1
Multiplayer Games [127]; see also Table 31.1
News Industry [1425]
Nutrition [2971]
Operating Systems [2900]
Physics [1879]; see also Tables 3.1, 27.1, 28.4, 29.1, 31.1, 32.1, and
34.1
Police and Justice [471]
Prediction [405]; see also Tables 28.4, 29.1, 30.1, 31.1, and 35.1
Security [2459, 2878, 3022]; see also Tables 3.1, 22.1, 26.1, 27.1, 28.4,
29.1, 31.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 46.3
Shape Synthesis see Table 26.1
Software [822, 823, 827, 1008, 1027, 1168, 1651, 1850, 1851, 2000, 2066,
2131, 2132, 2330, 2961, 2971, 3058]; see also Tables 3.1, 18.1,
20.1, 21.1, 22.1, 25.1, 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 33.1,
34.1, 36.1, 37.1, 40.1, 46.1, and 46.3
Sorting see Tables 28.4 and 31.1
Telecommunication see Tables 28.4, 37.1, and 40.1
Testing [1168, 1651]; see also Tables 3.1, 28.4, and 31.3
Theorem Proofing and Automatic
Verification
see Tables 29.1, 40.1, and 46.3
Water Supply and Management see Tables 3.1 and 28.4
Wireless Communication [1219]; see also Tables 3.1, 18.1, 22.1, 26.1, 27.1, 28.4, 29.1,
31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 46.3
10.2 Books
1. Handbook of Evolutionary Computation [171]
2. New Ideas in Optimization [638]
3. Scalable Optimization via Probabilistic Modeling – From Algorithms to Applications [2158]
4. Handbook of Metaheuristics [1068]
5. New Optimization Techniques in Engineering [2083]
6. Metaheuristics for Multiobjective Optimisation [1014]
7. Selected Papers from the Workshop on Pattern Directed Inference Systems [2869]
8. Applications of Multi-Objective Evolutionary Algorithms [597]
9. Handbook of Applied Optimization [2125]
10. The Traveling Salesman Problem: A Computational Study [118]
11. Intelligent Systems for Automated Learning and Adaptation: Emerging Trends and Applica-
tions [561]
12. Optimization and Operations Research [773]
13. Readings in Artificial Intelligence: A Collection of Articles [2872]
14. Machine Learning: Principles and Techniques [974]
15. Experimental Methods for the Analysis of Optimization Algorithms [228]
16. Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8]
17. Evolutionary Algorithms for Solving Multi-Objective Problems [599]
18. Pareto Optimality, Game Theory and Equilibria [560]
19. Pattern Classification [844]
20. The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization [1694]
21. Parallel Evolutionary Computations [2015]
22. Introduction to Global Optimization [2126]
126 10 GENERAL INFORMATION ON OPTIMIZATION
23. Global Optimization Algorithms – Theory and Application [2892]
24. Computational Intelligence in Control [1922]
25. Introduction to Operations Research [580]
26. Management Models and Industrial Applications of Linear Programming [530]
27. Efficient and Accurate Parallel Genetic Algorithms [477]
28. Swarm Intelligence – Focus on Ant and Particle Swarm Optimization [525]
29. Nonlinear Programming: Sequential Unconstrained Minimization Techniques [908]
30. Nonlinear Programming: Sequential Unconstrained Minimization Techniques [909]
31. Global Optimization: Deterministic Approaches [1271]
32. Soft Computing: Methodologies and Applications [1242]
33. Handbook of Global Optimization [1272]
34. New Learning Paradigms in Soft Computing [1430]
35. How to Solve It: Modern Heuristics [1887]
36. Numerical Optimization [2040]
37. Artificial Intelligence: A Modern Approach [2356]
38. Electromagnetic Optimization by Genetic Algorithms [2250]
39. Soft Computing: Integrating Evolutionary, Neural, and Fuzzy Systems [2685]
40. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
[2961]
41. Multi-Objective Swarm Intelligent Systems: Theory & Experiences [2016]
42. Evolutionary Optimization [2394]
43. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis
and Design [460]
44. Multi-Objective Optimization in Computational Intelligence: Theory and Practice [438]
45. An Operations Research Approach to the Economic Optimization of a Kraft Pulping Process
[500]
46. Fundamental Methods of Mathematical Economics [559]
47. Einf¨ uhrung in die K¨ unstliche Intelligenz [797]
48. Stochastic and Global Optimization [850]
49. Machine Learning Applications in Expert Systems and Information Retrieval [975]
50. Nonlinear and Dynamics Programming [1162]
51. Introduction to Operations Research [1232]
52. Introduction to Stochastic Search and Optimization [2561]
53. Nonlinear Multiobjective Optimization [1893]
54. Heuristics: Intelligent Search Strategies for Computer Problem Solving [2140]
55. Global Optimization with Non-Convex Constraints [2623]
56. The Nature of Statistical Learning Theory [2788]
57. Advances in Greedy Algorithms [247]
58. Multiobjective Decision Making Theory and Methodology [528]
59. Multiobjective Optimization: Principles and Case Studies [609]
60. Multi-Objective Optimization Using Evolutionary Algorithms [740]
61. Multicriteria Optimization [860]
62. Multiple Criteria Optimization: Theory, Computation and Application [2610]
63. Global Optimization [2714]
64. Soft Computing [760]
10.3 Conferences and Workshops
Table 10.2: Conferences and Workshops on Optimization.
10.3. CONFERENCES AND WORKSHOPS 127
IJCAI: International Joint Conference on Artificial Intelligence
History: 2009/07: Pasadena, CA, USA, see [373]
2007/01: Hyderabad, India, see [2797]
2005/07: Edinburgh, Scotland, UK, see [1474]
2003/08: Acapulco, M´exico, see [1118, 1485]
2001/08: Seattle, WA, USA, see [2012]
1999/07: Stockholm, Sweden, see [730, 731, 2296]
1997/08: Nagoya, Japan, see [1946, 1947, 2260]
1995/08: Montr´eal, QC, Canada, see [1836, 1861, 2919]
1993/08: Chamb´ery, France, see [182, 183, 927, 2259]
1991/08: Sydney, NSW, Australia, see [831, 1987, 1988, 2122]
1989/08: Detroit, MI, USA, see [2593, 2594]
1987/08: Milano, Italy, see [1846, 1847]
1985/08: Los Angeles, CA, USA, see [1469, 1470]
1983/08: Karlsruhe, Germany, see [448, 449]
1981/08: Vancouver, BC, Canada, see [1212, 1213]
1979/08: Tokyo, Japan, see [2709, 2710]
1977/08: Cambridge, MA, USA, see [2281, 2282]
1975/09: Tbilisi, Georgia, USSR, see [2678]
1973/08: Stanford, CA, USA, see [2039]
1971/09: London, UK, see [630]
1969/05: Washington, DC, USA, see [2841]
AAAI: Annual National Conference on Artificial Intelligence
History: 2011/08: San Francisco, CA, USA, see [3]
2010/07: Atlanta, GA, USA, see [2]
2008/06: Chicago, IL, USA, see [979]
2007/07: Vancouver, BC, Canada, see [1262]
2006/07: Boston, MA, USA, see [1055]
2005/07: Pittsburgh, PA, USA, see [1484]
2004/07: San Jos´e, CA, USA, see [1849]
2002/07: Edmonton, Alberta, Canada, see [759]
2000/07: Austin, TX, USA, see [1504]
1999/07: Orlando, FL, USA, see [1222]
1998/07: Madison, WI, USA, see [1965]
1997/07: Providence, RI, USA, see [1634, 1954]
1996/08: Portland, OR, USA, see [586, 2207]
1994/07: Seattle, WA, USA, see [1214]
1993/07: Washington, DC, USA, see [913]
1992/07: San Jos´e, CA, USA, see [2639]
1991/07: Anaheim, CA, USA, see [732]
1990/07: Boston, MA, USA, see [793]
1988/08: St. Paul, MN, USA, see [1916]
1987/07: Seattle, WA, USA, see [960]
128 10 GENERAL INFORMATION ON OPTIMIZATION
1986/08: Philadelphia, PA, USA, see [1512, 1513]
1984/08: Austin, TX, USA, see [381]
1983/08: Washington, DC, USA, see [1041]
1982/08: Pittsburgh, PA, USA, see [2845]
1980/08: Stanford, CA, USA, see [197]
HIS: International Conference on Hybrid Intelligent Systems
History: 2011/12: Melaka, Malaysia, see [14]
2010/08: Atlanta, GA, USA, see [1377]
2009/08: Shˇeny´ang, Li´aon´ıng, China, see [1375]
2008/10: Barcelona, Catalonia, Spain, see [2993]
2007/09: Kaiserslautern, Germany, see [1572]
2006/12: Auckland, New Zealand, see [1340]
2005/11: Rio de Janeiro, RJ, Brazil, see [2014]
2004/12: Kitakyushu, Japan, see [1421]
2003/12: Melbourne, VIC, Australia, see [10]
2002/12: Santiago, Chile, see [9]
2001/12: Adelaide, SA, Australia, see [7]
IAAI: Conference on Innovative Applications of Artificial Intelligence
History: 2011/08: San Francisco, CA, USA, see [4]
2010/07: Atlanta, GA, USA, see [2361]
2009/07: Pasadena, CA, USA, see [1164]
2006/07: Boston, MA, USA, see [1055]
2005/07: Pittsburgh, PA, USA, see [1484]
2004/07: San Jos´e, CA, USA, see [1849]
2003/08: Acapulco, M´exico, see [2304]
2001/04: Seattle, WA, USA, see [1238]
2000/07: Austin, TX, USA, see [1504]
1999/07: Orlando, FL, USA, see [1222]
1998/07: Madison, WI, USA, see [1965]
1997/07: Providence, RI, USA, see [1634]
1996/08: Portland, OR, USA, see [586]
1995/08: Montr´eal, QC, Canada, see [51]
1994/08: Seattle, WA, USA, see [462]
1993/07: Washington, DC, USA, see [1]
1992/07: San Jos´e, CA, USA, see [2439]
1991/06: Anaheim, CA, USA, see [2533]
1990/05: Washington, DC, USA, see [2269]
1989/03: Stanford, CA, USA, see [2428]
ICCI: IEEE International Conference on Cognitive Informatics
History: 2011/08: Banff, AB, Canada, see [200]
2010/07: Bˇeij¯ıng, China, see [2630]
2009/06: Kowloon, Hong Kong / Xi¯ anggˇ ang, China, see [164]
2008/08: Stanford, CA, USA, see [2857]
2007/08: Lake Tahoe, CA, USA, see [3065]
2006/07: Bˇeij¯ıng, China, see [3029]
2005/08: Irvine, CA, USA, see [1339]
10.3. CONFERENCES AND WORKSHOPS 129
2004/08: Victoria, Canada, see [524]
2003/08: London, UK, see [2133]
2002/08: Calgary, AB, Canada, see [1337]
ICML: International Conference on Machine Learning
History: 2011/06: Seattle, WA, USA, see [2442]
2010/06: Haifa, Israel, see [1163]
2009/06: Montr´eal, QC, Canada, see [681]
2008/07: Helsinki, Finland, see [603]
2007/06: Corvallis, OR, USA, see [1047]
2006/06: Pittsburgh, PA, USA, see [602]
2005/08: Bonn, North Rhine-Westphalia, Germany, see [724]
2004/07: Banff, AB, Canada, see [426]
2003/08: Washington, DC, USA, see [895]
2002/07: Sydney, NSW, Australia, see [2382]
2001/06: Williamstown, MA, USA, see [427]
2000/06: Stanford, CA, USA, see [1676]
1999/06: Bled, Slovenia, see [401]
1998/07: Madison, WI, USA, see [2471]
1997/07: Nashville, TN, USA, see [926]
1996/07: Bari, Italy, see [2367]
1995/07: Tahoe City, CA, USA, see [2221]
1994/07: New Brunswick, NJ, USA, see [601]
1993/06: Amherst, MA, USA, see [1945]
1992/07: Aberdeen, Scotland, UK, see [2520]
1991/06: Evanston, IL, USA, see [313]
1990/06: Austin, TX, USA, see [2206]
1989/06: Ithaca, NY, USA, see [2447]
1988/06: Ann Arbor, MI, USA, see [1659, 1660]
1987: Irvine, CA, USA, see [1415]
1985: Skytop, PA, USA, see [2519]
1983: Monticello, IL, USA, see [1936]
1980: Pittsburgh, PA, USA, see [2183]
MCDM: Multiple Criteria Decision Making
History: 2011/06: Jyv¨ askyl¨ a, Finland, see [1895]
2009/06: Ch´engd¯ u, S`ıchu¯an, China, see [2750]
2008/06: Auckland, New Zealand, see [861]
2006/06: Chania, Crete, Greece, see [3102]
2004/08: Whistler, BC, Canada, see [2875]
2002/02: Semmering, Austria, see [1798]
2000/06: Ankara, Turkey, see [1562]
1998/06: Charlottesville, VA, USA, see [1166]
1997/01: Cape Town, South Africa, see [2612]
1995/06: Hagen, North Rhine-Westphalia, Germany, see [891]
1994/08: Coimbra, Portugal, see [592]
1992/07: Taipei, Taiwan, see [2744]
1990/08: Fairfax, VA, USA, see [2551]
130 10 GENERAL INFORMATION ON OPTIMIZATION
1988/08: Manchester, UK, see [1759]
1986/08: Kyoto, Japan, see [2410]
1984/06: Cleveland, OH, USA, see [1165]
1982/08: Mons, Brussels, Belgium, see [1190]
1980/08: Newark, DE, USA, see [1960]
1979/08: Hagen/K¨onigswinter, West Germany, see [890]
1977/08: Buffalo, NY, USA, see [3094]
1975/05: Jouy-en-Josas, France, see [2701]
SEAL: International Conference on Simulated Evolution and Learning
History: 2012/12: Hanoi, Vietnam, see [1179]
2010/12: Kanpur, Uttar Pradesh, India, see [2585]
2008/12: Melbourne, VIC, Australia, see [1729]
2006/10: H´ef´ei,
¯
Anhu¯ı, China, see [2856]
2004/10: Busan, South Korea, see [2123]
2002/11: Singapore, see [2661]
2000/10: Nagoya, Japan, see [1991]
1998/11: Canberra, Australia, see [1852]
1996/11: Taejon, South Korea, see [2577]
AI: Australian Joint Conference on Artificial Intelligence
History: 2010/12: Adelaide, SA, Australia, see [33]
2006/12: Hobart, Australia, see [2406]
2005/12: Sydney, NSW, Australia, see [3072]
2004/12: Cairns, Australia, see [2871]
AISB: Artificial Intelligence and Simulation of Behaviour
History: 2008/04: Aberdeen, Scotland, UK, see [1147]
2007/04: Newcastle upon Tyne, UK, see [2549]
2005/04: Bristol, UK and Hatfield, England, UK, see [2547, 2548]
2003/03: Aberystwyth, Wales, UK and Leeds, UK, see [2545, 2546]
2002/04: London, UK, see [2544]
2001/03: Heslington, York, UK, see [2759]
2000/04: Birmingham, UK, see [2748]
1997/04: Manchester, UK, see [637]
1996/04: Brighton, UK, see [938]
1995/04: Sheffield, UK, see [937]
1994/04: Leeds, UK, see [936]
BIONETICS: International Conference on Bio-Inspired Models of Network, Information, and Com-
puting Systems
History: 2010/12: Boston, MA, USA, see [367]
2009/12: Avignon, France, see [1275]
2008/11: Awaji City, Hyogo, Japan, see [153]
2007/12: Budapest, Hungary, see [1342]
2006/12: Cavalese, Italy, see [509]
ICARIS: International Conference on Artificial Immune Systems
History: 2011/07: Cambridge, UK, see [1203]
2010/07: Edinburgh, Scotland, UK, see [858]
2009/08: York, UK, see [2363]
2008/08: Phuket, Thailand, see [274]
10.3. CONFERENCES AND WORKSHOPS 131
2007/08: Santos, Brazil, see [707]
2006/09: Oeiras, Portugal, see [282]
2005/08: Banff, AB, Canada, see [1424]
2004/09: Catania, Sicily, Italy, see [2035]
2003/09: Edinburgh, Scotland, UK, see [2706]
: Canterbury, Kent, UK, see [2705]
MICAI: Mexican International Conference on Artificial Intelligence
History: 2010/11: Pachuca, Mexico, see [2583]
2009/11: Guanajuato, M´exico, see [38]
2007/11: Aguascalientes, M´exico, see [1037]
2005/11: Monterrey, M´exico, see [1039]
2004/04: Mexico City, M´exico, see [1929]
2002/04: M´erida, Yucat´ an, M´exico, see [598]
2000/04: Acapulco, M´exico, see [470]
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
History: 2011/10: Cluj-Napoca, Romania, see [2576]
2010/05: Granada, Spain, see [1102]
2008/11: Puerto de la Cruz, Tenerife, Spain, see [1620]
2007/11: Catania, Sicily, Italy, see [1619]
2006/06: Granada, Spain, see [2159]
SMC: International Conference on Systems, Man, and Cybernetics
History: 2014/10: San Diego, CA, USA, see [2386]
2013/10: Edinburgh, Scotland, UK, see [859]
2012/10: Seoul, South Korea, see [2454]
2011/10: Anchorage, AK, USA, see [91]
2009/10: San Antonio, TX, USA, see [1352]
2001/10: Tucson, AZ, USA, see [1366]
1999/10: Tokyo, Japan, see [1331]
1998/10: La Jolla, CA, USA, see [1364]
WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications)
History: 2010/11: see [1029]
2009/11: see [1015]
2008/11: see [1857]
2007/10: see [151]
2006/09: see [2980]
2005: see [2979]
2004: see [2978]
2003: see [2977]
2002: see [2976]
2001: see [2975]
2000: see [2974]
1999/09: Nagoya, Japan, see [1993]
1998/08: Nagoya, Japan, see [1992]
1997/06: see [535]
1996/08: Nagoya, Japan, see [1001]
AIMS: Artificial Intelligence in Mobile System
132 10 GENERAL INFORMATION ON OPTIMIZATION
History: 2003/10: Seattle, WA, USA, see [1624]
CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology
History: 2008/10: Cergy-Pontoise, France, see [3049]
2005/05: Muroran, Japan, see [11]
ECML: European Conference on Machine Learning
History: 2007/09: Warsaw, Poland, see [2866]
2006/09: Berlin, Germany, see [278]
2005/10: Porto, Portugal, see [2209]
2004/09: Pisa, Italy, see [2182]
2003/09: Cavtat, Dubrovnik, Croatia, see [512]
2002/08: Helsinki, Finland, see [1221]
2001/09: Freiburg, Germany, see [992]
2000/05: Barcelona, Catalonia, Spain, see [215]
1998/04: Chemnitz, Sachsen, Germany, see [537]
1997/04: Prague, Czech Republic, see [2781]
1995/04: Heraclion, Crete, Greece, see [1223]
1994/04: Catania, Sicily, Italy, see [507]
1993/04: Vienna, Austria, see [2809]
EPIA: Portuguese Conference on Artificial Intelligence
History: 2011/10: Lisbon, Portugal, see [1744]
2009/10: Aveiro, Portugal, see [1774]
2007/12: Guimar˜aes, Portugal, see [2026]
2005/12: Covilh˜ a, Portugal, see [275]
EUFIT: European Congress on Intelligent Techniques and Soft Computing
History: 1999/09: Aachen, North Rhine-Westphalia, Germany, see [2807]
1998/09: Aachen, North Rhine-Westphalia, Germany, see [2806]
1997/09: Aachen, North Rhine-Westphalia, Germany, see [3091]
1996/09: Aachen, North Rhine-Westphalia, Germany, see [877]
1995/08: Aachen, North Rhine-Westphalia, Germany, see [2805]
1994/09: Aachen, North Rhine-Westphalia, Germany, see [2803]
1993/09: Aachen, North Rhine-Westphalia, Germany, see [2804]
EngOpt: International Conference on Engineering Optimization
History: 2010/09: Lisbon, Portugal, see [1405]
2008/06: Rio de Janeiro, RJ, Brazil, see [1227]
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
HAIS: International Workshop on Hybrid Artificial Intelligence Systems
History: 2011/05: Warsaw, Poland, see [2867]
2009/06: Salamanca, Spain, see [2749]
2008/09: Burgos, Spain, see [632]
2007/11: Salamanca, Spain, see [631]
2006/10: Ribeir˜ao Preto, SP, Brazil, see [184]
ICAART: International Conference on Agents and Artificial Intelligence
History: 2011/01: Rome, Italy, see [1404]
2010/01: Val`encia, Spain, see [917]
10.3. CONFERENCES AND WORKSHOPS 133
2009/01: Porto, Portugal, see [916]
ICTAI: International Conference on Tools with Artificial Intelligence
History: 2003/11: Sacramento, CA, USA, see [1367]
IEA/AIE: International Conference on Industrial and Engineering Applications of Artificial Intel-
ligence and Expert Systems
History: 2004/05: Ottawa, ON, Canada, see [2088]
ISA: International Workshop on Intelligent Systems and Applications
History: 2011/05: Wˇ uh`an, H´ ubˇei, China, see [1380]
2010/05: Wˇ uh`an, H´ ubˇei, China, see [2846]
2009/05: Wˇ uh`an, H´ ubˇei, China, see [2847]
ISICA: International Symposium on Advances in Computation and Intelligence
History: 2007/09: Wˇ uh`an, H´ ubˇei, China, see [1490]
LION: Learning and Intelligent OptimizatioN
History: 2011/01: Rome, Italy, see [2318]
2010/01: Venice, Italy, see [2581]
2009/01: Trento, Italy, see [2625]
2007/02: Andalo, Trento, Italy, see [1273, 1826]
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
History: 2010/09: St Andrews, Scotland, UK, see [780]
2009/09: Lisbon, Portugal, see [779]
2008/09: Sydney, NSW, Australia, see [2011]
2007/09: Providence, RI, USA, see [2225]
2006/09: Nantes, France, see [583]
2005/10: Barcelona, Catalonia, Spain, see [1860]
2004/09: Toronto, ON, Canada, see [2715]
SoCPaR: International Conference on SOft Computing and PAttern Recognition
History: 2011/10: D` ali´ an, Li´aon´ıng, China, see [1381]
2010/12: Cergy-Pontoise, France, see [515]
2009/12: Malacca, Malaysia, see [1224]
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
History: 1992/09: Tutzing, Germany, see [2742]
1989: Germany, see [245]
1986/07: Neubiberg, Bavaria, Germany, see [244]
1983/06: Neubiberg, Bavaria, Germany, see [243]
AINS: Annual Symposium on Autonomous Intelligent Networks and Systems
History: 2003/06: Menlo Park, CA, USA, see [513]
ALIO/EURO: Workshop on Applied Combinatorial Optimization
History: 2011/05: Porto, Portugal, see [138]
2008/12: Buenos Aires, Argentina, see [137]
2005/10: Paris, France, see [136]
2002/11: Puc´on, Chile, see [135]
1999/11: Erice, Italy, see [134]
1996/11: Valpara´ıso, Chile, see [133]
1989/08: Rio de Janeiro, RJ, Brazil, see [132]
ANZIIS: Australia and New Zealand Conference on Intelligent Information Systems
History: 1994/11: Brisbane, QLD, Australia, see [1359]
134 10 GENERAL INFORMATION ON OPTIMIZATION
ASC: IASTED International Conference on Artificial Intelligence and Soft Computing
History: 2002/07: Banff, AB, Canada, see [1714]
EDM: International Conference on Educational Data Mining
History: 2009/07: C´ordoba, Spain, see [217]
EWSL: European Working Session on Learning
Continued as ECML in 1993.
History: 1991/03: Porto, Portugal, see [1560]
1988/10: Glasgow, Scotland, UK, see [1061]
1987/05: Bled, Yugoslavia (now Slovenia), see [322]
1986/02: Orsay, France, see [2097]
FLAIRS: Florida Artificial Intelligence Research Symposium
History: 1994/05: Pensacola Beach, FL, USA, see [1360]
FWGA: Finnish Workshop on Genetic Algorithms and Their Applications
See Table 29.2 on page 355.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
History: 2006/12: Vienna, Austria, see [2024]
ICMLC: International Conference on Machine Learning and Cybernetics
History: 2005/08: Guˇangzh¯ou, Guˇangd¯ ong, China, see [2263]
IICAI: Indian International Conference on Artificial Intelligence
History: 2011/12: Tumkur, India, see [2736]
2009/12: Tumkur, India, see [2217]
2007/12: Pune, India, see [2216]
2005/12: Pune, India, see [2215]
2003/12: Hyderabad, India, see [2214]
ISIS: International Symposium on Advanced Intelligent Systems
History: 2007/09: Sokcho, Korea, see [1579]
MCDA: International Summer School on Multicriteria Decision Aid
History: 1998/07: Monte Estoril, Lisbon, Portugal, see [198]
PAKDD: Pacific-Asia Conference on Knowledge Discovery and Data Mining
History: 2011/05: Sh¯enzh`en, Guˇangd¯ ong, China, see [1294]
STeP: Finnish Artificial Intelligence Conference
History: 2008/08: Espoo, Finland, see [2255]
TAAI: Conference on Technologies and Applications of Artificial Intelligence
History: 2010/11: Hsinchu, Taiwan, see [1283]
2009/10: Wufeng Township, Taichung County, Taiwan, see [529]
2008/11: Chiao-hsi Shiang, I-lan County, Taiwan, see [2659]
2007/11: Douliou, Yunlin, Taiwan, see [2006]
2006/12: Kaohsiung, Taiwan, see [1493]
2005/12: Kaohsiung, Taiwan, see [2004]
2004/11: Wenshan District, Taipei City, Taiwan, see [2003]
TAI: IEEE Conference on Tools for Artificial Intelligence
History: 1990/11: Herndon, VA, USA, see [1358]
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
History: 1997/09: Rytro, Poland, see [2362]
10.4. JOURNALS 135
Machine Intelligence Workshop
History: 1977: Santa Cruz, CA, USA, see [874]
TAINN: Turkish Symposium on Artificial Intelligence and Neural Networks (Tu¨ark Yapay Zeka ve
Yapay Sinir A˘gları Sempozyumu)
History: 1993/06: Istanbul, Turkey, see [1643]
AGI: Conference on Artificial General Intelligence
History: 2011/08: Mountain View, CA, USA, see [2587]
2010/03: Lugano, Switzerland, see [1786]
2009/03: Arlington, VA, USA, see [661]
2008/03: Memphis, TN, USA, see [1414]
AICS: Irish Conference on Artificial Intelligence and Cognitive Science
History: 2007/08: Dublin, Ireland, see [764]
AIIA: Congress of the Italian Association for Artificial Intelligence (AI*IA) on Trends in Artificial
Intelligence
History: 1991/10: Palermo, Italy, see [124]
ECML PKDD: European Conference on Machine Learning and European Conference on Principles
and Practice of Knowledge Discovery in Databases
History: 2009/09: Bled, Slovenia, see [321]
2008/09: Antwerp, Belgium, see [114]
SEA: International Symposium on Experimental Algorithms
History: 2011/05: Chania, Crete, Greece, see [2098]
2010/05: Ischa Island, Naples, Italy, see [2584]
10.4 Journals
1. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics published by
IEEE Systems, Man, and Cybernetics Society
2. Information Sciences – Informatics and Computer Science Intelligent Systems Applications:
An International Journal published by Elsevier Science Publishers B.V.
3. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers
B.V. and North-Holland Scientific Publishers Ltd.
4. Machine Learning published by Kluwer Academic Publishers and Springer Netherlands
5. Applied Soft Computing published by Elsevier Science Publishers B.V.
6. Soft Computing – A Fusion of Foundations, Methodologies and Applications published by
Springer-Verlag
7. Computers & Operations Research published by Elsevier Science Publishers B.V. and Perg-
amon Press
8. Artificial Intelligence published by Elsevier Science Publishers B.V.
9. Operations Research published by HighWire Press (Stanford University) and Institute for
Operations Research and the Management Sciences (INFORMS)
10. Annals of Operations Research published by J. C. Baltzer AG, Science Publishers and Springer
Netherlands
11. International Journal of Computational Intelligence and Applications (IJCIA) published by
Imperial College Press Co. and World Scientific Publishing Co.
12. ORSA Journal on Computing published by Operations Research Society of America (ORSA)
13. Management Science published by HighWire Press (Stanford University) and Institute for
Operations Research and the Management Sciences (INFORMS)
14. Journal of Artificial Intelligence Research (JAIR) published by AAAI Press and AI Access
Foundation, Inc.
15. Computational Optimization and Applications published by Kluwer Academic Publishers and
Springer Netherlands
16. The Journal of the Operational Research Society (JORS) published by Operations Research
Society and Palgrave Macmillan Ltd.
17. Data Mining and Knowledge Discovery published by Springer Netherlands
136 10 GENERAL INFORMATION ON OPTIMIZATION
18. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied
Mathematics (SIAM)
19. Mathematics of Operations Research (MOR) published by Institute for Operations Research
and the Management Sciences (INFORMS)
20. Journal of Global Optimization published by Springer Netherlands
21. AIAA Journal published by American Institute of Aeronautics and Astronautics
22. Journal of Artificial Evolution and Applications published by Hindawi Publishing Corporation
23. Applied Intelligence – The International Journal of Artificial Intelligence, Neural Networks,
and Complex Problem-Solving Technologies published by Springer Netherlands
24. Artificial Intelligence Review – An International Science and Engineering Journal published
by Kluwer Academic Publishers and Springer Netherlands
25. Annals of Mathematics and Artificial Intelligence published by Springer Netherlands
26. IEEE Transactions on Knowledge and Data Engineering published by IEEE Computer Soci-
ety Press
27. IEEE Intelligent Systems Magazine published by IEEE Computer Society Press
28. ACM SIGART Bulletin published by ACM Press
29. Journal of Machine Learning Research (JMLR) published by MIT Press
30. Discrete Optimization published by Elsevier Science Publishers B.V.
31. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
32. Artificial Intelligence in Engineering published by Elsevier Science Publishers B.V.
33. Engineering Optimization published by Informa plc and Taylor and Francis LLC
34. Optimization and Engineering published by Kluwer Academic Publishers and Springer
Netherlands
35. International Journal of Organizational and Collective Intelligence (IJOCI) published by Idea
Group Publishing (Idea Group Inc., IGI Global)
36. Journal of Intelligent Information Systems published by Springer-Verlag GmbH
37. International Journal of Knowledge-Based and Intelligent Engineering Systems published by
IOS Press
38. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) published by
IEEE Computer Society
39. Optimization Methods and Software published by Taylor and Francis LLC
40. Journal of Mathematical Modelling and Algorithms published by Springer Netherlands
41. Structural and Multidisciplinary Optimization published by Springer-Verlag GmbH
42. OR/MS Today published by Institute for Operations Research and the Management Sciences
(INFORMS) and Lionheart Publishing, Inc.
43. AI Magazine published by AAAI Press
44. Computers in Industry – An International, Application Oriented Research Journal published
by Elsevier Science Publishers B.V.
45. International Journal Approximate Reasoning published by Elsevier Science Publishers B.V.
46. International Journal of Intelligent Systems published by Wiley Interscience
47. International Journal of Production Research published by Taylor and Francis LLC
48. International Transactions in Operational Research published by Blackwell Publishing Ltd
49. INFORMS Journal on Computing (JOC) published by HighWire Press (Stanford University)
and Institute for Operations Research and the Management Sciences (INFORMS)
50. OR Spectrum – Quantitative Approaches in Management published by Springer-Verlag
51. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization pub-
lished by SIAM Activity Group on Optimization – SIAG/OPT
52. K¨ unstliche Intelligenz (KI) published by B¨ottcher IT Verlag
Part II
Difficulties in Optimization
138 10 GENERAL INFORMATION ON OPTIMIZATION
Chapter 11
Introduction
The classification of optimization algorithms in Section 1.3 and the table of contents of this
book enumerate a wide variety of optimization algorithms. Yet, the approaches introduced
here resemble only a fraction of the actual number of available methods. It is a justified
question to ask why there are so many different approaches, why is this variety needed? One
possible answer is simply because there are so many different kinds of optimization tasks.
Each of them puts different obstacles into the way of the optimizers and comes with own,
characteristic difficulties.
In this part we want to discuss the most important of these complications, the major
problems that may be encountered during optimization [2915]. You may wish to read this
part after reading through some of the later parts on specific optimization algorithms or
after solving your first optimization task. You will then realize that many of the problems
you encountered are discussed here in depth.
Some of subjects in the following text concern Global Optimization in general (multi-
modality and overfitting, for instance), others apply especially to nature-inspired approaches
like Genetic Algorithms (epistasis and neutrality, for example). Sometimes, neglecting a
single one of these aspects during the development of the optimization process can render
the whole efforts invested useless, even if highly efficient optimization techniques are applied.
By giving clear definitions and comprehensive introductions to these topics, we want to raise
the awareness of scientists and practitioners in the industry and hope to help them to use
optimization algorithms more efficiently.
In Figure 11.1, we have sketched a set of different types of fitness landscapes (see Sec-
tion 5.1) which we are going to discuss. The objective values in the figure are subject to
minimization and the small bubbles represent candidate solutions under investigation. An
arrow from one bubble to another means that the second individual is found by applying
one search operation to the first one.
Before we go more into detail about what makes these landscapes difficult, we should
establish the term in the context of optimization. The degree of difficulty of solving a certain
problem with a dedicated algorithm is closely related to its computational complexity. We
will outline this in the next section and show that for many problems, there is no algorithm
which can exactly solve them in reasonable time.
As already stated, optimization algorithms are guided by objective functions. A function
is difficult from a mathematical perspective in this context if it is not continuous, not
differentiable, or if it has multiple maxima and minima. This understanding of difficulty
comes very close to the intuitive sketches in Figure 11.1.
In many real world applications of metaheuristic optimization, the characteristics of the
objective functions are not known in advance. The problems are usually very complex and it
is only rarely possible to derive boundaries for the performance or the runtime of optimizers
in advance, let alone exact estimates with mathematical precision.
In such cases, experience and general rules of thumb are often the best guides available
140 11 INTRODUCTION
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.a: Best Case
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.b: Low Variation
multiple(local)optima o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.c: Multi-Modal
nousefulgradientinformation
? ? ?
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
Fig. 11.1.d: Rugged
regionwithmisleading
gradientinformation
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.e: Deceptive
?
neutralarea
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.f: Neutral
?
?
neutral area or
areawithoutmuch
information
needle
(isolated
optimum)
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.g: Needle-In-A-Haystack
?
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 11.1.h: Nightmare
Figure 11.1: Different possible properties of fitness landscapes (minimization).
11 INTRODUCTION 141
(see also Chapter 24). Empirical results based on models obtained from related research
areas such as biology are sometimes helpful as well. In this part, we discuss many such models
and rules, providing a better understanding of when the application of a metaheuristic is
feasible and when not, as well as on how to avoid defining problems in a way that makes
them difficult.
Chapter 12
Problem Hardness
Most of the algorithms we will discuss in this book, in particular the metaheuristics, usually
give approximations as results but cannot guarantee to always find the globally optimal
solution. However, we are actually able to find the global optima for all given problems (of
course, limited to the precision of the available computers). Indeed, the easiest way to do
so would be exhaustive enumeration (see Section 8.3), to simply test all possible values from
the problem space and to return the best one.
This method, obviously, will take extremely long so our goal is to find an algorithm
which is better. More precisely, we want to find the best algorithm for solving a given class
of problems. This means that we need some sort of metric with which we can compare the
efficiency of algorithms [2877, 2940].
12.1 Algorithmic Complexity
The most important measures obtained by analyzing an algorithm
1
are the time that it takes
to produce the wanted outcome and the storage space needed for internal data [635]. We
call those the time complexity and the space complexity dimensions. The time-complexity
denotes how many steps algorithms need until they return their results. The space complex-
ity determines how much memory an algorithm consumes at most in one run. Of course,
these measures depend on the input values passed to the algorithm.
If we have an algorithm that should decide whether a given number is prime or not, the
number of steps needed to find that out will differ if the size of the inputs are different.
For example, it is easy to see that telling whether 7 is prime or not can be done in much
shorter time than verifying the same property for 2
32 582 657
− 1. The nature of the input
may influence the runtime and space requirements of an algorithm as well. We immediately
know that 1 000 000 000 000 000 is not prime, though it is a big number, whereas the answer
in case of the smaller number 100 000 007 is less clear.
2
Therefore, often best, average, and
worst-case complexities exist for given input sizes.
In order to compare the efficiency of algorithms, some approximative notations have been
introduced [1555, 1556, 2510]. As we just have seen, the basic criterion influencing the time
and space requirements of an algorithm applied to a problem is the size of the inputs, i. e.,
the number of bits needed to represent the number from which we want to find out whether
it is prime or not, the number of elements in a list to be sorted, the number of nodes or
edges in a graph to be 3-colored, and so on. We can describe this dependency as a function
of this size.
In real systems however, the knowledge of the exact dependency is not needed. If we,
for example, know that sorting n data elements with the Quicksort algorithm
3
[1240, 1557]
1
http://en.wikipedia.org/wiki/Analysis_of_algorithms [accessed 2007-07-03]
2
100 000 007 is indeed a prime, in case you’re wondering, 2
32 582 657
− 1 too.
3
http://en.wikipedia.org/wiki/Quicksort [accessed 2007-07-03]
144 12 PROBLEM HARDNESS
takes in average something about nlog n steps, this is sufficient enough, even if the correct
number is 2nln n ≈ 1.39nlog n.
The Big-O-family notations introduced by Bachmann [163] and made popular by Landau
[1664] allow us to group functions together that rise at approximately the same speed.
Definition D12.1 (Big-O notation). The big-O notation
4
is a mathematical notation
used to describe the asymptotical upper bound of functions.
f (x) ∈ O(g(x)) ⇔∃ x
0
, m ∈ R : m > 0 ∧ [f (x) [ ≤ m[g(x) [ ∀ x > x
0
(12.1)
In other words, a function f (x) is in O of another function g(x) if and only if there exists a
real number x
0
and a constant, positive factor m so that the absolute value of f (x) is smaller
(or equal) than m-times the absolute value of g(x) for all x that are greater than x
0
. f (x)
thus cannot grow (considerably) faster than g(x). Therefore, x
3
+x
2
+x+1 = f (x) ∈ O
_
x
3
_
since for m = 5 and x
0
= 2 it holds that 5x
3
> x
3
+x
2
+x + 1 ∀x ≥ 2.
In terms of algorithmic complexity, we specify the amount of steps or memory units an
algorithm needs in dependency on the size of its inputs in the big-O notation. More gen-
erally, this specification is based on key-features of the input data. For a graph-processing
algorithm, this may be the number of edges [E[ and the number of vertexes [V [ of a graph
to be processed by a graph-processing algorithm and the algorithmic complexity may be
O
_
[V [
2
+
_
[E[[V [
_
. Some examples for single-parametric functions can be found in Ex-
ample E12.1.
Example E12.1 (Some examples of the big-O notation).
class examples description
O(1) f
1
(x) = 2
222
,
f
2
(x) = sinx
Algorithms that have constant runtime for all in-
puts are O(1).
O(log n) f
3
(x) = log x,
f
4
(x) = f
4
_
x
2
_
+
1; f
4
(x < 1) = 0
Logarithmic complexity is often a feature of al-
gorithms that run on binary trees or search algo-
rithms in ordered sets. Notice that O(log n) im-
plies that only parts of the input of the algorithm
is read/regarded, since the input has length n and
we only perform mlog n steps.
O(n) f
5
(x) = 23n + 4,
f
6
(x) =
n
2
Algorithms of O(n) require to access and process
their input a constant number of times. This is
for example the case when searching in a linked
list.
O(nlog n) f
7
(x) = 23x +xlog 7x Many sorting algorithms like Quicksort and
Mergesort are in O(nlog n)
O
_
n
2
_
f
8
(x) = 34x
2
,
f
9
(x) =

x+3
i=0
x −2
Some sorting algorithms like selection sort have
this complexity. For many problems, O
_
n
2
_
-
solutions are acceptable good.
O
_
n
i
_
:
i > 1, i ∈ R
f
10
(x) = x
5
−x
2
The general polynomial complexity. In this group
we find many algorithms that work on graphs.
O(2
n
) f
11
(x) = 23 ∗ 2
x
Algorithms with exponential complexity perform
slowly and become infeasible with increasing in-
put size. For many hard problems, there exist
only algorithms of this class. Their solution can
otherwise only be approximated by the means of
randomized global optimization techniques.
12.2. COMPLEXITY CLASSES 145
f(x)=x
f(x)=x
2
f(x)=x
4
f(x)=x
8
1
10
100
1000
1million
1billion
1trillion
10
15
10
20
10
30
10
25
10
40
10
35
4 8 1 2 64 128 16 32 256 512 1024 2048
f(x)=x
10
f(x)=1.1
x
msperday
picoseconds
sincebigbang
f(x)=x
x
f(x)=x! f(x)=e
x
f(x)=2
x
f(x)
x
Figure 12.1: Illustration of some functions illustrating their rising speed, gleaned from [2365].
In Example E12.1, we give some examples for the big-O notation and name kinds of algo-
rithms which have such complexities. The rising behavior of some functions is illustrated in
Figure 12.1. From this figure it becomes clear that problems for which only algorithms with
high complexities such as O(2
n
) exist quickly become intractable even for small input sizes
n.
12.2 Complexity Classes
While the big-O notation is used to state how long a specific algorithm needs (at least/in
average/at most) for input with certain features, the computational complexity
5
of a problem
is bounded by the best algorithm known for it. In other words, it makes a statement about
how much resources are necessary for solving a given problem, or, from the opposite point
of view, tell us whether we can solve a problem with given available resources.
12.2.1 Turing Machines
The amount of resources required at least to solve a problem depends on the problem
itself and on the machine used for solving it. One of the most basic computer models are
deterministic Turing machines
6
(DTMs) [2737] as sketched in Fig. 12.2.a. They consist of
a tape of infinite length which is divided into distinct cells, each capable of holding exactly
one symbol from a finite alphabet. A Turing machine has a head which is always located
at one of these cells and can read and write the symbols. The machine has a finite set of
possible states and is driven by rules. These rules decide which symbol should be written
and/or move the head one unit to the left or right based on the machine’s current state and
the symbol on the tape under its head. With this simple machine, any of our computers
and, hence, also the programs executed by them, can be simulated.
7
4
http://en.wikipedia.org/wiki/Big_O_notation [accessed 2007-07-03]
5
http://en.wikipedia.org/wiki/Computational_complexity_theory [accessed 2010-06-24]
6
http://en.wikipedia.org/wiki/Turing_machine [accessed 2010-06-24]
7
Another computational model, RAM (Random Access Machine) is quite close to today’s computer
architectures and can be simulated with Turing machines efficiently [1804] and vice versa.
146 12 PROBLEM HARDNESS
1 0
0
0
0
0
0 0 0
0
1
1
1
0
0
0
0
TapeSymbol State Print Motion NextState Þ
A 0 1 Right A Þ
A 1 0 Right B Þ
B 0 1 B Right Þ
B 1 1 Left C Þ
C 1 0 Left C Þ
C 0 1 Right A Þ
Fig. 12.2.a: A deterministic Turing machine.
TapeSymbol State Print Motion NextState Þ
A 0 1 Right A Þ
B 0 1 B Right Þ
B 1 1 Left C Þ
C 1 0 Left C Þ
C 0 Right 1 A Þ
C 0 1 Left A Þ
A 1 0 Right B Þ
A 1 0 Right A Þ
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
1 0
0
0
0
0
0 0 0
0
1
1 1 0
0 0
0
Fig. 12.2.b: A non-deterministic Turing machine.
Figure 12.2: Turing machines.
Like DTMs, non-deterministic Turing machine
8
(NTMs, see Fig. 12.2.b) are thought
experiments rather than real computers. They are different from DTMs in that they may
have more than one rule for a state/input pair. Whenever there is an ambiguous situation
where n > 1 rules could be applied, the NTM can be thought of to “fork” into n independent
NTMs running in parallel. The computation of the overall system stops when any of these
reaches a final, accepting state. NTMs can be simulated with DTMs by performing a
breadth-first search on the possible computation steps of the NTM until it finds an accepting
state. The number of steps required for this simulation, however, grows exponentially with
the length of the shortest accepting path of the NTM [1483].
12.2.2 P, NP, Hardness, and Completeness
The set of all problems which can be solved by a deterministic Turing machine within
polynomial time is called T [1025]. This means that for the problems within this class,
there exists at least one algorithm (for a DTM) with an algorithmic complexity in O(n
p
)
(where n is the input size and p ∈ N
1
is some natural number).
Algorithms designed for Turing machines can be translated to algorithms for normal PCs.
Since it quite easy to simulate a DTM (with finite tape) with an off-the-shelf computer and
vice versa, these are also the problems that can be solved within polynomial time by existing,
real computers. These are the problems which are said to be exactly solvable in a feasible
way [359, 1092].
AT, on the other hand, includes all problems which can be solved by a non-deterministic
Turing machine in a runtime rising polynomial with the input size. It is the set of all problems
for which solutions can be checked in polynomial time [1549].
Clearly, these include all the problems from T, since DTMs can be considered as a
subset of NTMs and if a deterministic Turing machine can solve something in polynomial
time, a non-deterministic one can as well. The other way around, i.e., the question whether
AT ⊆ T and, hence, T = AT, is much more complicated and so far, there neither exists
proof against nor for it [1549]. However, it is strongly assumed that problems in AT need
super-polynomial, exponential, time to be solved by deterministic Turing machines and
therefore also by computers as we know and use them.
8
http://en.wikipedia.org/wiki/Non-deterministic_Turing_machine [accessed 2010-06-24]
12.3. THE PROBLEM: MANY REAL-WORLD TASKS ARE AT-HARD 147
If problem A can be transformed in a way so that we can use an algorithm for problem
B to solve A, we say A reduces to B. This means that A is no more difficult than B. A
problem is hard for a complexity class if every other problem in this class can be reduced
to it. For classes larger than T, reductions which take no more than polynomial time in the
input size are usually used. The problems which are hard for AT are called AT-hard. A
problem which is in a complexity class C and hard for C is called complete for C, hence the
name AT-complete.
12.3 The Problem: Many Real-World Tasks are NP-hard
From a practical perspective, problems in T are usually friendly. Everything with an algo-
rithmic complexity in O
_
n
5
_
and less can usually be solved more or less efficiently if n does
not get too large. Searching and sorting algorithms, with complexities of between log n and
nlog n, are good examples for such problems which, in most cases, are not bothersome.
However, many real-world problems are AT-hard or, without formal proof, can be as-
sumed to be at least as bad as that. So far, no polynomial-time algorithm for DTMs has
been discovered for this class of problems and the runtime of the algorithms we have for
them increases exponentially with the input size.
In 1962, Bremermann [407] pointed out that “No data processing system whether artificial
or living can process more than 2 10
47
bits per second per gram of its mass”. This is an
absolute limit to computation power which we are very unlikely to ever be able to unleash.
Yet, even with this power, if the number of possible states of a system grow exponentially
with its size, any solution attempt becomes infeasible quickly. Minsky [1907], for instance,
estimates the number of all possible move sequences in chess to be around 1 10
120
.
The Traveling Salesman Problem which we initially introduced in Example E2.2 on
page 45 (TSP) is a classical example for AT-complete combinatorial optimization problems.
Since many Vehicle Routing Problems (VRPs) [2912, 2914] are basically iterated variants
of the TSP, they fall at least into the same complexity class. This means that until now,
there exists no algorithm with a less-than exponential complexity which can be used to find
optimal transportation plans for a logistics company. In other words, as soon as a logistics
company has to satisfy transportation orders of more than a handful of customers, it will
likely not be able to serve them in an optimal way.
The Knapsack problem [463], i. e., answering the question on how to fit as many items
(of different weight and value) into a weight-limited container so that the totally value of
the items in the container is maximized, is AT-complete as well. Like the TSP, it occurs in
a variety of practical applications.
The first known AT-complete problem is the satisfiability problem (SAT) [626]. Here,
the problem is to decide whether or not there is an assignment of values to a set of Boolean
variables x
1
, x
2
, . . . , x
n
which makes a given Boolean expression based on these variables,
parentheses, ∧, ∨, and evaluate to true.
These few examples and the wide area of practical application problems which can be
reduced to them clearly show one thing: Unless T = AT can be found at some point in
time, there will be no way to solve a variety of problems exactly.
12.4 Countermeasures
Solving a problem exactly to optimality is not always possible, as we have learned in the
previous section. If we have to deal with AT-complete problems, we will therefore have
to give up on some of the features which we expect an algorithm or a solution to have. A
randomized algorithm includes at least one instruction that acts on the basis of random de-
cision, as stated in Definition D1.13 on page 33. In other words, as opposed to deterministic
148 12 PROBLEM HARDNESS
algorithms
9
which always produce the same results when given the same inputs, a random-
ized algorithm violates the constraint of determinism. Randomized algorithms are also often
called probabilistic algorithms [1281, 1282, 1919, 1966, 1967]. There are two general classes
of randomized algorithms: Las Vegas and Monte Carlo algorithms.
Definition D12.2 (Las Vegas Algorithm). A Las Vegas algorithm
10
is a randomized
algorithm that never returns a wrong result [141, 1281, 1282, 1966]. It either returns the
correct result, reports a failure, or does not return at all.
If a Las Vegas algorithm returns, its outcome is deterministic (but not the algorithm itself).
The termination, however, cannot be guaranteed. There usually exists an expected runtime
limit for such algorithms – their actual execution however may take arbitrarily long. In
summary, a Las Vegas algorithm terminates with a positive probability and is (partially)
correct.
Definition D12.3 (Monte Carlo Algorithm). A Monte Carlo algorithm
11
always ter-
minates. Its result, however, can be either correct or incorrect [1281, 1282, 1966].
In contrast to Las Vegas algorithms, Monte Carlo algorithms always terminate but are
(partially) correct only with a positive probability. As pointed out in Section 1.3, they are
usually the way to go to tackle complex problems.
9
http://en.wikipedia.org/wiki/Deterministic_computation [accessed 2007-07-03]
10
http://en.wikipedia.org/wiki/Las_Vegas_algorithm [accessed 2007-07-03]
11
http://en.wikipedia.org/wiki/Monte_carlo_algorithm [accessed 2007-07-03]
12.4. COUNTERMEASURES 149
Tasks T12
44. Find all big-O relationships between the following functions f
i
: N
1
→R:
• f
1
(x) = 9

x
• f
2
(x) = log x
• f
3
(x) = 7
• f
4
(x) = (1.5)
x
• f
5
(x) =
99
/x
• f
6
(x) = e
x
• f
7
(x) = 10x
2
• f
8
(x) = (2 + (−1)
x
)x
• f
9
(x) = 2 ln x
• f
10
(x) = (1 + (−1)
x
)x
2
+x
[10 points]
45. Show why the statement “The runtime of the algorithm is at least in O
_
n
2
_
” makes
no sense.
[5 points]
46. What is the difference between a deterministic Turing machine (DTM) and non-
deterministic one (NTM)?
[10 points]
47. Write down the states and rules of a deterministic Turing machine which inverts a
sequence of zeros and ones on the tape while moving its head to the right and that
stops when encountering a blank symbol.
[5 points]
48. Use literature or the internet to find at least three more AT-complete problems. Define
them as optimization problems by specifying proper problem spaces X and objective
functions f which are subject to minimization.
[15 points]
Chapter 13
Unsatisfying Convergence
13.1 Introduction
Definition D13.1 (Convergence). An optimization algorithm has converged (a) if it can-
not reach new candidate solutions anymore or (b) if it keeps on producing candidate solutions
from a “small”
1
subset of the problem space.
Optimization processes will usually converge at some point in time. In nature, a similar phe-
nomenon can be observed according to Koza [1602]: The niche preemption principle states
that a niche in a natural environment tends to become dominated by a single species [1812].
Linear convergence to the global optimum x

with respect to an objective function f is
defined in Equation 13.1, where t is the iteration index of the optimization algorithm and
x(t) is the best candidate solution discovered until iteration t [1976]. This is basically the
best behavior of an optimization process we could wish for.
[f(x(t + 1)) −f(x

)[ ≤ c [f(x(t)) −f(x

)[ , c ∈ [0, 1) (13.1)
One of the problems in Global Optimization (and basically, also in nature) is that it is often
not possible to determine whether the best solution currently known is situated on local or
a global optimum and thus, if convergence is acceptable. In other words, it is usually not
clear whether the optimization process can be stopped, whether it should concentrate on
refining the current optimum, or whether it should examine other parts of the search space
instead. This can, of course, only become cumbersome if there are multiple (local) optima,
i. e., the problem is multi-modal [1266, 2267] as depicted in Fig. 11.1.c on page 140.
Definition D13.2 (Multi-Modality). A mathematical function is multi-modal if it has
multiple (local or global) maxima or minima [711, 2474, 3090]. A set of objective functions
(or a vector function) f is multi-modal if it has multiple (local or global) optima – depending
on the definition of “optimum” in the context of the corresponding optimization problem.
The existence of multiple global optima itself is not problematic and the discovery of only
a subset of them can still be considered as successful in many cases. The occurrence of
numerous local optima, however, is more complicated.
1
according to a suitable metric like numbers of modifications or mutations which need to be applied to
a given solution in order to leave this subset
152 13 UNSATISFYING CONVERGENCE
13.2 The Problems
There are three basic problems which are related to the convergence of an optimization algo-
rithm: premature, non-uniform, and domino convergence. The first one is most important
in optimization, but the latter ones may also cause large inconvenience.
13.2.1 Premature Convergence
Definition D13.3 (Premature Convergence). An optimization process has prema-
turely converged to a local optimum if it is no longer able to explore other parts of the
search space than the area currently being examined and there exists another region that
contains a superior solution [2417, 2760].
The goal of optimization is, of course, to find the global optima. Premature convergence,
basically, is the convergence of an optimization process to a local optimum. A prematurely
converged optimization process hence provides solutions which are below the possible or
anticipated quality.
Example E13.1 (Premature Convergence).
Figure 13.1 illustrates two examples for premature convergence. Although a higher peak
exists in the maximization problem pictured in Fig. 13.1.a, the optimization algorithm solely
focuses on the local optimum to the right. The same situation can be observed in the
minimization task shown in Fig. 13.1.b where the optimization process gets trapped in the
local minimum on the left hand side.
globaloptimum
localoptimum
Fig. 13.1.a: Example 1: Maximization
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
Fig. 13.1.b: Example 2: Minimization
Figure 13.1: Premature convergence in the objective space.
13.2.2 Non-Uniform Convergence
13.2. THE PROBLEMS 153
Definition D13.4 (Non-Uniform Convergence). In optimization problems with in-
finitely many possible (globally) optimal solutions, an optimization process has non-
uniformly converged if the solutions that it provides only represent a subset of the optimal
characteristics.
This definition is rather fuzzy, but it becomes much clearer when taking a look on
Example E13.2. The goal of optimization in problems with infinitely many optima is often
not to discover only one or two optima. Instead, usually we want to find a set X of solutions
which has a good spread over all possible optimal features.
Example E13.2 (Convergence in a Bi-Objective Problem).
Assume that we wish to solve a bi-objective problem with f = (f
1
, f
2
). Then, the goal is to
find a set X of candidate solutions which approximates the set X

of globally (Pareto) optimal
solutions as good as possible. Then, two issues arise: First, the optimization process should
converge to the true Pareto front and return solutions as close to it as possible. Second,
these solutions should be uniformly spread along this front.
f
1
f
2
trueParetofront
frontreturned
bytheoptimizer
Fig. 13.2.a: Bad Convergence and
Good Spread
f
1
f
2
Fig. 13.2.b: Good Convergence and
Bad Spread
f
1
f
2
Fig. 13.2.c: Good Convergence and
Spread
Figure 13.2: Pareto front approximation sets [2915].
Let us examine the three fronts included in Figure 13.2 [2915]. The first picture
(Fig. 13.2.a) shows a result X having a very good spread (or diversity) of solutions, but
the points are far away from the true Pareto front. Such results are not attractive because
they do not provide Pareto-optimal solutions and we would consider the convergence to be
premature in this case. The second example (Fig. 13.2.b) contains a set of solutions which
are very close to the true Pareto front but cover it only partially, so the decision maker
could lose important trade-off options. Finally, the front depicted in Fig. 13.2.c has the two
desirable properties of good convergence and spread.
13.2.3 Domino Convergence
The phenomenon of domino convergence has been brought to attention by Rudnick [2347]
who studied it in the context of his BinInt problem [2347, 2698] which is discussed in
Section 50.2.5.2. In principle, domino convergence occurs when the candidate solutions
have features which contribute to significantly different degrees to the total fitness. If these
features are encoded in separate genes (or building blocks) in the genotypes, they are likely to
be treated with different priorities, at least in randomized or heuristic optimization methods.
Building blocks with a very strong positive influence on the objective values, for instance,
will quickly be adopted by the optimization process (i. e., “converge”). During this time, the
alleles of genes with a smaller contribution are ignored. They do not come into play until
154 13 UNSATISFYING CONVERGENCE
the optimal alleles of the more “important” blocks have been accumulated. Rudnick [2347]
called this sequential convergence phenomenon domino convergence due to its resemblance
to a row of falling domino stones [2698].
Let us consider the application of a Genetic Algorithm, an optimization method working
on bit strings, in such a scenario. Mutation operators from time to time destroy building
blocks with strong positive influence which are then reconstructed by the search. If this
happens with a high enough frequency, the optimization process will never get to optimize
the less salient blocks because repairing and rediscovering those with higher importance
takes precedence. Thus, the mutation rate of the EA limits the probability of finding the
global optima in such a situation.
In the worst case, the contributions of the less salient genes may almost look like noise
and they are not optimized at all. Such a situation is also an instance of premature conver-
gence, since the global optimum which would involve optimal configurations of all building
blocks will not be discovered. In this situation, restarting the optimization process will not
help because it will turn out the same way with very high probability. Example problems
which are often likely to exhibit domino convergence are the Royal Road and the aforemen-
tioned BinInt problem, which you can find discussed in Section 50.2.4 and Section 50.2.5.2,
respectively.
There are some relations between domino convergence and the Siren Pitfall in feature
selection for classification in data mining [961]. Here, if one feature is highly useful in a hard
sub-problem of the classification task, its utility may still be overshadowed by mediocre
features contributing to easy sub-tasks.
13.3 One Cause: Loss of Diversity
In biology, diversity is the variety and abundance of organisms at a given place and
time [1813, 2112]. Much of the beauty and efficiency of natural ecosystems is based on
a dazzling array of species interacting in manifold ways. Diversification is also a good strat-
egy utilized by investors in the economy in order to increase their wealth.
In population-based Global Optimization algorithms, maintaining a set of diverse candi-
date solutions is very important as well. Losing diversity means approaching a state where all
the candidate solutions under investigation are similar to each other. Another term for this
state is convergence. Discussions about how diversity can be measured have been provided
by Cousins [648], Magurran [1813], Morrison [1956], Morrison and De Jong [1958], Paenke
et al. [2112], Routledge [2344], and Burke et al. [454, 458].
Example E13.3 (Loss of Diversity → Premature Convergence).
Figure 13.3 illustrates two possible ways in which a population-based optimization method
may progress. Both processes start with a set of random candidate solutions denoted as
bubbles in the fitness landscape already used in Fig. 13.1.a in their search for the global
maximum. While the upper algorithm preserves a reasonable diversity in the set of elements
under investigation although initially, the best candidate solutions are located in the prox-
imity of the local optimum. By doing so, it also manages to explore the area near the global
optimum and eventually, discovers both peaks. The process illustrated in the lower part of
the picture instead solely concentrates on the area where the best candidate solutions were
sampled in the beginning and quickly climbs up the local optimum. Reaching a state of
premature convergence, it cannot escape from there anymore.
13.3. ONE CAUSE: LOSS OF DIVERSITY 155
lossofdiversity ® prematureconvergence
preservationofdiversity ® convergencetoglobaloptimum
initialsituation
Figure 13.3: Loss of Diversity leads to Premature Convergence.
Preserving diversity is directly linked with maintaining a good balance between exploitation
and exploration [2112] and has been studied by researchers from many domains, such as
1. Genetic Algorithms [238, 2063, 2321, 2322],
2. Evolutionary Algorithms [365, 366, 1692, 1964, 2503, 2573],
3. Genetic Programming [392, 456, 458, 709, 1156, 1157],
4. Tabu Search [1067, 1069], and
5. Particle Swarm Optimization [2943].
13.3.1 Exploration vs. Exploitation
The operations which create new solutions from existing ones have a very large impact on
the speed of convergence and the diversity of the populations [884, 2535]. The step size in
Evolution Strategy is a good example of this issue: setting it properly is very important and
leads over to the “exploration versus exploitation” problem [1250]. The dilemma of deciding
which of the two principles to apply to which degree at a certain stage of optimization is
sketched in Figure 13.4 and can be observed in many areas of Global Optimization.
2
Definition D13.5 (Exploration). In the context of optimization, exploration means find-
ing new points in areas of the search space which have not been investigated before.
2
More or less synonymously to exploitation and exploration, the terms intensifications and diversification
have been introduced by Glover [1067, 1069] in the context of Tabu Search.
156 13 UNSATISFYING CONVERGENCE
initialsituation:
goodcandidatesolution
found
exploitation:
searchcloseproximityof
thegoodcandidata
exploration:
searchalsothe
distantarea
Figure 13.4: Exploration versus Exploitation
Since computers have only limited memory, already evaluated candidate solutions usually
have to be discarded in order to accommodate the new ones. Exploration is a metaphor
for the procedure which allows search operations to find novel and maybe better solution
structures. Such operators (like mutation in Evolutionary Algorithms) have a high chance
of creating inferior solutions by destroying good building blocks but also a small chance of
finding totally new, superior traits (which, however, is not guaranteed at all).
Definition D13.6 (Exploitation). Exploitation is the process of improving and combin-
ing the traits of the currently known solutions.
The crossover operator in Evolutionary Algorithms, for instance, is considered to be an
operator for exploitation. Exploitation operations often incorporate small changes into al-
ready tested individuals leading to new, very similar candidate solutions or try to merge
building blocks of different, promising individuals. They usually have the disadvantage that
other, possibly better, solutions located in distant areas of the problem space will not be
discovered.
Almost all components of optimization strategies can either be used for increasing ex-
ploitation or in favor of exploration. Unary search operations that improve an existing
solution in small steps can often be built, hence being exploitation operators. They can also
be implemented in a way that introduces much randomness into the individuals, effectively
making them exploration operators. Selection operations
3
in Evolutionary Computation
choose a set of the most promising candidate solutions which will be investigated in the
next iteration of the optimizers. They can either return a small group of best individuals
(exploitation) or a wide range of existing candidate solutions (exploration).
Optimization algorithms that favor exploitation over exploration have higher conver-
gence speed but run the risk of not finding the optimal solution and may get stuck at a local
optimum. Then again, algorithms which perform excessive exploration may never improve
their candidate solutions well enough to find the global optimum or it may take them very
long to discover it “by accident”. A good example for this dilemma is the Simulated An-
nealing algorithm discussed in Chapter 27 on page 243. It is often modified to a form called
simulated quenching which focuses on exploitation but loses the guaranteed convergence to
the optimum. Generally, optimization algorithms should employ at least one search opera-
tion of explorative character and at least one which is able to exploit good solutions further.
There exists a vast body of research on the trade-off between exploration and exploitation
that optimization algorithms have to face [87, 741, 864, 885, 1253, 1986].
3
Selection will be discussed in Section 28.4 on page 285.
13.4. COUNTERMEASURES 157
13.4 Countermeasures
There is no general approach which can prevent premature convergence. The probability
that an optimization process gets caught in a local optimum depends on the characteristics
of the problem to be solved and the parameter settings and features of the optimization
algorithms applied [2350, 2722].
13.4.1 Search Operator Design
The most basic measure to decrease the probability of premature convergence is to make
sure that the search operations are complete (see Definition D4.8 on page 85), i. e., to make
sure that they can at least theoretically reach every point in the search space from every
other point. Then, it is (at least theoretically) possible to escape arbitrary local optima.
A prime example for bad operator design is Algorithm 1.1 on page 40 which cannot escape
any local optima. The two different operators given in Equation 5.4 and Equation 5.5 on 96
are another example for the influence of the search operation design on the vulnerability of
the optimization process to local convergence.
13.4.2 Restarting
A very crude and yet, sometimes effective measure is restarting the optimization process at
randomly chosen points in time. One example for this method is GRASPs, Greedy Random-
ized Adaptive Search Procedures [902, 906] (see Chapter 42 on page 499), which continuously
restart the process of creating an initial solution and refining it with local search. Still, such
approaches are likely to fail in domino convergence situations. Increasing the proportion of
exploration operations may also reduce the chance of premature convergence.
13.4.3 Low Selection Pressure and Population Size
Generally, the higher the chance that candidate solutions with bad fitness are investigated
instead of being discarded in favor of the seemingly better ones, the lower the chance of
getting stuck at a local optimum. This is the exact idea which distinguishes Simulated
Annealing from Hill Climbing (see Chapter 27). It is known that Simulated Annealing can
find the global optimum, whereas simple Hill Climbers are likely to prematurely converge.
In an EA, too, using low selection pressure decreases the chance of premature convergence.
However, such an approach also decreases the speed with which good solutions are exploited.
Simulated Annealing, for instance, takes extremely long if a guaranteed convergence to the
global optimum is required.
Increasing the population size of a population-based optimizer may be useful as well,
since larger populations can maintain more individuals (and hence, also be more diverse).
However, the idea that larger populations will always lead to better optimization results
does not always hold, as shown by van Nimwegen and Crutchfield [2778].
13.4.4 Sharing, Niching, and Clearing
Instead of increasing the population size, it makes thus sense to “gain more” from the smaller
populations. In order to extend the duration of the evolution in Evolutionary Algorithms,
many methods have been devised for steering the search away from areas which have already
been frequently sampled. In steady-state EAs (see Paragraph 28.1.4.2.1), it is common to
remove duplicate genotypes from the population [2324].
More general, the exploration capabilities of an optimizer can be improved by integrating
density metrics into the fitness assignment process. The most popular of such approaches
are sharing and niching (see Section 28.3.4, [735, 742, 1080, 1085, 1250, 1899, 2391]). The
158 13 UNSATISFYING CONVERGENCE
Strength Pareto Algorithms, which are widely accepted to be highly efficient, use another
idea: they adapt the number of individuals that one candidate solution dominates as density
measure [3097, 3100].
P´etrowski [2167, 2168] introduced the related clearing approach which you can find
discussed in Section 28.4.7, where individuals are grouped according to their distance in the
phenotypic or genotypic space and all but a certain number of individuals from each group
just receive the worst possible fitness. A similar method aiming for convergence prevention
is introduced in Section 28.4.7, where individuals close to each other in objective space are
deleted.
13.4.5 Self-Adaptation
Another approach against premature convergence is to introduce the capability of self-
adaptation, allowing the optimization algorithm to change its strategies or to modify its
parameters depending on its current state. Such behaviors, however, are often implemented
not in order to prevent premature convergence but to speed up the optimization process
(which may lead to premature convergence to local optima) [2351–2353].
13.4.6 Multi-Objectivization
Recently, the idea of using helper objectives has emerged. Here, usually a single-objective
problem is transformed into a multi-objective one by adding new objective functions [1429,
1440, 1442, 1553, 1757, 2025]. Brockhoff et al. [425] proved that in some cases, such changes
can speed up the optimization process. The new objectives are often derived from the main
objective by decomposition [1178] or from certain characteristics of the problem [1429]. They
are then optimized together with the original objective function with some multi-objective
technique, say an MOEA (see Definition D28.2). Problems where the main objective is
some sort of sum of different components contributing to the fitness are especially suitable
for that purpose, since such a component naturally lends itself as helper objective. It has
been shown that the original objective function has at least as many local optima as the
total number of all local optima present in the decomposed objectives for sum-of-parts
multi-objectivizations [1178].
Example E13.4 (Sum-of-Parts Multi-Objectivization).
In Figure 13.5, we provide an example of multi-objectivization of the a sum-of-parts objective
function f defined on the real interval X = G = [0, 1]. The original (single) objective f
(Fig. 13.5.a) can be divided into the two objective functions f
1
and f
2
with f(x) = f
1
(x)+f
2
(x)
illustrated Fig. 13.5.b.
If we assume minimization, f has five locally optimal sets according to Definition D4.14
on page 89: The sets X

1
and X

5
are globally optimal since they contain the elements with
lowest function value. x

l,2
, x

l,3
, and x

l,4
are three sets each containing exactly one locally
minimal point.
We now split up f to f
1
and f
2
and optimize either f ≡ (f
1
, f
2
) or f ≡ (f, f
1
, f
2
) together.
If we apply Pareto-based optimization (see Section 3.3.5 on page 65) and use the Defini-
tion D4.17 on page 89 of local optima (which is also used in [1178]), we find that there are
only three locally optima regions remaining A

1
, A

2
, and A

3
.
A

1
, for example, is a continuous region which borders to elements it dominates. If we
go a small step to the left from the left border of A

1
, we will find an element with the same
value of f
1
but with worse values in both f
2
and f. The same is true for its right border. A

3
has exactly the same behavior.
Any step to the left or right within A

2
leads to an improvement of either f
1
or f
2
and a
degeneration of the other function. Therefore, no element within A

2
dominates any other
element in A

2
. However, at the borders of A

2
, f
1
becomes constant while f
2
keeps growing,
13.4. COUNTERMEASURES 159
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6 0.8 1
f(x)
mainobjective:f(x)=f (x)+f (x)
1 2
X
«
1
x
«
l,2
x
«
l,3 x
«
l,4
X
«
5
x X Î
Fig. 13.5.a: The original objective and its local op-
tima.
0
0.2
0.4
0.6
0.8
1
0.2 0.4 0.6 0.8 1
A
«
1
A
«
2
A
«
3
f (x)
1
f (x)
2
x X Î
Fig. 13.5.b: The helper objectives and the local op-
tima in the objective space.
Figure 13.5: Multi-Objectivization `a la Sum-of-Parts.
i. e., getting worse. The neighboring elements outside of A

2 are thus dominated by the
elements on its borders.
Multi-objectivization may thus help here to overcome the local optima, since there are
fewer locally optimal regions. However, a look at A

2
shows us that it contains x

l,2
, x

l,3
, and
x

l,4
and – since it is continuous – also (uncountable) infinitly many other points. In this
case, it would thus not be clear whether multi-objectivization would be beneficial or not a
priori.
160 13 UNSATISFYING CONVERGENCE
Tasks T13
49. Sketch one possible objective function for a single-objective optimization problem
which has infitely many solutions. Copy this graph three times and paint six points
into each copy which represent:
(a) the possible results of a possible prematurely converged optimization process,
(b) the possible results of a possible correctly converged optimization process with
non-uniform spread (convergence),
(c) the possible results of a possible optimization process which has converged nicely.
[10 points]
50. Discuss the dilemma of exploration versus exploitation. What are the advantages and
disadvantages of these two general search principles?
[5 points]
51. Discuss why the Hill Climbing optimization algorithm (Chapter 26) is particularly
prone to premature convergence and how Simulated Annealing (Chapter 27) mitigates
the cause of this vulnerability.
[5 points]
Chapter 14
Ruggedness and Weak Causality
unimodal multimodal somewhatrugged veryrugged
landscapedifficultyincreases
Figure 14.1: The landscape difficulty increases with increasing ruggedness.
14.1 The Problem: Ruggedness
Optimization algorithms generally depend on some form of gradient in the objective or fitness
space. The objective functions should be continuous and exhibit low total variation
1
, so the
optimizer can descend the gradient easily. If the objective functions become more unsteady
or fluctuating, i. e., go up and down often, it becomes more complicated for the optimization
process to find the right directions to proceed to as sketched in Figure 14.1. The more
rugged a function gets, the harder it gets to optimize it. For short, one could say ruggedness
is multi-modality plus steep ascends and descends in the fitness landscape. Examples of
rugged landscapes are Kauffman’s NK fitness landscape (see Section 50.2.1), the p-Spin
model discussed in Section 50.2.2, Bergman and Feldman’s jagged fitness landscape [276],
and the sketch in Fig. 11.1.d on page 140.
14.2 One Cause: Weak Causality
During an optimization process, new points in the search space are created by the search
operations. Generally we can assume that the genotypes which are the input of the search
operations correspond to phenotypes which have previously been selected. Usually, the
better or the more promising an individual is, the higher are its chances of being selected for
further investigation. Reversing this statement suggests that individuals which are passed
to the search operations are likely to have a good fitness. Since the fitness of a candidate
solution depends on its properties, it can be assumed that the features of these individuals
1
http://en.wikipedia.org/wiki/Total_variation [accessed 2008-04-23]
162 14 RUGGEDNESS AND WEAK CAUSALITY
are not so bad either. It should thus be possible for the optimizer to introduce slight changes
to their properties in order to find out whether they can be improved any further
2
. Normally,
such exploitive modifications should also lead to small changes in the objective values and
hence, in the fitness of the candidate solution.
Definition D14.1 (Strong Causality). Strong causality (locality) means that small
changes in the properties of an object also lead to small changes in its behavior [2278,
2279, 2329].
This principle (proposed by Rechenberg [2278, 2279]) should not only hold for the search
spaces and operations designed for optimization (see Equation 24.5 on page 218 in Sec-
tion 24.9), but applies to natural genomes as well. The offspring resulting from sexual
reproduction of two fish, for instance, has a different genotype than its parents. Yet, it is
far more probable that these variations manifest in a unique color pattern of the scales, for
example, instead of leading to a totally different creature.
Apart from this straightforward, informal explanation here, causality has been investi-
gated thoroughly in different fields of optimization, such as Evolution Strategy [835, 2278],
structure evolution [1760, 1761], Genetic Programming [835, 1396, 2327, 2329], genotype-
phenotype mappings [2451], search operators [835], and Evolutionary Algorithms in gen-
eral [835, 2338, 2599].
Sendhoff et al. [2451, 2452] introduce a very nice definition for causality which combines
the search space G, the (unary) search operations searchOp
1
∈ Op, the genotype-phenotype
mapping “gpm” and the problem space X with the objective functions f . First, a distance
measure “dist caus” based on the stochastic nature of the unary search operation searchOp
1
is defined.
3
dist caus(g
1
, g
2
) = −log
_
1
P
id
P (g
2
= searchOp
1
(g
1
))
_
(14.1)
P
id
= P (g = searchOp
1
(g)) (14.2)
In Equation 14.1, g
1
and g
2
are two elements of the search space G, P (g
2
= searchOp
1
(g
1
))
is the probability that g
2
is the result of one application of the search operation searchOp
1
to
g
1
, and P
id
is the probability that the searchOp
1
maps an element to itself. dist caus(g
1
, g
2
)
falls with a rising probability the g
1
is mapped to g
2
. Sendhoff et al. [2451, 2452] then define
strong causality as the state where Equation 14.3 holds:
∀g
1
, g
2
, g
3
∈ G : [[f (gpm(g
1
)) −f (gpm(g
2
))[[
2
< [[f (gpm(g
1
)) −f (gpm(g
3
))[[
2
dist caus(g
1
, g
2
) < dist caus(g
1
, g
3
)
P (g
2
= searchOp
1
(g
1
)) > P (g
3
= searchOp
1
(g
1
)) (14.3)
In other words, strong causality means that the search operations have a higher chance to
produce points g
2
which (map to candidate solutions which) are closer in objective space
to their inputs g
1
than to create elements x
3
which (map to candidate solutions which) are
farther away.
In fitness landscapes with weak (low) causality, small changes in the candidate solutions
often lead to large changes in the objective values, i. e., ruggedness. It then becomes harder
to decide which region of the problem space to explore and the optimizer cannot find reliable
gradient information to follow. A small modification of a very bad candidate solution may
then lead to a new local optimum and the best candidate solution currently known may be
surrounded by points that are inferior to all other tested individuals.
The lower the causality of an optimization problem, the more rugged its fitness landscape
is, which leads to a degeneration of the performance of the optimizer [1563]. This does not
necessarily mean that it is impossible to find good solutions, but it may take very long to
do so.
2
We have already mentioned this under the subject of exploitation.
3
This measure has the slight drawback that it is undefined for P (g
2
= searchOp
1
(g
2
)) = 0 or P
id
= 0.
14.3. FITNESS LANDSCAPE MEASURES 163
14.3 Fitness Landscape Measures
As measures for the ruggedness of a fitness landscape (or their general difficulty), many
different metrics have been proposed. Wedge and Kell [2874] and Altenberg [80] provide
nice lists of them in their work
4
, which we summarize here:
1. Weinberger [2881] introduced the autocorrelation function and the correlation length
of random walks.
2. The correlation of the search operators was used by Manderick et al. [1821] in con-
junction with the autocorrelation [2267].
3. Jones and Forrest [1467, 1468] proposed the fitness distance correlation (FDC), the
correlation of the fitness of an individual and its distance to the global optimum. This
measure has been extended by researchers such as Clergue et al. [590, 2784].
4. The probability that search operations create offspring fitter than their parents, as
defined by Rechenberg [2278] and Beyer [295] (and called evolvability by Altenberg
[77]), will be discussed in Section 16.2 on page 172 in depth.
5. Simulation dynamics have been researched by Altenberg [77] and Grefenstette [1130].
6. Another interesting metric is the fitness variance of formae (Radcliffe and Surry [2246])
and schemas (Reeves and Wright [2284]).
7. The error threshold method from theoretical biology [867, 2057] has been adopted
Ochoa et al. [2062] for Evolutionary Algorithms. It is the “critical mutation rate be-
yond which structures obtained by the evolutionary process are destroyed by mutation
more frequently than selection can reproduce them” [2062].
8. The negative slope coefficient (NSC) by Vanneschi et al. [2785, 2786] may be considered
as an extension of Altenberg’s evolvability measure.
9. Davidor [691] uses the epistatic variance as a measure of utility of a certain represen-
tation in Genetic Algorithms. We discuss the issue of epistasis in Chapter 17.
10. The genotype-fitness correlation (GFC) of Wedge and Kell [2874] is a new measure for
ruggedness in fitness landscape and has been shown to be a good guide for determining
optimal population sizes in Genetic Programming.
11. Rana [2267] defined the Static-φ and the Basin-δ metrics based on the notion of
schemas in binary-coded search spaces.
14.3.1 Autocorrelation and Correlation Length
As example, let us take a look at the autocorrelation function as well as the correlation
length of random walks [2881]. Here we borrow its definition from V´erel et al. [2802]:
Definition D14.2 (Autocorrelation Function). Given a random walk (x
i
, x
i+1
, . . . ),
the autocorrelation function ρ of an objective function f is the autocorrelation function
of the time series (f(x
i
) , f(x
i+1
) , . . . ).
ρ(k, f) =
E[f(x
i
) f(x
i+k
)] −E[f(x
i
)] E[f(x
i+k
)]
D
2
[f(x
i
)]
(14.4)
where E[f(x
i
)] and D
2
[f(x
i
)] are the expected value and the variance of f(x
i
).
4
Especially the one of Wedge and Kell [2874] is beautiful and far more detailed than this summary here.
164 14 RUGGEDNESS AND WEAK CAUSALITY
The correlation length τ = −
1
log ρ(1,f)
measures how the autocorrelation function decreases
and summarizes the ruggedness of the fitness landscape: the larger the correlation length,
the lower the total variation of the landscape. From the works of Kinnear, Jr [1540] and
Lipsitch [1743] from twenty, however, we also know that correlation measures do not always
represent the hardness of a problem landscape full.
14.4 Countermeasures
To the knowledge of the author, no viable method which can directly mitigate the effects
of rugged fitness landscapes exists. In population-based approaches, using large population
sizes and applying methods to increase the diversity can reduce the influence of ruggedness,
but only up to a certain degree. Utilizing Lamarckian evolution [722, 2933] or the Baldwin
effect [191, 1235, 1236, 2933], i. e., incorporating a local search into the optimization pro-
cess, may further help to smoothen out the fitness landscape [1144] (see Section 36.2 and
Section 36.3, respectively).
Weak causality is often a home-made problem because it results to some extent from
the choice of the solution representation and search operations. We pointed out that explo-
ration operations are important for lowering the risk of premature convergence. Exploitation
operators are as same as important for refining solutions to a certain degree. In order to
apply optimization algorithms in an efficient manner, it is necessary to find representations
which allow for iterative modifications with bounded influence on the objective values, i. e.,
exploitation. In Chapter 24, we present some further rules-of-thumb for search space and
operation design.
14.4. COUNTERMEASURES 165
Tasks T14
52. Investigate the Weierstraß Function
5
. Is it unimodal, smooth, multimodal, or even
rugged?
[5 points]
53. Why is causality important in metaheuristic optimization? Why are problems with
low causality normally harder than problems with strong causality?
[5 points]
54. Define three objective functions by yourself: a uni-modal one (f
1
: R →R
+
), a simple
multi-modal one (f
2
: R →R
+
), and a rugged one (f
3
: R →R
+
). All three functions
should have the same global minimum (optimum) 0 with objective value 0. Try to solve
all three function with a trivial implementation of Hill Climbing (see Algorithm 26.1 on
page 230) by applying it 30 times to each function. Note your observations regarding
the problem hardness.
[15 points]
5
http://en.wikipedia.org/wiki/Weierstrass_function [accessed 2010-09-14]
Chapter 15
Deceptiveness
15.1 Introduction
Especially annoying fitness landscapes show deceptiveness (or deceptivity). The gradient
of deceptive objective functions leads the optimizer away from the optima, as illustrated in
Figure 15.1 and Fig. 11.1.e on page 140.
unimodal unimodalwith
neutrality
multimodal(center
andlimits)deceptive
verydeceptive
landscapedifficultyincreases
Figure 15.1: Increasing landscape difficulty caused by deceptivity.
The term deceptiveness [288] is mainly used in the Genetic Algorithm
1
community in the
context of the Schema Theorem [1076, 1077]. Schemas describe certain areas (hyperplanes)
in the search space. If an optimization algorithm has discovered an area with a better
average fitness compared to other regions, it will focus on exploring this region based on
the assumption that highly fit areas are likely to contain the true optimum. Objective
functions where this is not the case are called deceptive [288, 1075, 1739]. Examples for
deceptiveness are the ND fitness landscapes outlined in Section 50.2.3, trap functions (see
Section 50.2.3.1), and the fully deceptive problems given by Goldberg et al. [743, 1076, 1083]
and [288].
15.2 The Problem
Definition D15.1 (Deceptiveness). An objective function f : X →R is deceptive if a Hill
Climbing algorithm (see Chapter 26 on page 229) would be steered in a direction leading
away from all global optima in large parts of the search space.
1
We are going to discuss Genetic Algorithms in Chapter 29 on page 325 and the Schema Theorem
in Section 29.5 on page 341.
168 15 DECEPTIVENESS
For Definition D15.1, it is usually also assumed that (a) the search space is also the problem
space (G = X), (b) the genotype-phenotype mapping is the identity mapping, and (c) the
search operations do not violate the causality (see Definition D14.1 on page 162).. Still,
Definition D15.1 remains a bit blurry since the term large parts of the search space is
intuitive but not precise. However, it reveals the basic problem caused by deceptiveness.
If the information accumulated by an optimizer actually guides it away from the optimum,
search algorithms will perform worse than a random walk or an exhaustive enumeration
method. This issue has been known for a long time [1914, 1915, 2696, 2868] and has been
subsumed under the No Free Lunch Theorem which we will discuss in Chapter 23.
15.3 Countermeasures
Solving deceptive optimization tasks perfectly involves sampling many individuals with very
bad features and low fitness. This contradicts the basic ideas of metaheuristics and thus,
there are no efficient countermeasures against deceptivity. Using large population sizes,
maintaining a very high diversity, and utilizing linkage learning (see Section 17.3.3) are,
maybe, the only approaches which can provide at least a small chance of finding good
solutions.
A completely deceptive fitness landscape would be even worse than the needle-in-a-
haystack problems outlined in Section 16.7 [? ]. Exhaustive enumeration of the search
space or searching by using random walks may be the only possible way to go in extremely
deceptive cases. However, the question what kind of practical problem would have such a
feature arises. The solution of a completely deceptive problem would be the one that, when
looking on all possible solutions, seems to be the most improbably one. However, when
actually testing it, this worst-looking point in the problem space turns out to be the best-
possible solution. Such kinds of problem are, as far as the author of this book is concerned,
rather unlikely and rare.
15.3. COUNTERMEASURES 169
Tasks T15
55. Why is deceptiveness fatal for metaheuristic optimization? Why are non-deceptive
problems usually easier than deceptive ones?
[5 points]
56. Construct two different deceptive objective functions f
1
, f
2
: R →R
+
by yourself. Like
the three functions from Task 54 on page 165, they should have the global minimum
(optimum) 0 with objective value 0. Try to solve these functions with a trivial imple-
mentation of Hill Climbing (see Algorithm 26.1 on page 230) by applying it 30 times
to both of them. Note your observations regarding the problem hardness and compare
them to your results from Task 54.
[7 points]
Chapter 16
Neutrality and Redundancy
16.1 Introduction
unimodal notmuchgradient
distantfromoptimum
withsomeneutrality veryneutral
landscapedifficultyincreases
Figure 16.1: Landscape difficulty caused by neutrality.
Definition D16.1 (Neutrality). We consider the outcome of the application of a search
operation to an element of the search space as neutral if it yields no change in the objective
values [219, 2287].
It is challenging for optimization algorithms if the best candidate solution currently known
is situated on a plane of the fitness landscape, i. e., all adjacent candidate solutions have
the same objective values. As illustrated in Figure 16.1 and Fig. 11.1.f on page 140, an
optimizer then cannot find any gradient information and thus, no direction in which to
proceed in a systematic manner. From its point of view, each search operation will yield
identical individuals. Furthermore, optimization algorithms usually maintain a list of the
best individuals found, which will then overflow eventually or require pruning.
The degree of neutrality ν is defined as the fraction of neutral results among all possible
products of the search operations applied to a specific genotype [219]. We can generalize
this measure to areas G in the search space G by averaging over all their elements. Regions
where ν is close to one are considered as neutral.
∀g
1
∈ G ⇒ ν(g
1
) =
[¦g
2
: P (g
2
= Op(g
1
)) > 0 ∧ f (gpm(g
2
)) = f (gpm(g
1
))¦[
[¦g
2
: P (g
2
= Op(g
1
)) > 0¦[
(16.1)
∀G ⊆ G ⇒ ν(G) =
1
[G[

g∈G
ν(g) (16.2)
172 16 NEUTRALITY AND REDUNDANCY
16.2 Evolvability
Another metaphor in Global Optimization which has been borrowed from biological sys-
tems [1289] is evolvability
1
[699]. Wagner [2833, 2834] points out that this word has two
uses in biology: According to Kirschner and Gerhart [1543], a biological system is evolv-
able if it is able to generate heritable, selectable phenotypic variations. Such properties
can then be spread by natural selection and changed during the course of evolution. In its
second sense, a system is evolvable if it can acquire new characteristics via genetic change
that help the organism(s) to survive and to reproduce. Theories about how the ability
of generating adaptive variants has evolved have been proposed by Riedl [2305], Wagner
and Altenberg [78, 2835], Bonner [357], and Conrad [624], amongst others. The idea of
evolvability can be adopted for Global Optimization as follows:
Definition D16.2 (Evolvability). The evolvability of an optimization process in its cur-
rent state defines how likely the search operations will lead to candidate solutions with new
(and eventually, better) objectives values.
The direct probability of success [295, 2278], i. e., the chance that search operators produce
offspring fitter than their parents, is also sometimes referred to as evolvability in the context
of Evolutionary Algorithms [77, 80].
16.3 Neutrality: Problematic and Beneficial
The link between evolvability and neutrality has been discussed by many researchers [2834,
3039]. The evolvability of neutral parts of a fitness landscape depends on the optimization
algorithm used. It is especially low for Hill Climbing and similar approaches, since the search
operations cannot directly provide improvements or even changes. The optimization process
then degenerates to a random walk, as illustrated in Fig. 11.1.f on page 140. The work of
Beaudoin et al. [242] on the ND fitness landscapes
2
shows that neutrality may “destroy”
useful information such as correlation.
Researchers in molecular evolution, on the other hand, found indications that the major-
ity of mutations in biology have no selective influence [971, 1315] and that the transformation
from genotypes to phenotypes is a many-to-one mapping. Wagner [2834] states that neutral-
ity in natural genomes is beneficial if it concerns only a subset of the properties peculiar to
the offspring of a candidate solution while allowing meaningful modifications of the others.
Toussaint and Igel [2719] even go as far as declaring it a necessity for self-adaptation.
The theory of punctuated equilibria
3
, in biology introduced by Eldredge and Gould
[875, 876], states that species experience long periods of evolutionary inactivity which are
interrupted by sudden, localized, and rapid phenotypic evolutions [185].
4
It is assumed that
the populations explore neutral layers
5
during the time of stasis until, suddenly, a relevant
change in a genotype leads to a better adapted phenotype [2780] which then reproduces
quickly. Similar phenomena can be observed/are utilized in EAs [604, 1838].
“Uh?”, you may think, “How does this fit together?” The key to differentiating between
“good” and “bad” neutrality is its degree ν in relation to the number of possible solutions
maintained by the optimization algorithms. Smith et al. [2538] have used illustrative ex-
amples similar to Figure 16.2 showing that a certain amount of neutral reproductions can
foster the progress of optimization. In Fig. 16.2.a, basically the same scenario of premature
1
http://en.wikipedia.org/wiki/Evolvability [accessed 2007-07-03]
2
See Section 50.2.3 on page 557 for a detailed elaboration on the ND fitness landscape.
3
http://en.wikipedia.org/wiki/Punctuated_equilibrium [accessed 2008-07-01]
4
A very similar idea is utilized in the Extremal Optimization method discussed in Chapter 41.
5
Or neutral networks, as discussed in Section 16.4.
16.4. NEUTRAL NETWORKS 173
convergence as in Fig. 13.1.a on page 152 is depicted. The optimizer is drawn to a local opti-
mum from which it cannot escape anymore. Fig. 16.2.b shows that a little shot of neutrality
could form a bridge to the global optimum. The optimizer now has a chance to escape the
smaller peak if it is able to find and follow that bridge, i. e., the evolvability of the system
has increased. If this bridge gets wider, as sketched in Fig. 16.2.c, the chance of finding the
global optimum increases as well. Of course, if the bridge gets too wide, the optimization
process may end up in a scenario like in Fig. 11.1.f on page 140 where it cannot find any
direction. Furthermore, in this scenario we expect the neutral bridge to lead to somewhere
useful, which is not necessarily the case in reality.
globaloptimum
localoptimum
Fig. 16.2.a: Premature Conver-
gence
Fig. 16.2.b: Small Neutral Bridge Fig. 16.2.c: Wide Neutral Bridge
Figure 16.2: Possible positive influence of neutrality.
Recently, the idea of utilizing the processes of molecular
6
and evolutionary
7
biology as
complement to Darwinian evolution for optimization gains interest [212]. Scientists like
Hu and Banzhaf [1286, 1287, 1289] began to study the application of metrics such as the
evolution rate of gene sequences [2973, 3021] to Evolutionary Algorithms. Here, the degree
of neutrality (synonymous vs. non-synonymous changes) seems to play an important role.
Examples for neutrality in fitness landscapes are the ND family (see Section 50.2.3),
the NKp and NKq models (discussed in Section 50.2.1.5), and the Royal Road (see Sec-
tion 50.2.4). Another common instance of neutrality is bloat in Genetic Programming,
which is outlined in ?? on page ??.
16.4 Neutral Networks
From the idea of neutral bridges between different parts of the search space as sketched by
Smith et al. [2538], we can derive the concept of neutral networks.
Definition D16.3 (Neutral Network). Neutral networks are equivalence classes K of
elements of the search space G which map to elements of the problem space X with the
same objective values and are connected by chains of applications of the search operators
Op [219].
∀g
1
, g
2
∈ G : g
1
∈ K(g
2
) ⊆ G ⇔ ∃k ∈ N
0
: P
_
g
2
= Op
k
(g
1
)
_
> 0 ∧
f (gpm(g
1
)) = f (gpm(g
2
)) (16.3)
6
http://en.wikipedia.org/wiki/Molecular_biology [accessed 2008-07-20]
7
http://en.wikipedia.org/wiki/Evolutionary_biology [accessed 2008-07-20]
174 16 NEUTRALITY AND REDUNDANCY
Barnett [219] states that a neutral network has the constant innovation property if
1. the rate of discovery of innovations keeps constant for a reasonably large amount of
applications of the search operations [1314], and
2. if this rate is comparable with that of an unconstrained random walk.
Networks with this property may prove very helpful if they connect the optima in the fitness
landscape. Stewart [2611] utilizes neutral networks and the idea of punctuated equilibria
in his extrema selection, a Genetic Algorithm variant that focuses on exploring individuals
which are far away from the centroid of the set of currently investigated candidate solutions
(but have still good objective values). Then again, Barnett [218] showed that populations
in Genetic Algorithm tend to dwell in neutral networks of high dimensions of neutrality
regardless of their objective values, which (obviously) cannot be considered advantageous.
The convergence on neutral networks has furthermore been studied by Bornberg-Bauer
and Chan [361], van Nimwegen et al. [2778, 2779], and Wilke [2942]. Their results show that
the topology of neutral networks strongly determines the distribution of genotypes on them.
Generally, the genotypes are “drawn” to the solutions with the highest degree of neutrality
ν on the neutral network Beaudoin et al. [242].
16.5 Redundancy: Problematic and Beneficial
Definition D16.4 (Redundancy). Redundancy in the context of Global Optimization is
a feature of the genotype-phenotype mapping and means that multiple genotypes map to
the same phenotype, i. e., the genotype-phenotype mapping is not injective.
∃g
1
, g
2
: g
1
,= g
2
∧ gpm(g
1
) = gpm(g
2
) (16.4)
The role of redundancy in the genome is as controversial as that of neutrality [2880].
There exist many accounts of its positive influence on the optimization process. Ship-
man et al. [2458, 2480], for instance, tried to mimic desirable evolutionary properties of
RNA folding [1315]. They developed redundant genotype-phenotype mappings using voting
(both, via uniform redundancy and via a non-trivial approach), Turing machine-like binary
instructions, Cellular automata, and random Boolean networks [1500]. Except for the trivial
voting mechanism based on uniform redundancy, the mappings induced neutral networks
which proved beneficial for exploring the problem space. Especially the last approach pro-
vided particularly good results [2458, 2480]. Possibly converse effects like epistasis (see
Chapter 17) arising from the new genotype-phenotype mappings have not been considered
in this study.
Redundancy can have a strong impact on the explorability of the problem space. When
utilizing a one-to-one mapping, the translation of a slightly modified genotype will always
result in a different phenotype. If there exists a many-to-one mapping between genotypes
and phenotypes, the search operations can create offspring genotypes different from the
parent which still translate to the same phenotype. The optimizer may now walk along a
path through this neutral network. If many genotypes along this path can be modified to
different offspring, many new candidate solutions can be reached [2480]. One example for
beneficial redundancy is the extradimensional bypass idea discussed in Section 24.15.1.
The experiments of Shipman et al. [2479, 2481] additionally indicate that neutrality
in the genotype-phenotype mapping can have positive effects. In the Cartesian Genetic
Programming method, neutrality is explicitly introduced in order to increase the evolvability
(see ?? on page ??) [2792, 3042].
16.6. SUMMARY 175
Yet, Rothlauf [2338] and Shackleton et al. [2458] show that simple uniform redundancy
is not necessarily beneficial for the optimization process and may even slow it down. Ronald
et al. [2324] point out that the optimization may be slowed down if the population of the
individuals under investigation contains many isomorphic genotypes, i. e., genotypes which
encode the same phenotype. If this isomorphismcan be identified an removed, a significant
speedup may be gained [2324]. We can thus conclude that there is no use in introducing
encodings which, for instance, represent each phenotypic bit with two bits in the genotype
where 00 and 01 map to 0 and 10 and 11 map to 1. Another example for this issue is given
in Fig. 24.1.b on page 220.
16.6 Summary
Different from ruggedness which is always bad for optimization algorithms, neutrality has
aspects that may further as well as hinder the process of finding good solutions. Generally
we can state that degrees of neutrality ν very close to 1 degenerate optimization processes
to random walks. Some forms of neutral networks accompanied by low (nonzero) values of
ν can improve the evolvability and hence, increase the chance of finding good solutions.
Adverse forms of neutrality are often caused by bad design of the search space or
genotype-phenotype mapping. Uniform redundancy in the genome should be avoided where
possible and the amount of neutrality in the search space should generally be limited.
16.7 Needle-In-A-Haystack
Besides fully deceptive problems, one of the worst cases of fitness landscapes is the needle-
in-a-haystack (NIAH) problem [? ] sketched in Fig. 11.1.g on page 140, where the optimum
occurs as isolated [1078] spike in a plane. In other words, small instances of extreme rugged-
ness combine with a general lack of information in the fitness landscape. Such problems
are extremely hard to solve and the optimization processes often will converge prematurely
or take very long to find the global optimum. An example for such fitness landscapes is
the all-or-nothing property often inherent to Genetic Programming of algorithms [2729], as
discussed in ?? on page ??.
176 16 NEUTRALITY AND REDUNDANCY
Tasks T16
57. Describe the possible positive and negative aspects of neutrality in the fitness land-
scape.
[10 points]
58. Construct two different objective functions f
1
, f
2
: R → R
+
which have neutral parts
in the fitness landscape. Like the three functions from Task 54 on page 165 and the
two from Task 56 on page 169, they should have the global minimum (optimum) 0
with objective value 0. Try to solve these functions with a trivial implementation of
Hill Climbing (see Algorithm 26.1 on page 230) by applying it 30 times to both of
them. Note your observations regarding the problem hardness and compare them to
your results from Task 54.
[7 points]
Chapter 17
Epistasis, Pleiotropy, and Separability
17.1 Introduction
17.1.1 Epistasis and Pleiotropy
In biology, epistasis
1
is defined as a form of interaction between different genes [2170].
The term was coined by Bateson [231] and originally meant that one gene suppresses the
phenotypical expression of another gene. In the context of statistical genetics, epistasis was
initially called “epistacy” by Fisher [928]. According to Lush [1799], the interaction between
genes is epistatic if the effect on the fitness of altering one gene depends on the allelic state
of other genes.
Definition D17.1 (Epistasis). In optimization, epistasis is the dependency of the con-
tribution of one gene to the value of the objective functions on the allelic state of other
genes. [79, 692, 2007]
We speak of minimal epistasis when every gene is independent of every other gene. Then,
the optimization process equals finding the best value for each gene and can most efficiently
be carried out by a simple greedy search (see Section 46.3.1) [692]. A problem is maximally
epistatic when no proper subset of genes is independent of any other gene [2007, 2562]. Our
understanding and effects of epistasis come close to another biological phenomenon:
Definition D17.2 (Pleiotropy). Pleiotropy
2
denotes that a single gene is responsible for
multiple phenotypical traits [1288, 2944].
Like epistasis, pleiotropy can sometimes lead to unexpected improvements but often is harm-
ful for the evolutionary system [77]. Both phenomena may easily intertwine. If one gene
epistatically influences, for instance, two others which are responsible for distinct phenotyp-
ical traits, it has both epistatic and pleiotropic effects. We will therefore consider pleiotropy
and epistasis together and when discussing the effects of the latter, also implicitly refer to
the former.
Example E17.1 (Pleiotropy, Epistasis, and a Dinosaur).
In Figure 17.1 we illustrate a fictional dinosaur along with a snippet of its fictional genome
1
http://en.wikipedia.org/wiki/Epistasis [accessed 2008-05-31]
2
http://en.wikipedia.org/wiki/Pleiotropy [accessed 2008-03-02]
178 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
gene1
color
gene2 gene3 gene4
length
length
shape
Figure 17.1: Pleiotropy and epistasis in a dinosaur’s genome.
consisting of four genes. Gene 1 influences the color of the creature and is neither pleiotropic
nor has any epistatic relations. Gene 2, however, exhibits pleiotropy since it determines the
length of the hind legs and forelegs. At the same time, it is epistatically connected with
gene 3 which too, influences the length of the forelegs – maybe preventing them from looking
exactly like the hind legs. The fourth gene is again pleiotropic by determining the shape of
the bone armor on the top of the dinosaur’s skull.
Epistasis can be considered as the higher-order or non-main effects in a model predicting
fitness values based on interactions of the genes of the genotypes from the point of view of
Design of Experiments (DOE) [2285].
17.1.2 Separability
In the area of optimizing over continuous problem spaces (such as X ⊆ R
n
), epistasis and
pleiotropy are closely related to the term separability. Separbility is a feature of the objective
functions of an optimization problem [2668]. A function of n variables is separable if it can
be rewritten as a sum of n functions of just one variable [1162, 2099]. Hence, the decision
variables involved in the problem can be optimized independently of each other, i. e., are
minimally epistatic, and the problem is said to be separable according to the formal definition
given by Auger et al. [147, 1181] as follows:
Definition D17.3 (Separability). A function f : X
n
→ R (where usually X ⊆ R is
separable if and only if
arg min
(x
1
,..,x
n
)
f(x
1
, .., x
n
) = (arg min
x
1
f(x
1
, ..) , .., arg min
x
n
f(.., x
n
)) (17.1)
Otherwise, f(x) is called a nonseparable function.
17.2. THE PROBLEM 179
If a function f(x) is separable, the parameters x
1
, .., x
n
forming the candidate solution x ∈ X
are called independent. A separable problem is decomposable, as already discussed under
the topic of epistasis.
Definition D17.4 (Nonseparability). A function f : X
n
→ R (where usually x ⊆ R
is m-nonseparable if at most m of its parameters x
i
are not independent. A nonseparable
function f(x) is called fully-nonseparable if any two of its parameters x
i
are not independent.
The higher the degree of nonseparability, the harder a function will usually become more
difficult for optimization [147, 1181, 2635]. Often, the term nonseparable is used in the sense
of fully-nonseparable. In other words, in between separable and fully nonseparable problems,
a variety of partially separable problems exists. Definition D17.5 is, hence, an alternative
for Definition D17.4.
Definition D17.5 (Partial Separability). If an objective function f : X
n
→ R
+
can be
expressed as
f(x) =
M

i=1
f
i
_
x
I
i,1
, .., x
I
i,|I
i
|
_
(17.2)
where each of the M component functions f
i
: X
|I
i
|
→R depends on a subset I
i
⊆ ¦1..n¦ of
the n decision variables [612, 613, 1136, 1137].
17.1.3 Examples
Examples of problems with a high degree of epistasis are Kauffman’s NK fitness land-
scape [1499, 1501] (Section 50.2.1), the p-Spin model [86] (Section 50.2.2), and the tun-
able model by Weise et al. [2910] (Section 50.2.7). Nonseparable objective functions often
used for benchmarking are, for instance, the Griewank function [1106, 2934, 3026] (see Sec-
tion 50.3.1.11) and Rosenbrock’s Function [3026] (see Section 50.3.1.5). Partially separable
test problems are given in, for example, [2670, 2708].
17.2 The Problem
As sketched in Figure 17.2, epistasis has a strong influence on many of the previously
discussed problematic features. If one gene can “turn off” or affect the expression of other
genes, a modification of this gene will lead to a large change in the features of the phenotype.
Hence, the causality will be weakened and ruggedness ensues in the fitness landscape. It also
becomes harder to define search operations with exploitive character. Moreover, subsequent
changes to the “deactivated” genes may have no influence on the phenotype at all, which
would then increase the degree of neutrality in the search space. Bowers [375] found that
representations and genotypes with low pleiotropy lead to better and more robust solutions.
Epistasis and pleiotropy are mainly aspects of the way in which objective functions f ∈ f ,
the genome G, and the genotype-phenotype mapping are defined and interact. They should
be avoided where possible.
180 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
ruggedness
multi-
modality
weakcausality
high
epistasis
º causes
neutrality
Needleina
Haystack
Figure 17.2: The influence of epistasis on the fitness landscape.
Example E17.2 (Epistasis in Binary Genomes).
Consider the following three objective functions f
1
to f
3
all subject to maximization. All three
are based on a problem space consisting of all bit strings of length five
_
X
B
= B
5
= ¦1, 0¦
5
_
.
f
1
(x) =

i=0
4x[i] (17.3)
f
2
(x) = (x[0] ∗ x[1]) + (x[2] ∗ x[3]) +x[4] (17.4)
f
3
(x) =

i=0
4x[i] (17.5)
Furthermore, the search space be identical to the problem space, i. e., G = X. In the
optimization problem defined by X
B
and f , the elements of the candidate solutions are
totally independent and can be optimized separately. Thus, f
1
is minimal epistatic. In f
2
,
only the first two (x[0], x[1]) and second two (x[2], x[3]) genes of the each genotype influence
each other. Both groups are independent from each other and from x[4]. Hence, the problem
has some epistasis. f
3
is a needle-in-a-haystack problem where all five genes directly influence
each other. A single zero in a candidate solution deactivates all other genes and the optimum
– the string containing all ones – is the only point in the search space with non-zero objective
values.
Example E17.3 (Epistasis in Genetic Programming).
Epistasis can be observed especially often Genetic Programming. In this area of Evolution-
ary Computation the problem spaces are sets of all programs in (usually) domain-specific
languages and we try to find candidate solutions which solve a given problem “algorithmi-
cally”.
1 int gcd(int a, int b) { //skeleton code
2 int t; //skeleton code
3 while(b!=0) { // ------------------
4 t = b; //
5 b = a % b; // Evolved Code
6 a = t; //
7 } // ------------------
8 return a; //skeleton code
9 } //skeleton code
Listing 17.1: A program computing the greatest common divisor.
17.2. THE PROBLEM 181
Let us assume that we wish to evolve a program computing the greatest common divisor of
two natural numbers a, b ∈ N
1
in a C-style representation (X = C). Listing 17.1 would be
a program which fulfills this task. As search space G we could, for instance, use all trees
where the inner nodes are chosen from while, =, !=, ==, %, +, -, each accepting two child
nodes, and the leaf nodes are either xa, b, t, 1, or 0. The evolved trees could be pasted by
the genotype-phenotype mapping into some skeleton code as indicated in Listing 17.1.
The genes of this representation are the nodes of the trees. A unary search operator
(mutation) could exchange a node inside a genotype with a randomly created one and a
binary search operator (crossover) could swap subtrees between two genotypes. Such a
mutation operator could, for instance, replace the “t = b” in Line 4 of Listing 17.1 with
a “t = a”. Although only a single gene of the program has been modified, its behavior
has now changed completely. Without stating any specific objective functions, it should be
clear that the new program will have very different (functional) objective values. Changing
the instruction has epistatic impact on the behavior of the rest of the program – the loop
declared in Line 3, for instance, may now become an endless loop for many inputs a and b.
Example E17.4 (Epistasis in Combinatorial Optimization: Bin Packing).
In Example E1.1 on page 25 we introduced bin packing as a typical combinatorial optimiza-
tion problem. Later, in Task 67 on page 238 we propose a solution approach to a variant of
this problem. As discussed in this task and sketched in Figure 17.3, we suggest representing
a solution to bin packing as a permutation of the n objects to be packed into the bin and
to set G = X = Π(n). In other words, a candidate solution describes the order in which
the objects have to be placed into the bin and is processed from left to right. Whenever
the current object does not fit into the current bin anymore, a new bin is “opened”. An
implementation of this idea is given in Section 57.1.2.1 and an example output of a test
program applying Hill Climbing, Simulated Annealing, a random walk, and the random
sampling algorithm is listen in Listing 57.16 on page 903.
As can be seen in that listing of results, neither Hill Climbing nor Simulated Annealing
can come very close to the known optimal solutions. One reason for this performance is
that the permutation representation for bin packing exhibits some epistasis. In Figure 17.3,
we sketch how the exchange of two genes near the beginning of a candidate solution x
1
can
lead to a new solution x
2
with a very different behavior: Due to the sequential packing of
objects into bins according to the permutation described by the candidate solutions, object
groupings way after the actually swapped objects may also be broken down. In other words,
the genes at the beginning of the representation have a strong influence on the behavior at
its end. Interestingly (and obviously), this influence is not mutual: the genes at the end of
the permutation have no influence on the genes preceding them.
Suggestions for a better encoding, an extension of the permutation encoding with so-
called separators marking the beginning and ending of a bin, have been made by Jones and
Beltramo [1460] and Stawowy [2604]. Khuri et al. [1534] suggest a method which works
completely without permutations: Each gene of a string of length n stands for one of the
n objects and its (numerical) allele denotes the bin into which the object belongs. Though
this solution allows for infeasible solutions, it does not suffer from the epistasis phenomenon
discussed in this example.
182 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
0
a =2
0
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
6
a =6
6
7
a =7
7
8
a =8
8
9
a =9
9
10
a =10
10
11
a =12
11
g =
1
x =( ,11,6, ,2,9,4,8,5,7,1,3)
1
10 0 g =
1
x =( ,11,6, ,2,9,4,8,5,7,1,3)
2
0 10
0
a =2
0
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
6
a =6
6
7
a =7
7
8
a =8
8
9
a =9
9
10
a =10
10
11
a =12
11
bin1 bin2 bin3 bin4 bin5 bin6
b
i
n

s
i
z
e

b

=

1
4
12 objects (with identifiers 0 to 11) and
sizesa=(2,3,3,3,4,5,5,7,8,9,11,12)
shouldbepackedintobinsofsizeb=14
0
a =2
0
10
a =10
10
0
a =2
0
6
a =6
6
10
a =10
10
11
a =12
11
bin1 bin2 bin3 bin4 bin5 bin6 bin7
1
a =3
1
2
a =3
2
3
a =3
3
4
a =4
4
5
a =5
5
7
a =7
7
8
a =8
8
9
a =9
9
0
a =2
0
10
a =10
10
swapthefirstandfourthelementofthepermutation
Firstcandidatesolutionx requires6bins.
1
Candidatex ,aslightlymodifiedversionofx ,
requires7binsandhasaverydifferent
,
behavior
,
.
2 1
Figure 17.3: Epistasis in the permutation-representation for bin packing.
Generally, epistasis and conflicting objectives in multi-objective optimization should be dis-
tinguished from each other. Epistasis as well as pleiotropy is a property of the influence of
the editable elements (the genes) of the genotypes on the phenotypes. Objective functions
can conflict without the involvement of any of these phenomena. We can, for example, define
two objective functions f
1
(x) = x and f
2
(x) = −x which are clearly contradicting regardless
of whether they both are subject to maximization or minimization. Nevertheless, if the can-
didate solutions x and the genotypes are simple real numbers and the genotype-phenotype
mapping is an identity mapping, neither epistatic nor pleiotropic effects can occur.
Naudts and Verschoren [2008] have shown for the special case of length-two binary string
genomes that deceptiveness does not occur in situations with low epistasis and also that
objective functions with high epistasis are not necessarily deceptive. [239] state the de-
ceptiveness is a special case of epistatic interaction (while we would rather say that it is
a phenomenon caused by epistasis). Another discussion about different shapes of fitness
landscapes under the influence of epistasis is given by Beerenwinkel et al. [248].
17.3 Countermeasures
We have shown that epistasis is a root cause for multiple problematic features of optimization
tasks. General countermeasures against epistasis can be divided into two groups. The
symptoms of epistasis can be mitigated with the same methods which increase the chance
of finding good solutions in the presence of ruggedness or neutrality.
17.3.1 Choice of the Representation
Epistasis itself is a feature which results from the choice of the search space structure, the
17.3. COUNTERMEASURES 183
search operations, the genotype-phenotype mapping, and the structure of the problem space.
Avoiding epistatic effects should be a major concern during their design.
Example E17.5 (Bad Representation Design: Algorithm 1.1).
Algorithm 1.1 on page 40 is a good example for bad search space design. There, the
assignment of n objects to k bins in order to solve the bin packing problem is represented
as a permutation of the first n natural numbers. The genotype-phenotype mapping starts
to take numbers from the start of the genotype sequentially and places the corresponding
objects into the first bin. If the object identified by the current gene does not fill into the
bin anymore, the GPM continues with the next bin.
Representing an assignment of objects to bins like this has one drawback: if the genes at
lower loci, i. e., close to the beginning of the genotype, are modified, this may change many
subsequent assignments. If a large object is swapped from the back to an early bin, the
other objects may be shifted backwards and many of them will land in different bins than
before. The meaning of the gene at locus i in the genotype depends on the allelic states of
all genes at loci smaller than i. Hence, this representation is quite epistatic.
Interestingly, using the representation suggested in Example E1.1 on page 25 and in
Equation 1.2 directly as search space would not only save us the work of implementing a
genotype-phenotype mapping, it would also exhibit significantly lower epistasis.
Choosing the problem space and the genotype-phenotype mapping efficiently can lead to a
great improvement in the quality of the solutions produced by the optimization process [2888,
2905]. Introducing specialized search operations can achieve the same [2478].
We outline some general rules for representation design in Chapter 24. Considering these
suggestions may further improve the performance of an optimizer.
17.3.2 Parameter Tweaking
Using larger populations and favoring explorative search operations seems to be helpful in
epistatic problems since this should increase the diversity. Shinkai et al. [2478] showed that
applying (a) higher selection pressure, i. e., increasing the chance to pick the best candidate
solutions for further investigation instead of the weaker ones, and (b) extinctive selection,
i. e., only working with the newest produced set of candidate solutions while discarding
the ones from which they were derived, can also increase the reliability of an optimizer to
find good solutions. These two concepts are slightly contradicting, so careful adjustment of
the algorithm settings appears to be vital in epistatic environments. Shinkai et al. [2478]
observed higher selection pressure also leads to earlier convergence, a fact that we pointed
out in Chapter 13.
17.3.3 Linkage Learning
According to Winter et al. [2960], linkage is “the tendency for alleles of different genes to
be passed together from one generation to the next” in genetics. This usually indicates that
these genes are closely located in the same chromosome. In the context of Evolutionary
Algorithms, this notation is not useful since identifying spatially close elements inside the
genotypes g ∈ G is trivial. Instead, we are interested in alleles of different genes which have
a joint effect on the fitness [1980, 1981].
Identifying these linked genes, i. e., learning their epistatic interaction, is very helpful
for the optimization process. Such knowledge can be used to protect building blocks
3
from
being destroyed by the search operations (such as crossover in Genetic Algorithms), for
instance. Finding approaches for linkage learning has become an especially popular discipline
3
See Section 29.5.5 for information on the Building Block Hypothesis.
184 17 EPISTASIS, PLEIOTROPY, AND SEPARABILITY
in the area of Evolutionary Algorithms with binary [552, 1196, 1981] and real [754] genomes.
Two important methods from this area are the messy GA (mGA, see Section 29.6) by
Goldberg et al. [1083] and the Bayesian Optimization Algorithm (BOA) [483, 2151]. Module
acquisition [108] may be considered as such an effort.
17.3.4 Cooperative Coevolution
Variable Interaction Learning (VIL) techniques can be used to discover the relations be-
tween decision variables, as introduced by Chen et al. [551] and discussed in the context of
cooperative coevolution in Section 21.2.3 on page 204.
17.3. COUNTERMEASURES 185
Tasks T17
59. What is the difference between epistasis and pleiotropy?
[5 points]
60. Why does epistasis cause ruggedness in the fitness landscape?
[5 points]
61. Why can epistasis cause neutrality in the fitness landscape?
[5 points]
62. Why does pleiotropy cause ruggedness in the fitness landscape?
[5 points]
63. How are epistasis and separability related with each other?
[5 points]
Chapter 18
Noise and Robustness
18.1 Introduction – Noise
In the context of optimization, three types of noise can be distinguished. The first form is
noise in the training data used as basis for learning or the objective functions [3019] (i). In
many applications of machine learning or optimization where a model for a given system is
to be learned, data samples including the input of the system and its measured response are
used for training.
Example E18.1 (Noise in the Training Data).
Some typical examples of situations where training data is the basis for the objective function
evaluation are
1. the usage of Global Optimization for building classifiers (for example for predicting
buying behavior using data gathered in a customer survey for training),
2. the usage of simulations for determining the objective values in Genetic Programming
(here, the simulated scenarios correspond to training cases), and
3. the fitting of mathematical functions to (x, y)-data samples (with artificial neural
networks or symbolic regression, for instance).
Since no measurement device is 100% accurate and there are always random errors, noise is
present in such optimization problems.
Besides inexactnesses and fluctuations in the input data of the optimization process, pertur-
bations are also likely to occur during the application of its results. This category subsumes
the other two types of noise: perturbations that may arise from (ii) inaccuracies in the pro-
cess of realizing the solutions and (iii) environmentally induced perturbations during the
applications of the products.
Example E18.2 (Creating the perfect Tire).
This issue can be illustrated by using the process of developing the perfect tire for a car
as an example. As input for the optimizer, all sorts of material coefficients and geometric
constants measured from all known types of wheels and rubber could be available. Since
these constants have been measured or calculated from measurements, they include a certain
degree of noise and imprecision (i).
188 18 NOISE AND ROBUSTNESS
The result of the optimization process will be the best tire construction plan discovered
during its course and it will likely incorporate different materials and structures. We would
hope that the tires created according to the plan will not fall apart if, accidently, an extra
0.0001% of a specific rubber component is used (ii). During the optimization process,
the behavior of many construction plans will be simulated in order to find out about their
utility. When actually manufactured, the tires should not behave unexpectedly when used in
scenarios different from those simulated (iii) and should instead be applicable in all driving
situations likely to occur.
The effects of noise in optimization have been studied by various researchers; Miller and
Goldberg [1897, 1898], Lee and Wong [1703], and Gurin and Rastrigin [1155] are some of
them. Many Global Optimization algorithms and theoretical results have been proposed
which can deal with noise. Some of them are, for instance, specialized
1. Genetic Algorithms [932, 1544, 2389, 2390, 2733, 2734],
2. Evolution Strategies [168, 294, 1172], and
3. Particle Swarm Optimization [1175, 2119] approaches.
18.2 The Problem: Need for Robustness
The goal of Global Optimization is to find the global optima of the objective functions. While
this is fully true from a theoretical point of view, it may not suffice in practice. Optimization
problems are normally used to find good parameters or designs for components or plans to
be put into action by human beings or machines. As we have already pointed out, there will
always be noise and perturbations in practical realizations of the results of optimization.
There is no process in the world that is 100% accurate and the optimized parameters,
designs, and plans have to tolerate a certain degree of imprecision.
Definition D18.1 (Robustness). A system in engineering or biology isrobust if it is able
to function properly in the face of genetic or environmental perturbations [1288, 2833].
Therefore, a local optimum (or even a non-optimal element) for which slight disturbances
only lead to gentle performance degenerations is usually favored over a global optimum
located in a highly rugged area of the fitness landscape [395]. In other words, local optima in
regions of the fitness landscape with strong causality are sometimes better than global optima
with weak causality. Of course, the level of this acceptability is application-dependent.
Figure 18.1 illustrates the issue of local optima which are robust vs. global optima which
are not. Table 18.1 gives some examples on problems which require robust optimization.
Example E18.3 (Critical Robustness).
When optimizing the control parameters of an airplane or a nuclear power plant, the global
optimum is certainly not used if a slight perturbation can have hazardous effects on the
system [2734].
Example E18.4 (Robustness in Manufacturing).
Wiesmann et al. [2936, 2937] bring up the topic of manufacturing tolerances in multilayer
optical coatings. It is no use to find optimal configurations if they only perform optimal
when manufactured to a precision which is either impossible or too hard to achieve on a
constant basis.
18.2. THE PROBLEM: NEED FOR ROBUSTNESS 189
Example E18.5 (Dynamic Robustness in Road-Salting).
The optimization of the decision process on which roads should be precautionary salted for
areas with marginal winter climate is an example of the need for dynamic robustness. The
global optimum of this problem is likely to depend on the daily (or even current) weather
forecast and may therefore be constantly changing. Handa et al. [1177] point out that it is
practically infeasible to let road workers follow a constantly changing plan and circumvent
this problem by incorporating multiple road temperature settings in the objective function
evaluation.
Example E18.6 (Robustness in Nature).
Tsutsui et al. [2733, 2734] found a nice analogy in nature: The phenotypic characteristics
of an individual are described by its genetic code. During the interpretation of this code,
perturbations like abnormal temperature, nutritional imbalances, injuries, illnesses and so
on may occur. If the phenotypic features emerging under these influences have low fitness,
the organism cannot survive and procreate. Thus, even a species with good genetic material
will die out if its phenotypic features become too sensitive to perturbations. Species robust
against them, on the other hand, will survive and evolve.
globaloptimum
robustlocaloptimum
f(x)
X
Figure 18.1: A robust local optimum vs. a “unstable” global optimum.
Table 18.1: Applications and Examples of robust methods.
Area References
Combinatorial Problems [644, 1177]
Distributed Algorithms and Sys-
tems
[52, 1401, 2898]
190 18 NOISE AND ROBUSTNESS
Engineering [52, 644, 1134, 1135, 2936, 2937]
Graph Theory [52, 63, 67, 644, 1401]
Image Synthesis [375]
Logistics [1177]
Mathematics [2898]
Motor Control [489]
Multi-Agent Systems [2925]
Software [1401]
Wireless Communication [63, 67, 644]
18.3 Countermeasures
For the special case where the phenome is a real vector space (X ⊆ R
n
), several approaches
for dealing with the need for robustness have been developed. Inspired by Taguchi meth-
ods
1
[2647], possible disturbances are represented by a vector δ = (δ
1
, δ
2
, .., δ
n
)
T
, δ
i

R in the method suggested by Greiner [1134, 1135]. If the distributions and influ-
ences of the δ
i
are known, the objective function f(x) : x ∈ X can be rewritten as
˜
f(x, δ) [2937]. In the special case where δ is normally distributed, this can be simplified
to
˜
f
_
(x
1

1
, x
2

2
, .., x
n

n
)
T
_
. It would then make sense to sample the probability
distribution of δ a number of t times and to use the mean values of
˜
f(x, δ) for each objective
function evaluation during the optimization process. In cases where the optimal value y

of the objective function f is known, Equation 18.1 can be minimized. This approach is
also used in the work of Wiesmann et al. [2936, 2937] and basically turns the optimization
algorithm into something like a maximum likelihood estimator (see Section 53.6.2 and Equa-
tion 53.257 on page 695).
f

(x) =
1
t
t

i=1
_
y


˜
f(x, δ
i
)
_
2
(18.1)
This method corresponds to using multiple, different training scenarios during the objective
function evaluation in situations where X ,⊆ R
n
. By adding random noise and artificial
perturbations to the training cases, the chance of obtaining robust solutions which are
stable when applied or realized under noisy conditions can be increased.
1
http://en.wikipedia.org/wiki/Taguchi_methods [accessed 2008-07-19]
Chapter 19
Overfitting and Oversimplification
In all scenarios where optimizers evaluate some of the objective values of the candidate
solutions by using training data, two additional phenomena with negative influence can be
observed: overfitting and (what we call) oversimplification. This is especially true if the
solutions represent functional dependencies inside the training data S
t
.
Example E19.1 (Function Approximation).
Let us assume that S
t
⊆ S be the training data sample from a space S of possible data.
Let us assume in the following that this space would be the cross product S = A B of a
space B containing the values of the variables whose dependencies on input variables from
A should be learned. The structure of each single element s[i] of the training data sample
S
t
= (s[0], s[1], .., s[n −1]) is s[i] = (a[i], b[i]) : a[i] ∈ A, b[i] ∈ B. The candidate solutions would
actually be functions x : A →B and the goal is to find the optimal function x

which exactly
represents the (previously unknown) dependencies between A and B as they occur in S
t
,
i. e., a function with x

(a[i]) ≈ b[i] ∀i ∈ 1..n.
Artificial neural networks, (normal) regression, and symbolic regression are the most
common techniques to solve such problems, but other techniques such as Particle Swarm
Optimization, Evolution Strategies, and Differential Evolution can be used too. Then, the
candidate solutions may, for instance, be polynomials x(a) = x
0
+ x
1
a
1
+ x
2
a
2
+ . . . or
otherwise structured functions and only the coefficients x
0
, x
1
, x
2
. . . would be optimized.
When trying to good candidate solutions x : A → B, possible objective functions (subject
to minimization) f
E
(x) =

n−1
i=0
[x(a[i]) −b[i][ or f
SE
(x) =

n−1
i=0
(x(a[i]) −b[i])
2
.
19.1 Overfitting
19.1.1 The Problem
Overfitting
1
is the emergence of an overly complicated candidate solution in an optimization
process resulting from the effort to provide the best results for as much of the available
training data as possible [792, 1040, 2397, 2531].
Definition D19.1 (Overfitting). A candidate solution ` x ∈ X optimized based on a set
of training data S
t
is considered to be overfitted if a less complicated, alternative solution
x

∈ X exists which has a smaller (or equal) error for the set of all possible (maybe even
infinitely many), available, or (theoretically) producible data samples S ∈ S.
1
http://en.wikipedia.org/wiki/Overfitting [accessed 2007-07-03]
192 19 OVERFITTING AND OVERSIMPLIFICATION
Hence, overfitting also stands for the creation of solutions which are not robust. Notice
that x

may, however, have a larger error in the training data S
t
than ` x. The phenomenon
of overfitting is best known and can often be encountered in the field of artificial neural
networks or in curve fitting
2
[1697, 1742, 2334, 2396, 2684] as described in Example E19.1.
Example E19.2 (Example E19.1 Cont.).
There exists exactly one polynomial
3
of the degree n − 1 that fits to each such training
data and goes through all its points.
4
Hence, when only polynomial regression is performed,
there is exactly one perfectly fitting function of minimal degree. Nevertheless, there will
also be an infinite number of polynomials with a higher degree than n −1 that also match
the sample data perfectly. Such results would be considered as overfitted.
In Figure 19.1, we have sketched this problem. The function x

(a) = b for A = B = R
shown in Fig. 19.1.b has been sampled three times, as sketched in Fig. 19.1.a. There exists
no other polynomial of a degree of two or less that fits to these samples than x

. Optimizers,
however, could also find overfitted polynomials of a higher degree such as ` x in Fig. 19.1.c
which also match the data. Here, ` x plays the role of the overly complicated solution which
will perform as good as the simpler model x

when tested with the training sets only, but
will fail to deliver good results for all other input data.
a
b
Fig. 19.1.a: Three sample points
of a linear function.
a
b
x

Fig. 19.1.b: The optimal solution
x

.
a
b
x
Fig. 19.1.c: The overly complex
variant ` x.
Figure 19.1: Overfitting due to complexity.
A very common cause for overfitting is noise in the sample data. As we have already pointed
out, there exists no measurement device for physical processes which delivers perfect results
without error. Surveys that represent the opinions of people on a certain topic or randomized
simulations will exhibit variations from the true interdependencies of the observed entities,
too. Hence, data samples based on measurements will always contain some noise.
Example E19.3 (Overfitting caused by Noise).
In Figure 19.2 we have sketched how such noise may lead to overfitted results. Fig. 19.2.a
illustrates a simple physical process obeying some quadratic equation. This process has been
measured using some technical equipment and the n = 100 = [S[ noisy samples depicted in
Fig. 19.2.b have been obtained. Fig. 19.2.c shows a function ` x resulting from optimization
that fits the data perfectly. It could, for instance, be a polynomial of degree 99 which goes
right through all the points and thus, has an error of zero. Hence, both f
E
(` x) and f
SQ
(` x)
from Example E19.1 would be zero. Although being a perfect match to the measurements,
this complicated solution does not accurately represent the physical law that produced the
sample data and will not deliver precise results for new, different inputs.
2
We will discuss overfitting in conjunction with Genetic Programming-based symbolic regression in Sec-
tion 49.1 on page 531.
3
http://en.wikipedia.org/wiki/Polynomial [accessed 2007-07-03]
4
http://en.wikipedia.org/wiki/Polynomial_interpolation [accessed 2008-03-01]
19.1. OVERFITTING 193
a
b
x

Fig. 19.2.a: The original physical
process.
a
b
Fig. 19.2.b: The measured train-
ing data.
a
b
x
Fig. 19.2.c: The overfitted result
` x.
Figure 19.2: Fitting noise.
From the examples we can see that the major problem that results from overfitted solutions
is the loss of generality.
Definition D19.2 (Generality). A solution of an optimization process is general if it is
not only valid for the sample inputs s[0], s[1], .., s[n −1] ∈ S
t
which were used for training
during the optimization process, but also for different inputs s ,∈ S
t
if such inputs exist.
19.1.2 Countermeasures
There exist multiple techniques which can be utilized in order to prevent overfitting to a
certain degree. It is most efficient to apply multiple such techniques together in order to
achieve best results.
19.1.2.1 Restricting the Problem Space
A very simple approach is to restrict the problem space X in a way that only solutions up
to a given maximum complexity can be found. In terms of polynomial function fitting, this
could mean limiting the maximum degree of the polynomials to be tested, for example.
19.1.2.2 Complement Optimization Criteria
Furthermore, the functional objective functions (such as those defined in Example E19.1)
which solely concentrate on the error of the candidate solutions should be augmented by
penalty terms and non-functional criteria which favor small and simple models [792, 1510].
19.1.2.3 Using much Training Data
Large sets of sample data, although slowing down the optimization process, may improve the
generalization capabilities of the derived solutions. If arbitrarily many training datasets or
training scenarios can be generated, there are two approaches which work against overfitting:
1. The first method is to use a new set of (randomized) scenarios for each evaluation
of each candidate solution. The resulting objective values then may differ largely
even if the same individual is evaluated twice in a row, introducing incoherence and
ruggedness into the fitness landscape.
194 19 OVERFITTING AND OVERSIMPLIFICATION
2. At the beginning of each iteration of the optimizer, a new set of (randomized) scenarios
is generated which is used for all individual evaluations during that iteration. This
method leads to objective values which can be compared without bias. They can be
made even more comparable if the objective functions are always normalized into some
fixed interval, say [0, 1].
In both cases it is helpful to use more than one training sample or scenario per evaluation
and to set the resulting objective value to the average (or better median) of the outcomes.
Otherwise, the fluctuations of the objective values between the iterations will be very large,
making it hard for the optimizers to follow a stable gradient for multiple steps.
19.1.2.4 Limiting Runtime
Another simple method to prevent overfitting is to limit the runtime of the optimizers [2397].
It is commonly assumed that learning processes normally first find relatively general solutions
which subsequently begin to overfit because the noise “is learned”, too.
19.1.2.5 Decay of Learning Rate
For the same reason, some algorithms allow to decrease the rate at which the candidate
solutions are modified by time. Such a decay of the learning rate makes overfitting less
likely.
19.1.2.6 Dividing Data into Training and Test Sets
If only one finite set of data samples is available for training/optimization, it is common
practice to separate it into a set of training data S
t
and a set of test cases S
c
. During the
optimization process, only the training data is used. The resulting solutions x are tested
with the test cases afterwards. If their behavior is significantly worse when applied to S
c
than when applied to S
t
, they are probably overfitted.
The same approach can be used to detect when the optimization process should be
stopped. The best known candidate solutions can be checked with the test cases in each
iteration without influencing their objective values which solely depend on the training data.
If their performance on the test cases begins to decrease, there are no benefits in letting the
optimization process continue any further.
19.2 Oversimplification
19.2.1 The Problem
With the term oversimplification (or overgeneralization) we refer to the opposite of overfit-
ting. Whereas overfitting denotes the emergence of overly complicated candidate solutions,
oversimplified solutions are not complicated enough. Although they represent the training
samples used during the optimization process seemingly well, they are rough overgeneral-
izations which fail to provide good results for cases not part of the training.
A common cause for oversimplification is sketched in Figure 19.3: The training sets only
represent a fraction of the possible inputs. As this is normally the case, one should always
be aware that such an incomplete coverage may fail to represent some of the dependencies
and characteristics of the data, which then may lead to oversimplified solutions. Another
possible reason for oversimplification is that ruggedness, deceptiveness, too much neutrality,
19.2. OVERSIMPLIFICATION 195
or high epistasis in the fitness landscape may lead to premature convergence and prevent
the optimizer from surpassing a certain quality of the candidate solutions. It then cannot
adapt them completely even if the training data perfectly represents the sampled process. A
third possible cause is that a problem space could have been chosen which does not include
the correct solution.
Example E19.4 (Oversimplified Polynomial).
Fig. 19.3.a shows a cubic function. Since it is a polynomial of degree three, four sample
points are needed for its unique identification. Maybe not knowing this, only three samples
have been provided in Fig. 19.3.b. By doing so, some vital characteristics of the function
are lost. Fig. 19.3.c depicts a square function – the polynomial of the lowest degree that fits
exactly to these samples. Although it is a perfect match, this function does not touch any
other point on the original cubic curve and behaves totally differently at the lower parameter
area.
However, even if we had included point P in our training data, it is still possible that the
optimization process could yield Fig. 19.3.c as a result. Having training data that correctly
represents the sampled system does not mean that the optimizer is able to find a correct
solution with perfect fitness – the other, previously discussed problematic phenomena can
prevent it from doing so. Furthermore, if it was not known that the system which was to be
modeled by the optimization process can best be represented by a polynomial of the third
degree, one could have limited the problem space X to polynomials of degree two and less.
Then, the result would likely again be something like Fig. 19.3.c, regardless of how many
training samples are used.
a
b
P
Fig. 19.3.a: The “real system”
and the points describing it.
a
b
Fig. 19.3.b: The sampled training
data.
b
a
Fig. 19.3.c: The oversimplified re-
sult ` x.
Figure 19.3: Oversimplification.
19.2.2 Countermeasures
In order to counter oversimplification, its causes have to be mitigated.
19.2.2.1 Using many Training Cases
It is not always possible to have training scenarios which cover the complete input space
A. If a program is synthesized Genetic Programming which has an 32bit integer number
as input, we would need 2
32
= 4 294 967 296 samples. By using multiple scenarios for each
individual evaluation, the chance of missing important aspects is decreased. These scenarios
can be replaced with new, randomly created ones in each generation, in order to decrease
this chance even more.
196 19 OVERFITTING AND OVERSIMPLIFICATION
19.2.2.2 Using a larger Problem Space
The problem space, i. e., the representation of the candidate solutions, should further be
chosen in a way which allows constructing a correct solution to the problem defined. Then
again, releasing too many constraints on the solution structure increases the risk of overfit-
ting and thus, careful proceeding is recommended.
Chapter 20
Dimensionality (Objective Functions)
20.1 Introduction
The most common way to define optima in multi-objective problems (MOPs, see Defini-
tion D3.6 on page 55) is to use the Pareto domination relation discussed in Section 3.3.5
on page 65. A candidate solution x
1
∈ X is said to dominate another candidate solution
x
2
∈ X (x
1
⊣ x
2
) if and only if its corresponding vector of objective values f (x
1
) is (par-
tially) less than the one of x
2
, i. e., i ∈ 1..n ⇒ f
i
(x
1
) ≤ f
i
(x
2
) and ∃i ∈ 1..n : f
i
(x
1
) < f
i
(x
2
)
(in minimization problems, see Definition D3.13). The domination rank
#
dom(x, X) of a
candidate solution x ∈ X is the number of elements in a reference set X ⊆ X it is dominated
by (Definition D3.16), a non-dominated element (relative to X) has a domination rank of
#
dom(x, X) = 0, and the solutions considered to be global optima are non-dominated for
the whole problem space X (
#
dom(x, X) = 0, Definition D3.14). These are the elements we
would like to find with optimization or, in the case of probabilistic metaheuristics, at least
wish to approximate as closely as possible.
Definition D20.1 (Dimension of MOP). We refer to the number n = [f [ of objective
functions f
1
, f
2
, .., f
n
∈ f of an optimization problem as its dimension (or dimensionality).
Many studies in the literature consider mainly bi-objective MOPs [1237, 1530]. As a conse-
quence, many algorithms are designed to deal with that kind of problems. However, MOPs
having a higher number of objective functions are common in practice – sometimes the
number of objectives reaches double figures [599] – leading to the so-called many-objective
optimization [2228, 2915]. This term has been coined by the Operations Research commu-
nity to denote problems with more than two or three objective functions [894, 3099]. It is
currently a hot research topic with the first conference on evolutionary multi-objective opti-
mization held in 2001 [3099]. Table 20.1 gives some examples for many-objective situations.
Table 20.1: Applications and Examples of Many-Objective Optimization.
Area References
Combinatorial Problems [1417, 1437]
Computer Graphics [1002, 1577, 1893, 2060]
198 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
Data Mining [2213]
Decision Making [1893]
Software [2213]
20.2 The Problem: Many-Objective Optimization
When the dimension of the MOPs increases, the majority of candidate solutions are non-
dominated. As a consequence, traditional nature-inspired algorithms have to be redesigned:
According to Hughes [1303] and on basis of the results provided by Purshouse and Fleming
[2231], it can be assumed that purely Pareto-based optimization approaches (maybe with
added diversity preservation methods) will not perform good in problems with four or more
objectives. Results from the application of such algorithms (NSGA-II [749], SPEA-2 [3100],
PESA [639]) to two or three objectives cannot simply be extrapolated to higher numbers
of optimization criteria [1530]. Kang et al. [1491] consider Pareto optimality as unfair
and imperfect in many-objective problems. Hughes [1303] indicates that the following two
hypotheses may hold:
1. An optimizer that produces an entire Pareto set in one run is better than generating the
Pareto set through many single-objective optimizations using an aggregation approach
such as weighted min-max (see Section 3.3.4) if the number of objective function
evaluations is fixed.
2. Optimizers that use Pareto ranking based methods to sort the population will be very
effective for small numbers of objectives, but not perform as effectively for many-
objective optimization in comparison with methods based on other approaches.
The results of Jaszkiewicz [1437] and Ishibuchi et al. [1418] further demonstrate the degener-
ation of the performance of the traditional multi-objective metaheuristics in many-objective
problems in comparison with single-objective approaches. Ikeda et al. [1400] show that vari-
ous elements, which are apart from the true Pareto frontier, may survive as hardly-dominated
solutions which leads to a decrease in the probability of producing new candidate solutions
dominating existing ones. This phenomenon is called dominance resistant. Deb et al. [750]
recognize this problem of redundant solutions and incorporate an example function provok-
ing it into their test function suite for real-valued, multi-objective optimization.
Additionally to this algorithm-sided limitations, Bouyssou [374] suggests that based on,
for instance, the limited capabilities of the human mind [1900], an operator will not be able
to make efficient decisions if more than a dozen objectives are involved. Visualizing the
solutions in a human-understandable way becomes more complex with the rising number of
dimensions, too [1420].
Example E20.1 (Non-Dominated Elements in Random Samples).
Deb [740] showed that the number of non-dominated elements in random samples increases
quickly with the dimension. The hyper-surface of the Pareto frontier may increase exponen-
tially with the number of objective functions [1420]. Like Ishibuchi et al. [1420], we want to
illustrate this issue with an experiment.
Imagine that a population-based optimization approach was used to solve a many-
objective problem with n = [f [ objective functions. At startup, the algorithm will fill
the initial population pop with ps randomly created individuals p. If we assume the problem
space X to be continuous and the creation process sufficiently random, no two individuals
in pop will be alike. If the objective functions are locally invertible and sufficiently indepen-
dent, we can assume that there will be no two candidate solution in pop for which any of
the objectives takes on the same value.
20.2. THE PROBLEM: MANY-OBJECTIVE OPTIMIZATION 199
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.a: ps = 5
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.b: ps = 10
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.c: ps = 50
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.d: ps = 100
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.e: ps = 500
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.f: ps = 1000
0
8
12
2
4
6
8
10
12
14
0
0.1
0.2
0.3
4
k
P
(
#
d
o
m
=
k
p
s
,
n
)
|
no.ofobjectivesn
Fig. 20.1.g: ps = 3000
Figure 20.1: Approximated distribution of the domination rank in random samples for
several population sizes.
200 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
The distribution P (
#
dom = k[ ps, n) of the probability that the domination rank “
#
dom”
of a randomly selected individual from the initial, randomly created population is exactly
k ∈ 0..ps −1 depends on the population size ps and the number of objective functions n. We
have approximated this probability distribution with experiments where we determined the
domination rank of ps n-dimensional vectors where each element is drawn from a uniform
distribution for several values of n and ps, spanning from n = 2 to n = 50 and ps = 3 to
ps = 3600. (For n = 1, no experiments where performed since there, each domination rank
from 0 to ps −1 occurs exactly once in the population.)
Figure 20.1 and Figure 20.2 illustrate the results of these experiments, the arithmetic
means of 100 000 runs for each configuration. In Figure 20.1, the approximated distribu-
tion of P (
#
dom = k[ ps, n) is illustrated for various n. The value of P (
#
dom = 0[ ps, n)
quickly approaches one for rising n. We limited the z-axis to 0.3 so the behavior of the
other ranks k > 0, which quickly becomes negligible, can still be seen. For fixed n and
ps, P (
#
dom = k[ ps, n) drops rapidly, from the looks roughly similar to an exponential dis-
tribution (see Section 53.4.3). For a fixed k and ps, P (
#
dom = k[ ps, n) takes on roughly
the shape of a Poisson distribution (see Section 53.3.2) along the n-axis, i. e., it rises for
some time and then falls again. With rising ps, the maximum of this distribution shifts
to the right, meaning that an algorithm with a higher population size may be able to deal
with more objective functions. Still, even for ps = 3000, around 64% of the population are
already non-dominated in this experiment.
500
1000
1500
2000
2500
3000
3500
5
10
15
20
0
0.2
0.4
0.6
0.8
1.0
p
o
p
u
la
t
io
n
s
iz
e
p
s
P
(
#
d
o
m
=
0
p
s
,
n
)
|
n
o
.

o
f

o
b
j
e
c
t
i
v
e
s

n
Figure 20.2: The proportion of non-dominated candidate solutions for several population
sizes ps and dimensionalities n.
The fraction of non-dominated elements in the random populations is illustrated in Fig-
ure 20.2. We find that it again seems to rise exponentially with n. The population size ps in
Figure 20.2 seems to have only an approximately logarithmically positive influence: If we list
the population sizes required to keep the fraction of non-dominated candidate solutions at the
same level of P (
#
dom = 0[ ps = 5, n = 2) ≈ 0.457, we find P (
#
dom = 0[ ps = 12, n = 3) ≈
0.466, P (
#
dom = 0[ ps = 35, n = 4) ≈ 0.447, P (
#
dom = 0[ ps = 90, n = 5) ≈ 0.455,
P (
#
dom = 0[ ps = 250, n = 6) ≈ 0.452, P (
#
dom = 0[ ps = 650, n = 7) ≈ 0.460, and
P (
#
dom = 0[ ps = 1800, n = 8) ≈ 0.459. An extremely coarse rule of thumb for this ex-
periment would hence be that around 0.6e
n
individuals are required in the population to
hold the proportion of non-dominated at around 46%.
20.3. COUNTERMEASURES 201
To sum things up, the increasing dimensionality of the objective space leads to three main
problems [1420]:
1. The performance of traditional approaches based solely on Pareto comparisons dete-
riorates.
2. The utility of the solutions cannot be understood by the human operator anymore.
3. The number of possible Pareto-optimal solutions may increase exponentially.
Finally, it should be noted that there exists also interesting research where objectives are
added to a single-objective problem to make it easier [1553, 2025] which you can find sum-
marized in Section 13.4.6. Indeed, Brockhoff et al. [425] showed that in some cases, adding
objective functions may be beneficial (while it is not in others).
20.3 Countermeasures
Various countermeasures have been proposed against the problem of dimensionality. Nice
surveys on contemporary approaches to many-objective optimization with evolutionary ap-
proaches, to the difficulties arising in many-objective, and on benchmark problems have
been provided by Hiroyasu et al. [1237], Ishibuchi et al. [1420], and L´opez Jaimes and Coello
Coello [1775]. In the following we list some of approaches for many-objective optimization,
mostly utilizing the information provided in [1237, 1420].
20.3.1 Increasing Population Size
The most trivial measure, however, has not been introduced in these studies: increasing
the population size. Well, from Example E20.1 we know that this will work only for a
few objective functions and we have to throw in exponentially more individuals in order
to fight the influence of many objectives. For five or six objective functions, we may still
be able to obtain good results if we do this – for the price of many additional objective
function evaluations. If we follow the rough approximation equation from the example, it
would already take a population size of around ps = 100 000 in order to keep the fraction of
non-dominated candidate solutions below 0.5 (in the scenario used there, that is). Hence,
increasing the population size can only get us so far.
20.3.2 Increasing Selection Pressure
A very simple idea for improving the way multi-objective approaches scale with increasing
dimensionality is to increase the exploration pressure into the direction of the Pareto frontier.
Ishibuchi et al. [1420] distinguish approaches which modify the definition of domination in
order to reduce the number non-dominated candidate solutions in the population [2403] and
methods which assign different ranks to non-dominated solutions [636, 829, 830, 1576, 1635,
2629]. Farina and Amato [894] propose to rely fuzzy Pareto methods instead of pure Pareto
comparisons.
20.3.3 Indicator Function-based Approaches
Ishibuchi et al. [1420] point further out that fitness assignment methods could be used which
are not based on Pareto dominance. According to them, one approach is using indicator
functions such as those involving hypervolume metrics [1419, 2838].
202 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS)
20.3.4 Scalerizing Approaches
Another fitness assignment method are scalarizing functions [1420] which treat many-
objective problems with single-objective-style methods. Advanced single-objective optimiza-
tion like those developed by Hughes [1303] could be used instead of Pareto comparisons.
Wagner et al. [2837, 2838] support the results and assumptions of Hughes [1303] in terms
of single-objective being better in many-objective problems than traditional multi-objective
Evolutionary Algorithms such as NSGA-II [749] or SPEA 2 [3100], but also refute the gen-
eral idea that all Pareto-based cannot cope with their performance. They also show that
more recent approaches [880, 2010, 3096] which favor more than just distribution aspects
can perform very well. Other scalarizing methods can be found in [1304, 1416, 1417, 1437]
20.3.5 Limiting the Search Area in the Objective Space
Hiroyasu et al. [1237] point out that the search area in the objective space can be lim-
ited. This leads to a decrease of the number of non-dominated points [1420] and can be
achieved by either incorporating preference information [746, 933, 2695] or by reducing the
dimensionality [422–424, 745, 2411].
20.3.6 Visualization Methods
Approaches for visualizing solutions of many-objective problems in order to make them
more comprehensible have been provided by Furuhashi and Yoshikawa [1002], K¨ oppen and
Yoshida [1577], Obayashi and Sasaki [2060] and in [1893].
Chapter 21
Scale (Decision Variables)
In Chapter 20, we discussed how an increasing number of objective functions can threaten
the performance of optimization algorithms. We referred to this problem as dimensionality,
i. e., the dimension of the objective space. However, there is another space in optimization
whose dimension can make an optimization problem difficult: the search space G. The
“curse of dimensionality”
1
of the search space stands for the exponential increase of its
volume with the number of decision variables [260, 261]. In order to better distinguish
between the dimensionality of the search and the objective space, we will refer to the former
one as scale.
In Example E21.1, we give an example for the growth of the search space with the scale
of a problem and in Table 21.1, some large-scale optimization problems are listed.
Example E21.1 (Small-scale versus large-scale).
The problem of small-scale versus large-scale problems can best be illustrated using discrete
or continuous vector-based search spaces as example. If we search, for instance, in the
natural interval G = (1..10), there are ten points which could be the optimal solution.
When G = (1..10)
2
, there exist one hundred possible results and for G = (1..10)
3
, it is
already one thousand. In other words, if the number of points in G be n, then the number
of points in G
m
is n
m
.
Table 21.1: Applications and Examples of Large-Scale Optimization.
Area References
Data Mining [162, 994, 1748]
Databases [162]
Distributed Algorithms and Sys-
tems
[1748, 3031]
Engineering [2860]
Function Optimization [551, 1927, 2130, 2486, 3020]
Graph Theory [3031]
Software [3031]
21.1 The Problem
Especially in combinatorial optimization, the issue of scale has been considered for a long
time. Here, the computational complexity (see Section 12.1) denotes how many algorithm
1
http://en.wikipedia.org/wiki/Curse_of_dimensionality [accessed 2009-10-20]
204 21 SCALE (DECISION VARIABLES)
steps are needed to solve a problem which consists of n parameters. As sketched in Fig-
ure 12.1 on page 145, the time requirements to solve problems which require step numbers
growing exponentially with n quickly exceed any feasible bound. Here, especially AT-hard
problems should be mentioned, since they always exhibit this feature.
Besides combinatorial problems, numerical problems suffer the curse of dimensionality
as well.
21.2 Countermeasures
21.2.1 Parallelization and Distribution
When facing problems of large-scale, the problematic symptom is the high runtime require-
ment. Thus, any measure to use as much computational power as available can be a remedy.
Obviously, there (currently) exists no way to solve large-scale AT-hard problems exactly
within feasible time. With more computers, cores, or hardware, only a linear improvement
of runtime can be achieved at most. However, for problems residing in the gray area between
feasible and infeasible, distributed computing [1747] may be the method of choice.
21.2.2 Genetic Representation and Development
The best way to solve a large-scale problem is to solve it as a small-scale problem. For some
optimization tasks, it is possible to choose a search space which has a smaller size than
the problem space ([G[ < [X[). Indirect genotype-phenotype mappings can link the spaces
together, as we will discuss later in Section 28.2.2.
Here, one option are generative mappings (see Section 28.2.2.1) which step-by-step con-
struct a complex phenotype by translating a genotype according to some rules. Grammatical
Evolution, for instance, unfolds a symbol according to a BNF grammar with rules identified
in a genotype. This recursive process can basically lead to arbitrarily complex phenotypes.
Another approach is to apply a developmental, ontogenic mapping (see Section 28.2.2.2)
that uses feedback from simulations or objective functions in this process.
Example E21.2 (Dimension Reduction via Developmental Mapping).
Instead of optimizing (may be up to a million) points on the surface of an airplane’s wing
(see Example E2.5 and E4.1) in order to find an optimal shape, the weights of an artificial
neural network used to move surface points to a beneficial configuration could be optimized.
The second task would involve significantly fewer variables, since an artificial neural network
with usually less than 1000 or even 100 weights could be applied. Of course, the number
of possible solutions which can be discovered by this approach decreases (it is at most [G[).
However, the problem can become tangible and good results may be obtained in reasonable
time.
21.2.3 Exploiting Separability
Sometimes, parts of candidate solutions are independent from each other and can be opti-
mized more or less separately. In such a case of low epistasis (see Chapter 17), a large-scale
problem can be divided into several problems of smaller-scale which can be optimized sepa-
rately. If solving a problem of scale n takes 2
n
algorithm steps, solving two problems of scale
0.5n will clearly lead to a great runtime reduction. Such a reduction may even be worth
21.2. COUNTERMEASURES 205
sacrificing some solution quality. If the optimization problem at hand exhibits low epistasis
or is separable, such a sacrifice may even be unnecessary.
Coevolution has shown to be an efficient approach in combinatorial optimization [1311].
If extended with a cooperative component, i. e., to Cooperative Coevolution [2210, 2211],
it can efficiently exploit separability in numerical problems and lead to better results [551,
3020].
21.2.4 Combination of Techniques
Generally, it can be a good idea to concurrently use sets of algorithm [1689] or portfo-
lios [2161] working on the same or different populations. This way, the different strengths
of the optimization methods can be combined. In the beginning, for instance, an algorithm
with good convergence speed may be granted more runtime. Later, the focus can shift
towards methods which can retain diversity and are not prone to premature convergence.
Alternatively, a sequential approach can be performed, which starts with one algorithm and
switches to another one when no further improvements are found [? ]. This way, first an
interesting area in the search space can be found which then can be investigated in more
detail.
Chapter 22
Dynamically Changing Fitness Landscape
It should also be mentioned that there exist problems with dynamically changing fitness
landscapes [396, 397, 400, 1957, 2302]. The task of an optimization algorithm is then to
provide candidate solutions with momentarily optimal objective values for each point in time.
Here we have the problem that an optimum in iteration t will possibly not be an optimum in
iteration t +1 anymore. Problems with dynamic characteristics can, for example, be tackled
with special forms [3019] of
1. Evolutionary Algorithms [123, 398, 1955, 1956, 2723, 2941],
2. Genetic Algorithms [1070, 1544, 1948–1950],
3. Particle Swarm Optimization [316, 498, 499, 1728, 2118],
4. Differential Evolution [1866, 2988], and
5. Ant Colony Optimization [1148, 1149, 1730].
The moving peaks benchmarks by Branke [396, 397] and Morrison and De Jong [1957] are
good examples for dynamically changing fitness landscapes. You can find them discussed
in ?? on page ??. In Table 22.1, we give some examples for applications of dynamic opti-
mization.
Table 22.1: Applications and Examples of Dynamic Optimization.
Area References
Chemistry [1909, 2731, 3007]
Combinatorial Problems [400, 644, 1148, 1446, 1486, 1730, 2120, 2144, 2499]
Cooperation and Teamwork [1996, 1997, 2424, 2502]
Data Mining [3018]
Distributed Algorithms and Sys-
tems
[353, 786, 962, 964–966, 1401, 1733, 1840, 1909, 1935, 1982,
1995, 2387, 2425, 2694, 2731, 2791, 3007, 3032, 3106]
Engineering [556, 644, 786, 1486, 1733, 1840, 1909, 1982, 2144, 2425,
2499, 2694, 2731, 3007]
Graph Theory [353, 556, 644, 786, 962, 964–966, 1401, 1486, 1733, 1840,
1909, 1935, 1982, 1995, 2144, 2387, 2425, 2502, 2694, 2731,
2791, 3032, 3106]
Logistics [1148, 1446, 1730, 2120]
Motor Control [489]
Multi-Agent Systems [2144, 2925]
208 22 DYNAMICALLY CHANGING FITNESS LANDSCAPE
Security [1486]
Software [1401, 1486, 1995, 3032]
Wireless Communication [556, 644, 1486, 2144, 2502]
Chapter 23
The No Free Lunch Theorem
By now, we know the most important difficulties that can be encountered when applying an
optimization algorithm to a given problem. Furthermore, we have seen that it is arguable
what actually an optimum is if multiple criteria are optimized at once. The fact that there
is most likely no optimization method that can outperform all others on all problems can,
thus, easily be accepted. Instead, there exists a variety of optimization methods specialized
in solving different types of problems. There are also algorithms which deliver good results
for many different problem classes, but may be outperformed by highly specialized methods
in each of them. These facts have been formalized by Wolpert and Macready [2963, 2964]
in their No Free Lunch Theorems
1
(NFL) for search and optimization algorithms.
23.1 Initial Definitions
Wolpert and Macready [2964] consider single-objective optimization over a finite do-
main and define an optimization problem φ(g) ≡ f(gpm(g)) as a mapping of a search
space G to the objective space R.
2
Since this definition subsumes the problem space
and the genotype-phenotype mapping, only skipping the possible search operations, it
is very similar to our Definition D6.1 on page 103. They further call a time-ordered
set d
m
of m distinct visited points in G R a “sample” of size m and write d
m

¦(d
g
m
(1), d
y
m
(1)) , (d
g
m
(2), d
y
m
(2)) , . . . , (d
g
m
(m), d
y
m
(m))¦. d
g
m
(i) is the genotype and d
y
m
(i)
the corresponding objective value visited at time step i. Then, the set D
m
= (GR)
m
is
the space of all possible samples of length m and D = ∪
m≥0
D
m
is the set of all samples of
arbitrary size.
An optimization algorithm a can now be considered to be a mapping of the previously
visited points in the search space (i. e., a sample) to the next point to be visited. Formally,
this means a : D →G. Without loss of generality, Wolpert and Macready [2964] only regard
unique visits and thus define a : d ∈ D → g : g ,∈ d.
Performance measures Ψ can be defined independently from the optimization algorithms
only based on the values of the objective function visited in the samples d
m
. If the objective
function is subject to minimization, Ψ(d
y
m
) = min ¦d
y
m
: i = 1..m¦ would be the appropriate
measure.
Often, only parts of the optimization problem φ are known. If the minima of the objective
function f were already identified beforehand, for instance, its optimization would be useless.
Since the behavior in wide areas of φ is not obvious, it makes sense to define a probability
P (φ) that we are actually dealing with φ and no other problem. Wolpert and Macready
[2964] use the handy example of the Traveling Salesman Problem in order to illustrate this
1
http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization [accessed 2008-
03-28]
2
Notice that we have partly utilized our own notations here in order to be consistent throughout the
book.
210 23 THE NO FREE LUNCH THEOREM
issue. Each distinct TSP produces a different structure of φ. Yet, we would use the same
optimization algorithm a for all problems of this class without knowing the exact shape of φ.
This corresponds to the assumption that there is a set of very similar optimization problems
which we may encounter here although their exact structure is not known. We act as if there
was a probability distribution over all possible problems which is non-zero for the TSP-alike
ones and zero for all others.
23.2 The Theorem
The performance of an algorithm a iterated m times on an optimization problem φ can
then be defined as P (d
y
m
[φ, m, a), i. e., the conditional probability of finding a particular
sample d
y
m
. Notice that this measure is very similar to the value of the problem landscape
Φ(x, τ) introduced in Definition D5.2 on page 95 which is the cumulative probability that
the optimizer has visited the element x ∈ X until (inclusively) the τth evaluation of the
objective function(s).
Wolpert and Macready [2964] prove that the sum of such probabilities over all possible
optimization problems φ on finite domains is always identical for all optimization algorithms.
For two optimizers a
1
and a
2
, this means that

∀φ
P (d
y
m
[φ, m, a
1
) =

∀φ
P (d
y
m
[φ, m, a
2
) (23.1)
Hence, the average over all φ of P (d
y
m
[φ, m, a) is independent of a.
23.3 Implications
From this theorem, we can immediately follow that, in order to outperform a
1
in one opti-
mization problem, a
2
will necessarily perform worse in another. Figure 23.1 visualizes this
issue. It shows that general optimization approaches like Evolutionary Algorithms can solve
a variety of problem classes with reasonable performance. In this figure, we have chosen a
performance measure Φ subject to maximization, i. e., the higher its values, the faster will
the problem be solved. Hill Climbing approaches, for instance, will be much faster than
Evolutionary Algorithms if the objective functions are steady and monotonous, that is, in a
smaller set of optimization tasks. Greedy search methods will perform fast on all problems
with matroid
3
structure. Evolutionary Algorithms will most often still be able to solve these
problems, it just takes them longer to do so. The performance of Hill Climbing and greedy
approaches degenerates in other classes of optimization tasks as a trade-off for their high
utility in their “area of expertise”.
One interpretation of the No Free Lunch Theorem is that it is impossible for any opti-
mization algorithm to outperform random walks or exhaustive enumerations on all possible
problems. For every problem where a given method leads to good results, we can construct
a problem where the same method has exactly the opposite effect (see Chapter 15). As
a matter of fact, doing so is even a common practice to find weaknesses of optimization
algorithms and to compare them with each other, see Section 50.2.6, for example.
3
http://en.wikipedia.org/wiki/Matroid [accessed 2008-03-28]
23.3. IMPLICATIONS 211
allpossibleoptimizationproblems
p
e
r
f
o
r
m
a
n
c
e
randomwalkorexhaustiveenumerationor...
generaloptimizationalgorithm-anEA,forinstance
specializedoptimizationalgorithm1;ahillclimber,forinstance
specializedoptimizationalgorithm2;adepth-firstsearch,forinstance
verycrudesketch
Figure 23.1: A visualization of the No Free Lunch Theorem.
Example E23.1 (Fitness Landscapes for Minimization and the NFL).
Let us assume that the Hill Climbing algorithm is applied to the objective functions given
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
r
i
g
h
t
r
i
g
h
t
w
r
o
n
g
I II III
Fig. 23.2.a: A landscape good for Hill
Climbing.
o
b
j
e
c
t
i
v
e

v
a
l
u
e
s

f
(
x
)
x
w
r
o
n
g
r
i
g
h
t
IV V
Fig. 23.2.b: A landscape bad for Hill
Climbing.
Figure 23.2: Fitness Landscapes and the NFL.
in Figure 23.2 which are both subject to minimization. We divided both fitness landscapes
into sections labeled with roman numerals. A Hill Climbing algorithm will generate points
in the proximity of the currently best solution and transcend to these points if they are
better. Hence, in both problems it will always follow the downward gradient in all three
sections. In Fig. 23.2.a, this is the right decision in sections II and III because the global
optimum is located at the border between them, but wrong in the deceptive section I. In
Fig. 23.2.b, exactly the opposite is the case: Here, a gradient descend is only a good idea
in section IV (which corresponds to I in Fig. 23.2.a) whereas it leads away from the global
optimum in section V which has the same extend as sections II and III in Fig. 23.2.a.
Hence, our Hill Climbing would exhibit very different performance when applied to
212 23 THE NO FREE LUNCH THEOREM
Fig. 23.2.a and Fig. 23.2.b. Depending on the search operations available, it may be even
outpaced by a random walk on Fig. 23.2.b in average. Since we may not know the nature of
the objective functions before the optimization starts, this may become problematic. How-
ever, Fig. 23.2.b is deceptive in the whole section V, i. e., most of its problem space, which
is not the case to such an extent in many practical problems.
Another interpretation is that every useful optimization algorithm utilizes some form of
problem-specific knowledge. Radcliffe [2244] basically states that without such knowledge,
search algorithms cannot exceed the performance of simple enumerations. Incorporating
knowledge starts with relying on simple assumptions like “if x is a good candidate solution,
than we can expect other good candidate solutions in its vicinity”, i. e., strong causality.
The more (correct) problem specific knowledge is integrated (correctly) into the algorithm
structure, the better will the algorithm perform. On the other hand, knowledge correct
for one class of problems is, quite possibly, misleading for another class. In reality, we use
optimizers to solve a given set of problems and are not interested in their performance when
(wrongly) applied to other classes.
Example E23.2 (Arbitrary Landscapes and Exhaustive Enumeration).
An even simpler thought experiment than Example E23.1 on optimization thus goes as
follows: If we consider an optimization problem of truly arbitrary structure, then any element
in X could be the global optimum. In a arbitrary (or random) fitness landscape where the
objective value of the global optimum is unknown, it is impossible to be sure that the best
candidate solution found so far is the global optimum until all candidate solutions have
been evaluated. In this case, a complete enumeration or uninformed search is the optimal
approach. In most randomized search procedures, we cannot fully prevent that a certain
candidate solution is discovered and evaluated more than once, at least without keeping all
previously discovered elements in memory which is infeasible in most cases. Hence, especially
when approaching full coverage of the problem space, these methods will waste objective
function evaluations and thus, become inferior.
The number of arbitrary fitness landscapes should be at least as high as the number
of “optimization-favorable” ones: Assuming a favorable landscape with a single global op-
timum, there should be at least one position to which the optimum could be moved in
order to create a fitness landscape where any informed search is outperformed by exhaustive
enumeration or a random walk.
The rough meaning of the NLF is that all black-box optimization methods perform equally
well over the complete set of all optimization problems [2074]. In practice, we do not want
to apply an optimizer to all possible problems but to only some, restricted classes. In terms
of these classes, we can make statements about which optimizer performs better.
Today, there exists a wide range of work on No Free Lunch Theorems for many different
aspects of machine learning. The website http://www.no-free-lunch.org/
4
gives
a good overview about them. Further summaries, extensions, and criticisms have been
provided by K¨ oppen et al. [1578], Droste et al. [837, 838, 839, 840], Oltean [2074], and
Igel and Toussaint [1397, 1398]. Radcliffe and Surry [2247] discuss the NFL in the context
of Evolutionary Algorithms and the representations used as search spaces. The No Free
Lunch Theorem is furthermore closely related to the Ugly Duckling Theorem
5
proposed by
Watanabe [2868] for classification and pattern recognition.
4
accessed: 2008-03-28
5
http://en.wikipedia.org/wiki/Ugly_duckling_theorem [accessed 2008-08-22]
23.4. INFINITE AND CONTINUOUS DOMAINS 213
23.4 Infinite and Continuous Domains
Recently, Auger and Teytaud [145, 146] showed that the No Free Lunch Theorem holds only
in a weaker form for countable infinite domains. For continuous domains (i. e., uncountable
infinite ones), it does not hold. This means for a given performance measure and a limit
of the maximally allowed objective function evaluations, it can be possible to construct an
optimal algorithm for a certain family of objective functions [145, 146].
A factor that should be considered when thinking about these issues is, that real ex-
isting computers can only work with finite search spaces, since they have finite memory.
In the end, even floating point numbers are finite in value and precision. Hence, there is
always a difference between the behavior of the algorithmic, mathematical definition of an
optimization method and its practical implementation for a given computing environment.
Unfortunately, the author of this book finds himself unable to state with certainty whether
this means that the NFL does hold for realistic continuous optimization or whether the
proofs by Auger and Teytaud [145, 146] carry over to actual programs.
23.5 Conclusions
So far, the subject of this chapter was the question about what makes optimization prob-
lems hard, especially for metaheuristic approaches. We have discussed numerous different
phenomena which can affect the optimization process and lead to disappointing results.
If an optimization process has converged prematurely, it has been trapped in a non-
optimal region of the search space from which it cannot “escape” anymore (Chapter 13).
Ruggedness (Chapter 14) and deceptiveness (Chapter 15) in the fitness landscape, often
caused by epistatic effects (Chapter 17), can misguide the search into such a region. Neu-
trality and redundancy (Chapter 16) can either slow down optimization because the appli-
cation of the search operations does not lead to a gain in information or may also contribute
positively by creating neutral networks from which the search space can be explored and
local optima can be escaped from. Noise is present in virtually all practical optimization
problems. The solutions that are derived for them should be robust (Chapter 18). Also,
they should neither be too general (oversimplification, Section 19.2) nor too specifically
aligned only to the training data (overfitting, Section 19.1). Furthermore, many practical
problems are multi-objective, i. e., involve the optimization of more than one criterion at
once (partially discussed in Section 3.3), or concern objectives which may change over time
(Chapter 22).
The No Free Lunch Theorem argues that it is not possible to develop the one optimization
algorithm, the problem-solving machine which can provide us with near-optimal solutions
in short time for every possible optimization task (at least in finite domains). Even though
this proof does not directly hold for purely continuous or infinite search spaces, this must
sound very depressing for everybody new to this subject.
Actually, quite the opposite is the case, at least from the point of view of a researcher.
The No Free Lunch Theorem means that there will always be new ideas, new approaches
which will lead to better optimization algorithms to solve a given problem. Instead of being
doomed to obsolescence, it is far more likely that most of the currently known optimization
methods have at least one niche, one area where they are excellent. It also means that it
is very likely that the “puzzle of optimization algorithms” will never be completed. There
will always be a chance that an inspiring moment, an observation in nature, for instance,
may lead to the invention of a new optimization algorithm which performs better in some
problem areas than all currently known ones.
214 23 THE NO FREE LUNCH THEOREM
Evolutionary
Algorithms
ACO
PSO
Extremal
Optimiz.
Simulated
Annealing
EDA
Tabu
Search
Branch&
Bound
Dynamic
Program.
A
Search
«
IDDFS
Hill
Climbing
Memetic
Algorithms
Downhill
Simplex
GA,GP,ES,
DE,EP,...
LCS
RFD
Random
Optimiz.
Figure 23.3: The puzzle of optimization algorithms.
Chapter 24
Lessons Learned: Designing Good Encodings
Most Global Optimization algorithms share the premise that solutions to problems are either
(a) elements of a somewhat continuous space that can be approximated stepwise or (b) that
they can be composed of smaller modules which already have good attributes even when oc-
curring separately.Especially in the second case, the design of the search space (or genome)
G and the genotype-phenotype mapping “gpm” is vital for the success of the optimization
process. It determines to what degree these expected features can be exploited by defining
how the properties and the behavior of the candidate solutions are encoded and how the
search operations influence them. In this chapter, we will first discuss a general theory
on how properties of individuals can be defined, classified, and how they are related. We
will then outline some general rules for the design of the search space G, the search opera-
tions searchOp ∈ Op, and the genotype-phenotype mapping “gpm”. These rules represent
the lessons learned from our discussion of the possible problematic aspects of optimization
tasks.In software engineering, there exists a variety of design patterns
1
that describe good
practice and experience values. Utilizing these patterns will help the software engineer to
create well-organized, extensible, and maintainable applications.
Whenever we want to solve a problem with Global Optimization algorithms, we need to
define the structure of a genome. The individual representation along with the genotype-
phenotype mapping is a vital part of Genetic Algorithms and has major impact on the
chance of finding good solutions. Like in software engineering, there exist basic patterns
and rules of thumb on how to design them properly.
Based on our previous discussions, we can now provide some general best practices for the
design of encodings from different perspectives. These principles can lead to finding better
solutions or higher optimization speed if considered in the design phase [1075, 2032, 2338].
We will outline suggestions given by different researchers transposed into the more general
vocabulary of formae (introduced in Chapter 9 on page 119).
24.1 Compact Representation
[1075] and Ronald [2323] state that the representations of the formae in the search space
should be as short as possible. The number of different alleles and the lengths of the different
genes should be as small as possible. In other words, the redundancy in the genomes should
be minimal. In Section 16.5 on page 174, we will discuss that uniform redundancy slows
down the optimization process.
24.2 Unbiased Representation
UnbiasednessThe search space G should be unbiased in the sense that all phenotypes are
1
http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29 [accessed 2007-08-12]
216 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
represented by the same number of genotypes [2032, 2113, 2115]. This property allows to
efficiently select an unbiased starting population which gives the optimizer the chance of
reaching all parts of the problem space.
∀x
1
, x
2
∈ X ⇒[¦g ∈ G : x
1
= gpm(g)¦[ ≈ [¦g ∈ G : x
2
= gpm(g)¦[ (24.1)
Beyer and Schwefel [300, 302] furthermore also request for unbiased variation operators. The
unary mutation operations in Evolution Strategy, for example, should not give preference to
any specific change of the genotypes. The reason for this is that selection operations, i. e.,
the methods which chooses the candidate solutions already exploits fitness information and
guides the search into one direction. The variation operators thus should not do this job as
well but instead try to increase the diversity. They should perform exploration as opposed
to the exploitation performed by selection.
Beyer and Schwefel [302] therefore again suggest to use normal distributions to mutate
real-valued vectors if G ⊆ R
n
and Rudolph [2348] propose to use geometrically distributed
random numbers to offset genotype vectors in G ⊆ Z
n
.
24.3 Surjective GPM
Palmer [2113], Palmer and Kershenbaum [2115] and Della Croce et al. [765] state that a
good search space G and genotype-phenotype mapping “gpm” should be able to repre-
sent all possible phenotypes and solution structures, i. e., be surjective (see Section 51.7 on
page 644) [2032].
∀x ∈ X ⇒ ∃g ∈ G : x = gpm(g) (24.2)
24.4 Injective GPM
In the spirit of Section 24.1 and with the goal to lower redundancy, the number of genotypes
which map to the same phenotype should be low [2322–2324] In Section 16.5 we provided a
thorough discussion of redundancy in the genotypes and its demerits (but also its possible
advantages). Combining the idea of achieving low uniform redundancy with the principle
stated in Section 24.3, Palmer and Kershenbaum [2113, 2115] conclude that the GPM should
ideally be both, simple and bijective.
24.5 Consistent GPM
The genotype-phenotype mapping should always yield valid phenotypes [2032, 2113, 2115].
The meaning of valid in this context is that if the problem space X is the set of all possible
trees, only trees should be encoded in the genome. If we use the R
3
as problem space,
no vectors with fewer or more elements than three should be produced by the genotype-
phenotype mapping. Anything else would not make sense anyway, so this is a pretty basic
rule. This form of validity does not imply that the individuals are also correct solutions in
terms of the objective functions.
Constraints on what a good solution is can, on one hand, be realized by defining them ex-
plicitly and making them available to the optimization process (see Section 3.4 on page 70).
On the other hand, they can also be implemented by defining the encoding, search opera-
tions, and genotype-phenotype mapping in a way which ensures that all phenotypes created
are feasible by default [1888, 2323, 2912, 2914].
24.6. FORMAE INHERITANCE AND PRESERVATION 217
24.6 Formae Inheritance and Preservation
The search operations should be able to preserve the (good) formae and (good) configurations
of multiple formae [978, 2242, 2323, 2879]. Good formae should not easily get lost during
the exploration of the search space.
From a binary search operation like recombination (e.g., crossover in GAs), we would
expect that, if its two parameters g
1
and g
2
are members of a forma A, the resulting element
will also be an instance of A.
∀g
1
, g
2
∈ A ⊆ G ⇒searchOp(g
1
, g
2
) ∈ A (24.3)
If we furthermore can assume that all instances of all formae A with minimal precision
(A ∈ mini) of an individual are inherited by at least one parent, the binary reproduction
operation is considered as pure. [2242, 2879]
∀g
3
= searchOp(g
1
, g
2
) ∈ G, ∀A ∈ mini : g
3
∈ A ⇒ g
1
∈ A∨ g
2
∈ A (24.4)
If this is the case, all properties of a genotype g
3
which is a combination of two others (g
1
, g
2
)
can be traced back to at least one of its parents. Otherwise, “searchOp” also performs an
implicit unary search step, a mutation in Genetic Algorithm, for instance. Such properties,
although discussed here for binary search operations only, can be extended to arbitrary n-ary
operators.
24.7 Formae in Genotypic Space Aligned with Phenotypic Formae
The optimization algorithms find new elements in the search space G by applying the search
operations searchOp ∈ Op. These operations can only create, modify, or combine genotypical
formae since they usually have no information about the problem space. Most mathematical
models dealing with the propagation of formae like the Building Block Hypothesis and the
Schema Theorem
2
thus focus on the search space and show that highly fit genotypical formae
will more probably be investigated further than those of low utility. Our goal, however, is to
find highly fit formae in the problem space X. Such properties can only be created, modified,
and combined by the search operations if they correspond to genotypical formae. According
to Radcliffe [2242] and Weicker [2879], a good genotype-phenotype mapping should provide
this feature.
It furthermore becomes clear that useful separate properties in phenotypic space can only
be combined by the search operations properly if they are represented by separate formae
in genotypic space too.
24.8 Compatibility of Formae
Formae of different properties should be compatible [2879]. Compatible Formae in phe-
notypic space should also be compatible in genotypic space. This leads to a low level of
epistasis [240, 2323] (see Chapter 17 on page 177) and hence will increase the chance of
success of the reproduction operations. The representations of different, compatible pheno-
typic formae should not influence each other [1075]. This will lead to a representations with
minimal epistasis.
2
See Section 29.5 for more information on the Schema Theorem.
218 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
24.9 Representation with Causality
Besides low epistasis, the search space, GPM, and search operations together should possess
strong causality (see Section 14.2). Generally, the search operations should produce small
changes in the objective values. This would, simplified, mean that:
∀x
1
, x
2
∈ X, g ∈ G : x
1
= gpm(g) ∧ x
2
= gpm(searchOp(g)) ⇒ f (x
2
) ≈ f x
1
(24.5)
24.10 Combinations of Formae
If genotypes g
1
, g
2
, . . . which are instances of different but compatible formae A
1
⊲⊳ A
2
⊲⊳ . . .
are combined by a binary (or n-ary) search operation, the resulting genotype g should be an
instance of both properties, i. e., the combination of compatible formae should be a forma
itself [2879].
∀g
1
∈ A
1
, g
2
∈ A
2
, ⇒searchOp(g
1
, g
2
, . . .) ∈ A
1
∩ A
2
∩ . . . (,= ∅) (24.6)
If this principle holds for many individuals and formae, useful properties can be combined by
the optimization step by step, narrowing down the precision of the arising, most interesting
formae more and more. This should lead the search to the most promising regions of the
search space.
24.11 Reachability of Formae
Beyer and Schwefel [300, 302] suggest that a good representation has the feature of reach-
ability that any other (finite) individual state can be reached from any parent state within
a finite number of (mutation) steps. In other words, the set of search operations should be
complete (see Definition D4.8 on page 85). Beyer and Schwefel state that this property is
necessary for proving global convergence.
The set of available search operations Op should thus include at least one (unary) search
operation which is theoretically able to reach all possible formae within finite steps. If the
binary search operations in Op all are pure, this unary operator is the only one (apart
from creation operations) able to introduce new formae which are not yet present in the
population. Hence, it should be able to find any given forma [2879].
Generally, we just need to be able to generate all formae, not necessarily all genotypes
with such an operator, if formae can arbitrarily combined by the available n-ary search or
recombination operations.
24.12 Influence of Formae
One rule which, in my opinion, was missing in the lists given by Radcliffe [2242] and Weicker
[2879] is that the absolute contributions of the single formae to the overall objective values
of a candidate solution should not be too different. Let us divide the phenotypic formae
into those with positive and those with negative or neutral contribution and let us, for
simplification purposes, assume that those with positive contribution can be arbitrarily
combined. If one of the positive formae has a contribution with an absolute value much
lower than those of the other positive formae, we will trip into the problem of domino
convergence discussed in Section 13.2.3 on page 153.
Then, the search will first discover the building blocks of higher value. This, itself, is
not a problem. However, as we have already pointed out in Section 13.2.3, if the search is
24.13. SCALABILITY OF SEARCH OPERATIONS 219
stochastic and performs exploration steps, chances are that alleles of higher importance get
destroyed during this process and have to be rediscovered. The values of the less salient
formae would then play no role. Thus, the chance of finding them strongly depends on how
frequent the destruction of important formae takes place.
Ideally, we would therefore design the genome and phenome in a way that the different
characteristics of the candidate solution all influence the objective values to a similar degree.
Then, the chance of finding good formae increases.
24.13 Scalability of Search Operations
The scalability requirement outlined by Beyer and Schwefel [300, 302] states that the strength
of the variation operator (usually a unary search operation such as mutation) should be
tunable in order to adapt to the optimization process. In the initial phase of the search,
usually larger search steps are performed in order to obtain a coarse idea in which general
direction the optimization process should proceed. The closer the focus gets to a certain local
or global optimum, the smaller should the search steps get in order to balance the genotypic
features in a fine-grained manner. Ideally, the optimization algorithm should achieve such
behavior in a self-adaptive way in reaction to the structure of the problem it is solving.
24.14 Appropriate Abstraction
The problem should be represented at an appropriate level of abstraction [240, 2323]. The
complexity of the optimization problem should evenly be distributed between the search
space G, the search operations searchOp ∈ Op, the genotype-phenotype mapping “gpm”,
and the problem space X. Sometimes, using a non-trivial encoding which shifts complexity
to the genotype-phenotype mapping and the search operations can improve the performance
of an optimizer and increase the causality [240, 2912, 2914].
24.15 Indirect Mapping
If a direct mapping between genotypes and phenotypes is not possible, a suitable develop-
mental approach should be applied [2323]. Such indirect genotype-phenotype mappings, as
we will discuss in Section 28.2.2 on page 272, are especially useful if large-scale and robust
structure designs should be constructed by the optimization process.
24.15.1 Extradimensional Bypass: Example for Good Complexity
Minimal-sized genomes are not always the best approach. An interesting aspect of genome
design supporting this claim is inspired by the works of the theoretical biologist Conrad
[621, 622, 623, 625]. According to his extradimensional bypass principle, it is possible to
transform a rugged fitness landscape with isolated peeks into one with connected saddle
points by increasing the dimensionality of the search space [496, 544]. In [625] he states that
the chance of isolated peeks in randomly created fitness landscapes decreases when their
dimensionality grows.
This partly contradicts Section 24.1 which states that genomes should be as compact
as possible. Conrad [625] does not suggest that nature includes useless sequences in the
genome but either genes which allow for
1. new phenotypical characteristics or
2. redundancy providing new degrees of freedom for the evolution of a species.
220 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
In some cases, such an increase in freedom makes more than up for the additional “costs”
arising from the enlargement of the search space. The extradimensional bypass can be
considered as an example of positive neutrality (see Chapter 16).
G
G’
f(x)
local
optimum
global
optimum
Fig. 24.1.a: Useful increase of dimensionality.
G
G’
f(x)
local
optimum
global
optimum
Fig. 24.1.b: Useless increase of dimensionality.
Figure 24.1: Examples for an increase of the dimensionality of a search space G (1d) to G

(2d).
In Fig. 24.1.a, an example for the extradimensional bypass (similar to Fig. 6 in [355])
is sketched. The original problem had a one-dimensional search space G corresponding to
the horizontal axis up front. As can be seen in the plane in the foreground, the objective
function had two peeks: a local optimum on the left and a global optimum on the right,
separated by a larger valley. When the optimization process began climbing up the local
optimum, it was very unlikely that it ever could escape this hill and reach the global one.
Increasing the search space to two dimensions (G

), however, opened up a path way
between them. The two isolated peeks became saddle points on a longer ridge. The global
optimum is now reachable from all points on the local optimum.
Generally, increasing the dimension of the search space makes only sense if the added
dimension has a non-trivial influence on the objective functions. Simply adding a useless new
dimension (as done in Fig. 24.1.b) would be an example for some sort of uniform redundancy
from which we already know (see Section 16.5) that it is not beneficial. Then again, adding
useful new dimensions may be hard or impossible to achieve in most practical applications.
A good example for this issue is given by Bongard and Paul [355] who used an EA to
evolve a neural network for the motion control of a bipedal robot. They performed runs
where the evolution had control over some morphological aspects and runs where it had
not. The ability to change the leg with of the robots, for instance, comes at the expense
of an increase of the dimensions of the search spaced. Hence, one would expect that the
optimization would perform worse. Instead, in one series of experiments, the results were
much better with the extended search space. The runs did not converge to one particular
leg shape but to a wide range of different structures. This led to the assumption that the
morphology itself was not so much target of the optimization but the ability of changing it
transformed the fitness landscape to a structure more navigable by the evolution. In some
other experimental runs performed by Bongard and Paul [355], this phenomenon could not
be observed, most likely because
1. the robot configuration led to a problem of too high complexity, i. e., ruggedness in
the fitness landscape and/or
2. the increase in dimensionality this time was too large to be compensated by the gain
of evolvability.
24.15. INDIRECT MAPPING 221
Further examples for possible benefits of “gradually complexifying” the search space are
given by Malkin in his doctoral thesis [1818].
Part III
Metaheuristic Optimization Algorithms
224 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS
Chapter 25
Introduction
In Definition D1.15 on page 36 we stated that metaheuristics is a black-box optimization
technique which combines information from the objective functions gained by sampling the
search space in a way which (hopefully) leads to the discovery of candidate solutions with
high utility. In the taxonomy given in Section 1.3, these algorithms are the majority since
they are the focus of this book.
226 25 INTRODUCTION
25.1 General Information on Metaheuristics
25.1.1 Applications and Examples
Table 25.1: Applications and Examples of Metaheuristics.
Area References
Art see Tables 28.4, 29.1, and 31.1
Astronomy see Table 28.4
Chemistry see Tables 28.4, 29.1, 30.1, 31.1, 32.1, 40.1, and 43.1
Combinatorial Problems [342, 651, 1033, 1034, 2283, 2992]; see also Tables 26.1, 27.1,
28.4, 29.1, 30.1, 31.1, 33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1,
40.1, 41.1, and 42.1
Computer Graphics see Tables 27.1, 28.4, 29.1, 30.1, 31.1, 33.1, and 36.1
Control see Tables 28.4, 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork see Tables 28.4, 29.1, 31.1, and 37.1
Data Compression see Table 31.1
Data Mining see Tables 27.1, 28.4, 29.1, 31.1, 32.1, 34.1, 35.1, 36.1, 37.1,
and 39.1
Databases see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
Decision Making see Table 31.1
Digital Technology see Tables 26.1, 28.4, 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys-
tems
[3001, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
32.1, 33.1, 34.1, 35.1, 36.1, 37.1, 39.1, 40.1, and 41.1
E-Learning see Tables 28.4, 29.1, 37.1, and 39.1
Economics and Finances see Tables 27.1, 28.4, 29.1, 31.1, 33.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
Engineering see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
35.1, 36.1, 37.1, 39.1, 40.1, and 41.1
Function Optimization see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 32.1, 33.1, 34.1, 35.1,
36.1, 39.1, 40.1, 41.1, and 43.1
Games see Tables 28.4, 29.1, 31.1, and 32.1
Graph Theory [3001, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
33.1, 34.1, 35.1, 36.1, 37.1, 38.1, 39.1, 40.1, and 41.1
Healthcare see Tables 28.4, 29.1, 31.1, 35.1, 36.1, and 37.1
Image Synthesis see Table 28.4
Logistics [651, 1033, 1034]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1,
33.1, 34.1, 36.1, 37.1, 38.1, 39.1, 40.1, and 41.1
Mathematics see Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1, 32.1, 33.1, 34.1,
and 35.1
Military and Defense see Tables 26.1, 27.1, 28.4, and 34.1
Motor Control see Tables 28.4, 33.1, 34.1, 36.1, and 39.1
Multi-Agent Systems see Tables 28.4, 29.1, 31.1, 35.1, and 37.1
Multiplayer Games see Table 31.1
Physics see Tables 27.1, 28.4, 29.1, 31.1, 32.1, and 34.1
Prediction see Tables 28.4, 29.1, 30.1, 31.1, and 35.1
Security see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 35.1, 36.1, 37.1, 39.1,
and 40.1
25.1. GENERAL INFORMATION ON METAHEURISTICS 227
Shape Synthesis see Table 26.1
Software [467, 3031]; see also Tables 26.1, 27.1, 28.4, 29.1, 30.1, 31.1,
33.1, 34.1, 36.1, 37.1, and 40.1
Sorting see Tables 28.4 and 31.1
Telecommunication see Tables 28.4, 37.1, and 40.1
Testing see Tables 28.4 and 31.3
Theorem Proofing and Automatic
Verification
see Tables 29.1 and 40.1
Water Supply and Management see Table 28.4
Wireless Communication see Tables 26.1, 27.1, 28.4, 29.1, 31.1, 33.1, 34.1, 35.1, 36.1,
37.1, 39.1, and 40.1
25.1.2 Books
1. Metaheuristics for Scheduling in Industrial and Manufacturing Applications [2992]
2. Handbook of Metaheuristics [1068]
3. Fleet Management and Logistics [651]
4. Metaheuristics for Multiobjective Optimisation [1014]
5. Large-Scale Network Parameter Configuration using On-line Simulation Framework [3031]
6. Modern Heuristic Techniques for Combinatorial Problems [2283]
7. How to Solve It: Modern Heuristics [1887]
8. Search Methodologies – Introductory Tutorials in Optimization and Decision Support Tech-
niques [453]
9. Handbook of Approximation Algorithms and Metaheuristics [1103]
25.1.3 Conferences and Workshops
Table 25.2: Conferences and Workshops on Metaheuristics.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
ICARIS: International Conference on Artificial Immune Systems
See Table 10.2 on page 130.
MIC: Metaheuristics International Conference
History: 2009/07: Hamburg, Germany, see [1968]
2008/10: Atizap´an de Zaragoza, M´exico, see [1038]
2007/06: Montr´eal, QC, Canada, see [1937]
2005/08: Vienna, Austria, see [2810]
2003/08: Kyoto, Japan, see [1321]
2001/07: Porto, Portugal, see [2294]
1999/07: Angra dos Reis, Rio de Janeiro, Brazil, see [2298]
1997/07: Sophia-Antipolis, France, see [2828]
1995/07: Breckenridge, CO, USA, see [2104]
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
228 25 INTRODUCTION
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
History: 2012/10: Port El Kantaoui, Sousse, Tunisia, see [2205]
2010/10: Djerba Island, Tunisia, see [873]
2008/10: Hammamet, Tunesia, see [1171]
2006/11: Hammamet, Tunesia, see [1170]
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
25.1.4 Journals
1. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers
B.V. and North-Holland Scientific Publishers Ltd.
2. Computational Optimization and Applications published by Kluwer Academic Publishers and
Springer Netherlands
3. The Journal of the Operational Research Society (JORS) published by Operations Research
Society and Palgrave Macmillan Ltd.
4. Journal of Heuristics published by Springer Netherlands
5. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied
Mathematics (SIAM)
6. Journal of Global Optimization published by Springer Netherlands
7. Discrete Optimization published by Elsevier Science Publishers B.V.
8. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
9. Engineering Optimization published by Informa plc and Taylor and Francis LLC
10. Optimization and Engineering published by Kluwer Academic Publishers and Springer
Netherlands
11. Optimization Methods and Software published by Taylor and Francis LLC
12. Journal of Mathematical Modelling and Algorithms published by Springer Netherlands
13. Structural and Multidisciplinary Optimization published by Springer-Verlag GmbH
14. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization pub-
lished by SIAM Activity Group on Optimization – SIAG/OPT
Chapter 26
Hill Climbing
26.1 Introduction
Definition D26.1 (Local Search). A local search
1
algorithm solves an optimization
problem by iteratively moving from one candidate solution to another in the problem space
until a termination criterion is met [2356].
In other words, a local search algorithm in each iteration either keeps the current candidate
solution or moves to one element from its neighborhood as defined in Definition D4.12 on
page 86. Local search algorithms have two major advantages: They use very little memory
(normally only a constant amount) and are often able to find solutions in large or infinite
search spaces. These advantages come, of course, with large trade-offs in processing time
and the risk of premature convergence.
Hill Climbing
2
(HC) [2356] is one of the oldest and simplest local search and optimiza-
tion algorithm for single objective functions f. In Algorithm 1.1 on page 40, we gave an
example for a simple and unstructured Hill Climbing algorithm. Here we will generalize
from Algorithm 1.1 to a more basic Monte Carlo Hill Climbing algorithm structure.
26.2 Algorithm
The Hill Climbing procedures usually starts either from a random point in the search space
or from a point chosen according to some problem-specific criterion (visualized with the
operator create
__
Algorithm 26.1). Then, in a loop, the currently known best individual p

is modified (with the mutate( )operator) in order to produce one offspring p
new
. If this new
individual is better than its parent, it replaces it. Otherwise, it is discarded. This procedure
is repeated until the termination criterion terminationCriterion
__
is met. In Listing 56.6 on
page 735, we provide a Java implementation which resembles Algorithm 26.1.
In this sense, Hill Climbing is similar to an Evolutionary Algorithm (see Chapter 28)
with a population size of 1 and truncation selection. Matter of fact, it resembles a (1 + 1)
Evolution Strategies. In Algorithm 30.7 on page 369, we provide such an optimization
algorithm for G ⊆ R
n
which additionally adapts the search step size.
1
http://en.wikipedia.org/wiki/Local_search_(optimization) [accessed 2010-07-24]
2
http://en.wikipedia.org/wiki/Hill_climbing [accessed 2007-07-03]
230 26 HILL CLIMBING
Algorithm 26.1: x ←− hillClimbing(f)
Input: f: the objective function subject to minization
Input: [implicit] terminationCriterion: the termination criterion
Data: p
new
: the new element created
Data: p

: the (currently) best individual
Output: x: the best element found
1 begin
2 p

.g ←− create
__
3 p

.x ←− gpm(p

.g)
4 while terminationCriterion
__
do
5 p
new
.g ←− mutate(p

.g)
6 p
new
.x ←− gpm(p
new
.g)
7 if f(p
new
.x) ≤ f(p

.x) then p

←− p
new
8 return p

.x
Although the search space G and the problem space X are most often the same in Hill
Climbing, we distinguish them in Algorithm 26.1 for the sake of generality. It should further
be noted that Hill Climbing can be implemented in a deterministic manner if, for each
genotype g
1
∈ G, the set of adjacent points is finite (g
2
∈ G : [adjacent(g
2
, g
1
)[ ∈ N
0
).
26.3 Multi-Objective Hill Climbing
As illustrated in Algorithm 26.2, we can easily extend Hill Climbing algorithms with a
support for multi-objective optimization by using some of the methods from Section 3.5.2
on page 76 and some which we later will discuss in the context of Evolutionary Algorithms.
This extended approach will then return a set X of the best candidate solutions found instead
of a single point x from the problem space as done in Algorithm 26.1.
The set of currently known best individuals archive may contain more than one element.
In each round, we pick one individual from the archive at random. We then use this individ-
ual to create the next point in the problem space with the “mutate” operator. The method
pruneOptimalSet(archive, as) checks whether the new candidate solution is non-prevailed
from any element in archive. If so, it will enter the archive and all individuals from archive
which now are dominated are removed. Since an optimization problem may potentially have
infinitely many global optima, we subsequently have to ensure that the archive does not grow
too big: pruneOptimalSet(archive, as) makes sure that it contains at most as individuals and,
if necessary, deletes some individuals. In case of Algorithm 26.2, the archive can grow to
at most as + 1 individuals before the pruning, so at most one individual will be deleted.
When the loop ends because the termination criterion terminationCriterion
__
was met, we
extract all the phenotypes from the archived individuals (extractPhenotypes(archive)) and
return them as the set X. An example implementation of this method in Java can be found
in Listing 56.7 on page 737.
26.4. PROBLEMS IN HILL CLIMBING 231
Algorithm 26.2: X ←− hillClimbingMO(OptProb, as)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: as: the maximum archive size
Input: [implicit] terminationCriterion: the termination criterion
Data: p
old
: the individual used as parent for the next point
Data: p
new
: the new individual generated
Data: archive: the set of best individuals known
Output: X: the set of the best elements found
1 begin
2 archive ←− ()
3 p
new
.g ←− create
__
4 p
new
.x ←− gpm(p
new
.g)
5 while terminationCriterion
__
do
// check whether the new individual belongs into archive
6 archive ←− updateOptimalSet(archive, p
new
, cmp
f
)
// if the archive is too big, delete one individual
7 archive ←− pruneOptimalSet(archive, as)
// pick random element from archive
8 p
old
←− archive[⌊randomUni[0, len(archive))⌋]
9 p
new
.g ←− mutate(p
old
.g)
10 p
new
.x ←− gpm(p
new
.g)
11 return extractPhenotypes(archive)
26.4 Problems in Hill Climbing
The major problem of Hill Climbing is premature convergence. We introduced this phe-
nomenon in Chapter 13 on page 151: The optimization gets stuck at a local optimum and
cannot discover the global optimum anymore.
Both previously specified versions of the algorithm are very likely to fall prey to this
problem, since the always move towards improvements in the objective values and thus,
cannot reach any candidate solution which is situated behind an inferior point. They will
only follow a path of candidate solutions if it is monotonously
3
improving the objective func-
tion(s). The function used in Example E5.1 on page 96 is, for example, already problematic
for Hill Climbing.
As previously mentioned, Hill Climbing in this form is a local search rather than Global
Optimization algorithm. By making a few slight modifications to the algorithm however, it
can become a valuable technique capable of performing Global Optimization:
1. A tabu-list which stores the elements recently evaluated can be added. By preventing
the algorithm from visiting them again, a better exploration of the problem space X can
be enforced. This technique is used in Tabu Search which is discussed in Chapter 40
on page 489.
2. A way of preventing premature convergence is to not always transcend to the better
candidate solution in each step. Simulated Annealing introduces a heuristic based
on the physical model the cooling down molten metal to decide whether a superior
offspring should replace its parent or not. This approach is described in Chapter 27
on page 243.
3
http://en.wikipedia.org/wiki/Monotonicity [accessed 2007-07-03]
232 26 HILL CLIMBING
3. The Dynamic Hill Climbing approach by Yuret and de la Maza [3048] uses the last
two visited points to compute unit vectors. With this technique, the directions are
adjusted according to the structure of the problem space and a new coordinate frame
is created which points more likely into the right direction.
4. Randomly restarting the search after so-and-so many steps is a crude but efficient
method to explore wide ranges of the problem space with Hill Climbing. You can find
it outlined in Section 26.5.
5. Using a reproduction scheme that not necessarily generates candidate solutions directly
neighboring p

.x, as done in Random Optimization, an optimization approach defined
in Chapter 44 on page 505, may prove even more efficient.
26.5 Hill Climbing With Random Restarts
Hill Climbing with random restarts is also called Stochastic Hill Climbing (SH) or Stochastic
gradient descent
4
[844, 2561]. In the following, we will outline how random restarts can be
implemented into Hill Climbing and serve as a measure against premature convergence.
We will further combine this approach directly with the multi-objective Hill Climbing ap-
proach defined in Algorithm 26.2. The new algorithm incorporates two archives for optimal
solutions: archive
1
, the overall optimal set, and archive
2
the set of the best individuals in
the current run. We additionally define the restart criterion shouldRestart
__
which is eval-
uated in every iteration and determines whether or not the algorithm should be restarted.
shouldRestart
__
therefore could, for example, count the iterations performed or check if
any improvement was produced in the last ten iterations. After each single run, archive
2
is
incorporated into archive1, from which we extract and return the problem space elements
at the end of the Hill Climbing process, as defined in Algorithm 26.3.
4
http://en.wikipedia.org/wiki/Stochastic_gradient_descent [accessed 2007-07-03]
26.5. HILL CLIMBING WITH RANDOM RESTARTS 233
Algorithm 26.3: X ←− hillClimbingMORR(OptProb, as)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: as: the maximum archive size
Input: [implicit] terminationCriterion: the termination criterion
Input: [implicit] shouldRestart: the restart criterion
Data: p
old
: the individual used as parent for the next point
Data: p
new
: the new individual generated
Data: archive
1
, archive
2
: the sets of best individuals known
Output: X: the set of the best elements found
1 begin
2 archive
1
←− ()
3 while terminationCriterion
__
do
4 archive
2
←− ()
5 p
new
.g ←− create
__
6 p
new
.x ←− gpm(p
new
.g)
7 while
_
shouldRestart
__
∨ terminationCriterion
___
do
// check whether the new individual belongs into
archive
8 archive
2
←− updateOptimalSet(archive
2
, p
new
, cmp
f
)
// if the archive is too big, delete one individual
9 archive
2
←− pruneOptimalSet(archive
2
, as)
// pick random element from archive
10 p
old
←− archive
2
[⌊randomUni[0, len(archive
2
))⌋]
11 p
new
.g ←− mutate(p
old
.g)
12 p
new
.x ←− gpm(p
new
.g)
// check whether good individual were found in this run
13 archive
1
←− updateOptimalSetN(archive
1
, archive
2
, cmp
f
)
// if the archive is too big, delete one individual
14 archive
1
←− pruneOptimalSet(archive
1
, as)
15 return extractPhenotypes(archive
1
)
234 26 HILL CLIMBING
26.6 General Information on Hill Climbing
26.6.1 Applications and Examples
Table 26.1: Applications and Examples of Hill Climbing.
Area References
Combinatorial Problems [16, 502, 775, 1261, 1486, 1532, 1553, 1558, 1781, 1971, 2656,
2663, 2745, 3027]; see also Table 26.3
Databases [1781, 2745]
Digital Technology [1600, 1665]
Distributed Algorithms and Sys-
tems
[15, 410, 775, 1532, 1558, 2058, 2165, 2286, 2315, 2554, 2656,
2663, 2903, 2994, 3027, 3085, 3106]
Engineering [410, 1486, 1600, 1665, 2058, 2286, 2656, 2663]
Function Optimization [1927, 3048]
Graph Theory [15, 410, 775, 1486, 1532, 1558, 2058, 2165, 2237, 2286, 2315,
2554, 2596, 2656, 2663, 3027, 3085, 3106]
Logistics [1553, 1971]
Mathematics [1674]
Military and Defense [775]
Security [1486, 2165, 2554]
Shape Synthesis [887]
Software [1486, 1665, 2554, 2994]
Wireless Communication [1486, 2237, 2596]
26.6.2 Books
1. Artificial Intelligence: A Modern Approach [2356]
2. Introduction to Stochastic Search and Optimization [2561]
3. Evolution and Optimum Seeking [2437]
26.6.3 Conferences and Workshops
Table 26.2: Conferences and Workshops on Hill Climbing.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
26.7. RAINDROP METHOD 235
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
26.6.4 Journals
1. Journal of Heuristics published by Springer Netherlands
26.7 Raindrop Method
Only five years ago, Bettinger and Zhu [290, 3083] contributed a new search heuristic for
constrained optimization: the Raindrop Method, which they used for forest planning prob-
lems. In the original description of the algorithm, the search and problem space are identical
(G = X). The algorithm is based on precise knowledge of the components of the candidate
solutions and on how their interaction influences the validity of the constraints. It works as
follows:
1. The Raindrop Method starts out with a single, valid candidate solution x

(i. e., one
that violates none of the constraints). This candidate may be found with a random
search process or provided by a human operator.
2. Create a copy x of x

. Set the iteration counter τ to a user-defined maximum value
maxτ of iterations for modifying and correcting x that are allowed without improve-
ments before reverting to x

.
3. Perturb x by randomly modifying one of its components. Let us refer to this randomly
selected component as s. This modification may lead to constraint violations.
4. If no constraint was violated, continue at Point 11, otherwise proceed as follows.
5. Set a distance value d to 0.
6. Create a list L of the components of x that lead to constraint violations. Here we
make use the knowledge of the interaction of components and constraints.
7. FromL, we pick the component c which is physically closest to s, that is, the component
with the minimum distance dist(c, s). In the original application of the Raindrop
Method, physically close was properly defined due to the fact that the candidate
solutions were basically two-dimensional maps. For applications different from forest
planning, appropriate definitions for the distance measure have to be supplied.
8. Set d = dist(c, s).
236 26 HILL CLIMBING
9. Find the next best value for c which does not introduce any new constraint violations
in components f which are at least as close to s, i. e., with dist(f, s) ≤ dist(c, s). This
change may, however, cause constraints violations in components farther away from s.
If no such change is possible, go to Point 13. Otherwise, modify the component c in
x.
10. Go back to step Point 4.
11. If x is better than x

, that is, x ≺ x

, set x

= x. Otherwise, decrease the iteration
counter τ.
12. If the termination criterion has not yet been met, go back to step Point 3 if τ > 0 and
to Point 2 if τ = 0.
13. Return x

to the user.
The iteration counter τ here is used to allow the search to explore solutions more distance
from the current optimum x

. The higher the initial value maxτ specified user, the more
iterations without improvement are allowed before reverting x to x

. The name Raindrop
Method comes from the fact that the constraint violations caused by the perturbation of
the valid solution radiate away from the modified component s like waves on a water surface
radiate away from the point where a raindrop hits.
26.7.1 General Information on Raindrop Method
26.7.1.1 Applications and Examples
Table 26.3: Applications and Examples of Raindrop Method.
Area References
Combinatorial Problems [290, 3083]
26.7.1.2 Books
1. Nature-Inspired Algorithms for Optimisation [562]
26.7.1.3 Conferences and Workshops
Table 26.4: Conferences and Workshops on Raindrop Method.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
26.7. RAINDROP METHOD 237
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
26.7.1.4 Journals
1. Journal of Heuristics published by Springer Netherlands
238 26 HILL CLIMBING
Tasks T26
64. Use the implementation of Hill Climbing given in Listing 56.6 on page 735 (or an
equivalent own one as in Task 65) in order to find approximations of the global minima
of
f
1
(x) = 0.5 −
1
2n
n

i=1
[cos (ln (1 +[x
i
[))] (26.1)
f
2
(x) = 0.5 −0.5 ∗ cos
_
ln
_
1 +
n

i=1
[x
i
[
__
(26.2)
f
3
(x) = 0.01 ∗ max
n
i=1
_
x
i
2
_
(26.3)
in the real interval [−10, 10]
n
.
(a) Run ten experiments for n = 2, n = 5, and n = 10 each. Note all parameter
settings of the algorithm you used (the number of algorithm steps, operators,
etc.), the results, the vectors discovered, and their objective values. [20 points]
(b) Compute the median (see Section 53.2.6 on page 665), the arithmetic mean (see
Section 53.2.2, especially Definition D53.32 on page 661) and the standard devi-
ation (see Section 53.2.3, especially Section 53.2.3.2 on page 663) of the achieved
objective values in the ten runs. [10 points]
(c) The three objective functions are designed so that their value is always in [0, 1]
in the interval [−10, 10]
n
. Try to order the optimization problems according to
the median of the achieved objective values. Which problems seem to be the
hardest? What is the influence of the dimension n on the achieved objective
values? [10 points]
(d) If you compare f
1
and f
2
, you find striking similarities. Does one of the functions
seem to be harder? If so, which. . . and what, in your opinion, makes it harder?
[5 points]
[45 points]
65. Implement the Hill Climbing procedure Algorithm 26.1 on page 230 in a programming
language different from Java. You can specialize your implementation for only solving
continuous numerical problems, i. e., you do not need to follow the general, interface-
based approach we adhere to in Listing 56.6 on page 735. Apply your implementation
to the Sphere function (see Section 50.3.1.1 on page 581 and Listing 57.1 on page 859)
for n = 10 dimensions in order to test its functionality.
[20 points]
66. Extend the Hill Climbing algorithm implementation given in Listing 56.6 on page 735
to Hill Climbing with restarts, i. e., to a Hill Climbing which restarts the search pro-
cedure every n steps (while preserving the results of the optimization process in an
additional variable). You may use the discussion in Section 26.5 on page 232 as refer-
ence. However, you do not need to provide the multi-objective optimization capabilities
of Algorithm 26.3 on page 233. Instead, only extend the single-objective Hill Climbing
method in Listing 56.6 on page 735 or provide a completely new implementation of
Hill Climbing with restarts in a programming language of your choice. Test your im-
plementation on one of the previously discussed benchmark functions for real-valued,
continuous optimization (see, e.g., Task 64).
[25 points]
67. Let us now solve a modified version of the bin packing problem from Example E1.1
on page 25. In Table 26.5, we give four datasets from Data Set 1 in [2422]. Each
26.7. RAINDROP METHOD 239
dataset stands for one bin packing problem and provides the bin size b, the number
n of objects to be put into the bins, and the sizes a
i
of the objects (with i ∈ 1..n).
The problem space be the set of all possible assignments of objects to bins, where the
number k of bins is not provided. For each of the four bin packing problems, find
a candidate solution x which contains an assignment of objects to as few as possible
bins of the given size b. In other words: For each problem, we do know the size b of
the bin, the number n of objects, and the object sizes a. We want to find out how
many bins we need at least to store all the objects (k ∈ N
1
, the objective, subject to
minimization) and how we can store (x ∈ X) the objects in these bins.
Name [2422] b n a
i
k

N1C1W1 A 100 50 [50, 100, 99, 99, 96, 96, 92, 92, 91, 88, 87, 86, 85, 76, 74,
72, 69, 67, 67, 62, 61, 56, 52, 51, 49, 46, 44, 42, 40, 40,
33, 33, 30, 30, 29, 28, 28, 27, 25, 24, 23, 22, 21, 20, 17,
14, 13, 11, 10, 7, 7, 3]
25
N1C2W2 D 120 50 [50, 120, 99, 98, 98, 97, 96, 90, 88, 86, 82, 82, 80, 79, 76,
76, 76, 74, 69, 67, 66, 64, 62, 59, 55, 52, 51, 51, 50, 49,
44, 43, 41, 41, 41, 41, 41, 37, 35, 33, 32, 32, 31, 31, 31,
30, 29, 23, 23, 22, 20, 20]
24
N2C1W1 A 100 100 [100, 100, 99, 97, 95, 95, 94, 92, 91, 89, 86, 86, 85, 84, 80,
80, 80, 80, 80, 79, 76, 76, 75, 74, 73, 71, 71, 69, 65, 64,
64, 64, 63, 63, 62, 60, 59, 58, 57, 54, 53, 52, 51, 50, 48,
48, 48, 46, 44, 43, 43, 43, 43, 42, 41, 40, 40, 39, 38, 38,
38, 38, 37, 37, 37, 37, 36, 35, 34, 33, 32, 30, 29, 28, 26,
26, 26, 24, 23, 22, 21, 21, 19, 18, 17, 16, 16, 15, 14, 13,
12, 12, 11, 9, 9, 8, 8, 7, 6, 6, 5, 1]
48
N2C3W2 C 150 100 [100, 150, 100, 99, 99, 98, 97, 97, 97, 96, 96, 95, 95, 95,
94, 93, 93, 93, 92, 91, 89, 88, 87, 86, 84, 84, 83, 83, 82,
81, 81, 81, 78, 78, 75, 74, 73, 72, 72, 71, 70, 68, 67, 66,
65, 64, 63, 63, 62, 60, 60, 59, 59, 58, 57, 56, 56, 55, 54,
51, 49, 49, 48, 47, 47, 46, 45, 45, 45, 45, 44, 44, 44, 44,
43, 41, 41, 40, 39, 39, 39, 37, 37, 37, 35, 35, 34, 32, 31,
31, 30, 28, 26, 25, 24, 24, 23, 23, 22, 21, 20, 20]
41
Table 26.5: Bin Packing test data from [2422], Data Set 1.
Solve this optimization problem with Hill Climbing. It is not the goal to solve it to
optimality, but to get solutions with Hill Climbing which are as good as possible. For
the sake of completeness and in order to allow comparisons, we provide the lowest
known number k

of bins for the problems given by Scholl and Klein [2422] in the last
column of Table 26.5.
In order to solve this optimization problem, you may proceed as discussed in Chap-
ter 7 on page 109, except that you have to use Hill Climbing. Below, we give some
suggestions – but you do not necessarily need to follow them:
(a) A possible problem space X for n objects would be the set of all permutations
of the first n natural numbers (G = Π(1..n)). A candidate solution x ∈ X, i. e.,
such a permutation, could describe the order in which the objects have to be put
into bins. If a bin is filled to its capacity (or the current object does not fit into
the bin anymore), a new bin is needed. This process is repeated until all objects
have been packed into bins.
(b) For the objective function, there exist many different choices.
i. As objective function f
1
, the number k(x) of bins required for storing the
objects according to a permutation x ∈ X could be used. This value can
easily be computed as discussed in our outline of the problem space X above.
You can find an example implementation of this function in Listing 57.10 on
page 892.
240 26 HILL CLIMBING
ii. Alternatively, we could also sum up the squares of the free space left over in
each bin as f
2
. It is easy to see that if we put each object into a single bin,
the free capacity left over is maximal. If there was a way to perfectly fill each
bin with objects, the free capacity would become 0. Such a configuration
would also require the least number of bins, since using fewer bins would
mean to not store all objects. By summing up the squares of the free spaces,
we punish bins with lots of free space more severely. In other words, we
reward solutions where the bins are filled equally. Naturally one would “feel”
that such solutions are “good”. However, this approach may prevent us
from finding the optimum if it is a solution where the objects are distributed
uneven. We implement this objective function as Listing 57.11 on page 893.
iii. Also, a combination of the above would be possible, such as f
3
(x) = f
1
(x) −
1
1+f
2
(x)
, for example. An example implementation of this function is given
as Listing 57.12 on page 894.
iv. Then again, we could consider the total number of bins required but ad-
ditionally try to minimize the space lastWaste(x) left over in the last bin,
i. e., set f
4
(x) = f
1
(x) + 1/(1 + lastWaste(x)). The idea here would be that
the dominating factor of the objective should still be the number of bins re-
quired. But, if two solutions require the same amount of bins, we prefer the
one where more space is free in the last bin. If this waste in the last bin is
increased more and more, this means that the volume of objects in that bin
is reduced. It may eventually become 0 which would equal leaving the bin
empty. The empty bin would not be used and hence, we would be saving one
bin. An implementation of such a function for the suggested problem space
is given at Listing 57.13 on page 896.
v. Finally, instead of only considering the remaining empty space in the last
bin, we could consider the maximum empty space over all bins. The idea
is pretty similar: If we have a bin which is almost empty, maybe we can
make it disappear with just a few moves (object swaps). Hence, we could
prefer solutions where there is one bin with much free space. In order to find
such solutions, we go over all the bins required by the solution, and check how
much space is free in them. We then take the one bin with the most free space
mf and add this space as a factor to the bin count in the objective function,
similar to as we did in the previous function: f
5
(x) = f
1
(x) + 1/(1 + mf(x)).
An implementation of this function for the suggested problem space is given
at Listing 57.14 on page 897.
(c) If the suggested problem space is used, it could also be used as search space,
i. e., G = X. Then, a simple unary search operation (mutation) would swap the
elements of the permutation such as those provided in Listing 56.38 on page 806
and Listing 56.39. A creation operation could draw a permutation according to
the uniform distribution as shown in Listing 56.37.
Use the Hill Climbing Algorithm 26.1 given in Listing 56.6 on page 735 for solving the
task and compare its results with the results produced by equivalent random walks
and random sampling (see Chapter 8 and Section 56.1.2 on page 729). For your
experiments, you may use the program given in Listing 57.15 on page 899.
(a) For each algorithm or objective function you test, perform at least 30 independent
runs to get reliable results (Remember: our Hill Climbing algorithm is randomized
and can produce different results in each run.)
(b) For each of the four problems, write down the best assignment of objects to bins
that you have found (along with the number of bins required). You may add own
ideas as well
26.7. RAINDROP METHOD 241
(c) Discuss which of the objective functions and algorithm or algorithm configuration
performed best and give reasons.
(d) Also provide all your source code.
[40 points]
Chapter 27
Simulated Annealing
27.1 Introduction
In 1953, Metropolis et al. [1874] developed a Monte Carlo method for “calculating the
properties of any substance which may be considered as composed of interacting individual
molecules”. With this new “Metropolis” procedure stemming from statistical mechanics,
the manner in which metal crystals reconfigure and reach equilibriums in the process of
annealing can be simulated, for example. This inspired Kirkpatrick et al. [1541] to develop
the Simulated Annealing
1
(SA) algorithm for Global Optimization in the early 1980s and
to apply it to various combinatorial optimization problems. Independently around the same
time,
ˇ
Cern´ y [516] employed a similar approach to the Traveling Salesman Problem. Even
earlier, Jacobs et al. [1426, 1427] use a similar method to optimize program code. Simulated
Annealing is a general-purpose optimization algorithm that can be applied to arbitrary
search and problem spaces. Like simple Hill Climbing, Simulated Annealing only needs a
single initial individual as starting point and a unary search operation and thus belongs to
the family of local search algorithms.
27.1.1 Metropolis Simulation
In metallurgy and material science, annealing
2
[1307] is a heat treatment of material
with the goal of altering properties such as its hardness. Metal crystals have small defects,
dislocations of atoms which weaken the overall structure as sketched in Figure 27.1. It
should be noted that Figure 27.1 is a very rough sketch and the following discussion is very
rough and simplified, too.
3
In order decrease the defects, the metal is heated to, for instance
0.4 times of its melting temperature, i. e., well below its melting point but usually already a
few hundred degrees Celsius. Adding heat means adding energy to the atoms in the metal.
Due to the added energy, the electrons are freed from the atoms in the metal (which now
become ions). Also, the higher energy of the ions makes them move and increases their
diffusion rate. The ions may now leave their current position and assume better positions.
During this process, they may also temporarily reside in configurations of higher energy, for
example if two ions get close to each other. The movement decreases as the temperature
of the metal sinks. The structure of the crystal is reformed as the material cools down and
1
http://en.wikipedia.org/wiki/Simulated_annealing [accessed 2007-07-03]
2
http://en.wikipedia.org/wiki/Annealing_(metallurgy) [accessed 2008-09-19]
3
I assume that a physicist or chemist would beat for these simplifications it. If you are a physicist or
chemist and have suggestions for improving/correcting the discussion, please drop me an email.
244 27 SIMULATED ANNEALING
t
T
T=getTemperature(1)
s
0K=getTemperature( ) ¥
physicalrolemodelof
SimulatedAnnealing
metalstructurewithdefects increasedmovementofatoms ionsinlow-energystructure
withfewerdefects
Figure 27.1: A sketch of annealing in metallurgy as role model for Simulated Annealing.
approaches its equilibrium state. During this process, the dislocations in the metal crystal
can disappear. When annealing metal, the initial temperature must not be too low and the
cooling must be done sufficiently slowly so as to avoid to stop the ions too early and to
prevent the system from getting stuck in a meta-stable non-crystalline state representing a
local minimum of energy.
In the Metropolis simulation of the annealing process, each set of positions pos of all
atoms of a system is weighted by its Boltzmann probability factor e

E(pos)
k
B
∗T
where E(pos) is
the energy of the configuration pos, T is the temperature measured in Kelvin, and k
B
is the
Boltzmann’s constant
4
:
k
B
≈ 1.380 650 524 10
−23
J/K (27.1)
The Metropolis procedure is a model of this physical process which could be used to simulate
a collection of atoms in thermodynamic equilibrium at a given temperature. A new nearby
geometry pos
i+1
was generated as a random displacement from the current geometry pos
i
of the atoms in each iteration. The energy of the resulting new geometry is computed and
∆E, the energetic difference between the current and the new geometry, was determined.
The probability that this new geometry is accepted, P (∆E) is defined in Equation 27.3.
∆E = E(pos
i+1
) −E(pos
i
) (27.2)
P (∆E) =
_
e

∆E
k
B
∗T
if ∆E > 0
1 otherwise
(27.3)
Systems in chemistry and physics always “strive” to attain low-energy states, i. e., the lower
the energy, the more stable is the system. Thus, if the new nearby geometry has a lower
energy level, the change of atom positions is accepted in the simulation. Otherwise, a
uniformly distributed random number r = randomUni[0, 1) is drawn and the step will only
be realized if r is less or equal the Boltzmann probability factor, i. e., r ≤ P (∆E). At high
temperatures T, this factor is very close to 1, leading to the acceptance of many uphill steps.
4
http://en.wikipedia.org/wiki/Boltzmann%27s_constant [accessed 2007-07-03]
27.2. ALGORITHM 245
As the temperature falls, the proportion of steps accepted which would increase the energy
level decreases. Now the system will not escape local regions anymore and (hopefully) comes
to a rest in the global energy minimum at temperature T = 0K.
27.2 Algorithm
The abstraction of the Metropolis simulation method in order to allow arbitrary problem
spaces is straightforward – the energy computation E(pos
i
) is replaced by an objective
function f or maybe even by the result v of a fitness assignment process. Algorithm 27.1
illustrates the basic course of Simulated Annealing which you can find implemented in List-
ing 56.8 on page 740. Without loss of generality, we reuse the definitions from Evolutionary
Algorithms for the search operations and set Op = ¦create, mutate¦.
Algorithm 27.1: x ←− simulatedAnnealing(f)
Input: f: the objective function to be minimized
Data: p
new
: the newly generated individual
Data: p
cur
: the point currently investigated in problem space
Data: T: the temperature of the system which is decreased over time
Data: t: the current time index
Data: ∆E: the energy (objective value) difference of the p
cur
.x and p
new
.x
Output: x: the best element found
1 begin
2 p
cur
.g ←− create
__
3 p
cur
.x ←− gpm(p
new
.g)
4 x ←− p
cur
.x
5 t ←− 1
6 while terminationCriterion
__
do
7 T ←− getTemperature(t)
8 p
new
.g ←− mutate(p
cur
.g)
9 p
new
.x ←− gpm(p
new
.g)
10 ∆E ←− f(p
new
.x) −f(p
cur
.x)
11 if ∆E ≤ 0 then
12 p
cur
←− p
new
13 if f(p
cur
.x) < f(x) then x ←− p
cur
.x
14 else
15 if randomUni[0, 1) < e

∆E
T
then p
cur
←− p
new
16 t ←− t + 1
17 return x
Let us discuss in a very simplified way how this algorithm works. During the course of
the optimization process, the temperature T decreases (see Section 27.3). If the temperature
is high in the beginning, this means that, different from a Hill Climbing process, Simulated
Annealing accepts also many worse individuals. Better candidate solutions are always ac-
cepted. In summary, the chance that the search will transcend to a new candidate solution,
regardless whether it is better or worse, is high. In its behavior, Simulated Annealing is
then a bit similar to a random walk. As the temperature decreases, the chance that worse
individuals are accepted decreases. If T approaches 0, this probability becomes 0 as well
and Simulated Annealing becomes a pure Hill Climbing algorithm. The rationale of having
an algorithm that initially behaves like a random walk and then becomes a Hill Climber is
to make use of the benefits of the two extreme cases-
246 27 SIMULATED ANNEALING
A random walk is a maximally-explorative and minimally-exploitive algorithm. It can
never get stuck at a local optimum and will explore all areas in the search space with
the same probability. Yet, it cannot exploit a good solution, i. e., is unable to make small
modifications to an individual in a targeted way in order to check if there are better solutions
nearby.
A Hill Climber, on the other hand, is minimally-explorative and maximally-exploitive. It
always tries to modify and improve on the current solution and never investigates far-away
areas of the search space.
Simulated Annealing combines both properties, it performs a lot of exploration in order
to discover interesting areas of the search space. The hope is that, during this time, it will
discover the basin of attraction of the global optimum. The more the optimization process
becomes like a Hill Climber, the less likely will it move away from a good solution to a worse
one and will instead concentrate on exploiting the solution.
27.2.1 No Boltzmann’s Constant
Notice the difference between Line 15 in Algorithm 27.1 and Equation 27.3 on page 244:
Boltzmann’s constant has been removed. The reason is that with Simulated Annealing, we
do not want to simulate a physical process but instead, aim to solve an optimization problem.
The nominal value of Boltzmann’s constant, given in Equation 27.1 (something around
1.4 10
−23
), has quite a heavy impact on the Boltzmann probability. In a combinatorial
problem, where the differences in the objective values between two candidate solutions are
often discrete, this becomes obvious: In order to achieve an acceptance probability of 0.001
for a solution which is 1 worse than the best known one, one would need a temperature T
of:
P = 0.001 = e

1
k
B
∗T
(27.4)
ln 0.001 = −
1
k
B
∗ T
(27.5)

1
ln 0.001
= k
B
∗ T (27.6)

1
k
B
∗ ln 0.001
= T (27.7)
T ≈ 1.05 10
22
K (27.8)
Making such a setting, i. e., using a nominal value of the scale of 1 10
22
or more, for such
a problem is not obvious and hard to justify. Leaving the constant k
B
away leads to
P = 0.001 = e

1
T
(27.9)
ln 0.001 = −
1
T
(27.10)

1
ln 0.001
= T (27.11)
T ≈ 0.145 (27.12)
In other words, a temperature of around 0.15 would be sufficient to achieve an acceptance
probability of around 0.001. Such a setting is much easier to grasp and allows the algo-
rithm user to relate objective values, objective value differences, and probabilities in a more
straightforward way. Since “physical correctness” is not a goal in optimization (while find-
ing good solutions in an easy way is), k
B
is thus eliminated from the computations in the
Simulated Annealing algorithm.
If you take a close look on Equation 27.8 and Equation 27.12, you will spot another
difference: Equation 27.8 contains the unit K whereas Equation 27.12 does not. The reason
is that the absence of k
B
in Equation 27.12 removes the physical context and hence, using
27.3. TEMPERATURE SCHEDULING 247
the unit Kelvin is no longer appropriate. Technically, we should also no longer speak about
temperature, but for the sake of simplicity and compliance with literature, we will keep that
name during our discussion of Simulated Annealing.
27.2.2 Convergence
It has been shown that Simulated Annealing algorithms with appropriate cooling strategies
will asymptotically converge to the global optimum [1403, 2043]. The probability that
the individual p
cur
as used in Algorithm 27.1 represents a global optimum approaches 1
for t → ∞. This feature is closely related to the property ergodicity
5
of an optimization
algorithm:
Definition D27.1 (Ergodicity). An optimization algorithm is ergodic if its result x ∈ X
(or its set of results X ⊆ X) does not depend on the point(s) in the search space at which
the search process is started.
Nolte and Schrader [2043] and van Laarhoven and Aarts [2776] provide lists of the most
important works on the issue of guaranteed convergence to the global optimum. Nolte and
Schrader [2043] further list the research providing deterministic, non-infinite boundaries
for the asymptotic convergence by Anily and Federgruen [111], Gidas [1054], Nolte and
Schrader [2042], and Mitra et al. [1917]. In the same paper, they introduce a significantly
lower bound. They conclude that, in general, p
cur
.x in the Simulated Annealing process is
a globally optimal candidate solution with a high probability after a number of iterations
which exceeds the cardinality of the problem space. This is a boundary which is, well, slow
from the viewpoint of a practitioner [1403]: In other words, it would be faster to enumerate
all possible candidate solutions in order to find the global optimum with certainty than
applying Simulated Annealing. Indeed, this makes sense: If this was not the case, Simulated
Annealing could be used to solve AT-complete problems in sub-exponential time. . .
This does not mean that Simulated Annealing is generally slow. It only needs that much
time if we insist on the convergence to the global optimum. Speeding up the cooling process
will result in a faster search, but voids the guaranteed convergence on the other hand. Such
speeded-up algorithms are called Simulated Quenching (SQ) [1058, 1402, 2405].
Quenching
6
, too, is a process in metallurgy. Here, a work piece is first heated and then
cooled rapidly, often by holding it into a water bath. This way, steel objects can be hardened,
for example.
27.3 Temperature Scheduling
Definition D27.2 (Temperature Schedule). The temperature schedule defines how the
temperature parameter T in the Simulated Annealing process (as defined in Algorithm 27.1)
is set. The operator getTemperature : N
1
→ R
+
maps the current iteration index t to a
(positive) real temperature value T.
getTemperature(t) ∈ [1.. +∞) ∀t ∈ N
1
(27.13)
5
http://en.wikipedia.org/wiki/Ergodicity [accessed 2010-09-26]
6
http://en.wikipedia.org/wiki/Quenching [accessed 2010-10-11]
248 27 SIMULATED ANNEALING
Equation 27.13 is provided as a Java interface in Listing 55.12 on page 717. As already
mentioned, the temperature schedule has major influence on the probability that the Sim-
ulated Annealing algorithm will find the global optimum. For “getTemperature”, a few
general rules hold. All schedules start with a temperature T
s
which is greater than zero.
getTemperature(1) = T
s
> 0 (27.14)
If the number of iterations t approaches infinity, the temperature should approach 0. If the
temperature sinks too fast, the ergodicity of the search may be lost and the global optimum
may not be discovered. In order to avoid that the resulting Simulated Quenching algorithm
gets stuck at a local optimum, random restarting can be applied, which already has been
discussed in the context of Hill Climbing in Section 26.5 on page 232.
lim
t→+∞
getTemperature(t) = 0 (27.15)
There exists a wide range of methods to determine this temperature schedule. Miki et al.
0.2T
s
0.4T
s
0.6T
s
0.8T
s
T
s
0 20 40 60 80 100
linearscaling
(i.e.,polynomialwith =1) a
polynomial
with a=2
exponentially
with e=0.05
logarithmically
exponentially
with e=0.025
t
Figure 27.2: Different temperature schedules for Simulated Annealing.
[1896], for example, used Genetic Algorithms for this purpose. The most established methods
are listed below and implemented in Paragraph 56.1.3.2.1 and some examples of them are
sketched in Figure 27.2.
27.3.1 Logarithmic Scheduling
Scale the temperature logarithmically where T
s
is set to a value larger than the greatest
difference of the objective value of a local minimum and its best neighboring candidate
27.3. TEMPERATURE SCHEDULING 249
solution. This value may, of course, be estimated since it usually is not known. [1167, 2043]
An implementation of this method is given in Listing 56.9 on page 743.
getTemperature
ln
(t) =
_
T
s
if t < e
T
s
/ ln t otherwise
(27.16)
27.3.2 Exponential Scheduling
Reduce T to (1−ǫ)T in each step (or ever m steps, see Equation 27.19), where the exact value
of 0 < ǫ < 1 is determined by experiment [2808]. Such an exponential cooling strategy will
turn Simulated Annealing into Simulated Quenching [1403]. This strategy is implemented
in Listing 56.10 on page 744.
getTemperature
exp
(t) = (1 −ǫ)
t
∗ T
s
(27.17)
27.3.3 Polynomial and Linear Scheduling
T is polynomially scaled between T
s
and 0 for
˘
t iterations according to Equation 27.18 [2808].
α is a constant, maybe 1, 2, or 4 that depends on the positions and objective values of the
local minima. Large values of α will spend more iterations at lower temperatures. For α = 1,
the schedule proceeds in linear steps.
getTemperature
poly
(t) =
_
1 −
t
˘
t
_
α
∗ T
s
(27.18)
27.3.4 Adaptive Scheduling
Instead of basing the temperature solely on the iteration index t, other information may be
used for self-adaptation as well. After every m moves, the temperature T can be set to β
times ∆E
c
= f(p
cur
.x) − f(x), where β is an experimentally determined constant, f(p
cur
.x)
is the objective value of the currently examined candidate solution p
cur
.x, and f(x) is the
objective value of the best phenotype x found so far. Since ∆E
c
may be 0, we limit the
temperature change to a maximum of T ∗ γ with 0 < γ < 1. [2808]
27.3.5 Larger Step Widths
If the temperature reduction should not take place continuously but only every m ∈ N
1
iterations, t

computed according to Equation 27.19 can be used in the previous formulas,
i. e., getTemperature(t

) instead of getTemperature(t).
t

= m⌊t/m⌋ (27.19)
250 27 SIMULATED ANNEALING
27.4 General Information on Simulated Annealing
27.4.1 Applications and Examples
Table 27.1: Applications and Examples of Simulated Annealing.
Area References
Combinatorial Problems [190, 308, 344, 382, 516, 667, 771, 1403, 1454, 1455, 1486,
1541, 1550, 1896, 2043, 2105, 2171, 2640, 2770, 2901, 2966]
Computer Graphics [433, 1043, 1403, 2707]
Data Mining [1403]
Databases [771, 2171]
Distributed Algorithms and Sys-
tems
[616, 1926, 2058, 2068, 2631, 2632, 2901, 2994, 3002]
Economics and Finances [1060, 1403]
Engineering [437, 1058, 1059, 1403, 1486, 1541, 1550, 1926, 2058, 2250,
2405, 3002]
Function Optimization [1018, 1071, 2180, 2486]
Graph Theory [63, 344, 616, 1454, 1455, 1486, 1926, 2058, 2068, 2237, 2631,
2632, 2793, 3002]
Logistics [190, 382, 516, 667, 1403, 1550, 1896, 2043, 2770, 2966]
Mathematics [111, 227, 1043, 1674, 2180]
Military and Defense [1403]
Physics [1403]
Security [1486]
Software [1486, 1550, 2901, 2994]
Wireless Communication [63, 154, 1486, 2237, 2250, 2793]
27.4.2 Books
1. Genetic Algorithms and Simulated Annealing [694]
2. Simulated Annealing: Theory and Applications [2776]
3. Electromagnetic Optimization by Genetic Algorithms [2250]
4. Facts, Conjectures, and Improvements for Simulated Annealing [2369]
5. Simulated Annealing for VLSI Design [2968]
6. Introduction to Stochastic Search and Optimization [2561]
7. Simulated Annealing [2966]
27.4.3 Conferences and Workshops
Table 27.2: Conferences and Workshops on Simulated Annealing.
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
27.4. GENERAL INFORMATION ON SIMULATED ANNEALING 251
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
27.4.4 Journals
1. Journal of Heuristics published by Springer Netherlands
252 27 SIMULATED ANNEALING
Tasks T27
68. Perform the same experiments as in Task 64 on page 238 with Simulated Annealing.
Therefore, use the Java Simulated Annealing implementation given in Listing 56.8 on
page 740 or develop an equivalent own implementation.
[45 points]
69. Compare your results from Task 64 on page 238 and Task 68. What are the differences?
What are the similarities? Did you use different experimental settings? If so, what
happens if you use the same experimental settings for both tasks? Try to give reasons
for your results.
[10 points]
70. Perform the same experiments as in Task 67 on page 238 with Simulated Annealing and
Simulated Quenching. Therefore, use the Java Simulated Annealing implementation
given in Listing 56.8 on page 740 or develop an equivalent own implementation.
[25 points]
71. Compare your results from Task 67 on page 238 and Task 70. What are the differences?
What are the similarities? Did you use different experimental settings? If so, what
happens if you use the same experimental settings for both tasks? Try to give reasons
for your results. Put your explanations into the context of the discussions provided
in Example E17.4 on page 181.
[10 points]
72. Try solving the bin packing problem again, this time by using the representation given
by Khuri, Sch¨ utz, and Heitk¨otter in [1534]. Use both, Simulated Annealing and Hill
Climbing to evaluate the performance of the new representation similar to what we
do in Section 57.1.2.1 on page 887. Compare the results with the results listed in that
section and obtained by you in Task 70. Which method is better? Provide reasons for
your observation.
[50 points]
73. Compute the probabilities that a new solution with ∆E
1
= 0, ∆E
2
= 0.001, ∆E
3
=
0.01, ∆E
4
= 0.1, ∆E
5
= 1.0, ∆E
6
= 10.0, and ∆E
7
= 100 is accepted by the Simulated
Annealing algorithm at temperatures T
1
= 0, T
2
= 1, T
3
= 10, and T
4
= 100. Use
the same formulas used in Algorithm 27.1 (consider the discussion in Section 27.2.1).
[15 points]
74. Compute the temperatures needed so that solutions with ∆E
1
= 0, ∆E
2
= 0.001,
∆E
3
= 0.01, ∆E
4
= 0.1, ∆E
5
= 1.0, ∆E
6
= 10.0, and ∆E
7
= 100 are accepted by the
Simulated Annealing algorithm with probabilities P
1
= 0, P
2
= 0.1, P
3
= 0.25, and
P
4
= 0.5. Base your calculations on the formulas used in Algorithm 27.1 and consider
the discussion in Section 27.2.1.
[15 points]
Chapter 28
Evolutionary Algorithms
28.1 Introduction
Definition D28.1 (Evolutionary Algorithm). Evolutionary Algorithms
1
(EAs) are
population-based metaheuristic optimization algorithms which use mechanisms inspired by
the biological evolution, such as mutation, crossover, natural selection, and survival of the
fittest, in order to refine a set of candidate solutions iteratively in a cycle. [167, 171, 172]
Evolutionary Algorithms have not been invented in a canonical form. Instead, they are a
large family of different optimization methods which have, to a certain degree, been de-
veloped independently by different researchers. Information on their historic development
is thus largely distributed amongst the chapters of this book which deal with the specific
members of this family of algorithms.
Like other metaheuristics, all EAs share the “black box” optimization character and
make only few assumptions about the underlying objective functions. Their consistent and
high performance in many different problem domains (see, e.g., Table 28.4 on page 314) and
the appealing idea of copying natural evolution for optimization made them one the most
popular metaheuristic.
As in most metaheuristic optimization algorithms, we can distinguish between single-
objective and multi-objective variants. The latter means that multiple, possibly conflict-
ing criteria are optimized as discussed in Section 3.3. The general area of Evolutionary
Computation that deals with multi-objective optimization is called EMOO, evolutionary
multi-objective optimization and the multi-objective EAs are called, well, multi-objective
Evolutionary Algorithms.
Definition D28.2 (Multi-Objective Evolutionary Algorithm). A multi-objective Evo-
lutionary Algorithm (MOEA) is an Evolutionary Algorithm suitable to optimize multiple
criteria at once [595, 596, 737, 740, 954, 1964, 2783].
In this section, we will first outline the basic cycle in which Evolutionary Algorithms proceed
in Section 28.1.1. EAs are inspired by natural evolution, so we will compare the natural
paragons of the single steps of this cycle in detail in Section 28.1.2. Later in this chapter,
we will outline possible algorithms which can be used for the processes in EAs and finally
give some information on conferences, journals, and books on EAs as well as it applications.
1
http://en.wikipedia.org/wiki/Artificial_evolution [accessed 2007-07-03]
254 28 EVOLUTIONARY ALGORITHMS
28.1.1 The Basic Cycle of EAs
All evolutionary algorithms proceed in principle according to the scheme illustrated in Fig-
Evaluation
computetheobjective
valuesofthesolution
candidates
GPM
applythegenotype-
phenotypemappingand
obtainthephenotypes
FitnessAssignment
usetheobjectivevalues
todeterminefitness
values
InitialPopulation
createaninitial
populationofrandom
individuals
createnewindividuals
fromthematingpoolby
crossoverandmutation
Reproduction
Selection
selectthefittestindi-
vidualsforreproduction
Figure 28.1: The basic cycle of Evolutionary Algorithms.
ure 28.1 which is a specialization of the basic cycle of metaheuristics given in Figure 1.8 on
page 35. We define the basic EA for single-objective optimization in Algorithm 28.1 and the
multi-objective version in Algorithm 28.2.
1. In the first generation (t = 1), a population pop of ps individuals p is created. The
function createPop(ps) produces this population. It depends on its implementation
whether the individuals p have random genomes p.g (the usual case) or whether the
population is seeded (see Paragraph 28.1.2.2.2).
2. the genotypes p.g are translated to phenotypes p.x (see Section 28.1.2.3). This
genotype-phenotype mapping may either directly be performed to all individuals
(performGPM) in the population or can be a part of the process of evaluating the
objective functions.
3. The values of the objective functions f ∈ f are evaluated for each candidate solu-
tion p.x in pop (and usually stored in the individual records) with the operation
“computeObjectives”. This evaluation may incorporate complicated simulations and
calculations. In single-objective Evolutionary Algorithms, f = ¦f¦ and, hence, [f [ = 1
holds.
4. With the objective functions, the utilities of the different features of the candidate
solutions p.x have been determined and a scalar fitness value v(p) can now be assigned
to each of them (see Section 28.1.2.5). In single-objective Evolutionary Algorithms,
v ≡ f holds.
5. A subsequent selection process selection(pop, v, mps) (or selection(pop, f, mps), if v ≡ f,
as is the case in single-objective Evolutionary Algorithms) filters out the candidate
solutions with bad fitness and allows those with good fitness to enter the mating pool
with a higher probability (see Section 28.1.2.6).
6. In the reproduction phase, offspring are derived from the genotypes p.g of the selected
individuals p ∈ mate by applying the search operations searchOp ∈ Op (which are
28.1. INTRODUCTION 255
called reproduction operations in the context of EAs, see Section 28.1.2.7). There usu-
ally are two different reproduction operations: mutation, which modifies one genotype,
and crossover, which combines two genotypes to a new one. Whether the whole popu-
lation is replaced by the offspring or whether they are integrated into the population as
well as which individuals recombined with each other depends on the implementation
of the operation “reproducePop”. We highlight the transition between the generations
in Algorithm 28.1 and 28.2 by incrementing a generation counter t and annotating the
populations with it, i. e., pop(t).
7. If the terminationCriterion
__
is met, the evolution stops here. Otherwise, the algo-
rithm continues at Point 2.
Algorithm 28.1: X ←− simpleEA(f, ps, mps)
Input: f: the objective/fitness function
Input: ps: the population size
Input: mps: the mating pool size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 pop(t = 1) ←− createPop(ps)
4 continue ←− true
5 while continue do
6 pop(t) ←− performGPM(pop(t) , gpm)
7 pop(t) ←− computeObjectives(pop(t) , f)
8 if terminationCriterion
__
then
9 continue ←− false
10 else
11 mate ←− selection(pop(t) , f, mps)
12 t ←− t + 1
13 pop(t) ←− reproducePop(mate, ps)
14 return extractPhenotypes(extractBest(pop(t) , f))
Algorithm 28.1 is strongly simplified. The termination criterion terminationCriterion
__
,
for example, is usually invoked after each and every evaluation of the objective functions.
Often, we have a limit in terms of objective function evaluations (see, for instance Sec-
tion 6.3.3). Then, it is not sufficient to check terminationCriterion
__
after each generation,
since a generation involves computing the objective values of multiple individuals. You can
find a more precise version of the basic Evolutionary Algorithm scheme (with steady-state
population treatment) is specified in Listing 56.11 on page 746.
Also, the return value(s) of an Evolutionary Algorithm often is not the best individual(s)
from the last generation, but the best candidate solution discovered so far. This, however,
demands for maintaining an archive, a thing which – especially in the case of multi-objective
or multi-modal optimization.
An example implementation of Algorithm 28.2 is given in Listing 56.12 on page 750.
28.1.2 Biological and Artificial Evolution
The concepts of Evolutionary Algorithms are strongly related to the way in which the
256 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.2: X ←− simpleMOEA(OptProb, ps, mps)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: ps: the population size
Input: mps: the mating pool size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: v: the fitness function resulting from the fitness assigning process
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 pop(t = 1) ←− createPop(ps)
4 continue ←− true
5 while continue do
6 pop(t) ←− performGPM(pop(t) , gpm)
7 pop(t) ←− computeObjectives(pop(t) , f)
8 if terminationCriterion
__
then
9 continue ←− false
10 else
11 v ←− assignFitness(pop(t) , cmp
f
)
12 mate ←− selection(pop, v, mps)
13 t ←− t + 1
14 pop(t) ←− reproducePop(mate, ps)
15 return extractPhenotypes(extractBest(pop, cmp
f
))
28.1. INTRODUCTION 257
development of species in nature could proceed. We will outline the historic ideas behind
evolution and put the steps of a multi-objective Evolutionary Algorithm into this context.
28.1.2.1 The Basic Principles from Nature
In 1859, Darwin [684] published his book “On the Origin of Species”
2
in which he identified
the principles of natural selection and survival of the fittest as driving forces behind the
biological evolution. His theory can be condensed into ten observations and deductions
[684, 1845, 2938]:
1. The individuals of a species possess great fertility and produce more offspring than
can grow into adulthood.
2. Under the absence of external influences (like natural disasters, human beings, etc.),
the population size of a species roughly remains constant.
3. Again, if no external influences occur, the food resources are limited but stable over
time.
4. Since the individuals compete for these limited resources, a struggle for survival ensues.
5. Especially in sexual reproducing species, no two individuals are equal.
6. Some of the variations between the individuals will affect their fitness and hence, their
ability to survive.
7. Some of these variations are inheritable.
8. Individuals which are less fit are also less likely to reproduce. The fittest individuals
will survive and produce offspring more probably.
9. Individuals that survive and reproduce will likely pass on their traits to their offspring.
10. A species will slowly change and adapt more and more to a given environment during
this process which may finally even result in new species.
These historic ideas themselves remain valid to a large degree even today. However, some
of Darwin’s specific conclusions can be challenged based on data and methods which simply
were not available at his time. Point 10, the assumption that change happens slowly over
time, for instance, became controversial as fossils indicate that evolution seems to “at rest”
for longer times which exchange with short periods of sudden, fast developments [473, 2778]
(see Section 41.1.2 on page 493).
28.1.2.2 Initialization
28.1.2.2.1 Biology
Until now, it is not clear how life started on earth. It is assumed that before the first organ-
isms emerged, an abiogenesis
3
, a chemical evolution took place in which the chemical com-
ponents of life formed. This process was likely divided in three steps: (1) In the first step,
simple organic molecules such as alcohols, acids, purines
4
(such as adenine and guanine),
and pyrimidines
5
(such as thymine, cytosine, and uracil) were synthesized from inorganic
2
http://en.wikipedia.org/wiki/The_Origin_of_Species [accessed 2007-07-03]
3
http://en.wikipedia.org/wiki/Abiogenesis [accessed 2009-08-05]
4
http://en.wikipedia.org/wiki/Purine [accessed 2009-08-05]
5
http://en.wikipedia.org/wiki/Pyrimidine [accessed 2009-08-05]
258 28 EVOLUTIONARY ALGORITHMS
substances. (2) Then, simple organic molecules were combined to the basic components of
complex organic molecules such as monosaccharides, amino acids
6
, pyrroles, fatty acids, and
nucleotides
7
. (3) Finally, complex organic molecules emerged.Many different theories exist
on how this could have taken place [176, 1300, 1698, 1713, 1837, 2084, 2091, 2432]. An impor-
tant milestone in the research in this area is likely the Miller-Urey experiment
8
[1904, 1905]
where it was shown that amino acids can form under conditions such as those expected to be
present in the early time of the earth from inorganic substances. The first single-celled life
most probably occurred in the Eoarchean
9
, the earth age 3.8 billion years ago, demarking
the onset of evolution.
28.1.2.2.2 Evolutionary Algorithms
Usually, when an Evolutionary Algorithm starts up, there exists no information about what
is good or what is bad. Basically, only random genes are coupled together as individuals
in the initial population pop(t = 0). This form of initialization can be considered to be
similar to the early steps of chemical and biological evolution, where it was unclear which
substances or primitive life form would be successful and nature experimented with many
different reactions and concepts.
Besides random initialization, Evolutionary Algorithms can also be seeded [1126, 1139,
1471]. If seeding is applied, a set of individuals which already are adapted to the optimization
criteria is inserted into the initial population. Sometimes, the entire first population consists
of such individuals with good characteristics only. The candidate solutions used for this may
either have been obtained manually or by a previous optimization step [1737, 2019].
The goal for seeding is be to achieve faster convergence and better results in the evolu-
tionary process. However, it also raises the danger of premature convergence (see Chapter 13
on page 151), since the already-adapted candidate solutions will likely be much more com-
petitive than the random ones and thus, may wipe them out of the population. Care must
thus be taken in order to achieve the wanted performance improvement.
28.1.2.3 Genotype-Phenotype Mapping
28.1.2.3.1 Biology
TODO
Example E28.1 (Evolution of Fish).
In the following, let us discuss the evolution on the example of fish. We can consider a fish as
an instance of its heredity information, as one possible realization of the blueprint encoded
in its DNA.
28.1.2.3.2 Evolutionary Algorithms
At the beginning of every generation in an EA, each genotype p.g is instantiated as a
new phenotype p.x = gpm(p.g). Genotype-phenotype mappings are discussed in detail in
Section 28.2.
6
http://en.wikipedia.org/wiki/Amino_acid [accessed 2009-08-05]
7
http://en.wikipedia.org/wiki/Nucleotide [accessed 2009-08-05]
8
http://en.wikipedia.org/wiki/MillerUrey_experiment [accessed 2009-08-05]
9
http://en.wikipedia.org/wiki/Eoarchean [accessed 2007-07-03]
28.1. INTRODUCTION 259
28.1.2.4 Objective Values
28.1.2.4.1 Biology
There are different criteria which determine whether an organism can survive in its environ-
ment. Often, individuals are good in some characteristics but rarely reach perfection in all
possible aspects.
Example E28.2 (Evolution of Fish (Example E28.1) Cont.).
The survival of the genes of the fish depends on how good the fish performs in the ocean.
Its fitness, however, is not only determined by one single feature of the phenotype like its
size. Although a bigger fish will have better chances to survive, size alone does not help if
it is too slow to catch any prey. Also its energy consumption in terms of calories per time
unit which are necessary to sustain its life functions should be low so it does not need to eat
all the time. Other factors influencing the fitness positively are formae like sharp teeth and
colors that blend into the environment so it cannot be seen too easily by its predators such
as sharks. If its camouflage is too good on the other hand, how will it find potential mating
partners? And if it is big, it will usually have higher energy consumption. So there may be
conflicts between the desired properties.
28.1.2.4.2 Evolutionary Algorithms
To sum it up, we could consider the life of the fish as the evaluation process of its genotype
in an environment where good qualities in one aspect can turn out as drawbacks from other
perspectives. In multi-objective Evolutionary Algorithms, this is exactly the same. For each
problem that we want to solve, we can specify one or multiple so-called objective functions
f ∈ f . An objective function f represents one feature that we are interested in.
Example E28.3 (Evolution of a Car).
Let us assume that we want to evolve a car (a pretty weird assumption, but let’s stick with
it). The genotype p.g ∈ G would be the construction plan and the phenotype p.x ∈ X the
real car, or at least a simulation of it. One objective function f
a
would definitely be safety.
For the sake of our children and their children, the car should also be environment-friendly,
so that’s our second objective function f
b
. Furthermore, a cheap price f
c
, fast speed f
d
, and
a cool design f
e
would be good. That makes five objective functions from which for example
the second and the fourth are contradictory (f
b
≁ f
d
).
28.1.2.5 Fitness
28.1.2.5.1 Biology
As stated before, different characteristics determine the fitness of an individual. The biolog-
ical fitness is a scalar value that describes the number of offspring of an individual [2542].
260 28 EVOLUTIONARY ALGORITHMS
Example E28.4 (Evolution of Fish (Example E28.1) Cont.).
In Example E28.1, the fish genome is instantiated and proved its physical properties in
several different categories in Example E28.2. Whether one specific fish is fit or not, however,
is still not easy to say: fitness is always relative; it depends on the environment. The fitness
of the fish is not only defined by its own characteristics but also results from the features of
the other fish in the population (as well as from the abilities of its prey and predators). If
one fish can beat another one in all categories, i. e., is bigger, stronger, smarter, and so on,
it is better adapted to its environment and thus, will have a higher chance to survive. This
relation is transitive but only forms a partial order since a fish which is strong but not very
clever and a fish that is clever but not strong maybe have the same probability to reproduce
and hence, are not directly comparable.
It is not clear whether a weak fish with a clever behavioral pattern is worse or better
than a really strong but less cunning one. Both traits are useful in the evolutionary process
and maybe, one fish of the first kind will sometimes mate with one of the latter and produce
an offspring which is both, intelligent and powerful.
Example E28.5 (Computer Science and Sports).
Fitness depends on the environment. In my department, I may be considered as average fit.
If took a stroll to the department of sports science, however, my relative physical suitability
will decrease. Yet, my relative computer science skills will become much better.
28.1.2.5.2 Evolutionary Algorithms
The fitness assignment process in multi-objective Evolutionary Algorithms computes one
scalar fitness value for each individual p in a population pop by building a fitness function
v (see Definition D5.1 on page 94). Instead of being an a posteriori measure of the number
of offspring of an individual, it rather describes the relative quality of a candidate solution,
i. e., its ability to produce offspring [634, 2987].
Optimization criteria may contradict, so the basic situation in optimization very much
corresponds to the one in biology. One of the most popular methods for computing the fitness
is called Pareto ranking
10
. It performs comparisons similar to the ones we just discussed.
Some popular fitness assignment processes are given in Section 28.3.
28.1.2.6 Selection
28.1.2.6.1 Biology
Selection in biology has many different facets [2945]. Let us again assume fitness to be a
measure for the expected number of offspring, as done in Paragraph 28.1.2.5.1. How fit an
individual is then does not necessarily determine directly if it can produce offspring and, if
so, how many. There are two more aspects which have influence on this: the environment
(including the rest of the population) and the potential mating partners. These factors are
the driving forces behind the three selection steps in natural selection
11
: (1) environmental
selection
12
(or survival selection and ecological selection) which determines whether an in-
dividual survives to reach fertility [684], (2) sexual selection, i. e., the choice of the mating
partner(s) [657, 2300], and (3) parental selection – possible choices of the parents against
pregnancy or newborn [2300].
10
Pareto comparisons are discussed in Section 3.3.5 on page 65 and Pareto ranking is introduced in
Section 28.3.3.
11
http://en.wikipedia.org/wiki/Natural_selection [accessed 2010-08-03]
12
http://en.wikipedia.org/wiki/Ecological_selection [accessed 2010-08-03]
28.1. INTRODUCTION 261
Example E28.6 (Evolution of Fish (Example E28.1) Cont.).
An intelligent fish may be eaten by a shark, a strong one can die from a disease, and even
the most perfect fish could die from a lighting strike before ever reproducing. The process
of selection is always stochastic, without guarantees – even a fish that is small, slow, and
lacks any sophisticated behavior might survive and could produce even more offspring than
a highly fit one. The degree to which a fish is adapted to its environment therefore can
be considered as some sort of basic probability which determines its expected chances to
survive environment selection. This is the form of selection propagated by Darwin [684] in
his seminal book “On the Origin of Species”.
The sexual or mating selection describes the fact that survival alone does not guarantee
reproduction: At least for sexually reproducing species, two individuals are involves in this
process. Hence, even a fish which is perfectly adapted to its environment will have low
chances to create offspring if it is unable to attract female interest. Sexual selection is
considered to be responsible for many of nature’s beauty, such as the colorful wings of a
butterfly or the tail of a peacock [657, 2300].
Example E28.7 (Computer Science and Sports (Example E28.5) cnt.).
For today’s noble computer scientist, survival selection is not the threatening factor. Further-
more, our relative fitness is on par with the fitness of the guys with the sports department,
at least in terms trade-off between theoretical and physical abilities. Yet, surprisingly, the
sexual selection is less friendly to some of us. Indeed, parallels with human behavior and
peacocks which prefer mating partners with colorful feathers over others regardless of their
further qualities can be drawn.
The third selection factor, parental , may lead to examples for reinforcement of certain phys-
ical characteristics which are much more extreme than Example E28.7. Rich Harris [2300]
argues that after the fertilization, i. e., before or after birth, parents may make a choice
for or against the life of their offspring. If the offspring does or is likely to exhibit certain
unwanted traits, this decision may be negative. A negative decision may also be caused by
environmental or economical decisions. A negative decision, however, may also again be
overruled after birth if the offspring turns out to possess especially favorable characteris-
tics. Rich Harris [2300], for instance, argues that parents in prehistoric European and Asian
cultures may this way have actively selected hairlessness and pale skin.
28.1.2.6.2 Evolutionary Algorithms
In Evolutionary Algorithms, survival selection is always applied in order to steer the op-
timization process towards good areas in the search space. From time to time, also a
subsequent sexual selection is used. For survival selection, the so-called selection algorithms
mate = selection(pop, v, mps) determine the mps individuals from the population pop which
should enter the mating pool mate based on their scalar fitness v. selection(pop, v, mps)
usually picks the fittest individuals and places them into mate with the highest probability.
The oldest selection scheme is called Roulette wheel and will be introduced in Section 28.4.3
on page 290 in detail. In the original version of this algorithm (intended for fitness maxi-
mization), the chance of an individual p to reproduce is proportional to its fitness v(p)pop.
More information on selection algorithms is given in Section 28.4.
28.1.2.7 Reproduction
262 28 EVOLUTIONARY ALGORITHMS
28.1.2.7.1 Biology
Last but not least, there is the reproduction. For this purpose, two approaches exist in
nature: the sexual and the asexual method.
Example E28.8 (Evolution of Fish (Example E28.1) Cont.).
Fish reproduce sexually. Whenever a female fish and a male fish mate, their genes will
be recombined by crossover. Furthermore, mutations may take place which slightly al-
ter the DNA. Most often, the mutations affect the characteristics of resulting larva only
slightly [2303]. Since fit fish produce offspring with higher probability, there is a good
chance that the next generation will contain at least some individuals that have combined
good traits from their parents and perform even better than them.
Asexual reproducing species basically create clones of themselves, a process during which,
again, mutations may take place. Regardless of how they were produced, the new generation
of offspring now enters the struggle for survival and the cycle of evolution starts in another
round.
28.1.2.7.2 Evolutionary Algorithms
In Evolutionary Algorithms, individuals usually do not have such a “gender”. Yet, the
concept of recombination, i. e., of binary search operations, is one of the central ideas of this
metaheuristic. Each individual from the mating pool can potentially be recombined with
every other one. In the car example, this means that we could copy the design of the engine
of one car and place it into the construction plan of the body of another car.
The concept of mutation is present in EAs as well. Parts of the genotype (and hence,
properties of the phenotype) can randomly be changed with a unary search operation. This
way, new construction plans for new cars are generated. The common definitions for repro-
duction operations in Evolutionary Algorithms are given in Section 28.5.
28.1.2.8 Does the Natural Paragon Fit?
At this point it should be mentioned that the direct reference to Darwinian evolution in
Evolutionary Algorithms is somehow controversial. The strongest argument against this
reference is that Evolutionary Algorithms introduce a change in semantics by being goal-
driven [2772] while the natural evolution is not.
Paterson [2134] further points out that “neither GAs [Genetic Algorithms] nor GP [Ge-
netic Programming] are concerned with the evolution of new species, nor do they use natural
selection.” On the other hand, nobody would claim that the idea of selection has not been
borrowed from nature although many additions and modifications have been introduced in
favor for better algorithmic performance.
An argument against close correspondence between EAs and natural evolution concerns
the development of different species. It depends on the definition of species: According to
Wikipedia – The Free Encyclopedia [2938], “a species is a class of organisms which are very
similar in many aspects such as appearance, physiology, and genetics.” In principle, there
is some elbowroom for us and we may indeed consider even different solutions to a single
problem in Evolutionary Algorithms as members of a different species – especially if the
binary search operation crossover/recombination applied to their genomes cannot produce
another valid candidate solution.
In Genetic Algorithms, the crossover and mutation operations may be considered to be
strongly simplified models of their natural counterparts. In many advanced EAs there may
be, however, operations which work entirely differently. Differential Evolution, for instance,
features a ternary search operation (see Chapter 33). Another example is [2912, 2914], where
28.1. INTRODUCTION 263
the search operations directly modify the phenotypes (G = X) according to heuristics and
intelligent decision.
Paterson [2134] points out that “neither GAs [Genetic Algorithms] nor GP [Genetic
Programming] are concerned with the evolution of new species, nor do they use natural
selection.” Nonetheless, nobody would claim that the idea of selection has not been borrowed
from nature although many additions and modifications have been introduced in favor for
better algorithmic performance.
Furthermore, although the concept of fitness
13
in nature is controversial [2542], it is often
considered as an a posteriori measurement. It then defines the ratio between the numbers
of occurrences of a genotype in a population after and before selection or the number of
offspring an individual has in relation to the number of offspring of another individual. In
Evolutionary Algorithms, fitness is an a priori quantity denoting a value that determines
the expected number of instances of a genotype that should survive the selection process.
However, one could conclude that biological fitness is just an approximation of the a priori
quantity arisen due to the hardness (if not impossibility) of directly measuring it.
My personal opinion (which may as well be wrong) is that the citation of Darwin here is
well motivated since there are close parallels between Darwinian evolution and Evolutionary
Algorithms. Nevertheless, natural and artificial evolution are still two different things and
phenomena observed in either of the two do not necessarily carry over to the other.
28.1.2.9 Formae and Implicit Parallelistm
Let us review our introductory fish example in terms of forma analysis. Fish can, for instance,
be characterized by the properties “clever” and “strong”. For the sake of simplicity, let us
assume that both properties may be true or false for a single individual. They hence
define two formae each. A third property can be the color, for which many different possible
variations exist. Some of them may be good in terms of camouflage, others maybe good in
terms of finding mating partners. Now a fish can be clever and strong at the same time, as
well as weak and green. Here, a living fish allows nature to evaluate the utility of at least
three different formae.
This issue has first been stated by Holland [1250] for Genetic Algorithms and is termed
implicit parallelism (or intrinsic parallelism). Since then, it has been studied by many dif-
ferent researchers [284, 1128, 1133, 2827]. If the search space and the genotype-phenotype
mapping are properly designed, implicit parallelism in conjunction with the crossover/re-
combination operations is one of the reasons why Evolutionary Algorithms are such a suc-
cessful class of optimization algorithms. The Schema Theorem discussed in Section 29.5.3
on page 342 gives a further explanation for this from of parallel evaluation.
28.1.3 Historical Classification
The family of Evolutionary Algorithms historically encompasses five members, as illustrated
in Figure 28.2. We will only enumerate them here in short. In-depth discussions will follow
in the corresponding chapters.
1. Genetic Algorithms (GAs) are introduced in Chapter 29 on page 325. GAs subsume
all Evolutionary Algorithms which have string-based search spaces G in which they
mainly perform combinatorial search.
2. The set of Evolutionary Algorithms which explore the space of real vectors X ⊆ R
n
,
often using more advanced mathematical operations, is called Evolution Strategies (ES,
see Chapter 30 on page 359). Closely related is Differential Evolution, a numerical
optimizer using a ternary search operation.
13
http://en.wikipedia.org/wiki/Fitness_(biology) [accessed 2008-08-10]
264 28 EVOLUTIONARY ALGORITHMS
3. For Genetic Programming (GP), which will be elaborated on in Chapter 31 on page 379,
we can provide two definitions: On one hand, GP includes all Evolutionary Algorithms
that grow programs, algorithms, and these alike. On the other hand, also all EAs that
evolve tree-shaped individuals are instances of Genetic Programming.
4. Learning Classifier Systems (LCS), discussed in Chapter 35 on page 457, are learning
approaches that assign output values to given input values. They often internally use
a Genetic Algorithm to find new rules for this mapping.
5. Evolutionary Programming (EP, see Chapter 32 on page 413) approaches usually fall
into the same category as Evolution Strategies, they optimize vectors of real numbers.
Historically, such approaches have also been used to derive algorithm-like structures
such as finite state machines.
EvolutionaryAlgorithms
LearningClassifier
Systems
GeneticAlgorithms
Differential
Evolution
GeneticProgramming
GGGP
LGP
SGP
Evolutionary
Programming
EvolutionStrategy
Figure 28.2: The family of Evolutionary Algorithms.
The early research [718] in Genetic Algorithms (see Section 29.1 on page 325), Genetic
Programming (see Section 31.1.1 on page 379), and Evolutionary Programming (see ??
on page ??) date back to the 1950s and 60s. Besides the pioneering work listed in these
sections, at least other important early contribution should not go unmentioned here: The
Evolutionary Operation (EVOP) approach introduced by Box and Draper [376, 377] in the
late 1950s. The idea of EVOP was to apply a continuous and systematic scheme of small
changes in the control variables of a process. The effects of these modifications are evaluated
and the process is slowly shifted into the direction of improvement. This idea was never
realized as a computer algorithm, but Spendley et al. [2572] used it as basis for their simplex
method which then served as progenitor of the Downhill Simplex algorithm
14
by Nelder and
Mead [2017]. [718, 1719] Satterthwaite’s REVOP [2407, 2408], a randomized Evolutionary
Operation approach, however, was rejected at this time [718].
Like in Section 1.3 on page 31, we here classified different algorithms according to their
semantics, in other words, corresponding to their specific search and problem spaces and
the search operations [634]. All five major approaches can be realized with the basic scheme
defined in Algorithm 28.1. To this simple structure, there exist many general improve-
ments and extensions. Often, these do not directly concern the search or problem spaces.
14
We discuss Nelder and Mead’s Downhill Simplex [2017] optimization method in Chapter 43 on page 501.
28.1. INTRODUCTION 265
Then, they also can be applied to all members of the EA family alike. Later in this chap-
ter, we will discuss the major components of many of today’s most efficient Evolutionary
Algorithms [593]. Some of the distinctive features of these EAs are:
1. The population size and the number of populations used.
2. The method of selecting the individuals for reproduction.
3. The way the offspring is included into the population(s).
28.1.4 Populations in Evolutionary Algorithms
Evolutionary Algorithms try to simulate processes known from biological evolution in order
to solve optimization problems. Species in biology consist of many individuals which live
together in a population. The individuals in a population compete but also mate and with
each other generation for generation. In EAs, it is quite similar: the population pop ∈
(GX)
ps
is a set of ps individuals p ∈ G X where each individual p possesses heredity
information (it’s genotype p.g) and a physical representation (the phenotype p.x).
There exist various way in which an Evolutionary Algorithm can process its population.
Especially interesting is how the population pop(t + 1) of the next generation is formed
from (1) the individuals selected from pop(t) which are collected in the mating pool mate
and (2) the new offspring derived from them.
28.1.4.1 Extinctive/Generational EAs
If it the new population only contains the offspring, we speak of extinctive selection [2015,
2478]. Extinctive selection can be compared with ecosystems of small single-cell organisms
(protozoa
15
) which reproduce in a asexual fissiparous
16
manner, i. e., by cell division. In this
case, of course, the elders will not be present in the next generation. Other comparisons can
partly be drawn to the sexual reproducing to octopi, where the female dies after protecting
the eggs until the larvae hatch, or to the black widow spider where the female devours
the male after the insemination. Especially in the area of Genetic Algorithms, extinctive
strategies are also known as generational algorithms.
Definition D28.3 (Generational). In Evolutionary Algorithms that are genera-
tional [2226], the next generation will only contain the offspring of the current one and
no parent individuals will be preserved.
Extinctive Evolutionary Algorithms can further be divided into left and right selec-
tion [2987]. In left extinctive selections, the best individuals are not allowed to reproduce in
order to prevent premature convergence of the optimization process. Conversely, the worst
individuals are not permitted to breed in right extinctive selection schemes in order to reduce
the selective pressure since they would otherwise scatter the fitness too much [2987].
28.1.4.2 Preservative EAs
In algorithms that apply a preservative selection scheme, the population is a combination of
the previous population and its offspring [169, 1461, 2335, 2772]. The biological metaphor for
such algorithms is that the lifespan of many organisms exceeds a single generation. Hence,
parent and child individuals compete with each other for survival.
Even in preservative strategies, it is not granted that the best individuals will always
survive, unless truncation selection (as introduced in Section 28.4.2 on page 289) is applied.
In general, most selection methods are randomized. Even if they pick the best candidate
solutions with the highest probabilities, they may also select worse individuals.
15
http://en.wikipedia.org/wiki/Protozoa [accessed 2008-03-12]
16
http://en.wikipedia.org/wiki/Binary_fission [accessed 2008-03-12]
266 28 EVOLUTIONARY ALGORITHMS
28.1.4.2.1 Steady-State Evolutionary Algorithms
Steady-state Evolutionary Algorithms [519, 2041, 2270, 2316, 2641], abbreviated by SSEA,
are preservative Evolutionary Algorithms which usually do not allow inferior individuals to
replace better ones in the population. Therefore, the binary search operator (crossover/re-
combination) is often applied exactly once per generation. The new offspring then replaces
the worst member of the population if and only if it has better fitness.
Steady-state Evolutionary Algorithms often produce better results than generational
EAs. Chafekar et al. [519], for example, introduce steady-state Evolutionary Algorithms
that are able to outperform generational NSGA-II (which you can find summarized in ??
on page ??) for some difficult problems. In experiments of Jones and Soule [1463] (primarily
focused on other issues), steady-state algorithms showed better convergence behavior in a
multi-modal landscape. Similar results have been reported by Chevreux [558] in the context
of molecule design optimization. The steady-state GENITOR has shown good behavior in
an early comparative study by Goldberg and Deb [1079]. On the other hand, with a badly
configured steady-state approach, we run also the risk of premature convergence. In our own
experiments on a real-world Vehicle Routing Problem discussed in Section 49.3 on page 536,
for instance, steady-state algorithms outperformed generational ones – but only if sharing
was applied [2912, 2914].
28.1.4.3 The µ-λ Notation
The µ-λ notation has been developed to describe the treatment of parent and offspring
individuals in Evolution Strategies (see Chapter 30) [169, 1243, 2437]. Now it is often used
in other Evolutionary Algorithms as well. Usually, this notation implies truncation selection,
i. e., the deterministic selection of the µ best individuals from a population. Then,
1. λ denotes the number of offspring (to be) created and
2. µ is the number of parent individuals, i. e., the size of the mating pool (mps = µ).
The population adaptation strategies listed below have partly been borrowed from the Ger-
man Wikipedia – The Free Encyclopedia [2938] site for Evolution Strategy
17
:
28.1.4.3.1 (µ +λ)-EAs
Using the reproduction operations, from µ parent individuals λ offspring are created from µ
parents. From the joint set of offspring and parents, i. e., a temporary population of size ps =
λ+µ, only the µ fittest ones are kept and the λ worst ones are discarted [302, 1244]. (µ +λ)-
strategies are thus preservative. The naming (µ + λ) implies that only unary reproduction
operators are used. However, in practice, this part of the convention is often ignored [302].
28.1.4.3.2 (µ, λ)-EAs
For extinctive selection patterns, the (µ, λ)-notation is used, as introduced by Schwefel
[2436]. EAs named according to this pattern create λ ≥ µ child individuals from the µ
available genotypes. From these, they only keep the µ best individuals and discard the µ
parents as well as the λ−µ worst children [295, 2436]. Here, λ corresponds to the population
size ps.
Again, this notation stands for algorithms which employ no crossover operator but mu-
tation only. Yet, many works on (µ, λ) Evolution Strategies, still apply recombination [302].
28.1.4.3.3 (1 + 1)-EAs
The EA only keeps track a single individual which is reproduced. The elder and the offspring
together form the population pop, i. e., ps = 2. From these two, only the better individual
will survive and enter the mating pool mate (mps = 1). This scheme basically is equivalent
to Hill Climbing introduced in Chapter 26 on page 229.
17
http://en.wikipedia.org/wiki/Evolutionsstrategie [accessed 2007-07-03]
28.1. INTRODUCTION 267
28.1.4.3.4 (1, 1)-EAs
The EA creates a single offspring from a single parent. The offspring then replaces its parent.
This is equivalent to a random walk and hence, does not optimize very well, see Section 8.2
on page 114.
28.1.4.3.5 (µ + 1)-EAs
Here, the population contains µ individuals from which one is drawn randomly. This indi-
vidual is reproduced. Alternatively, two parents can be chosen and recombined [302, 2278].
From the joint set of its offspring and the current population, the least fit individual is
removed.
28.1.4.3.6 (µ/ρ, λ)-EAs
The notation (µ/ρ, λ) is a generalization of (µ, λ) used to signify the application of ρ-ary
reproduction operators. In other words, the parameter ρ is added to denote the number of
parent individuals of one offspring. Normally, only unary mutation operators (ρ = 1) are
used in Evolution Strategies, the family of EAs the µ-λ notation stems from. If (binary)
recombination is also applied (as normal in general Evolutionary Algorithms), ρ = 2 holds. A
special case of (µ/ρ, λ) algorithms is the (µ/µ, λ) Evolution Strategy [1843] which, basically,
is similar to an Estimation of Distribution Algorithm.
28.1.4.3.7 (µ/ρ +λ)-EAs
Analogously to (µ/ρ, λ)-Evolutionary Algorithms, the (µ/ρ + λ)-Evolution Strategies are
generalized (µ, λ) approaches where ρ denotes the number of parents of an offspring indi-
vidual.
28.1.4.3.8 (µ

, λ

(µ, λ)
γ
)-EAs
Geyer et al. [1044, 1045, 1046] have developed nested Evolution Strategies where λ

off-
spring are created and isolated for γ generations from a population of the size µ

. In each
of the γ generations, λ children are created from which the fittest µ are passed on to the
next generation. After the γ generations, the best individuals from each of the γ isolated
candidate solutions propagated back to the top-level population, i. e., selected. Then, the
cycle starts again with λ

new child individuals. This nested Evolution Strategy can be
more efficient than the other approaches when applied to complex multimodal fitness envi-
ronments [1046, 2279].
28.1.4.4 Elitism
Definition D28.4 (Elitism). An elitist Evolutionary Algorithm [595, 711, 1691] ensures
that at least one copy of the best individual(s) of the current generation is propagated on
to the next generation.
The main advantage of elitism is that it is guaranteed that the algorithm will converge. This
means, meaning that once the basin of attraction of the global optimum has been discovered,
the Evolutionary Algorithm will converge to that optimum. On the other hand, the risk
of converging to a local optimum is also higher. Elitism is an additional feature of Global
Optimization algorithms – a special type of preservative strategy – which is often realized
by using a secondary population (called the archive) only containing the non-prevailed in-
dividuals. This population is updated at the end of each iteration. It should be noted that
268 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.3: X ←− elitistMOEA(OptProb, ps, as, mps)
Input: OptProb: a tuple (X, f , cmp
f
) of the problem space X, the objective functions
f , and a comparator function cmp
f
Input: ps: the population size
Input: as: the archive size
Input: mps: the population size
Data: t: the generation counter
Data: pop: the population
Data: mate: the mating pool
Data: v: the fitness function resulting from the fitness assigning process
Data: continue: a Boolean variable used for termination detection
Output: X: the set of the best elements found
1 begin
2 t ←− 1
3 archive ←− ()
4 pop(t = 1) ←− createPop(ps)
5 continue ←− true
6 while continue do
7 pop(t) ←− performGPM(pop(t) , gpm)
8 pop(t) ←− computeObjectives(pop(t) , f)
9 archive ←− updateOptimalSetN(archive, pop, cmp
f
)
10 archive ←− pruneOptimalSet(archive, as)
11 if terminationCriterion
__
then
12 continue ←− false
13 else
14 v ←− assignFitness(pop(t) , cmp
f
)
15 mate ←− selection(pop, v, mps)
16 t ←− t + 1
17 pop(t) ←− reproducePop(mate, ps)
18 return extractPhenotypes(extractBest(archive, cmp
f
))
28.1. INTRODUCTION 269
elitism is usually of fitness and concentrates on the true objective values of the individuals.
Such an archive-based elitism can also be combined with generational approaches. Algo-
rithm 28.3, an extension of Algorithm 28.2 on page 256 and the basic evolutionary cycle
given in Section 28.1.1:
1. The archive archive is the set of best individuals found by the algorithm. Initially, it is
the empty set ∅. Subsequently, it is updated with the function “updateOptimalSetN”
which inserts only the new, unprevailed elements from the population into it and also
removes archived individuals which become prevailed by those new optima. Algorithms
that realize such updating are defined in Section 28.6.1 on page 308.
2. If the archive becomes too large – it might theoretically contain uncountable many
individuals – “pruneOptimalSet” reduces it to a proper size, employing techniques like
clustering in order to preserve the element diversity. More about pruning can be found
in Section 28.6.3 on page 311.
3. You should also notice that both, the fitness assignment and selection processes, of
elitist Evolutionary Algorithms may take the archive as additional parameter. In
principle, such archive-based algorithms can also be used in non-elitist Evolutionary
Algorithms by simply replacing the parameter archive with ∅.
28.1.5 Configuration Parameters of Evolutionary Algorithms
Evolutionary Algorithms are very robust and efficient optimizers which can often also provide
good solutions for hard problems. However, their efficient is strongly influenced by design
choices and configuration parameters. In Chapter 24, we gave some general rules-of-thumb
for the design of the representation used in an optimizer. Here we want to list the set of
configuration and setup choices available to the user of an EA.
B
a
s
i
c
P
a
r
a
m
e
t
e
r
s
A
r
c
h
iv
e
/
P
r
u
n
in
g
S
e
l
e
c
t
i
o
n

A
l
g
o
r
i
t
h
m
F
i
t
n
e
s
s

A
s
s
i
g
n
m
e
n
t
P
r
o
c
e
s
s
S
e
a
r
c
h
S
p
a
c
e
a
n
d
S
e
a
r
c
h
O
p
e
r
a
t
io
n
s
G
e
n
o
t
y
p
e
-
P
h
e
n
o
t
y
p
e
M
a
p
p
i
n
g
(
p
o
p
u
l
a
t
i
o
n
s
i
z
e
,

m
u
-
t
a
t
i
o
n
a
n
d
c
r
o
s
s
o
v
e
r
r
a
t
e
s
)
Figure 28.3: The configuration parameters of Evolutionary Algorithms.
Figure 28.3 illustrates these parameters. The performance and success of an evolutionary
optimization approach applied to a problem given by a set of objective functions f and a
problem space X is defined by
270 28 EVOLUTIONARY ALGORITHMS
1. its basic parameter settings such as the population size ps or the crossover rate cr and
mutation rate mr,
2. whether it uses an archive archive of the best individuals found and, if so, the archive
size as and which pruning technology is used to prevent it from overflowing,
3. the fitness assignment process “assignFitness” and the selection algorithm “selection”,
4. the choice of the search space G and the search operations Op, and, finally
5. the genotype-phenotype mapping “gpm” connecting the search Space and the problem
space.
In ??, we go more into detail on how to state the configuration of an optimization algorithm
in order to fully describe experiments and to make them reproducible.
28.2 Genotype-Phenotype Mappings
In Evolutionary Algorithms as well as in any other optimization method, the search space
G is not necessarily the same as the problem space X. In other words, the elements g
processed by the search operations searchOp ∈ Op may differ from the candidate solutions
x ∈ X which are evaluated by the objective functions f : X → R. In such cases, a mapping
between G and X is necessary: the genotype-phenotype mapping (GPM). In Section 4.3 on
page 85 we gave a precise definition of the term GPM. Generally, we can distinguish four
types of genotype-phenotype mappings:
1. If G = X, the genotype-phenotype mapping is the identity mapping. The search
operations then directly work on the candidate solutions.
2. In many cases, direct mappings, i. e., simple functional transformations which directly
relate parts of the genotypes and phenotypes are used (see Section 28.2.1).
3. In generative mappings, the phenotypes are built step by step in a process directed by
the genotypes. Here, the genotypes serve as programs which direct the phenotypical
developments (see Section 28.2.2.1).
4. Finally, the third set of genotype-phenotype mappings additionally incorporates feed-
back from the environment or the objective functions into the developmental process
(see Section 28.2.2.2).
Which kind of genotype-phenotype mapping should be used for a given optimization task
strongly depends on the type of the task at hand. For most small-scale and many large-
scale problems, direct mappings are completely sufficient. However, especially if large-scale,
possibly repetitive or recursive structures are to be created, generative or ontogenic mappings
can be the best choice [777, 2735].
28.2.1 Direct Mappings
Direct mappings are explicit functional relationships between the genotypes and the phe-
notypes [887]. Usually, each element of the phenotype is related to one element in the
genotype [2735] or to a group of genes. Also, the algorithmic complexity of these mappings
usually is usually not above O(nlog n) where n is the length of a genotype. Based on the
work of Devert [777], we define explicit or direct mappings as follows:
28.2. GENOTYPE-PHENOTYPE MAPPINGS 271
Definition D28.5 (Explicit Representation). An explicit (or direct) representation is
a representation for which a computable functions exists that takes as argument the size
the genotype and gives an upper bound for the number of algorithm steps of the genotype-
phenotype mapping until the phenotype-building process has converged to a stable pheno-
type.
The algorithm steps mentioned in Definition D28.5 can be, for example, the write operations
to the phenotype. In the direct mappings we discuss in the remainder of this section, usually
each element of a candidate solution is written to exactly once. If bit strings are transformed
to vectors of natural numbers, for example, each element of the numerical vector is set to
one value. The same goes when scaling real vectors: the elements of the resulting vectors are
written to once. In the following, we provide some examples for direct genotype-phenotype
mappings.
Example E28.9 (Binary Search Spaces: Gray Coding).
Bit string genomes are sometimes complemented with the application of gray coding
18
during
the genotype-phenotype mapping. If the value of a natural number changes by one, in its
binary complement representation, an arbitrary number of bit may change. 8
10
has the
binary representation 1000
2
but 7
10
has 0111
2
. In gray code, only one bit changes: (one
possible) the gray code representation of 8
10
is 1100
2g
and 7
10
then corresponds to 0100
g2
.
Using gray code may thus be good to preserve locality (see Section 14.2) and to reduce
hidden biases [504] when dealing with bit-string encoded numbers in optimization. Collins
and Eaton [610] studied further encodings for Genetic Algorithms and found that their
E-code outperform both gray and direct binary coding in function optimization.
Example E28.10 (Real Search Spaces: Normalization).
If bounded real-valued, continuous problem spaces X ⊆ R
n
are investigated, it is possible
that the bounds of the dimensions differ significantly. Depending on the optimization algo-
rithm and search operations applied, this may deceive the optimization process to give more
emphasis to dimensions with large bounds while neglecting dimensions with smaller bounds.
Example E28.11 (Neglecting One Dimension of X).
Assume that a two dimensional problem space X was searched for the best solution of an
optimization task. The bounds of the first dimension are ˘ x
1
= −1 and
˘
x
1
= 1 and the
bounds of the second dimension are ˘ x
2
= −10
9
and
˘
x
2
= 10
9
. The only search operation
available would add random numbers normally distributed with expected value µ = 0 and
standard deviation σ = 0.01 to each dimension of the candidate vector. Then, a more or
less efficient search in the first dimension would be possible.
A genotype-phenotype mapping which allows the search to operate on G = [0, 1]
n
instead
of X = [˘ x
1
,
˘
x
1
] [˘ x
2
,
˘
x
2
] [˘ x
n
,
˘
x
n
], where ˘ x
i
∈ R and
˘
x
i
∈ R are the lower and upper
bounds of the i
th
dimension of the problem space X, respectively, would prevent such pitfalls
from the beginning:
gpm
scale
(g) = (˘ x
i
+g[i] ∗ (
˘
x
i
− ˘ x
i
∀i ∈ 1 . . . n) (28.1)
18
http://en.wikipedia.org/wiki/Gray_coding [accessed 2007-07-03]
272 28 EVOLUTIONARY ALGORITHMS
28.2.2 Indirect Mappings
Different from direct mappings, in the phenotype-building process in indirect mappings is
more complex [777]. The basic idea of such developmental approaches is that the phenotypes
are constructed from the genotypes in an iterative way, by stepwise applications of rules and
refinements. These refinements take into consideration some state information such as the
structure of the phenotype. The genotypes usually represent the rules and transformations
which have to be applied to the candidate solutions during the construction phase.
In nature, life begins with a single cell which divides
19
time and again until a mature
individual is formed
20
after the genetic information has been reproduced. The emergence
of a multi-cellular phenotype from its genotypic representation is called embryogenesis
21
in biology. Many researchers have drawn their inspiration for indirect mappings from this
embryogenesis, from the developmental process which embryos undergo when developing to
a fully-functional being inside an egg or the womb of their mother.
Therefore, developmental mappings received colorful names such as artificial embryo-
geny [2601], artificial ontongeny (AO) [356], computational embryogeny [273, 1637], cellular
encoding [1143], morphogenesis [291, 1431] and artificial development [2735].
Natural embryogeny is a highly complex translation from hereditary information to phys-
ical structures. Among other things, the DNA, for instance, encodes the structural design
information of the human brain. As pointed out by Manos et al. [1829], there are only
about 30 thousand active genes in the human genome (2800 million amino acids) for over
100 trillion neural connections in our cerebrum. A huge manifold of information is hence
decoded from “data” which is of a much lower magnitude. This is possible because the same
genes can be reused in order to repeatedly create the same pattern. The layout of the light
receptors in the eye, for example, is always the same – just their wiring changes.
If we draw inspiration from such a process, we find that features such as gene reuse and
(phenotypic) modularity seem to be beneficial. Indeed, allowing for the repetition of the
same modules (cells, receptors, . . . ) encoded in the same way many times is one of the
reasons why natural embryogeny leads to the development of organisms whose number of
modules surpasses the number of genes in the DNA by far.
The inspiration we get from nature can thus lead to good ideas which, in turn, can lead
to better results in optimization. Yet, we would like to point out that being inspired by
nature does – by far – not mean to be like nature. The natural embryogenesis processes
are much more complicated than the indirect mappings applied in Evolutionary Algorithms.
The same goes for approaches which are inspired by chemical diffusion and such and such.
Furthermore, many genotype-phenotype mappings based on gene reuse, modularity, and
feedback between the phenotypes and the environment are not related to and not inspired
by nature.
In the following, we divide indirect mappings into two categories. We distinguish GPMs
which solely use state information (such as the partly-constructed phenotype) and such that
additionally incorporate feedback from external information sources, the environment, into
the phenotype building process.
28.2.2.1 Generative Mappings: Phenotypic Interaction
In the following, we provide some examples for generative genotype-phenotype mappings.
28.2.2.1.1 Genetic Programming
19
http://en.wikipedia.org/wiki/Cell_division [accessed 2007-07-03]
20
Matter of fact, cell division will continue until the individual dies. However, this is not important here.
21
http://en.wikipedia.org/wiki/Embryogenesis [accessed 2007-07-03]
28.2. GENOTYPE-PHENOTYPE MAPPINGS 273
Example E28.12 (Grammar-Guided Genetic Programming).
Genetic Programming, as we will discuss in Chapter 31, is the family of Evolutionary Algo-
rithms which deals with the evolutionary synthesis of programs. In grammar-guided Genetic
Programming, usually the grammar of the programming language in which these programs
are to be specified is given. Each gene in the genotypes of these approaches usually encodes
for the application of a single rule of the grammar, as is, for instance, the case in Gads
(see ?? on page ??) which uses an integer string genome where each integer corresponds to
the ID of a rule. This way, step by step, complex sentences can be built by unfolding the
genotype. In Grammatical Evolution (??), integer strings with genes identifying grammati-
cal productions are used as well. Depending on the un-expanded variables in the phenotype,
these genes may have different meanings. Applying the rules identified by them may intro-
duce new unexpanded variables into the phenotype. Hence, the runtime of such a mapping
is not necessarily known in advance.
Example E28.13 (Graph-creating Trees).
In Standard Genetic Programming approaches, the genotypes are trees. Trees are very useful
to express programs and programs can be used to create many different structures. In a
genotype-phenotype mapping, the programs may, for example, be executed to build graphs
(as in Luke and Spector’s Edge Encoding [1788] discussed here in ?? on page ??) or electrical
circuits [1608]. In the latter case, the structure of the phenotype strongly depends on how
and how often given instructions or automatically defined functions in the genotypes are
called.
28.2.2.2 Ontogenic Mappings: Environmental Interaction
Full ontogenic (or developmental [887]) mappings even go a step farther than just utilizing
state information during the transformation of the genotypes to phenotypes [777]. This
additional information can stem from the objective functions. If simulations are involved
in the computation of the objective functions, the mapping may access the full simulation
information as well.
Example E28.14 (Developmental Wing Construction).
The phenotype could, for instance, be a set of points describing the surface of an airplane’s
wing. The genotype could be a real vector holding the weights of an artificial neural network.
The artificial neural network may then represent rules for moving the surface points as feed-
back to the air pressure on them given by a simulation of the wing. These rules can describe
a distinct wing shape which develops according to the feedback from the environment.
Example E28.15 (Developmental Termite Nest Construction).
Est´evez and Lipson [887] evolved rules for the construction of a termite nest. The phenotype
is the nest which consists of multiple cells. The fitness of the cell depends on how close the
temperature in each cell comes to a given target temperature. The rules which were evolved
in Est´evez and Lipson’s work are directly applied to the phenotypes in the simulations.
The decide whether and where cells were added to or removed from the nest based on the
temperature and density of the cells in the simulation.
274 28 EVOLUTIONARY ALGORITHMS
28.3 Fitness Assignment
28.3.1 Introduction
In multi-objective optimization, each candidate solution p.x is characterized by a vector
of objective values f (p.x). The early Evolutionary Algorithms, however, were intended to
perform single-objective optimization, i. e., were capable of finding solutions for a single
objective function alone. Historically, many algorithms for survival and mating selection
base their decision on such a scalar value.
A fitness assignment process, as defined in Definition D5.1 on page 94 in Section 5.2 is
thus used to transform the information given by the vector of objective values f (p.x) for
the phenotype p.x ∈ X of an individual p ∈ G X to a scalar fitness value v(p). This
assignment process takes place in each generation of the Evolutionary Algorithm and, as
stated in Section 5.2, may thus change in each iteration.
Including a fitness assignment step may also be beneficial in a MOEA, but can also
prove useful in a single-objective scenario: The fitness assigned to an individual may not
just reflect the absolute utility of an individual, but its suitability relative to the other
individuals in the population pop. Then, the fitness function has the character of a heuristic
function (see Definition D1.14 on page 34).
Besides such ranking data, fitness can also incorporate density/niching information. This
way, not only the quality of a candidate solution is considered, but also the overall diversity
of the population. Therefore, either the phenotypes p.x ∈ X or the genotypes p.g ∈ G may
be analyzed and compared. This can improve the chance of finding the global optima as
well as the performance of the optimization algorithm significantly. If many individuals in
the population occupy the same rank or do not dominate each other, for instance, such
information will be very helpful.
The fitness v(p.x) thus may not only depend on the candidate solution p.x itself, but
on the whole population pop of the Evolutionary Algorithm (and on the archive archive of
optimal elements, if available). In practical realizations, the fitness values are often stored
in a special member variable in the individual records. Therefore, v(p) can be considered as
a mapping that returns the value of such a variable which has previously been stored there
by a fitness assignment process assignFitness(pop, cmp
f
).
Definition D28.6 (Fitness Assignment Process). A fitness assignment process “assignFitness”
uses a comparison criterion cmp
f
(see Definition D3.20 on page 77) to create a function
v : GX →R
+
(see Definition D5.1 on page 94) which relates a scalar fitness value to each
individual p in the population pop as stated in Equation 28.2 (and archive archive, if an
archive is available Equation 28.3).
v = assignFitness(pop, cmp
f
) ⇒ v(p) ∈ R
+
∀p ∈ pop (28.2)
v = assignFitnessArch(pop, archive, cmp
f
) ⇒ v(p) ∈ R
+
∀p ∈ pop ∪ archive (28.3)
Listing 55.14 on page 718 provides a Java interface which enables us to implement different
fitness assignment processes according to Definition D28.6. In the context of this book, we
generally minimize fitness values, i. e., the lower the fitness of a candidate solution, the
better. If no density or sharing information is included and the fitness only reflects the
strict partial order introduced by the Pareto (see Section 3.3.5) or a prevalence relation (see
Section 3.5.2), Equation 28.4 will hold. If further information such as diversity metrics are
included in the fitness, this equation may no longer be applicable.
p
1
.x ≺ p
2
.x ⇒v(p
1
) < v(p
2
) ∀p
1
, p
2
∈ pop ∪ archive (28.4)
28.3. FITNESS ASSIGNMENT 275
28.3.2 Weighted Sum Fitness Assignment
The most primitive fitness assignment strategy would be to just use a weighted sum of
the objective values. This approach is very static and brings about the same problems as
weighted sum-based approach for defining what an optimum is discussed in Section 3.3.3 on
page 62. It makes no use of Pareto or prevalence relations. For computing the weighted sum
of the different objective values of a candidate solution, we reuse Equation 3.18 on page 62
from the weighted sum optimum definition. The weights have to be chosen in a way that
ensures that v(p) ∈ R
+
holds for all individuals p.x ∈ X.
v(p) = assignFitnessWeightedSum(pop) ⇔ (∀p ∈ pop ⇒ v(p) = ws(p.x)) (28.5)
28.3.3 Pareto Ranking
Another simple method for computing fitness values is to let them directly reflect the Pareto
domination (or prevalence) relation. There are two ways for doing this:
1. We can assign a value inversely proportional to the number of other individuals p
2
it
prevails to each individual p
1
.
v
1
(p
1
) ≡
1
[p
2
∈ pop : p
1
.x ≺ p
2
.x[ + 1
(28.6)
2. or we can assign the number of other individuals p3 it is dominated by to it.
v
2
(p
1
) ≡ [p
3
∈ pop : p
3
.x ≺ p
2
.x[ (28.7)
In case 1, individuals that dominate many others will here receive a lower fitness value
than those which are prevailed by many. Since fitness is subject to minimization they will
strongly be preferred. Although this idea looks appealing at first glance, it has a decisive
drawback which can also be observed in Example E28.16: It promotes individuals that reside
in crowded region of the problem space and underrates those in sparsely explored areas. If an
individual has many neighbors, it can potentially prevail many other individuals. However,
a non-prevailed candidate solution in a sparsely explored region of the search space may not
prevail any other individual and hence, receive fitness 1, the worst possible fitness under v
1
.
Thus, the fitness assignment process achieves exactly the opposite of what we want.
Instead of exploring the problem space and delivering a wide scan of the frontier of best
possible candidate solutions, it will focus all effort on a small set of individuals. We will
only obtain a subset of the best solutions and it is even possible that this fitness assignment
method leads to premature convergence to a local optimum.
A much better second approach for fitness assignment has first been proposed by Gold-
berg [1075]. Here, the idea is to assign the number of individuals it is prevailed by to each
candidate solution [1125, 1783]. This way, the previously mentioned negative effects will not
occur and the exploration pressure is applied to a much wider area of the Pareto frontier.
This so-called Pareto ranking can be performed by first removing all non-prevailed (non-
dominated) individuals from the population and assigning the rank 0 to them (since they
are prevailed by no individual and according to Equation 28.7 should receive v
2
). Then, the
same is performed with the rest of the population. The individuals only dominated by those
on rank 0 (now non-dominated) will be removed and get the rank 1. This is repeated until
all candidate solutions have a proper fitness assigned to them. Since we follow the idea of the
freer prevalence comparators instead of Pareto dominance relations, we will synonymously
refer to this approach as Prevalence ranking and define Algorithm 28.4 which is implemented
in Java in Listing 56.16 on page 759. It should be noted that this simple algorithm has
a complexity of O
_
ps
2
∗ [f [
_
. A more efficient approach with complexity O
_
ps log
|f |−1
ps
_
has been introduced by Jensen [1441] based on a divide-and-conquer strategy.
276 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.4: v
2
←− assignFitnessParetoRank(pop, cmp
f
)
Input: pop: the population to assign fitness values to
Input: cmp
f
: the prevalence comparator defining the prevalence relation
Data: i, j, cnt: the counter variables
Output: v
2
: a fitness function reflecting the Prevalence ranking
1 begin
2 for i ←− len(pop) −1 down to 0 do
3 cnt ←− 0
4 p ←− pop[i]
5 for j ←− len(pop) −1 down to 0 do
// Check whether cmp
f
(pop[j].x, p.x) < 0
6 if (j ,= i) ∧ (pop[j].x ≺ p.x) then cnt ←− cnt + 1
7 v
2
(p) ←− cnt
8 return v
2
Example E28.16 (Pareto Ranking Fitness Assignment).
Figure 28.4 and Table 28.1 illustrate the Pareto relations in a population of 15 individuals
and their corresponding objective values f
1
and f
2
, both subject to minimization.
f
1
f
2
8
0
1
2
3
4
5
6
7
8
10
1 2 3 4 5 6 7 8 9 10 12
11
13
9
10
1 6
7
2
15
14
12
5
3
4
ParetoFrontier
Figure 28.4: An example scenario for Pareto ranking.
28.3. FITNESS ASSIGNMENT 277
p.x dominates is dominated by v
1
(p) v
2
(p)
1 ¦5, 6, 8, 9, 14, 15¦ ∅
1
/7 0
2 ¦6, 7, 8, 9, 10, 11, 13, 14, 15¦ ∅
1
/10 0
3 ¦12, 13, 14, 15¦ ∅
1
/5 0
4 ∅ ∅ 1 0
5 ¦8, 15¦ ¦1¦
1
/3 1
6 ¦8, 9, 14, 15¦ ¦1, 2¦
1
/5 2
7 ¦9, 10, 11, 14, 15¦ ¦2¦
1
/6 1
8 ¦15¦ ¦1, 2, 5, 6¦
1
/2 4
9 ¦14, 15¦ ¦1, 2, 6, 7¦
1
/3 4
10 ¦14, 15¦ ¦2, 7¦
1
/3 2
11 ¦14, 15¦ ¦2, 7¦
1
/3 2
12 ¦13, 14, 15¦ ¦3¦
1
/4 1
13 ¦15¦ ¦2, 3, 12¦
1
/2 3
14 ¦15¦ ¦1, 2, 3, 6, 7, 9, 10, 11, 12¦
1
/2 9
15 ∅ ¦1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14¦ 1 13
Table 28.1: The Pareto domination relation of the individuals illustrated in Figure 28.4.
We computed the fitness values for both approaches for the individuals in the population
given in Figure 28.4 and noted them in Table 28.1. In column v
1
(p), we provide the fitness
values are inversely proportional to the number of individuals which are dominated by p.
A good example for the problem that approach 1 prefers crowded regions in the search
space while ignoring interesting candidate solutions in less populated regions are the four
non-prevailed individuals ¦1, 2, 3, 4¦ on the Pareto frontier. The best fitness is assigned to
the element 2, followed by individual 1. Although individual 7 is dominated (by 1), its
fitness is better than the fitness of the non-dominated element 3.
The candidate solution 4 gets the worst possible fitness 1, since it prevails no other
element. Its chances for reproduction are similarly low than those of individual 15 which
is dominated by all other elements except 4. Hence, both candidate solutions will most
probably be not selected and vanish in the next generation. The loss of individual 4 will
greatly decrease the diversity and further increase the focus on the crowded area near 1 and
2.
Column v
2
(p) in Table 28.1 shows that all four non-prevailed individuals now have the
best possible fitness 0 if the method given as Algorithm 28.4 is applied. Therefore, the focus
is not centered to the overcrowded regions.
However, the region around the individuals 1 and 2 has probably already extensively
been explored, whereas the surrounding of candidate solution 4 is rather unknown. More
individuals are present in the top-left area of Figure 28.4 and at least two individuals with
the same fitness as 3 and 4 reside there. Thus, chances are sill that more individuals
from this area will be selected than from the bottom-right area. A better approach of
fitness assignment should incorporate such information and put a bit more pressure into the
direction of individual 4, in order to make the Evolutionary Algorithm investigate this area
more thoroughly. Algorithm 28.4 cannot fulfill this hope.
28.3.4 Sharing Functions
Previously, we have mentioned that the drawback of Pareto ranking is that it does not
incorporate any information about whether the candidate solutions in the population reside
closely to each other or in regions of the problem space which are only sparsely covered
by individuals. Sharing, as a method for including such diversity information into the
278 28 EVOLUTIONARY ALGORITHMS
fitness assignment process, was introduced by Holland [1250] and later refined by Deb [735],
Goldberg and Richardson [1080], and Deb and Goldberg [742] and further investigated
by [1899, 2063, 2391].
Definition D28.7 (Sharing Function). A sharing function Sh : R
+
→ R
+
is a function
that maps the distance d = dist(p
1
, p
2
) between two individuals p
1
and p
2
to the real interval
[0, 1]. The value of Sh
σ
is 1 if d = 0 and 1 is d exceeds a specific constant value σ.
Sh
σ
(d = dist(p
1
, p
2
)) =
_
_
_
1 if d ≤ 0
∈ [0, 1] if 0 < d < σ
0 otherwise
(28.8)
Sharing functions can be employed in many different ways and are used by a variety of fitness
assignment processes [735, 1080]. Typically, the simple triangular function “Sh tri” [1268]
or one of its either convex (Sh cvex
ξ
) or concave (Sh ccav
ξ
) pendants with the power
ξ ∈ R
+
, ξ > 0 are applied. Besides using different powers of the distance-σ-ratio, another
approach is the exponential sharing method “Sh exp”.
Sh tri
σ
(d) =
_
1 −
d
σ
if 0 ≤ d < σ
0 otherwise
(28.9)
Sh cvex
σ,ξ
(d) =
_
_
1 −
d
σ
_
ξ
if 0 ≤ d < σ
0 otherwise
(28.10)
Sh ccav
σ,ξ
(d) =
_
1 −
_
d
σ
_
ξ
if 0 ≤ d < σ
0 otherwise
(28.11)
Sh exp
σ,ξ
(d) =
_
¸
_
¸
_
1 if d ≤ 0
0 if d ≥ σ
e

ξd
σ −e
−ξ
1−e
−ξ
otherwise
(28.12)
For sharing, the distance of the individuals in the search space G as well as their distance
in the problem space X or the objective space Y may be used. If the candidate solutions
are real vectors in the R
n
, we could use the Euclidean distance of the phenotypes of the
individuals directly, i. e., compute dist eucl(p
1
.x, p
2
.x). However, this makes only sense in
lower dimensional spaces (small values of n), since the Euclidean distance in R
n
does not
convey much information for high n.
In Genetic Algorithms, where the search space is the set of all bit strings G = B
n
of the length n, another suitable approach would be to use the Hamming distance
22
dist ham(p
1
.g, p
2
.g) of the genotypes. The work of Deb [735] indicates that phenotypical
sharing will often be superior to genotypical sharing.
Definition D28.8 (Niche Count). The niche count m(p, P) [738, 1899] of an individual
p ∈ P is the sum of its sharing values with all individual in a list P.
∀p ∈ P ⇒m(p, P) =

∀p

∈P
Sh
σ
(dist(p, p

)) (28.13)
The niche count “m” is always greater than or equal to 1, since p ∈ P and, hence,
Sh
σ
(dist(p, p)) = 1 is computed and added up at least once. The original sharing ap-
proach was developed for fitness assignment in single-objective optimization where only one
objective function f was subject to maximization. In this case, the objective value was sim-
ply divided by the niche count, punishing solutions in crowded regions [1899]. The goal of
22
See ?? on page ?? for more information on the Hamming distance.
28.3. FITNESS ASSIGNMENT 279
sharing was to distribute the population over a number of different peaks in the fitness land-
scape, with each peak receiving a fraction of the population proportional to its height [1268].
The results of dividing the fitness by the niche counts strongly depends on the height dif-
ferences of the peaks and thus, on the complexity class
23
of f. On f
1
∈ O(x), for instance,
the influence of “m” is much bigger than on a f
2
∈ O(e
x
).
By multiplying the niche count “m” to predetermined fitness values v

, we can use this
approach for fitness minimization in conjunction with a variety of other different fitness
assignment processes, but also inherit its shortcomings:
v(p) = v

(p) ∗ m(p, pop) , v

≡ assignFitness(pop, cmp
f
) (28.14)
Sharing was traditionally combined with fitness proportionate, i. e., roulette wheel selec-
tion
24
. Oei et al. [2063] have shown that if the sharing function is computed using the
parental individuals of the “old” population and then na¨ıvely combined with the more so-
phisticated tournament selection
25
, the resulting behavior of the Evolutionary Algorithm
may be chaotic. They suggested to use the partially filled “new” population to circumvent
this problem. The layout of Evolutionary Algorithms, as defined in this book, bases the
fitness computation on the whole set of “new” individuals and assumes that their objec-
tive values have already been completely determined. In other words, such issues simply
do not exist in multi-objective Evolutionary Algorithms as introduced here and the chaotic
behavior does occur.
For computing the niche count “m”, O
_
n
2
_
comparisons are needed. According to
Goldberg et al. [1085], sampling the population can be sufficient to approximate “m”in
order to avoid this quadratic complexity.
28.3.5 Variety Preserving Fitness Assignment
Using sharing and the niche counts na¨ıvely leads to more or less unpredictable effects. Of
course, it promotes solutions located in sparsely populated niches but how much their fitness
will be improved is rather unclear. Using distance measures which are not normalized can
lead to strange effects, too. Imagine two objective functions f
1
and f
2
. If the values of f
1
span from 0 to 1 for the individuals in the population whereas those of f
2
range from 0 to
10 000, the components of f
1
will most often be negligible in the Euclidian distance of two
individuals in the objective space Y. Another problem is that the effect of simple sharing
on the pressure into the direction of the Pareto frontier is not obvious either or depends on
the sharing approach applied. Some methods simply add a niche count to the Pareto rank,
which may cause non-dominated individuals having worse fitness than any others in the
population. Other approaches scale the niche count into the interval [0, 1) before adding it
which not only ensures that non-dominated individuals have the best fitness but also leave
the relation between individuals at different ranks intact which, however, does not further
variety very much.
In some of our experiments [2888, 2912, 2914], we applied a Variety Preserving Fitness
Assignment, an approach based on Pareto ranking using prevalence comparators and well-
dosed sharing. We developed it in order to mitigate the previously mentioned side effects
and balance the evolutionary pressure between optimizing the objective functions and max-
imizing the variety inside the population. In the following, we will describe it in detail and
give its specification in Algorithm 28.5.
Before this fitness assignment process can begin, it is required that all individuals with
infinite objective values must be removed from the population pop. If such a candidate
solution is optimal, i. e., if it has negative infinitely large objectives in a minimization process,
for instance, it should receive fitness zero, since fitness is subject to minimization. If the
23
See Chapter 12 on page 143 for a detailed introduction into complexity and the O-notation.
24
Roulette wheel selection is discussed in Section 28.4.3 on page 290.
25
You can find an outline of tournament selection in Section 28.4.4 on page 296.
280 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.5: v ←− assignFitnessVariety(pop, cmp
f
)
Input: pop: the population
Input: cmp
f
: the comparator function
Input: [implicit] f : the set of objective functions
Data: . . .: sorry, no space here, we’ll discuss this in the text
Output: v: the fitness function
1 begin
/
*
If needed: Remove all elements with infinite objective
values from pop and assign fitness 0 or
len(pop) +
_
len(pop) + 1 to them. Then compute the
prevalence ranks.
*
/
2 ranks ←− createList(len(pop) , 0)
3 maxRank ←− 0
4 for i ←− len(pop) −1 down to 0 do
5 for j ←− i −1 down to 0 do
6 k ←− cmp
f
(pop[i].x, pop[j].x)
7 if k < 0 then ranks[j] ←− ranks[j] + 1
8 else if k > 0 then ranks[i] ←− ranks[i] + 1
9 if ranks[i] > maxRank then maxRank ←− ranks[i]
// determine the ranges of the objectives
10 mins ←− createList([f [ , +∞)
11 maxs ←− createList([f [ , −∞)
12 foreach p ∈ pop do
13 for i ←− [f [ down to 1 do
14 if f
i
(p.x) < mins[i −1] then mins[i −1] ←− f
i
(p.x)
15 if f
i
(p.x) > maxs[i −1] then maxs[i −1] ←− f
i
(p.x)
16 rangeScales ←− createList([f [ , 1)
17 for i ←− [f [ −1 down to 0 do
18 if maxs[i] > mins[i] then rangeScales[i] ←− 1/ (maxs[i] −mins[i])
// Base a sharing value on the scaled Euclidean distance of
all elements
19 shares ←− createList(len(pop) , 0)
20 minShare ←− +∞
21 maxShare ←− −∞
22 for i ←− len(pop) −1 down to 0 do
23 curShare ←− shares[i]
24 for j ←− i −1 down to 0 do
25 dist ←− 0
26 for k ←− [f [ down to 1 do
27 dist ←− dist + [(f
k
(pop[i].x) −f
k
(pop[j].x)) ∗ rangeScales[k −1]]
2
28 s ←− Sh exp

|f |,16
_

dist
_
29 curShare ←− curShare +s
30 shares[j] ←− shares[j] +s
31 shares[i] ←− curShare
32 if curShare < minShare then minShare ←− curShare
33 if curShare > maxShare then maxShare ←− curShare
// Finally, compute the fitness values
34 scale ←−
_
1/ (maxShare −minShare) if maxShare > minShare
1 otherwise
35 for i ←− len(pop) −1 down to 0 do
36 if ranks[i] > 0 then
37 v(pop[i]) ←− ranks[i] +

maxRank ∗ scale ∗ (shares[i] −minShare)
38 else v(pop[i]) ←− scale ∗ (shares[i] −minShare)
28.3. FITNESS ASSIGNMENT 281
individual is infeasible, on the other hand, its fitness should be set to len(pop)+
_
len(pop)+1,
which is 1 larger than every other fitness values that may be assigned by Algorithm 28.5.
In lines 2 to 9, a list ranks is created which is used to efficiently compute the Pareto rank
of every candidate solution in the population. Actually, the word prevalence rank would be
more precise in this case, since we use prevalence comparisons as introduced in Section 3.5.2.
Therefore, Variety Preserving is not limited to Pareto optimization but may also incorporate
External Decision Makers (Section 3.5.1) or the method of inequalities (Section 3.4.5).
The highest rank encountered in the population is stored in the variable maxRank.
This value may be zero if the population contains only non-prevailed elements. The lowest
rank will always be zero since the prevalence comparators cmp
f
define order relations which
are non-circular by definition.
26
. maxRank is used to determine the maximum penalty for
solutions in an overly crowded region of the search space later on.
From line 10 to 18, the maximum and the minimum values that each objective function
takes on when applied to the individuals in the population are obtained. These values are
used to store the inverse of their ranges in the array rangeScales, which will be used to
scale all distances in each dimension (objective) of the individuals into the interval [0, 1].
There are [f [ objective functions in f and, hence, the maximum Euclidian distance between
two candidate solutions in the objective space (scaled to normalization) becomes
_
[f [. It
occurs if all the distances in the single dimensions are 1.
Between line 19 and 33, the scaled distance from every individual to every other candidate
solution in the objective space is determined. This distance is used to aggregate share values
(in the array shares). Therefore, again two nested loops are needed (lines 22 and 24). The
distance components of two individuals pop[i] and pop[j] are scaled and summarized in a
variable dist in line 27. The Euclidian distance between them is

dist which is used to
determine a sharing value in 28. We therefore have decided for exponential sharing, as
introduced in Equation 28.12 on page 278, with power 16 and σ =
_
[f [. For every individual,
we sum up all the shares (see line 30). While doing so, the minimum and maximum total
shares are stored in the variables minShare and maxShare in lines 32 and 33.
These variables are used scale all sharing values again into the interval [0, 1] (line 34),
so the individual in the most crowded region always has a total share of 1 and the most
remote individual always has a share of 0. Basically, two things are now known about the
individuals in pop:
1. their Pareto/Prevalence ranks, stored in the array ranks, giving information about
their relative quality according to the objective values and
2. their sharing values, held in shares, denoting how densely crowded the area around
them is.
With this information, the final fitness values of an individual p is determined as follows: If p
is non-prevailed, i. e., its rank is zero, its fitness is its scaled total share (line 38). Otherwise,
the square root of the maximum rank,

maxRank, is multiplied with the scaled share and
add it to its rank (line 37). By doing so, the supremacy of non-prevailed individuals in
the population is preserved. Yet, these individuals can compete with each other based on
the crowdedness of their location in the objective space. All other candidate solutions may
degenerate in rank, but at most by the square root of the worst rank.
Example E28.17 (Variety Preserving Fitness Assignment).
Let us now apply Variety Preserving to the examples for Pareto ranking from Section 28.3.3.
In Table 28.2, we again list all the candidate solutions from Figure 28.4 on page 276, this
time with their objective values obtained with f
1
and f
2
corresponding to their coordinates
in the diagram. In the third column, you can find the Pareto ranks of the individuals as it
has been listed in Table 28.1 on page 277. The columns share/u and share/s correspond to
the total sharing sums of the individuals, unscaled and scaled into [0, 1].
26
In all order relations imposed on finite sets there is always at least one “smallest” element. See Sec-
tion 51.8 on page 645 for more information.
282 28 EVOLUTIONARY ALGORITHMS
2
4
6
8
10 f
1
0
2
4
6
8
10
f
2
0
0.2
0.4
0.6
0.8
1
s
h
a
r
e

p
o
t
e
n
t
i
a
l
1
5
6
8
2
9
7
10
11
14
15
3
12
13
4
ParetoFrontier
Figure 28.5: The sharing potential in the Variety Preserving Example E28.17.
28.3. FITNESS ASSIGNMENT 283
x f
1
f
2
rank share/u share/s v(x)
1 1 7 0 0.71 0.779 0.779
2 2 4 0 0.239 0.246 0.246
3 6 2 0 0.201 0.202 0.202
4 10 1 0 0.022 0 0
5 1 8 1 0.622 0.679 3.446
6 2 7 2 0.906 1 5.606
7 3 5 1 0.531 0.576 3.077
8 2 9 4 0.314 0.33 5.191
9 3 7 4 0.719 0.789 6.845
10 4 6 2 0.592 0.645 4.325
11 5 5 2 0.363 0.386 3.39
12 7 3 1 0.346 0.366 2.321
13 8 4 3 0.217 0.221 3.797
14 7 7 9 0.094 0.081 9.292
15 9 9 13 0.025 0.004 13.01
Table 28.2: An example for Variety Preserving based on Figure 28.4.
But first things first; as already mentioned, we know the Pareto ranks of the candidate
solutions from Table 28.1, so the next step is to determine the ranges of values the objective
functions take on for the example population. These can again easily be found out from
Figure 28.4. f
1
spans from 1 to 10, which leads to rangeScale[0] =
1
/9. rangeScale[1] =
1
/8
since the maximum of f
2
is 9 and its minimum is 1. With this, we now can compute the
(dimensionally scaled) distances amongst the candidate solutions in the objective space,
the values of

dist in algorithm Algorithm 28.5, as well as the corresponding values of
the sharing function Sh exp

|f |,16
_

dist
_
. We noted these in Table 28.3, using the upper
triangle of the table for the distances and the lower triangle for the shares.
Upper triangle: distances. Lower triangle: corresponding share values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0.391 0.836 1.25 0.125 0.111 0.334 0.274 0.222 0.356 0.51 0.833 0.863 0.667 0.923
2 0.012 0.51 0.965 0.512 0.375 0.167 0.625 0.391 0.334 0.356 0.569 0.667 0.67 0.998
3 7.7E-5 0.003 0.462 0.933 0.767 0.502 0.981 0.708 0.547 0.391 0.167 0.334 0.635 0.936
4 6.1E-7 1.8E-5 0.005 1.329 1.163 0.925 1.338 1.08 0.914 0.747 0.417 0.436 0.821 1.006
5 0.243 0.003 2.6E-5 1.8E-7 0.167 0.436 0.167 0.255 0.417 0.582 0.914 0.925 0.678 0.898
6 0.284 0.014 1.7E-4 1.8E-6 0.151 0.274 0.25 0.111 0.255 0.417 0.747 0.765 0.556 0.817
7 0.023 0.151 0.003 2.9E-5 0.007 0.045 0.512 0.25 0.167 0.222 0.51 0.569 0.51 0.833
8 0.045 0.001 1.5E-5 1.5E-7 0.151 0.059 0.003 0.274 0.436 0.601 0.933 0.914 0.609 0.778
9 0.081 0.012 3.3E-4 4.8E-6 0.056 0.284 0.059 0.045 0.167 0.334 0.669 0.67 0.444 0.712
10 0.018 0.023 0.002 3.2E-5 0.009 0.056 0.151 0.007 0.151 0.167 0.502 0.51 0.356 0.67
11 0.003 0.018 0.012 2.1E-4 0.001 0.009 0.081 0.001 0.023 0.151 0.334 0.356 0.334 0.669
12 8E-5 0.002 0.151 0.009 3.2E-5 2.1E-4 0.003 2.6E-5 0.001 0.003 0.023 0.167 0.5 0.782
13 5.7E-5 0.001 0.023 0.007 2.9E-5 1.7E-4 0.002 3.2E-5 0.001 0.003 0.018 0.151 0.391 0.635
14 0.001 0.001 0.001 9.3E-5 4.6E-4 0.002 0.003 0.001 0.007 0.018 0.023 0.003 0.012 0.334
15 2.9E-5 1.2E-5 2.5E-5 1.1E-5 3.9E-5 9.7E-5 8E-5 1.5E-4 3.1E-4 0.001 0.001 1.4E-4 0.001 0.023
Table 28.3: The distance and sharing matrix of the example from Table 28.2.
The value of the sharing function can be imagined as a scalar field, as illustrated in
Figure 28.5. In this case, each individual in the population can be considered as an electron
that will build an electrical field around it resulting in a potential. If two electrons come
284 28 EVOLUTIONARY ALGORITHMS
close, repulsing forces occur, which is pretty much the same what we want to do with Variety
Preserving. Unlike the electrical field, the power of the sharing potential falls exponentially,
resulting in relatively steep spikes in Figure 28.5 which gives proximity and density a heavier
influence. Electrons in atoms on planets are limited in their movement by other influences
like gravity or nuclear forces, which are often stronger than the electromagnetic force. In
Variety Preserving, the prevalence rank plays this role – as you can see in Table 28.2, its
influence on the fitness is often dominant.
By summing up the single sharing potentials for each individual in the example, we
obtain the fifth column of Table 28.3, the unscaled share values. Their minimum is around
0.022 and the maximum is 0.94. Therefore, we must subtract 0.022 from each of these values
and multiply the result with 1.131. By doing so, we build the column shares/s. Finally, we
can compute the fitness values v(x) according to lines 38 and 37 in Algorithm 28.5.
The last column of Table 28.2 lists these results. All non-prevailed individuals have
retained a fitness value less than one, lower than those of any other candidate solution in
the population. However, amongst these best individuals, candidate solution 4 is strongly
preferred, since it is located in a very remote location of the objective space. Individual
1 is the least interesting non-dominated one, because it has the densest neighborhood in
Figure 28.4. In this neighborhood, the individuals 5 and 6 with the Pareto ranks 1 and 2 are
located. They are strongly penalized by the sharing process and receive the fitness values
v(5) = 3.446 and v(6) = 5.606. In other words, individual 5 becomes less interesting than
candidate solution 7 which has a worse Pareto rank. 6 now is even worse than individual 8
which would have a fitness better by two if strict Pareto ranking was applied.
Based on these fitness values, algorithms like Tournament selection (see Section 28.4.2)
or fitness proportionate approaches (discussed in Section 28.4.3) will pick elements in a
way that preserves the pressure into the direction of the Pareto frontier but also leads to a
balanced and sustainable variety in the population. The benefits of this approach have been
shown, for instance, in [2188, 2888, 2912, 2914].
28.3.6 Tournament Fitness Assignment
In tournament fitness assignment which is a generalization of the q-level (binary) tournament
selection introduced by Weicker [2879], the fitness of each individual is computed by letting
it compete q times against r other individuals (with r = 1 as default) and counting the
number of competitions it loses. For a better understanding of the tournament metaphor
see Section 28.4.4 on page 296, where the tournament selection scheme is discussed. The
number of losses will approximate the Pareto rank of the individual, but it is a bit more
randomized than that. If the number of tournaments won was instead of the losses, the
same problems than in the first idea of Pareto ranking would be encountered.
28.4. SELECTION 285
Algorithm 28.6: v ←− assignFitnessTournament(q, r, pop, cmp
f
)
Input: q: the number of tournaments per individuals
Input: r: the number of other contestants per tournament, normally 1
Input: pop: the population to assign fitness values to
Input: cmp
f
: the comparator function providing the prevalence relation
Data: i, j, k, z: counter variables
Data: b: a Boolean variable being true as long as a tournament isn’t lost
Data: p: the individual currently examined
Output: v: the fitness function
1 begin
2 for i ←− len(pop) −1 down to 0 do
3 z ←− q
4 p ←− pop[i]
5 for j ←− q down to 1 do
6 b ←− true
7 k ←− r
8 while (k > 0) ∧ b do
9 b ←− pop[⌊randomUni0len(pop)⌋].x ≺ p.x
10 k ←− k −1
11 if b then z ←− z −1
12 v(p) ←− z
13 return v
28.4 Selection
28.4.1 Introduction
Definition D28.9 (Selection). In Evolutionary Algorithms, the selection
27
operation
mate = selection(pop, v, mps) chooses mps individuals according to their fitness values v
(subject to minimization) from the population pop and places them into the mating pool
mate [167, 340, 1673, 1912].
mate = selection(pop, v, mps) ⇒ ∀p ∈ mate ⇒ p ∈ pop
∀p ∈ pop ⇒ p ∈ GX
v(p) ∈ R
+
∀p ∈ pop
(len(mate) ≥ min ¦len(pop) , mps¦) ∧ (len(mate) ≤ mps)
(28.15)
Definition D28.10 (Mating Pool). The mating pool mate in an Evolutionary Algorithm
contains the individuals which are selected for further investigation and on which the repro-
duction operations (see Section 28.5 on page 306) will subsequently be applied.
27
http://en.wikipedia.org/wiki/Selection_%28genetic_algorithm%29 [accessed 2007-07-03]
286 28 EVOLUTIONARY ALGORITHMS
Selection may behave in a deterministic or in a randomized manner, depending on the
algorithm chosen and its application-dependant implementation. Furthermore, elitist Evo-
lutionary Algorithms may incorporate an archive archive in the selection process, as outlined
in ??.
Generally, there are two classes of selection algorithms: such with replacement (annotated
with a subscript
r
in this book) and such without replacement (here annotated with a
subscript
w
) [2399]. In Figure 28.6, we present the transition from generation t to t + 1 for
both types of selection methods.In a selection algorithm without replacement, each individual
A B C
E F
I
D
H G
A B C
D G
Populationingenerationt
Pop(t);ps=9
selection
with
replacement
individualscanenter
matingpoolmorethanonce
B D ´ C A ´
D’
A’
A B ´
B
G
C’ D B ´
A B ´ A D ´
B’
A’
B D ´
C
B
D A ´ D’
individualsentermating
poolatmostonce
duplication,
mutation,
crossover
...
duplication,
mutation,
crossover
...
Populationingenerationt+1
Pop(t+1);ps=9
matingpoolattimestept
Mate(t);ms 5 =
selection
without
replacement
insomeEAs(steady-state,( + )ES,...)
theparentscompetewiththeoffsprings
m l
B A A
D B
matingpoolattimestept
Mate(t);ms 5 =
Figure 28.6: Selection with and without replacement.
from the population pop can enter the mating pool mate at most once in each generation.
If a candidate solution has been selected, it cannot be selected a second time in the same
generation. This process is illustrated in the bottom row of Figure 28.6.
The mating pool returned by algorithms with replacement can contain the same indi-
vidual multiple times, as sketched in the top row of Figure 28.6. The dashed gray line in
this figure stands for the possibility to include the parent generation in the next selection
process. In other words, the parents selected into mate(t) may compete with their children
in popt+1 and together take part in the next environmental selection step. This is the case
in (µ + λ)-Evolution Strategies and in steady-state Evolutionary Algorithms, for example.
In extinctive/generational EAs, such as, for example, (µ, λ)-Evolution Strategies, the parent
generation simply is discarded, as discussed in Section 28.1.4.1.
mate = selection
w
(pop, v, mps) ⇒ count(p, mate) = 1 ∀p ∈ mate (28.16)
mate = selection
r
(pop, v, mps) ⇒ count(p, mate) ≥ 1 ∀p ∈ mate (28.17)
The selection algorithms have major impact on the performance of Evolutionary Algorithms.
Their behavior has thus been subject to several detailed studies, conducted by, for instance,
Blickle and Thiele [340], Chakraborty et al. [520], Goldberg and Deb [1079], Sastry and
Goldberg [2400], and Zhong et al. [3079], just to name a few.
Usually, fitness assignment processes are carried out before selection and the selection
algorithms base their decisions solely on the fitness v of the individuals. It is possible to
28.4. SELECTION 287
rely on the prevalence or dominance relation, i. e., to write selection(pop, cmp
f
, mps) instead
of selection(pop, v, mps), or, in case of single-objective optimization, to use the objective
values directly (selection(pop, f, mps)) and thus, to save the costs of the fitness assignment
process. However, this can lead to the same problems that occurred in the first approach of
Pareto-ranking fitness assignment (see Point 1 in Section 28.3.3 on page 275).
As outlined in Paragraph 28.1.2.6.2, Evolutionary Algorithms generally apply survival
selection, sometimes followed by a sexual selectionSelection!sexual procedure. In this section,
we will focus on the former one.
28.4.1.1 Visualization
In the following sections, we will discuss multiple selection algorithms. In order to ease
understanding them, we will visualize the expected number of offspring ESp of an indi-
vidual, i. e., the number of times it will take part in reproduction. If we assume uniform
sexual selection (i. e., no sexual selection), this value will be proportional to its number of
occurrences in the mating pool.
Example E28.18 (Individual Distribution after Selection).
Therefore, we will use the special case where we have a population pop of the constant size
of ps(t) = ps(t + 1) = 1000 individuals. The individuals are numbered from 0 to 999, i. e.,
p
0
..p
999
. For the sake of simplicity, we assume that sexual selection from the mating pool
mate is uniform and that only a unary reproduction operation such as mutation is used.
The fitness of an individual strongly influence its chance for reproduction. We consider four
cases with fitness subject to minimization:
1. As sketched in Fig. 28.7.a, the individual p
i
has fitness i, i. e., v
1
(p
0
) = 0, v
1
(p
1
) =
1, . . . , v
1
(p
999
) = 999.
2. Individual p
i
has fitness (i + 1)
3
, i. e., v
2
(p
0
) = 1, v
2
(p
1
) = 3, . . . , v
2
(p
999
) =
1 000 000 000, as illustrated in Fig. 28.7.b.
288 28 EVOLUTIONARY ALGORITHMS
0
200
400
600
800
1000
200 400 600 800
i
v (p)
1 i
Fig. 28.7.a: Case 1: v
1
(p
i
) = i
0
2e8
4e8
6e8
8e8
1e9
200 400 600 800
i
v (p)
2 i
Fig. 28.7.b: Case 2: v
2
(p
i
) = (i + 1)
3
0
0.4
0.8
1.2
1.6
2.0
200 400 600 800
i
v (p)
3 i
Fig. 28.7.c: Case 3: v
3
(p
i
) =
0.0001i if i < 999, 1 otherwise
0
0.4
0.8
1.2
1.6
2.0
200 400 600 800
i
v (p)
4 i
Fig. 28.7.d: Case 4: v
4
(p
i
) = 1 + v
3
(p
i
)
Figure 28.7: The four example fitness cases.
28.4. SELECTION 289
28.4.2 Truncation Selection
Truncation selection
28
, also called deterministic selection, threshold selection, or breeding
selection, returns the k ≤ mps best elements from the list pop. The name breeding selection
stems from the fact that farmers who breed animals or crops only choose the best individuals
for reproduction, too [302]. The selected elements are copied as often as needed until the
mating pool size mps is reached. Most often k equals the mating pool size mps which usually
is significantly smaller than the population size ps.
Algorithm 28.7 realizes truncation selection with replacement (i. e., where individuals
may enter the mating pool mate more than once) and Algorithm 28.8 represents truncation
selection without replacement. You can find both algorithms are implemented in one single
class whose sources are provided in Listing 56.13 on page 754. They first sort the population
in ascending order according to the fitness v. For truncation selection without replacement,
k = mps and k < ps always holds, since otherwise, this approach would make no sense since
it would select all individuals. Algorithm 28.8 simply returns the best mps individuals from
the population. Algorithm 28.7, on the other hand, iterates from 0 to mps − 1 after the
sorting and inserts only the elements with indices from 0 to k −1 into the mating pool.
Algorithm 28.7: mate ←− truncationSelection
r
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: k: cut-off rank, usually k = mps
Data: i: counter variable
Output: mate: the survivors of the truncation which now form the mating pool
1 begin
2 mate ←− ()
3 k ←− min ¦k, ps¦
4 pop ←− sortAsc(pop, v)
5 for i ←− 1 up to mps do
6 mate ←− addItem(mate, pop[i mod k])
7 return mate
Algorithm 28.8: mate ←− truncationSelection
w
(mps, pop, v)
Input: pop: the list of individuals to select from
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Output: mate: the survivors of the truncation which now form the mating pool
1 begin
2 pop ←− sortAsc(pop, v)
3 return deleteRange(pop, mps, ps −mps)
Truncation selection is typically applied in Evolution Strategies with (µ +λ) and (µ, λ)
Evolution Strategies as well as in steady-state EAs. In general Evolutionary Algorithms, it
28
http://en.wikipedia.org/wiki/Truncation_selection [accessed 2007-07-03]
290 28 EVOLUTIONARY ALGORITHMS
should be combined with a fitness assignment process that incorporates diversity information
in order to prevent premature convergence. Recently, L¨assig and Hoffmann [1686], L¨assig
et al. [1687] have proved that threshold selection, a truncation selection method where a cut-
off fitness is defined after which no further individuals are selected is the optimal selection
strategy for crossover, provided that the right cut-off value is used. In practical applications,
this value is normally not known.
Example E28.19 (Truncation Selection (Example E28.18 cntd)).
In Figure 28.8, we sketch the number of offspring for the individuals p ∈ pop from Exam-
0
1
2
3
4
5
6
7
ES(p)
i
9
k=
k=
0.100 * ps
0.250 * ps
k= 0.125 * ps
k= 0.500 * ps
k= 1.000 * ps
100 200 300 400 500 600 700 i 900
100 200 300 400 500 600 700 900 v (p)
1 i
1e6 8e6 3e7 6e7 1e8 2e8 3e8 7e8 v (p)
2 i
0
0
1
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.09 v (p)
3 i
0
1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.09 1 v (p)
4 i
Figure 28.8: The number of expected offspring in truncation selection.
ple E28.18 if truncation selection would be applied. As stated, k is less or equal to the
mating pool size mps. Under the assumption that no sexual selection takes place, ESp only
depends on k.
Under truncation selection, the diagram will look exactly the same regardless which of
the four example fitness configurations we use. Here, only the order of individuals and not
on the numerical relation of their fitness is important. If we set k = ps, each individual will
have one offspring in average. If k =
1
2
mps, the top-50% individuals will have two offspring
and the others none. For k =
1
10
mps, only the best 100 from the 1000 candidate solutions
will reach the mating pool and reproduce 10 times in average.
28.4.3 Fitness Proportionate Selection
Fitness proportionate selection
29
was the selection method applied in the original Genetic
Algorithms as introduced by Holland [1250] and therefore is one of the oldest selection
29
http://en.wikipedia.org/wiki/Fitness_proportionate_selection [accessed 2008-03-19]
28.4. SELECTION 291
schemes. In this Genetic Algorithm, fitness was subject to maximization. Under fitness
proportionate selection, as its name already implies, the probability P (select(p)) of an in-
dividual p ∈ pop to enter the mating pool is proportional to its fitness v(p). This relation
in its original form is defined in Equation 28.18 below: The probability P (select(p)) that
individual p is picked in each selection decision corresponds to the fraction of its fitness in
the sum of the fitness of all individuals.
P (select(p)) =
v(p)

∀p

∈pop
v(p

)
(28.18)
There exists a variety of approaches which realize such probability distributions [1079], like
stochastic remainder selection [358, 409] and stochastic universal selection [188, 1133]. The
most commonly known method is the Monte Carlo roulette wheel selection by De Jong
[711], where we imagine the individuals of a population to be placed on a roulette
30
wheel
as sketched in Fig. 28.9.a. The size of the area on the wheel standing for a candidate solution
is proportional to its fitness. The wheel is spun, and the individual where it stops is placed
into the mating pool mate. This procedure is repeated until mps individuals have been
selected.
In the context of this book, fitness is subject to minimization. Here, higher fitness values
v(p) indicate unfit candidate solutions p.x whereas lower fitness denotes high utility. Fur-
thermore, the fitness values are normalized into a range of [0, 1], because otherwise, fitness
proportionate selection will handle the set of fitness values ¦0, 1, 2¦ in a different way than
¦10, 11, 12¦. Equation 28.22 defines the framework for such a (normalized) fitness propor-
tionate selection. It is realized in Algorithm 28.9 as a variant with and in Algorithm 28.10
without replacement.
minV = min ¦v(p) ∀p ∈ pop¦ (28.19)
maxV = max ¦v(p) ∀p ∈ pop¦ (28.20)
normV(p) =
maxV−v(p.x)
maxV−minV
(28.21)
P (select(p)) ≈
normV (p)

∀p

∈pop
normV (p

)
(28.22)
30
http://en.wikipedia.org/wiki/Roulette [accessed 2008-03-20]
292 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.9: mate ←− rouletteWheelSelection
r
(pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: a counter variable
Data: a: a temporary store for a numerical value
Data: A: the array of fitness values
Data: min, max, sum: the minimum, maximum, and sum of the fitness values
Output: mate: the mating pool
1 begin
2 A ←− createList(ps, 0)
3 min ←− ∞
4 max ←− −∞
// Initialize fitness array and find extreme fitnesses
5 for i ←− 0 up to ps −1 do
6 a ←− v(pop[i])
7 A[i] ←− a
8 if a < min then min ←− a
9 if a > max then max ←− a
// break situations where all individuals have the same
fitness
10 if max = min then
11 max ←− max + 1
12 min ←− min −1
/
*
aggregate fitnesses: A = (normV(pop[0]),normV(pop[0]) +
normV(pop[1]),normV(pop[0]) +normV(pop[1]) +normV(pop[2]),. . . )
*
/
13 sum ←− 0
14 for i ←− 0 up to ps −1 do
15 sum ←− sum+
max −A[i]
max −min
16 A[i] ←− sum
/
*
spin the roulette wheel: the chance that a uniformly
distributed random number lands in A[i] is (inversely)
proportional to v(pop[1])
*
/
17 for i ←− 0 up to mps −1 do
18 a ←− searchItemAS(randomUni[0, sum) , A)
19 if a < 0 then a ←− −a −1
20 mate ←− addItem(mate, pop[a])
21 return mate
28.4. SELECTION 293
Algorithm 28.10: mate ←− rouletteWheelSelection
w
(pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: a counter variable
Data: a, b: temporary stores for numerical values
Data: A: the array of fitness values
Data: min, max, sum: the minimum, maximum, and sum of the fitness values
Output: mate: the mating pool
1 begin
2 A ←− createList(ps, 0)
3 min ←− ∞
4 max ←− −∞
// Initialize fitness array and find extreme fitnesses
5 for i ←− 0 up to ps −1 do
6 a ←− v(pop[i])
7 A[i] ←− a
8 if a < min then min ←− a
9 if a > max then max ←− a
// break situations where all individuals have the same
fitness
10 if max = min then
11 max ←− max + 1
12 min ←− min −1
/
*
aggregate fitnesses: A = (normV(pop[0]),normV(pop[0]) +
normV(pop[1]),normV(pop[0]) +normV(pop[1]) +normV(pop[2]),. . . )
*
/
13 sum ←− 0
14 for i ←− 0 up to ps −1 do
15 sum ←− sum+
max −A[i]
max −min
16 A[i] ←− sum
/
*
spin the roulette wheel: the chance that a uniformly
distributed random number lands in A[i] is (inversely)
proportional to v(pop[1])
*
/
17 for i ←− 0 up to min ¦mps, ps¦ −1 do
18 a ←− searchItemAS(randomUni[0, sum) , A)
19 if a < 0 then a ←− −a −1
20 if a = 0 then b ←− 0
21 else b ←− A[a −1]
22 b ←− A[a] −b
// remove the element at index a from A
23 for j ←− a + 1 up to len(A) −1 do
24 A[j] ←− A[j] −b
25 sum ←− sum−b
26 mate ←− addItem(mate, pop[a])
27 pop ←− deleteItem(pop, a)
28 A ←− deleteItem(A, a)
29 return mate
294 28 EVOLUTIONARY ALGORITHMS
Example E28.20 (Roulette Wheel Selection).
In Figure 28.9, we illustrate the fitness distribution on the roulette wheel for fitness max-
v(p )=10
A(p )= / A
1
1
10
1
v(p )=20
A(p )= / A
2
2
5
1
v(p )=30
A(p )= / A
3
3
3
1
v(p )=40
A(p )= / A
4
4
5
2
Fig. 28.9.a: Example for fitness maximiza-
tion.
v(p )=10
A(p )= / A=
1
1
60
30
2
1
/ A
v(p )=20
A(p )= / A= / A
2
2
60 3
20 1
v(p )=30
A(p )= / A
= / A
3
3
60
6
10
1
v(p )=40
A(p )= / A
=
4
4
60
0
0A
Fig. 28.9.b: Example for normalized fitness mini-
mization.
Figure 28.9: Examples for the idea of roulette wheel selection.
imimization (Fig. 28.9.a) and normalized minimization (Fig. 28.9.b). In this example we
compute the expected proportion in which each of the four element p
1
..p
4
will occur in the
mating pool mate if the fitness values are v(p
1
) = 10, v(p
2
) = 20, v(p
3
) = 30, and v(p
4
) = 40.
Amongst others, Whitley [2929] points out that even fitness normalization as performed here
cannot overcome the drawbacks of fitness proportional selection methods. In order to clarify
these drawbacks, let us visualize the expected results of roulette wheel selection applied to
the special cases stated in Section 28.4.1.1.
Example E28.21 (Roulette Wheel Selection (Example E28.18 cntd)).
Figure 28.10 illustrates the expected number of occurrences ES(p
i
) of an individual p
i
if
roulette wheel selection is applied with replacement.
In such cases, usually parents for each offspring are selected into the mating pool, which
then is as large as the population (mps = ps = 1000). Thus we draw one thousand times a
single individual from the population pop. Each single choice is based on the proportion of
the individual fitness in the total fitness of all individuals, as defined in Equation 28.18 and
Equation 28.22.
In scenario 1 with the fitness sum
999∗998
2
= 498501, the relation ES(p
i
) = mps ∗
i
498501
holds for fitness maximization and ES(p
i
) = mps
999−i
498501
for minimization. As result
(sketched in Fig. 28.10.a), the fittest individuals produce (on average) two offspring, whereas
the worst candidate solutions will always vanish in this example.
For scenario 2 with v
2
(p
i
.x) = (i +1)
3
, the total fitness sum is approximately 2.51 10
11
and ES(p
i
) = mps
(i+1)
3
2.52 10
11
holds for maximization. The resulting expected values depicted
in Fig. 28.10.b are significantly different from those in Fig. 28.10.a.
The meaning of this is that the design of the objective functions (or the fitness assign-
ment process) has a very strong influence on the convergence behavior of the Evolutionary
Algorithm. This selection method only works well if the fitness of an individual is indeed
something like a proportional measure for the probability that it will produce better off-
spring.
The drawback becomes even clearer when comparing the two fitness functions v
3
and
v
4
. v
4
is basically the same as v
3
– just shifted upwards by 1. From a balanced selection
28.4. SELECTION 295
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.8
200 400 600
fitnessminimization
fitnessmaximization
ES(p)
i
i
200 400 600 v (p)
1
0
0
Fig. 28.10.a: v
1
(p
i
.x) = i
0
200 400 600
fitnessminimization
fitnessmaximization
ES(p)
i
i
8.12e6 6.48e7 2.17e8 v (p)
2
0
1
0.5
1.0
1.5
2.0
2.5
3.5
Fig. 28.10.b: v
1
(p
i
.x) = (i + 1)
3
fitnessmaximization
0
5
10
15
20
ES(p)
i
200 400 600 i
0.02 0.04 0.06 v (p)
3
0
0
Fig. 28.10.c: v
3
(p
i
) =
0.0001i if i < 999, 1 otherwise
0
0.5
1.0
1.5
2.0
200 400 600
fitnessmaximization
ES(p)
i
i
1.02 1.04 1.06 v (p)
4
0
0
Fig. 28.10.d: v
4
(p
i
) = 1 + v
3
(p
i
)
Figure 28.10: The number of expected offspring in roulette wheel selection.
296 28 EVOLUTIONARY ALGORITHMS
method, we would expect that it produces similar mating pools for two functions which are
that similar. However, the result is entirely different. For v
3
, the individual at index 999
is totally outstanding with its fitness of 1 (which is more than ten times the fitness of the
next-best individual). Hence, it receives many more copies in the mating pool. In the case
of v
4
, however, it is only around twice as good as the next best individual and thus, only
receives around twice as many copies. From this situation, we can draw two conclusions:
1. The number of copies that an individual receives changes significantly when the vertical
shift of the fitness function changes – even if its other characteristics remain constant.
2. In cases like v
3
, where one individual has outstanding fitness, this may degenerate the
Genetic Algorithm to a better Hill Climber, since this individual will unite most of
the slots in the mating pool on itself. If the Genetic Algorithm behaves like a Hill
Climber, its benefits, such as the ability to trace more than one possible solution at a
time, disappear.
Roulette wheel selection has a bad performance compared to other schemes like tournament
selection or ranking selection [1079]. It is mainly included here for the sake of completeness
and because it is easy to understand and suitable for educational purposes.
28.4.4 Tournament Selection
Tournament selection
31
, proposed by Wetzel [2923] and studied by Brindle [409], is one
of the most popular and effective selection schemes. Its features are well-known and have
been analyzed by a variety of researchers such as Blickle and Thiele [339, 340], Lee et al.
[1707], Miller and Goldberg [1898], Sastry and Goldberg [2400], and Oei et al. [2063]. In
tournament selection, k elements are picked from the population pop and compared with
each other in a tournament. The winner of this competition will then enter mating pool
mate. Although being a simple selection strategy, it is very powerful and therefore used in
many practical applications [81, 96, 461, 1880, 2912, 2914].
Example E28.22 (Binary Tournament Selection).
As example, consider a tournament selection (with replacement) with a tournament size of
two [2931]. For each single tournament, the contestants are chosen randomly according to a
uniform distribution and the winners will be allowed to enter the mating pool. If we assume
that the mating pool will contain about as same as many individuals as the population, each
individual will, on average, participate in two tournaments. The best candidate solution of
the population will win all the contests it takes part in and thus, again on average, contributes
approximately two copies to the mating pool. The median individual of the population is
better than 50% of its challengers but will also loose against 50%. Therefore, it will enter the
mating pool roughly one time on average. The worst individual in the population will lose
all its challenges to other candidate solutions and can only score even if competing against
itself, which will happen with probability (
1
/mps)
2
. It will not be able to reproduce in the
average case because mps ∗ (
1
/mps)
2
=
1
/mps < 1 ∀mps > 1.
Example E28.23 (Tournament Selection (Example E28.18 cntd)).
For visualization purposes, let us go back to Example E28.18 with a population of 1000
individuals p
0
..p
999
and mps = ps = 1000. Again, we assume that each individual has an
unique fitness value of v
1
(p
i
) = i, v
2
(p
i
) = (i +1)
3
, v
3
(p
i
) = 0.0001i if i < 999, 1 otherwise,
and v
4
(p
i
) = 1 +v
3
(p
i
), respectively. If we apply tournament selection with replacement in
31
http://en.wikipedia.org/wiki/Tournament_selection [accessed 2007-07-03]
28.4. SELECTION 297
this special scenario, the expected number of occurrences ES(p
i
) of an individual p
i
in the
mating pool can be computed according to Blickle and Thiele [340] as
ES(p
i
) = mps ∗
_
_
1000 −i
1000
_
k

_
1000 −i −1
1000
_
k
_
(28.23)
The absolute values of the fitness play no role in tournament selection. The only thing that
k=10
k=5
k=4
k=3
k=2
k=1
100 200 300 400 500 600 700 i 900
100 200 300 400 500 600 700 900 v (p)
1 i
1e6 8e6 3e7 6e7 1e8 2e8 3e8 7e8 v (p)
2 i
0
0
1
0
1
2
3
4
5
6
7
ES(p)
i
9
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.09 v (p)
3 i
0
1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.09 1 v (p)
4 i
Figure 28.11: The number of expected offspring in tournament selection.
matters is whether or not the fitness of one individual is higher as the fitness of another
one, not fitness difference itself. The expected numbers of offspring for the four example
cases 1 and 2 from Example E28.18 are the same. Tournament selection thus gets rid of the
problematic dependency on the relative proportions of the fitness values in roulette-wheel
style selection schemes.
Figure 28.11 depicts the expected offspring numbers for different tournament sizes k ∈
¦1, 2, 3, 4, 5, 10¦. If k = 1, tournament selection degenerates to randomly picking individuals
and each candidate solution will occur one time in the mating pool on average. With rising
k, the selection pressure increases: individuals with good fitness values create more and
more offspring whereas the chance of worse candidate solutions to reproduce decreases.
Tournament selection with replacement (TSR) is presented in Algorithm 28.11 which has
been implemented in Listing 56.14 on page 756. Tournament selection without replacement
(TSoR) [43, 1707] can be defined in two forms. In the first variant specified as Algo-
rithm 28.12, a candidate solution cannot compete against itself. This method is defined in.
In Algorithm 28.13, on the other hand, an individual may enter the mating pool at most
once.
298 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.11: mate ←− tournamentSelection
r
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: i, j, a, b: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 for i ←− 1 up to mps do
4 a ←− ⌊randomUni[0, ps)⌋
5 for j ←− 1 up to k −1 do
6 b ←− ⌊randomUni[0, ps)⌋
7 if v(pop[b]) < v(pop[a]) then a ←− b
8 mate ←− addItem(mate, pop[a])
9 return mate
Algorithm 28.12: mate ←− tournamentSelection
w
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: i, j, a, b: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to min ¦len(pop) , mps¦ do
5 a ←− ⌊randomUni[0, len(pop))⌋
6 for j ←− 1 up to min ¦len(pop) , k¦ −1 do
7 b ←− ⌊randomUni[0, ps)⌋
8 if v(pop[b]) < v(pop[a]) then a ←− b
9 mate ←− addItem(mate, pop[a])
10 pop ←− deleteItem(pop, a)
11 return mate
28.4. SELECTION 299
Algorithm 28.13: mate ←− tournamentSelection
w
(k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] k: the tournament size
Data: A: the list of contestants per tournament
Data: a: the tournament winner
Data: i, j: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to mps do
5 A ←− ()
6 for j ←− 1 up to min¦k, len(pop)¦ do
7 repeat
8 searchItemUS(a, A)¡0
9 until a ←− ⌊randomUni[0, len(pop))⌋
10 A ←− addItem(A, a)
11 a ←− minA
12 mate ←− addItem(mate, pop[a])
13 return mate
The algorithms specified here should more precisely be entitled as deterministic tour-
nament selection algorithms since the winner of the k contestants that take part in each
tournament enters the mating pool. In the non-deterministic variants, this is not necessarily
the case. There, a probability η is defined. The best individual in the tournament is selected
with probability η, the second best with probability η(1−η), the third best with probability
η(1 − η)
2
and so on. The i
th
best individual in a tournament enters the mating pool with
probability η(1−η)
i
. Algorithm 28.14 realizes this behavior for a tournament selection with
replacement. Notice that it becomes equivalent to Algorithm 28.11 on the preceding page
if η is set to 1. Besides the algorithms discussed here, a set of additional tournament-based
selection methods has been introduced by Lee et al. [1707].
300 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.14: mate ←− tournamentSelectionND
r
(η, k, pop, v, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: v: the fitness values
Input: mps: the number of individuals to be placed into the mating pool mate
Input: [implicit] η: the selection probability, η ∈ [0, 1]
Input: [implicit] k: the tournament size
Data: A: the set of tournament contestants
Data: i, j: counter variables
Output: mate: the winners of the tournaments which now form the mating pool
1 begin
2 mate ←− ()
3 pop ←− sortAsc(pop, v)
4 for i ←− 1 up to mps do
5 A ←− ()
6 for j ←− 1 up to k do
7 A ←− addItem(A, ⌊randomUni[0, ps)⌋)
8 A ←− sortAsc(A, cmp(a
1
, a
2
) ≡ (a
1
−a
2
))
9 for j ←− 0 up to len(A) −1 do
10 if (randomUni[0, 1) ≤ η) ∨ (j ≥ len(A) −1) then
11 mate ←− addItem(mate, pop[A[j]])
12 j ←− ∞
13 return mate
28.4.5 Linear Ranking Selection
Linear ranking selection [340], introduced by Baker [187] and more thoroughly discussed by
Blickle and Thiele [340], Whitley [2929], and Goldberg and Deb [1079] is another approach
for circumventing the problems of fitness proportionate selection methods. In ranking se-
lection [187, 1133, 2929], the probability of an individual to be selected is proportional to
its position (rank) in the sorted list of all individuals in the population. Using the rank
smoothes out larger differences of the fitness values and emphasizes small ones. Generally,
we can consider the conventional ranking selection method as the application of a fitness
assignment process setting the rank as fitness (which can be achieved with Pareto ranking,
for instance) and a subsequent fitness proportional selection.
In its original definition discussed in [340], linear ranking selection was used to maximize
fitness. Here, we will discuss it again for fitness minimization. In linear ranking selection with
replacement, the probability P (select(p
i
)) that the individual at rank i ∈ 0..ps is picked in
each selection decision is given in Equation 28.24 below. Notice that each individual always
receives a different rank and even if two individuals have the same fitness, they must be
assigned to different ranks [340].
P (select(p
i
)) =
1
ps
_
η

+
_
η
+
−η

_
ps −i −1
ps −1
_
(28.24)
In Equation 28.24, the best individual (at rank 0) will have the probability
η+
ps
of being
28.4. SELECTION 301
picked while the worst individual (rank ps −1) is chosen with
η

ps
chance:
P (select(p
0
)) =
1
ps
_
η

+
_
η
+
−η

_
ps −0 −1
ps −1
_
=
1
ps
_
η

+
_
η
+
−η

__
=
η
+
ps
(28.25)
P (select(p
ps−1
)) =
1
ps
_
η

+
_
η
+
−η

_
ps −(ps −1) −1
ps −1
_
(28.26)
=
1
ps
_
η

+
_
η
+
−η

_
0
ps −1
_
=
η

ps
(28.27)
It should be noted that all probabilities must add up to one, i. e., :
ps−1

i=0
P (select(p
i
)) = 1 (28.28)
1 =
ps−1

i=0
1
ps
_
η

+
_
η
+
−η

_
ps −i −1
ps −1
_
(28.29)
=
1
ps
ps−1

i=0
_
η

+
_
η
+
−η

_
ps −i −1
ps −1
_
(28.30)
=
1
ps
_
ps η

+
ps−1

i=0
_
η
+
−η

_
ps −i −1
ps −1
_
(28.31)
=
1
ps
_
ps η

+
_
η
+
−η

_
ps−1

i=0
ps −i −1
ps −1
_
(28.32)
=
1
ps
_
ps η

+
η
+
−η

ps −1
ps−1

i=0
ps −i −1
_
(28.33)
=
1
ps
_
ps η

+
η
+
−η

ps −1
_
ps (ps −1) −
ps−1

i=0
i
__
(28.34)
=
1
ps
_
ps η

+
η
+
−η

ps −1
_
ps (ps −1) −
1
2
ps (ps −1)
__
(28.35)
=
1
ps
_
ps η

+
η
+
−η

ps −1
1
2
ps (ps −1)
_
(28.36)
=
1
ps
_
ps η

+
1
2
ps
_
η
+
−η

_
_
(28.37)
=
1
ps
_
1
2
ps η

+
1
2
ps η
+
_
(28.38)
1 =
1
2
η

+
1
2
η
+
(28.39)
1 −
1
2
η

=
1
2
η
+
(28.40)
2 −η

= η
+
(28.41)
The parameters η

and η
+
must be chosen in a way that 2 − η

= η
+
and, obviously,
0 < η

and 0 < η
+
must hold as well. If the selection algorithm should furthermore make
sense, i. e., prefer individuals with a lower fitness rank over those with a higher fitness rank
(again, fitness is subject to minimization), 0 ≤ η

< η
+
≤ 2 is true.
Koza [1602] determines the slope of the linear ranking selection probability function by
a factor r
m
in his implementation of this selection scheme. According to [340], this factor
302 28 EVOLUTIONARY ALGORITHMS
can be transformed to η

and η
+
by the following equations.
η

=
2
r
m
+ 1
(28.42)
η
+
=
2r
m
r
m
+ 1
(28.43)
2 −η

= η
+
⇒ 2 −
2
r
m
+ 1
=
2r
m
r
m
+ 1
(28.44)
2 (r
m
+ 1) −2 = 2r
m
(28.45)
2r
m
= 2r
m
q.e.d. (28.46)
As mentioned, we can simply emulate this scheme by (1) first building a secondary fit-
ness function v

based on the ranks of the individuals p
i
according to the original fitness v
and Equation 28.24 and then (2) applying fitness proportionate selection as given in Algo-
rithm 28.9 on page 292. We therefore only need to set v

(p
i
) = 1 −P (select(p
i
)) (since Al-
gorithm 28.9 on page 292 minimizes fitness).
28.4.6 Random Selection
Random selection is a selection method which simply fills the mating pool with randomly
picked individual from the population. Obviously, to use this selection scheme for sur-
vival/environmental selection, i. e., to determine which individuals are allowed to produce
offspring (see Section 28.1.2.6 on page 260) makes little sense. This would turn the Evolu-
tionary Algorithm into some sort of parallel random walk, since it means to disregard all
fitness information.
However, after an actual mating pool has been filled, it may make sense to use random
selection to choose the parents for the different individuals, especially in cases where there
are many parents such as in Evolution Strategies which are introduced in Chapter 30 on
page 359.
Algorithm 28.15: mate ←− randomSelection
r
(pop, mps)
Input: pop: the list of individuals to select from
Input: [implicit] ps: the population size, ps = len(pop)
Input: mps: the number of individuals to be placed into the mating pool mate
Data: i: counter variable
Output: mate: the randomly picked parents
1 begin
2 mate ←− ()
3 for i ←− 1 up to mps do
4 mate ←− addItem(mate, pop[⌊randomUni[0, ps)⌋])
5 return mate
Algorithm 28.15 provides a specification of random selection with replacement. An
example implementation of this algorithm is given in Listing 56.15 on page 758.
28.4.7 Clearing and Simple Convergence Prevention (SCP)
In our experiments, especially in Genetic Programming [2888] and problems with discrete
objective functions [2912, 2914], we often use a very simple mechanism to prevent premature
convergence. Premature convergence, as outlined in Chapter 13, means that the optimization
process gets stuck at a local optimum and loses the ability to investigate other areas in the
28.4. SELECTION 303
search space, in particular the area where the global optimum resides. OurSCP method
is neither a fitness nor a selection algorithm, but instead more of a filter which is placed
after computing the objectives and before applying a fitness assignment process. Since our
method actively deletes individuals from the population, we placed it into this section.
The idea is simple: the more similar individuals exist in the population, the more likely
the optimization process has converged. It is not clear whether it converged to a global
optimum or to a local one. If it got stuck at a local optimum, the fraction of the population
which resides at that spot should be limited. Then, some individuals necessarily will be
located somewhere else and, if they survive, may find a path away from the local optimum
towards a different area in the search space. In case that the optimizer indeed converged to
the global optimum, such a measure does not cause problems because in the end, because
better points than the global optimum cannot be discovered anyway.
28.4.7.1 Clearing
The first one to apply an explicit limitation of the number of individuals in an area of
the search space was P´etrowski [2167, 2168] whose clearing approach is applied in each
generation and works as specified in Algorithm 28.16 where fitness is subject to minimization.
Basically, clearing divides the population of an EA into several sub-populations according
Algorithm 28.16: v

←− clearingFilter(pop, σ, k, v)
Input: pop: the list of individuals to apply clearing to
Input: σ: the clearing radius
Input: k: the nieche capacity
Input: [implicit] v: the fitness values
Input: [implicit] dist: a distance measure in the genome or phenome
Data: n: the current number of winners
Data: i, j: counter variables
Data: P: counter variables
Output: v

: the new fitness assignment
1 begin
2 v

←− fitnf
3 P ←− sortAsc(pop, v)
4 for i ←− 0 up to len(P) −1 do
5 if v(P[i]) < ∞ then
6 n ←− 1
7 for j ←− i + 1 up to len(P) −1 do
8 if (v(P[j]) < ∞) ∧ (dist(P[i], P[j]) < σ) then
9 if n < k then n ←− n + 1
10 else v

(P[j]) ←− ∞
11 return v

to a distance measure “dist” applied in the genotypic (G) or phenotypic space (X) in each
generation. The individuals of each sub-population have at most the distance σ to the
fittest individual in this niche. Then, the fitness of all but the k best individuals in such a
sub-population is set to the worst possible value. This effectively prevents that a niche can
get too crowded. Sareni and Kr¨ahenb¨ uhl [2391] showed that this method is very promising.
Singh and Deb [2503] suggest a modified clearing approach which shifts individuals that
would be cleared farther away and reevaluates their fitness.
304 28 EVOLUTIONARY ALGORITHMS
28.4.7.2 SCP
We modify this approach in two respects: We measure similarity not in form of a distance
in the search space G or the problem space X, but in the objective space Y ⊆ R
|f |
. All
individuals are compared with each other. If two have exactly the same objective values
32
,
one of them is thrown away with probability
33
c ∈ [0, 1] and does not take part in any further
comparisons. This way, we weed out similar individuals without making any assumptions
about G or X and make room in the population and mating pool for a wider diversity of
candidate solutions. For c = 0, this prevention mechanism is turned off, for c = 1, all
remaining individuals will have different objective values.
Although this approach is very simple, the results of our experiments were often
significantly better with this convergence prevention method turned on than without
it [2188, 2888, 2912, 2914]. Additionally, in none of our experiments, the outcomes were
influenced negatively by this filter, which makes it even more robust than other methods for
convergence prevention like sharing or variety preserving. Algorithm 28.17 specifies how our
simple mechanism works. It has to be applied after the evaluation of the objective values of
the individuals in the population and before any fitness assignment or selection takes place.
Algorithm 28.17: P ←− scpFilter(pop, c, f )
Input: pop: the list of individuals to apply convergence prevention to
Input: c: the convergence prevention probability, c ∈ [0, 1]
Input: [implicit] f : the set of objective function(s)
Data: i, j: counter variables
Data: p: the individual checked in this generation
Output: P: the pruned population
1 begin
2 P ←− ()
3 for i ←− 0 up to len(P) −1 do
4 p ←− pop[i]
5 for j ←− len(P) −1 down to 0 do
6 if f (p.x) = f (pop

[j].x) then
7 if randomUni[0, 1) < c then
8 P ←− deleteItem(P, j)
9 P ←− addItem(P, p)
10 return P
If an individual p occurs n times in the population or if there are n individuals with
exactly the same objective values, Algorithm 28.17 cuts down the expected number of their
occurrences ESp to
ES(p) =
n

i=1
(1 −c)
i−1
=
n−1

i=0
(1 −c)
i
=
(1 −c)
n
−1
−c
=
1 −(1 −c)
n
c
(28.47)
In Figure 28.12, we sketch the expected number of remaining instances of the individual p
after this pruning process if it occurred n times in the population before Algorithm 28.17
was applied.
From Equation 28.47 follows that even a population of infinite size which has fully con-
verged to one single value will probably not contain more than
1
c
copies of this individual
32
The exactly-the-same-criterion makes sense in combinatorial optimization and many Genetic Program-
ming problems but may easily be replaced with a limit imposed on the Euclidian distance in real-valued
optimization problems, for instance.
33
instead of defining a fixed threshold k
28.4. SELECTION 305
5 10 15 n 25
cp=0.7
cp=0.5
cp=0.3
cp=0.2
cp=0.1
0
1
2
3
4
5
6
7
ES(p)
9
Figure 28.12: The expected numbers of occurrences for different values of n and c.
after the simple convergence prevention has been applied. This threshold is also visible in
Figure 28.12.
lim
n→∞
ES(p) = lim
n→∞
1 −(1 −c)
n
c
=
1 −0
c
=
1
c
(28.48)
28.4.7.3 Discussion
In P´etrowski’s clearing approach [2167], the maximum number of individuals which can
survive in a niche was a fixed constant k and, if less than k individuals resided in a niche,
none of them would be affected. Different from that, in SCP, an expected value of the
number of individuals allowed in a niche is specified with the probability c and may be
both, exceeded or undercut.
Another difference of the approaches arises from the space in which the distance is
computed: Whereas clearing prevents the EA from concentrating too much on a certain area
in the search or problem space, SCP stops it from keeping too many individuals with equal
utility. The former approach works against premature convergence to a certain solution
structure while the latter forces the EA to “keep track” of a trail to candidate solutions
with worse fitness which may later evolve to good individuals with traits different from
the currently exploited ones. Furthermore, since our method does not need to access any
information apart from the objective values, it is independent from the search and problem
space and can, once implemented, combined with any kind of population-based optimizer.
Which of the two approaches is better has not yet been tested with comparative ex-
periments and is part of our future work. At the present moment, we assume that in
real-valued search or problem spaces, clearing should be more suitable whereas we know
from experiments using our approach only that SCP performs very good in combinatorial
problems [2188, 2914] and Genetic Programming [2888].
306 28 EVOLUTIONARY ALGORITHMS
28.5 Reproduction
An optimization algorithm uses the information gathered up to step t for creating the can-
didate solutions to be evaluated in step t + 1. There exist different methods to do so. In
Evolutionary Algorithms, the aggregated information corresponds to the population pop
and the archive archive of best individuals, if such an archive is maintained. The search
operations searchOp ∈ Op used in the Evolutionary Algorithm family are called reproduc-
tion operation, inspired by the biological procreation mechanisms
34
of mother nature [2303].
There are four basic operations:
1. Creation simply creates a new genotype without any ancestors or heritage.
2. Duplication directly copies an existing genotype without any modification.
3. Mutation in Evolutionary Algorithms corresponds to small, random variations in the
genotype of an individual, exactly natural mutation
35
.
4. Like in sexual reproduction, recombination
36
combines two parental genotypes to a
new genotype including traits from both elders.
In the following, we will discuss these operations in detail and provide general definitions
form them. When an Evolutionary Algorithm starts, no information about the search space
has been gathered yet. Hence, unless seeding is applied (see Paragraph 28.1.2.2.2, we cannot
use existing candidate solutions to derive new ones and search operations with an arity higher
than zero cannot be applied. Creation is thus used to fill the initial population pop(t = 1).
Definition D28.11 (Creation). The creation operation create
__
is used to produce a new
genotype g ∈ G with a random configuration.
g = create
__
⇒g ∈ G (28.49)
For creating the initial population of the size ps, we define the function createPop(ps) in
Algorithm 28.18.
Algorithm 28.18: pop ←− createPop(ps)
Input: ps: the number of individuals in the new population
Input: [implicit] create: the creation operator
Data: i: a counter variable
Output: pop: the new population of randomly created individuals (len(pop) = s)
1 begin
2 pop ←− ()
3 for i ←− 0 up to ps −1 do
4 p ←− ∅
5 p.g ←− create
__
6 pop ←− addItem(pop, p)
7 return pop
Duplication is just a placeholder for copying an element of the search space, i. e., it is
what occurs when neither mutation nor recombination are applied. It is useful to increase
the share of a given type of individual in a population.
34
http://en.wikipedia.org/wiki/Reproduction [accessed 2007-07-03]
35
http://en.wikipedia.org/wiki/Mutation [accessed 2007-07-03]
36
http://en.wikipedia.org/wiki/Sexual_reproduction [accessed 2008-03-17]
28.5. REPRODUCTION 307
Definition D28.12 (Duplication). The duplication operation duplicate : G →G is used
to create an exact copy of an existing genotype g ∈ G.
g = duplicate(g) ∀g ∈ G (28.50)
In most EAs, a unary search operation, mutation, is provided. This operation takes one
genotype as argument and modifies it. The way this modification is performed is application-
dependent. It may happen in a randomized or in a deterministic fashion. Together with
duplication, mutation corresponds to the asexual reproduction in nature.
Definition D28.13 (Mutation). The mutation operation mutate : G → G is used to
create a new genotype g
n
∈ G by modifying an existing one.
g
n
= mutate(g) : g ∈ G ⇒ g
n
∈ G (28.51)
Like mutation, the recombination is a search operation inspired by natural reproduction. Its
role model is sexual reproduction. The goal of implementing a recombination operation is to
provide the search process with some means to join beneficial features of different candidate
solutions. This may happen in a deterministic or randomized fashion.
Notice that the term recombination is more general than crossover since it stands for
arbitrary search operations that combines the traits of two individuals. Crossover, however,
is only used if the elements search space G are linear representations. Then, it stands for
exchanging parts of these so-called strings.
Definition D28.14 (Recombination). The recombination
37
operation recombine : G
G → G is used to create a new genotype g
n
∈ G by combining the features of two existing
ones.
g
n
= recombine(g
a
, g
b
) : g
a
, g
b
∈ G ⇒ g
n
∈ G (28.52)
It is not unusual to chain the application of search operators, i. e., to perform something
like mutate(recombine(g
1
, g
2
)). The four operators are altogether used to reproduce whole
populations of individuals.
Definition D28.15 (reproducePop). The population reproduction operation pop =
reproducePop(mate, ps) is used to create a new population pop of ps individuals by applying
the reproduction operations to the mating pool mate.
In Algorithm 28.19, we give a specification on how Evolutionary Algorithms usually would
apply the reproduction operations defined so far to create a new population of ps individuals.
This procedure applies mutation and crossover at specific rates mr ∈ [0, 1] and cr ∈ [0, 1],
respectively.
37
http://en.wikipedia.org/wiki/Recombination [accessed 2007-07-03]
308 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.19: pop ←− reproducePopEA(mate, ps, mr, cr)
Input: mate: the mating pool
Input: [implicit] mps: the mating pool size
Input: ps: the number of individuals in the new population
Input: mr: the mutation rate
Input: cr: the crossover rate
Input: [implicit] mutate, recombine: the mutation and crossover operators
Data: i, j: counter variables
Output: pop: the new population (len(pop) = ps)
1 begin
2 pop ←− ()
3 for i ←− 0 up to ps do
4 p ←− duplicate(mate[i mod mps])
5 if (randomUni[0, 1) < cr) ∧ (mps > 1) then
6 repeat
7 j ←− ⌊randomUni[0, mps)⌋
8 until j = (i mod mps)
9 p.g ←− recombine(p.g, mate[j].g)
10 if randomUni[0, 1) < mr then p.g ←− mutate(p.g)
11 pop ←− addItem(pop, p)
12 return pop
28.6 Maintaining a Set of Non-Dominated/Non-Prevailed
Individuals
Most multi-objective optimization algorithms return a set X of best solutions x discovered
instead of a single individual. Many optimization techniques also internally keep track of
the set of best candidate solutions encountered during the search process. In Simulated An-
nealing, for instance, it is quite possible to discover an optimal element x

and subsequently
depart from it to a local optimum x

l
. Therefore, optimizers normally carry a list of the
non-prevailed candidate solutions ever visited with them.
In scenarios where the search space G differs from the problem space X it often makes
more sense to store the list of individual records P instead of just keeping a set x of the
best phenotypes X. Since the elements of the search space are no longer required at the
end of the optimization process, we define the simple operation “extractPhenotypes” which
extracts them from a set of individuals P.
∀x ∈ extractPhenotypes(P) ⇒ ∃p ∈ P : x = p.x (28.53)
28.6.1 Updating the Optimal Set
Whenever a new individual p is created, the set P holding the best individuals known so far
may change. It is possible that the new candidate solution must be included in the optimal
set or even prevails some of the phenotypes already contained therein which then must be
removed.
Definition D28.16 (updateOptimalSet). The function “updateOptimalSet” updates a
set of elements P
old
with the new individual p
new
.x. It uses knowledge of the prevalence
28.6. MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS309
relation ≺ and the corresponding comparator function cmp
f
.
P
new
= updateOptimalSet(P
old
, p
new
, , ) P
old
, P
new
⊆ GX, p
new
∈ GX :
∀p
1
∈ P
old
∃p
2
∈ P
old
: p
2
.x ≺ p
1
.x ⇒ P
new
⊆ P
old
∪ ¦p
new
¦ ∧
∀p
1
∈ P
new
∃p
2
∈ P
new
: p
2
.x ≺
(28.54)
We define two equivalent approaches in Algorithm 28.20 and Algorithm 28.21 which perform
the necessary operations. Algorithm 28.20 creates a new, empty optimal set and successively
inserts non-prevailed elements whereas Algorithm 28.21 removes all elements which are pre-
vailed by the new individual p
new
from the old set P
old
.
Algorithm 28.20: P
new
←− updateOptimalSet(P
old
, p
new
, cmp
f
)
Input: P
old
: a set of non-prevailed individuals as known before the creation of p
new
Input: p
new
: a new individual to be checked
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Output: P
new
: the set updated with the knowledge of p
new
1 begin
2 P
new
←− ∅
3 foreach p
old
∈ P
old
do
4 if (p
new
.x ≺ p
old
.x) then
5 P
new
←− P
new
∪ ¦p
old
¦
6 if p
old
.x ≺ p
new
.x then return P
old
7 return P
new
∪ ¦p
new
¦
Algorithm 28.21: P
new
←− updateOptimalSet(P
old
, p
new
, cmp
f
) (
nd
Version)
Input: P
old
: a set of non-prevailed individuals as known before the creation of p
new
Input: p
new
: a new individual to be checked
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Output: P
new
: the set updated with the knowledge of p
new
1 begin
2 P
new
←− P
old
3 foreach p
old
∈ P
old
do
4 if p
new
.x ≺ p
old
.x then
5 P
new
←− P
new
¸ ¦p
old
¦
6 else if p
old
.x ≺ p
new
.x then
7 return P
old
8 return P
new
∪ ¦p
new
¦
Especially in the case of Evolutionary Algorithms, not a single new element is created
in each generation but a set P. Let us define the operation “updateOptimalSetN” for this
purpose. This operation can easily be realized by iteratively applying “updateOptimalSet”,
as shown in Algorithm 28.22.
28.6.2 Obtaining Non-Prevailed Elements
The function “updateOptimalSet” helps an optimizer to build and maintain a list of optimal
310 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.22: P
new
←− updateOptimalSetN(P
old
, P, cmp
f
)
Input: P
old
: the old set of non-prevailed individuals
Input: P: the set of new individuals to be checked for non-prevailedness
Input: cmp
f
: the comparator function representing the domainance/prevalence
relation
Data: p: an individual from P
Output: P
new
: the updated set
1 begin
2 P
new
←− P
old
3 foreach p ∈ P do P
new
←− updateOptimalSet(P
new
, p, cmp
f
)
4 return P
new
individuals. When the optimization process finishes, the “extractPhenotypes” can then be
used to obtain the non-prevailed elements of the problem space and return them to the
user. However, not all optimization methods maintain an non-prevailed set all the time.
When they terminate, they have to extracts all optimal elements the set of individuals pop
currently known.
Definition D28.17 (extractBest). The function “extractBest” function extracts a set P
of non-prevailed (non-dominated) individuals from any given set of individuals pop according
to the comparator cmp
f
.
∀P ⊆ pop ⊆ GX, P = extractBest(pop, cmp
f
) ⇒∀p
1
∈ P ,∃p
2
∈ pop : p
2
.x ≺ p
1
.x
(28.55)
Algorithm 28.23 defines one possible realization of “extractBest”. By the way, this approach
could also be used for updating an optimal set.
updateOptimalSet(P
old
, p
new
, cmp
f
) ≡ extractBest(P
old
cupp
new
, cmp
f
) (28.56)
Algorithm 28.23: P ←− extractBest(pop, cmp
f
)
Input: pop: the list to extract the non-prevailed individuals from
Data: p
any
, p
chk
: candidate solutions tested for supremacy
Data: i, j: counter variables
Output: P: the non-prevailed subset extracted from pop
1 begin
2 P ←− pop
3 for i ←− len(P) −1 down to 1 do
4 for j ←− i −1 down to 0 do
5 if P[i] ≺ P[j] then
6 P ←− deleteItemPj
7 i ←− i −1
8 else if P[j] ≺ P[i] then
9 P ←− deleteItemPi
10 return listToSetP
28.6. MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS311
28.6.3 Pruning the Optimal Set
In some optimization problems, there may be very many if not infinite many optimal indi-
viduals. The set X of the best discovered candidate solutions computed by the optimization
algorithms, however, cannot grow infinitely because we only have limited memory. The same
holds for the archives archive of non-dominated individuals maintained by several MOEAs.
Therefore, we need to perform an action called pruning which reduces the size of the set of
non-prevailed individuals to a given limit k [605, 1959, 2645].
There exists a variety of possible pruning operations. Morse [1959] and Taboada and Coit
[2644], for instance, suggest to use clustering algorithms
38
for this purpose. In principle, also
any combination of the fitness assignment and selection schemes discussed in Chapter 28
would do. It is very important that the loss of generality during a pruning operation is
minimized. Fieldsend et al. [912], for instance, point out that if the extreme elements of
the optimal frontier are lost, the resulting set may only represent a small fraction of the
optima that could have been found without pruning. Instead of working on the set X of best
discovered candidate solutions x, we again base our approach on a set of individuals P and
we define:
Definition D28.18 (pruneOptimalSet). The pruning operation “pruneOptimalSet” re-
duces the size of a set P
old
of individuals to fit to a given upper boundary k.
∀P
new
⊆ P
old
subseteqGX, k ∈ N
1
: P
new
= pruneOptimalSet(P
old
, k) ⇒[P
new
[ ≤ k
(28.57)
28.6.3.1 Pruning via Clustering
Algorithm 28.24 uses clustering to provide the functionality specified in this definition and
thereby realizes the idea of Morse [1959] and Taboada and Coit [2644]. Basically, any given
clustering algorithm could be used as replacement for “cluster” – see ?? on page ?? for more
information on clustering.
Algorithm 28.24: P
new
←− pruneOptimalSet
c
(P
old
, k)
Input: P
old
: the set to be pruned
Input: k: the maximum size allowed for the set (k > 0)
Input: [implicit] cluster: the clustering algorithm to be used
Input: [implicit] nucleus: the function used to determine the nuclei of the clusters
Data: B: the set of clusters obtained by the clustering algorithm
Data: b: a single cluster b ∈ B
Output: P
new
: the pruned set
1 begin
// obtain k clusters
2 B ←− cluster(P
old
)
3 P
new
←− ∅
4 foreach b ∈ B do P
new
←− P
new
∪ nucleus(b)
5 return P
new
38
You can find a discussion of clustering algorithms in ??.
312 28 EVOLUTIONARY ALGORITHMS
28.6.3.2 Adaptive Grid Archiving
Let us discuss the adaptive grid archiving algorithm (AGA) as example for a more sophis-
ticated approach to prune the set of non-dominated / non-prevailed individuals. AGA has
been introduced for the Evolutionary Algorithm PAES
39
by Knowles and Corne [1552] and
uses the objective values (computed by the set of objective functions f ) directly. Hence, it
can treat the individuals as [f [-dimensional vectors where each dimension corresponds to one
objective function f ∈ f . This [f [-dimensional objective space Y is divided in a grid with d
divisions in each dimension. Its span in each dimension is defined by the corresponding min-
imum and maximum objective values. The individuals with the minimum/maximum values
are always preserved. This circumvents the phenomenon of narrowing down the optimal
set described by Fieldsend et al. [912] and distinguishes the AGA approach from clustering-
based methods. Hence, it is not possible to define maximum set sizes k which are smaller
than 2[f [. If individuals need to be removed from the set because it became too large, the
AGA approach removes those that reside in regions which are the most crowded.
The original sources outline the algorithm basically with with descriptions and defini-
tions. Here, we introduce a more or less trivial specification in Algorithm 28.25 on the next
page and Algorithm 28.26 on page 314.
The function “divideAGA” is internally used to perform the grid division. It transforms
the objective values of each individual to grid coordinates stored in the array lst. Further-
more, “divideAGA” also counts the number of individuals that reside in the same coordi-
nates for each individual and makes it available in cnt. It ensures the preservation of border
individuals by assigning a negative cnt value to them. This basically disables their later
disposal by the pruning algorithm “pruneOptimalSetAGA” since “pruneOptimalSetAGA”
deletes the individuals from the set P
old
that have largest cnt values first.
39
PAES is discussed in ?? on page ??
28.6. MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS313
Algorithm 28.25: (P
l
, lst, cnt) ←− divideAGA(P
old
, d)
Input: P
old
: the set to be pruned
Input: d: the number of divisions to be performed per dimension
Input: [implicit] f : the set of objective functions
Data: i, j: counter variables
Data: mini, maxi, mul: temporary stores
Output: (P
l
, lst, cnt): a tuple containing the list representation P
l
of P
old
, a list lst
assigning grid coordinates to the elements of P
l
and a list cnt containing
the number of elements in those grid locations
1 begin
2 mini ←− createList([f [, ∞)
3 maxi ←− createList([f [, −∞)
4 for i ←− [f [ −1 down to 0 do
5 mini[i −1] ←− min¦f
i
(p.x) ∀p ∈ P
old
¦
6 maxi[i −1] ←− max¦f
i
(p.x) ∀p ∈ P
old
¦
7 mul ←− createList([f [, 0)
8 for i ←− [f [ −1 down to 0 do
9 if maxi[i] ,= mini[i] then
10 mul[i] ←−
d
maxi[i] −mini[i]
11 else
12 maxi[i] ←− maxi[i] + 1
13 mini[i] ←− mini[i] −1
14 P
l
←− setToListP
old
15 lst ←− createList(len(P
l
) , ∅)
16 cnt ←− createList(len(P
l
) , 0)
17 for i ←− len(P
l
) −1 down to 0 do
18 lst[i] ←− createList([f [, 0)
19 for j ←− 1 up to [f [ do
20 if (f
j
(P
l
[i]) ≤ mini[j −1]) ∨ (f
j
(P
l
[i]) ≥ maxi[j −1]) then
21 cnt[i] ←− cnt[i] −2
22 lst[i][j −1] ←− ⌊(f
j
(P
l
[i]) −mini[j −1]) ∗ mul[j −1]⌋
23 if cnt[i] > 0 then
24 for j ←− i + 1 up to len(P
l
) −1 do
25 if lst[i] = lst[j] then
26 cnt[i] ←− cnt[i] + 1
27 if cnt[j] > 0 then cnt[j] ←− cnt[j] + 1
28 return (P
l
, lst, cnt)
314 28 EVOLUTIONARY ALGORITHMS
Algorithm 28.26: P
new
←− pruneOptimalSetAGA(P
old
, d, k)
Input: Pold: the set to be pruned
Input: d: the number of divisions to be performed per dimension
Input: k: the maximum size allowed for the set (k ≥ 2[f [)
Input: [implicit] f : the set of objective functions
Data: i: a counter variable
Data: P
l
: the list representation of P
old
Data: lst: a list assigning grid coordinates to the elements of P
l
Data: cnt: the number of elements in the grid locations defined in lst
Output: P
new
: the pruned set
1 begin
2 if len(P
old
) ≤ k then return P
old
3 (P
l
, lst, cnt) ←− divideAGA(P
old
, d)
4 while len(P
l
) > k do
5 idx ←− 0
6 for i ←− len(P
l
) −1 down to 1 do
7 if cnt[i] > cnt[idx] then idx ←− i
8 for i ←− len(P
l
) −1 down to 0 do
9 if (lst[i] = lst[idx]) ∧ (cnt[i] > 0) then cnt[i] ←− cnt[i] −1
10 P
l
←− deleteItemP
l
idx
11 cnt ←− deleteItemcntidx
12 lst ←− deleteItemlstidx
13 return listToSetP
l
28.7 General Information on Evolutionary Algorithms
28.7.1 Applications and Examples
Table 28.4: Applications and Examples of Evolutionary Algorithms.
Area References
Art [1174, 2320, 2445]; see also Tables 29.1 and 31.1
Astronomy [480]
Chemistry [587, 678, 2676]; see also Tables 29.1, 30.1, 31.1, and 32.1
Combinatorial Problems [308, 354, 360, 371, 400, 444, 445, 451, 457, 491, 564, 567,
619, 644, 649, 690, 802, 807, 808, 810, 812, 820, 1011–1013,
1117, 1145, 1148, 1149, 1177, 1290, 1311, 1417, 1429, 1437,
1486, 1532, 1618, 1653, 1696, 1730, 1803, 1822–1825, 1830,
1835, 1859, 1871, 1884, 1961, 2025, 2055, 2105, 2144, 2162,
2188, 2189, 2245, 2335, 2448, 2499, 2535, 2633, 2663, 2669,
2772, 2811, 2862, 3027, 3064, 3071, 3101]; see also Tables
29.1, 30.1, 31.1, 33.1, 34.1, and 35.1
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 315
Computer Graphics [1452, 1577, 1661, 2096, 2487, 2488, 2651]; see also Tables
29.1, 30.1, 31.1, and 33.1
Control [1710]; see also Tables 29.1, 31.1, 32.1, and 35.1
Cooperation and Teamwork [1996, 2424]; see also Tables 29.1 and 31.1
Data Compression see Table 31.1
Data Mining [144, 360, 480, 1048, 1746, 2213, 2503, 2644, 2645, 2676,
3018]; see also Tables 29.1, 31.1, 32.1, 34.1, and 35.1
Databases [371, 567, 1145, 1696, 1822, 1824, 1825, 2633, 3064, 3071];
see also Tables 29.1, 31.1, and 35.1
Decision Making see Table 31.1
Digital Technology [1479]; see also Tables 29.1, 30.1, 31.1, 34.1, and 35.1
Distributed Algorithms and Sys-
tems
[89, 112, 125, 156, 202, 328, 352–354, 404, 439, 545, 605,
663, 669, 702, 705, 786, 811, 843, 871, 882, 963, 1006, 1007,
1094, 1225, 1292, 1298, 1532, 1574, 1633, 1718, 1733, 1749,
1800, 1835, 1838, 1863, 1883, 1995, 2020, 2058, 2071, 2100,
2387, 2425, 2440, 2448, 2470, 2493, 2628, 2644, 2645, 2663,
2694, 2791, 2862, 2864, 2893, 2903, 3027, 3061, 3085]; see
also Tables 29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
E-Learning [1158, 1298, 2628, 2763]; see also Table 29.1
Economics and Finances [549, 585, 1718, 1825, 2622]; see also Tables 29.1, 31.1, 33.1,
and 35.1
Engineering [112, 156, 267, 517, 535, 567, 579, 605, 644, 669, 677, 777,
786, 787, 853, 871, 882, 953, 1094, 1225, 1310, 1323, 1479,
1486, 1508, 1574, 1710, 1717, 1733, 1746, 1749, 1830, 1835,
1838, 1883, 2058, 2059, 2083, 2111, 2144, 2189, 2320, 2364,
2425, 2448, 2468, 2499, 2535, 2644, 2645, 2663, 2677, 2693,
2694, 2743, 2799, 2858, 2861, 2862, 2864]; see also Tables
29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
Function Optimization [742, 795, 853, 855, 1018, 1523, 1727, 1927, 1986, 2099, 2210,
2799]; see also Tables 29.1, 30.1, 32.1, 33.1, 34.1, and 35.1
Games [2350]; see also Tables 29.1, 31.1, and 32.1
Graph Theory [39, 63, 67, 89, 112, 202, 328, 329, 353, 354, 439, 644, 669,
786, 804, 843, 882, 963, 1007, 1094, 1225, 1290, 1486, 1532,
1574, 1733, 1749, 1800, 1835, 1863, 1883, 1934, 1995, 2020,
2025, 2058, 2100, 2144, 2189, 2237, 2387, 2425, 2440, 2448,
2470, 2493, 2663, 2677, 2694, 2772, 2791, 2864, 3027, 3085];
see also Tables 29.1, 30.1, 31.1, 33.1, 34.1, and 35.1
Healthcare [567, 2021, 2537]; see also Tables 29.1, 31.1, and 35.1
Image Synthesis [291, 777, 2445, 2651]
Logistics [354, 444, 445, 451, 517, 579, 619, 802, 808, 810, 812, 820,
1011, 1013, 1148, 1149, 1177, 1429, 1653, 1730, 1803, 1823,
1830, 1859, 1869, 1884, 1961, 2059, 2111, 2162, 2188, 2189,
2245, 2669, 2811, 2912]; see also Tables 29.1, 30.1, 33.1, and
34.1
Mathematics [376, 377, 550, 1217, 1661, 2350, 2407, 2677]; see also Tables
29.1, 30.1, 31.1, 32.1, 33.1, 34.1, and 35.1
Military and Defense [439, 1661, 1872]; see also Table 34.1
Motor Control [489]; see also Tables 33.1 and 34.1
Multi-Agent Systems [967, 1661, 2144, 2268, 2693, 2766]; see also Tables 29.1, 31.1,
and 35.1
Multiplayer Games see Table 31.1
Physics [407, 678]; see also Tables 29.1, 31.1, 32.1, and 34.1
316 28 EVOLUTIONARY ALGORITHMS
Prediction [144, 1045]; see also Tables 29.1, 30.1, 31.1, and 35.1
Security [404, 540, 871, 1486, 1507, 1508, 1661, 1749]; see also Tables
29.1, 31.1, and 35.1
Software [703, 842, 1486, 1508, 1791, 1934, 1995, 2213, 2677, 2863,
2887]; see also Tables 29.1, 30.1, 31.1, 33.1, and 34.1
Sorting [1233]; see also Table 31.1
Telecommunication [820]
Testing [2863]; see also Table 31.3
Theorem Proofing and Automatic
Verification
see Table 29.1
Water Supply and Management [1268, 1986]
Wireless Communication [39, 63, 67, 68, 155, 644, 787, 1290, 1486, 1717, 1934, 2144,
2189, 2237, 2364, 2677]; see also Tables 29.1, 31.1, 33.1, 34.1,
and 35.1
28.7.2 Books
1. Handbook of Evolutionary Computation [171]
2. Variants of Evolutionary Algorithms for Real-World Applications [567]
3. New Ideas in Optimization [638]
4. Multiobjective Problem Solving from Nature – From Concepts to Applications [1554]
5. Natural Intelligence for Scheduling, Planning and Packing Problems [564]
6. Advances in Evolutionary Computing – Theory and Applications [1049]
7. Bio-inspired Algorithms for the Vehicle Routing Problem [2162]
8. Evolutionary Computation 1: Basic Algorithms and Operators [174]
9. Nature-Inspired Algorithms for Optimisation [562]
10. Telecommunications Optimization: Heuristic and Adaptive Techniques [640]
11. Hybrid Evolutionary Algorithms [1141]
12. Success in Evolutionary Computation [3014]
13. Computational Intelligence in Telecommunications Networks [2144]
14. Evolutionary Design by Computers [272]
15. Evolutionary Computation in Economics and Finance [549]
16. Parameter Setting in Evolutionary Algorithms [1753]
17. Evolutionary Computation in Dynamic and Uncertain Environments [3019]
18. Applications of Multi-Objective Evolutionary Algorithms [597]
19. Introduction to Evolutionary Computing [865]
20. Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery: Implica-
tions in Business, Science and Engineering [563]
21. Evolutionary Computation [847]
22. Evolutionary Computation [863]
23. Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8]
24. Evolutionary Computation 2: Advanced Algorithms and Operators [175]
25. Evolutionary Algorithms for Solving Multi-Objective Problems [599]
26. Genetic and Evolutionary Computation for Image Processing and Analysis [466]
27. Evolutionary Computation: The Fossil Record [939]
28. Swarm Intelligence: Collective, Adaptive [1524]
29. Parallel Evolutionary Computations [2015]
30. Evolutionary Computation in Practice [3043]
31. Design by Evolution: Advances in Evolutionary Design [1234]
32. New Learning Paradigms in Soft Computing [1430]
33. Designing Evolutionary Algorithms for Dynamic Environments [1956]
34. How to Solve It: Modern Heuristics [1887]
35. Electromagnetic Optimization by Genetic Algorithms [2250]
36. Advances in Metaheuristics for Hard Optimization [2485]
37. Soft Computing: Integrating Evolutionary, Neural, and Fuzzy Systems [2685]
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 317
38. The Art of Artificial Evolution: A Handbook on Evolutionary Art and Music [2320]
39. Evolutionary Optimization [2394]
40. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis
and Design [460]
41. Multi-Objective Optimization in Computational Intelligence: Theory and Practice [438]
42. Computer Simulation in Genetics [659]
43. Evolutionary Computation: A Unified Approach [715]
44. Genetic Algorithms in Search, Optimization, and Machine Learning [1075]
45. Evolutionary Computation in Data Mining [1048]
46. Introduction to Stochastic Search and Optimization [2561]
47. Advances in Evolutionary Algorithms [1581]
48. Self-Adaptive Heuristics for Evolutionary Computation [1617]
49. Representations for Genetic and Evolutionary Algorithms [2338]
50. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Pro-
gramming, Genetic Algorithms [167]
51. Search Methodologies – Introductory Tutorials in Optimization and Decision Support Tech-
niques [453]
52. Data Mining and Knowledge Discovery with Evolutionary Algorithms [994]
53. Evolution¨ are Algorithmen [2879]
54. Advances in Evolutionary Algorithms – Theory, Design and Practice [41]
55. Evolutionary Algorithms in Molecular Design [587]
56. Multi-Objective Optimization Using Evolutionary Algorithms [740]
57. Evolutionary Dynamics in Time-Dependent Environments [2941]
58. Evolutionary Computation: Theory and Applications [3025]
59. Evolutionary Computation [2987]
28.7.3 Conferences and Workshops
Table 28.5: Conferences and Workshops on Evolutionary Algorithms.
GECCO: Genetic and Evolutionary Computation Conference
History: 2011/07: Dublin, Ireland, see [841]
2010/07: Portland, OR, USA, see [30, 31]
2009/07: Montr´eal, QC, Canada, see [2342, 2343]
2008/07: Atlanta, GA, USA, see [585, 1519, 1872, 2268, 2537]
2007/07: London, UK, see [905, 2579, 2699, 2700]
2006/07: Seattle, WA, USA, see [1516, 2441]
2005/06: Washington, DC, USA, see [27, 304, 2337, 2339]
2004/06: Seattle, WA, USA, see [751, 752, 1515, 2079, 2280]
2003/07: Chicago, IL, USA, see [484, 485, 2578]
2002/07: New York, NY, USA, see [224, 478, 1675, 1790, 2077]
2001/07: San Francisco, CA, USA, see [1104, 2570]
2000/07: Las Vegas, NV, USA, see [1452, 2155, 2928, 2935, 2986]
1999/07: Orlando, FL, USA, see [211, 482, 2093, 2502]
CEC: IEEE Conference on Evolutionary Computation
History: 2011/06: New Orleans, LA, USA, see [2027]
2010/07: Barcelona, Catalonia, Spain, see [1354]
2009/05: Trondheim, Norway, see [1350]
318 28 EVOLUTIONARY ALGORITHMS
2008/06: Hong Kong (Xi¯anggˇ ang), China, see [1889]
2007/09: Singapore, see [1343]
2006/07: Vancouver, BC, Canada, see [3033]
2005/09: Edinburgh, Scotland, UK, see [641]
2004/06: Portland, OR, USA, see [1369]
2003/12: Canberra, Australia, see [2395]
2002/05: Honolulu, HI, USA, see [944]
2001/05: Gangnam-gu, Seoul, Korea, see [1334]
2000/07: La Jolla, CA, USA, see [1333]
1999/07: Washington, DC, USA, see [110]
1998/05: Anchorage, AK, USA, see [2496]
1997/04: Indianapolis, IN, USA, see [173]
1996/05: Nagoya, Japan, see [1445]
1995/11: Perth, WA, Australia, see [1361]
1994/06: Orlando, FL, USA, see [1891]
EvoWorkshops: Real-World Applications of Evolutionary Computing
History: 2011/04: Torino, Italy, see [788]
2009/04: T¨ ubingen, Germany, see [1052]
2008/05: Naples, Italy, see [1051]
2007/04: Val`encia, Spain, see [1050]
2006/04: Budapest, Hungary, see [2341]
2005/03: Lausanne, Switzerland, see [2340]
2004/04: Coimbra, Portugal, see [2254]
2003/04: Colchester, Essex, UK, see [2253]
2002/04: Kinsale, Ireland, see [465]
2001/04: Lake Como, Milan, Italy, see [343]
2000/04: Edinburgh, Scotland, UK, see [464]
1999/05: G¨ oteborg, Sweden, see [2197]
1998/04: Paris, France, see [1310]
PPSN: Conference on Parallel Problem Solving from Nature
History: 2012/09: Taormina, Italy, see [2675]
2010/09: Krak´ow, Poland, see [2412, 2413]
2008/09: Birmingham, UK and Dortmund, North Rhine-Westphalia, Germany, see
[2354, 3028]
2006/09: Reykjavik, Iceland, see [2355]
2002/09: Granada, Spain, see [1870]
2000/09: Paris, France, see [2421]
1998/09: Amsterdam, The Netherlands, see [866]
1996/09: Berlin, Germany, see [2818]
1994/10: Jerusalem, Israel, see [693]
1992/09: Brussels, Belgium, see [1827]
1990/10: Dortmund, North Rhine-Westphalia, Germany, see [2438]
WCCI: IEEE World Congress on Computational Intelligence
History: 2012/06: Brisbane, QLD, Australia, see [1411]
2010/07: Barcelona, Catalonia, Spain, see [1356]
2008/06: Hong Kong (Xi¯anggˇ ang), China, see [3104]
2006/07: Vancouver, BC, Canada, see [1341]
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 319
2002/05: Honolulu, HI, USA, see [1338]
1998/05: Anchorage, AK, USA, see [1330]
1994/06: Orlando, FL, USA, see [1324]
EMO: International Conference on Evolutionary Multi-Criterion Optimization
History: 2011/04: Ouro Preto, MG, Brazil, see [2654]
2009/04: Nantes, France, see [862]
2007/03: Matsushima, Sendai, Japan, see [2061]
2005/03: Guanajuato, M´exico, see [600]
2003/04: Faro, Portugal, see [957]
2001/03: Z¨ urich, Switzerland, see [3099]
AE/EA: Artificial Evolution
History: 2011/10: Angers, France, see [2586]
2009/10: Strasbourg, France, see [608]
2007/10: Tours, France, see [1928]
2005/10: Lille, France, see [2657]
2003/10: Marseilles, France, see [1736]
2001/10: Le Creusot, France, see [606]
1999/11: Dunkerque, France, see [948]
1997/10: Nˆımes, France, see [1191]
1995/09: Brest, France, see [75]
1994/09: Toulouse, France, see [74]
HIS: International Conference on Hybrid Intelligent Systems
See Table 10.2 on page 128.
ICANNGA: International Conference on Adaptive and Natural Computing Algorithms
History: 2011/04: Ljubljana, Slovenia, see [2589]
2009/04: Kuopio, Finland, see [1564]
2007/04: Warsaw, Poland, see [257, 258]
2005/03: Coimbra, Portugal, see [2297]
2003/04: Roanne, France, see [2142]
2001/04: Prague, Czech Republic, see [1641]
1999/04: Protoroˇz, Slovenia, see [801]
1997/04: Vienna, Austria, see [2527]
1995/04: Al`es, France, see [2141]
1993/04: Innsbruck, Austria, see [69]
ICCI: IEEE International Conference on Cognitive Informatics
See Table 10.2 on page 128.
ICNC: International Conference on Advances in Natural Computation
History: 2012/05: Ch´ ongq`ıng, China, see [574]
2011/07: Sh`anghˇ ai, China, see [1379]
2010/08: Y¯ant´ai, Sh¯and¯ong, China, see [1355]
2009/08: Ti¯anj¯ın, China, see [2848]
2008/10: J`ı’n´ an, Sh¯and¯ong, China, see [1150]
2007/08: Hˇaikˇ ou, Hˇain´an, China, see [1711]
2006/09: X¯ı’¯an, Shˇanx¯ı, China, see [1443, 1444]
2005/08: Ch´ angsh¯a, H´ un´an, China, see [2850–2852]
320 28 EVOLUTIONARY ALGORITHMS
SEAL: International Conference on Simulated Evolution and Learning
See Table 10.2 on page 130.
BIOMA: International Conference on Bioinspired Optimization Methods and their Applications
History: 2012/05: Bohinj, Slovenia, see [349]
2010/05: Ljubljana, Slovenia, see [921]
2008/10: Ljubljana, Slovenia, see [920]
2006/10: Ljubljana, Slovenia, see [919]
2004/10: Ljubljana, Slovenia, see [918]
BIONETICS: International Conference on Bio-Inspired Models of Network, Information, and Com-
puting Systems
See Table 10.2 on page 130.
FEA: International Workshop on Frontiers in Evolutionary Algorithms
History: 2005/07: Salt Lake City, UT, USA, see [2379]
2003/09: Cary, NC, USA, see [546]
2002/03: Research Triangle Park, NC, USA, see [508]
2000/02: Atlantic City, NJ, USA, see [2853]
1998/10: Research Triangle Park, NC, USA, see [2293]
1997/03: Research Triangle Park, NC, USA, see [2475]
ICARIS: International Conference on Artificial Immune Systems
See Table 10.2 on page 130.
MIC: Metaheuristics International Conference
See Table 25.2 on page 227.
MICAI: Mexican International Conference on Artificial Intelligence
See Table 10.2 on page 131.
NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization
See Table 10.2 on page 131.
SMC: International Conference on Systems, Man, and Cybernetics
See Table 10.2 on page 131.
WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications)
See Table 10.2 on page 131.
CIMCA: International Conference on Computational Intelligence for Modelling, Control and Au-
tomation
History: 2006/11: Sydney, NSW, Australia, see [1923]
2005/11: Vienna, Austria, see [1370]
CIS: International Conference on Computational Intelligence and Security
History: 2009/12: Bˇeij¯ıng, China, see [1393]
2008/12: S¯ uzh¯ou, Ji¯ angs¯ u, China, see [1392]
2007/12: Harbin, Heilongjiang China, see [1390]
2006/11: Guˇangzh¯ou, Guˇangd¯ ong, China, see [557]
2005/12: X¯ı’¯an, Shˇanx¯ı, China, see [1192, 1193]
CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology
See Table 10.2 on page 132.
EUFIT: European Congress on Intelligent Techniques and Soft Computing
See Table 10.2 on page 132.
EngOpt: International Conference on Engineering Optimization
See Table 10.2 on page 132.
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 321
EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization
History: 2009/04: T¨ ubingen, Germany, see [645]
2008/03: Naples, Italy, see [2775]
2007/04: Val`encia, Spain, see [646]
2006/04: Budapest, Hungary, see [1116]
2005/03: Lausanne, Switzerland, see [2252]
2004/04: Coimbra, Portugal, see [1115]
GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation
History: 2009/06: Sh`anghˇ ai, China, see [3000]
GEM: International Conference on Genetic and Evolutionary Methods
History: 2011/07: Las Vegas, NV, USA, see [2552]
2010/07: Las Vegas, NV, USA, see [119]
2009/07: Las Vegas, NV, USA, see [121]
2008/06: Las Vegas, NV, USA, see [120]
2007/06: Las Vegas, NV, USA, see [122]
GEWS: Grammatical Evolution Workshop
See Table 31.2 on page 385.
LION: Learning and Intelligent OptimizatioN
See Table 10.2 on page 133.
LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction
See Table 10.2 on page 133.
SoCPaR: International Conference on SOft Computing and PAttern Recognition
See Table 10.2 on page 133.
Progress in Evolutionary Computation: Workshops on Evolutionary Computation
History: 1993—1994: Australia, see [3023]
WOPPLOT: Workshop on Parallel Processing: Logic, Organization and Technology
See Table 10.2 on page 133.
AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems
History: 2001/11: Dunedin, New Zealand, see [1498]
ALIO/EURO: Workshop on Applied Combinatorial Optimization
See Table 10.2 on page 133.
ASC: IASTED International Conference on Artificial Intelligence and Soft Computing
See Table 10.2 on page 134.
EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation
History: 2011/04: Torino, Italy, see [2588]
FOCI: IEEE Symposium on Foundations of Computational Intelligence
History: 2007/04: Honolulu, HI, USA, see [1865]
GICOLAG: Global Optimization – Integrating Convexity, Optimization, Logic Programming, and
Computational Algebraic Geometry
See Table 10.2 on page 134.
IJCCI: International Conference on Computational Intelligence
History: 2010/10: Val`encia, Spain, see [914]
2009/10: Madeira, Portugal, see [1809]
IWNICA: International Workshop on Nature Inspired Computation and Applications
History: 2011/04: H´ef´ei,
¯
Anhu¯ı, China, see [2754]
322 28 EVOLUTIONARY ALGORITHMS
2010/10: H´ef´ei,
¯
Anhu¯ı, China, see [2758]
2008/05: H´ef´ei,
¯
Anhu¯ı, China, see [2757]
2006/10: H´ef´ei,
¯
Anhu¯ı, China, see [1122]
2004/10: H´ef´ei,
¯
Anhu¯ı, China, see [2755, 2756]
Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna
See Table 10.2 on page 134.
NaBIC: World Congress on Nature and Biologically Inspired Computing
History: 2011/11: Salamanca, Spain, see [2368]
2010/12: Kitakyushu, Japan, see [1545]
2009/12: Coimbatore, India, see [12]
WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms
History: 2010/09: Vienna, Austria, see [2311]
2007/07: London, UK, see [905]
2005/06: Oslo, Norway, see [479]
EvoRobots: International Symposium on Evolutionary Robotics
History: 2001/10: Tokyo, Japan, see [1099]
ICEC: International Conference on Evolutionary Computation
History: 2010/10: Val`encia, Spain, see [1276]
2009/10: Madeira, Portugal, see [2325]
IWACI: International Workshop on Advanced Computational Intelligence
History: 2010/08: S¯ uzh¯ou, Ji¯ angs¯ u, China, see [2995]
2009/10: Mexico City, M´exico, see [1875]
2008/06: Macau, China, see [2752]
META: International Conference on Metaheuristics and Nature Inspired Computing
See Table 25.2 on page 228.
SEA: International Symposium on Experimental Algorithms
See Table 10.2 on page 135.
SEMCCO: International Conference on Swarm, Evolutionary and Memetic Computing
History: 2010/12: Chennai, India, see [2595]
SSCI: IEEE Symposium Series on Computational Intelligence
Conference series which contains many other symposia and workshops.
History: 2011/04: Paris, France, see [1357]
2009/03: Nashville, TN, USA, see [1353]
2007/04: Honolulu, HI, USA, see [1345]
28.7.4 Journals
1. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer
Society
2. Evolutionary Computation published by MIT Press
3. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics published by
IEEE Systems, Man, and Cybernetics Society
4. International Journal of Computational Intelligence and Applications (IJCIA) published by
Imperial College Press Co. and World Scientific Publishing Co.
5. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and
Springer Netherlands
6. Natural Computing: An International Journal published by Kluwer Academic Publishers and
Springer Netherlands
7. Journal of Heuristics published by Springer Netherlands
8. Journal of Artificial Evolution and Applications published by Hindawi Publishing Corporation
9. International Journal of Computational Intelligence Research (IJCIR) published by Research
India Publications
10. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) pub-
lished by Idea Group Publishing (Idea Group Inc., IGI Global)
28.7. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 323
11. IEEE Computational Intelligence Magazine published by IEEE Computational Intelligence
Society
12. Journal of Optimization Theory and Applications published by Plenum Press and Springer
Netherlands
13. Computational Intelligence published by Wiley Periodicals, Inc.
14. International Journal of Swarm Intelligence and Evolutionary Computation (IJSIEC) pub-
lished by Ashdin Publishing (AP)
324 28 EVOLUTIONARY ALGORITHMS
Tasks T28
75. Implement a general steady-state Evolutionary Algorithm. You may use Listing 56.11
as blueprint and create a new class derived from EABase (given in ?? on page ??) which
provides the steady-state functionality described in Paragraph 28.1.4.2.1 on page 266.
[20 points]
76. Implement a general Evolutionary Algorithm with archive-based elitism. You may use
Listing 56.11 as blueprint and create a new class derived from EABase (given in ?? on
page ??) which provides the steady-state functionality described in Section 28.1.4.4 on
page 267.
[20 points]
77. Implement the linear ranking selection scheme given in Section 28.4.5 on page 300
for your version of Evolutionary Algorithms. You may use Listing 56.14 as blueprint
implement the interface ISelectionAlgorithm given in Listing 55.13 in a new class.
[20 points]
Chapter 29
Genetic Algorithms
29.1 Introduction
Genetic Algorithms
1
(GAs) are a subclass of Evolutionary Algorithms where (a) the
genotypes g of the search space G are strings of primitive elements (usually all of the
same type) such as bits, integer, or real numbers, and (b) search operations such as
mutation and crossover directly modify these genotypes.. Because of the simple struc-
ture of the search spaces of Genetic Algorithms, often a non-identity mapping is used
as genotype-phenotype mapping to translate the elements g ∈ G to candidate solutions
x ∈ X [1075, 1220, 1250, 2931].
Genetic Algorithms are the original prototype of Evolutionary Algorithms and adhere to
the description given in Section 28.1.1. They provide search operators which try to closely
copy sexual and asexual reproduction schemes from nature. In such “sexual” search opera-
tions, the genoty