Global Optimization Algorithms – Theory and Application –

3rd

Ed

Author: Thomas Weise tweise@gmx.de Edition: third Version: 2011-12-07-15:31

Newest Version: http://www.it-weise.de/projects/book.pdf Sources/CVS-Repository: goa-taa.cvs.sourceforge.net

2

, Don t print me!
This book is much more useful as electronic resource: you can search ,terms and click links. You can t do that in a printed book.

The book is frequently updated and improved as well. Printed versions will just outdate.

Also think about the trees that would have to die for the paper!

, Don t print me!

Preface

This e-book is devoted to Global Optimization algorithms, which are methods for finding solutions of high quality for an incredible wide range of problems. We introduce the basic concepts of optimization and discuss features which make optimization problems difficult and thus, should be considered when trying to solve them. In this book, we focus on metaheuristic approaches like Evolutionary Computation, Simulated Annealing, Extremal Optimization, Tabu Search, and Random Optimization. Especially the Evolutionary Computation methods, which subsume Evolutionary Algorithms, Genetic Algorithms, Genetic Programming, Learning Classifier Systems, Evolution Strategy, Differential Evolution, Particle Swarm Optimization, and Ant Colony Optimization, are discussed in detail. In this third edition, we try to make a transition from a pure material collection and compendium to a more structured book. We try to address two major audience groups: 1. Our book may help students, since we try to describe the algorithms in an understandable, consistent way. Therefore, we also provide the fundamentals and much background knowledge. You can find (short and simplified) summaries on stochastic theory and theoretical computer science in Part VI on page 638. Additionally, application examples are provided which give an idea how problems can be tackled with the different techniques and what results can be expected. 2. Fellow researchers and PhD students may find the application examples helpful too. For them, in-depth discussions on the single approaches are included that are supported with a large set of useful literature references. The contents of this book are divided into three parts. In the first part, different optimization technologies will be introduced and their features are described. Often, small examples will be given in order to ease understanding. In the second part starting at page 530, we elaborate on different application examples in detail. Finally, in the last part following at page 638, the aforementioned background knowledge is provided. In order to maximize the utility of this electronic book, it contains automatic, clickable links. They are shaded with dark gray so the book is still b/w printable. You can click on 1. entries in the table of contents, 2. citation references like “Heitk¨tter and Beasley [1220]”, o 3. page references like “253”, 4. references such as “see Figure 28.1 on page 254” to sections, figures, tables, and listings, and 5. URLs and links like “http://www.lania.mx/˜ccoello/EMOO/ [accessed 2007-10-25]”.1
1 URLs are usually annotated with the date we have accessed them, like http://www.lania.mx/ ˜ccoello/EMOO/ [accessed 2007-10-25]. We can neither guarantee that their content remains unchanged, nor that these sites stay available. We also assume no responsibility for anything we linked to.

4

PREFACE

The following scenario is an example for using the book: A student reads the text and finds a passage that she wants to investigate in-depth. She clicks on a citation which seems interesting and the corresponding reference is shown. To some of the references which are online available, links are provided in the reference text. By clicking on such a link, the Adobe Reader R 2 will open another window and load the regarding document (or a browser window of a site that links to the document). After reading it, the student may use the “backwards” button in the Acrobat Reader’s navigation utility to go back to the text initially read in the e-book. If this book contains something you want to cite or reference in your work, please use the citation suggestion provided in Chapter A on page 945. Also, I would be very happy if you provide feedback, report errors or missing things that you have (or have not) found, criticize something, or have any additional ideas or suggestions. Do not hesitate to contact me via my email address tweise@gmx.de. Matter of fact, a large number of people helped me to improve this book over time. I have enumerated the most important contributors in Chapter D – Thank you guys, I really appreciate your help! At many places in this book we refer to Wikipedia – The Free Encyclopedia [2938] which is a great source of knowledge. Wikipedia – The Free Encyclopedia contains articles and definitions for many of the aspects discussed in this book. Like this book, it is updated and improved frequently. Therefore, including the links adds greatly to the book’s utility, in my opinion. The updates and improvements will result in new versions of the book, which will regularly appear at http://www.it-weise.de/projects/book.pdf. The complete A L TEX source code of this book, including all graphics and the bibliography, is hosted at SourceForge under http://sourceforge.net/projects/goa-taa/ in the CVS repository goa-taa.cvs.sourceforge.net. You can browse the repository under http://goa-taa.cvs.sourceforge.net/goa-taa/ and anonymously download the complete sources of it by using the CVS command given in Listing 1.3
cvs -z3 -d:pserver:anonymous@goa-taa.cvs.sourceforge.net:/cvsroot/goa-taa checkout -P Book

Listing 1: The CVS command for anonymously checking out the complete book sources.
A Compiling the sources requires multiple runs of L TEX, BibTEX, and makeindex because of the nifty way the references are incorporated. In the repository, an Ant-Script is provided for doing this. Also, the complete bibliography of this book is stored in a MySQL4 database and the scripts for creating this database as well as tools for querying it (Java) and a Microsoft Access5 frontend are part of the sources you can download. These resources as well as all contents of this book (unless explicitly stated otherwise) are licensed under the GNU Free Documentation License (FDL, see Chapter B on page 947). Some of the algorithms provided are made available under the LGPL license (see Chapter C on page 955)

Copyright c 2011 Thomas Weise. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the Chapter B on page 947 entitled “GNU Free Documentation License (FDL)”.
2 The

Adobe Reader R is available for download at http://www.adobe.com/products/reader/

[ac-

cessed 2007-08-13].

3 More

information can be found at http://sourceforge.net/scm/?type=cvs&group_id=264352

4 http://en.wikipedia.org/wiki/MySQL [accessed 2010-07-29] 5 http://en.wikipedia.org/wiki/Microsoft_Access [accessed 2010-07-29]

Contents

Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part I Foundations 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1.1 What is Global Optimization? . . . . . . . . . . . 1.1.1 Optimization . . . . . . . . . . . . . . . . . 1.1.2 Algorithms and Programs . . . . . . . . . . 1.1.3 Optimization versus Dedicated Algorithms 1.1.4 Structure of this Book . . . . . . . . . . . . 1.2 Types of Optimization Problems . . . . . . . . . . 1.2.1 Combinatorial Optimization Problems . . . 1.2.2 Numerical Optimization Problems . . . . . 1.2.3 Discrete and Mixed Problems . . . . . . . 1.2.4 Summary . . . . . . . . . . . . . . . . . . . 1.3 Classes of Optimization Algorithms . . . . . . . . 1.3.1 Algorithm Classes . . . . . . . . . . . . . . 1.3.2 Monte Carlo Algorithms . . . . . . . . . . 1.3.3 Heuristics and Metaheuristics . . . . . . . 1.3.4 Evolutionary Computation . . . . . . . . . 1.3.5 Evolutionary Algorithms . . . . . . . . . . 1.4 Classification According to Optimization Time . . 1.5 Number of Optimization Criteria . . . . . . . . . 1.6 Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 3 5

21 21 22 23 24 24 24 24 27 29 30 31 31 34 34 36 37 37 39 39

2

Problem Space and Objective Functions . . . . . . . . . . . . . . . . . . . . 43 2.1 Problem Space: How does it look like? . . . . . . . . . . . . . . . . . . . . . 43 2.2 Objective Functions: Is it good? . . . . . . . . . . . . . . . . . . . . . . . . . 44 Optima: What does good mean? . . . . . . . . . . . . . . . . . . . 3.1 Single Objective Functions . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Extrema: Minima and Maxima of Differentiable Functions 3.1.2 Global Extrema . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Multiple Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Multiple Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 52 53 54 55

3

6 3.3.1 Introduction . . . . . . . . . . . . . . . . 3.3.2 Lexicographic Optimization . . . . . . . 3.3.3 Weighted Sums (Linear Aggregation) . . 3.3.4 Weighted Min-Max . . . . . . . . . . . . 3.3.5 Pareto Optimization . . . . . . . . . . . . 3.4 Constraint Handling . . . . . . . . . . . . . . . . 3.4.1 Genome and Phenome-based Approaches 3.4.2 Death Penalty . . . . . . . . . . . . . . . 3.4.3 Penalty Functions . . . . . . . . . . . . . 3.4.4 Constraints as Additional Objectives . . 3.4.5 The Method Of Inequalities . . . . . . . 3.4.6 Constrained-Domination . . . . . . . . . 3.4.7 Limitations and Other Methods . . . . . 3.5 Unifying Approaches . . . . . . . . . . . . . . . 3.5.1 External Decision Maker . . . . . . . . . 3.5.2 Prevalence Optimization . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 59 62 64 65 70 71 71 71 72 72 75 75 75 75 76 81 81 83 85 88 89 89 90 93 93 94 95 101 101 103 104 104 105 105 106 106

Search Space and Operators: How can we find it? 4.1 The Search Space . . . . . . . . . . . . . . . . . . . 4.2 The Search Operations . . . . . . . . . . . . . . . . 4.3 The Connection between Search and Problem Space 4.4 Local Optima . . . . . . . . . . . . . . . . . . . . . 4.4.1 Local Optima of single-objective Problems . 4.4.2 Local Optima of Multi-Objective Problems . 4.5 Further Definitions . . . . . . . . . . . . . . . . . . Fitness and Problem Landscape: 5.1 Fitness Landscapes . . . . . . . 5.2 Fitness as a Relative Measure of 5.3 Problem Landscapes . . . . . . How does . . . . . . . Utility . . . . . . . . . .

5

the Optimizer see . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

it? . . . . . . . . . . . . . . . . . . . . . . . .

6

The Structure of Optimization: Putting it 6.1 Involved Spaces and Sets . . . . . . . . . . 6.2 Optimization Problems . . . . . . . . . . . 6.3 Other General Features . . . . . . . . . . . 6.3.1 Gradient Descend . . . . . . . . . . 6.3.2 Iterations . . . . . . . . . . . . . . . 6.3.3 Termination Criterion . . . . . . . . 6.3.4 Minimization . . . . . . . . . . . . . 6.3.5 Modeling and Simulating . . . . . .

together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 8

Solving an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . 109 Baseline Search Patterns . 8.1 Random Sampling . . . . 8.2 Random Walks . . . . . 8.2.1 Adaptive Walks . 8.3 Exhaustive Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 113 114 115 116

9

Forma Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

CONTENTS 10 General Information on Optimization 10.1 Applications and Examples . . . . . . 10.2 Books . . . . . . . . . . . . . . . . . . 10.3 Conferences and Workshops . . . . . 10.4 Journals . . . . . . . . . . . . . . . . Part II Difficulties in Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 123 123 125 126 135

11 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 12 Problem Hardness . . . . . . . . . . . . . . . . . . . . . 12.1 Algorithmic Complexity . . . . . . . . . . . . . . . . 12.2 Complexity Classes . . . . . . . . . . . . . . . . . . . 12.2.1 Turing Machines . . . . . . . . . . . . . . . . . 12.2.2 P, N P, Hardness, and Completeness . . . . . 12.3 The Problem: Many Real-World Tasks are N P-hard 12.4 Countermeasures . . . . . . . . . . . . . . . . . . . . 13 Unsatisfying Convergence . . . . . . . . . . . . . . . 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 13.2 The Problems . . . . . . . . . . . . . . . . . . . . . 13.2.1 Premature Convergence . . . . . . . . . . . . 13.2.2 Non-Uniform Convergence . . . . . . . . . . 13.2.3 Domino Convergence . . . . . . . . . . . . . 13.3 One Cause: Loss of Diversity . . . . . . . . . . . . . 13.3.1 Exploration vs. Exploitation . . . . . . . . . 13.4 Countermeasures . . . . . . . . . . . . . . . . . . . 13.4.1 Search Operator Design . . . . . . . . . . . . 13.4.2 Restarting . . . . . . . . . . . . . . . . . . . 13.4.3 Low Selection Pressure and Population Size 13.4.4 Sharing, Niching, and Clearing . . . . . . . . 13.4.5 Self-Adaptation . . . . . . . . . . . . . . . . 13.4.6 Multi-Objectivization . . . . . . . . . . . . . 14 Ruggedness and Weak Causality . . . . . . . . 14.1 The Problem: Ruggedness . . . . . . . . . . . 14.2 One Cause: Weak Causality . . . . . . . . . . 14.3 Fitness Landscape Measures . . . . . . . . . . 14.3.1 Autocorrelation and Correlation Length 14.4 Countermeasures . . . . . . . . . . . . . . . . 15 Deceptiveness . . . . 15.1 Introduction . . . 15.2 The Problem . . . 15.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 143 145 145 146 147 147 151 151 152 152 152 153 154 155 157 157 157 157 157 158 158 161 161 161 163 163 164 167 167 167 168 171 171 172 172 173 174 175 175

16 Neutrality and Redundancy . . . . . . . . 16.1 Introduction . . . . . . . . . . . . . . . . 16.2 Evolvability . . . . . . . . . . . . . . . . 16.3 Neutrality: Problematic and Beneficial . 16.4 Neutral Networks . . . . . . . . . . . . . 16.5 Redundancy: Problematic and Beneficial 16.6 Summary . . . . . . . . . . . . . . . . . . 16.7 Needle-In-A-Haystack . . . . . . . . . . .

8 17 Epistasis, Pleiotropy, and Separability 17.1 Introduction . . . . . . . . . . . . . . . 17.1.1 Epistasis and Pleiotropy . . . . 17.1.2 Separability . . . . . . . . . . . 17.1.3 Examples . . . . . . . . . . . . . 17.2 The Problem . . . . . . . . . . . . . . . 17.3 Countermeasures . . . . . . . . . . . . 17.3.1 Choice of the Representation . . 17.3.2 Parameter Tweaking . . . . . . 17.3.3 Linkage Learning . . . . . . . . 17.3.4 Cooperative Coevolution . . . . 18 Noise and Robustness . . . . . . . . . 18.1 Introduction – Noise . . . . . . . . 18.2 The Problem: Need for Robustness 18.3 Countermeasures . . . . . . . . . . 19 Overfitting and Oversimplification 19.1 Overfitting . . . . . . . . . . . . . 19.1.1 The Problem . . . . . . . . 19.1.2 Countermeasures . . . . . 19.2 Oversimplification . . . . . . . . . 19.2.1 The Problem . . . . . . . . 19.2.2 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 177 177 178 179 179 182 182 183 183 184 187 187 188 190 191 191 191 193 194 194 195 197 197 198 201 201 201 201 202 202 202 203 203 204 204 204 204 205

20 Dimensionality (Objective Functions) . . . . . . . . . . 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 20.2 The Problem: Many-Objective Optimization . . . . . . 20.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . 20.3.1 Increasing Population Size . . . . . . . . . . . . 20.3.2 Increasing Selection Pressure . . . . . . . . . . . 20.3.3 Indicator Function-based Approaches . . . . . . 20.3.4 Scalerizing Approaches . . . . . . . . . . . . . . 20.3.5 Limiting the Search Area in the Objective Space 20.3.6 Visualization Methods . . . . . . . . . . . . . . 21 Scale (Decision Variables) . . . . . . . . . . . . . 21.1 The Problem . . . . . . . . . . . . . . . . . . . . 21.2 Countermeasures . . . . . . . . . . . . . . . . . 21.2.1 Parallelization and Distribution . . . . . 21.2.2 Genetic Representation and Development 21.2.3 Exploiting Separability . . . . . . . . . . 21.2.4 Combination of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22 Dynamically Changing Fitness Landscape . . . . . . . . . . . . . . . . . . . 207 23 The No Free Lunch Theorem . . . 23.1 Initial Definitions . . . . . . . . . 23.2 The Theorem . . . . . . . . . . . 23.3 Implications . . . . . . . . . . . . 23.4 Infinite and Continuous Domains 23.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 209 210 210 213 213

CONTENTS 24 Lessons Learned: Designing Good Encodings . . . . . . . . . . 24.1 Compact Representation . . . . . . . . . . . . . . . . . . . . . . 24.2 Unbiased Representation . . . . . . . . . . . . . . . . . . . . . . 24.3 Surjective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.4 Injective GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5 Consistent GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 Formae Inheritance and Preservation . . . . . . . . . . . . . . . 24.7 Formae in Genotypic Space Aligned with Phenotypic Formae . . 24.8 Compatibility of Formae . . . . . . . . . . . . . . . . . . . . . . 24.9 Representation with Causality . . . . . . . . . . . . . . . . . . . 24.10Combinations of Formae . . . . . . . . . . . . . . . . . . . . . . 24.11Reachability of Formae . . . . . . . . . . . . . . . . . . . . . . . 24.12Influence of Formae . . . . . . . . . . . . . . . . . . . . . . . . . 24.13Scalability of Search Operations . . . . . . . . . . . . . . . . . . 24.14Appropriate Abstraction . . . . . . . . . . . . . . . . . . . . . . 24.15Indirect Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 24.15.1Extradimensional Bypass: Example for Good Complexity Part III Metaheuristic Optimization Algorithms 25 Introduction . . . . . . . . . . . . . . . . . 25.1 General Information on Metaheuristics 25.1.1 Applications and Examples . . . 25.1.2 Books . . . . . . . . . . . . . . . 25.1.3 Conferences and Workshops . . 25.1.4 Journals . . . . . . . . . . . . . 26 Hill 26.1 26.2 26.3 26.4 26.5 26.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 215 215 215 216 216 216 217 217 217 218 218 218 218 219 219 219 219

225 226 226 227 227 228 229 229 229 230 231 232 234 234 234 234 235 235 236 243 243 243 245 246 247 247 248 249 249 249 249

Climbing . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . Algorithm . . . . . . . . . . . . . . . . . . . . . . Multi-Objective Hill Climbing . . . . . . . . . . . Problems in Hill Climbing . . . . . . . . . . . . . Hill Climbing With Random Restarts . . . . . . . General Information on Hill Climbing . . . . . . . 26.6.1 Applications and Examples . . . . . . . . . 26.6.2 Books . . . . . . . . . . . . . . . . . . . . . 26.6.3 Conferences and Workshops . . . . . . . . 26.6.4 Journals . . . . . . . . . . . . . . . . . . . 26.7 Raindrop Method . . . . . . . . . . . . . . . . . . 26.7.1 General Information on Raindrop Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27 Simulated Annealing . . . . . . . . . . . . 27.1 Introduction . . . . . . . . . . . . . . . . 27.1.1 Metropolis Simulation . . . . . . . 27.2 Algorithm . . . . . . . . . . . . . . . . . 27.2.1 No Boltzmann’s Constant . . . . . 27.2.2 Convergence . . . . . . . . . . . . 27.3 Temperature Scheduling . . . . . . . . . 27.3.1 Logarithmic Scheduling . . . . . . 27.3.2 Exponential Scheduling . . . . . . 27.3.3 Polynomial and Linear Scheduling 27.3.4 Adaptive Scheduling . . . . . . . . 27.3.5 Larger Step Widths . . . . . . . .

10 27.4 General Information on Simulated Annealing 27.4.1 Applications and Examples . . . . . . 27.4.2 Books . . . . . . . . . . . . . . . . . . 27.4.3 Conferences and Workshops . . . . . 27.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 250 250 250 251 253 253 254 255 263 265 269 270 270 272 274 274 275 275 277 279 284 285 285 289 290 296 300 302 302 306 308 308 309 311 314 314 316 317 322 325 325 325 326 326 327 327 327 328 328 330 333

28 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . 28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.1.1 The Basic Cycle of EAs . . . . . . . . . . . . . . . . . . . 28.1.2 Biological and Artificial Evolution . . . . . . . . . . . . . 28.1.3 Historical Classification . . . . . . . . . . . . . . . . . . . 28.1.4 Populations in Evolutionary Algorithms . . . . . . . . . . 28.1.5 Configuration Parameters of Evolutionary Algorithms . . 28.2 Genotype-Phenotype Mappings . . . . . . . . . . . . . . . . . . 28.2.1 Direct Mappings . . . . . . . . . . . . . . . . . . . . . . . 28.2.2 Indirect Mappings . . . . . . . . . . . . . . . . . . . . . . 28.3 Fitness Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.2 Weighted Sum Fitness Assignment . . . . . . . . . . . . . 28.3.3 Pareto Ranking . . . . . . . . . . . . . . . . . . . . . . . 28.3.4 Sharing Functions . . . . . . . . . . . . . . . . . . . . . . 28.3.5 Variety Preserving Fitness Assignment . . . . . . . . . . 28.3.6 Tournament Fitness Assignment . . . . . . . . . . . . . . 28.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 28.4.2 Truncation Selection . . . . . . . . . . . . . . . . . . . . . 28.4.3 Fitness Proportionate Selection . . . . . . . . . . . . . . 28.4.4 Tournament Selection . . . . . . . . . . . . . . . . . . . . 28.4.5 Linear Ranking Selection . . . . . . . . . . . . . . . . . . 28.4.6 Random Selection . . . . . . . . . . . . . . . . . . . . . . 28.4.7 Clearing and Simple Convergence Prevention (SCP) . . . 28.5 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.6 Maintaining a Set of Non-Dominated/Non-Prevailed Individuals 28.6.1 Updating the Optimal Set . . . . . . . . . . . . . . . . . 28.6.2 Obtaining Non-Prevailed Elements . . . . . . . . . . . . . 28.6.3 Pruning the Optimal Set . . . . . . . . . . . . . . . . . . 28.7 General Information on Evolutionary Algorithms . . . . . . . . 28.7.1 Applications and Examples . . . . . . . . . . . . . . . . . 28.7.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.7.3 Conferences and Workshops . . . . . . . . . . . . . . . . 28.7.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Genetic Algorithms . . . . . . . . . . . . . . . . . . . 29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 29.1.1 History . . . . . . . . . . . . . . . . . . . . . 29.2 Genomes in Genetic Algorithms . . . . . . . . . . . 29.2.1 Classification According to Length . . . . . . 29.2.2 Classification According to Element Type . . 29.2.3 Classification According to Meaning of Loci 29.2.4 Introns . . . . . . . . . . . . . . . . . . . . . 29.3 Fixed-Length String Chromosomes . . . . . . . . . 29.3.1 Creation: Nullary Reproduction . . . . . . . 29.3.2 Mutation: Unary Reproduction . . . . . . . 29.3.3 Permutation: Unary Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 29.3.4 Crossover: Binary Reproduction . . . . . . . 29.4 Variable-Length String Chromosomes . . . . . . . . 29.4.1 Creation: Nullary Reproduction . . . . . . . 29.4.2 Insertion and Deletion: Unary Reproduction 29.4.3 Crossover: Binary Reproduction . . . . . . . 29.5 Schema Theorem . . . . . . . . . . . . . . . . . . . 29.5.1 Schemata and Masks . . . . . . . . . . . . . 29.5.2 Wildcards . . . . . . . . . . . . . . . . . . . 29.5.3 Holland’s Schema Theorem . . . . . . . . . . 29.5.4 Criticism of the Schema Theorem . . . . . . 29.5.5 The Building Block Hypothesis . . . . . . . 29.5.6 Genetic Repair and Similarity Extraction . . 29.6 The Messy Genetic Algorithm . . . . . . . . . . . . 29.6.1 Representation . . . . . . . . . . . . . . . . . 29.6.2 Reproduction Operations . . . . . . . . . . . 29.6.3 Overspecification and Underspecification . . 29.6.4 The Process . . . . . . . . . . . . . . . . . . 29.7 Random Keys . . . . . . . . . . . . . . . . . . . . . 29.8 General Information on Genetic Algorithms . . . . 29.8.1 Applications and Examples . . . . . . . . . . 29.8.2 Books . . . . . . . . . . . . . . . . . . . . . . 29.8.3 Conferences and Workshops . . . . . . . . . 29.8.4 Journals . . . . . . . . . . . . . . . . . . . . 30 Evolution Strategies . . . . . . . . . . . . . . . 30.1 Introduction . . . . . . . . . . . . . . . . . . 30.2 Algorithm . . . . . . . . . . . . . . . . . . . 30.3 Recombination . . . . . . . . . . . . . . . . . 30.3.1 Dominant Recombination . . . . . . . 30.3.2 Intermediate Recombination . . . . . 30.4 Mutation . . . . . . . . . . . . . . . . . . . . 30.4.1 Using Normal Distributions . . . . . . 30.5 Parameter Adaptation . . . . . . . . . . . . 30.5.1 The 1/5th Rule . . . . . . . . . . . . 30.5.2 Endogenous Parameters . . . . . . . . 30.6 General Information on Evolution Strategies 30.6.1 Applications and Examples . . . . . . 30.6.2 Books . . . . . . . . . . . . . . . . . . 30.6.3 Conferences and Workshops . . . . . 30.6.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11 335 339 340 340 340 341 341 342 342 345 346 346 347 347 347 348 348 349 351 351 352 353 356 359 359 360 362 362 362 364 364 367 368 370 373 373 373 373 376 379 379 379 382 382 383 383 386 386 386 389 392 393

31 Genetic Programming . . . . . . . . . . . . . . . 31.1 Introduction . . . . . . . . . . . . . . . . . . . 31.1.1 History . . . . . . . . . . . . . . . . . . 31.2 General Information on Genetic Programming 31.2.1 Applications and Examples . . . . . . . 31.2.2 Books . . . . . . . . . . . . . . . . . . . 31.2.3 Conferences and Workshops . . . . . . 31.2.4 Journals . . . . . . . . . . . . . . . . . 31.3 (Standard) Tree Genomes . . . . . . . . . . . 31.3.1 Introduction . . . . . . . . . . . . . . . 31.3.2 Creation: Nullary Reproduction . . . . 31.3.3 Node Selection . . . . . . . . . . . . . . 31.3.4 Unary Reproduction Operations . . . .

12

CONTENTS 31.3.5 Recombination: Binary Reproduction . . . . . . . . . . . . . . . 31.3.6 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Linear Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . 31.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . 31.4.3 Realizations and Implementations . . . . . . . . . . . . . . . . . 31.4.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4.5 General Information on Linear Genetic Programming . . . . . . 31.5 Grammars in Genetic Programming . . . . . . . . . . . . . . . . . . . . 31.5.1 General Information on Grammar-Guided Genetic Programming 31.6 Graph-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6.1 General Information on Grahp-based Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 400 403 403 404 405 407 408 409 409 409 409 413 414 414 414 414 417 419 419 419 420 421 421 421 421 424 427 427 428 428 428 429 431 431 434 434 438 439 442 442 444 444 447 447 452 452 454

32 Evolutionary Programming . . . . . . . . . . . . . . 32.1 General Information on Evolutionary Programming 32.1.1 Applications and Examples . . . . . . . . . . 32.1.2 Books . . . . . . . . . . . . . . . . . . . . . . 32.1.3 Conferences and Workshops . . . . . . . . . 32.1.4 Journals . . . . . . . . . . . . . . . . . . . . 33 Differential Evolution . . . . . . . . . . . . . . . 33.1 Introduction . . . . . . . . . . . . . . . . . . . 33.2 Ternary Recombination . . . . . . . . . . . . . 33.2.1 Advanced Operators . . . . . . . . . . . 33.3 General Information on Differential Evolution 33.3.1 Applications and Examples . . . . . . . 33.3.2 Books . . . . . . . . . . . . . . . . . . . 33.3.3 Conferences and Workshops . . . . . . 33.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

34 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . . . 34.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.2 General Information on Estimation Of Distribution Algorithms . . . . . 34.2.1 Applications and Examples . . . . . . . . . . . . . . . . . . . . . . 34.2.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.2.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . 34.2.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.4 EDAs Searching Bit Strings . . . . . . . . . . . . . . . . . . . . . . . . . 34.4.1 UMDA: Univariate Marginal Distribution Algorithm . . . . . . . 34.4.2 PBIL: Population-Based Incremental Learning . . . . . . . . . . . 34.4.3 cGA: Compact Genetic Algorithm . . . . . . . . . . . . . . . . . . 34.5 EDAs Searching Real Vectors . . . . . . . . . . . . . . . . . . . . . . . . 34.5.1 SHCLVND: Stochastic Hill Climbing with Learning by Vectors of Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 34.5.2 PBILC : Continuous PBIL . . . . . . . . . . . . . . . . . . . . . . 34.5.3 Real-Coded PBIL . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.6 EDAs Searching Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.6.1 PIPE: Probabilistic Incremental Program Evolution . . . . . . . . 34.7 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.7.1 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.7.2 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 35 Learning Classifier Systems . . . . . . . . . . . . . . 35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 35.2 Families of Learning Classifier Systems . . . . . . . 35.3 General Information on Learning Classifier Systems 35.3.1 Applications and Examples . . . . . . . . . . 35.3.2 Books . . . . . . . . . . . . . . . . . . . . . . 35.3.3 Conferences and Workshops . . . . . . . . . 35.3.4 Journals . . . . . . . . . . . . . . . . . . . . 36 Memetic and Hybrid Algorithms . . . . . . . 36.1 Memetic Algorithms . . . . . . . . . . . . . . 36.2 Lamarckian Evolution . . . . . . . . . . . . . 36.3 Baldwin Effect . . . . . . . . . . . . . . . . . 36.4 General Information on Memetic Algorithms 36.4.1 Applications and Examples . . . . . . 36.4.2 Books . . . . . . . . . . . . . . . . . . 36.4.3 Conferences and Workshops . . . . . 36.4.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 457 457 457 459 459 459 459 462 463 463 464 464 467 467 467 467 470 471 471 473 473 473 473 476 477 478 478 478 478 479 481 481 481 482 482 482 483 483 483 483 486 489 490 490 490 490 491

37 Ant Colony Optimization . . . . . . . . . . . . . . 37.1 Introduction . . . . . . . . . . . . . . . . . . . . . 37.2 General Information on Ant Colony Optimization 37.2.1 Applications and Examples . . . . . . . . . 37.2.2 Books . . . . . . . . . . . . . . . . . . . . . 37.2.3 Conferences and Workshops . . . . . . . . 37.2.4 Journals . . . . . . . . . . . . . . . . . . .

38 River Formation Dynamics . . . . . . . . . . . . . . 38.1 General Information on River Formation Dynamics 38.1.1 Applications and Examples . . . . . . . . . . 38.1.2 Books . . . . . . . . . . . . . . . . . . . . . . 38.1.3 Conferences and Workshops . . . . . . . . . 38.1.4 Journals . . . . . . . . . . . . . . . . . . . .

39 Particle Swarm Optimization . . . . . . . . . . . . . . . . . 39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 39.2 The Particle Swarm Optimization Algorithm . . . . . . . . 39.2.1 Communication with Neighbors – Social Interaction 39.2.2 Particle Update . . . . . . . . . . . . . . . . . . . . 39.2.3 Basic Procedure . . . . . . . . . . . . . . . . . . . . 39.3 General Information on Particle Swarm Optimization . . . 39.3.1 Applications and Examples . . . . . . . . . . . . . . 39.3.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3.3 Conferences and Workshops . . . . . . . . . . . . . 39.3.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . 40 Tabu Search . . . . . . . . . . . . . . . . 40.1 General Information on Tabu Search 40.1.1 Applications and Examples . . 40.1.2 Books . . . . . . . . . . . . . . 40.1.3 Conferences and Workshops . 40.1.4 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14 41 Extremal Optimization . . . . . . . . . . . . . . . . 41.1 Introduction . . . . . . . . . . . . . . . . . . . . . 41.1.1 Self-Organized Criticality . . . . . . . . . . 41.1.2 The Bak-Sneppen model of Evolution . . . 41.2 Extremal Optimization and Generalized Extremal 41.3 General Information on Extremal Optimization . 41.3.1 Applications and Examples . . . . . . . . . 41.3.2 Books . . . . . . . . . . . . . . . . . . . . . 41.3.3 Conferences and Workshops . . . . . . . . 41.3.4 Journals . . . . . . . . . . . . . . . . . . . 42 GRASPs . . . . . . . . . . . . . . . . . 42.1 General Information on GRAPSs . 42.1.1 Applications and Examples . 42.1.2 Conferences and Workshops 42.1.3 Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 493 493 493 494 496 496 496 496 497 499 500 500 500 500 501 501 502 502 502 502 505 506 506 506 506

43 Downhill Simplex (Nelder and Mead) . . . 43.1 Introduction . . . . . . . . . . . . . . . . . 43.2 General Information on Downhill Simplex 43.2.1 Applications and Examples . . . . . 43.2.2 Conferences and Workshops . . . . 43.2.3 Journals . . . . . . . . . . . . . . .

44 Random Optimization . . . . . . . . . . . . . . . 44.1 General Information on Random Optimization 44.1.1 Applications and Examples . . . . . . . 44.1.2 Conferences and Workshops . . . . . . 44.1.3 Journals . . . . . . . . . . . . . . . . .

Part IV Non-Metaheuristic Optimization Algorithms 45 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 46 State Space Search . . . . . . . . . . . . . . . . . . . 46.1 Introduction . . . . . . . . . . . . . . . . . . . . . 46.1.1 The Baseline: A Binary Goal Criterion . . 46.1.2 State Space and Neighborhood Search . . . 46.1.3 The Search Space as Graph . . . . . . . . . 46.1.4 Key Efficiency Features . . . . . . . . . . . 46.2 Uninformed Search . . . . . . . . . . . . . . . . . 46.2.1 Breadth-First Search . . . . . . . . . . . . 46.2.2 Depth-First Search . . . . . . . . . . . . . 46.2.3 Depth-Limited Search . . . . . . . . . . . . 46.2.4 Iteratively Deepening Depth-First Search . 46.2.5 General Information on Uninformed Search 46.3 Informed Search . . . . . . . . . . . . . . . . . . . 46.3.1 Greedy Search . . . . . . . . . . . . . . . . 46.3.2 A⋆ Search . . . . . . . . . . . . . . . . . . 46.3.3 General Information on Informed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 511 511 511 512 514 514 515 516 516 516 517 518 519 519 520

CONTENTS 47 Branch And Bound . . . . . . . . . . . . . . . 47.1 General Information on Branch And Bound 47.1.1 Applications and Examples . . . . . . 47.1.2 Books . . . . . . . . . . . . . . . . . . 47.1.3 Conferences and Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 523 524 524 524 524 527 528 528 528 528

48 Cutting-Plane Method . . . . . . . . . . . . . . 48.1 General Information on Cutting-Plane Method 48.1.1 Applications and Examples . . . . . . . 48.1.2 Books . . . . . . . . . . . . . . . . . . . 48.1.3 Conferences and Workshops . . . . . .

Part V Applications 49 Real-World Problems . . . . . . . . . . . . . . . . . . . . . . . . . 49.1 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . 49.1.1 Genetic Programming: Genome for Symbolic Regression 49.1.2 Sample Data, Quality, and Estimation Theory . . . . . . 49.1.3 Limits of Symbolic Regression . . . . . . . . . . . . . . . 49.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.2.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . 49.3 Freight Transportation Planning . . . . . . . . . . . . . . . . . . 49.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 49.3.2 Vehicle Routing in Theory and Practice . . . . . . . . . . 49.3.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 49.3.4 Evolutionary Approach . . . . . . . . . . . . . . . . . . . 49.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 49.3.6 Holistic Approach to Logistics . . . . . . . . . . . . . . . 49.3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 50 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 50.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 50.2 Bit-String based Problem Spaces . . . . . . . . . . . 50.2.1 Kauffman’s NK Fitness Landscapes . . . . . 50.2.2 The p-Spin Model . . . . . . . . . . . . . . . 50.2.3 The ND Family of Fitness Landscapes . . . . 50.2.4 The Royal Road . . . . . . . . . . . . . . . . 50.2.5 OneMax and BinInt . . . . . . . . . . . . . . 50.2.6 Long Path Problems . . . . . . . . . . . . . . 50.2.7 Tunable Model for Problematic Phenomena 50.3 Real Problem Spaces . . . . . . . . . . . . . . . . . 50.3.1 Single-Objective Optimization . . . . . . . . 50.4 Close-to-Real Vehicle Routing Problem Benchmark 50.4.1 Introduction . . . . . . . . . . . . . . . . . . 50.4.2 Involved Objects . . . . . . . . . . . . . . . . 50.4.3 Problem Space . . . . . . . . . . . . . . . . . 50.4.4 Objectives and Constraints . . . . . . . . . . 50.4.5 Framework . . . . . . . . . . . . . . . . . . . Part VI Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 531 531 532 535 536 536 536 537 538 540 542 545 549 551 553 553 553 553 556 557 558 562 563 566 580 580 621 621 625 632 634 635

16 51 Set Theory . . . . . . . . . . . . . . . . . . . 51.1 Set Membership . . . . . . . . . . . . . . 51.2 Special Sets . . . . . . . . . . . . . . . . 51.3 Relations between Sets . . . . . . . . . . 51.4 Operations on Sets . . . . . . . . . . . . 51.5 Tuples . . . . . . . . . . . . . . . . . . . 51.6 Permutations . . . . . . . . . . . . . . . . 51.7 Binary Relations . . . . . . . . . . . . . . 51.8 Order Relations . . . . . . . . . . . . . . 51.9 Equivalence Relations . . . . . . . . . . . 51.10Functions . . . . . . . . . . . . . . . . . . 51.10.1Monotonicity . . . . . . . . . . . . 51.11Lists . . . . . . . . . . . . . . . . . . . . 51.11.1Sorting . . . . . . . . . . . . . . . 51.11.2Searching . . . . . . . . . . . . . . 51.11.3Transformations between Sets and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 639 640 640 641 643 643 644 645 646 646 647 647 649 650 650

52 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 53 Stochastic Theory and Statistics . . . . . . . . . . . . . 53.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . 53.1.1 Probabily as defined by Bernoulli (1713) . . . . 53.1.2 Combinatorics . . . . . . . . . . . . . . . . . . . 53.1.3 The Limiting Frequency Theory of von Mises . . 53.1.4 The Axioms of Kolmogorov . . . . . . . . . . . . 53.1.5 Conditional Probability . . . . . . . . . . . . . . 53.1.6 Random Variables . . . . . . . . . . . . . . . . . 53.1.7 Cumulative Distribution Function . . . . . . . . 53.1.8 Probability Mass Function . . . . . . . . . . . . 53.1.9 Probability Density Function . . . . . . . . . . . 53.2 Parameters of Distributions and their Estimates . . . . 53.2.1 Count, Min, Max and Range . . . . . . . . . . . 53.2.2 Expected Value and Arithmetic Mean . . . . . . 53.2.3 Variance and Standard Deviation . . . . . . . . 53.2.4 Moments . . . . . . . . . . . . . . . . . . . . . . 53.2.5 Skewness and Kurtosis . . . . . . . . . . . . . . 53.2.6 Median, Quantiles, and Mode . . . . . . . . . . 53.2.7 Entropy . . . . . . . . . . . . . . . . . . . . . . . 53.2.8 The Law of Large Numbers . . . . . . . . . . . . 53.3 Some Discrete Distributions . . . . . . . . . . . . . . . 53.3.1 Discrete Uniform Distribution . . . . . . . . . . 53.3.2 Poisson Distribution πλ . . . . . . . . . . . . . . 53.3.3 Binomial Distribution B (n, p) . . . . . . . . . . 53.4 Some Continuous Distributions . . . . . . . . . . . . . 53.4.1 Continuous Uniform Distribution . . . . . . . . 53.4.2 Normal Distribution N µ, σ 2 . . . . . . . . . . 53.4.3 Exponential Distribution exp(λ) . . . . . . . . . 53.4.4 Chi-square Distribution . . . . . . . . . . . . . . 53.4.5 Student’s t-Distribution . . . . . . . . . . . . . . 53.5 Example – Throwing a Dice . . . . . . . . . . . . . . . 53.6 Estimation Theory . . . . . . . . . . . . . . . . . . . . 53.6.1 Introduction . . . . . . . . . . . . . . . . . . . . 53.6.2 Likelihood and Maximum Likelihood Estimators 53.6.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 653 654 654 655 656 657 657 658 659 659 659 660 661 662 664 665 665 667 668 668 668 671 674 676 676 678 682 684 687 689 691 691 693 696

CONTENTS

17

53.7 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699 53.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699 Part VII Implementation 54 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 55 The Specification Package . . . . . . . . . . . . 55.1 General Definitions . . . . . . . . . . . . . . . 55.1.1 Search Operations . . . . . . . . . . . . 55.1.2 Genotype-Phenotype Mapping . . . . . 55.1.3 Objective Function . . . . . . . . . . . 55.1.4 Termination Criterion . . . . . . . . . . 55.1.5 Optimization Algorithm . . . . . . . . . 55.2 Algorithm Specific Interfaces . . . . . . . . . . 55.2.1 Evolutionary Algorithms . . . . . . . . 55.2.2 Estimation Of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 707 708 711 712 712 713 716 717 719 723 724 724 729 735 746 784 784 809 818 824 824 833 835 837 839 859 859 859 887 905 905 914 914 925

56 The Implementation Package . . . . . . . . . . . . . . . . . . . . . 56.1 The Optimization Algorithms . . . . . . . . . . . . . . . . . . . . 56.1.1 Algorithm Base Classes . . . . . . . . . . . . . . . . . . . . 56.1.2 Baseline Search Patterns . . . . . . . . . . . . . . . . . . . 56.1.3 Local Search Algorithms . . . . . . . . . . . . . . . . . . . 56.1.4 Population-based Metaheuristics . . . . . . . . . . . . . . . 56.2 Search Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56.2.1 Operations for String-based Search Spaces . . . . . . . . . 56.2.2 Operations for Tree-based Search Spaces . . . . . . . . . . 56.2.3 Multiplexing Operators . . . . . . . . . . . . . . . . . . . . 56.3 Data Structures for Special Search Spaces . . . . . . . . . . . . . 56.3.1 Tree-based Search Spaces as used in Genetic Programming 56.4 Genotype-Phenotype Mappings . . . . . . . . . . . . . . . . . . . 56.5 Termination Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 56.6 Comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56.7 Utility Classes, Constants, and Routines . . . . . . . . . . . . . .

57 Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.1 Demos for String-based Problem Spaces . . . . . . . . . . . . . . . . . . . . . 57.1.1 Real-Vector based Problem Spaces X ⊆ Rn . . . . . . . . . . . . . . . 57.1.2 Permutation-based Problem Spaces . . . . . . . . . . . . . . . . . . . 57.2 Demos for Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . 57.2.1 Demos for Genetic Programming Applied to Mathematical Problems 57.3 The VRP Benchmark Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 57.3.1 The Involved Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.3.2 The Optimization Utilities . . . . . . . . . . . . . . . . . . . . . . . .

Appendices A Citation Suggestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945

18 B GNU Free Documentation License (FDL) B.1 Applicability and Definitions . . . . . . . . B.2 Verbatim Copying . . . . . . . . . . . . . . B.3 Copying in Quantity . . . . . . . . . . . . B.4 Modifications . . . . . . . . . . . . . . . . B.5 Combining Documents . . . . . . . . . . . B.6 Collections of Documents . . . . . . . . . . B.7 Aggregation with Independent Works . . . B.8 Translation . . . . . . . . . . . . . . . . . . B.9 Termination . . . . . . . . . . . . . . . . . B.10 Future Revisions of this License . . . . . . B.11 Relicensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947 947 949 949 949 951 951 951 952 952 952 953 955 955 955 956 956 956 957 957

C

GNU Lesser General Public License (LGPL) . . . . . . . . . . C.1 Additional Definitions . . . . . . . . . . . . . . . . . . . . . . . . C.2 Exception to Section 3 of the GNU GPL . . . . . . . . . . . . . C.3 Conveying Modified Versions . . . . . . . . . . . . . . . . . . . . C.4 Object Code Incorporating Material from Library Header Files . C.5 Combined Works . . . . . . . . . . . . . . . . . . . . . . . . . . C.6 Combined Libraries . . . . . . . . . . . . . . . . . . . . . . . . . C.7 Revised Versions of the GNU Lesser General Public License . .

D

Credits and Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213 List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217 List of Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221

Part I

Foundations

20

CONTENTS

Chapter 1 Introduction

1.1

What is Global Optimization?

One of the most fundamental principles in our world is the search for optimal states. It begins in the microcosm in physics, where atom bonds1 minimize the energy of electrons [2137]. When molecules combine to solid bodies during the process of freezing, they assume crystal structures which come close to energy-optimal configurations. Such a natural optimization process, of course, is not driven by any higher intention but the result from the laws of physics. The same goes for the biological principle of survival of the fittest [2571] which, together with the biological evolution [684], leads to better adaptation of the species to their environment. Here, a local optimum is a well-adapted species that can maintain a stable population in its niche.

smallest ...
fastest ...

biggest ... ... with least energy
best trade-offs between ....

maximal ...

... that fulfills all timing constraints

minimal ...

... on the smallest possible area

fastest ...
most robust ...

most similar to ...

cheapest ...

... with least aerodynamic drag

most precise ...
Figure 1.1: Terms that hint: “Optimization Problem” (inspired by [428])
1 http://en.wikipedia.org/wiki/Chemical_bond [accessed 2007-07-12]

22

1 INTRODUCTION

As long as humankind exists, we strive for perfection in many areas. As sketched in Figure 1.1 there exists an incredible variety of different terms that indicate that a situation is actually an optimization problem. For example, we want to reach a maximum degree of happiness with the least amount of effort. In our economy, profit and sales must be maximized and costs should be as low as possible. Therefore, optimization is one of the oldest sciences which even extends into daily life [2023]. We also can observe the phenomenon of trade-offs between different criteria which are also common in many other problem domains: One person may choose to work hard with the goal of securing stability and pursuing wealth for her future life, accepting very little spare time in its current stage. Another person can, deliberately, choose a lax life full of joy in the current phase but rather unsecure in the long term. Similar, most optimization problems in real-world applications involve multiple, usually conflicting, objectives. Furthermore, many optimization problems involve certain constraints. The guy with the relaxed lifestyle, for instance, regardless how lazy he might become, cannot waive dragging himself to the supermarket from time to time in order to acquire food. The eager beaver, on the other hand, will not be able to keep up with 18 hours of work for seven days per week in the long run without damaging her health considerably and thus, rendering the goal of an enjoyable future unattainable. Likewise, the parameters of the solutions of an optimization problem may be constrained as well. 1.1.1 Optimization

Since optimization covers a wide variety of subject areas, there exist different points of view on what it actually is. We approach this question by first clarifying the goal of optimization: solving optimization problems. Definition D1.1 (Optimization Problem (Economical View)). An optimization problem is a situation which requires deciding for one choice from a set of possible alternatives in order to reach a predefined/required benefit at minimal costs.

Definition D1.2 (Optimization Problem (Simplified Mathematical View)). Solving an optimization problem requires finding an input value x⋆ for which a mathematical function f takes on the smallest possible value (while usually obeying to some restrictions on the possible values of x⋆ ). Every task which has the goal of approaching certain configurations considered as optimal in the context of pre-defined criteria can be viewed as an optimization problem. After giving these rough definitions which we will later refine in Section 6.2 on page 103 we can also specify the topic of this book: Definition D1.3 (Optimization). Optimization is the process of solving an optimization problem, i. e., finding suitable solutions for it.

Definition D1.4 (Global Optimization). Global optimization is optimization with the goal of finding solutions x⋆ for a given optimization problem which have the property that no other, better solutions exist. If something is important, general, and abstract enough, there is always a mathematical discipline dealing with it. Global Optimization2 [609, 2126, 2181, 2714] is also the
2 http://en.wikipedia.org/wiki/Global_optimization [accessed 2007-07-03]

1.1. WHAT IS GLOBAL OPTIMIZATION?

23

branch of applied mathematics and numerical analysis that focuses on, well, optimization. Closely related and overlapping areas are mathematical economics 3 [559] and operations research 4 [580, 1232]. 1.1.2 Algorithms and Programs

The title of this book is Global Optimization Algorithms – Theory and Application and so far, we have clarified what optimization is so the reader has an idea about the general direction in which we are going to drift. Before giving an overview about the contents to follow, let us take a quick look into the second component of the book title, algorithms. The term algorithm comprises essentially all forms of “directives what to do to reach a certain goal”. A culinary receipt is an algorithm, for example, since it tells how much of which ingredient is to be added to a meal in which sequence and how everything should be heated. The commands inside the algorithms can be very concise or very imprecise, depending on the area of application. How accurate can we, for instance, carry out the instruction “Add a tablespoon of sugar.”? Hence, algorithms are a very wide field that there exist numerous different, rather fuzzy definitions for the word algorithm [46, 141, 157, 635, 1616]. We base ours on the one given by Wikipedia: Definition D1.5 (Algorithm (Wikipedia)). An algorithm5 is a finite set of well-defined instructions for accomplishing some task. It starts in some initial state and usually terminates in a final state. The term algorithms is derived from the name of the mathematician Mohammed ibn-Musa al-Khwarizmi who was part of the royal court in Baghdad and who lived from about 780 to 850. Al-Khwarizmi’s work is the likely source for the word algebra as well. Definition D1.6 provides a basic understanding about what an optimization algorithm is, a “working hypothesis” for the time being. It will later be refined in Definition D6.3 on page 103. Definition D1.6 (Optimization Algorithm). An optimization algorithm is an algorithm suitable to solve optimization problems. An algorithm is a set of directions in a representation which is usually understandable for human beings and thus, independent from specific hardware and computers. A program, on the other hand, is basically an algorithm realized for a given computer or execution environment. Definition D1.7 (Program). A program6 is a set of instructions that describe a task or an algorithm to be carried out on a computer. Therefore, the primitive instructions of the algorithm must be expressed either in a form that the machine can process directly (machine code7 [2135]), in a form that can be translated (1:1) into such code (assembly language [2378], Java byte code [1109], etc.), or in a highlevel programming language8 [1817] which can be translated (n:m) into the latter using special software (compiler) [1556]. In this book, we will provide several algorithms for solving optimization problems. In our algorithms, we will try to bridge between abstraction, easiness of reading, and closeness to the syntax of high-level programming languages as Java. In the algorithms, we will use self-explaining primitives which will still base on clear definitions (such as the list operations discussed in Section 51.11 on page 647).
3 http://en.wikipedia.org/wiki/Mathematical_economics [accessed 2009-06-16] 4 http://en.wikipedia.org/wiki/Operations_research [accessed 2009-06-19] 5 http://en.wikipedia.org/wiki/Algorithm [accessed 2007-07-03] 6 http://en.wikipedia.org/wiki/Computer_program [accessed 2007-07-03] 7 http://en.wikipedia.org/wiki/Machine_code [accessed 2007-07-04] 8 http://en.wikipedia.org/wiki/High-level_programming_language [accessed 2007-07-03]

24 1.1.3 Optimization versus Dedicated Algorithms

1 INTRODUCTION

During at least fifty five years, research in computer science led to the development of many algorithms which solve given problem classes to optimality. Basically, we can distinguish two kinds of problem-solving algorithms: dedicated algorithms and optimization algorithms. Dedicated algorithms are specialized to exactly solve a given class of problems in the shortest possible time. They are most often deterministic and always utilize exact theoretical knowledge about the problem at hand. Kruskal’s algorithm [1625], for example, is the best way for searching the minimum spanning tree in a graph with weighted edges. The shortest paths from a source node to the other nodes in a network can best be found with Dijkstra’s algorithm [796] and the maximum flow in a flow network can be computed quickly with the algorithm given by Dinitz [800]. There are many problems for which such fast and efficient dedicated algorithms exist. For each optimization problem, we should always check first if it can be solved with such a specialized, exact algorithm (see also Chapter 7 on page 109). However, for many optimization tasks no efficient algorithm exists. The reason can be that (1) the problem is too specific – after all, researchers usually develop algorithms only for problems which are of general interest. (2) Furthermore, for some problems (the current scientific opinion is that) no efficient, dedicated can be constructed at all (see Section 12.2.2 on page 146). For such problems, the more general optimization algorithms are employed. Most often, these algorithms only need a definition of the structure of possible solutions and a function f which tells measures the quality of a candidate solution. Based on this information, they try to find solutions for which f takes on the best values. Optimization methods utilize much less information than exact algorithms, they are usually slower and cannot provide the same solution quality. In the cases 1 and 2, however, we have no choice but to use them. And indeed, there exists a giant number of situations where either Point 1, 2, or both conditions hold, as you can see in, for example, Table 25.1 on page 226 or 28.4 on page 314. . . 1.1.4 Structure of this Book

In the first part of this book, we will first discuss the things which constitute an optimization problem and which are the basic components of each optimization process. In Part II, we take a look on all the possible aspects which can make an optimization task difficult. Understanding these issues will allow you to anticipate possible problems when solving an optimization task and, hopefully, to prevent their occurrence from the start. We then discuss metaheuristic optimization algorithms (and especially Evolutionary Computation methods) in Part III. Whereas these optimization approaches are clearly the center of this book, we also introduce traditional, non-metaheuristic, numerical, and/or exact methods in Part IV. In Part V, we then list some examples of applications for which optimization algorithms can be used. Finally, Part VI is a collection of definitions and knowledge which can be useful to understand the terms and formulas used in the rest of the book. It is provided in an effort to make the book stand-alone and to allow students to understand it even if they have little mathematical or theoretical background.

1.2

Types of Optimization Problems

Basically, there are two general types of optimization tasks: combinatorial and numerical problems. 1.2.1 Combinatorial Optimization Problems

If another object of size a11 = 4 was added. an . ). 2430] are problems which are defined over a finite (or numerable infinite) discrete problem space X and whose candidate solutions structure can either be expressed as 1. x2 .k exists so that Equation 1. combination. The question to answer is: Can the n objects be distributed over the k bins in a way that none of the bins flows over? A configuration which meets this requirement is sketched in Figure 1. graph partitioning [344].n to 1. . 2. or 5. . . finite sequence or permutation of elements xi chosen from finite sets. The bin packing problem10 [1025] is a combinatorial optimization problem which is defined a3=1 a5=1 a9=2 a10=3 a7=2 a4=1 bin 0 bin 1 bin 2 bin 3 bin 4 a2=4 a1=4 a6=2 distribution of n=10 objects of size a1 to a10 in k=5 bins of size b=5 a8=2 b=5 Figure 1. . x ∈ X ⇒ x = (x1 .2. x2 . .1 (Bin Packing Problem). partitions. Algorithms suitable for such problems are. In Equation 1. e. the mapping is represented as a list x of k sets where the ith set contains the indices of the objects to be packed into the ith bin.org/wiki/Bin_packing_problem [accessed 2010-08-27] ∃x : ∀i ∈ 0. each of which having size b ∈ N1 . }. packing [564]. and satisfiability problems [2335].. the answer to the question would become No. Tabu Search (Chapter 40). 628. . i.org/wiki/Combinatorial_optimization [accessed 2010-08-04] 10 http://en. .wikipedia. or subsets of the above. . sets of elements. as follows: Given are a number k ∈ N1 of bins. e.2 on page 45 [1429]).1) . and Extremal Optimization (Chapter 41). for example. . Combinatorial optimization problems9 [578.k − 1 :  ∀j∈x[i] aj  ≤ b (1.   Finding whether this equation holds or not is known to be N P-complete. Vehicle Routing Problems (see Section 49. Combinatorial tasks are.. and a number n ∈ N1 with the weights a1 . The question could be transformed into an optimization problem with the goal of to find the smallest 9 http://en.1.3 on page 536 and [2162. i. elements from finite sets.2: One example instance of a bin packing problem.2.1 holds. 2124. tree or graph structures with node or edge properties stemming from any of the above types...wikipedia.8 (Combinatorial Optimization Problem). the Traveling Salesman Problem (see Example E2. The bin packing problem can be formalized to asking whether a mapping x from the numbers 1. Example E1. graph coloring [1455].1. as 4. 2912]). x ∈ X ⇒ x = {x1 . . TYPES OF OPTIMIZATION PROBLEMS 25 Definition D1. any form of nesting. Genetic Algorithms (Chapter 29).. as 3. amongst others. Simulated Annealing (Chapter 27). a2 . scheduling [564].

K).G). J.2 (Circuit Layout).I).3 (Job Shop Scheduling). (G. (B.2 for a discussion on N P-completeness and N P-hardness and Task 67 on page 238 for a homework task concerned with solving the bin packing problem. The goal is to find a layout x which assigns each circuit c ∈ C to a location s ∈ S so that the overall length of wiring required to establish all the connections r ∈ R becomes minimal. (I. F.F).B).j .L).2 takes on the lowest value. Job Shop Scheduling11 (JSS) is considered to be one of the hardest classes of scheduling problems and is also N P-complete [1026]. (F. t15 000 ).2 ≺ ti. Example E1. E. . 10 000 bottles of cola (t1 .M)} Figure 1. for example.C). M} R={(A. (H. ti. C.M). a set T of jobs ti ∈ T is given. It is clear that the bottles cannot be closed before they are filled ti. constraints in which order they must be processed. .L).     k−1   (1.3 whereas the labeling 11 http://en. In the case of a beverage factory.F). The goal is to distribute a set of tasks to machines in a way that all deadlines are met.3: An example for circuit layouts.J). Given are a card with a finite set S of slots.J). They may be extended with constraints such as to also minimize the heat emission of the circuit and to minimize cross-wire induction and so on. See Section 12.  aj  − b fbp (x) =   i=0 ∀j∈x[i] This problem is known to be N P-hard.L). Alternatively. the jobs would consist of three subjobs. c2 ) ∈ R means that the two circuits c1 . (D.M).org/wiki/Job_Shop_Scheduling [accessed 2010-08-27] . we could try to find the mapping x for which the function fbp given in Equation 1. I. (L. L.26 1 INTRODUCTION number k of bins of size b which can contain all the objects listed in a.10 000. (K.K). where each job ti consists of a number of sub-tasks ti. (H. K. t10 000 ) and 5000 bottles of lemonade (t10 001 . (C.E). since the bottles first need to labeled. 2792] and thus.j of each job ti which describes the precedence of the sub-tasks.2) max 0.. For the cola bottles. (C.J). under the constraint that no wire must cross another one. (H. (G. and finally closed.3. e.. A partial order is defined on the sub-tasks ti. (I. i. then they are filled with cola. In a JSS. . can become arbitrarily complex. c2 ∈ C have to be connected. Example E1. D. They may be extended to not only evolve the circuit location but also the circuit types and numbers [1479. (D. G. that card.wikipedia. (E.2 for i ∈ 1. Similar circuit layout problems exist in a variety of forms for quite a while now [1541] and are known to be quite hard. (C. An example for such a scenario is given in Figure 1.1 to ti. B. H. and a connection list R ⊆ C × C where (c1 . . a set C : |C| ≤ |S| of circuits to be placed on 20 slots K I H G M L A B C F D E J 11 slots S=20´11 C={A.

3 and sketched in Figure 1. Evolutionary Programming (Chapter 32).4 (Routing).5 (Function Roots and Minimization). Each of the machines mi can only process a single job at a time.2.3) However.2 ∀i ∈ 10 001. 1. we can transform it into the optimization problem Minimize the objective function 2 2 f1 (x) = |0 − g(x)| = |g(x)| or Minimize the objective function f2 (x) = (0 − g(x)) = (g(x)) . 2040] are problems that are defined over problem spaces which are subspaces of (uncountable infinite) numerical spaces. for example.9 (Numerical Optimization Problem).15 000 is replaced with filling the bottles with lemonade. The optimization algorithm has to find a schedule S which defines for each job when it has to be executed and on what machine while adhering to the order and deadline constraints. Numerical problems defined over Rn (or any kind of space containing it) are called continuous optimization problems. engineering design optimization tasks [2059]. Example E1. Function optimization [194]. each of the jobs has a deadline until which it must be finished. the optimization process would start with random configurations which violate some of the constraints (typically the deadlines) and step by step reduces the violations until results are found that fulfill all requirements. It is furthermore obvious that the different sub-jobs have to be performed by different machines mi ∈ M . find the shortest path from one node n1 to another node n2 . Estimation of Distribution Algorithms (Chapter 34) or Particle Swarm Optimization (Chapter 39).org/wiki/Optimization_(mathematics) [accessed 2010-08-04] . Different from the original formulation. Differential Evolution (Chapter 33).wikipedia. all production steps are the same as the ones for the cola bottles. it is an optimization problem. This combinatorial problem is rather simple and has been solved by Dijkstra [796] a long time ago.2 Numerical Optimization Problems Definition D1. g(x) = 5 + ln sin x2 + e tan x + cosx − arctan |x − 3| x (1. Finally. such as the real or complex vectors (X ⊆ Rn or X ⊆ Cn ). except that step ti.4 can – at least by the author’s limited mathematical capabilities – hardly be solved manually. In a network N composed of “numNodes”N nodes. Example E1. Usually. The task Find the roots of function g(x) defined in Equation 1.. Nevertheless. the objective functions f1 and f2 provide us with a 12 http://en. classification and data mining tasks [633] are common continuous optimization problems.2. The JSS can be considered as a combinatorial problem since the optimizer creates a sequence and assignment of jobs and the set of possible starting times (which make sense) is also finite. TYPES OF OPTIMIZATION PROBLEMS 27 can take place at any time. This optimization problem consists of composing a sequence of edges which form a path from n1 to n2 of the shortest length. They often can efficiently be solved with Evolution Strategies (Chapter 30). Numerical optimization problems 12 [1281.1. For the lemonade.

1 4 measure of how close an element x is to be a valid solution. . If these indeed discover elements x⋆ with f1 (x⋆ ) = f2 (x⋆ ) = 0. f1 (x⋆ ) = f2 (x⋆ ) = 0 holds.3 with the four roots x⋆ to x⋆ . we can hand any of the two objective functions to a method capable of solving numerical optimization problems. For the global optima x⋆ of the problem. we have answered the initial question since the elements x⋆ are the roots of g(x). Based on this formulation.28 4 2 0 -2 -4 -5 x1 x2 0 x3 x4 5 x g(x) 1 INTRODUCTION Figure 1.4: The graph of g(x) as given in Equation 1.

a problem such as the one sketched in Figure 1. a discrete internet programming task. however. unused beams (thin light gray) Figure 1.3 Discrete and Mixed Problems Of course. depending on the problem formulation. such as the beams available for house construction.10 (Integer Programming). techniques well capable of solving integer programming problems 13 http://en.n beams which leads to the most stable truss whilst not exceeding a maximum volume of material V would be subject to optimization. combinatorial optimization tasks can be considered as a class of such discrete optimization problems. light gray lines.2.org/wiki/Integer_programming [accessed 2010-08-04] . Notice that. A structure as the one sketched on the right hand side of the figure could result from a numerical optimization process: Some beams receive different non-zero thicknesses (illustrated with thicker lines). 0 Basically. Some other beams have been erased from the design (receive thickness di = 0). In a truss design optimization problem [18] as sketched in Figure 1.5 becomes effectively combinatorial or. The thickness di of each of the i = 1.. possible beams (light gray)..n. 1. 30mm. 0 Problems of this type are called integer programming problems: Definition D1. di ∈ {10mm. However.2. we can choose the thicknesses of the beams in the truss arbitrarily. the locations of n optimization F F truss structure: fixed points (left). 50mm.5. numerical problems may also be defined over discrete spaces. 20mm. applied force (right. TYPES OF OPTIMIZATION PROBLEMS 29 Example E1. truss optimization can also become a discrete or even combinatorial problem: Not in all real-world design tasks. for example. F) possible optimization result: beam thicknesses (dark gray). such as the Nn . 2965]. In such a case. Because of the wide range of possible solution structures in combinatorial tasks. at least. F ). the thicknesses of the beams are finite and. 300mm} ∀i ∈ 1.wikipedia. a few fixed points (left hand side of the truss sketches). possible beam locations (gray). and → − a point where a weight will be attached (right hand side of the truss sketches. sketched as thin.5: A truss optimization problem like the one given in [18].1. An integer programming problem 13 is an optimization problem which is defined over a finite or countable infinite numerical problem space (such as Nn or Zn ) [1495.6 (Truss Optimization). 100mm. when building a truss from pre-defined building blocks. It may probably be possible when synthesizing the support structures for air plane wings or motor blocks.

e.6 is not intended to provide a precise hierarchy of problem types. methods for real-valued problems are not necessarily good for integer-valued ones (and hence. are usually solved by applying combinatorial modifications to solution structures. These two approaches are fundamentally different and thus. The same goes for tasks which are suitable for real-valued vector optimization problems: the class of integer programming problems is.4 Summary Figure 1. a crossover of a combinatorial and a continuous optimization problem. Furthermore. Definition D1. or rotating vectors. Basically. although algorithms for continuous . i. the techniques for continuous optimization can theoretically be used for discrete numerical problems (integer programming) as well as for MIPs and for combinatorial problems. such as changing the order of elements in a sequence. However. of course. Combinatorial problems.30 1 INTRODUCTION are not necessarily suitable for a given combinatorial task.2. there may be a problem where a graph must be built which has edges with real-valued properties.11 (Mixed Integer Programming). there are smooth transitions between many problem classes and the type of a given problem is often subject to interpretation. multiplying them with some number. Instead. such as adding vectors. every discrete numerical problem can also be expressed as continuous numerical problem since any vector from the Rn can be easily be transformed to a vector in the Zn by simply rounding each of its elements. such as integer programming techniques. a rough sketch the relations between them.6: A rough sketch of the problem type relations. The candidate solutions of combinatorial problems stem from finite sets and are discrete. it is only Numerical Optimization Problems Discrete Optimization Problems Continuous Optimization Problems Mixed Integer Programming Problems Integer Programming Problems Combinatorial Optimization Problems Figure 1. 1. there exist optimization problems which are combinations of the above classes. Of course. However. on the other hand. algorithms for numerical optimization problems often apply numerical methods. They can thus always be expressed as discrete numerical vectors and thus could theoretically be solved with methods for numerical optimization. For instance. also not for combinatorial tasks). a member of the general numerical optimization tasks. Hence. A mixed integer programming problem (MIP) is a numerical optimization problem where some variables stem from finite (or countable infinite) (numerical) spaces while others stem from uncountable infinite (numerical) spaces..

Before digging any deeper into the matter. 1. artificial intelligence. This means that whenever they have to make a decision on how to proceed. the algorithm has terminated. The area of Global Optimization is a living and breathing research discipline. . Soft Computing. the precision with which numbers can be represented on a computer is finite since the memory available in any possible physical device is limited. deterministic algorithms will always produce the same results. As stated before. it is quite easy to see that adjusting the values and sequences of bits in a floating point representation of a real number will probably lead to inferior results or. Example E1. Again. we will attempt to classify some of these algorithms in order to give a rough overview of what will come in Part III and Part IV. Evolutionary Computation. Figure 1.3. Furthermore. industry. if solved on a physical machine.1 Deterministic Algorithms Definition D1. CLASSES OF OPTIMIZATION ALGORITHMS 31 optimization can theoretically be used for combinatorial optimization. In each execution step of a deterministic algorithm. As same as important are the numerous ideas developed by practitioners in from engineering. ranging from physics to economy and from biology to the fine arts. and so on exactly are and which specific algorithms do belong to them and which not. From this perspective.3 Classes of Optimization Algorithms In this book. Not only are there many hybrid approaches which combine several different techniques. Figure 1. Please be aware that the idea of creating a precise taxonomy of optimization methods makes no sense. there may be optimization tasks with both.3.6 provides a problem which can exist in a continuous and in a discrete flavor. combinatorial and continuous characteristics. as one possible way to classify optimizers according to their algorithm structure.1. we will try to outline some of the terms used in Figure 1. In the remainder of this section. optimization algorithms can be divided in two basic classes: deterministic and probabilistic algorithms. to slower optimization speed than using advanced mathematical operations. there also exist various definitions for what things like metaheuristics. Many contributions come from scientists from various research areas.1 Algorithm Classes Generally. both of which can be tackled with numerical methods whereas the latter one can also be solved efficiently with approaches for discrete problems. Furthermore. we will only be able to discuss a small fraction of the wide variety of Global Optimization techniques [1068. many of these fuzzy collective terms also widely overlap.7.12 (Deterministic Algorithm). there exists at most one way to proceed. it could thus be solved with approaches for combinatorial optimization. 2126]. they will come to the same decision each time the available data is the same. Finally. For the same input data.3. any continuous optimization problem.7 sketches a rough taxonomy of Global Optimization methods. 1. it is quite clear that this is rarely the case in practice and would generally yield bad results. In theory. is a discrete problem as well.7 should thus be considered as just a rough sketch. and business. at least.1. If no way to proceed exists. 1.

1887] ✛ Evolutionary Computation [715. 945] LCS Chapter 35 [430. 1615] ✻ Greedy Search Section 46.1 Tabu Search Chapter 40 [1069.3 [1557. 863. 906] Genetic Programming Chapter 31 [1602. 643] Differential Evolution Chapter 33 [2220.7: Overview of optimization algorithms.5 [1853.2 [1557. 2017] ✻ SGP LGP Memetic Algorithms Chapter 36 [1141. 2191] Cutting-Plane Method Chapter 48 [152. 3025. Programming Chapter 32 [940.3. 2140] Simulated Annealing Chapter 27 [1541. 1220. 2126.2 Extremal Optimization Chapter 41 [345. 2181. 725] Evolution Strategies Chapter 30 [300. 2199] PSO Chapter 39 [589. 2040.3.3 [1602.1 [635] Random Optimization Chapter 44 [526. 2043] ✻ Evolutionary Algorithms Chapter 28 [167. 1256] Figure 1.2. 599] ✻ Swarm Intelligence [353. 1524] ✻ BFS Section 46. 810] DFS Section 46. . 853] Informed Search Section 46. 1228] ✻Genetic Algorithms Chapter 29 [1075. 2621] EDAs Chapter 34 [1683. 2926] Section 31. 394] A⋆ Search Section 46. 230] IDDFS Section 46. 2126] ✻Probabilistic Approaches ✻ ✻ Metaheuristics Part III [1068.2.4 [2356] GRASP Chapter 42 [902. 1961] Section 31. 171. 2356] Hill Climbing Chapter 26 [2356] ✻Uninformed Search Section 46. 2153] Evol.4 [209. 2279] RFD Chapter 38 [229.6 [1901. 1912] ✻ ACO Chapter 37 [809.32 1 INTRODUCTION Optimization [609. 3043] ✻ Search Algorithms Chapter 46 [635. 1271. 2140] Downhill Simplex Chapter 43 [1655.2. 1663] Graph-based Section 31. 2714] ✛ Deterministic Approaches [1103. 2358] Branch & Bound Chapter 47 [673.2 [758] Harmony Search [1033] GGGP Section 31. 847. 881.

If f(x1 ) < f(x2 ). for example. In Example E1. however. A randomized15 (or probabilistic or stochastic) algorithm includes at least one instruction that acts on the basis of random numbers. Although this decision may be the right one in most cases. it may be possible that sometimes. 14 http://en. Definition D1.7 (Decision based on Objective Value). Robbins and Monro [2313]. Yet. we described a possible decision step in a deterministic optimization algorithm.1. Deterministic algorithms are thus used when decisions can efficiently be made on the data available to the optimization process.. Matter of fact. too complicated. we would pick x1 and otherwise. applying many of the deterministic approaches becomes less efficient and often infeasible.7 Cont. in situations like the one described in Example E1. and Bremermann [407]. Here lies the focus of this book. Generally. CLASSES OF OPTIMIZATION ALGORITHMS 33 Example E1.8 (Probabilistic Decision Making (Example E1. dynamically changing. e. a probabilistic algorithm violates the constraint of determinism [1281.8. that the optimization process has to select one of the two candidate solutions x1 and x2 for further investigation. 1967]. randomized decision making can be very advantageous. In the case that f(x1 ) < f(x2 ) actually holds. The algorithm could base its decision on the value the objective function f subject to minimization takes on. Early work in this area.3. Probabilistic algorithms have the further drawback of not being deterministic.org/wiki/Randomized_algorithm [accessed 2007-07-03] . similar to a statement such as if f(x1 ) < f(x2 ) then investigate(x1 ). Hence. This is the case if a clear relation between the characteristics of the possible solutions and their utility for a given problem exists.9. i. was started at least sixty years ago. 1966.7. may produce different results in each run even for the same input. Then. exact algorithms can be much more efficient than probabilistic ones in many domains. else investigate(x2 ). The foundations of many of the efficient methods currently in use were laid by researchers such as Bledsoe and Browning [327]. Then.1. 1. it could be a good idea to rephrase the decision to “if f(x1 ) < f(x2 ).2 Probabilistic Algorithms There may be situations.13 (Probabilistic Algorithm). we could draw a random number uniformly distributed in [0.wikipedia. investigate x2 . 1) and if this number less than 0.wikipedia. 1919. probabilistic optimization algorithms come into play. For each two points xi . continue with x2 ”. the decision procedure is the same and deterministically chooses the candidate solution with the better objective value. such kind of randomized decision making distinguishes the more Simulated Annealing algorithm (Chapter 27) which has been proven to be able to always find the global optimum (possibly in infinite time) from simple Hill Climbing (Chapter 26) which does not have this feature. the search space can efficiently be explored with. which now has become one of most important research fields in optimization.. xj ∈ X.3.org/wiki/Divide_and_conquer_algorithm [accessed 2007-07-09] 15 http://en. where deterministic decision making may prevent the optimizer from finding the best results.)). 1282. Example E1. but in 10% of the situations. In other words. it makes sense to investigate a less fit candidate solution. then choose x1 with 90% probability. If the relation between a candidate solution and its “fitness” is not obvious. Let us assume. for example. a divide and conquer scheme14 . then x1 is chosen and x2 otherwise. Friedberg [995]. or the scale of the search space is very high.

They can be considered as the equivalent of fitness or objective functions in (deterministic) state space search methods [1467] (see Section 46. Many probabilistic methods.2). utilize information from objective (see Section 2. i. 1.4). 2276] function φ is a problem dependent approximate measure for the quality of one candidate solution. Different from that. only consider those elements of the search space in further computations that have been selected by the heuristic while discarding the rest.wikipedia. An objective function f(x) usually has a clear meaning denoting an absolute utility of a candidate solution x. for instance. for instance. Still. Example E1. estimate the number of search steps required from a given solution to the optimal results. The term heuristic stems from the Greek heuriskein which translates to to find or to to discover. 2140.3.3 Heuristics and Metaheuristics Most efficient optimization algorithms which search good individuals in some targeted way.3 and the more specific Definition D46.2.9 (Heuristic for the TSP from Example E2. heuristic values φ(p) often represent more indirect criteria such as the aforementioned search costs required to reach the global optimum [1467] from a given individual.3. see Section 12. state “The total distance of this candidate solution is 1024km longer than the optimal tour. A heuristic may.14 (Heuristic).wikipedia. 1242. which summarizes all methods that find solutions for computational hard tasks (such as N P-complete problems. A heuristic function in the Traveling Salesman Problem involving five cities in China as defined in Example E2. This does not mean that the results obtained by using them are incorrect – they may just not be the global optima.2 Monte Carlo Algorithms 1 INTRODUCTION Most probabilistic algorithms used for optimization are Monte Carlo-based approaches (see Section 12. A heuristic18 [1887.3. the better the solution would be. Approaches using such functions are called informed search methods (which will be discussed in Section 46.3. It is quite clear that heuristics are problem class dependent.5 on page 518). again by trading in optimality for higher computation speed [760. all methods better than uninformed searches. In this case. Such a statement has a different quality from an objective function just stating the total tour length and being subject to minimization. An umbrella term closely related to Monte Carlo algorithms is Soft Computing17 (SC). ..org/wiki/Soft_computing [accessed 2007-09-17] 18 http://en. 1. 2685]. Deterministic algorithms usually employ heuristics to define the processing order of the candidate solutions.1 Heuristics Heuristics used in optimization are functions that rates the utility of an element and provides support for the decision regarding which element out of a set of possible solutions is to be examined next. .2 on page 45 could. on the other hand. 17 http://en. 1430.3 on page 518).org/wiki/Heuristic_%28computer_science%29 [accessed 2007-07-03] . a result little bit inferior to the best possible solution x⋆ is still better than waiting 10100 years until x⋆ is found16 .” or “You need to swap no more than three cities in this tour in order to find the optimal solution”. Definition D1. 16 Kindly refer to Chapter 12 for a more in-depth discussion of complexity.34 1.2) in relatively short time. They trade guaranteed global optimality of the solutions in for a shorter runtime.2) or heuristic functions. e. the lower the heuristic value.

3.8: Black box treatment of objective functions in metaheuristics.0) (0.3.1) (0. In many cases.3) (0.0) (3.2) (0. This becomes especially clear when considering that one most of important research areas of AI is the design of good heuristics for search.2) (2. the only choices an optimization algorithm has are: 1.0) (0.3) (2.0) (2. 19 http://en.2) (2. Which candidate solutions should be investigated further? 2.2) (0. especially for the state space search algorithms which here. Anyway.2) (3. CLASSES OF OPTIMIZATION ALGORITHMS 35 We will discuss heuristics in more depth in Section 46.3) (3.2 Metaheuristics This knowledge is a luxury often not available.1.3) (1.1) (3.3) (1.1) (1. 1.1) (2.) f1(x) compute fitness or heuristic. knowing the proximity of the meaning of the terms heuristics and objective functions. are listed as deterministic optimization methods.org/wiki/Artificial_intelligence [accessed 2007-09-17] . and create new ones via search operations f(xÎX) black box (3.2) (1.9. Then.3. however.1) (3.2) (1. the optimization algorithms have a “horizon” limited to the search operations and the objective functions with unknown characteristics. randomized approaches to be informed searches (which brings us back to the fuzzyness of terms).3) Pop(t+1) (2. select best solutions. They can produce new candidate solutions by applying search operators to the currently available genotypes. another difference between heuristics and objective functions: They usually involve more knowledge about the relation of the structure of the candidate solutions and their utility as may have become clear from Example E1. It should be mentioned that optimization and artificial intelligence19 (AI) merge seamlessly with each other. it may as well be possible to consider many of the following.3. Yet.0) (3.8.0) (1.3) (0.wikipedia.0) (1.0) (2. that is. There is.1) (1.2) (3. the exact result of the operators is (especially in probabilistic approaches) not clear in advance and has to be measured with the objective functions as sketched in Figure 1. Which search operations should be applied to them? (If more than one operator is available.1) (0.1) (2.3) Pop(t) Figure 1.

It encompasses all randomized metaheuristic approaches which iteratively refine a set (population) of multiple candidate solutions. 4. 20 http://en. Still. in this book. we here want to give a short outline of the main differences: 1.2. A metaheuristic20 is a method for solving general classes of problems. for instance. There also are techniques without direct real-world role model like Tabu Search and Random Optimization. rate the utility of a solution in comparison to another set of solutions (such as a population in Evolutionary Algorithms). Obviously. Many metaheuristics are inspired by a model of some natural phenomenon or physical process.1. 1068. Rayward-Smith [2275]. Metaheuristics.. which you can find discussed in Part III.wikipedia. [2761. Heuristic. objective function. for instance. i. 1.4 Evolutionary Computation An important class of probabilistic Monte Carlo algorithms is Evolutionary Computation21 (EC). i.3 on page 274 in detail). by treating them as black box-procedures [342.3.36 1 INTRODUCTION Definition D1. or fitness function. Problem-specific absolute/direct quality measure which rates one aspect of candidate solution. usually without utilizing deeper insight into their structure.15 (Metaheuristic).5 and Section 28.3. It combines utility measures such as objective functions or heuristics in an abstract and hopefully efficient way. This process may also take place relative to a set of given individuals. objective function.wikipedia. 1.2. problem-specific measure of the potential of a solution as step towards the global optimum. is considered to be a metaheuristic too (although it incorporates more knowledge about the structure of the heuristic functions than. [2650]..1 Summary: Difference between Heuristics and Metaheuristics The terms heuristic.org/wiki/Evolutionary_computation [accessed 2007-09-17] . problem-independent (ternary) approach to select solutions for further investigation in order to solve an optimization problem. and Taillard et al.3. Therefore. approximate the distance of the solution to the optimum in terms of search steps). 2.. e. Objective Function. for example. May be absolute/direct (describe the utility of a solution) or indirect (e. metaheuristics are not necessarily probabilistic methods. fitness function (will be introduced in 28. e. Vaessens et al. 2762]. The Raindrop Method emulates the waves on a sheet of water after a raindrop fell into it. The A⋆ search. Evolutionary Algorithms do). decides which candidate solution is used during the next search step according to the Boltzmann probability factor of atom configurations of solidifying metal melts. Therefore uses a (primary or secondary) utility measure such as a heuristic. Fitness Function. 3. Simulated Annealing. Secondary utility measure which may combine information from a set of primary utility measures (such as objective functions or heuristics) to a single rating for an individual. a deterministic informed search approach. we are interested in probabilistic metaheuristics the most.org/wiki/Metaheuristic [accessed 2007-07-03] 21 http://en. obtain knowledge on the structure of the problem space and the objective functions by utilizing statistics obtained from stochastically sampling the search space. 1103].g. Unified models of metaheuristic optimization procedures have been proposed by Osman [2103]. and metaheuristic may easily be mixed up and it is maybe not entirely obvious which is which and where is the difference. general. Metaheuristic.

A software engineer or a user who wants to solve a problem with such an approach is however more interested in its “interfacing features” such as speed and precision. where optimal tree data structures or programs are sought. In the games of the RoboCup competition. Definition D1. When it comes to time constraints and hence. or updating a factory’s machine job schedule after new orders came in. Swarm Intelligence is inspired by fact that natural systems composed of many independent. Scientists in the area of global optimization try to push this Pareto frontier22 further by inventing new approaches and enhancing or tweaking existing ones. oppose each other on a soccer field. copy the process of natural evolution and treat candidate solutions as individuals which compete and reproduce in a virtual environment. 1. Evolutionary Algorithms. Example E1. we can distinguish two types of optimization use cases. A general rule of thumb is that you can gain improvements in accuracy of optimization only by investing more time. When a soccer playing robot approaches the goal of the opposing team with the ball.9 shows some of the robots of the robotic soccer team23 [179] as the play in the RoboCup German Open 2009. 23 http://carpenoctem.5 Evolutionary Algorithms Evolutionary Algorithms can again be divided into very diverse approaches. 1. each consisting of a fixed number of robotic players. Example E4. optimality is often significantly traded in for speed gains.2 on page 83) and Evolution Strategies (ESs) which do the same with vectors of real numbers to Genetic Programming (GP). two teams. services composition for business processes. these individuals adapt better and better to the environment (defined by the user-specified objective function(s)) and thus. CLASSIFICATION ACCORDING TO OPTIMIZATION TIME 37 Some of the most important EC approaches are Evolutionary Algorithms and Swarm Intelligence.3.4. Should it shoot directly at 22 Pareto frontiers have been discussed in Section 3. become more and more suitable solutions to the problem at hand.1. on the other hand. simple agents – ants in case of Ant Colony Optimization (ACO) and birds or fish in case of Particle Swarm Optimization (PSO) – are often able to find pieces of food or shortest-distance routes very efficiently. Online optimization problems are tasks that need to be solved quickly in a time span usually ranging between ten milliseconds to a few minutes.4 Classification According to Optimization Time The taxonomy just introduced classifies the optimization methods according to their algorithmic structure and underlying principles.3.16 (Online Optimization). which will be discussed in-depth in this book. Figure 1. at least in terms of probabilistic algorithms. The variety of Evolutionary Algorithms is the focus of this book. Examples for online optimization are robot localization.10 (Robotic Soccer). Speed and precision are often conflicting objectives.net . the required speed of the optimization algorithm. In order to find a solution in this short time. The rules are similar to human soccer and the team which scores the most goals wins. spanning from Genetic Algorithms (GAs) which try to find bit-or integer-strings of high utility (see.das-lab. also including Memetic Algorithms (MAs) which are EAs hybridized with other search algorithms.5 on page 65. Generation after generation. for instance. in other words. it has only a very limited time frame to decide what to do [2518]. from the viewpoint of theory. load balancing.

some data mining tasks. There are two measures which characterize whether an optimization is suited to the timing constraints of a problem: 1. It is clear that in such situations. 2. i. only a smaller set of possible alternatives can be tested or. The time consumption of the algorithm itself. are basically infinite. These optimization processes will usually be carried out only once in a long time. predefined programs may be processed.38 1 INTRODUCTION Figure 1. Online optimization tasks are often carried out repetitively – new orders will.17 (Offline Optimization). the first hitting time FHT (or its expected value). In offline optimization problems. Before doing anything else. e. Definition D1. one must be sure about to which of these two classes the problem to be solved belongs. it may be less feasible than a simple approach if the quality of a candidate solution can be determined quickly. opposing and cooperating robots may already have moved to different locations and the decision may have become outdated. Instead. for instance. Such problems regard for example design optimization. time is not so important and a user is willing to wait maybe up to days or weeks if she can get an optimal or close-to-optimal result. If a robot takes too long to make a decision. the points where the robot can move or shoot the ball to. continuously arrive in a production facility and need to be scheduled to machines in a way that minimizes the waiting time of all jobs. Additionally. thus ignoring the optimization character of the decision making. this process often involves finding an agreement between different. An algorithm with high internal complexity and a low FHT may be suitable for a problem where evaluating the objective functions takes long. . no complex optimization algorithm can be applied. or creating long-term schedules for transportation crews.. Not only must the problem be solved in a few milliseconds. On the other hand. The number of evaluations τ until the optimum (or a sufficiently good candidate solution) is discovered. the degrees of freedom in the real world. alternatively. the goal? Should it try to pass the ball to a team mate? Should it continue to drive towards the goal in order to get into a better shooting position? To find a sequence of actions which maximizes the chance of scoring a goal in a continuously changing world is an online optimization problem. independent robots each having a possibly different view on what is currently going on in the world.9: The robotic soccer team CarpeNoctem (on the right side) on the fourth day of the RoboCup German Open 2009.

if fbp (x⋆ ) = 0.1 given on 25. it takes the next number from the genotype and checks whether the object identified by this number still fits into the current bin bin. Obviously. it is not possible to pack the n specified objects into the k bins of size b. we want to give a short introductory example on how metaheuristic optimization algorithms may look like and of which modules they can be composed. If so.5. This way.e.4 on page 70).3) starts with a tuple of k empty sets and sequentially processes the genotypes g from the beginning to the end.1 is any good in solving the problem. 5 ∈ x[0].. bin is increased. Each set can hold numbers from 1. Instead of exploring this space directly.2 and in Algorithm 1. Algorithm 1.1. the number is placed into the set in the tuple x at index bin. However.3.1 uses a different search space G (see Section 4. this genotype-phenotype mapping gpm : G → X (see Section 4. each so-called genotype. Otherwise. The translation between the genotypes g ∈ G and the candidate solutions x ∈ X is deterministic. This decision problem can be translated into an optimization problem where it is the goal to discover the x⋆ where the bin capacities are exceeded the least. we will also introduced unifying approaches which allow us to transform multi-objective problems into single-objective ones and to gain a joint perspective on constrained and unconstrained optimization. As stated in Example E1. for example. the opposite is the case. Example E1. an ) to (at most) k bins each able to hold a volume of b exists where the capacity of no bin is exceeded. In each processing step. is a permutation of the first n natural numbers.3 on page 55.2 on page 26. . and constraint optimization will be discussed in depth in Section 3.1. the answer is “No. Each of these encoded candidate solutions. to minimize the objective function fbp given in Equation 1. This representation is translated between Lines 13 and 22 to the tuple-of-sets representation which can be evaluated by the objective function fbp . It should be noted that we do not claim that Algorithm 1. the goal of bin packing is to whether an assignment x of n objects (having the sizes a1 . if there are empty bins left. such that optimize one or multiple objective functions while adhering to additional constraints (see Section 3.. The sequence of these numbers stands for the assignment order of objects to bins.1. 2. we represented a possible assignment of the objects to bins as a tuple x consisting of k sets. It is known that bin packing is N P-hard. the answer to the decision problem is “Yes. Objective functions are discussed in Section 2. Basically.1. an assignment which does not violate the constraints exist..1 is based in the bin packing Example E1. which means that there currently exists no algorithm which is always better than enumerating all possible solutions and testing each .5 Number of Optimization Criteria Optimization algorithms can be divided into 1. a2 . for one specific x. the first k − 1 bins are always filled to at most their capacity and only the last bin may overflow.1. i. matter of fact.1). The set X of all such possible solutions (s) x is called problem space and discussed in Section 2. If.n.1 on page 51. such which try to find the best values of single objective functions f (see Section 3. multi-objective.” and otherwise. In this chapter. NUMBER OF OPTIMIZATION CRITERIA 39 1. 1. fbp is evaluated in Line 23. then the fifth object should be placed into the first bin and 3 ∈ x[3] means that the third object is to be put into the fourth bin. it posses typical components which can be found in most metaheuristics and can relatively easily be understood.1. such that optimize sets f of target functions (see Section 3. and 3.6 Introductory Example In Algorithm 1.. This distinction between single-objective.” For fbp in Example E1.

. .. . . . . . . g ⋆ ∈ G: the current and best genotype so far . .5) 1 2 3 4 5 6 7 8 9 10 11 12 13 begin initialization: nullary search operation g ⋆ ←− (1. . . . . . . . . . an ∈ N1 : the sizes of the n objects Data: g.2) Data: y. (see Definition D2. ∅. . . a2 . . (see Definition D3. . . . .3) . . n)⌋ searchOp : G → G (see Section 4. a) . .40 1 INTRODUCTION Algorithm 1. . .2) y ←− +∞ repeat // perform a random search step g ←− g ⋆ i ←− ⌊randomUni[1. . . .1) Input: k ∈ N1 : the number of bins Input: b ∈ N1 : the size of each single bin Input: a1 . . . . . .2) y ←− fbp (x) // the search logic: always choose the better point if y < y ⋆ then What is good? (see Chapter 3 and 5. . . . .. (see Example E1. b. . . .3) Output: x ∈ X: the best phenotype found .1: x ←− optimizeBinPacking(k. 2.) n times 14 15 16 17 18 19 20 21 22 bin ←− 0 sum ←− 0 for i ←− 0 up to n − 1 do j ←− g [i] if ((sum + aj ) > b) ∧ (bin < (k − 1)) then bin ←− bin + 1 sum ←− 0 x[bin] ←− x[bin] ∪ j sum ←− sum + aj 23 24 25 26 27 28 29 until y ⋆ = 0 return x // compute objective value objective function f : X → R (see Section 2. . . . . . .. .3) g [j] ←− t // compute the bins x ←− (∅. . . . . .2) g ←− g ⋆ x ←− x y ⋆ ←− y termination criterion (see Section 6. . . .2) Data: x ∈ X: the current and best phenotype so far .. . . n)⌋ repeat search step: unary search operation j ←− ⌊randomUni[1. (see Definition D4. . . (see Definition D2. . . . y ⋆ ∈ R+ : the current and best objective value . .3. n) ⋆ searchOp : ∅ → G (see Section 4. .2) until i = j t ←− g [i] genotype-phenotype mapping g [i] ←− g [j] gpm : G → X (see Section 4.

Our Algorithm 1. INTRODUCTORY EXAMPLE 41 of them. . A discussion on what the term better actually means in optimization is given in Chapter 3. is chosen not too wise. In Line 24. one with no overflowing bin.6.2 on page 148). the search will from now on use its corresponding genotype (g ⋆ ←− x) for creating new candidate solutions with the search operation and remembers x as well as its objective value y. This termination criterion (see Section 6.1 continues until a perfect packing. This operation basically is a unary search operation which maps the elements of the search space G to new elements in G and creates new genotypes g. We will give clear definitions for the typical modules of an optimization algorithm and outline how they interact.1. If the new phenotype x is better than the old one x. The search process in Algorithm 1. Effectively.2 on page 83. The basic definitions for search operations is given in Section 4.1 here served as example to show the general direction of where we will be going. since if no solution can be found. e.. checked in Line 28.3). Algorithm 1. is discovered. the optimizer compares the objective value of the best assignment x found so far with the objective value of the candidate solution x decoded from the newly created genotype g with the genotype-phenotype mapping “gpm”. it randomly exchanges two objects (Lines 5 to 12). we will put a simple and yet comprehensive theory around the structure of metaheuristics.1 tries to explore the search space by starting with putting the objects into the bins in their original order (Line 2). the algorithm will run forever. The initialization step can be viewed as a nullary search operation. i. In each optimization step.3. this Hill Climbing approach (see Chapter 26 on page 229) is thus a Las Vegas algorithm (see Definition D12. In the following chapters.

Describe them in the same way as requested in Task 1. Check if different classifications are possible for the problems and if so. Apply your implementation to the bin packing instance given in Figure 1. outline how the problems should be formulated in each case.1 on page 40 in a programming language of your choice. [10 points] 4. business. for example. [10 points] 2. use a step-by-step debugger if necessary. Implement Algorithm 1.6 on page 29) is. 4.2 on page 25 and to the problem k = 5. discrete. 1) and (hence) n = 9. and point out possible remedies. What are the differences between solving general optimization problems and specific ones? Discuss especially the difference between black box optimization and solving functions which are completely known and maybe even can be differentiated. or science. [10 points] 3. 3.42 Tasks T1 1 INTRODUCTION 1. their impact. 2. What are the advantages and disadvantages of randomized/probabilistic optimization algorithms? [4 points] 7. and how the investigated space of possible solutions looks like. Find at least three features which make Algorithm 1. a finite set of beam thicknesses we can choose from. [5 points] .6 on page 30. whether they should be maximized or minimized. 3. b = 6.2 and sketched in Figure 1. and mixed integer programming. 3. usually considered to be a numerical problem since the thickness of each beam is a real value. If there is. Describe these drawbacks. however. [20 points] 5. combinatorial. In which way are the requirements for online and offline optimization different? Does a need for online optimization influence the choice of the optimization algorithm and the general algorithm structure which can be used in your opinion? [5 points] 8. it becomes a discrete problem and could even be considered as a combinatorial one. Does the algorithm work? Describe your observations. a = (6. the classes introduced in Section 1. The truss optimization problem (see Example E1. Describe the criteria which are subject to optimization. 2. Name five optimization problems which can be encountered in engineering. Name five optimization problems which can be encountered in daily life. [6 points] 6. Classify the problems from Task 1 and 2 into numerical.1 on page 40 rather unsuitable for practical applications. You may use an implementation of the algorithm as requested in Task 4 for gaining experience with what is problematic for the algorithm. 5.

The problem space X is often restricted by practical constraints. while in N0 . we said that as problem space for minimizing a mathematical function. A candidate solution (phenotype) x is an element of the problem space X of a certain optimization problem. With them numbers can only be expressed up to a certain precision. more than one problem space can be defined for a given optimization problem. The problem space (phenome) X of an optimization problem is the set containing all elements x which could be its solution.1 (Problem Space). it also has subtle influence on the way in which an optimization algorithm can perform its search. for instance. an element x = 1 + 10−16 cannot be discovered. it determines which solutions we can possible find. we synonymously refer to the problem space as phenome and to candidate solutions as phenotypes. Then again. If the computer can only handles numbers up to 15 decimals.1 Problem Space: How does it look like? The first step which has to be taken when tackling any optimization problem is to define the structure of the possible solutions. there are not. is inspired by the natural evolution and re-uses many terms from biology. Definition D2. we will especially focus on evolutionary optimization algorithms. impossible to take all real numbers into consideration in the minimization process of a real function. One of the oldest such method. there is no total order defined on C. Usually. A few lines before. On the other hand. unlike N0 and R.Chapter 2 Problem Space and Objective Functions 2. the Genetic Algorithm. On our off-the-shelf CPUs or with the Java programming language. for instance. .2 (Candidate Solution). It is. Between each two different points in R. the real numbers R would be fine. there are infinitely many other numbers. For a minimizing a simple mathematical function. the solution will be a real number and if we wish to determine the optimal shape of an airplane’s wing. we are looking for a blueprint describing it. In dependence on Genetic Algorithms. Definition D2. we can only use 64 bit floating point numbers. In this book. we could as well restrict ourselves to the natural numbers N0 or widen the search to the whole complex plane C. This choice has major impact: On one hand. On the other hand.

objective functions are subject to minimization.81m/s2 .051s2 /m ∗ x2 (2. and objective function f mean. the evaluation of the objective functions may incorporate randomized simulations (Example E2.3) or even human interaction (Example E2.10 on page 646 we introduce the concept of functions as functional binary relations (see ?? on page ??) which assign to each element x from the domain X one and only one element from the codomain (in case of the objective functions. In other words. this is not always fully true. The problem space X of this problem equals to the real numbers R and we can define an objective function f : R → R which denotes the absolute difference between the actual throw distance d(x) and the target distance 100m when a stone is thrown with x m/s as sketched in Figure 2. These example should give an impression on the variety of possible ways in which they can be specified and structured. the codomain is R). As parameter.6). however. let us. the functionality constraint may be violated.8 on page 712. d(x) a f(x) 100m Figure 2.1.3. candidate solution x.3. Example E2. α=15◦ d(x) = x sin(2α) g ≈ 0. we will use several examples to build an understanding of what the three concepts problem space X. Such measures are called objective functions. they take a candidate solution x from the problem space X and return a value which rates its quality.org/wiki/Trajectory [accessed 2009-06-13] . In Listing 55.1) (2. An objective function f : X → R is a mathematical function which is subject to optimization. Notice that although we state that an objective function is a mathematical function. the better is the candidate solution x ∈ X. the smaller f(x).1: A sketch of the stone’s throw in Example E2. Definition D2.1 (A Stone’s Throw).1: f(x) = |100m − d(x)| 2 (2.3) 1 http://en. In such cases.wikipedia.3 (Objective Function).2) g≈9. However. For the purpose of simplicity. A very simple optimization problem is the question Which is best velocity with which I should throw1 a stone (in an α = 15◦ angle) so that it lands exactly 100m away (assuming uniform gravity and the absence of drag and wind)? Everyone of us has solved countless such tasks in school and it probably never came to our mind to consider them as optimization problem. In Section 51.2 Objective Functions: Is it good? The next basic step when defining an optimization problem is to define a measure for what is good.44 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS 2. you can find a Java interface resembling Definition D2. Usually. stick with the concept given in Definition D2. In the following.

5) (2. and Shanghai. Example E2.051s2 /m 1960.3 on page 651 on the routing utility GoogleMaps (http://maps.051s2 /m ∗ (x⋆ ) 0.1. the problem space. based on data from GoogleMaps.051s2 /m ∗ (x⋆ ) ≈ 100m x⋆ ≈ 100m ≈ 0. w(a. OBJECTIVE FUNCTIONS: IS IT GOOD? 45 This problem can easily be solved and we can obtain the (globally) optimal solution x⋆ . 1124. was very simple since it equaled the (positive) real numbers. Chengdu.com/ [accessed 2009-06-16]).78m2 /s2 ≈ 44.2 (Traveling Salesman Problem). Here. could have easily been obtained from literature. e.4 (Traveling Salesman Problem).2. Hefei. i. The Traveling Salesman Problem [118.b) Beijing Chengdu Guangzhou Hefei Shanghai 1244 km 2095 km 1529 km 472 km Hefei 1044 km 1615 km 1257 km Guangzhou 2174 km 1954 km Chengdu 1854 km Beijing Hefei Chengdu Shanghai Guangzhou Figure 2. if not known beforehand. 1694] was first mentioned in 1831 by Voigt [2816] and a similar (historic) problem was stated by the Irish Mathematician Hamilton [1024].6) Example E2.2: An example for Traveling Salesman Problems. We had a very clear objective function f for which even the precise mathematical definition was available and. the set of possible solutions X is not as trivial as in Example E2. since we know that the best value which f can take on is zero: f(x⋆ ) = 0 ≈ 100m − 0. too.google.2. Furthermore.2. The pairwise distances of the cities are known. Guangzhou. The (symmetric) distance matrix3 for computing a tour’s length is given in the figure. Its basic idea is that a salesman wants to visit n cities in the shortest possible time under the assumption that no city is visited twice and that he will again arrive at the origin by the end of the tour. Instead of being a mapping from the real numbers to the real numbers..28m/s 2 2 (2. to find the shortest cyclic tour through Beijing. The goal of the Traveling Salesman Problem (TSP) is to find a cyclic path of minimum total weight2 which visits all vertices of a weighted graph. the space of possible solutions. This problem can be formalized with the aid of graph theory (see Chapter 52) as follows: Definition D2. the objective function f takes 2 see 3 Based Equation 52. . Assume that we have the task to solve the Traveling Salesman Problem given in Figure 2.1 was very easy.4) (2.

the Traveling Salesman Problem is one of the most complicated optimization problems. With a total distance of f(x12 ) = 6781 km. x[i + 1]) + w(x[3]. Hence. it quickly becomes infeasible since the number n of solutions grows more than exponentially n ≤ n! . X is the set of all such permutations.1: The unique tours for Example E2. Hence. If we. Enumerating all possible solutions is. Computing it is again rather simple. of course. in the simple Exam3 ple E2. we cannot simply solve it and compute the optimal result x⋆ .7) We can furthermore halve the search space by recognizing that it plays no role for the total distance in which direction we take our tour x ∈ X. X = Π({Beijing. Shanghai}) (2. x[0]) + i=0 w(x[i]. From this example we can learn three things: 4 http://en.46 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS one possible permutation of the five cities as parameter.wikipedia. enumerating all candidate solutions would have even been impossible since X = R. Matter of fact.org/wiki/Brute-force_search [accessed 2009-06-16] . both starting and ending in Hefei. the tour xi x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei → → → → → → → → → → → → Beijing → Chengdu → Guangzhou → Shanghai Beijing → Chengdu → Shanghai → Guangzhou Beijing → Guangzhou → Chengdu → Shanghai Beijing → Guangzhou → Shanghai → Chengdu Beijing → Shanghai → Chengdu → Guangzhou Beijing → Shanghai → Guangzhou → Chengdu Chengdu → Beijing → Guangzhou → Shanghai Chengdu → Beijing → Shanghai → Guangzhou Chengdu → Guangzhou → Beijing → Shanghai Chengdu → Shanghai → Beijing → Guangzhou Guangzhou → Beijing → Chengdu → Shanghai Guangzhou → Chengdu → Beijing → Shanghai → → → → → → → → → → → → Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei Hefei f(xi ) 6853km 7779km 7739km 8457km 7594km 7386km 7644km 7499km 7459km 8385km 7852km 6781km Table 2. we will determine x⋆ with a so-called brute force 4 approach: We enumerate all possible solutions and their corresponding objective values and then pick the best one. Since the tours which we want to find are cyclic. For TSPs with more cities. we only need to take the permutations of four cities into consideration. In our case. Here. and Shanghai. as the route over Guangzhou. We again want to minimize it. the most time-consuming optimization approach possible. Guangzhou. From Table 2. Beijing. it is more than 1500 km shorter than the worst solution x4 . for instance. but different from Example E2. It equals to the sum of distances between the consecutive cities x[i] in the lists x representing the permutations and to the start and end point Hefei. Chengdu.8) The objective function f : X → R – the criterion subject to optimization – is the total distance of the tours. this avails to 12 unique permutations. Chengdu. In other words. 1 for a TSP with n cities. Hefei) (2. Furthermore. assume that the tour starts in Hefei. it will also end in Hefei and each possible solution is completely characterized by the sequence in which Beijing. Chengdu. a tour (Hefei → Shanghai → Beijing → Chengdu → Guangzhou → Hefei) has the same length and is equivalent to (Hefei → Guangzhou → Chengdu → Beijing → Shanghai → Hefei).1 we can easily spot the best solution x⋆ = x12 . 2 f(x) = w(Hefei.1. and Shanghai are visited. there are 2 ∗ (n − 1)! possible solutions.1.2. Guangzhou.

Computing the features of analog circuits can be done by hand with some effort. diodes. weight and shape of the proposed constructions. Lohn et al. [1615]. and Villegas et al. Yet. Koza et al. 2. amongst others. 5 http://en. inductors. for instance. for instance. A similar problem is the automated analog circuit design. Example E2. where a circuit diagram involving a combination of different electronic components (resistors. in order to evaluate the fitness of circuits. there are many optimization problems where this is not the case. and/or transistors) is sought which fulfills certain requirements. OBJECTIVE FUNCTIONS: IS IT GOOD? 47 1. the computational costs of computing the objective values rating the aforementioned features of the candidate solutions are high and involve simulating the air flows around the wings and blades. [1769. If an analysis of the frequency and timing behavior is required. Typically.2 the evaluation of even small sets of candidate solutions may take a substantial amount of time. different from Example E2. also often involving wire thickness and geometry. However. tried to find wing designs with minimal drag and weight but maximum tank volume. 1451]. [2814]. the tasks span from the synthesis of amplifiers [1607. Choo et al. a set of server computers is employed to execute these external programs so the utility of multiple candidate solutions can be determined in parallel. most “interesting” optimization problem have this feature.2. 3. 1763. Matter of fact.org/wiki/Antenna_gain [accessed 2009-06-19] . 1770]. the difficulty of the equations quickly goes out of hand. indeed.2. simulation environments [2249] like SPICE [2815] are normally used. such as Butterworth [1608. In most cases. capacitors. John and Ammann [1450. Therefore. a simulation environment and considerable computational power is required. whereas Chung et al. 1766]. Like in the electrical engineering example. Example E2.2 could not be solved for x. The objective functions usually involve maximizing the antenna gain5 and bandwidth together with constraints concerning. 1705] optimized the design of rotor blades for helicopters. computing the gain (depending on the signal frequency and radiation angle) of an arbitrarily structured antenna is not trivial. Enumerating all solutions is not a good optimization approach and only applicable in “small” or trivial problem instances.4 (Analog Electronic Circuit Design). filters [1766]. There are objective functions which cannot be resolved mathematically. Example E2. to analog robot controllers [1613]. 1764] and Chebycheff filters [1608]. Aircraft design is another domain where many optimization problems can be spotted. Here. Here. [575]. Besides the fact that there are many possible antenna designs. its results can still be easily determined for any given candidate solution. Although the objective function of Example E2. but requires deeper knowledge of electrical engineering. Rahmat-Samii and Michielssen dedicate no less than four chapters to it in their book on the application of (evolutionary) optimization methods to electromagnetism-related tasks [2250]. have a very different structure. Designing antennas is an optimization problem which has been approached by many researchers such as Chattoraj and Roy [532]. [579] searched a geometry with a minimum drag and shock pressure/sonic boom for transonic aircrafts and Lee and Hajela [1704.wikipedia. the problem spaces X are the sets of feasible antenna designs with many degrees of freedom in up to three dimensions.3 (Antenna Design). Obayashi [2059] and Oyama [2111]. The problem space containing the parameters of the objective functions is not restricted to the real numbers but can.5 (Aircraft Design).

2746] is attractive or music8 [1453] sounds good.wikipedia. fickleness. inaccuracy. The Interactive Genetic Algorithms6 (IGAs) [471. some interactive optimization methods have been developed. involve multiple simulations. Nevertheless. The next question to answer is What constitutes these “best” elements? 6 http://en. esthetic works of art are created. 2651. [1583]) and Unemi’s graphic and movie tool [2746]. 2529.wikipedia. human beings will not be replaced by machines for a long time. it is not really possible to devise any automated method for assigning objective values to the points in the problem space.6 (Interactive Optimization). such as indeterminism. also rub off on the objective functions. The first step when approaching an optimization problem is to choose the problem space X.wikipedia. 2652] outsource the objective function evaluation to human beings who. this approach is driven even further by also letting human “agents” select and combine the most interesting candidate solutions step by step.org/wiki/Evolutionary_art [accessed 2009-06-18] 8 http://en.48 2 PROBLEM SPACE AND OBJECTIVE FUNCTIONS Example E2. By incorporating this information in the search process. Objective functions are not necessarily mere mathematical expressions. changing attention and loss of focus which. obviously. as we will later discuss in detail in Chapter 7 on page 109.org/wiki/HBGA [accessed 2007-07-03] . similar features may also occur if determining the utility of a candidate solution involves randomized simulations or measuring data from real-world processes. 2497.org/wiki/Interactive_genetic_algorithm [accessed 2007-07-03] 7 http://en. but can be complex algorithms that.org/wiki/Evolutionary_music [accessed 2009-06-18] 9 http://en. Global Optimization comprises all techniques that can be used to find the best elements x⋆ in X with respect to such criteria. Especially in the area of Evolutionary Computation. In this domain. for instance.wikipedia. for example. When it comes to judging art. All interactive optimization algorithms have to deal with the peculiarities of their human “modules”. Let us now summarize the information gained from these examples. judge whether an evolved graphic7 or movie [2497. In Kosorukoff’s Human Based Genetic Algorithm9 (HBGAs.

you can ignore wire crossing and simply assume that a wire going from location (i. intermediate nodes. These nodes may. Note that there is not necessarily a direct connection between n1 and n2 . Develop a problem space X which is able to represent all paths between the first and second node in an arbitrary network with m ≥ 2. Do not make the problem specific to the cola/lemonade example. j) to location (k. In this case. such as ni → nA → nB → nj . . [5 points] 11. [15 points] . A value of +∞ at ∆ij means that there is no direct connection ni → nj between node ni and nj . we stated the layout of circuits as a combinatorial optimization problem.2. Assume a network N = {n1 . . For this purpose.3 on page 26.2 on page 26. . Define a suitable objective function for this problem. be connected by a path which leads over other. nm } and that a m × m distance matrix ∆ is given that contains the distance between node ni and nj at position ∆ij .2. Define an objective function f : X → R+ which denotes the total length of required wiring required for a given element x ∈ X. Define a problem space for general job shop scheduling problems as outlined in Example E1. . n2 . Define a possible problem space X for this task that allows us to assign each circuit c ∈ C to one location of an m × n-sized card. l) has length |i − k| + |j − l|. there would be a connection of finite length. however. Hence. Name some problems which can occur in circuit layout optimization as outlined in Task 9 if the mentioned simplified assumption is indeed applied. We consider the specific optimization task Find the shortest connection between node n1 and n2 . [10 points] 12. nj can be reached from ni over a path leading first to nA and then to nB before finally arriving at nj . OBJECTIVE FUNCTIONS: IS IT GOOD? Tasks T2 49 9. [10 points] 10. In Example E1.

.

we usually simply minimize its negation (−f) instead.1 (two-dimensional objective function). and the placement of commercials in a way that maximizes the profit. depending on what we are looking for. Let us therefore now discuss what it is that makes a solution optimal 1 .wikipedia. on the other hand. 1 http://en.org/wiki/Maxima_and_minima [accessed 2007-07-03] . As outlined in this graphic. the goal was to find a cyclic route of minimal distance through five Chinese cities. we will give some basic definitions on what the optima of a single (objective) function are. As illustrated. however.2. arrange the purchase of raw material. it is a convention that optimization problems are most often defined as minimization tasks. In the case of antenna design (Example E2. the antenna gain was maximized and in Example E2. X2 ) ⊂ R2 .3). Example E3. we will do this in a way that miniminzes the production time. We will. In this section.Chapter 3 Optima: What does good mean? We already said that Global Optimization is about finding the best possible solutions for given problems. In Global Optimization. If a criterion f is subject to maximization. Figure 3. If we own a manufacturing plant and have to assign incoming orders to machines.1 Single Objective Functions In the case of optimizing a single criterion f. we distinguish between local and global optima. 3. an optimum is either its maximum or minimum.1 illustrates such a function f defined over a two-dimensional space X = (X1 . the employment of staff. a global optimum is an optimum of the whole domain X while a local optimum is an optimum of only a subset of X.

.4) l ⇒ x⋆ is local maximum(3.1: Global and local optima of a two-dimensional function.1) Alternatively. either a local minimum or maximum. If. a local maximum xl has the highest objective value when ˆ compared to its neighbors.1. i. Back in high school. Such points are called (local) extrema2 .5) l h→0 h→0 h→0 h→0 2 http://en.wikipedia. 3.52 3 OPTIMA: WHAT DOES GOOD MEAN? local maximum local minimum local maximum global maximum f X2 X1 global minimum X Figure 3. we can detect a local minimum or maximum also when the first derivative undergoes a sign change. if f can be differentiated two times at x⋆ .1 Extrema: Minima and Maxima of Differentiable Functions A local minimum xl is a point which has the smallest objective value amongst its neighbors ˇ in the problem space.” l x⋆ is local extremum ⇒ f ′ (x⋆ ) = 0 l l Furthermore. e.2) (3. In optimization.3) (3. a local optimum is always an extremum. sign (f ′ (x⋆ − h)) = sign (f ′ (x⋆ + h)). for a small h. we learned: “If f : X → R (X ⊆ R) has a (local) extremum at x⋆ l and is differentiable at that point. then the first derivative f ′ is zero at x⋆ . the following holds: l (f ′ (x⋆ ) = 0) ∧ (f ′′ (x⋆ ) > 0) ⇒ x⋆ is local minimum l l l (f ′ (x⋆ ) = 0) ∧ (f ′′ (x⋆ ) < 0) ⇒ x⋆ is local maximum l l l (3. then we can l l state that: (f ′ (x⋆ ) = 0) ∧ l (f ′ (x⋆ ) = 0) ∧ l lim f ′ (x⋆ − h) < 0 ∧ l lim f ′ (x⋆ − h) > 0 ∧ l lim f ′ (x⋆ + h) > 0 l lim f ′ (x⋆ + h) < 0 l ⇒ x⋆ is local minimum (3.org/wiki/Maxima_and_minima [accessed 2010-09-07] . Similarly.

To solve f ′ (x⋆ ) = 0 we set 0 = 2x⋆ and get. We cannot use Equation 3. we focus on derivative-free optimization.1. x < 0 ⇒ (4x3 ) < 0 ⇒ g ′ (x) < 0.2 on page 45.2 Global Extrema The definitions of globally optimal points are much more straightforward and can be given based on the terms we already have discussed: Definition D3. For the two-dimensional objective function given in Example E3.6) For distinguishing maxima.4. l From many of our previous examples such as Example E1. or Example E2. In this book. For positive x. x⋆ is a minimum of g(x). x > 0 ⇒ (4x3 ) > 0 ⇒ g ′ (x) > 0. the he definition of local optima is a bit more complicated and will be discussed in detail in Section 4.2 (Extrema and Derivatives). l both g ′ (x⋆ ) and g ′′ (x⋆ ) become 0.3 (Global Optimum).1 (Global Maximum). however. l l If we take the function g(x) = x4 − 2. A global maximum x ∈ x of one (objective) funcˆ tion f : X → R is an input element with f(ˆ) ≥ f(x) ∀x ∈ X. A global minimum x ∈ X of one (objective) funcˇ tion f : X → R is an input element with f(ˇ) ≤ f(x) ∀x ∈ X. x Definition D3.org/wiki/Hessian_matrix [accessed 2010-09-07] .4..1.3. obviously.1: x⋆ is local extremum ⇒ grad(f)(x⋆ ) = 0 l l (3. each global optimum. For x⋆ = 0.5 can only be directly applied if the problem space is a subset of the real numbers X ⊆ R. for example. Derivative-based optimization is hence only feasible in a subset of optimization problems.4.wikipedia. l l l l f ′′ (x⋆ ) = 2 and hence. minimum. or maximum. the Hessian. Equation 3. if we take a small l l step to the left.6. Furthermore.1 to Equation 3. Even if the objective function exists in a differentiable form. A global optimum x⋆ ∈ X of one (objective) function f : X → R subject to minimization is a global minimum. computing the gradient. minimum. e. We have a sign change from − to + and according to Equation 3.2 (Global Minimum). SINGLE OBJECTIVE FUNCTIONS 53 Example E3. let alone in one that can be differentiated.2. we will postpone it to Section 4. At x⋆ . into x < 0. Of course.4 on page 47. The precise definition of local optima without using derivatives requires some sort of measure of what neighbor means in the problem space.1. also a local optimum. the Hessian matrix3 would be needed. and saddle points. However. i. and their properties becomes cumbersome. 3. They do not work for discrete spaces or for vector spaces.1 on page 25. we know that objective functions in many optimization problems cannot be expressed in a mathematically closed form. x⋆ = 0. or maximum is. the generalized form of Equation 3. Since this requires some further definitions. for problem spaces X ⊆ Rn of larger scales n > 5. minima. 3 http://en. we get g ′ (x) = 4x3 and g ′′ (x) = 12x2 . A global optimum of a function subject to maximization is a global maximum.2. x⋆ is a local minimum according to Equation 3. Example E2. at the same time. The function f (x) = x2 + 5 has the first derivative f ′ (x) = 2x and the second derivative f ′′ (x) = 2. x Definition D3. As stated before. we would need to a apply Equation 3.

Figure 3. we sketch a function f(x) which evaluates x2 outside the interval [−5. Fig.b: The cosine function: infinitely many optima.a: x3 −8x2 : a function with two global minima.2. As mentioned before. The optimal set X ⋆ ⊆ X of an optimization problem is the set that contains all its globally optimal solutions.3 (One-Dimensional Functions with Multiple Optima). 3.2: Examples for functions with multiple optima ˇ would find that it has two global optimal x⋆ = x1 = − 16 and x⋆ = x2 = 16 . 3. it has uncountable infinite global optima.5 0 -0. 5]. If the function f(x) = x3 − 8x2 (illustrated in Fig. we 200 f(x) 100 0 -100 -10 -5 0 xÎX 5 10 x1 ^ 60 40 20 -10 -5 0 X ^ ^ x2 1 0. Even a onedimensional function f : R → R can have more than one global maximum.2.2. Example E3.2 Multiple Optima Many optimization problems have more than one globally optimal solution. 3. In Fig. 100 f(x) 80 xÎX 5 10 Fig. Definition D3. In multi-objective optimization.c: A function with uncountable many minima. Hence. 3. 5] are minima. all the points in the real interval [−5.3. multiple global minima.54 3 OPTIMA: WHAT DOES GOOD MEAN? 3. In single-objective optimization.a) was subject to minimization. there exist a variety of approaches to define optima which we will discuss in-depth in Section 3. 5] and has the value 25 in [−5. the exact meaning of optimal is problem dependent.4 (Optimal Set).2.5 -1 f(x) ^-1 x -10 -5 ^0 x 0 ^1 x xÎX 5 10 Fig.2. Another ˇ 1 2 3 3 ˆ interesting example is the cosine function sketched in Fig. 3.c. or even both in its domain X. it means to find the candidate solutions which lead to either minimum or maximum objective values.b. . If the function is subject to minimization. It has global maxima xi at xi = 2iπ and global minima xi at xi = (2i + 1)π for all i ∈ Z. The correct solution of such ˆ ˇ ˇ an optimization problem would then be a set X ⋆ of all optimal inputs in X rather than a single maximum or minimum. 3.2.

2 on page 253).3. . .3. an efficient approximation algorithm might produce reasonably good x in short time.2 on page 103. Definition D3.7) (3. 954].2) in polynomial time. the goal of optimization is not necessarily to find X = X ⋆ or X ⊂ X ⋆ but a good approximation X ≈ X ⋆ in a reasonable time. We will discuss the issue of diversity amongst the optima in Chapter 13 in more depth. 740. Obviously. . In order to know whether these are global optima.6 which will later be refined in Definition D6. In many real-world design or decision making problems. for instance. as described in detail in Section 12. In the further text. sometimes even infinite many optimal solutions in many optimization problems. }.3. the best outcome of an optimization process would be to find X = X ⋆ . In a multi-objective optimization problem (MOP). Algorithms designed to optimize sets of objective functions are usually named with the prefix multi-objective. x2 . . we would. f2 (x) . however.n} T (3. in many problems. we will treat these sets as vector functions f : X → Rn which return a vector of real numbers of dimension n when applied to a candidate solution x ∈ X. Therefore. no algorithm which can solve the Traveling Salesman Problem (see Example E2. a set f : X → Rn consisting of n objective functions fi : X → R. it is only possible to find a finite (sub-)set of solutions. Definition D3. .3. The set X ⊆ X contains output elements x ∈ X of an optimization process. each representing one criterion. From Example E3. we will denote the result of optimization algorithms which return single candidate solutions (in sans-serif font) as x and the results of algorithms which are capable to produce sets of possible solutions as X = {x1 .3.. MULTIPLE OBJECTIVE FUNCTIONS 55 There are often multiple. Since the storage space in computers is limited. Hence.5 (Optimization Result). if such a task involving a high enough number of locations is subject to optimization.6 (Multi-Objective Optimization Problem (MOP)). we already know that this might actually be impossible if there are too many global optima. Purshouse [2228. To the Rn we will in this case refer to as objective space Y. like multi-objective Evolutionary Algorithms (see Definition D28. ) In the following. is to be optimized over a problem space X [596. the goal should be to find a X ⊂ X ⋆ in a way that the candidate solutions are evenly distributed amongst the global optima and not all of them cling together in a small area of X ⋆ . there are multiple criteria to be optimized at the same time.8) f (x) = (f1 (x) . 2230] provides a nice classification of the possible relations between the objective functions which we illustrate here as Figure 3. Then. Besides that we are maybe not able to find all global optima for a given optimization problem. Such tasks can be defined as in Definition D3. 3. . need to wait for thousands of years. f = {fi : X → R : i ∈ 1. There exists.1 Introduction Global Optimization techniques are not just used for finding the maxima or minima of single functions f. In the best case from the .3 Multiple Objective Functions 3.3. it may also be possible that we are not able to find any global optimum at all or that we cannot determine whether the results x ∈ X which we have obtained are optimal or not.

Then.10) Besides harmonizing and conflicting optimization criteria. two objective functions fi and fj are independent from each other in a subset X ⊆ X of the problem space X if the MOP can be decomposed into two problems which can optimized separately and whose solutions can then be composed to solutions of the original problem. In the following.8 (Conflicting Objective Functions).3: The possible relations between objective functions as given in [2228.8). and independent optimization criteria. We also list some typical multi-objective optimization applications in Table 3. If X = X.9 holds. Definition D3. If the criteria conflict. . these relations are called total and the subscript X is left away in the notations. In a multi-objective ptimization problem. x2 ∈ X ⊆ X] (3. Definition D3. it would suffice to optimize only one of them and the MOP can be reduced to a single-objective one. Obviously. optimization perspective. x2 ∈ X ⊆ X] (3.1. conflicting criteria. If two objectives f3 and f4 conflict. however. we give some examples for typical situations involving harmonizing. In a multi-objective optimization problem.56 3 OPTIMA: WHAT DOES GOOD MEAN? relation dependent conflict harmony independent Figure 3. In a harmonious relationship. objective functions do not need to be conflicting or harmonic for their complete domain X. we denote this as f3 ≁ f4 . two objective functions fi and fj are conflicting (fi ≁X fj ) in a subset X ⊆ X of the problem space X if Equation 3.. e.9) The opposite are conflicting objectives where an improvement in one criterion leads to a degeneration of the utility of the candidate solutions according to another criterion.7 (Harmonizing Objective Functions). but instead may exhibit these properties on certain intervals only.10 holds in a subset X of the problem space X. the exact opposite is the case. fi ∼X fj ⇒ [fi (x1 ) < fi (x2 ) ⇒ fj (x1 ) < fj (x2 ) ∀x1 . fi ≁X fj ⇒ [fi (x1 ) < fi (x2 ) ⇒ fj (x1 ) > fj (x2 ) ∀x1 . 2230]. before discussing basic techniques suitable for defining what is a “good solution” in problems with multiple. two objective functions fi and fj are harmonizing (fi ∼X fj ) in a subset X ⊆ X of the problem space X if Equation 3. i. do not influence each other: It is possible to modify candidate solutions in a way that leads to a change in one objective value while the other one remains constant (see Example E3. two objective functions f1 and f2 are in harmony. the optimization criteria can be optimized simultaneously. written as f1 ∼ f2 . meaning that improving one criterion will also lead to an improvement in the other. an improvement of one leads to an improvement of the other. conflicting. Independent criteria. Definition D3. there may be such which are independent from each other.9 (Independent Objective Functions). In a multi-objective optimization problem.

raw materials etc. Maximize the profit. lead to different sets X ⋆ . 4. it makes no sense to talk about a global maximum or a global minimum in terms of multiobjective optimization. Minimize the costs for advertising. These different definitions. Example E3. would maybe not be discovered by an ant with a clumsy controller. The maximum (and thus. The last two objectives seem to clearly conflict with the goal cost minimization. it may take the ant a shorter walk to pile the same amount of food. the smaller one should be preferred. If we go back to our factory example. Minimize the negative impact of waste and pollution caused by the production process on environment. Multi-objective optimization often means to compromise conflicting goals. Both functions have the real numbers R as problem space X1 . Between the personal costs and the time needed for production and the product quality there should also be some kind of contradictive relation. there exist multiple approaches to define what an optimum is.4. these optimization goals conflict with each other: A program which contains only one instruction will likely not guide the ant to a food item and back to the nest. If two control programs produce the same results and one is smaller (i. however. we can gain another insight: To find the global optimum could mean to maximize one function fi ∈ f and to minimize another one fj ∈ f . We will discuss some of these approaches in the following by using two graphical examples for illustration purposes. since producing many quality goods in short time naturally requires many hands. personal.4. MULTIPLE OBJECTIVE FUNCTIONS 57 Example E3. too. The exact mutual influences between optimization criteria can apparently become complicated and are not always obvious.4 (Factory Example). e. The more food it aggregates. 3. Maximize the quality of the products. Thus. in turn.3. the longer the distance it needs to walk.. If its behavior is driven by a clever program. contains fewer instructions) than the other. the optimum) of f1 is x1 and the largest value of f2 is at x2 . The efficiency of an ant is not only measured by the amount of food it is able to pile. the distance the ant has to cover to find the food (or the time it needs to do so) has to be considered in the optimization process. In the first graphical example pictured in Figure 3. .. From these both examples. Since compromises for conflicting criteria can be defined in many ways. we can specify the following objectives that all are subject to optimization: 1. 2. f2 }. Example E3. (i = j). we want to maximize two independent objective functions f 1 = {f1 . 5. For every food item. We will thus retreat to the notation of the set of optimal elements x⋆ ∈ X ⋆ ⊆ X in the following discussions. ˆ ˆ In Figure 3.6 (Graphical Example 1). Like in the factory example.3.5 (Artificial Ant Example). we can easily see that f1 and f2 are partly conflicting: Their maxima are at different locations and there even exist areas where f1 rises while f2 falls and vice versa. Minimize the time between an incoming order and the shipment of the corresponding product. 4 See ?? on page ?? for more details. Another example for such a situation is the Artificial Ant problem4 where the goal is to find the most efficient controller for a simulated ant. the ant needs to walk to some point. This better path. Hence.

In the second example sketched in Figure 3. their solutions can be combined to a full vector x⋆ = (x⋆ .4: Two functions f1 and f2 with different maxima x1 and x2 . x2 alone and f2 can be optimized by considering only x3 while the values of x1 and x2 do not matter.5: Two functions f3 and f4 with different minima x1 . since f1 can be optimized on the first two vector elements x1 .11) (3.58 3 OPTIMA: WHAT DOES GOOD MEAN? y y=f1(x) y=f2(x) x2 ^ x1 ^ xÎX1 Figure 3. ˇ ˇ ˇ ˇ lem space X1 to the real numbers that are to be maximized.8 (Independent Objectives). ˇ ˇ ˇ ˇ Example E3.12) This problem can be decomposed into two separate problems. we instead minimize two functions f3 and f4 that map a two-dimensional problem space X2 ⊂ R2 to the real numbers R. ^ ^ ^ ^ x2 . Assume that the problem space are the three-dimensional real vectors R3 and the two objectives f1 and f2 would be subject to minimization: f1 (x) = (x1 − 5)2 + (x2 + 1)2 f2 (x) = x2 − sin x − 3 3 (3.7 (Graphical Example 2). ˆ ˆ Example E3. x⋆ ) by taking the 1 2 3 ⋆ ⋆ ⋆ solutions x1 and x2 from the optimization of f1 and x2 from the optimization process solving f2 . x3 . x⋆ . After solving the two separate optimization T problems.5. and x4 . The objective functions f1 and f2 in the first example are mappings of a one-dimensional prob- x1 f3 x2 f4 x3 x4 x1 x2 X2 x1 Figure 3. Both functions have two global minima. x2 . It should be noted ˇ ˇ ˇ ˇ that x1 = x2 = x3 = x4 . the lowest values of f3 are x1 and x2 whereas f4 gets minimal at x3 and x4 .

2644. 1571. 605. 2019. 662. is more important than f3 . 1859. 789. 1840. 1508. 605. 861. 1559. 579. 1488. 953. the robots must try to decide which action to take in order to maximize the chance of scoring a goal. 1873. Hence.1 Testing [2863] Water Supply and Management [1268] Wireless Communication [1717. 1840. 1756. 1488. 2862. 1559] Engineering [93. 1574. see also Table 20. 1661. 1299. 737. 869. 2862.[320.10 on page 37. and ⋆ so on. 1034. assume that f1 is strictly more important than f2 . 1796. 860. 1571. 2025. 2862. Table 3. 690. 2473. 1696. 579. 1000.3. If decision making in robotic soccer was only based on this criteria. 2158. 997. 2863. leaving the own one unprotected. 662. 2111] Mathematics [560. see also Table 20. 2897]. 1622. 654. 2111. 1559. 861.1 Computer Graphics [1661]. 2645. 2059. 2102] Military and Defense [1661] Multi-Agent Systems [1661] Physics [1000] Security [1508. 1574. 2398. 669. In Example E1. we mentioned that in robotic soccer. 1840. 1442.1 Data Mining [2069. 1429. 2820]. for instance. 2291. at least one second objective is important too: to minimize the chance that the opposing team scores a goal. 2158] 3. 527. which. 2895]. 1859. 1442. 997. 2644. 1717.9 (Robotic Soccer Cont. 1479.1 Digital Technology [789. 2644. 1034. 2897] Economics and Finances [320. 320. 1559. If this set . 1488. 2897] Logistics [527. 663. in turn. 2059. 2645]. 1756–1758. 1440.1 Databases [1696] Decision Making [198. 662. 1299. 1553. 2645.). tems 1801. We then would first determine the optimal set X(1) according to f1 . see also Table 20. see also Table 20. 1801]. it is likely that most mates in a robot team would jointly attack the goal of the opposing team. 1553. 669. 2897] Games [560.3. 491. 548. 2025. 519. 869. 861. 1661] Software [1508.2 Lexicographic Optimization The maybe simplest way to optimize a set f of objective functions is to solve them according to priorities [93. 547. 1429. MULTIPLE OBJECTIVE FUNCTIONS 59 Example E3. see also Table 20. we could. 1574. 668] [366. Area Business-To-Business Chemistry Combinatorial Problems References [1559] [306. 1479] Distributed Algorithms and Sys. Without loss of generality.1: Applications and Examples of Multi-Objective Optimization. 2472. 2158. 2102] Graph Theory [669. 2071. 857. 869. 857.3.

Definition D3.14.60 3 OPTIMA: WHAT DOES GOOD MEAN? ⋆ contains more than one solution. the described approach may lead to different results. this can be achieved in ⋆ polynomial time in n [860]. the ωj will be assumed to be 1 if not explicitly stated otherwise. i. Then. This process is repeated until either X(1. the problem space Xi of the ith ⋆ problem equals X only for i = 1 and then is Xi ∀i > 1 = X(1..n)| = n! different multi-objective problems arising.π x⋆ (3. and the ωi retain their meaning from Equation 3. By doing so.11 (Lexicographically Better (Refined)). member of the optimal set Xπ ) if there is no lexicographically better element in X.13 only denote whether fj should be maximized or minimized. A candidate solution x1 ∈ X is lexicographically better than another element x2 ∈ X (denoted by x1 <lex x2 ) if x1 <lex x2 ⇔ (ωi fi (x1 ) < ωi fi (x2 )) ∧ (i = min {k : fk (x1 ) = fk (x2 )}) 1 if fj should be minimized ωj = −1 if fj should be maximized (3. For each such permutation. Since we generally consider minimization problems. the n ∗ n! single-objective problems.13) (3.14) where the ωj in Equation 3.2) ⊆ X(1) . ) = 1 or all criteria have been processed. the multi-objective optimization problem has been reduced to n = |f | single-objective ones to be solved iteratively. If n objective functions are to be optimized. A candidate solution x⋆ is lexicographically optimal in terms of a permutation π ∈ Π(1.n) applied to the objective functions (and ⋆ therefore. e. We can thus refine the previous definition of “lexicographically better” by putting it into the context of a permutation of the objective functions: Definition D3.n) is the set of all permutations of the natural numbers from 1 to n.. A candidate solution x1 ∈ X is lexicographically better than another element x2 ∈ X (denoted by x1 <lex.n)| possible permutations of them where Π(1. separately... we can now define optimality. e. This would lead to a more than exponential complexity in n.10 (Lexicographically Better).. Based on the <lex -relation.16) ⋆ As lexicographically global optimal set X ⋆ we consider the union of the optimal sets Xπ for all possible permutations π of the first n natural numbers: X⋆ = ∀π∈Π(1. there are n! = |Π(1.14.12 (Lexicographic Optimality).15) where n = len(f ) is the number of objective functions.. we will then retain only those elements in X(1) which ⋆ have the best objective values according to f2 compared to the other members of X(1) and ⋆ ⋆ ⋆ obtain X(1.i−1) [860]. π [i] is the ith element of the permutation π.. i..17) X ⋆ can be computed by solving all the |Π(1.. Definition D3.n) applied to the objective functions if x1 <lex.. But which solutions would be contained in the Xπ or X ⋆ if we use the lexicographical definition of optimality? . as specified in Equation 3..2.π x2 ⇔ ωπ[i] fπ[i] (x1 ) < ωπ[i] fπ[i] (x2 ) ∧ i = min k : fk (x1 ) = fπ[k] (x2 ) (3.2.n) ⋆ Xπ (3.. In case that X is a finite set..π x2 ) according to a permutation π ∈ Π(1. ⋆ x⋆ ∈ Xπ ⇔ ∃x ∈ X : x <lex.

12 (Example E3.2) which have the highest f2 -value and so on. if determined lexicographically.5 Cont. however.11 (Example E3.3. are not possible with the lexicographic method. x4 }. X(1. The two-dimensional objective functions f3 and f4 from Figure 3. On the downside. MULTIPLE OBJECTIVE FUNCTIONS 61 Example E3.1) = {x⋆ }. Example E3. Determining lexicographically optimal set Xπ for the identity permutation π = (1. Amongst all elements in X(1) . the program which leads to the highest possible food collection. production quality. Since ˆ 1 2 ⋆ ⋆ Π(1. and will largely disregard any environmental issues.3) = x ˇ x ˇ {ˇ1 . Although it is a bit hard to see in the figure itself. The first one x⋆ is the single global maximum x1 of f1 and the other one (x⋆ ) is 2 1 ⋆ ⋆ the single global maximum x2 of f2 . that X(4. Example E3. minimize the running costs (f3 . we would surely be interested in solutions which lead to a short production time but do not entirely ignore any budget issues. 2. x2 and x3 .4.1) = {x⋆ . contains the 1. x ˇ ˇ ˇ 3.7 Cont. and minimize environmental damage ⋆ (f5 . In other words. 5) means to first determine all possible configurations of the factory the minimize the production time f1 . running costs. we wished to minimize the production time (f1 .5 each have two global minima x1 . (2. maximize a product quality measure (f4 .2. 2) = {(1. In the factory example. Such differentiations.3. the program which guides the ant along the shortest possible path (which is again the empty program). smallest possible program (length 0). Lexicographically). and that X ⋆ = X(3.3) = {ˇ3 . and 3. . ω3 = 1).4. we consider f2 . ω2 = −1). Only if there is more than one such configuration ⋆ ⋆ len X(1) > 1 . for instance. 1)}. respectively. Lexicographically).6 Cont.2) = {x⋆ } and X(2.13 (Example E3. it follows that X ⋆ = X(1.1 Problems with the Lexicographic Method From these examples it should become clear that the lexicographic approach to defining what is optimal and what not has some advantages and some disadvantages.4 Cont. 2. ω5 = 1). There are exactly two lexicographically optimal points on the two functions illustrated in ˆ Figure 3. x⋆ }. 3. Lexicographically). x4 }. On the positive side. x x x x ⋆ ⋆ ⋆ ⋆ This means that X(3. the set of all lexicographically optimal solutions X ⋆ will contain the elements x⋆ ∈ X which have the best objective values regarding to one single objective function while largely neglecting the others. ω4 = −1). ˇ ˇ ˇ ˇ f4 (ˇ1 ) = f4 (ˇ2 ) and f3 (ˇ3 ) = f3 (ˇ4 ) because of the symmetry in the two objective functions. 2) . we find that it is very easy realize and that it allows us to treat a multi-objective optimization problem by iteratively solving single-objective ones. The same would be the case in the Artificial Ant problem. Therefore. it does not perform any trade-off between the different objective functions at all and we only find the extreme values of the optimization criteria.3. ω1 = 1). maximize the profit (f2 . X ⋆ . x4 . 1 2 Example E3.4) ∪ X(4. In the factory example Example E3. x3 .10 (Example E3. What we get in Xπ are configurations which perform exceptionally well in terms of the production time but sacrifice profit. we then copied those ⋆ ⋆ to X(1. x2 .2) ∪ X(2. x2 }.4) = {ˇ1 . Lexicographically). No trade-off between the objectives will happen at all. 4.

10.3. Since a weighted sum is a linear aggregation of the objective values.7. Using signed weights allows us to minimize one objective while maximizing another one. we set the weights w3 and w4 to 1. apply a weight wa = 1 to an objective function fa and the weight wb = −1 to the criterion fb .: Weighted Sums). If we maximize ws1 (x).7 Cont. the effect would be inverse and fb would be minimized and while fa is maximized.19) x⋆ ∈ X ⋆ ⇔ ws(x⋆ ) ≤ ws(x) ∀x ∈ X With this method.62 3. Example E3.6. Either way. Again.6: Optimization using the weighted sum approach (first example). we will thus also maximize the functions f1 and f2 . This leads to a single optimum x⋆ = x.6 Cont. for instance.14 (Example E3.7 is sketched in Figure 3. n ws(x) = i=1 wi fi (x) = ∀fi ∈f wi fi (x) (3. The weights are both set to 1 = w1 = w2 . is to compute a weighted sum ws(x) of all objective functions fi (x) ∈ f . we can also emulate lexicographical optimization in many cases by simply setting the weights to values of different magnitude. The sum ws2 . If we instead maximize ws(x). The sum of the two-dimensional functions f3 and f4 from the second graphical Example E3. is subject to minimization.3 3 OPTIMA: WHAT DOES GOOD MEAN? Weighted Sums (Linear Aggregation) Another very simple method to define what an optimal solution is. If the objective functions fi ∈ f all had the range fi : X → 0.. setting the weights to wi = 10i would effectively achieve this. multi-objective problems are reduced to single-objective ones with this method.6 demonstrates optimization with the weighted sum approach for the example given in Example E3.15 (Example E3. we then actually minimize the first and maximize the second objective function. however. Example E3. the two global minima x5 and x6 can be found. Figure 3.18) (3. ˆ y y=ws1(x)=f1(x)+f2(x) y1=f1(x) y2=f2(x) x ^ xÎX1 Figure 3. As illustrated in Equation 3. At the bottoms of these valleys.19.: Weighted Sums). ˇ ˇ . a global optimum then is a candidate solution x⋆ ∈ X for which the weighted sum of all objective values is less or equal to those of all other candidate solutions (if minimization of the weighted sum is assumed). Each objective fi is multiplied with a weight wi representing its importance. for instance. it is also often referred to as linear aggregating. The graph of ws2 has two especially deep valleys. We can. By minimizing ws(x).

1 on page 144 or http://en.wikipedia. In Figure 3. Such functions cannot be added up properly using constant weights.3. When minimizing or maximizing this sum. we will always disregard one of the two functions. For small x.1 Problems with Weighted Sums This approach has a variety of drawbacks.3. will now become negligible. 3. depending on the interval chosen. we have sketched the sum ws(x) of the two objective functions f1 (x) = −x2 and f2 (x) = ex−2 .3.3. f1 will become insignificant for all x > 40. One is that it cannot handle functions that rise or fall with different speed5 properly. Adding multiple such values up thus does not necessarily make sense.16 (Problematic Objectives for Weighted Sums). f2 is negligible compared to f1 . For x > 5 it begins to outpace f1 which. ^ ^ x2 [ac- . Even if we would set w1 to the really large number 1010 .8.7: Optimization using the weighted sum approach (second example). Whitley [2929] pointed out that the results of an objective function should not be considered as a precise measure of utility.0005. Example E3. because −(402 )∗1010 ≈ 0.org/wiki/Asymptotic_notation for related information. MULTIPLE OBJECTIVE FUNCTIONS 63 x5 x6 y x1 y=ws2(x)=f3(x)+f4(x) Figure 3. in turn. e40−2 5 See cessed 2007-07-03] Definition D12.

not necessarily all elements considered optimal in terms of Pareto domination (see next section) will be found.n} < min {wi fi (x2 ) : i ∈ 1. 1554. 1305. 1621.15 on the facing page). Das and Dennis [689] also show that with weighted sum approaches. A candidate solution x1 ∈ X is preferred in comparison with x2 ∈ X if Equation 3.1 on page 144). . convex and concave trade-off surfaces (see Definition D3. Weighted sums are only suitable to optimize functions that at least share the same big-O notation (see Definition D12.8: A problematic constellation for the weighted sum approach. 1/wn ) from the point of origin in objective space to and along which the optimizer should converge.20) where wi are the weights of the objective values..3. Also.20 holds min {wi fi (x1 ) : i ∈ 1. In the same paper. This approach which is also called Weighted Tschebychev Norm is able to find solutions on both. wi > 0 is used and wi is set to a value below zero for functions to be maximized. determine whether the objective maximizing the food piled by an Artificial Ant rises in comparison to the objective minimizing the distance walked by the simulated insect? And even if the shape of the objective functions and their complexity class were clear. Instead.17 on the next page for different shapes of trade-off curves). The T weight vector describes a beam in direction (1/w1 .4 Weighted Min-Max The weighted min-max approach [1303. the candidate solutions which will finally be contained in the result set X will not necessarily be evenly spaced [1000]. for instance. 1622] (see Example E3.. it is not obvious how the objective functions will fall or rise.. Often.. 1/w2 . How can we. .n} (3. Amongst further authors who discuss the drawbacks of the linear aggregation method are Koski [1582] and Messac and Mattson [1873]. the question about how to set the weights w properly still remains open in most cases [689].64 3 OPTIMA: WHAT DOES GOOD MEAN? 45 y2=f2(x) 35 25 15 5 -5 -3 -1 y=g(x) =f1(x)+f2(x) 1 3 5 -5 -15 -25 y1=f1(x) Figure 3. For objective functions subject to minimization. candidate solutions hidden in concavities of the trade-off curve will not be discovered [609. 3. 1741] for multi-objective optimization compares two candidate solutions x1 and x2 on basis of the minimum (or maximum) of their weighted objective values.

and transitive binary relation. Like in Equation 3.17 (Shapes of Optimal Sets / Pareto Frontiers).org/wiki/Pareto_efficiency [accessed 2007-07-03] . The use of Pareto domination in multi-objective Evolutionary Algorithms was first proposed by [1075] back in 1989 [2228].2.. From this front. 1166.21) (3.14 are the same.1.22 and Equation 3. Based on the set f of objective functions f. In contrast. X ⋆ is called the Pareto-optimal set or Pareto set. asymmetric.55 on page 838. the weighted sum approach imposes a total order by projecting the objective space Y it into the real numbers R.n : ωj fj (x1 ) < ωj fj (x2 ) ωi = 1 if fi should be minimized −1 if fi should be maximized (3.. it provides the trade-off solutions which we longed for in Section 3. but there are other different 6 http://en. The notation of optimal in the Pareto sense is strongly based on the definition of domination: Definition D3. game theory. f (x1 ) is partially less than f (x). 997. Definition D3.22) We provide a Java version of a comparison operator implementing the domination relation in Listing 56. a decision maker (be it a human or an algorithm) can finally choose the configurations that. It defines the frontier of solutions that can be reached by trading-off conflicting objectives in an optimal manner.2 has a convex geometry. if a candidate solution x1 dominates another one (x2 ). 2102. Then. part of the optimal set X ⋆ ) if it is not dominated by any other element in the problem space X. 1010. the ωi again denote whether fi should be maximized or minimized. For a given optimization problem. F ⋆ = {f (x⋆ ) : x⋆ ∈ X ⋆ } (3.14 (Pareto Optimal). 2938]. Pareto optimality6 became an important notion in economics. suit best [264.3. x⋆ ∈ X ⋆ ⇔ ∃x ∈ X : x ⊣ x⋆ (3. it includes all the extreme points of the objective functions.3.13 (Domination). Definition D3. An element x1 dominates (is preferred to) an element x2 (x1 ⊣ x2 ) if x1 is better than x2 in at least one objective function and not worse with respect to all other objectives. in her opinion. see Definition D51.15 (Pareto Frontier).16 on page 645) on the space of possible objective values Y. Since we generally assume minimization.3. The Pareto domination relation defines a strict partial order (an irreflexive.23) The Pareto optimal set will also comprise all lexicographically optimal solutions since.n ⇒ ωi fi (x1 ) ≤ ωi fi (x2 ) ∧ ∃j ∈ 1. 2610]. we can write: x1 ⊣ x2 ⇔ ∀i ∈ 1. 954. 952. per definition. Additionally. engineering. the ωi will always be 1 if not explicitly stated otherwise in the rest of this book. 528.24) Example E3.wikipedia. MULTIPLE OBJECTIVE FUNCTIONS 3.5 Pareto Optimization 65 The mathematical foundations for multi-objective optimization which considers conflicting criteria in a fair way has been laid by Edgeworth [857] and Pareto [2127] around one hundred fifty years ago [1642]. An element x⋆ ∈ X is Pareto optimal (and hence.3. and social sciences [560.13 in the definition of the lexicographical ordering of candidate solutions. the Pareto front(ier) F ⋆ ⊂ Rn is defined as the set of results the objective function vector f creates when it is applied to all the elements of the Pareto-optimal set X ⋆ . Thus Equation 3. The Pareto front in Example E13.

9. we show some examples. 3.66 3 OPTIMA: WHAT DOES GOOD MEAN? f2 f2 f2 f2 f1 Fig.9.a: Non-Convex (Concave) Fig.9 [2915]. In Figure 3. and non-uniformly distributed Pareto fronts. shapes as well. for instance. including non-convex (concave).c: linear f1 Fig.9. . 3.9. 3.b: nected f1 Discon- f1 Fig. have problems to find all portions of Pareto fronts with non-convex geometry as shown by Das and Dennis [689].9: Examples of Pareto fronts [2915].d: NonUniformly Distributed Figure 3. Different structures of the optimal sets pose different challenges to achieve good convergence and spread. 3. linear. Optimization processes driven by linear aggregating functions will. disconnected.

.16 (Domination Rank). MULTIPLE OBJECTIVE FUNCTIONS 67 In this book. We again assume that f1 and f2 should both be maximized and hence. for instance.25) Hence. we could apply a method able to mathematically resolve the objective functions and to return the intervals in a form similar to the notation we used here. represent the optimal set X ⋆ = [x2 . In practice. we will refer to candidate solutions which are not dominated by any element in a reference set as non-dominated. The higher the domination rank. since both functions f1 and f2 can be improved by increasing x.10: Optimization using the Pareto Frontier approach.: Pareto Optimization). y y=f1(x) y=f2(x) xÎX1 x1 x2 x3 x4 x5 x6 Figure 3. All candidate solutions not in the optimal set are dominated by at least one element of X ⋆ . We define the domination rank a candidate solution x relative to a reference set X as #dom(x. The points in the area between x1 and x2 (shaded in light gray) are dominated by other points in the same region or in [x2 . an element x with #dom(x. for instance.3. ω1 = ω2 = −1. Definition D3. We can repeat this procedure and will always find a new dominating point until we reach x2 . x3 ]. In Figure 3. x2 demarks the global maximum of f2 – the point with the highest possible f2 value – which cannot be dominated by any other element of X by definition (see Equation 3. If we start at the leftmost point in X (which is position x1 ).21). The areas shaded with dark gray are Pareto optimal and thus.3. we illustrate the impact of the definition of Pareto optimality on our first example (outlined in Example E3. work on populations of candidate solutions and in such a context.x2 ) f2 .6 Cont. X) = 0 is non-dominated in X and a candidate solution x⋆ with #dom(x⋆ .6). x2 ): f1 ∼[x1 . In other words. x3 ] ∪ [x5 . This reference set may be the whole problem space X itself (in which case the non-dominated elements are global optima) or any subset X ⊆ x of it. we can go one small step ∆ to the right and will find a point x1 + ∆ dominating x1 because f1 (x1 + ∆) > f1 (x1 ) and f2 (x1 + ∆) > f2 (x1 ).10. of course. Example E3. f1 and f2 harmonize in [x1 . our computers can only handle finitely many elements and we would not be able to find all optimal elements unless. a non-dominated candidate solution would be one not dominated by any other element in the population. X) #dom(x. X) = 0 is a global optimum. x6 ] which contains infinite many elements since it is composed of intervals on the real axis. Many optimization algorithms. X) of = |{x′ : (x′ ∈ X) ∧ (x′ ⊣ x)}| (3.18 (Example E3. of course. the worse is the associated element.

we will find a point x2 + ∆ with f2 (x2 + ∆) < f2 (x2 ) but also f1 (x2 + ∆) > f1 (x2 ). For a certain grid-based resolution X′ ⊂ X2 of the problem space X2 . Besides these examples here. those candidate solutions residing in the valleys of Figure 3.68 3 OPTIMA: WHAT DOES GOOD MEAN? From here on. X3 . we can see that hills in the domination space occur at positions where both.11: Optimization using the Pareto Frontier approach (second example). In order to increase f1 . If we compare Figure 3.3. x2 ) can dominate any point in [x2 . f2 steeply falls to a very low level. we 2 counted the number #dom(x. 2 2 the higher this number. At x3 however. One objective can only get better if another one degenerates. there ⋆ ⋆ ⋆ ⋆ are four such areas X1 .3 on page 275. f2 would be decreased and vice versa and hence.11. These elements are Pareto optimal and have a domination rank of zero.5. the new point is not dominated by x2 . x2 ) may be larger than f2 (x2 + ∆). X′ ) of elements that dominate each element x ∈ X′ .11 with the plots of the two functions f3 and f4 in Figure 3. . i. A level lower than f2 (x5 ). This Pareto ranking approach is also used in many optimization algorithms as part of the fitness assignment scheme (see Section 28. another illustration of the domination relation which may help understanding Pareto optimization can be found in Section 28. f2 will decrease for some time. X2 . not dominated by any other candidate solution. All of them are also dominated by the non-dominated regions that we have just discussed. Example E3. Another method to visualize the Pareto relationship is outlined in Figure 3. Since the f1 values of the points in [x5 .3 on page 275 (Figure 28.19 (Example E3. f1 (x2 + ∆) > f1 (x) holds for all of them. as the name says.1). x6 ] are also higher than those of the points in (x3 . Hence. f1 and f2 conflict f1 ≁[x2 . but f1 keeps rising. If we now go a small step ∆ to the right. all points in the set [x5 . we can derive similar relations. x4 ] because f1 keeps rising until x4 is reached.7 Cont. x4 ]. Again. f3 and f4 have high values.3. the worst is the element x in terms of Pareto optimization. and X4 . for instance). regions of the problem space where both functions have small values are dominated by very few elements. x6 ] (which also contains the global maximum of f1 ) dominate those in (x3 .x3 ] f2 . For all the points in the white area between x4 and x5 and after x6 .: Pareto Optimization). Although some of the f2 (x) values of the other points x ∈ [x1 .11 for our second graphical example.4 and Table 28. This means that no point in [x1 .11 are better than those which are part of the hills. e. Conversely. x4 ]. In Figure 3. X1 X2 X3 #dom « « « X4 « x1 x2 Figure 3. A non-dominated element is..

). may survive as hardly-dominated solutions (called dominance resistant. 4. allowing the ant to gather 0 food item when walking a distance of 0 length units. allowing the ant to gather 1 food item when walking a distance of 5 length units. Usually. This then leads inevitably to the creation of a great proportion of useless offspring.5 – Artificial Ant – Cont. the question of the actual utility arises in this case. Minimize the size of the program driving the ant. too. The result of the optimization process obviously contains useless but non-dominated individuals which occupy space in the non-dominated set. which are apart from the true Pareto frontier. On one hand. A program x3 consisting of 50 instructions. Furthermore.3.5 on page 57 we have introduced multiple conflicting criteria in this problem. Between x1 and x2 . intelligent one. Therefore. Therefore. allowing the ant to gather 50 food items when walking a distance of 500 length units. it will be kept in the list and other solutions which differ less from each other but are more interesting for us will be discarded. 1.1 Problems of Pure Pareto Optimization 69 Besides the fact that we will usually not be able to list all Pareto optimal elements. some optimization algorithms use a clustering technique to prune the optimal set while maintaining diversity. an empty program x5 is useless because it will lead to no food collection at all.3. we may yield for example: 1. memory restrictions usually force us to limit the size of the list of non-dominated solutions X found during the search. the complete Pareto optimal set is often also not the wanted result of the optimization process. [1400] show that various elements. allowing the ant to gather 61 food items when walking a distance of 1 000 000 length units. Also. processing time is invested in evaluating them (and we have already seen that evaluation may be a costly step of the optimization process). 3. but the ant driven by x2 has to cover ten times the distance created by x1 . on the other hand.5.20 (Example E3. MULTIPLE OBJECTIVE FUNCTIONS 3. such solutions may dominate others which are not optimal but fall into the space behind the interesting part of the Pareto front. In Example E3. A program x3 which forces the ant to scan each and every location on the map is not feasible even if it leads to a maximum in food gathering. 2. A program x1 consisting of 100 instructions. allowing the ant to gather 60 food items when walking a distance of 5000 length units. In this case. Better yet. we are rather interested in some special areas of the Pareto front only.3. this is good since it will preserve a broad scan of the Pareto frontier. the difference in the amount of piled food is only 20%. 5. Minimize the distance covered or the time needed to find the food. see Section 20. Also. A program x2 consisting of 100 instructions. We can again take the Artificial Ant example to visualize this problem. a short but dumb program is of course very different from a longer. If we applied pure Pareto optimization to these objective. A program x5 consisting of 0 instructions. A program x4 consisting of 10 instructions. If such candidate solutions are created and archived. . Ikeda et al. Example E3. Maximize the amount of food piled. When this size limit is reached. non-dominated elements usually have a higher probability of being explored further. 3. 2.2).

1735. Area Combinatorial Problems Computer Graphics Databases References [237. 2072. if and only if it fulfils all constraints: isFeasible(x) ⇔ gi (x) ≥ 0 ∀i ∈ 1. 1822.5. 3034] [2487.2 Goals of Pareto Optimization Purshouse [2228] identifies three goals of Pareto optimization which also mirror the aforementioned problems: 1.2.17 (Feasibility). Comprehensive reviews on techniques for such problems have been provided by Michalewicz [1885]. 2036. these useless offspring will need a good share of the processing time to be evaluated.26) A candidate solution x for which isFeasible(x) does not hold is called infeasible. 2658] .2: Applications and Examples of Constraint Optimization. a candidate solution x is feasible. Constraint handling and the notation of feasibility can be combined with all the methods for single-objective and multi-objective optimization previously discussed. for a given optimization problem. [594. In Table 3. only a feasible individual can be a solution.p∧ hj (x) = 0 ∀j ∈ 1.. p ≥ 0 inequality constraints g and q ≥ 0 equality constraints h may be imposed additionally to the objective functions. and Coello Coello et al. Obviously. i.. e. 599] in the context of Evolutionary Computation. In ?? on page ?? you can find an illustrative discussion on the drawbacks of strict Pareto optimization in a practical example (evolving web service compositions). 3. 1161. Thus. Diversity: If the real set of optimal solutions is too large. we give some examples for constraints in optimization problems. 1921. Such regions of interest are one of the reasons for one further extension of multi-objective optimization problems: Definition D3. 1871. 2488] [1735. there are feasibility limits for the parameters of the candidate solutions or the objective function values. In an optimization problem. Table 3. 3. 2658. Michalewicz and Schoenauer [1890]. 2. there are several reasons to force the optimization process into a wanted direction. Proximity: The optimizer should obtain result sets X which come as close to the Pareto frontier as possible. 3. an optimum. Pertinency: The set X of solutions discovered should match the interests of the decision maker using the optimizer. 1822. There is no use in containing candidate solutions with no actual merits for the human operator.70 3 OPTIMA: WHAT DOES GOOD MEAN? In the next iteration of the optimization process. the set X of solutions actually discovered should have a good distribution in both uniformity and extent.4 Constraint Handling Often..q (3. Then.3.

1888] can only work in problems where the feasible regions are very large. i. chose a penalty function of the form f′ (x) = f(x) + v i=1 [gi (x)] in order to ensure that the function g always stays larger than zero. 1161. This death penalty [1885. 3041] as part of the genotype-phenotype mapping as well. 2072] or adaptive penalties which additionally utilize population statistics [237. An example for a decoder is given in [2472. 1881. [2228] 2. 3034] Engineering [500. Coding: This can be done as part of the genotype-phenotype mapping by using socalled decoders. turned into feasible ones [2228..1 Genome and Phenome-based Approaches One approach to ensure that constraints are always obeyed is to ensure that only feasible candidate solutions can occur during the optimization process. especially in the area of single-objective optimization. 3034] tems Economics and Finances [530.3.4. Thorough discussions on penalty functions have been contributed by Fiacco and McCormick [908] and Smith and Coit [2526]. Additionally. 519. 3. If the problem space contains only few or non-continuously distributed feasible elements. 2487]. Here. There are practically no limits for the ways in which a penalty for infeasibility can be integrated into the objective functions. p −1 for instance. 547. goes back to Courant [647] who introduced the idea of penalty functions in 1943. Repairing: Infeasible candidate solutions could be repaired. Once this has happened. 3070] Function Optimization [1732] 71 3. the information which could be gained from them is disposed with them. The basic idea is that this combination is done in a way which ensures that an infeasible candidate solution always has a worse f′ -value than a feasible one with the same objective values. 530. 3. CONSTRAINT HANDLING Decision Making [1801] Distributed Algorithms and Sys. resulting in a new function f′ which is then actually optimized. such an approach will lead the optimization process to stagnate because most effort is spent in sampling untargeted in the hope to find feasible candidate solutions. the transition to other feasible elements may be too complicated and the search only finds local optima. if infeasible elements are directly discarded.3 Penalty Functions Maybe one of the most popular approach for dealing with constraints. this is achieved 2 by defining f′ as f′ (x) = f(x) + v [h(x)] .[1801.4. 531. Several researchers suggest dynamic penalties which incorporate the index of the current iteration of the optimizer [1458. . e. 501]. Carroll [500. the constraints are combined with the objective function f. In [647].4.2 Death Penalty Probably the easiest way of dealing with constraints in objective and fitness space is to simply reject all infeasible candidate solutions right away and not considering them any further in the optimization process. 1. 2473].4. Various similar approaches exist.

the higher will the value of f≥ become. we can transform this to a new objective function which is subject to minimization. The candidate solutions which are not able to fulfill any of the goals succumb to those which fulfill at least some goals. 2. The higher the absolute value of the negative g.27) As long as g is larger than 0. If h is continuous and real. 3070].. f= (x) = max {|h(x)| .4 3 OPTIMA: WHAT DOES GOOD MEAN? Constraints as Additional Objectives Another idea for handling constraints would be to consider them as new objective functions. 3. for instance. 739]... Then. The minus in front of it will turn it positive. Only the solutions that are in the same group are compared on basis on the Pareto domination relation. (∃i ∈ 1. it may arbitrarily closely approach 0 but never actually reach it. No optimization pressure is applied as long as the constraint is met.4. 3052. e. However. e. |f | ˘ 2. |f | : ri ≤ fi (x) ≤ ri ) ∧ (∃j ∈ 1. the constraint is met. ǫ} (3. ri ≤ fi (x) ≤ ri ∀i ∈ 1. r Pohlheim [2190] outlines how this approach can be applied to the set f of objective functions and combined with Pareto optimization: Based on the inequalities.. ri ] for each objective function fi . an area of interest is specified in form of a goal range [˘i . |f | ˘ Using these groups. An approach similar to the idea of interpreting constraints as objectives is Deb’s Goal Programming method [736. The candidate solutions which fulfill all goals are preferred instead of all other individuals that either fulfill some or no goals. i. defining a small cut-off value ǫ (for instance ǫ = 10−6 ) may help the optimizer to satisfy the constraint. In the MOI. It fulfills all of the goals. 2924. the minimum term in f≥ will become negative as well. An equality constraint h can similarly be transformed to a function f= (x) (again subject to minimization). Constraint violations and objective values are ranked together in a part of the MOEA by Ray et al. (fi (x) < ri ) ∨ (fi (x) > ri ) ∀i ∈ 1. |f | : (fj (x) < rj ) ∨ (fj (x) > rj )) ˘ ˘ 3. 0} (3. If g(x) ≥ 0 must hold.72 3. we thus apply a force into the direction of meeting the constraint. Since it is subject to minimization. i. 3.4. three categories of candidate solutions can be defined and each element x ∈ X belongs to one of them: 1.28) f= (x) minimizes the absolute value of h. It fulfills none of the goals.29) ˘ . It fulfills some (but not all) of the goals. i.30) ˘ (3.. e.. since there is no use in maximizing g further than 0.31) ˘ ˘ (3. f≥ will be zero for all candidate solutions which fulfill this requirement.5 The Method Of Inequalities General inequality constraints can also be processed according to the Method of Inequalities (MOI) introduced by Zakian [3050. 3054] in his seminal work on computer-aided control systems design (CACSD) [2404. f≥ (x) = −min {g(x) . [2273] [749]. ˘ (3. a new comparison mechanism is created: 1. if g gets negative..

etc. In the whole domain X of the ˘ optimization problem.4. as sketched in Fig. only the regions [x1 . which is why the elements in this area cannot dominat each other and hence. we apply the Pareto-based Method of Inequalities to the graphical Example E3. In Figure 3. and even a set of adjacent.13. we see that f1 rises while f2 degenerates. Based on this class division.13 we apply the Pareto-based Method of Inequalities to our second graphical example (Example E3. are all optimal. situated at the bottom of the illustrated landscape whereas the worst candidate solutions reside on hill tops. x4 ] fulfill all the target criteria.: MOI). It turns out that the elements in [x3 . 1 means that a candidate solution fulfills all inequalities. r3 ] and [˘4 . 3. Therefore. the greater part of the first optimal area from this example is now infeasible because f2 drops under r2 . CONSTRAINT HANDLING 73 By doing so. Also. we can then perform a modified Pareto counting where each candidate solution dominates all the elements in higher classes Fig. Example E3. r r Like we did in the second example for Pareto optimization.: MOI). These elements are. r4 ] on f3 and f4 . e. x4 ] dominate all the elements [x1 . Less effort will be spent on creating and evaluating individuals in parts of the problem space that most probably do not contain any valid solution. we define two different ranges of interest [˘3 .13. the optimization process will be driven into the direction of the interesting part of the Pareto frontier.21 (Example E3. x4 ] from left to right. the second non-dominated region from the Pareto example Figure 3. since f1 rises over r1 . 3.10 suddenly becomes infeasible.r2 y=f1(x) y=f2(x) r ˘ ˘ 1. x⋆ . This number corresponds to the classes to which the elements belong. for an element of class 2.c. By ˘ ˘ doing so. 3} to each of its elements in Fig.b.22 (Example E3. again. 2. and the elements in class 3 fail all requirements. y r1. Example E3. To these elements. at least some of the constraints hold. Pareto comparisons are applied.3. we first assign a number c ∈ {1. ˘ ˘ ˘ ˘ ˘ ˘ ˘ . x⋆ .r2 xÎX1 x1 x2 x3 x4 Figure 3. 3. The result is that multiple single optima x⋆ . We impose the same goal ranges on both objectives r1 = r2 and r1 = r2 .6.6 Cont. i..12. x2 ] and [x3 .7 Cont. If we scan through [x3 . For illustrating purposes. In Figure 3.12: Optimization using the Pareto-based Method of Inequalities approach (first example). we want to plot the quality of the elements in the problem space. x2 ] since they provide higher values in f1 for same values in f2 .7)..13. non-dominated elements 1 2 3 ⋆ X9 occurs.a.

13.74 3 OPTIMA: WHAT DOES GOOD MEAN? r3 f4 r4 f3 x1 ˘ r3 x2 x1 Fig. Fig.13.c: The Pareto-based Method of Inequalities ranking.a: The ranges applied to f3 and f4 . x«x« x«x« 3 4 1 2 MOI class 3 2 1 #dom x1 x2 x1 Fig.13.b: The Pareto-based Method of Inequalities class division.13: Optimization using the Pareto-based Method of Inequalities approach (first example). ˘ ˘ ˘4 r x2 x«x«x« x« X9 5 6 7 8 « x2 . 3. 3. 3. Figure 3.

2190. Especially interesting in our context are methods which have been integrated into Evolutionary Algorithms [736. immutable partial orders8 on the candidate solutions.7 Limitations and Other Methods Other approaches for incorporating constraints into optimization are Goal Attainment [951. for instance. Here. for instance. Again. which leads to high memory consumption and declining search speed. x1 and x2 are both feasible but x1 dominates x2 . UNIFYING APPROACHES 75 A good overview on techniques for the Method of Inequalities is given by Whidborne et al.3. be two candidate solutions x1 .5. the Method of Inequalities. [2924].5. [749] define this principle which can easily be incorporated into the dominance comparison between any two candidate solutions. We cannot decide which of two candidate solutions not dominating each other is the most interesting one for further 7 http://en.17. an overview on this subject is given by Coello Coello et al. One of the ideas behind “externalizing” the assessment process on what is good and what is bad is that Pareto optimization as well as. x1 and x2 both are infeasible but x1 has a smaller overall constraint violation. 749. in [599]. such as the popular Goal Attainment approach by Fonseca and Fleming [951] which is similar to the Pareto-MOI we have adopted from Pohlheim [2190]. there can. 3. 2662]. a candidate solutions x1 ∈ X is preferred in comparison to an element x2 ∈ X [547. 3. x1 is feasible while x2 is not.3. . 739. 2951] and Goal Programming7 [530.wikipedia. As we have seen in the examples discussed in Section 3. Deb et al. The more general concept of an External Decision Maker which (or who) decides which candidate solutions prevail has been introduced by Fonseca and Fleming [952. 2.1 External Decision Maker All approaches for defining what optima are and how constraints should be considered until now were rather specific and bound to certain mathematical constructs. 954]. Since there can be arbitrarily many such elements. impose rigid.5 Unifying Approaches 3. x2 ∈ X with neither x1 ⊣ x2 nor x2 ⊣ x1 . The first drawback of such a partial order is that elements may exists which neither succeed nor precede each other.org/wiki/Goal_programming [accessed 2007-07-03] 8 A definition of partial order relations is specified in Definition D51. There is no “best of the best”. 1297] if 1.16 on page 645. or 3.4. 3. The second drawback of only using rigid partial orders is that they make no statement of the quality relations within X.5. 2393.4. the optimization process may run into a situation where it set of currently discovered “best” candidate solutions X grows quickly.6 Constrained-Domination The idea of constrained-domination is very similar to the Method of Inequalities but natively applies to optimization problems as specified in Definition D3. 531].

19 and which we will discuss later in Section 28.76 3 OPTIMA: WHAT DOES GOOD MEAN? investigation.3. Here comes the External Decision Maker as an expression of the user’s preferences [949] into play. a priori knowledge DM (decision maker) utility/cost EA (an optimizer) results objective values (acquired knowledge) Figure 3. Also. Furthermore. The structure of the decision making process u can freely be defined and may incorporate any of the previously mentioned methods. if the underlying optimizer is maximizing) which maps the space of objective values (Rn ) to the real numbers R. u could. the infeasible (but Pareto optimal) programs for the Artificial Ant as discussed in Section 3. While such ordering methods are a good default approaches able to direct the search into the direction of the Pareto frontier and delivering a broad scan of it. to perform an implicit Pareto ranking. certain regions of interest [955]. This region would exclude. which often cannot be delivered by pure Pareto optimization. or to compare individuals based on pre-specified goal-vectors. 3. Cost values have some meaning outside the optimization process and are based on user preferences. and even interaction with the user (smilar to Example E2. for example. for example.14: An external decision maker providing an Evolutionary Algorithm with utility values. additional density information which deems areas of X which have only sparsely been sampled as more interesting than those which have already extensively been investigated. One example for introducing a quality measure additional to the Pareto relation is the Pareto ranking which we already used in Example E3. since all elements in X are alike.5. This technique allows focusing the search onto solutions which are not only good in the Pareto sense. the utility of a candidate solution is defined by the number of other candidate solutions it is dominated by.3.3 on page 275. this process is another way of resolving the “incomparabilitysituation”. they neglect the fact that the user of the optimization most often is not interested in the whole optimal set but has preferences. 950. Here.2 Prevalence Optimization We have now discussed various ways to define what optima in terms of multi-objective optimization are and to steer the search process into their direction.1. Fonseca and Fleming make a clear distinction between fitness and cost values. be reduced to compute a weighted sum of the objective values. If External Decision Makers are applied in Evolutionary Algorithms or other search paradigms based on fitness measures. 956]. for instance). the plain Pareto relation does not allow us to dispose the “weakest” element from X in order to save memory. it may even incorporate forms of artificial intelligence. but also feasible and interesting from the viewpoint of the user. these will be computed using the values of the cost function instead of the objective functions [949. The task of this decision maker is to provide a cost function u : Rn → R (or utility function. Fitness values on the other hand are an internal construct of the search with no meaning outside the optimizer (see Definition D5. other forms of multi-criterion Decision Making. Into this number. What the user wants is a detailed scan of these areas.14. as illustrated in Figure 3. Since there is a total order defined on the real numbers.5. Let us subsume all .6 on page 48.1 on page 94 for more details).

a2 ) < 0 ∧ cmp(a2 . Even the weighted sum approach and the External Decision Maker do nothing else than mapping multi-dimensional vectors to the real numbers in order to make them comparable. a2 ) ∈ A2 to the real numbers R according to a (strict) partial order9 R: R(a1 .18 (Comparator Function). it covers many of the most sophisticated multi-objective techniques that are proposed.36) (3.32) ¬R(a1 . x2 . such a comparator function cmpf can be imposed on the problem spaces of our optimization problems: Definition D3. e. a2 ) ⇒ ¬R(a1 . It is. the need to compare elements of the problem space in terms of their quality as solution for a given problem winds like a read thread through this matter. a3 )) < 0. 9 Partial orders are introduced in Definition D51. a3 ) < 0 ⇒ cmp(a1 . transitive. 1531. we assume that both are of equal quality. a2 ) ∀a1 . Provided with the knowledge of the objective functions f ∈ f . a1 ) > 0) ∀a1 . a2 ) ⇔ (cmp(a1 . vice versa. for instance. ∈ X (x1 ≺ x2 ) ∧ (x2 ≺ x3 ) ⇒ x1 ≺ x3 ∀x1 . Together with the fitness assignment strategies for Evolutionary Algorithms which will be introduced later in this book (see Section 28. From the concept of Pareto optimization to the Method of Inequalities. x2 ) < 0 ∀x1 . a2 ) = cmp(a2 . A comparator function cmp : A2 → R maps all pairs of elements (a1 . As shortcut for this comparator function. In the latter case. either x1 is better than x2 . By replacing the Pareto approach with prevalence comparisons. Pareto domination relations. 2662]. in [952. i. From the three defining equations. a2 ∈ A (3. a2 ∈ A cmp(a. The subscript f in cmpf illustrates that the comparator has access to all the values of the objective functions in addition to the problem space elements which are its parameters. the Method of Inequalities-based comparisons. . we introduce the prevalence notation as follows: Definition D3. a1 ) = 0 ∀a1 . x2 ) ∈ X2 of candidate solutions to the real numbers R according to Definition D3. x2 ) ∈ R returns a value less than 0.15 on page 645. x2 .20 (Prevalence). a1 ) ⇔ cmp(a1 . Hence. a) = 0∀a ∈ A (3. a2 ) ∧ ¬R(a2 .3 on page 274). An element x1 prevails over an element x2 (x1 ≺ x2 ) if the application-dependent prevalence comparator function cmpf (x1 .34) (3.18. x1 ≺ x2 ⇔ cmpf (x1 .3.37) It is easy to see that we can fit lexicographic orderings.19 (Prevalence Comparator Function).35) We provide an Java interface for the functionality of a comparator functions in Listing 55. as well as the weighted sum combination of objective values easily into this notation.. x3 ∈ X (3. A prevalence comparator function cmpf : X2 → R maps all pairs (x1 . These two results can be expressed with a comparator function cmpf .15 on page 719. If we compare two candidate solutions x1 und x2 . a2 ) < 0) ∧ (cmp(a2 .5. or the result is undecidable. UNIFYING APPROACHES 77 of them in general approach. all the optimization algorithms (especially many of the evolutionary techniques) relying on domination relations can be used in their original form while offering the additional ability of scanning special regions of interests of the optimal frontier.33) R(a1 . Definition D3. for instance. a2 ∈ A (3. cmp(a1 . there are three possible relations between two elements of the problem space. many features of “cmp” can be deduced.

.ant (x1 .  −1 if (f1 (x1 ) > 0 ∧ f1 (x2 ) = 0) ∨     (f2 (x1 ) > 0 ∧ f2 (x2 ) = 0) ∨     (f3 (x1 ) > 0 ∧ f1 (x2 ) = 0)  1 if (f1 (x2 ) > 0 ∧ f1 (x1 ) = 0) ∨ (3.π x1 )  1 if ∃π ∈ Π(1.3.ws and the domination-based Pareto optimization with the objective directions ωi as defined in Section 3. It therefore may incorporate the knowledge on the importance of the objective functions.1 by no longer encouraging the evolution of useless programs for Artificial Ants while retaining the benefits of Pareto optimization.⊣ .23: x⋆ ∈ X ⋆ ⇔ ¬∃x ∈ X : x = x⋆ ∧ x ≺ x⋆ (3.3. |f |) : (x1 <lex. we will discuss some of the most popular optimization strategies. x2 ) =   (f2 (x2 ) > 0 ∧ f2 (x1 ) = 0) ∨     (f3 (x2 ) > 0 ∧ f1 (x1 ) = 0)    cmpf .41) Example E3. we can also easily solve the problem stated in Section 3..5 can be realized with cmpf .   −1 if ∃π ∈ Π(1.π x2 )    0 otherwise   |f | cmpf . Equation 3.π x2 ) ∧    ¬∃π ∈ Π(1. x2 ) otherwise Later in this book. x2 ) =  0 otherwise i=1 (wi fi (x1 ) − wi fi (x2 )) ≡ ws(x1 ) − ws(x2 ) (3.lex (x1 .39)   ¬∃π ∈ Π(1.3.ws (x1 .⊣ (x1 ..5. f2 denote the distance covered in order to find the food. |f |) : (x2 <lex. Although they are usually implemented based on Pareto optimization.23 (Example E3. we will always introduce them using prevalence.3. and f3 be the program length.π x1 ) ∧ cmpf . The comparator function can simply be defined in a way that infeasible candidate solutions will always be prevailed by useful programs. x2 ) =    −1 if x1 ⊣ x2 1 if x2 ⊣ x1 cmpf .40) (3.⊣ (x1 . we will exercise the prevalence approach on the examples of lexicographic optimality (see Section 3.3) we similarly provide cmpf .42 demonstrates one possible comparator function for the Artificial Ant problem.5 – Artificial Ant – Cont. For the weighted sum method with the weights wi (see Section 3..42) cmpf . |f |) : (x1 <lex.lex . Let f1 be the objective function with an output proportional to the food piled.2) and define the comparator function cmpf .38) For illustration purposes. With the prevalence comparator. |f |) : (x2 <lex.). we can construct the optimal set X ⋆ containing globally optimal elements x⋆ in a way very similar to Equation 3..78 3 OPTIMA: WHAT DOES GOOD MEAN? Since the comparator function cmpf and the prevalence relation impose a partial order on the problem space X like the domination relation does. x2 ) = (3.

Let us consider the university life of a student as optimization problem with the goal of obtaining good grades in all the subjects she attends.. i. Would formulate this goal as a constraint or rather as an objective function? Why? [4 points] 15. 3..1 to Equation 3. Use Equation 3. prevent the student from learning 24 hours per day. What is your opinion about the viability of solving this problem with the weighted sum approach with both weights set to 1? [5 points] 17. the Pareto dominance relation. or (b) a set of objectives which can be optimized with. 2} and m ∈ {1. UNIFYING APPROACHES Tasks T3 79 13. Name at least three criteria which could be subject to optimization or constraints in this scenario. [6 points] 14. [10 points] 19. The most basic goal when solving job shop problems is to meet all production deadlines.5 from page 52 to determine all extrema (optima) for (a) f (x) = 3x5 + 2x2 − x. In Example E1. Outline why the sentence also holds for n > 2.3 on page 26. for example. for example. In your opinion. for increasing values of n = |f |. 2. and (c) h(x) = sin x.3. make a statement about the expected impact of the fact discussed in Task 17 on the quality of solutions that a Pareto-based optimizer can deliver for rising dimensionality of the objective space. [15 points] .2 on page 26. Based on your answer. [10 points] 16. hence. we stated the layout of circuits as a combinatorial optimization problem. The job shop scheduling problem was outlined in Example E1. phrase at least two constraints in daily life which. Additionally. If we perform Pareto optimization of n = |f | objective functions fi : i ∈ 1. there can be at most mn−1 non-dominated candidate solutions at a time. 4}. Consider the extreme case that all points known to such an algorithm are non-dominated to answer the question. the Pareto approach. How could be formulate a (a) single objective function for that purpose.n where each of the objectives can take on m different values. Consider a bi-objective optimization problem with X = R as problem space and the two objective functions f1 (x) = e|x−3| and f2 (x) = |x − 100| both subject to minimization. (b) g(x) = 2x2 + x. [10 points] 18. consider an optimization process that decides which individual should be used for further investigation solely on basis of its objective values and.4 on page 276 for n ∈ {1. is it good if most of the candidate solutions under investigation by a Pareto-based optimization algorithm are non-dominated? For answering this question.5. Sketch configurations of mn−1 nondominated candidate solutions in diagrams similar to Figure 28. e.

.

2. This is not always possible. new elements from X are explored in the hope to improve the solution quality.1 The Search Space In Section 2. the problem spaces X of these problems are quite different. optimization would make not too much sense).1). Instead. After we the problem space is defined. these vectors would have length one (Fig. where the element at index i is the segment width and i + 1 and i + 2 denote its disposition and height (Fig. Such an approach would not only be error-prone.1.1. but in most of the problems we will encounter. From these initial solutions. Example E4.b. 4. and E2.5). . we often reuse well-known search spaces for many different problems. however.5). such as from simple real-valued problems (Example E2. The stone’s throw as well as many antenna and wing blueprints. In order to find these new candidate solutions. where the element at index i specifies a length and i + 1 and i + 2 are angles denoting into which direction the segment is bent as sketched in Fig. It would. Since the problem spaces of the three aforementioned examples are different. A solution of the stone’s throw task. be cumbersome to develop search operations time and again for each new problem space we encounter. Optimization algorithms usually start out with a set of either predefined or automatically generated elements from X which are far from being optimal (otherwise.3. The aircraft wing could be represent as vector of length 3n.c). for instance. 4. was a real number of meters per second. and tasks in aircraft development (Example E2. search operators are used which modify or combine the elements already known.a). this would mean that different sets of search operators are required.1 (Search Spaces for Example E2. it will save a lot of effort.1. 4. An antenna can be described by its blueprint which is structured very differently from the blueprints of aircraft wings or rotors. it would also make it very hard to formulate general laws and to consolidate findings. we will always look for ways to map it to such a search space for which search operators are already available. An antenna consisting of n segments could be encoded as vector of length 3n.Chapter 4 Search Space and Operators: How can we find it? 4. too.3). E2. antenna design (Example E2. For the stone’s throw. Obviously. we gave various examples for optimization problems. can all be represented as vectors of real numbers. for instance. too.1.

determining whether the scales will be brown or gray. The elements g ∈ G of the search space G of a given optimization problem are called the genotypes. may have a gene for the color of its scales.1. Likewise.3 (Gene). Figure 4. an antenna design. () x=g=(g1) g11 g10 g14 g9 g13 Fig.org/wiki/Gene [accessed 2007-07-03] 4 http://en.1.wikipedia. hierarchical units. or a blueprint of a wing. Fig. This gene could have two possible “values” called alleles 4 . the genome is the whole hereditary information of organisms and the genotype is represents the heredity information of an individual. 4.82 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? g0 g= g1 g2 g. In Example E4.1: Examples for real-coding candidate solutions. The search space G (genome) of an optimization problem is the set of all elements g which can be processed by the search operations.a: Stone’s throw. 3 http://en. They consist of genes 3 . If G = X. In dependence on Genetic Algorithms.. for instance. A fish. 1 http://en. Definition D4. The elements of the search space rarely are unstructured aggregations. The distinguishable units of information in genotypes g ∈ G which encode properties of the candidate solutions x ∈ X are called genes. segments of nucleic acid.wikipedia. we often refer to the search space synonymously as genome 1 . in turn.. 4..1 (Search Space). they often consist of distinguishable parts. As said. Definition D4.. The same goes for the chromosomes in biology. a biological genotype encodes the complete heredity information of an individual.b: Antenna design. genotype. Genetic Algorithms (see Chapter 29) are optimization algorithms inspired by the biological evolution.1. a genotype in optimization encodes a complete candidate solution.2 In biology. Definition D4.org/wiki/Genome [accessed 2007-07-15] 2 The words gene. or well-typed data structures.wikipedia. The term genome was coined by the German biologist Winkler [2958] as a portmanteau of the words gene and chromosome [1701]. which contain the information necessary to produce RNA strings in a controlled manner and encode phenotypical or metabolical properties. we call its elements genotypes.2 (Genotype).org/wiki/Allele [accessed 2007-07-03] .1. been introduced [2957] by the Danish biologist Johannsen [1448]. and phenotype have.c: Wing design. the genomes all are the vectors of real numbers but the problem spaces are either a velocity. g5 g0 g4 g2 g1 g 0 Fig. Since in the context of optimization we refer to the search space as genome. Instead. The Genetic Algorithm community has adopted this notation long ago and we use it for specifying the detailed structure of the genotypes. we distinguish genotypes g ∈ G and phenotypes x ∈ X. g6 g8 g7 g3 g1 () g2 g5 g8 g4 g3 g7 g6 g0 g= g1 g2 g. 4.

01`` at locus 0 1 2 X0 3 3 2 1 0 X1 X0 1 2 3 genome G (search space) genotype gÎG phenotype xÎX (solution candidate) phenome X=X0´X1 (problem space) Figure 4. THE SEARCH OPERATIONS Definition D4. Listing 55. x[0] and x[1]. An operation with n = 0.1) Search operations are used an optimization algorithm in order to explore the search space. for instance. i. Genetic Algorithms are a widely-researched optimization approach which.3 and thus. An operator with n = 1 could randomly modify 5 http://en.2 on page 708)..5 (Locus).3 × 0. x[1])) = x[0] − x[1] : x[0] ∈ X0 . Figure 4. With this trivial encoding.3) → R such as the trivial f(x = (x[0]. An allele is a value of specific gene.org/wiki/Locus_%28genetics%29 [accessed 2007-07-03] 6 http://en. 83 Definition D4.. gÎG st 0 1 1 1 1 Gene 2nd Gene allele . A search operation “searchOp” takes a fixed number n ∈ N0 of elements g1 .. receives no parameters and may.. searchOp : Gn → G (4. Four bits suffice for encoding two natural numbers in X1 = X2 = 1..org/wiki/Arity [accessed 2008-02-15] . n is an operator-specific value. for instance.11`` at locus 1 1 1 0 1 1 1 1 0 1 1 1 1 allele .2 illustrates how this can be achieved by using two bits for each. for the whole candidate solution (phenotype) x. Example E4..gn from the search space G as parameter and returns another element from the search space.2 The Search Operations Definition D4. x[1] ∈ X1 . in its original form.4. 3) 4. genes. a nullary operator.2.2: The relation of genome. and the problem space. The first gene resides at locus 0 and encodes x[0] whereas the two bits at locus 1 stand for x[1].wikipedia..4 (Allele).. 0 0 0 0 0 0 0 1 0 0 1 0 searchOp x=gpm(g) 3 2 1 0 X1 xÎX . its arity6 . create randomly configured genotypes (see.wikipedia.2 (Small Binary Genome).. the Genetic Algorithm can be applied to the problem and will quickly find the solution x = (0. utilizes bit strings as genomes.6 (Search Operation). Assume that we wished to apply such a plain GA to the minimization of a function f : (0. e. The locus5 is the position where a specific gene can be found in a genotype [634].

can be translated to a phenotype x′ = gpm(g ′ ) with similar or maybe even better objective values f (x′ ). e. Since optimization processes quite often use more than a single search operation at a time.) is randomized as well.3) is that 1. In the style of Badea and Stanciu [177] and Skubch [2516.. . Here. g2 . This Building Block Hypothesis is discussed in detail in Section 29. where also criticism to this point of view is provided.. random one directly. .) = {g1 . .. Many operations which have n > 1 try to combine the genotypes they receive in the input. . g2 . . it is possible to take a genotype g of a candidate solution x = gpm(g) with good objective values f (x) and to 2.7 (Search Operation Set). the return values of two successive invocations of one search operation with exactly the same parameters may be different.9 on page 218.. g2 . Furthermore.. . Op(g1 . . ∀g1 . If the parameter g is left away. .. e. which takes three arguments. .) = Op(G) (4.) which 3. i. Definition D4. ... if the n-ary search operation from Op which was applied is a randomized operator.. g2 ... this chain has to start with a search operation with zero arity. g2 .2 on page 161 and Section 24. Instead modifying a genotype.5. . The kth application receives all possible results of the k-1th application of the search operators as parameter. they are not functions in the mathematical sense. · · · ∈ G. }.. we then could as well create a new. i. g2 . .g2 .84 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? its parameter – the mutation operator in Evolutionary Algorithms is an example for this behavior. Search operations often are randomized. e. . we now can define: . is defined recursively as Opk (g1 .3) ∀G∈Π(P(Opk−1 (g1 . just Opk is written. gn ) = ∅ otherwise (4.. This decision may either be random or follow a specific schedule and the decision making process may even change during the course of the optimization process.))) where Op0 (g1 . gn ) denotes the application of any n-ary search operation in Op to the to the genotypes g1 . i. Instead. possible randomness in the search operations and in the decision process on which operation to apply. Op is the set of search operations searchOp ∈ Op which are available to an optimization process. the optimization algorithm using Op usually performs some form of decision on which operation to apply. Writing Op(g) in order to denote the application of a unary search operation from Op to the genotype g ∈ G is thus a simplified expression which encompasses both. . .5 on page 346. modify this genotype g to get a new one g ′ = searchOp(g. if more than one n-ary operation is contained in Op. This assumption is one of the most basic concepts of iterative optimization algorithms.) we denote k successive applications of (possibly different) search operators (with respect to their arities). gn ∈ G ⇒ Op(g1 . . . It is known as causality or locality and discussed in-depth in Section 14. . g2 . and Π(A) is the set of all possible permutations of the elements of G. g2 . e. Differential Evolution utilizes an operator with n = 3. the idea often is that good can step by step be accumulated and united by an optimization process. P (G) is the power set of a set G. If it does not hold then applying a small change to a genotype leads to large changes in its utility.. An example for such operations is the crossover operator in Genetic Algorithms.2) It should be noted that. gn ) ∈ G if ∃ n-ary operation in Op Op(g1 . obviously also the behavior of Op(. This is done in the hope that positive characteristics of them can be joined into an even better offspring genotype. With Opk (g1 . .. 2517]. . . i. The key idea of all search operations which take at least one parameter (n > 0) (see Listing 55. we define the set Op. g2 .

A point g2 is ε-adjacent to a point g1 in the search space G if it can be reached by applying a single search operation “searchOp” to g1 with a probability of more than ε ∈ [0. regardless of their distance. Definition D4. Furthermore. If at least one such operation exists and this operation can potentially return all possible genotypes.4) Definition D4. for instance. we can get a meaningful impression on how likely it is to discover a point g2 when starting from g1 ..4. parts of the search space can never be discovered. for instance. If the set of search operations is strongly complete. If we set ε to a very small number such as 1 · 104 . this definition allows us to reason about limits such as limε→0 adjacentε (g2 . g1 ).1 on page 230).2.9 (Weak Completeness).3 The Connection between Search and Problem Space If the search space differs from the problem space. e. ∀g1 . g1 ) ≡ adjacent0 (g2 .3.8 (Completeness).10 (Adjacency (Search Space)). of zero arity. adjacentε (g2 . 4. we will use the adjacency relation with ε = 0 and abbreviate the notation with adjacent(g2 . ∀g ∈ G ⇒ ∃k ∈ N1 : g ∈ Opk (4. In Example E4. From the opposite point of view. the search process.6) The probability parameter ε is defined here in order to allow us to differentiate between probable and improbable transitions in the search space. e. the Hill Climbing Algorithm 26. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 85 Definition D4. Then. g1 ) would be true for all bit strings g2 which have a Hamming distance of no more than one from g1 . If Op achieved its weak completeness because of a nullary search operation. adjacent(g2 . A set Op of search operations “searchOp” is weakly complete if every point g in the search space G can be reached by applying only operations searchOp ∈ Op. there are elements in the search space which cannot be reached from a random starting point. g1 ).3. However. g1 ) ⇔ P (Op(g1 ) = g2 ) > ε (4. A set Op of search operations “searchOp” is complete if and only if every point g1 in the search space G can be reached from every other point g2 ∈ G by applying only operations searchOp ∈ Op. which differ by at most one bit from g1 . g2 ∈ G ⇒ ∃k ∈ N1 : g1 ∈ Opk (g2 ) (4.. 1). single-bit flip mutation operators (see Section 29. we would need to transform the binary strings processed by the Genetic Algorithm to tuples of natural numbers which can be processed by the objective functions. Normally. the optimization process is probably not able to explore the problem space adequately and may not find optimal or satisfyingly good solution. can find any solution regardless of the initialization. If it is not even weakly complete. then the set of search operations is already weakly complete. then adjacent(g2 . the initialization of the algorithm would determine which solutions can be found and which not. this means that if the set of search operations is not complete. if we assume a real-valued search space G = R and a mutation operator which adds a normally distributed random value to a genotype. g1 ) would always be true. in theory. Notice that the adjacency relation is not necessarily symmetric. nullary operations are often only used in the beginning of the search process (see. a translation between them is furthermore required. . In the case of Genetic Algorithms with.5) A weakly complete set of search operations usually contains at least one operation without parameters.2. i. i.1 on page 330). However. The adjacency relation with ε = 0 has thus no meaning in this context.

10 on page 646. Section 16.g1 ) P (Op(g1 ) = g2 ) P x2 = gpm(g2 )|Op(g1 )=g2  > ε  (4. g ∈ G (4. this definition boils down to Equation 4. Otherwise. GPMs are not bijective (since they are neither necessarily injective nor surjective).2. there is no guarantee whether the solution of a given problem can be discovered or not.10) x1 =gpm(g1 )       P (Op(g1 ) = g2 ) > ε neighboringε (x2 . If a genotype-phenotype mapping is injective. i. If the genotype-phenotype mapping is deterministic (which usually is the case).8) Genotype-phenotype mappings should further be surjective [2247]. g1 ) ∧ 0 x2 = gpm(g2 ) 7 See 8 See (4. we would rewrite Equation 4. a point x2 is ε-neighboring to x1 if it can be reached by applying a single search operation and a subsequent genotype-phenotype mapping to g1 with at least a probability of ε ∈ [0. we can construct an inverse mapping gpm−1 : X → G. i. Definition D4. some candidate solutions can never be found and evaluated by the optimization algorithm even if the search operations are complete. gpm−1 (x) = g ⇔ gpm(g) = x ∀x ∈ X. i. is also not necessarily symmetric.12 (Neighboring (Problem Space)). Nevertheless.5 on page 174 for more information. e. relate at least one genotype to each element of the problem space. we can also define a neighboring relation for the problem space.7 on page 711. you can find a Java interface which resembles Equation 4. The GPM may be a functional relation if it is deterministic.7 as Equation 4. x1 ) ⇔   ∀g2 :adjacent0 (g2 . if a genotypephenotype mapping is bijective. we say that it is free from redundancy. Nevertheless. Then. Equation 4. others have positive influence8 .7. x1 ) ⇔     ∀g2 : adjacent (g2 . The genotype-phenotype mapping (GPM. x1 = gpm(g1 ).9) Based on the genotype-phenotype mapping and the definition of adjacency in the search space given earlier in this section. or ontogenic mapping [2134]) gpm : G → X is a left-total7 binary relation which maps the elements of the search space G to elements in the problem space X. In this case.29 on page 644 to Point 5 for an outline of the properties of binary relations. e.. There are different forms of redundancy.8: ∀g ∈ G ∃x ∈ X : P (gpm(g) = x) > 0 (4.11 (Genotype-Phenotype Mapping). of course. it is possible to create mappings which involve random numbers and hence. Under the condition that the candidate solution x1 ∈ X was obtained from the element g1 ∈ G by applying the genotypephenotype mapping. which means that it assigns distinct phenotypes to distinct elements of the search space. e.   neighboringε (x2 .10 provides a general definition of the neighboring relation.11) x1 =gpm(g1 ) Equation 51. you can find a thorough discussion of different types of genotype-phenotype mappings used in Evolutionary Algorithms. 1). The only hard criterion we impose on genotype-phenotype mappings in this book is left-totality. Most often. . some are considered to be harmful for the optimization process.7) In Listing 55. which.86 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? Definition D4.11. ∀g ∈ G ∃x ∈ X : gpm(g) = x (4. that they map each element of the search space to at least one candidate solution. cannot be considered to be functions in the mathematical sense of Section 51.. In Section 28..

this is not the case. let 1 0 GPM 110 2 X Problem Space: X The natural numbers 0. i. The candidate solution x = 0 is neighboring to 0 and 1 if it was obtained by applying the genotype-phenotype mapping to gA = 010. Example E4. the definition of neighborhood. If the search space G and the problem space X are the same (G = X) and the genotypephenotype mapping is the identity mapping. and 011 while gB = 110 is adjacent to 010. G = B3 = 3 {0. e. However.1} (g1. if x = 0 was the result of by applying the genotype-phenotype mapping to gB = 110.3. thus. on the genotype g1 it was obtained from. The two genotypes gA = 010 and gB = 110 both map to 0 since (0 + 1) ∗ 0 = (1 + 1) ∗ 0 = 0.31 on page 644). 1. 1} . g3)=g G Figure 4. since gpm(010) = gpm(100) = 0 and gpm(111) = 2. For injective genotype-phenotype mappings. g2. With one search step. us imagine that we have a very simple optimization problem where the problem space X only consists of the three numbers 0. we would apply the function x = gpm(g) = (g1 + g2 ) ∗ g3 . Then the neighboring relation (defined on the problem space) equals the adjacency relation (defined on the search space). 100. and 2 X = {0.3: Neighborhoods under non-injective genotype-phenotype mappings. 1. and 2. as can be seen in Example E4. As genotype-phenotype mapping gpm : G → X. gpm(110) = gpm(000) = 0 and gpm(011) = 1 can be reached. it follows that g2 = x2 and the sum in Equation 4. as can be seen . Then.. and 111. the neighborhood relation under non-injective genotype-phenotype mappings depends on the genotypes from which the phenotypes under investigation originated.1 on page 96 shows an account on how the search operations and. As an example for neighborhoods under non-injective genotype-phenotype mappings. THE CONNECTION BETWEEN SEARCH AND PROBLEM SPACE 87 The condition x1 = gpm(g1 ) is relevant if the genotype-phenotype mapping “gpm” is not injective (see Equation 51. Example E5.3. influence the performance of an optimization algorithm. if more than one genotype can map to the same phenotype. the set of ε-neighbors of a candidate solution x1 depends on how it was discovered.11 breaks down to P (Op(g1 ) = g2 ). 2}.2} Genotype-Phenotype Mapping: gpm gpm(g) = (g1 + g2) * g3 111 011 gB 010 gA 100 000 001 101 Search Space: G The bit strings of length 3 3 3 G = B = {0.4.3. i.. Imagine further that we would use the bit strings of length three as search space. e. its neighborhood is {0. 000.3 (Neighborhoods under Non-Injective GPMs). If the only search operation available would be a single-bit flip mutation. Hence. This means that gA = 010 is adjacent to 110.1. then all genotypes which have a Hamming distance of 1 are adjacent in the search space. as sketched in Figure 4.

In the case of singleobjective minimization. each of which having at least probability ε ∈ [0.13 holds. 4. From Example E4.4 (Types of Local Optima).” However.12) In the following. This would contradict the intuition. such as the set Xl⋆ of candidate solutions sketched in Figure 4. i. and a global minimum. 1). x2 ). Hence.6. too. with neighboring(x1 .13 (Connection Set).4: Minima of a single objective function. x2 ) ∀x1 .. Similar to Handl et al. Since the identity mapping is injective. such a definition would disregard locally optimal ˇ plateaus. a more fine-grained definition is necessary. we could say “A (local) minimum xl ∈ X of one (objective) function f : X → R is an input element with f(ˇl ) < f(x) ˇ x for all x neighboring to xl .4. Example E4. be found. it is also a local minimum. (G = X) ∧ (gpm(g) = g ∀g ∈ G) ⇔ neighboring ≡ adjacent (4.13) . the condition x1 = gpm(g1 ) is always true.4. x x we would classify the elements in XN as a local optima as well. every global optimum is also a local optimum. A set X ⊆ X is ε-connected if all of its elements can be reached from each other with consecutive search steps. let us start with creating a definition of areas in the search space based on the specification of ε-neighboring (Definition D4. it becomes clear that such a definition is not as trivial as the definitions of global optima given in Chapter 3. as a first step.12). we denote that x1 and x2 are ε = 0-neighbors. Definition D4. If we simply exchange the “<” in f(ˇl ) < f(x) with a “≤” and obtain f(ˇl ) ≤ f(x) however. In order to define what a local minimum is. If we assume minimization. since they are clearly worse than the elements on the right side of XN . the global optimum x⋆ of the objective function given can easily f(x) XN X« l x« xÎX Figure 4. we will use neighboringε to denote a ε-neighborhood relation with implicitly assuming the dependencies on the search space or that the GPM is injective.4 Local Optima We now have the means to define the term local optimum more precisely. Obviously.88 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? when taking a look at Equation 4. x2 ∈ X (4. Furthermore. e. x1 = x2 ⇒ neighboringε (x1 . if Equation 4. [1178].

x⋆ ) ⇒ [(x ∈ Xl⋆ ) ∧ (¬x ≺ x⋆ )] ∨ l l [(x ∈ Xl⋆ ) ∧ (x⋆ ≺ x)] ∀x⋆ ∈ Xl⋆ . an ε-local maximum xl is an element of an ε-local maximal set and we will use the ˆ term local maximum set and local maximum to indicate scenarios where ε = 0. ˆ neighboringε (x.14) An ε-local minimum xl is an element of an ε-local minimal set.15) ˆ ∀ˆl ∈ Xl . ˆ Definition D4. x ∈ X l l (4. In Section 3. Similar to the case of local maxima and minima. neighboringε (x. “prevails” can as well be read as “dominates”. Therefore. we showed that these encompass the other multi-objective concepts as well. we use the term ε-local optimum to denote candidate solutions which are part of ε-local optimal sets.1 Local Optima of single-objective Problems 89 ˇ Definition D4. depending on the definition of the optimization problem. 4. ε = 0 is assumed.14 (ε-Local Minimal Set). We will use the term local ˇ minimum set and local minimum to denote scenarios where ε = 0.5. x ∈ X ˇ (x ∈ X x x (4. A ε-local optimal set Xl⋆ ⊆ X is either a ε-local minimum set or a ε-local maximum set. See. LOCAL OPTIMA 4. [1178].4. we will again use the terms ε-local optimum. A ε-local optimal set Xl⋆ ⊆ X under the prevalence operator “≺” and problem space X is a connected set for which Equation 4. and local optimum. which is probably the most common use.4.14 holds. x ∈ X x Again. If ε is omitted in the name. Definition D4.17 (ε-Local Optimal Set (Multi-Objective Optimization)).4. xl ) ⇒ (x ∈ Xl ) ∧ (f(x) = f(ˆl )) ∨ ˆ x ˆ (x ∈ Xl ) ∧ (f(ˆl ) > f(x)) x (4. for example.15 (ε-Local Maximal Set).2 Local Optima of Multi-Objective Problems As basis for the definition of local optima in multi-objective problems we use the prevalence relations and the concepts provided by Handl et al. A ε-local maximal set Xl ⊆ X under objective function f and problem space X is a connected set for which Equation 4.4 on page 158 for an example of local optima in multi-objective problems.16 holds.16) Again. . xl ) ⇒ (x ∈ Xl ) ∧ (f(x) = f(ˇl )) ∨ ˇ x ˇ l ) ∧ (f(ˇl ) < f(x)) ∀ˇl ∈ Xl . Definition D4.16 (ε-Local Optimal Set (Single-Objective Optimization)). A ε-local minimal set Xl ⊆ X under objective function f and problem space X is a connected set for which Equation 4.2.4. Example E13. ˇ neighboringε (x. in the following definition.15 holds. local optimum set.

x) ∈ G × X of an element p.g in the search space G and the corresponding element p. many practical realizations of optimization algorithms use extended variants of such a data structure to store additional information like the objective and fitness values.90 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? 4. we will consider individuals as tuples in G × X × W. An individual p is a tuple p = (p. Definition D4. where W is the space of the additional information – W = Rn × R. p.5 Further Definitions In order to ease the discussions of different Global Optimization algorithms. In the algorithm definitions later in this book. Definition D4.g. A population pop is a list consisting of ps individuals which are jointly under investigation during an optimization process. Again relating to the notation used in Genetic Algorithms.x) ∈ pop ⇒ p. pop ∈ (G × X) ps ⇒ (∀p = (p. Such endogenous information [302] often represents local optimization strategy parameters. Besides this basic individual structure.x and p.19 (Population).17) .x = gpm(p. which is the unison of the genotype and phenotype of an individual.x without explicitly referring to the genotype-phenotype mapping “gpm”. we furthermore define the data structure individual. for instance. we will often access the phenotypes p.18 (Individual).g) in the problem space X. we will refer to lists of such individual records as populations.g)) (4.g.18. It then usually coexists with global (exogenous) configuration settings. since “gpm” is already implicitly included in the relation of p.g according to Definition D4. Then. p.x = gpm(p.

2} See Task 21 for a description of the search space. Define a possible unary search operation which modifies a point g ∈ R2 and which could be used for minimizing a black box objective function f : R2 → R. Instead. By this approach. by adding. 2. 1. multiplying. g [1]. 2}. = {5 gears. A unary search operation takes such a string g ∈ G and modifies it. automatic 5 gears. silver. We have a search space X = X1 × X2 × X3 × X4 where • • • • X1 X2 X3 X4 = {with climate control. A nullary search operation here would thus just create a new. g [1]) optimization algorithm would only face a two-dimensional problem instead of a threedimensional one. 2} has one parameter.2). for example) to find solutions to the Traveling Salesman Problem in this search space.n)n for a Traveling Salesman Problem with n cities. one string of length n where each element is from the set {0. 1. blue. An example for such a string for n = 5 is (1. automatic 6 gears}. without sun roof}. 0). 2}. we could use the same search space and problem space G = X = R3 . 6 gears. 2} . and = {red. FURTHER DEFINITIONS Tasks T4 91 20. 1. we could use G = R2 and apply a genotype-phenotype mapping gpm : R2 → R3 . or permutating g randomly. Define a proper search space for this problem along with a unary search operation and a genotype-phenotype mapping. Define a unary search operation which randomly modifies a point in the n-dimensional n search space G = {0. When trying to find the minimum of an black box objective function f : R3 → R. {0. black. [10 points] 24. 1. [5 points] 26. [5 points] 22. [10 points] 21. What do you think about this idea? Is it good? [5 points] 25. 1. This modification has to be done in a random way. A unary n search operation for G = {0. 1. without climate control}. Discuss some basic assumptions usually underlying the design of unary and binary search operations.4.2 on page 45. [10 points] 27. [5 points] . Define a search space and search operations for the Traveling Salesman Problem as discussed in Example E2. Assume that we are car salesmen and search for an optimal configuration for a car. This search space would clearly be numerical since G ⊆ Nn ⊆ Rn . for example. 1. 2} denotes a search space which consists of strings of the length n where each element of the string is from the set {0. our we could set gpm g = (g [0]. random string of this type. Discuss 1 about whether it makes sense to use a pure numerical optimization algorithm (with operations similar to the one you defined in Task 23. A nullary search operation has no parameters n (see Section 4.5. g [0] + g [1]) . T T = (g [0]. We can define a search space G = (1. [5 points] 23. It returns a new string g ′ ∈ G which is a bit different from g. = {with sun roof. white.. Define a nullary search operation which creates new random points in the nn dimensional search space G = {0. yellow}. 1. For example. green. subtraction.

d ∈ [c. d] contains all the real numbers between c and d. and d can be chosen freely. i.. The implementation should obey to the interface definitions in Chapter 55..b contains only the natural numbers from a to b. [10 points] 9 The natural interval a. d]. How can you solve your optimization problem with the available algorithm? Provide an implementation of the necessary module in a programming language of your choice. c.b. e.b)n . d]. m.. n. If you choose a programming language different from Java. you should give the methods names similar to those used in that section and write comments relate your code to the interfaces given there.. In other words.. a.. 10 The real interval [c. For example π ∈ [1. 34. 4] and c..4 . .92 4 SEARCH SPACE AND OPERATORS: HOW CAN WE FIND IT? 28. more specific G = [c. and b are fixed constants that are specific to the optimization problem you have.5 = {1. the problem space X is a subsetset of the natural numbers by the n (i. which deals with search spaces G ⊆ Rm . Assume that you want to solve an optimization problem defined over an n-dimensional vector of natural numbers where each element of the vector stems from the natural interval9 a. e. d]m . 5} and π ∈ 1. You have an optimization 0 algorithm available which is suitable for continuous optimization problems defined over m-dimensional real-vectors where each vector element stems from the real interval10 [c. more specific: X = (a.. For example 1. X ⊆ Nn ). 2.

Starting from a random point on the map. In multi-objective optimization applications with more sophisticated fitness assignment and selection processes. In the context of this book. 1030. 2985]. 1 http://en. they are visualizations of the relationship between the genotypes or phenotypes in a given population and their corresponding reproduction probability. In Chapter 28. Since it would possibly be confusing for the reader if we used a different definition for fitness landscapes than the rest of the world. the objective values were directly used as fitness and the “reproduction probability”. 1879].ucl.org/wiki/Fitness_landscape [accessed 2007-07-03] 2 This part of [1673] is also online available at http://www.1 Fitness Landscapes A very powerful metaphor in Global Optimization is the fitness landscape1 . In Figure 11. who used level contours diagrams in order to outline the effects of selection. . Basically. Langdon and Poli [1673]2 explain that fitness landscapes can be imagined as a view on a countryside from far above. 1500. Evolutionary Algorithms were initially developed as single-objective optimization methods. fitness landscapes have been developed and extensively been researched by evolutionary biologists [701. The Evolutionary Algorithm research community has widely adopted the fitness landscapes as relation between individuals and their objective values [865. for instance. we therefore deviate from this view. this simple approach does not reflect the biological metaphor correctly anymore.wikipedia.html [accessed 2008-02-15]. we will discuss Evolutionary Algorithms. which are optimization methods inspired by natural evolution. i. mutation.uk/staff/W. and crossover on the capabilities of populations to escape locally optimal configurations. Like many other abstractions in optimization. like in physics of disordered systems such as spin glasss [311. e. Similar abstractions arise in many other areas [2598]. An optimizer can then be considered as a short-sighted hiker who tries to find the lowest valley or the highest hilltop. the height of each point is analogous to its objective value.. The idea of such visualizations goes back to Wright [2985].1 on page 140. was proportional to them. you can find some examples for fitness landscapes. Here. the chance of a candidate solution for being subject to further investigation. she wants to reach this goal by walking the minimum distance.cs. 1912].ac.Chapter 5 Fitness and Problem Landscape: How does the Optimizer see it? 5.Langdon/FOGP/ intro_pic/landscape. we introduce the new term problem landscape and keep using the term fitness landscape in the traditional manner. Then.

Such comparisons. be position of a candidate solution in the list of investigated elements sorted according to the Pareto relation. for instance. even if the objective values stay constant. this is indeed the case in most multi-objective optimization situations. are only valid and make only sense within the optimization process. e. For each candidate solution. i. the individual records are often extended to also facilitate the fitness values.3 on page 55.14 on page 34 and the more precise Definition D46. In Section 3. In practical implementations. This is indeed a good implementation decision. .945 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT? 5. n = |f | > 1. the partial orders created by Pareto or prevalence comparators are therefore mapped to the positive real numbers R+ . Hence. i. but give no information on how much it is better or worse and how interesting it is for further investigation. The fitness mapping v is usually built by incorporating the information of a set of individuals. .. cmpf ) as defined in Definition D28. Also. result. Although it is a bit counter-intuitive that a fitness function is built repeatedly and is not a constant mapping. In many optimization techniques.. . It could.1 (Fitness). Both measures reflect some ranking criterion used to pick the point(s) in the search space which should be investigated further in the next iteration.20 on page 77). vt=2 . as mentioned before. especially in Evolutionary Algorithms. since some fitness assignment processes 3 http://en. In multi-objective optimization. If we refer to v(p) multiple times in an algorithm for the same p and pop.6 on page 274 is typically used to construct v based on the set pop of all individuals which are currently under investigation and a comparison criterion cmpf (see Definition D3. e. such a scalar. a fitness assignment procedure v = assignFitness(pop. this single number represents its fitness as solution for the given optimization problem. most often. Definition D5. only state whether one candidate solution is better than another one or not. The fitness3 value v : G × X → R+ maps an individual p to a positive real value v(p) denoting its utility as solution or its priority in the subsequent steps of the optimization process. We can detect a similarity between a fitness function v and a heuristic function φ as defined in Definition D1. . relative measure is needed.5 on page 518. The process of computing such a fitness value is often not solely depending on the absolute objective values of the candidate solutions but also on those of the other individuals known. however. Both of them give indicators about the utility of p which. The difference between fitness functions and heuristics is that heuristics usually are based on some insights about the concrete structure of the optimization problem whereas a fitness assignment process obtains its information primarily from the objective values of the individuals.wikipedia. They are thus more abstract and general. This procedure is usually repeated in each iteration t cycle of the optimizer and different fitness assignments vt=1 . a population pop. we have seen how such vectors can be compared directly in a consistent way. Often. This allows the search process to compute the fitness as a relative utility rating.org/wiki/Fitness_(genetic_algorithm) [accessed 2008-08-10] .2 Fitness as a Relative Measure of Utility When performing a multi-objective optimization. the fitness can change during the optimization process and often a relative measure whereas heuristic values for a fixed point in the search space are usually constant and independent from the structure of the other candidate solutions under investigation. the utility of a candidate solution is characterized by a vector in Rn . fitness values often only have a meaning inside the optimization process [949] and may change in each iteration of the optimizer. This concept is similar to the heuristic functions in many deterministic search algorithms which approximate how many modifications are likely necessary to an element in order to reach a feasible solution. this should be understood as access to such a cached value which is actually computed only once.

the way the initial elements are picked. The problem landscape Φ : X × N1 → [0. they called this single function fitness function and thus. they can be considered as the counterparts of the Markov chain models for the search state used in convergence proofs such as those in [2043] which resemble probability mass/density functions. because they may involve comparing each individual with every other one in pop as well as computing density information or clustering. meaning that two consecutive runs of the same configuration may lead to totally different results.5. the fitness assignment process. τ1 . this is not possible and fitness assignment processes like those which we are going to elaborate on in Section 28. 5. In multi-objective optimization processes. In our understanding. PROBLEM LANDSCAPES 2 95 have a complexity of O (len(pop)) or higher. can sometimes actually be applied. This point of view is obsolete in principle. yet you will find many contemporary publications that use this notion. Problem landscapes are essentially some form of cumulative distribution functions (see Definition D53. i. all entities involved in an optimization process over a finite problem space directly influence the problem landscape.2) (5. Back then. the focus was mainly on single-objective optimization. vp = f(p.org/wiki/Fitness_%28biology%29 [accessed 2008-02-22] . Since most of the algorithms discussed in this book are randomized. As already mentioned. e. It makes fitness and objective values compatible in that it allows us to consider the fitness of a multi-objective problem to be the objective value of a single-objective one. elements with smaller fitness are “better” than those with higher fitness.18 on page 658).x). they also have another advantage. set objective value ≡ fitness value. The term fitness has been borrowed from biology4 [2136.3 Problem Landscapes The fitness steers the search into areas of the search space considered promising to contain optima. 1] maps all the points x in a finite problem space X to the cumulative probability of reaching them until (inclusively) the τ th evaluation of a candidate solution (for a certain configuration of an optimization algorithm). problem landscapes are not only closer to the original meaning of fitness landscapes in biology. On basis of its cumulative character. This is partly due the fact that in simple problems with only one objective function.3) 4 http://en. the first applications of Genetic Algorithms were developed. The choice of the search operations in the search space G.. The quality of an optimization process is largely defined by how fast it finds these points. 2964] in their No Free Lunch Theorem (see Chapter 23 on page 209).2 (Problem Landscape). τ1 ) ≤ Φ(x. the objective functions. fitness is subject to minimization. Definition D5. we can define a quality measure based on probabilities. Φ(x.wikipedia. τ ) = P x has been visited until the τ th evaluation ∀x ∈ X. As such. the old approach of using the objective values directly as fitness.1) This definition of problem landscape is very similar to the performance measure definition used by Wolpert and Macready [2963. τ ) ≤ 1 ∀x ∈ X. In the context of this book.3. e. τ ): 0 ≤ Φ(x. i. and the way individuals are selected for further exploration all have impact on the probability to discover certain candidate solutions – the problem landscape Φ. τ ∈ N1 (5. According to this definition. it complies with the idea that optimization algorithms are to find the minima of mathematical functions (if nothing else has been stated). 2542] by the Evolutionary Algorithms community.3 on page 274 are used instead.. we can furthermore make the following assumptions about Φ(x. τ2 ∈ N1 (5. τ2 ) ∀τ1 < τ2 ∧ x ∈ X. Although this definition differs from the biological perception of fitness. τ ∈ N1 Φ(x. the genotype-phenotype mapping.

x2 + randomNorm(0. we used an operator searchOp2 : X → X which produces candidate solutions that are direct neighbors of the old ones in this lattice. defined over a finite subset X of the two-dimensional real plane. and the better one is kept. an area of very bad fitness. 1)) (5. The chances to first of x⋆ are higher. we do not need to differentiate between fitness and objective values. In the first experiment. With this setting. we can spot one local optimum x⋆ and one global optimum x⋆ .. searchOp1 (x) ≡ (x1 + randomNorm(0. We divided X in a regular lattice and in the second experiment series. From these records. this probability is exactly 1/362 in the first iteration. depicted in Figure 5.2. we use the problem space X also as search space G. For optimization. In this example. These peaks then begin to grow. small peaks begin to form around the places where the optima are located. . The rest of the problem space exhibits a small gradient into the direction of its center. The old and the new candidate are compared. which initially randomly draws one candidate solution with uniform probability from the X.1 (Influence of Search Operations on Φ). e. Let us now give a simple example for problem landscapes and how they are influenced by the optimization algorithm applied to them. 1) . i. the probabilities of the elements in the problem space of being discovered are very low. has a bigger basin of attraction. near to zero. In each iteration. Both operators are complete.3 million runs of the optimizer (8000 iterations each). The problem landscape Φ produced by this operator is shown in Figure 5. we have recorded the traces of two experiments with 1.1: An example optimization problem. there is a hill. Between l them. we will use a very simple Hill Climbing algorithm5 .4) (5. 1) . which we are going to optimize. Figure 5.1 illustrates one objective function. we can approximate the problem landscapes very good. The optimization algorithm will likely follow this gradient and sooner or later discover x⋆ or x⋆ .1.5) In both experiments. since our problem space is a 36 × 36 lattice.965 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT? Example E5. Hence. it creates a new candidate solution from the current one by using a unary search operation. l l since it is closer to the center of X and thus. better means has lower fitness. 1)) searchOp1 (x) ≡ (x1 + randomUni[−1. x2 + randomUni[−1. genotype-phenotype mapping is the identity mapping. In the example. 5 Hill Climbing algorithms are discussed thoroughly in Chapter 26. since each point in the search space can be reached from each other point by applying them. In Figure 5. To put it precise. the Hill Climbing optimizer used a search operation searchOp1 : X → X which creates a new candidate solution according to a (discretized) normal distribution around the old one.3. Starting with the tenth or so evaluation. and rather uniformly distributed in the first few iterations. the f(x) x« l x« X Figure 5.

4 0.8 0.4 0.2 0 X Fig.2 0 X Fig.2 0 0.5.2 0 0.2. 5.8000) X Fig.2.2 0 X Fig.2 0 X Fig.8 0. 8000) Figure 5. 50) F(x.2.6 0.4 0.4 0.6 0.4 0. 1000) F(x.c: Φ(x.8 0. 5.8 0.4 0.8 0.f: Φ(x.500) 0.4000) X Fig.4 0. 2) F(x.4 0.2.6 0.6000) 0.2.8 0. 5.6 0.4 0.8 0. 5. 10) 0.8 0.10) X Fig.0) 0.8 0.4 0. 2000) F(x.2.4 0.e: Φ(x.6 0.1000) X Fig.2 0 0.6 0.6 0. 5. 5.6 0. 5. 5.2000) 0.6 0.h: Φ(x. 5.100) F(x.8 0. 100) F(x. .a: Φ(x. PROBLEM LANDSCAPES F(x.8 0.6 0. 500) F(x. 5.2.2 0 0.2.8 0.g: Φ(x.b: Φ(x.6 0.2.2.2) X Fig.j: Φ(x.2 0 0.50) X Fig.2 0 97 F(x. 5) F(x.k: Φ(x.2 0 0.i: Φ(x.5) 0.4 0.d: Φ(x.2.2 0 X Fig. 5.2: The problem landscape of the example problem derived with searchOp1 .2.6 0. 6000) F(x. 4000) F(x.3.l: Φ(x. 5. 1) X Fig.

2 0 X Fig.c: Φ(x.3.6 0.8 0.3.j: Φ(x.2 0 0. 8000) Figure 5.8000) X Fig.a: Φ(x.2 0 0. 5. 5.6000) 0.3.8 0. 500) F(x.100) F(x. 100) F(x.3.3.50) X Fig.4 0.6 0.8 0.6 0. 1000) F(x.d: Φ(x.8 0. 2) F(x.2 0 0.2 0 0.8 0.4 0.4 0.10) X Fig. 5.k: Φ(x.2000) 0.8 0.4 0. 2000) F(x.985 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT? F(x.3: The problem landscape of the example problem derived with searchOp2 . 5.4 0. 5.2 0 X Fig.6 0.1000) X Fig.3.6 0.3.i: Φ(x.6 0.4 0. 5) F(x. 5.4 0. 5.6 0.2) X Fig.6 0. 1) F(x.8 0.2 0 X Fig.4 0. 5.6 0.4 0.5) 0.h: Φ(x.3. 10) 0.8 0.2 0 0.e: Φ(x.2 0 0.b: Φ(x. .6 0.2 0 X Fig. 6000) F(x.3.3.6 0.f: Φ(x. 5.4 0. 5.500) 0.g: Φ(x. 4000) F(x.6 0.8 0.8 0.2 0 X Fig.4000) X Fig.0) 0.l: Φ(x.4 0.8 0.4 0.8 0. 50) F(x.2 0 X Fig. 5. 5.3.3.

Optimization algorithms discover good elements with higher probability than elements with bad characteristics. all points in the search space have a non-zero probability of being found from all other points in the search space. The operator based on the uniform distribution is only able to create candidate solutions in the direct neighborhood of the known points.1 we can draw four conclusions: 1. 8000) ≈ 0. From Example E5. the Φ value of the global optimum begins to rise farther and farther. only one of the two points will be the result of the optimization process whereas the other optimum remains undiscovered. 2.3. we can see that Φ(x⋆ . finally surpassing the one of the local optimum. the definition of neighborhoods of the candidate solutions (see Definition D4. the according probability would have converge to 1. it will never discover the local one.l.7 and Φ(x⋆ . The difference between the two search operators tested becomes obvious starting with approximately the 2000th iteration. 8000) ≈ 0. The process with the operator utilizing the normal distribution. given the problem is defined in a way which allows this to happen. 5. it can never escape. If it arrives at the global optimum. it has a higher probability of being found than the global optimum x⋆ .. if an optimizer gets trapped in the local optimum.5. In other words. PROBLEM LANDSCAPES 99 Since the local optimum x⋆ resides at the center of a large basin and the gradients point l straight into its direction. e. .3.12 on page 86). The success of optimization depends very much on the way the search is conducted. In Fig.3. Hill Climbing algorithms are no Global Optimization algorithms since they have no general means of preventing getting stuck at local optima. The reason for this is that with the normal distribution. 4. Hence. In l other words. It also depends on the time (or the number of iterations) the optimizer allowed to use. all elements of the search space are adjacent. 3. it will still eventually discover the global optimum and if we ran this experiment longer. i. Even if the optimizer gets trapped in the local optimum.

Specify an algorithm for such an assignment process which sorts the individuals in a list (population pop) according to their objective value and assigns the index of an individual in this sorted list as its fitness.1005 FITNESS AND PROBLEM LANDSCAPE: HOW DOES THE OPTIMIZER SEE IT? Tasks T5 29. Should the individuals be sorted in ascending or descending order if f is subject to maximization and the fitness v is subject to minimization? [10 points] . What is are the similarities and differences between fitness values and traditional heuristic values? [10 points] 30.x) of its phenotype p. For many (single-objective) applications.x as its fitness v(p) is highly useful. a fitness assignment process which assigns the rank of an individual p in a population pop according to the objective value f(p.

x) of objective values is computed for each candidate solution p. In this case. it is time to put them together to the general structure common to all optimization processes.. a single value f(p. or on other criteria. Based on these objective values.1 Involved Spaces and Sets At the lowest level. there is the search space G inside which the search operations navigate. The same structure was already used in Example E4. Some optimization methods only process one candidate solution at a time. In the single-objective case. the optimizer will then decide how to proceed. Optimization algorithms can usually only process a subset of this space at a time.Chapter 6 The Structure of Optimization: Putting it together. This structure consists of the previously introduced well-defined spaces and sets as well as the mappings between them. to an overall picture.x. injective transformation. the optimizer may compute a real fitness v(p) denoting how promising an individual p is for further investigation. One example for this structure of optimization processes is given in Figure 6. 6.1 by using a Genetic Algorithm which encodes the coordinates of points in a plane into bit strings as an illustration. on the other hand. as parameter for the search operations. directly on the objective values. The elements of G are mapped to the problem space X by the genotype-phenotype mapping “gpm”. the objective functions(s) can be evaluated. Having determined the corresponding phenotypes of the genotypes produced by the search operations.x) determines the utility of p. the size of pop is one and a single variable instead of a list data structure is used to hold the element’s data. the corresponding candidate solutions form the population pop of the optimization algorithm.x in the population. a vector f (p. relative to the other known candidate solutions in the population pop. . i. Together with the elements under investigation in the search space. how to generate new points in G. Either based on such a fitness value. In case of a multi-objective optimization. e. In many cases. A population may contain the same individual record multiple times if the algorithm permits this.2. After discussing the single entities involved in optimization. “gpm” is the identity mapping (if G = X) or a rather trivial.

.3) (0.3) (2...0) (3.1: Spaces.3) Problem Space Genotype-Phenotype Mapping 0110 0010 0100 0011 0101 0111 1010 1100 1110 1011 X Population (Phenotypes) PopÎ(G´X) ps Genotype-Phenotype Mapping 1111 0010 0010 0010 0010 1101 0100 1000 0111 0111 1110 1111 1111 0000 0001 1000 1001 Search Space G Population (Genotypes) PopÎ(G´X) ps The Involved Spaces The Involved Sets/Elements Figure 6. Fitness Space Fitness Assignment Process f1(x) fitness R + Fitness Values v: G´X R+ Fitness Assignment Process f1(x) Objective Space Objective Function(s) optimal solutions YÍRn Objective Values f: X Y (YÍRn) Objective Function(s) (3.3) (3..102 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER.3) (1..2) (0. Fitness and heuristic values (normally) have only a meaning in the context of a population or a set of solution candidates.2) (2.1) (3.1) (3.0) (1.0) (1...0) (0.2) (1.3) (2.1) (2..2) (2..0) (2.2) (3...1) (0.2) (1.2) (3..3) (0...1) (2..2) (0. Sets..1) (1.3) (1.0) (2.2) (0.0) (0... ..3) (1..1) (0..1) (1...0) (3......2) (0..2) (0. and Elements involved in an optimization process.3) (3.

An optimization algorithm is a mapping OptAlgo : OptSetup → Φ × X∗ of an optimization setup “OptSetup” to a problem landscape Φ as well as a mapping to the optimization result x ∈ X∗ . Usual prerequisites for effective problem solving are a weakly complete set of search operations Op. Each optimization algorithm is defined by its specification that states which operations have to be carried out in which order. f . This perspective comes closer to the practitioner’s point of view whereas Definition D6. Definition D6.2 Optimization Problems An optimization problem is the task we wish to solve with an optimization algorithm. we gave two different views on how optimization problems can be defined. an optimization algorithm is characterized by . f2 . time limits for the optimization process. an arbitrarily sized subset of the problem space. we provide a Java interface which unites the functionality and features of optimization algorithms in Section 55. The setup “OptSetup” of an application of an optimization algorithm is the five-tuple OptSetup = (OptProb. Many optimization algorithms are general enough to work on different search spaces and utilize various search operations. a surjective genotype-phenotype mapping “gpm”. together with the specification of the algorithm.2 (Optimization Setup). the means to determine their utilities (f ). Here. the search operations.. and a way to compare two candidate solutions (cmpf ).1) From an abstract perspective. ∃x⋆ ∈ X : lim Φ(x⋆ .2 provides us with a mathematical expression for one specific application of an optimization algorithm to a specific optimization problem instance.1 (Optimization Problem (Mathematical View)).5 on page 713. x2 ∈ X based on their objective values f (x1 ) and f (x2 ) (and/or constraint satisfaction properties or other possible features). Definition D6.1. in the context of this book.3 (Optimization Algorithm). G. . gpm. } of n = |f | objective functions fi : X → R ∀i ∈ 1.2. and that the objective functions f do provide more information than needle-in-a-haystack configurations (see Section 16. Θ) of the optimization problem OptProb to be solved.n and a comparator function cmpf : X × X → R which allowing us to compare two candidate solutions x1 . Others can only be applied to a distinct class of search spaces or have fixed operators. it will find at least one local optimum x⋆ if granted infinite processing time and if such an l optimum exists. τ ) = 1 l l τ →∞ (6.6. This setup. we will give a more precise discussion which will be the basis of our considerations later in this book. Op.3 gives a good summary of the theoretical idea of optimization. or starting seeds for random number generators). Definition D6. Here. a set f = {f1 . . An optimization problem. the genotype-phenotype mapping. OPTIMIZATION PROBLEMS 103 6.7 on page 175). In Definition D1. Apart from this very mathematical definition. includes the definition of the possible solution structures (X). Definition D6. An optimization problem “OptProb” is a triplet OptProb = (X. .1 and D1. cmpf ) consisting of a problem space X. the termination criterion) characterize an optimizer.2 on page 22. If an optimization algorithm can effectively be applied to an optimization problem. entirely defines the behavior of the optimizer “OptAlgo”. the functionality (finding a list of good individuals) and the properties (the objective function(s). and the algorithm-specific parameter configuration Θ (such as size limits for populations. the search space G in which the search operations searchOp ∈ Op navigate and which is translated to the problem space X with the genotypephenotype mapping gpm.

the the the the 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER. 6. start out by randomly creating some initial individuals which are then refined iteratively.2 would hold. When descending this gradient. The best optimization algorithm for a given setup OptSetup is the one with the highest values of Φ(x⋆ . In this section we define and discuss general abstractions for such similarities. giving the best direction into which the search should be steered. x2 ∈ X : x1 ≺ x2 ⇒ lim Φ(x1 .wikipedia.4 (Gradient). and reasonable objective function(s)). way it applies the search operations. Also.104 1. if we have a differentiable smooth function with low total variation2 [2224].org/wiki/Gradient [accessed 2007-11-06] 2 http://en. τ ) for the optimal elements x⋆ in the problem space with the lowest corresponding values of τ .1 Gradient Descend Definition D6. 3. itself.3) In other words. way it assigns fitness to the individuals. h→0 lim ||f (x + h) − f (x) − grad(f )(x) · h|| =0 ||h|| (6.org/wiki/Del [accessed 2008-02-15] . Optimization processes which are not allowed to run infinitely have to find out when to terminate. for instance. 2. the gradient would be the perfect hint for optimizer. we cannot directly differentiate the objective functions with the Nabla operator3 ∇f because its exact mathematical characteristics are not known or too complicated. For a perfect optimization algorithm (given an optimization setup with weakly complete search operations.wikipedia. finding the best optimization algorithm for a given optimization problem is. from g1 to g2 . In most cases. A gradient1 of a scalar mathematical function (scalar field) f : Rn → R is a vector function (vector field) which points into the direction of the greatest increase of the scalar field. If we compare two elements g1 and g2 via their corresponding phenotypes x1 = gpm(g1 ) and x2 = gpm(g2 ) and find x1 ≺ x2 . Many of them. for example. it is more than questionable whether such an algorithm can actually be built (see Chapter 23). ways it selects them for further investigation. 1 http://en. Equation 6. the search space G is often not a vector space over the real numbers R. τ ) τ →∞ τ →∞ (6. we can assume that there is some sort of gradient facing upwards from x1 to x2 and hence. samples of the search space are used to approximate the gradient.3. and way it builds and treats its state information. Thus. However. What many optimization algorithms which treat objective functions as black boxes do is to use rough estimations of the gradient for this purpose.org/wiki/Total_variation [accessed 2009-07-01] 3 http://en.2) 6. 4. a multi-objective optimization problem. we can hope to find an x3 with x3 ≺ x1 and finally the global minimum. It is denoted by ∇f or grad(f ). ∀x1 .wikipedia. Therefore. a surjective genotype-phenotype mapping. uses a polytope consisting of n + 1 points in G = Rn for approximating the direction with the steepest descend.3 Other General Features There are some further common semantics and operations that are shared by most optimization algorithms. τ ) > lim Φ(x2 . The Downhill Simplex (see Chapter 43 on page 501).

2 Iterations 105 Global optimization algorithms often proceed iteratively and step by step evaluate candidate solutions in order to approach the optima. If this time has been exceeded. the optimizers determine when they have to halt. since she often wants to know whether a qualitatively interesting solution can be found for a given problem using at most a predefined number of objective function evaluations. OTHER GENERAL FEATURES 6. Definition D6. Algorithms are referred to as iterative if most of their work is done by cyclic repetition of one main loop. One example for iterative algorithm is Algorithm 6. this time threshold can sometimes not be abided exactly and may avail to different total numbers of iterations during multiple runs of the same algorithm. An iteration4 refers to one round in a loop of an algorithm. including the number of performed steps t. an upper limit for the number of iterations t or individual evaluations τ may be specified. Hence. iterations are referred to as generations. false} evaluates to true. you can find a Java interface resembling Definition D6.3.3. In Listing 55. the objective values of the best individuals. Instead of specifying a time limit. In Listing 56.9 on page 712. 2.7 (Termination Criterion). The user may grant the optimization algorithm a maximum computation time. 2622. 6.53 on page 836. In the context of this book.wikipedia.1. The value t ∈ N0 is the index of the iteration currently performed by the algorithm and t + 1 refers to the following step. 3088. We distinguish between evaluations τ and iterations t. an iterative optimization algorithm starts with the first step t = 0. The value τ ∈ N0 denotes the number of candidate solutions for which the set of objective functions f has been evaluated. Such criteria are most interesting for the researcher. Definition D6.org/wiki/Iteration [accessed 2007-07-03] ˘ ˘ . With terminationCriterion . for instance. It is one repetition of a specific sequence of instructions inside of the algorithm.6.7. the optimization process will stop and return its results. Some possible criteria that can be used to decide whether an optimizer should terminate or not are [2160. When the termination criterion function terminationCriterion ∈ {true. we give a Java implementation of the terminationCriterion operation which can be used for that purpose. the optimizer should stop. Definition D6. 4 http://en.3 Termination Criterion The termination criterion terminationCriterion is a function with access to all the information accumulated by an optimization process. and the time elapsed since the start of the process. Here we should note that the time needed for single individuals may vary. There often exists a well-defined relation between the number of performed evaluations τ of candidate solutions and the index of the current iteration t in an optimization process: Many Global Optimization algorithms create and evaluate exactly a certain number of individuals per generation.5 (Evaluation).3.6 (Iteration). and so will the times needed for iterations. In some optimization algorithms like Genetic Algorithms. 3089]: 1.

5. 6. Such algorithms may be used for both. Therefore x1 ≺ x2 ⇔ cmpf (x1 .1. The training set is used to guide the optimization process whereas the test set is used to verify its results. it will run forever even if an optimal solution exists. we can apply any combination of the criteria above in order to determine when to halt.106 6 THE STRUCTURE OF OPTIMIZATION: PUTTING IT TOGETHER. . we will present them as minimization processes since this is the most commonly used notation without loss of generality. As previously mentioned. Note that using the prevalence comparisons as introduced in Section 3. Then. This example also shows another thing: The structure of the termination criterion decides whether a probabilistic optimization technique is a Las Vegas or Monte Carlo algorithm (see Definition D12. minimization or maximization. may get stuck at a local optimum. If we optimize something like a decision maker or classifier based on a sample data set. From its structure.1: Example Iterative Algorithm Input: [implicit] terminationCriterion: the termination criterion Data: t: the iteration counter 1 2 3 4 begin t ←− 0 // initialize the data of the algorithm while ¬terminationCriterion do // perform one iteration . Algorithm 6. there exist problem classes far away from such simple function optimization that require complex models and simulations. we can terminate an optimization process if it has already yielded a sufficiently good solution. multi-objective optimization processes can be transformed into single-objective minimization processes.5.3.2 and D12.4 Minimization The original form of many optimization algorithms was developed for single-objective optimization. Algorithm 1. In practical applications.here happens the magic t ←− t + 1 6.5 Modeling and Simulating While there are a lot of problems where the objective functions are mathematical expressions that can directly be computed.2 on page 76. Then. it is clear that it is very prone for premature convergence (see Chapter 13 on page 151 and thus. We can compare the performance of our solution when fed with the training set to its properties if fed with the test set.3.3). 3. An algorithm that maximizes the function f may be transformed to a minimization using −f instead. An optimization process may be stopped when no significant improvement in the solution quality was detected for a specified number of iterations. the process most probably has converged to a (hopefully good) solution and will most likely not be able to make further progress. This comparison helps us detect when most probably no further generalization can be achieved by the optimizer and we should terminate the process. we will normally divide this data into a training and a test set. How the termination criterion may be tested in an iterative algorithm is illustrated in Algorithm 6.1 on page 40 is an example for a badly designed termination criterion: the algorithm will not halt until a globally optimal solution is found. x2 ) < 0. 4. Obviously. however.

like (a) a map on which the Artificial Ant will move driven by an optimized program. and so on.3. They are defined by leaving away facts that probably have only minor impact on the conclusions drawn from them. Then we can test the influence of various wind forces and directions on building structure and approximate the properties which define the objective values. Models of the environment in which we can test and explore the properties of the potential solutions. 2. It often makes more sense to bring the construction plan of a skyscraper to life in a simulation.wikipedia. (d) a physical model of air which blows through the turbine. circuit diagrams for logical circuits. Examples are (a) (b) (c) (d) (e) programs in Genetic Programming. Models themselves are rather static structures of descriptions and formulas. (c) a model of the network in which the evolved distributed algorithms can run. construction plans of a skyscraper.org/wiki/Model_%28abstract%29 [accessed 2007-07-03] 6 http://en.wikipedia. distributed algorithms represented as programs for Genetic Programming. with wind blowing from several directions. The models of the potential solutions shape the problem space X. A simulation6 is the computational realization of a model.9 (Simulation). OTHER GENERAL FEATURES 107 Definition D6. A model5 is an abstraction or approximation of a system that allows us to reason and to deduce properties of the system. They allow us to reason if a model makes sense or not and how certain objects behave in the context of a model. (e) the model of an energy source the other pins which will be attached to the circuit together with the possible voltages on these pins. Simulations are executable. for example for the Artificial Ant problem. live representations of models that can be as meaningful as real experiments. a simulation realizes these connections. Definition D6.org/wiki/Simulation [accessed 2007-07-03] . Whereas a model describes abstract connections between the properties of a system.6. In the area of Global Optimization. 5 http://en. Deriving concrete results (objective values) from them is often complicated. Models are often simplifications or idealization of real-world issues. we often need two types of abstractions: 1. (b) an abstraction from the environment in which the skyscraper will be built. construction plans of a turbine.8 (Model).

.

always first check if a dedicated algorithm exists for the problem (see Section 1. Define the objective functions f ∈ f which are subject to optimization (see Section 2. Decide upon the optimization algorithm to be used for the optimization problem (see Section 1. . x2 ) able to decide which one of two candidate solutions x1 and x2 is better (based on f . Check for possible alternatives to using optimization algorithms! 1.1: Use dedicated algorithms! There are many highly efficient algorithms for solving special problems. For finding the shortest paths from one source node to the other nodes in a network. for example. the problem space X (see Section 2.1). Define a comparator function cmpf (x1 . 6.5. if needed (see Section 3. Choose an appropriate search space G (see Section 4.3). Figure 7. If this search did not lead to the discovery of a suitable method to solve the problem. Define the data structures for the possible solutions. the following steps should be performed in the order they are listed below and sketched in Figure 7. single-objective case which is most common. The first step before using them is always an internet or literature search in order to check for alternatives. This decision may be directly implied by the choice in Point 5. 3.2).2).2. one would never use a metaheuristic but Dijkstra’s algorithm [796]. a randomized or traditional optimization technique can be used.Chapter 7 Solving an Optimization Problem Before considering the following steps. For computing the maximum flow in a flow network. Define the equality constraints h and inequality constraints g. this function will just select the one individual with the lower objective value.4). 0. 4.1). see Section 3.1. the algorithm by Dinitz [800] beats any probabilistic optimization method. Therefore. 5. and the constraints hi and gi .3). The key to win with metaheuristics is to know when to apply them and when not. 2. In the unconstrained.

Choose search space. „Our approach beats approach xyz with 2% probability to err in a two-tailed Mann-Whitney test. 30 runs for each configuration and problem instance e. Configure optimization algorithm.. . crossover rate = y. Define optimization criteria and constraints. v(pÎPop. no Are results satisfying? yes Production Stage integrate into real application Figure 7.2: . searchOp: G m G.g. and mapping to problem space. search operations..110 7 SOLVING AN OPTIMIZATION PROBLEM Are there dedicated algorithms? yes Use dedicated algorithm! no Determine datastructure for solutions.g.. gpm: G X population size = x. a few runs per configuration no Are results satisfying? yes Perform real experiment consisting of enough runs to get statistically significant results. Select optimization algorithm. X f: X Rn. Pop) R+ EA? CMA-ES? DE? G.. Run small-scale test experiments. Statistical Evaluation e.

i. If the results are bad. Now either the results of the optimizer can be used and published or the developed optimization system can become part of an application (see Part V and [2912] as an example). Define the set searchOperations of search operations searchOp ∈ Op navigating in the search space (see Section 4. Obtain enough results to make statistical comparisons between the configurations and with other algorithms (see e. depending on how bad the results are.7 SOLVING AN OPTIMIZATION PROBLEM 111 7. depending on the goal of the work. Section 53. 10. set all its parameters to values which look reasonable. Configure the optimization algorithm.g..7). . again consider to go back to a previous step. Conduct experiments with some different configurations and perform many runs for each configuration. 8. e. 5. 9. This decision may again be directly implied by the choice in Point 5. or even Point 1. whether they approximately meet the expectations. go back to Point 9. continue at Point 11 11.2). Define a genotype-phenotype mapping “gpm” between the search space G and the problem space X (see Section 4. Run a small-scale test with the algorithm and the settings and test.. If not. 12. Otherwise.3).

Why is a single experiment with a randomized optimization algorithm not enough to show that this algorithm is good (or bad) for solving a given optimization task? Why do we need comprehensive experiments instead. as pointed out in Point 11 on the previous page? [5 points] . Why is it important to always look for traditional dedicated and deterministic alternatives before trying to solve an optimization problem with a metaheuristic optimization algorithm? [5 points] 32.112 Tasks T7 7 SOLVING AN OPTIMIZATION PROBLEM 31.

it therefore utilizes the nullary search operation “create” which builds a genotype p.x ∈ X with the genotype-phenotype mapping “gpm”.4 on page 729. As sketched in Algorithm 8. Definition D8. The answer to this question is strongly related to how well it can utilize the information it gains during the optimization process in order to find promising regions in the search space. The comparison criterion should be the solution quality in terms of objective values. In every iteration. If so.1 on page 143). The results are acceptable as long as the user is satisfied with them. It is uninformed. It checks whether the new phenotype is better than the best candidate solution discovered so far (x). Hence.1 (Effectiveness). An interesting question. is When is an optimization algorithm suitable for a given problem?. . A third baseline algorithm is exhaustive enumeration. e. does not make any use of the information it can obtain from evaluating candidate solutions with the objective functions. It can be used as a yardstick in terms of optimization speed.1 Random Sampling Random sampling is the most basic search approach.1) with a lower-than-exponential algorithmic complexity (see Section 12. it makes sense to declare an optimization algorithm as efficient if it can outperform primitive approaches which do not utilize the information gained from evaluating candidate solutions with objective functions. i.1 and implemented in Listing 56. e.. however. it keeps creating genotypes of random configuration until the termination criterion is met. i.g ∈ G with a random configuration and maps this genotype to a phenotype p.2 (Efficiency). An optimization algorithm is effective if it can (statistically significantly) outperform random sampling and random walks in terms of solution quality according to the objective functions.Chapter 8 Baseline Search Patterns Metaheuristic optimization algorithms trade in the guaranteed optimality of their results for shorter runtime. it is stored in x. Definition D8.. An optimization algorithm is efficient if it can meet the criterion of effectiveness (Definition D8. The primary information source for any optimizer is the (set of) objective functions. It does not utilize the information gained from the objective functions as well. such as the random sampling and random walk methods which will be introduced in this chapter. within fewer steps than required by exhaustive enumeration. This means that their results can be (but not necessarily are) worse than the global optima of a given optimization problem. 8. the testing of all possible solutions.

8.1: x ←− randomSampling(f) Input: f: the objective/fitness function Data: p: the current individual Output: x: the result.x) < f(x)) then x ←− p. In each iteration.g ←− create p. It is well known as the No Free Lunch Theorem (see Chapter 23). that averaged over the set of all possible optimization problems. We specify the random walk algorithm in Algorithm 8. a comparison to this primitive algorithm is necessary whenever a new. undirected search methods. we can deduce that (a) it is not able to utilize the information gained from the objective function(s) efficiently (see Chapter 14 and 16). They form the second one of the two baseline algorithms to which optimization methods must be compared for specific problems in order to test their utility. The difference is that optimization algorithms make use of the information they gain by evaluating the candidate 1 http://en.2 Random Walks Like random sampling. random sampling will be completely unbiased as well. random walks are quite similar in their structure to basic optimization algorithms such as Hill Climbing (see Chapter 26). random walks1 [899.. no optimization algorithm can beat random sampling. i. e. 1302. While random sampling algorithms make use of no information gathered so far. not yet analyzed optimization task is to be solved. it uses a unary search operation (in Algorithm 8.g) if (x = ∅) ∨ (f(p. 2143] (sometimes also called drunkard’s walks) can be considered as an uninformed.wikipedia. random walks base their next “search move” on the most recently investigated genotype p. Still.x until terminationCriterion return x Notice that x is the only variable used for information aggregation and that there is no form of feedback. unbiased way. (b) its search operations cannot modify existing genotypes in a useful manner (see Section 14. a random walk starts at one (possibly random) point. The search makes no use of the knowledge about good structures of candidate solutions nor does it further investigate a genotype which has produced a good phenotype. for specific families of problems and for most practically relevant problems in general. However.5 on page 732. or (c) the information given by the objective functions is largely misleading (see Chapter 15).2). Instead.2 and implement it in Listing 56. If an optimization algorithm does not perform better than random sampling on a given optimization problem.114 8 BASELINE SEARCH PATTERNS Algorithm 8.. This unbiasedness and ignorance of available information turns random sampling into one of the baseline algorithms used for checking whether an optimization method is suitable for a given problem.2 simply denoted as searchOp(p. Instead of sampling the search space uniformly as random sampling does.g.org/wiki/Random_walk [accessed 2010-08-26] . Because of this structure. the best candidate solution discovered 1 2 3 4 5 6 7 8 9 begin x ←− ∅ repeat p. if “create” samples the search space G in a uniform.g)) to move to the next point.x ←− gpm(p. outperforming random sampling is not too complicated.

Measures that can be derived mathematically from a walk model include estimates for the number of steps needed until a certain configuration is found. the best candidate solution discovered 1 2 3 4 5 6 7 8 9 10 115 begin x ←− ∅ p.f. i. It starts at a random location in the search space and pro- . 2. From practically running random walks. random walk or random sampling may even outperform them if the problems are deceptive enough (see Chapter 15). for instance.x ←− gpm(p. applying one of these two uninformed. 8. for example. like a random walk. Figure 8. The same goes for Fig.x) < f(x)) then x ←− p. we are maybe not able to navigate to past states again. and the average difference of the objective values of two consecutive populations.g ←− create repeat p. usually works on a population of size 1.c each illustrate three random walks in which axis-parallel steps of length 1 are taken. 8. 2517]. The subfigures 8.a to 8. if a search algorithm encounters a state explosion because there are too many states and transcending to them would consume too much memory. Random walks are often used in optimization theory for determining features of a fitness landscape. One example for such a case is discussed in the work of Skubch [2516.1. each of which displaying three random walks with Gaussian (standard-normally) distributed step widths and arbitrary step directions.1 illustrates some examples for random walks.1.2. RANDOM WALKS Algorithm 8.2.8.1. since optimization methods always introduce some bias into the search process. uses the number of encounters of certain state transition during the walk in order to successively build a heuristic function.2: x ←− randomWalk(f) Input: f: the objective/fitness function Data: p: the current individual Output: x: the result. Better yet..x p. 2517] about reasoning agents. some information about the search space can be extracted.1 Adaptive Walks An adaptive walk is a theoretical optimization method which. 8.g) if (x = ∅) ∨ (f(p. Skubch [2516. In any useful configuration for optimization problems which are not too ill-defined.d to Fig. e. primitive strategies is advised.g ←− searchOp(p. In certain cases of online search it is not possible to apply systematic approaches like breadth-first search or depth-first search. the chance to find a certain configuration at all.g) until terminationCriterion return x solutions with the objective functions. is only partially observable and each state transition represents an immediate interaction with this environment. they should thus be able to outperform random walks and random sampling. In cases where they cannot. always try to steer into the direction of the search space where they expected a fitness improvement. 1. If the environment. The dimensions range from 1 to 3 from the left to the right. Circumstances where random walks can be the search algorithms of choice are. for instance.1.

If the new individual is better. it becomes clear that adaptive walks are very similar to Hill Climbing and Random Optimization. are in its ε = 0-neighborhood. 2528]. e. 1057.a: 1d. 8. axis-parallel. Greedy dynamics: The optimization process chooses a single new individual from the set of “one-mutant change” neighbors.1: Some examples for random walks. axis-parallel. 8. it replaces its ancestor. step length 1 Fig. One-mutant change: The optimization process chooses a single new individual from the set of “one-mutant change” neighbors. 8.3 Exhaustive Enumeration Exhaustive enumeration is an optimization method which is as same as primitive as random walks. arbitrary angle. Gaussian step Fig. helps us to determine properties of fitness landscapes whereas the other two are practical realizations of optimization algorithms. 3. Gaussian step length Fig. The major difference to the previous form is the number of steps that are needed per improvement.b: 2d. ceeds by changing (or mutating) its single individual. Adaptive walks are a very common construct in evolutionary biology. i. 8.1. arbitrary angle. three methods are available: 1. 8.116 steps 8 BASELINE SEARCH PATTERNS Fig. If it is not better than the current candidate solution.. 2. very much like random walks.d: length steps 1d.1.1.f: 3d.e: 2d. otherwise it is discarded. These neighbors differ from the current candidate solution in only one property.1. the search continues until a better one has been found or all neighbors have been enumerated. 8. step length 1 Fig. For this modification. Fitter Dynamics: The optimization process enumerates all one-mutant neighbors of the current candidate solution and transcends to the best one. The dynamics of such populations in nearequilibrium states with low mutation rates can be approximated with one-mutant adaptive walks [79. From these elaborations. 8. Gaussian step length Figure 8.c: 3d. step length 1 Fig. Biological populations are running for a very long time and so their genetic compositions are assumed to be relatively converged [79.1. The major difference is that an adaptive walk is a theoretical construct that. 1057]. It can be applied to any finite search space and basically consists of testing every single .1.

the algorithm can also terminate sooner. the algorithm will need exactly 2n steps if no lower limit y for the value of f is ˘ known. If the minimum possible objective value y is known for a given minimization ˘ problem.2). no algorithms are known which can generally be faster than exhaustive enumeration. e.. the best candidate solution discovered 1 2 3 4 5 6 7 8 begin x ←− gpm(G[0]) t ←− 1 while (t < |G|) ∧ (f(x) > y ) do ˘ x ←− gpm(G[t]) if f(x) > f(x) then x ←− x t ←− t + 1 return x Algorithm 8.8.3: x ←− exhaustiveSearch(f. For an input size of n.. the bit strings of length n. . y ) ˘ Input: f: the objective/fitness function Input: y : the lowest possible objective value. −∞ if unknown ˘ Data: x: the current candidate solution Data: t: the iteration counter Output: x: the result. ˘ It is interesting to see that for many hard problems such as N P-complete ones (Section 12. i. the algorithm may terminate ˘ before traversing the complete search space if it discovers a global optimum with f(x) = y ˘ earlier. It is based on the assumption that the search space G can be traversed in a linear order. returning the best candidate solution discovered after all have been tested. If G = Bn .3. y = −∞ can be provided. i. where the tth element in the search space can be accessed as G[t]. algorithms for such problems need O(2n ) iterations. e.2. If no lower limit for f is known. Exhaustive search requires exactly the same complexity. Algorithm 8. EXHAUSTIVE ENUMERATION 117 possible solution for its utility. If this limit is known.3 contains a definition of the exhaustive search algorithm. The second parameter of the algorithm is the lower limit y of the objective value.

apply both Algorithm 1. try to find a problem (problem space. [10 points] 37. Is the experiment in Task 36 suitable to validate your hypothesis? If not. 1)? Answer this question both from (a) a theoretical-mathematical perspective in an idealized execution environment (i. What is are the similarities and differences between random walks. Implement an exhaustive search pattern to solve the instance of the Traveling Salesman Problem given in Example E2.1 on page 40 and your random sampling implementation to these instances. [10 points] .1 on page 40 and test it on the same problem instances used to verify Task 4 on page 42. random sampling. Can we implement an exhaustive search pattern to the find the minimum of a mathematical function f defined over the real interval (−1. with infinite precision) and (b) from a practical point of view based on real-world computers and requirements (i. [10 points] 40. Did the experiments support your hypothesis? [15 points] 38. Discuss your results.1 on page 40 and test it on the same problem instances used to verify Task 4 on page 42 and 36. e.. Implement random sampling for the bin packing problem as a comparison algorithm for Algorithm 1. [10 points] 39. If yes. Why should we consider an optimization algorithm as ineffective for a given problem if it does not perform better than random sampling or a random walk on the problem? [5 points] 35.. can you find some instances of the bin packing problem where it can be used to verify your idea? If so. Implement an exhaustive-search based optimization procedure for the bin packing problem as a comparison algorithm for Algorithm 1. [5 points] 36. e. objective function) where this could be the case. Is it possible that an optimization algorithm such as. for example.2 on page 45. under the assumption that the available precision of floating point numbers and of any given production process is limited).1 on page 40 may give us even worse results than a random walk or random sampling? Give reasons. Discuss your results.118 Tasks T8 8 BASELINE SEARCH PATTERNS 33. and adaptive walks? [10 points] 34. Compare your answers to Task 35 and 36. Algorithm 1.

Optimizers focus mainly on the phenotypical properties since these are evaluated by the objective functions. for example.1.1 (Examples for Properties). or phenotypes to a set of possible property values. we are going to discuss it in the more general version from Weicker [2879] as introduced by Radcliffe and Surry [2241–2243. the properties of the genotypes are considered as well.x. A genotypical φ3 property then would be if a certain sequence of bits occurs in the genotype p. Definition D9. if the result of z is close to a value 1 for the input 0. In this section. gray} could denote the color of a specific vertex q as illustrated in Figure 9. black} and property f4 maps trees to the natural numbers. a property φ5 : X → {black.1 (Property). 1250. genotypes. Example E9. white. false}. and false otherwise: φ2 = |z(0) − 1| ≤ 0. since they influence the features of the phenotypes via the genotypephenotype mapping. Assume that the optimizer uses a binary search space G = Bn and that the formulas z are decoded from there to trees that represent mathematical expression by a genotypephenotype mapping. ..Chapter 9 Forma Analysis The Schema Theorem has been stated for Genetic Algorithms by Holland [1250] in its seminal work [711. i. φ1 and φ2 both map the space of mathematical functions to the set B = {true.1 on page 531. With Example E9.1 ∀z ∈ X. 2634].g and another phenotypical property φ4 is the number of nodes in the phenotype p. we can consider properties φi to be some sort of functions that map the individuals to property values.1. A rather structural property φ1 of formulas z : R → R in symbolic regression1 would be whether it contains the mathematical expression x + 1 or not: φ1 : X → {true. A property φ is a function which maps individuals. If we try to solve a graph-coloring problem. e. We can also declare a behavioral property φ2 which is true if |z(0) − 1| ≤ 0. gray. false} whereas φ5 maps the space of all possible colorings for the given graph to the set {white. 1253]. 1 More information on symbolic regression can be found in Section 49. In general. In the forma analysis. Each individual p in the population pop of an optimization algorithm is characterized by its properties φ. we want to clarify how possible properties could look like.1 holds. for instance. 2246.

two different characteristics of φi. 2 See the definition of equivalence classes in Section 51. p2 ∈ Aφi =v ⇒ p1 ∼φi p2 The number of formae induced by a property.120 9 FORMA ANALYSIS Af =black 5 G5 q G4 G1 q q G6 G8 Pop Í X (G=X) Af =white 5 q G2 q q G7 q G3 q Af =gray 5 Figure 9. The precision of φ1 and φ2 in Example E9. An equivalence class Aφi =v that contains all the individuals sharing the same characteristic v in terms of the property φi is called a forma [2241] or predicate [2826].1 Cont. Two formae Aφi =v and Aφj =w are said to be compatible. e. Aφi =v ⊲⊳ Aφj =w ⇔ Aφi =v ∩ Aφj =w = ∅ Aφi =v ⊲⊳ Aφj =w ⇔ ∃ p ∈ G × X : p ∈ Aφi =v ∧ p ∈ Aφj =w Aφi =v ⊲⊳ Aφi =w ⇒ w = v (9. two different formae of the same property φi . Definition D9.2 (Examples for Formae (Example E9. for each two candidate solutions x1 and x2 .1) Obviously.3) ∀p1 . if there can exist at least one individual which is an instance of both.9 on page 646. are always incompatible. . These relations divide the search space into equivalence classes Aφi =v .. e. On the basis of the properties φi we can define equivalence relations2 ∼φi : p1 ∼φi p2 ⇒ φi (p1 ) = φi (p2 ) ∀p1 .4) (9.1: An graph coloring-based example for properties and formae. i.2) (9.2 (Forma). either x1 ∼φi x2 or x1 ≁φi x2 holds. p2 ∈ G × X (9. Example E9. is called its precision [2241].1 is 2. Aφi =v = {∀p ∈ G × X : φi (p) = v} (9. for φ5 it is 3.5) (9.6) Of course. i.. This property would have an uncountable infinitely large precision. We can define another property φ6 ≡ z(0) denoting the value a mathematical function has for the input 0. the number of its different characteristics.)). written as Aφi =v ⊲⊳ Aφj =w .

All formae of the properties φ1 and φ2 on the other hand are compatible: Aφ1 =false ⊲⊳ Aφ2 =false . Aφ1 =false ⊲⊳ Aφ2 =true . If we take φ6 into consideration. Generally. The hope is that there are many compatible ones under these formae that can gradually be combined in the search process.95 .2: Example for formae in symbolic regression.1 4 4 4 2 z7 z6 z5 z3 z4 z1 z8 z2 Af =true 2 z2 z7 z1 z8 Af =false 2 Af =2 Af =1 Figure 9. since it is not possible that a function z contains a term x + 1 and at the same time does not contain it.1 z1(x)=x+1 z8(x)=(tan x)+1 z3(x)=x+2 z5(x)=tan x z7(x)=(cos x)(x+1) z4(x)=2(x+1) z6(x)=(sin x)(x+1) Pop Í X (G=X) z5 z4 z3 4 z1 ~f z4 z7 z6 z5 z3 1 z2 Af =true Af =false 1 1 z8 ~f ~f6 z6 Af =0 Af =1. The idea is that an optimization algorithm should first discover formae which have a good influence on the overall fitness of the candidate solutions. like Aφ2 =true ⊲⊳ Aφ6 =1 and Aφ2 =false ⊲⊳ Aφ6 =2 . In our initial symbolic regression example Aφ1 =true ⊲⊳ Aφ1 =false holds. and Aφ1 =true ⊲⊳ Aφ2 =true all hold. when appropriate. . The discussion of forma and their dependencies stems from the Evolutionary Algorithm community and there especially from the supporters of the Building Block Hypothesis. In this text we have defined formae and the corresponding terms on the basis of individuals p which are records that assign an element of the problem spaces p. we will find that there exist some formae compatible with some of φ2 and some that are not. but Aφ2 =true ⊲⊳ Aφ6 =0 and Aφ2 =false ⊲⊳ Aφ6 =0.9 FORMA ANALYSIS 121 z2(x)=x2+1. we will relax this notation and also discuss forma directly in the context of the search space G or problem space X. Aφ1 =true ⊲⊳ Aφ2 =false .x ∈ X to an element of the search space p.g ∈ G.

[6 points] 42. or phenotypes of the Traveling Salesman Problem as introduced in Example E2. [10 points] . Define at least three reasonable properties on the individuals.1 on page 46.122 Tasks T9 9 FORMA ANALYSIS 41. List all the members of your forma given in Task 42 for the Traveling Salesman Problem instance provided in Table 2. genotypes. [12 points] 43. List the relevant forma for each of the properties you have defined in based on your definitions in Task 41 and in Task 26 on page 91.2 on page 45 and further discussed in Task 26 on page 91.

868. 1851.1.3. 1834. 17.4 see Table 3. 29. 432. 28. 40. 2132. 1497. 1308.1.1 Applications and Examples Table 10. 1850. 2790. 671. 250. see also Tables 3. 1264. 2257.1.1. 178. see also Tables 3. 783.4.1. 1091. 2262.1 [2301. 680.1. 1625.1. 46. 3017. 3047]. 1089. 31. 46. 18. 2330. 2717.2.1. 1694. 2512.1.1. 22.1 .1. 568. 1096. 475. 22. 2817. 251. 2422. 130. 2885.1. 2261.1. 31. 3047] [5. 47. 2000. 30. 33.1 see Table 28.1. 2673. 118. 2288–2290. 822–825. 27. 40. 806. 38. 2131. 30. 36.1. 2223. 1879. 1042. 3022. Area Art Astronomy Business-To-Business Chemistry Cloud Computing Combinatorial Problems References see Tables 28. 131.1. 571. 2692.1. 41. 29. 3. 20. 1501.1: Applications and Examples of Optimization. 431. 1533. 1998. and 48.1. 2124. 1313. 34.1. 626–628. 1239. 2219. 2812. 682.4. 37.1. 213. 576.1. 115. 28.1.1. 2256. 2511. 1723. 2172. 265. 497. 2667. 2066. 26. and 43. 2248. 1585. 1472.1 [623]. and 31. 2508.Chapter 10 General Information on Optimization 10. 2813. 35. 1027. 1502.1.4. 2449. 3016. 1026.1.1. 1008. 1124. 1724. 2990. 25.1.1. 533.1. 2900.1. 463. 832. 578. 29. 315. 42. 39. 241. 1280. 32.

1. 27.1. 2511. 250. 28. 29. 41. 115. 497.1. see also Tables 3.1.1. 28. 39. 189. 2839. 2839. 226. 30. 29. 2512.1. 35. 991. 21. 1561. 3078].1.1. 1425. see also Tables 3. 46. 578.1. 2333. and 43. tems 628.1.1.[127. and 37. 277.1.1 Face Recognition [471] Function Optimization [72. 2602. 22. 29. 1585. 29. 29.1. 2716. 281. see also Tables 3.1. 2132.1.1. 2257. 46. 2878. 29. 37. 3. 3.1.1.1.1. 961. 1188. 30. 39. 576. 37. 2812. 32. 1224. 31. 1263. 41. 3047.1. 31. 26. 27. 3017. 2066. 29. 39. 21. 34. 2202. 36. 1280.1. 35. 26. and 39.1.1.1.1.1. 34.1. 3066. 2088. 305.1. 33. 1091. 1431. 32. 28.1. 983. 46. see also Tables 22.1.1.1. 33. 2375. 22. 1472.1. 2971. 2312. 26.1.1. 38. 2790. and 46. and 48. 203. 29. 3004.1. 28. 565. 31.2. 28.1. 281.1.124 Computer Graphics 10 GENERAL INFORMATION ON OPTIMIZATION [45.1.1. 805. 1194. 2238.4. 33. 29. 30. 2272. 39.1 E-Learning [506. 35.1 Digital Technology see Tables 3.1. see also Tables 3. 36. 1625. 33. 1650. 35. 1137. 766. 36. 1501.1. 28. 832.1.1. 29. 2070. 450. 849.2.1.1. 31. 31. 21. 2202. 217. 1089. 3. 46. 32.1.1. 36. 3017]. 28.1. 1942. 36. see also Tables 18. see also Tables 28. 41. 251. 27. 1227. 2288–2290.1 Healthcare [45. 1724. 1625.1 Decision Making [1194].3 Engineering [189.1 Graph Theory [241. and 47.1. 40. 34.1. 506.1 Economics and Finances [204. 178. 30.1. 27.1. see also Tables 3. 29. 27. 806.1. 131. 1342. 26. 40.1.1. 538. 2319. 35. 39. 37. 46.4. 1027. 2301. 533. 1645. 310. 41.1.4. 2708].1. 1100.1. 1998. 38. 3.1 . 3022].1. 975.4. 533.1.2. 3077.1 Data Compression see Table 31. 39. 576. 2839]. 1124. 671.1. 34. 213. 39. 650. 199. 241.1.1. 617.1. 991. 26.1.1.1. 1176.1. 1008.1. 29. 2673. 265. 2824.1. 2506.1 Control [1922]. 18.1. 2319.1.1 Image Synthesis [1637].1. 1425. and 31. 37. 2262. 2602. see also Tables 3. 33.3 Databases [115.1 Games [1954. 2383–2385].1. and 36. 40.1. 130. 25. 40. 1101. 20. 37.1. 968.1.1. 2878. 3103]. 310. 31.1. 2330. 1723. 1308. 450. 1313. 36.1. and 35. 3. 27. 1194. 2692. 495. see also Tables 3.1.1. 3022.1. 2172.1. 1647. 40. 823.1. 2999.1.1.1.1. 37.1. 28.4. 2459. 312.1.3. 25.4 Logistics [118. 930.2. 2716.1. 1472. 29. 568. 92. see also Tables 3.4. 2717.1. 28. 825.1. 40.1 Data Mining [37.3.3. 35. 37. 2813.1.1. 32. 1259. 628. 126.1. see also Tables 28.1. 18. 21.1. 2000.1 Cooperation and Teamwork [2238]. 34. 1998. 28. 330–333.1. 28.2. 2716. 2131. 31. 27. and 47.1.1. 3. 30. 46. 2839. 46.4.1. 37. 1035. 2939. 2990. 2357. 1219. 26.1. 3047]. 21. 2375. 22.4. 1690.1. 30. 46.1. see also Tables 3.1.1. 36. 2257. 2261. 2824. 613. 1851. 36. 1694. 34. see also Tables 28. 31. 27. 1690. 22. 571.1. 2234.1.1. 3103]. 559.3. 2878.4. 1219. 47. see also Tables 3. 26.1. 18. 2788. 889.1.4.1.1. and 32.1. 2256. and 35.1.1.1.1. 27.1. 2319]. 35. 2671. 495.4. 565.1. 627. 1136. 3047.3.1. 1777. 1096. see also Tables 3. 31.1. and 37. 2508. 1168. 680. 868. 29.1.2. 204. 28.1.1 and 28. 2088.2. 2330. 1690. 822. 327.1.4.1.1. 33. 1194. 3016. 2172.4.1.1. 406.1.1. 1168. 40.1. 21. 2638. 1777.1. 2459. 783.1. 29.4. 2301. and 47. 650.1. 2817. 1036. and 46.1.1. 1497.1. and 47. 32.4. 20. 30.1. 31.1. 41.1 Distributed Algorithms and Sys. 2961. 18.1. 1101. 3078. 36.1.1. 2248. 492.1.1. 1777. 824. 3077. 25. 31. 2070. 1503. 2506]. 2375. 3058]. 2849. 40. 1850. 33.1.1. 844.1. 20. 656. 1959. 22. 31. 35. 2790. 39. 34.1.

see also Tables 3. see also Tables 3. 18.1. 15. 31. and 35.4 and 31. 1651]. 32. 34. 22. and 34.1.1.1. 37.1. 3022]. 27. see also Tables 3.1. 22. 12.1.3 Theorem Proofing and Automatic see Tables 29. see also Tables 3.1.1.1.2 1. 28. 20. 2000. 18. 40. 2131. 27.1. 2357. 1027.1.1.4. 22.1.1.4.4. and 46. 34. 21. 1850. 1851. 29.1. 2330.1. 30. 28. 46.1.1 Police and Justice [471] Prediction [405].1.1. 2667. 39. 31.1.10. 39.1. 1020. 32. and 46. 31. 29.1.1.1. 40. 26. 930. see also Table 31.1.1. 36. 36. 3078]. 2200.1 News Industry [1425] Nutrition [2971] Operating Systems [2900] Physics [1879].1. 28.1. 31. 26.1 Military and Defense see Tables 3.1 Motor Control see Tables 18.4. 2.1. 22. and 31. 35. 1168.1.4. 28. 31. and 46. 27. 766. 2971. 2939.1.4. 5.1.3 Verification Water Supply and Management see Tables 3. 29. and 34. 29.1.1. and 37.1. 27.1. 26. 8. 20.1. 6. 29.1. BOOKS Mathematics 125 [405. 37.1. 30.1. 31. 2349. 22. see also Tables 3. 34.1.1 and 28.1. 18. 13.4.1.1.3 Sorting see Tables 28. and 46.1. 36. 26. 27. see also Tables 28. 827.1.1. 18. 31.2. 30.4.1. 37.1 Multi-Agent Systems [1818. 1651. 28.4. 19.1. 14.1. 10. 33. 40. 29.1. 28.4.1. and 35.1. 22. see also Tables 3.1. 3.1. 33. 34.1. 2919].1 Software [822. 26.1. 2961.1.1. 28.1.1.1.1.4. 28.4 Wireless Communication [1219].3 Shape Synthesis see Table 26. 9. 2885. 3058].1 Telecommunication see Tables 28.1 Security [2459.1. 33.1.1. 40. 35. 33. Books Handbook of Evolutionary Computation [171] New Ideas in Optimization [638] Scalable Optimization via Probabilistic Modeling – From Algorithms to Applications [2158] Handbook of Metaheuristics [1068] New Optimization Techniques in Engineering [2083] Metaheuristics for Multiobjective Optimisation [1014] Selected Papers from the Workshop on Pattern Directed Inference Systems [2869] Applications of Multi-Objective Evolutionary Algorithms [597] Handbook of Applied Optimization [2125] The Traveling Salesman Problem: A Computational Study [118] Intelligent Systems for Automated Learning and Adaptation: Emerging Trends and Applications [561] Optimization and Operations Research [773] Readings in Artificial Intelligence: A Collection of Articles [2872] Machine Learning: Principles and Techniques [974] Experimental Methods for the Analysis of Optimization Algorithms [228] Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8] Evolutionary Algorithms for Solving Multi-Objective Problems [599] Pareto Optimality. 16. 27. 2066. 4.1. 11. 21. see also Tables 3.1. 29.1. 28. 37.1.1. 18. 823. 1008. 2132.3 10. and 40. 2081.1. 35.1.1. 2878. 25. 17. 2418.1. 2572. 36.1. Game Theory and Equilibria [560] Pattern Classification [844] The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization [1694] Parallel Evolutionary Computations [2015] Introduction to Global Optimization [2126] .1 Testing [1168. and 39.1.1 Multiplayer Games [127]. 7.

38. 37. 28. . 40.3 Conferences and Workshops Table 10. 41. 31. 25. 59. 50. 26. 32. 44. 57. 43. Neural. and Fuzzy Systems [2685] Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations [2961] Multi-Objective Swarm Intelligent Systems: Theory & Experiences [2016] Evolutionary Optimization [2394] Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design [460] Multi-Objective Optimization in Computational Intelligence: Theory and Practice [438] An Operations Research Approach to the Economic Optimization of a Kraft Pulping Process [500] Fundamental Methods of Mathematical Economics [559] Einf¨ hrung in die K¨ nstliche Intelligenz [797] u u Stochastic and Global Optimization [850] Machine Learning Applications in Expert Systems and Information Retrieval [975] Nonlinear and Dynamics Programming [1162] Introduction to Operations Research [1232] Introduction to Stochastic Search and Optimization [2561] Nonlinear Multiobjective Optimization [1893] Heuristics: Intelligent Search Strategies for Computer Problem Solving [2140] Global Optimization with Non-Convex Constraints [2623] The Nature of Statistical Learning Theory [2788] Advances in Greedy Algorithms [247] Multiobjective Decision Making Theory and Methodology [528] Multiobjective Optimization: Principles and Case Studies [609] Multi-Objective Optimization Using Evolutionary Algorithms [740] Multicriteria Optimization [860] Multiple Criteria Optimization: Theory. 58. 36. 42. 54. 52. 24.126 23. 35. Computation and Application [2610] Global Optimization [2714] Soft Computing [760] 10. 64. 61. 29. 10 GENERAL INFORMATION ON OPTIMIZATION Global Optimization Algorithms – Theory and Application [2892] Computational Intelligence in Control [1922] Introduction to Operations Research [580] Management Models and Industrial Applications of Linear Programming [530] Efficient and Accurate Parallel Genetic Algorithms [477] Swarm Intelligence – Focus on Ant and Particle Swarm Optimization [525] Nonlinear Programming: Sequential Unconstrained Minimization Techniques [908] Nonlinear Programming: Sequential Unconstrained Minimization Techniques [909] Global Optimization: Deterministic Approaches [1271] Soft Computing: Methodologies and Applications [1242] Handbook of Global Optimization [1272] New Learning Paradigms in Soft Computing [1430] How to Solve It: Modern Heuristics [1887] Numerical Optimization [2040] Artificial Intelligence: A Modern Approach [2356] Electromagnetic Optimization by Genetic Algorithms [2250] Soft Computing: Integrating Evolutionary. 53. 47. 51. 62. 27. 60. 30. 63. 46. 45. 34. 33.2: Conferences and Workshops on Optimization. 48. 49. 56. 39. 55.

Canada. Georgia. see [2281. USA. see [732] 1990/07: Boston. 2919] e 1993/08: Chamb´ry. NSW. 1861. 1954] 1996/08: Portland. USA. see [2593. USA. OR. 449] 1981/08: Vancouver. see [1474] 2003/08: Acapulco. France. see [1846. see [1504] 1999/07: Orlando. 1947. 927. USA. see [1836. BC. India. see [979] 2007/07: Vancouver. Japan. 731. WA. 1988. see [1634. CA. USA. USA. USA. see [586. 2259] e 1991/08: Sydney. 2260] 1995/08: Montr´al. MI. USA. WA. Canada. WI. USSR. PA. M´xico. see [730. see [1965] 1997/07: Providence. 2296] 1997/08: Nagoya. see [1484] 2004/07: San Jos´. QC. see [1214] 1993/07: Washington. 183. MN. MA. see [1849] e 2002/07: Edmonton. CA. CA. USA. see [1055] 2005/07: Pittsburgh. CA. 1987. Canada. USA. see [831. UK. Scotland. Japan. DC. USA. see [2639] e 1991/07: Anaheim. see [373] 2007/01: Hyderabad. FL. CA. see [1946. USA. see [1469. USA. see [2039] 1971/09: London. USA. see [759] 2000/07: Austin. CA. MA. CONFERENCES AND WORKSHOPS IJCAI: International Joint Conference on Artificial Intelligence History: 2009/07: Pasadena. IL. Australia. see [913] 1992/07: San Jos´. WA. USA. BC. Canada. USA. DC. 2710] 1977/08: Cambridge. GA. USA. USA. Alberta. USA. Germany. 2282] 1975/09: Tbilisi. USA. 2122] 1989/08: Detroit. see [2709. 1470] 1983/08: Karlsruhe. 1213] 1979/08: Tokyo. Sweden. Italy. CA. see [3] 2010/07: Atlanta. see [448. see [1212. Paul. 2207] 1994/07: Seattle. RI. see [1118. see [2797] 2005/07: Edinburgh. see [630] 1969/05: Washington. 2594] 1987/08: Milano. see [2678] 1973/08: Stanford. see [1916] 1987/07: Seattle. see [2] 2008/06: Chicago. see [182. 1485] e 2001/08: Seattle. UK. see [960] 127 . USA. see [1222] 1998/07: Madison.10. see [793] 1988/08: St. USA. MA. see [1262] 2006/07: Boston.3. TX. USA. see [2841] AAAI: Annual National Conference on Artificial Intelligence History: 2011/08: San Francisco. see [2012] 1999/07: Stockholm. 1847] 1985/08: Los Angeles. USA. USA.

see [3065] 2006/07: Bˇij¯ e ıng. OR. M´xico. USA. USA. China. see [2857] 2007/08: Lake Tahoe. Malaysia. USA. CA. see [9] 2001/12: Adelaide. USA. USA. USA. see [1375] 2008/10: Barcelona. CA. RJ. USA. see [1238] 2000/07: Austin. see [1504] 1999/07: Orlando. see [2630] 2009/06: Kowloon. USA. see [1339] . see [1965] 1997/07: Providence. PA. see [1] 1992/07: San Jos´. USA. see [2439] e 1991/06: Anaheim. CA. see [1222] 1998/07: Madison. TX. USA. see [586] 1995/08: Montr´al. USA. PA. USA. New Zealand. USA. China. see [1849] e 2003/08: Acapulco. MA. CA. see [381] Washington. see [2993] 2007/09: Kaiserslautern. Canada. see [4] 2010/07: Atlanta. Brazil. see [1421] 2003/12: Melbourne. USA.128 1986/08: 1984/08: 1983/08: 1982/08: 1980/08: 10 GENERAL INFORMATION ON OPTIMIZATION Philadelphia. see [2361] 2009/07: Pasadena. DC. USA. WA. see [1377] 2009/08: Shˇny´ng. see [1634] 1996/08: Portland. Chile. CA. RI. see [2304] e 2001/04: Seattle. see [7] IAAI: Conference on Innovative Applications of Artificial Intelligence History: 2011/08: San Francisco. CA. USA. Hong Kong / Xi¯nggˇng. see [2014] 2004/12: Kitakyushu. USA. see [200] 2010/07: Bˇij¯ e ıng. CA. USA. 1513] Austin. Germany. see [1041] Pittsburgh. Catalonia. TX. Spain. Canada. USA. Australia. Li´on´ e a a ıng. GA. USA. CA. see [2533] 1990/05: Washington. VIC. China. Japan. QC. USA. FL. DC. CA. USA. see [2269] 1989/03: Stanford. see [197] HIS: International Conference on Hybrid Intelligent Systems History: 2011/12: Melaka. WI. CA. GA. China. USA. see [1572] 2006/12: Auckland. see [1512. see [2845] Stanford. AB. see [1340] 2005/11: Rio de Janeiro. see [14] 2010/08: Atlanta. PA. see [2428] ICCI: IEEE International Conference on Cognitive Informatics History: 2011/08: Banff. USA. Australia. see [1055] 2005/07: Pittsburgh. USA. USA. SA. USA. see [164] a a 2008/08: Stanford. WA. DC. see [1484] 2004/07: San Jos´. see [462] 1993/07: Washington. see [51] e 1994/08: Seattle. see [10] 2002/12: Santiago. see [3029] 2005/08: Irvine. see [1164] 2006/07: Boston.

see [2206] 1989/06: Ithaca. AB. AB. USA. USA. USA. see [602] 2005/08: Bonn. USA. TN. see [2520] 1991/06: Evanston. Canada. Taiwan. USA. see [313] 1990/06: Austin. CA. Austria. New Zealand. see [2744] 1990/08: Fairfax. see [603] 2007/06: Corvallis. see [1562] 1998/06: Charlottesville. see [2367] 1995/07: Tahoe City. North Rhine-Westphalia. see [2183] MCDM: Multiple Criteria Decision Making History: 2011/06: Jyv¨skyl¨. Greece. see [1798] 2000/06: Ankara. Canada. Canada. Canada. North Rhine-Westphalia. see [724] 2004/07: Banff. Crete. see [895] 2002/07: Sydney. USA. Germany. BC. see [2612] 1995/06: Hagen. see [1047] 2006/06: Pittsburgh. CA. QC. see [427] 2000/06: Stanford. USA. USA. see [2519] 1983: Monticello. see [2133] 2002/08: Calgary. MA.10. USA. MI. USA. see [3102] 2004/08: Whistler. NSW. Canada. IL. PA. VA. WA. USA. Australia. Finland. USA. USA. see [601] 1993/06: Amherst. NY. MA. USA. see [2750] e u ıchu¯ 2008/06: Auckland. see [1163] 2009/06: Montr´al. Israel. S` an. UK. PA. USA. USA. see [1676] 1999/06: Bled. USA. USA. Germany. see [426] 2003/08: Washington. see [1337] ICML: International Conference on Machine Learning History: 2011/06: Seattle. see [891] 1994/08: Coimbra. TX. see [2447] 1988/06: Ann Arbor. see [861] 2006/06: Chania.3. see [2442] 2010/06: Haifa. UK. USA. see [1945] 1992/07: Aberdeen. South Africa. CA. Portugal. see [1166] 1997/01: Cape Town. IL. see [2471] 1997/07: Nashville. Slovenia. see [926] 1996/07: Bari. see [2875] 2002/02: Semmering. NJ. China. see [2382] 2001/06: Williamstown. see [524] 2003/08: London. VA. see [2221] 1994/07: New Brunswick. Turkey. see [1659. see [1936] 1980: Pittsburgh. DC. USA. OR. CONFERENCES AND WORKSHOPS 2004/08: Victoria. see [2551] 129 . Scotland. Finland. see [681] e 2008/07: Helsinki. Italy. PA. see [1415] 1985: Skytop. see [592] 1992/07: Taipei. USA. see [401] 1998/07: Madison. 1660] 1987: Irvine. WI. see [1895] a a 2009/06: Ch´ngd¯ .

UK. see [2123] 2002/11: Singapore. UK. Australia. see [2544] 2001/03: Heslington. Anhu¯ China. South Korea. DE. see [1190] Newark. see [2363] Phuket. see [1203] Edinburgh. USA. see [367] 2009/12: Avignon. UK. see [2545. UK. VIC. see [2410] Cleveland. see [1991] 1998/11: Canberra. OH. see [1729] 2006/10: H´f´i. Japan. USA. see [1342] 2006/12: Cavalese. Belgium. see [509] ICARIS: International History: 2011/07: 2010/07: 2009/08: 2008/08: Conference on Artificial Immune Systems Cambridge. see [2856] ee ¯ ı. India. see [1165] Mons. Australia. see [2748] 1997/04: Manchester. Vietnam. UK. UK. Hungary. see [2585] 2008/12: Melbourne.130 1988/08: 1986/08: 1984/06: 1982/08: 1980/08: 1979/08: 1977/08: 1975/05: 10 GENERAL INFORMATION ON OPTIMIZATION Manchester. see [936] BIONETICS: International Conference on Bio-Inspired Models of Network. NY. see [890] o Buffalo. see [2871] AISB: Artificial Intelligence and Simulation of Behaviour History: 2008/04: Aberdeen. Australia. see [858] York. see [3094] Jouy-en-Josas. see [2547. UK and Leeds. see [274] . UK. NSW. see [637] 1996/04: Brighton. Australia. see [1275] 2008/11: Awaji City. see [2661] 2000/10: Nagoya. 2004/10: Busan. England. Thailand. MA. SA. France. see [153] 2007/12: Budapest. 2546] 2002/04: London. West Germany. see [2577] AI: Australian Joint Conference on Artificial Intelligence History: 2010/12: Adelaide. UK. see [938] 1995/04: Sheffield. Australia. see [937] 1994/04: Leeds. UK and Hatfield. Japan. UK. Hyogo. see [3072] 2004/12: Cairns. see [2759] 2000/04: Birmingham. Wales. UK. UK. UK. Japan. UK. Scotland. see [1759] Kyoto. see [2549] 2005/04: Bristol. see [33] 2006/12: Hobart. Brussels. France. see [2406] 2005/12: Sydney. 2548] 2003/03: Aberystwyth. UK. Uttar Pradesh. USA. see [1147] 2007/04: Newcastle upon Tyne. Italy. Scotland. and Computing Systems History: 2010/12: Boston. York. see [2701] SEAL: International Conference on Simulated Evolution and Learning History: 2012/12: Hanoi. Information. see [1960] Hagen/K¨nigswinter. see [1852] 1996/11: Taejon. USA. Australia. see [1179] 2010/12: Kanpur. UK. South Korea.

USA. see [470] e NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization History: 2011/10: Cluj-Napoca. Sicily. Scotland. see [1037] e 2005/11: Monterrey. UK. Japan. see [1001] AIMS: Artificial Intelligence in Mobile System 131 . see [1352] 2001/10: Tucson. see [1364] WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) History: 2010/11: see [1029] 2009/11: see [1015] 2008/11: see [1857] 2007/10: see [151] 2006/09: see [2980] 2005: see [2979] 2004: see [2978] 2003: see [2977] 2002: see [2976] 2001: see [2975] 2000: see [2974] 1999/09: Nagoya. see [2386] 2013/10: Edinburgh. see [1102] 2008/11: Puerto de la Cruz. see [1620] 2007/11: Catania. USA. Italy. see [859] 2012/10: Seoul.10. Portugal. Spain. AZ. AB. USA. see [1039] e 2004/04: Mexico City. Man. M´xico. AK. see [707] 2006/09: Oeiras. Japan. Spain. CA. UK. M´xico. Japan. see [1366] 1999/10: Tokyo. see [38] e 2007/11: Aguascalientes. see [2576] 2010/05: Granada. Scotland. CONFERENCES AND WORKSHOPS 2007/08: Santos. M´xico. see [2159] SMC: International Conference on Systems. CA. USA.3. South Korea. Tenerife. USA. see [2035] 2003/09: Edinburgh. Yucat´n. see [1993] 1998/08: Nagoya. Mexico. Kent. UK. see [2706] : Canterbury. see [1929] e 2002/04: M´rida. Japan. Italy. see [2705] MICAI: Mexican International Conference on Artificial Intelligence History: 2010/11: Pachuca. see [598] e a e 2000/04: Acapulco. Romania. Spain. see [1992] 1997/06: see [535] 1996/08: Nagoya. see [1619] 2006/06: Granada. Sicily. see [1424] 2004/09: Catania. and Cybernetics History: 2014/10: San Diego. M´xico. see [2583] 2009/11: Guanajuato. M´xico. Canada. see [2454] 2011/10: Anchorage. see [1331] 1998/10: La Jolla. see [91] 2009/10: San Antonio. TX. see [282] 2005/08: Banff. Brazil. M´xico.

Brazil. RJ. Portugal. see [2182] 2003/09: Cavtat.2 on page 385. Portugal. Portugal. see [1227] GEWS: Grammatical Evolution Workshop See Table 31. see [992] 2000/05: Barcelona. see [2781] 1995/04: Heraclion. Germany. North Rhine-Westphalia. Sachsen. see [2866] 2006/09: Berlin. see [2806] 1997/09: Aachen. HAIS: International Workshop on Hybrid Artificial Intelligence Systems History: 2011/05: Warsaw. North Rhine-Westphalia. see [1223] 1994/04: Catania. see [2805] 1994/09: Aachen. see [631] 2006/10: Ribeir˜o Preto. North Rhine-Westphalia. see [2804] EngOpt: International Conference on Engineering Optimization History: 2010/09: Lisbon. Croatia. North Rhine-Westphalia. Austria. see [3091] 1996/09: Aachen. see [2807] 1998/09: Aachen. Crete. Germany. see [275] a EUFIT: European Congress on Intelligent Techniques and Soft Computing History: 1999/09: Aachen. see [184] a ICAART: International Conference on Agents and Artificial Intelligence History: 2011/01: Rome. Finland. see [2803] 1993/09: Aachen. Spain. Germany. see [215] 1998/04: Chemnitz. see [2809] EPIA: Portuguese Conference on Artificial Intelligence History: 2011/10: Lisbon. USA. see [1405] 2008/06: Rio de Janeiro. see [1744] 2009/10: Aveiro. France. see [11] ECML: European Conference on Machine Learning History: 2007/09: Warsaw. Germany. North Rhine-Westphalia. North Rhine-Westphalia. see [507] 1993/04: Vienna. see [1221] 2001/09: Freiburg. see [512] 2002/08: Helsinki. Japan. Greece. see [3049] 2005/05: Muroran. see [1404] 2010/01: Val`ncia. Germany. Poland. Germany. WA. Poland. Brazil. Italy. see [1624] CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology History: 2008/10: Cergy-Pontoise. see [917] e . see [278] 2005/10: Porto. see [2867] 2009/06: Salamanca. Portugal. Spain. Italy. Germany. Italy.132 10 GENERAL INFORMATION ON OPTIMIZATION History: 2003/10: Seattle. Germany. Portugal. see [1774] 2007/12: Guimar˜es. see [877] 1995/08: Aachen. Dubrovnik. see [2209] 2004/09: Pisa. see [2026] a 2005/12: Covilh˜. Spain. Germany. Portugal. see [632] 2007/11: Salamanca. Spain. Sicily. Germany. see [537] 1997/04: Prague. North Rhine-Westphalia. see [2749] 2008/09: Burgos. Spain. Catalonia. Czech Republic. SP.

see [2742] 1989: Germany. USA. H´ bˇi. Portugal. see [515] 2009/12: Malacca. see [136] 2002/11: Puc´n. see [2011] 2007/09: Providence.10. China. 1826] LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction History: 2010/09: St Andrews. Li´on´ a a a ıng. CA. see [2846] u a u e 2009/05: Wˇ h`n. see [2625] 2007/02: Andalo. Trento. see [1273. France. Scotland. see [1367] IEA/AIE: International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems History: 2004/05: Ottawa. Italy.3. China. see [780] 2009/09: Lisbon. QLD. Spain. Argentina. France. CA. CONFERENCES AND WORKSHOPS 2009/01: Porto. China. Australia. NSW. Germany. see [2581] 2009/01: Trento. see [138] 2008/12: Buenos Aires. H´ bˇi. see [135] o 1999/11: Erice. ON. Portugal. see [2225] 2006/09: Nantes. see [132] ANZIIS: Australia and New Zealand Conference on Intelligent Information Systems History: 1994/11: Brisbane. Malaysia. see [1380] u a u e 2010/05: Wˇ h`n. see [133] ıso. China. Italy. France. UK. see [2318] 2010/01: Venice. Organization and Technology History: 1992/09: Tutzing. see [1381] 2010/12: Cergy-Pontoise. RJ. ON. Canada. H´ bˇi. see [137] 2005/10: Paris. Canada. see [245] 1986/07: Neubiberg. see [134] 1996/11: Valpara´ Chile. Germany. see [583] 2005/10: Barcelona. see [916] 133 ICTAI: International Conference on Tools with Artificial Intelligence History: 2003/11: Sacramento. USA. see [1359] . Catalonia. Brazil. Australia. Bavaria. see [779] 2008/09: Sydney. see [2847] u a u e ISICA: International Symposium on Advances in Computation and Intelligence History: 2007/09: Wˇ h`n. China. see [243] AINS: Annual Symposium on Autonomous Intelligent Networks and Systems History: 2003/06: Menlo Park. H´ bˇi. see [244] 1983/06: Neubiberg. Germany. see [1224] WOPPLOT: Workshop on Parallel Processing: Logic. see [1490] u a u e LION: Learning and Intelligent OptimizatioN History: 2011/01: Rome. Italy. Bavaria. Italy. 1989/08: Rio de Janeiro. see [2715] SoCPaR: International Conference on SOft Computing and PAttern Recognition History: 2011/10: D`li´n. Italy. see [513] ALIO/EURO: Workshop on Applied Combinatorial Optimization History: 2011/05: Porto. see [1860] 2004/09: Toronto. Portugal. RI. see [2088] ISA: International Workshop on Intelligent Systems and Applications History: 2011/05: Wˇ h`n. USA. Chile.

Optimization. Yunlin. see [1493] 2005/12: Kaohsiung. see [1560] 1988/10: Glasgow. China. Taiwan. see [1061] 1987/05: Bled. History: 1991/03: Porto. see [2216] 2005/12: Pune. India. Guˇngd¯ng. Lisbon. see [2003] TAI: IEEE Conference on Tools for Artificial Intelligence History: 1990/11: Herndon. VA. Spain. Austria. see [2004] 2004/11: Wenshan District. Taiwan. Poland. Yugoslavia (now Slovenia). GICOLAG: Global Optimization – Integrating Convexity. see [1283] 2009/10: Wufeng Township. see [2214] ISIS: International Symposium on Advanced Intelligent Systems History: 2007/09: Sokcho. and Computational Algebraic Geometry History: 2006/12: Vienna. see [1360] FWGA: Finnish Workshop on Genetic Algorithms and Their Applications See Table 29. Portugal. see [1714] EDM: International Conference on Educational Data Mining History: 2009/07: C´rdoba.134 10 GENERAL INFORMATION ON OPTIMIZATION ASC: IASTED International Conference on Artificial Intelligence and Soft Computing History: 2002/07: Banff. India. see [198] PAKDD: Pacific-Asia Conference on Knowledge Discovery and Data Mining History: 2011/05: Sh¯nzh`n. AB. see [2217] 2007/12: Pune. Logic Programming. Taiwan. Korea. see [2263] a o a o IICAI: Indian International Conference on Artificial Intelligence History: 2011/12: Tumkur. see [2024] ICMLC: International Conference on Machine Learning and Cybernetics History: 2005/08: Guˇngzh¯u. Scotland. USA. see [322] 1986/02: Orsay. Taipei City. see [2006] 2006/12: Kaohsiung. France. see [2215] 2003/12: Hyderabad. India. see [2736] 2009/12: Tumkur. Taiwan. Finland. FL. I-lan County. India. Taiwan. India. see [2097] FLAIRS: Florida Artificial Intelligence Research Symposium History: 1994/05: Pensacola Beach. see [529] 2008/11: Chiao-hsi Shiang. see [2659] 2007/11: Douliou. Taichung County. China. Guˇngd¯ng. Canada. see [1294] e e a o STeP: Finnish Artificial Intelligence Conference History: 2008/08: Espoo. Taiwan. see [1358] Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna History: 1997/09: Rytro. Portugal. Taiwan. see [217] o EWSL: European Working Session on Learning Continued as ECML in 1993. UK. see [2255] TAAI: Conference on Technologies and Applications of Artificial Intelligence History: 2010/11: Hsinchu. USA.2 on page 355. see [1579] MCDA: International Summer School on Multicriteria Decision Aid History: 1998/07: Monte Estoril. see [2362] .

see [2584] 10. see [1414] AICS: Irish Conference on Artificial Intelligence and Cognitive Science History: 2007/08: Dublin. Science Publishers and Springer Netherlands 11. see [874] TAINN: Turkish Symposium on Artificial Intelligence and Neural Networks (Tu¨rk Yapay Zeka ve a Yapay Sinir A˘ları Sempozyumu) g History: 1993/06: Istanbul. VA. USA. and Cybernetics – Part B: Cybernetics published by IEEE Systems.4 Journals 1. see [321] 2008/09: Antwerp. ORSA Journal on Computing published by Operations Research Society of America (ORSA) 13.V. USA. and Pergamon Press 8. and Cybernetics Society 2. 3. International Journal of Computational Intelligence and Applications (IJCIA) published by Imperial College Press Co. IEEE Transactions on Systems. Switzerland. The Journal of the Operational Research Society (JORS) published by Operations Research Society and Palgrave Macmillan Ltd. Computers & Operations Research published by Elsevier Science Publishers B. see [1786] 2009/03: Arlington. Italy. Naples.10. 15. Operations Research published by HighWire Press (Stanford University) and Institute for Operations Research and the Management Sciences (INFORMS) 10. Management Science published by HighWire Press (Stanford University) and Institute for Operations Research and the Management Sciences (INFORMS) 14. CA. Applied Soft Computing published by Elsevier Science Publishers B. Machine Learning published by Kluwer Academic Publishers and Springer Netherlands 5. and World Scientific Publishing Co. Slovenia. Belgium. USA. Turkey. Data Mining and Knowledge Discovery published by Springer Netherlands . see [114] SEA: International Symposium on Experimental Algorithms History: 2011/05: Chania. see [2587] 2010/03: Lugano. CA. 17. Information Sciences – Informatics and Computer Science Intelligent Systems Applications: An International Journal published by Elsevier Science Publishers B. Italy. see [764] AIIA: Congress of the Italian Association for Artificial Intelligence (AI*IA) on Trends in Artificial Intelligence History: 1991/10: Palermo. Baltzer AG. 4. Crete. Computational Optimization and Applications published by Kluwer Academic Publishers and Springer Netherlands 16. Greece. Man. see [124] ECML PKDD: European Conference on Machine Learning and European Conference on Principles and Practice of Knowledge Discovery in Databases History: 2009/09: Bled. see [2098] 2010/05: Ischa Island. JOURNALS 135 Machine Intelligence Workshop History: 1977: Santa Cruz. 9. Inc. Annals of Operations Research published by J. 12. Man.V. 6.V. and North-Holland Scientific Publishers Ltd. Methodologies and Applications published by Springer-Verlag 7. Soft Computing – A Fusion of Foundations. Artificial Intelligence published by Elsevier Science Publishers B.4. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers B. USA. TN. see [1643] AGI: Conference on Artificial General Intelligence History: 2011/08: Mountain View. see [661] 2008/03: Memphis.V. Journal of Artificial Intelligence Research (JAIR) published by AAAI Press and AI Access Foundation. C.V. Ireland.

International Journal of Knowledge-Based and Intelligent Engineering Systems published by IOS Press 38.V. Journal of Optimization Theory and Applications published by Plenum Press and Springer Netherlands 32. IEEE Transactions on Knowledge and Data Engineering published by IEEE Computer Society Press 27. Journal of Global Optimization published by Springer Netherlands 21. K¨nstliche Intelligenz (KI) published by B¨ttcher IT Verlag u o . ACM SIGART Bulletin published by ACM Press 29. International Transactions in Operational Research published by Blackwell Publishing Ltd 49. Discrete Optimization published by Elsevier Science Publishers B. Optimization and Engineering published by Kluwer Academic Publishers and Springer Netherlands 35. INFORMS Journal on Computing (JOC) published by HighWire Press (Stanford University) and Institute for Operations Research and the Management Sciences (INFORMS) 50. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied Mathematics (SIAM) 19. Artificial Intelligence Review – An International Science and Engineering Journal published by Kluwer Academic Publishers and Springer Netherlands 25. 31.. Inc. 43. 45. Application Oriented Research Journal published by Elsevier Science Publishers B. International Journal of Organizational and Collective Intelligence (IJOCI) published by Idea Group Publishing (Idea Group Inc. Neural Networks.V. Journal of Machine Learning Research (JMLR) published by MIT Press 30. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization published by SIAM Activity Group on Optimization – SIAG/OPT 52. and Complex Problem-Solving Technologies published by Springer Netherlands 24.V. IEEE Intelligent Systems Magazine published by IEEE Computer Society Press 28. IGI Global) 36. Optimization Methods and Software published by Taylor and Francis LLC 40. OR/MS Today published by Institute for Operations Research and the Management Sciences (INFORMS) and Lionheart Publishing. 46.V. Applied Intelligence – The International Journal of Artificial Intelligence. Artificial Intelligence in Engineering published by Elsevier Science Publishers B. 33. Structural and Multidisciplinary Optimization published by Springer-Verlag GmbH 42. OR Spectrum – Quantitative Approaches in Management published by Springer-Verlag 51. Journal of Mathematical Modelling and Algorithms published by Springer Netherlands 41. Mathematics of Operations Research (MOR) published by Institute for Operations Research and the Management Sciences (INFORMS) 20. Annals of Mathematics and Artificial Intelligence published by Springer Netherlands 26. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) published by IEEE Computer Society 39. Engineering Optimization published by Informa plc and Taylor and Francis LLC 34. AI Magazine published by AAAI Press 44. Journal of Artificial Evolution and Applications published by Hindawi Publishing Corporation 23. International Journal Approximate Reasoning published by Elsevier Science Publishers B. International Journal of Production Research published by Taylor and Francis LLC 48. Computers in Industry – An International. International Journal of Intelligent Systems published by Wiley Interscience 47. Journal of Intelligent Information Systems published by Springer-Verlag GmbH 37. AIAA Journal published by American Institute of Aeronautics and Astronautics 22.136 10 GENERAL INFORMATION ON OPTIMIZATION 18.

Part II Difficulties in Optimization .

138 10 GENERAL INFORMATION ON OPTIMIZATION .

The degree of difficulty of solving a certain problem with a dedicated algorithm is closely related to its computational complexity. In this part we want to discuss the most important of these complications.Chapter 11 Introduction The classification of optimization algorithms in Section 1. The objective values in the figure are subject to minimization and the small bubbles represent candidate solutions under investigation. This understanding of difficulty comes very close to the intuitive sketches in Figure 11. characteristic difficulties.1. A function is difficult from a mathematical perspective in this context if it is not continuous. why is this variety needed? One possible answer is simply because there are so many different kinds of optimization tasks. It is a justified question to ask why there are so many different approaches. You will then realize that many of the problems you encountered are discussed here in depth. Before we go more into detail about what makes these landscapes difficult. By giving clear definitions and comprehensive introductions to these topics. even if highly efficient optimization techniques are applied. You may wish to read this part after reading through some of the later parts on specific optimization algorithms or after solving your first optimization task. for instance).1) which we are going to discuss. optimization algorithms are guided by objective functions. for example). the major problems that may be encountered during optimization [2915]. neglecting a single one of these aspects during the development of the optimization process can render the whole efforts invested useless. Some of subjects in the following text concern Global Optimization in general (multimodality and overfitting. Sometimes. we should establish the term in the context of optimization. As already stated. In Figure 11. Yet. In many real world applications of metaheuristic optimization. In such cases. let alone exact estimates with mathematical precision. We will outline this in the next section and show that for many problems. we have sketched a set of different types of fitness landscapes (see Section 5. the characteristics of the objective functions are not known in advance. experience and general rules of thumb are often the best guides available . not differentiable. The problems are usually very complex and it is only rarely possible to derive boundaries for the performance or the runtime of optimizers in advance. the approaches introduced here resemble only a fraction of the actual number of available methods.3 and the table of contents of this book enumerate a wide variety of optimization algorithms. or if it has multiple maxima and minima.1. there is no algorithm which can exactly solve them in reasonable time. others apply especially to nature-inspired approaches like Genetic Algorithms (epistasis and neutrality. we want to raise the awareness of scientists and practitioners in the industry and hope to help them to use optimization algorithms more efficiently. Each of them puts different obstacles into the way of the optimizers and comes with own. An arrow from one bubble to another means that the second individual is found by applying one search operation to the first one.

e: Deceptive x objective values f(x) ? ? neutral area or area without much information x Fig. 11.1: Different possible properties of fitness landscapes (minimization).a: Best Case objective values f(x) x Fig.1.1.1.1.d: Rugged objective values f(x) objective values f(x) ? neutral area x Fig.f: Neutral region with misleading gradient information Fig.1. 11.140 11 INTRODUCTION objective values f(x) x Fig. 11. 11.h: Nightmare Figure 11.g: Needle-In-A-Haystack objective values f(x) needle (isolated optimum) ? x Fig.1.b: Low Variation objective values f(x) multiple (local) optima x Fig.1. 11. 11.c: Multi-Modal objective values f(x) ? ? ? no useful gradient information Fig. . 11.1. 11.

providing a better understanding of when the application of a metaheuristic is feasible and when not.11 INTRODUCTION 141 (see also Chapter 24). Empirical results based on models obtained from related research areas such as biology are sometimes helpful as well. . we discuss many such models and rules. as well as on how to avoid defining problems in a way that makes them difficult. In this part.

.

i. The time-complexity denotes how many steps algorithms need until they return their results. We call those the time complexity and the space complexity dimensions.. 1556. The space complexity determines how much memory an algorithm consumes at most in one run. 12. often best. and worst-case complexities exist for given input sizes.Chapter 12 Problem Hardness Most of the algorithms we will discuss in this book. In real systems however. these measures depend on the input values passed to the algorithm. the knowledge of the exact dependency is not needed. e. Indeed. We immediately know that 1 000 000 000 000 000 is not prime. usually give approximations as results but cannot guarantee to always find the globally optimal solution. The nature of the input may influence the runtime and space requirements of an algorithm as well. We can describe this dependency as a function of this size. 2940].wikipedia. In order to compare the efficiency of algorithms.org/wiki/Analysis_of_algorithms [accessed 2007-07-03] is indeed a prime. the number of bits needed to represent the number from which we want to find out whether it is prime or not.1 Algorithmic Complexity The most important measures obtained by analyzing an algorithm1 are the time that it takes to produce the wanted outcome and the storage space needed for internal data [635]. whereas the answer in case of the smaller number 100 000 007 is less clear. As we just have seen. limited to the precision of the available computers). to simply test all possible values from the problem space and to return the best one. the number of elements in a list to be sorted. it is easy to see that telling whether 7 is prime or not can be done in much shorter time than verifying the same property for 232 582 657 − 1. 3 http://en.wikipedia. This method. know that sorting n data elements with the Quicksort algorithm3 [1240.2 Therefore. However. 1557] 2 100 000 007 1 http://en. average. the easiest way to do so would be exhaustive enumeration (see Section 8. we are actually able to find the global optima for all given problems (of course. and so on. the basic criterion influencing the time and space requirements of an algorithm applied to a problem is the size of the inputs. If we. obviously. in case you’re wondering. though it is a big number. Of course. If we have an algorithm that should decide whether a given number is prime or not.3).org/wiki/Quicksort [accessed 2007-07-03] . 232 582 657 − 1 too. the number of steps needed to find that out will differ if the size of the inputs are different. will take extremely long so our goal is to find an algorithm which is better. we want to find the best algorithm for solving a given class of problems. 2510]. for example. in particular the metaheuristics. For example. More precisely. the number of nodes or edges in a graph to be 3-colored. some approximative notations have been introduced [1555. This means that we need some sort of metric with which we can compare the efficiency of algorithms [2877.

there exist only algorithms of this class. More generally. Logarithmic complexity is often a feature of algorithms that run on binary trees or search algorithms in ordered sets.1 (Some examples of the big-O notation). m ∈ R : m > 0 ∧ |f (x) | ≤ m|g (x) | ∀ x > x0 (12. Some sorting algorithms like selection sort have x+3 this complexity. In this group we find many algorithms that work on graphs. In terms of algorithmic complexity. Example E12. f10 (x) = x5 − x2 The general polynomial complexity. f2 (x) = sinx f3 (x) = log x. Notice that O(log n) implies that only parts of the input of the algorithm is read/regarded. Therefore.1) In other words. Algorithms of O(n) require to access and process their input a constant number of times. Definition D12. we specify the amount of steps or memory units an algorithm needs in dependency on the size of its inputs in the big-O notation. For a graph-processing algorithm. f4 (x < 1) = 0 222 description O(n) O(n log n) O n2 O ni : i > 1. even if the correct number is 2n ln n ≈ 1. this may be the number of edges |E| and the number of vertexes |V | of a graph to be processed by a graph-processing algorithm and the algorithmic complexity may be O |V |2 + |E||V | . since the input has length n and we only perform m log n steps. f11 (x) = 23 ∗ 2x Algorithms with exponential complexity perform slowly and become infeasible with increasing input size. f (x) ∈ O(g (x)) ⇔ ∃ x0 . class O(1) O(log n) examples f1 (x) = 2 . f5 (x) = 23n + 4. f (x) thus cannot grow (considerably) faster than g (x). i ∈ R O(2n ) Algorithms that have constant runtime for all inputs are O(1).1.1 (Big-O notation). Some examples for single-parametric functions can be found in Example E12. O n2 f9 (x) = i=0 x − 2 solutions are acceptable good. For many hard problems. . The big-O notation4 is a mathematical notation used to describe the asymptotical upper bound of functions. Their solution can otherwise only be approximated by the means of randomized global optimization techniques. This is f6 (x) = n 2 for example the case when searching in a linked list. this specification is based on key-features of the input data. positive factor m so that the absolute value of f (x) is smaller (or equal) than m-times the absolute value of g (x) for all x that are greater than x0 .39n log n. this is sufficient enough. The Big-O-family notations introduced by Bachmann [163] and made popular by Landau [1664] allow us to group functions together that rise at approximately the same speed. f4 (x) = f4 x + 2 1.144 12 PROBLEM HARDNESS takes in average something about n log n steps. f7 (x) = 23x + x log 7x Many sorting algorithms like Quicksort and Mergesort are in O(n log n) f8 (x) = 34x2 . x3 +x2 +x+1 = f (x) ∈ O x3 since for m = 5 and x0 = 2 it holds that 5x3 > x3 + x2 + x + 1 ∀x ≥ 2. For many problems. a function f (x) is in O of another function g (x) if and only if there exists a real number x0 and a constant.

the computational complexity5 of a problem is bounded by the best algorithm known for it. hence. With this simple machine. The machine has a finite set of possible states and is driven by rules.1: Illustration of some functions illustrating their rising speed.1. 12. In Example E12. COMPLEXITY CLASSES f(x)=x x 145 f(x)=x! f(x)=e x f(x)=2 x f(x)=1.org/wiki/Turing_machine [accessed 2010-06-24] 7 Another computational model. from the opposite point of view. also the programs executed by them.2. One of the most basic computer models are deterministic Turing machines6 (DTMs) [2737] as sketched in Fig.wikipedia.org/wiki/Computational_complexity_theory [accessed 2010-06-24] 6 http://en. it makes a statement about how much resources are necessary for solving a given problem. 12. They consist of a tape of infinite length which is divided into distinct cells.wikipedia.12. RAM (Random Access Machine) is quite close to today’s computer architectures and can be simulated with Turing machines efficiently [1804] and vice versa. tell us whether we can solve a problem with given available resources. The rising behavior of some functions is illustrated in Figure 12. 12. These rules decide which symbol should be written and/or move the head one unit to the left or right based on the machine’s current state and the symbol on the tape under its head.2. any of our computers and.1 x 10 40 f(x) 10 10 35 30 10 10 25 f(x)=x10 picoseconds since big bang f(x)=x8 20 10 1 trillion 15 f(x)=x4 ms per day f(x)=x2 f(x)=x 1 billion 1 million 1000 100 10 1 1 2 4 8 16 32 64 128 256 512 1024 x 2048 Figure 12.a. . A Turing machine has a head which is always located at one of these cells and can read and write the symbols.1 Turing Machines The amount of resources required at least to solve a problem depends on the problem itself and on the machine used for solving it. In other words.org/wiki/Big_O_notation [accessed 2007-07-03] 5 http://en. From this figure it becomes clear that problems for which only algorithms with high complexities such as O(2n ) exist quickly become intractable even for small input sizes n. can be simulated. each capable of holding exactly one symbol from a finite alphabet.wikipedia.7 4 http://en.1. or. we give some examples for the big-O notation and name kinds of algorithms which have such complexities. gleaned from [2365].2 Complexity Classes While the big-O notation is used to state how long a specific algorithm needs (at least/in average/at most) for input with certain features.2.

Since it quite easy to simulate a DTM (with finite tape) with an off-the-shelf computer and vice versa. 1092]. These are the problems which are said to be exactly solvable in a feasible way [359.2.2.b: A non-deterministic Turing machine..146 State Tape Symbol 0 1 0 1 1 0 12 PROBLEM HARDNESS Þ Þ Þ Þ Þ Þ Þ Print 1 0 1 1 0 1 Motion Right Right Right Left Left Right Next State A B B C C A 0 0 00 00 1 00 0 0 11 1 0 00 A A B B C C Fig. 0 0 00 00 1 00 0 0 11 1 0 00 State Tape Symbol 0 1 1 0 1 1 0 0 Þ Þ Þ Þ Þ Þ Þ Þ Þ Print 1 0 0 1 1 0 1 1 Motion Right Right Right Right Left Left Right Left Next State A B A B C C A A A A A B B 00 0 0 0 0 00 C 00 1 00 1 00 0 0 C 00 0 0 11 1 0 00 11 1 0 00 C Fig. see Fig. However. i. on the other hand. includes all problems which can be solved by a non-deterministic Turing machine in a runtime rising polynomial with the input size. N P. 12. N P. NTMs can be simulated with DTMs by performing a breadth-first search on the possible computation steps of the NTM until it finds an accepting state. 12. non-deterministic Turing machine8 (NTMs.a: A deterministic Turing machine.e. since DTMs can be considered as a subset of NTMs and if a deterministic Turing machine can solve something in polynomial time.2: Turing machines. exponential. They are different from DTMs in that they may have more than one rule for a state/input pair.2. the question whether N P ⊆ P and.b) are thought experiments rather than real computers. It is the set of all problems for which solutions can be checked in polynomial time [1549]. real computers. however. these include all the problems from P.2 P.wikipedia. it is strongly assumed that problems in N P need super-polynomial. hence. there exists at least one algorithm (for a DTM) with an algorithmic complexity in O(np ) (where n is the input size and p ∈ N1 is some natural number). Hardness. accepting state. 12.org/wiki/Non-deterministic_Turing_machine [accessed 2010-06-24] .2. 8 http://en. The number of steps required for this simulation. Algorithms designed for Turing machines can be translated to algorithms for normal PCs. a non-deterministic one can as well. This means that for the problems within this class. time to be solved by deterministic Turing machines and therefore also by computers as we know and use them. the NTM can be thought of to “fork” into n independent NTMs running in parallel. grows exponentially with the length of the shortest accepting path of the NTM [1483]. Clearly. there neither exists proof against nor for it [1549]. these are also the problems that can be solved within polynomial time by existing. The computation of the overall system stops when any of these reaches a final. Like DTMs. is much more complicated and so far. Figure 12. 12. The other way around. and Completeness The set of all problems which can be solved by a deterministic Turing machine within polynomial time is called P [1025]. P = N P. Whenever there is an ambiguous situation where n > 1 rules could be applied.

In other words. The problems which are hard for N P are called N P-hard. hence the name N P-complete. there exists no algorithm with a less-than exponential complexity which can be used to find optimal transportation plans for a logistics company. we say A reduces to B. A randomized algorithm includes at least one instruction that acts on the basis of random decision.12. Bremermann [407] pointed out that “No data processing system whether artificial or living can process more than 2 · 1047 bits per second per gram of its mass”. if the number of possible states of a system grow exponentially with its size.2 on page 45 (TSP) is a classical example for N P-complete combinatorial optimization problems. in most cases. x2 .3 The Problem: Many Real-World Tasks are N P-hard From a practical perspective. can be assumed to be at least as bad as that. there will be no way to solve a variety of problems exactly. ∧. The first known N P-complete problem is the satisfiability problem (SAT) [626]. Minsky [1907].3. no polynomial-time algorithm for DTMs has been discovered for this class of problems and the runtime of the algorithms we have for them increases exponentially with the input size. the problem is to decide whether or not there is an assignment of values to a set of Boolean variables x1 . However. as soon as a logistics company has to satisfy transportation orders of more than a handful of customers. e. In other words. This means that A is no more difficult than B. problems in P are usually friendly. as opposed to deterministic . Everything with an algorithmic complexity in O n5 and less can usually be solved more or less efficiently if n does not get too large. without formal proof.. answering the question on how to fit as many items (of different weight and value) into a weight-limited container so that the totally value of the items in the container is maximized. parentheses. Yet. . i. reductions which take no more than polynomial time in the input size are usually used. A problem which is in a complexity class C and hard for C is called complete for C. So far. A problem is hard for a complexity class if every other problem in this class can be reduced to it. as stated in Definition D1. we will therefore have to give up on some of the features which we expect an algorithm or a solution to have. with complexities of between log n and n log n. they fall at least into the same complexity class. Like the TSP. 12. any solution attempt becomes infeasible quickly. . 12. Here. are not bothersome.4 Countermeasures Solving a problem exactly to optimality is not always possible. is N P-complete as well. many real-world problems are N P-hard or. are good examples for such problems which. as we have learned in the previous section. ∨. even with this power. The Traveling Salesman Problem which we initially introduced in Example E2. If we have to deal with N P-complete problems. it occurs in a variety of practical applications. For classes larger than P. THE PROBLEM: MANY REAL-WORLD TASKS ARE N P-HARD 147 If problem A can be transformed in a way so that we can use an algorithm for problem B to solve A. . and ¬ evaluate to true. This means that until now. The Knapsack problem [463]. 2914] are basically iterated variants of the TSP. Since many Vehicle Routing Problems (VRPs) [2912. Searching and sorting algorithms.13 on page 33. for instance. estimates the number of all possible move sequences in chess to be around 1 · 10120 . In 1962. xn which makes a given Boolean expression based on these variables. This is an absolute limit to computation power which we are very unlikely to ever be able to unleash. These few examples and the wide area of practical application problems which can be reduced to them clearly show one thing: Unless P = N P can be found at some point in time. . it will likely not be able to serve them in an optimal way.

The termination. 9 http://en.wikipedia. Randomized algorithms are also often called probabilistic algorithms [1281. 1282. 1966]. Monte Carlo algorithms always terminate but are (partially) correct only with a positive probability.org/wiki/Las_Vegas_algorithm [accessed 2007-07-03] 11 http://en. A Las Vegas algorithm10 is a randomized algorithm that never returns a wrong result [141. As pointed out in Section 1. however. 1966].org/wiki/Deterministic_computation [accessed 2007-07-03] 10 http://en. 1282. a Las Vegas algorithm terminates with a positive probability and is (partially) correct. It either returns the correct result. In summary. There are two general classes of randomized algorithms: Las Vegas and Monte Carlo algorithms. Definition D12. 1967].148 12 PROBLEM HARDNESS algorithms9 which always produce the same results when given the same inputs. a randomized algorithm violates the constraint of determinism. Its result. Definition D12. reports a failure.wikipedia. In contrast to Las Vegas algorithms. 1919. 1282. they are usually the way to go to tackle complex problems. 1281. If a Las Vegas algorithm returns.3 (Monte Carlo Algorithm). or does not return at all. its outcome is deterministic (but not the algorithm itself).wikipedia.2 (Las Vegas Algorithm). however. 1966.3. can be either correct or incorrect [1281. cannot be guaranteed. A Monte Carlo algorithm11 always terminates.org/wiki/Monte_carlo_algorithm [accessed 2007-07-03] . There usually exists an expected runtime limit for such algorithms – their actual execution however may take arbitrarily long.

4. Use literature or the internet to find at least three more N P-complete problems. Define them as optimization problems by specifying proper problem spaces X and objective functions f which are subject to minimization. [5 points] 46. [15 points] . Find all big-O relationships between the following functions fi : N1 → R: √ • f1 (x) = 9 x • f2 (x) = log x • f3 (x) = 7 • f4 (x) = (1. Show why the statement “The runtime of the algorithm is at least in O n2 ” makes no sense. What is the difference between a deterministic Turing machine (DTM) and nondeterministic one (NTM)? [10 points] 47.12. COUNTERMEASURES Tasks T12 44. [5 points] 48.5)x • f5 (x) = 99/x • f6 (x) = ex • f7 (x) = 10x2 • f8 (x) = (2 + (−1)x )x • f9 (x) = 2 ln x • f10 (x) = (1 + (−1)x )x2 + x [10 points] 149 45. Write down the states and rules of a deterministic Turing machine which inverts a sequence of zeros and ones on the tape while moving its head to the right and that stops when encountering a blank symbol.

.

Definition D13.1 Introduction Definition D13. This is basically the best behavior of an optimization process we could wish for.c on page 140.. In nature. c ∈ [0. the problem is multi-modal [1266.2 (Multi-Modality). if convergence is acceptable. A set of objective functions (or a vector function) f is multi-modal if it has multiple (local or global) optima – depending on the definition of “optimum” in the context of the corresponding optimization problem. also in nature) is that it is often not possible to determine whether the best solution currently known is situated on local or a global optimum and thus. i. a similar phenomenon can be observed according to Koza [1602]: The niche preemption principle states that a niche in a natural environment tends to become dominated by a single species [1812].1.Chapter 13 Unsatisfying Convergence 13. it is usually not clear whether the optimization process can be stopped. Optimization processes will usually converge at some point in time. where t is the iteration index of the optimization algorithm and x(t) is the best candidate solution discovered until iteration t [1976]. of course.1.1) One of the problems in Global Optimization (and basically. or whether it should examine other parts of the search space instead.1 (Convergence). 11. only become cumbersome if there are multiple (local) optima. 2267] as depicted in Fig. |f(x(t + 1)) − f(x⋆ )| ≤ c |f(x(t)) − f(x⋆ )| . 1 according to a suitable metric like numbers of modifications or mutations which need to be applied to a given solution in order to leave this subset . whether it should concentrate on refining the current optimum. e. A mathematical function is multi-modal if it has multiple (local or global) maxima or minima [711. 1) (13. 2474. however. The existence of multiple global optima itself is not problematic and the discovery of only a subset of them can still be considered as successful in many cases. Linear convergence to the global optimum x⋆ with respect to an objective function f is defined in Equation 13. This can. is more complicated. The occurrence of numerous local optima. In other words. 3090]. An optimization algorithm has converged (a) if it cannot reach new candidate solutions anymore or (b) if it keeps on producing candidate solutions from a “small”1 subset of the problem space.

basically.2 The Problems There are three basic problems which are related to the convergence of an optimization algorithm: premature.a: Example 1: Maximization Fig. Although a higher peak exists in the maximization problem pictured in Fig. is the convergence of an optimization process to a local optimum.1 Premature Convergence Definition D13.1 illustrates two examples for premature convergence. 13. The same situation can be observed in the minimization task shown in Fig. Figure 13.152 13 UNSATISFYING CONVERGENCE 13. to find the global optima.1: Premature convergence in the objective space.1. 13. but the latter ones may also cause large inconvenience. 13.b where the optimization process gets trapped in the local minimum on the left hand side.b: Example 2: Minimization Figure 13.1. The goal of optimization is. 2760].2.1.3 (Premature Convergence). of course. non-uniform. An optimization process has prematurely converged to a local optimum if it is no longer able to explore other parts of the search space than the area currently being examined and there exists another region that contains a superior solution [2417. The first one is most important in optimization. A prematurely converged optimization process hence provides solutions which are below the possible or anticipated quality. Example E13. 13. and domino convergence. the optimization algorithm solely focuses on the local optimum to the right. Premature convergence.2 Non-Uniform Convergence . 13. global optimum local optimum objective values f(x) x Fig.1. 13.2.1 (Premature Convergence).a.

Building blocks with a very strong positive influence on the objective values. Finally.a: Bad Convergence and Fig. these solutions should be uniformly spread along this front. Example E13. Instead. the front depicted in Fig.2. Then. they are likely to be treated with different priorities. The second example (Fig. 13. at least in randomized or heuristic optimization methods. but it becomes much clearer when taking a look on Example E13.3 Domino Convergence The phenomenon of domino convergence has been brought to attention by Rudnick [2347] who studied it in the context of his BinInt problem [2347.b) contains a set of solutions which are very close to the true Pareto front but cover it only partially.13.2: Pareto front approximation sets [2915].2. Let us examine the three fronts included in Figure 13.2. usually we want to find a set X of solutions which has a good spread over all possible optimal features. the alleles of genes with a smaller contribution are ignored. 13. In optimization problems with infinitely many possible (globally) optimal solutions. but the points are far away from the true Pareto front. the optimization process should converge to the true Pareto front and return solutions as close to it as possible. an optimization process has nonuniformly converged if the solutions that it provides only represent a subset of the optimal characteristics.2 [2915]. The first picture (Fig. f2 ). 13. 13.2 (Convergence in a Bi-Objective Problem).2. In principle. f2 front returned by the optimizer f2 f2 true Pareto front f1 f1 f1 Fig. the goal is to find a set X of candidate solutions which approximates the set X ⋆ of globally (Pareto) optimal solutions as good as possible. THE PROBLEMS 153 Definition D13.2. “converge”). If these features are encoded in separate genes (or building blocks) in the genotypes. Assume that we wish to solve a bi-objective problem with f = (f1 . e. Such results are not attractive because they do not provide Pareto-optimal solutions and we would consider the convergence to be premature in this case.c has the two desirable properties of good convergence and spread. Second. During this time.2. for instance. domino convergence occurs when the candidate solutions have features which contribute to significantly different degrees to the total fitness. 13.2. 2698] which is discussed in Section 50. The goal of optimization in problems with infinitely many optima is often not to discover only one or two optima. They do not come into play until .2. This definition is rather fuzzy.2. Then. will quickly be adopted by the optimization process (i.2.a) shows a result X having a very good spread (or diversity) of solutions.b: Good Convergence and Fig. 13. two issues arise: First. 13.5.2. so the decision maker could lose important trade-off options.c: Good Convergence and Good Spread Bad Spread Spread Figure 13.4 (Non-Uniform Convergence)..

Reaching a state of premature convergence. Much of the beauty and efficiency of natural ecosystems is based on a dazzling array of species interacting in manifold ways. 13. The process illustrated in the lower part of the picture instead solely concentrates on the area where the best candidate solutions were sampled in the beginning and quickly climbs up the local optimum. In the worst case. Let us consider the application of a Genetic Algorithm. By doing so.3 illustrates two possible ways in which a population-based optimization method may progress. since the global optimum which would involve optimal configurations of all building blocks will not be discovered. Both processes start with a set of random candidate solutions denoted as bubbles in the fitness landscape already used in Fig. if one feature is highly useful in a hard sub-problem of the classification task. 13.2. Example E13. Rudnick [2347] called this sequential convergence phenomenon domino convergence due to its resemblance to a row of falling domino stones [2698]. diversity is the variety and abundance of organisms at a given place and time [1813. Figure 13. the contributions of the less salient genes may almost look like noise and they are not optimized at all.a in their search for the global maximum.4 and Section 50. discovers both peaks.154 13 UNSATISFYING CONVERGENCE the optimal alleles of the more “important” blocks have been accumulated. Example problems which are often likely to exhibit domino convergence are the Royal Road and the aforementioned BinInt problem. and Burke et al. the optimization process will never get to optimize the less salient blocks because repairing and rediscovering those with higher importance takes precedence. Morrison and De Jong [1958]. Morrison [1956]. Another term for this state is convergence. If this happens with a high enough frequency. Mutation operators from time to time destroy building blocks with strong positive influence which are then reconstructed by the search.2. Thus. Routledge [2344]. While the upper algorithm preserves a reasonable diversity in the set of elements under investigation although initially. maintaining a set of diverse candidate solutions is very important as well.3 (Loss of Diversity → Premature Convergence). Diversification is also a good strategy utilized by investors in the economy in order to increase their wealth.3 One Cause: Loss of Diversity In biology. In population-based Global Optimization algorithms. 458]. 2112]. There are some relations between domino convergence and the Siren Pitfall in feature selection for classification in data mining [961]. it also manages to explore the area near the global optimum and eventually. in such a scenario. Losing diversity means approaching a state where all the candidate solutions under investigation are similar to each other. the best candidate solutions are located in the proximity of the local optimum. Such a situation is also an instance of premature convergence.5. Discussions about how diversity can be measured have been provided by Cousins [648]. the mutation rate of the EA limits the probability of finding the global optima in such a situation. restarting the optimization process will not help because it will turn out the same way with very high probability. an optimization method working on bit strings. Here. it cannot escape from there anymore. [2112]. [454. In this situation. Paenke et al.2. . which you can find discussed in Section 50. Magurran [1813]. its utility may still be overshadowed by mediocre features contributing to easy sub-tasks. respectively.1.

13. 2573]. 366. Exploitation The operations which create new solutions from existing ones have a very large impact on the speed of convergence and the diversity of the populations [884. and 5. 1069] in the context of Tabu Search. 2063. 4. Genetic Programming [392. 1964. 3. 2535].5 (Exploration).3: Loss of Diversity leads to Premature Convergence. 709. 458. 1069].2 Definition D13. such as 1. 1156. 2321.1 Exploration vs. 2. the terms intensifications and diversification have been introduced by Glover [1067.4 and can be observed in many areas of Global Optimization. The step size in Evolution Strategy is a good example of this issue: setting it properly is very important and leads over to the “exploration versus exploitation” problem [1250]. The dilemma of deciding which of the two principles to apply to which degree at a certain stage of optimization is sketched in Figure 13. 2322]. 1157]. Preserving diversity is directly linked with maintaining a good balance between exploitation and exploration [2112] and has been studied by researchers from many domains. exploration means finding new points in areas of the search space which have not been investigated before. ONE CAUSE: LOSS OF DIVERSITY 155 preservation of diversity ® convergence to global optimum initial situation loss of diversity ® premature convergence Figure 13. 456.3. Evolutionary Algorithms [365. Particle Swarm Optimization [2943]. 1692. Genetic Algorithms [238. Tabu Search [1067. In the context of optimization.13.3. 2 More or less synonymously to exploitation and exploration. . 2503.

already evaluated candidate solutions usually have to be discarded in order to accommodate the new ones. 885. Optimization algorithms that favor exploitation over exploration have higher convergence speed but run the risk of not finding the optimal solution and may get stuck at a local optimum. . The crossover operator in Evolutionary Algorithms. is considered to be an operator for exploitation. promising individuals. solutions located in distant areas of the problem space will not be discovered. Almost all components of optimization strategies can either be used for increasing exploitation or in favor of exploration. optimization algorithms should employ at least one search operation of explorative character and at least one which is able to exploit good solutions further. 741. 3 Selection will be discussed in Section 28. Unary search operations that improve an existing solution in small steps can often be built. very similar candidate solutions or try to merge building blocks of different. There exists a vast body of research on the trade-off between exploration and exploitation that optimization algorithms have to face [87.156 initial situation: good candidate solution found 13 UNSATISFYING CONVERGENCE exploitation: search close proximity of the good candidata exploration: search also the distant area Figure 13. for instance. Such operators (like mutation in Evolutionary Algorithms) have a high chance of creating inferior solutions by destroying good building blocks but also a small chance of finding totally new. Generally. They can either return a small group of best individuals (exploitation) or a wide range of existing candidate solutions (exploration). possibly better. Exploitation operations often incorporate small changes into already tested individuals leading to new. effectively making them exploration operators. however.4 on page 285. Definition D13. 1253. superior traits (which. Selection operations3 in Evolutionary Computation choose a set of the most promising candidate solutions which will be investigated in the next iteration of the optimizers. hence being exploitation operators. Exploitation is the process of improving and combining the traits of the currently known solutions.6 (Exploitation). It is often modified to a form called simulated quenching which focuses on exploitation but loses the guaranteed convergence to the optimum. They can also be implemented in a way that introduces much randomness into the individuals. 864. Exploration is a metaphor for the procedure which allows search operations to find novel and maybe better solution structures. algorithms which perform excessive exploration may never improve their candidate solutions well enough to find the global optimum or it may take them very long to discover it “by accident”. A good example for this dilemma is the Simulated Annealing algorithm discussed in Chapter 27 on page 243. 1986]. is not guaranteed at all). Then again. They usually have the disadvantage that other.4: Exploration versus Exploitation Since computers have only limited memory.

The probability that an optimization process gets caught in a local optimum depends on the characteristics of the problem to be solved and the parameter settings and features of the optimization algorithms applied [2350. it is common to remove duplicate genotypes from the population [2324].1). such an approach also decreases the speed with which good solutions are exploited. The . 1080. It is known that Simulated Annealing can find the global optimum.4. since larger populations can maintain more individuals (and hence.4. using low selection pressure decreases the chance of premature convergence. The two different operators given in Equation 5.4.5 on 96 are another example for the influence of the search operation design on the vulnerability of the optimization process to local convergence. the higher the chance that candidate solutions with bad fitness are investigated instead of being discarded in favor of the seemingly better ones.2 Restarting A very crude and yet. [735. 13.4 Sharing. 1085. Simulated Annealing. A prime example for bad operator design is Algorithm 1. However. many methods have been devised for steering the search away from areas which have already been frequently sampled.1 on page 40 which cannot escape any local optima.1. it is (at least theoretically) possible to escape arbitrary local optima. too.2. COUNTERMEASURES 157 13. Then. whereas simple Hill Climbers are likely to prematurely converge.4 and Equation 5. the exploration capabilities of an optimizer can be improved by integrating density metrics into the fitness assignment process. 2722]. also be more diverse). takes extremely long if a guaranteed convergence to the global optimum is required.4.3. More general.4 Countermeasures There is no general approach which can prevent premature convergence. 906] (see Chapter 42 on page 499). the idea that larger populations will always lead to better optimization results does not always hold. i. Niching.. 742. The most popular of such approaches are sharing and niching (see Section 28. such approaches are likely to fail in domino convergence situations. However. In order to extend the duration of the evolution in Evolutionary Algorithms.13.1 Search Operator Design The most basic measure to decrease the probability of premature convergence is to make sure that the search operations are complete (see Definition D4. This is the exact idea which distinguishes Simulated Annealing from Hill Climbing (see Chapter 27). Greedy Randomized Adaptive Search Procedures [902. it makes thus sense to “gain more” from the smaller populations. e.8 on page 85). the lower the chance of getting stuck at a local optimum.4. sometimes effective measure is restarting the optimization process at randomly chosen points in time. which continuously restart the process of creating an initial solution and refining it with local search. as shown by van Nimwegen and Crutchfield [2778]. Increasing the proportion of exploration operations may also reduce the chance of premature convergence.4. In an EA. 1250. Still. and Clearing Instead of increasing the population size. to make sure that they can at least theoretically reach every point in the search space from every other point. 13. 1899. 2391]).4. for instance.3 Low Selection Pressure and Population Size Generally. One example for this method is GRASP s. In steady-state EAs (see Paragraph 28. Increasing the population size of a population-based optimizer may be useful as well. 13. 13.

The original (single) objective f (Fig. 1553. If we assume minimization. We now split up f to f1 and f2 and optimize either f ≡ (f1 .4. The same is true for its right border.3. 2168] introduced the related clearing approach which you can find e discussed in Section 28.b.6 Multi-Objectivization Recently. the idea of using helper objectives has emerged. Such behaviors.7. and A⋆ . since such a component naturally lends itself as helper objective. [425] proved that in some cases.7.4. A⋆ . f2 ) together.4 (Sum-of-Parts Multi-Objectivization). 13.a) can be divided into the two objective functions f1 and f2 with f(x) = f1 (x)+f2 (x) illustrated Fig. usually a single-objective problem is transformed into a multi-objective one by adding new objective functions [1429. we find that there are only three locally optima regions remaining A⋆ . we provide an example of multi-objectivization of the a sum-of-parts objective function f defined on the real interval X = G = [0.5 Self-Adaptation Another approach against premature convergence is to introduce the capability of selfadaptation.158 13 UNSATISFYING CONVERGENCE Strength Pareto Algorithms. for example. however. 1]. Any step to the left or right within A⋆ leads to an improvement of either f1 or f2 and a 2 degeneration of the other function. xl. Therefore. Here. 2 2 . which are widely accepted to be highly efficient. and x⋆ are three sets each containing exactly one locally l. If we 1 go a small step to the left from the left border of A⋆ . where individuals are grouped according to their distance in the phenotypic or genotypic space and all but a certain number of individuals from each group just receive the worst possible fitness. f2 ) or f ≡ (f.17 on page 89 of local optima (which is also used in [1178]). The new objectives are often derived from the main objective by decomposition [1178] or from certain characteristics of the problem [1429].4 minimal point.2).14 ⋆ ⋆ on page 89: The sets X1 and X5 are globally optimal since they contain the elements with ⋆ ⋆ lowest function value. They are then optimized together with the original objective function with some multi-objective technique. are often implemented not in order to prevent premature convergence but to speed up the optimization process (which may lead to premature convergence to local optima) [2351–2353].2 . In Figure 13. 2025]. However. Brockhoff et al. It has been shown that the original objective function has at least as many local optima as the total number of all local optima present in the decomposed objectives for sum-of-parts multi-objectivizations [1178]. at the borders of A⋆ .5.5. f1 becomes constant while f2 keeps growing. Problems where the main objective is some sort of sum of different components contributing to the fitness are especially suitable for that purpose. 13. use another idea: they adapt the number of individuals that one candidate solution dominates as density measure [3097. 3100]. f1 . 1442. P´trowski [2167. A similar method aiming for convergence prevention is introduced in Section 28. 13. say an MOEA (see Definition D28. Example E13. we will find an element with the same 1 value of f1 but with worse values in both f2 and f. 13. no element within A⋆ dominates any other 2 element in A⋆ .3 . 1757. xl.5. where individuals close to each other in objective space are deleted. A⋆ 3 has exactly the same behavior. such changes can speed up the optimization process.4. f has five locally optimal sets according to Definition D4.4. allowing the optimization algorithm to change its strategies or to modify its parameters depending on its current state. 1440. If we apply Pareto-based optimization (see Section 3. is a continuous region which borders to elements it dominates. 1 2 3 A⋆ .5 on page 65) and use the Definition D4.

Figure 13.13.5: Multi-Objectivization ` la Sum-of-Parts.8 0. In this l.4 0. The neighboring elements outside of A⋆ 2 are thus dominated by the elements on its borders. f1(x) xÎX 0.2 l. COUNTERMEASURES 1 0.2 main objective: f(x)=f1(x)+f2(x) X« 1 X« 5 159 1 0. 13.8 1 Fig. x⋆ . a i. . and 2 l.3 x⋆ and – since it is continuous – also (uncountable) infinitly many other points. a look at A⋆ shows us that it contains x⋆ .6 0.2 l.4 xÎX 0 0.4.2 f2(x) A« 1 A« 2 A« 3 f(x) x« x« x« l. However.4 0. getting worse.4 case.. Multi-objectivization may thus help here to overcome the local optima.6 0.5.4 0.2 0. e.3 l.8 0. since there are fewer locally optimal regions.a: The original objective and its local optima.8 1 0 Fig.5.6 0.2 0. 13. it would thus not be clear whether multi-objectivization would be beneficial or not a priori.b: The helper objectives and the local optima in the objective space.6 0.4 0.

Discuss why the Hill Climbing optimization algorithm (Chapter 26) is particularly prone to premature convergence and how Simulated Annealing (Chapter 27) mitigates the cause of this vulnerability. [10 points] 50. [5 points] . Copy this graph three times and paint six points into each copy which represent: (a) the possible results of a possible prematurely converged optimization process. Sketch one possible objective function for a single-objective optimization problem which has infitely many solutions. Discuss the dilemma of exploration versus exploitation. (b) the possible results of a possible correctly converged optimization process with non-uniform spread (convergence). What are the advantages and disadvantages of these two general search principles? [5 points] 51.160 Tasks T13 13 UNSATISFYING CONVERGENCE 49. (c) the possible results of a possible optimization process which has converged nicely.

one could say ruggedness is multi-modality plus steep ascends and descends in the fitness landscape. For short. i. The more rugged a function gets.Chapter 14 Ruggedness and Weak Causality landscape difficulty increases unimodal multimodal somewhat rugged very rugged Figure 14. 14. Usually. the harder it gets to optimize it. so the optimizer can descend the gradient easily. 11. 14.1.2.1 The Problem: Ruggedness Optimization algorithms generally depend on some form of gradient in the objective or fitness space. The objective functions should be continuous and exhibit low total variation1 .wikipedia. e. go up and down often.1: The landscape difficulty increases with increasing ruggedness.1). and the sketch in Fig. Bergman and Feldman’s jagged fitness landscape [276]. Generally we can assume that the genotypes which are the input of the search operations correspond to phenotypes which have previously been selected.2.1. it becomes more complicated for the optimization process to find the right directions to proceed to as sketched in Figure 14. the better or the more promising an individual is.org/wiki/Total_variation [accessed 2008-04-23] .2. If the objective functions become more unsteady or fluctuating. Since the fitness of a candidate solution depends on its properties. it can be assumed that the features of these individuals 1 http://en. the p-Spin model discussed in Section 50. new points in the search space are created by the search operations.d on page 140. Reversing this statement suggests that individuals which are passed to the search operations are likely to have a good fitness.. Examples of rugged landscapes are Kauffman’s NK fitness landscape (see Section 50.2 One Cause: Weak Causality During an optimization process. the higher are its chances of being selected for further investigation.

2452] then define strong causality as the state where Equation 14.162 14 RUGGEDNESS AND WEAK CAUSALITY are not so bad either. 2452] introduce a very nice definition for causality which combines the search space G. Sendhoff et al. g2 ) = − log Pid Pid = P (g = searchOp1 (g)) (14.1 (Strong Causality). 2279]) should not only hold for the search spaces and operations designed for optimization (see Equation 24. the more rugged its fitness landscape is. The offspring resulting from sexual reproduction of two fish.. 2329]. This principle (proposed by Rechenberg [2278.1) dist caus(g1 . search operators [835]. 1761]. This does not necessarily mean that it is impossible to find good solutions. g2 ) < dist caus(g1 . for instance. i. ruggedness. instead of leading to a totally different creature.3 1 P (g2 = searchOp1 (g1 )) (14. 2278]. 2 We 3 This have already mentioned this under the subject of exploitation.9). strong causality means that the search operations have a higher chance to produce points g2 which (map to candidate solutions which) are closer in objective space to their inputs g1 than to create elements x3 which (map to candidate solutions which) are farther away. Definition D14.2) In Equation 14. small changes in the candidate solutions often lead to large changes in the objective values. 2329]. 2327. First. it is far more probable that these variations manifest in a unique color pattern of the scales. measure has the slight drawback that it is undefined for P (g2 = searchOp1 (g2 )) = 0 or Pid = 0. informal explanation here. Genetic Programming [835. 1396. structure evolution [1760. has a different genotype than its parents. Apart from this straightforward. such exploitive modifications should also lead to small changes in the objective values and hence. g2 ) falls with a rising probability the g1 is mapped to g2 .3 holds: ∀g1 . g3 ∈ G : ||f (gpm(g1 )) − f (gpm(g2 ))||2 < ||f (gpm(g1 )) − f (gpm(g3 ))||2 dist caus(g1 . P (g2 = searchOp1 (g1 )) is the probability that g2 is the result of one application of the search operation searchOp1 to g1 . and Evolutionary Algorithms in general [835. and Pid is the probability that the searchOp1 maps an element to itself. but applies to natural genomes as well. a distance measure “dist caus” based on the stochastic nature of the unary search operation searchOp1 is defined. the (unary) search operations searchOp1 ∈ Op. genotypephenotype mappings [2451]. [2451. Sendhoff et al. e. It should thus be possible for the optimizer to introduce slight changes to their properties in order to find out whether they can be improved any further2 . the genotype-phenotype mapping “gpm” and the problem space X with the objective functions f . g1 and g2 are two elements of the search space G. . which leads to a degeneration of the performance of the optimizer [1563]. In fitness landscapes with weak (low) causality. Yet. [2451. 2338. g3 ) P (g2 = searchOp1 (g1 )) > P (g3 = searchOp1 (g1 )) (14.5 on page 218 in Section 24. g2 .3) In other words. 2279. 2599]. dist caus(g1 . Strong causality (locality) means that small changes in the properties of an object also lead to small changes in its behavior [2278. for example. It then becomes harder to decide which region of the problem space to explore and the optimizer cannot find reliable gradient information to follow. The lower the causality of an optimization problem.1. such as Evolution Strategy [835. A small modification of a very bad candidate solution may then lead to a new local optimum and the best candidate solution currently known may be surrounded by points that are inferior to all other tested individuals. Normally. in the fitness of the candidate solution. but it may take very long to do so. causality has been investigated thoroughly in different fields of optimization.

Weinberger [2881] introduced the autocorrelation function and the correlation length of random walks. ). . many different metrics have been proposed. 5. [2785. xi+1 . let us take a look at the autocorrelation function as well as the correlation length of random walks [2881].2 on page 172 in depth. 9. 7. . The error threshold method from theoretical biology [867. [1821] in conjunction with the autocorrelation [2267]. [2062] for Evolutionary Algorithms. The negative slope coefficient (NSC) by Vanneschi et al. 8. which we summarize here: 1. 14. Given a random walk (xi . Davidor [691] uses the epistatic variance as a measure of utility of a certain representation in Genetic Algorithms.14.3. This measure has been extended by researchers such as Clergue et al. the correlation of the fitness of an individual and its distance to the global optimum.3 Fitness Landscape Measures As measures for the ruggedness of a fitness landscape (or their general difficulty). [2802]: e Definition D14. ρ(k. 3. 2. [590. Here we borrow its definition from V´rel et al. The correlation of the search operators was used by Manderick et al. 4.3. the autocorrelation function ρ of an objective function f is the autocorrelation function of the time series (f(xi ) . 6. FITNESS LANDSCAPE MEASURES 163 14. . 11.2 (Autocorrelation Function). will be discussed in Section 16. as defined by Rechenberg [2278] and Beyer [295] (and called evolvability by Altenberg [77]). 1468] proposed the fitness distance correlation (FDC). We discuss the issue of epistasis in Chapter 17. The genotype-fitness correlation (GFC) of Wedge and Kell [2874] is a new measure for ruggedness in fitness landscape and has been shown to be a good guide for determining optimal population sizes in Genetic Programming.1 Autocorrelation and Correlation Length As example. 4 Especially the one of Wedge and Kell [2874] is beautiful and far more detailed than this summary here. f(xi+1 ) . It is the “critical mutation rate beyond which structures obtained by the evolutionary process are destroyed by mutation more frequently than selection can reproduce them” [2062]. 2786] may be considered as an extension of Altenberg’s evolvability measure. ). f) = E [f(xi ) f(xi+k )] − E [f(xi )] E [f(xi+k )] D2 [f(xi )] (14. Simulation dynamics have been researched by Altenberg [77] and Grefenstette [1130]. The probability that search operations create offspring fitter than their parents. . Another interesting metric is the fitness variance of formae (Radcliffe and Surry [2246]) and schemas (Reeves and Wright [2284]). . . Rana [2267] defined the Static-φ and the Basin-δ metrics based on the notion of schemas in binary-coded search spaces. . 10. 2784].4) where E [f(xi )] and D2 [f(xi )] are the expected value and the variance of f(xi ). 2057] has been adopted Ochoa et al. Wedge and Kell [2874] and Altenberg [80] provide nice lists of them in their work4 . Jones and Forrest [1467.

f) measures how the autocorrelation function decreases and summarizes the ruggedness of the fitness landscape: the larger the correlation length. but only up to a certain degree.2 and Section 36. however. we also know that correlation measures do not always represent the hardness of a problem landscape full. we present some further rules-of-thumb for search space and operation design. Utilizing Lamarckian evolution [722. respectively). 1236. In Chapter 24. exploitation. e. i.. 2933] or the Baldwin effect [191. We pointed out that exploration operations are important for lowering the risk of premature convergence. it is necessary to find representations which allow for iterative modifications with bounded influence on the objective values.. no viable method which can directly mitigate the effects of rugged fitness landscapes exists. the lower the total variation of the landscape. i. In order to apply optimization algorithms in an efficient manner. Jr [1540] and Lipsitch [1743] from twenty.4 Countermeasures To the knowledge of the author. . In population-based approaches. may further help to smoothen out the fitness landscape [1144] (see Section 36. Exploitation operators are as same as important for refining solutions to a certain degree. 14.3. Weak causality is often a home-made problem because it results to some extent from the choice of the solution representation and search operations. 1235. From the works of Kinnear. 2933].164 14 RUGGEDNESS AND WEAK CAUSALITY 1 The correlation length τ = − log ρ(1. e. using large population sizes and applying methods to increase the diversity can reduce the influence of ruggedness. incorporating a local search into the optimization process.

smooth. Investigate the Weierstraß Function 5 .wikipedia. and a rugged one (f3 : R → R+ ).14.org/wiki/Weierstrass_function [accessed 2010-09-14] .4. [15 points] 5 http://en. multimodal. COUNTERMEASURES Tasks T14 165 52. Define three objective functions by yourself: a uni-modal one (f1 : R → R+ ). Is it unimodal. or even rugged? [5 points] 53. Why is causality important in metaheuristic optimization? Why are problems with low causality normally harder than problems with strong causality? [5 points] 54.1 on page 230) by applying it 30 times to each function. Try to solve all three function with a trivial implementation of Hill Climbing (see Algorithm 26. Note your observations regarding the problem hardness. a simple multi-modal one (f2 : R → R+ ). All three functions should have the same global minimum (optimum) 0 with objective value 0.

.

1 Introduction Especially annoying fitness landscapes show deceptiveness (or deceptivity).2 The Problem Definition D15. [743. Examples for deceptiveness are the ND fitness landscapes outlined in Section 50. landscape difficulty increases unimodal unimodal with neutrality multimodal (center and limits) deceptive very deceptive Figure 15.2. An objective function f : X → R is deceptive if a Hill Climbing algorithm (see Chapter 26 on page 229) would be steered in a direction leading away from all global optima in large parts of the search space. 11.1: Increasing landscape difficulty caused by deceptivity. 1739]. 1076. trap functions (see Section 50.e on page 140.1 (Deceptiveness). Objective functions where this is not the case are called deceptive [288.5 on page 341.1 and Fig.2. 1 We are going to discuss Genetic Algorithms in Chapter 29 on page 325 and the Schema Theorem in Section 29. The term deceptiveness [288] is mainly used in the Genetic Algorithm1 community in the context of the Schema Theorem [1076.1).3. it will focus on exploring this region based on the assumption that highly fit areas are likely to contain the true optimum. and the fully deceptive problems given by Goldberg et al.3. The gradient of deceptive objective functions leads the optimizer away from the optima. . 15. 1077]. 1075. If an optimization algorithm has discovered an area with a better average fitness compared to other regions.1. Schemas describe certain areas (hyperplanes) in the search space. as illustrated in Figure 15.Chapter 15 Deceptiveness 15. 1083] and [288].

rather unlikely and rare. Such kinds of problem are. maybe. Exhaustive enumeration of the search space or searching by using random walks may be the only possible way to go in extremely deceptive cases. If the information accumulated by an optimizer actually guides it away from the optimum. when looking on all possible solutions.3 Countermeasures Solving deceptive optimization tasks perfectly involves sampling many individuals with very bad features and low fitness. the question what kind of practical problem would have such a feature arises. search algorithms will perform worse than a random walk or an exhaustive enumeration method. this worst-looking point in the problem space turns out to be the bestpossible solution.3) are. However. This issue has been known for a long time [1914. (b) the genotype-phenotype mapping is the identity mapping. This contradicts the basic ideas of metaheuristics and thus.. A completely deceptive fitness landscape would be even worse than the needle-in-ahaystack problems outlined in Section 16. as far as the author of this book is concerned.168 15 DECEPTIVENESS For Definition D15. it is usually also assumed that (a) the search space is also the problem space (G = X).1 remains a bit blurry since the term large parts of the search space is intuitive but not precise. there are no efficient countermeasures against deceptivity. 15.1.3. The solution of a completely deceptive problem would be the one that. . it reveals the basic problem caused by deceptiveness. and (c) the search operations do not violate the causality (see Definition D14. when actually testing it. Definition D15. seems to be the most improbably one. and utilizing linkage learning (see Section 17. the only approaches which can provide at least a small chance of finding good solutions. However. However.7 [? ]. Still. 2868] and has been subsumed under the No Free Lunch Theorem which we will discuss in Chapter 23. 1915. 2696. maintaining a very high diversity.1 on page 162). Using large population sizes.

Construct two different deceptive objective functions f1 . COUNTERMEASURES Tasks T15 169 55.3. Try to solve these functions with a trivial implementation of Hill Climbing (see Algorithm 26. Note your observations regarding the problem hardness and compare them to your results from Task 54. Like the three functions from Task 54 on page 165.15. f2 : R → R+ by yourself. Why is deceptiveness fatal for metaheuristic optimization? Why are non-deceptive problems usually easier than deceptive ones? [5 points] 56.1 on page 230) by applying it 30 times to both of them. [7 points] . they should have the global minimum (optimum) 0 with objective value 0.

.

1) |{g2 : P (g2 = Op(g1 )) > 0}| 1 ∀G ⊆ G ⇒ ν (G) = ν (g) (16.. We can generalize this measure to areas G in the search space G by averaging over all their elements. As illustrated in Figure 16.1 (Neutrality). no direction in which to proceed in a systematic manner. an optimizer then cannot find any gradient information and thus. The degree of neutrality ν is defined as the fraction of neutral results among all possible products of the search operations applied to a specific genotype [219]. It is challenging for optimization algorithms if the best candidate solution currently known is situated on a plane of the fitness landscape. We consider the outcome of the application of a search operation to an element of the search space as neutral if it yields no change in the objective values [219. each search operation will yield identical individuals.2) |G| g∈G . 2287]. optimization algorithms usually maintain a list of the best individuals found. Regions where ν is close to one are considered as neutral. Furthermore. i. all adjacent candidate solutions have the same objective values.1. 11.1 Introduction landscape difficulty increases unimodal not much gradient with some neutrality distant from optimum very neutral Figure 16.1 and Fig. From its point of view. which will then overflow eventually or require pruning. e. ∀g1 ∈ G ⇒ ν (g1 ) = |{g2 : P (g2 = Op(g1 )) > 0 ∧ f (gpm(g2 )) = f (gpm(g1 ))}| (16.1: Landscape difficulty caused by neutrality.f on page 140.Chapter 16 Neutrality and Redundancy 16. Definition D16.

org/wiki/Punctuated_equilibrium [accessed 2008-07-01] 4 A very similar idea is utilized in the Extremal Optimization method discussed in Chapter 5 Or neutral networks. Toussaint and Igel [2719] even go as far as declaring it a necessity for self-adaptation. [2538] have used illustrative examples similar to Figure 16. Researchers in molecular evolution. 2278]. [242] on the ND fitness landscapes2 shows that neutrality may “destroy” useful information such as correlation. The theory of punctuated equilibria 3 .1. e. on the other hand. 16. suddenly. you may think. Bonner [357]. Similar phenomena can be observed/are utilized in EAs [604. as discussed in Section 16. In its second sense. Such properties can then be spread by natural selection and changed during the course of evolution. and Conrad [624]. Wagner [2833. and rapid phenotypic evolutions [185].3 Neutrality: Problematic and Beneficial The link between evolvability and neutrality has been discussed by many researchers [2834. is also sometimes referred to as evolvability in the context of Evolutionary Algorithms [77.wikipedia. 876].2. as illustrated in Fig.172 16 NEUTRALITY AND REDUNDANCY 16. since the search operations cannot directly provide improvements or even changes. 80].4 It is assumed that the populations explore neutral layers5 during the time of stasis until. basically the same scenario of premature 1 http://en.2 Evolvability Another metaphor in Global Optimization which has been borrowed from biological systems [1289] is evolvability1 [699]. In Fig. It is especially low for Hill Climbing and similar approaches. 16. a biological system is evolvable if it is able to generate heritable. The evolvability of an optimization process in its current state defines how likely the search operations will lead to candidate solutions with new (and eventually. “Uh?”. in biology introduced by Eldredge and Gould [875. 3039]. better) objectives values. “How does this fit together?” The key to differentiating between “good” and “bad” neutrality is its degree ν in relation to the number of possible solutions maintained by the optimization algorithms. 2834] points out that this word has two uses in biology: According to Kirschner and Gerhart [1543]. localized.2 showing that a certain amount of neutral reproductions can foster the progress of optimization. Theories about how the ability of generating adaptive variants has evolved have been proposed by Riedl [2305].. found indications that the majority of mutations in biology have no selective influence [971.2. selectable phenotypic variations.4. 3 http://en. a system is evolvable if it can acquire new characteristics via genetic change that help the organism(s) to survive and to reproduce. The idea of evolvability can be adopted for Global Optimization as follows: Definition D16. The evolvability of neutral parts of a fitness landscape depends on the optimization algorithm used. states that species experience long periods of evolutionary inactivity which are interrupted by sudden. The work of Beaudoin et al. i. 41. The direct probability of success [295. . the chance that search operators produce offspring fitter than their parents.2 (Evolvability). The optimization process then degenerates to a random walk. 1315] and that the transformation from genotypes to phenotypes is a many-to-one mapping. Wagner and Altenberg [78.wikipedia. 11.org/wiki/Evolvability [accessed 2007-07-03] 2 See Section 50.3 on page 557 for a detailed elaboration on the ND fitness landscape. Smith et al. 2835]. a relevant change in a genotype leads to a better adapted phenotype [2780] which then reproduces quickly.a. Wagner [2834] states that neutrality in natural genomes is beneficial if it concerns only a subset of the properties peculiar to the offspring of a candidate solution while allowing meaningful modifications of the others. 1838]. amongst others.f on page 140.

the optimization process may end up in a scenario like in Fig. the degree of neutrality (synonymous vs.5). the idea of utilizing the processes of molecular6 and evolutionary7 biology as complement to Darwinian evolution for optimization gains interest [212]. g2 ∈ G : g1 ∈ K (g2 ) ⊆ G ⇔ ∃k ∈ N0 : P g2 = Opk (g1 ) > 0 ∧ 6 http://en. 16. Here. which is not necessarily the case in reality.4.b: Small Neutral Bridge Fig. 16. as sketched in Fig. which is outlined in ?? on page ??. NEUTRAL NETWORKS 173 convergence as in Fig.4).1.wikipedia. If this bridge gets wider. i.2: Possible positive influence of neutrality. 1287.4 Neutral Networks From the idea of neutral bridges between different parts of the search space as sketched by Smith et al. [2538]. 16.2.3 (Neutral Network). global optimum local optimum Fig.f on page 140 where it cannot find any direction. Recently. the evolvability of the system has increased. 13. Another common instance of neutrality is bloat in Genetic Programming.1. 16.wikipedia.a on page 152 is depicted.2. if the bridge gets too wide.c. ∀g1 . in this scenario we expect the neutral bridge to lead to somewhere useful. Scientists like Hu and Banzhaf [1286.org/wiki/Evolutionary_biology [accessed 2008-07-20] f (gpm(g1 )) = f (gpm(g2 )) (16.2.2.c: Wide Neutral Bridge Figure 16. Furthermore. we can derive the concept of neutral networks..a: gence Premature Conver- Fig.2. The optimizer now has a chance to escape the smaller peak if it is able to find and follow that bridge. 11. Definition D16. Neutral networks are equivalence classes K of elements of the search space G which map to elements of the problem space X with the same objective values and are connected by chains of applications of the search operators Op [219]. 16. 3021] to Evolutionary Algorithms. the chance of finding the global optimum increases as well. Examples for neutrality in fitness landscapes are the ND family (see Section 50.2. 16. e.org/wiki/Molecular_biology [accessed 2008-07-20] 7 http://en. and the Royal Road (see Section 50. Of course.b shows that a little shot of neutrality could form a bridge to the global optimum.2. the NKp and NKq models (discussed in Section 50.3). non-synonymous changes) seems to play an important role.1.2.3) .16. The optimizer is drawn to a local optimum from which it cannot escape anymore. 1289] began to study the application of metrics such as the evolution rate of gene sequences [2973. Fig.

The experiments of Shipman et al. One example for beneficial redundancy is the extradimensional bypass idea discussed in Section 24. neutrality is explicitly introduced in order to increase the evolvability (see ?? on page ??) [2792.. g2 : g1 = g2 ∧ gpm(g1 ) = gpm(g2 ) (16. [2458. 2481] additionally indicate that neutrality in the genotype-phenotype mapping can have positive effects. . 2779]. the genotype-phenotype mapping is not injective. tried to mimic desirable evolutionary properties of RNA folding [1315]. 2480]. ∃g1 . Turing machine-like binary instructions. The convergence on neutral networks has furthermore been studied by Bornberg-Bauer and Chan [361]. and random Boolean networks [1500]. 16. In the Cartesian Genetic Programming method. the search operations can create offspring genotypes different from the parent which still translate to the same phenotype. Cellular automata. [242]. Especially the last approach provided particularly good results [2458.1.5 Redundancy: Problematic and Beneficial Definition D16. a Genetic Algorithm variant that focuses on exploring individuals which are far away from the centroid of the set of currently investigated candidate solutions (but have still good objective values). Their results show that the topology of neutral networks strongly determines the distribution of genotypes on them. if this rate is comparable with that of an unconstrained random walk. Networks with this property may prove very helpful if they connect the optima in the fitness landscape. many new candidate solutions can be reached [2480]. When utilizing a one-to-one mapping. the translation of a slightly modified genotype will always result in a different phenotype. Possibly converse effects like epistasis (see Chapter 17) arising from the new genotype-phenotype mappings have not been considered in this study. van Nimwegen et al. Stewart [2611] utilizes neutral networks and the idea of punctuated equilibria in his extrema selection. If many genotypes along this path can be modified to different offspring. and 2.4) The role of redundancy in the genome is as controversial as that of neutrality [2880]. the genotypes are “drawn” to the solutions with the highest degree of neutrality ν on the neutral network Beaudoin et al. Redundancy in the context of Global Optimization is a feature of the genotype-phenotype mapping and means that multiple genotypes map to the same phenotype. for instance. via uniform redundancy and via a non-trivial approach). Redundancy can have a strong impact on the explorability of the problem space. If there exists a many-to-one mapping between genotypes and phenotypes. Then again. Generally. the rate of discovery of innovations keeps constant for a reasonably large amount of applications of the search operations [1314].174 16 NEUTRALITY AND REDUNDANCY Barnett [219] states that a neutral network has the constant innovation property if 1. There exist many accounts of its positive influence on the optimization process. e.15. Except for the trivial voting mechanism based on uniform redundancy. [2778. 3042]. [2479. The optimizer may now walk along a path through this neutral network.4 (Redundancy). Shipman et al. i. and Wilke [2942]. They developed redundant genotype-phenotype mappings using voting (both. 2480]. the mappings induced neutral networks which proved beneficial for exploring the problem space. Barnett [218] showed that populations in Genetic Algorithm tend to dwell in neutral networks of high dimensions of neutrality regardless of their objective values. which (obviously) cannot be considered advantageous.

Rothlauf [2338] and Shackleton et al. i. An example for such fitness landscapes is the all-or-nothing property often inherent to Genetic Programming of algorithms [2729]. .6. increase the chance of finding good solutions.. Another example for this issue is given in Fig. Some forms of neutral networks accompanied by low (nonzero) values of ν can improve the evolvability and hence. genotypes which encode the same phenotype. as discussed in ?? on page ??.b on page 220. 16. Uniform redundancy in the genome should be avoided where possible and the amount of neutrality in the search space should generally be limited. 11. small instances of extreme ruggedness combine with a general lack of information in the fitness landscape.7 Needle-In-A-Haystack Besides fully deceptive problems. Adverse forms of neutrality are often caused by bad design of the search space or genotype-phenotype mapping.1.g on page 140. Generally we can state that degrees of neutrality ν very close to 1 degenerate optimization processes to random walks. We can thus conclude that there is no use in introducing encodings which. Ronald et al. Such problems are extremely hard to solve and the optimization processes often will converge prematurely or take very long to find the global optimum. [2458] show that simple uniform redundancy is not necessarily beneficial for the optimization process and may even slow it down. represent each phenotypic bit with two bits in the genotype where 00 and 01 map to 0 and 10 and 11 map to 1. 16. [2324] point out that the optimization may be slowed down if the population of the individuals under investigation contains many isomorphic genotypes. where the optimum occurs as isolated [1078] spike in a plane. 24. for instance.1. one of the worst cases of fitness landscapes is the needlein-a-haystack (NIAH) problem [? ] sketched in Fig. If this isomorphismcan be identified an removed. neutrality has aspects that may further as well as hinder the process of finding good solutions. a significant speedup may be gained [2324]. In other words.16. e. SUMMARY 175 Yet.6 Summary Different from ruggedness which is always bad for optimization algorithms.

Note your observations regarding the problem hardness and compare them to your results from Task 54. [7 points] .1 on page 230) by applying it 30 times to both of them. f2 : R → R+ which have neutral parts in the fitness landscape. Describe the possible positive and negative aspects of neutrality in the fitness landscape. Construct two different objective functions f1 . they should have the global minimum (optimum) 0 with objective value 0. Try to solve these functions with a trivial implementation of Hill Climbing (see Algorithm 26. [10 points] 58.176 Tasks T16 16 NEUTRALITY AND REDUNDANCY 57. Like the three functions from Task 54 on page 165 and the two from Task 56 on page 169.

1 (Pleiotropy. it has both epistatic and pleiotropic effects. 2944].1 (Epistasis).1. In the context of statistical genetics.1 we illustrate a fictional dinosaur along with a snippet of its fictional genome 1 http://en. also implicitly refer to the former. and a Dinosaur). We will therefore consider pleiotropy and epistasis together and when discussing the effects of the latter. 2562]. two others which are responsible for distinct phenotypical traits. In Figure 17. epistasis was initially called “epistacy” by Fisher [928]. A problem is maximally epistatic when no proper subset of genes is independent of any other gene [2007. Both phenomena may easily intertwine.wikipedia. Pleiotropy. the optimization process equals finding the best value for each gene and can most efficiently be carried out by a simple greedy search (see Section 46.1 Introduction 17. If one gene epistatically influences.Chapter 17 Epistasis. Pleiotropy 2 denotes that a single gene is responsible for multiple phenotypical traits [1288.org/wiki/Pleiotropy [accessed 2008-03-02] . Like epistasis. Then.1) [692]. epistasis 1 is defined as a form of interaction between different genes [2170].1 Epistasis and Pleiotropy In biology.3. [79. and Separability 17.2 (Pleiotropy). epistasis is the dependency of the contribution of one gene to the value of the objective functions on the allelic state of other genes. Example E17. Our understanding and effects of epistasis come close to another biological phenomenon: Definition D17.wikipedia. Epistasis. According to Lush [1799].org/wiki/Epistasis [accessed 2008-05-31] 2 http://en. The term was coined by Bateson [231] and originally meant that one gene suppresses the phenotypical expression of another gene. pleiotropy can sometimes lead to unexpected improvements but often is harmful for the evolutionary system [77]. 2007] We speak of minimal epistasis when every gene is independent of every other gene. In optimization. 692. the interaction between genes is epistatic if the effect on the fitness of altering one gene depends on the allelic state of other genes. Definition D17. for instance.

A function f : X n → R (where usually X ⊆ R is separable if and only if arg min(x1 .. consisting of four genes. influences the length of the forelegs – maybe preventing them from looking exactly like the hind legs. exhibits pleiotropy since it determines the length of the hind legs and forelegs. e. . Epistasis can be considered as the higher-order or non-main effects in a model predicting fitness values based on interactions of the genes of the genotypes from the point of view of Design of Experiments (DOE) [2285].. epistasis and pleiotropy are closely related to the term separability.) .. Gene 1 influences the color of the creature and is neither pleiotropic nor has any epistatic relations. A function of n variables is separable if it can be rewritten as a sum of n functions of just one variable [1162. 1181] as follows: Definition D17.2 Separability In the area of optimizing over continuous problem spaces (such as X ⊆ Rn ). At the same time.1: Pleiotropy and epistasis in a dinosaur’s genome. (17. .178 17 EPISTASIS. 2099]...3 (Separability). Gene 2. PLEIOTROPY.. Hence. 17... i. and the problem is said to be separable according to the formal definition given by Auger et al.xn ) f(x1 . AND SEPARABILITY shape color length length gene 1 gene 2 gene 3 gene 4 Figure 17.. xn )) Otherwise. Separbility is a feature of the objective functions of an optimization problem [2668]. . it is epistatically connected with gene 3 which too.1. xn ) = (arg minx1 f(x1 . however. [147.. the decision variables involved in the problem can be optimized independently of each other. arg minxn f(. The fourth gene is again pleiotropic by determining the shape of the bone armor on the top of the dinosaur’s skull. f(x) is called a nonseparable function. are minimally epistatic.1) ..

5 (Partial Separability). a modification of this gene will lead to a large change in the features of the phenotype.1. If an objective function f : X n → R+ can be expressed as M f(x) = i=1 fi xIi. 17. In other words. [2670. THE PROBLEM 179 If a function f(x) is separable. [2910] (Section 50.1.5 is. the parameters x1 .|I i| (17. Hence.. xn forming the candidate solution x ∈ X are called independent. Partially separable test problems are given in.2.2.5). as already discussed under the topic of epistasis.. Definition D17. for instance.3. . the Griewank function [1106. .n} of the n decision variables [612... subsequent changes to the “deactivated” genes may have no influence on the phenotype at all. Bowers [375] found that representations and genotypes with low pleiotropy lead to better and more robust solutions. and the genotype-phenotype mapping are defined and interact. 1137].7).3. A nonseparable function f(x) is called fully-nonseparable if any two of its parameters xi are not independent.2. 2934. Epistasis and pleiotropy are mainly aspects of the way in which objective functions f ∈ f .4. 2708].1. the p-Spin model [86] (Section 50. the genome G. A separable problem is decomposable. The higher the degree of nonseparability. 1136. Definition D17. for example. the causality will be weakened and ruggedness ensues in the fitness landscape. epistasis has a strong influence on many of the previously discussed problematic features. 3026] (see Section 50. A function f : X n → R (where usually x ⊆ R is m-nonseparable if at most m of its parameters xi are not independent. the term nonseparable is used in the sense of fully-nonseparable. the harder a function will usually become more difficult for optimization [147. hence.2) where each of the M component functions fi : X |Ii | → R depends on a subset Ii ⊆ {1. Definition D17.2. 613. 2635]..2. and the tunable model by Weise et al. They should be avoided where possible. a variety of partially separable problems exists. It also becomes harder to define search operations with exploitive character.11) and Rosenbrock’s Function [3026] (see Section 50. Nonseparable objective functions often used for benchmarking are. If one gene can “turn off” or affect the expression of other genes. .2 The Problem As sketched in Figure 17. xIi.1). 1501] (Section 50.1 . Often. 17. Moreover. an alternative for Definition D17.17. 1181. in between separable and fully nonseparable problems. which would then increase the degree of neutrality in the search space.2).3 Examples Examples of problems with a high degree of epistasis are Kauffman’s NK fitness landscape [1499.4 (Nonseparability).

3 (Epistasis in Genetic Programming). f1 is minimal epistatic.2 (Epistasis in Binary Genomes). //skeleton code while(b!=0) { // -----------------t = b. In the optimization problem defined by XB and f . A single zero in a candidate solution deactivates all other genes and the optimum – the string containing all ones – is the only point in the search space with non-zero objective values.1: A program computing the greatest common divisor. G = X.5) f3 (x) = f2 (x) = (x[0] ∗ x[1]) + (x[2] ∗ x[3]) + x[4] 4x[i] i=0 Furthermore. Example E17. All three are based on a problem space consisting of all bit strings of length five XB = B5 = {1. int b) { //skeleton code int t. // Evolved Code a = t. Thus. // b = a % b. AND SEPARABILITY Needle in a Haystack ruggedness multimodality neutrality high epistasis weak causality º causes Figure 17. Consider the following three objective functions f1 to f3 all subject to maximization. the elements of the candidate solutions are totally independent and can be optimized separately. Both groups are independent from each other and from x[4].4) (17. //skeleton code } //skeleton code Listing 17.3) (17. 1 2 3 4 5 6 7 8 9 int gcd(int a. // } // -----------------return a.2: The influence of epistasis on the fitness landscape. Hence. the search space be identical to the problem space. In f2 . . 0} f1 (x) = i=0 5 . In this area of Evolutionary Computation the problem spaces are sets of all programs in (usually) domain-specific languages and we try to find candidate solutions which solve a given problem “algorithmically”. e. f3 is a needle-in-a-haystack problem where all five genes directly influence each other.180 17 EPISTASIS. the problem has some epistasis. PLEIOTROPY. 4x[i] (17. x[1]) and second two (x[2]. Example E17. Epistasis can be observed especially often Genetic Programming.. x[3]) genes of the each genotype influence each other. only the first two (x[0]. i.

2. for instance. Example E17.4 (Epistasis in Combinatorial Optimization: Bin Packing). Listing 17.16 on page 903. use all trees where the inner nodes are chosen from while.1. Khuri et al. Though this solution allows for infeasible solutions. As search space G we could. As discussed in this task and sketched in Figure 17. In Figure 17. THE PROBLEM 181 Let us assume that we wish to evolve a program computing the greatest common divisor of two natural numbers a. Such a mutation operator could.3. its behavior has now changed completely.1 with a “t = a”. a random walk. 1. An implementation of this idea is given in Section 57. replace the “t = b” in Line 4 of Listing 17. A unary search operator (mutation) could exchange a node inside a genotype with a randomly created one and a binary search operator (crossover) could swap subtrees between two genotypes. Changing the instruction has epistatic impact on the behavior of the rest of the program – the loop declared in Line 3. Suggestions for a better encoding. Simulated Annealing.1. =. a candidate solution describes the order in which the objects have to be placed into the bin and is processed from left to right. this influence is not mutual: the genes at the end of the permutation have no influence on the genes preceding them. Without stating any specific objective functions. have been made by Jones and Beltramo [1460] and Stawowy [2604]. As can be seen in that listing of results. In Example E1.2. [1534] suggest a method which works completely without permutations: Each gene of a string of length n stands for one of the n objects and its (numerical) allele denotes the bin into which the object belongs. In other words. The genes of this representation are the nodes of the trees. . in Task 67 on page 238 we propose a solution approach to a variant of this problem. a new bin is “opened”. t. object groupings way after the actually swapped objects may also be broken down. the genes at the beginning of the representation have a strong influence on the behavior at its end. !=.17. -. Interestingly (and obviously). b. +.1 on page 25 we introduced bin packing as a typical combinatorial optimization problem. and the random sampling algorithm is listen in Listing 57. The evolved trees could be pasted by the genotype-phenotype mapping into some skeleton code as indicated in Listing 17. neither Hill Climbing nor Simulated Annealing can come very close to the known optimal solutions. an extension of the permutation encoding with socalled separators marking the beginning and ending of a bin. Although only a single gene of the program has been modified. and the leaf nodes are either xa. may now become an endless loop for many inputs a and b. %. ==. we sketch how the exchange of two genes near the beginning of a candidate solution x1 can lead to a new solution x2 with a very different behavior: Due to the sequential packing of objects into bins according to the permutation described by the candidate solutions.1 and an example output of a test program applying Hill Climbing. it should be clear that the new program will have very different (functional) objective values. b ∈ N1 in a C-style representation (X = C). One reason for this performance is that the permutation representation for bin packing exhibits some epistasis. for instance. each accepting two child nodes. Whenever the current object does not fit into the current bin anymore. for instance. we suggest representing a solution to bin packing as a permutation of the n objects to be packed into the bin and to set G = X = Π(n). In other words.3. it does not suffer from the epistasis phenomenon discussed in this example.1 would be a program which fulfills this task. Later. or 0.

3. General countermeasures against epistasis can be divided into two groups. 3. 1.1 Choice of the Representation Epistasis itself is a feature which results from the choice of the search space structure. neither epistatic nor pleiotropic effects can occur.182 17 EPISTASIS. Objective functions can conflict without the involvement of any of these phenomena. 11. 5.3 Countermeasures We have shown that epistasis is a root cause for multiple problematic features of optimization tasks. [248]. Another discussion about different shapes of fitness landscapes under the influence of epistasis is given by Beerenwinkel et al. 17. the . Nevertheless. AND SEPARABILITY 12 objects (with identifiers 0 to 11) and sizes a = (2. a slightly modified version of x1. PLEIOTROPY. Epistasis as well as pleiotropy is a property of the influence of the editable elements (the genes) of the genotypes on the phenotypes. First candidate solution x1 requires 6 bins. 4. 3) 3 5 a5=5 a3=3 g1 = x2 = (0. requires 7 bins and has a very different behavior . 9. 10. 5. 2. 12) should be packed into bins of size b=14 8 a8=8 0 a0=2 a1=3 a2=3 a3=3 1 2 3 4 a4=4 5 a5=5 a6=6 6 7 a7=7 9 a9=9 a10=10 10 a 11 11=12 swap the first and fourth element of the permutation g1 = x1 = (10. 17. define two objective functions f1 (x) = x and f2 (x) = −x which are clearly contradicting regardless of whether they both are subject to maximization or minimization. We can. 9. 8. 3. 3) bin size b = 14 4 a4=4 2 a2=3 1 a1=3 11 a11=12 9 a9=9 8 a8=8 7 a7=7 10 a10=10 11 a11=12 0 a0=2 9 a9=9 10 8 a8=8 6 a6=6 7 a7=7 a10=10 3 a3=3 6 0 a0=2 a6=6 2 a2=3 4 a4=4 5 a5=5 1 a1=3 bin 1 bin 2 bin 3 bin 4 bin 5 bin 6 bin 1 bin 2 bin 3 bin 4 bin 5 bin 6 bin 7 Candidate x2. 5. 5. 7. 8. [239] state the deceptiveness is a special case of epistatic interaction (while we would rather say that it is a phenomenon caused by epistasis). . Naudts and Verschoren [2008] have shown for the special case of length-two binary string genomes that deceptiveness does not occur in situations with low epistasis and also that objective functions with high epistasis are not necessarily deceptive. Generally. 11. 3. 1. 2. for example. if the candidate solutions x and the genotypes are simple real numbers and the genotype-phenotype mapping is an identity mapping. 7. 6. The symptoms of epistasis can be mitigated with the same methods which increase the chance of finding good solutions in the presence of ruggedness or neutrality. 9. 4. 7.3: Epistasis in the permutation-representation for bin packing. epistasis and conflicting objectives in multi-objective optimization should be distinguished from each other. Figure 17. 8. 6. 4. 11. 0. .

2 directly as search space would not only save us the work of implementing a genotype-phenotype mapping. i. The meaning of the gene at locus i in the genotype depends on the allelic states of all genes at loci smaller than i. e. a fact that we pointed out in Chapter 13. 17. There. are modified. this may change many subsequent assignments... e. This usually indicates that these genes are closely located in the same chromosome.5 for information on the Building Block Hypothesis. the other objects may be shifted backwards and many of them will land in different bins than before. The genotype-phenotype mapping starts to take numbers from the start of the genotype sequentially and places the corresponding objects into the first bin. i. e. 2905]. increasing the chance to pick the best candidate solutions for further investigation instead of the weaker ones. Representing an assignment of objects to bins like this has one drawback: if the genes at lower loci. Finding approaches for linkage learning has become an especially popular discipline 3 See Section 29. . Hence. for instance. Such knowledge can be used to protect building blocks3 from being destroyed by the search operations (such as crossover in Genetic Algorithms)..17. 17. Example E17. this representation is quite epistatic. If the object identified by the current gene does not fill into the bin anymore. We outline some general rules for representation design in Chapter 24. [2960]. close to the beginning of the genotype. Identifying these linked genes. 1981]. These two concepts are slightly contradicting.3. If a large object is swapped from the back to an early bin.. this notation is not useful since identifying spatially close elements inside the genotypes g ∈ G is trivial. [2478] observed higher selection pressure also leads to earlier convergence. [2478] showed that applying (a) higher selection pressure. can also increase the reliability of an optimizer to find good solutions. Considering these suggestions may further improve the performance of an optimizer. Choosing the problem space and the genotype-phenotype mapping efficiently can lead to a great improvement in the quality of the solutions produced by the optimization process [2888. linkage is “the tendency for alleles of different genes to be passed together from one generation to the next” in genetics. the GPM continues with the next bin. only working with the newest produced set of candidate solutions while discarding the ones from which they were derived. so careful adjustment of the algorithm settings appears to be vital in epistatic environments. Shinkai et al. the genotype-phenotype mapping. i.3. Avoiding epistatic effects should be a major concern during their design. i. we are interested in alleles of different genes which have a joint effect on the fitness [1980. Algorithm 1.3. Shinkai et al. and (b) extinctive selection. it would also exhibit significantly lower epistasis. is very helpful for the optimization process. using the representation suggested in Example E1. the assignment of n objects to k bins in order to solve the bin packing problem is represented as a permutation of the first n natural numbers.5. Introducing specialized search operations can achieve the same [2478]. Instead.5 (Bad Representation Design: Algorithm 1. COUNTERMEASURES 183 search operations. Interestingly. and the structure of the problem space. learning their epistatic interaction.1). e.1 on page 40 is a good example for bad search space design.3 Linkage Learning According to Winter et al.1 on page 25 and in Equation 1.2 Parameter Tweaking Using larger populations and favoring explorative search operations seems to be helpful in epistatic problems since this should increase the diversity. In the context of Evolutionary Algorithms.

3. 1981] and real [754] genomes.184 17 EPISTASIS. [1083] and the Bayesian Optimization Algorithm (BOA) [483. Module acquisition [108] may be considered as such an effort.2. . AND SEPARABILITY in the area of Evolutionary Algorithms with binary [552.6) by Goldberg et al. 2151]. PLEIOTROPY.3 on page 204.4 Cooperative Coevolution Variable Interaction Learning (VIL) techniques can be used to discover the relations between decision variables. [551] and discussed in the context of cooperative coevolution in Section 21. 17. Two important methods from this area are the messy GA (mGA. 1196. see Section 29. as introduced by Chen et al.

3. What is the difference between epistasis and pleiotropy? [5 points] 185 60. Why does pleiotropy cause ruggedness in the fitness landscape? [5 points] 63. How are epistasis and separability related with each other? [5 points] . Why does epistasis cause ruggedness in the fitness landscape? [5 points] 61. COUNTERMEASURES Tasks T17 59. Why can epistasis cause neutrality in the fitness landscape? [5 points] 62.17.

.

2. y)-data samples (with artificial neural networks or symbolic regression. the simulated scenarios correspond to training cases). Example E18. three types of noise can be distinguished. the fitting of mathematical functions to (x.2 (Creating the perfect Tire). The first form is noise in the training data used as basis for learning or the objective functions [3019] (i). they include a certain degree of noise and imprecision (i). Example E18. the usage of Global Optimization for building classifiers (for example for predicting buying behavior using data gathered in a customer survey for training). Since no measurement device is 100% accurate and there are always random errors. the usage of simulations for determining the objective values in Genetic Programming (here. This issue can be illustrated by using the process of developing the perfect tire for a car as an example. noise is present in such optimization problems. perturbations are also likely to occur during the application of its results. Some typical examples of situations where training data is the basis for the objective function evaluation are 1. for instance).1 Introduction – Noise In the context of optimization. . Besides inexactnesses and fluctuations in the input data of the optimization process. all sorts of material coefficients and geometric constants measured from all known types of wheels and rubber could be available.1 (Noise in the Training Data). This category subsumes the other two types of noise: perturbations that may arise from (ii) inaccuracies in the process of realizing the solutions and (iii) environmentally induced perturbations during the applications of the products. Since these constants have been measured or calculated from measurements. As input for the optimizer. In many applications of machine learning or optimization where a model for a given system is to be learned.Chapter 18 Noise and Robustness 18. data samples including the input of the system and its measured response are used for training. and 3.

0001% of a specific rubber component is used (ii). Definition D18. Genetic Algorithms [932. a local optimum (or even a non-optimal element) for which slight disturbances only lead to gentle performance degenerations is usually favored over a global optimum located in a highly rugged area of the fitness landscape [395]. 1544. 2937] bring up the topic of manufacturing tolerances in multilayer optical coatings. and Gurin and Rastrigin [1155] are some of them. for instance. Wiesmann et al. 2833]. Optimization problems are normally used to find good parameters or designs for components or plans to be put into action by human beings or machines. and plans have to tolerate a certain degree of imprecision. Of course. and 3. During the optimization process.1 gives some examples on problems which require robust optimization. In other words.1 illustrates the issue of local optima which are robust vs.2 The Problem: Need for Robustness The goal of Global Optimization is to find the global optima of the objective functions. Evolution Strategies [168. 294. global optima which are not. Example E18. When optimizing the control parameters of an airplane or a nuclear power plant. 2119] approaches. an extra 0. .4 (Robustness in Manufacturing). Example E18. 2. 18. it may not suffice in practice.3 (Critical Robustness). Figure 18. Many Global Optimization algorithms and theoretical results have been proposed which can deal with noise. Particle Swarm Optimization [1175. A system in engineering or biology isrobust if it is able to function properly in the face of genetic or environmental perturbations [1288.1 (Robustness). While this is fully true from a theoretical point of view. Miller and Goldberg [1897. local optima in regions of the fitness landscape with strong causality are sometimes better than global optima with weak causality. the level of this acceptability is application-dependent. specialized 1. The effects of noise in optimization have been studied by various researchers. Some of them are. [2936. 2734]. accidently. the behavior of many construction plans will be simulated in order to find out about their utility. We would hope that the tires created according to the plan will not fall apart if. It is no use to find optimal configurations if they only perform optimal when manufactured to a precision which is either impossible or too hard to achieve on a constant basis. 2390. the tires should not behave unexpectedly when used in scenarios different from those simulated (iii) and should instead be applicable in all driving situations likely to occur. 1172]. designs. the global optimum is certainly not used if a slight perturbation can have hazardous effects on the system [2734]. Lee and Wong [1703]. As we have already pointed out. There is no process in the world that is 100% accurate and the optimized parameters. there will always be noise and perturbations in practical realizations of the results of optimization. Table 18. 2389. Therefore. When actually manufactured.188 18 NOISE AND ROBUSTNESS The result of the optimization process will be the best tire construction plan discovered during its course and it will likely incorporate different materials and structures. 2733. 1898].

even a species with good genetic material will die out if its phenotypic features become too sensitive to perturbations. Area References Combinatorial Problems [644. on the other hand. perturbations like abnormal temperature. Table 18. 2898] tems .[52.1: Applications and Examples of robust methods. Handa et al. 1401. THE PROBLEM: NEED FOR ROBUSTNESS 189 Example E18.5 (Dynamic Robustness in Road-Salting).6 (Robustness in Nature). Species robust against them. injuries. 2734] found a nice analogy in nature: The phenotypic characteristics of an individual are described by its genetic code. Tsutsui et al. will survive and evolve. If the phenotypic features emerging under these influences have low fitness. The optimization of the decision process on which roads should be precautionary salted for areas with marginal winter climate is an example of the need for dynamic robustness. f(x) X robust local optimum global optimum Figure 18. Thus.1: A robust local optimum vs. During the interpretation of this code. the organism cannot survive and procreate. The global optimum of this problem is likely to depend on the daily (or even current) weather forecast and may therefore be constantly changing. illnesses and so on may occur. a “unstable” global optimum. nutritional imbalances.2. [1177] point out that it is practically infeasible to let road workers follow a constantly changing plan and circumvent this problem by incorporating multiple road temperature settings in the objective function evaluation. 1177] Distributed Algorithms and Sys.18. Example E18. [2733.

1401] [375] [1177] [2898] [489] [2925] [1401] [63. 2937] [52. 67. δn ) . function evaluation during the optimization process. By adding random noise and artificial perturbations to the training cases. this can be simplified f(x.. several approaches for dealing with the need for robustness have been developed. . δ2 . 644.3 Countermeasures For the special case where the phenome is a real vector space (X ⊆ Rn ).. δi ∈ R in the method suggested by Greiner [1134. Inspired by Taguchi methT ods1 [2647].wikipedia. This approach is also used in the work of Wiesmann et al.6. It would then make sense to sample the probability f distribution of δ a number of t times and to use the mean values of ˜ δ) for each objective f(x. x2 + δ2 . 644.1 can be minimized. [2936. T to ˜ (x1 + δ1 . 1135]. 2936. 67.257 on page 695). 1 http://en. 2937] and basically turns the optimization algorithm into something like a maximum likelihood estimator (see Section 53. 644] 18. 63. xn + δn ) . If the distributions and influences of the δi are known.1) t i=1 This method corresponds to using multiple. 1134. In cases where the optimal value y ⋆ of the objective function f is known. f′ (x) = (18. In the special case where δ is normally distributed. 1135.org/wiki/Taguchi_methods [accessed 2008-07-19] . . t 2 1 y⋆ − ˜ δi ) f(x. Equation 18.. the chance of obtaining robust solutions which are stable when applied or realized under noisy conditions can be increased. the objective function f(x) : x ∈ X can be rewritten as ˜ δ) [2937].. possible disturbances are represented by a vector δ = (δ1 .2 and Equation 53. different training scenarios during the objective function evaluation in situations where X ⊆ Rn .190 Engineering Graph Theory Image Synthesis Logistics Mathematics Motor Control Multi-Agent Systems Software Wireless Communication 18 NOISE AND ROBUSTNESS [52.

x1 . alternative solution x⋆ ∈ X exists which has a smaller (or equal) error for the set of all possible (maybe even infinitely many). 2397. This is especially true if the solutions represent functional dependencies inside the training data St . e. but other techniques such as Particle Swarm Optimization.1 The Problem Overfitting1 is the emergence of an overly complicated candidate solution in an optimization process resulting from the effort to provide the best results for as much of the available training data as possible [792. and symbolic regression are the most common techniques to solve such problems. two additional phenomena with negative influence can be observed: overfitting and (what we call) oversimplification. b[i] ∈ B. or (theoretically) producible data samples S ∈ S. Let us assume that St ⊆ S be the training data sample from a space S of possible data. The candidate solutions would actually be functions x : A → B and the goal is to find the optimal function x⋆ which exactly represents the (previously unknown) dependencies between A and B as they occur in St . 1 http://en. i. (normal) regression. for instance. or otherwise structured functions and only the coefficients x0 .Chapter 19 Overfitting and Oversimplification In all scenarios where optimizers evaluate some of the objective values of the candidate solutions by using training data.wikipedia.. Artificial neural networks. When trying to good candidate solutions x : A → B.1 (Function Approximation). 1040.1 Overfitting 19.n. Let us assume in the following that this space would be the cross product S = A × B of a space B containing the values of the variables whose dependencies on input variables from A should be learned. available. and Differential Evolution can be used too. possible objective functions (subject n−1 n−1 2 to minimization) fE (x) = i=0 |x(a[i]) − b[i]| or fSE (x) = i=0 (x(a[i]) − b[i]) . be polynomials x(a) = x0 + x1 a1 + x2 a2 + . b[i]) : a[i] ∈ A. s[1]. would be optimized. Example E19. .1 (Overfitting). 2531]. . . Evolution Strategies. a function with x⋆ (a[i]) ≈ b[i] ∀i ∈ 1. the candidate solutions may..org/wiki/Overfitting [accessed 2007-07-03] . 19. The structure of each single element s[i] of the training data sample St = (s[0]. x2 .. . Definition D19. . Then. s[n − 1]) is s[i] = (a[i]..1. A candidate solution x ∈ X optimized based on a set ` of training data St is considered to be overfitted if a less complicated.

2396. The function x⋆ (a) = b for A = B = R shown in Fig. Surveys that represent the opinions of people on a certain topic or randomized simulations will exhibit variations from the true interdependencies of the observed entities. but will fail to deliver good results for all other input data.4 Hence.b has been sampled three times.1.1.a: Three sample points of a linear function.1: Overfitting due to complexity.). 1742. Such results would be considered as overfitted.2. b b x★ b x a Fig.c ` which also match the data. 2684] as described in Example E19. This process has been measured using some technical equipment and the n = 100 = |S| noisy samples depicted in Fig. 2 We will discuss overfitting in conjunction with Genetic Programming-based symbolic regression in Section 49. There exists exactly one polynomial3 of the degree n − 1 that fits to each such training data and goes through all its points.b have been obtained.1 would be zero. overfitting also stands for the creation of solutions which are not robust. In Figure 19.3 (Overfitting caused by Noise). Here. 19.org/wiki/Polynomial [accessed 2007-07-03] 4 http://en. both fE (`) and fSQ (`) x x from Example E19.c: The overly complex variant x. a Fig. 19. there is exactly one perfectly fitting function of minimal degree. this complicated solution does not accurately represent the physical law that produced the sample data and will not deliver precise results for new. Nevertheless.2 (Example E19. In Figure 19. Fig. Although being a perfect match to the measurements. however. Hence. 19. ` Figure 19. 19. a Fig. Fig.1.2 we have sketched how such noise may lead to overfitted results. Optimizers.2.a illustrates a simple physical process obeying some quadratic equation. Hence.b: The optimal solution x⋆ . As we have already pointed out. 19. There exists no other polynomial of a degree of two or less that fits to these samples than x⋆ .wikipedia. 19.c shows a function x resulting from optimization ` that fits the data perfectly.1. however.1. there exists no measurement device for physical processes which delivers perfect results without error.1.a. Example E19. as sketched in Fig. It could. x plays the role of the overly complicated solution which ` will perform as good as the simpler model x⋆ when tested with the training sets only.1 on page 531.2.1 Cont.wikipedia. 3 http://en. for instance. A very common cause for overfitting is noise in the sample data. too. could also find overfitted polynomials of a higher degree such as x in Fig. Notice that x⋆ may.org/wiki/Polynomial_interpolation [accessed 2008-03-01] . we have sketched this problem. The phenomenon ` of overfitting is best known and can often be encountered in the field of artificial neural networks or in curve fitting2 [1697. data samples based on measurements will always contain some noise. be a polynomial of degree 99 which goes right through all the points and thus. have a larger error in the training data St than x. has an error of zero. Example E19. 2334.1. 19. 19. there will also be an infinite number of polynomials with a higher degree than n − 1 that also match the sample data perfectly. when only polynomial regression is performed.192 19 OVERFITTING AND OVERSIMPLIFICATION Hence.1. different inputs. 19.

2.2 Countermeasures There exist multiple techniques which can be utilized in order to prevent overfitting to a certain degree. OVERFITTING 193 b x★ b b x a Fig. . 19. 19.2..2: Fitting noise. 19. a Fig.2.19.1. From the examples we can see that the major problem that results from overfitted solutions is the loss of generality. If arbitrarily many training datasets or training scenarios can be generated.1 Restricting the Problem Space A very simple approach is to restrict the problem space X in a way that only solutions up to a given maximum complexity can be found. A solution of an optimization process is general if it is not only valid for the sample inputs s[0]. may improve the generalization capabilities of the derived solutions.c: The overfitted result x. s[1]. In terms of polynomial function fitting. introducing incoherence and ruggedness into the fitness landscape.1) which solely concentrate on the error of the candidate solutions should be augmented by penalty terms and non-functional criteria which favor small and simple models [792. . this could mean limiting the maximum degree of the polynomials to be tested.2. ` Figure 19. 19.a: The original physical process. 19. The resulting objective values then may differ largely even if the same individual is evaluated twice in a row.2. a Fig.2.2 (Generality). for example. although slowing down the optimization process.b: The measured training data.2 Complement Optimization Criteria Furthermore. Definition D19. but also for different inputs s ∈ St if such inputs exist. 19..1. 1510]. The first method is to use a new set of (randomized) scenarios for each evaluation of each candidate solution. there are two approaches which work against overfitting: 1.3 Using much Training Data Large sets of sample data.1. 19. the functional objective functions (such as those defined in Example E19. It is most efficient to apply multiple such techniques together in order to achieve best results.1. s[n − 1] ∈ St which were used for training during the optimization process.1.

too much neutrality.2. 19. Such a decay of the learning rate makes overfitting less likely.194 19 OVERFITTING AND OVERSIMPLIFICATION 2. Whereas overfitting denotes the emergence of overly complicated candidate solutions. making it hard for the optimizers to follow a stable gradient for multiple steps. some algorithms allow to decrease the rate at which the candidate solutions are modified by time. Another possible reason for oversimplification is that ruggedness. only the training data is used.2 Oversimplification 19.1. 19. . they are rough overgeneralizations which fail to provide good results for cases not part of the training. the fluctuations of the objective values between the iterations will be very large. 19.1. 19. If their performance on the test cases begins to decrease. one should always be aware that such an incomplete coverage may fail to represent some of the dependencies and characteristics of the data. oversimplified solutions are not complicated enough.1. At the beginning of each iteration of the optimizer. which then may lead to oversimplified solutions. 1]. deceptiveness. a new set of (randomized) scenarios is generated which is used for all individual evaluations during that iteration.2. they are probably overfitted. If their behavior is significantly worse when applied to Sc than when applied to St . They can be made even more comparable if the objective functions are always normalized into some fixed interval.4 Limiting Runtime Another simple method to prevent overfitting is to limit the runtime of the optimizers [2397]. The same approach can be used to detect when the optimization process should be stopped.3: The training sets only represent a fraction of the possible inputs. A common cause for oversimplification is sketched in Figure 19. The resulting solutions x are tested with the test cases afterwards.5 Decay of Learning Rate For the same reason. there are no benefits in letting the optimization process continue any further. In both cases it is helpful to use more than one training sample or scenario per evaluation and to set the resulting objective value to the average (or better median) of the outcomes. The best known candidate solutions can be checked with the test cases in each iteration without influencing their objective values which solely depend on the training data. Otherwise. too. It is commonly assumed that learning processes normally first find relatively general solutions which subsequently begin to overfit because the noise “is learned”. it is common practice to separate it into a set of training data St and a set of test cases Sc . As this is normally the case. say [0. During the optimization process.2.6 Dividing Data into Training and Test Sets If only one finite set of data samples is available for training/optimization.1 The Problem With the term oversimplification (or overgeneralization) we refer to the opposite of overfitting.2. This method leads to objective values which can be compared without bias. Although they represent the training samples used during the optimization process seemingly well.

a: The “real system” and the points describing it.2.c. 19. ` Figure 19.3. Having training data that correctly represents the sampled system does not mean that the optimizer is able to find a correct solution with perfect fitness – the other.3. It then cannot adapt them completely even if the training data perfectly represents the sampled process. this function does not touch any other point on the original cubic curve and behaves totally differently at the lower parameter area. By using multiple scenarios for each individual evaluation.a shows a cubic function. 19. the result would likely again be something like Fig. b Fig. regardless of how many training samples are used.3.2. 19. in order to decrease this chance even more. These scenarios can be replaced with new. we would need 232 = 4 294 967 296 samples. 19. if it was not known that the system which was to be modeled by the optimization process can best be represented by a polynomial of the third degree.2. . it is still possible that the optimization process could yield Fig. Example E19.3.2.c depicts a square function – the polynomial of the lowest degree that fits exactly to these samples.19. 19.3: Oversimplification.c as a result. Then. 19. Fig. OVERSIMPLIFICATION 195 or high epistasis in the fitness landscape may lead to premature convergence and prevent the optimizer from surpassing a certain quality of the candidate solutions. its causes have to be mitigated.b. Since it is a polynomial of degree three. some vital characteristics of the function are lost. b b a P a Fig. Fig.3. By doing so. only three samples have been provided in Fig. 19. Although it is a perfect match. However.2 Countermeasures In order to counter oversimplification. randomly created ones in each generation. Maybe not knowing this. Furthermore.4 (Oversimplified Polynomial). a Fig. four sample points are needed for its unique identification. 19.b: The sampled training data. 19.3. A third possible cause is that a problem space could have been chosen which does not include the correct solution. If a program is synthesized Genetic Programming which has an 32bit integer number as input.3.3. 19. the chance of missing important aspects is decreased. previously discussed problematic phenomena can prevent it from doing so. one could have limited the problem space X to polynomials of degree two and less.1 Using many Training Cases It is not always possible to have training scenarios which cover the complete input space A. even if we had included point P in our training data.c: The oversimplified result x.

2. the representation of the candidate solutions.196 19.. Then again. i.2 19 OVERFITTING AND OVERSIMPLIFICATION Using a larger Problem Space The problem space. e. should further be chosen in a way which allows constructing a correct solution to the problem defined. . releasing too many constraints on the solution structure increases the risk of overfitting and thus. careful proceeding is recommended.2.

n : fi (x1 ) < fi (x2 ) (in minimization problems. 2915]. X) = 0. see Definition D3. X) of a candidate solution x ∈ X is the number of elements in a reference set X ⊆ X it is dominated by (Definition D3. in the case of probabilistic metaheuristics. e..14). Many studies in the literature consider mainly bi-objective MOPs [1237.1 (Dimension of MOP). Table 20.1 gives some examples for many-objective situations. The domination rank #dom(x. A candidate solution x1 ∈ X is said to dominate another candidate solution x2 ∈ X (x1 ⊣ x2 ) if and only if its corresponding vector of objective values f (x1 ) is (partially) less than the one of x2 .16). MOPs having a higher number of objective functions are common in practice – sometimes the number of objectives reaches double figures [599] – leading to the so-called many-objective optimization [2228. 1577.5 on page 65. It is currently a hot research topic with the first conference on evolutionary multi-objective optimization held in 2001 [3099]. We refer to the number n = |f | of objective functions f1 . . and the solutions considered to be global optima are non-dominated for the whole problem space X (#dom(x. many algorithms are designed to deal with that kind of problems. Definition D3. at least wish to approximate as closely as possible..n ⇒ fi (x1 ) ≤ fi (x2 ) and ∃i ∈ 1.1 Introduction The most common way to define optima in multi-objective problems (MOPs. f2 .3.13). X) = 0. 1893. Area Combinatorial Problems Computer Graphics References [1417. As a consequence.1: Applications and Examples of Many-Objective Optimization.6 on page 55) is to use the Pareto domination relation discussed in Section 3. i ∈ 1. Definition D20. a non-dominated element (relative to X) has a domination rank of #dom(x.. fn ∈ f of an optimization problem as its dimension (or dimensionality). i.. These are the elements we would like to find with optimization or. This term has been coined by the Operations Research community to denote problems with more than two or three objective functions [894. 1437] [1002. Table 20. However.Chapter 20 Dimensionality (Objective Functions) 20. 1530].. see Definition D3. 3099]. 2060] .

it can be assumed that purely Pareto-based optimization approaches (maybe with added diversity preservation methods) will not perform good in problems with four or more objectives. As a consequence. Kang et al. [1418] further demonstrate the degeneration of the performance of the traditional multi-objective metaheuristics in many-objective problems in comparison with single-objective approaches. [1400] show that various elements. Optimizers that use Pareto ranking based methods to sort the population will be very effective for small numbers of objectives. Ikeda et al. Visualizing the solutions in a human-understandable way becomes more complex with the rising number of dimensions. Deb et al. the algorithm will fill the initial population pop with ps randomly created individuals p. [750] recognize this problem of redundant solutions and incorporate an example function provoking it into their test function suite for real-valued.1 (Non-Dominated Elements in Random Samples). . the limited capabilities of the human mind [1900]. Hughes [1303] indicates that the following two hypotheses may hold: 1.4) if the number of objective function evaluations is fixed. SPEA-2 [3100]. Example E20. Imagine that a population-based optimization approach was used to solve a manyobjective problem with n = |f | objective functions. If we assume the problem space X to be continuous and the creation process sufficiently random. no two individuals in pop will be alike. PESA [639]) to two or three objectives cannot simply be extrapolated to higher numbers of optimization criteria [1530]. The results of Jaszkiewicz [1437] and Ishibuchi et al. an operator will not be able to make efficient decisions if more than a dozen objectives are involved. the majority of candidate solutions are nondominated. which are apart from the true Pareto frontier. we can assume that there will be no two candidate solution in pop for which any of the objectives takes on the same value. At startup. Bouyssou [374] suggests that based on. but not perform as effectively for manyobjective optimization in comparison with methods based on other approaches.198 Data Mining Decision Making Software 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS) [2213] [1893] [2213] 20. we want to illustrate this issue with an experiment. Additionally to this algorithm-sided limitations. may survive as hardly-dominated solutions which leads to a decrease in the probability of producing new candidate solutions dominating existing ones. Deb [740] showed that the number of non-dominated elements in random samples increases quickly with the dimension. An optimizer that produces an entire Pareto set in one run is better than generating the Pareto set through many single-objective optimizations using an aggregation approach such as weighted min-max (see Section 3. This phenomenon is called dominance resistant.2 The Problem: Many-Objective Optimization When the dimension of the MOPs increases. traditional nature-inspired algorithms have to be redesigned: According to Hughes [1303] and on basis of the results provided by Purshouse and Fleming [2231]. If the objective functions are locally invertible and sufficiently independent. [1420].3. multi-objective optimization. Results from the application of such algorithms (NSGA-II [749]. for instance. too [1420]. 2. The hyper-surface of the Pareto frontier may increase exponentially with the number of objective functions [1420]. [1491] consider Pareto optimality as unfair and imperfect in many-objective problems. Like Ishibuchi et al.

1.2 0.2. 20.1.b: ps = 10 P(#dom=k | ps.1 0 0 4 P(#dom=k | ps.1 0 0 4 k 8 12 2 4 no.1 0 0 4 k 8 12 2 4 jectiv no.n) 0.1 0 0 0. THE PROBLEM: MANY-OBJECTIVE OPTIMIZATION 199 P(#dom=k | ps.e: ps = 500 Fig.1. .1. of ob 6 8 14 10 12 es n Fig.n) 4 0.3 0. of ob 6 8 14 10 12 k 8 es n 12 2 4 jectiv no.g: ps = 3000 Figure 20.1 0 0 4 P(#dom=k | ps. of o 6 8 sn bjective 14 10 12 k 8 12 2 4 jectiv no. 20.1 0 0 4 k 8 12 2 4 jectiv no.n) 0.1. of ob 6 8 14 10 12 es n Fig.3 0. 20.3 0. 20.n) 0.1: Approximated distribution of the domination rank in random samples for several population sizes. 20. of o 6 8 sn bjective 14 10 12 k 8 12 2 4 jectiv no.n) P(#dom=k | ps. 20. of ob 6 8 14 10 12 es n Fig.n) 0.1.2 0.3 0.n) 0.2 0.a: ps = 5 Fig.1.2 0.3 0. 20.3 0.1 0 0 4 k 8 12 2 4 no.d: ps = 100 P(#dom=k | ps.20.3 0.2 0. of ob 6 8 14 10 12 es n Fig.c: ps = 50 Fig.2 0.f: ps = 1000 P(#dom=k | ps.2 0.

) Figure 20. the maximum of this distribution shifts to the right.457. We have approximated this probability distribution with experiments where we determined the domination rank of ps n-dimensional vectors where each element is drawn from a uniform distribution for several values of n and ps. no o of b n ves cti je Figure 20.1 and Figure 20. P ( #dom = 0| ps = 35. P ( #dom = k| ps. The population size ps in Figure 20.2. We limited the z-axis to 0. With rising ps. An extremely coarse rule of thumb for this experiment would hence be that around 0. n) takes on roughly the shape of a Poisson distribution (see Section 53. P ( #dom = 0| ps = 90. Still. n = 2) ≈ 0. In Figure 20..4. n = 5) ≈ 0. can still be seen. P(#dom=0 | ps.ps − 1 depends on the population size ps and the number of objective functions n. The fraction of non-dominated elements in the random populations is illustrated in Figure 20.2) along the n-axis. each domination rank from 0 to ps − 1 occurs exactly once in the population. spanning from n = 2 to n = 50 and ps = 3 to ps = 3600. n) drops rapidly. i. n = 4) ≈ 0.1. n) quickly approaches one for rising n.6 0. around 64% of the population are already non-dominated in this experiment.447. n = 8) ≈ 0. n = 7) ≈ 0. no experiments where performed since there.3).3 so the behavior of the other ranks k > 0. . the approximated distribution of P ( #dom = k| ps. n) of the probability that the domination rank “#dom” of a randomly selected individual from the initial.3. P ( #dom = 0| ps = 650. we find P ( #dom = 0| ps = 12. P ( #dom = k| ps. The value of P ( #dom = 0| ps.452.455. n = 3) ≈ 0.2 illustrate the results of these experiments. n = 6) ≈ 0. We find that it again seems to rise exponentially with n. For fixed n and ps. and P ( #dom = 0| ps = 1800.466.4 0. the arithmetic means of 100 000 runs for each configuration. it rises for some time and then falls again. which quickly becomes negligible. P ( #dom = 0| ps = 250..2 seems to have only an approximately logarithmically positive influence: If we list the population sizes required to keep the fraction of non-dominated candidate solutions at the same level of P ( #dom = 0| ps = 5.8 0.200 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS) The distribution P ( #dom = k| ps. n) is illustrated for various n.n) 1. For a fixed k and ps. randomly created population is exactly k ∈ 0.2 0 500 20 1000 15 10 5 pop 1500 2000 ulat 2500 ion 3000 size 3500 ps . even for ps = 3000. from the looks roughly similar to an exponential distribution (see Section 53. (For n = 1.459.460. meaning that an algorithm with a higher population size may be able to deal with more objective functions.6en individuals are required in the population to hold the proportion of non-dominated at around 46%.0 0. e.2: The proportion of non-dominated candidate solutions for several population sizes ps and dimensionalities n.

2. In the following we list some of approaches for many-objective optimization. . According to them. Indeed.3 Countermeasures Various countermeasures have been proposed against the problem of dimensionality.20. mostly utilizing the information provided in [1237. For five or six objective functions. [1237]. 1420]. Farina and Amato [894] propose to rely fuzzy Pareto methods instead of pure Pareto comparisons. that is). The utility of the solutions cannot be understood by the human operator anymore. Well. Nice surveys on contemporary approaches to many-objective optimization with evolutionary approaches.2 Increasing Selection Pressure A very simple idea for improving the way multi-objective approaches scale with increasing dimensionality is to increase the exploration pressure into the direction of the Pareto frontier.1 we know that this will work only for a few objective functions and we have to throw in exponentially more individuals in order to fight the influence of many objectives. 1576. 20. the increasing dimensionality of the objective space leads to three main problems [1420]: 1. The number of possible Pareto-optimal solutions may increase exponentially. 829. adding objective functions may be beneficial (while it is not in others). [1420]. If we follow the rough approximation equation from the example.5 (in the scenario used there. one approach is using indicator functions such as those involving hypervolume metrics [1419. however. Brockhoff et al. COUNTERMEASURES 201 To sum things up. it should be noted that there exists also interesting research where objectives are added to a single-objective problem to make it easier [1553. and on benchmark problems have been provided by Hiroyasu et al. Hence. 20. The performance of traditional approaches based solely on Pareto comparisons deteriorates.3. [1420] distinguish approaches which modify the definition of domination in order to reduce the number non-dominated candidate solutions in the population [2403] and methods which assign different ranks to non-dominated solutions [636. has not been introduced in these studies: increasing the population size. 3. 20. 2025] which you can find summarized in Section 13.1 Increasing Population Size The most trivial measure.3 Indicator Function-based Approaches Ishibuchi et al. Ishibuchi et al.3. 2838]. we may still be able to obtain good results if we do this – for the price of many additional objective function evaluations. increasing the population size can only get us so far. it would already take a population size of around ps = 100 000 in order to keep the fraction of non-dominated candidate solutions below 0.3. Finally. Ishibuchi et al. [1420] point further out that fitness assignment methods could be used which are not based on Pareto dominance. [425] showed that in some cases. to the difficulties arising in many-objective. 830.4. from Example E20.6. 2629].3. 20. 1635. and L´pez Jaimes and Coello o Coello [1775].

Wagner et al. 2838] support the results and assumptions of Hughes [1303] in terms of single-objective being better in many-objective problems than traditional multi-objective Evolutionary Algorithms such as NSGA-II [749] or SPEA 2 [3100].202 20.3.4 Scalerizing Approaches 20 DIMENSIONALITY (OBJECTIVE FUNCTIONS) Another fitness assignment method are scalarizing functions [1420] which treat manyobjective problems with single-objective-style methods. Obayashi and Sasaki [2060] and in [1893]. K¨ppen and o Yoshida [1577]. 20. Other scalarizing methods can be found in [1304. This leads to a decrease of the number of non-dominated points [1420] and can be achieved by either incorporating preference information [746. . 745. Advanced single-objective optimization like those developed by Hughes [1303] could be used instead of Pareto comparisons.3. 933. 1416.5 Limiting the Search Area in the Objective Space Hiroyasu et al.6 Visualization Methods Approaches for visualizing solutions of many-objective problems in order to make them more comprehensible have been provided by Furuhashi and Yoshikawa [1002].3. 1417. 1437] 20. 2695] or by reducing the dimensionality [422–424. 2411]. 2010. 3096] which favor more than just distribution aspects can perform very well. They also show that more recent approaches [880. [2837. [1237] point out that the search area in the objective space can be limited. but also refute the general idea that all Pareto-based cannot cope with their performance.

. Example E21.. In order to better distinguish between the dimensionality of the search and the objective space. there exist one hundred possible results and for G = (1. i.1) denotes how many algorithm 1 http://en.1 The Problem Especially in combinatorial optimization.1: Applications and Examples of Large-Scale Optimization. there is another space in optimization whose dimension can make an optimization problem difficult: the search space G. 3031] tems Engineering [2860] Function Optimization [551. The problem of small-scale versus large-scale problems can best be illustrated using discrete or continuous vector-based search spaces as example. it is already one thousand.. if the number of points in G be n. 2486. Table 21. we will refer to the former one as scale.org/wiki/Curse_of_dimensionality [accessed 2009-10-20] .Chapter 21 Scale (Decision Variables) In Chapter 20. 1748] Databases [162] Distributed Algorithms and Sys. Here.1. However. 1927. If we search. In other words.[1748.10)3 . In Example E21. 3020] Graph Theory [3031] Software [3031] 21. e. for instance. the issue of scale has been considered for a long time. The “curse of dimensionality” 1 of the search space stands for the exponential increase of its volume with the number of decision variables [260.10). some large-scale optimization problems are listed.1 (Small-scale versus large-scale). Area References Data Mining [162.wikipedia. 994.10)2 . the computational complexity (see Section 12. the dimension of the objective space. then the number of points in Gm is nm . 2130. When G = (1. in the natural interval G = (1. 261]. We referred to this problem as dimensionality. we give an example for the growth of the search space with the scale of a problem and in Table 21.1. we discussed how an increasing number of objective functions can threaten the performance of optimization algorithms.. there are ten points which could be the optimal solution.

especially N P-hard problems should be mentioned. If solving a problem of scale n takes 2n algorithm steps.5 and E4. it is possible to choose a search space which has a smaller size than the problem space (|G| < |X|). any measure to use as much computational power as available can be a remedy.5n will clearly lead to a great runtime reduction. Such a reduction may even be worth . 21. or hardware. The second task would involve significantly fewer variables. distributed computing [1747] may be the method of choice. the weights of an artificial neural network used to move surface points to a beneficial configuration could be optimized. Grammatical Evolution. With more computers.2. the problem can become tangible and good results may be obtained in reasonable time.2.2) that uses feedback from simulations or objective functions in this process. one option are generative mappings (see Section 28. Here.1 Parallelization and Distribution When facing problems of large-scale.1 on page 145. Here.1) which step-by-step construct a complex phenotype by translating a genotype according to some rules. the time requirements to solve problems which require step numbers growing exponentially with n quickly exceed any feasible bound.2. unfolds a symbol according to a BNF grammar with rules identified in a genotype. there (currently) exists no way to solve large-scale N P-hard problems exactly within feasible time. since they always exhibit this feature. for instance.2 Countermeasures 21. Example E21.204 21 SCALE (DECISION VARIABLES) steps are needed to solve a problem which consists of n parameters. the number of possible solutions which can be discovered by this approach decreases (it is at most |G|). However. In such a case of low epistasis (see Chapter 17). a large-scale problem can be divided into several problems of smaller-scale which can be optimized separately. Indirect genotype-phenotype mappings can link the spaces together. parts of candidate solutions are independent from each other and can be optimized more or less separately.2 Genetic Representation and Development The best way to solve a large-scale problem is to solve it as a small-scale problem. 21. for problems residing in the gray area between feasible and infeasible.2. However. For some optimization tasks. 21. only a linear improvement of runtime can be achieved at most. Besides combinatorial problems.2. numerical problems suffer the curse of dimensionality as well.1) in order to find an optimal shape. solving two problems of scale 0.2. Thus.3 Exploiting Separability Sometimes. ontogenic mapping (see Section 28.2. Obviously. Of course. Another approach is to apply a developmental. as we will discuss later in Section 28. As sketched in Figure 12. the problematic symptom is the high runtime requirement. This recursive process can basically lead to arbitrarily complex phenotypes. cores.2 (Dimension Reduction via Developmental Mapping). since an artificial neural network with usually less than 1000 or even 100 weights could be applied.2.2. Instead of optimizing (may be up to a million) points on the surface of an airplane’s wing (see Example E2.

the different strengths of the optimization methods can be combined. an algorithm with good convergence speed may be granted more runtime. which starts with one algorithm and switches to another one when no further improvements are found [? ]. This way. 3020]. i.2. to Cooperative Coevolution [2210. 21. for instance.2. Later. Coevolution has shown to be an efficient approach in combinatorial optimization [1311].. a sequential approach can be performed. the focus can shift towards methods which can retain diversity and are not prone to premature convergence. 2211]. COUNTERMEASURES 205 sacrificing some solution quality. If the optimization problem at hand exhibits low epistasis or is separable. such a sacrifice may even be unnecessary.4 Combination of Techniques Generally. If extended with a cooperative component. e. it can efficiently exploit separability in numerical problems and lead to better results [551. Alternatively. In the beginning. This way. it can be a good idea to concurrently use sets of algorithm [1689] or portfolios [2161] working on the same or different populations.21. . first an interesting area in the search space can be found which then can be investigated in more detail.

.

2120. Particle Swarm Optimization [316. 2302]. Evolutionary Algorithms [123. 1149. Genetic Algorithms [1070. 2144. 786. 1733. 1956. 1730]. 644. 1955. 1840. 2502. 1957. we give some examples for applications of dynamic optimization.Chapter 22 Dynamically Changing Fitness Landscape It should also be mentioned that there exist problems with dynamically changing fitness landscapes [396. 1730. be tackled with special forms [3019] of 1. 2731. 1446. 786. 2731. Here we have the problem that an optimum in iteration t will possibly not be an optimum in iteration t + 1 anymore. 397. 398. 1733. 3106] [1148. 1909. 2731. 2499. 1728. 964–966. Problems with dynamic characteristics can. 2499] [1996. 3032. 964–966. 2791. In Table 22. 2941]. 2731. 1909. 3007] [400. 1948–1950]. 1486. 498. Table 22. 1486. 1733. 2425. 1935. 1544. 1909. 3007. 644. 1446. for example. 3. 1997.1. 1995. 2925] 1982. 2118]. 786.1: Applications and Examples of Dynamic Optimization. 1982. 1401. 1840. 2723. Area Chemistry Combinatorial Problems Cooperation and Teamwork Data Mining Distributed Algorithms and Systems Engineering Graph Theory Logistics Motor Control Multi-Agent Systems References [1909. 397] and Morrison and De Jong [1957] are good examples for dynamically changing fitness landscapes. 2424. 2502] [3018] [353. 5. The task of an optimization algorithm is then to provide candidate solutions with momentarily optimal objective values for each point in time. 2144. 3032. 2694. 1840. 556. 2387. 3106] [556. 2144. 2694. 2. and Ant Colony Optimization [1148. You can find them discussed in ?? on page ??. 962. 1982. 3007] [353. 1995. 2694. 2425. 2791. Differential Evolution [1866. 2425. 1486. 962. The moving peaks benchmarks by Branke [396. 2387. 1148. 1730. 2120] [489] [2144. 400. 1935. 1401. 499. . 644. 2988]. 4.

3032] [556. 644. 1995. 2144. 1486. 1486.208 Security Software Wireless Communication 22 DYNAMICALLY CHANGING FITNESS LANDSCAPE [1486] [1401. 2502] .

. There are also algorithms which deliver good results for many different problem classes.org/wiki/No_free_lunch_in_search_and_optimization [accessed 200803-28] 2 Notice that we have partly utilized our own notations here in order to be consistent throughout the book.. it is very similar to our Definition D6. Ψ(dy ) = min {dy : i = 1. Wolpert and Macready [2964] only regard unique visits and thus define a : d ∈ D → g : g ∈ d. . we know the most important difficulties that can be encountered when applying an optimization algorithm to a given problem. Wolpert and Macready [2964] use the handy example of the Traveling Salesman Problem in order to illustrate this 1 http://en.1 on page 103. Furthermore. its optimization would be useless. They further call a time-ordered set dm of m distinct visited points in G × R a “sample” of size m and write dm ≡ {(dg (1).. thus. dy (m))}. . Formally. Performance measures Ψ can be defined independently from the optimization algorithms only based on the values of the objective function visited in the samples dm .1 Initial Definitions Wolpert and Macready [2964] consider single-objective optimization over a finite domain and define an optimization problem φ(g) ≡ f(gpm(g)) as a mapping of a search space G to the objective space R. easily be accepted. . 2964] in their No Free Lunch Theorems1 (NFL) for search and optimization algorithms. it makes sense to define a probability P (φ) that we are actually dealing with φ and no other problem. Often. this means a : D → G. (dg (m). Then. there exists a variety of optimization methods specialized in solving different types of problems. These facts have been formalized by Wolpert and Macready [2963.Chapter 23 The No Free Lunch Theorem By now.wikipedia. dy (2)) . Without loss of generality.m} would be the appropriate m m measure. but may be outperformed by highly specialized methods in each of them. dy (1)) . Since the behavior in wide areas of φ is not obvious. An optimization algorithm a can now be considered to be a mapping of the previously visited points in the search space (i. only parts of the optimization problem φ are known. Instead. the set Dm = (G × R) is the space of all possible samples of length m and D = ∪m≥0 Dm is the set of all samples of arbitrary size. dg (i) is the genotype and dy (i) m m m m m m m m m the corresponding objective value visited at time step i. a sample) to the next point to be visited. . (dg (2). 23. The fact that there is most likely no optimization method that can outperform all others on all problems can. only skipping the possible search operations. If the objective function is subject to minimization.2 Since this definition subsumes the problem space and the genotype-phenotype mapping. we have seen that it is arguable what actually an optimum is if multiple criteria are optimized at once. for instance. If the minima of the objective function f were already identified beforehand. e.

a ). 23. see Section 50. m. in a smaller set of optimization tasks. Yet.3 Implications From this theorem.org/wiki/Matroid [accessed 2008-03-28] .6. i. For two optimizers a1 and a2 . We act as if there was a probability distribution over all possible problems which is non-zero for the TSP-alike ones and zero for all others. we would use the same optimization algorithm a for all problems of this class without knowing the exact shape of φ. it just takes them longer to do so. for example. e. Hill Climbing approaches. the average over all φ of P (dy |φ. For every problem where a given method leads to good results.1) ∀φ ∀φ Hence. this means that P (dy |φ. This corresponds to the assumption that there is a set of very similar optimization problems which we may encounter here although their exact structure is not known. the conditional probability of finding a particular m sample dy . that is. One interpretation of the No Free Lunch Theorem is that it is impossible for any optimization algorithm to outperform random walks or exhaustive enumerations on all possible problems. e.wikipedia. It shows that general optimization approaches like Evolutionary Algorithms can solve a variety of problem classes with reasonable performance.2. will be much faster than Evolutionary Algorithms if the objective functions are steady and monotonous. m. Figure 23. m. doing so is even a common practice to find weaknesses of optimization algorithms and to compare them with each other. Each distinct TSP produces a different structure of φ. As a matter of fact. a ) is independent of a. a2 ) m (23. for instance. 3 http://en. in order to outperform a1 in one optimization problem. In this figure. i. a2 will necessarily perform worse in another. Wolpert and Macready [2964] prove that the sum of such probabilities over all possible optimization problems φ on finite domains is always identical for all optimization algorithms. Evolutionary Algorithms will most often still be able to solve these problems. m.. the higher its values. we can construct a problem where the same method has exactly the opposite effect (see Chapter 15). the faster will the problem be solved. Notice that this measure is very similar to the value of the problem landscape m Φ(x.2 on page 95 which is the cumulative probability that the optimizer has visited the element x ∈ X until (inclusively) the τ th evaluation of the objective function(s). we can immediately follow that.1 visualizes this issue.2 The Theorem The performance of an algorithm a iterated m times on an optimization problem φ can then be defined as P (dy |φ. we have chosen a performance measure Φ subject to maximization. m 23. τ ) introduced in Definition D5. The performance of Hill Climbing and greedy approaches degenerates in other classes of optimization tasks as a trade-off for their high utility in their “area of expertise”. a1 ) = m P (dy |φ.. Greedy search methods will perform fast on all problems with matroid3 structure.210 23 THE NO FREE LUNCH THEOREM issue.

for instance specialized optimization algorithm 1.2.b: Climbing. We divided both fitness landscapes into sections labeled with roman numerals.a. Figure 23. a hill climber. Let us assume that the Hill Climbing algorithm is applied to the objective functions given objective values f(x) objective values f(x) I on g II III rig ht IV righ t V wr rig ht wro ng x A landscape good for Hill x A landscape bad for Hill Fig. for instance Figure 23. a gradient descend is only a good idea in section IV (which corresponds to I in Fig. Fig.. Hence. IMPLICATIONS 211 performance very crude sketch all possible optimization problems random walk or exhaustive enumeration or . a depth-first search.2 which are both subject to minimization.3. 23.an EA. Example E23. Hence.23.2.a: Climbing.2. 23. for instance specialized optimization algorithm 2. In Fig. general optimization algorithm . in both problems it will always follow the downward gradient in all three sections. 23.2. A Hill Climbing algorithm will generate points in the proximity of the currently best solution and transcend to these points if they are better. 23.2. 23.a..2.1 (Fitness Landscapes for Minimization and the NFL).2: Fitness Landscapes and the NFL.b.1: A visualization of the No Free Lunch Theorem. our Hill Climbing would exhibit very different performance when applied to . 23. In Fig. exactly the opposite is the case: Here.a) whereas it leads away from the global optimum in section V which has the same extend as sections II and III in Fig. but wrong in the deceptive section I. this is the right decision in sections II and III because the global optimum is located at the border between them. in Figure 23.

23. this may become problematic. Hence. [837. An even simpler thought experiment than Example E23.. i.2. Since we may not know the nature of the objective functions before the optimization starts. 839. a complete enumeration or uninformed search is the optimal approach. become inferior. than we can expect other good candidate solutions in its vicinity”. these methods will waste objective function evaluations and thus.2. and o Igel and Toussaint [1397. Today. The more (correct) problem specific knowledge is integrated (correctly) into the algorithm structure. there exists a wide range of work on No Free Lunch Theorems for many different aspects of machine learning. most of its problem space. extensions.b in average.2.wikipedia. there should be at least one position to which the optimum could be moved in order to create a fitness landscape where any informed search is outperformed by exhaustive enumeration or a random walk. especially when approaching full coverage of the problem space. i. then any element in X could be the global optimum. restricted classes. 23. 838. misleading for another class. Another interpretation is that every useful optimization algorithm utilizes some form of problem-specific knowledge. The website http://www. Example E23. 840]. Depending on the search operations available. Oltean [2074]. at least without keeping all previously discovered elements in memory which is infeasible in most cases.org/4 gives a good overview about them. In practice. it is impossible to be sure that the best candidate solution found so far is the global optimum until all candidate solutions have been evaluated. In terms of these classes. it may be even outpaced by a random walk on Fig. The rough meaning of the NLF is that all black-box optimization methods perform equally well over the complete set of all optimization problems [2074]. which is not the case to such an extent in many practical problems. In a arbitrary (or random) fitness landscape where the objective value of the global optimum is unknown.212 23 THE NO FREE LUNCH THEOREM Fig. e. quite possibly. However. e. search algorithms cannot exceed the performance of simple enumerations. 23. In reality. we cannot fully prevent that a certain candidate solution is discovered and evaluated more than once. The number of arbitrary fitness landscapes should be at least as high as the number of “optimization-favorable” ones: Assuming a favorable landscape with a single global optimum. we do not want to apply an optimizer to all possible problems but to only some. Further summaries. knowledge correct for one class of problems is. The No Free Lunch Theorem is furthermore closely related to the Ugly Duckling Theorem5 proposed by Watanabe [2868] for classification and pattern recognition.no-free-lunch.. Radcliffe [2244] basically states that without such knowledge. 4 accessed: 2008-03-28 5 http://en. On the other hand. Incorporating knowledge starts with relying on simple assumptions like “if x is a good candidate solution.2 (Arbitrary Landscapes and Exhaustive Enumeration). 1398].org/wiki/Ugly_duckling_theorem [accessed 2008-08-22] . Fig.a and Fig. strong causality.b. [1578]. 23. we can make statements about which optimizer performs better. the better will the algorithm perform.b is deceptive in the whole section V. Droste et al. In most randomized search procedures.2. we use optimizers to solve a given set of problems and are not interested in their performance when (wrongly) applied to other classes. Radcliffe and Surry [2247] discuss the NFL in the context of Evolutionary Algorithms and the representations used as search spaces.1 on optimization thus goes as follows: If we consider an optimization problem of truly arbitrary structure. and criticisms have been provided by K¨ppen et al. In this case.

Hence. If an optimization process has converged prematurely. 146] showed that the No Free Lunch Theorem holds only in a weaker form for countable infinite domains. Noise is present in virtually all practical optimization problems. e. quite the opposite is the case. .. it is far more likely that most of the currently known optimization methods have at least one niche. This means for a given performance measure and a limit of the maximally allowed objective function evaluations. e. at least from the point of view of a researcher. Even though this proof does not directly hold for purely continuous or infinite search spaces. one area where they are excellent. Actually. this must sound very depressing for everybody new to this subject.1). i. The No Free Lunch Theorem means that there will always be new ideas. or concern objectives which may change over time (Chapter 22). Ruggedness (Chapter 14) and deceptiveness (Chapter 15) in the fitness landscape.23. A factor that should be considered when thinking about these issues is. even floating point numbers are finite in value and precision. new approaches which will lead to better optimization algorithms to solve a given problem. often caused by epistatic effects (Chapter 17). The solutions that are derived for them should be robust (Chapter 18). We have discussed numerous different phenomena which can affect the optimization process and lead to disappointing results. an observation in nature. that real existing computers can only work with finite search spaces. uncountable infinite ones).4. may lead to the invention of a new optimization algorithm which performs better in some problem areas than all currently known ones. For continuous domains (i. there is always a difference between the behavior of the algorithmic. Also. it does not hold. they should neither be too general (oversimplification. 146]. many practical problems are multi-objective. INFINITE AND CONTINUOUS DOMAINS 213 23. Auger and Teytaud [145. There will always be a chance that an inspiring moment. Unfortunately.2) nor too specifically aligned only to the training data (overfitting. Section 19. since they have finite memory.3). It also means that it is very likely that the “puzzle of optimization algorithms” will never be completed. it can be possible to construct an optimal algorithm for a certain family of objective functions [145. can misguide the search into such a region. Instead of being doomed to obsolescence. involve the optimization of more than one criterion at once (partially discussed in Section 3. Neutrality and redundancy (Chapter 16) can either slow down optimization because the application of the search operations does not lead to a gain in information or may also contribute positively by creating neutral networks from which the search space can be explored and local optima can be escaped from. Furthermore.4 Infinite and Continuous Domains Recently. the author of this book finds himself unable to state with certainty whether this means that the NFL does hold for realistic continuous optimization or whether the proofs by Auger and Teytaud [145. 146] carry over to actual programs. The No Free Lunch Theorem argues that it is not possible to develop the one optimization algorithm.5 Conclusions So far. the problem-solving machine which can provide us with near-optimal solutions in short time for every possible optimization task (at least in finite domains). In the end.. 23. it has been trapped in a nonoptimal region of the search space from which it cannot “escape” anymore (Chapter 13). the subject of this chapter was the question about what makes optimization problems hard. especially for metaheuristic approaches. for instance. Section 19. mathematical definition of an optimization method and its practical implementation for a given computing environment.

ES. DE. . GP. EP.3: The puzzle of optimization algorithms. Simulated Annealing Branch & Bound A Search « Dynamic Program.. Hill Climbing Downhill Simplex Figure 23. GA.. .214 23 THE NO FREE LUNCH THEOREM EDA Evolutionary Algorithms Extremal Optimiz. Memetic Algorithms RFD ACO PSO IDDFS LCS Tabu Search Random Optimiz.

In software engineering. It determines to what degree these expected features can be exploited by defining how the properties and the behavior of the candidate solutions are encoded and how the search operations influence them. 24. and how they are related. there exists a variety of design patterns1 that describe good practice and experience values. In other words. the redundancy in the genomes should be minimal. In Section 16. 2032. In this chapter. we need to define the structure of a genome. classified. These principles can lead to finding better solutions or higher optimization speed if considered in the design phase [1075. there exist basic patterns and rules of thumb on how to design them properly. We will then outline some general rules for the design of the search space G. extensible. Based on our previous discussions. Utilizing these patterns will help the software engineer to create well-organized. 2338]. the search operations searchOp ∈ Op.2 Unbiased Representation UnbiasednessThe search space G should be unbiased in the sense that all phenotypes are 1 http://en. we will first discuss a general theory on how properties of individuals can be defined.5 on page 174. 24. We will outline suggestions given by different researchers transposed into the more general vocabulary of formae (introduced in Chapter 9 on page 119). These rules represent the lessons learned from our discussion of the possible problematic aspects of optimization tasks.1 Compact Representation [1075] and Ronald [2323] state that the representations of the formae in the search space should be as short as possible.Especially in the second case. The individual representation along with the genotypephenotype mapping is a vital part of Genetic Algorithms and has major impact on the chance of finding good solutions.wikipedia. the design of the search space (or genome) G and the genotype-phenotype mapping “gpm” is vital for the success of the optimization process. Whenever we want to solve a problem with Global Optimization algorithms. and the genotype-phenotype mapping “gpm”. Like in software engineering. and maintainable applications. The number of different alleles and the lengths of the different genes should be as small as possible. we will discuss that uniform redundancy slows down the optimization process.org/wiki/Design_pattern_%28computer_science%29 [accessed 2007-08-12] . we can now provide some general best practices for the design of encodings from different perspectives.Chapter 24 Lessons Learned: Designing Good Encodings Most Global Optimization algorithms share the premise that solutions to problems are either (a) elements of a somewhat continuous space that can be approximated stepwise or (b) that they can be composed of smaller modules which already have good attributes even when occurring separately.

Palmer and Kershenbaum [2113. Constraints on what a good solution is can. the methods which chooses the candidate solutions already exploits fitness information and guides the search into one direction. 2113. . 2914]. i. Palmer and Kershenbaum [2115] and Della Croce et al. ∀x1 . only trees should be encoded in the genome.4 Injective GPM In the spirit of Section 24.7 on page 644) [2032].3 Surjective GPM Palmer [2113]. The reason for this is that selection operations.. x2 ∈ X ⇒ |{g ∈ G : x1 = gpm(g)}| ≈ |{g ∈ G : x2 = gpm(g)}| (24. If we use the R3 as problem space. Beyer and Schwefel [302] therefore again suggest to use normal distributions to mutate real-valued vectors if G ⊆ Rn and Rudolph [2348] propose to use geometrically distributed random numbers to offset genotype vectors in G ⊆ Zn . on one hand.216 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS represented by the same number of genotypes [2032. ∀x ∈ X ⇒ ∃g ∈ G : x = gpm(g) (24. no vectors with fewer or more elements than three should be produced by the genotypephenotype mapping. for example. 24. 2113. They should perform exploration as opposed to the exploitation performed by selection.5 Consistent GPM The genotype-phenotype mapping should always yield valid phenotypes [2032. simple and bijective. Anything else would not make sense anyway. should not give preference to any specific change of the genotypes. i. The variation operators thus should not do this job as well but instead try to increase the diversity.5 we provided a thorough discussion of redundancy in the genotypes and its demerits (but also its possible advantages). 24. be surjective (see Section 51. [765] state that a good search space G and genotype-phenotype mapping “gpm” should be able to represent all possible phenotypes and solution structures. 2115]. This form of validity does not imply that the individuals are also correct solutions in terms of the objective functions. e. This property allows to efficiently select an unbiased starting population which gives the optimizer the chance of reaching all parts of the problem space. 2323. so this is a pretty basic rule.1) Beyer and Schwefel [300. and genotype-phenotype mapping in a way which ensures that all phenotypes created are feasible by default [1888. On the other hand. Combining the idea of achieving low uniform redundancy with the principle stated in Section 24. e.2) 24. they can also be implemented by defining the encoding. the number of genotypes which map to the same phenotype should be low [2322–2324] In Section 16. 2912. search operations. 2115] conclude that the GPM should ideally be both. 2115].3. The meaning of valid in this context is that if the problem space X is the set of all possible trees. 302] furthermore also request for unbiased variation operators. The unary mutation operations in Evolution Strategy.4 on page 70).. be realized by defining them explicitly and making them available to the optimization process (see Section 3.1 and with the goal to lower redundancy.

a mutation in Genetic Algorithm. [2242. modified. g2 ) ∈ G. is to find highly fit formae in the problem space X. 24. g2 ) can be traced back to at least one of its parents. The representations of different. g2 ) ∈ A (24. 2879]. Otherwise. Such properties can only be created.6 Formae Inheritance and Preservation The search operations should be able to preserve the (good) formae and (good) configurations of multiple formae [978. Good formae should not easily get lost during the exploration of the search space. a good genotype-phenotype mapping should provide this feature. These operations can only create.. FORMAE INHERITANCE AND PRESERVATION 217 24. This will lead to a representations with minimal epistasis. the binary reproduction operation is considered as pure. ∀g1 . for instance. can be extended to arbitrary n-ary operators. “searchOp” also performs an implicit unary search step. if its two parameters g1 and g2 are members of a forma A. we would expect that. modify. the resulting element will also be an instance of A. It furthermore becomes clear that useful separate properties in phenotypic space can only be combined by the search operations properly if they are represented by separate formae in genotypic space too. ∀A ∈ mini : g3 ∈ A ⇒ g1 ∈ A ∨ g2 ∈ A (24.5 for more information on the Schema Theorem. or combine genotypical formae since they usually have no information about the problem space. Compatible Formae in phenotypic space should also be compatible in genotypic space. 24. however. 2 See Section 29.6. Most mathematical models dealing with the propagation of formae like the Building Block Hypothesis and the Schema Theorem2 thus focus on the search space and show that highly fit genotypical formae will more probably be investigated further than those of low utility. all properties of a genotype g3 which is a combination of two others (g1 . This leads to a low level of epistasis [240.7 Formae in Genotypic Space Aligned with Phenotypic Formae The optimization algorithms find new elements in the search space G by applying the search operations searchOp ∈ Op. Our goal.4) If this is the case. 2242.24. compatible phenotypic formae should not influence each other [1075]. Such properties. although discussed here for binary search operations only. crossover in GAs).g.3) If we furthermore can assume that all instances of all formae A with minimal precision (A ∈ mini) of an individual are inherited by at least one parent. According to Radcliffe [2242] and Weicker [2879]. . g2 ∈ A ⊆ G ⇒ searchOp(g1 . 2323.8 Compatibility of Formae Formae of different properties should be compatible [2879]. From a binary search operation like recombination (e. and combined by the search operations if they correspond to genotypical formae. 2879] ∀g3 = searchOp(g1 . 2323] (see Chapter 17 on page 177) and hence will increase the chance of success of the reproduction operations.

10 Combinations of Formae If genotypes g1 . most interesting formae more and more.2). g2 . the search will first discover the building blocks of higher value. the search space.5) 24. itself. the resulting genotype g should be an instance of both properties. we will trip into the problem of domino convergence discussed in Section 13. which are instances of different but compatible formae A1 ⊲⊳ A2 ⊲⊳ . in my opinion. e.) ∈ A1 ∩ A2 ∩ . Hence. However.2. . . If one of the positive formae has a contribution with an absolute value much lower than those of the other positive formae. The set of available search operations Op should thus include at least one (unary) search operation which is theoretically able to reach all possible formae within finite steps. narrowing down the precision of the arising. it should be able to find any given forma [2879]. Then. g2 ∈ A2 . 24. if formae can arbitrarily combined by the available n-ary search or recombination operations. . the search operations should produce small changes in the objective values. this unary operator is the only one (apart from creation operations) able to introduce new formae which are not yet present in the population.3. GPM. . for simplification purposes. . assume that those with positive contribution can be arbitrarily combined. Generally. are combined by a binary (or n-ary) search operation. Let us divide the phenotypic formae into those with positive and those with negative or neutral contribution and let us. Generally.218 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS 24. This should lead the search to the most promising regions of the search space. . 302] suggest that a good representation has the feature of reachability that any other (finite) individual state can be reached from any parent state within a finite number of (mutation) steps. .. not necessarily all genotypes with such an operator. simplified. mean that: ∀x1 .9 Representation with Causality Besides low epistasis. g ∈ G : x1 = gpm(g) ∧ x2 = gpm(searchOp(g)) ⇒ f (x2 ) ≈ f x1 (24. was missing in the lists given by Radcliffe [2242] and Weicker [2879] is that the absolute contributions of the single formae to the overall objective values of a candidate solution should not be too different. . the combination of compatible formae should be a forma itself [2879]. if the search is . In other words. 24. · · · ⇒ searchOp(g1 . is not a problem. This. i. This would. . .2.12 Influence of Formae One rule which. useful properties can be combined by the optimization step by step. If the binary search operations in Op all are pure. ∀g1 ∈ A1 .3 on page 153. x2 ∈ X. (= ∅) (24. as we have already pointed out in Section 13. the set of search operations should be complete (see Definition D4. Beyer and Schwefel state that this property is necessary for proving global convergence.11 Reachability of Formae Beyer and Schwefel [300.8 on page 85). we just need to be able to generate all formae.6) If this principle holds for many individuals and formae. and search operations together should possess strong causality (see Section 14. g2 .

2914]. In the initial phase of the search. usually larger search steps are performed in order to obtain a coarse idea in which general direction the optimization process should proceed. Then. the smaller should the search steps get in order to balance the genotypic features in a fine-grained manner. redundancy providing new degrees of freedom for the evolution of a species. Ideally.13 Scalability of Search Operations The scalability requirement outlined by Beyer and Schwefel [300. Ideally. This partly contradicts Section 24. it is possible to transform a rugged fitness landscape with isolated peeks into one with connected saddle points by increasing the dimensionality of the search space [496.13. The values of the less salient formae would then play no role. chances are that alleles of higher importance get destroyed during this process and have to be rediscovered. Thus. Conrad [625] does not suggest that nature includes useless sequences in the genome but either genes which allow for 1. 24. 623. 622. the genotype-phenotype mapping “gpm”. 2912.1 Extradimensional Bypass: Example for Good Complexity Minimal-sized genomes are not always the best approach. and the problem space X. the optimization algorithm should achieve such behavior in a self-adaptive way in reaction to the structure of the problem it is solving. Sometimes. new phenotypical characteristics or 2. 24. 625]. According to his extradimensional bypass principle. . 2323].14 Appropriate Abstraction The problem should be represented at an appropriate level of abstraction [240. SCALABILITY OF SEARCH OPERATIONS 219 stochastic and performs exploration steps.2. using a non-trivial encoding which shifts complexity to the genotype-phenotype mapping and the search operations can improve the performance of an optimizer and increase the causality [240. the search operations searchOp ∈ Op. An interesting aspect of genome design supporting this claim is inspired by the works of the theoretical biologist Conrad [621. the chance of finding them strongly depends on how frequent the destruction of important formae takes place. the chance of finding good formae increases. The closer the focus gets to a certain local or global optimum. a suitable developmental approach should be applied [2323].2 on page 272. 544]. are especially useful if large-scale and robust structure designs should be constructed by the optimization process. as we will discuss in Section 28. we would therefore design the genome and phenome in a way that the different characteristics of the candidate solution all influence the objective values to a similar degree.15. 24.1 which states that genomes should be as compact as possible. 302] states that the strength of the variation operator (usually a unary search operation such as mutation) should be tunable in order to adapt to the optimization process.24. Such indirect genotype-phenotype mappings. The complexity of the optimization problem should evenly be distributed between the search space G. 24.15 Indirect Mapping If a direct mapping between genotypes and phenotypes is not possible. In [625] he states that the chance of isolated peeks in randomly created fitness landscapes decreases when their dimensionality grows.

24. 6 in [355]) is sketched. A good example for this issue is given by Bongard and Paul [355] who used an EA to evolve a neural network for the motion control of a bipedal robot. adding useful new dimensions may be hard or impossible to achieve in most practical applications. separated by a larger valley. the robot configuration led to a problem of too high complexity. The two isolated peeks became saddle points on a longer ridge.1. an example for the extradimensional bypass (similar to Fig. i. In some other experimental runs performed by Bongard and Paul [355]. The global optimum is now reachable from all points on the local optimum. one would expect that the optimization would perform worse.1. however.1: Examples for an increase of the dimensionality of a search space G (1d) to G′ (2d).. the increase in dimensionality this time was too large to be compensated by the gain of evolvability. it was very unlikely that it ever could escape this hill and reach the global one. The extradimensional bypass can be considered as an example of positive neutrality (see Chapter 16). This led to the assumption that the morphology itself was not so much target of the optimization but the ability of changing it transformed the fitness landscape to a structure more navigable by the evolution. ruggedness in the fitness landscape and/or 2. Simply adding a useless new dimension (as done in Fig.1.a: Useful increase of dimensionality. Instead.1.220 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS In some cases. f(x) global optimum f(x) local optimum global optimum local optimum G’ G Fig. The ability to change the leg with of the robots. G’ G Fig. As can be seen in the plane in the foreground. for instance. Hence. the objective function had two peeks: a local optimum on the left and a global optimum on the right. 24. 24. this phenomenon could not be observed. 24. the results were much better with the extended search space. Figure 24.5) that it is not beneficial. e. Then again. The runs did not converge to one particular leg shape but to a wide range of different structures. . They performed runs where the evolution had control over some morphological aspects and runs where it had not. When the optimization process began climbing up the local optimum. Generally. In Fig. Increasing the search space to two dimensions (G′ ). in one series of experiments. The original problem had a one-dimensional search space G corresponding to the horizontal axis up front.b: Useless increase of dimensionality.b) would be an example for some sort of uniform redundancy from which we already know (see Section 16.a. increasing the dimension of the search space makes only sense if the added dimension has a non-trivial influence on the objective functions. such an increase in freedom makes more than up for the additional “costs” arising from the enlargement of the search space. comes at the expense of an increase of the dimensions of the search spaced. opened up a path way between them. most likely because 1.

INDIRECT MAPPING 221 Further examples for possible benefits of “gradually complexifying” the search space are given by Malkin in his doctoral thesis [1818]. .15.24.

.

Part III Metaheuristic Optimization Algorithms .

224 24 LESSONS LEARNED: DESIGNING GOOD ENCODINGS .

15 on page 36 we stated that metaheuristics is a black-box optimization technique which combines information from the objective functions gained by sampling the search space in a way which (hopefully) leads to the discovery of candidate solutions with high utility.3. In the taxonomy given in Section 1. these algorithms are the majority since they are the focus of this book. .Chapter 25 Introduction In Definition D1.

4.1.1. 30.1. 41.1.1 Multi-Agent Systems see Tables 28.1 Engineering see Tables 26. 27. and 43. and 39. 1034.1. 31. 39.[3001.1. 35.1. 39. 27. Area Art Astronomy Chemistry Combinatorial Problems References see Tables 28. 31.4. 30. and 35.1. and 40.1. 37.1 25. and 31.4.1.1.1.1 Image Synthesis see Table 28. 32. 36.1.1. 31.1.1.1. 31.1. 30.1.1.1. see also Tables 26.1 [342.1.1 Military and Defense see Tables 26.1.1.4. 27. 31. and 42. 30. 31.4.1. 33.1.1.1. 36. 28.1. 31. 33. 31. 33. 29.1. 36.1. and 35.1 Cooperation and Teamwork see Tables 28.1 Data Mining see Tables 27.1.1. 28. 2992]. 38.1. 34. 34. see also Tables 26. 40. 32.1. 37.1.1. 34.1. 29.1. 34. 36.1. 31.1.4.1. see also Tables 26.1.4.1.1. 28.1: Applications and Examples of Metaheuristics. 29. 28.1 Motor Control see Tables 28.1.1 Games see Tables 28. 37. 29. 30. 31.1. 3031].1. and 40.1.1.1. 37. 35.1 Healthcare see Tables 28. 29.4 see Tables 28.1.4.1.4.1. 40.1 Graph Theory [3001.1.1.1.1. 39. 33.1.1.1. 32. 37. 1034]. 31.1. 39. 32.1.1 Control see Tables 28. 37.1. 40.1.1. 28.1.1. 38. 27. 40.1 Mathematics see Tables 26. 27.1. 37. 34.1. and 35.1.1.1. 39. and 35.1. 34. 29.4. 36.1.1 tems E-Learning see Tables 28.1 Decision Making see Table 31.1 General Information on Metaheuristics Applications and Examples Table 25.1.1. 28.1. 33. 40. and 43. 30.1. 31.1. 39. 33. and 41.1. 28.1 Distributed Algorithms and Sys. 28. 29.1. 27.1. 33.4. 35.1 Databases see Tables 26.1.1.1 Function Optimization see Tables 26.1.1. 34. 35. 27.1.4. and 37.1 Data Compression see Table 31. 29.4. 35.1 Computer Graphics see Tables 27. 1033.1.4. 29. 30.1.1.1.1. 34.1.1. 33. 31. 28.1. 31. 32.1.4. 33.1. 27. 31.1. 37. 32.1.1. and 41.1.1.1. 35. 28.1 .1.1.1. and 41. 29.4.1. 28.1.1.1. 29. and 41.4. see also Tables 26. and 37. 29. and 34.1.1.1. 34.226 25 INTRODUCTION 25. 28.1.1. 29. 29.1.1. 36.1. 37.1.1. 29.1. 31. 36. 2283. 35.1. 34.1. 36.1 Physics see Tables 27.1. 29.1. 35. 37.4.1 Digital Technology see Tables 26. 29.1. 35. 40.4.4 Logistics [651.1.1. 30.1. 36.4. 35.1.4. and 39. 32.1.1 see Table 28. 27.1. 29. 29.1 Security see Tables 26.1.4. and 37.1. 30.1.1.1.1.1.1. 28.4. 39. 29.1. 41. 27.1. 28.4.1. 30. 651.1. 3031].1.4.1. 28. 36. 36.1.1. 39. and 32.1.1 Prediction see Tables 28.1. 40. 31.1 Economics and Finances see Tables 27. 38. 31.1 Multiplayer Games see Table 31. 36. 35. and 36. and 39. 33.1. 30. 29.1. 31.1. 29. 39.1. and 40. 32. and 34.1.1.1. 29.1.1.1. 1033.

1.1 25. 2. see [1968] 2008/10: Atizap´n de Zaragoza. 28. 30. 36. and 40. 31. .25. see also Tables 26. 3031]. HIS: International Conference on Hybrid Intelligent Systems See Table 10. and 40. France. 5. 29. Austria. 28.1.4 and 31. USA. 37.2 on page 131.4 Wireless Communication see Tables 26. see [2294] 1999/07: Angra dos Reis. 4.2 on page 130. 34.1. see [1937] e 2005/08: Vienna.1. 29. Handbook of Approximation Algorithms and Metaheuristics [1103] 25. see [2298] 1997/07: Sophia-Antipolis. 39. MIC: Metaheuristics International Conference History: 2009/07: Hamburg.2 on page 131. GENERAL INFORMATION ON METAHEURISTICS Shape Synthesis Software 227 see Table 26.1.1. 6. 27. SMC: International Conference on Systems. Rio de Janeiro.1 [467. see [2104] NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10.1.1 Telecommunication see Tables 28.3 Theorem Proofing and Automatic see Tables 29.1.2 on page 128. 33.1.1. 27.1 and 40. Germany. see [1321] 2001/07: Porto.1. see [2810] 2003/08: Kyoto.1. 34.4.1. and Cybernetics See Table 10. 33. Brazil. see [1038] a e 2007/06: Montr´al.4. Japan. 3. ICARIS: International Conference on Artificial Immune Systems See Table 10.2: Conferences and Workshops on Metaheuristics. 8.1 Sorting see Tables 28.2 1.1.1.1. Books Metaheuristics for Scheduling in Industrial and Manufacturing Applications [2992] Handbook of Metaheuristics [1068] Fleet Management and Logistics [651] Metaheuristics for Multiobjective Optimisation [1014] Large-Scale Network Parameter Configuration using On-line Simulation Framework [3031] Modern Heuristic Techniques for Combinatorial Problems [2283] How to Solve It: Modern Heuristics [1887] Search Methodologies – Introductory Tutorials in Optimization and Decision Support Techniques [453] 9. Canada. M´xico.1. 37.1 Testing see Tables 28. CO.1. 37.1.1.3 Conferences and Workshops Table 25. and 40. 35. Man.1 Verification Water Supply and Management see Table 28. 31.1. Portugal. 7. QC.4.4 and 31.1. 36.1. see [2828] 1995/07: Breckenridge.

Optimization and Engineering published by Kluwer Academic Publishers and Springer Netherlands 11. GEWS: Grammatical Evolution Workshop See Table 31. see [1171] 2006/11: Hammamet. 8. SIAM Journal on Optimization (SIOPT) published by Society for Industrial and Applied Mathematics (SIAM) 6. and North-Holland Scientific Publishers Ltd.2 on page 385. see [2205] 2010/10: Djerba Island.V. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. Tunisia. WOPPLOT: Workshop on Parallel Processing: Logic. see [1170] SEA: International Symposium on Experimental Algorithms See Table 10. 25. Logic Programming.2 on page 135. European Journal of Operational Research (EJOR) published by Elsevier Science Publishers B. 25 INTRODUCTION LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. Optimization Methods and Software published by Taylor and Francis LLC 12. LION: Learning and Intelligent OptimizatioN See Table 10.2 on page 133.2 on page 133.2 on page 133. Tunisia. GICOLAG: Global Optimization – Integrating Convexity.2 on page 133.2 on page 134. Sousse. META: International Conference on Metaheuristics and Nature Inspired Computing History: 2012/10: Port El Kantaoui.2 on page 134. and Computational Algebraic Geometry See Table 10.228 EngOpt: International Conference on Engineering Optimization See Table 10.2 on page 132. Discrete Optimization published by Elsevier Science Publishers B. SIAG/OPT Views and News: A Forum for the SIAM Activity Group on Optimization published by SIAM Activity Group on Optimization – SIAG/OPT . Journal of Mathematical Modelling and Algorithms published by Springer Netherlands 13. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. The Journal of the Operational Research Society (JORS) published by Operations Research Society and Palgrave Macmillan Ltd.1.4 Journals 1.V. Organization and Technology See Table 10. 2. Tunesia. see [873] 2008/10: Hammamet. Structural and Multidisciplinary Optimization published by Springer-Verlag GmbH 14. Optimization. Journal of Heuristics published by Springer Netherlands 5. Journal of Global Optimization published by Springer Netherlands 7. Engineering Optimization published by Informa plc and Taylor and Francis LLC 10. Journal of Optimization Theory and Applications published by Plenum Press and Springer Netherlands 9. Computational Optimization and Applications published by Kluwer Academic Publishers and Springer Netherlands 3. 4. Tunesia.

1 http://en. This procedure is repeated until the termination criterion terminationCriterion is met. we provide such an optimization algorithm for G ⊆ Rn which additionally adapts the search step size. with large trade-offs in processing time and the risk of premature convergence. In Listing 56. If this new individual is better than its parent. A local search1 algorithm solves an optimization problem by iteratively moving from one candidate solution to another in the problem space until a termination criterion is met [2356].1 to a more basic Monte Carlo Hill Climbing algorithm structure.6 on page 735. of course.Chapter 26 Hill Climbing 26. it resembles a (1 + 1) Evolution Strategies. In this sense. Here we will generalize from Algorithm 1.12 on page 86. Local search algorithms have two major advantages: They use very little memory (normally only a constant amount) and are often able to find solutions in large or infinite search spaces.1).org/wiki/Local_search_(optimization) [accessed 2010-07-24] 2 http://en. a local search algorithm in each iteration either keeps the current candidate solution or moves to one element from its neighborhood as defined in Definition D4. In Algorithm 30.wikipedia. it replaces it. we gave an example for a simple and unstructured Hill Climbing algorithm. These advantages come. it is discarded. In other words.wikipedia.7 on page 369. we provide a Java implementation which resembles Algorithm 26. 26. Hill Climbing2 (HC) [2356] is one of the oldest and simplest local search and optimization algorithm for single objective functions f.1 Introduction Definition D26.2 Algorithm The Hill Climbing procedures usually starts either from a random point in the search space or from a point chosen according to some problem-specific criterion (visualized with the operator create Algorithm 26.org/wiki/Hill_climbing [accessed 2007-07-03] .1 on page 40. the currently known best individual p⋆ is modified (with the mutate( )operator) in order to produce one offspring pnew . Hill Climbing is similar to an Evolutionary Algorithm (see Chapter 28) with a population size of 1 and truncation selection.1 (Local Search). Matter of fact. Then. In Algorithm 1. Otherwise.1. in a loop.

. as) checks whether the new candidate solution is non-prevailed from any element in archive.5. The method pruneOptimalSet(archive. This extended approach will then return a set X of the best candidate solutions found instead of a single point x from the problem space as done in Algorithm 26.230 Algorithm 26. we extract all the phenotypes from the archived individuals (extractPhenotypes(archive)) and return them as the set X. In each round. If so.x ←− gpm(pnew . In case of Algorithm 26. we can easily extend Hill Climbing algorithms with a support for multi-objective optimization by using some of the methods from Section 3. we subsequently have to ensure that the archive does not grow too big: pruneOptimalSet(archive.x . the archive can grow to at most as + 1 individuals before the pruning. we pick one individual from the archive at random.1: x ←− hillClimbing(f) Input: f: the objective function subject to minization Input: [implicit] terminationCriterion: the termination criterion Data: pnew : the new element created Data: p⋆ : the (currently) best individual Output: x: the best element found 1 2 3 4 5 6 7 8 26 HILL CLIMBING begin p⋆ ←− create .x Although the search space G and the problem space X are most often the same in Hill Climbing.2. It should further be noted that Hill Climbing can be implemented in a deterministic manner if.3 Multi-Objective Hill Climbing As illustrated in Algorithm 26.g) pnew .x) ≤ f(p⋆ then p⋆ ←− pnew .2. 26.1 for the sake of generality. deletes some individuals. for each genotype g1 ∈ G.g ←− mutate(p⋆ .7 on page 737. When the loop ends because the termination criterion terminationCriterion was met. We then use this individual to create the next point in the problem space with the “mutate” operator. if necessary.2 on page 76 and some which we later will discuss in the context of Evolutionary Algorithms. The set of currently known best individuals archive may contain more than one element. as) makes sure that it contains at most as individuals and. An example implementation of this method in Java can be found in Listing 56.1. g1 )| ∈ N0 ). we distinguish them in Algorithm 26.x) return p⋆ .g) if f(pnew . it will enter the archive and all individuals from archive which now are dominated are removed.g p⋆ ←− gpm(p⋆ . the set of adjacent points is finite (g2 ∈ G : |adjacent(g2 . Since an optimization problem may potentially have infinitely many global optima.g) while ¬terminationCriterion do pnew . so at most one individual will be deleted.

g) pnew .org/wiki/Monotonicity [accessed 2007-07-03] . Both previously specified versions of the algorithm are very likely to fall prey to this problem. 2.26. since the always move towards improvements in the objective values and thus. As previously mentioned. the objective functions f .wikipedia. it can become a valuable technique capable of performing Global Optimization: 1. Simulated Annealing introduces a heuristic based on the physical model the cooling down molten metal to decide whether a superior offspring should replace its parent or not. By preventing the algorithm from visiting them again.g) while ¬terminationCriterion do // check whether the new individual belongs into archive archive ←− updateOptimalSet(archive. A way of preventing premature convergence is to not always transcend to the better candidate solution in each step. By making a few slight modifications to the algorithm however. and a comparator function cmpf Input: as: the maximum archive size Input: [implicit] terminationCriterion: the termination criterion Data: pold : the individual used as parent for the next point Data: pnew : the new individual generated Data: archive: the set of best individuals known Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 begin archive ←− () pnew . pnew .2: X ←− hillClimbingMO(OptProb. as) Input: OptProb: a tuple (X. f . The function used in Example E5.1 on page 96 is. They will only follow a path of candidate solutions if it is monotonously3 improving the objective function(s).x ←− gpm(pnew . A tabu-list which stores the elements recently evaluated can be added. already problematic for Hill Climbing. len(archive))⌋] pnew .g ←− mutate(pold . a better exploration of the problem space X can be enforced. 3 http://en. We introduced this phenomenon in Chapter 13 on page 151: The optimization gets stuck at a local optimum and cannot discover the global optimum anymore. This approach is described in Chapter 27 on page 243. delete one individual archive ←− pruneOptimalSet(archive.4 Problems in Hill Climbing The major problem of Hill Climbing is premature convergence.g) return extractPhenotypes(archive) 26. cmpf ) // if the archive is too big.g ←− create pnew .x ←− gpm(pnew . for example. PROBLEMS IN HILL CLIMBING 231 Algorithm 26. This technique is used in Tabu Search which is discussed in Chapter 40 on page 489. Hill Climbing in this form is a local search rather than Global Optimization algorithm.4. as) // pick random element from archive pold ←− archive[⌊randomUni[0. cannot reach any candidate solution which is situated behind an inferior point. cmpf ) of the problem space X.

26. We will further combine this approach directly with the multi-objective Hill Climbing approach defined in Algorithm 26. in Chapter 44 on page 505.5 Hill Climbing With Random Restarts Hill Climbing with random restarts is also called Stochastic Hill Climbing (SH) or Stochastic gradient descent 4 [844. the overall optimal set. as defined in Algorithm 26. In the following. We additionally define the restart criterion shouldRestart which is evaluated in every iteration and determines whether or not the algorithm should be restarted.wikipedia.5. 4 http://en. 2561]. Randomly restarting the search after so-and-so many steps is a crude but efficient method to explore wide ranges of the problem space with Hill Climbing. archive2 is incorporated into archive1. for example. The Dynamic Hill Climbing approach by Yuret and de la Maza [3048] uses the last two visited points to compute unit vectors. The new algorithm incorporates two archives for optimal solutions: archive1 . You can find it outlined in Section 26. After each single run. an optimization approach defined . shouldRestart therefore could.2.3.x. 5. may prove even more efficient. from which we extract and return the problem space elements at the end of the Hill Climbing process. With this technique.org/wiki/Stochastic_gradient_descent [accessed 2007-07-03] . we will outline how random restarts can be implemented into Hill Climbing and serve as a measure against premature convergence. and archive2 the set of the best individuals in the current run. Using a reproduction scheme that not necessarily generates candidate solutions directly neighboring p⋆ as done in Random Optimization. the directions are adjusted according to the structure of the problem space and a new coordinate frame is created which points more likely into the right direction.232 26 HILL CLIMBING 3. count the iterations performed or check if any improvement was produced in the last ten iterations. 4.

3: X ←− hillClimbingMORR(OptProb. the objective functions f . len(archive2 ))⌋] pnew .g) while ¬ shouldRestart ∨ terminationCriterion do // check whether the new individual belongs into archive archive2 ←− updateOptimalSet(archive2 .x ←− gpm(pnew . HILL CLIMBING WITH RANDOM RESTARTS 233 Algorithm 26. pnew . as) .26.g ←− create pnew . f . as) // pick random element from archive pold ←− archive2 [⌊randomUni[0.g) 13 14 15 return extractPhenotypes(archive1 ) // check whether good individual were found in this run archive1 ←− updateOptimalSetN(archive1 . archive2 : the sets of best individuals known Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 12 begin archive1 ←− () while ¬terminationCriterion do archive2 ←− () pnew .g) pnew . cmpf ) // if the archive is too big.g ←− mutate(pold . cmpf ) of the problem space X.x ←− gpm(pnew . delete one individual archive1 ←− pruneOptimalSet(archive1 .5. cmpf ) // if the archive is too big. archive2 . as) Input: OptProb: a tuple (X. delete one individual archive2 ←− pruneOptimalSet(archive2 . and a comparator function cmpf Input: as: the maximum archive size Input: [implicit] terminationCriterion: the termination criterion Input: [implicit] shouldRestart: the restart criterion Data: pold : the individual used as parent for the next point Data: pnew : the new individual generated Data: archive1 .

3 Databases [1781. tems 2663.6. 2286. 2315. 2554. 2165. 3027. 2237. HIS: International Conference on Hybrid Intelligent Systems See Table 10. 1553.2 on page 131.2 on page 128. Man. 2554] Shape Synthesis [887] Software [1486. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. 2058.234 26 HILL CLIMBING 26. Introduction to Stochastic Search and Optimization [2561] 3.2 Books 1.3 Conferences and Workshops Table 26. 2596. 2596] 26. 2058. 3027. 3085. Evolution and Optimum Seeking [2437] 26. 1486. 3048] Graph Theory [15.1 General Information on Hill Climbing Applications and Examples Table 26. 1665] Distributed Algorithms and Sys. see also Table 26. 2554. 1781. 2656. 2663] Function Optimization [1927. 2286. 2994. 1665. 410. 2745. 2745] Digital Technology [1600. 1532. 1558.[15. 2656. 1558. 2315.1: Applications and Examples of Hill Climbing.2 on page 227.2 on page 131. 2058. 410. 1532. 775. 1486. 1600.6. 2656. 2903. 502. 1971] Mathematics [1674] Military and Defense [775] Security [1486. 3106] Logistics [1553. 1532. 2663. 2994] Wireless Communication [1486. 2165. 1665. MIC: Metaheuristics International Conference See Table 25. 2165. 775. 2237. 3027]. Artificial Intelligence: A Modern Approach [2356] 2. 2656. 2286. 1261. EngOpt: International Conference on Engineering Optimization See Table 10. Area Combinatorial Problems References [16. 2554. SMC: International Conference on Systems.2: Conferences and Workshops on Hill Climbing. and Cybernetics See Table 10. 775. 1971.2 on page 132. LION: Learning and Intelligent OptimizatioN . 1486. 2663.6 26. 3106] Engineering [410.6. 1558. 3085.

6. If no constraint was violated. physically close was properly defined due to the fact that the candidate solutions were basically two-dimensional maps. Logic Programming. e. From L. which they used for forest planning problems.2 on page 133. The Raindrop Method starts out with a single.2 on page 135. Let us refer to this randomly selected component as s. otherwise proceed as follows. 7. For applications different from forest planning. s).2 on page 228. Optimization..26.2 on page 133. Here we make use the knowledge of the interaction of components and constraints. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. the component with the minimum distance dist(c.2 on page 134. appropriate definitions for the distance measure have to be supplied. 5. and Computational Algebraic Geometry See Table 10. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10.4 Journals 1. one that violates none of the constraints). It works as follows: 1. GICOLAG: Global Optimization – Integrating Convexity. Journal of Heuristics published by Springer Netherlands 26. 235 ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. In the original application of the Raindrop Method. 2. Organization and Technology See Table 10. 3. Create a copy x of x⋆ . WOPPLOT: Workshop on Parallel Processing: Logic. Set the iteration counter τ to a user-defined maximum value maxτ of iterations for modifying and correcting x that are allowed without improvements before reverting to x⋆ . we pick the component c which is physically closest to s. Set a distance value d to 0.2 on page 133. 3083] contributed a new search heuristic for constrained optimization: the Raindrop Method.6. This modification may lead to constraint violations. 8.2 on page 133. 4.2 on page 134. that is.7. Bettinger and Zhu [290. s). RAINDROP METHOD See Table 10. Create a list L of the components of x that lead to constraint violations. the search and problem space are identical (G = X). In the original description of the algorithm. SEA: International Symposium on Experimental Algorithms See Table 10. 26. This candidate may be found with a random search process or provided by a human operator. Set d = dist(c. The algorithm is based on precise knowledge of the components of the candidate solutions and on how their interaction influences the validity of the constraints.7 Raindrop Method Only five years ago. continue at Point 11. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. . Perturb x by randomly modifying one of its components. valid candidate solution x⋆ (i.

2 on page 128.236 26 HILL CLIMBING 9. modify the component c in x. that is.1. If no such change is possible. 13.7. The name Raindrop Method comes from the fact that the constraint violations caused by the perturbation of the valid solution radiate away from the modified component s like waves on a water surface radiate away from the point where a raindrop hits. Return x⋆ to the user. Otherwise. The iteration counter τ here is used to allow the search to explore solutions more distance from the current optimum x⋆ .1 Applications and Examples Table 26. 10.1 General Information on Raindrop Method 26.1. 3083] 26.4: Conferences and Workshops on Raindrop Method. The higher the initial value maxτ specified user. If the termination criterion has not yet been met. however. set x⋆ = x. and Cybernetics . Otherwise. HIS: International Conference on Hybrid Intelligent Systems See Table 10. 11.. cause constraints violations in components farther away from s. If x is better than x⋆ . with dist(f. 12. 26.2 on page 131.7. go to Point 13. SMC: International Conference on Systems.7. Man. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. Find the next best value for c which does not introduce any new constraint violations in components f which are at least as close to s.1. s) ≤ dist(c. s). e. MIC: Metaheuristics International Conference See Table 25. Area Combinatorial Problems References [290.2 on page 227.2 Books 1. This change may.7. decrease the iteration counter τ . i. go back to step Point 3 if τ > 0 and to Point 2 if τ = 0. Nature-Inspired Algorithms for Optimisation [562] 26. x ≺ x⋆ .3: Applications and Examples of Raindrop Method.3 Conferences and Workshops Table 26. the more iterations without improvement are allowed before reverting x to x⋆ . Go back to step Point 4.

Journal of Heuristics published by Springer Netherlands .2 on page 228.2 on page 133. RAINDROP METHOD See Table 10.7. SEA: International Symposium on Experimental Algorithms See Table 10.2 on page 134. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. Optimization.4 Journals 1. Organization and Technology See Table 10.7. 237 ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10.2 on page 133. WOPPLOT: Workshop on Parallel Processing: Logic.2 on page 134. LION: Learning and Intelligent OptimizatioN See Table 10. 26.2 on page 132.26. Logic Programming.2 on page 133.2 on page 131. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10.2 on page 135.2 on page 133. GICOLAG: Global Optimization – Integrating Convexity. EngOpt: International Conference on Engineering Optimization See Table 10.1. and Computational Algebraic Geometry See Table 10.

Does one of the functions seem to be harder? If so. [20 points] 66.6 on page 735 or provide a completely new implementation of Hill Climbing with restarts in a programming language of your choice. Instead. 10]n . i. Which problems seem to be the hardest? What is the influence of the dimension n on the achieved objective values? [10 points] (d) If you compare f1 and f2 .3. which. especially Definition D53. only extend the single-objective Hill Climbing method in Listing 56. operators. In Table 26. interfacebased approach we adhere to in Listing 56. i. (a) Run ten experiments for n = 2. makes it harder? [5 points] [45 points] 65. the vectors discovered.1 on page 25. the results. to a Hill Climbing which restarts the search procedure every n steps (while preserving the results of the optimization process in an additional variable). you do not need to follow the general. e. Note all parameter settings of the algorithm you used (the number of algorithm steps. we give four datasets from Data Set 1 in [2422].).1. 10]n . Each . etc.32 on page 661) and the standard deviation (see Section 53. Implement the Hill Climbing procedure Algorithm 26.238 Tasks T26 26 HILL CLIMBING 64. in your opinion.5 − 2n n i=1 [cos (ln (1 + |xi |))] n (26. and their objective values.5 on page 232 as reference. continuous optimization (see.2. n = 5. Extend the Hill Climbing algorithm implementation given in Listing 56.6 on page 735 to Hill Climbing with restarts. You can specialize your implementation for only solving continuous numerical problems.3. you find striking similarities.01 ∗ maxn i=1 xi i=1 |xi | in the real interval [−10. and n = 10 each. Try to order the optimization problems according to the median of the achieved objective values.5 − 0. Let us now solve a modified version of the bin packing problem from Example E1. the arithmetic mean (see Section 53.6 on page 665). Apply your implementation to the Sphere function (see Section 50.. .1) (26.1 on page 859) for n = 10 dimensions in order to test its functionality.6 on page 735. e.2.5 ∗ cos ln 1 + 2 f3 (x) = 0.2) (26.. However. Task 64). You may use the discussion in Section 26.6 on page 735 (or an equivalent own one as in Task 65) in order to find approximations of the global minima of 1 f1 (x) = 0. and what.5.2. especially Section 53. [25 points] 67. 1] in the interval [−10. e. Test your implementation on one of the previously discussed benchmark functions for real-valued. [20 points] (b) Compute the median (see Section 53. Use the implementation of Hill Climbing given in Listing 56.g.2.3. [10 points] (c) The three objective functions are designed so that their value is always in [0.3 on page 233.1 on page 581 and Listing 57.3) f2 (x) = 0. ..2 on page 663) of the achieved objective values in the ten runs. you do not need to provide the multi-objective optimization capabilities of Algorithm 26.2.1 on page 230 in a programming language different from Java.

65. 91. 86.5. It is not the goal to solve it to optimality. In order to solve this optimization problem. RAINDROP METHOD 239 dataset stands for one bin packing problem and provides the bin size b. 31. 80. 13. 37. In other words: For each problem. 63. except that you have to use Hill Climbing. 25. 93. 54. 17. 13. 85. 44. 41. 44. 45. 79. find a candidate solution x which contains an assignment of objects to as few as possible bins of the given size b. 44. 98. 96. 29. we do know the size b of the bin. 44. 52. 22. 76. 21. 35. 84. 34. 22. 15. 8. 7. 10. 16. 99. 41. 34. This value can easily be computed as discussed in our outline of the problem space X above. 55. 23. 30. 52. 88. 100. 73. 36. As objective function f1 . 33. k⋆ 25 N1C2W2 D 120 50 24 N2C1W1 A 100 100 48 N2C3W2 C 150 100 41 Table 26. 44. 51. 38. 64. 85. Below. 43. 38. 63. 32. 32. 37. 54. 87. 98. We want to find out how many bins we need at least to store all the objects (k ∈ N1 .n)). 88. 62. such a permutation. 79. 80. 96. 97. 40. 92. 60. 7. 84.5: Bin Packing test data from [2422]. 81. 39. 71. 74. 71. 56. 1] [100. 97. 76. 30. we give some suggestions – but you do not necessarily need to follow them: (a) A possible problem space X for n objects would be the set of all permutations of the first n natural numbers (G = Π(1. 93. 23. where the number k of bins is not provided. 97. 33. 59. 40. 73. and the sizes ai of the objects (with i ∈ 1.7.26. 74. 59. 72. 24. 72. 41. 82. 99. i. Data Set 1. 91. 43. 56. 99. 24. 17. 43. 26. 31. 51. 9. 78. 89. 84. 53. 95. 88. 28. 11. 37. 42. 80. 14. 67. 24. 51. 50.10 on page 892. 29. 30. 67. 75. 72. 81. 11. 45. 66. 39. 67. but to get solutions with Hill Climbing which are as good as possible. 57. 99. 46. there exist many different choices. 100. 62. 28. 74. 96. 32. 26. 96. 69. 150. 87. 91. 49. 37. You can find an example implementation of this function in Listing 57. 76. 21. 76. 40. 31. 41. 48. 37. 66. 59. 86. 6. 48. 19. 44. 30. 45. 18. 75. 38. 42. 31. 37. 43. 23. 12. 37. 120. we provide the lowest known number k ⋆ of bins for the problems given by Scholl and Klein [2422] in the last column of Table 26. 39. 90. the number n of objects. 63. (b) For the objective function. 94. 86. 51. 94. 78. 64. 80. 23. 30. 92. 57. 58. 41. 46. 9. 39. 70. 60. 45. 98. 60. 38. 41. 26. 22..n). 37. 65. If a bin is filled to its capacity (or the current object does not fit into the bin anymore). 29. 80. . a new bin is needed. 28. 21. 86. 49. 5. 64. 40. 86. 97. 24. 95. 95. 82. 80. Solve this optimization problem with Hill Climbing. 83. 47. 20] 74. 69. 16. 26. 99. This process is repeated until all objects have been packed into bins. For the sake of completeness and in order to allow comparisons. 95. 7. 23. 44. 48. i. 83. 64. A candidate solution x ∈ X. 49. 20. 28. the objective. 52. and the object sizes a. 20. 56. 99. 58. 35. 82. For each of the four bin packing problems. 51. 81. 67. 50. 20. 12. the number n of objects to be put into the bins. the number k(x) of bins required for storing the objects according to a permutation x ∈ X could be used. 61. 92. 35. 49. 59. e. 97. Name [2422] N1C1W1 A b 100 n 50 ai [50. 27. 100. 92... 93. 21. 46. 63. 62. could describe the order in which the objects have to be put into bins. 23. 33. 76. 95. 25. 33. The problem space be the set of all possible assignments of objects to bins. 96. 64. 62. 43. 8. 69. 41. 14. 76. subject to minimization) and how we can store (x ∈ X) the objects in these bins. 89. 68. you may proceed as discussed in Chapter 7 on page 109. 55. 22. 43. 20] [100. 31. 47. 6. 71. 35. 40. 3] [50. 48. 32. 41.

we punish bins with lots of free space more severely.. For your experiments. iii. The idea here would be that the dominating factor of the objective should still be the number of bins required. It is easy to see that if we put each object into a single bin.14 on page 897. (c) If the suggested problem space is used. You may add own ideas as well . A creation operation could draw a permutation according to the uniform distribution as shown in Listing 56. similar to as we did in the previous function: f5 (x) = f1 (x) + 1/(1 + mf(x)).240 26 HILL CLIMBING ii. Then. set f4 (x) = f1 (x) + 1/(1 + lastWaste(x)). An implementation of this function for the suggested problem space is given at Listing 57. i. Hence. we prefer the one where more space is free in the last bin.13 on page 896. In order to find such solutions. It may eventually become 0 which would equal leaving the bin empty. the free capacity left over is maximal. such as f3 (x) = f1 (x) − 1 1+f2 (x) . Also. for example. e.) (b) For each of the four problems. If this waste in the last bin is increased more and more. We implement this objective function as Listing 57. Alternatively. i.2 on page 729). since using fewer bins would mean to not store all objects. In other words. we could consider the total number of bins required but additionally try to minimize the space lastWaste(x) left over in the last bin. we could also sum up the squares of the free space left over in each bin as f2 . if two solutions require the same amount of bins.15 on page 899. instead of only considering the remaining empty space in the last bin. we go over all the bins required by the solution. we could prefer solutions where there is one bin with much free space.1. we could consider the maximum empty space over all bins.. Finally.39. Use the Hill Climbing Algorithm 26.1 given in Listing 56. G = X. the free capacity would become 0. it could also be used as search space. a combination of the above would be possible. and check how much space is free in them. v. this means that the volume of objects in that bin is reduced. Naturally one would “feel” that such solutions are “good”. Such a configuration would also require the least number of bins. We then take the one bin with the most free space mf and add this space as a factor to the bin count in the objective function. we would be saving one bin. If there was a way to perfectly fill each bin with objects. By summing up the squares of the free spaces. maybe we can make it disappear with just a few moves (object swaps). perform at least 30 independent runs to get reliable results (Remember: our Hill Climbing algorithm is randomized and can produce different results in each run.38 on page 806 and Listing 56.37. But.6 on page 735 for solving the task and compare its results with the results produced by equivalent random walks and random sampling (see Chapter 8 and Section 56. (a) For each algorithm or objective function you test. a simple unary search operation (mutation) would swap the elements of the permutation such as those provided in Listing 56. write down the best assignment of objects to bins that you have found (along with the number of bins required). iv. this approach may prevent us from finding the optimum if it is a solution where the objects are distributed uneven. you may use the program given in Listing 57. The idea is pretty similar: If we have a bin which is almost empty. However. An implementation of such a function for the suggested problem space is given at Listing 57. The empty bin would not be used and hence. we reward solutions where the bins are filled equally.11 on page 893.12 on page 894. e. Then again. An example implementation of this function is given as Listing 57.

(d) Also provide all your source code. RAINDROP METHOD 241 (c) Discuss which of the objective functions and algorithm or algorithm configuration performed best and give reasons.26. [40 points] .7.

.

The movement decreases as the temperature of the metal sinks. please drop me an email. 27. Also. they may also temporarily reside in configurations of higher energy. the metal is heated to.4 times of its melting temperature.1 Introduction In 1953. the manner in which metal crystals reconfigure and reach equilibriums in the process of annealing can be simulated. Metropolis et al. 1427] use a similar method to optimize program code. dislocations of atoms which weaken the overall structure as sketched in Figure 27.1 is a very rough sketch and the following discussion is very rough and simplified. well below its melting point but usually already a few hundred degrees Celsius. Even y earlier. for example. for example if two ions get close to each other. Due to the added energy.org/wiki/Annealing_(metallurgy) [accessed 2008-09-19] 3 I assume that a physicist or chemist would beat for these simplifications it.org/wiki/Simulated_annealing [accessed 2007-07-03] 2 http://en. Metal crystals have small defects.3 In order decrease the defects. Jacobs et al.1 Metropolis Simulation In metallurgy and material science..1. With this new “Metropolis” procedure stemming from statistical mechanics.wikipedia. Independently around the same ˇ time. the higher energy of the ions makes them move and increases their diffusion rate. too. annealing2 [1307] is a heat treatment of material with the goal of altering properties such as its hardness. Simulated Annealing only needs a single initial individual as starting point and a unary search operation and thus belongs to the family of local search algorithms. i. The ions may now leave their current position and assume better positions. It should be noted that Figure 27. Cern´ [516] employed a similar approach to the Traveling Salesman Problem. for instance 0. e. During this process. If you are a physicist or chemist and have suggestions for improving/correcting the discussion. Simulated Annealing is a general-purpose optimization algorithm that can be applied to arbitrary search and problem spaces.1. . [1426. [1541] to develop the Simulated Annealing1 (SA) algorithm for Global Optimization in the early 1980s and to apply it to various combinatorial optimization problems. Adding heat means adding energy to the atoms in the metal. the electrons are freed from the atoms in the metal (which now become ions). Like simple Hill Climbing.wikipedia. [1874] developed a Monte Carlo method for “calculating the properties of any substance which may be considered as composed of interacting individual molecules”.Chapter 27 Simulated Annealing 27. The structure of the crystal is reformed as the material cools down and 1 http://en. This inspired Kirkpatrick et al.

380 650 524 · 10−23 J/K (27. approaches its equilibrium state. When annealing metal. e. 1) is drawn and the step will only be realized if r is less or equal the Boltzmann probability factor. a uniformly distributed random number r = randomUni[0. r ≤ P (∆E). the energetic difference between the current and the new geometry. In the Metropolis simulation of the annealing process. the change of atom positions is accepted in the simulation. the dislocations in the metal crystal can disappear. i.wikipedia.1: A sketch of annealing in metallurgy as role model for Simulated Annealing. Thus. was determined. At high temperatures T.org/wiki/Boltzmann%27s_constant [accessed 2007-07-03] .2) (27. the lower the energy. ∆E = E(posi+1 ) − E(posi ) P (∆E) = e − k∆E B ∗T − E(pos) (27.244 metal structure with defects 27 SIMULATED ANNEALING increased movement of atoms ions in low-energy structure with fewer defects T Ts=getTemperature(1) physical role model of Simulated Annealing 0K=getTemperature(¥) t Figure 27.. each set of positions pos of all atoms of a system is weighted by its Boltzmann probability factor e kB ∗T where E(pos) is the energy of the configuration pos. A new nearby geometry posi+1 was generated as a random displacement from the current geometry posi of the atoms in each iteration. The probability that this new geometry is accepted. P (∆E) is defined in Equation 27.3) if ∆E > 0 1 otherwise Systems in chemistry and physics always “strive” to attain low-energy states. the more stable is the system. Otherwise. and kB is the Boltzmann’s constant4 : kB ≈ 1. The energy of the resulting new geometry is computed and ∆E. the initial temperature must not be too low and the cooling must be done sufficiently slowly so as to avoid to stop the ions too early and to prevent the system from getting stuck in a meta-stable non-crystalline state representing a local minimum of energy. T is the temperature measured in Kelvin.3. i. During this process. if the new nearby geometry has a lower energy level. this factor is very close to 1.1) The Metropolis procedure is a model of this physical process which could be used to simulate a collection of atoms in thermodynamic equilibrium at a given temperature. e. leading to the acceptance of many uphill steps.. 4 http://en.

we reuse the definitions from Evolutionary Algorithms for the search operations and set Op = {create.x Output: x: the best element found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 begin pcur . If the temperature is high in the beginning. the chance that worse individuals are accepted decreases. Better candidate solutions are always accepted. mutate}. Algorithm 27.x) − f(pcur . 27.g) x ←− pcur .2.x else ∆E if randomUni[0. In summary. In its behavior.g ←− mutate(pcur . different from a Hill Climbing process.3).g) ∆E ←− f(pnew .x and pnew . this probability becomes 0 as well and Simulated Annealing becomes a pure Hill Climbing algorithm.x ←− gpm(pnew . As the temperature decreases.x) < f(x) then x ←− pcur . Simulated Annealing accepts also many worse individuals. the chance that the search will transcend to a new candidate solution.8 on page 740.x) if ∆E ≤ 0 then pcur ←− pnew if f(pcur . regardless whether it is better or worse.1 illustrates the basic course of Simulated Annealing which you can find implemented in Listing 56. The rationale of having an algorithm that initially behaves like a random walk and then becomes a Hill Climber is to make use of the benefits of the two extreme cases- .2 Algorithm The abstraction of the Metropolis simulation method in order to allow arbitrary problem spaces is straightforward – the energy computation E(posi ) is replaced by an objective function f or maybe even by the result v of a fitness assignment process.g ←− create pcur .1: x ←− simulatedAnnealing(f) Input: f: the objective function to be minimized Data: pnew : the newly generated individual Data: pcur : the point currently investigated in problem space Data: T: the temperature of the system which is decreased over time Data: t: the current time index Data: ∆E: the energy (objective value) difference of the pcur . ALGORITHM 245 As the temperature falls. During the course of the optimization process.x ←− gpm(pnew . this means that. 1) < e− T then pcur ←− pnew return x t ←− t + 1 Let us discuss in a very simplified way how this algorithm works.27. the temperature T decreases (see Section 27. Without loss of generality. the proportion of steps accepted which would increase the energy level decreases. is high.x t ←− 1 while ¬terminationCriterion do T ←− getTemperature(t) pnew .g) pnew . Algorithm 27. If T approaches 0. Now the system will not escape local regions anymore and (hopefully) comes to a rest in the global energy minimum at temperature T = 0K. Simulated Annealing is then a bit similar to a random walk.

and probabilities in a more straightforward way.8 and Equation 27. e. it performs a lot of exploration in order to discover interesting areas of the search space.001 = − T 1 − =T ln 0.4) (27. The more the optimization process becomes like a Hill Climber.246 27 SIMULATED ANNEALING A random walk is a maximally-explorative and minimally-exploitive algorithm.8) ln 0. The reason is that the absence of kB in Equation 27. The hope is that.12. If you take a close look on Equation 27.5) (27. on the other hand.9) (27. kB is thus eliminated from the computations in the Simulated Annealing algorithm. during this time.001 for a solution which is 1 worse than the best known one. It can never get stuck at a local optimum and will explore all areas in the search space with the same probability.001. using a nominal value of the scale of 1 · 1022 or more.145 1 (27.6) (27. objective value differences. The nominal value of Boltzmann’s constant. this becomes obvious: In order to achieve an acceptance probability of 0. for such a problem is not obvious and hard to justify.001 T ≈ 0.001 = − − 1 = kB ∗ T ln 0.001 T ≈ 1.12 removes the physical context and hence..11) (27.2. Such a setting is much easier to grasp and allows the algorithm user to relate objective values.1 and Equation 27. it will discover the basin of attraction of the global optimum. has quite a heavy impact on the Boltzmann probability. Since “physical correctness” is not a goal in optimization (while finding good solutions in an easy way is).001 = e −k 1 B ∗T (27. In a combinatorial problem. where the differences in the objective values between two candidate solutions are often discrete.1 (something around 1. the less likely will it move away from a good solution to a worse one and will instead concentrate on exploiting the solution. using . is minimally-explorative and maximally-exploitive. Leaving the constant kB away leads to P = 0. The reason is that with Simulated Annealing.8 contains the unit K whereas Equation 27. is unable to make small modifications to an individual in a targeted way in order to check if there are better solutions nearby. It always tries to modify and improve on the current solution and never investigates far-away areas of the search space.3 on page 244: Boltzmann’s constant has been removed. e. it cannot exploit a good solution.1 No Boltzmann’s Constant Notice the difference between Line 15 in Algorithm 27. i. 27. we do not want to simulate a physical process but instead.12 does not.. Simulated Annealing combines both properties. A Hill Climber.7) (27.12) In other words. one would need a temperature T of: P = 0.4 · 10−23 ). given in Equation 27.10) (27.001 1 =T − kB ∗ ln 0.05 · 1022 K 1 kB ∗ T Making such a setting. aim to solve an optimization problem.001 = e− T 1 ln 0. Yet. you will spot another difference: Equation 27. i. a temperature of around 0.15 would be sufficient to achieve an acceptance probability of around 0.

but voids the guaranteed convergence on the other hand. Nolte and Schrader [2043] and van Laarhoven and Aarts [2776] provide lists of the most important works on the issue of guaranteed convergence to the global optimum.1) is set.2. 1402. well. . we will keep that name during our discussion of Simulated Annealing. Indeed. slow from the viewpoint of a practitioner [1403]: In other words. + ∞) ∀t ∈ N1 5 http://en. They conclude that. this makes sense: If this was not the case.org/wiki/Ergodicity [accessed 2010-09-26] 6 http://en. for example. Such speeded-up algorithms are called Simulated Quenching (SQ) [1058. Speeding up the cooling process will result in a faster search. Nolte and Schrader [2043] further list the research providing deterministic.wikipedia.3 Temperature Scheduling Definition D27. Quenching6 . Simulated Annealing could be used to solve N P-complete problems in sub-exponential time. but for the sake of simplicity and compliance with literature. Gidas [1054].wikipedia. in general. they introduce a significantly lower bound. An optimization algorithm is ergodic if its result x ∈ X (or its set of results X ⊆ X) does not depend on the point(s) in the search space at which the search process is started. This feature is closely related to the property ergodicity5 of an optimization algorithm: Definition D27. This is a boundary which is.2 (Temperature Schedule). 27. It only needs that much time if we insist on the convergence to the global optimum. 27. getTemperature(t) ∈ [1. This does not mean that Simulated Annealing is generally slow.1 represents a global optimum approaches 1 for t → ∞. a work piece is first heated and then cooled rapidly. often by holding it into a water bath. Technically.x in the Simulated Annealing process is a globally optimal candidate solution with a high probability after a number of iterations which exceeds the cardinality of the problem space. it would be faster to enumerate all possible candidate solutions in order to find the global optimum with certainty than applying Simulated Annealing. we should also no longer speak about temperature.2 Convergence It has been shown that Simulated Annealing algorithms with appropriate cooling strategies will asymptotically converge to the global optimum [1403. is a process in metallurgy.. In the same paper.1 (Ergodicity). 2043]. This way. Here. TEMPERATURE SCHEDULING 247 the unit Kelvin is no longer appropriate. The probability that the individual pcur as used in Algorithm 27. . The temperature schedule defines how the temperature parameter T in the Simulated Annealing process (as defined in Algorithm 27. The operator getTemperature : N1 → R+ maps the current iteration index t to a (positive) real temperature value T. steel objects can be hardened. 2405]. too.3.13) . pcur .org/wiki/Quenching [accessed 2010-10-11] (27. [1917].27. and Mitra et al. Nolte and Schrader [2042]. non-infinite boundaries for the asymptotic convergence by Anily and Federgruen [111].

27.1 and some examples of them are sketched in Figure 27.14) If the number of iterations t approaches infinity.4 Ts logarithmically exponentially with e=0.2 Ts polynomial with a=2 exponentially with e=0.2. the temperature schedule has major influence on the probability that the Simulated Annealing algorithm will find the global optimum.. Ts 0.3. for example. which already has been discussed in the context of Hill Climbing in Section 26. used Genetic Algorithms for this purpose.1 Logarithmic Scheduling Scale the temperature logarithmically where Ts is set to a value larger than the greatest difference of the objective value of a local minimum and its best neighboring candidate . getTemperature(1) = Ts > 0 (27. In order to avoid that the resulting Simulated Quenching algorithm gets stuck at a local optimum. As already mentioned. For “getTemperature”.15) There exists a wide range of methods to determine this temperature schedule.8 Ts 0.2. the temperature should approach 0.5 on page 232. the ergodicity of the search may be lost and the global optimum may not be discovered.025 0. If the temperature sinks too fast.3.1.248 27 SIMULATED ANNEALING Equation 27. [1896]. polynomial with a=1) 0. Miki et al.13 is provided as a Java interface in Listing 55. The most established methods are listed below and implemented in Paragraph 56. t→+∞ lim getTemperature(t) = 0 (27.12 on page 717.6 Ts linear scaling (i.e. All schedules start with a temperature Ts which is greater than zero. a few general rules hold.2: Different temperature schedules for Simulated Annealing. random restarting can be applied.05 0 20 40 60 80 t 100 Figure 27.

getTemperatureln (t) = Ts if t < e Ts / ln t otherwise (27. TEMPERATURE SCHEDULING 249 solution. see Equation 27. Since ∆Ec may be 0. be estimated since it usually is not known. [2808] 27. getTemperatureexp (t) = (1 − ǫ)t ∗ Ts (27. 2.17) 27.. i.3 Polynomial and Linear Scheduling ˘ T is polynomially scaled between Ts and 0 for t iterations according to Equation 27. getTemperaturepoly (t) = 1− ˘ t t α ∗ Ts (27. This strategy is implemented in Listing 56. or 4 that depends on the positions and objective values of the local minima.10 on page 744. the schedule proceeds in linear steps.5 Larger Step Widths If the temperature reduction should not take place continuously but only every m ∈ N1 iterations.9 on page 743. Such an exponential cooling strategy will turn Simulated Annealing into Simulated Quenching [1403]. Large values of α will spend more iterations at lower temperatures. where β is an experimentally determined constant.16) 27.3.19).3. of course.19 can be used in the previous formulas. t′ = m⌊t/m⌋ (27. f(pcur . t′ computed according to Equation 27.19) . the temperature T can be set to β times ∆Ec = f(pcur .3.x.18) 27. we limit the temperature change to a maximum of T ∗ γ with 0 < γ < 1. 2043] An implementation of this method is given in Listing 56. getTemperature(t′ ) instead of getTemperature(t). This value may.3.x) is the objective value of the currently examined candidate solution pcur .x) − f(x).2 Exponential Scheduling Reduce T to (1−ǫ)T in each step (or ever m steps.3. where the exact value of 0 < ǫ < 1 is determined by experiment [2808]. For α = 1. [1167. other information may be used for self-adaptation as well. α is a constant. maybe 1. and f(x) is the objective value of the best phenotype x found so far. e.27.4 Adaptive Scheduling Instead of basing the temperature solely on the iteration index t.18 [2808]. After every m moves.

1550. 2486] Graph Theory [63. 2058. 1486. 1550. 1926. 771. 1926. Conjectures. 1486.4. 2058. 3002] Logistics [190. 2707] Data Mining [1403] Databases [771. 1403. Books Genetic Algorithms and Simulated Annealing [694] Simulated Annealing: Theory and Applications [2776] Electromagnetic Optimization by Genetic Algorithms [2250] Facts. 2994. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. 667. 1541. 4. 2171] Distributed Algorithms and Sys. 1896. 2631. 1896. 2180. 2068. 2632. 2793. 516. 308. 2405. 2901. 2966] Computer Graphics [433.3 Conferences and Workshops Table 27.2 on page 131. 2250. 3. 2180] Military and Defense [1403] Physics [1403] Security [1486] Software [1486. 3002] tems Economics and Finances [1060.4. 2994] Wireless Communication [63. 2105. 344. 2043. 2901. 2237. 1403] Engineering [437. 154.2 on page 128. 1550. 382. 1455. 2171. MIC: Metaheuristics International Conference See Table 25. 2901. 382. 227. 7.1: Applications and Examples of Simulated Annealing. 2. . 1486. 1454. 1926. 667.1 General Information on Simulated Annealing Applications and Examples Table 27. 3002] Function Optimization [1018.2 on page 227. and Improvements for Simulated Annealing [2369] Simulated Annealing for VLSI Design [2968] Introduction to Stochastic Search and Optimization [2561] Simulated Annealing [2966] 27. 1403.250 27 SIMULATED ANNEALING 27. 344. 1486. 1403. 2237. 2632.[616. 1058.4. 2631. 1541. 2793] 27. 1043. HIS: International Conference on Hybrid Intelligent Systems See Table 10. 1059. 1550. 2770. 1674. 1455. 1454.2: Conferences and Workshops on Simulated Annealing. 2058. 6. 2043. 1071. 5. 2640. 2966] Mathematics [111. 2250. 1403. 1043.4 27. 2770.2 1. 516. Area Combinatorial Problems References [190. 2068. 616.

SEA: International Symposium on Experimental Algorithms See Table 10. GICOLAG: Global Optimization – Integrating Convexity.2 on page 228. Journal of Heuristics published by Springer Netherlands . 27. and Cybernetics See Table 10.2 on page 134.2 on page 133. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. Man.4.2 on page 135. and Computational Algebraic Geometry See Table 10.4 Journals 1.2 on page 133. LION: Learning and Intelligent OptimizatioN See Table 10. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10.2 on page 131. Organization and Technology See Table 10.2 on page 133.2 on page 134. Logic Programming.2 on page 133.27. 251 ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10.2 on page 132. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10.4. Optimization. GENERAL INFORMATION ON SIMULATED ANNEALING SMC: International Conference on Systems. EngOpt: International Conference on Engineering Optimization See Table 10. WOPPLOT: Workshop on Parallel Processing: Logic.

∆E4 = 0. [45 points] 69. ∆E3 = 0. [15 points] 74.1.1. Perform the same experiments as in Task 64 on page 238 with Simulated Annealing. use the Java Simulated Annealing implementation given in Listing 56.2. Perform the same experiments as in Task 67 on page 238 with Simulated Annealing and Simulated Quenching. ∆E2 = 0.252 Tasks T27 27 SIMULATED ANNEALING 68.1). What are the differences? What are the similarities? Did you use different experimental settings? If so.8 on page 740 or develop an equivalent own implementation.25. Use both. ∆E6 = 10.1 and consider the discussion in Section 27.4 on page 181. this time by using the representation given by Khuri. Which method is better? Provide reasons for your observation.1 on page 887. Compare the results with the results listed in that section and obtained by you in Task 70. and ∆E7 = 100 are accepted by the Simulated Annealing algorithm with probabilities P1 = 0.0.01. Simulated Annealing and Hill u o Climbing to evaluate the performance of the new representation similar to what we do in Section 57. ∆E6 = 10. [10 points] 72. what happens if you use the same experimental settings for both tasks? Try to give reasons for your results. what happens if you use the same experimental settings for both tasks? Try to give reasons for your results. Base your calculations on the formulas used in Algorithm 27. Therefore. ∆E2 = 0. Use the same formulas used in Algorithm 27.0.0. Therefore. and ∆E7 = 100 is accepted by the Simulated Annealing algorithm at temperatures T1 = 0.1.5.1 (consider the discussion in Section 27. P3 = 0. Compute the probabilities that a new solution with ∆E1 = 0. Put your explanations into the context of the discussions provided in Example E17. ∆E5 = 1. ∆E5 = 1. ∆E4 = 0. [10 points] 70. Compare your results from Task 64 on page 238 and Task 68. Sch¨tz. Compare your results from Task 67 on page 238 and Task 70.1. and Heitk¨tter in [1534].001.1. Compute the temperatures needed so that solutions with ∆E1 = 0. Try solving the bin packing problem again.0.8 on page 740 or develop an equivalent own implementation. [50 points] 73. ∆E3 = 0. and P4 = 0. T3 = 10. P2 = 0.2. use the Java Simulated Annealing implementation given in Listing 56. [15 points] . and T4 = 100.001.01. T2 = 1. What are the differences? What are the similarities? Did you use different experimental settings? If so.2. [25 points] 71.

g. well. Evolutionary Algorithms1 (EAs) are population-based metaheuristic optimization algorithms which use mechanisms inspired by the biological evolution. and survival of the fittest.1. 596. Table 28.1.1 (Evolutionary Algorithm). As in most metaheuristic optimization algorithms. EAs are inspired by natural evolution. in order to refine a set of candidate solutions iteratively in a cycle. Definition D28. natural selection. The general area of Evolutionary Computation that deals with multi-objective optimization is called EMOO.wikipedia. they are a large family of different optimization methods which have. crossover. so we will compare the natural paragons of the single steps of this cycle in detail in Section 28. 1964. [167.1.. Like other metaheuristics. journals. Information on their historic development is thus largely distributed amongst the chapters of this book which deal with the specific members of this family of algorithms. Their consistent and high performance in many different problem domains (see. we will outline possible algorithms which can be used for the processes in EAs and finally give some information on conferences. The latter means that multiple. multi-objective Evolutionary Algorithms. we can distinguish between singleobjective and multi-objective variants. 2783].1 Introduction Definition D28. 172] Evolutionary Algorithms have not been invented in a canonical form. we will first outline the basic cycle in which Evolutionary Algorithms proceed in Section 28. to a certain degree. such as mutation. e. possibly conflicting criteria are optimized as discussed in Section 3. 1 http://en. 954.2 (Multi-Objective Evolutionary Algorithm). and books on EAs as well as it applications. In this section.3. 737.4 on page 314) and the appealing idea of copying natural evolution for optimization made them one the most popular metaheuristic. been developed independently by different researchers. Instead. 740.Chapter 28 Evolutionary Algorithms 28.org/wiki/Artificial_evolution [accessed 2007-07-03] . A multi-objective Evolutionary Algorithm (MOEA) is an Evolutionary Algorithm suitable to optimize multiple criteria at once [595. Later in this chapter. 171. all EAs share the “black box” optimization character and make only few assumptions about the underlying objective functions.2. evolutionary multi-objective optimization and the multi-objective EAs are called.

g of the selected individuals p ∈ mate by applying the search operations searchOp ∈ Op (which are . a population pop of ps individuals p is created. as is the case in single-objective Evolutionary Algorithms) filters out the candidate solutions with bad fitness and allows those with good fitness to enter the mating pool with a higher probability (see Section 28. |f | = 1 holds.3). if v ≡ f. This genotype-phenotype mapping may either directly be performed to all individuals (performGPM) in the population or can be a part of the process of evaluating the objective functions.1 The Basic Cycle of EAs 28 EVOLUTIONARY ALGORITHMS All evolutionary algorithms proceed in principle according to the scheme illustrated in FigInitial Population create an initial population of random individuals Evaluation compute the objective values of the solution candidates Fitness Assignment GPM apply the genotypephenotype mapping and obtain the phenotypes use the objective values to determine fitness values Reproduction create new individuals from the mating pool by crossover and mutation Selection select the fittest individuals for reproduction Figure 28.2).2.2. 3. the genotypes p.g (the usual case) or whether the population is seeded (see Paragraph 28.x have been determined and a scalar fitness value v(p) can now be assigned to each of them (see Section 28. 1. In single-objective Evolutionary Algorithms. f.1: The basic cycle of Evolutionary Algorithms.5). v.2. 4. the utilities of the different features of the candidate solutions p.1. offspring are derived from the genotypes p.1. We define the basic EA for single-objective optimization in Algorithm 28.1.2. 5.254 28. mps). A subsequent selection process selection(pop. v ≡ f holds.x in pop (and usually stored in the individual records) with the operation “computeObjectives”. f = {f} and.2. It depends on its implementation whether the individuals p have random genomes p. hence. 6. The function createPop(ps) produces this population. This evaluation may incorporate complicated simulations and calculations.1. mps) (or selection(pop. The values of the objective functions f ∈ f are evaluated for each candidate solution p. In single-objective Evolutionary Algorithms. In the reproduction phase.x (see Section 28. 2.1 which is a specialization of the basic cycle of metaheuristics given in Figure 1. With the objective functions. In the first generation (t = 1).1 and the multi-objective version in Algorithm 28. ure 28.1.g are translated to phenotypes p.8 on page 35.6).2.

f)) Algorithm 28. the return value(s) of an Evolutionary Algorithm often is not the best individual(s) from the last generation. which modifies one genotype.2. This. 28. gpm) pop(t) ←− computeObjectives(pop(t) . There usually are two different reproduction operations: mutation. Also.3).2 by incrementing a generation counter t and annotating the populations with it. the algo- Algorithm 28. Whether the whole population is replaced by the offspring or whether they are integrated into the population as well as which individuals recombined with each other depends on the implementation of the operation “reproducePop”.2 is given in Listing 56. the evolution stops here. mps) t ←− t + 1 pop(t) ←− reproducePop(mate. The termination criterion terminationCriterion . which combines two genotypes to a new one. however.11 on page 746..1 and 28.1. demands for maintaining an archive. ps) return extractPhenotypes(extractBest(pop(t) . but the best candidate solution discovered so far. is met.2 Biological and Artificial Evolution The concepts of Evolutionary Algorithms are strongly related to the way in which the .1. for instance Section 6. INTRODUCTION 255 called reproduction operations in the context of EAs. We highlight the transition between the generations in Algorithm 28. see Section 28. i. mps) Input: f: the objective/fitness function Input: ps: the population size Input: mps: the mating pool size Data: t: the generation counter Data: pop: the population Data: mate: the mating pool Data: continue: a Boolean variable used for termination detection Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 begin t ←− 1 pop(t = 1) ←− createPop(ps) continue ←− true while continue do pop(t) ←− performGPM(pop(t) .12 on page 750. An example implementation of Algorithm 28. You can find a more precise version of the basic Evolutionary Algorithm scheme (with steady-state population treatment) is specified in Listing 56. Often.1. pop(t). since a generation involves computing the objective values of multiple individuals. 7. a thing which – especially in the case of multi-objective or multi-modal optimization.1 is strongly simplified. f. ps.1: X ←− simpleEA(f.28. and crossover. If the terminationCriterion rithm continues at Point 2. e. f) if terminationCriterion then continue ←− false else mate ←− selection(pop(t) . we have a limit in terms of objective function evaluations (see.3. Then.7). for example. Otherwise. it is not sufficient to check terminationCriterion after each generation. is usually invoked after each and every evaluation of the objective functions.

and a comparator function cmpf Input: ps: the population size Input: mps: the mating pool size Data: t: the generation counter Data: pop: the population Data: mate: the mating pool Data: v: the fitness function resulting from the fitness assigning process Data: continue: a Boolean variable used for termination detection Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 begin t ←− 1 pop(t = 1) ←− createPop(ps) continue ←− true while continue do pop(t) ←− performGPM(pop(t) . ps) return extractPhenotypes(extractBest(pop.2: X ←− simpleMOEA(OptProb. the objective functions f . cmpf ) of the problem space X. f . gpm) pop(t) ←− computeObjectives(pop(t) . mps) t ←− t + 1 pop(t) ←− reproducePop(mate. ps. mps) Input: OptProb: a tuple (X. f) if terminationCriterion then continue ←− false else v ←− assignFitness(pop(t) . cmpf )) . cmpf ) mate ←− selection(pop. v.256 28 EVOLUTIONARY ALGORITHMS Algorithm 28.

2938]: 1. a chemical evolution took place in which the chemical components of life formed.wikipedia. purines4 (such as adenine and guanine).1.1. These historic ideas themselves remain valid to a large degree even today. a struggle for survival ensues. This process was likely divided in three steps: (1) In the first step. it is not clear how life started on earth. 10. 28. for instance. His theory can be condensed into ten observations and deductions [684. no two individuals are equal. etc. 7. 28. an abiogenesis 3 . 6. human beings. 3. The fittest individuals will survive and produce offspring more probably. A species will slowly change and adapt more and more to a given environment during this process which may finally even result in new species.1. Darwin [684] published his book “On the Origin of Species”2 in which he identified the principles of natural selection and survival of the fittest as driving forces behind the biological evolution. Some of the variations between the individuals will affect their fitness and hence.1 The Basic Principles from Nature In 1859.). Again. 1845. 4. Under the absence of external influences (like natural disasters. We will outline the historic ideas behind evolution and put the steps of a multi-objective Evolutionary Algorithm into this context.wikipedia. acids.2. cytosine.2 on page 493). However. the food resources are limited but stable over time. their ability to survive. became controversial as fossils indicate that evolution seems to “at rest” for longer times which exchange with short periods of sudden.1. the population size of a species roughly remains constant. 5.2. 2. Especially in sexual reproducing species.org/wiki/The_Origin_of_Species [accessed 2007-07-03] 3 http://en.org/wiki/Purine [accessed 2009-08-05] 5 http://en. Individuals that survive and reproduce will likely pass on their traits to their offspring.wikipedia.28. if no external influences occur.1 Biology Until now.2 Initialization 28. the assumption that change happens slowly over time.wikipedia. It is assumed that before the first organisms emerged.2. Point 10.org/wiki/Abiogenesis [accessed 2009-08-05] 4 http://en. INTRODUCTION 257 development of species in nature could proceed.1. Some of these variations are inheritable. Individuals which are less fit are also less likely to reproduce. and uracil) were synthesized from inorganic 2 http://en. 9. Since the individuals compete for these limited resources. The individuals of a species possess great fertility and produce more offspring than can grow into adulthood.org/wiki/Pyrimidine [accessed 2009-08-05] . some of Darwin’s specific conclusions can be challenged based on data and methods which simply were not available at his time. and pyrimidines5 (such as thymine. fast developments [473. simple organic molecules such as alcohols. 8.2. 2778] (see Section 41.

and nucleotides7 . amino acids6 .Many different theories exist on how this could have taken place [176. 2432].1. fatty acids. 28. may wipe them out of the population.1 Biology TODO Example E28.1. the earth age 3.8 billion years ago.wikipedia.3 Genotype-Phenotype Mapping 28.3.3. Evolutionary Algorithms can also be seeded [1126.2 Evolutionary Algorithms At the beginning of every generation in an EA. In the following. Besides random initialization. let us discuss the evolution on the example of fish.x = gpm(p. However. (2) Then. An important milestone in the research in this area is likely the Miller-Urey experiment 8 [1904.2.g is instantiated as a new phenotype p. demarking the onset of evolution.2. 1300. This form of initialization can be considered to be similar to the early steps of chemical and biological evolution.org/wiki/Nucleotide [accessed 2009-08-05] 8 http://en. complex organic molecules emerged. If seeding is applied.2.org/wiki/MillerUrey_experiment [accessed 2009-08-05] 9 http://en.2. 1698.1. The first single-celled life most probably occurred in the Eoarchean9 .g).wikipedia.org/wiki/Eoarchean [accessed 2007-07-03] . when an Evolutionary Algorithm starts up. 1713. 1837.258 28 EVOLUTIONARY ALGORITHMS substances. the entire first population consists of such individuals with good characteristics only. 1905] where it was shown that amino acids can form under conditions such as those expected to be present in the early time of the earth from inorganic substances.wikipedia. since the already-adapted candidate solutions will likely be much more competitive than the random ones and thus. only random genes are coupled together as individuals in the initial population pop(t = 0). Basically. pyrroles. The goal for seeding is be to achieve faster convergence and better results in the evolutionary process. Genotype-phenotype mappings are discussed in detail in Section 28. 1471]. 1139.2 Evolutionary Algorithms Usually. 6 http://en.org/wiki/Amino_acid [accessed 2009-08-05] 7 http://en. 28. 2091. Sometimes. where it was unclear which substances or primitive life form would be successful and nature experimented with many different reactions and concepts. We can consider a fish as an instance of its heredity information. as one possible realization of the blueprint encoded in its DNA.wikipedia.1. Care must thus be taken in order to achieve the wanted performance improvement. 2019]. there exists no information about what is good or what is bad. simple organic molecules were combined to the basic components of complex organic molecules such as monosaccharides. a set of individuals which already are adapted to the optimization criteria is inserted into the initial population. it also raises the danger of premature convergence (see Chapter 13 on page 151). each genotype p. 2084.1 (Evolution of Fish). (3) Finally.2. 28. The candidate solutions used for this may either have been obtained manually or by a previous optimization step [1737.2.

individuals are good in some characteristics but rarely reach perfection in all possible aspects.4. 28. 28. we could consider the life of the fish as the evaluation process of its genotype in an environment where good qualities in one aspect can turn out as drawbacks from other perspectives.2. Example E28. If its camouflage is too good on the other hand.2.1. So there may be conflicts between the desired properties.28.g ∈ G would be the construction plan and the phenotype p. For the sake of our children and their children.x ∈ X the real car.1) Cont.1.2.4. For each problem that we want to solve. different characteristics determine the fitness of an individual.5 Fitness 28. The biological fitness is a scalar value that describes the number of offspring of an individual [2542]. a cheap price fc . The survival of the genes of the fish depends on how good the fish performs in the ocean. however.2 (Evolution of Fish (Example E28. so that’s our second objective function fb . In multi-objective Evolutionary Algorithms. Often.1 Biology As stated before. Other factors influencing the fitness positively are formae like sharp teeth and colors that blend into the environment so it cannot be seen too easily by its predators such as sharks.1.2. how will it find potential mating partners? And if it is big. and a cool design fe would be good.1. The genotype p. INTRODUCTION 28. . Its fitness. fast speed fd .2.2 Evolutionary Algorithms To sum it up.1 Biology There are different criteria which determine whether an organism can survive in its environment.4 Objective Values 259 28. size alone does not help if it is too slow to catch any prey. Let us assume that we want to evolve a car (a pretty weird assumption.1. Furthermore.1. Although a bigger fish will have better chances to survive. Example E28. we can specify one or multiple so-called objective functions f ∈ f . this is exactly the same. or at least a simulation of it.). An objective function f represents one feature that we are interested in. That makes five objective functions from which for example the second and the fourth are contradictory (fb ≁ fd ). it will usually have higher energy consumption. the car should also be environment-friendly. Also its energy consumption in terms of calories per time unit which are necessary to sustain its life functions should be low so it does not need to eat all the time.5. but let’s stick with it). is not only determined by one single feature of the phenotype like its size.3 (Evolution of a Car). One objective function fa would definitely be safety.

If one fish can beat another one in all categories. 11 http://en. i. These factors are the driving forces behind the three selection steps in natural selection 11 : (1) environmental selection 12 (or survival selection and ecological selection) which determines whether an individual survives to reach fertility [684].5 on page 65 and Pareto ranking is introduced in Section 28. 2300]. In my department. The fitness of the fish is not only defined by its own characteristics but also results from the features of the other fish in the population (as well as from the abilities of its prey and predators). Whether one specific fish is fit or not. e.. smarter. the fish genome is instantiated and proved its physical properties in several different categories in Example E28.5.1 on page 94). I may be considered as average fit.2. 28.2. however. it rather describes the relative quality of a candidate solution.wikipedia. it is better adapted to its environment and thus.4 (Evolution of Fish (Example E28.3. so the basic situation in optimization very much corresponds to the one in biology. i. my relative physical suitability will decrease. (2) sexual selection. e. Instead of being an a posteriori measure of the number of offspring of an individual. There are two more aspects which have influence on this: the environment (including the rest of the population) and the potential mating partners. as done in Paragraph 28. will have a higher chance to survive. its ability to produce offspring [634. e..5.5 (Computer Science and Sports).1. This relation is transitive but only forms a partial order since a fish which is strong but not very clever and a fish that is clever but not strong maybe have the same probability to reproduce and hence. however. It performs comparisons similar to the ones we just discussed.3.3. Optimization criteria may contradict. i. and so on. If took a stroll to the department of sports science.wikipedia. One of the most popular methods for computing the fitness is called Pareto ranking10 . 10 Pareto comparisons are discussed in Section 3.1. the choice of the mating partner(s) [657. In Example E28. is still not easy to say: fitness is always relative. 2987]. How fit an individual is then does not necessarily determine directly if it can produce offspring and. is bigger.org/wiki/Natural_selection [accessed 2010-08-03] 12 http://en.1) Cont. Some popular fitness assignment processes are given in Section 28. are not directly comparable. my relative computer science skills will become much better.). intelligent and powerful.2. how many.2.6.2 Evolutionary Algorithms The fitness assignment process in multi-objective Evolutionary Algorithms computes one scalar fitness value for each individual p in a population pop by building a fitness function v (see Definition D5. It is not clear whether a weak fish with a clever behavioral pattern is worse or better than a really strong but less cunning one. Both traits are useful in the evolutionary process and maybe. stronger.1 Biology Selection in biology has many different facets [2945]. Example E28.1.1..260 28 EVOLUTIONARY ALGORITHMS Example E28.1.2. Fitness depends on the environment. and (3) parental selection – possible choices of the parents against pregnancy or newborn [2300].6 Selection 28. Let us again assume fitness to be a measure for the expected number of offspring. if so.org/wiki/Ecological_selection [accessed 2010-08-03] . 28. Yet. it depends on the environment.3.1. one fish of the first kind will sometimes mate with one of the latter and produce an offspring which is both.

Rich Harris [2300]. Hence. mps) usually picks the fittest individuals and places them into mate with the highest probability. parental .7 Reproduction .28.6 (Evolution of Fish (Example E28.3 on page 290 in detail. For survival selection. INTRODUCTION 261 Example E28. v. before or after birth. For today’s noble computer scientist.1. From time to time. at least in terms trade-off between theoretical and physical abilities. i.1. the sexual selection is less friendly to some of us. and lacks any sophisticated behavior might survive and could produce even more offspring than a highly fit one. survival selection is not the threatening factor. The oldest selection scheme is called Roulette wheel and will be introduced in Section 28.2. Rich Harris [2300] argues that after the fertilization. In the original version of this algorithm (intended for fitness maximization). This is the form of selection propagated by Darwin [684] in his seminal book “On the Origin of Species”. An intelligent fish may be eaten by a shark. 2300]. 28. The third selection factor. this decision may be negative. surprisingly. even a fish which is perfectly adapted to its environment will have low chances to create offspring if it is unable to attract female interest.5) cnt. slow.2. The sexual or mating selection describes the fact that survival alone does not guarantee reproduction: At least for sexually reproducing species. The degree to which a fish is adapted to its environment therefore can be considered as some sort of basic probability which determines its expected chances to survive environment selection. may lead to examples for reinforcement of certain physical characteristics which are much more extreme than Example E28. A negative decision. however. may also again be overruled after birth if the offspring turns out to possess especially favorable characteristics. mps) determine the mps individuals from the population pop which should enter the mating pool mate based on their scalar fitness v. selection(pop.7. Furthermore. without guarantees – even a fish that is small. the chance of an individual p to reproduce is proportional to its fitness v(p)pop. a strong one can die from a disease. parents may make a choice for or against the life of their offspring. Indeed. survival selection is always applied in order to steer the optimization process towards good areas in the search space. If the offspring does or is likely to exhibit certain unwanted traits.1) Cont.6.4.2 Evolutionary Algorithms In Evolutionary Algorithms. our relative fitness is on par with the fitness of the guys with the sports department.). parallels with human behavior and peacocks which prefer mating partners with colorful feathers over others regardless of their further qualities can be drawn. Example E28.1.). two individuals are involves in this process. A negative decision may also be caused by environmental or economical decisions. Sexual selection is considered to be responsible for many of nature’s beauty. Yet.. the so-called selection algorithms mate = selection(pop. The process of selection is always stochastic.7 (Computer Science and Sports (Example E28. 28. argues that parents in prehistoric European and Asian cultures may this way have actively selected hairlessness and pale skin. such as the colorful wings of a butterfly or the tail of a peacock [657. e. and even the most perfect fish could die from a lighting strike before ever reproducing.4. v. for instance. More information on selection algorithms is given in Section 28. also a subsequent sexual selection is used.

262 28. a process during which. features a ternary search operation (see Chapter 33). and genetics. operations which work entirely differently. An argument against close correspondence between EAs and natural evolution concerns the development of different species. the mutations affect the characteristics of resulting larva only slightly [2303]. there is a good chance that the next generation will contain at least some individuals that have combined good traits from their parents and perform even better than them. Regardless of how they were produced. again.2. is one of the central ideas of this metaheuristic.5. The strongest argument against this reference is that Evolutionary Algorithms introduce a change in semantics by being goaldriven [2772] while the natural evolution is not.8 (Evolution of Fish (Example E28. mutations may take place which slightly alter the DNA.7. “a species is a class of organisms which are very similar in many aspects such as appearance. The concept of mutation is present in EAs as well. individuals usually do not have such a “gender”.1.2.). Another example is [2912. e. Paterson [2134] further points out that “neither GAs [Genetic Algorithms] nor GP [Genetic Programming] are concerned with the evolution of new species. the crossover and mutation operations may be considered to be strongly simplified models of their natural counterparts. nor do they use natural selection. Most often. their genes will be recombined by crossover. the concept of recombination. In the car example. new construction plans for new cars are generated. The common definitions for reproduction operations in Evolutionary Algorithms are given in Section 28. i. 2914].1. Each individual from the mating pool can potentially be recombined with every other one. 28. In many advanced EAs there may be. In Genetic Algorithms. two approaches exist in nature: the sexual and the asexual method. of binary search operations. nobody would claim that the idea of selection has not been borrowed from nature although many additions and modifications have been introduced in favor for better algorithmic performance. Since fit fish produce offspring with higher probability.8 Does the Natural Paragon Fit? At this point it should be mentioned that the direct reference to Darwinian evolution in Evolutionary Algorithms is somehow controversial.2. the new generation of offspring now enters the struggle for survival and the cycle of evolution starts in another round. Whenever a female fish and a male fish mate.1) Cont. 28. Parts of the genotype (and hence.” In principle. Yet. where . for instance. properties of the phenotype) can randomly be changed with a unary search operation. This way..” On the other hand. It depends on the definition of species: According to Wikipedia – The Free Encyclopedia [2938]. Fish reproduce sexually. For this purpose. Example E28. there is the reproduction.7. this means that we could copy the design of the engine of one car and place it into the construction plan of the body of another car. Asexual reproducing species basically create clones of themselves. physiology.2 Evolutionary Algorithms In Evolutionary Algorithms.1 Biology 28 EVOLUTIONARY ALGORITHMS Last but not least.1. Furthermore. there is some elbowroom for us and we may indeed consider even different solutions to a single problem in Evolutionary Algorithms as members of a different species – especially if the binary search operation crossover/recombination applied to their genomes cannot produce another valid candidate solution. however. Differential Evolution. mutations may take place.

Now a fish can be clever and strong at the same time.9 Formae and Implicit Parallelistm Let us review our introductory fish example in terms of forma analysis. others maybe good in terms of finding mating partners. be characterized by the properties “clever” and “strong”. The Schema Theorem discussed in Section 29. for which many different possible variations exist. as illustrated in Figure 28. Genetic Algorithms (GAs) are introduced in Chapter 29 on page 325.3 Historical Classification The family of Evolutionary Algorithms historically encompasses five members. Some of them may be good in terms of camouflage.3 on page 342 gives a further explanation for this from of parallel evaluation. Nevertheless. 28. 2827]. nor do they use natural selection.28. GAs subsume all Evolutionary Algorithms which have string-based search spaces G in which they mainly perform combinatorial search. However.1.1. It then defines the ratio between the numbers of occurrences of a genotype in a population after and before selection or the number of offspring an individual has in relation to the number of offspring of another individual. If the search space and the genotype-phenotype mapping are properly designed. In-depth discussions will follow in the corresponding chapters. a living fish allows nature to evaluate the utility of at least three different formae. a numerical optimizer using a ternary search operation. Closely related is Differential Evolution. it is often considered as an a posteriori measurement. although the concept of fitness13 in nature is controversial [2542].2. see Chapter 30 on page 359).” Nonetheless. Furthermore. 13 http://en. This issue has first been stated by Holland [1250] for Genetic Algorithms and is termed implicit parallelism (or intrinsic parallelism). natural and artificial evolution are still two different things and phenomena observed in either of the two do not necessarily carry over to the other. My personal opinion (which may as well be wrong) is that the citation of Darwin here is well motivated since there are close parallels between Darwinian evolution and Evolutionary Algorithms. Paterson [2134] points out that “neither GAs [Genetic Algorithms] nor GP [Genetic Programming] are concerned with the evolution of new species. is called Evolution Strategies (ES. nobody would claim that the idea of selection has not been borrowed from nature although many additions and modifications have been introduced in favor for better algorithmic performance. one could conclude that biological fitness is just an approximation of the a priori quantity arisen due to the hardness (if not impossibility) of directly measuring it. 2. In Evolutionary Algorithms. implicit parallelism in conjunction with the crossover/recombination operations is one of the reasons why Evolutionary Algorithms are such a successful class of optimization algorithms.org/wiki/Fitness_(biology) [accessed 2008-08-10] . Since then. The set of Evolutionary Algorithms which explore the space of real vectors X ⊆ Rn . Fish can. 28. for instance. it has been studied by many different researchers [284. as well as weak and green. Here. let us assume that both properties may be true or false for a single individual.5.2. A third property can be the color.1. 1128. We will only enumerate them here in short. often using more advanced mathematical operations. For the sake of simplicity. INTRODUCTION 263 the search operations directly modify the phenotypes (G = X) according to heuristics and intelligent decision.wikipedia. 1. They hence define two formae each. fitness is an a priori quantity denoting a value that determines the expected number of instances of a genotype that should survive the selection process. 1133.

To this simple structure. On the other hand. also all EAs that evolve tree-shaped individuals are instances of Genetic Programming. This idea was never realized as a computer algorithm. The effects of these modifications are evaluated and the process is slowly shifted into the direction of improvement. 14 We discuss Nelder and Mead’s Downhill Simplex [2017] optimization method in Chapter 43 on page 501. Genetic Programming GGGP Genetic Algorithms LGP SGP Evolutionary Programming Learning Classifier Systems Evolution Strategy Differential Evolution Evolutionary Algorithms Figure 28. 1719] Satterthwaite’s REVOP [2407. 5. All five major approaches can be realized with the basic scheme defined in Algorithm 28. at least other important early contribution should not go unmentioned here: The Evolutionary Operation (EVOP) approach introduced by Box and Draper [376. They often internally use a Genetic Algorithm to find new rules for this mapping. in other words. Historically. 377] in the late 1950s. was rejected at this time [718]. . Like in Section 1. Evolutionary Programming (EP. 4. see Chapter 32 on page 413) approaches usually fall into the same category as Evolution Strategies. but Spendley et al.1 on page 379). The early research [718] in Genetic Algorithms (see Section 29. we can provide two definitions: On one hand. which will be elaborated on in Chapter 31 on page 379. corresponding to their specific search and problem spaces and the search operations [634]. For Genetic Programming (GP).1 on page 325). discussed in Chapter 35 on page 457. Often. Genetic Programming (see Section 31. Learning Classifier Systems (LCS).1. they optimize vectors of real numbers. and Evolutionary Programming (see ?? on page ??) date back to the 1950s and 60s. The idea of EVOP was to apply a continuous and systematic scheme of small changes in the control variables of a process.3 on page 31.264 28 EVOLUTIONARY ALGORITHMS 3. 2408]. [718.1. and these alike. we here classified different algorithms according to their semantics. however. are learning approaches that assign output values to given input values. Besides the pioneering work listed in these sections. these do not directly concern the search or problem spaces. GP includes all Evolutionary Algorithms that grow programs.2: The family of Evolutionary Algorithms. [2572] used it as basis for their simplex method which then served as progenitor of the Downhill Simplex algorithm14 by Nelder and Mead [2017]. a randomized Evolutionary Operation approach. there exist many general improvements and extensions. algorithms. such approaches have also been used to derive algorithm-like structures such as finite state machines.

2 on page 289) is applied. In left extinctive selections. 1461.. most selection methods are randomized. Even in preservative strategies. In Evolutionary Algorithms that are generational [2226]. 28.wikipedia. extinctive strategies are also known as generational algorithms.4 Populations in Evolutionary Algorithms Evolutionary Algorithms try to simulate processes known from biological evolution in order to solve optimization problems. 28.28. Species in biology consist of many individuals which live together in a population. we speak of extinctive selection [2015. Later in this chapter. There exist various way in which an Evolutionary Algorithm can process its population.1. Hence.1. The method of selecting the individuals for reproduction. Especially interesting is how the population pop(t + 1) of the next generation is formed from (1) the individuals selected from pop(t) which are collected in the mating pool mate and (2) the new offspring derived from them. it is not granted that the best individuals will always survive. where the female dies after protecting the eggs until the larvae hatch. by cell division.4. Even if they pick the best candidate solutions with the highest probabilities.org/wiki/Binary_fission [accessed 2008-03-12] . the best individuals are not allowed to reproduce in order to prevent premature convergence of the optimization process.wikipedia.1. we will discuss the major components of many of today’s most efficient Evolutionary Algorithms [593]. Conversely. Especially in the area of Genetic Algorithms. the population is a combination of the previous population and its offspring [169. e. Definition D28. the next generation will only contain the offspring of the current one and no parent individuals will be preserved. In general. 2478].1 Extinctive/Generational EAs If it the new population only contains the offspring. the worst individuals are not permitted to breed in right extinctive selection schemes in order to reduce the selective pressure since they would otherwise scatter the fitness too much [2987]. i. 15 http://en. In this case. the elders will not be present in the next generation. 3.2 Preservative EAs In algorithms that apply a preservative selection scheme. Some of the distinctive features of these EAs are: 1. Extinctive selection can be compared with ecosystems of small single-cell organisms (protozoa15 ) which reproduce in a asexual fissiparous16 manner. they also can be applied to all members of the EA family alike. 2772]. The individuals in a population compete but also mate and with each other generation for generation. Other comparisons can partly be drawn to the sexual reproducing to octopi. or to the black widow spider where the female devours the male after the insemination. 28. of course. unless truncation selection (as introduced in Section 28. 2335. Extinctive Evolutionary Algorithms can further be divided into left and right selection [2987]. 2. In EAs.org/wiki/Protozoa [accessed 2008-03-12] 16 http://en. The biological metaphor for such algorithms is that the lifespan of many organisms exceeds a single generation.3 (Generational). The way the offspring is included into the population(s).4.x). INTRODUCTION 265 Then. parent and child individuals compete with each other for survival. The population size and the number of populations used.4.g) and a physical representation (the phenotype p. it is quite similar: the population pop ∈ ps (G × X) is a set of ps individuals p ∈ G × X where each individual p possesses heredity information (it’s genotype p. they may also select worse individuals.1.

wikipedia. Therefore.. the deterministic selection of the µ best individuals from a population. λ corresponds to the population size ps. still apply recombination [302]. In our own experiments on a real-world Vehicle Routing Problem discussed in Section 49.4. e.3 The µ-λ Notation The µ-λ notation has been developed to describe the treatment of parent and offspring individuals in Evolution Strategies (see Chapter 30) [169. 2316. 1244]. steady-state algorithms showed better convergence behavior in a multi-modal landscape. from µ parent individuals λ offspring are created from µ parents. 2437]. In experiments of Jones and Soule [1463] (primarily focused on other issues). The naming (µ + λ) implies that only unary reproduction operators are used.1. i. i.3.4. e.1. The population adaptation strategies listed below have partly been borrowed from the German Wikipedia – The Free Encyclopedia [2938] site for Evolution Strategy17 : 28. the size of the mating pool (mps = µ). i. this notation stands for algorithms which employ no crossover operator but mutation only. a temporary population of size ps = λ+µ. 28.3. are preservative Evolutionary Algorithms which usually do not allow inferior individuals to replace better ones in the population. for instance.1. abbreviated by SSEA. Usually. e. in practice. introduce steady-state Evolutionary Algorithms that are able to outperform generational NSGA-II (which you can find summarized in ?? on page ??) for some difficult problems. 2270. Steady-state Evolutionary Algorithms often produce better results than generational EAs. µ is the number of parent individuals. The new offspring then replaces the worst member of the population if and only if it has better fitness..4. many works on (µ. Then. 2436]. The elder and the offspring together form the population pop. 28. Chafekar et al.1. for example. Now it is often used in other Evolutionary Algorithms as well. Yet. 2041. e. only the µ fittest ones are kept and the λ worst ones are discarted [302. (µ + λ)strategies are thus preservative. On the other hand. as introduced by Schwefel [2436].1. λ) Evolution Strategies.. the (µ. The steady-state GENITOR has shown good behavior in an early comparative study by Goldberg and Deb [1079]. λ)-EAs For extinctive selection patterns. From the joint set of offspring and parents. 1. Here. 2641]. the binary search operator (crossover/recombination) is often applied exactly once per generation. steady-state algorithms outperformed generational ones – but only if sharing was applied [2912. only the better individual will survive and enter the mating pool mate (mps = 1).3. they only keep the µ best individuals and discard the µ parents as well as the λ−µ worst children [295.org/wiki/Evolutionsstrategie [accessed 2007-07-03] . we run also the risk of premature convergence. Again. with a badly configured steady-state approach. 28.1 Steady-State Evolutionary Algorithms 28 EVOLUTIONARY ALGORITHMS Steady-state Evolutionary Algorithms [519. Similar results have been reported by Chevreux [558] in the context of molecule design optimization.3 (1 + 1)-EAs The EA only keeps track a single individual which is reproduced. EAs named according to this pattern create λ ≥ µ child individuals from the µ available genotypes. However.4. 1243.266 28.4.2 (µ. From these. this notation implies truncation selection. From these two. 17 http://en.2. 2914]. ps = 2. λ denotes the number of offspring (to be) created and 2.. This scheme basically is equivalent to Hill Climbing introduced in Chapter 26 on page 229. this part of the convention is often ignored [302].3 on page 536. λ)-notation is used. i.1 (µ + λ)-EAs Using the reproduction operations. [519].

This is equivalent to a random walk and hence. λ) approaches where ρ denotes the number of parents of an offspring individual. This individual is reproduced.3. λ)γ )-EAs Geyer et al. An elitist Evolutionary Algorithm [595. The offspring then replaces its parent. This means.3. the best individuals from each of the γ isolated candidate solutions propagated back to the top-level population. λ) Evolution Strategy [1843] which.4.6 (µ/ρ. the least fit individual is removed. On the other hand. λ′ (µ. If (binary) recombination is also applied (as normal in general Evolutionary Algorithms)..3.3. basically. the population contains µ individuals from which one is drawn randomly. 28.1. ρ = 2 holds.1. Normally. 1)-EAs 267 The EA creates a single offspring from a single parent. 2279].1. Then.4.5 (µ + 1)-EAs Here. Elitism is an additional feature of Global Optimization algorithms – a special type of preservative strategy – which is often realized by using a secondary population (called the archive) only containing the non-prevailed individuals. 28. the cycle starts again with λ′ new child individuals. Alternatively. i. selected. two parents can be chosen and recombined [302. only unary mutation operators (ρ = 1) are used in Evolution Strategies. λ)-EAs The notation (µ/ρ.4.2 on page 114. In other words.4.28.4 (1. A special case of (µ/ρ. This nested Evolution Strategy can be more efficient than the other approaches when applied to complex multimodal fitness environments [1046. 28. the (µ/ρ + λ)-Evolution Strategies are generalized (µ. 28.4 (Elitism). λ) is a generalization of (µ. see Section 8.3. From the joint set of its offspring and the current population. is similar to an Estimation of Distribution Algorithm.4 Elitism Definition D28. λ)-Evolutionary Algorithms. λ) algorithms is the (µ/µ.4.1. the risk of converging to a local optimum is also higher. the family of EAs the µ-λ notation stems from. 1045. λ children are created from which the fittest µ are passed on to the next generation.1.1. The main advantage of elitism is that it is guaranteed that the algorithm will converge.8 (µ′ .1. 1691] ensures that at least one copy of the best individual(s) of the current generation is propagated on to the next generation. This population is updated at the end of each iteration. the parameter ρ is added to denote the number of parent individuals of one offspring. 28. [1044. e. In each of the γ generations. 1046] have developed nested Evolution Strategies where λ′ offspring are created and isolated for γ generations from a population of the size µ′ . INTRODUCTION 28. λ) used to signify the application of ρ-ary reproduction operators. meaning that once the basin of attraction of the global optimum has been discovered. the Evolutionary Algorithm will converge to that optimum. 2278]. It should be noted that .4. 711.7 (µ/ρ + λ)-EAs Analogously to (µ/ρ. After the γ generations. does not optimize very well.

cmpf ) archive ←− pruneOptimalSet(archive. f) archive ←− updateOptimalSetN(archive. ps. ps) return extractPhenotypes(extractBest(archive. gpm) pop(t) ←− computeObjectives(pop(t) .3: X ←− elitistMOEA(OptProb. cmpf ) of the problem space X. cmpf )) .268 28 EVOLUTIONARY ALGORITHMS Algorithm 28. mps) Input: OptProb: a tuple (X. mps) t ←− t + 1 pop(t) ←− reproducePop(mate. v. f . cmpf ) mate ←− selection(pop. pop. and a comparator function cmpf Input: ps: the population size Input: as: the archive size Input: mps: the population size Data: t: the generation counter Data: pop: the population Data: mate: the mating pool Data: v: the fitness function resulting from the fitness assigning process Data: continue: a Boolean variable used for termination detection Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 begin t ←− 1 archive ←− () pop(t = 1) ←− createPop(ps) continue ←− true while continue do pop(t) ←− performGPM(pop(t) . the objective functions f . as) if terminationCriterion then continue ←− false else v ←− assignFitness(pop(t) . as.

we gave some general rules-of-thumb for the design of the representation used in an optimizer.1. unprevailed elements from the population into it and also removes archived individuals which become prevailed by those new optima. Figure 28.3 on page 311. it is the empty set ∅.1. The archive archive is the set of best individuals found by the algorithm.3: The configuration parameters of Evolutionary Algorithms. More about pruning can be found in Section 28. Here we want to list the set of configuration and setup choices available to the user of an EA. 2. it is updated with the function “updateOptimalSetN” which inserts only the new. Algorithms that realize such updating are defined in Section 28. You should also notice that both. employing techniques like clustering in order to preserve the element diversity.3. INTRODUCTION 269 elitism is usually of fitness and concentrates on the true objective values of the individuals. their efficient is strongly influenced by design choices and configuration parameters.3 illustrates these parameters. over oss ulat (pop n and cr tatio rates) . If the archive becomes too large – it might theoretically contain uncountable many individuals – “pruneOptimalSet” reduces it to a proper size. Initially. Algorithm 28.28. such archive-based algorithms can also be used in non-elitist Evolutionary Algorithms by simply replacing the parameter archive with ∅. pe Searc h Searc Space a n h Op erati d ons Se le ct io Arch n A Figure 28. 28. the fitness assignment and selection processes. Subsequently. 3.2 on page 256 and the basic evolutionary cycle given in Section 28.5 Configuration Parameters of Evolutionary Algorithms Evolutionary Algorithms are very robust and efficient optimizers which can often also provide good solutions for hard problems. However. an extension of Algorithm 28.6.1: 1. In principle. of elitist Evolutionary Algorithms may take the archive as additional parameter. The performance and success of an evolutionary optimization approach applied to a problem given by a set of objective functions f and a problem space X is defined by G en ot y M pe-P ap h pi en ng ot y ive/P lg or ith m s Fitne en ignm s Ass ess Proc t runin g rs mete Para muBasic ion size.6. In Chapter 24. Such an archive-based elitism can also be combined with generational approaches.1.1 on page 308.

In generative mappings. However. the elements g processed by the search operations searchOp ∈ Op may differ from the candidate solutions x ∈ X which are evaluated by the objective functions f : X → R. each element of the phenotype is related to one element in the genotype [2735] or to a group of genes. For most small-scale and many largescale problems. Generally. In ??. finally 5. 2. generative or ontogenic mappings can be the best choice [777. and. the archive size as and which pruning technology is used to prevent it from overflowing.3 on page 85 we gave a precise definition of the term GPM. the third set of genotype-phenotype mappings additionally incorporates feedback from the environment or the objective functions into the developmental process (see Section 28. i. 3. the genotype-phenotype mapping is the identity mapping. direct mappings are completely sufficient. 28.1 Direct Mappings Direct mappings are explicit functional relationships between the genotypes and the phenotypes [887]. especially if large-scale.2. Here. 2. 3. the genotype-phenotype mapping “gpm” connecting the search Space and the problem space. the fitness assignment process “assignFitness” and the selection algorithm “selection”. Based on the work of Devert [777]. Usually.1). 28.2. whether it uses an archive archive of the best individuals found and. Which kind of genotype-phenotype mapping should be used for a given optimization task strongly depends on the type of the task at hand. possibly repetitive or recursive structures are to be created. If G = X. its basic parameter settings such as the population size ps or the crossover rate cr and mutation rate mr. we can distinguish four types of genotype-phenotype mappings: 1. the choice of the search space G and the search operations Op. the phenotypes are built step by step in a process directed by the genotypes. The search operations then directly work on the candidate solutions.2. direct mappings. we define explicit or direct mappings as follows: . if so. In many cases.2.. 2735]. e. Also.2. Finally. simple functional transformations which directly relate parts of the genotypes and phenotypes are used (see Section 28.1). the algorithmic complexity of these mappings usually is usually not above O(n log n) where n is the length of a genotype.2). the search space G is not necessarily the same as the problem space X.2. 4. we go more into detail on how to state the configuration of an optimization algorithm in order to fully describe experiments and to make them reproducible. In Section 4. the genotypes serve as programs which direct the phenotypical developments (see Section 28. In such cases. 4. a mapping between G and X is necessary: the genotype-phenotype mapping (GPM).270 28 EVOLUTIONARY ALGORITHMS 1. In other words.2 Genotype-Phenotype Mappings In Evolutionary Algorithms as well as in any other optimization method.

we provide some examples for direct genotype-phenotype mappings. . respectively. Bit string genomes are sometimes complemented with the application of gray coding18 during the genotype-phenotype mapping. In gray code. this may deceive the optimization process to give more emphasis to dimensions with large bounds while neglecting dimensions with smaller bounds. An explicit (or direct) representation is a representation for which a computable functions exists that takes as argument the size the genotype and gives an upper bound for the number of algorithm steps of the genotypephenotype mapping until the phenotype-building process has converged to a stable phenotype. where xi ∈ R and xi ∈ R are the lower and upper x x x ˘ bounds of the ith dimension of the problem space X.9 (Binary Search Spaces: Gray Coding). 810 has the binary representation 10002 but 710 has 01112 . x2 ] × · · · × [˘n .5 can be. x1 ] × [˘2 . If the value of a natural number changes by one. a more or less efficient search in the first dimension would be possible. in its binary complement representation. usually each element of a candidate solution is written to exactly once.wikipedia. each element of the numerical vector is set to one value. . If bit strings are transformed to vectors of natural numbers. Collins and Eaton [610] studied further encodings for Genetic Algorithms and found that their E-code outperform both gray and direct binary coding in function optimization. for example.01 to each dimension of the candidate vector.10 (Real Search Spaces: Normalization).11 (Neglecting One Dimension of X). If bounded real-valued. The algorithm steps mentioned in Definition D28.28. ˘ ˘ A genotype-phenotype mapping which allows the search to operate on G = [0. In the following. xn ]. an arbitrary number of bit may change. continuous problem spaces X ⊆ Rn are investigated. 1]n instead of X = [˘1 . Example E28. it is possible that the bounds of the dimensions differ significantly. would prevent such pitfalls from the beginning: gpmscale (g) = (˘i + g [i] ∗ (xi − xi ∀i ∈ 1 . n) x ˘ ˘ ˘ ˘ ˘ ˘ (28. The same goes when scaling real vectors: the elements of the resulting vectors are written to once. Example E28. Using gray code may thus be good to preserve locality (see Section 14. The bounds of the first dimension are x1 = −1 and x1 = 1 and the ˘ bounds of the second dimension are x2 = −109 and x2 = 109 . Then. The only search operation ˘ available would add random numbers normally distributed with expected value µ = 0 and standard deviation σ = 0. the write operations to the phenotype. Example E28.1) 18 http://en.org/wiki/Gray_coding [accessed 2007-07-03] . only one bit changes: (one possible) the gray code representation of 810 is 11002g and 710 then corresponds to 0100g2 . Depending on the optimization algorithm and search operations applied.2. In the direct mappings we discuss in the remainder of this section.5 (Explicit Representation). Assume that a two dimensional problem space X was searched for the best solution of an optimization task. for example. GENOTYPE-PHENOTYPE MAPPINGS 271 Definition D28.2) and to reduce hidden biases [504] when dealing with bit-string encoded numbers in optimization.

modularity. Therefore. A huge manifold of information is hence decoded from “data” which is of a much lower magnitude. The natural embryogenesis processes are much more complicated than the indirect mappings applied in Evolutionary Algorithms. The same goes for approaches which are inspired by chemical diffusion and such and such. can lead to better results in optimization.1. by stepwise applications of rules and refinements. [1829]. in turn. Many researchers have drawn their inspiration for indirect mappings from this embryogenesis. ) encoded in the same way many times is one of the reasons why natural embryogeny leads to the development of organisms whose number of modules surpasses the number of genes in the DNA by far. many genotype-phenotype mappings based on gene reuse. into the phenotype building process.1 Generative Mappings: Phenotypic Interaction In the following.2. there are only about 30 thousand active genes in the human genome (2800 million amino acids) for over 100 trillion neural connections in our cerebrum. for instance. The inspiration we get from nature can thus lead to good ideas which. The basic idea of such developmental approaches is that the phenotypes are constructed from the genotypes in an iterative way. in the phenotype-building process in indirect mappings is more complex [777]. The layout of the light receptors in the eye.2. Furthermore. life begins with a single cell which divides19 time and again until a mature individual is formed20 after the genetic information has been reproduced.1 Genetic Programming 19 http://en. artificial ontongeny (AO) [356]. the environment. 1637]. The emergence of a multi-cellular phenotype from its genotypic representation is called embryogenesis21 in biology. allowing for the repetition of the same modules (cells. . receptors.2. 28. cell division will continue until the individual dies.2.org/wiki/Embryogenesis [accessed 2007-07-03] this is not important here. We distinguish GPMs which solely use state information (such as the partly-constructed phenotype) and such that additionally incorporate feedback from external information sources. cellular encoding [1143]. is always the same – just their wiring changes. If we draw inspiration from such a process. The genotypes usually represent the rules and transformations which have to be applied to the candidate solutions during the construction phase. Natural embryogeny is a highly complex translation from hereditary information to physical structures. 21 http://en. . . However. we provide some examples for generative genotype-phenotype mappings. Yet.org/wiki/Cell_division [accessed 2007-07-03] 20 Matter of fact.wikipedia. These refinements take into consideration some state information such as the structure of the phenotype. In nature. Indeed. As pointed out by Manos et al. 28. for example.wikipedia. encodes the structural design information of the human brain. . we would like to point out that being inspired by nature does – by far – not mean to be like nature. In the following. computational embryogeny [273. morphogenesis [291.2. 1431] and artificial development [2735]. the DNA. we find that features such as gene reuse and (phenotypic) modularity seem to be beneficial. and feedback between the phenotypes and the environment are not related to and not inspired by nature.272 28. from the developmental process which embryos undergo when developing to a fully-functional being inside an egg or the womb of their mother.2 Indirect Mappings 28 EVOLUTIONARY ALGORITHMS Different from direct mappings. Among other things. developmental mappings received colorful names such as artificial embryogeny [2601]. This is possible because the same genes can be reused in order to repeatedly create the same pattern. we divide indirect mappings into two categories.

In Grammatical Evolution (??).12 (Grammar-Guided Genetic Programming). These rules can describe a distinct wing shape which develops according to the feedback from the environment.28. be a set of points describing the surface of an airplane’s wing. This additional information can stem from the objective functions. the mapping may access the full simulation information as well. The phenotype could. GENOTYPE-PHENOTYPE MAPPINGS 273 Example E28. as is. for instance. Each gene in the genotypes of these approaches usually encodes for the application of a single rule of the grammar. for example. e The decide whether and where cells were added to or removed from the nest based on the temperature and density of the cells in the simulation. integer strings with genes identifying grammatical productions are used as well. step by step. complex sentences can be built by unfolding the genotype. be executed to build graphs (as in Luke and Spector’s Edge Encoding [1788] discussed here in ?? on page ??) or electrical circuits [1608]. Example E28.15 (Developmental Termite Nest Construction). In the latter case. This way. Depending on the un-expanded variables in the phenotype. The fitness of the cell depends on how close the temperature in each cell comes to a given target temperature. Genetic Programming. usually the grammar of the programming language in which these programs are to be specified is given. Applying the rules identified by them may introduce new unexpanded variables into the phenotype. The rules which were evolved in Est´vez and Lipson’s work are directly applied to the phenotypes in the simulations. the genotypes are trees. The genotype could be a real vector holding the weights of an artificial neural network. If simulations are involved in the computation of the objective functions. Est´vez and Lipson [887] evolved rules for the construction of a termite nest. In grammar-guided Genetic Programming. . 28.2. In a genotype-phenotype mapping. the programs may. the case in Gads (see ?? on page ??) which uses an integer string genome where each integer corresponds to the ID of a rule. the runtime of such a mapping is not necessarily known in advance. Trees are very useful to express programs and programs can be used to create many different structures. as we will discuss in Chapter 31. these genes may have different meanings. the structure of the phenotype strongly depends on how and how often given instructions or automatically defined functions in the genotypes are called.2 Ontogenic Mappings: Environmental Interaction Full ontogenic (or developmental [887]) mappings even go a step farther than just utilizing state information during the transformation of the genotypes to phenotypes [777]. Hence. The phenotype e is the nest which consists of multiple cells.2. is the family of Evolutionary Algorithms which deals with the evolutionary synthesis of programs. Example E28.13 (Graph-creating Trees).14 (Developmental Wing Construction). for instance. Example E28.2. In Standard Genetic Programming approaches. The artificial neural network may then represent rules for moving the surface points as feedback to the air pressure on them given by a simulation of the wing.

the lower the fitness of a candidate solution. i. if available).2). cmpf ) ⇒ v(p) ∈ R+ ∀p ∈ pop v = assignFitnessArch(pop. may thus change in each iteration.2 (and archive archive. the better. i. archive.14 on page 34). Besides such ranking data. Therefore. This assignment process takes place in each generation of the Evolutionary Algorithm and. This can improve the chance of finding the global optima as well as the performance of the optimization algorithm significantly.1 Introduction In multi-objective optimization.3) Listing 55.2.x ≺ p2 . but its suitability relative to the other individuals in the population pop. many algorithms for survival and mating selection base their decision on such a scalar value. If no density or sharing information is included and the fitness only reflects the strict partial order introduced by the Pareto (see Section 3. but also the overall diversity of the population. A fitness assignment process. but on the whole population pop of the Evolutionary Algorithm (and on the archive archive of optimal elements. the fitness values are often stored in a special member variable in the individual records.6 (Fitness Assignment Process).274 28 EVOLUTIONARY ALGORITHMS 28.3. e. p2 ∈ pop ∪ archive (28.1 on page 94 in Section 5. If many individuals in the population occupy the same rank or do not dominate each other. either the phenotypes p. v = assignFitness(pop.14 on page 718 provides a Java interface which enables us to implement different fitness assignment processes according to Definition D28. as defined in Definition D5. Including a fitness assignment step may also be beneficial in a MOEA. but can also prove useful in a single-objective scenario: The fitness assigned to an individual may not just reflect the absolute utility of an individual. this equation may no longer be applicable. p1 . e.. we generally minimize fitness values.. each candidate solution p. were intended to perform single-objective optimization. if an archive is available Equation 28. In the context of this book.x) thus may not only depend on the candidate solution p.x ⇒ v(p1 ) < v(p2 ) ∀p1 .2) (28. cmpf ).x itself. A fitness assignment process “assignFitness” uses a comparison criterion cmpf (see Definition D3.3).x) for the phenotype p. were capable of finding solutions for a single objective function alone. Then. Historically.20 on page 77) to create a function v : G × X → R+ (see Definition D5. v(p) can be considered as a mapping that returns the value of such a variable which has previously been stored there by a fitness assignment process assignFitness(pop. Equation 28. This way. cmpf ) ⇒ v(p) ∈ R+ ∀p ∈ pop ∪ archive (28. such information will be very helpful. Therefore.x ∈ X or the genotypes p.x). Definition D28. The early Evolutionary Algorithms.g ∈ G may be analyzed and compared. as stated in Section 5.1 on page 94) which relates a scalar fitness value to each individual p in the population pop as stated in Equation 28.5. for instance. If further information such as diversity metrics are included in the fitness. fitness can also incorporate density/niching information.5) or a prevalence relation (see Section 3. however.2 is thus used to transform the information given by the vector of objective values f (p. the fitness function has the character of a heuristic function (see Definition D1.x ∈ X of an individual p ∈ G × X to a scalar fitness value v(p).6.3 Fitness Assignment 28. In practical realizations.3.x is characterized by a vector of objective values f (p.4 will hold.4) . not only the quality of a candidate solution is considered. The fitness v(p.

Here. We will only obtain a subset of the best solutions and it is even possible that this fitness assignment method leads to premature convergence to a local optimum.16 on page 759. It makes no use of Pareto or prevalence relations. Although this idea looks appealing at first glance. Since fitness is subject to minimization they will strongly be preferred. This approach is very static and brings about the same problems as weighted sum-based approach for defining what an optimum is discussed in Section 3. Instead of exploring the problem space and delivering a wide scan of the frontier of best possible candidate solutions.18 on page 62 from the weighted sum optimum definition. For computing the weighted sum of the different objective values of a candidate solution. The weights have to be chosen in a way that ensures that v(p) ∈ R+ holds for all individuals p.4 which is implemented in Java in Listing 56. we will synonymously refer to this approach as Prevalence ranking and define Algorithm 28. or we can assign the number of other individuals p3 it is dominated by to it. the worst possible fitness under v1 .3.x ≺ p2 .28. v2 (p1 ) ≡ |p3 ∈ pop : p3 .16: It promotes individuals that reside in crowded region of the problem space and underrates those in sparsely explored areas.2 Weighted Sum Fitness Assignment 275 The most primitive fitness assignment strategy would be to just use a weighted sum of the objective values. This is repeated until all candidate solutions have a proper fitness assigned to them. the previously mentioned negative effects will not occur and the exploration pressure is applied to a much wider area of the Pareto frontier. A more efficient approach with complexity O ps log|f |−1 ps has been introduced by Jensen [1441] based on a divide-and-conquer strategy. The individuals only dominated by those on rank 0 (now non-dominated) will be removed and get the rank 1. a non-prevailed candidate solution in a sparsely explored region of the search space may not prevail any other individual and hence.6) 2.x ∈ X.7 should receive v2 ).x| (28.3. FITNESS ASSIGNMENT 28. it has a decisive drawback which can also be observed in Example E28. We can assign a value inversely proportional to the number of other individuals p2 it prevails to each individual p1 . If an individual has many neighbors.3.3 Pareto Ranking Another simple method for computing fitness values is to let them directly reflect the Pareto domination (or prevalence) relation.5) 28. it will focus all effort on a small set of individuals. This way. the idea is to assign the number of individuals it is prevailed by to each candidate solution [1125.3. This so-called Pareto ranking can be performed by first removing all non-prevailed (nondominated) individuals from the population and assigning the rank 0 to them (since they are prevailed by no individual and according to Equation 28. the same is performed with the rest of the population. However. the fitness assignment process achieves exactly the opposite of what we want. receive fitness 1. Then. There are two ways for doing this: 1. Thus. it can potentially prevail many other individuals. we reuse Equation 3.7) In case 1. v1 (p1 ) ≡ 1 |p2 ∈ pop : p1 . A much better second approach for fitness assignment has first been proposed by Goldberg [1075]. 1783]. .x)) (28.x| + 1 (28.3 on page 62. It should be noted that this simple algorithm has a complexity of O ps2 ∗ |f | . individuals that dominate many others will here receive a lower fitness value than those which are prevailed by many. Since we follow the idea of the freer prevalence comparators instead of Pareto dominance relations.x ≺ p2 . v(p) = assignFitnessWeightedSum(pop) ⇔ (∀p ∈ pop ⇒ v(p) = ws(p.

Figure 28.x) then cnt ←− cnt + 1 return v2 v2 (p) ←− cnt Example E28.4: An example scenario for Pareto ranking.x.4 and Table 28. both subject to minimization.1 illustrate the Pareto relations in a population of 15 individuals and their corresponding objective values f1 and f2 .x) < 0 if (j = i) ∧ (pop[j].x ≺ p.276 28 EVOLUTIONARY ALGORITHMS Algorithm 28. .4: v2 ←− assignFitnessParetoRank(pop.16 (Pareto Ranking Fitness Assignment). p. cmpf ) Input: pop: the population to assign fitness values to Input: cmpf : the prevalence comparator defining the prevalence relation Data: i. cnt: the counter variables Output: v2 : a fitness function reflecting the Prevalence ranking 1 2 3 4 5 6 7 8 begin for i ←− len(pop) − 1 down to 0 do cnt ←− 0 p ←− pop[i] for j ←− len(pop) − 1 down to 0 do // Check whether cmpf (pop[j]. j. 10 f2 8 7 6 5 4 3 2 1 0 1 5 1 8 6 9 10 7 2 12 Pareto Frontier 2 3 4 5 3 11 13 14 15 4 6 7 8 9 10 f1 12 Figure 28.

7} {3} {2. 3.28. 9. 10. 8.1 shows that all four non-prevailed individuals now have the best possible fitness 0 if the method given as Algorithm 28. 14. 10. 13. A better approach of fitness assignment should incorporate such information and put a bit more pressure into the direction of individual 4.1: The Pareto domination relation of the individuals illustrated in Figure 28. 15} {14. we have mentioned that the drawback of Pareto ranking is that it does not incorporate any information about whether the candidate solutions in the population reside closely to each other or in regions of the problem space which are only sparsely covered by individuals. 6. 15} {9. 3. 15} {13. The candidate solution 4 gets the worst possible fitness 1. its fitness is better than the fitness of the non-dominated element 3. chances are sill that more individuals from this area will be selected than from the bottom-right area. 6. 5. 7} {2. Although individual 7 is dominated (by 1). 11. we provide the fitness values are inversely proportional to the number of individuals which are dominated by p. 14. 10. Sharing. 12} {1. followed by individual 1. 2. 7. 13. 8. 14.4 cannot fulfill this hope. 7. Algorithm 28. 9. FITNESS ASSIGNMENT p. We computed the fitness values for both approaches for the individuals in the population given in Figure 28. both candidate solutions will most probably be not selected and vanish in the next generation. 6. 15} ∅ {8. 8. 2. 11. whereas the surrounding of candidate solution 4 is rather unknown.4 Sharing Functions Previously. 2. More individuals are present in the top-left area of Figure 28.4.3.x dominates 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {5. 12} {1.4 and at least two individuals with the same fitness as 3 and 4 reside there.4 and noted them in Table 28.3. 2. 15} {15} {15} ∅ is dominated by ∅ ∅ ∅ ∅ {1} {1. 14} v1 (p) v2 (p) 1/7 1/10 1/5 277 1 1/3 1/5 1/6 1/2 1/3 1/3 1/3 1/4 1/2 1/2 1 0 0 0 0 1 2 1 4 4 2 2 1 3 9 13 Table 28. The loss of individual 4 will greatly decrease the diversity and further increase the focus on the crowded area near 1 and 2. Hence. 6. 7} {2. 14. 9.4 is applied. 6} {1. 11. Column v2 (p) in Table 28. A good example for the problem that approach 1 prefers crowded regions in the search space while ignoring interesting candidate solutions in less populated regions are the four non-prevailed individuals {1. 2} {2} {1. in order to make the Evolutionary Algorithm investigate this area more thoroughly. since it prevails no other element. the region around the individuals 1 and 2 has probably already extensively been explored. the focus is not centered to the overcrowded regions. 9. The best fitness is assigned to the element 2. 11. In column v1 (p). 3. 5. 7. 28. 15} {8. 4} on the Pareto frontier. 3. Thus. 12. 13. Its chances for reproduction are similarly low than those of individual 15 which is dominated by all other elements except 4. 14. Therefore. 9. 15} {14.1. 15} {6. 2. as a method for including such diversity information into the . However. 15} {12. 14. 15} {15} {14. 10.

x). punishing solutions in crowded regions [1899]. another approach is the exponential sharing method “Sh exp”.g) of the genotypes. The goal of 22 See ?? on page ?? for more information on the Hamming distance.ξ (d) = Sh ccavσ.13) The niche count “m” is always greater than or equal to 1. this makes only sense in lower dimensional spaces (small values of n).g.x. was introduced by Holland [1250] and later refined by Deb [735].12) (28. 1]. since p ∈ P and. If the candidate solutions are real vectors in the Rn . the objective value was simply divided by the niche count. p)) = 1 is computed and added up at least once. The niche count m(p. The original sharing approach was developed for fitness assignment in single-objective optimization where only one objective function f was subject to maximization. . The value of Shσ is 1 if d = 0 and 1 is d exceeds a specific constant value σ.ξ (d) = For sharing.  1 if d ≤ 0  Shσ (d = dist(p1 . the distance of the individuals in the search space G as well as their distance in the problem space X or the objective space Y may be used. Typically. the simple triangular function “Sh tri” [1268] or one of its either convex (Sh cvexξ ) or concave (Sh ccavξ ) pendants with the power ξ ∈ R+ . p′ )) ∀p′ ∈P  e− ξd −e−ξ  σ otherwise 1−e−ξ (28.10) (28. Definition D28.278 28 EVOLUTIONARY ALGORITHMS fitness assignment process. 1080]. p2 .11) 1− 1− if 0 ≤ d < σ 0 otherwise if 0 ≤ d < σ 0 otherwise 1 if d ≤ 0 0 if d ≥ σ Sh expσ. i. Definition D28. where the search space is the set of all bit strings G = Bn of the length n. p2 . p2 ) between two individuals p1 and p2 to the real interval [0. 2391]. we could use the Euclidean distance of the phenotypes of the individuals directly.8 (Niche Count). ξ > 0 are applied.7 (Sharing Function). A sharing function Sh : R+ → R+ is a function that maps the distance d = dist(p1 . 1] if 0 < d < σ (28. P ) [738. 1899] of an individual p ∈ P is the sum of its sharing values with all individual in a list P . In Genetic Algorithms. P ) = Shσ (dist(p. In this case. since the Euclidean distance in Rn does not convey much information for high n. Sh triσ (d) = Sh cvexσ. ∀p ∈ P ⇒ m(p.ξ (d) =    1− d σ if 0 ≤ d < σ 0 otherwise d ξ σ d ξ σ (28. However. Goldberg and Richardson [1080].. p2 )) = ∈ [0.9) (28. hence. The work of Deb [735] indicates that phenotypical sharing will often be superior to genotypical sharing.8)  0 otherwise Sharing functions can be employed in many different ways and are used by a variety of fitness assignment processes [735. e. another suitable approach would be to use the Hamming distance22 dist ham(p1 . and Deb and Goldberg [742] and further investigated by [1899. compute dist eucl(p1 . 2063. Besides using different powers of the distance-σ-ratio. Shσ (dist(p.

They suggested to use the partially filled “new” population to circumvent this problem. the components of f1 will most often be negligible in the Euclidian distance of two individuals in the objective space Y. Oei et al. 28. Other approaches scale the niche count into the interval [0. we can use this approach for fitness minimization in conjunction with a variety of other different fitness assignment processes. Of course. since fitness is subject to minimization. does not further variety very much. sampling the population can be sufficient to approximate “m”in order to avoid this quadratic complexity. we will describe it in detail and give its specification in Algorithm 28. Another problem is that the effect of simple sharing on the pressure into the direction of the Pareto frontier is not obvious either or depends on the sharing approach applied. as defined in this book. The layout of Evolutionary Algorithms. v′ ≡ assignFitness(pop. Some methods simply add a niche count to the Pareto rank. Using distance measures which are not normalized can lead to strange effects. On f1 ∈ O(x). 2912. FITNESS ASSIGNMENT 279 sharing was to distribute the population over a number of different peaks in the fitness landscape.3. In some of our experiments [2888. too. the influence of “m” is much bigger than on a f2 ∈ O(ex ). i.5. e.4. O n2 comparisons are needed.3. if it has negative infinitely large objectives in a minimization process. 2914].4. If such a candidate solution is optimal. The results of dividing the fitness by the niche counts strongly depends on the height differences of the peaks and thus. According to Goldberg et al.14) Sharing was traditionally combined with fitness proportionate. Imagine two objective functions f1 and f2 . i. for instance. In the following.4 on page 296. roulette wheel selection24 . it promotes solutions located in sparsely populated niches but how much their fitness will be improved is rather unclear. for instance. cmpf ) (28.3 on page 290. however. 1) before adding it which not only ensures that non-dominated individuals have the best fitness but also leave the relation between individuals at different ranks intact which. For computing the niche count “m”. on the complexity class23 of f.5 Variety Preserving Fitness Assignment Using sharing and the niche counts na¨ ıvely leads to more or less unpredictable effects. 24 Roulette 23 See . bases the fitness computation on the whole set of “new” individuals and assumes that their objective values have already been completely determined. If the values of f1 span from 0 to 1 for the individuals in the population whereas those of f2 range from 0 to 10 000. it is required that all individuals with infinite objective values must be removed from the population pop. wheel selection is discussed in Section 28. with each peak receiving a fraction of the population proportional to its height [1268].. If the Chapter 12 on page 143 for a detailed introduction into complexity and the O-notation. Before this fitness assignment process can begin. we applied a Variety Preserving Fitness Assignment. pop) .28. an approach based on Pareto ranking using prevalence comparators and welldosed sharing. By multiplying the niche count “m” to predetermined fitness values v′ . such issues simply do not exist in multi-objective Evolutionary Algorithms as introduced here and the chaotic behavior does occur. e. [1085].. In other words. 25 You can find an outline of tournament selection in Section 28. which may cause non-dominated individuals having worse fitness than any others in the population. it should receive fitness zero. [2063] have shown that if the sharing function is computed using the parental individuals of the “old” population and then na¨ ıvely combined with the more sophisticated tournament selection25 . but also inherit its shortcomings: v(p) = v′ (p) ∗ m(p. We developed it in order to mitigate the previously mentioned side effects and balance the evolutionary pressure between optimizing the objective functions and maximizing the variety inside the population. the resulting behavior of the Evolutionary Algorithm may be chaotic.

compute the fitness values 1/ (maxShare − minShare) if maxShare > minShare scale ←− 1 otherwise for i ←− len(pop) − 1 down to 0 do if ranks[i] > 0 then √ v(pop[i]) ←− ranks[i] + maxRank ∗ scale ∗ (shares[i] − minShare) else v(pop[i]) ←− scale ∗ (shares[i] − minShare) shares[i] ←− curShare if curShare < minShare then minShare ←− curShare if curShare > maxShare then maxShare ←− curShare .x) if k < 0 then ranks[j] ←− ranks[j] + 1 else if k > 0 then ranks[i] ←− ranks[i] + 1 // determine the ranges of the objectives mins ←− createList(|f | . −∞) foreach p ∈ pop do for i ←− |f | down to 1 do if fi (p. cmpf ) Input: pop: the population Input: cmpf : the comparator function Input: [implicit] f : the set of objective functions Data: . Then compute the prevalence ranks.5: v ←− assignFitnessVariety(pop.16 dist curShare ←− curShare + s shares[j] ←− shares[j] + s 34 35 36 37 38 // Finally.: sorry. pop[j].x. */ ranks ←− createList(len(pop) .x) < mins[i − 1] then mins[i − 1] ←− fi (p. . 0) minShare ←− +∞ maxShare ←− −∞ for i ←− len(pop) − 1 down to 0 do curShare ←− shares[i] for j ←− i − 1 down to 0 do dist ←− 0 for k ←− |f | down to 1 do 2 dist ←− dist + [(fk (pop[i].x) rangeScales ←− createList(|f | .x) − fk (pop[j]. we’ll discuss this in the text Output: v: the fitness function 1 2 3 4 5 6 7 8 9 begin /* If needed: Remove all elements with infinite objective values from pop and assign fitness 0 or len(pop) + len(pop) + 1 to them. . 1) for i ←− |f | − 1 down to 0 do if maxs[i] > mins[i] then rangeScales[i] ←− 1/ (maxs[i] − mins[i]) if ranks[i] > maxRank then maxRank ←− ranks[i] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 // Base a sharing value on the scaled Euclidean distance of all elements shares ←− createList(len(pop) .x)) ∗ rangeScales[k − 1]] √ s ←− Sh exp√|f |. +∞) maxs ←− createList(|f | .280 28 EVOLUTIONARY ALGORITHMS Algorithm 28.x) if fi (p. no space here. 0) maxRank ←− 0 for i ←− len(pop) − 1 down to 0 do for j ←− i − 1 down to 0 do k ←− cmpf (pop[i].x) > maxs[i − 1] then maxs[i − 1] ←− fi (p.

In the third column. There are |f | objective functions in f and. We therefore have decided for exponential sharing. See Section 51. is multiplied with the scaled share and add it to its rank (line 37). √ the square root of the maximum rank. . The lowest rank will always be zero since the prevalence comparators cmpf define order relations which are non-circular by definition. which will be used to scale all distances in each dimension (objective) of the individuals into the interval [0. This value may be zero if the population contains only non-prevailed elements.5.4. Variety Preserving is not limited to Pareto optimization but may also incorporate External Decision Makers (Section 3. Therefore. These values are used to store the inverse of their ranges in the array rangeScales. we sum up all the shares (see line 30). so the individual in the most crowded region always has a total share of 1 and the most remote individual always has a share of 0. the maximum Euclidian distance between two candidate solutions in the objective space (scaled to normalization) becomes |f |. The highest rank encountered in the population is stored in the variable maxRank.3.28. the supremacy of non-prevailed individuals in the population is preserved.17 (Variety Preserving Fitness Assignment). e. i. the minimum and maximum total shares are stored in the variables minShare and maxShare in lines 32 and 33. While doing so. its fitness should be set to len(pop)+ len(pop)+1. as introduced in Equation 28. The distance components of two individuals pop[i] and pop[j] are scaled and summarized in a √ variable dist in line 27. giving information about their relative quality according to the objective values and 2. FITNESS ASSIGNMENT 281 individual is infeasible. maxRank. The columns share/u and share/s correspond to the total sharing sums of the individuals.1 on page 277. since we use prevalence comparisons as introduced in Section 3. which is 1 larger than every other fitness values that may be assigned by Algorithm 28. Example E28.5. denoting how densely crowded the area around them is. maxRank is used to determine the maximum penalty for solutions in an overly crowded region of the search space later on.8 on page 645 for more information. its rank is zero. stored in the array ranks. Between line 19 and 33. with power 16 and σ = |f |. on the other hand.2. a list ranks is created which is used to efficiently compute the Pareto rank of every candidate solution in the population. two things are now known about the individuals in pop: 1. these individuals can compete with each other based on the crowdedness of their location in the objective space. you can find the Pareto ranks of the individuals as it has been listed in Table 28. For every individual. the scaled distance from every individual to every other candidate solution in the objective space is determined. their Pareto/Prevalence ranks. unscaled and scaled into [0..5).3. 1]. the word prevalence rank would be more precise in this case. this time with their objective values obtained with f1 and f2 corresponding to their coordinates in the diagram. the final fitness values of an individual p is determined as follows: If p is non-prevailed. Yet. In lines 2 to 9. 1] (line 34). Basically. but at most by the square root of the worst rank.5. It occurs if all the distances in the single dimensions are 1. 26 In all order relations imposed on finite sets there is always at least one “smallest” element.2. In Table 28.3. again two nested loops are needed (lines 22 and 24). By doing so. held in shares. Let us now apply Variety Preserving to the examples for Pareto ranking from Section 28. their sharing values. Actually. These variables are used scale all sharing values again into the interval [0. From line 10 to 18. The Euclidian distance between them is dist which is used to determine a sharing value in 28. Otherwise.26 . we again list all the candidate solutions from Figure 28. With this information. 1]. Therefore. the maximum and the minimum values that each objective function takes on when applied to the individuals in the population are obtained.12 on page 278. its fitness is its scaled total share (line 38).4 on page 276. hence. All other candidate solutions may degenerate in rank.1) or the method of inequalities (Section 3. This distance is used to aggregate share values (in the array shares).

8 0.6 0.4 0.17.2 0 15 8 5 1 6 9 10 7 2 3 11 12 4 13 14 10 8 Pareto Frontier f2 6 4 2 0 4 6 8 f1 10 2 Figure 28.282 28 EVOLUTIONARY ALGORITHMS share potential 1 0.5: The sharing potential in the Variety Preserving Example E28. .

217 0.375 0.3E-5 4. If two electrons come .7E-5 0.1E-4 0.417 0.417 0.003 0.025 0.925 0.077 5.2: An example for Variety Preserving based on Figure 28. f1 spans from 1 to 10. Lower triangle: corresponding share values. But first things first.023 0.1E-7 1.003 0.782 0.163 0.778 0. In this case.014 1.863 0.151 0.1E-4 0.003 0.002 0. as well as the corresponding values of √ the sharing function Sh exp√|f |.1.3: The distance and sharing matrix of the example from Table 28.334 0.2E-5 2.003 0.007 0.151 0.201 0.022 0.51 1.5E-5 1.001 0. using the upper triangle of the table for the distances and the lower triangle for the shares.7E-4 0.002 0.4.502 1.914 0.51 0.151 0.845 4. which leads to rangeScale[0] = 1/9.5E-5 1. Upper triangle: distances.366 0.2E-5 0.081 0.5.167 0.5.67 0.001 0.045 0.998 0.592 0.512 0.009 0.012 3.006 0.003 2.391 0.8E-6 0.622 0.081 0.001 0.151 0.167 0.334 0.004 0. the values of dist in algorithm Algorithm 28.667 0.933 1.356 0. we know the Pareto ranks of the candidate solutions from Table 28.2E-5 2.094 0.329 0.334 0.965 0.007 2.789 0.667 15 0.669 0.678 0.817 0.002 3. as illustrated in Figure 28.001 9.023 0. so the next step is to determine the ranges of values the objective functions take on for the example population.9E-5 1. The value of the sharing function can be imagined as a scalar field.67 0.836 0.33 0.018 0.356 0.239 0.334 0.246 0.391 1.9E-5 1.3E-4 4.7E-5 8E-5 1.003 0.7E-5 0.009 3.765 0.003 2.325 3.001 1. These can again easily be found out from Figure 28.246 0.009 0.255 0.462 0.821 1.6E-5 0.708 0.045 0.003 0.292 13.167 0.906 0.012 0.582 0.007 0.747 0.8E-6 0.314 0.16 dist .436 0.012 7.2.833 0.914 0. as already mentioned.151 0.202 0 3.576 0.151 0.747 0.001 1. We noted these in Table 28. FITNESS ASSIGNMENT x f1 f2 rank share/u share/s v(x) 1 1 7 2 2 4 3 6 2 4 10 1 5 1 8 6 2 7 7 3 5 8 2 9 9 3 7 10 4 6 11 5 5 12 7 3 13 8 4 14 7 7 15 9 9 0 0 0 0 1 2 1 4 4 2 2 1 3 9 13 0.001 0.679 1 0.569 0.001 0.9E-5 9.936 0.6E-4 2.67 0.018 0.056 0.25 0.167 0.023 0.191 6.898 0.012 2.712 0.167 0.1E-5 3.502 0.001 0.779 0.933 0.833 0.606 3.625 0.7E-4 1.284 0.391 0.346 0.5 0.059 0.363 0.9E-5 0.39 2.334 0.056 0.023 0.167 0.167 0.321 3.28.284 0.243 0.914 0.222 0.356 0.01 283 Table 28. With this.003 8E-5 5.71 0.202 0 0.5E-4 3.4. each individual in the population can be considered as an electron that will build an electrical field around it resulting in a potential.51 0.645 0.923 0.609 0.125 0.797 9.8E-7 0.386 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.045 0. rangeScale[1] = 1/8 since the maximum of f2 is 9 and its minimum is 1.018 0.569 0.356 0.635 0.436 0.002 3.635 0.3.111 0.255 0.111 0.023 Table 28.151 0.001 0.334 0.391 0.51 0.25 0.222 0.338 0.981 1.005 6.767 0.51 0.25 0.444 0.4E-4 0.5E-7 0.925 0.023 0.081 0.8E-5 0.274 0.274 0.221 0.669 0.446 5.334 0.2E-5 0.601 0.6E-5 1. we now can compute the (dimensionally√ scaled) distances amongst the candidate solutions in the objective space.417 0.001 0.001 0.3.531 0.151 0.059 0.023 0.556 0.274 0.018 0.436 0.1E-4 0.08 0.547 0.512 0.001 0.719 0.167 0.003 2.779 0.007 0.

2888.6 Tournament Fitness Assignment In tournament fitness assignment which is a generalization of the q-level (binary) tournament selection introduced by Weicker [2879]. Based on these fitness values. resulting in relatively steep spikes in Figure 28. Electrons in atoms on planets are limited in their movement by other influences like gravity or nuclear forces. Their minimum is around 0. because it has the densest neighborhood in Figure 28. the same problems than in the first idea of Pareto ranking would be encountered. we must subtract 0. which are often stronger than the electromagnetic force. 2912. individual 5 becomes less interesting than candidate solution 7 which has a worse Pareto rank.131. amongst these best individuals. the fitness of each individual is computed by letting it compete q times against r other individuals (with r = 1 as default) and counting the number of competitions it loses. Therefore.3) will pick elements in a way that preserves the pressure into the direction of the Pareto frontier but also leads to a balanced and sustainable variety in the population. the prevalence rank plays this role – as you can see in Table 28. candidate solution 4 is strongly preferred.4. the individuals 5 and 6 with the Pareto ranks 1 and 2 are located. we can compute the fitness values v(x) according to lines 38 and 37 in Algorithm 28.94. we build the column shares/s. since it is located in a very remote location of the objective space.3. By summing up the single sharing potentials for each individual in the example. If the number of tournaments won was instead of the losses. For a better understanding of the tournament metaphor see Section 28.4. The last column of Table 28. They are strongly penalized by the sharing process and receive the fitness values v(5) = 3.4.4 on page 296. .3.5 which gives proximity and density a heavier influence. By doing so.022 and the maximum is 0.446 and v(6) = 5.2. the power of the sharing potential falls exponentially. In Variety Preserving. Individual 1 is the least interesting non-dominated one. algorithms like Tournament selection (see Section 28. All non-prevailed individuals have retained a fitness value less than one.284 28 EVOLUTIONARY ALGORITHMS close. for instance. However.606. we obtain the fifth column of Table 28.2 lists these results. In this neighborhood.022 from each of these values and multiply the result with 1.2) or fitness proportionate approaches (discussed in Section 28. Finally. repulsing forces occur. which is pretty much the same what we want to do with Variety Preserving.5. lower than those of any other candidate solution in the population. 28. The number of losses will approximate the Pareto rank of the individual. but it is a bit more randomized than that. 2914]. In other words. 6 now is even worse than individual 8 which would have a fitness better by two if strict Pareto ranking was applied. in [2188. where the tournament selection scheme is discussed. The benefits of this approach have been shown. its influence on the fitness is often dominant. the unscaled share values. Unlike the electrical field.4.

4 Selection 28. pop. mate = selection(pop. The mating pool mate in an Evolutionary Algorithm contains the individuals which are selected for further investigation and on which the reproduction operations (see Section 28. z: counter variables Data: b: a Boolean variable being true as long as a tournament isn’t lost Data: p: the individual currently examined Output: v: the fitness function 1 2 3 4 5 6 7 8 9 10 11 12 13 285 begin for i ←− len(pop) − 1 down to 0 do z ←− q p ←− pop[i] for j ←− q down to 1 do b ←− true k ←− r while (k > 0) ∧ b do b ←− ¬pop[⌊randomUni0len(pop)⌋]. 1912]. 1673.10 (Mating Pool). cmpf ) Input: q: the number of tournaments per individuals Input: r: the number of other contestants per tournament.org/wiki/Selection_%28genetic_algorithm%29 [accessed 2007-07-03] . mps) chooses mps individuals according to their fitness values v (subject to minimization) from the population pop and places them into the mating pool mate [167. k.1 Introduction Definition D28.6: v ←− assignFitnessTournament(q. 27 http://en. 340. mps}) ∧ (len(mate) ≤ mps) (28.5 on page 306) will subsequently be applied.28.9 (Selection). the selection27 operation mate = selection(pop.4.wikipedia. normally 1 Input: pop: the population to assign fitness values to Input: cmpf : the comparator function providing the prevalence relation Data: i. r.4. In Evolutionary Algorithms.x k ←− k − 1 v(p) ←− z if b then z ←− z − 1 return v 28.x ≺ p. SELECTION Algorithm 28. j. mps) ⇒ ∀p ∈ mate ⇒ p ∈ pop ∀p ∈ pop ⇒ p ∈ G × X v(p) ∈ R+ ∀p ∈ pop (len(mate) ≥ min {len(pop) .15) Definition D28. v. v.

6: Selection with and without replacement. It is possible to . ms = 5 . Goldberg and Deb [1079]. the parent generation simply is discarded.6. ms = 5 in some EAs (steady-state. just to name a few. there are two classes of selection algorithms: such with replacement (annotated with a subscript r in this book) and such without replacement (here annotated with a subscript w ) [2399]. from the population pop can enter the mating pool mate at most once in each generation.1.) the parents compete with the offsprings Figure 28. and Zhong et al. for example. The dashed gray line in this figure stands for the possibility to include the parent generation in the next selection process. mps) ⇒ count(p. for instance. ps = 9 G A´B B D G C’ D´B mating pool at time step t Mate(t). such as. Chakraborty et al. This is the case in (µ + λ)-Evolution Strategies and in steady-state Evolutionary Algorithms.. each individual individuals can enter mating pool more than once A A B A ’ B D selection duplication. Their behavior has thus been subject to several detailed studies..6. depending on the algorithm chosen and its application-dependant implementation. mate = selectionw (pop. A B C B’ A´B A´D B B´D C D´A D’ D E F Population in generation t+1 Pop(t+1). In extinctive/generational EAs. mps) ⇒ count(p. ps = 9 G H I selection without replacement individuals enter mating duplication. In Figure 28.6. we present the transition from generation t to t + 1 for both types of selection methods. Generally. v. the parents selected into mate(t) may compete with their children in popt+1 and together take part in the next environmental selection step. crossover . mate) ≥ 1 ∀p ∈ mate The selection algorithms have major impact on the performance of Evolutionary Algorithms. This process is illustrated in the bottom row of Figure 28. Blickle and Thiele [340]. [520].. as discussed in Section 28. If a candidate solution has been selected. for example.1.17) mate = selectionr (pop. Usually.16) (28. (µ. elitist Evolutionary Algorithms may incorporate an archive archive in the selection process. [3079]. fitness assignment processes are carried out before selection and the selection algorithms base their decisions solely on the fitness v of the individuals. pool at most once mutation. (m+l) ES.In a selection algorithm without replacement.. crossover Mate(t). as outlined in ??.. with replacement mating pool at time step t mutation.. B C A A ’ D’ B´D C´A Population in generation t Pop(t). Furthermore. In other words. it cannot be selected a second time in the same generation. as sketched in the top row of Figure 28. The mating pool returned by algorithms with replacement can contain the same individual multiple times. λ)-Evolution Strategies. mate) = 1 ∀p ∈ mate (28. v. Sastry and Goldberg [2400]. conducted by. .286 28 EVOLUTIONARY ALGORITHMS Selection may behave in a deterministic or in a randomized manner.4.

v1 (p0 ) = 0. no sexual selection). mps). We consider four cases with fitness subject to minimization: 1. Individual pi has fitness (i + 1)3 . Therefore. we will discuss multiple selection algorithms. e. to use the objective values directly (selection(pop. this value will be proportional to its number of occurrences in the mating pool. 2. If we assume uniform sexual selection (i.4.a. sometimes followed by a sexual selectionSelection!sexual procedure.b. As sketched in Fig. . mps)) and thus.3 on page 275). . v2 (p1 ) = 3.7. The fitness of an individual strongly influence its chance for reproduction. i.1 Visualization In the following sections. However. e. p0 .2. 28.. As outlined in Paragraph 28. 28.2. f. we will use the special case where we have a population pop of the constant size of ps(t) = ps(t + 1) = 1000 individuals. or. e. this can lead to the same problems that occurred in the first approach of Pareto-ranking fitness assignment (see Point 1 in Section 28.4. in case of single-objective optimization. . Example E28.. to write selection(pop. we assume that sexual selection from the mating pool mate is uniform and that only a unary reproduction operation such as mutation is used.1. the individual pi has fitness i.18 (Individual Distribution after Selection). i.p999 . . v. .. . In this section. . v2 (p0 ) = 1. the number of times it will take part in reproduction. SELECTION 287 rely on the prevalence or dominance relation. we will focus on the former one. The individuals are numbered from 0 to 999. cmpf . i.6. i. 28. v1 (p999 ) = 999. For the sake of simplicity. Evolutionary Algorithms generally apply survival selection.1. v1 (p1 ) = 1. as illustrated in Fig. . In order to ease understanding them. e.3. mps) instead of selection(pop..7.. we will visualize the expected number of offspring ESp of an individual... v2 (p999 ) = 1 000 000 000. e. to save the costs of the fitness assignment process. . e.28. i.

28. .288 28 EVOLUTIONARY ALGORITHMS 1000 800 600 400 200 v1(pi) 1e9 8e8 6e8 4e8 2e8 i v2(pi) i 0 200 400 600 800 Fig. 28.2 0.7.b: Case 2: v2 (pi ) = (i + 1)3 0 200 400 600 800 Fig.6 1. 1 otherwise Fig.0 1.2 0.7: The four example fitness cases.0 1.a: Case 1: v1 (pi ) = i 2.d: Case 4: v4 (pi ) = 1 + v3 (pi ) Figure 28. 28.4 v4(pi) i 0 = 200 400 600 800 200 400 600 800 Fig.7.6 1.8 i 0.4 0 v3(pi) 2.7. 28.c: Case 3: v3 (pi ) 0.0001i if i < 999.8 0.7.

ps = len(pop) Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Input: k: cut-off rank.wikipedia.7 realizes truncation selection with replacement (i. threshold selection.4.4.8: mate ←− truncationSelectionw (mps. Algorithm 28. since otherwise. Algorithm 28. or breeding selection.8 simply returns the best mps individuals from the population. also called deterministic selection.7. ps − mps) Truncation selection is typically applied in Evolution Strategies with (µ + λ) and (µ. λ) Evolution Strategies as well as in steady-state EAs.8 represents truncation selection without replacement.13 on page 754. The selected elements are copied as often as needed until the mating pool size mps is reached. mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size. v. v) Input: pop: the list of individuals to select from Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Output: mate: the survivors of the truncation which now form the mating pool 1 2 3 begin pop ←− sortAsc(pop. pop[i mod k]) return mate Algorithm 28. In general Evolutionary Algorithms. k = mps and k < ps always holds. ps} pop ←− sortAsc(pop. Most often k equals the mating pool size mps which usually is significantly smaller than the population size ps. pop. this approach would make no sense since it would select all individuals. Algorithm 28. mps. v) for i ←− 1 up to mps do mate ←− addItem(mate. SELECTION 289 28. The name breeding selection stems from the fact that farmers who breed animals or crops only choose the best individuals for reproduction. They first sort the population in ascending order according to the fitness v. on the other hand. iterates from 0 to mps − 1 after the sorting and inserts only the elements with indices from 0 to k − 1 into the mating pool. where individuals may enter the mating pool mate more than once) and Algorithm 28.7: mate ←− truncationSelectionr (k.. For truncation selection without replacement. usually k = mps Data: i: counter variable Output: mate: the survivors of the truncation which now form the mating pool 1 2 3 4 5 6 7 begin mate ←− () k ←− min {k. too [302]. You can find both algorithms are implemented in one single class whose sources are provided in Listing 56.org/wiki/Truncation_selection [accessed 2007-07-03] . returns the k ≤ mps best elements from the list pop.2 Truncation Selection Truncation selection28 . Algorithm 28. e. it 28 http://en. pop.28. v) return deleteRange(pop.

a truncation selection method where a cutoff fitness is defined after which no further individuals are selected is the optimal selection strategy for crossover.01 200 200 8e6 0. If we set k = ps.09 k k k k k = = = = = 0. As stated. the top-50% individuals will have two offspring 2 1 and the others none.org/wiki/Fitness_proportionate_selection [accessed 2008-03-19] . ESp only depends on k.125 * ps 0.04 1.03 400 400 6e7 0.000 * ps Figure 28. each individual will have one offspring in average.18 cntd)).8: The number of expected offspring in truncation selection.4. In practical applications.07 i v1(pi) v2(pi) v3(pi) v4(pi) 900 900 7e8 0. Under truncation selection.19 (Truncation Selection (Example E28.wikipedia. If k = 1 mps.01 1.100 * ps 0. Under the assumption that no sexual selection takes place.04 500 500 1e8 0. L¨ssig a a et al. Recently. Here.02 1.09 1.8. only the order of individuals and not on the numerical relation of their fitness is important.06 1. the diagram will look exactly the same regardless which of the four example fitness configurations we use.500 * ps 1. 28.250 * ps 0.3 Fitness Proportionate Selection Fitness proportionate selection29 was the selection method applied in the original Genetic Algorithms as introduced by Holland [1250] and therefore is one of the oldest selection 29 http://en. we sketch the number of offspring for the individuals p ∈ pop from Exam- 9 ES(pi) 7 6 5 4 3 2 1 0 0 0 1 0 1 100 100 1e6 0.07 1.290 28 EVOLUTIONARY ALGORITHMS should be combined with a fitness assignment process that incorporates diversity information in order to prevent premature convergence. For k = 10 mps. L¨ssig and Hoffmann [1686].02 300 300 3e7 0. In Figure 28.05 1.06 700 700 3e8 0. provided that the right cut-off value is used. this value is normally not known. ple E28.18 if truncation selection would be applied.05 600 600 2e8 0. [1687] have proved that threshold selection. k is less or equal to the mating pool size mps. only the best 100 from the 1000 candidate solutions will reach the mating pool and reproduce 10 times in average.03 1. Example E28.

2} in a different way than {10. In this Genetic Algorithm. the probability P (select(p)) of an individual p ∈ pop to enter the mating pool is proportional to its fitness v(p). Equation 28. The most commonly known method is the Monte Carlo roulette wheel selection by De Jong [711].18) There exists a variety of approaches which realize such probability distributions [1079]. 1. 11. 1133]. 12}. SELECTION 291 schemes. In the context of this book.x) normV(p) = maxV − minV normV (p) P (select(p)) ≈ normV (p′ ) ∀p′ ∈pop (28.10 without replacement. The wheel is spun.28. The size of the area on the wheel standing for a candidate solution is proportional to its fitness.x whereas lower fitness denotes high utility.4. and the individual where it stops is placed into the mating pool mate. fitness proportionate selection will handle the set of fitness values {0.a.19) (28. minV = min {v(p) ∀p ∈ pop} maxV = max {v(p) ∀p ∈ pop} maxV − v(p.9. 28.18 below: The probability P (select(p)) that individual p is picked in each selection decision corresponds to the fraction of its fitness in the sum of the fitness of all individuals. as its name already implies.org/wiki/Roulette [accessed 2008-03-20] . This relation in its original form is defined in Equation 28. fitness is subject to minimization.22 defines the framework for such a (normalized) fitness proportionate selection. Here.21) (28.20) (28. the fitness values are normalized into a range of [0. It is realized in Algorithm 28. 1].wikipedia. 409] and stochastic universal selection [188. like stochastic remainder selection [358. P (select(p)) = v(p) v(p′ ) ∀p′ ∈pop (28. higher fitness values v(p) indicate unfit candidate solutions p. fitness was subject to maximization.9 as a variant with and in Algorithm 28.22) 30 http://en. because otherwise. This procedure is repeated until mps individuals have been selected. Under fitness proportionate selection. where we imagine the individuals of a population to be placed on a roulette30 wheel as sketched in Fig. Furthermore.

normV(pop[0]) + normV(pop[1]). 0) min ←− ∞ max ←− −∞ // Initialize fitness array and find extreme fitnesses for i ←− 0 up to ps − 1 do a ←− v(pop[i]) A[i] ←− a if a < min then min ←− a if a > max then max ←− a 10 11 12 // break situations where all individuals have the same fitness if max = min then max ←− max + 1 min ←− min − 1 13 14 15 16 /* aggregate fitnesses: A = (normV(pop[0]). pop[a]) return mate */ . max. mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size. maximum. ) sum ←− 0 for i ←− 0 up to ps − 1 do max − A[i] sum ←− sum + max − min A[i] ←− sum */ 17 18 19 20 21 /* spin the roulette wheel: the chance that a uniformly distributed random number lands in A[i] is (inversely) proportional to v(pop[1]) for i ←− 0 up to mps − 1 do a ←− searchItemAS(randomUni[0.. . and sum of the fitness values Output: mate: the mating pool 1 2 3 4 5 6 7 8 9 begin A ←− createList(ps. sum: the minimum.9: mate ←− rouletteWheelSelectionr (pop.292 28 EVOLUTIONARY ALGORITHMS Algorithm 28. sum) . A) if a < 0 then a ←− −a − 1 mate ←− addItem(mate.normV(pop[0]) + normV(pop[1]) + normV(pop[2]). v. . ps = len(pop) Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Data: i: a counter variable Data: a: a temporary store for a numerical value Data: A: the array of fitness values Data: min.

ps} − 1 do a ←− searchItemAS(randomUni[0.10: mate ←− rouletteWheelSelectionw (pop. sum: the minimum. .normV(pop[0]) + normV(pop[1]) + normV(pop[2]). ps = len(pop) Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Data: i: a counter variable Data: a.4. A) if a < 0 then a ←− −a − 1 if a = 0 then b ←− 0 else b ←− A[a − 1] b ←− A[a] − b // remove the element at index a from A for j ←− a + 1 up to len(A) − 1 do A[j] ←− A[j] − b sum ←− sum − b mate ←− addItem(mate. mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size. 0) min ←− ∞ max ←− −∞ // Initialize fitness array and find extreme fitnesses for i ←− 0 up to ps − 1 do a ←− v(pop[i]) A[i] ←− a if a < min then min ←− a if a > max then max ←− a 10 11 12 // break situations where all individuals have the same fitness if max = min then max ←− max + 1 min ←− min − 1 13 14 15 16 /* aggregate fitnesses: A = (normV(pop[0]).. . and sum of the fitness values Output: mate: the mating pool 1 2 3 4 293 5 6 7 8 9 begin A ←− createList(ps.28. a) */ return mate . v. b: temporary stores for numerical values Data: A: the array of fitness values Data: min. max. sum) . a) A ←− deleteItem(A. ) sum ←− 0 for i ←− 0 up to ps − 1 do max − A[i] sum ←− sum + max − min A[i] ←− sum */ 17 18 19 20 21 22 23 24 25 26 27 28 29 /* spin the roulette wheel: the chance that a uniformly distributed random number lands in A[i] is (inversely) proportional to v(pop[1]) for i ←− 0 up to min {mps. SELECTION Algorithm 28. pop[a]) pop ←− deleteItem(pop.normV(pop[0]) + normV(pop[1]). maximum.

10.52 · 1011 holds for maximization. Example E28. For scenario 2 with v2 (pi . whereas the worst candidate solutions will always vanish in this example.9. In order to clarify these drawbacks.20 (Roulette Wheel Selection).a). The drawback becomes even clearer when comparing the two fitness functions v3 and v4 .9. the relation ES (pi ) = mps ∗ 2 999−i i 498501 holds for fitness maximization and ES (pi ) = mps 498501 for minimization. In Figure 28. Each single choice is based on the proportion of the individual fitness in the total fitness of all individuals. Whitley [2929] points out that even fitness normalization as performed here cannot overcome the drawbacks of fitness proportional selection methods. v(p2 ) = 20.p4 will occur in the mating pool mate if the fitness values are v(p1 ) = 10.a) and normalized minimization (Fig.9.b).x) = (i + 1)3 . From a balanced selection .18 and Equation 28. Fig. The meaning of this is that the design of the objective functions (or the fitness assignment process) has a very strong influence on the convergence behavior of the Evolutionary Algorithm.10. 28.b: Example for normalized fitness minimization.18 cntd)).9. In such cases. we illustrate the fitness distribution on the roulette wheel for fitness max- v(p2)=20 A(p2)= 1/5 A v(p3)=30 A(p3)= 1/3 A v(p1)=10 A(p1)= 1/10 A v(p1)=10 A(p1)= 30/60 A= 1/2 A v(p4)=40 A(p4)= 0/60 A =0A v(p4)=40 A(p4)= 2/5 A v(p3)=30 A(p3)= 10/60 A v(p2)=20 = 1/6 A 20 1 A(p2)= /60 A= /3 A Fig.1. As result (sketched in Fig. v(p3 ) = 30. Thus we draw one thousand times a single individual from the population pop. as defined in Equation 28.b are significantly different from those in Fig.4.10. and v(p4 ) = 40.10 illustrates the expected number of occurrences ES (pi ) of an individual pi if roulette wheel selection is applied with replacement. The resulting expected values depicted in Fig.9. 28. 28.51 · 1011 (i+1)3 and ES (pi ) = mps 2. 28. which then is as large as the population (mps = ps = 1000). In scenario 1 with the fitness sum 999∗998 = 498501..21 (Roulette Wheel Selection (Example E28. the fittest individuals produce (on average) two offspring.a: Example for fitness maximization. Amongst others. In this example we compute the expected proportion in which each of the four element p1 .9: Examples for the idea of roulette wheel selection.1. Figure 28. This selection method only works well if the fitness of an individual is indeed something like a proportional measure for the probability that it will produce better offspring.a. Figure 28.294 28 EVOLUTIONARY ALGORITHMS Example E28. let us visualize the expected results of roulette wheel selection applied to the special cases stated in Section 28. imimization (Fig. 28. v4 is basically the same as v3 – just shifted upwards by 1. 28. the total fitness sum is approximately 2.22. usually parents for each offspring are selected into the mating pool. 28.

5 1.02 400 0.48e7 2.0 0.28. 1 otherwise Fig.b: v1 (pi .04 600 0.10. 28.5 2. 28.d: v4 (pi ) = 1 + v3 (pi ) Figure 28.17e8 v2(p) Fig.c: 0.10: The number of expected offspring in roulette wheel selection.a: v1 (pi .x) = i Fig. 28. SELECTION 295 1.0001i if i < 999.4 1.6 0.5 1.06 v3 (pi ) i v3(p) 0 0 200 1.0 ES(pi) 1.4.5 0 fitness maximization fitness minimization fitness minimization 0 0 200 200 400 400 600 600 i v1(p) 0 1 200 400 600 i 8.10.02 400 1.0 1.10.2 1.5 0 = fitness maximization 0 0 200 0.12e6 6.4 0.8 0.5 ES(pi) 2.2 0 fitness maximization 3.0 0.10. 28. .0 0.06 i v4(p) Fig.04 600 1.8 ES(pi) 1.x) = (i + 1)3 20 ES(pi) 15 10 fitness maximization 5 0 2.

As example. For each single tournament. contributes approximately two copies to the mating pool.4. consider a tournament selection (with replacement) with a tournament size of two [2931]. 2914]. For v3 . it receives many more copies in the mating pool. Therefore. 340]. the contestants are chosen randomly according to a uniform distribution and the winners will be allowed to enter the mating pool. since this individual will unite most of the slots in the mating pool on itself. Hence. again on average. v2 (pi ) = (i + 1)3 . The worst individual in the population will lose all its challenges to other candidate solutions and can only score even if competing against 2 itself. let us go back to Example E28. and Oei et al. Example E28. the individual at index 999 is totally outstanding with its fitness of 1 (which is more than ten times the fitness of the next-best individual).22 (Binary Tournament Selection). it is only around twice as good as the next best individual and thus. In the case of v4 . [2063]. it is very powerful and therefore used in many practical applications [81. It is mainly included here for the sake of completeness and because it is easy to understand and suitable for educational purposes. disappear. is one of the most popular and effective selection schemes. Example E28. Although being a simple selection strategy.0001i if i < 999. 2. The best candidate solution of the population will win all the contests it takes part in and thus. and v4 (pi ) = 1 + v3 (pi ). we can draw two conclusions: 1. In tournament selection. the result is entirely different. which will happen with probability (1/mps) . However. It will not be able to reproduce in the 2 average case because mps ∗ (1/mps) = 1/mps < 1 ∀mps > 1. we would expect that it produces similar mating pools for two functions which are that similar. 96. where one individual has outstanding fitness.org/wiki/Tournament_selection [accessed 2007-07-03] . If we apply tournament selection with replacement in 31 http://en. participate in two tournaments. For visualization purposes. k elements are picked from the population pop and compared with each other in a tournament. each individual will. 28.23 (Tournament Selection (Example E28. Roulette wheel selection has a bad performance compared to other schemes like tournament selection or ranking selection [1079]. If we assume that the mating pool will contain about as same as many individuals as the population. If the Genetic Algorithm behaves like a Hill Climber.. on average. 1880. respectively. this may degenerate the Genetic Algorithm to a better Hill Climber.4 Tournament Selection Tournament selection31 . Again. 2912. Its features are well-known and have been analyzed by a variety of researchers such as Blickle and Thiele [339. such as the ability to trace more than one possible solution at a time.18 with a population of 1000 individuals p0 . we assume that each individual has an unique fitness value of v1 (pi ) = i.296 28 EVOLUTIONARY ALGORITHMS method. From this situation. v3 (pi ) = 0. [1707]. Sastry and Goldberg [2400]. Miller and Goldberg [1898]. 461. In cases like v3 . The median individual of the population is better than 50% of its challengers but will also loose against 50%. 1 otherwise. it will enter the mating pool roughly one time on average. proposed by Wetzel [2923] and studied by Brindle [409]. The number of copies that an individual receives changes significantly when the vertical shift of the fitness function changes – even if its other characteristics remain constant. however.18 cntd)). Lee et al. only receives around twice as many copies. its benefits.wikipedia.p999 and mps = ps = 1000. The winner of this competition will then enter mating pool mate.

5. This method is defined in. The only thing that 9 ES(pi) 7 6 5 4 3 2 1 0 0 0 1 0 1 100 100 1e6 0. If k = 1. not fitness difference itself. 1707] can be defined in two forms. Tournament selection thus gets rid of the problematic dependency on the relative proportions of the fitness values in roulette-wheel style selection schemes.14 on page 756. Tournament selection with replacement (TSR) is presented in Algorithm 28. 2.01 200 200 8e6 0. tournament selection degenerates to randomly picking individuals and each candidate solution will occur one time in the mating pool on average.06 1. the expected number of occurrences ES (pi ) of an individual pi in the mating pool can be computed according to Blickle and Thiele [340] as ES (pi ) = mps ∗ 1000 − i 1000 k − 1000 − i − 1 1000 k (28.4. In Algorithm 28. With rising k.11: The number of expected offspring in tournament selection.04 500 500 1e8 0.09 k=10 k=5 k=4 k=3 k=2 k=1 Figure 28. 4. SELECTION 297 this special scenario.09 1. matters is whether or not the fitness of one individual is higher as the fitness of another one.02 1.03 400 400 6e7 0. on the other hand.11 depicts the expected offspring numbers for different tournament sizes k ∈ {1.07 1.06 700 700 3e8 0.23) The absolute values of the fitness play no role in tournament selection.03 1.02 300 300 3e7 0. 3. . a candidate solution cannot compete against itself. an individual may enter the mating pool at most once.18 are the same.07 i v1(pi) v2(pi) v3(pi) v4(pi) 900 900 7e8 0. the selection pressure increases: individuals with good fitness values create more and more offspring whereas the chance of worse candidate solutions to reproduce decreases. 10}.13.04 1.28.05 600 600 2e8 0. In the first variant specified as Algorithm 28.01 1. The expected numbers of offspring for the four example cases 1 and 2 from Example E28.05 1. Tournament selection without replacement (TSoR) [43.11 which has been implemented in Listing 56.12. Figure 28.

mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size. k} − 1 do b ←− ⌊randomUni[0. a.11: mate ←− tournamentSelectionr (k. ps)⌋ for j ←− 1 up to k − 1 do b ←− ⌊randomUni[0. b: counter variables Output: mate: the winners of the tournaments which now form the mating pool 1 2 3 4 5 6 7 8 9 begin mate ←− () for i ←− 1 up to mps do a ←− ⌊randomUni[0.12: mate ←− tournamentSelectionw (k. pop. v.298 28 EVOLUTIONARY ALGORITHMS Algorithm 28. ps = len(pop) Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Input: [implicit] k: the tournament size Data: i. mps} do a ←− ⌊randomUni[0. b: counter variables Output: mate: the winners of the tournaments which now form the mating pool 1 2 3 4 5 6 7 8 9 10 11 begin mate ←− () pop ←− sortAsc(pop. ps)⌋ if v(pop[b]) < v(pop[a]) then a ←− b return mate mate ←− addItem(mate. a) return mate . pop. len(pop))⌋ for j ←− 1 up to min {len(pop) . pop[a]) Algorithm 28. j. ps)⌋ if v(pop[b]) < v(pop[a]) then a ←− b mate ←− addItem(mate. mps) Input: pop: the list of individuals to select from Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Input: [implicit] k: the tournament size Data: i. v. pop[a]) pop ←− deleteItem(pop. j. v) for i ←− 1 up to min {len(pop) . a.

a set of additional tournament-based selection methods has been introduced by Lee et al. a) a ←− minA mate ←− addItem(mate. SELECTION Algorithm 28. [1707]. mps) Input: pop: the list of individuals to select from Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Input: [implicit] k: the tournament size Data: A: the list of contestants per tournament Data: a: the tournament winner Data: i. v) for i ←− 1 up to mps do A ←− () for j ←− 1 up to min{k. . pop. a probability η is defined. v. The best individual in the tournament is selected with probability η.4. In the non-deterministic variants. len(pop)} do repeat searchItemUS(a. len(pop))⌋ A ←− addItem(A. Notice that it becomes equivalent to Algorithm 28. The ith best individual in a tournament enters the mating pool with probability η(1 − η)i .11 on the preceding page if η is set to 1. the second best with probability η(1 − η). Besides the algorithms discussed here. this is not necessarily the case. There.28.14 realizes this behavior for a tournament selection with replacement. pop[a]) return mate The algorithms specified here should more precisely be entitled as deterministic tournament selection algorithms since the winner of the k contestants that take part in each tournament enters the mating pool. A)¡0 until a ←− ⌊randomUni[0. the third best with probability η(1 − η)2 and so on.13: mate ←− tournamentSelectionw (k. Algorithm 28. j: counter variables Output: mate: the winners of the tournaments which now form the mating pool 1 2 3 4 5 6 7 8 9 10 11 12 13 299 begin mate ←− () pop ←− sortAsc(pop.

24. we can consider the conventional ranking selection method as the application of a fitness assignment process setting the rank as fitness (which can be achieved with Pareto ranking. a2 ) ≡ (a1 − a2 )) for j ←− 0 up to len(A) − 1 do if (randomUni[0. 1 ps ps − i − 1 ps − 1 η+ ps P (select(pi )) = η− + η+ − η− (28. v) for i ←− 1 up to mps do A ←− () for j ←− 1 up to k do A ←− addItem(A. In linear ranking selection with replacement. j: counter variables Output: mate: the winners of the tournaments which now form the mating pool 1 2 3 4 5 6 7 8 9 10 11 12 begin mate ←− () pop ←− sortAsc(pop. ps)⌋) A ←− sortAsc(A. we will discuss it again for fitness minimization. introduced by Baker [187] and more thoroughly discussed by Blickle and Thiele [340]. 1] Input: [implicit] k: the tournament size Data: A: the set of tournament contestants Data: i. the best individual (at rank 0) will have the probability of being . mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size. k. η ∈ [0. 1) ≤ η) ∨ (j ≥ len(A) − 1) then mate ←− addItem(mate. v. the probability P (select(pi )) that the individual at rank i ∈ 0. linear ranking selection was used to maximize fitness. they must be assigned to different ranks [340]. ps = len(pop) Input: v: the fitness values Input: mps: the number of individuals to be placed into the mating pool mate Input: [implicit] η: the selection probability. ⌊randomUni[0. 1133. for instance) and a subsequent fitness proportional selection. Here. Using the rank smoothes out larger differences of the fitness values and emphasizes small ones. 2929]. pop[A[j]]) j ←− ∞ 13 return mate 28. pop. the probability of an individual to be selected is proportional to its position (rank) in the sorted list of all individuals in the population.5 Linear Ranking Selection Linear ranking selection [340].14: mate ←− tournamentSelectionNDr (η. Whitley [2929]. Generally.4.300 28 EVOLUTIONARY ALGORITHMS Algorithm 28. cmp(a1 . In its original definition discussed in [340]. Notice that each individual always receives a different rank and even if two individuals have the same fitness.24) In Equation 28.ps is picked in each selection decision is given in Equation 28.24 below. In ranking selection [187. and Goldberg and Deb [1079] is another approach for circumventing the problems of fitness proportionate selection methods..

e.28) 1 ps η− + η+ − η− η− + η+ − η− ps−1 1= i=0 ps − i − 1 ps − 1 ps − i − 1 ps − 1 ps − i − 1 ps − 1 ps − i − 1 ps − 1 (28.31) = 1 ps η − + η + − η − ps 1 η+ − η− ps η − + ps ps − 1 (28. If the selection algorithm should furthermore make sense. e. 0 ≤ η − < η + ≤ 2 is true.29) = 1 ps ps−1 (28.41) 1 ps (ps − 1) − ps (ps − 1) 2 1 η+ − η− 1 ps η − + ps (ps − 1) ps ps − 1 2 1 1 ps η − + ps η + − η − ps 2 1 1 1 ps η − + ps η + ps 2 2 1 − 1 + η + η 2 2 1 + η 2 η+ The parameters η − and η + must be chosen in a way that 2 − η − = η + and.28. this factor .26) (28. fitness is subject to minimization).39) (28.36) (28.27) 0 ps − 1 ps − 0 − 1 1 − = η + η+ − η− ps − 1 ps ps − (ps − 1) − 1 ps − 1 = η− ps It should be noted that all probabilities must add up to one. Koza [1602] determines the slope of the linear ranking selection probability function by a factor rm in his implementation of this selection scheme.37) (28. : ps−1 P (select(pi )) = 1 i=0 ps−1 (28.25) ps (28..34) (28.. According to [340]. SELECTION picked while the worst individual (rank ps − 1) is chosen with 1 ps 1 P (select(pps−1 )) = ps P (select(p0 )) = = 1 ps η− + η+ − η− η− + η+ − η− η− + η+ − η− η ps − 301 chance: η+ = (28. obviously. prefer individuals with a lower fitness rank over those with a higher fitness rank (again.40) (28.35) (28.38) (28. 0 < η − and 0 < η + must hold as well.33) 1 η+ − η− = ps η − + ps ps − 1 = = = = 1= 1 1 − η− = 2 2 − η− = η+ − η− 1 ps η − + ps ps − 1 ps (ps − 1) − i i=0 (28. i.32) i=0 ps−1 = i=0 ps − i − 1 ps−1 (28.4. i.30) i=0 = 1 ps η − + ps i=0 η+ − η− ps−1 (28.

However. Premature convergence.45) (28.4.42) (28. ps)⌋]) return mate Algorithm 28. We therefore only need to set v′ (pi ) = 1 − P (select(pi )) (since Algorithm 28. to determine which individuals are allowed to produce offspring (see Section 28. e. Algorithm 28.d.e.15: mate ←− randomSelectionr (pop. especially in cases where there are many parents such as in Evolution Strategies which are introduced in Chapter 30 on page 359.9 on page 292 minimizes fitness)..1. mps) Input: pop: the list of individuals to select from Input: [implicit] ps: the population size.9 on page 292. 2914]. it may make sense to use random selection to choose the parents for the different individuals. we often use a very simple mechanism to prevent premature convergence. η− = η+ 2 rm + 1 2rm = rm + 1 2rm 2 = rm + 1 rm + 1 2 (rm + 1) − 2 = 2rm (28. as outlined in Chapter 13.46) 2 − η− = η+ ⇒ 2 − 2rm = 2rm q. to use this selection scheme for survival/environmental selection. pop[⌊randomUni[0.15 on page 758. 28. An example implementation of this algorithm is given in Listing 56. 28.7 Clearing and Simple Convergence Prevention (SCP) In our experiments. means that the optimization process gets stuck at a local optimum and loses the ability to investigate other areas in the .302 28 EVOLUTIONARY ALGORITHMS can be transformed to η − and η + by the following equations.4. we can simply emulate this scheme by (1) first building a secondary fitness function v′ based on the ranks of the individuals pi according to the original fitness v and Equation 28.2. Obviously. ps = len(pop) Input: mps: the number of individuals to be placed into the mating pool mate Data: i: counter variable Output: mate: the randomly picked parents 1 2 3 4 5 begin mate ←− () for i ←− 1 up to mps do mate ←− addItem(mate. since it means to disregard all fitness information.6 on page 260) makes little sense.6 Random Selection Random selection is a selection method which simply fills the mating pool with randomly picked individual from the population. after an actual mating pool has been filled.15 provides a specification of random selection with replacement. This would turn the Evolutionary Algorithm into some sort of parallel random walk.43) (28.24 and then (2) applying fitness proportionate selection as given in Algorithm 28.44) (28. As mentioned. especially in Genetic Programming [2888] and problems with discrete objective functions [2912. i.

a u Singh and Deb [2503] suggest a modified clearing approach which shifts individuals that would be cleared farther away and reevaluates their fitness.7. Then. v) for i ←− 0 up to len(P ) − 1 do if v(P [i]) < ∞ then n ←− 1 for j ←− i + 1 up to len(P ) − 1 do if (v(P [j]) < ∞) ∧ (dist(P [i]. . Sareni and Kr¨henb¨hl [2391] showed that this method is very promising.4. some individuals necessarily will be located somewhere else and. P [j]) < σ) then if n < k then n ←− n + 1 else v′ (P [j]) ←− ∞ return v′ 11 to a distance measure “dist” applied in the genotypic (G) or phenotypic space (X) in each generation. because better points than the global optimum cannot be discovered anyway. Then. 2168] whose clearing approach is applied in each e generation and works as specified in Algorithm 28. In case that the optimizer indeed converged to the global optimum. clearing divides the population of an EA into several sub-populations according Algorithm 28. we placed it into this section. if they survive. This effectively prevents that a niche can get too crowded. Since our method actively deletes individuals from the population. v) Input: pop: the list of individuals to apply clearing to Input: σ: the clearing radius Input: k: the nieche capacity Input: [implicit] v: the fitness values Input: [implicit] dist: a distance measure in the genome or phenome Data: n: the current number of winners Data: i. the fitness of all but the k best individuals in such a sub-population is set to the worst possible value. the fraction of the population which resides at that spot should be limited. 28.16: v′ ←− clearingFilter(pop. Basically. OurSCP method is neither a fitness nor a selection algorithm.16 where fitness is subject to minimization. j: counter variables Data: P : counter variables Output: v′ : the new fitness assignment 1 2 3 4 5 6 7 8 9 10 begin v′ ←− f itnf P ←− sortAsc(pop.4. SELECTION 303 search space. may find a path away from the local optimum towards a different area in the search space. The idea is simple: the more similar individuals exist in the population. such a measure does not cause problems because in the end. the more likely the optimization process has converged. k. in particular the area where the global optimum resides. σ.28. but instead more of a filter which is placed after computing the objectives and before applying a fitness assignment process.1 Clearing The first one to apply an explicit limitation of the number of individuals in an area of the search space was P´trowski [2167. The individuals of each sub-population have at most the distance σ to the fittest individual in this niche. If it got stuck at a local optimum. It is not clear whether it converged to a global optimum or to a local one.

1] Input: [implicit] f : the set of objective function(s) Data: i. j: counter variables Data: p: the individual checked in this generation Output: P : the pruned population 1 2 3 4 5 6 7 8 begin P ←− () for i ←− 0 up to len(P ) − 1 do p ←− pop[i] for j ←− len(P ) − 1 down to 0 do if f (p.47) In Figure 28. this prevention mechanism is turned off. From Equation 28.x) = f (pop′ [j]. one of them is thrown away with probability33 c ∈ [0. for c = 1.47 follows that even a population of infinite size which has fully converged to one single value will probably not contain more than 1 copies of this individual c 32 The exactly-the-same-criterion makes sense in combinatorial optimization and many Genetic Programming problems but may easily be replaced with a limit imposed on the Euclidian distance in real-valued optimization problems. j) return P P ←− addItem(P.304 28. 2914].x) then if randomUni[0.17 was applied. For c = 0. It has to be applied after the evaluation of the objective values of the individuals in the population and before any fitness assignment or selection takes place. the outcomes were influenced negatively by this filter. 2912. 1) < c then P ←− deleteItem(P. 33 instead of defining a fixed threshold k . All individuals are compared with each other. Algorithm 28. we sketch the expected number of remaining instances of the individual p after this pruning process if it occurred n times in the population before Algorithm 28. 1] and does not take part in any further comparisons. This way. in none of our experiments. we weed out similar individuals without making any assumptions about G or X and make room in the population and mating pool for a wider diversity of candidate solutions. c. If two have exactly the same objective values32 . Although this approach is very simple.2 SCP 28 EVOLUTIONARY ALGORITHMS We modify this approach in two respects: We measure similarity not in form of a distance in the search space G or the problem space X. f ) Input: pop: the list of individuals to apply convergence prevention to Input: c: the convergence prevention probability. Additionally.7.17: P ←− scpFilter(pop. c ∈ [0. 2888.4. but in the objective space Y ⊆ R|f | .12. Algorithm 28. all remaining individuals will have different objective values. the results of our experiments were often significantly better with this convergence prevention method turned on than without it [2188.17 cuts down the expected number of their occurrences ESp to n n−1 ES (p) = i=1 (1 − c) i−1 = i=0 (1 − c) = i 1 − (1 − c) (1 − c) − 1 = −c c n n (28. for instance. Algorithm 28. which makes it even more robust than other methods for convergence prevention like sharing or variety preserving. p) 9 10 If an individual p occurs n times in the population or if there are n individuals with exactly the same objective values.17 specifies how our simple mechanism works.

3 cp=0. This threshold is also visible in Figure 28. clearing should be more suitable whereas we know from experiments using our approach only that SCP performs very good in combinatorial problems [2188. we assume that in real-valued search or problem spaces. Different from that. 2914] and Genetic Programming [2888]. The former approach works against premature convergence to a certain solution structure while the latter forces the EA to “keep track” of a trail to candidate solutions with worse fitness which may later evolve to good individuals with traits different from the currently exploited ones. . the maximum number of individuals which can e survive in a niche was a fixed constant k and.4. none of them would be affected.3 Discussion In P´trowski’s clearing approach [2167].7.5 cp=0. n→∞ lim ES (p) = lim 1 − (1 − c) 1−0 1 = = n→∞ c c c n (28. if less than k individuals resided in a niche.48) 28. combined with any kind of population-based optimizer. Furthermore.12. since our method does not need to access any information apart from the objective values.4. once implemented.28. an expected value of the number of individuals allowed in a niche is specified with the probability c and may be both. SCP stops it from keeping too many individuals with equal utility.7 25 Figure 28. exceeded or undercut.2 cp=0. in SCP. Another difference of the approaches arises from the space in which the distance is computed: Whereas clearing prevents the EA from concentrating too much on a certain area in the search or problem space. after the simple convergence prevention has been applied. Which of the two approaches is better has not yet been tested with comparative experiments and is part of our future work.12: The expected numbers of occurrences for different values of n and c. SELECTION 305 9 ES(p) 7 6 5 4 3 2 1 0 5 10 15 n cp=0.1 cp=0. it is independent from the search and problem space and can. At the present moment.

org/wiki/Sexual_reproduction [accessed 2008-03-17] . recombination 36 combines two parental genotypes to a new genotype including traits from both elders. The search operations searchOp ∈ Op used in the Evolutionary Algorithm family are called reproduction operation.org/wiki/Mutation [accessed 2007-07-03] 36 http://en. There exist different methods to do so. unless seeding is applied (see Paragraph 28.18: pop ←− createPop(ps) Input: ps: the number of individuals in the new population Input: [implicit] create: the creation operator Data: i: a counter variable Output: pop: the new population of randomly created individuals (len(pop) = s) 1 2 3 4 5 6 7 begin pop ←− () for i ←− 0 up to ps − 1 do p ←− ∅ p. e. random variations in the genotype of an individual. Creation simply creates a new genotype without any ancestors or heritage.306 28 EVOLUTIONARY ALGORITHMS 28. i. no information about the search space has been gathered yet. 34 http://en. 4.2. In Evolutionary Algorithms. It is useful to increase the share of a given type of individual in a population.11 (Creation). exactly natural mutation35 .wikipedia. Mutation in Evolutionary Algorithms corresponds to small.wikipedia.. The creation operation create genotype g ∈ G with a random configuration. Hence.1. 3. p) return pop Duplication is just a placeholder for copying an element of the search space. it is what occurs when neither mutation nor recombination are applied. In the following. 2.2. There are four basic operations: 1. we will discuss these operations in detail and provide general definitions form them. Like in sexual reproduction. Definition D28.18.g ←− create pop ←− addItem(pop. Duplication directly copies an existing genotype without any modification. we define the function createPop(ps) in Algorithm 28. g = create ⇒g∈G is used to produce a new (28. Creation is thus used to fill the initial population pop(t = 1). inspired by the biological procreation mechanisms34 of mother nature [2303]. When an Evolutionary Algorithm starts.5 Reproduction An optimization algorithm uses the information gathered up to step t for creating the candidate solutions to be evaluated in step t + 1.org/wiki/Reproduction [accessed 2007-07-03] 35 http://en. the aggregated information corresponds to the population pop and the archive archive of best individuals. Algorithm 28.49) For creating the initial population of the size ps. if such an archive is maintained. we cannot use existing candidate solutions to derive new ones and search operations with an arity higher than zero cannot be applied.2.wikipedia.

to perform something like mutate(recombine(g1 . The way this modification is performed is applicationdependent. gn = recombine(ga . is only used if the elements search space G are linear representations. a unary search operation.12 (Duplication). REPRODUCTION 307 Definition D28. Its role model is sexual reproduction.13 (Mutation).19. Crossover. The population reproduction operation pop = reproducePop(mate. This operation takes one genotype as argument and modifies it. The mutation operation mutate : G → G is used to create a new genotype gn ∈ G by modifying an existing one.5. Together with duplication. mutation. The goal of implementing a recombination operation is to provide the search process with some means to join beneficial features of different candidate solutions. g2 )). gb ) : ga . gb ∈ G ⇒ gn ∈ G (28. it stands for exchanging parts of these so-called strings.. however. In Algorithm 28. gn = mutate(g) : g ∈ G ⇒ gn ∈ G (28. 1] and cr ∈ [0. Definition D28. we give a specification on how Evolutionary Algorithms usually would apply the reproduction operations defined so far to create a new population of ps individuals.15 (reproducePop). 1]. mutation corresponds to the asexual reproduction in nature. g = duplicate(g) ∀g ∈ G (28. the recombination is a search operation inspired by natural reproduction. The duplication operation duplicate : G → G is used to create an exact copy of an existing genotype g ∈ G. i. Definition D28. Notice that the term recombination is more general than crossover since it stands for arbitrary search operations that combines the traits of two individuals.wikipedia.14 (Recombination).52) It is not unusual to chain the application of search operators.org/wiki/Recombination [accessed 2007-07-03] .50) In most EAs. Then. 37 http://en. The four operators are altogether used to reproduce whole populations of individuals.51) Like mutation. is provided. respectively. Definition D28. This procedure applies mutation and crossover at specific rates mr ∈ [0. It may happen in a randomized or in a deterministic fashion. e. The recombination37 operation recombine : G × G → G is used to create a new genotype gn ∈ G by combining the features of two existing ones. This may happen in a deterministic or randomized fashion.28. ps) is used to create a new population pop of ps individuals by applying the reproduction operations to the mating pool mate.

In scenarios where the search space G differs from the problem space X it often makes more sense to store the list of individual records P instead of just keeping a set x of the best phenotypes X. cr) Input: mate: the mating pool Input: [implicit] mps: the mating pool size Input: ps: the number of individuals in the new population Input: mr: the mutation rate Input: cr: the crossover rate Input: [implicit] mutate. ∀x ∈ extractPhenotypes(P ) ⇒ ∃p ∈ P : x = p. It uses knowledge of the prevalence .g) return pop if randomUni[0. 1) < cr) ∧ (mps > 1) then repeat j ←− ⌊randomUni[0. recombine: the mutation and crossover operators Data: i.x (28. for instance.308 28 EVOLUTIONARY ALGORITHMS Algorithm 28.53) 28. optimizers normally carry a list of the l non-prevailed candidate solutions ever visited with them.g ←− mutate(p. Definition D28. we define the simple operation “extractPhenotypes” which extracts them from a set of individuals P . 1) < mr then p.6 Maintaining a Set of Non-Dominated/Non-Prevailed Individuals Most multi-objective optimization algorithms return a set X of best solutions x discovered instead of a single individual.g ←− recombine(p. Many optimization techniques also internally keep track of the set of best candidate solutions encountered during the search process.1 Updating the Optimal Set Whenever a new individual p is created.19: pop ←− reproducePopEA(mate. mr. mate[j]. the set P holding the best individuals known so far may change. The function “updateOptimalSet” updates a set of elements Pold with the new individual pnew .g) pop ←− addItem(pop.x.16 (updateOptimalSet). ps. p) 28. It is possible that the new candidate solution must be included in the optimal set or even prevails some of the phenotypes already contained therein which then must be removed.6. mps)⌋ until j = (i mod mps) p. Since the elements of the search space are no longer required at the end of the optimization process. Therefore.g. j: counter variables Output: pop: the new population (len(pop) = ps) 1 2 3 4 5 6 7 8 9 10 11 12 begin pop ←− () for i ←− 0 up to ps do p ←− duplicate(mate[i mod mps]) if (randomUni[0. it is quite possible to discover an optimal element x⋆ and subsequently depart from it to a local optimum x⋆ . In Simulated Annealing.

x ≺ pnew . MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS309 relation ≺ and the corresponding comparator function cmpf .20 creates a new.21 which perform the necessary operations. Algorithm 28.x then Pnew ←− Pnew \ {pold } else if pold . pnew ∈ G × X : ∀p1 ∈ Pold ∃p2 ∈ Pold : p2 .22. Pnew = updateOptimalSet(Pold .x ≺ pold . empty optimal set and successively inserts non-prevailed elements whereas Algorithm 28.x) then Pnew ←− Pnew ∪ {pold } if pold .54) We define two equivalent approaches in Algorithm 28.20: Pnew ←− updateOptimalSet(Pold .6.2 Obtaining Non-Prevailed Elements The function “updateOptimalSet” helps an optimizer to build and maintain a list of optimal .x ≺ p1 .x then return Pold return Pnew ∪ {pnew } Especially in the case of Evolutionary Algorithms. Pnew ⊆ G × X. This operation can easily be realized by iteratively applying “updateOptimalSet”. cmpf ) Input: Pold : a set of non-prevailed individuals as known before the creation of pnew Input: pnew : a new individual to be checked Input: cmpf : the comparator function representing the domainance/prevalence relation Output: Pnew : the set updated with the knowledge of pnew 1 2 3 4 5 6 7 begin Pnew ←− ∅ foreach pold ∈ Pold do if ¬ (pnew . pnew .20 and Algorithm 28. 28.x then return Pold return Pnew ∪ {pnew } Algorithm 28. pnew . pnew .28. cmpf ) (nd Version) Input: Pold : a set of non-prevailed individuals as known before the creation of pnew Input: pnew : a new individual to be checked Input: cmpf : the comparator function representing the domainance/prevalence relation Output: Pnew : the set updated with the knowledge of pnew 1 2 3 4 5 6 7 8 begin Pnew ←− Pold foreach pold ∈ Pold do if pnew .x ⇒ Pnew ⊆ Pold ∪ {pnew } ∧ ∀p1 ∈ Pnew ∃p2 ∈ Pnew : p2 .21: Pnew ←− updateOptimalSet(Pold . as shown in Algorithm 28. Algorithm 28.6.x ≺ pnew . Let us define the operation “updateOptimalSetN” for this purpose. not a single new element is created in each generation but a set P .x ≺ (28.x ≺ pold . . ) Pold .21 removes all elements which are prevailed by the new individual pnew from the old set Pold .

310 28 EVOLUTIONARY ALGORITHMS Algorithm 28. they have to extracts all optimal elements the set of individuals pop currently known.22: Pnew ←− updateOptimalSetN(Pold .17 (extractBest). However.23: P ←− extractBest(pop. not all optimization methods maintain an non-prevailed set all the time.56) Algorithm 28.x (28. cmpf ) Input: pop: the list to extract the non-prevailed individuals from Data: pany . cmpf ) return Pnew individuals. cmpf ) (28. P = extractBest(pop.55) Algorithm 28.23 defines one possible realization of “extractBest”. cmpf ) ≡ extractBest(Pold cuppnew . j: counter variables Output: P : the non-prevailed subset extracted from pop 1 2 3 4 5 6 7 8 9 begin P ←− pop for i ←− len(P ) − 1 down to 1 do for j ←− i − 1 down to 0 do if P [i] ≺ P [j] then P ←− deleteItemP j i ←− i − 1 else if P [j] ≺ P [i] then P ←− deleteItemP i return listToSetP 10 . this approach could also be used for updating an optimal set. When the optimization process finishes. By the way. updateOptimalSet(Pold . the “extractPhenotypes” can then be used to obtain the non-prevailed elements of the problem space and return them to the user. The function “extractBest” function extracts a set P of non-prevailed (non-dominated) individuals from any given set of individuals pop according to the comparator cmpf . pchk : candidate solutions tested for supremacy Data: i. ∀P ⊆ pop ⊆ G × X.x ≺ p1 . cmpf ) ⇒ ∀p1 ∈ P ∃p2 ∈ pop : p2 . When they terminate. Definition D28. cmpf ) Input: Pold : the old set of non-prevailed individuals Input: P : the set of new individuals to be checked for non-prevailedness Input: cmpf : the comparator function representing the domainance/prevalence relation Data: p: an individual from P Output: Pnew : the updated set 1 2 3 4 begin Pnew ←− Pold foreach p ∈ P do Pnew ←− updateOptimalSet(Pnew . pnew . P. p.

The pruning operation “pruneOptimalSet” reduces the size of a set Pold of individuals to fit to a given upper boundary k. point out that if the extreme elements of the optimal frontier are lost.1 Pruning via Clustering Algorithm 28. It is very important that the loss of generality during a pruning operation is minimized. The same holds for the archives archive of non-dominated individuals maintained by several MOEAs. we again base our approach on a set of individuals P and we define: Definition D28. for instance.3. ∀Pnew ⊆ Pold subseteqG × X.24: Pnew ←− pruneOptimalSetc (Pold .28. for instance.6.6. there may be very many if not infinite many optimal individuals. k) Input: Pold : the set to be pruned Input: k: the maximum size allowed for the set (k > 0) Input: [implicit] cluster: the clustering algorithm to be used Input: [implicit] nucleus: the function used to determine the nuclei of the clusters Data: B: the set of clusters obtained by the clustering algorithm Data: b: a single cluster b ∈ B Output: Pnew : the pruned set 1 2 3 4 5 begin // obtain k clusters B ←− cluster(Pold ) Pnew ←− ∅ foreach b ∈ B do Pnew ←− Pnew ∪ nucleus(b) return Pnew 38 You can find a discussion of clustering algorithms in ??. the resulting set may only represent a small fraction of the optima that could have been found without pruning. Fieldsend et al. k) ⇒ |Pnew | ≤ k (28.24 uses clustering to provide the functionality specified in this definition and thereby realizes the idea of Morse [1959] and Taboada and Coit [2644].57) 28. Basically. [912]. Morse [1959] and Taboada and Coit [2644]. Instead of working on the set X of best discovered candidate solutions x. any given clustering algorithm could be used as replacement for “cluster” – see ?? on page ?? for more information on clustering. cannot grow infinitely because we only have limited memory. MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS311 28.18 (pruneOptimalSet). 1959. we need to perform an action called pruning which reduces the size of the set of non-prevailed individuals to a given limit k [605.6. The set X of the best discovered candidate solutions computed by the optimization algorithms. Therefore. . suggest to use clustering algorithms38 for this purpose. also any combination of the fitness assignment and selection schemes discussed in Chapter 28 would do. There exists a variety of possible pruning operations. however. k ∈ N1 : Pnew = pruneOptimalSet(Pold .3 Pruning the Optimal Set In some optimization problems. Algorithm 28. 2645]. In principle.

This circumvents the phenomenon of narrowing down the optimal set described by Fieldsend et al. If individuals need to be removed from the set because it became too large.6. The individuals with the minimum/maximum values are always preserved.312 28. [912] and distinguishes the AGA approach from clusteringbased methods. This basically disables their later disposal by the pruning algorithm “pruneOptimalSetAGA” since “pruneOptimalSetAGA” deletes the individuals from the set Pold that have largest cnt values first.25 on the next page and Algorithm 28. It ensures the preservation of border individuals by assigning a negative cnt value to them. the AGA approach removes those that reside in regions which are the most crowded. we introduce a more or less trivial specification in Algorithm 28. 39 PAES is discussed in ?? on page ?? . it is not possible to define maximum set sizes k which are smaller than 2|f |. “divideAGA” also counts the number of individuals that reside in the same coordinates for each individual and makes it available in cnt. AGA has been introduced for the Evolutionary Algorithm PAES39 by Knowles and Corne [1552] and uses the objective values (computed by the set of objective functions f ) directly.3. Hence.26 on page 314. The original sources outline the algorithm basically with with descriptions and definitions. Its span in each dimension is defined by the corresponding minimum and maximum objective values. Hence. The function “divideAGA” is internally used to perform the grid division.2 Adaptive Grid Archiving 28 EVOLUTIONARY ALGORITHMS Let us discuss the adaptive grid archiving algorithm (AGA) as example for a more sophisticated approach to prune the set of non-dominated / non-prevailed individuals. Furthermore. Here. This |f |-dimensional objective space Y is divided in a grid with d divisions in each dimension. It transforms the objective values of each individual to grid coordinates stored in the array lst. it can treat the individuals as |f |-dimensional vectors where each dimension corresponds to one objective function f ∈ f .

cnt) lst[i][j − 1] ←− ⌊(fj (Pl [i]) − mini[j − 1]) ∗ mul[j − 1]⌋ 28 . lst. lst. ∅) cnt ←− createList(len(Pl ) .28. mul: temporary stores Output: (Pl . −∞) for i ←− |f | − 1 down to 0 do mini[i − 1] ←− min{fi (p. 0) for j ←− 1 up to |f | do if (fj (Pl [i]) ≤ mini[j − 1]) ∨ (fj (Pl [i]) ≥ maxi[j − 1]) then cnt[i] ←− cnt[i] − 2 if cnt[i] > 0 then for j ←− i + 1 up to len(Pl ) − 1 do if lst[i] = lst[j] then cnt[i] ←− cnt[i] + 1 if cnt[j] > 0 then cnt[j] ←− cnt[j] + 1 return (Pl .25: (Pl . cnt): a tuple containing the list representation Pl of Pold . ∞) maxi ←− createList(|f |. j: counter variables Data: mini. lst.x) ∀p ∈ Pold } mul ←− createList(|f |. maxi. MAINTAINING A SET OF NON-DOMINATED/NON-PREVAILED INDIVIDUALS313 Algorithm 28.x) ∀p ∈ Pold } maxi[i − 1] ←− max{fi (p. d) Input: Pold : the set to be pruned Input: d: the number of divisions to be performed per dimension Input: [implicit] f : the set of objective functions Data: i. 0) for i ←− |f | − 1 down to 0 do if maxi[i] = mini[i] then d mul[i] ←− maxi[i] − mini[i] else maxi[i] ←− maxi[i] + 1 mini[i] ←− mini[i] − 1 Pl ←− setToListPold lst ←− createList(len(Pl ) .6. a list lst assigning grid coordinates to the elements of Pl and a list cnt containing the number of elements in those grid locations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 begin mini ←− createList(|f |. cnt) ←− divideAGA(Pold . 0) for i ←− len(Pl ) − 1 down to 0 do lst[i] ←− createList(|f |.

1. 3064. 1884. 3101].1 [308. 2535. 2320. 649. 2663. 1730. 354. 2676].26: Pnew ←− pruneOptimalSetAGA(Pold . 1835. 690. 2772. 1011–1013. lst. 30. 2448. 3071. 31. 30. 2025. 1177. 1532. 444. 1437. 33. 1148.1. 812.1. 1859. 2188. 2445].314 28 EVOLUTIONARY ALGORITHMS Algorithm 28. 2669. 451. 1803. 567. 1149. 802. 808. k) Input: P old: the set to be pruned Input: d: the number of divisions to be performed per dimension Input: k: the maximum size allowed for the set (k ≥ 2|f |) Input: [implicit] f : the set of objective functions Data: i: a counter variable Data: Pl : the list representation of Pold Data: lst: a list assigning grid coordinates to the elements of Pl Data: cnt: the number of elements in the grid locations defined in lst Output: Pnew : the pruned set 1 2 3 4 5 6 7 8 9 10 11 12 13 begin if len(Pold ) ≤ k then return Pold (Pl . 1653. 445. 644. 1429. 2144. 1696. 371. 1486.4: Applications and Examples of Evolutionary Algorithms. 678. 2245. 34.1 . 457. and 35. 1311. 2811.7. 1961. 1830. 2499. 2633.1 and 31.7 28. 1417. 2162. see also Tables 29.1 [480] [587. 1145. 820. 31. 1822–1825. cnt) ←− divideAGA(Pold . 619.1.1. d) while len(Pl ) > k do idx ←− 0 for i ←− len(Pl ) − 1 down to 1 do if cnt[i] > cnt[idx] then idx ←− i return listToSetPl Pl ←− deleteItemPl idx cnt ←− deleteItemcntidx lst ←− deleteItemlstidx for i ←− len(Pl ) − 1 down to 0 do if (lst[i] = lst[idx]) ∧ (cnt[i] > 0) then cnt[i] ←− cnt[i] − 1 28.1. d. see also Tables 29.1 General Information on Evolutionary Algorithms Applications and Examples Table 28. 564. 360. 1117.1. 400. 491. see also Tables 29. 2189. 1871. 2335. and 32. 1290. 1618. 810. 2055. 2105. 3027. 2862. 807.1. Area Art Astronomy Chemistry Combinatorial Problems References [1174.

1 Control [1710]. 1717. 1532. 2861. 2645. 2470. 34. see also Tables 29.1 Data Mining [144. 1883. 329. 2677. 33. 777.1. 30. 31. 1823. 1800. 2111. 1733. 2237. 1733. 2645. 2189.1. 89. 439. 1869. 404. see also Tables 29. 32. 1217. 2189. see also Tables 29. 30. 30. 1872]. 2633. 2893. 1995.1. 2099. and 35.1. 3018]. 810.1 Databases [371. 2364. 328. 1883. 2628. 953. 31. 1429. 31. 2864.7. 439.1. 669. 2693. 2766]. 353. 33. 1006.1.1.1. 1479. 1177. 444. 1523. 31. 3085]. 2677. see also Tables 29. 2188. 1961. 32. 34.1 Military and Defense [439. 377. 2493. 2210. 545. 1574. 33.1. 33.1. 605. 1835. 31. 31. 3027.1. 2083. 820.1. 1830.1. see also Tables 29.1 and 31. 2644. 579. 1149. 156. 605.1. 2425. 2663. 202. 1727. 1310. 30. 2440.1 Mathematics [376. 2213. 2772. 1298. 34. 2503. 2020. 1013. 963. 2100.1. and 34. 619.1.1.1. and 34. 1225. 2189. 2811.1.1. 63. see also Tables 29. 125. 34.1. see also Tables 29. 2058. 1749. 871. 1661. 2440.1. 2651]. 1574.1 Healthcare [567. 2862. 2445. 1718. 2059. 2407.1. 480. 804. 30. 2021. 2669. and 35. 1803. 535. see also Tables 29. 1094. see also Tables 29. see also Tables 29. 2059. see also Tables 29. 352–354. 2268. 1661. see also Tables 29.1 Physics [407. 786. 3064. 30. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS Computer Graphics 315 [1452. 1574.[89. 1653. 871. 2537]. 808. 2799].1. 1007. 32. 2111. 1733. and 35. 34. 1508.1. 1225. 2645. see also Tables 29.1. 882. 2144. 31. and 35.1 Distributed Algorithms and Sys.1 Engineering [112. 644.1. 1323. 2448. and 35. 1830. 1749. 1577. 1486. 31. 1292. 1838.1. 31. see also Tables 33.1 Image Synthesis [291. 517. 853.1 Data Compression see Table 31.1. 1486. 644. 1838. 2644.1 and 34.1. 33. see also Table 29. 702. 67. see also Tables 29.1. 1746. 34.1. 2694. 2020. 1986.1. 31.1.1 Economics and Finances [549.1 . 32.1.1. 34. 2499.28. 2470. and 33. 2096. 33. 2387. 1863. 2144. 3085]. 33. 2858. 30. 2493. 882. 2162. 1710.1. 678]. 811. 795. 812.1. 882. 1298. and 35.1 Cooperation and Teamwork [1996. 267. 1884.1 Digital Technology [1479].1 Graph Theory [39. 2743. 1934. 2663. 328.1. 1148. 1011. 2663.1. 1718. 1863. 2763]. 2791. see also Tables 29.1.1 Motor Control [489]. 1094. 112. 1048. 843. 2862. 2651] Logistics [354. 445. 2058. 567. 1532.1. 2425.1. 31. 1883. 2245. 1825. 3027.1. 156.1. 1094.1 Multi-Agent Systems [967. 1007.1. 2425. 451.1. 567. 2025. 1730. 2488. and 35. 1696. 786. 1145. 777. see also Table 34. 1290. 2058. 1825. 31. see also Tables 29. 517.1 Games [2350]. 963. 1835. 705. 2424]. 2144. 1749. tems 663. 1661. 2320. 1225. 32. 2071. 1746. see also Tables 29. and 35. 787. 786.1. 2100.1. 32. 2912]. 31.1.1. and 35.1.1 Multiplayer Games see Table 31.1 Decision Making see Table 31. 2791. 585. 1661. 2676.1 Function Optimization [742. 1927. and 32. 2694. 30. and 35.1. 2799. 32. 31. 669. 2903. 354. 1633. 1824.1. 2693. 2387. and 35. 855.1 E-Learning [1158. 3061. 3071]. 2350. 2468. 677. 2535. 1995. 112. 579. 202. and 35.1. 2694.1. 802. 2622].1. 550. 2864].1. 1835. 2644. 1859.1. 853. 2628. 2864. 2448. 2448. 2677]. 669. 2487. 1018. 360. 843. 1800. 1822.

Science and Engineering [563] Evolutionary Computation [847] Evolutionary Computation [863] Evolutionary Multiobjective Optimization – Theoretical Advances and Applications [8] Evolutionary Computation 2: Advanced Algorithms and Operators [175] Evolutionary Algorithms for Solving Multi-Objective Problems [599] Genetic and Evolutionary Computation for Image Processing and Analysis [466] Evolutionary Computation: The Fossil Record [939] Swarm Intelligence: Collective. 30. 1791. 15.1 Sorting [1233]. 30. 1486. 9. 22.1 Telecommunication [820] Testing [2863]. 787. 1934. 18. 20. 13. 1486.1. 2144. see also Table 31. 1661. 2863. 16.1.1 Software [703. 10. 1934. Books Handbook of Evolutionary Computation [171] Variants of Evolutionary Algorithms for Real-World Applications [567] New Ideas in Optimization [638] Multiobjective Problem Solving from Nature – From Concepts to Applications [1554] Natural Intelligence for Scheduling. 11. 27.1. 7.2 1. 33. 1995. 30.1. 12. 31. 29. Neural. 2677]. 31. 19. 1290.1. 68. 8.316 Prediction Security 28 EVOLUTIONARY ALGORITHMS [144.7. 24. 540. 1986] Wireless Communication [39. and 35.1. 1507. 31. 17. see also Tables 29. see also Tables 29. 1045].1 28.1. 23.1. 34.1 Verification Water Supply and Management [1268.1 [404. 67. 6. 36. 2677.1.1. see also Table 31. 842. 33. 37. 155. and 35. 2.1. and 34. and 35. 2364. 1717. 2237. 14.1. Planning and Packing Problems [564] Advances in Evolutionary Computing – Theory and Applications [1049] Bio-inspired Algorithms for the Vehicle Routing Problem [2162] Evolutionary Computation 1: Basic Algorithms and Operators [174] Nature-Inspired Algorithms for Optimisation [562] Telecommunications Optimization: Heuristic and Adaptive Techniques [640] Hybrid Evolutionary Algorithms [1141] Success in Evolutionary Computation [3014] Computational Intelligence in Telecommunications Networks [2144] Evolutionary Design by Computers [272] Evolutionary Computation in Economics and Finance [549] Parameter Setting in Evolutionary Algorithms [1753] Evolutionary Computation in Dynamic and Uncertain Environments [3019] Applications of Multi-Objective Evolutionary Algorithms [597] Introduction to Evolutionary Computing [865] Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery: Implications in Business. 35. 32. 31. Adaptive [1524] Parallel Evolutionary Computations [2015] Evolutionary Computation in Practice [3043] Design by Evolution: Advances in Evolutionary Design [1234] New Learning Paradigms in Soft Computing [1430] Designing Evolutionary Algorithms for Dynamic Environments [1956] How to Solve It: Modern Heuristics [1887] Electromagnetic Optimization by Genetic Algorithms [2250] Advances in Metaheuristics for Hard Optimization [2485] Soft Computing: Integrating Evolutionary. 21. 2189. see also Tables 29. 1486. 4.3 Theorem Proofing and Automatic see Table 29. 34. 5. 31. and Fuzzy Systems [2685] . 26. 63. 1508. 1749]. see also Tables 29.1. 871. 2213. 25. 33. 644. 3. 1508. 28. 2887].

2502] CEC: IEEE Conference on Evolutionary Computation History: 2011/06: New Orleans. NV. 2280] 2003/07: Chicago. 2268. Self-Adaptive Heuristics for Evolutionary Computation [1617] 49. 1515. 304. 2700] 2006/07: Seattle. 2079. see [2027] 2010/07: Barcelona. Data Mining and Knowledge Discovery with Evolutionary Algorithms [994] 53. USA. Multi-Objective Optimization in Computational Intelligence: Theory and Practice [438] 42. Evolutionary Computation: Theory and Applications [3025] 59. 2155. USA. GA. Evolutionary Computation: A Unified Approach [715] 44. 752. Evolutionary Computation in Data Mining [1048] 46. see [1354] 2009/05: Trondheim. see [905. 2093. 1675. see [585. OR. Introduction to Stochastic Search and Optimization [2561] 47. see [1516. 2935. 2077] 2001/07: San Francisco. 1519. Spain. WA. Advances in Evolutionary Algorithms – Theory. 2986] 1999/07: Orlando. 2579. DC. USA. Design and Practice [41] 55. see [1104. USA. 478. CA. see [30. 1790. 2339] 2004/06: Seattle. Evolution¨re Algorithmen [2879] a 54. Representations for Genetic and Evolutionary Algorithms [2338] 50. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design [460] 41. Genetic Algorithms [167] 51. Optimization. USA. USA. Evolutionary Computation [2987] 28. Evolutionary Algorithms in Theory and Practice: Evolution Strategies. Norway. 482. QC. see [484. IL. USA. Search Methodologies – Introductory Tutorials in Optimization and Decision Support Techniques [453] 52. 2537] 2007/07: London. NY. 2699. see [841] 2010/07: Portland. GECCO: Genetic and Evolutionary Computation Conference History: 2011/07: Dublin. see [751. Advances in Evolutionary Algorithms [1581] 48. 485. Genetic Algorithms in Search. Canada. USA.7. see [2342. 2343] e 2008/07: Atlanta. Catalonia. UK. see [27.5: Conferences and Workshops on Evolutionary Algorithms. The Art of Artificial Evolution: A Handbook on Evolutionary Art and Music [2320] 39. Evolutionary Programming. 2337. Evolutionary Optimization [2394] 40. 2441] 2005/06: Washington.28. Computer Simulation in Genetics [659] 43. see [224. 2578] 2002/07: New York. LA.7. 1872. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 317 38. 31] 2009/07: Montr´al. USA. see [211. Ireland. 2570] 2000/07: Las Vegas. USA. see [1452. FL. see [1350] . 2928. WA.3 Conferences and Workshops Table 28. Evolutionary Algorithms in Molecular Design [587] 56. Evolutionary Dynamics in Time-Dependent Environments [2941] 58. and Machine Learning [1075] 45. USA. Multi-Objective Optimization Using Evolutionary Algorithms [740] 57.

Australia. Essex. UK. 2413] o 2008/09: Birmingham. see [1333] Washington. see [2354. see [2355] 2002/09: Granada. USA. Scotland. The Netherlands. see [173] Nagoya. Iceland. Scotland. OR. USA. Australia. Spain. Poland. see [1445] Perth. see [788] 2009/04: T¨ bingen. see [3104] a a 2006/07: Vancouver. see [1052] u 2008/05: Naples. Italy. IN. USA. see [641] Portland. Korea. Catalonia. Milan. see [1334] La Jolla. see [2438] WCCI: IEEE World Congress on Computational Intelligence History: 2012/06: Brisbane. North Rhine-Westphalia. see [944] Gangnam-gu. see [1051] 2007/04: Val`ncia. see [1411] 2010/07: Barcelona. BC. China. Canada. see [2421] 1998/09: Amsterdam. DC. see [2340] 2004/04: Coimbra. UK and Dortmund. see [465] 2001/04: Lake Como. UK. China. Ireland. France. CA. Canada. see [3033] Edinburgh. Germany. see [1310] PPSN: Conference on Parallel Problem Solving from Nature History: 2012/09: Taormina. see [2253] 2002/04: Kinsale. see [1891] EvoWorkshops: Real-World Applications of Evolutionary Computing History: 2011/04: Torino. see [2395] Honolulu. USA. see [1889] a a Singapore. see [2254] 2003/04: Colchester. see [1369] Canberra. Japan. Spain. see [2412. see [866] 1996/09: Berlin. Germany.318 2008/06: 2007/09: 2006/07: 2005/09: 2004/06: 2003/12: 2002/05: 2001/05: 2000/07: 1999/07: 1998/05: 1997/04: 1996/05: 1995/11: 1994/06: 28 EVOLUTIONARY ALGORITHMS Hong Kong (Xi¯nggˇng). Australia. see [1361] Orlando. France. QLD. see [1827] 1990/10: Dortmund. Switzerland. UK. see [1050] e 2006/04: Budapest. see [343] 2000/04: Edinburgh. see [2341] 2005/03: Lausanne. WA. see [1343] Vancouver. Israel. Portugal. Seoul. see [110] Anchorage. Italy. Sweden. BC. Germany. see [1870] 2000/09: Paris. Belgium. USA. North Rhine-Westphalia. see [2675] 2010/09: Krak´w. see [2197] o 1998/04: Paris. see [1341] . AK. see [2496] Indianapolis. see [1356] 2008/06: Hong Kong (Xi¯nggˇng). USA. Spain. FL. see [2818] 1994/10: Jerusalem. Italy. Italy. Hungary. see [464] 1999/05: G¨teborg. 3028] 2006/09: Reykjavik. USA. HI. see [693] 1992/09: Brussels. Germany.

see [1443. H´ n´n. see [1150] ı’n´ a o 2007/08: Hˇikˇu. Switzerland. France. see [1191] 1995/09: Brest. see [2142] 2001/04: Prague. see [2061] 2005/03: Guanajuato. see [2586] 2009/10: Strasbourg. Poland. USA. Hˇin´n. France. see [606] 1999/11: Dunkerque. see [1711] a o a a 2006/09: X¯ an. see [608] 2007/10: Tours.28. 2008/10: J` an.7. see [600] e 2003/04: Faro. Austria. Brazil. Czech Republic. see [1379] a a 2010/08: Y¯nt´i. see [2589] 2009/04: Kuopio. see [2850–2852] a a u a 319 . see [2297] 2003/04: Roanne. see [257. see [1324] EMO: International Conference on Evolutionary Multi-Criterion Optimization History: 2011/04: Ouro Preto. China. HI. Finland. see [801] z 1997/04: Vienna. China. France. see [69] ICCI: IEEE International Conference on Cognitive Informatics See Table 10. Portugal. France. China. see [2527] 1995/04: Al`s. France. MG. China. Slovenia. see [1564] 2007/04: Warsaw. see [862] 2007/03: Matsushima. France. Shˇnx¯ China. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms History: 2011/04: Ljubljana. China. France. see [2657] 2003/10: Marseilles. Sh¯nd¯ng. see [1641] 1999/04: Protoroˇ. France. see [1330] 1994/06: Orlando. M´xico. Austria. see [957] 2001/03: Z¨ rich. Portugal. ICNC: International Conference on Advances in Natural Computation History: 2012/05: Ch´ngq` o ıng. FL. France. see [948] 1997/10: Nˆ ımes. see [1355] a a a o 2009/08: Ti¯nj¯ China.2 on page 128. see [2654] 2009/04: Nantes. France. see [2141] e 1993/04: Innsbruck. Sh¯nd¯ng. France. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 2002/05: Honolulu. Slovenia. 258] 2005/03: Coimbra.2 on page 128. 2005/08: Ch´ngsh¯. see [75] 1994/09: Toulouse. 1444] ı’¯ a ı. see [74] HIS: International Conference on Hybrid Intelligent Systems See Table 10. see [2848] a ın. see [1736] 2001/10: Le Creusot. USA. France. Japan. France. see [574] 2011/07: Sh`nghˇi. see [1338] 1998/05: Anchorage. Sendai. see [1928] 2005/10: Lille. AK. USA. China. see [3099] u AE/EA: Artificial Evolution History: 2011/10: Angers.

see [557] a o a o 2005/12: X¯ an. see [1393] 2008/12: S¯ zh¯u. MIC: Metaheuristics International Conference See Table 25.2 on page 132. Slovenia. EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. Slovenia. see [2475] ICARIS: International Conference on Artificial Immune Systems See Table 10. . NC. Slovenia. see [920] 2006/10: Ljubljana. see [2293] 1997/03: Research Triangle Park. see [546] 2002/03: Research Triangle Park. NC. USA. Slovenia. China. Heilongjiang China. USA. see [1192. EngOpt: International Conference on Engineering Optimization See Table 10.2 on page 131. see [1392] u o a u 2007/12: Harbin. see [508] 2000/02: Atlantic City. China. USA. NSW. Ji¯ngs¯ . Slovenia. see [919] 2004/10: Ljubljana. see [2853] 1998/10: Research Triangle Park. USA. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. see [2379] 2003/09: Cary. MICAI: Mexican International Conference on Artificial Intelligence See Table 10. see [349] 2010/05: Ljubljana. WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10. Austria.2 on page 130. NC. NJ.320 28 EVOLUTIONARY ALGORITHMS SEAL: International Conference on Simulated Evolution and Learning See Table 10.2 on page 131.2 on page 131.2 on page 132.2 on page 132.2 on page 130. Man.2 on page 227. and Cybernetics See Table 10. see [1370] CIS: International Conference on Computational Intelligence and Security History: 2009/12: Bˇij¯ e ıng. Information. Guˇngd¯ng. Australia. see [921] 2008/10: Ljubljana.2 on page 130. see [918] BIONETICS: International Conference on Bio-Inspired Models of Network. Shˇnx¯ China. SMC: International Conference on Systems. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. FEA: International Workshop on Frontiers in Evolutionary Algorithms History: 2005/07: Salt Lake City. China. UT. NC. see [1390] 2006/11: Guˇngzh¯u. USA. Control and Automation History: 2006/11: Sydney. CIMCA: International Conference on Computational Intelligence for Modelling. see [1923] 2005/11: Vienna. 1193] ı’¯ a ı.2 on page 131. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications History: 2012/05: Bohinj. and Computing Systems See Table 10. USA.

IJCCI: International Conference on Computational Intelligence History: 2010/10: Val`ncia. HI. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. see [2754] ee ¯ ı. see [121] 2008/06: Las Vegas. USA. Optimization. Progress in Evolutionary Computation: Workshops on Evolutionary Computation History: 1993—1994: Australia. see [2552] 2010/07: Las Vegas. USA. China. see [914] e 2009/10: Madeira. see [1116] 2005/03: Lausanne. see [1115] GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation History: 2009/06: Sh`nghˇi.7. NV. USA. New Zealand.28. NV. Anhu¯ China. NV. Logic Programming. and Computational Algebraic Geometry See Table 10. see [646] e 2006/04: Budapest. USA. NV. see [3023] WOPPLOT: Workshop on Parallel Processing: Logic.2 on page 133. Spain. see [120] 2007/06: Las Vegas. USA. USA.2 on page 133. see [3000] a a GEM: International Conference on Genetic and Evolutionary Methods History: 2011/07: Las Vegas. see [119] 2009/07: Las Vegas. Portugal. Spain.2 on page 133. see [2775] 2007/04: Val`ncia. Italy.2 on page 134. see [645] u 2008/03: Naples. EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation History: 2011/04: Torino. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems History: 2001/11: Dunedin. .2 on page 133. NV. see [1865] GICOLAG: Global Optimization – Integrating Convexity.2 on page 134. Italy. see [2588] FOCI: IEEE Symposium on Foundations of Computational Intelligence History: 2007/04: Honolulu. LION: Learning and Intelligent OptimizatioN See Table 10. see [122] GEWS: Grammatical Evolution Workshop See Table 31. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10. Switzerland.2 on page 385.2 on page 133. see [1498] ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. Portugal. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 321 EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization History: 2009/04: T¨ bingen. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. Organization and Technology See Table 10. see [2252] 2004/04: Coimbra. Hungary. Germany. see [1809] IWNICA: International Workshop on Nature Inspired Computation and Applications History: 2011/04: H´f´i.

Austria. see [2995] u o a u 2009/10: Mexico City. see [1099] ICEC: International Conference on Evolutionary Computation History: 2010/10: Val`ncia. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. see [1122] ee ¯ ı. Evolutionary Computation published by MIT Press 3. Anhu¯ China. see [1345] 28. see [2311] 2007/07: London. Anhu¯ China. IEEE Transactions on Systems. NaBIC: World Congress on Nature and Biologically Inspired Computing History: 2011/11: Salamanca. SEA: International Symposium on Experimental Algorithms See Table 10. see [1545] 2009/12: Coimbatore. China. and World Scientific Publishing Co. HI. History: 2011/04: Paris. Spain. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands 6. TN. see [2595] SSCI: IEEE Symposium Series on Computational Intelligence Conference series which contains many other symposia and workshops. Japan. Japan.7. India.. Man. Anhu¯ China. Natural Computing: An International Journal published by Kluwer Academic Publishers and Springer Netherlands 7. see [1357] 2009/03: Nashville. SEMCCO: International Conference on Swarm. see [1875] e 2008/06: Macau. see [2757] ee ¯ ı. see [905] 2005/06: Oslo. M´xico. Norway. see [1276] e 2009/10: Madeira. Evolutionary and Memetic Computing History: 2010/12: Chennai. Spain. see [2752] META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. 2756] ee ¯ ı. 5. Ji¯ngs¯ . Man. see [479] EvoRobots: International Symposium on Evolutionary Robotics History: 2001/10: Tokyo. and Cybernetics – Part B: Cybernetics published by IEEE Systems. see [2755. Journal of Artificial Evolution and Applications published by Hindawi Publishing Corporation 9. see [12] WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms History: 2010/09: Vienna.322 28 EVOLUTIONARY ALGORITHMS 2010/10: H´f´i. China.4 Journals 1. see [2325] IWACI: International Workshop on Advanced Computational Intelligence History: 2010/08: S¯ zh¯u. International Journal of Computational Intelligence and Applications (IJCIA) published by Imperial College Press Co. and Cybernetics Society 4. USA. IGI Global) . see [2368] 2010/12: Kitakyushu. Anhu¯ China. International Journal of Computational Intelligence Research (IJCIR) published by Research India Publications 10. see [2758] ee ¯ ı. India.2 on page 134.2 on page 228. France.2 on page 135. 2006/10: H´f´i. see [1353] 2007/04: Honolulu. Journal of Heuristics published by Springer Netherlands 8. USA. 2004/10: H´f´i. 2008/05: H´f´i. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) published by Idea Group Publishing (Idea Group Inc. Portugal. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2. UK.

28. Inc. IEEE Computational Intelligence Magazine published by IEEE Computational Intelligence Society 12. GENERAL INFORMATION ON EVOLUTIONARY ALGORITHMS 323 11. International Journal of Swarm Intelligence and Evolutionary Computation (IJSIEC) published by Ashdin Publishing (AP) . Computational Intelligence published by Wiley Periodicals. 14.7. Journal of Optimization Theory and Applications published by Plenum Press and Springer Netherlands 13.

You may use Listing 56.1.13 in a new class.1 on page 266.2. You may use Listing 56.14 as blueprint implement the interface ISelectionAlgorithm given in Listing 55. Implement a general Evolutionary Algorithm with archive-based elitism. [20 points] 76.11 as blueprint and create a new class derived from EABase (given in ?? on page ??) which provides the steady-state functionality described in Section 28. Implement a general steady-state Evolutionary Algorithm.4 on page 267.11 as blueprint and create a new class derived from EABase (given in ?? on page ??) which provides the steady-state functionality described in Paragraph 28. You may use Listing 56.4.1.324 Tasks T28 28 EVOLUTIONARY ALGORITHMS 75. Implement the linear ranking selection scheme given in Section 28.4.5 on page 300 for your version of Evolutionary Algorithms. [20 points] 77. [20 points] .4.

223] and the computer scientist Fraser [941. In asexual reproduction. 324. Because of the simple structure of the search spaces of Genetic Algorithms. 2232] that can be tackled with Genetic Algorithms. 221. the objective functions f here are often referred to as fitness functions. 989] began to apply computer-aided simulations in order to gain more insight into genetic processes and the natural evolution and selection. and (b) search operations such as mutation and crossover directly modify these genotypes.wikipedia.1 Introduction Genetic Algorithms1 (GAs) are a subclass of Evolutionary Algorithms where (a) the genotypes g of the search space G are strings of primitive elements (usually all of the same type) such as bits. 1220.org/wiki/Genetic_algorithm [accessed 2007-07-03] . because of the close relation to biology and since Genetic Algorithms were originally applied to single-objective optimization. This is a historically grown 1 http://en. 222. It should further be mentioned that. 988. and research and development [564. economy.1. 326] used evolutionary approaches based on binary string genomes for solving inequalities. 2931]. Today. and Frantz [985] – all under the advice of Holland at the University of Michigan. They provide search operators which try to closely copy sexual and asexual reproduction schemes from nature. 1250. Bremermann [407] and Bledsoe [323. important research on such search spaces was contributed by Bagley [180] (who introduced the term Genetic Algorithm). 29. there are many applications in science.Chapter 29 Genetic Algorithms 29. often a non-identity mapping is used as genotype-phenotype mapping to translate the elements g ∈ G to candidate solutions x ∈ X [1075. As a result of Holland’s work [1247–1250] Genetic Algorithms as a new approach for problem solving could be formalized finally became widely recognized and popular. Jr. 511]. [510. Cavicchio. As same as important was the work of De Jong [711] on benchmarking GAs for function optimization from 1975 preceded by Hollstien’s PhD thesis [1258]. Genetic Algorithms are the original prototype of Evolutionary Algorithms and adhere to the description given in Section 28.1.1 History The roots of Genetic Algorithms go back to the mid-1950s. or real numbers.1. At the end of that decade. the genotypes of the two parents genotypes will recombine. where biologists like Barricelli [220. and for determining the weights in neural networks in the early 1960s. 325. In such “sexual” search operations.. mutations are the only changes that occur. Rosenberg [2331]. for function optimization. integer.

1: A Sketch of a DNA Sequence. the positions of the genes may shift when the reproduction operations are applied. a string chromosome can either be a fixed-length tuple (Equation 29. . g [1].2) This is not given in variable-length string genomes. G = {∀lists g : g [i] ∈ GT ∀0 ≤ i < len(g)} G = GT ∗ (29. Thymine Cytosine Adenine Guanine Hydrogen Bond Phosphate Deoxyribose (sugar) rough simplification gÎG 1 0 1 0 0 1 1 0 0 1 1 1 0 1 0 0 0 1 Search Space G: all bit strings of a given length Genotypes g: specific bit strings Figure 29. G = {∀ (g [0].n − 1} G = G0 × G1 × . Then.. Genotypes in nature encompass the whole hereditary information of an organism encoded in the DNA illustrated of the DNA. Here.2 Genomes in Genetic Algorithms Most of the terminology which we have defined in Part I and used throughout this book stems from the GA sector.. Genetic Algorithms worked on bit strings of fixed length.1 Classification According to Length In the first case.1. 29. Like their natural prototypes. the genomes in Genetic Algorithms are strings. are referred to genomes and their elements are called genotypes because of their historical development. 1912].. the loci i of the genes g [i] are constant and hence. each gene can have a fixed meaning.. the tuples may contain elements of different types Gi .2. Today.326 29 GENETIC ALGORITHMS misnaming which should not be mixed up with the fitness assignment processes discussed in Section 28. Originally. all elements of such genotypes must have the same type GT .3) of elements.3 on page 274 and the fitness values v used in the context of this book. as sketched in Figure 29. these genotypes are also often called chromosomes. Thus.1) or a variable-length list (Equation 29.4) .1) (29. 29. for instance. linear sequences of certain data types [1075. The DNA is a string of base pairs that encodes the phenotypical characteristics of the creature it belongs to. The search spaces G of Genetic Algorithms.3) (29. g [n − 1]) : g [i] ∈ Gi ∀i ∈ 0. 1253. Because of the linear structure. × Gn−1 (29.

a gene is a segment of nucleic acid that contains the information necessary to produce a functional RNA product in a controlled manner. 1684. Messy genomes (see Section 29.29. GAs with bit string genome are referred to as binary-coded if no re-coding genotype-phenotype mapping is applied.3 on page 82) is the basic informational unit in a genotype g. each gene may be an integer number identifying a city within China and the genotype. (2) vectors of integer numbers. in a fixed-length string genome..1). A gene (see Definition D4. An allele (see Definition D4. each locus may have a direct connection to an element of the phenotype. the genes do not only code for a specific phenotypic element. In biology. would represent the order in which they are visited. are called real-coded [1509].2 on page 45. the Algorithmic Chemistry programs in execution are not interpreted sequentially. In their use in linear Genetic Programming developed by Lasarczyk and Banzhaf [207. there exist indirect genotype-phenotype mappings (Section 28. gray coding is used as discussed in Example E28. but also for the position in the phenotype where this element should be placed. i. Hence. e. Finally. for example. In many applications.9.2) and encodings where there is no relation between the location of a gene and its meaning or interpretation. we would refer to the GA as gray-coded. for instance. 29. the phenotypic features have more uniform meanings. similar structures can be observed. where G ⊆ Rn .2. Here.6) where introduced to improve locality by linkage learning: Here. 1685].. the genes do not code for distinct and different features of the phenotypes. Yet. for example. there are also parts of natural genomes which have no (obvious) function [1072. a permutation of these identifiers.4) is a value of specific gene in nature and in EAs alike. When. e..2.2.1. the sequence of the genes is still important. the meaning of a gene (an instruction) here only depends on its allelic state (its value) and not on its position (its locus) inside the chromosome. The locus (see Definition D4. 29. in fixed-length chromosomes there may be a specific meaning assigned to each locus in the genotype (if direct genotype-phenotype mappings are applied as outlined in Section 28.1 (Intron). In some representations in Evolutionary Algorithms.1. each gene stands for an instruction. 2 http://en. 2870]. gene 1 stands for the color of the dinosaur’s neck whereas gene 4 codes the shape of its head. GENOMES IN GENETIC ALGORITHMS 327 29. i. Definition D29. Genetic Algorithms with numeric vector genomes. especially in variable-length genomes.org/wiki/Intron [accessed 2007-07-05] . As stated in Section 29.3 Classification According to Meaning of Loci Let us shortly recapitulate the structure of the elements g of the search space G given in Section 4.2 Classification According to Element Type The most common string chromosomes are (1) bit strings. Instead. A good example for the latter are Algorithmic Chemistries which we will discuss in ?? on page ??. this connection of locus to meaning is loose.2.1 on page 177.2.2. An intron is a part of a genotype g ∈ G that does not contribute to the phenotype x = gpm(g). Unlike normal computer programs. at each step. In Example E17.2. the processor randomly chooses one instruction from the program.wikipedia.5) is the position where a specific gene can be found in a chromosome. In. and (3) vectors of real numbers.4 Introns Besides the functional genes and their alleles. The American biochemist Gilbert [1056] coined the term intron 2 for such parts. plays no role during the genotype-phenotype mapping. Although this was an example from nature. solving the Traveling Salesman Problem given in Example E2. Instead.

i. 29.3 Fixed-Length String Chromosomes Especially widespread in Genetic Algorithms are search spaces based on fixed-length chromosomes. createGAFL(n.n (29. parts of the genome that were translated to proteins in evolutionary past. 29. g [2]. Bn ) which creates bit strings g ∈ G of length n (G = Bn ). In reference to Equation 29. 1) < 0.1 Creation: Nullary Reproduction 29... we show how this general equation can be implemented for fixed-length bit-string genomes..1: g ←− createGAFLBin(n) Input: n: the length of the individuals Data: i ∈ 1. i. They represent a form of redundancy which may have positive as well as negative effects. in Equation 29. The properties of their reproduction operations such as crossover and mutation are well known and an extensive body of research on them is available [1075. e. for instance. g [n]) : g [i] = Gi [⌊randomUni[0.1 on page 326. . Instead. Currently though. the role introns in Genetic Algorithms is mysterious as well. In Algorithm 29.1.328 29 GENETIC ALGORITHMS Biological introns have often been thought of as junk DNA or “old code”.5 then v ←− 0 else v ←− 1 i ←− i − 1 g ←− addItem(g. as outlined in Section 16. v) return g . Similarly.1.1 Fully-Random Genotypes The creation of random fixed-length string individuals means to create a new tuple of the structure defined by the genome and initialize it with random values. G) ≡ (g [1].n: the running index Data: v ∈ B: the running index Output: g: a new bit string of length n with random contents 1 2 3 4 5 6 7 8 9 begin i ←− n g ←− () while i > 0 do if randomUni[0.3. Although creation is a nullary reproduction operation. but now are not used anymore..5) Algorithm 29.3.5 for fixedlength genomes where the elementary types Gi from G0 × G1 × . len(Gi ))⌋] ∀i ∈ 1. as well as the search space G to the operation in order to keep the definition general. the dimension of the search space. they seem to provide support for efficient splicing. e. we could roughly describe this process with Equation 29.. i. in some cases.5 on page 174 and ??. e... we define an algorithm createGAFLBin(n) ≡ createGAFL(n. × Gn−1 = G are finite sets and their j th elements can be accessed with an index Gi [j].. many researchers assume that introns are maybe not as useless as initially assumed [660].5 we provide n. 1253]. one which does not receive any genotypes g ∈ G as parameter.

. FIXED-LENGTH STRING CHROMOSOMES 29....37 on page 804.n − 1.. of course. the allelic value g [i] of the gene is the index of the element sg[i] ∈ S which will appear at that position in the phenotype x. i.6 on page 643.n − 1). gpmΠ (g. tmp[j]) tmp[j] ←− tmp[i] return g From a more general perspective.29. Genetic Algorithms 0 can also work on permutations. Usually. the gene at locus i in the genotype g ∈ G defines which element s from the set S should appear at position i in the permutation x ∈ X = Π(S). Nn . as defined in Section 51. Algorithm 29.. n − 1) i ←− n g ←− () while i > 0 do j ←− ⌊randomUni[0.3. 1. More precisely.. The genotypes g in the search space G are vectors of natural numbers from 0 to n − 1.6) In other words. we can create a new genotype g ∈ G using the algorithm Algorithm 29. . n − 2.n − 1 and si = i ∀i ∈ 0.6. S) = x : x[i] = sg[i] ∀i ∈ 0. . .3. . or Rn ). j: indices into tmp and g Output: g: the new random permutation of the numbers 0. The mapping of these vectors to the permutations can be done implicitly or explicitly according to the following Equation 29.2 Permutations 329 Besides on arbitrary vectors over some vector space (be it Bn . we can consider the problem space X to be the space X = Π(S) of all permutations of the elements si : i ∈ 0.2 which is implemented in Listing 56.n − 1 of a given set S. If G is the space of all possible permutations of the n natural numbers from 0 to n − 1. i)⌋ i ←− i − 1 g ←− addItem(g.1. S = 0. e.n − 1 (29.2: g ←− createGAPerm(n) Input: n: the number of elements to permutate Data: tmp: a temporary list Data: i.n − 1 1 2 3 4 5 6 7 8 9 10 begin tmp ←− (0.. G = Π(0.

. 29.2. 29.2.b. creates a neighborhood as defined in Equation 29. If parts of the genotype interact epistatically (see Chapter 17). The method sketched in Fig. Fig.b. 29.a.8: depending on the probability ε(l) of modifying exactly 0 < l ≤ n elements..2.2 Multi-Point Mutation It is also possible to perform a multi-point mutation. the method sketched in Fig. Fig. such a multi-point mutation may be the only way to escape local optima. If the genome consists of bit strings. 29.b and Fig. Of course. 29.d: Complete mutation. In this case.a: Single-gene mutation.2. 29.3.c: Uniform multi-gene mutation (ii).2. Figure 29. adjacentε(l) (g2 .2. g1 ) = l (29.3.c).b: Consecutive multi-gene mutation (i). the number of bits to be modified at once can be drawn from a (bounded) exponential distribution (see Section 53. This becomes clear when assuming that the search space was the bit strings of length n. 29.330 29. all individuals in Hamming distance of exactly l become adjacent (with a probability of exactly exactly ε(l)).2. adjacentε (g2 . G = Bn . to change 0 < n ≤ len(gp ) locations in the genotype gp are changed at once (Fig.2. this can be achieved by randomly modifying the value (allele) of a gene. g1 ) = 1 (29.4. as illustrated in Fig. 29.3 on page 682).2. i. illustrated in Fig.2. this single-point mutation is called single-bit flip mutation. Fig.c. In this case.1 Single-Point Mutation In fixed-length string chromosomes.c is far more efficient than the one in Fig.2. 29. random changes into them. this operation creates a neighborhood where all points which have a Hamming distance of exactly one are adjacent (see Definition D4.2 Mutation: Unary Reproduction 29 GENETIC ALGORITHMS Mutation is an important method for preserving the diversity of the candidate solutions by introducing small.2. e.7) 29.3.2. 29. g1 ) = dist ham(g2 . can only put strings into adjacency which have a Hamming distance of l and where the modified bits additionally from a consecutive sequence. Fig. similar considerations would hold for any kind of string-based search space. The former method.8) If a multi-point mutation operator is applied. 29. 29. on the other hand.10 on page 85). g1 ) = dist ham(g2 .2: Value-altering mutation of string chromosomes.

1. This would make little sense in binary-coded search space. len(g) − 1) repeat i ←− ⌊randomUni[0. 29. 29.2.3. e. exactly two genes are changed. e.4. With probability (1 − 0.5)2 ∗ 0. It stops if it has toggled all bits (i.3.. σ 2 which we illustrated in Fig. e.2. With probability (1 − 0.125. j) g [j] ←− ¬g [j] until (len(q) = 0) ∨ (randomUni[0. .2... 1) < 0. means that the value at a given locus is simply inverted as sketched in Fig. it is common to instead modify all genes as sketched in Fig. This operation is referred to as bit-flip mutation.3 Vector Mutation In real-coded genomes. encodings where each gene is a single bit.1 on page 162) in the search space.5..n − 1 1 2 3 4 5 6 7 8 9 10 begin g ←− gp q ←− (0.5 = 0. However.2) with expected value g [i] and some standard deviation σ. i. FIXED-LENGTH STRING CHROMOSOMES 331 This would increase the probability ε(l) that only few genes are changed (l is low). 29. 29. n) Input: gp : the parent individual Input: n ∈ 1. 1) becomes lower than 0. the operator modifies three genes and so on. bigger modifications (l → n in search spaces of dimension n) would still be possible and happen from time to time with such a choice..b and outline in detail in Section 30. 29. we provide an implementation of the mutation operation sketched in Fig. which makes sense since one of the basic assumptions in Genetic Algorithms is that there exists at least some locality ( Definition D14. len(g))⌋ j ←− q [i] q ←− deleteItem(q.c which modifies a roughly exponentially distributed number of genes (bits) in a genotype from a bit-string based search space (G ⊆ Bn )..5) ∗ 0. since the inversion of all bits means to completely discard all information of the parent individual.3. i. e.. The latter condition has the effect that only one gene is modified with probability 0. For real-encoded genomes. j: temporary variables Data: q: the first len(gp ) natural numbers Output: g: the new random permutation of the numbers 0.a.4 Mutation: Binary and Real Genomes Changing a single gene in binary-coded chromosomes.2.3.5.3: g ←− mutateGAFLBinRand(gp . all loci have been used) or a random number uniformly distributed in [0. It does so by randomly selecting loci and toggling the alleles (values) of the genes (bits) at these positions. 2. Algorithm 29.29.5) return g In Algorithm 29.3.25.4.3.d.1 on page 364. . g [i]new ∼ N g [i]. i.len(gp ): the highest possible number of bits to change Data: i. that some small changes in the genotype also lead to small changes in the phenotype.5 = 0.. modifying an element g [i] can be done by replacing it with a number drawn from a normal distribution (see Section 53. 29.

Fig.3.and real-coded GAs . 29.a: Mutation in binary genomes.3: Mutation in binary.332 29 GENETIC ALGORITHMS g[i]-s g[i]+s g[i] g[i]’ Fig.b: Mutation in real genomes. 29.3. Figure 29.

This operation also preserves the validity of the conditions for permutations given in Definition D51. Algorithm 29.3.4.3.4. useful when solving problems that involve finding an optimal sequence of items. Permutation is.4: g ←− mutateSinglePerm(n) Input: gp : the parental genotype Data: i. j: indices into g Data: t: a temporary variable Output: g: the new. FIXED-LENGTH STRING CHROMOSOMES 29.38 on page 806.3.3).2 Multiple Permutations Different from Algorithm 29. len(g)) repeat j ←− randomUni[0.13 on page 643.4. we provide a simple unary search operation which exchanges the alleles of two genes.4.29. Here. If the genotype was a proper permutation obeying the conditions given in Definition D51. permutated genotype 1 2 3 4 5 6 7 8 9 10 begin g ←− gp i ←− randomUni[0. like the Traveling Salesman Problem as mentioned in Example E2. Figure 29. This. makes only sense if all genes have similar data types and the location of a gene has influence on its meaning. The number of elements which are swapped is roughly exponentially distributed (see Section 53.3.4: Permutation applied to a string chromosome. of course.3.3. Exchanging two alleles then equals of switching two cities in the route. these conditions will hold for the new offspring as well. Algorithm 29.3 Permutation: Unary Reproduction 333 The permutation operation is an alternative mutation method where the alleles of two genes are exchanged as sketched in Figure 29. 29.39 on page 808) repetitively exchanges elements. .4 which is implemented in Listing 56.1 Single Permutation In Algorithm 29. for instance.5 (implemented in Listing 56. exactly as sketched in Figure 29. len(g)) until i = j t ←− g [i] g [i] ←− g [j] g [j] ←− t return g 29.2. a genotype g could encode the sequence in which the cities are visited.13 on page 643.

j: indices into g Data: t: a temporary variable Output: g: the new. len(g))⌋ until i = j t ←− g [i] g [i] ←− g [j] g [j] ←− t until randomUni[0. len(g))⌋ repeat j ←− ⌊randomUni[0.5: g ←− mutateMultiPerm(gp ) Input: gp : the parental genotype Data: i. 1) ≥ 0.334 29 GENETIC ALGORITHMS Algorithm 29.5 return g . permutated genotype 1 2 3 4 5 6 7 8 9 10 11 12 begin g ←− gp repeat i ←− ⌊randomUni[0.

29.29. also called multi-point crossover (MPX). By setting the parameter k to 1 or 2.5 outlines the most basic methods to recombine two string chromosomes.5. 29.a.3 Multi-Point Crossover (MPX. 29. Amongst all Evolutionary Algorithms.3. becomes uniform crossover (see Section 29.4.3. and the middle part from the second parent chromosome.5. both parental chromosomes are split at a randomly determined crossover point.4). () Fig.5 p=0. Subsequently. 1-PX). which is performed by swapping parts of two genotypes. 3 This abbreviation is also used for simplex crossover. maybe by picking it with a bounded exponential distribution. [1159]). they at least mirror the school book view on it. [1159]).4. Gwiazda mentions alone 11 standard. . Figure 29. it is possible that two offspring are created at once from the two parents.5.5 p=0. 1-PX) When performing single-point crossover (SPX3 . With crossover.5 p=0.5 () Fig. Here. some of the recombination operations which probably comes closest to the natural paragon can be found in Genetic Algorithms.5. The second offspring is shown in the long parentheses.5 p=0. () Fig.4.a: Single-point Crossover (SPX. 66 binary-coded.5 p=0. Figure 29. Doing so it.n − 1 for each application of the operator. This way.5 p=0. FIXED-LENGTH STRING CHROMOSOMES 29.3.5: Crossover (recombination) operators for fixed-length string genomes. () Fig. Although these still cannot nearly resemble the idea of natural crossover.5 p=0. many different recombination operations can be performed with the same algorithm.2 Two-Point Crossover (TPX) In two-point crossover (TPX. sketched in Fig.5 p=0. and 89 real-coded crossover operators. the generalized form of this technique is specified: The k-point crossover operation (k-PX. 2-PX. see ??. a new child genotype is created by appending the second part of the second parent to the first part of the first parent as illustrated in Fig. we only can discuss a small fraction of those. k-PX) In Algorithm 29.5. 1-PX.b: Two-point Crossover (TPX. The operator takes the two parent genotypes gp1 and gp2 from an n-dimensional search space as well as the number 0 < k < n of crossover points as parameter.3. In his 2006 book [1159].6. p=0. single-point and two-point crossover can be emulated and with k = n − 1. 29. 2-PX). 29. The second possible offspring of this operation is again displayed in parentheses. It furthermore is possible to choose k randomly from 1. both parental genotypes are split at two points and a new offspring is created by using parts number one and three from the first. the so-called crossover. not mandatory.3.1 Single-Point Crossover (SPX.d: Uniform Crossover (UX). 29.5 p=0. however. 29.3.b)..4 Crossover: Binary Reproduction 335 There is an incredible wide variety of crossover operators for Genetic Algorithms.4. 29.5. k-PX). 29.c: Multi-point Crossover (MPX.

s.336 29 GENETIC ALGORITHMS Algorithm 29. e − s)) b ←− ¬b s ←− e return g . k) Input: gp1 . subList(gu .6: g ←− recombineMPX(gp1 . <) // perform the crossover by copying sub-strings g ←− () s ←− 0 b ←− true gu ←− gp1 for i ←− 0 up to k do e ←− CP [i] if b then gu ←− gp1 else gu ←− gp2 // add substring of length e − s from gu starting at index s to g g ←− appendList(g. e: the segment start and end indices Data: b: a Boolean variable for choosing the segment Data: gu ∈ G: the currently used parent Output: g ∈ G: a new genotype 1 2 3 4 5 6 7 8 begin // find k crossover points from 0 to n − 1 CP ←− (n) for i ←− 1 up to k do repeat cp ←− ⌊randomUni[0. gp2 . n − 1)⌋ + 1 until cp ∈ CP CP ←− addItem(CP. cp) 9 10 11 12 13 14 15 16 17 18 19 20 CP ←− sortAsc(CP. gp2 ∈ G: the parental genotypes Input: [implicit] n: the dimension of the search space G Input: k: the number of crossover points Data: i: a counter variable Data: cp: a randomly chosen crossover point candidate Data: CP : the list of crossover points Data: s.

5 ∗ (run )) where n is a natural number in order to prefer points which are somewhat centered between the original coordinates and for ±. gp2 ) Input: gp1 .5 (Weighted) Average Crossover The previously discussed crossover methods have primarily been developed for bit string genomes G = Bn . gp2 ∈ G: the parental genotypes Data: i: a counter variable Output: g ∈ G: a new genotype 1 2 3 4 5 begin g ←− gp1 for i ←− 0 up to len(g) − 1 do if randomUni[0. specified in Algorithm 29. For pure average crossover. 1). the algorithm could be redesigned to use different crossover points for the two parents. Notice that n.8.29 on page 792.c. the crossover points for both parents are always identical whereas for variablelength strings (discussed in Section 29. as weight γ we can use 1.3.3.9 for search spaces G ⊆ Rn . either a portion from the first parent gp1 or the second parent gp2 is copied.4 Uniform Crossover (UX) Uniform crossover (UX [1159. In binary-coded Genetic Algorithms.6. this makes indeed sense. i. to an operator which creates one child fromρ parents. Algorithm 29. Depending on whether it is in an odd (b = 1) or even (b = 0) iteration.29.5. 29.5 ± (0. e. the offspring will necessarily be located on one of the corners of the hypercube described by both parents [2099] and sketched in Figure 29. One way to do so is Equation 29. is also part of the list CP since we need to copy the sub-sequence between the last crossover point and the end of one of parent genotypes as well. we can use a (weighted) average of the two parent vectors as offspring. each element of the two parents g1 and g2 are combined by a weighted average to gnew . the allelic value if chosen with the same probability (50%) from the same locus of the first parent or from the same locus of the second parent.d. either + or − is chosen randomly (as done in Listing 56.5.4.3. Therefore. Then. It copies string sub-sequences between each two crossover points. 29.5 then g [i] ←− gp2 [i] return g 29.c on page 341). 2. Dominant (discrete) recombination which is introduced in Section 30. If the average is randomly weighted. for example). An example for the application of this algorithm illustrated in Fig. 0.6. the dimension of the search space. and exemplarily implemented in Listing 56.7: g ←− recombineUX(gp1 . a new random number ru uniformly distributed in (0. FIXED-LENGTH STRING CHROMOSOMES 337 In Algorithm 29.4. 29. 29. or . 1) < 0.36 on page 803) follows another scheme: Here.. If they combine two strings g1 and g2 . 2563]. for each locus in the offspring chromosome.1 on page 362 is a generalization of UX to an ρ-ary operator.4 and sketched in Fig.3. sketched in Fig. the algorithm iterates through the sorted list CP of crossover points. the weight γ will always be set to γ = 0. first k different crossover points cp are picked according to the uniform distribution and the offspring is initialized as empty list. we would like to explore the volume inside the hypercube as well.5. For fixedlength strings. For real-valued genomes.7. In this equation.

g2[2]) GÍ 2 R 1- . k-PK) as introduced here in Algorithm 29. HX) According to Banzhaf et al.4. Homologous chromosomes5 are either chromosomes in a biological cell that pair during meiosis or non-identical chromosomes which code for the same functional feature by containing similar genes in different allelic states.9) The intermediate recombination operator discussed in Section 30.org/wiki/Homologous_chromosome [accessed 2008-06-17] . g2[1]. Park et al. an approximately normally distributed random number with mean µ = 0. homology 4 of protein-coding DNA sequences means that they code for the same protein which may indicate common functionality.5. homologous genetic material is very similar and in nature.wikipedia.6. In other words. gnew [i] = γg1 [i] + (1 − γ)g2 [i] ∀i ∈ 1. g2[2]) (g1[0].. As basis for their analysis they take multi-point crossover (MPX. g1[2]) GÍ 2 R G0ÍR W Cr eig os hte so d ve A r.3. Park et al. natural crossover is very restricted and usually exchanges only genes that express the same functionality and are located at the same positions (loci) on the chromosomes.2 (Homology). [2128] propose that a similar approach should be useful in Genetic Algorithms. 29. g2[1].org/wiki/Homology_(biology) [accessed 2008-06-17] 5 http://en.3.n (29. They thus make exchange of genetic information conditional: 4 http://en. [209].338 29 GENETIC ALGORITHMS G1ÍR X . again within the bounds (0. only such material is exchanged in sexual reproduction. ge G1ÍR (g2[0].6: The points sampled by different crossover operators. PX P 2- X X -P . g1[1]. In genetics.wikipedia. ve etc ra .6 Homologous Crossover (HRO. 1) for each locus i. 3. k > 0 crossover points are chosen randomly and subsequences between two points (as well as the start of the string and the first crossover point and the last crossover point and the end of the parent strings) are exchanged. g1[2]) G0ÍR parents possible offspring Figure 29.M GÍ 2 R G0ÍR (g1[0]. Definition D29.2 on page 362 is an extension of average crossover to ρ parents. [2128] point out that such an exchange is likely to destroy schemas which extend over crossover points while leaving only those completely embedded in a sequence subject to exchange intact. g1[1]. In MPX.U G1ÍR (g2[0].

Homologous crossover can be implemented by changing the condition in the alternative in Line 15 in Algorithm 29. g6 and uses the internal operator recombinei to recombine each of them with one of the parents.10. e. combining the two subsequences with xor and then counting the number of 1-bits is suggested as similarity measure.10. i.4 Variable-Length String Chromosomes Variable-length genomes for Genetic Algorithms where first proposed by Smith in his PhD thesis [2536].10. g4 ) ←− recombineR (g1 .6 on page 346. (g3 . b ∨ ((e − s) < w) ∨ dist ham(subList(gp1 . it internally creates two new (random) genotypes g5 . again the sequence from gp1 is used. g2 ) uses an internal crossover operation recombinei and a genotype creation routine “create”.4. subList(gp2 . it does not perform “real” crossover.. we deviate from the suggestions given in [1159. 29. . or if 3. he introduced a new variant of classifier systems (see Chapter 35) with the goal of evolving programs for playing poker [2240. Actually. Jones introduces an experiment to test whether a crossover operator is efficient or not.11) If the performance of Genetic Algorithm which uses recombinei as crossover operator is not better than with recombineR using recombinei . . 2128]. another argument for the measured better performance of homologous crossover [2128] would be the discussion on similarity extraction given in Section 29. His random crossover or headless chicken crossover [101] (g3 . than the efficiency of recombinei is questionable. we permit an information exchange between the parental chromosomes only if it is below a threshold u while using the genes from the first parent if it is above or equal the threshold. There.5. Therefore. the Hamming distance “dist ham” divided by the sequence length (e − s). .29. e − s)) ≥u e−s (29. Instead. g2 ) ⇒ (recombinei g1 . its length would be below a given threshold w. if 2. two-point crossover outperformed random crossover only in problems where there are well-defined building blocks while not leading to better results in problems without this feature.3. create recombinei g2 . in Equation 29. However. Either the quoted works just contain a small typo regarding that issue or the author of this book misunderstood the concept of homology. It receives two parent individuals g1 and g2 and produces two offspring g3 and g4 . If this value is above a threshold u.7 Random Crossover In his 1995 paper [1466].10) Besides the schema-preservation which was the reason for the invention of HRO. HX) always chooses the subsequence from the first parent gp1 if 1. create . 2536]. the exchanged sequence would be too short. s. the sequences are not different enough.6 on Line 15). s. There. g4 ) = recombineR (g1 . it would be the turn of gp1 anyway (b = true in Algorithm 29. In the experiments of Jones [1466] with recombinei = TPX.6 to Equation 29.4. e − s) . VARIABLE-LENGTH STRING CHROMOSOMES 339 their HRO (Homologous Recombination Operator. The reader should be informed that in Point 3 of the enumeration and in Equation 29. like the Hamming distance it is a dissimilarity measure. ) (29. 29. As measure for difference.

29. for example.4.b. this operation can be reversed by deleting elements from the string as shown in Fig. A special case of this type of recombination is the homologous crossover. insertion and deletion.7.4. createGAVL(GT ) = createGAFL l.4.1 on page 407. Figure 29.b: Deletion of genes. are also implicitly be performed by crossover. The crossover of different strings may turn out as an insertion of new genes into an individual. as discussed in Section 29. as sketched in Figure 29. 29. the same crossover operations are available as for fixed-length strings except that the strings are no longer necessarily split at the same loci. It should be noted that both. of genes with randomly chosen alleles at any given position into a chromosome as sketched in Fig. we could insert a couple Fig. Recombining two identical strings with each other can. .1 Creation: Nullary Reproduction 29 GENETIC ALGORITHMS Variable-length strings can be created by first randomly drawing a length l > 0 and then creating a list of that length filled with random elements.4.7.340 29.3).3.3. Second.7. 29. lead to deletion of genes.12) 29.4. First.7.2 Insertion and Deletion: Unary Reproduction If the string chromosomes are of variable length.8.a: Insertion of random genes. where only genes at the same loci and with similar content are exchanged.2 and Section 29. Fig.3 can be extended by two additional methods. GT l l ∼ exp (29.6 on page 338 and in Section 31.7: Search operators for variable-length strings (additional to those from Section 29. the set of mutation operations introduced in Section 29. 29. 29. l could be drawn from a bounded exponential distribution.3.4. The lengths of the new strings resulting from such a cut and splice operation may differ from the lengths of the parents.3 Crossover: Binary Reproduction For variable-length string chromosomes.a.

5 Schema Theorem The Schema Theorem is a special instance of forma analysis (discussed in Chapter 9 on page 119) for Genetic Algorithms. 29.8: Crossover of variable-length string chromosomes.c: Multi-Point Crossover Figure 29.. Definition D29.29. 29.5 (Defined Length). Σ is the binary alphabet Σ = {true.a: Single-Point Crossover () Fig. i. e.6 (Schema). power set you can find described in Definition D51. 29. There are two basic principles on defining such properties: masks and do not care symbols. Definition D29.8. false} = {0. Matter of fact.15) A mask contains the indices of all elements in a string that are interesting in terms of the property it defines. masks.b: Two-Point Crossover () Fig. and wildcards before going into detail about the Schema Theorem itself.8.9 on page 642. 1250] 6 Alphabets 7 The and such and such are defined in ?? on page ??. the set of all genotypic masks M (l) is the power set7 of the valid loci M (l) = P (0.. Normally. For fixed-length string genomes.14) Definition D29.l − 1) [2879]. and the related Building Block Hypothesis. k ∈ mi } (29.1 Schemata and Masks Assume that the genotypes g in the search space G of Genetic Algorithms are strings of a fixed-length l over an alphabet6 Σ. Every mask mi ∈ M (l) defines a property φi and an equivalence relation: g ∼φi h ⇔ g [j] = h[j] ∀j ∈ mi (29. The defined length δ (mi ) of a mask mi is the maximum distance between two indices in the mask: δ (mi ) = max {|j − k| ∀j.4 (Order). it is older than its generalization and was first stated by Holland back in 1975 [711. A forma defined on a string genome concerning the values of the characters at specified loci is called Schema [558. 29. its criticism.8.13) Definition D29. 1250.3 (Mask). 29. we know that properties can be defined on the genotypic or the phenotypic space. The order order(mi ) of the mask mi is the number of loci defined by it: order(mi ) = |mi | (29. . Here we will first introduce the basic concepts of schemata. From forma analysis. For a fixed-length string genome G = Σl . G = Σl .5. SCHEMA THEOREM 341 () Fig. we can consider the values at certain loci as properties of a genotype. 1253]. 1}.5.

1)}. ∗). 0. for example..0) Figure 29. Therefore. H1=(0. 0) .2 Wildcards 29 GENETIC ALGORITHMS The second method of specifying such schemata is to use don’t care symbols (wildcards) to create “blueprints” H of their member individuals. 1)}. 1. 1} . 2} .3 Holland’s Schema Theorem The Schema Theorem8 was defined by Holland [1250] for Genetic Algorithms which use fitness-proportionate selection (see Section 28. The mask m1 = {1.wikipedia.*) g0 (0.0) = {(1.*) (1. ∗). Aφ1 =(1. definitions like the defined length and order can easily be transported into their context. 29.1) ≡ H2 = (0. These schemata mark hyperplanes in the search space G. it defines four formae Aφ1 =(0. 0) .1) (1. specifies that the values at the loci 1 and 2 of a genotype denote the value of a property φ1 and the value of the bit at position 3 is irrelevant. {0. 1.*) g2 (0.342 29. 0.4. 1. and Aφ1 =(1. The set of valid masks M (3) is then M (3) = {{0} .1.1) ≡ H4 = (1.16) (29.. 0. 1.0. {1.0. 0.5.17) H (j) ∈ Σ ∪ {∗} ∀j ∈ 0. 2}. (0.1. {0.0. These formae can be defined with masks as well: Aφ1 =(0. 0)}.l − 1 ⇒ H (j) = g [j] if j ∈ mi ∗ otherwise (29. we place the don’t care symbol * at all irrelevant positions and the characterizing values of the property at the others.*) (0.1) H2=(0. Aφ1 =(0.0.0) ≡ H3 = (1.1.1.0.1.0) g1 (1.9 for the three bit genome.1. Aφ1 =(1.l − 1 Schemas correspond to masks and thus. {2} . Assume we have bit strings of the length l = 3 as genotypes (G = B3 ). Example E29. 0) . Therefore. (0. ∀j ∈ 0. 1. 2}}.0) ≡ H1 = (0. ∗).1) H5=(1. 1)}.org/wiki/Holland%27s_Schema_Theorem [accessed 2007-07-29] .0. (1. 1.0) (1. 1. 0.*) H4=(1.0) = {(0.*.9: An example for schemata in a three bit genome. {1} . Aφ1 =(0. and Aφ1 =(1.1) H3=(1. 2} . ∗). 0) .1 (Masks and Schemas on Bit Strings). (1. as illustrated in Figure 29.1) = {(0. {0.1) = {(1. 0.0) (0.5.3 on page 290) where fitness is subject to 8 http://en.

pop(t)) vt (H) = ps ps vt count(H. pop(t + 1)) of instances of the schema H selected into pop(t + 1) of generation t + 1 then evaluates to: E [count(H. pop(t + 1))] = ps P (select(pH )) count(H.i ) v(p′ ) ∀p′ ∈pop(t) (29. is thus simply the sum of the probabilities of selecting any of the count(H.24) (29. pop(t + 1))] = vt (29. pop(t + 1))] of the number count(H. 1253]. For fitness proportionate selection in this case.i ) = count(H. We here print this equation in a slightly adapted form. Let us first only consider selection. The expected value E [count(H.pop(t)) count(H.i is an instance of the schema H.21) (29. fitness proportionate selection for fitness maximization and with replacement in particular.19) We can rewrite this equation to represent the arithmetic mean vt (H) of the fitness of all instances of the schema in the population: 1 vt (H) = count(H.26) (29. and ignore reproduction operations such as mutation and crossover.5.25 exactly ps times.18) The chance to select any instance of schema H.25) v(p′ ) = ps vt ∀p′ ∈pop(t) P (select(pH )) = count(H. we need to select ps individuals. SCHEMA THEOREM 343 maximization [711. pop(t)) vt (H) i=1 (29.i ) P (select(pH.20) v(pH.23) (29. If the genotype of pH.i )) = i=1 P (select(pH )) = i=1 v(p′ ) ∀p′ ∈pop(t) (29.22) P (select(pH )) = count(H.18 on page 291 defines the probability to pick an individual in a single selection choice according its fitness. In other words.29. pop(t)) vt (H) E [count(H.pop(t)) v(pH. pop(t)) instances of this schema in the current population: count(H. pop(t)) count(H. its probability of being selected with this method equals its fitness divided by the sum of the fitness of all other individuals (p′ ) in the population pop(t) at generation t.27) (29. here denoted as P (select(pH )). P (select(pH. pop(t)) vt (H) ps vt In order to fill the new population pop(t + 1) of generation t + 1. pop(t)) vt (H) ps−1 v(pj ) j=0 We can do a similar thing to compute the mean fitness vt of all the ps individuals p′ in the population of generation t: vt = 1 ps v(p′ ) ∀p′ ∈pop(t) (29. Equation 28.28) .pop(t)) count(H.i )) = v(pH. we have repeat the decision given in Equation 29.pop(t)) v(pH.i ) i=1 (29.

e. If we assume that either mutation is applied (with probability mr). or neither operation is used and the individual survives without modification. for example. e.31. crossover is applied (with probability cr). Finally. we can expect that schemas with lower average fitness are likely to have fewer instances after selection. E [count(H. Create a new instance of the schema. by substituting it into Equation 29. simply assume Pdif f = 1. the probability ζm to do this with mutation is given in Equation 29. If the parent was not instance of the schema. Holland [1250] was only interested in finding a lower bound for the expected value and thus considered only the first two cases.. this can lead to the loss of the schema only if the crossover point separates the bits which are not declared as * in H. the order order(H). Leave the schema untouched when reproducing an individual.32 and. In other words. In a search space G = Bl .30) For single-point mutation. If ζ is the probability that an instance of the schema is destroyed. pop(t + 1))] = count(H. we get Equation 29. but the child is not. we would have to pick a second parent which has at least one wrong value in the genes it provides for the non-don’t care points it covers after crossover.33) . the worst possible luck when choosing the parent.33. the child will be neither. then the child is as well. The lower the defined length of a schema and the lower its order. the parent was not a schema instance. pop(t)) vt (H) 1 − mr − cr E [count(H. Equation 29.29) If we apply single-point crossover to an individual which is already a member of the schema H.344 29 GENETIC ALGORITHMS From this equation we can see that schemas with better (higher) fitness are likely to be present to a higher proportion in the next generation.. 2.29. and between l − 2 and l − 1.31) What we see is that both reproduction operations favor short schemas over long ones. Also. Furthermore. pop(t)) vt (H) (1 − ζ) vt (29. on the other hand. . Figure 29.32) (29. solely the number of non-don’t care points in the schema H is interesting. in order to actually destroy the schema instance. ζc = δ (H) δ (H) Pdif f ≤ l−1 l−1 (29. the probability to destroy an instance of H with crossover. since we want to find a lower bound. only δ (H) points are dangerous. the higher is its chance to survive mutation and crossover.30 and Equation 29. i. pop(t + 1))] ≥ vt l l−1 ζ ≤ mr ζm + cr ζc = mr (29. The defined length δ (H) of H is the number of possible crossover points lying within the schema (see. amongst the l loci we could hit (with uniform probability). we have l − 1 possible crossover points: between loci 0 and 1. Here.28 changes to Equation 29. . 3. The next thing to consider in the Schema Theorem are reproduction operations. .10 on page 347). but the child is. we now can put Equation 29. Each individual application search operation may either: 1. ζm = order(H) l (29.30. Under these circumstances ζc . exactly order(H) will lead to the destruction of the schema instance. . If the parent was instance of the schema. Destroy an occurrence of the schema: The parent individual was an instance of the schema. Amongst them. Equation 29. becomes Equation 29.31 together. between 1 and 2. We assume that this probability would be Pdif f and. i. order(H) δ (H) + cr l l−1 order(H) δ (H) count(H.29.

1133]. highly fit schemas would spread exponentially in the population since their number would multiply with a certain factor in every generation.3. 6. Obviously.5. this just means that the schema fitness vt (H) must be higher than the average population fitness vt .9.29 will shift over time.5. Thus. if instances of the schema H would never get lost. maybe even misleading. Since we only have samples of the schemata H. This limits the reproduction of the schemata and also makes statements about probabilities in general more complicated. so we do not get further information about its real utility. the Schema Theorem represents a lower bound that will only hold for one generation [2931]. i. 9. For non-zero ζs. Representing them as binary strings will lead to D3 = {0011.1. can lead to schema incompatibilities. }.29 and Equation 29. we cannot be sure if vt (H) really represents the average fitness of all the members of the schema. 29. This deduction is however a very optimistic assumption and not generally true. 0110. 12. the probabilities in Equation 29.35) (29. If a highly fit schema has many offspring with good fitness.4 Criticism of the Schema Theorem From Equation 29. the schema fitness must be higher in order to compensate for the destroyed instances appropriately. . we cannot know if it is really good if one specific schema spreads fast. 1001. (H) by time. Take the set D3 of numbers divisible by three for example D3 = {3. If the relation of vt (H) to vt would remain constant for multiple generations (and of such a quality that leads to an increase of instances). for instance. Additionally. } if we have a bit-string genome of the length 4. i. this will also improve the overall fitness of the population.2.33 hold for any possible schema at any given point in time accounts for what is called implicit parallelism and introduced in Section 28. . . e. Another issue is that we implicitly assume that most schemata are compatible and can be combined.29.33.36) (29. In Equation 29... .1 on page 155. we can already see that the number of instances of a schema is likely to increase as long as vt (H) (1 − ζ) vt 1−ζ 1 < vt (H) vt vt 1 1> vt (H) 1 − ζ vt vt (H) > 1−ζ 1< (29. we cannot seize these genotypes in a schema using the discussed approach. . one could assume that short.. Hence. The expressiveness of masks and blueprints even is limited and can be argued that there are properties which we cannot specify with them. They . Furthermore. Trying to derive predictions for more than one or two generations using the Schema Theorem as is will lead to deceptive or wrong results [1129. that there is low interaction between different genes. even reproduction operators which preserve the instances of the schema may lead to a decrease of vt+. This is also not generally valid: Epistatic effects. That is why we annotate it with t instead of writing v(H). SCHEMA THEOREM 345 The fact that Equation 29.. the population of a Genetic Algorithm only represents a sample of limited size of the search space G.29. .37) If ζ was 0.29 or Equation 29. the number of schema instances would grow exponentially (since it would multiply with a constant factor in each generation). e. Generally.34) (29. Remember that we have already discussed the exploration versus exploitation topic and the importance of diversity in Section 13. It is also possible that parts of the population already have converged and other members of a schema will not be explored anymore. 1100. even it is very fit.

the inheritance of the sequence or subsequence.a [i] = gp. in a maximally random fashion [302]. The Schema Theorem. seemingly has good fitness. low-defining length schemata with above-average fitness (the so-called building blocks). When a Genetic Algorithm solves a problem. Uniform crossover preserves these components.. we mentioned that the Schema Theorem gives rise to implicit parallelism since it holds for any possible schema at the same time. it should be noted that (a) only a small subset of schemas is usually actually instantiated within the population and (b) only a few samples are usually available per schema (if any). or (c) a combination of inherited sequences of length m less than l and a random coincidence of l − m equal bits (resulting. It states that not the different schemas are combined from the parent individuals to form the child. for example. The latter two cases.a and gp. The Building Block Hypothesis (BBH) proposed by Goldberg [1075]. If a sequence of genes carries the same alleles in both parents.346 29 GENETIC ALGORITHMS may.3. although it is a very nice model. Also consider the criticisms of the Schema Theorem mentioned in the previous section which turns higher abstractions like the Building Block Hypothesis into a touchy subject. the value at that locus in gc is either 1 or 0 – both with a probability of 50%. (b) both parents inherited the sequence. If the two parents disagree in one locus j.b . By using the building blocks instead of testing any possible binary configuration. but instead the similarities are inherited. In UX. Here. however.5. Holland [1250] is based on two assumptions: 1. 29. 29. [1075] Although it seems as if the Building Block Hypothesis is supported by the Schema Theorem.a [i] = gp. uniform crossover preserves the positions where the parents are equal and randomly fills up the remaining genes. 2. as introduced in Section 29. be gathered in a forma.b [i] ⇒ gc [i] = gp. In other words.6 Genetic Repair and Similarity Extraction The GR Hypothesis by Beyer [298] based on the Genetic Repair (GR) Effect [296] takes a counter position to the Building Block Hypothesis [302]. there exists much criticism of the Building Block Hypothesis and. imply that the sequence (or parts thereof) was previously selected. the substructure of a genotype which allows it to match to a schema is called a building block. there exist some low-order. However. This means that if the parents have the same value in locus i. In general.5.4 on page 337. so they can be replaced. Finally.4. the value for each locus of the child individual gc is chosen randomly from the values at the same locus in one of the two parents gp. however. Experiments that originally were intended to proof this theory often did not work out as planned [1913].b [i]. Genetic Algorithms efficiently decrease the complexity of the problem.5 The Building Block Hypothesis According to Harik [1196]. These schemata are combined step by step by the Genetic Algorithm in order to form larger and better strings. this assumption (like many others) about the Schema Theorem makes only sense if the population size is assumed to be infinite and the sampling of the population and the schema would be really uniform. maybe even harmful. from mutation).Generally. this is done with maximum entropy. then the child will inherit this value too gp. there are three possible reasons: (a) either it emerged randomly which has a probability of 2−l for length-l sequences – which is thus very unlikely. cannot hold for such a forma since the probability p of destruction may be different from instance to instance. however. it cannot yet be considered as proven sufficiently. e. Let us take a look on uniform crossover (UX ) for bit-string based search spaces. this cannot be verified easily. The remaining bits are uninteresting. . i.

1 Representation The idea behind the genomes used in messy GAs goes back to the work Bagley [180] from 1967 who first introduced a representation where the ordering of the genes was not fixed. The useless sequences. it is not generally possible to devise such a design. 0).6. i. it can place the genes of a building block spatially close together. low defined length) and is of low order [180]. the one with the lesser defined length will be propagated to more offspring.1 Inversion: Unary Reproduction The inversion operator reverses the order of genes between two randomly chosen loci [180.11 illustrates. The messy Genetic Algorithms (mGAs) developed by Goldberg et al.33. 0) . how the possible building block components (1.29 and Equation 29. 1196].. 0) . Hence. e. according to Equation 29.6 The Messy Genetic Algorithm According to the schema theorem specified in Equation 29. from two schemas of the same average fitness and order. With this operation. however. 1) . 29.3.. (4. sets of epistatically linked genes.2. gpm(g1 ) = gpm(g2 ).29. e. is short (i. (2. Figure 29.10: crossover. 0) . are not known at design time – otherwise the problem would already be solved.10. since it is less likely to be destroyed by crossover.6.6. Because of this feature. (0. For instance.33. any particular ordering can be produced in a relatively small number of steps.. 1)) where both genotypes map to the same phenotype. Instead. 0) . This theory has been explained for bit-string based as well as real-valued vector based search spaces alike [298. Thus. 0) can be brought together in two steps. These building blocks. . (4.2 Reproduction Operations 29. This method of linkage learning may thus increase the probability that building blocks. 0). destroyed in 6 out of 9 cases by crossover rearrange destroyed in 1 out of 9 cases by crossover Figure 29. the bit string 000111 can be represented as g1 = ((0. Two linked genes and their destruction probability under single-point 29. (3. THE MESSY GENETIC ALGORITHM 347 Hence. (2. 985]. however. are preserved during crossover operations. It thus mitigates the effects of epistasis as discussed in Section 17. 29.6. as sketched in Figure 29. [1083] use a coding scheme which is intended to allow the Genetic Algorithm to re-arrange genes at runtime. (1. (5. 1) . 0) . the effects of the inversion operation were rather disappointing [180. (1. γ) with its position (locus) φ and value (allele) γ was used. (4. Nevertheless. 1) . 0). 302]. 1)) but as well as g2 = ((5. a schema is likely to spread in the population if it has above-average fitness. i. for each gene a tuple (φ. Therefore. the idea of Genetic Repair is that useful gene sequences which step-by-step are built by mutation (or the variation resulting from crossover) are aggregated. 1) . are dampened or destroyed by the same operator. for example. (3. (3. e. 0) . placing dependent genes close to each other would be a good search space design approach since it would allow good building blocks to proliferate faster. and (6.3.

all possible building blocks of a particular order k are generated. (0.1) (7. (1.0) (1. g6 from above codes for 000 and the second value for locus 2 is discarded. .4 The Process In a simple Genetic Algorithm. 29.1) (1. for instance. 29. 0) .0) (4. (4.1) (2.348 29 GENETIC ALGORITHMS (0. (2. 0) . 0) . Splicing g2 = ((5. In the messy GA [1083. 0) . 0) . 0)) is overspecified since it contains two (in this example.0) (6. the genotype g6 = ((2.1) (7. 0) .1) (3. g7 would code for 000. leads to g5 = ((5. 0)). (5. 1)). In other words. 0) . (3.1) first inversion (0.0) (3. (4.1) (6. 1) . 1) . different) alleles for the third gene (at locus 2). 0) . too.1) (5. (3.0) (2. If we assume a three bit genome. (1. (0.1 = 0. (5. building blocks are identified and recombined simultaneously.6. 0) .5. (1.6. and the first allele found for a specific locus wins. 0) . (1. Via selection. this race is avoided by separating the evolutionary process into two stages: 1. 1)) has a cut probability of pc = (6 − 1)∗0. (4. in turn. (3. the g1 = ((0.0) (4. 0) . building blocks are identified. (2. 0) . the best ones are identified and spread in the population. 1)). In the primordial phase.0) (3. 0) .6. (5. 1)) and g4 = ((4. With pK = 0.3.3 Splice: Binary Reproduction The splice operator joins two genotypes with a predefined probability ps by simply attaching one to the other [1551]. (0. In the original conception of the messy GA.1. (1.1) (6. 0) . 1) . 29. is underspecified since it does not contain any value for the gene in the middle (at locus 1).6. Dealing with overspecification is rather simple [847.3 Overspecification and Underspecification The genotypes in messy GAs have a variable length and the cut and splice operators can lead to genotypes being over or underspecified.2 Cut: Unary Reproduction The cut operator splits a genotype g into two with the probability pc = (len(g) − 1) pK where pK is a bitwise probability and len(g) the length of the genotype [1551]. 1)) and g4 = ((4. 1)).11: An example for two subsequent applications of the inversion operation [1196]. (2.0) (4.1) (7.1) second inversion (0. the application of two cut and a subsequent splice operation to two genotypes has roughly the same effect as a single-point crossover operator in variable-length string chromosomes Section 29. A cut at position 4 would lead to g3 = ((0.0) (5. If this string was 000.0) (5.2. 1551]: The genes are processed from left to right during the genotype-phenotype mapping.2.0) (1. 0) . (2. 1084]. In summary.4. 29. g7 = ((2. 1) . 1) . (2. The loci left open during the interpretation of underspecified genes are filled with values from a template string [1551]. 1) . (5. which leads to a race between recombination and selection [1196]. 1) . (3.1) Figure 29. (0. 1) . 1) . (4.0) (2. 1) . 1) .

i.6 nor Algorithm 29.3. g [i]) x ←− insertItem(x. j.6 on page 329.1. as defined in the genotype-phenotype mapping introduced in Equation 29. The gene at locus i stands for the element si .7 Random Keys In Section 29. e.7 to such genotypes.8: x ←− gpmRK (g.n and specifies that element sj should be at position i in the permutation. the position at which the element si appears in the phenotype. The Random Keys encoding defined by Bean [236] is a simple idea how to circumvent this problem. 1] of the real numbers. sn−1 }: The set of elements in the permutation. s1 . 2. g ′ ) if j < 0 then j ←− ((−j) − 1) g ′ ←− insertItem(g ′ . most commonly. 5. These building blocks are recombined with the cut and splice operators in the subsequent juxtapositional phase. It is quite clear that every number in 1. G = [0.n − 1 is thus a real number or. 3. . Algorithm 29. sj ) return x .. 4. s1 ... The gene g [i] at locus i of the genotype g ∈ G hence has a value j = g [i] ∈ 1... 1) could lead to the offspring (1.7.. e. 2. This process was later improved by using a probabilistic complete initialization algorithm [1087] instead. 3. 5. we present an algorithm which performs the Random Keys genotype-phenotype mapping. An example implementation of the algorithm in Java is given in Listing 56. Performing a simple single-point crossover on two different genotypes is hence not feasible.8. is defined by the position of the gene g [i] in the sorted list of all genes. 4) and g2 = (4. The two genotypes g1 = (1.. The permutation of these elements. j. 5. We can therefore not apply Algorithm 29.n must occur exactly once in each genotype. This direct representation is achieved by using a search space G which contains the n-dimensional vectors over the natural numbers from 0 to n − 1. 4). S) Input: g ∈ G ⊆ Rn : the genotype Input: S = {s0 .2 and 29. 1) or (4.. This bootstrapping was done by applying the primordial and juxtapositional phases for all orders from 1 to k − 1. . i. we introduce operators for processing strings which directly represent permutations of the elements from a given set S = {s0 .3. It introduces a genotype-phenotype mapping which translates n-dimensional vectors of real numbers (G ⊆ Rn ) to permutations of length n. Often S ⊂ N0 and si = i Data: i: a counter variable Data: j: the position index of the allele of the ith gene Data: g ′ : a temporary genotype Output: x ∈ X = Π(S): the corresponding phenotype 1 2 3 4 5 6 7 8 9 begin g ′ ←− () x ←− () for i ←− 0 up to len(g) − 1 do j ←− searchItemAS(g [i]. 2. RANDOM KEYS 349 2. Each gene g [i] of a genotype g with i ∈ 0. 3.n − 1]n .3. 29. sn−1 }. 2. 3.. both of which containing one number twice while missing another one. The complexity of the original mGA needed a bootstrap phase in order to identify the orderk building blocks which required to identify the order-k − 1 blocks first.52 on page 834..29. an element of the subset [0. In Algorithm 29.

2.350 29 GENETIC ALGORITHMS In other words. Finally. and so on. since 0.84. we can translate a combinatorial problem where a given set of elements must be ordered into a numerical optimization problem defined over a continuous domain. s0 . Snyder and Daskin [2540] combine a Genetic Algorithm.45. Traveling Salesman Problems. the gene at locus 3 has the largest allele (g3 [3] = 0. s3 is the last element in the permutation. s1 .59) would be translated to the phenotype x3 = (s2 .2. Hence. scheduling and resource allocation problems. Bean [236] used the Random Keys method to solve Vehicle Routing Problems. e. .17 (i. 0. g3 [2]) is the smallest number and hence. 0. s3 ). s4 .17. 0. s0 will occur at the second place of the phenotype. as well. s2 comes first. The next smaller number is 0. tackle generalized Traveling Salesman Problems. With this genotype-phenotype mapping. the genotype g3 = (0. the value of the gene at locus 0.84) and thus. quadratic assignment problems. the Random Keys encoding. 0. and a local search method to..

3079] Games [998] Graph Theory [15. 711. 1767–1770. 1074] Cooperation and Teamwork [2922] Data Mining [633. 3024. 240. 556. 1488. 2959. 2018. 1894. 527. 1081. 1686. 2472. 2148. 1440. 978. 1657. 2795. 2114. 2865. 713. 2933. 1581. 558. 2655. 3076] Economics and Finances [1060. 2729. 3076] E-Learning [71. 2796. 1126.1 General Information on Genetic Algorithms Applications and Examples Table 29. 3062. 1269. 1447. 2656. Area Art Chemistry Combinatorial Problems References [1953. 911. 769. 2267. 190. 1706. 1284. 1829. 2324.8. 721. 2473. 1538. 2114. 2658. 2068. 872. 2732. 1082. 1657. 2270. 1779–1781. 765. 2989. 2338. 3062. 1640. 2370. 1841. GENERAL INFORMATION ON GENETIC ALGORITHMS 351 29. 2923. 2068. 1750. 190. 3002. 1779–1781. 2729. 71. 65. 532. 2795. 769. 2168. 1558. 575. 3002.1: Applications and Examples of Genetic Algorithms. 429. 3024. 2658. 436. 2498. 2147. 1284. 237. 2177. 511. 2270. 409. 2338. 1442. 757. 789] Distributed Algorithms and Sys. 402. 2732] Control [541. 1840. 757. 519. 2498. 668. 1460. 1704. 2819. 2643. 2732. 3084] . 2163. 994. 382. 1756. 629. 973. 1211.8. 1487. 2373. 2529] [214. 1446. 541. 1105. 1840. 556. 3084] Computer Graphics [56. 640. 3015. 1971. 410. 1442. 436. 1709. 1709. 1896. 1910. 1757. 1657. 2814. 2909. 355. 789. 2372. 1763–1765.29. 1270. 1258. 1896. 1971. 1636. 2250. 1291. 575. 1450. 654. 1558. 1982. 1636. 2686. 2498. 721. 2656. 1494. 2901. 640. 1935. 683. 1080. 2796. 629. 2894. 3076] Databases [640. 2656. 1636. 610. 2981. 1558. 1488. 2713. 2128. 1581. 1471. 1538. 2922. 2729. 2950. 1705. 1471. 236. 541. 2338. 1447. 717. 2656. 1269. 735. 2113. 410. 714. 2540. 575. 2178. 629. 978. 65. 1767–1770. 1969. 3063] Digital Technology [240. 2950. 3055. 3044. 2177. 1581. 1450. 2720. 1725. 41. 2540. 1446. 2370. 769. 872. 1709. 573. 2729] [16. 410. 2796. 169. 896. 236.8 29. 1737. 2901. 2686. 2865. 2019. 81. 81. 2814. 2113. 1935. 1241. 2324. 1750. 3056] Function Optimization [103. 240. 2374. 1534. 1161. tems 1537.[15. 1982. 2720. 2374. 1906] Engineering [56. 402. 1756. 3044. 2902. 896. 2713. 335. 2497. 382. 1706. 1840. 640. 527. 2232. 1704. 466. 1488. 3002] Healthcare [2022] Logistics [41. 2324. 1126. 1270. 303. 1906. 2075. 2075. 1705. 1982.

Foundations of Learning Classifier Systems [443] 19. 790. 2374. Evolutionary Algorithms in Theory and Practice: Evolution Strategies. 679. 1535. Control. Genetic and Evolutionary Computation for Image Processing and Analysis [466] 11. Handbook of Evolutionary Computation [171] 2. Practical Handbook of Genetic Algorithms: New Frontiers [522] 10.2 Books 1. Genetic Algorithms + Data Structures = Evolution Programs [1886] 35. 1450. 532. Handbook of Genetic Algorithms [695] 8. Advances in Evolutionary Algorithms [1581] 24.352 Mathematics Multi-Agent Systems Physics Prediction Security Software Theorem Proofing and Automatic Verification Wireless Communication 29 GENETIC ALGORITHMS [240. 520. 998. 2285. Control. 2372.8. 2416] [1494] [532. 1284. 556. Efficient and Accurate Parallel Genetic Algorithms [477] 15. Evaluation of the Effectiveness of Genetic Algorithms in Combinatorial Optimization [2923] 27. 2814] 29. Evolutionary Computation: A Unified Approach [715] 21. Cellular Genetic Algorithms [66] 32. 2655. Genetic Algorithm and its Application (Y´ an Su`nfˇ J´ Q´ Y` ıchu´ a a ı ı ıngy`ng) [541] o 30. Practical Handbook of Genetic Algorithms: Applications [521] . Evolution¨re Algorithmen [2879] a 29. Optimization. Genetic Algorithms for Applied CAD Problems [1640] 13. Electromagnetic Optimization by Genetic Algorithms [2250] 16. Genetic Algorithms and Simulated Annealing [694] 7. 652. 652. Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases [634] 17. 998. 2250. Practical Handbook of Genetic Algorithms: Complex Coding Systems [523] 20. Evolutionary Computation 1: Basic Algorithms and Operators [174] 4. Genetic Algorithms Reference – Volume 1 – Crossover for Single-Objective Numerical Optimization Problems [1159] 18. 2284. 575. 2371. An Introduction to Genetic Algorithms [1912] 34. Representations for Genetic and Evolutionary Algorithms [2338] 25. 826. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. Genetic Algorithms and Evolution Strategy in Engineering and Computer Science: Recent Advances and Industrial Applications [2163] 3. Genetic Algorithms in Search. 575] [1285] [1725] [620. and Machine Learning [1075] 22. and Artificial Intelligence [1250] 9. Design and Practice [41] 31. Data Mining and Knowledge Discovery with Evolutionary Algorithms [994] 28. Extending the Scalability of Linkage Learning Genetic Algorithms – Theory & Practice [552] 14. 1767–1770. Advances in Evolutionary Algorithms – Theory. Evolutionary Programming. Genetic Algorithms and Genetic Programming at Stanford [1601] 5. 1494. Genetic Algorithms [167] 26. Evolutionary Computation: The Fossil Record [939] 12. Telecommunications Optimization: Heuristic and Adaptive Techniques [640] 6. 2901] [640] [154. 1796. 640. Introduction to Stochastic Search and Optimization [2561] 23. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. and Artificial Intelligence [1252] 33.

2: Conferences and Workshops on Genetic Algorithms. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28. see [969] 1991/07: San Diego. GENERAL INFORMATION ON GENETIC ALGORITHMS 29. HIS: International Conference on Hybrid Intelligent Systems See Table 10. ICCI: IEEE International Conference on Cognitive Informatics See Table 10. USA. USA. see [719] 2001/07: Charlottesville. see [883] 1993/07: Urbana-Champaign. USA. USA. FL.8. see [2414] 1987/07: Cambridge.5 on page 318. USA. MI. ICGA: International Conference on Genetic Algorithms and their Applications History: 1997/07: East Lansing. WI. Austria. USA. USA. 353 . CA. EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28. see [256] 1994/07: Estes Park. see [208] 1996/08: San Diego.5 on page 318.2 on page 128.8. see [29] 2005/01: Aizu-Wakamatsu City.5 on page 319. Japan. see [2562] AE/EA: Artificial Evolution See Table 28. USA. see [2932] 1992/07: Vail. IN. USA. see [301] 2009/01: Orlando. CEC: IEEE Conference on Evolutionary Computation See Table 28. see [2927] 1990/07: Bloomington.5 on page 318.29. USA. see [166] 1995/07: Pittsburgh. USA. GECCO: Genetic and Evolutionary Computation Conference See Table 28.3 Conferences and Workshops Table 29.5 on page 319. IL. VA. FOGA: Workshop on Foundations of Genetic Algorithms History: 2011/01: Schwarzenberg. USA. MA. see [2983] 2002/09: Torremolinos. CO. PA. EvoWorkshops: Real-World Applications of Evolutionary Computing See Table 28. see [1131] PPSN: Conference on Parallel Problem Solving from Nature See Table 28.5 on page 317. CA. WCCI: IEEE World Congress on Computational Intelligence See Table 28. CO. Spain. USA.5 on page 317. PA. see [1132] 1985/06: Pittsburgh. see [2565] 1998/07: Madison.2 on page 128. USA. VA. see [254] 1989/06: Fairfax.5 on page 319.

2 on page 130. MICAI: Mexican International Conference on Artificial Intelligence See Table 10. Czech Republic. Czech Republic.354 29 GENETIC ALGORITHMS ICNC: International Conference on Advances in Natural Computation See Table 28.5 on page 319. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. BIONETICS: International Conference on Bio-Inspired Models of Network. WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10. Finland. Greece. see [2232] 1995/12: Las Palmas de Gran Canaria. CIMCA: International Conference on Computational Intelligence for Modelling. Man.2 on page 131.5 on page 320. Czech Republic. see [419] 2001/06: Brno.5 on page 320. Germany. . SMC: International Conference on Systems. EUROGEN: Evolutionary Methods for Design Optimization and Control History: 2007/06: Jyv¨skyl¨. and Computing Systems See Table 10. see [1894] a a 1997/11: Triest. and Cybernetics See Table 10. 418] 1996/06: Brno. see [421] 2004/06: Brno. see [2751] a a 2005/09: Munich. Czech Republic. Czech Republic. Czech Republic. Czech Republic. Spain. Catalonia. Czech Republic. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28. Czech Republic. see [414] MIC: Metaheuristics International Conference See Table 25. see [415] 1995/09: Brno.2 on page 227. see [216] 2001/09: Athens. Czech Republic. Information. Finland. see [416. Bavaria.5 on page 320. see [2107] 2006/05: Brno. Spain.2 on page 130. Italy. SEAL: International Conference on Simulated Evolution and Learning See Table 10. see [417] 1997/06: Brno. see [411. see [420] 2003/06: Brno. see [1842] 2002/06: Brno.2 on page 131. 412] 2007/09: Prague. see [2959] FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28. see [2419] 2003/09: Barcelona. see [2106] 1998/06: Brno. Czech Republic. see [413] 2005/06: Brno. see [1053] 1999/05: Jyv¨skyl¨.2 on page 131. MENDEL: International Mendel Conference on Genetic Algorithms History: 2009/06: Brno.2 on page 131. Czech Republic. see [2522] 2000/06: Brno. Control and Automation See Table 28. Czech Republic.

2 on page 132.5 on page 321. COGANN: International Workshop on Combinations of Genetic Algorithms and Neural Networks History: 1992/06: Baltimore.2 on page 133. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. Finland.2 on page 134.5 on page 321.5 on page 321. 355 CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10.5 on page 321. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28.2 on page 133. EngOpt: International Conference on Engineering Optimization See Table 10. see [2415] EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation See Table 28. UK. Organization and Technology See Table 10.2 on page 132. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28. see [3056] 1995/09: Sheffield. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. GEM: International Conference on Genetic and Evolutionary Methods See Table 28.2 on page 133.29.5 on page 320. EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. MD. FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28. WOPPLOT: Workshop on Parallel Processing: Logic.2 on page 133. Finland. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. GENERAL INFORMATION ON GENETIC ALGORITHMS CIS: International Conference on Computational Intelligence and Security See Table 28.5 on page 321. History: 1995/06: Vaasa. GALESIA: International Conference on Genetic Algorithms in Engineering Systems History: 1997/09: Glasgow. Scotland. LION: Learning and Intelligent OptimizatioN See Table 10. FWGA: Finnish Workshop on Genetic Algorithms and Their Applications Later continued as NWGA.5 on page 321.2 on page 133. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10.8. UK.2 on page 132.5 on page 321. USA. see [3055] GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. see [59] 1994/03: Vaasa. see [58] . EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28.

Finland. Finland. 29.5 on page 322. IWACI: International Workshop on Advanced Computational Intelligence See Table 28.5 on page 322.5 on page 322. see [62] 1997/08: Helsinki. Logic Programming. Finland.5 on page 322. EvoRobots: International Symposium on Evolutionary Robotics See Table 28. Optimization.8. see [61] 1996/08: Vaasa.5 on page 321. SSCI: IEEE Symposium Series on Computational Intelligence See Table 28. Evolutionary and Memetic Computing See Table 28. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. see [59] NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28.5 on page 321. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands . SEA: International Symposium on Experimental Algorithms See Table 10. ICEC: International Conference on Evolutionary Computation See Table 28. see [57] GICOLAG: Global Optimization – Integrating Convexity. IJCCI: International Conference on Computational Intelligence See Table 28.2 on page 228.5 on page 322. NWGA: Nordic Workshop on Genetic Algorithms History: 1998/07: Trondheim.4 Journals 1. see [60] 1995/06: Vaasa.2 on page 134. Norway.5 on page 322.2 on page 134.2 on page 135. and Computational Algebraic Geometry See Table 10. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28.5 on page 322. SEMCCO: International Conference on Swarm. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28.356 29 GENETIC ALGORITHMS 1992/11: Espoo. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. Finland. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2.

and 78.2. it would be more economically to represent bit strings as array of int or long and access specific bits via bit arithmetic (via. what happens if you use the same experimental settings for both tasks? Try to give reasons for your results.8.29. what happens if you use the same experimental settings for all three tasks? Try to give reasons for your results.. 8. this time by reproducing the experiments given in Khuri. Each boolean will take up at least 1 byte. If you use a permutation-based search space. <<. What are the differences? What are the similarities? Did you use different experimental settings? If so. we provide a Java implementation of some of the standard operators of Genetic Algorithms based on arrays of boolean. e.2. by either using byte[]. int[]. one eighth of a byte. Compare your results from Task 67 on page 238. Perform the same experiments as in Task 64 on page 238 (and 68) with a Genetic Algorithm. Use a real-vector based search space. Sch¨tz. What are the differences? What are the similarities? Did you use different experimental settings? If so. Perform the same experiments as in Task 67 on page 238 (and 70) with a Genetic Algorithm. Implement an appropriate crossover operator. [10 points] 83. probably even 4. [50 points] 84. or >>>).4 on page 181.2. [30 points] . or maybe even 16 bytes. [10 points] 81. ˆ.2. using boolean[] is a bit inefficient.1 on page 801. e. Obviously.util. [40 points] 79.. 70 and 81. or long[]. In Paragraph 56. Yet. &. Compare the results with the results u o listed in that section and obtained by you in Task 70 and 70. Which method is better? Provide reasons for your observation. since it wastes memory. [40 points] 80. you need to define a recombination operation which does not violate the permutation-character of the genotypes. Put your explanations into the context of the discussions provided in Example E17. Try solving the bin packing problem again.1. Please re-implement the operators from Paragraph 56. Perform the same experiments as in Task 64 on page 238 (and 68 and 78) with a Genetic Algorithm. and Heitk¨tter in [1534]. |. you may re-use the nullary and unary search operations we already have developed for this task. Since it only represents a single bit. GENERAL INFORMATION ON GENETIC ALGORITHMS Tasks T29 357 78. or a specific class such as java .BitSet for storing the bit strings instead of boolean[]. Compare your results from Task 64 on page 238. 68. [40 points] 82. Use a bit-string based search space and utilize an appropriate genotypephenotype mapping.g.1 on page 801 in a more efficient way.1. i.

.

Mutation and selection are the primary reproduction operators and recombination is used less often in Evolution Strategies.. In this case.11).3 on page 266. maintain an a single standard deviation parameter σ and use identical normal distributions for generating the random numbers added to each element of the solution vectors. the real-vector based search spaces. 2278. The search space of today’s Evolution Strategies normally are vectors from the Rn . The Evolution Strategy may either 1. use a separate standard deviation (from a standard deviation vector σ) for each element of the genotypes. Example E28. They did not yet focus on evolving real-valued. 2437]. a special form of Evolutionary Algorithms [170. 1220. The standard deviations are governed by self-adaptation [1185.1 Introduction Evolution Strategies1 (ES) introduced by Rechenberg [2277. λ) Evolution Strategies were developed – a step into a research direction which was not very welcomed back in the 1960s [302]. Since there was only one parent and one offspring in this form of Evolution Strategy. 2435] are a heuristic optimization technique based in the ideas of adaptation and evolution. i.. for instance. the problem space is often identical with the search space. i. but bit strings or integer strings are common as well [302].1. this early method more or less resembles a (1 + 1) population discussed in Paragraph 28. 300. X = G ⊆ Rn . Here.3. or 3.org/wiki/Evolution_strategy [accessed 2007-07-03] .3. normal distributed random numbers are used for mutation.4. 2279] and Schwefel [2433. From there on. ESs normally follow one of the population handling methods classified in the µ-λ notation given in Section 28. 2. e.wikipedia. e. The first Evolution Strategies were quite similar to Hill Climbing methods. create random numbers from different normal distributions for mutations in order to facilitate different strengths and resolutions of the decision variables (see. (µ + λ) and (µ. use a complete covariance matrix C for creating random vectors distributed in a hyperellipse and thus also taking into account binary interactions between the elements of the solution vectors. 1617. The parameter of the mutation is the standard deviation of these random numbers. numerical vectors and only featured two rules for finding good solutions (a) slightly modifying one candidate solution and (b) keeping it only if it is at least as good as its parent. 1436]. Most often. They are often 1 http://en.4.Chapter 30 Evolution Strategies 30. we will focus mainly on the former. 302.1. 1878] and may result from a stochastic analysis of the elements in the population [1182–1184. 2434. 2277–2279.

recombination. This step width can be considered to be good since it was responsible for creating a good individual. The way in which this selection is performed depends on the type of the Evolution Strategy. it makes sense to also assume that they can do well to govern the adaptation of the search steps. individuals created by search operations which perform small modifications will be selected. Starting with a larger step size. λ) ≡ (µ/1. maybe by using a procedure similar to the Java code given in Listing 56.7). the Hill Climber will initially quickly find interesting areas in the search space. The algorithm starts with the creation of a population of µ random individuals in Line 3. it will often jump in and out of smaller highly fit areas without being able to investigate them thoroughly. The algorithm can facilitate the two basic strategies (µ/ρ. once it finds the basin of attraction of an optimum. Assume we have a Hill Climbing algorithm which uses a fixed mutation step size σ to modify the (single) genotype it currently has under investigation. 30. λ) (Paragraph 28. Later. It thus makes sense to adapt the step width of the mutation operation. The idea is that the EA selects individuals which are good. it will be able to trace that optimum and deliver solutions. Mutation and recombination will lead to the construction of endogenous information which favors smaller and smaller search steps – the optimization process converges. it also selects step width that led to its creation. These individuals will hold endogenous information which leads to small step sizes.1. The parameters µ and λ. If a large value is used for σ.2 Algorithm In Algorithm 30. If the Evolution Strategy is a (µ/ρ + λ) strategy.3.6) and (µ/ρ + λ) (Paragraph 28.5. are called exogenous parameters and are usually kept constant during the evolutionary optimization process [302]. will lead to the creation of inferior candidate solutions and thus. In every generation (except the first one with t = 1). We can describe the idea of endogenous parameters for governing the search step width as follows. on the other hand. Hence. the adaptation of the step length of the search operations is governed by the same selection and reproduction mechanisms which are applied in the Evolutionary Algorithm to find good solutions.1. λ) and (µ + λ) ≡ (µ/1 + λ) population treatment. However. and mutation together with the individual. then also the previously selected individuals . However. These general strategies also encompass (µ. This vector undergoes selection. Since we expect these mechanisms to work for this purpose. the Hill Climber will progress very slowly. perish together with them during selection. as we will lay down in Section 30.17 on page 760) we present the basic single-objective Evolution Strategy algorithm as specified by Beyer and Schwefel [302]. We therefore have to select µ from them into the mating pool mate. only the individuals of generation t in pop(t) are considered (Line 13). By doing so. 2. In the (µ/ρ. In other words.24. larger steps are favored by the evolution.4. on the other hand.4. We can further expect that initially. The idea of self-adaption can be justified with the following thoughts. when the basin(s) of attraction of the optimum/optima are reached. we may reduce it while approaching the global optimum. first a genotype-phenotype mapping may be performed (Line 6) and then the objective function is evaluated in . only small search steps can lead to improvements in the objective values of the candidate solutions. λ). there are λ individuals in the population pop. due to its large step size. Evolution Strategies are Evolutionary Algorithms which usually add the step length vector as endogenous information to the individuals. Bad step widths. If a small value is used for σ. which help the algorithm to explore wide areas in the search space.1 (and in the corresponding Java implementation in Listing 56. 1.3. In each generation.360 30 EVOLUTION STRATEGIES treated as endogenous strategy parameters which can directly be stored in the individual records p and evolve along with them [302].

f.w ∀p ∈ P ) pn . either (µ/ρ. ρ) Input: f: the objective/fitness function Input: µ: the mating pool size Input: λ: the number of offsprings Input: ρ: the number of parents Input: [implicit] strategy: the population strategy.2. ALGORITHM 361 Algorithm 30. µ.w ←− mutateW (w′ ) pn . gpm) pop(t) ←− computeObjectives(pop(t) . ρ) pn ←− ∅ g ′ ←− recombineG (p. f.w) pop(t + 1) ←− addItem(pop(t + 1) .g ←− mutateG (g ′ . f.30. λ) or (µ/ρ + λ) Data: t: the generation counter Data: pop: the population Data: mate: the mating pool Data: P : a set of ρ parents pool for a single offspring Data: pn : a single offspring Data: g ′ . λ) then mate(t) ←− truncationSelectionw (pop(t) . . µ) else mate(t) ←− truncationSelectionw (pop(t) ∪ mate(t − 1) . pn ) t ←− t + 1 return extractPhenotypes(extractBest(pop(t) ∪ pop(t − 1) ∪ . . pn . µ) else mate(t) ←− pop(t) pop(t + 1) ←− ∅ for i ←− 1 up to λ do P ←− selectionw (mate(t) . .g ∀p ∈ P ) w′ ←− recombineW (p. f)) . w′ : temporary variables Data: continue: a Boolean variable used for termination detection Output: X: the set of the best elements found 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 begin t ←− 1 pop(t = 1) ←− createPop(µ) continue ←− true while continue do pop(t) ←− performGPM(pop(t) . λ.1: X ←− generalES(f. f) if terminationCriterion then continue ←− false else if t > 1 then if strategy = (µ/ρ.

4. be constructed. We specify this procedure as Algorithm 30. we chose this notation. a vector. for example. However. the λ new individuals can be built.32 on page 798.3. e. is a generalization of uniform crossover (see Section 29. Like in uniform crossover. Of course. but even then. In the next step Line 24. for the ith locus in the child chromosome g′ . no one would implement it like this in reality.ρ . . Here. Furthermore. keep track of the best candidate solutions found so far. .2 . during all the iterations.4. 30.w could be the covariance matrix of all parent genotype vectors. In any way. we speak of multi-recombination. each candidate solution can be selected at most once. . Finally.1 . a random number normally distributed according to the covariance stored in pn . pn . The optimizer can maintain such a set during the generations by using an archive and pruning techniques.. we randomly pick an allele from the same locus from any of the ρ parents gp. .33 on page 799.362 30 EVOLUTION STRATEGIES from mate(t − 1) can be selected. as specified in Algorithm 30. i. the value of the ith gene of the offspring g′ is the arithmetic mean of the alleles of genes at the same locus in the parents. For the sake of simplicity. If ρ > 2. This genotype.g of the offspring pn is then created by recombining the genotypes of all of the parents with an operator “recombine” (Line 22). 30. The endogenous information pn . if the information was mutated after the genotype was created.. e. we sketched that the best solutions found in all populations over all generations would be returned in Line 28 which symbolizes such bookkeeping. it would not be clear whether its features are still good – after mutation. The order of first mutating the information record and then the genotype is important for the selection round in the following generation since we want information records which create good individuals to be selected.gn.w could be added to p. a whole set X of equally good candidate solutions may be discovered. it might have become bad. Here. This selection is usually a uniform random process which does not consider the objective values. This mutated information record is then used to mutate the newly created genotype in Line 25 with the operator “mutate”. this information is modified.2 and implemented in Listing 56. most often a single result will be returned.3.5 on page 337) to ρ parents. The algorithm should. After the mating pool mate has been filled. there are ρ parents for each offspring.w ∈ W to be stored in pn is created in a similar fashion with the operator “recombine” in Line 22. In single-objective optimization.3. 30. we therefore first select ρ parents into a parent pool P in Line 20. gp. gp. i.3. The genotype pn .2 Intermediate Recombination The intermediate recombination operator can be considered to be an extension of average crossover (discussed in Section 29. . again a selection method without replacement is generally used. For each of the λ offspring pn to . . for instance. be the arithmetic mean of all parent genotype vectors.3 and implement it in Listing 56.1 Dominant Recombination Dominant (discrete) recombination.3 Recombination The standard recombination operators in Evolution Strategy basically resemble generalizations of uniform crossover and average crossover to ρ ≥ 1 parents for (µ/ρ + λ) strategies. mutated. the new individual can be added to the next population popt+1 in Line 26 and a new generation can begin. but in order to keep the algorithm definition a bit smaller and to not have too many variables. the selection method chosen for this usually is truncation selection without replacement.4 on page 337) to ρ parents. may. In a (µ/ρ + λ) strategy.

2: g′ ←− recombineDiscrete(P ) Input: P : the list of parent individuals.g[i] s g′ [i] ←− n return g′ . RECOMBINATION 363 Algorithm 30.3. ρ)⌋] g′ [i] ←− p.g[i] return g′ Algorithm 30. j: counter variables Data: s: the sum of the values Data: p: a parent individual Output: g′ : the offspring of the parents 1 2 3 4 5 6 7 8 begin for i ←− 0 up to n − 1 do s ←− 0 for j ←− 0 up to ρ − 1 do p ←− P [j] s ←− s + p.3: g′ ←− recombineIntermediate(P ) Input: P : the list of parent individuals. len(P ) = ρ Data: i.30. len(P ) = ρ Data: i: a counter variable Data: p: a parent individual Output: g′ : the offspring of the parents 1 2 3 4 5 begin for i ←− 0 up to n − 1 do p ←− P [⌊randomUni[0.

σ 2 . Normal distributions have the advantage that (a) small changes are preferred.. σ 2 . or we add a normal distribution with mean 0 to the genotype. (re- selected parents g’ A sG G +sG G 2 0 0 1 s» 1 sG G s» s GG 0 1 ( ) 0 1 C = (s sG G sG G G G sG G 0 0 0 1 0 1 1 1 ) B C D Figure 30. (c) the dispersion of distribution can easily be increased/decreased by increasing/decreasing the standard deviation σ. e. we have two equivalent options: either we draw the numbers around the mean µ = g ′ . since values close to the expected value µ. If a real-valued. . i. do g ′ + randomNorm 0. the mean µ and the (scalar) standard deviation σ. e. finally. σ 2 . For a multi-dimensional case. where the genotypes are vectors g′ ∈ Rn . intermediate genotype g ′ ∈ R is mutated.1: Sampling Results from Normal Distributions combined intermediate) genotypes g ′ are often mutated by using normally distributed random numbers ∼ N µ. all numbers have a non-zero probability of being drawn. we want the possible results to be distributed around this genotype.4.364 30 EVOLUTION STRATEGIES 30. use randomNorm g ′ . but (b) basically.4 Mutation 30.. i. i. Therefore..1 Using Normal Distributions If Evolution Strategies are applied to real-valued (vector-based) search spaces G = Rn . all points in the search space are adjacent with ε = 0 (see Definition D4. the procedure is exactly the same. e.10 on page 85) and. A univariate normal distribution thus has two parameters.

a vector σ.30.. e..4. w = σ) Input: g′ ∈ Rn : the input vector Input: σ ∈ R: the standard deviation of the mutation Data: i: a counter variable Output: g ∈ Rn : the mutated version of g ′ begin for i ←− 0 up to n − 1 do g[i] ←− g′ [i] + σrandomNorm(0. In this figure. For the normal distribution in part (B). Figure 30. may differ largely.25 on page 786. MUTATION 365 For the parameter σ of the normal distribution. Algorithm 30. i.1. Basically. it may even change depending on which area of the search space is currently investigated by the optimization process.1. Furthermore. the influence of the dimensions of the search space. w = σ. the surfaces of constant density form concentric spheres around g′ . we chose a standard deviation which equals the mean of the standard deviations along both axes G0 and G1 .4: g ←− mutateG. there exist three simple methods for defining the second parameter of the normal distribution(s) for mutation. is quite common. For each dimension i ∈ 0. i. i. 30. 30.4. we assume that the new genotype g′ is the mean vector of a set of selected parent individuals (A) and we assume that the standard deviations are derived from these parents as well. They are illustrated exemplarily in Figure 30.4. The advantage of this mutation operator is that it only needs a single. where the selected individuals form a shape in the search space which is not aligned with the main axes. σ 2 return g 1 2 3 4 rithm 30. things are a little bit more complicated. 1) // Line 3 ≡ g[i] ←− randomNorm g′ [i]. the endogenous information is a scalar number w = σ. this value is usually stored as endogenous strategy parameter and adapted over time.σ (g′ .1.11 on page 271 we pointed out that the influence of the genes of a genotype on the objective values. Here.. however.1 Single Normal Distribution The simplest way to mutate a vector g′ ∈ Rn with a normal distribution is given in AlgoAlgorithm 30.1.2 n Univariate Normal Distributions By using n different standard deviations. Such a situation.1. scalar endogenous parameter. The mutants are symmetrically distributed around g′ .n − 1..25 on page 786 again provides a .5 illustrates how this can be done and Listing 56. σ 2 is drawn and added to g′ [i]. The distribution is isotropic. we can scale the cloud of points along the axes of the coordinate system as can be seen in part (C) of Figure 30.4 and implemented in Listing 56. Usually. e. a random value normally distributed with N µ. this operation creates an unbiased distribution of possible genotypes g. The drawback is that the resulting genotype distribution is always isotropic and cannot reflect different scales along the axes. As illustrated in part (B) of Figure 30. In Example E28.1 is the result of an experiment where we illustrate the distribution of a sampled point cloud. e. We illustrate the two-dimensional case where the parents which have been selected based on their fitness reside in a (rotated) ellipse.

1) // Line 3 ≡ g[i] ←− randomNorm g′ [i]. Even though the problem of axis scale has been resolved. . σ [i]2 return g 1 2 3 4 suitable implementation of the idea.1 part (C).n − 1.M (g′ . Instead of sampling all dimensions from the same normal distribution N 0. randomNorm(0. 1) . j: a counter variable Data: t: a temporary vector Output: g ∈ Rn : the mutated version of g′ begin for i ←− 0 up to n − 1 do t[i] ←− randomNorm(0.5: g ←− mutateG.6 shows how to mutate vectors sespel′ by adding numbers which Algorithm 30. 1) .3 n-dimensional Multivariate Normal Distribution Algorithm 30. .1. 30. part (D). 1) 1 2 3 4 5 6 7 8 g ←− g′ // g ←− g + Mt for i ←− 0 up to n − 1 do for j ←− 0 up to n − 1 do g[i] ←− g[i] + M[i. w = M) Input: g′ ∈ Rn : the input vector Input: M ∈ Rn×n : an (orthogonal) rotation matrix Data: i.4. We are thus able to. σ 2 . randomNorm(0.. 1)) . j] ∗ t[j] return g follow normally distributions with rotated iso-density ellipsoids. The advantage of this sampling method is imminent: we can create points which are similarly distributed than a selected cloud of parents and thus better represent the information gathered so far. w = σ) 30 EVOLUTION STRATEGIES Input: g′ ∈ Rn : the input vector Input: σ ∈ Rn : the standard deviation of the mutation Data: i: a counter variable Output: g ∈ Rn : the mutated version of g′ begin for i ←− 0 up to n − 1 do g[i] ←− g′ [i] + σ [i]randomNorm(0. For Figure 30. . The result of the mul′ tiplication Mt is then shifted by g . .σ (g′ . we sampled points with the standard deviations of the parent points along the axes. for instance. σ [i]) with i ∈ 0. we now use n distinct univariate normal distributions N (0. With a matrix M we scale and rotate standard-normal distributed random vectors t = T (randomNorm(0.6: g ←− mutateG.366 Algorithm 30. sample points similar distributions than the selected parents as sketched in Figure 30. we still cannot sample an arbitrarily rotated cloud of points – the main axes of the iso-density ellipsoids of the distribution are still always parallel to the main axes of the coordinate system. . The endogenous parameter w = σ now has n dimensions as well.1.

. The distribution chosen for this purpose depends on endogenous strategy parameters such as σ. The more information it acquires by evaluating the candidate solutions. the rotation and scaling matrix M to be the matrix of all the normalized and scaled (n dimensional) Eigenvectors Λi (C) : i ∈ 1. By multiplying a vector of n independent standard normal distributions with the matrix of normalized Eigenvectors. obtaining M is another interesting question. One of the general. Otherwise. If this last mutation operator is indeed applied with a matrix M directly derived from the covariance matrix C estimated from the parent individuals.g [j] (30. PARAMETER ADAPTATION 367 There are also three disadvantages: (a) First.. M[i. the dispersion of the mutation results should decrease. e. it is rotated to be aligned to the covariance matrix C..5 on page 442. by ||Λi (C)||2 . 3} but becomes computationally intense for higher dimensions [1097].4.1) C[i. as can easily be seen in Algorithm 30.5. computing the Eigenvectors and Eigenvalues is only easy for dimensions n ∈ {1.n − 1 can be used.1: p.6. the size of the endogenous Parameter w increases to n(n+1) if G ⊆ Rn : the main diagonal of M has n elements. [302] .30. however. need to be stored only once. We therefore utilize Definition D53. At the beginning.41 on page 664. i. j] = ∀p∈P We then can set. for example. Thus.g [j] − p. These vectors are normalized by dividing them by their length.5 Parameter Adaptation In Section 30. The matrix elements 2 above and below the diagonal are the same and thus. By scaling the Eigenvectors with the square roots λi (C) of the corresponding Eigenvalues. we discussed different methods to mutation genotypes g′ . The matrix M can be constructed from the covariance matrix C of the individuals selected for becoming parents P . j] = C[i.g [i] − p. As it turns out. basic principles of metaheuristics is that the optimization process should converge to areas of the search space which contain highly-fit individuals. they are scaled with the square roots of the corresponding Eigenvalues λi (C). the smaller the interesting area should become. it makes sense to initially sample a wide area around the intermediate genotypes g′ with mutation. (b) The second problem is that sampling now has a complexity of O n2 . there must be a change in these parameters as well. j ∈ 0. the best a metaheuristic algorithm can do is to uniformly sample the search space. This change in dispersion can only be achieved by a change in the parameters controlling it – the endogenous strategy parameters of the ES. we stretch the standard normal distributions from spheres to ellipsoids with the same shape as described by the covariance matrix. i. Step by step. e. 30. the Evolution Strategy basically becomes a real-valued Estimation of Distribution Algorithm as introduced in Section 34.g [i] |P | p. as the information gathered increases. and M. From this perspective. computing M has a higher complexity than using it for sampling. M= λi (C) Λi (C) ||Λi (C)||2 for i from 1 to n (30. Then. 2. The mutation operators randomly picked points in the search space which are distributed around g′ .n of the covariance matrix C. by applying Equation 30.. Such a change of the parameter values is called adaption. σ. j] ∀i.2) √ If C is a diagonal matrix. (c) Finally.

5. for 0 < σ < +∞. tune the mutation strength σ in such a way that the (estimated) success rate PS is about 1/5 [302]. e.. let us discuss the behavior of these parameters on the Sphere function fRSO1 discussed in Section 50.1 (1/5th Rule). PS and ϕ. depend on the value of σ. i. σ is the parameter we can tune in order to achieve this goal.5. i. PS can be estimated as feedback from the optimizer: the estimation of PS is the number of times a mutated individual replaced its parent in the population divided the total number of performed mutations. Since we consider the Sphere function as benchmark. As a compromise.5) lim ϕ = 0 If σ → +∞. The goal of adaptation is to have a high expected progress towards the global optimum.5 σ→0 (30. we can refer to it as mutation strength. on the other hand.3) The isotropic mutation operator samples points in a sphere around the intermediate genotype g′ . they conceived the 1/5th Rule: Definition D30. Since PS is zero.28] for different mathematical benchmark functions.5. 0.4. σ→+∞ lim PS = 0 lim ϕ = 0 (30. to maximize ϕ. the expected distance gain towards the optimum also disappears. If ϕ is large.6) (30.1. the Evolution Strategy will converge quickly to the global optimum.7) σ→+∞ In between these two extreme cases.1 on page 581 for n → ∞. All points which have a lower Euclidean distance to the coordinate center 0 than g′ have a better objective value and all points with larger distance have a worse objective value. Beyer [300] and Rechenberg [2278] considered two key performance metrics in their works on (1 + 1)-ESs: the success probability PS of the mutation operation and progress rate ϕ describing the expected distance gain towards the optimum. the success probability PS becomes zero since mutation will likely jump over the iso-fitness sphere of g′ even if it samples towards the right direction.1. Like Beyer [300] and Rechenberg [2278]. This means if success probability PS is larger than 0. we need to increase the mutation strength σ in order to maintain a near-optimal progress towards the optimum.3. the chance PS of sampling a point g with a smaller norm ||g||2 than g′ is 0.2.18. Beyer and Schwefel [302] and Rechenberg [2278] found theoretically that ϕ becomes maximal for values of PS in [0. g′ will always lie on a iso-fitness sphere. σ→0 lim PS = 0. on the . If σ is very small.. PS is monotonously decreasing with rising σ from limσ→0 PS = 0.4) (30.1 The 1/5th Rule 30 EVOLUTION STRATEGIES One of the most well-known adaptation rules for Evolution Strategies is the 1/5th Rule which adapts the single parameter σ of the isotropic mutation operator introduced in Section 30. n fRSO1 (x = g) = i=1 xi 2 with G = X ⊆ Rn (30. the expected distance ϕ for moving towards the optimum also becomes zero. If. In order to obtain nearly optimal (local) performance of the (1 + 1)-ES with isotropic mutation in real-valued search spaces.5 to limσ→+∞ PS = 0. ϕ begins to decline. Since σ defines the dispersion of the mutation results. e. lies an area where both ϕ > 0 and 0 < PS < 0.368 30. However.1. Both. For larger as well as for lower values of PS .

2 then σ ←− σ/a s ←− 0 pn . 1): the adaptation value Input: σ0 > 0: the initial mutation strength Data: t: the generation counter Data: p⋆ : the current individual Data: pn : a single offspring Data: PS ∈ [0.30.x ←− gpm(pn .σ (p⋆ σ) .g p⋆ ←− gpm(p⋆ .x) < f(p⋆ then .g ←− mutateG.7: x ←− (1+1)-ES1/5 (f.x . t ←− t + 1 return p⋆ . PARAMETER ADAPTATION 369 Algorithm 30.5.x) p⋆ ←− pn s ←− s + 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 if (t mod L) = 0 then s PS ←− L if PS < 0.x .g.g) pn ←− p⋆ continue ←− true while continue do pn . σ0 ) Input: f: the objective/fitness function Input: L > 1: the generation limit for adaptation Input: a ∈ (0. 1]: the estimated success probability of mutation Data: continue: a Boolean variable used for termination detection Output: x: the best element found begin t ←− 1 s ←− 0 σ ←− σ0 p⋆ ←− create .2 then σ ←− σ ∗ a else if PS > 0.g) if terminationCriterion then continue ←− false else if f(pn . L. a.

1 on page 361.5. we gave little evidence why this is actually is the case. If we look back at the 1/5th rule.1 as well as in the beginning of this section.x). it will perish and doom its genotype p. Finally.2. They are part of the individual p whose genotype p.4. If it falls below 0. it adapts only one single strategy parameter σ and thus cannot support more sophisticated mutation operators such as those discussed in Section 30. Schwefel [2435] suggests to keep a in 0. It is suitable for (1 + 1)-Evolution Strategies and requires a certain level of causality in the search space. Beyer and Schwefel [302] propose that if the dimension n of the search space G ⊆ Rn is sufficiently large (n ≥ 30). and 3. setting L = n would be a good choice. 2. it will take the endogenous parameters p. this adaptation mechanism may be misled by the information gained from sampling.x has bad characteristics. what implications this design has. the control parameters are considered to be endogenous information of each individual in the population. It does not allow the optimization algorithm to trace down different local or global optima with potentially different topology.1. the step size is too large and σ must be reduced. Beyer and Schwefel [302] hence propose to use Algorithm 30.7 we present a (1 + 1)-Evolution Strategy (1+1)-ES which implements the 1/5th Rule. it only covers one special case.2. i.g has a corresponding phenotype p.w to peril as well. If the fitness landscape is too rugged (see Chapter 14 on page 161). undergo selection together with the individuals owning them. if p. especially if many parents are used. As can be seen in the basic Evolution Strategy algorithm given in Algorithm 30.g and the control parameters p.w are responsible for the final individual creation step (mutation). the endogenous strategy parameters p.) In order to find a more general way to perform self-adaptation. but could be done with an extension of the rule to (µ + λ)-Evolution Strategies. we stated that the some of the control parameters of Evolution Strategies. the parameters are now local.370 30 EVOLUTION STRATEGIES other hand.5.2 on page 362 and Section 29.3 on page 363 as well.x = gpm(p.2 Endogenous Parameters In the introductory Section 30. So far.w which lead to its creation with it.g) with a good objective value f(p. e. the fraction of accepted mutations falls below 0.1 Recombining the Mutation Strength (Vectors) Beyer and Schwefel [302] state that mutating the strategy parameters can lead to fluctuations which slow down the optimization process. e. such as the mutation strength.. If the genotype p.1. 30. In Algorithm 30.2. First. In this case.3.2. 1). it has a high chance of surviving the selection step. i. if the mutation steps are too long.4.3. reduced. The similarity-preserving effect of intermediate recombination (see Section 30. 3.. For recombining the mutation strengths.g they helped building. (Which is not possible with a (1 + 1)-ES anyway. 1. are endogenous. we find three drawbacks. the reverse operation is performed and the mutation strength is divided by a. and what it is actually good for.6 on page 346) can work against this phenomenon. Instead of a global parameter set. Second. The algorithm estimates the success probability PS every L generations. are processed by reproduction operations.5. the adapted parameter holds for the complete population. If the mutations are too weak and PS > 0. the mutation strength σ is multiplied with a factor a ∈ (0. 2. . On the other hand. This step leads to three decisive changes in the adaptation process: 1. 30.85 < a < 1 for good results. Mutation steps which were involved into the successful production of good offspring will thus be selected and may lead to the creation of more good candidate solutions.2 and 30.

√ Beyer and Schwefel [302] that it should√ to τ = 1/ n for problems with G ⊆ Rn . all values in R+ \ {0} are adjacent under eN (0.1 on page 361 according to Equation 30. 1) < 0.8. we first compute a general scaling factor ν which is log-normally distributed. nearly optimal behavior of the Evolution Strategy can be achieved if α is set according to Equation 30.5.1) 1 τ ∝ √ N (30. In Algorithm 30.10 on page 85 to the space of endogenous information W. σ ′ may be the result from recombination of several parents. Only the direct predecessor and successor of an element in this series can be found in one step by mutation.2 Mutating the Mutation Strength σ 371 The mutation of the single strategy parameter σ.11) The exogenous parameter τ is used to scale the standard deviation of the random numbers.1) ∗ σ [i]′ return σ Input: σ ′ ∈ Rn : the old mutation strength vector Data: i: a counter variable Output: σ ∈ Rn : the new mutation strength vector strengths.1) σ ←− 0 for i ←− 0 up to n − 1 do σ [i] ←− ν ∗ eτ randomNorm(0. We can then define the operation mutateW.9) α = 1+τ According to Beyer [297].σ (for the recombined mutation strength σ ′ ) given in Line 24 of Algorithm 30.3 Mutating the Strength Vector σ Algorithm 30. The interesting question is how to create this number.8 presents a way to mutate vectors σ ′ which represent axis-parallel mutation Algorithm 30.10) (30. the mutation strength. only the values σ0 ∗ αi : i ∈ Z can be reached at all if the mutation strength is initialized with σ0 .8) (30.5.8 is that it creates an extremely limited neighborhood of σ-values if we extend the concept of adjacency from Definition D4.30.2. mutateW. The ε = 0 adjacency for the strategy parameter can be extended to the whole R+ by setting the multiplication parameter ν to e raised to the power of a normally distributed number.8.5 σ′ α otherwise (30. 1 Rechenberg’s Two-Point Rule [2279] sets ν = α with 50% probability and to α otherwise.2.σ (σ ′ ) = ασ ′ if randomUni[0. mutateW. If the be fitness landscape is multimodal. α > 1 is a fixed exogenous strategy parameter.σ (σ ′ ) = σ ′ ∗ eτ randomNorm(0. 30.8: σ ←− mutateW.9. The interesting thing about Equation 30. the mutation will become more volatile. Since all real numbers are ε = 0-adjacent under a normal distribution. Matter of fact. τ = 1/ 2n can be used. If 0 < ν < 1. can be achieved by multiplying it a positive number ν. the strength decreases and if 1 < ν.11. PARAMETER ADAPTATION 30.σ (σ) ′ 1 2 3 4 5 6 begin ν ←− eτ0 randomNorm(0. Each field i of .5. The meaning of τ will be discussed below and values for it will be given in Equation 30.1) .

. Schwefel [2437] further suggests to use c = 1.372 30 EVOLUTION STRATEGIES the new mutation strength vector σ then is the old value σ ′ [i] multiplied with another lognormally distributed random number and ν. For a (10. the settings in Equation 30. This idea has exemplarily been implemented in Listing 56. 100)-Evolution Strategy.12 and Equation 30.12) (30.13) For σ ∈ Rn .26 on page 787. c τ0 = √ 2n c τ = √ 2 n (30.13 are recommended by Beyer and Schwefel [302].

1586. 1186. 1183. The Theory of Evolution Strategies [300] 9.5 on page 317. 2434.6. 2959.1: Applications and Examples of Evolution Strategies. 1135. 778. 2163. EvoWorkshops: Real-World Applications of Evolutionary Computing . Area References Chemistry [1573] Combinatorial Problems [292. 3042. 1617. Genetic Algorithms [167] 8. 2279] Prediction [1044] Software [2279] 30.30. 2433. Numerical Optimization of Computer Models [2436] 30.3 Conferences and Workshops Table 30. 299. 2683. 1894. 1617. Evolutionsstrategie ’94 [2279] 6. GECCO: Genetic and Evolutionary Computation Conference See Table 28. 520. Evolution and Optimum Seeking [2437] 7. 778.6. 2604] Computer Graphics [3087] Digital Technology [3042] Distributed Algorithms and Sys. 1187. 1881. 2279.6.2: Conferences and Workshops on Evolution Strategies. 1182. 2937.5 on page 317. Genetic Algorithms and Evolution Strategy in Engineering and Computer Science: Recent Advances and Industrial Applications [2163] 3. 302. Self-Adaptive Heuristics for Evolutionary Computation [1617] 4.2 Books 1. 2232.6 30. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution [2278] 5. 1436. Cybernetic Solution Path of an Experimental Problem [2277] 10.1 General Information on Evolution Strategies Applications and Examples Table 30. 2936. 3087] Graph Theory [2683] Logistics [292. Evolutionary Programming. Handbook of Evolutionary Computation [171] 2. 3087] Function Optimization [169. 1586. GENERAL INFORMATION ON EVOLUTION STRATEGIES 373 30.6. 2266.[2683] tems Engineering [303. 2279] Mathematics [227. Evolutionary Algorithms in Theory and Practice: Evolution Strategies. CEC: IEEE Conference on Evolutionary Computation See Table 28. 299.

5 on page 318. BIONETICS: International Conference on Bio-Inspired Models of Network.374 See Table 28. Information.2 on page 131.5 on page 320. SEAL: International Conference on Simulated Evolution and Learning See Table 10. PPSN: Conference on Parallel Problem Solving from Nature See Table 28. WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10.2 on page 354. 30 EVOLUTION STRATEGIES EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28. WCCI: IEEE World Congress on Computational Intelligence See Table 28. SMC: International Conference on Systems. .2 on page 131.5 on page 320.2 on page 131. MICAI: Mexican International Conference on Artificial Intelligence See Table 10. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10.5 on page 319. ICNC: International Conference on Advances in Natural Computation See Table 28.2 on page 130. FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28.5 on page 319. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28.5 on page 319.2 on page 130.2 on page 132.2 on page 128.5 on page 318.5 on page 320. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28.5 on page 318. CIS: International Conference on Computational Intelligence and Security See Table 28. Man. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. HIS: International Conference on Hybrid Intelligent Systems See Table 10.5 on page 320.2 on page 131. MIC: Metaheuristics International Conference See Table 25. AE/EA: Artificial Evolution See Table 28.2 on page 128. EUROGEN: Evolutionary Methods for Design Optimization and Control See Table 29. and Computing Systems See Table 10. ICCI: IEEE International Conference on Cognitive Informatics See Table 10.5 on page 319.2 on page 227. Control and Automation See Table 28. and Cybernetics See Table 10. CIMCA: International Conference on Computational Intelligence for Modelling.

Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10.2 on page 134. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28. Organization and Technology See Table 10.2 on page 134.2 on page 132. Logic Programming. and Computational Algebraic Geometry See Table 10.30. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28.2 on page 132. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. WOPPLOT: Workshop on Parallel Processing: Logic.5 on page 321. GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. GENERAL INFORMATION ON EVOLUTION STRATEGIES EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28.2 on page 133.5 on page 321.5 on page 321.6. LION: Learning and Intelligent OptimizatioN See Table 10.5 on page 322. EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation See Table 28. Optimization. EvoRobots: International Symposium on Evolutionary Robotics .5 on page 321.5 on page 321. GEM: International Conference on Genetic and Evolutionary Methods See Table 28.2 on page 133.2 on page 133. FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28. 375 EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28.2 on page 133.5 on page 322. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10.5 on page 321. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28. IJCCI: International Conference on Computational Intelligence See Table 28.5 on page 321. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10. EngOpt: International Conference on Engineering Optimization See Table 10. GICOLAG: Global Optimization – Integrating Convexity.2 on page 133. NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28.5 on page 321.5 on page 321.2 on page 134.

6.5 on page 322. SSCI: IEEE Symposium Series on Computational Intelligence See Table 28. IWACI: International Workshop on Advanced Computational Intelligence See Table 28.5 on page 322. SEMCCO: International Conference on Swarm. 30 EVOLUTION STRATEGIES ICEC: International Conference on Evolutionary Computation See Table 28.2 on page 135.5 on page 322. SEA: International Symposium on Experimental Algorithms See Table 10. Evolutionary and Memetic Computing See Table 28.376 See Table 28. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2.5 on page 322. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands .4 Journals 1. 30.5 on page 322. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25.2 on page 228.

2. [10 points] 88.7 on page 369 in a programming language of your choice. and ρ of an Evolution Strategy relate to the population size ps and the mating pool size mps used in general Evolutionary Algorithms? You may use Section 28. Such permutations can be created and optimized in a way very similar to what we already did with the bin packing problem a few times.30.2 on page 329.3.7 on page 349 to the Traveling Salesman Problem instance att48 taken from [2288] and listed in Paragraph 57.4 on page 337) in Genetic Algorithms? What are the similarities? What are the differences? [10 points] 89. you can find the necessary code for that. Apply an Evolution Strategy using the Random Keys encoding given in Section 29.1 on page 362) relate to the uniform crossover operator (see Section 29.1.1.1 on page 361? What is the general difference between endogenous and exogenous parameters and the underlying ideas of adaptation? [10 points] 87. How does the Evolution Strategy perform in comparison with the other algorithms and previous experimental results? What conclusions can we draw about the utility of the Random Keys genotype-phenotype mapping? [10 points] 90. Additionally to the Evolution Strategy. Perform the same experiments as in Task 67 on page 238 (and 70 and Task 81 on page 357) with a Evolution Strategy using the Random Keys encoding given in Section 29.1. How does the Evolution Strategy perform in comparison with the Hill Climber on this problem? What conclusions can we draw about the utility of the Random Keys genotype-phenotype mapping? How close do the algorithms come to the optimal. also apply a simple Hill Climber to the same problem.3. In Section 57.7 on page 369 differ from the method applied in Algorithm 30.7 on page 349. a solution to an n-city Traveling Salesman Problem can be represented by a permutation of length n − 1. GENERAL INFORMATION ON EVOLUTION STRATEGIES Tasks T30 377 85. How does the parameter adaptation method in Algorithm 30. [40 points] 86. shortest path for this problem which has the length 10 628? [50 points] . µ. Only the objective function needs to be re-defined.3.4 on page 265 as a reference. Implement the simple Evolution Strategy Algorithm 30.2. As already discussed in Example E2.1.2 on page 45.1. using a method similar to what we did in Task 67 on page 238 and a representation according to Section 29. How do the parameters λ.1 on page 887.6.4. How does the n-ary domination recombination algorithm in Evolution Strategies (see Section 30.2.

.

algorithms. it can also be defined as the set of all Evolutionary Algorithms that breed programs. The well-known input-processing-output model from computer science states that a running instance of a program uses its input information to compute and return output data. The goal then is to find a program that connects them or that exhibits some kind of desired behavior according to the specified situations. 31. Friedberg did not use an evolutionary. The program was represented as a sequence of instructions for a theoretical computer called Herman [995. 996].Chapter 31 Genetic Programming 31. and similar constructs.1. usually some inputs and corresponding output data samples are known. In 1957. Friedberg [995] left the first footprints in this area by using a learning algorithm to stepwise improve a program.2. population-based approach for searching the programs – the idea of Evolutionary Computation had not been developed yet at that time. In this chapter. (a) First. In Genetic Programming. samples are known input Process (running Program) output to be found with genetic programming Figure 31. 1602. or simulated. 2199] has two possible meanings. as you can see 1 http://en.org/wiki/Genetic_programming [accessed 2007-07-03] .1 History The history of Genetic Programming [102] goes back to the early days of computer science as sketched in Figure 31. it is often used to subsume all Evolutionary Algorithms that have tree data structures as genotypes. as sketched in Figure 31. can be produced. (b) Second.1: Genetic Programming in the context of the IPO model.1 Introduction The term Genetic Programming1 (GP) [1220. we focus on the latter definition which still includes discussing tree-shaped genomes.wikipedia.1.

The mid-1980s were a very productive period for the development of Genetic Programming. Genetic Programming became fully accepted at the end of this productive decade mainly because of the work of Koza [1590. New results were reported by Smith [2536] in his PhD thesis in 1980.1. 1993) 31 GENETIC PROGRAMMING Standard GP Strongly-typed GP Evolutionary Programming (Fogel et al. the computational capacity of the computers of that era was very limited and would not have allowed for experiments an program synthesis driven by a (population-based) Evolutionary Algorithm. 2007) First Steps (Friedberg. 1995) Cartesian GP (Poli. The Evolutionary Programming approach for evolving Finite State Machine by Fogel et al.2: Some important works on Genetic Programming illustrated on a time line. Cramer [652] applied a Genetic Algorithm in order to evolve a program written in a subset of the programming language PL in 1985. Hicklin [1230] and Fujuki [999] implemented reproduction operations for manipulating the if-then clauses of LISP programs consisting of single COND-statements. 2001) Grammatical Evolution (Ryan et al. 1996) (Miller/Thomson. he suggested that effort could be spent into allowing the (checkers) program to learn scoring polynomials – an activity which would be similar to symbolic regression. 975]. 1958. the Artificial Ant . 1596]. 1995) TAG3P (Nguyen. In order to build predictors.1 on page 325. such as learning of Boolean functions [1592. 2001) String-to-Tree Mappings (Cramer. Fujiko and Dickinson [998] evolved strategies for playing the iterated prisoner’s dilemma game. Forsyth [972] evolved trees denoting fully bracketed Boolean expressions for classification problems in 1981 [972. 1994) (Brameier/Banzhaf. In the future development section of his 1959 paper [2383]. 973. 2004) LOGENPRO PDGP (Wong/Leung.. he could not report any progress in this issue. 1988) (Montana. 1591].. We discuss this approach in ?? on page ??. 1995) Grammars (Antonisse. 1998) 1960 1970 1980 1985 1990 1995 2000 2005 Figure 31. Samuel applied machine learning to the game of checkers and. Samuel 1959) Linear GP (Nordin. Bickel and Bickel [309] evolved sets of rules which were represented as trees using tree-based mutation crossover operators. in Section 29. At the same time. the next generation of scientists began to look for ways to evolve programs. This GA used a string of integers as genome and employed a genotype-phenotype mapping that recursively transformed them into program trees. Fourteen years later. He re-implemented his approach in Prolog at the TU Munich in 1987 [790. With this approach. created the world’s first self-learning program. Yet. [945] (see also Chapter 32 on page 413) dates back to 1966. by doing so. different forms of mutation (but no crossover) were used for creating offspring from successful individuals. Also. the undergraduate student Schmidhuber [2420] also used a Genetic Algorithm to evolve programs at the Siemens AG. 1985) Gene Expression Programming (Ferreira. 2420]. Around the same time. 1966) RBGP (Weise. 1981) (Koza.380 Tree-based (Forsyth. Koza studied many benchmark applications of Genetic Programming. 1990) GADS 1/2 (Paterson/Livesey. in his 1967 follow-up work [2385].

31. Also in the first half of the 1990s. Antonisse [113]. 2086]. for example. [1920] we the first ones to delve into grammar-guided Genetic Programming (GGGP. began to evolve programs in a linear shape. on the other hand. Perkis [2164]. We will discuss this form of Genetic Programming in-depth in Section 31. Each gene identifies an instruction and its parameters. and Mizoguchi et al. . Openshaw and Turton [2085. simple commands. G3P ) which is outlined in Section 31. string genomes used in linear Genetic Programming (LGP ) consist of single. Similar to programs written in machine code. an EBNF.4. 1595. Koza formalized (and patented [1590. 1602]. the candidate solutions are sentences in a language defined by. Roston [2336]. In the first half of the 1990s.5.1 on page 531). Linear Genetic Programming is discussed in Section 31. 1602].3. researchers such as Banzhaf [205]. Here. Stefanski [2605]. and symbolic regression [1596.1. Genetic Programming systems which are driven by a given grammar were developed. and Crepeau [653]. INTRODUCTION 381 problem (as discussed in ?? on page ??) [1594. a method for obtaining mathematical expressions that match given data samples (see Section 49. 1598]) the idea of employing genomes purely based on tree data structures rather than string chromosomes as used in Genetic Algorithms.

1591. and 31. 1789. 2566]. 1195. 2239. 2560. 869. 2681. 1591. 581. 1769. 1602. 2500–2502. 1909. 2854. 973. 2239. 1317.4.382 31 GENETIC PROGRAMMING 31. see also Tables 31. 455. 1596. 2568] Prediction [351. 1600.3 Mathematics [149. 869. 1665. 1591. 1785. see also Table 31. 2904. 99. 1423.3. 1721. 1120. 2997]. 1615. 1855. 1157. 2500. 1607.5 Multi-Agent Systems [98. 1120.3 and 31. see also Tables 31. see also Tables 31.3 Software [652. 958. 271. 543. 1022. 1542. 1592. 1658. 481. 2954. 2501.4 . see also Table 31. see also Table 31. 1674. 2338. 1909. 2513. 1456. 1596. 2729. tems 1320. 2567. 1805.4. 1317. 798. see also Tables 31. 1456.5 Games [998. 2377. 1401. 1685. 1591. 972. 704. 2854]. 1478. 1320.1 General Information on Genetic Programming Applications and Examples Table 31. 999. 542. 3007]. 2514. 2904] Decision Making [2728] Digital Technology [336. 2889. 1985. 1856. 1195. 772.4. 1613–1615. 1789. 1401. 1932. 798. see also Table 31. 846. 1230. 704. 615. 658.3 Combinatorial Problems [2338] Computer Graphics [97.[52. 1721. 1600.4 Cooperation and Teamwork [98. 581] Physics [487. 1772. 581. 351. 1021. 2731]. 2897. 1401. 455. 1726. 1120. 658. 351]. 2500. 790. 3009].2 31. 1805. 1785. 31. 2377. 1593. 76. 994. 3009]. 924. 1787. 2109. 2513. 2560. 2819. 1721. 2728. 1320. 1894. see also Table 31. 1787. 1514. 2728. 1805. 1478. 2501. 998. 1456. 1726. 3080]. 76. 1602. 581.3 Healthcare [350. 2090.5 Distributed Algorithms and Sys. 2729–2731. 3007. 1785. 1684.1: Applications and Examples of Genetic Programming. 799. see also Table 31.3 Economics and Finances [455. 1867. 2239. 1909.3. 1021. 269. and 31. 1909. 652.4 Databases [993. 2514. 2338.4 Engineering [52. 303. 270. 704. 2532. 1623. 31. 614. 337. 1805. 2502] Data Compression [336]. 2855. 2970]. 2996.2. 1453. 1785. 2731.3 and 31. see also Tables 31. 846. 1320. 1542. 1216. 2090. 1234.3 Data Mining [76. 31. 1602. 1449.3 and 31. see also Table 31. 1787. 2954]. 2240] Multiplayer Games [98. 1120. 1597. 869. 1021. 1215. 2513. 2728]. 1665. 2646]. 1317. see also Table 31. 1422.3. 1542. 799. 98. and 31. 455. 2905. 993. 1769. 1022.3 Security [76. 2729. 1215. 3007. 350. 1608. 2514]. Area Art Chemistry References [1234. 1317. 2251. 2566. 998. 2682. see also Tables 31. see also Table 31. 1456. 581. 2328] Graph Theory [52. 2729–2731.3 Control [2541. 1665. 2746] [487. 972. 1932. 1592. 337. 150. 870.

16. Hungary. Italy. UK.3 383 31. Turkey.5 on page 317. see also Table 31. 4. see [1517] 2003/04: Colchester. see [856] e 2006/04: Budapest. GENERAL INFORMATION ON GENETIC PROGRAMMING Sorting Testing Wireless Communication [1610]. EuroGP: European Workshop on Genetic Programming History: 2011/04: Torino. Portugal. 14. Italy. see [2080] 2007/04: Val`ncia.3 [1615. see [1518] 2004/04: Coimbra. and Chaotic Programming: The Sixth-Generation [2559] Genetic Programming: An Introduction – On the Automatic Evolution of Computer Programs and Its Applications [209] Design by Evolution: Advances in Evolutionary Design [1234] Linear Genetic Programming [394] A Field Guide to Genetic Programming [2199] Representations for Genetic and Evolutionary Algorithms [2338] Automatic Quantum Computer Programming – A Genetic Programming Approach [2568] Data Mining and Knowledge Discovery with Evolutionary Algorithms [994] Evolution¨re Algorithmen [2879] a Foundations of Genetic Programming [1673] Genetic Programming II: Automatic Discovery of Reusable Programs [1599] 31. see [977] 2001/04: Lake Como.2 1. see [886] 2009/04: T¨ bingen. Essex. 11. 3. 8.3 Conferences and Workshops Table 31. Italy. Spain.2: Conferences and Workshops on Genetic Programming.31.5 on page 317. 10. 1769.5 on page 318. Switzerland. Ireland. see also Table 31. 5.5 see Table 31. Germany. 12.2.2. see [1903] . see [607] 2005/03: Lausanne.2. 6. 13. 15. GECCO: Genetic and Evolutionary Computation Conference See Table 28. 9. CEC: IEEE Conference on Evolutionary Computation See Table 28. Genetic. 7. Milan. Books Advances in Genetic Programming II [106] Handbook of Evolutionary Computation [171] Advances in Genetic Programming [2569] Genetic Algorithms and Genetic Programming at Stanford [1601] Genetic Programming IV: Routine Human-Competitive Machine Intelligence [1615] Dynamic. 2502]. 2. see [2492] 2010/04: Istanbul. see [2360] 2002/04: Kinsale. EvoWorkshops: Real-World Applications of Evolutionary Computing See Table 28. see [2787] u 2008/03: Naples.

MIC: Metaheuristics International Conference See Table 25. Scotland. BIONETICS: International Conference on Bio-Inspired Models of Network.5 on page 318. USA. 1612] 1997/07: Stanford.5 on page 319. and Computing Systems See Table 10. 1611] 1996/07: Stanford. MI. see [2326] HIS: International Conference on Hybrid Intelligent Systems See Table 10. CA.5 on page 320. see [2753] 2010/05: Ann Arbor. GP: Annual Conference of Genetic Programming History: 1998/06: Madison. USA. ICCI: IEEE International Conference on Cognitive Informatics See Table 10. WI.5 on page 318. CA. MI. USA. see [210. MI. GPTP: Genetic Programming Theory and Practice History: 2011/05: Ann Arbor. 31 GENETIC PROGRAMMING WCCI: IEEE World Congress on Computational Intelligence See Table 28. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28. USA. MI. ICNC: International Conference on Advances in Natural Computation See Table 28. MI. see [2307] 2004/05: Ann Arbor. USA. see [1605. MI. USA.5 on page 319.2 on page 128. USA. SEAL: International Conference on Simulated Evolution and Learning See Table 10. see [2309] 2008/05: Ann Arbor. EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28. see [2575] 2006/05: Ann Arbor. 1609] AE/EA: Artificial Evolution See Table 28. USA. FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28. Sweden. USA. see [2308] 2007/05: Ann Arbor. see [2089] 2003/05: Ann Arbor. see [1603.5 on page 319. Information. CA. see [1604.2 on page 130. 2195] PPSN: Conference on Parallel Problem Solving from Nature See Table 28. see [2198] 1999/05: G¨teborg. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28.2 on page 128.2 on page 130. see [2196] o 1998/04: Paris. see [2310] 2009/05: Ann Arbor.5 on page 319.2 on page 227.5 on page 320. France. USA. USA. USA. MI.384 2000/04: Edinburgh. MICAI: Mexican International Conference on Artificial Intelligence . see [2306] 1995/07: Tahoe City. MI. UK.

LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. Man.2 on page 133. USA. . SMC: International Conference on Systems.5 on page 321. USA.31. IL. EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. see [2079] 2003/07: Chicago. GENERAL INFORMATION ON GENETIC PROGRAMMING See Table 10. 385 WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10.2 on page 131. Control and Automation See Table 28.2 on page 133. CIS: International Conference on Computational Intelligence and Security See Table 28. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10.2 on page 131. GEM: International Conference on Genetic and Evolutionary Methods See Table 28. see [2077] LION: Learning and Intelligent OptimizatioN See Table 10. Organization and Technology See Table 10.5 on page 321.2 on page 133. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. USA. WOPPLOT: Workshop on Parallel Processing: Logic.2 on page 133.5 on page 320. EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28. GEWS: Grammatical Evolution Workshop History: 2004/06: Seattle. WA.2. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28. see [2078] 2002/07: New York.2 on page 132.5 on page 321.2 on page 134.2 on page 133. EngOpt: International Conference on Engineering Optimization See Table 10. CIMCA: International Conference on Computational Intelligence for Modelling.2 on page 131.5 on page 320. GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. NY.2 on page 131.2 on page 132.2 on page 132.5 on page 321. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. and Cybernetics See Table 10.5 on page 321.

a mathematical expression [1602]. Logic Programming.3. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28. ICEC: International Conference on Evolutionary Computation See Table 28.5 on page 322.2 on page 135. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands 31.1 Introduction Tree-based Genetic Programming (TGP).5 on page 321.5 on page 322. Generally. IJCCI: International Conference on Computational Intelligence See Table 28.2 on page 134. EvoRobots: International Symposium on Evolutionary Robotics See Table 28. Optimization. SGP) is the most widespread Genetic Programming variant. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. both for historical reasons and because of its efficiency in many problem domains.5 on page 322.386 31 GENETIC PROGRAMMING EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation See Table 28.2.2 on page 134. a decision tree [1597]. NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28. 31.5 on page 321.5 on page 322.5 on page 321.5 on page 322.4 Journals 1. Evolutionary and Memetic Computing See Table 28. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. and Computational Algebraic Geometry See Table 10.3 (Standard) Tree Genomes 31. Here. SEA: International Symposium on Experimental Algorithms See Table 10.5 on page 321. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28. usually referred to as Standard Genetic Programming. a tree can represent a rule set [1867]. SSCI: IEEE Symposium Series on Computational Intelligence See Table 28. FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28.5 on page 322. the genotypes are tree data structures. . GICOLAG: Global Optimization – Integrating Convexity. IWACI: International Workshop on Advanced Computational Intelligence See Table 28.5 on page 322. SEMCCO: International Conference on Swarm.2 on page 228. or even the blueprint of an electrical circuit [1608]. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2.

3: The formula esinx + 3 |x| expressed as tree. ab where ab means “a to the power of b” . every node is characterized by a type i which defines how many child nodes it can have.1 (GP of Mathematical Expressions: Symbolic Regression). an inner node stands for a mathematical operation and its child nodes are the parameters of the operation. typeOf (t) ∈ i ∧ i ∈ Σ ⇒ numChildren(t) = 0 (31. The nodes t with typeOf (t) ∈ i ∧ i ∈ Σ are thus always leaf nodes.3. Trees also provide a very intuitive representation for mathematical functions. (STANDARD) TREE GENOMES 31. With 1.2) In Section 56.1. The set Σ contains the node types i which do not allow their instances t to have child nodes.3.2 (Non-Terminal Node Types). a function set of N = +. We thus provide a fully-fledged Strongly-Typed Genetic Programming (STGP) Montana [1930. a terminal set of Σ = (x. Definition D31.1 (Terminal Node Types). In our example implementation. e. −. The nodes t which belong to a non-terminal node type typeOf (t) ∈ i ∧ i ∈ N thus are always inner nodes of a tree. Here. typeOf (t) ∈ i ∧ i ∈ Σ ⇒ (numChildren(t) = k) ∧ k > 0 (31. e is Euler’s number. cos.1 on page 824. we provide simple Java implementations for basic tree node classes (Listing 56. there usually is a semantic distinction between inner nodes and leaf nodes of the genotypes.50) for each possible child node location.49.1) Definition D31. but it can serve here as a good example for the abilities of Genetic Programming. Example E31.5 x Figure 31. In Standard Genetic Programming.48)). 1931. Leaf nodes then are terminal symbols like numbers or variables. sin. /.1 on page 531. is called symbolic regression and is discussed in detail in Section 49. R) where x is a variable.1 Function and Terminal Nodes 387 In tree-based Genetic Programming. each node belongs to a specific type (Listing 56. ∗. We distinguish between terminal and non-terminal nodes (or node types). In this implementation.31. we go one step further and also define the possible types of children a node can have by specifying a type set (Listing 56. Such a node type does not only describe the number of children that its instance nodes can have. This method + ab * sin sin x e +3 x e 3 x ab 0. Each node type i from the set of nonterminal (function) node types in a Genetic Programming system has a definite number k ∈ N1 of children (k > 1). 1932] implementation. SGP has initially been used by Koza to evolve them. and R stands for a random-valued constant and 2.3.

In this figure. The syntax of most of the high-level programming languages. random population begin Pop () i s while i>0 do Pop appendList(Pop.org/wiki/Infix_notation [accessed 2010-08-21] 3 http://en. for(int i=s. Not only does this form normally constitute a tree – compilers even use tree representations internally.388 31 GENETIC PROGRAMMING we can already express and evolve a large number of mathematical formulas such as the one given in Figure 31.add(create()). first the value of its left child (esinx ) and then the value of the right child (3 ∗ |x|0.1. 31.3. i--) { Xpop.wikipedia. When reading the source code of a program.wikipedia. Pop = createPop(s) Input: s the size of the population to be created Data: i a counter variable Output: Pop the new. Evaluating a tree-encoded function requires a depth-first traversal of the tree where the value of a node can be computed after the values of all of its children have been calculated. } Algorithm Program (Schematic Java. High-Level Language) {block} while Pop ret {block} () i s i > 0 Pop appendList Pop create i Abstract Syntax Tree Representation i 1 Figure 31.org/wiki/Lexical_analysis [accessed 2007-07-03] . Then 2 http://en.3. create()) i i-1 return Pop end 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 List<IIndividual> createPop(s) { List<Individual> Xpop. Xpop = new ArrayList<IIndividual>(s).4: The AST representation of algorithms/programs. leads to a certain hierarchy of modules and alternatives. the root node is a + operation. this + would be the last operation which is evaluated when computing the value of the formula. for example.2 Similarity to High-Level Programming Languages Trees are very (and deceptively) close to the natural structure of algorithms and programs. i>0.5 ) will be determined. Since trees naturally comply with the infix 2 . } return Xpop. compilers first split it into tokens3 . Before this final step.

5: Tree creation by the full method.wikipedia. programs in representations which we know from our everyday life as students or programmers exhibit too much epistasis.wikipedia. Then. (STANDARD) TREE GENOMES 4 5 389 they parse these tokens. While Genetic Algorithms match their direct biological metaphor.3.3. Tree-based Genetic Programming directly evolves individuals in this form.3 Structure and Human-Readability Another interesting aspect of the tree genome is that it has no natural role model. random trees instead of such one-dimensional sequences are constructed. an abstract syntax tree (AST) is created [1278. Many other learning algorithms like artificial neural networks. we can illustrate almost every6 program or algorithm as such an AST (see Figure 31. There are three basic ways for realizing the create and “createPop” operations from for trees which can be distinguished according to the depth of the produced individuals. Genetic Programming is one of the few techniques that are able to learn structures of potentially unbounded complexity. an initial population composed of randomized individuals is needed. The internal nodes of ASTs are labeled by operators and the leaf nodes contain the operands of these operators.4). however.4.3.2 Creation: Nullary Reproduction Before the evolutionary process in Standard Genetic Programming can begin.1. It can be considered as more general than Genetic Algorithms because it makes fewer assumptions about the shape of the possible solutions. It should.2. ˆ Normally. a set of random bit strings is created for this purpose. As we will discuss in ??. 31. usually generate black-box outputs.org/wiki/Abstract_syntax_tree [accessed 2007-07-03] 6 Excluding such algorithms and programs that contain jumps (the infamous “goto”) that would produce crossing lines in the flowchart (http://en. for example.wikipedia. 1462].3.org/wiki/Flowchart [accessed 2007-07-03]).31. it often offers white-box solutions that are human-interpretable. to some degree. 4 http://en. which are highly complicated if not impossible to fully grasp [1853]. the DNA. Furthermore. the creation operation will return only trees where the path between the ˆ root and the most distant leaf node is not longer than d. . In Genetic Programming. Finally. Synthesizing them is thus not possible with GP. In principle.1 Full The full method illustrated in Figure 31. In Genetic Algorithms. be pointed out that Genetic Programming is not suitable to evolve programs such as the one illustrated in Figure 31. there is a maximum depth d specified that the tree individuals are not allowed to surpass.5 creates trees where each (non-backtracking) maximum depth Figure 31. 31. 31.org/wiki/Parse_tree [accessed 2007-07-03] 5 http://en. Genetic Programming introduces completely new characteristics and traits.

N.390 ˆ Algorithm 31. Σ else i ←− Σ[⌊randomUni[0.3 Ramped Half-and-Half Koza [1602] additionally introduced a mixture method called ramped half-and-half defined here in Algorithm 31. Algorithm 31.2. len(N ))⌋] g ←− instantiate(i) for j ←− 1 up to numChildren(i) do ˆ g ←− addChild g.2. ˆ where each (non-backtracking) path from the root to the leaf nodes is not longer than d but may be shorter. For each tree to be created. symbols (the functions N ). 31.1: g ←− createGPFull d. Σ Input: N : the set of function types Input: Σ: the set terminal types ˆ Input: d: the maximum depth Data: j: a counter variable Data: i ∈ N ∪ Σ: the selected tree node type Output: g ∈ G: a new genotype 1 2 3 4 5 6 7 8 9 10 31 GENETIC PROGRAMMING begin ˆ if d > 1 then i ←− N [⌊randomUni[0. A chain of function nodes is recursively constructed until the ˆ maximum depth minus one.2.3. 31. If we reach d. This is achieved by deciding randomly for each node if it should be a leaf or ˆ not when it is attached to the tree. Now either full or .1 illustrates ˆ the type is chosen from the non-terminal this process: For each node of a depth below d.6: Tree creation by the grow method.3. this algorithm draws a number r ˆ ˆ uniformly distributed between 2 and d: (r = ⌊randomUni 2. d + 1 ⌋).6 and specified in Algorithm 31. only leaf nodes (with types from Σ) can be attached.2 Grow The grow method depicted in Figure 31. Of course. len(Σ))⌋] g ←− instantiate(i) return g ˆ path from the root to the leaf nodes has exactly the length d. createGPFull d − 1. only leaf nodes can be attached to. N. of course. to nodes of the depth d − 1.3. also creates trees maximum depth Figure 31.

createGPFull d − 1. N.31. N.3: g ←− createGPRamped d. N.3. 1) < 0.5 then return createGPGrow(r. d + 1 if randomUni[0. len(V ))⌋] g ←− instantiate(i) for j ←− 1 up to numChildren(i) do else i ←− Σ[⌊randomUni[0. Σ Input: N : the set of function types Input: Σ: the set terminal types ˆ Input: d: the maximum depth Data: j: a counter variable Data: V = N ∪ Σ: the joint list of function and terminal node types Data: i ∈ V : the selected tree node type Output: g: a new genotype 1 2 3 4 5 6 7 8 9 10 11 begin ˆ if d > 1 then V ←− N ∪ Σ i ←− V [⌊randomUni[0. Σ Input: N : the set of function types Input: Σ: the set terminal types ˆ Input: d: the maximum depth Data: r: the maximum depth chosen for the genotype Output: g ∈ G: a new genotype 1 2 3 4 begin ˆ r ←− randomUni 2. Σ ˆ Algorithm 31.2: g ←− createGPGrow d. len(Σ))⌋] g ←− instantiate(i) return g ˆ g ∈ G ←− addChild g. Σ) else return createGPFull(r. N. N. Σ) . (STANDARD) TREE GENOMES 391 ˆ Algorithm 31.

The Ramped Half-and-Half algorithm for Strongly-Typed Genetic Programming is implemented in Listing 56. d: two Boolean variables Data: w: a value uniformly distributed in [0. nodeWeight(tc ))⌋ if w ≥ nodeWeight(tc ) − 1 then b ←− false else i ←− numChildren(tc ) − 1 while i ≥ 0 do w ←− w − nodeWeight(tc . We introduce an operator “selectNode” for choosing these nodes.3. nodeWeight(tc )] Data: i: an index Output: tc : the selected node 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 begin b ←− true tc ←− tr while b do w ←− ⌊randomUni[0.42 on page 814. . In order to apply mutation.3. Definition D31. These nodes are then exchanged. in mutation as well as in recombination. 31.4: tc ←− selectNodeuni (tr ) Input: tr : the (root of the) tree to select a node from Data: b. we need one node in each parent tree. For recombination. The operator tc chooses one node tc from the tree whose root is given by tr .3 Node Selection 392 In most of the reproduction operations for tree genomes.children[i] i ←− −1 else i ←− i − 1 return tc 17 same probability as done by the method selectNodeuni given in Algorithm 31. certain nodes in the trees need to be selected.3.1 Uniform Selection A good method for doing so could select all nodes tc in the tree tc with exactly the Algorithm 31.4. for instance. P (selectNodeuni (tr ) = tc ) = P (selectNodeuni (tr ) = tn ) for all nodes tc and tn in the tree with root tr . This method is often preferred since it produces an especially wide range of different tree depths and shapes and thus provides a great initial diversity. we first need to find the node which is to be altered.3 (Node Selection Operator). = selectNode(tr ) 31. There.children[i]) if w < 0 then tc ←− tc .31 GENETIC PROGRAMMING ˆ grow is chosen to finally create a tree with the maximum depth r (in place of d).

In selectNodeuni . Often.c.5 or recursively go to one of its children (also with 50% probability). We could. the reproduction operators will prefer accessing some parts of the trees while very rarely altering the other regions. i. Furthermore. as illustrated in Fig. and their children would have a base multiplier of 0. First. the tree which contains tn itself. of course. for example. numChildren(t)−1 nodeWeight(t) = 1 + i=0 nodeWeight(t. The possible outcome of this algorithm is illustrated in Fig. . Algorithm 31. a copy g ∈ G from the original genotype gp ∈ G is created. the nodes near to the root will most likely be left untouched during reproduction. we pick the ramped half-and-half method with a maximum depth similar to the one of the child which we replace by it. the weight of the root of a tree is the number of all nodes in the tree and the weight of each of the leaves is exactly 1. For creating this tree. a random parent node t is selected from g.4 on the other hand.3.a and an example implementation in Java for Strongly-Typed Genetic Programming is given in Listing 56. descend the tree by starting at the root t and would return the current node with probability 0.5 shows one possible way to realize such a mutation. 31.5 /numChildren(tr ) each. We provide an example implementation such a node selection method in Listing 56.53 for their probabilities and so on. 31. Its direct children have at most probability 0. We could also choose other probabilities which strongly prefer going down to the children of the tree. in the special case that the tree nodes may have a variable number of children. and so on. 31.1 Mutation: Unary Reproduction Tree genotypes may undergo small variations during the reproduction process in the Evolutionary Algorithm.. also be achieved with replacement. Then. its children. the root tr would have a 50 : 50 chance of being the starting point of reproduction 2 operation. Then. the probability for a node of being selected in a tree with root tr is thus 1/nodeWeight(tr ). and (b) the deletion of nodes.7. A tree descend where with probabilities different from these defined here may lead to unbalanced node selection probability distributions.7.children[i]) (31. e.3.3. this approach is favored by selection methods. (STANDARD) TREE GENOMES 393 In order to achieve such behavior. removing this node and all of its children. and finally replacing it with another node [1602]. although leaves in different branches of the tree are not chosen with the same probabilities if the branches differ in depth. Then.43 on page 815.The effects of insertion and deletion can. we first we define the weight nodeWeight(tn ) of a tree node tn to be the total number of nodes in the subtree with tn as root. grandchildren. When applying Algorithm 31. grand-grandchildren.4. 31. Such a mutation is usually defined as the random selection of a node in the tree. there exist no regions in the trees that have lower selection probabilities than others.41 on page 811.b).3) Thus.4 Unary Reproduction Operations 31. Hence. but then.7. two additional operations for mutation become available: (a) insertions of new nodes or small trees (Fig. One child of this node is replaced with a newly created subtree.31. the leaves would almost never take actively part in reproduction.

treeDepth(t. . 31.7: Possible tree mutation operations. 31. Σ) return g maximum depth Fig.7.7.children[j])} .5: g ←− mutateGP(gp ) Input: gp ∈ G: the parent genotype Input: [implicit] N : the set of function types Input: [implicit] Σ: the set terminal types Data: t: the selected tree node Data: j: the selected child index Output: g ∈ G: a new genotype 1 2 3 4 5 6 7 8 begin g ←− gp repeat t ←− selectNode(g) until numChildren(t) > 0 j ←− ⌊randomUni[0.394 31 GENETIC PROGRAMMING Algorithm 31. numChildren(t))⌋ t. Figure 31. 31.c: Subtree deletion.7. maximum depth Fig.b: Subtree insertions. maximum depth Fig. N.children[j] ←− createGPRamped(max {2.a: Subtree replacement.

If the tree represents a mathematical formula and the operation represented by the node is commutative. Figure 31. (STANDARD) TREE GENOMES 395 Example E31.9: Tree permutation – (asexually) shuffling subtrees. Figure 31. + 31.3. From this tree.6.1. The child nodes attached to that node are then shuffled randomly.1 Cont. it is not applied [1602]. this has no direct effect. Like mutation. The resulting genotype then stands for the formula (x/12) ∗ (3 + |x|). the portion (11 + x) is replaced with a randomly created subtree which resembles the expression x/12.2 (Mutation – Example E31. permutated.3. It first selects an internal node of the parental tree. The main goal of this operation is to re-arrange the nodes in highly fit subtrees in order to make them less fragile for other operations such as recombination. The initial genotype represents the mathematical formula (11 + x) ∗ (3 + |x|).8 shows one possible application of Algo- * + 11 x 3 x + randomly created / x 12 cut random node replace a randomly chosen node with a randomly created sub tree * / x 12 3 x Figure 31. rithm 31.9 resembles the permutation operation of string genomes or the inversion used in messy GA (Section 29. it is used to reproduce one single tree.31. [1602]). e.8: Mutation of a mathematical expression represented as a tree. Based on Example E31..).1 on page 387. i.2 Permutation: Unary Reproduction The tree permutation operation illustrated in Figure 31.5 to a mathematical expression encoded as a tree. The effects of this operation are questionable and most often. .2.4.

so the overall benefits of this operation are not fully clear [1602]. 31. in size) but equivalent to its parent in terms of functional aspects. for instance. this operation has no substantial effect but may be useful in special applications like the evolution of artificial neural networks [1602]. Let us take the mathematical formula sketched in Figure 31. Editing a tree means to create a new offspring tree which is more efficient (e.396 31. A negative aspect would be if (in our example) a fitter expression was (7 − (4 ∗ a)) and a is a variable close to 1.4. It is thus a very domain-specific operation. Besides the aspects mentioned in Example E31. In m.11. The new terminal may spread throughout the population in the further course of the evolution. This expression clearly can be written in a shorter way by replacing (7 − 4) with 3 and (1 ∗ a) with a. Genetic Programming with and without editing showed equal performance. According to Koza.3 Editing: Unary Reproduction 31 GENETIC PROGRAMMING Editing is to trees in Genetic Programming what simplifying is to formulas in mathematics. The positive aspect of editing here is that it reduces the number of nodes in the tree by removing useless expressions.. transforming (7 − 4) into 3 prevents an evolutionary transition to the fitter expression. editing also reduces the diversity in the genome which could degrade the performance by decreasing the variety of structures available. they will no longer be subject to potential damage by other reproduction operations.3. To put it plain.).4.g. Then.5 Wrapping: Unary Reproduction Applying the wrapping operation means to first select an arbitrary node n in the tree.3. Similar measures can often be applied to algorithms and program code. At the same time.3.3 (Editing – Example E31.4. Example E31. the expression (7 − 4) is now less likely to be destroyed by the reproduction processes since it is replaced by the single terminal node 3. In Koza’s experiments. we improve its readability and also decrease the computational time needed to evaluate the expression for concrete values of a and b.10 x = b + (7 − 4) + (1 ∗ a) + + b 7 4 1 * a b + 3 + a Figure 31. we create new terminal symbols that (internally hidden) are trees with multiple nodes. This way. A new non-terminal node m is created outside of the tree. By doing so.4 Encapsulation: Unary Reproduction The idea behind the encapsulation operation is to identify potentially useful subtrees and to turn them into atomic building block as sketched in Figure 31.10: Tree editing – (asexual) optimization.3. This makes it more easy for recombination operations to pick “important” building blocks. 31.1 Cont. at least one child node position is .

(b + (a + 2)) ∗ 5. which becomes obvious when comparing Figure 31. The wrapping operation is used by the author in is experiments and so far. The parent node inclusively all of its child nodes (except n) are removed from the tree.31.6 Lifting: Unary Reproduction While wrapping allows nodes to be inserted in non-terminal positions with small change of the tree’s semantic. Simple mutation would. however. . Finally. place a into an expression such as (a + 2). cut n from the tree or replace it with another expression. so it is likely used in similar ways by a variety of researchers.12 and Figure 31.12: An example for tree wrapping.13.4. A simple mutation operator could replace the node a with a randomly created subtree such √ √ as ( a ∗ 7). With lifting. I have not seen another source where it is used.4 (Wrapping – Example E31. lifting is able to remove them in the same way. 31. m is hung into the tree position that formerly was occupied by n.12 is to allow modifications of non-terminal nodes that have a high probability of being useful. left unoccupied. n (and all its potential child nodes) is cut from the original tree and append it to m by plugging it into the free spot. The purpose of this reproduction method illustrated in Figure 31. Then. Then again. which is quite different from (b + a) ∗ a. The idea of wrapping is to introduce a search operation with high locality which allows smaller changes to genotypes. a candidate solution could be the formula (b + a) ∗ 5. The result. Lifting begins with selecting an arbitrary inner node n of the tree. a tree that represents the mathematical formula (b + a) ∗ 3 can be transformed to b ∗ 3 in a single step. Wrapping may. (STANDARD) TREE GENOMES 397 Figure 31. If we evolve mathematical expressions.3.1 Cont.3. the idea of this operation is especially appealing to the evolution of executable structures such as mathematical formulas and quite simple. Figure 31. for example. is closer to the original formula.11: An example for tree encapsulation. This will always change the meaning of the whole subtree below n dramatically. It is the inverse operation to wrapping. arriving at (b + ( a ∗ 7)) ∗ 5. This node then replaces its parent node.). Example E31.

5. Therefore.13: An example for tree lifting. An example implementation of this operation in Java for Strongly-Typed Genetic Programming is given in Listing 56. both. but because of its simplicity. Over many generations. gp2 ) Input: gp1 . single subtrees are selected randomly from each of the parents and subsequently are cut out and reinserted in the partner genotype.3. The new trees they create must not exceed it. it would be strange if it had not been used by various researchers in the past.children[j] ←− selectNode(gp2 ) return g If a depth restriction is imposed on the genome. 31.5 Recombination: Binary Reproduction The mating process in nature – the recombination of the genotypes of two individuals – is also copied in tree-based Genetic Programming. the mutation and the recombination operation have to respect them. so far I have not seen a similar operation defined explicitly in other works. . Again.398 31 GENETIC PROGRAMMING Figure 31. The intent of using the recombination operation in Genetic Programming is the same as in Genetic Algorithms. 31. numChildren(t))⌋ t. Exchanging parts of genotypes is not much different in tree-based representations than in string-based ones. Lifting is used by the author in his experiments with Genetic Programming (see for example ?? on page ??).6: g ←− recombineGP(gp1 .1 Subtree Crossover Applying the default subtree exchange recombination operator to two parental trees means to swap subtrees between them as defined in Algorithm 31. Algorithm 31.14.44 on page 817. gp2 ∈ G: the parental genotypes Data: t: the selected tree node Data: j: the selected child index Output: g ∈ G: a new genotype 1 2 3 4 5 6 7 8 begin g ←− gp1 repeat t ←− selectNode(g) until numChildren(t) > 0 j ←− ⌊randomUni[0. successful building blocks – for example a highly fit expression in a mathematical formula – should spread throughout the population and be combined with good genes of different candidate solutions.6 and illustrated in Figure 31.3.

each parent tree is recombined with a new randomly created tree and the offspring is returned. In Strong Headless Chicken Crossover (SHCC). Example E31.3.5..4.14: Tree recombination by exchanging subtrees.3. i. Angeline [101] argues that it performs no better than mutation and causes bloat [104] and used Jones’s headless chicken crossover [1466] discussed in Section 29.2 Headless Chicken Crossover Recombination in Standard Genetic Programming can also have a very destructive effect on the individual fitness [2032].). (STANDARD) TREE GENOMES 399 () maximum depth Figure 31.15 shows one possible application of Algo- ab sin x e e x 3 Select node to replace with from Parent 2 / 3 Select node to replace in Parent 1 ab sin e x e Offspring / 3 Figure 31. The first parent tree gp1 represents the formula (sin(3/e))x and the second one (gp2 ) denotes (e − x) − |3|.5 (Recombination – Example E31. The new child will normally contain a relatively small amount of random nodes. The node which stands for the variable x in the first parent is selected as well as the subtree e − x in the second parent. Angeline [101] attacked Koza’s [1602]’s claims about the efficiency of subtree crossover by constructing two new operators: 1. 31.1 on page 387. .7 on page 339. Two mathematical expressions encoded as trees are recombined by the exchange of subtrees.1 Cont. mathematical expressions.6. Figure 31. Based on Example E31.15: Recombination of two mathematical expressions represented as trees. e.3. yielding an offspring which encodes the formula (sin(3/e))e−x . rithm 31.31. x is cut from gp1 and replaced with e − x.

however. SHCC is very close to subtree mutation.3. 31. It resembles a macromutation. This contradicts the common (prior) believe that subtree crossover would be a building block combining facility. D’Haeseleer [782] obtained modest improvements with his strong context preserving crossover that permitted only the exchange of subtrees that occupied the same positions in the parents. Poli and Langdon [2193.400 31 GENETIC PROGRAMMING 2.3. When ADFs are employed. The root of the tree usually loses its functional responsibility and now serves only as glue that holds the individual together and has a fixed number n of children.5. too. often only this first branch is taken into consideration whereas the root and the ADFs are ignored. since a sub-tree in the parent is replaced with a randomly created one.6 (ADFs – Example E31.).3 Advanced Techniques Several techniques have been proposed in order to mitigate the destructive effects of recombination in tree-based representations. When evaluating the fitness of an individual.1 on page 407. In problems where a constant incoming stream of new genetic material and variation is necessary.4. Finding a way to evolve modules and reusable building blocks is one of the key issues in using GP to derive higher-level abstractions and solutions to more complex problems [107. 2194] define the similar single-point crossover for tree genomes with the same purpose: increasing the probability of exchanging genetic material which is structural and functional akin and thus decreasing the disruptiveness. a certain structure is defined for the genome. Interestingly. His conclusion is that subtree crossover actually works similar to a macromutation operator whose contents are limited by the subtrees available in the population. How this works can maybe best illustrated . the parent tree is recombined with a new randomly created tree and vice versa. [982] for linear Genetic Programming is discussed in Section 31. they could not withstand a thorough investigation [1600].6 Modules 31.1 Cont.6. Lang [1665] claimed that Hill Climbing algorithm with a similar macromutation could outperform Genetic Programming a few years prior but were concluded from non-conclusive and ill-designed experiments. Angeline [101] showed that SHCC performed at least as good if not better than subtree crossover on a variety of problems and that the performance of WHCC is good. If ADFs are used.5 and 31. may use any of the automatically defined functions to produce its output. As a consequence. typically not only their number must be specified beforehand but also the number of arguments of each of them. the pure macromutation provided by SHCC thus outperforms crossover. 31.1 Automatically Defined Functions The concept of automatically defined functions (ADFs) as introduced by Koza [1602] provides some sort of pre-specified modularity for Genetic Programming. In Weak Headless Chicken Crossover (WHCC). A related approach define by Francone et al.6) in Genetic Programming was thus first profoundly pointed out by Angeline [101]. The result-generating branch.3. 1599]. One of the offspring randomly selected and returned. Example E31. 108.4. there will be a relatively small amount of non-random code. from which n−1 are automatically defined functions and one is the result-generating branch. In 1994. In half of the returned offspring. The similarity of crossover and mutation (which also becomes obvious when comparing Algorithm 31.

x is incremented one time and 1 is passed to ADF which then returns 2=1+1.4) (31. (STANDARD) TREE GENOMES 7 401 by using the example given in ??.2 Automatically Defined Macros Spector’s idea of automatically defined macros (ADMs) complements the ADFs of Koza [2566. It stems from function approximation . 2240].5) (31.3. or the evolution of robotic behavior [98].6. f0 has a single formal parameter a and f1 has two formal parameters a and b. Genetic Programming-based symbolic regression. they can also be applied to a variety of other problems like in the evolution of agent behaviors [96. is discussed in Section 49. we have illustrated the pseudo-code of two programs – one with a function (called ADF) and one with a macro (called ADM).3. Although ADFs were first introduced for symbolic regression by Koza [1602]. the ADM resembles to two calls to y().1 on page 531. electrical circuit design [1608]. Both. the function and the macro. Therefore. f0 (x)) ∗ (f0 (x) + 3) f0 (a) = a + 7 f1 (a. The parameter of ADF is evaluated before ADF is invoked. The genotype ?? encodes the following mathematical functions: g(x) = f1 (4. The parameters in automatically defined functions are always values whereas automatically defined macros work on code expressions. 31.16. Assume that the goal of GP is to approximate a function g with the one parameter x and that a genome is used where two functions (f0 and f1 ) are automatically defined. 7 A very common example for function approximation. Both concepts are very similar and only differ in the way that their parameters are handled. return a sum containing their parameter a two times. .7 (ADFs vs. The parameter of the macro. In Figure 31. This difference shows up only when side-effects come into play. since this is the area where many early examples of the idea of ADFs come from. Each program has a variable x which is initially zero. resulting in x being incremented two times and in 3=1+2 being returned. is the invocation of y(). however. not its result. The number of children of the function calls in the result-generating branch must be equal to the number of the parameters of the corresponding ADF.6) Hence. g(x) ≡ ((−4) ∗ (x + 7)) ∗ ((x + 7) + 3).31. ADM). Hence. The function y() has the side-effect that it increments x and returns its new value. 2567]. b) = (−a) ∗ b (31. Example E31.

16: Comparison of functions and macros... .. temp2 temp1 = y() temp2 = temp1 + temp1 print(”out: “ + temp2) end .roughly resembles main_program begin variable temp temp = y() + y() print(”out: “ + temp) end C:\ C:\ C:\ C:\ produced output C:\ C:\ C:\ C:\ produced output exec main_program -> “out 2” exec main_program -> “out 3” Figure 31.402 31 GENETIC PROGRAMMING variable x=0 subroutine y() begin x++ return x end function ADF(param a) º (a+a) main_program begin print(”out: “ + ADF(y)) end Program with ADF variable x=0 subroutine y() begin x++ return x end macro ADM(param a) º (a+a) main_program begin print(”out: “ + ADM(y)) end Program with ADM .roughly resembles main_program begin variable temp1..

4. for instance. In linear Genetic Programming. These sequences may contain branches in form of jumps to other places in the code. . Every possible flowchart describing the behavior of a program can be translated into such a sequence. This distinction in philosophy goes as far as down to the naming conventions of the instructions and variables: In symbolic regression in Standard Genetic Programming. . Nevertheless. LINEAR GENETIC PROGRAMMING 403 The ideas of automatically defined macros and automatically defined functions are very close to each other. the data is stored in registers such as R[0]. It was discussed that tree genomes are suitable to encode programs or formulas and how the genetic operators can be applied to them.4.8 (LGP versus SGP: Syntax). there is no much difference between the application of ADFs and ADMs. intermediate representations that encode the program trees. 2567]. and sin are common and the variables are usually called x (or x1 .4 Linear Genetic Programming 31.1 Introduction In the beginning of this chapter. Automatically defined macros are likely to be useful in scenarios where context-sensitive or side-effect-producing operators play important roles [2566. ). In Standard Genetic Programming. SUB . evolutionary synthesis of programs that solve a given set of problems. In other scenarios. trees are not the only way for representing programs. is that tree-based GP is close to high-level representations such as LISP S-expressions whereas linear Genetic Programming breeds instruction sequences resembling programs in assembler language or machine code. for example. 996]. the instructions would usually be named like ADD. such as linear Genetic Programming (see Section 31. a computer processes programs them as sequences of simple machine instructions instead. where strings are just genotypic. function names such as +. It is therefore only natural that the first approach to automated program generation developed by Friedberg [995] at the end of the 1950s used a fixed-length instruction sequence genome [995. . Matter of fact. 31. In linear Genetic Programming for the same purpose. Some of the most important early contributions to LGP come from [2199]: . Finally.31. . it was stated that one of the two meanings of Genetic Programming is the automated. The general idea of LGP. three parameters: the two input registers and the register to store the result of the computation. + would have two child nodes and return the result of the addition to the calling node without explicitly storing it somewhere. as opposed to SGP. . The area of Genetic Programming focused on such instruction string genomes is called linear Genetic Programming (LGP). Example E31. on the other hand. SIN and instead of variables.4) or PADO (see ?? on page ??). x2 . instruction strings (or bit/integer strings encoding tem) are the center of the evolution and contain the program code directly. An ADD instruction in LGP could have. . R[1]. −.MUL. it should be mentioned that the concepts of automatically defined functions and macros are not restricted to the standard tree genomes but are also applicable in other forms of Genetic Programming. This distinguishes it from many grammar-guided Genetic Programming approaches like Grammatical Evolution (see ?? on page ??). ∗.

9 (Linear Genetic Programming with Jumps).4. Like a normal computer program. Openshaw and Turton [2086] (1994) who also used Perkis’s approach but already represented mathematical equations as fixed-length bit string back in the 1980s [2085]. Banzhaf [205]. For processing a linear instruction sequence. Example E31.3. for example. Crepeau [653]. 2048] point out that standard crossover is highly disruptive. [2048. the required status information is of O1 since only the index of the currently executed instruction needs to be stored. it is also easy to limit the runtime in the program evaluation and even simulating parallelism. such an instruction set can be extended with alternatives (if-then-else) and loops.4. [642. Then. who used a genotype-phenotype mapping with repair mechanisms to translate a bit string to nested simple arithmetic instructions in 1993. 31. that the alternatives and loops which we know from high-level programming languages are mapped to conditional and unconditional jump instructions in machine code. As can be seen in Fig. in comparison. 2592]. who developed a machine code GP system around an emulator for the Z80 processor. [536. jump instructions are highly epistatic (see Chapter 17 and ??). These jumps target to either absolute or relative addresses inside the program.a. 31. and the MicroGP (µGP) system for test program induction by Corno et al. Like Standard Genetic Programming. Let us consider the insertion of a single. is no longer feasible. which randomly insert. the commercial system by Foster [976]. because a recursive depth-first traversal of the trees is required. Besides the methods discussed in this section (and especially in Section 31. LGP is most often applied with a simple function set comprised of primitive instructions and some mathematical operators.17. In SGP. whose stack-based GP evaluated arithmetic expressions in Reverse Polish Notation (RPN). it moves to the second instruction and so on. tree-based genomes are less vulnerable in this aspect. other interesting approaches to linear Genetic Programming are the LGP variants developed by Eklund [870] and Leung et al.4 on page 339). the drawback arises that simply reusing the genetic operators for variable-length string genomes (discussed in Section 29. or toggle bits.b. and 4. it is well possible that the resulting shift of the absolute addresses of the subsequent instructions in the program invalidates the control flow and renders the whole program useless. Even though the subtree crossover in tree-genomes is shown to be not very efficient either (see [101] and Section 31.404 31 GENETIC PROGRAMMING 1. Perkis [2164] (1994).5. 31.3).17. a (simulated) processor starts at the beginning of the instruction string. The size of the status information needed by an interpreter for tree-based programs is in Od (where d is the depth of the tree). 3. Nordin et al. in linear Genetic Programming jump instructions may be added to allow for the evolution of a more complex control flow. Then.2 Advantages and Disadvantages The advantage of linear Genetic Programming lies in the straightforward evaluation of the evolved algorithms. delete. Similarly. This issue is illustrated in Fig. It reads the first instruction and its operands from the string and carries it out.2 on page 399). the insertion of an instruction into a loop represented as a tree in Standard Genetic Programming is less critical. . maybe as result of a mutation or recombination operation. new command into the instruction string. Because of this structure. If we do not perform any further corrections after this insertion. 2. We can visualize. 1715] on specialized hardware.

405 (before insertion) (after insertion: loop begin shifted) Fig.1 The Compiling Genetic Programming System One of the early linear Genetic Programming approaches was developed by Nordin [2044].17.} while > 0 a = b b b = {. Operations which are restricted to have only minimal effect on the control flow from the start can also easily be introduced.31. who was dissatisfied with the performance of GP systems written in an interpreted language which.3 Realizations and Implementations 31. the 8 http://en. LINEAR GENETIC PROGRAMMING .4.4. in turn..org/wiki/C_(programming_language) [accessed 2008-09-16] .. {. he published his work on a new Compiling Genetic Programming System (CGPS) written in the C programming language8 [1525] directly manipulating individuals represented as machine code..4.4.1 discusses the homologous crossover operation which represents another method for decreasing the destructive effects of reproduction in LGP. ....3. Such operations could. 50 01 50 01 9A 10 23 38 33 83 8F 03 50 .} = a 1 b * 05 . we shortly outline some of the work of Brameier and Banzhaf [391].. (c) and an epilogue for terminating the function [2045]. who define some interesting approaches to this issue. (b) a set of instructions for information processing.. CGPS evolves linear programs which accept some input data and produce output data as result.a: Inserting into an instruction string.. In 1994. interpret the programs evolved using a tree-shaped genome. Section 31.. 50 01 50 01 9A 10 38 33 83 8F 03 50 . analyze the program structure and automatically correct jump targets.4.wikipedia. Each candidate solution in CGPS consists of (a) a prologue for shoveling the input data from the stack into registers. 31.b: Inserting in a tree representation.4.4. for instance...17.3. As instructions for the middle part. The prologue and epilogue were never modified by the genetic operations. 31. Figure 31. 31.17: The impact of insertion operations in Genetic Programming One approach to increase the locality in linear Genetic Programming with jumps is to use intelligent mutation and crossover operators which preserve the control flow of the program when inserting or deleting instructions. (after Insertion) Fig. In Section 31.

1 (a slightly modified version of the example given in [391]). The genotypes are transformed with a genotype-phenotype mapping into methods of a Java class which then can be loaded into the JVM.4. A genotype in JBGP contains the maximum allowed stack depth together with a linear list of instruction descriptors. thus. 1207].406 31 GENETIC PROGRAMMING Genetic Programming system had arithmetical operations and bit-shift operators at its disposal in [2044]. an individual is represented as a linear sequence of simple C instructions as outlined in the example Listing 31. Furthermore. [1206.3 Java Bytecode Evolution Besides AIMGP. instructions not influencing the result (see Definition D29. AIMGP). 1207] introduce byte code GP (bcGP). and evaluated.wikipedia. 9 http://en. executed. e. The dynamic parts.3. [1792–1795] is written in Java.3. i. there exist numerous other approaches to the evolution of linear Java bytecode functions.4. are modified by the genetic operations which add new byte code [1206. A new interesting application for linear Genetic Programming tackled with AIMGP is the evolution of robot behavior such as obstacle avoiding and wall following [2052]. making this LGP approach Turing-complete. and Francone [2050.org/wiki/Complex_Instruction_Set_Computing [accessed 2008-09-16] . each individual is a linear sequence of Java bytecode and is surrounded by a prologue and epilogue.wikipedia. The static parts contain things like version information are not affected by the reproduction operations.wikipedia. [1206. containing the methods.1 will store its outputs in the registers v[0] and v[1].1 and ??).org/wiki/Bytecode [accessed 2008-09-16] 11 http://en. the whole population can be kept inside a byte array of a fixed size. The JAPHET system of Klahold et al. This had the advantage that all instructions have the same size. Given that the output of the program defined in Listing 31. by adding buffer space. Due to reproduction operations like as mutation and crossover. all the lines marked with (I) do not contribute to the overall functional fitness. linearly encoded programs may contain introns. [1547]. where the whole population of each generation is represented by one class file. also Java Method Evolver.. 31. 2053]. Each instruction descriptor holds information such as the corresponding bytecode and a branch offset.2 Automatic Induction of Machine Code by Genetic Programming CGPS originally evolved code for the Sun Sparc processors. Another interesting application of his system was the compression of images and audio data [2047]. each individual has the same size and. He found that it had approximately the same capability for growing classifiers as artificial neural networks but performed much faster. 31. the successor of CGPS. 31.org/wiki/Reduced_Instruction_Set_Computing [accessed 2008-09-16] 10 http://en. Like in AIMGP. JME) by Lukschandl et al. too. Banzhaf.4. Classes are divided into a static and a dynamic part. the user provides an initial Java class at startup. Harvey et al. These were added in [2046] along with ADFs.3. the support for multiple other architectures was added by Nordin. In the Automatic Induction of Machine Code with GP system (AIM-GP. which is a member of the RISC9 processor class. including Java bytecode10 and CISC11 CPUs with variable instruction widths such as Intel 80x86 processors. The Java Bytecode Genetic Programming system (JBGP. 1207]. but no control flow manipulation primitives like jumps or procedure calls. Nordin [2044] used the classification of Swedish words as task in the first experiments with this new system.4 Brameier and Banzhaf: LGP with Implicit Intron removal In the Genetic Programming system developed by Brameier and Banzhaf [391] based on former experience with AIMGP.

1: A genotype of an individual in Brameier and Banzhaf’s LGP system. In his doctoral dissertation. if(v[0] > v[1]) v[3] = v[5] * v[5]. only genes that express the same functionality and are located at the same loci. v[6] = v[4] .3.4. before the fitness evaluation. the concept of explicitly defined introns (EDIs) proposed by Nordin et al. [209]. v[2] = v[5] + v[4].4. In [388]. The programs which evolve in linear Genetic Programming are encoded in string chromosome.3 on page 396) for linear Genetic Programming. and Boolean function evolution benchmarks.. In the earlier work of Brameier and Banzhaf [391] mentioned just a few lines ago.4.59.31. thus.6 on page 338. regression. similar to nature Banzhaf et al. v[5] = v[7] + 115. v[1] = sin(v[6]). v[0] = v[5] + 73.4. a homologous crossover mechanism could here be beneficial as well. introns were only excluded by the genotype-phenotype mapping but preserved in the genotypes because they were expected to make the programs robust against variations.4. Normal crossover operations like MPX may have a disruptive effect on building blocks. v[6] = v[0] * 25. his linear Genetic Programming approach performed better during experiments with classification.3. v[7] = v[6] * 2. usually of variable length.. should be avoided or at least minimized by them. Brameier and Banzhaf [391] introduce an algorithm which removes these introns during the genotype-phenotype mapping. [2051] is utilized in form of something like nop instructions in order to decrease the destructive effect of crossover. It is assumed that such a recombination operator would have a lower disruptiveness than plain MPX. we discussed homologous crossover for Genetic Algorithms.4. Brameier [388] elaborates that the control flow of linear Genetic Programming more equals a graph than a tree because of jump and call instructions. if(v[1] <= v[6]) v[1] = sin(v[7]). } (I) (I) (I) (I) (I) (I) Listing 31. LINEAR GENETIC PROGRAMMING 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 407 void ind(double[8] v) { . Brameier finds that introducing EDIs decreases the proportion of introns arising from unreachable or ineffective code and leads to better results. The idea of homologous crossover is that. Brameier concludes that such implicit introns representing unreachable or ineffective code have no real protective effect but reduce the efficiency of the reproduction operations and. 31. In comparison with standard tree-based GP.4. if(v[1] > 0) if(v[5] > 23) v[4] = v[2] * v[1]. v[7] = v[0] .4 Recombination 31. Instead. this approach can be considered as an editing operation (see Section 31. function approximation and Boolean function synthesis [393]. Hence.1 Sticky (Homologous) Crossover In Section 29. . The GP system based on this approach was successfully tested with several classification tasks [390–392]. are exchanged between parental genotypes. Basically.

3: Applications and Examples of Linear Genetic Programming. 1301. 2555. If a fitness plateau is reached. In other words. is less destructive. 391] Mathematics [390. 2050. The idea of such an approach is that initially. 3. 982. if z = 12. z is increased to the next divisor of z. If z = z and the GP process stagnates again.4. 1207. is not locally bound. the building blocks in the system are likely to be simple and small. Crossover exchanges exactly a single page between the parents but.5. 2050] introduce a sticky crossover operator which resembles homology by allowing the exchange of instructions between two genotypes (programs) only if they reside at the same loci.5 31. 1180. 1229. 2896] Healthcare [388. Each page includes the same number z of instructions. In sticky crossover. 1977. 31. an upper limit z for this size is defined. 947. 1793. Since exactly one page is exchanged. different from the homologous operators. 2164. 1671. The Genetic Programming process starts with a working page size of z = 1. 392.408 31 GENETIC PROGRAMMING Francone et al..’s sticky crossover is the Page-based linear Genetic Programming by Heywood and Zincir-Heywood [1229]. 2899. 2896] Graph Theory [947. first a sequence of code in the first parent genotype is choosen. 2050. 1793. 1207. 2898] ˘ ˘ ˘ ˘ . longer useful code sequences will be discovered.4. 1180. 31. 2164] Distributed Algorithms and Sys. This way of crossover is useful to identify and spread building blocks and thus. 2916] tems Engineering [392. 393. 1763–1767. 1977. Over time. Here. 2555. this iteration starts all over again at z = 1. 1229. 2164. 1180. 2896.2 Page-based LGP An approach similar to Francone et al. 1229. In other words. calling for a larger page size in order to be preserved. 1206. 1671.4. e. Area References Chemistry [207. 2044] Digital Technology [392. 2555. and 12. 1793. 2888.4. 1671. Heywood and Zincir-Heywood [1229] also device a dynamic page size adaptation method for their crossover operator. 1229. 1682. This sequence is then swapped with the sequence at exactly the same position in the second parent. 1977. i. the third page of the first parent may be swapped with the fifth page of the second parent. no further improvement of the best fitness can be found for some time. [982. 6. programs are described as sequences of pages. the length of the programs remains the same and the number of their instructions does not change. 393.1 General Information on Linear Genetic Programming Applications and Examples Table 31.[947. 2. 2052. 1715. Instead of keeping the page size z constant throughout the evolution. 1682. 2888. 2891. 2898. 4. 2899] Computer Graphics [1671] Control [2052] Data Compression [2047] Data Mining [388–392. 1767. z would go through 1.

2359] [1920] 31. 1180. 380] [1920. 2050.1.4.5.2 Books 1.1 General Information on Grammar-Guided Genetic Programming Applications and Examples Table 31. 1547.6 Graph-based Approaches 31.6. 2046.5. 1207.5: Applications and Examples of Grahp-based Genetic Programming. Data Mining Using Grammar Based Genetic Programming and Applications [2969] 31. 390. 2336] [2032–2034.1 31. 1671. 2032.1. Evolutionary Program Induction of Binary Machine Code and its Applications [2045] 31.1 General Information on Grahp-based Genetic Programming Applications and Examples Table 31. 1977. 1669.31.5 Grammars in Genetic Programming 31.1. Advances in Genetic Programming [2569] 2. Linear Genetic Programming [394] 4. Area Control Data Mining Digital Technology Economics and Finances Engineering Mathematics Software References [2336] [2969] [2032] [379. 2555] [976. 1682. 2053. 2916] [642. 2592] [1682.5.4: Applications and Examples of Grammar-Guided Genetic Programming.1 31. . 1793. 1792. 1206.5.2 Books 1. 1671. Genetic Programming: An Introduction – On the Automatic Evolution of Computer Programs and Its Applications [209] 3. 1767] 31.6. GRAMMARS IN GENETIC PROGRAMMING Prediction Security Software Testing Wireless Communication 409 [205.5. 2086] [947.

3042] [2482] [2482] 31 GENETIC PROGRAMMING . 3042] [2792.410 Area Digital Technology Engineering Mathematics Sorting References [2792.

3170909127233424 0.9278045504520468 0. 92.760834074066516 0.5854806945538868 0.29744194372200355 0. while keeping the structure of the formula F intact.3893919925542615 0.14129105849778467 0.6595800013242682 0.3168517955439009 0.31472423502162195 Listing 31. In a second step.1 on page 387) on the data given in Listing 31.3220146762426389 0.1 on page 905 and the other sources provided with this book to construct your optimization process.34977791509449296 0. and then adapt it in the best .1.3) and optimizes their value with a numerical optimization procedure.05332752398796371 0.05645469964408556 0.315436888525947 0.12226631044115759 0.30073566724237905 0.31955612939384237 0.018694563401923325 0. Use an objective function which could be roughly similar to Listing 57.2: The data for Task 91 [50 points] 92.3077087939606112 0. Combine what you have learned from Task 91. [20 points] 94. but instead of the polynomial.31172078461868413 0. . GRAPH-BASED APPROACHES Tasks T31 411 91.8823767703419055 0.2 T from Task 91. .31.6451020098521425 0. this system should discover all nodes which represent ephemeral random constants (see Definition D49.24247840720311337 0. g1 .008830073673861905 0.2802156496978409 0.7470926929713728 0.0766136391753689 0.7513516563209482 0.5294420401690347 0.018347254459257438 0.2.7571482262199194 0.2571847111870792 0.008752332965198615 0.31946827608018985 0.35201052061616 0.3147394303666216 0.30675072726587754 0. .9482259113015682 0.587999637880455 0.5794622522099054 0.32213477030839227 0.30809978088344814 0.241542619578675 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 a b ============================================= 0. Use any of the optimization algorithms for real-valued problem spaces to match the polynomial g5 a5 + g4 a4 + g3 a3 + g2 a2 + g1 a + g0 to the data given in Listing 31. .799102390437177 0.2. In other words.32191168989382 0.36438318566633743 0.6305855626214137 0.1 on page 531 and Example E31. Do the same as in Task 92. Which mathematical function φ with b = φ(a) relates the values of a to the values of b? You can use the files from Section 57. first discover a good formula structure.2475456173823204 0. Use Genetic Programming to perform symbolic regression (see Section 49.20 on page 908.3138417822317378 0.3164575071414033 0.45985453110101393 0.545693694874572 0.0708938021131889 0. Do not use any tree-related data structures. g5 ) by considering it as genotype 6 in the six-dimensional search space G ⊆ R . Optimize the vector g = (g0 . try to fit the function g2 a2 + g4 a + sin(g3 a + g2 ) + g1 ∗ ea∗g0 .6622167712763556 0.8839162433815917 0. [50 points] 93.9483969161261779 0. and 93: Develop a system which in a first optimization run synthesizes a formula F via symbolic regression and Genetic Programming.32219920416927167 0.6.6134151841062928 0.32233694519750067 0.

Repeat the experiment defined in Task 91 with this LGP system and compare the performance with the tree-based. Standard Genetic Programming approach. Replace the Evolutionary Algorithm as underlying optimization method for Genetic Programming with symbolic regression or a Hill Climber and apply the resulting system to Task 91 or 94. especially in terms of the binary reproduction operation crossover (which is not used in Simulated Annealing or Hill Climbing)? [20 points] 96.412 31 GENETIC PROGRAMMING possible way to the available data. implement a linear Genetic Programming (LGP) system which provides mathematical operations in a style similar to assembler code. Both steps should be performed by an automatic procedure and provide you with the final result without needing manual interaction. Repeat the experiments multiple times. Instead of a tree-based representation for programs and formulas. What conclusions can you draw about the result? [50 points] 95. [70 points] . How does the result change? What are your conclusions about GP. Try to find functions that fit different data created by yourself. as used in Task 91 to 95.

Chapter 32 Evolutionary Programming .

5 on page 319. Genetic Algorithms + Data Structures = Evolution Programs [1886] 32.5 on page 317.5 on page 317.5 on page 319.[2711] tems Engineering [303.5 on page 318. .5 on page 318. Evolutionary Programming. 2711] Games [940] Mathematics [1702] Physics [848] 32.1. EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28. 2018] Function Optimization [169. 1536. EvoWorkshops: Real-World Applications of Evolutionary Computing See Table 28.5 on page 318.2: Conferences and Workshops on Evolutionary Programming. Area References Chemistry [848] Control [1536] Data Mining [2392] Distributed Algorithms and Sys. 2.3 Conferences and Workshops Table 32. Genetic Algorithms [167] 5. 3. WCCI: IEEE World Congress on Computational Intelligence See Table 28. 1894. CEC: IEEE Conference on Evolutionary Computation See Table 28. GECCO: Genetic and Evolutionary Computation Conference See Table 28.1 32.1 General Information on Evolutionary Programming Applications and Examples Table 32.2 1.1: Applications and Examples of Evolutionary Programming.414 32 EVOLUTIONARY PROGRAMMING 32. PPSN: Conference on Parallel Problem Solving from Nature See Table 28. 4.1.1. Books Evolutionary Computation: The Fossil Record [939] Artificial Intelligence through Simulated Evolution [945] Blondie24: Playing at the Edge of AI [940] Evolutionary Algorithms in Theory and Practice: Evolution Strategies. AE/EA: Artificial Evolution See Table 28.

EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. CA.2 on page 130.2 on page 132. CA. USA.5 on page 320.2 on page 131.2 on page 227. 415 BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28.5 on page 319. SMC: International Conference on Systems.2 on page 128. MIC: Metaheuristics International Conference See Table 25. MICAI: Mexican International Conference on Artificial Intelligence See Table 10.32. FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. see [943] 1992/02: La Jolla. CA. CIS: International Conference on Computational Intelligence and Security See Table 28. CA. and Cybernetics See Table 10. Information. EngOpt: International Conference on Engineering Optimization . USA.2 on page 132.5 on page 320. see [942] HIS: International Conference on Hybrid Intelligent Systems See Table 10. ICCI: IEEE International Conference on Cognitive Informatics See Table 10. IN. CIMCA: International Conference on Computational Intelligence for Modelling. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28. USA.2 on page 130. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. Man. WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10. see [946] 1995/03: San Diego. Control and Automation See Table 28.5 on page 320.2 on page 131.5 on page 320.2 on page 128. see [2444] 1993/02: La Jolla. USA. see [2208] 1997/04: Indianapolis.1. GENERAL INFORMATION ON EVOLUTIONARY PROGRAMMING EP: Annual Conference on Evolutionary Programming History: 1998/05: San Diego. USA. and Computing Systems See Table 10. ICNC: International Conference on Advances in Natural Computation See Table 28. see [1848] 1994/02: San Diego. BIONETICS: International Conference on Bio-Inspired Models of Network. USA.5 on page 319.2 on page 131. SEAL: International Conference on Simulated Evolution and Learning See Table 10. CA.2 on page 131. USA. CA. see [109] 1996/02: San Diego.

5 on page 322. EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation See Table 28.5 on page 321. GEM: International Conference on Genetic and Evolutionary Methods See Table 28. FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28. LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10.2 on page 133. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28. ICEC: International Conference on Evolutionary Computation See Table 28. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28. WOPPLOT: Workshop on Parallel Processing: Logic. GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. EvoRobots: International Symposium on Evolutionary Robotics See Table 28.5 on page 321. Organization and Technology See Table 10.2 on page 133. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10.416 See Table 10.5 on page 321. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28. Optimization. GICOLAG: Global Optimization – Integrating Convexity.2 on page 134.5 on page 322.2 on page 134.5 on page 321. IJCCI: International Conference on Computational Intelligence See Table 28.5 on page 321. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10.5 on page 322.5 on page 321. LION: Learning and Intelligent OptimizatioN See Table 10.2 on page 134. 32 EVOLUTIONARY PROGRAMMING EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28.5 on page 321.5 on page 321.2 on page 133. Logic Programming.2 on page 133. .2 on page 133.5 on page 321. and Computational Algebraic Geometry See Table 10. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28.2 on page 132.5 on page 322. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10.

2 on page 228. SEA: International Symposium on Experimental Algorithms See Table 10.1.1. 417 32. GENERAL INFORMATION ON EVOLUTIONARY PROGRAMMING IWACI: International Workshop on Advanced Computational Intelligence See Table 28. SSCI: IEEE Symposium Series on Computational Intelligence See Table 28. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands .32. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2.5 on page 322.4 Journals 1.5 on page 322.5 on page 322.2 on page 135. SEMCCO: International Conference on Swarm. Evolutionary and Memetic Computing See Table 28.

.

1866. The difference between g1 and g2 in the search space G is computed. Points are usually sampled uniformly distributed in the search space. instead it uses the information stored in the whole population.Chapter 33 Differential Evolution 33. DES) is a method for mathematical optimization of multidimensional functions [286. and g3 as parameter.2). Developed by Storn and Price [2621]. 903. g3 ) = g3 + F (g1 − g2 ) 1 http://en. As the search progresses. 1662. As the points in the population approach an optimum (or better: when only those points close to the optimum survive selection). weighted with a weight F ∈ R and added to the third parent g3 . the DE technique has been invented in order to solve the Chebyshev polynomial fitting problem. at the beginning there exists very little knowledge about what is good or bad. The closer the optimization algorithm gets to an optimum. the convergence to an optimum leads to smaller search steps. recombineDE(g1 . hence allowing the optimization process to converge to the optimum. smaller step sizes are required to refine the solutions.org/wiki/Differential_evolution [accessed 2007-07-03] (33. This distance information is used to derive a proper step width for the search operation. the search steps get smaller.2 Ternary Recombination The essential idea behind Differential Evolution is the way in which the (ternary) recombination operator “recombineDE” is defined for creating new genotypes. It takes three parental individuals g1 . With this method. In numerical optimization.wikipedia. Differential Evolution does not need to use explicit endogenous information in the sense of Evolution Strategies (see Section 30. the smaller should the steps become in order to approach it closer and closer. Vice versa. It has proven to be a very reliable optimization strategy for many different tasks where parameters that can be encoded in real vectors. The search operations need to perform step with larger sizes are in order to move towards the regions with best fitness.1 Introduction Differential Evolution1 (strategy) (DE. As the basins of attraction of the optima are reached. In other words. 408. Differential Evolution is an Evolutionary Algorithm which uses the assumption that selection will make the population converge to the basin of attraction of an optimum as basis for self-adaption. initially large inter-point distances lead to larger search steps. 2220. 1882. their distances to each other get smaller. g2 . g2 .5. 33. 2620].1) .

it is only logical to combine or to compare the two methods as we show in ?? on page ??. B identifies the recombination operator to be used. by Kaelo and Ali. Gao and Wang [1017] emphasize the close similarities between the reproduction operators of Differential Evolution and the search step of the Downhill Simplex.2. There exists a common nomenclature for Differential Evolution algorithms: DE/A/p/B where 1. [72]. 2. and 4. p is the number of individual pairs used for computing the difference information used in the recombination operation.1 Advanced Operators The algorithm is completely self-organizing. DE A 33. Further improvements to the basic Differential Evolution scheme have been contributed. 3. This classical reproduction strategy has been complemented with new ideas like triangle mutation and alternations with weighted directed strategies. Thus. Their DERL and DELB algorithms outperformed [1475–1477] standard DE on the test benchmark compiled by Ali et al. . stands for Differential Evolution. is the way the individuals are selected to compute the differential information. no additional probability distribution has to be used in the basic Differential Evolution scheme. for instance.420 33 DIFFERENTIAL EVOLUTION Except for determining F .

3 Conferences and Workshops Table 33.3. HIS: International Conference on Hybrid Intelligent Systems . EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28.3. PPSN: Conference on Parallel Problem Solving from Nature See Table 28. 3020] Graph Theory [2315.5 on page 319. Area References Combinatorial Problems [286. 527] Computer Graphics [2707] Distributed Algorithms and Sys.2 Books 1. Differential Evolution – A Practical Approach to Global Optimization [2220] 2. 2619. 551. GECCO: Genetic and Evolutionary Computation Conference See Table 28.5 on page 318. 2793] Logistics [527] Mathematics [2620] Motor Control [490] Software [2620] Wireless Communication [2793] 33. 1017.[2315] tems Economics and Finances [1060] Engineering [2860] Function Optimization [408.33.5 on page 317. AE/EA: Artificial Evolution See Table 28.2: Conferences and Workshops on Differential Evolution.5 on page 318. Differential Evolution – In Search of Solutions [903] 33. EvoWorkshops: Real-World Applications of Evolutionary Computing See Table 28.5 on page 318.3. 2621. 2130.1: Applications and Examples of Differential Evolution. WCCI: IEEE World Congress on Computational Intelligence See Table 28.5 on page 317. CEC: IEEE Conference on Evolutionary Computation See Table 28.3. GENERAL INFORMATION ON DIFFERENTIAL EVOLUTION 421 33.3 33.5 on page 319.1 General Information on Differential Evolution Applications and Examples Table 33. 1476.

SEAL: International Conference on Simulated Evolution and Learning See Table 10. EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10.2 on page 128.5 on page 321.2 on page 128. CIMCA: International Conference on Computational Intelligence for Modelling. BIONETICS: International Conference on Bio-Inspired Models of Network. MIC: Metaheuristics International Conference See Table 25.5 on page 320. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. LION: Learning and Intelligent OptimizatioN See Table 10. and Cybernetics See Table 10.2 on page 131. CIS: International Conference on Computational Intelligence and Security See Table 28. ICCI: IEEE International Conference on Cognitive Informatics See Table 10. EngOpt: International Conference on Engineering Optimization See Table 10. and Computing Systems See Table 10. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28.2 on page 133.422 See Table 10.2 on page 130.2 on page 131. GEM: International Conference on Genetic and Evolutionary Methods See Table 28.2 on page 132.5 on page 320. 33 DIFFERENTIAL EVOLUTION ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28. EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28.2 on page 130. MICAI: Mexican International Conference on Artificial Intelligence See Table 10.5 on page 321.2 on page 132. GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10. ICNC: International Conference on Advances in Natural Computation See Table 28. Man.2 on page 131.2 on page 227.5 on page 320. . WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10.2 on page 131.5 on page 319. FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28.5 on page 321.2 on page 132. SMC: International Conference on Systems.5 on page 319. Information. Control and Automation See Table 28.5 on page 320.

Optimization.5 on page 322.2 on page 134. WOPPLOT: Workshop on Parallel Processing: Logic.2 on page 134. NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10.5 on page 322.5 on page 321.2 on page 228.2 on page 133. and Computational Algebraic Geometry See Table 10. GENERAL INFORMATION ON DIFFERENTIAL EVOLUTION LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10. FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28. SEA: International Symposium on Experimental Algorithms See Table 10.33. GICOLAG: Global Optimization – Integrating Convexity. Logic Programming. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10.2 on page 135.5 on page 321.5 on page 321.5 on page 321.2 on page 133. ICEC: International Conference on Evolutionary Computation See Table 28. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28. EvoRobots: International Symposium on Evolutionary Robotics See Table 28.5 on page 321.5 on page 322.5 on page 321. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28. SEMCCO: International Conference on Swarm. IJCCI: International Conference on Computational Intelligence See Table 28. 423 EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation See Table 28.5 on page 322. SSCI: IEEE Symposium Series on Computational Intelligence . Organization and Technology See Table 10. SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10.2 on page 133. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25.2 on page 134. Evolutionary and Memetic Computing See Table 28. ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10.3.5 on page 322.5 on page 322. IWACI: International Workshop on Advanced Computational Intelligence See Table 28.2 on page 133.

Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands . 33 DIFFERENTIAL EVOLUTION 33.3.4 Journals 1.5 on page 322.424 See Table 28. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2.

19 on page 771. However.1.3.1. as discussed by Mezura-Montes et al. In Task 90 on page 377. but now test some Differential Evolution variants instead of Evolution Strategy.3. Repeat this experiment.4 on page 794. [1882].2. which in turn. [50 points] .33. it may take p point pairs. In Paragraph 56. Compare your results with those obtained with the other two algorithms in Task 90. the search operators are no longer ternary but (1 + 2p)-ary. an Evolution Strategy was applied to an instance of the Traveling Salesman Problem and its results were compared to what we obtain with a simple Hill Climber that uses the straightforward integer-string permutation representation discussed in Section 29. Re-implement the operators and Differential Evolution algorithm for this general case. you can find implemented in Listing 56. You may use a programming language of your choice.1. In this case. [50 points] 98. Differential Evolution does not necessary only take two points in the search space to compute the difference information. we give the implementation of the ternary search operations of the simple Differential Evolution algorithm.2 on page 329. Instead. GENERAL INFORMATION ON DIFFERENTIAL EVOLUTION Tasks T33 425 97.

.

1683. try to estimate the probability distribution M directly. 34. In case of an Estimation of Distribution Algorithm solving a continuous optimization problem.1 we sketch the information held in a population of.f. thus decreasing the number of likely sampled genotypes.1. the variance D2 M of the distribution corresponding to the parameters of M should decrease and.1. In Genetic Algorithms or Genetic Programming. 1974. The spatially denser the population becomes. of the figure. [126. they try to learn how a perfect solution should look like. the goal is that D2 [M(t)] = 0 and E [M(t)] = x⋆ are approached for t → ∞ (ideally. global optimum x⋆ . in case that there exists a single.1. promising regions. which compete for survival (selection) and reproduce if successful. This is done from the point of view stochastic: Initially.Chapter 34 Estimation Of Distribution Algorithms 34. initially. In Figure 34. From the stochastic point of view. so the probability of each point which could be investigated to be the optimum is basically the same. an Evolution Strategy in Fig. 1 http://en. this corresponds to the assumption that the probability that the optimum may be contained in these regions is estimated to be high. for instance. 2153. from the left to the right. 34.wikipedia. it can be seen how the population of the ES converges. earlier) [1197]. Step by step. 2158]) are different.a to ?? and the information held in a model of an Estimation of Distribution Algorithm in Fig. In the first row. nothing is known about where the optimal solution is located in the search space. EDAs abandon the biological example of repeated sexual and asexual reproduction [2153] and instead.org/wiki/Estimation_of_distribution_algorithm [accessed 2009-08-20] . the population begins to move to certain. its parameters may shrink down and become smaller and smaller. In this case. Estimation of Distribution Algorithms1 (EDAs. With every candidate solution which has actually been evaluated with the objective functions. Instead of improving possible solutions step by step. The candidate solutions are considered to be equivalent to living organisms in nature.1 Introduction Traditional Evolutionary Algorithms are based directly on the natural paragon of Darwinian evolution. more information is gained. the higher this probability is assumed.d to 34. The model used in a real-valued EDA may converge similarly. the probability model M(t = 0) at time step t = 0 may be estimated with a uniform distribution.

534. 34. 2917] Digital Technology [3093] Distributed Algorithms and Sys. 2794] 34. Scalable Optimization via Probabilistic Modeling – From Algorithms to Applications [2158] . 1435.1.1. 34. 2158. Area References Combinatorial Problems [366.d: Model 1 Fig.1.2 General Information on Estimation Of Distribution Algorithms Applications and Examples 34.[1435. 2632] tems Engineering [117.2.2. 2793.b: Population 3 Fig. 34. 1784. 2793.a: Population 1 Fig. 1910. 2632.1: Comparison between information in a population and in a model. 2794] Logistics [1121.c: Population 4 m2 m1 s1 s2 m2 m1 s1 s2 m2 m1 s1 s2 Fig. 34. 34. 1435. 2860. 3093] Function Optimization [2038] Graph Theory [534.e: Model 3 Fig. 2456.1: Applications and Examples of Estimation Of Distribution Algorithms.428 34 ESTIMATION OF DISTRIBUTION ALGORITHMS Fig.1.1. 2158. 2158] Data Mining [42.1 Table 34. 2455. 2631.f: Model 4 Figure 34. 1683. 2455.2 Books 1. 2631. 2456. 34. 1683. 1683] Mathematics [2038] Military and Defense [3046] Motor Control [664] Physics [2187] Software [117] Wireless Communication [534.1. 2069. 2398. 34.

3 Conferences and Workshops Table 34. AE/EA: Artificial Evolution See Table 28. HIS: International Conference on Hybrid Intelligent Systems See Table 10. MIC: Metaheuristics International Conference See Table 25.5 on page 317. and Computing Systems See Table 10.2 on page 128. ICCI: IEEE International Conference on Cognitive Informatics See Table 10. WCCI: IEEE World Congress on Computational Intelligence See Table 28.5 on page 318. Estimation of Distribution Algorithms – A New Tool for Evolutionary Computation [1683] 3.2. SEAL: International Conference on Simulated Evolution and Learning See Table 10.2 on page 130.5 on page 319.5 on page 320.2.2 on page 227.5 on page 318. Hierarchical Bayesian Optimization Algorithm – Toward a New Generation of Evolutionary Algorithms [2145] 34.5 on page 318.5 on page 317. EvoWorkshops: Real-World Applications of Evolutionary Computing See Table 28. FEA: International Workshop on Frontiers in Evolutionary Algorithms See Table 28.2: Conferences and Workshops on Estimation Of Distribution Algorithms.2 on page 130.2 on page 128.5 on page 319. EMO: International Conference on Evolutionary Multi-Criterion Optimization See Table 28. BIOMA: International Conference on Bioinspired Optimization Methods and their Applications See Table 28. ICNC: International Conference on Advances in Natural Computation See Table 28. PPSN: Conference on Parallel Problem Solving from Nature See Table 28. CEC: IEEE Conference on Evolutionary Computation See Table 28.34. GECCO: Genetic and Evolutionary Computation Conference See Table 28.5 on page 319.5 on page 319. ICANNGA: International Conference on Adaptive and Natural Computing Algorithms See Table 28. Towards a New Evolutionary Computation – Advances on Estimation of Distribution Algorithms [1782] 4. Information. BIONETICS: International Conference on Bio-Inspired Models of Network. GENERAL INFORMATION ON ESTIMATION OF DISTRIBUTION ALGORITHMS429 2. MICAI: Mexican International Conference on Artificial Intelligence .5 on page 320.

ASC: IASTED International Conference on Artificial Intelligence and Soft Computing See Table 10. GEM: International Conference on Genetic and Evolutionary Methods See Table 28.2 on page 131. GEC: ACM/SIGEVO Summit on Genetic and Evolutionary Computation See Table 28. Progress in Evolutionary Computation: Workshops on Evolutionary Computation See Table 28. EvoNUM: European Event on Bio-Inspired Algorithms for Continuous Parameter Optimisation . LSCS: International Workshop on Local Search Techniques in Constraint Satisfaction See Table 10.2 on page 133.2 on page 133. CIMCA: International Conference on Computational Intelligence for Modelling.430 See Table 10.5 on page 321.2 on page 132.5 on page 321.5 on page 320. see [2155] SoCPaR: International Conference on SOft Computing and PAttern Recognition See Table 10.5 on page 320.5 on page 321. ALIO/EURO: Workshop on Applied Combinatorial Optimization See Table 10. CSTST: International Conference on Soft Computing as Transdisciplinary Science and Technology See Table 10. CIS: International Conference on Computational Intelligence and Security See Table 28. EUFIT: European Congress on Intelligent Techniques and Soft Computing See Table 10. WOPPLOT: Workshop on Parallel Processing: Logic. WSC: Online Workshop/World Conference on Soft Computing (in Industrial Applications) See Table 10. Control and Automation See Table 28. SMC: International Conference on Systems. USA.2 on page 134. Organization and Technology See Table 10.2 on page 131.2 on page 132. AJWS: Australia-Japan Joint Workshop on Intelligent & Evolutionary Systems See Table 28. LION: Learning and Intelligent OptimizatioN See Table 10. NV. Man.2 on page 133.5 on page 321.2 on page 131.2 on page 133. EvoCOP: European Conference on Evolutionary Computation in Combinatorial Optimization See Table 28. CA. EngOpt: International Conference on Engineering Optimization See Table 10.2 on page 133. 34 ESTIMATION OF DISTRIBUTION ALGORITHMS NICSO: International Workshop Nature Inspired Cooperative Strategies for Optimization See Table 10.2 on page 131. OBUPM: Optimization by Building and Using Probabilistic Models Workshop History: 2001/07: San Francisco. USA. and Cybernetics See Table 10. see [2149] 2000/07: Las Vegas.2 on page 132.5 on page 321.

431 FOCI: IEEE Symposium on Foundations of Computational Intelligence See Table 28. 34.5 on page 321. Evolutionary and Memetic Computing See Table 28.2 on page 134.2 on page 135.5 on page 321. The problem spaces X of the EDAs are usually the real vectors Rn or bit strings Bn and they also search in these spaces (G ≡ X). Optimization. we define the basic structure “PlainEDA” of Estimation of Distribution Algorithms and in Algorithm 34.4 Journals 1. In the general case.2 on page 228. NaBIC: World Congress on Nature and Biologically Inspired Computing See Table 28. Logic Programming. the actual loop of the EDA begins. Both algorithms begin with creating an initial a population pop (via “createPop”). as we assume in Algorithm 34. IWNICA: International Workshop on Nature Inspired Computation and Applications See Table 28.2 on page 134.5 on page 322.5 on page 321.x which belong to the initially created elements p.2. This . After this is done. ALGORITHM See Table 28.5 on page 322. SEA: International Symposium on Experimental Algorithms See Table 10.5 on page 322.3. IJCCI: International Conference on Computational Intelligence See Table 28. IEEE Transactions on Evolutionary Computation (IEEE-EC) published by IEEE Computer Society 2. Genetic Programming and Evolvable Machines published by Kluwer Academic Publishers and Springer Netherlands 34.5 on page 322.5 on page 322. META: International Conference on Metaheuristics and Nature Inspired Computing See Table 25.34. ICEC: International Conference on Evolutionary Computation See Table 28.g in the search space for all p ∈ pop.2. GICOLAG: Global Optimization – Integrating Convexity.1. the more general structure “EDA MO” is given. “createPop” might just produce a set of individuals representing random points in the search space or may as well create the points systematically following some grid. we wish to compute the objective values of the candidate solutions.5 on page 321. WPBA: Workshop on Parallel Architectures and Bioinspired Algorithms See Table 28. First. EvoRobots: International Symposium on Evolutionary Robotics See Table 28.2. SEMCCO: International Conference on Swarm.5 on page 322. SSCI: IEEE Symposium Series on Computational Intelligence See Table 28.5 on page 322. and Computational Algebraic Geometry See Table 10. we first need to derive the points p. IWACI: International Workshop on Advanced Computational Intelligence See Table 28.3 Algorithm In Algorithm 34. Krajowej Konferencji Algorytmy Ewolucyjne i Optymalizacja Globalna See Table 10. where the search space G is not necessarily the same as the problem space X.

ps) return x . ps.432 34 ESTIMATION OF DISTRIBUTION ALGORITHMS Algorithm 34. uniform distribution x ←− pop[0] continue ←− true while continue do pop ←− computeObjectives(pop. f) [0] if terminationCriterion then continue ←− f alse else mate ←− selection(mate. mps) Input: f: the objective function Input: ps: the population size Input: mps: the mating pool size Input: [implicit] terminationCriterion: the termination criterion Data: pop: the population Data: mate: the matingPool Data: M: the stochastic model Data: continue: a variable holding the termination criterion Output: x: the best individuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 begin pop ←− createPop(ps) M ←− initial model. mps) M ←− buildModel(mate. M) pop ←− sampleModel(M.1: x ←− PlainEDA(f. f) x ←− extractBest(pop ∪ {x} . f.

mps) Input: OptProb: a tuple (X. gpm) pop ←− computeObjectives(pop. ps)) return extractPhenotypes(archive) . f ) archive ←− extractBest(pop ∪ archive. v. f . cmpf ) if terminationCriterion then continue ←− f alse else v ←− assignFitness(pop. ps. and a comparator function cmpf Input: ps: the population size Input: mps: the mating pool size Input: [implicit] gpm: the genotype-phenotype mapping Input: [implicit] terminationCriterion : the termination criterion Data: pop: the population Data: mate: the matingPool Data: v: the fitness function Data: M: the stochastic model Data: continue: a variable holding the termination criterion Data: archive: the archive with the best individuals Output: X: the set of best candidate solutions discovered 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 begin pop ←− createPop(ps) M ←− initial model.3. cmpf ) mate ←− selection(mate. uniform distribution archive ←− ∅ continue ←− true while continue do pop ←− performGPM(pop. ALGORITHM 433 Algorithm 34. cmpf ) of the problem space X. the objective functions f .34. sampleModel(M. mps) M ←− buildModel(mate.2: X ←− EDA MO(OptProb. M) pop ←− appendList(pop.

which is.1) or the fitness (Algorithm 34. In the latter case outlined in Algorithm 34. the cycle starts all over again unless the termination criterion terminationCriterion was met. 1974]. we can check whether we discovered a new candidate solution with better performance than the best one known until now (x. Notice that until now. M represents a specific probability distribution which can be sampled by using a random number generator creating elements following the parameters defined in M. however. With the population of new individuals. not dominated. archive pruning strategies may be necessary in order to prevent the archive of best individuals from overflowing. After the objective values are known. for instance.1 UMDA: Univariate Marginal Distribution Algorithm Maybe the simplest EDA for binary search spaces is the Univariate Marginal Distribution Algorithm (UMDA) [1973. This may be done with any given selection algorithm. but generally. In the usual. the set f contains more than one function f ∈ f . In the end. obviously. 34. This procedure allows us to use. we can determine the objective values (here sketched as applying “computeObjectives” to pop). In the general case where |f | ≥ 1. a selection step “selection” chooses mps of the best individuals from pop and puts them into the mating pool mate. 34.20) in Paragraph 56. there may be more than one candidate solution which can be considered as optimal from the perspective of the current knowledge. an additional fitness assignment process “assignFitness” may be used which breaks down the objective values to a single scalar fitness v(p) for each individual p denoting its performance in relation to the other individuals in the population pop.4.x) (Algorithm 34. which is illustrated by providing it with the previous model as parameter.4.g belonging to the optimal . we deal with a single objective function f (Algorithm 34.1). the model M is used therefore.4.4 EDAs Searching Bit Strings Most of the research on EDAs is performed in areas where the search spaces G are bit strings g ∈ G = Bn of a fixed length n.1 on page 781.434 34 ESTIMATION OF DISTRIBUTION ALGORITHMS would be done with a genotype-phenotype mapping gpm : G → X whose application we here denote as “performGPM” in the generalized algorithm variant Algorithm 34. The UMDA constructs a model M storing one real number M[i] corresponding to the estimated probability that the bit at index i in the genotype p⋆ . the EDA proceeds exactly like any normal Evolutionary Algorithm.1). a probabilistic model M representing the traits elements in mate is computed with an EDA-specific procedure “buildModel”. In the next step.2.3 and provide a simple example implementation of its modules in Java on top of our basic EDA (see Listing 56. Algorithm 34. We therefore define here the model type specific “sampleModel”operation whose results then can either replace the old population completely or (as in case of Algorithm 34. but truncation or fitness proportional selection are often preferred in EDAs. Instead of creating a new population based on elements in mate by. Based on the candidate solutions p. this is not required. for instance.2. multi-objective optimization may be performed as well.2). for instance. applying mutation and crossover operators as it would be done in Genetic Algorithms or Genetic Programming.1. Then.2) be merged into it. an EDA for bit strings Bn while actually optimizing antenna designs [3046].x. We define it here as “UMDA” in Algorithm 34. Notice that in the latter case.2. Based on the objective values f(p. the single best candidate solution discovered is returned in the simple case Algorithm 34.1 or the archive of best solutions in the general case Algorithm 34. Normally. We use the operation “extractBest” in both algorithm variants do sketch this process. This operation may also involve previously accumulated knowledge such as models created in past generations. single-objective case.

4.4.3. 0.6.2: A rough sketch of the UMDA process.4 P(g[1]=1)=0. 0. 0. 1.1)=0. 1) Repeat 5/6 ps (population size) times. (0.7 randUni[0. 1. . 0.1)=0. (0. (1. 1. then go back to 1.1)=0.5 randUni[0. 1. 1. 0. Illustrated: Indivduals in the mating pool: g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 = = = = = = = = = = (1. 1.1)=0.3 randUni[0. 0. (1. 0. 1. 1. 0.34. 1) 1) 1) 0) 0) 1) 1) 1) 0) 1) 2 3 4 Model building: #0 #1 2 6 3 4 3 8 4 7 6 7 Model: M = (0. 0.6 P(g[3]=1)=0. 0.2 P(g[4]=1)=0.3) P(g[4]=0)=0.8 P(g[2]=0)=0.7 Rules for Sampling: P(g[3]=0)=0. 0. 0. 1. 1. (1. 0. 0. 1. 1. 0.2 6 7 New genotype: g = (1. 0. 1. 1. (1. 0. (1. 1. EDAS SEARCHING BIT STRINGS 435 1 Select individuals. 1.1)=0.2.9 randUni[0. (1.6 5 Model sampling: randUni[0.4 P(g[1]=0)=0. 0.3 P(g[0]=1)=0. (1. 1. Figure 34. 0.7 P(g[0]=0)=0. 1.3 P(g[2]=1)=0.

3: x ←− UMDA(f. . ps) pop ←− performGPM(pop.436 34 ESTIMATION OF DISTRIBUTION ALGORITHMS Algorithm 34. .21 on page 781 x ←− null continue ←− true while continue do pop ←− sampleModelUMDA(M. gpm) pop ←− computeObjectives(pop. . mps) Input: f: the objective function Input: ps: the population size Input: mps: the mating pool size Input: [implicit] terminationCriterion: the termination criterion Input: [implicit] n: the number of bits per genotype Data: pop: the population Data: mate: the matingPool Data: M: the stochastic model Data: continue: a variable holding the termination criterion Output: x: the best individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 begin M ←− (0. f. f) [0] if terminationCriterion then continue ←− f alse else M ←− buildModelUMDA(mate) return x . . 0. f) mate ←− rouletteWheelSelectionr (pop. 0.5. f) [0] else optres ←− extractBest(mate ∪ {x} . ps.5) // see Listing 56. mps) if x = null then optres ←− extractBest(mate.5.

for instance.4: pop ←− sampleModelUMDA(M.4 as “sampleModelUMDA” and implement it in Listing 56. . We illustrate this process in Algorithm 34. EDAS SEARCHING BIT STRINGS 437 candidate solution p⋆ .g.4. 1) ≤ M[j] then p.g. each element of the vector M is 0.23 on page 783. The model is sampled in order to fill the population with individuals. Therefore.g if g [j] = 1 then c ←− c + 1 c M[j] ←− len(mate) The model building process Algorithm 34. UMDA prescribes no specific selection scheme. Initially. j. the objective values of the candidate solutions p. .5 of the UMDA is very simple and you can .5: M ←− buildModelUMDA(mate) Input: mate: the selected individuals Input: [implicit] n: the number of bits per genotype Data: i.g. p) A genotype-phenotype mapping may be performed which transforms the genotypes p. 1) pop ←− addItem(pop. Algorithm 34. . 0) for j ←− 0 up to n − 1 do c ←− 0 for i ←− 0 up to len(mate) − 1 do g ←− mate[i]. A rough sketch of the UMDA process is given in Figure 34. j: counter variables Data: p: an individual record and its corresponding genotype p.34. For the jth bit in a new genotype p.g Output: pop: the new population 1 2 3 4 5 6 7 8 begin pop ←− () for i ←− 1 up to ps do p. 1) uniformly distributed between (inclusively) 0 and (exclusively) 1 are used. real-valued random numbers randomUni[0. ps) Input: M: the model Input: ps: the target population size Input: [implicit] n: the number of bits per genotype Data: i. 0.3.2.g ←− addItem(p. the case in circuit design [3093]. as is.5. With any of these operators (we chose the fitness proportionate variant in Algorithm 34.x is 1. but setups with truncation selection are used as well [3093]. c: counter variables Data: g: an element from the search space g ∈ G Output: M: the new model 1 2 3 4 5 6 7 8 begin M ←− (0. mps individuals are copied into the mating pool.x if needed. The original papers [1973] discuss fitness proportionate and tournament selection. 0) else p. . Then.g of the individuals in the population pop to their corresponding phenotypes p.g ←− addItem(p.x can be determined. such a number is drawn and the bit becomes 0 if it is smaller or equal to M[j]. Algorithm 34.g ←− () for j ←− 0 up to n − 1 do if randomUni[0.

x[j] = 1. Here. It should be noted that in the original version of the algorithm given in [193]. too. 0. ps) pop ←− performGPM(pop. ps. the cycle starts again until the termination criterion terminationCriterion is met.5) x ←− null continue ←− true while continue do pop ←− sampleModelUMDA(M. c divided the number of individuals in the mating pool hence is the relative frequency h(p.6: x ←− PBIL(f. . The model is therefore sampled like in the UMDA and the algorithm itself also proceeds pretty much in the same way.5. 34.4. gpm) pop ←− computeObjectives(pop. The main difference is the model building step presented as Algorithm 34. the single probabilities were randomly 2 http://en. mps) if x = null then x ←− extractBest(mate. mps) Input: f: the objective function Input: ps: the population size Input: mps: the mating pool size Input: [implicit] n: the number of bits per genotype Input: [implicit] terminationCriterion: the termination criterion Data: pop: the population Data: mate: the matingPool Data: M: the stochastic model Data: continue: a variable holding the termination criterion Output: x: the best individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 begin M ←− (0.5. .2 PBIL: Population-Based Incremental Learning One of the first steps into EDA research was the population-based incremental learning algorithm2 PBIL published in 1994 by Baluja [193]. truncation selection is used which allows only the best mps individuals to enter mating pool. . f. 195].6 in the version given in [195]. Algorithm 34. f) [0] else x ←− extractBest(mate ∪ {x} .438 34 ESTIMATION OF DISTRIBUTION ALGORITHMS find a Java example implementation in Listing 56. For the bit at index j. Here we define it as Algorithm 34. With the new probability vector. a simple mutation was applied to the models. Here. f) mate ←− truncationSelection(pop.22 on page 782. M) return x The structure of the models in PBIL is exactly the same as in the UMDA: For each bit in the genotypes. f) [0] if terminationCriterion then continue ←− f alse else M ←− buildModelPBIL(mate. the number c of times it was 1 is counted over all genotypes g stored the individual records in the mating pool mate. a new model M is created by merging the current model M′ and the information provided by the mating pool mate. 0. [194. len(mate)) and serves as the estimate of the probability that the locus j should be 1 in the optimal solution. The learning rate λ defines how much each bit in the genotypes g of the selected individuals p ∈ mate contributes to the new model.wikipedia. . In PBIL.7.org/wiki/Population-based_incremental_learning [accessed 2009-08-20] . a real number denoting the estimated probability that that bit should be 1 in the optimal genotype.

i. In Algorithm 34. because fp1 . gpm) mate ←− computeObjectives(mate. . .. In the single-objective case. [1198.. e. e.5) while ¬terminationCriterionCGA(M) do mate ←− sampleModelUMDA(M. only the best individual had influence on M.5.8: x ←− cGA(f. M′ ) Input: mate: the selected individuals Input: M′ : the old model Input: [implicit] λ: the learning rate Input: [implicit] n: the number of bits per genotype Data: i: a counter variable Data: g: an element from the search space g ∈ G Output: MT : a temporary model Output: M: the new model 1 2 3 4 439 begin MT ←− buildModelUMDA(mate) for i ←− 0 up to n − 1 do M[i] ←− (1 − λ) ∗ M′ [i] + λ ∗ MT [i] changed by a very small number. for instance. EDAS SEARCHING BIT STRINGS Algorithm 34. Also. p1 . the absolute frequency of its corresponding genotype p1 . M. The idea of the cGA is the following: Assume that we have a normal Genetic Algorithm and two individuals p1 and p2 compete in a binary tournament. Then. HCwL) and M¨hlenbein [1973] (Incremental Univariate u Marginal Distribution Algorithm. [1652] (Hill Climbing with Learning. usually one of them will win and reproduce whereas the other vanishes. Similar algorithms have been introduced by Kvasnicka et al.g in the population will increase by one.3 cGA: Compact Genetic Algorithm The compact Genetic Algorithm (cGA) by Harik et al.x < fp2 . 2) mate ←− performGPM(mate. . ps) return transformModelCompGA(M) algorithm which modifies this vector M so that there is a direct relation between it and the population it represents [2153]. the mating pool size was one. IUMDA). 0.x. 1199] again uses a vector of (real-valued) probabilities to represent the estimation of the solutions with the best features according to the objective function f.7: M ←− buildModelPBIL(mate.x ≺ p2 .8 we provide an outline of this Algorithm 34.5. . 34. For a population of size . i. f) M ←− buildModelCGA(mate. ps) Input: f: the objective function Input: ps: the population size Input: [implicit] n: the number of bits per genotype Data: mate: the matingPool Data: M: the stochastic model Output: x: the best individual 1 2 3 4 5 6 7 8 begin M ←− (0.x in case of minimization.4.34.4. 0. Assume further that p1 prevails in the competition.

since the individual is copied as a whole.4). ps) Input: mate: the two individuals Input: M′ : the old model Input: ps: the simulated population size Input: [implicit] n: the number of bits per genotype Data: p1 . 2). Hence.g. Algorithm 34.9. Further theoretical results concerning the efficiency of the cGA can be found in [833. M′ .440 34 ESTIMATION OF DISTRIBUTION ALGORITHMS 1 ps. Algorithm 34. This obviously also holds for every single gene in p1 . see Algorithm 34. only genotypes which exactly equal to the model can be sampled and the vector will not change anymore. this means that its relative frequency increases by ps . false): whether the cGA should terminate 1 2 3 4 begin for i ←− n − 1 down to 0 do if (M[i] > 0) ∧ (M[i] < 1) then return false return true The result of the optimization process is the phenotype gpm(g) corresponding to the genotype g ≡ M defined by the model M in the converged state.11.g [i] = 0 then M[i] ←− M′ [i] + else M[i] ←− M′ [i] − 1 ps 1 ps 11 The optimization process has converged if all elements of the model M became either 1 or 0.10: (true.9: M ←− buildModelCGA(mate.g [i] then if p1 . The trivial way to produce this element is specified here as Algorithm 34. .x then p1 ←− p2 p2 ←− mate[0] for i ←− n − 1 down to 0 do if p1 . as termination criterion “terminationCriterionCGA” we simply need to check whether this convergence already has taken place or not.g [i] = p2 . The cGA tries to emulate this situation by creating exactly two new genotypes in each iteration by using the same sampling method already utilized in the UMDA (sampleModelUMDA(M.x ≺ p1 . Then. p2 : the individual variables Data: i: counter variable Output: M: the new model 1 2 3 4 5 6 7 8 9 10 begin M ←− M′ p1 ←− mate[0] p2 ←− mate[1] if p2 . false) ←− terminationCriterionCGA(M) Input: M: the model Input: [implicit] n: the number of bits per genotype Data: i: a counter Output: (true. These two individuals then compete with each other and the genes of the winner then are used to update the model M as sketched in Algorithm 34.

. EDAS SEARCHING BIT STRINGS Algorithm 34.5 then g ←− addItem(g. 1) return gpm(g) 834. Because of its compact structure – only one real vector and two bit strings need to be held in memory – the cGA is especially suitable for implementation in hardware [117].34.11: x ←− transformModelCompGA(M) Input: M: the model Input: [implicit] n: the number of bits per genotype Data: i: a counter Data: g: the genotype.4. g ∈ G Output: x: the optimization result. 2271]. x ∈ X 1 2 3 4 5 6 441 begin g ←− () for i ←− 0 up to n − 1 do if M[i] < 0. 0) else g ←− addItem(g.

For search spaces o Algorithm 34. it builds models M which consist of a vector of expected values M. where n the number of elements of the genotypes. σ): the stochastic model Data: continue: a variable holding the termination criterion Output: x: the best individual 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 begin M. we discuss multiple such approaches. G[1] × . We introduce it here as Algorithm 34.5 EDAs Searching Real Vectors A variety of Estimation of Distribution Algorithms has been developed for finding optimal real vectors from the Rn . G[0] × G[1].12. . f) [0] else x ←− extractBest(mate ∪ {x} . Initially. [1976]. rangeToStdDev = 0. . f) mate ←− truncationSelection(pop.1 SHCLVND: Stochastic Hill Climbing with Learning by Vectors of Normal Distribution The Stochastic Hill Climbing with Learning by Vectors of Normal Distribution (SHCLVND) by Rudlof and K¨ppen [2346] is kind of a real-valued version of the PBIL. f) [0] if terminationCriterion then continue ←− f alse else M ←− buildModelSHCLVND(mate.442 34 ESTIMATION OF DISTRIBUTION ALGORITHMS 34. M¨hlenbein u et al. M) G ⊆ Rn . introduce one of the first real-valued EDAs and in the following. for instance. mps) Input: f: the objective function Input: ps: the population size Input: mps: the mating pool size Input: [implicit] terminationCriterion: the termination criterion Input: [implicit] n: the number of elements of the genotypes ˇ ˆ Input: [implicit] G.µ and a vector of standard deviations M. ps.µ is set to the point right in the middle of the search ˇ space which is delimited by the vector of minimum values G and the vector of maximum ˆ values G. ps) pop ←− performGPM(pop. The vector of .σ ←− rangeToStdDev G − G return x x ←− null continue ←− true while continue do pop ←− sampleModelSHCLVND(M. f.µ ←− 1 ˇ ˆ G+G 2 ˆ ˇ M.5. mps) if x = null then x ←− extractBest(mate. the expected value vector M.5 in [2346] Data: pop: the population Data: mate: the matingPool Data: M = (µ. 34.12: x ←− SHCLVND(f. Input: [implicit] rangeToStdDev: a factor for converting the range to the initial standard deviations σ.σ. Notice that these limits are soft and genotypes may be created which lie outside the limits (due to the unconstraint normal distribution used for sampling). G: the minimum and maximum vector delimiting the problem ˇ ˆ ˇ ˆ space G = G[0]. gpm) pop ←− computeObjectives(pop.

p) Algorithm 34.σ return M After sampling a population pop of ps individuals. one univariate normally distributed random number randomNorm(µ[j]. σ [j])is created for each locus j of the genotypes according to the value µ[j] denoting the expected location of the optimum in the jth dimension and the prescribed standard deviation σ [j]. a process here defined as “buildModelSHCLVND” in Algorithm 34.14: M ←− buildModelSHCLVND(mate.5. ps) Input: M = (µ. and computing the objective values for the individual records in the “computeObjectives” step.µ + λ ∗ g M. mps elements are selected using truncation selection – exactly like in the PBIL.g. σ) are sampled in order to create new points in the search space. These mps best elements are then used to update the model.σ is initialized by multiplying a constant rangeToStdDev with the ˆ ˇ range spanned by G − G. EDAS SEARCHING REAL VECTORS 443 standard deviations M.σ for i ←− 1 up to ps do p. Algorithm 34.g ←− addItem(p. 1]: the standard deviation reduction rate Data: g: the arithmetical mean of the selected vectors Output: M: the new model 1 begin g ←− 1 len(mate) len(mate)−1 2 3 4 5 mate[i].g ←− () for j ←− 0 up to n − 1 do p.34.5 in the original paper by Rudlof and K¨ppen [2346]. M′ ) Input: mate: the selected individuals Input: M′ : the old model Input: [implicit] n: the number of elements of the genotypes Input: [implicit] λ: the learning rate Input: [implicit] σred ∈ [0. Like PBIL’s model . j: counter variables Data: p: an individual record and its corresponding genotype p. Therefore. performing a genotype-phenotype mapping “gpm” if necessary.13 illustrates how the models M = (µ.13: pop ←− sampleModelSHCLVND(M. randomNorm(µ[j].µ ←− (1 − λ) ∗ M .σ ←− σred ∗ M′ .14.g i=0 ′ M. σ [j])) pop ←− addItem(pop. The constant rangeToStdDev is set to rangeToStdDev = 0. o Algorithm 34. σ): the model Input: ps: the target population size Input: [implicit] n: the number of elements of the genotypes Data: i.g Data: µ: the vector of estimated expected values Data: σ: the vector of standard deviations Output: pop: the new population 1 2 3 4 5 6 7 8 9 begin pop ←− () µ ←− M.µ σ ←− M.

σ are the same. For updating the center vector M. The model can be sampled using. Instead. After a possible genotype-phenotype mapping.17. In the case where all elements of M. the boundary ˇ ˆ vectors M. or to adjust each of its elements to the variance found in the corresponding loci of the K best genotypes of the current generation using the same learning mechanism as for the center vector.G[j].µ + λ ∗ (gb.g of the individuals p in the mating pool mate. 1]. 2621] and here listed in Equation 34.7.G and M.P by merging it with the information of whether good traits can be found in . it is simply multiplied with a simple reduction factor σred ∈ [0. 2456] follows a different approach. First.997 241 in their experi- ˆ which availed with t = 2500 generations and σG = ment [2346].G[j] for the jth gene from any of the n loci. 4. Rudlof and K¨ppen [2346] assume a goal value σG the standard deviations o ˆ are to take on after t generations of the algorithm and set σred = √ ˆ t σG 1 1000 (34.5. M.15 which basically proceeds almost exactly like the PBIL for binary search spaces (see Algorithm 34. These vectors are combined in a way similar to the reproduction operator used in Differential Evolution [2220.2 PBILC : Continuous PBIL Sebag and Ducoulombier [2443] suggest a similar extension of the PBIL to continuous search spaces: PBILC . Setting it to a constant value would.2 − gw ) (34. however. independent uniformly distributed random ˇ ˆ numbers in the interval prescribed by M. the ˆ ˇ model M consists of three vectors: the upper and lower boundary vectors M.6). to set each of its elements to the variance found in the corresponding loci of the K best genotypes of the current generation. . as done in Algorithm 34. . depending on the value. We define this real-coded PBIL as Algorithm 34. for instance. 2.G limiting the current region of interest and vector M.5.σ itself in a way similar to Evolution Strategy. 34.5. the vector M. the consider the genotypes gb.16.1 and gb. and a truncation selection step.5.µ = (1 − λ) ∗ M′ . M. 34.2) For updating M. Here. M.σ model component is not learned.2. they suggest four possibilities: 1.P is set to (0. 0.P denoting the probability for each locus ˇ ˆ j that a good genotype is located in the upper half of the interval M. ).G[j].3 Real-Coded PBIL The real-coded PBIL by Servet et al.1) to σred ≈ 0. to allow evolution to adjust M. the model M is to be updated with Algorithm 34.σ. [2455.G and M.G[j] . The standard deviation vector M.2 of the two best and the genotype gw of the worst individual discovered in the current generation. .444 34 ESTIMATION OF DISTRIBUTION ALGORITHMS updating step Algorithm 34.µ of the estimated distribution. determining the objective values. In the startup phase.1 + gb.µ which defines how much this vector is shifted into the direction of the mean vector g of the genotypes p.G of the model M are initialized to the problem-specific boundaries of the search space and the M. 3. “buildModelSHCLVND” uses a learning rate λ for updating the expected value vector model component M. prevent the search from becoming very specific or would slow it down significantly.

5) x ←− null continue ←− true while continue do pop ←− sampleModelRCPBIL(M. Data: pop: the population Data: mate: the matingPool ˇ ˆ Data: M = P. G: the minimum and maximum vector delimiting the problem ˇ ˆ ˇ ˆ space G = G[0].34.G ←− G M. mps) Input: f: the objective function Input: ps: the population size Input: mps: t