Optimization - An Introduction

Lecture 1

1

Outline
Basic Optimization Background Introduction to Genetic Algorithm (GA)

2

1

Optimization
It is the act of obtaining the best result under given circumstances. Engineering problems where the emphasis is on maximizing or minimizing a certain goal.

3

Optimization (Contd.)
f(x) f(x)

Minimum of f(x) Maximum of –f(x) -f(x)
4

x

2

Optimization (contd.)
The optimum seeking methods are also known as mathematical programming techniques and are generally studied as a part of operation research.

5

Methods of Operation Research
Mathematical Programming Stochastic Process Techniques Statistical Methods

6

3

Mathematical Programming Techniques Calculus Methods Calculus of variations Nonlinear Programming Geometric Programming Quadratic Programming Linear Programming Dynamic Programming Integer Programming Stochastic Programming Separable Programming Multi-objective Programming Network Methods: CPM/PERT Game Theory 7 Stochastic Process Techniques Statistical decision theory Markov Processes Queueing Theory Renewal Theory Simulation Methods Reliability Theory 8 4 .

towers. bridges.Statistical Methods Regressional Analysis Cluster Analysis. chimneys and dams for minimum cost 10 5 . foundations. Pattern Recognition Design of Experiments Discriminate Analysis (Factor Analysis) 9 Engineering Applications Design of Aircraft and Aerospace Structures for minimum weight Finding the optimal trajectories of space vehicles Design of Civil engineering structures – frames.

Design of material handling equipment like conveyors. trucks and cranes for minimum cost. turbines and heat transfer equipment for maximum efficiency 12 6 . gears machine tools and other mechanical components 11 Engineering Applications (Contd. Design of pumps. wind and other types of random loading Design of water resources systems for maximum benefit Optimal plastic design of structures Optimum design of linkages.) Selection of machining conditions in metal cutting processes for minimum production cost. cams.Engineering Applications (Contd.) Minimum weight design of structures for earthquake.

Engineering Applications (Contd. generators and transformers Optimum design of electrical networks. Shortest route taken by a salesmen visiting different cities during one tour Optimal production planning. Optimum design of chemical processing equipment and plants Design of optimum pipe line networks for process industries 14 7 .) Analysis of statistical data and building empirical models from experimental results to obtain the most accurate representation of the physical phenomenon.) Optimum design of electrical machinery like motors. controlling and scheduling 13 Engineering Applications (Contd.

….Engineering Applications (Contd.2.  x   n Subject to the constraints gj(X) ≤..2. m and Lj(X) = 0.) Selection of site for an industry Planning of maintenance and replacement of equipment Inventory control Allocation of resources or services among several activities to maximize the benefit Optimum design of control systems.. j = 1.…. p 16 8 . 15 Statement of an Optimization Problem An Optimization or a mathematical programming problem can be stated as follows Find  x1    which minimizes f(X) X =  .. j =1..

Number of inequality constraints p.design vector f(X) .Statement of Optimization Problem Where X is an n-dimensional vector .Objective Function gj(X) .Inequality Constraints Lj(X) – Equality Constraints n.) Earlier discussed problem is known as Constrained optimization Problem 18 9 .Number equality constraints 17 Statement of Optimization Problems (Contd.Number of Variables m.

Definition of Terminology Design Vector Design Constraints Constraint Surface Objective Function 19 Design Vector 20 10 .

Center distance d . 40} – Possible Solution Design point = {1. 20. T2} Design Space – N-dimensional Design point = { 1. 40} – Impossible Solution 22 11 .Material 21 Design Vector Pre-assigned Variables -Pressure Angle ψ -The tooth Profile -Material Design Vector X = { x1 x2 x3} = { b.0.Number of teeth T1 and T2 . -20.The tooth Profile .Design Vector Pre-assigned Parameters Design or Decision variables Example: Gear-pair . T1.Face width – b .0.Pressure Angle ψ .

Design Constraints 23 Design Constraints Design variables can not be chosen arbitrarily Design variables have to satisfy certain functional and other requirements Types of Constraints Behavior or Functional Constraints Geometric or Side Constraints 24 12 .

T1 and T2 can not be real numbers . fabricability and transportability Examples .Face width b can not be taken smaller than a certain value due to strength requirement . 26 13 . 25 Geometric/Side Constraints They represent physical limitations on the design variables like availability.Ratio T1/T2 is dictated by speeds of the input and output of shafts N1 and N2.Behavior/Functional Constraints They represent limitations on the behvaior or performance of the system Examples .Upper and lower bounds on T1 and T2 due to manufacturing limitations.

Constraint Surface 27 x2 Behavior Constraint Free Point Free Unacceptable point Bound acceptable point Behavior Constraint Bound unacceptable point Side Constraint x1 28 14 .

Objective Function Surfaces The locus of all points satisfying f(X) = c = Constant forms a hyper-surface in the design space 29 x2 f(X) = C C3 Optimum Point C1 C2 C1 < C2 < C3 x1 30 15 .

31 Optimization Procedure Need for Optimization Choose design variables Formulate constraints Formulate Objective Function Set up variable bounds Choose Optimization Algorithm Obtain Solutions 32 16 . the constraints and objective function surfaces become complex even for visualization and the problem has to be solved purely as a mathematical problem.Major Hurdle The number of design variables exceeds two or three.

Wiley Eastern Limited Kalyanmoy Deb . Optimization – Theory and Applications. India 33 34 17 . PHI.References S S Rao.Optimization For Engineering Design .Algorithms and Examples.

2.2.…. m and Lj(X) = 0. j =1. j = 1..…..  x   n Subject to the constraints gj(X) ≤. p 2 1 ..Optimization and Genetic Algorithm (GA)Introduction Lecture 2 1 Statement of an Optimization Problem An Optimization or a mathematical programming problem can be stated as follows Find  x1    which minimizes f(X) X =  ..

Statement of Optimization Problem Where X is an n-dimensional vector .) Earlier discussed problem is known as Constrained optimization Problem 4 2 .design vector f(X) .Inequality Constraints Lj(X) – Equality Constraints n.Number equality constraints 3 Statement of Optimization Problems (Contd.Objective Function gj(X) .Number of inequality constraints p.Number of Variables m.

Definition of Terminology Design Vector Design Constraints Constraint Surface Objective Function 5 x2 f(X) = C C3 Optimum Point C1 C2 C1 < C2 < C3 x1 6 3 .

Major Hurdle The number of design variables exceeds two or three. 7 Optimization Procedure Need for Optimization Choose design variables Formulate constraints Formulate Objective Function Set up variable bounds Choose Optimization Algorithm Obtain Solutions 8 4 . the constraints and objective function surfaces become complex even for visualization and the problem has to be solved purely as a mathematical problem.

Nontraditional Optimization Algorithms Genetic Algorithms (GAs) .Mimic the principles of natural genetics and natural selection to constitute search and optimization procedures – Evolutionary Computing Algorithm Simulated Annealing – Mimics the cooling phenomenon of molten metals to constitute a search procedure 9 Background The field is based and inspired by observation and computer simulation of natural processes in the real world. Main inspiration – Darwin’s theory of evolution Application of these ideas to complex optimisation problems and machine learning 10 5 .

variations can accumulate and produce new species.Darwin’s Theory of Evolution During reproduction. A process called natural selection. traits found in parents are passed on to their offspring Variations (mutations) are present naturally in all species producing new traits. the chromosomes of offspring are a mix of their parents An offspring’s characteristics are partially inherited from parents and partly the result of new genes created during the reproduction process 12 6 . 11 Natural Selection Those fittest survive longest Characteristics. ‘selects’ individuals best adapted to the environment Over long periods of time. encoded in genes are transmitted to offspring and tend to propagate into new generations In sexual reproduction.

g. Evolutionary Computing as Science – To simulate real life phenomena (e.Motivation Evolutionary Computing as Engineering – To find solutions to complex optimisation problems and machine learning problems. cellular automata. artificial life) 13 Nature-to-Computer Mapping Nature Individual Population Fitness Chromosome Gene Crossover and Mutation Natural Selection Computer Solution to a problem Set of solutions Quality of a solution Encoding for a solution Part of the encoding of a solution Search operators Reuse of good (sub-) solutions 14 7 .

Crossover and mutation implement a pseudorandom walk through the search space. – Mutation is the main operator for achieving this. 16 8 . 15 Local and Global Search Local search – Looking for solutions near existing solutions in the search space – Crossover is the main operator for achieving this Global search – Looking at completely new areas of the search space.Search Space The set of all possible encodings of solutions defines the search space. Walk is random because crossover and mutation are non-deterministic Walk is directed in the sense that the algorithm aims maximise quality of solutions using a fitness function. One measure of the complexity of the problem is the size of the search space.

A Generic Evolutionary Algorithm Initialise a population Evaluate a population While (termination condition not met) do – Select sub-population based on fitness – Produce offsprings of the population using crossover – Mutate offsprings stochastically – Select survivors based on fitness 17 Evolutionary Computing On the basis of:– structures undergoing optimisation or evolution – reproduction strategy – genetic operators used These are five groups of algorithms – Evolutionary programming – Evolutionary strategies – Classifier systems – Genetic Algorithms – Genetic Programming 18 9 .

effective techniques for optimization and machine learning applications Widely-used today in business.Genetic Algorithm Directed search algorithms based on the mechanics of biological evolution Developed by John Holland. scientific and engineering circles 20 10 .) Provide efficient. University of Michigan (1970’s) – To understand the adaptive processes of natural systems – To design artificial systems software that retains the robustness of natural systems 19 Genetic Algorithm (Contd.

recombination) Parameter settings (practice and art) 22 11 .Classes of Search Techniques Search techniques Calculus-based techniques Guided random search techniques Enumerative techniques Direct methods Indirect methods Evolutionary algorithms Simulated annealing Dynamic programming Finonacci Newton Evolutionary strategies Genetic algorithms Parallel Sequential Centralized Distributed Steady-state Generational 21 Components of GA A problem to solve. and . Encoding technique (gene... chromosome) Initialization procedure (creation) Evaluation function (environment) Selection of parents (reproduction) Genetic operators (mutation.

293 Fitness values Reproduction (Roulette wheel) 14.5%. 11000. 361 Av. 49. 30. values chosen at random. 5.2%. 576.. 64.4%.A simple GA example Problem: Find the value of x which maximises the function f(x)=x^2 on the integer range for x [0.9% and 11000 and 1100 1 and 10011 and 10 000 24 Selection for Crossover – 01101 – 0110 0 – 11000 – 11 011 Crossover after bit 4 (random) Another selection for Crossover Crossover after bit 2 (random 12 .31] Method – 1) Use 5 bits to represent a solution – 2) Choose a population of 4 individuals (small by GA standards). 23 Sample population – – – 01101. 10011 169. 0100.

439 Next generation will start with this population Mutation: Change one bit probabilisticly e. 625.001 – Expected probability of a mutation of one individual is 0. considering some of the different methods and problems that can be encountered.g p_m = 0.005. with no local maxima – The encoding was obvious We will look at each stage of a genetic algorithm in detail. 256 Av.New population after first generation – 01100. 11011. 10000 Fitness values – 144.02 25 Eventually get convergence with best individuals of – 11111 and a fitness of 961 This problem was very easy because: – Very small search space – Only one parameter – Very smooth steep function. 729. 26 13 . – Expected probability of a mutation in whole family is 0. 11001.

answer gets better with time Inherently parallel.Brighter Side of GA Concept is easy to understand Modular. easily distributed 27 Brighter Side of GA (Contd.) Many ways to speed up and improve a GA-based application as knowledge about problem domain is gained Easy to exploit previous or alternate solutions Flexible building blocks for hybrid applications Substantial history and range of use 28 14 . separate from application Supports multi-objective optimization Good for “noisy” environments Always an answer.

prisoner’s dilemma set covering. travelling salesman.When to use GA ? Alternate solutions are too slow or overly complicated Need an exploratory tool to examine new approaches Problem is similar to one that has already been successfully solved by using a GA Want to hybridize with an existing solution Benefits of the GA technology meet key problem requirements 29 GA Applications Domain Control Design Scheduling Robotics Machine Learning Signal Processing Game Playing Combinatorial Optimization Application Types gas pipeline. pole balancing. pursuit semiconductor layout. resource allocation trajectory planning designing neural networks. classifier systems filter design poker. aircraft design. bin packing. graph colouring and partitioning 30 15 . facility scheduling. missile evasion. communication networks manufacturing. checkers. routing. improving classification algorithms. keyboard configuration.

Journal of Machine Learning. – Genetic Programming: An introduction W.E. 32 16 .D.) Genetic Programming – Genetic Programming: On the Programming of Computers by Means of Natural Selection John R.E Goldberg Addison-Wesley 1989 rec. Francone Morgan Kaufmann 1998 ref. Banzhaf.References Genetic Algorithms – Genetic Algorithms in Search.Keller and F. Vol. M. – Introduction to Genetic Algorithms. 3 No. D. 31 References (Contd. Optimisation and Machine Learning.Koza MIT press 1992 rec. 2/3 Kluwer Academic Publishers 1988 ref. Mitchell MIT press 1996 – Special Issue on Genetic Algorithms. R.

cs. looking for optimal combinations of things. May 1995 34 17 .” .ac.uk/~rmp/slide_book/slide_ book. solutions you might not otherwise find in a lifetime.Salvatore Mangano Computer Design. potentially huge search spaces and navigating them.html 33 “Genetic Algorithms are good at taking large.bham.Other sources of material Online set of notes Introduction to Evolutionary Computation: Riccardo Poli University of Birmingham Web pages at http:/www.

35 18 .

p 2 1 . m and Lj(X) = 0.….Genetic Algorithm (GA) Lecture 3 1 Statement of an Optimization Problem An Optimization or a mathematical programming problem can be stated as follows Find  x1    which minimizes f(X) X =  . j = 1.….  x   n Subject to the constraints gj(X) ≤. j =1...2...2.

Number of inequality constraints p.Statement of Optimization Problem Where X is an n-dimensional vector .Objective Function gj(X) .design vector f(X) .Inequality Constraints Lj(X) – Equality Constraints n.Number equality constraints 3 Statement of Optimization Problems (Contd.Number of Variables m.) Earlier discussed problem is known as Constrained optimization Problem 4 2 .

Definition of Terminology Design Vector Design Constraints Constraint Surface Objective Function 5 Optimization Procedure Need for Optimization Choose design variables Formulate constraints Formulate Objective Function Set up variable bounds Choose Optimization Algorithm Obtain Solutions 6 3 .

Classes of Search Techniques Search techniques Calculus-based techniques Guided random search techniques Enumerative techniques Direct methods Indirect methods Evolutionary algorithms Simulated annealing Dynamic programming Finonacci Newton Evolutionary strategies Genetic algorithms Parallel Sequential Centralized Distributed Steady-state Generational 7 More Comparisons Nature – not very efficient at least a 20 year wait between generations not all mating combinations possible Genetic algorithm – efficient and fast optimisation complete in a matter of minutes mating combinations governed only by “fitness” 8 4 .

The Genetic Algorithm Approach Define limits of variable parameters Generate a random population of designs Assess “fitness” of designs Mate selection Crossover Mutation Reassess fitness of new population 9 Working Principle Let us consider following maximization problem Maximize f(x).2. i=1.…n 10 5 . (xi)L ≤ xi ≤(xi)U.

11 Coding (Contd. Length of the string is determined according to the desired solution accuracy. if 4-bits are used to code each variable in two variable function optimization problem. 1111) would represent the points ( (x1)L (x2)L )T ( (x1)U (x2)U )T 12 6 .) For example. the strings (0000 0000) and (1111.Coding Variables xi’s are coded in some string structures The coding of variables is not absolutely necessary Binary-coded strings having 1’s and 0’s are mostly used.

) Any other eight-bit string can be found to represent a point in the search space according to a fixed mapping rule. Usually following linear mapping rule is used xi = xi ( L) + xi (U ) − xi 2 li − 1 ( L) decoded value (si) 13 Coding (contd.) The variable xi is coded in a sub-string si of length li.Coding (Contd.1) and the string s is represented as (sl-1 sl-2 …… s2 s1 s0) 14 7 . The decoded value of binary sub-string si is calculated as 2i si ∑i =0 l −1 Where si ∈ (0.

15 Coding (Contd. there are only 24 or 16 distinct sub-strings possible. The possible accuracy 1/16th of the search space It is not necessary to code all variables in equal sub-string length.bit coding for a variable.) Generalization . the obtainable accuracy in that variable is approximately xi (U ) − xi 2 li (L) 16 8 .Coding (Contd. four-bit string (0111) has decoded value equal to ((1)20 + (1)21 + (1)22 + (0)23 ) or 7 With four bits to code each variable.) Example.li .

A “Population” 17 Ranking by Fitness 18 9 .

) Minimization Problem Convert to maximization problem using suitable transformation Optimum point remains unchanged Commonly used fitness function F(x) = 1/ (1+ f(x)) 20 10 . 19 Fitness Function (Contd.Fitness Function Fitness function F(x) is first derived from the objective function and used in successive genetic operations. For maximization problem. the fitness function can be considered as a the objective function F(x) = f(x).

GA Operators Reproduction Crossover Mutation 21 Reproduction 22 11 .

Essential idea – Above –average strings must be picked from current population and their multiple copies must inserted in the mating pool in a probabilistic manner 23 Reproduction (Contd. known as Selection Operator Exist many types of reproduction operators in GA literature. Probability for i.the string is pi = Fi ∑F j =1 n j 24 12 . Also.) Commonly used approach i. the sum of probability of each string being selected for the mating pool must be ONE.th string in the population is selected with a probability proportional to Fi Population size is usually kept fixed in a simple GA.Reproduction Selects the good strings in population and forms a mating pool.

The average fitness of the population is calculated as F = Σ Fi / n 26 13 . this roulette-wheel mechanism is expected to make Fi / Fcopies of the i.) Since the circumference of the wheel is marked according to a string’s fitness.) Simple selection approach – Roulette-wheel with its circumference marked for each string proportionate to the string’s fitness.th string in the mating pool. 16% 0% 25% 7% 3% 11% 38% 25 Reproduction (Contd.Reproduction (Contd.

No new strings are formed in reproduction phase.) Good strings in a population are probabilistically assigned a larger number of copies and a mating pool is formed. 27 Crossover 28 14 .Reproduction (contd.

Crossover (Contd.) New strings are created by exchanging information among strings of the mating pool. Many approaches are available in literature. 29 Crossover (Contd. Common Approach – Two strings are picked from the mating pool at random and some portions of the strings are exchanged between the strings.) P1 P2 (0 1 1 0 1 0 0 0) (1 1 0 1 1 0 1 0) * (0 1 0 0 1 0 0 0) (1 1 1 1 1 0 1 0) C1 C2 Crossover is a critical feature of genetic algorithms: – It greatly accelerates search early in evolution of a population – It leads to effective combination of schemata (subsolutions on different chromosomes) 30 15 .

) The need for mutation is to create a point in the neighborhood of the current point. Mutation is also used to maintain diversity in the poulation. thereby achieving a local search around the current solution.Mutation 31 Mutation (Contd. 32 16 .

Mutation (Contd. then neither reproduction nor crossover operator will be able to create 1 in that position. 33 Summary of GA Operators Reproduction Operator – Selects good strings Crossover Operator – Recombines good sub-strings from good strings together to hopefully create a better sub-string Mutation Operator – Alters a string locally to obtain better sub-string 34 17 .) Example – Four eight-bit strings • 0110 1011 • 0011 1101 • 0001 0110 • 0111 1100 If true optimum requires 1 in 0 position.

Initialize a random population of strings of size l. a cross-over operator. Choose a maximum allowable generation number tmax 36 18 . a selection operator.Best Design 35 Recap .Algorithm Step 1: Choose a coding to represent problem parameters. Crossover probability (pc). Choose population size (n). and mutation operator. and mutation probability (pm).

37 Simple Problem The objective is to minimize the function f(x1. x2) = (x12 + x2 –11)2 + (x1 + x22 –7)2 In the interval 0 ≤ x1. Set t = t + 1 and go to Step 3.Algorithm (contd.) Step 2: Evaluate each string in the population Step 3: If t > tmax or other termination criteria is satisfied. TERMINATE Step 4: Perform reproduction on the population Step 5: Perform crossover on random pairs of strings Step 6: Perform mutation on every string Step 7: Evaluate strings in the new population. x2 ≤ 6 True solution is (3.2) 38 19 .

With 10-bits we can get a solution accuracy of (6-0)/(210 –1) = 0.6). Chose roulette-wheel selection Crossover probability = 0.8 Mutation probability = 0.Step 1 Choose binary coding for x1.006 in the interval (0. x2 10-bits are chosen for each variable Total string length equal to 20.05 Population size = 20 tmax = 30 39 TABLE 40 20 .

680 Fitness function value at this value using the transformation rule: F(x) = 1.Step 2 Evaluate each string in the population First sub-string .001 Calculate for other populations 41 Step 3 Since t < tmax.692 Second sub-string 1110010000 5.0/(1.0 + 959. We proceed to Step 4.1100100000 (29 + 28 + 25) 800 0 + (6-0) x 800 /(1024-1) = 4.349 f(x) = 959. 42 21 .680) =0.

472 that string occupies the interval (0.Step 4 Picking up good strings – Use roulette-wheel selection procedure.g. e.0. for first string it has been 0.) Column C: Cumulative probability of selection Column D: In order to form the mating pool. Calculate average fitness of the population F = Σ Fi / n = 0.549) as shown in Column C String 10 is identified for Column E 44 22 .401. creat random numbers between zero and one and identify the particular string which is specified by each of these random numbers.008 Column A: Expected count of each string F(x)/ F Column B: Probability of each string being copied in the mating pool = Value of Column A/ Population size 43 Step 4 (Contd.

e. two strings are selected at random and crossed at a random site.Step 4 (Contd. First a coin is flipped with a probability pc = 0. In single-point crossover. At random select strings 3 and 10 for first cross-over operation.8 to check whether a cross-over is desired or not. 45 Step 5 Mating Pool are used for Crossover.) Column F : True count in the mating pool (Column E) Observation : Columns A and F reveal that the theoretical expected count and the true count of each string more or less agree with each other. 23 .g. 46 Let us say – YES.

We choose a site by creating a random number between (0. In this case. Store the revised strings as Intermediate Population 47 Step 6 Mutation process on intermediate population For bit-wise mutation. flip the coin with a probability pm = 0. string length 20). l-1) or (0.05 for every bit. with 10 pairs Crossover operation will be carried out.05 x 20 x 20 =20 bits in population (population size 20. 48 24 .Step 5 (Contd.) Find the cross-site at random. maximum 0.19) In this case.

Step 7 Resulting population becomes the new population. Carryout steps of 2-6. Go on till we find the point at which the fitness value is near to 1.0 49 The GA Cycle 50 25 .

Example Traveling Salesman Problem 51 Traveling Salesman Problem Find a tour of a given set of cities so that – each city is visited only once – the total distance traveled is minimized 52 26 .

TSP Example: 30 Cities 120 100 80 60 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100 y 53 Solution i (Distance = 941) TSP30 (Performance = 941) 120 100 80 60 y 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100 54 27 .

Solution j(Distance = 800) 44 62 69 67 78 64 62 54 42 50 40 40 38 21 35 67 60 60 40 42 50 99 TSP30 (Performance = 800) 120 100 80 60 y 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100 55 Solution k(Distance = 652) TSP30 (Performance = 652) 120 100 80 60 y 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100 56 28 .

O v e rv i e w o f P e r fo rm a n c e 1800 1600 1400 1200 Distance 1000 800 600 400 200 0 11 13 15 17 19 21 23 25 27 29 31 1 3 5 7 9 G e n e r a t io n s ( 1 0 0 0 ) Bes t W ors t A v erage 58 29 .Best Solution (Distance = 420) 42 38 35 26 21 35 32 7 38 46 44 58 60 69 76 78 71 69 67 62 84 94 TSP30 Solution (Performance = 420) 120 100 80 60 y 40 20 0 0 10 20 30 40 50 x 60 70 80 90 100 57 Overview of Performance T S P 3 0 .

59 30 .

a genetic algorithm is any population-based model that uses selection and recombination operators to generate new sample points in a search space. including parallel island models and parallel cellular genetic algorithms. 1993). Keywords: Genetic Algorithms. as well as variations on what will be referred to in this paper as the canonical genetic algorithm. This particular description of a genetic algorithm is intentionally abstract because in some sense. Many genetic algorithm models have been introduced by researchers largely working from 1 . In a broader usage of the term.. 1975). One then evaluates these structures and allocates reproductive opportunities in such a way that those chromosomes which represent a better solution to the target problem are given more chances to \reproduce" than those chromosomes which are poorer solutions.A Genetic Algorithm Tutorial Darrell Whitley Computer Science Department. although the range of problems to which genetic algorithms have been applied is quite broad. In a strict interpretation. the genetic algorithm refers to a model introduced and investigated by John Holland (1975) and by students of Holland (e. The \goodness" of a solution is typically de ned with respect to the current population. Parallel Algorithms 1 Introduction Genetic Algorithms are a family of computational models inspired by evolution. Colorado State University Fort Collins. It is still the case that most of the existing theory for genetic algorithms applies either solely or primarily to the model introduced by Holland. The tutorial also illustrates genetic search by hyperplane sampling. These algorithms encode a potential solution to a speci c problem on a simple chromosome-like data structure and apply recombination operators to these structures so as to preserve critical information.edu Abstract This tutorial covers the canonical genetic algorithm as well as more experimental forms of genetic algorithms. Search.colostate. include the schema theorem as well as recently developed exact models of the canonical genetic algorithm. Recent theoretical advances in modeling genetic algorithms also apply primarily to the canonical genetic algorithm (Vose. CO 80523 whitley@cs. An implementation of a genetic algorithm begins with a population of (typically random) chromosomes. DeJong. the term genetic algorithm has two meanings.g. The theoretical foundations of genetic algorithms are reviewed. Genetic algorithms are often viewed as function optimizers.

we obtain a range with 1024 discrete values. we wish to minimize (or maximize) some function F (X1 X2 ::: XM ). This means that the variables are discretized in an a priori fashion.an experimental perspective. 1. This assumes. It also assumes that the discretization is in some sense representative of the underlying function. If the parameters are actually continuous then this discretization is not a particular problem. 2 . In more traditional terms. This also often implies that it is not possible to treat each parameter as an independent variable which can be solved in isolation from the other variables. Many of these researchers are application oriented and are typically interested in genetic algorithms as optimization tools. Most users of genetic algorithms typically are concerned with problems that are nonlinear. the interaction between variables is sometimes referred to as epistasis. The goal of this tutorial is to present genetic algorithms in such a way that students new to this eld can grasp the basic concepts behind genetic algorithms as they work through the tutorial. For example. The last three sections of the tutorial cover alternative forms of genetic algorithms and evolutionary computational models. which have sometimes been misunderstood and misused by researchers new to the eld.1 Encodings and Optimization Problems Usually there are only two main components of most genetic algorithms that are problem dependent: the problem encoding and the evaluation function. The goal is to set the various parameters so as to optimize some output. There are interactions such that the combined e ects of the parameters must be considered in order to maximize or minimize the output of the black box. In section 3 the principle of hyperplane sampling is explored and some basic crossover operators are introduced. The tutorial also covers topics. In section 6 a brief criticism of the schema theorem is considered and in section 7 an exact model of the genetic algorithm is developed. such as inversion. with 10 bits per parameter. In section 2 a canonical genetic algorithm is reviewed. The rst assumption that is typically made is that the variables representing parameters can be represented by bit strings. We might view such a problem as a black box with a series of control dials representing di erent parameters the only output of the black box is a value returned by an evaluation function indicating how well a particular combination of parameter settings solves the optimization problem. In section 4 various versions of the schema theorem are developed in a step by step fashion and other crossover operators are discussed. that the discretization provides enough resolution to make it possible to adjust the output with the desired level of precision. including specialized parallel implementations. The tutorial begins with a very low level discussion of optimization to both introduce basic ideas in optimization as well as basic concepts that relate to genetic algorithms. of course. or to minimize cost or some measure of error. Consider a parameter optimization problem where we must optimize a set of variables either to maximize some target such as pro t. It should allow the more sophisticated reader to absorb this material with relative ease. In section 5 binary alphabets and their e ects on hyperplane sampling are considered. and that the range of the discretization corresponds to some power of 2. In the genetic algorithm community.

the population is replaced (all or in part) on a generational basis. consider a control application where the system can be in any one of an exponentially large number of possible states. Assume a genetic algorithm is used to optimize some form of control strategy. developing an evaluation function can sometimes involve developing a simulation. Solving such coding problems is usually considered to be part of the design of the evaluation function. the evaluation function is usually given as part of the problem description. then random search or search by enumeration of a large 3 . For example. The evaluation function must also be relatively fast. Winston (1992:102) points out that 2400 is a good approximation of the e ective size of the search space of possible board con gurations in chess.000 evaluations. Fitzpatrick and Grefenstette. most test functions are at least 30 bits in length and most researchers would probably agree that larger test functions are needed. The members of the population reproduce. Furthermore. Since a genetic algorithm works with a population of potential solutions.If some parameter can only take on an exact nite set of values then the coding issue becomes more di cult. what if there are exactly 1200 discrete values which can be assigned to some variable Xi . For example. Winston states that this is \a ridiculously large number.." The point is that as long as the number of \good solutions" to a problem are sparse with respect to the size of the search space. In fact. On the other hand. 230 does not sound quite so large. or some parameter settings may be represented twice so that all binary strings result in a legal set of parameter values. 1988). but this codes for a total of 2048 discrete values.f. In such cases. the analysis would be just getting started. a default worst possible evaluation. (This assumes the e ective branching factor at each possible move to be 16 and that a game is made up of 100 moves 16100 = (24)100 = 2400). How big is the associated search space? A classic introductory textbook on Arti cial Intelligence gives one characterization of a space of this size. We need at least 11 bits to cover this range. Anything much smaller represents a space which can be enumerated. the evaluation may be performance based and may represent only an approximate or partial evaluation. then it takes over 1 year to do 10. if all the atoms in the universe had been computing chess moves at picosecond rates since the big bang (if any). and their o spring must then be evaluated. 1. In other cases. (Considering for a moment that the national debt of the United States in 1993 is approximately 242 dollars. the size of the search space is related to the number of bits used in the problem encoding.2 How Hard is Hard? Assuming the interaction between parameters is nonlinear. The 848 unnecessary bit patterns may result in no evaluation. Generally. Consider a problem with an encoding of 400 bits. This is typically true for any optimization method. but it may particularly pose an issue for genetic algorithms. the state space must be sampled in a limited fashion and the resulting evaluation of control strategies is approximate and noisy (c.) Of course. Aside from the coding issue. This would be approximately 50 generations for a population of only 200 strings. The genetic algorithm samples the corners of this L-dimensional hypercube. the expression 2L grows exponentially with respect to L. If it takes 1 hour to do an evaluation. it incurs the cost of evaluating this population. For a bit string encoding of length L the size of the search space is 2L and forms a hypercube.

search space is not a practical form of problem solving. On the other hand, any search other than random search imposes some bias in terms of how it looks for better solutions and where it looks in the search space. Genetic algorithms indeed introduce a particular bias in terms of what new points in the space will be sampled. Nevertheless, a genetic algorithm belongs to the class of methods known as \weak methods" in the Arti cial Intelligence community because it makes relatively few assumptions about the problem that is being solved. Of course, there are many optimization methods that have been developed in mathematics and operations research. What role do genetic algorithms play as an optimization tool? Genetic algorithms are often described as a global search method that does not use gradient information. Thus, nondi erentiable functions as well as functions with multiple local optima represent classes of problems to which genetic algorithms might be applied. Genetic algorithms, as a weak method, are robust but very general. If there exists a good specialized optimization method for a speci c problem, then genetic algorithm may not be the best optimization tool for that application. On the other hand, some researchers work with hybrid algorithms that combine existing methods with genetic algorithms.

2 The Canonical Genetic Algorithm
The rst step in the implementation of any genetic algorithm is to generate an initial population. In the canonical genetic algorithm each member of this population will be a binary string of length L which corresponds to the problem encoding. Each string is sometimes referred to as a \genotype" (Holland, 1975) or, alternatively, a \chromosome" (Scha er, 1987). In most cases the initial population is generated randomly. After creating an initial population, each string is then evaluated and assigned a tness value. The notion of evaluation and tness are sometimes used interchangeably. However, it is useful to distinguish between the evaluation function and the tness function used by a genetic algorithm. In this tutorial, the evaluation function, or objective function, provides a measure of performance with respect to a particular set of parameters. The tness function transforms that measure of performance into an allocation of reproductive opportunities. The evaluation of a string representing a set of parameters is independent of the evaluation of any other string. The tness of that string, however, is always de ned with respect to other members of the current population. In the canonical genetic algorithm, tness is de ned by: fi =f where fi is the evaluation associated with string i and f is the average evaluation of all the strings in the population. Fitness can also be assigned based on a string's rank in the population (Baker, 1985 Whitley, 1989) or by sampling methods, such as tournament selection (Goldberg, 1990). It is helpful to view the execution of the genetic algorithm as a two stage process. It starts with the current population. Selection is applied to the current population to create an intermediate population. Then recombination and mutation are applied to the intermediate population to create the next population. The process of going from the current population to the next population constitutes one generation in the execution of a genetic algorithm. Goldberg (1989) refers to this basic implementation as a Simple Genetic Algorithm (SGA). 4

Selection (Duplication)
String 1 String 2 String 3 String 4 String 1 String 2 String 2 String 4

Recombination (Crossover)
Offspring-A (1 X 2) Offspring-B (1 X 2) Offspring-A (2 X 4) Offspring-B (2 X 4)

Current Generation t

Intermediate Generation t

Next Generation t + 1

Figure 1: One generation is broken down into a selection phase and recombination phase. This gure shows strings being assigned into adjacent slots during selection. In fact, they can be assigned slots randomly in order to shu e the intermediate population. Mutation (not shown) can be applied after crossover. We will rst consider the construction of the intermediate population from the current population. In the rst generation the current population is also the initial population. After calculating fi =f for all the strings in the current population, selection is carried out. In the canonical genetic algorithm the probability that strings in the current population are copied (i.e., duplicated) and placed in the intermediate generation is proportion to their tness. There are a number of ways to do selection. We might view the population as mapping onto a roulette wheel, where each individual is represented by a space that proportionally corresponds to its tness. By repeatedly spinning the roulette wheel, individuals are chosen using \stochastic sampling with replacement" to ll the intermediate population. A selection process that will more closely match the expected tness values is \remainder stochastic sampling." For each string i where fi=f is greater than 1.0, the integer portion of this number indicates how many copies of that string are directly placed in the intermediate population. All strings (including those with fi =f less than 1.0) then place additional copies in the intermediate population with a probability corresponding to the fractional portion of fi=f . For example, a string with fi=f = 1:36 places 1 copy in the intermediate population, and then receives a 0:36 chance of placing a second copy. A string with a tness of fi=f = 0:54 has a 0:54 chance of placing one string in the intermediate population. 5

\Remainder stochastic sampling" is most e ciently implemented using a method known as Stochastic Universal Sampling. Assume that the population is laid out in random order as in a pie graph, where each individual is assigned space on the pie graph in proportion to tness. Next an outer roulette wheel is placed around the pie with N equally spaced pointers. A single spin of the roulette wheel will now simultaneously pick all N members of the intermediate population. The resulting selection is also unbiased (Baker, 1987). After selection has been carried out the construction of the intermediate population is complete and recombination can occur. This can be viewed as creating the next population from the intermediate population. Crossover is applied to randomly paired strings with a probability denoted pc . (The population should already be su ciently shu ed by the random selection process.) Pick a pair of strings. With probability pc \recombine" these strings to form two new strings that are inserted into the next population. Consider the following binary string: 1101001100101101. The string would represent a possible solution to some parameter optimization problem. New sample points in the space are generated by recombining two parent strings. Consider the string 1101001100101101 and another binary string, yxyyxyxxyyyxyxxy, in which the values 0 and 1 are denoted by x and y. Using a single randomly chosen recombination point, 1-point crossover occurs as follows.
11010 \/ 01100101101 yxyyx /\ yxxyyyxyxxy

Swapping the fragments between the two parents produces the following o spring.
11010yxxyyyxyxxy and yxyyx01100101101

After recombination, we can apply a mutation operator. For each bit in the population, mutate with some low probability pm . Typically the mutation rate is applied with less than 1% probability. In some cases, mutation is interpreted as randomly generating a new bit, in which case, only 50% of the time will the \mutation" actually change the bit value. In other cases, mutation is interpreted to mean actually ipping the bit. The di erence is no more than an implementation detail as long as the user/reader is aware of the di erence and understands that the rst form of mutation produces a change in bit values only half as often as the second, and that one version of mutation is just a scaled version of the other. After the process of selection, recombination and mutation is complete, the next population can be evaluated. The process of evaluation, selection, recombination and mutation forms one generation in the execution of a genetic algorithm.

2.1 Why does it work? Search Spaces as Hypercubes.
The question that most people who are new to the eld of genetic algorithms ask at this point is why such a process should do anything useful. Why should one believe that this is going to result in an e ective form of search or optimization? The answer which is most widely given to explain the computational behavior of genetic algorithms came out of John Holland's work. In his classic 1975 book, Adaptation in Natural and Arti cial Systems, Holland develops several arguments designed to explain how a 6

A population of sample points provides information about numerous hyperplanes furthermore. Establishing that each string is a member of 2L . If \*" is used as a \don't care" or wild card match symbol.\genetic plan" or \genetic algorithm" can result in complex and robust search by implicitly sampling hyperplane partitions of a search space. then this plane can also be represented by the special string 0**. 1** is order-1 while 1**1******0** would be of order-3. The \order" of a hyperplane refers to the number of actual bit values that appear in its schema. Many 7 . It is also relatively easy to see that 3L . This can be shown by taking a bit string and looking at all the possible ways that any subset of bits can be replaced by \*" symbols. This creates an assignment to the points in hyperspace that gives the proper adjacency in the space between strings that are 1 bit di erent. (The string of all * symbols corresponds to the space itself and is not counted as a partition of the space (Holland 1975:72)). The front plane of the cube contains all the points that begin with 0. The bottom of Figure 2 illustrates a 4-dimensional space represented by a cube \hanging" inside another cube. it can be argued that far more hyperplanes are sampled than the number of strings contained in the population. Perhaps the best way to understand how a genetic algorithm can sample hyperplane partitions is to consider a simple 3-dimensional space (see Figure 2). there are L positions in the bit string and each position can be either the bit value contained in the string or the \*" symbol. Strings that contain * are referred to as schemata each schema corresponds to a hyperplane in the search space. all bit strings that match a particular schemata are contained in the hyperplane partition represented by that particular schemata. 1 di erent hyperplanes. An example is given in the top of Figure 2. The corners in this cube are numbered by bit strings and all adjacent corners are labelled by bit strings that di er by exactly 1-bit. Thus. It is also rather easy to see that *0** corresponds to the subset of points that corresponds to the fronts of both cubes. The inner cube now corresponds to the hyperplane 1*** while the outer cube corresponds to 0***. 1 or 0 which results in 3L combinations. In other words. In general. Label the points in the inner cube and outer cube exactly as they are labeled in the top 3-dimensional space. For each of the L positions in the bit string we can have either the value *. low order hyperplanes should be sampled by numerous points in the population. The points can be labeled as follows. 1 hyperplane partitions can be de ned over the entire search space. where L is the length of the binary encoding. 1 hyperplane partitions doesn't provide very much information if each point in the search space is examined in isolation. Next. Assume we have a problem encoded with just 3 bits this can be represented as a simple cube with the string 000 at the origin.) A key part of a genetic algorithm's intrinsic or implicit parallelism is derived from the fact that many hyperplanes are sampled when a population of strings is evaluated (Holland 1975) in fact. The order-2 hyperplane 10** corresponds to the front of the inner cube. This is why the notion of a population based search is critical to genetic algorithms. (This issue is reexamined in more detail in subsequent sections of this paper. Every binary encoding is a \chromosome" which corresponds to a corner in the hypercube and is a member of 2L . A bit string matches a particular schemata if that bit string can be constructed from the schemata by replacing the \*" symbol with the appropriate bit value. pre x each inner cube labeling with a 1 bit and each outer cube labeling with a 0 bit.

110

111

010

011

100

101

000

001

0110 0111 1110

0010 1010 1101

0101 1000 1001

0000 0001

Figure 2: A 3-dimensional cube and a 4-dimensional hypercube. The corners of the inner cube and outer cube in the bottom 4-D example are numbered in the same way as in the upper 3-D cube, except a 1 is added as a pre x to the labels of inner cube and a 0 is added as a pre x to the labels of the outer cube. Only select points are labeled in the 4-D hypercube.

8

di erent hyperplanes are evaluated in an implicitly parallel fashion each time a single string is evaluated (Holland 1975:74) but it is the cumulative e ects of evaluating a population of points that provides statistical information about any particular subset of hyperplanes.1 Implicit parallelism implies that many hyperplane competitions are simultaneously solved in parallel. The theory suggests that through the process of reproduction and recombination, the schemata of competing hyperplanes increase or decrease their representation in the population according to the relative tness of the strings that lie in those hyperplane partitions. Because genetic algorithms operate on populations of strings, one can track the proportional representation of a single schema representing a particular hyperplane in a population and indicate whether that hyperplane will increase or decrease its representation in the population over time when tness based selection is combined with crossover to produce o spring from existing strings in the population.

3 Two Views of Hyperplane Sampling
Another way of looking at hyperplane partitions is presented in Figure 3. A function over a single variable is plotted as a one-dimensional space, with function maximization as a goal. The hyperplane 0****...** spans the rst half of the space and 1****...** spans the second half of the space. Since the strings in the 0****...** partition are on average better than those in the 1****...** partition, we would like the search to be proportionally biased toward this partition. In the second graph the portion of the space corresponding to **1**...** is shaded, which also highlights the intersection of 0****...** and **1**...**, namely, 0*1*...**. Finally, in the third graph, 0*10**...** is highlighted. One of the points of Figure 3 is that the sampling of hyperplane partitions is not really e ected by local optima. At the same time, increasing the sampling rate of partitions that are above average compared to other competing partitions does not guarantee convergence to a global optimum. The global optimum could be a relatively isolated peak, for example. Nevertheless, good solutions that are globally competitive should be found. It is also a useful exercise to look at an example of a simple genetic algorithm in action. In Table 1, the rst 3 bits of each string are given explicitly while the remainder of the bit positions are unspeci ed. The goal is to look at only those hyperplanes de ned over the rst 3 bit positions in order to see what actually happens during the selection phase when strings are duplicated according to tness. The theory behind genetic algorithms suggests that the new distribution of points in each hyperplane should change according to the average tness of the strings in the population that are contained in the corresponding hyperplane partition. Thus, even though a genetic algorithm never explicitly evaluates any particular hyperplane partition, it should change the distribution of string copies as if it had.
Holland initially used the term intrinsic parallelism in his 1975 monograph, then decided to switch to implicit parallelism to avoid confusion with terminology in parallel computing. Unfortunately, the term implicit parallelism in the parallel computing community refers to parallelism which is extracted from code written in functional languages that have no explicit parallel constructs. Implicit parallelism does not refer to
1

the potential for running genetic algorithms on parallel hardware, although genetic algorithms are generally viewed as highly parallelizable algorithms.

9

1

F(X)

0 0 K/2 Variable X K

1

F(X)

0 0 K/8 K/4 K/2 Variable X K

1

F(X)

0 0 K/8 K/4 K/2 Variable X K

0***...*

**1*...*

0*10*...*

Figure 3: A function and various partitions of hyperspace. Fitness is scaled to a 0 to 1 range in this diagram.

10

28 1 000b13 4.b11 L 1. the number of copies produced during selection is determined by automatically assigning the integer part..79 1 010b11 4.93 2 001b8 4.b1 L 2.b21 L 0.65 2 010b4 4. In this example.61 0 100b20 4.51 1 011b17 4... (The notion of assigning tness by rank rather than by tness proportional representation has not been discussed in detail.93 2 111b3 4.37 1 100b10 4.. The true tness of the hyperplane partition corresponds to the average tness of all strings that lie in that hyperplane partition..02 1 111b5 4..0 { 2 101b2 4...b5 L 1. Genetic algorithms appear to process many hyperplanes implicitly in parallel when selection acts on the population.b17 L 0.20 1 011b7 4.70 1 110b15 4....b15 L 0.b6 L 1.2 0....... Table 2 enumerates the 27 hyperplanes (33) that can be de ned over the rst three bits of the strings in the population and explicitly calculates the tness associated with the corresponding hyperplane partition..b19 L 0.5 0..0 (a form of remainder stochastic sampling). Since we are not particularly concerned with the exact evaluation of these strings.4 0.b2 L 1.8 0..20 1 000b9 4.3 0.b10 L 1.5 0.b20 L 0..b3 L 1. 11 .7 0.. Random is a random number which determines whether or not a copy of a string is awarded for the fractional remainder of the tness. but the current example relates to change in representation due to tness and not how that tness is assigned. the sample of new strings should be biased toward regions that have previously contained strings that were above average with respect to previous populations.b9 L 1.1 0.0 and 1...String Fitness Random Copies 001b1 4.07 0 010b21 4.80 1 100b16 4. The example population in Table 1 contains only 21 (partially speci ed) strings.6 0.b4 L 1.76 1 000b18 4.. the tness values will be assigned according to rank...6 0.7 0.9 0. then assigning the fractional part by generating a random value between 0.0 { 0 Table 1: A population with tness assigned to strings according to rank.2 0.9 0. If the random value is greater than (1 .51 2 101b6 4.13 0 110b14 4.b18 L 0.. If the genetic algorithm works as advertised..0 { 1 String Fitness Random Copies 011b12 4....1 0. The genetic algorithm uses the population as a sample for estimating the tness of that hyperplane partition.45 0 001b19 4..3 0.b8 L 1.b12 L 0.. After this.b13 L 0..b16 L 0..) The table includes information on the tness of each string and the number of copies to be placed in the intermediate population.. Of course. the only time the sample is random is during the rst generation.b14 L 0..b7 L 1..8 0.. remainder) then an additional copy is awarded to the corresponding individual. the number of copies of strings that actually fall in a particular hyperplane partition after selection should approximate the expected number of copies that should fall in that partition..4 0..

4 6 1...* 1*1*.600 5 3.900 6 5..727 11 8..0 21 1*0*.Schema 101*.* *1**.8 7 0***.* 0... In practice. In most cases the match between expected and observed sampling rate is fairly good: the error is a result of sampling error due to the small population size. Let (t + intermediate) index the generation t after selection (but before crossover and mutation).* 001*.* Schemata and Fitness Values Mean Count Expect Obs Schema Mean Count Expect Obs 1.* 0..... however.667 6 4.* 111*.38 5 6.900 3 2..... when strings are merely duplicated no new sampling of hyperplanes is actually occurring since no new samples are generated. H. and f (H t) be the average evaluation of the sample of strings in partition H in the current population.089 9 9.30 10 13.991 11 10..2 7 **0*.4 3 *0**.7 4 1..0 3 1..933 12 11.1 5 *00*..767 3 2..4 4 00**.175 4 4.800 5 4..* *11*.* *01*.* 0.* 0*1*.9 9 1. this is generally not possible.7 2 1.70 2 3.0 4 100*..833 6 5. It is useful to begin formalizing the idea of tracking the potential sampling rate of a hyperplane.70 4 6..566 3 1...166 3 3...650 2 1..22 5 6..3 2 1..7 6 0*0*.* 0..* 0.0 4 1.* 10**......* 0.8 4 1.8 11 000*..5 3 *10*. In Table 2..* 0.0 7 1.* 0.1 12 110*.70 2 3.2 10 1.. Count refers to the number of strings in hyperplane H before selection. provides a means of generating new sample points while partially preserving distribution of strings across hyperplanes that is observed in the intermediate population.* **1*..033 6 6.0 14 010*.. the expected number of strings sampling a hyperplane partition after selection can be calculated by multiplying the number of hyperplane samples in the current population before selection by the average tness of the strings in the population that fall in that partition...* 1***... The observed number of copies actually allocated by selection is also given.9 6 011*.* 0.900 3 2. Theoretically. 12 .* 0.0 3 1..3 1 1..020 5 5...70 2 Table 2: The average tnesses (Mean) associated with the samples from the 27 hyperplanes de ned over the rst three bit positions are explicitly calculated..967 6 5.* 0. Recombination and mutation.010 10 10. we would like to have a sample of new points with this same distribution.000 21 21..* 0. The Expected representation (Expect) and Observed representation (Obs) are shown. Formally. the change in representation according to tness associated with the strings that are drawn from hyperplane H is expressed by: M (H t + intermediate) = M (H t) f (H t) : f Of course.1 8 01**.* 0.* 11**. Let M (H t) be the number of strings sampling H at the current generation t in some population.....* ****.* 0.

hyperplanes represented by schemata with more compact representations should be sampled at rates that are closer to those potential sampling distribution targets achieved under selection alone. 1). What does recombination do to the observed string distributions? Clearly. since in general there are L . Furthermore. This recombination operator is nice because it is relatively easy to quantify its e ects on di erent schemata representing hyperplanes. However. Maximum disruptions for order-2 schemata now occur when the 2 bits are at complementary positions on this ring. order-1 hyperplane samples are not a ected by recombination. More precisely. For current purposes a compact representation with respect 13 . Ken DeJong rst observed (1975) that 2-point crossover treats strings and schemata as if they form a ring. since each of the L-1 crossover points separates the bits in the schema. This leads to a general observation: when using 1-point crossover the positions of the bits in the schema are important in determining the likelihood that those bits will remain together during crossover. the observed distribution of potential samples from hyperplane partitions of order-2 and higher can be a ected by crossover. 1 crossover points in a string of length L. 11********** and 1**********1 The probability that the bits in the rst schema will be separated during 1-point crossover is only 1=L . assume we are are working with a string encoded with just 12 bits. which can be illustrated as follows: b7 b6 b5 b8 b4 b9 b3 b10 b2 b11 b12 b1 * * * * * 1 1 * * * * * where b1 to b12 represents bits 1 to 12. 1-point crossover is a special case of 2-point crossover where one of the crossover points always occurs at the wrap-around position between the rst and last bit. 1)=(L . Now consider the following two schemata. To keep things simple. Consider 1-point crossover. When viewed in this way. For 1-point and 2-point crossover it is clear that schemata which have bits that are close together on the string encoding (or ring) are less likely to be disrupted by crossover. since the single critical bit is always inherited by one of the o spring.3. 1.0. all hyperplanes of the same order are not necessarily a ected with the same probability. The probability that the bits in the second rightmost schema are disrupted by 1-point crossover however is (L . 3.1 2-point Crossover What happens if a 2-point crossover operator is used? A 2-point crossover operator uses two randomly chosen crossover points. or 1.1 Crossover Operators and Schemata The observed representation of hyperplanes in Table 2 corresponds to the representation in the intermediate population after selection but before recombination.1. Strings exchange the segment that falls between these two points.

if Ix is the index of the position of the rightmost occurrence of either a 0 or 1 and Iy is the index of the leftmost occurrence of either a 0 or 1. A common error that some researchers make when rst implementing inversion is to reverse bit segments of a directly encoded chromosome. 3. In this case an allele would correspond to a particular bit value in a speci c position on the chromosome. For example. but are maximally di erent with respect to 1-point crossover. the de ning length of ****1**0**10** is 12 . However. But just reversing some random segment of bits is nothing more than large scale mutation if the mapping from bits to parameters is position dependent.e. Inversion can change the linkage of bits on the chromosome such that bits with greater nonlinear interactions can potentially be moved closer together on the chromosome. then scanning left to right. The de ning length of a schema representing a hyperplane H is denoted here by (H ).1. linkage usually is equated with physical adjacency on a string.. Iy : Thus.1 are equally and maximally compact with respect to 2-point crossover. then (H )=L . Linkage under 2-point crossover is di erent and must be de ned with respect to distance on the chromosome when treated as a ring.to schemata is one that minimizes the probability of disruption during crossover. inversion is often considered to be a basic genetic operator. 3. Nevertheless. If 1-point crossover is used. Given that each position in a schema can be 0. 1 or *. the bits must have a position independent decoding. linkage can be seen as a generalization of the notion of a compact representation with respect to schema. Linkage is is sometimed de ned by physical adjacency of bits in a string encoding this implicitly assumes that 1-point crossover is the operator being used. The de ning length of a schemata is based on the distance between the rst and last bits in the schema with value either 0 or 1 (i.1. consider the following encoding composed of pairs where the rst number is a bit tag which indexes the bit and the second represents the bit value. ((9 0) (6 0) (2 1) (7 1) (5 1) (8 1) (3 0) (1 0) (4 0)) 14 . The de ning length is a direct measure of how many possible crossover points fall within the signi cant portion of a schemata. Of course. 1 is also a direct measure of how likely crossover is to fall within the signi cant portion of a schemata during crossover. then the de ning length is merely Ix . A position independent encoding requires that each bit be tagged in some way. as measured by de ning length. 5 = 7.3 Linkage and Inversion Along with mutation and crossover. before one can start moving bits around on the chromosome to improve linkage. since both of the two order-2 schemata examined in section 3.2 Linkage and De ning Length Linkage refers to the phenomenon whereby a set of bits act as \coadapted alleles" that tend to be inherited together as a group. not a * symbol). Note that this de nition is operator dependent. inversion is implemented by reversing a random segment of the chromosome. Typically.

If a string such as 1110101 were recombined between the rst two bits with a string such as 1000000 or 0100000. pc )M (H t) f c f In the derivation of the schema theorem a conservative assumption is made at this point. 4 The Schema Theorem A foundation has been laid to now develop the fundamental theorem of genetic algorithms. The probability that a randomly chosen mate samples H is just P (H t). For that part of the population that does not undergo crossover. Goldberg and Bridges (1990). disruptions) M (H t + 1) (1 . the representation due to selection is unchanged. " # f (H t) + p M (H t) f (H t) (1 . Let P (H t) denote the proportional represention of H obtained by dividing M (H t) by the population size. M (H t + intermediate) = M (H t) f (H t) : f To calculate M(H. One must now also consider how recombination is to be implemented. a new independent o spring would sample 11***** this is the sources of gains that is referred to in the above calculation. no disruption would occur in hyperplane 11***** since one of the o spring would still reside in this partition. H when only selection occurs.1 15 . Also. gains are ignored and the conservative assumption is made that crossover falling in the signi cant portion of a schema always leads to disruption. then no disruption occurs. Consider again what happens to a particular hyperplane. First we consider that crossover is applied probabilistically to a portion of the population. When crossover does occur. " # f (H t) + p M (H t) f (H t) (1 . We might wish to consider one exception: if two strings that both sample H are recombined. For example. To simplify things. pc )M (H t) f c f where disruptions overestimates losses. then we must calculate losses due to its disruptive e ects. Whitley (1991) as well as Holland (1975) discuss the problems of exploiting linkage and the recombination of tagged representations. assume we are interested in the schema 11*****. It is assumed that crossover within the de ning length of the schema is always disruptive to the schema representing H. In fact.The linkage can now be changed by moving around the tag-bit pairs. Disruption is therefore given by: (H ) (1 . but the string remains the same when decoded: 010010110. P (H t)): L. Recall that (H ) is the de ning length associated with 1-point crossover. this is not true and an exact calculation of the e ects of crossover is presented later in this paper. The schema theorem (Holland. if 1000000 and 0100000 were recombined exactly between the rst and second bit.t+1) we must consider the e ects of crossover as the next generation is created from the intermediate generation. losses) + gains M (H t + 1) = (1 . Thus. 1975) provides a lower bound on the change in the sampling rate for a single hyperplane from generation t to generation t + 1.

Let the mutation probability be pm where mutation always ips the bit. Without a mutation operator. " # f (H t) 1 . pc L (.1 We now have a useful version of the schema theorem (although it does not yet consider mutation) but it is not the only version in the literature. Mutation and Premature Convergence Clearly the schema theorem places the greatest emphasis on the role of crossover and hyperplane sampling in genetic search. This leads to the following expression of the schema theorem. or at least used at very low levels. mutation is included. There are several experimental researchers that point out that genetic search using mutation and no crossover often produces a fairly robust search. P (H t) f (H t) ) f f Finally. pm)o(H ). " # f (H t) 1 . p (H ) (1 .1 Crossover. 16 . H) P (H t + 1) P (H t) f (H t) 1 . The motivation for using mutation. the disruptive e ects of crossover and mutation should be minimized. occasionally changing bit values and allowing alternative alleles (and hyperplane partitions) to be retested. therefore acts as a background operator. p )o(H ) P (H t + 1) P (H t) f c m L.1 f " # 4. there is no possibility for reintroducing the missing bit value. then. Also. The order of H exactly corresponds to a count of the number of bits in the schema representing H that have value 0 or 1. This may particularly be a problem if one is working with a small population. To maximize the preservation of hyperplane samples after selection. p (H ) (1 . then there needs to be some source of continuing genetic diversity. Thus the probability that mutation does a ect the schema representing H is (1 . the expression can be rearranged with respect to pc. P (H t) f (H t) ) (1 . P (H t)) P (H t + 1) P (H t) f c L. then the algorithm has prematurely converged. But this is perhaps a limited point of view. both parents are typically chosen based on tness.At this point. Mutation. if the target function is nonstationary and the tness landscape changes over time (which is certainly the case in real biological systems). This suggests that mutation should perhaps not be used at all. If this happens without the genetic algorithm converging to a satisfactory solution. This can be added to the schema theorem by merely indicating the alternative parent is chosen from the intermediate population after selection. the inequality can be simpli ed. For example. Both sides can be divided by the population size to convert this into an expression for P (H t + 1). This particular interpretation of mutation ignores its potential as a hill-climbing mechanism: from the strict hyperplane sampling point of view imposed by the schema theorem mutation is a necessary evil. 1 (1 . Let o(H ) be a function that returns the order of the hyperplane H. After several generations it is possible that selection will drive all the bits in some position to a single value: either 0 or 1. And there is little or no theory that has addressed the interactions of hyperplane sampling and hill-climbing in genetic search. is to prevent the permanent loss of any particular bit or allele. the proportional representation of H at generation t + 1: Furthermore.

This is also a worst case probability of disruption which assumes no alleles found in the schema of interest are shared by the parents.75). This means that each bit is inherited independently from any other bit and that there is.1 Uniform Crossover The operator that has received the most attention in recent years is uniform crossover. no linkage between bits. for any order-3 schemata the probability of uniform crossover separating the critical bits is always 1 . but all other bits must be inherited by that same o spring. As the average evaluation of the strings in the population increases. One can now compute the average string evaluation as well as tness values using this adjusted evaluation. This problem can partially be addressed by using some form of tness scaling (Grefenstette.75 under 1-point crossover. But it is also easy to show analytically that if one is interested in minimizing schema disruption. On the other hand.. (It doesn't matter which o spring inherits the rst critical bit.e. We can de ne 84 di erent order-3 schemata over any particular string of 9 bits (i. randomly pick each bit from either of the two parent strings. The de ning length of a schema must equal 6 before the disruptive probabilities of 1-point crossover match those associated with uniform crossover (6/8 = . Alternatively. Of these schemata. it is also generally more disruptive than 1-point crossover. Spears and DeJong (1991) have shown that uniform crossover is in every case more disruptive than 2-point crossover for order-3 schemata for all de ning lengths. where o(H) is the order of the schema we are interested in. and the selective pressure based on tness is correspondingly reduced. 1986 Goldberg. This is again a point of view imposed by a strict interpretation of the schema theorem. 9 choose 3). one can subtract the evaluation of the worst string in the population from the evaluations of all strings in the population. the variance in tness decreases in the population. But operators that use many crossover points should be avoided because of extreme disruption to schemata. Uniform crossover works as follows: for each bit position 1 to L. Consider for a moment a string of 9 bits. 4. In general the probability of disruption is 1 . while uniform crossover is unbiased with respect to de ning length. Uniform crossover was studied in some detail by Ackley (1987) and popularized by Syswerda (1989).1.2. in fact. (1=2)2 = 0:75. In the simplest case.Another problem related to premature convergence is the need for scaling the population tness. 4. Another 15 have exactly the same disruption rate. 17 . only 19 of the 84 order-2 schemata have a disruption rate higher than 0. It also means that uniform crossover is unbiased with respect to de ning length. one can use a rank based form of selection. 1989). There may be little di erence between the best and worst individual in the population after several generations. and 50 of the 84 order-2 schemata have a lower disruption rate.) Thus. which will increase the resulting selective pressure. then 2-point crossover is better. disruption may not be the only factor a ecting the performance of a genetic algorithm.2 How Recombination Moves Through a Hypercube The nice thing about 1-point crossover is that it is easy to model analytically. (1=2)o(H ). It is relative easy to show that.

Any such move will reach a point that is one move closer to the destination. All of the points between 0000 and 1111 are reachable by some single application of uniform crossover. We can conveniently lay out the 4-dimensional hypercube as shown in Figure 4. Spears and DeJong (1991:314) speculate that. while 1-point crossover can generate 2(H . several researchers have suggested that uniform crossover is sometimes a better recombination operator. We can also view these strings as being connected by a set of minimal paths through the hypercube pick one parent string as the origin and the other as the destination. Not including the original parent strings. However. In Figure 4 it is easy to see that changing a single bit is a move up or down in the graph. Now change a single bit in the binary representation corresponding to the point of origin. In general. \With small populations. There is another sense in which uniform crossover is unbiased." Eshelman (1991) has made similar arguments outlining the advantages of disruptive operators. H. A 1-point crossover of 1111 and 0000 can only generate o spring that reside along the dashed paths at the edges of this graph. 1) di erent strings since there are H crossover points that produce unique o spring (see the discussion in the next section) and each crossover produces 2 o spring. Despite these analytical results. H di erent 2 18 . more disruptive crossover operators such as uniform or n-point (n 2) may yield better results because they help overcome the limited information capacity of smaller populations and the tendency for more homogeneity. The 2-point crossover operator can generate 2 H = H2 . 1-point crossover only generates strings that lie along two complementary paths (in the gure. One can point to its lack of representational bias with respect to schema disruption as a possible explanation. the leftmost and rightmost paths) through this 4-dimensional hypercube. It is also easy to see that 2-point crossover is less restrictive than 1-point crossover. uniform crossover can generate 2H . uniform crossover will draw a complementary pair of sample points with equal probability from all points that lie along any complementary minimal paths in the hypercube between the two parents. Note that the number of bits that are di erent between two strings is just the Hamming distance. Assume we wish to recombine the bits string 0000 and 1111. 2 di erent strings. while 1-point crossover samples points from only two speci c complementary minimal paths between the two parent strings.1111 0111 0011 0101 0001 1011 0110 0010 0000 1101 1001 0100 1110 1010 1000 1100 Figure 4: This graph illustrates paths though 4-D space. but this is unlikely since uniform crossover is uniformly worse than 2-point crossover.

we recombine the original strings. where the bits the strings share in common have been removed. There is also another bene t. A minimal alphabet maximizes the number of hyperplane partitions di19 . premature convergence. simple 1-point crossover will generate o spring ----11---1-----0 and ----00---0-----1 with a probability of 6/15 since there are 6 crossover points in the original parent strings between the third and fourth bits in the reduced subcube and L-1 = 15. On the other hand. genetic diversity and the role of hill-climbing by mutation requires better analytical methods. Crossover can now exploit the fact that there is really only 1 crossover point between any signi cant bits that appear in the reduced surrogate forms. Thus. population size. To help illustrate this idea. 5 The Case for Binary Alphabets The motivation behind the use of a minimal binary alphabet is based on relatively simple counting arguments. (This assumes the parents di er by at least two bits). but examine the o spring in their \reduced" forms. the o spring are guaranteed not to be duplicates of the parents.3 Reduced Surrogates Consider crossing the following two strings and a \reduced" version of the same strings. On the other hand. The ip side of this observation is that crossover is really restricted to a subcube de ned over the bit positions that are di erent. 4. 0001111011010011 0001001010010010 ----11---1-----1 ----00---0-----0 Both strings lie in the hyperplane 0001**101*01001*. We can isolate this subcube by removing all of the bits that are equivalent in the two parent structures. When viewed in this way. uniform fashion from all of the pairs of points that lie along complementary minimal paths in the subcube de ned between the two original parent strings. more or less identical to the one examined in the previous example. however. If at least 1 crossover point falls between the rst and last signi cant bits in the reduced surrogates. new sample points in hyperspace are generated.o spring since there are H choose 2 di erent crossover points that will result in o spring that are not copies of the parents and each pair of crossover points generates 2 strings. ----10---0-----0 and ----01---1-----1 are sampled with a probability of only 1/15 since there is only a single crossover point in the original parent structures that falls between the rst and second bits that de ne the subcube. We apply crossover on the reduced surrogates. One can remove this particular bias. simple 1-point or 2-point crossover will not. it is clear that recombination of these particular strings occurs in a 4-dimensional subcube. The debate on the merits of uniform crossover and operators such as 2-point reduced surrogate crossover is not a closed issue. Booker (1987) refers to strings such as ----11---1-----1 and ----00---0-----0 as the \reduced surrogates" of the original parent chromosomes. crossover operators. For example. Uniform crossover is unbiased with respect to this subcube in the sense that uniform crossover will still sample in an unbiased. To fully understand the interaction between hyperplane sampling.

this value is given by 2i L where L counts the number i i of ways to pick i positions that will have signi cant bit values in a string of length L and 2i is the number of ways to assign values to those positions. In a population of size N there should be N/2 samples of each of the 2L order-1 hyperplane partitions. and there are 22 possible ways to assign values to those bits. Each order-2 partition is sampled by 25% of the population. These numbers are based on the assumption that we are interested in hyperplane representations associated with the initial random population.is given by 2 L which is just the number of di erent ways to pick di erent positions and to assign all possible binary values to each subset of the positions. This ideal can be illustrated for order-1 and order-2 schemata as follows: Order 1 Schemata 0*** *0** **0* ***0 1*** *1** **1* ***1 00** 01** 10** 11** Order 2 Schemata 0*0* 0**0 *00* *0*0 0*1* 0**1 *01* *0*1 1*0* 1**0 *10* *1*0 1*1* 1**1 *11* *1*1 **00 **01 **10 **11 These counting arguments naturally lead to questions about the relationship between population size and the number of hyperplanes that are sampled by a genetic algorithm. 5. there are L pairs of order-1 schemata. Therefore 50% of the population falls in any particular order-1 partition. Clearly. For order-2 schemata. Thus.1 The N 3 Argument These counting arguments set the stage for the claim that a genetic algorithm processes on the order of N 3 hyperplanes when the population size is N. In general. we now need to show 2 L ! N 3 which implies 20 2 L ! (2 )3 . In general then. each hyperplane of order i is sampled by (1=2)i of the population. there are L ways to pick 2 locations in which to place the 2 critical bits positions. since selection changes the distributions over time. One can take a very simple view of this question and ask how many schemata of order-1 are sampled and how well are they represented in a population of size N.rectly available in the encoding for schema processing. Any set of order-1 schemata such as 1*** and 0*** cuts the search space in half. if we wish to count how many schemata representing hyperplanes exist at some order i. These low order hyperplane partitions are also sampled at a higher rate than would occur with an alphabet of higher cardinality. Recall that the number of di erent hyperplane partitions of order. The derivation used here is based on work found in the appendix of Fitzpatrick and Grefenstette (1988). Let be the highest order of hyperplane which is represented in a population of size N by at least copies is given by log(N= ). We wish to have at least samples of a hyperplane before claiming that we are statistically sampling that hyperplane.

In general. the number of hyperplanes in the space is nite. inspection the number of schemata processed is greater than N This argument does not hold in general for any population of size N. N must be chosen with respect to L to make the N 3 argument reasonable. however. C. been any real attempt to use these models to look at complex interactions between large numbers of hyperplane competitions. Second. There has not yet. Viewed in this way. However. D one can de ne all the same hyperplane partitions in a binary alphabet by de ning partitions such as (A and B). This either forces the use of larger population sizes or the e ectiveness of statistical sampling is diminished. Thus if we pick a population size where N = 3L then at most N hyperplanes can be processed (Michael Vose. It is obvious in some vacuous sense that knowing the distribution of the initial population as well as the tnesses of these strings (and the strings that are subsequently generated by the genetic algorithm) is su cient information for modeling the dynamic behavior of the genetic algorithm (Vose 1993). Still. the range of values 26 N 220 does represent a wide range of practical population sizes. Das and Crabb 1992). As discussed later in this tutorial. the alphabetic characters (and the corresponding hyperplane partitions) associated with a higher cardinality alphabet will not be as well represented in a nite population. First. The arguments for using binary alphabets assume that the schemata representing hyperplanes must be explicitly and directly manipulated by recombination.. For example. By only counting schemata that are exactly of order. At the same time. using an alphabet of the four characters A. the argument that N 3 hyperplanes are usefully processed assumes that all of these hyperplanes are processed with some degree of independence. this micro-level view of the genetic algorithm does not seems to explain its macro-level processing power.since = log(N= ) and N = 2 . 1991 Whitley. dynamic models of the genetic algorithm now exist (Vose and Liepins.2 The Case for Nonbinary Alphabets There are two basic arguments against using higher cardinality alphabets. personal communication). Antonisse (1989) has argued that this need not be the case and that higher order alphabets o er as much richness in terms of hyperplane samples as lower order alphabets. But a simple static count of the number of schemata available for processing fails to consider the dynamic behavior of the genetic algorithm. The total number of schemata associated with a string of length L is 3L. B.that should be well represented in a random initial population is given by: Px=1 2x L . Antonisse argues that one can look at the all subsets of the power set of schemata as also de ning hyperplanes. Therefore. there will be fewer explicit hyperplane partitions. all the N 3 argument really shows is that there may be as many as N 3 hyperplanes that are well represented given an appropriate population size.we might hope to x avoid arguments about interactions with lower order schemata. 5. Assume L 64 and 26 N 220. However. which implies 3 17: By 3. Fitzpatrick and Grefenstette now make the following arguments. the population size can be chosen arbitrarily. The sum of all schemata from order-1 to order. higher cardinality alphabets yield more hyperplane partitions 21 . Pick = 8. Notice that the current derivation counts only those schemata that are exactly of order. However. Given a string of length L. This suggests that we only need information about those strings sampled by the genetic algorithm. (C and D). etc.

but they fail to capture the full complexity of the genetic algorithm. Goldberg (1991) suggests that virtual minimal alphabets that facilitate hyperplane sampling can emerge from higher cardinality alphabets. a great deal of information is lost. the resulting predictions would in many cases be useless or misleading (e. Davis (1991) argues that the disadvantages of nonbinary encodings can be o set by the larger range of operators that can be applied to problems. 1993). personal communication. the schema theorem provides a lower bound that holds for only one generation into the future. There are other arguments for nonbinary encodings. By ignoring string gains and undercounting string losses. 7 An Executable Model of the Genetic Algorithm Consider the complete version of the schema theorem before dropping the gains term and simplifying the losses calculation. Grefenstette 1993 Vose. The inexactness of the inequality is such that if one were to try to use the schema theorem to predict the representation of a particular hyperplane over multiple generations. 22 . the sampling of strings is biased and the inexactness of the schema theorem makes it impossible to predict computational behavior. The schema theorem does not provide an exact picture of the genetic algorithms behavior and cannot predict how a speci c hyperplane is processed over time. 1989). In the next section. 6 Criticisms of the Schema Theorem There are some obvious limitations of the schema theorem which restrict its usefulness. Scha er and Eshelman (1992) as well as Wright (1991) present interesting arguments for real-valued encodings. These criticisms imply that the views of hyperplane sampling presented in section 3 of this tutorial may be good rhetorical tools for explaining hyperplane sampling.than binary alphabets.g. This is partly because the discussion in section 3 focuses on the impact of selection without considering the disruptive and generative e ects of crossover. the observed tness of a hyperplane H at time t can change dramatically as the population concentrates its new samples in more specialized subpartitions of hyperspace. This does not disprove Antonisse's claims. In general. it is an inequality. a introduction is to an exact version of the schema theorem. one cannot predict the representation of a hyperplane H over multiple generations without considering what is simultaneous happening to the other hyperplanes being processed by the genetic algorithm. that the hyperplanes that corresponds to the subsets de ned in this scheme actually provide new independent sources of information which can be processed in a meaningful way by a genetic algorithm. Therefore. First. Second. and that more problem dependent aspects of the coding can be exploited. Thus. After this. Antonisse's arguments fail to show however. but does suggest that there are unresolved issues associated with this hypothesis. looking at the average tness of all the strings in a particular hyperplane (or using a random sample to estimate this tness) is only relevant to the rst generation or two (Grefenstette and Baker.

Losses occur when a string crosses with another string and the resulting o spring fails to preserve the original string. the tnesses of strings are constants in the canonical genetic algorithm using tness proportional reproduction and one need not worry about changes in the observed tness of a hyperplane as represented by the current population. a bitwise exclusive-or denoted by . In the above illustration..e. Assume we apply this equation to each string in the search space. This translation is accomplished using bitwise addition modulo 2 (i. The equations can be generalized to cover the remaining 7 strings in the space. Given a speci cation of Z. Since Z is a string. Gains occur when two di erent strings cross and independently create a new copy of some string.g f In the current formulation. PI 2 will denote the probability that one-point crossover will fall between the second and third bit and the use of PI 2 in the computation implies that crossover must fall in this position for a particular string or string pair to e ect the calculation of losses or gains.0 and crossover in the relevant cases will always produce either a loss or a gain (depending on the expression in which the term appears).P (Z t + 1) = P (Z t) f (Z t) (1 . contained in the equation presented in this section to produce the appropriate corresponding strings for generating an expression for computing all terms of the form P(Z. Assuming 1-point crossover is used as an operator. The function (Si Z ) is applied to each bit string. PI 1 = PI 2 = 0:5. the model implicitly includes all lower order schemata. Since modeling strings models the highest order schemata. one can exactly calculate losses and gains. Si. if Z = 000 then recombining 100 and 001 will always produce a new copy of 000. crossover must fall in exactly this position with respect to the corresponding strings to result in a loss or a gain. the probability of \losses" and \gains" for the string Z = 000 are calculated as follows: losses = + gains = + PI 0 f P PI 1 PI 0 f (111) (111 ) + f P t P t PI 2 PI 0 f (101) (101 ) f P t P t : f (110) (110 ) + f t f P t f (011) (011 ) f f P (001) (001 ) (100) (100 ) + f f P t f f P t PI 1 (010) (010 ) (100) (100 ) f t f f P t P t f f P t PI 1 f (011) (011 ) (100) (100 ) + f PI 2 f f (001) (001 ) (110) (110 ) f P t : + PI 2 f (001) (001 ) (010) (010 ) f P t f The use of PI 0 in the preceding equations represents the probability of crossover in any position on the corresponding string or string pair.4). 23 . See Figure 4 and section 6.t+1). For example. Also. In this case. Z will refer to a string. it follows that PI 0 = 1. Likewise. The result is an exact model of the computational behavior of a genetic algorithm. The probability that one-point crossover will fall between the rst and second bit will be denoted by PI 1. fpc lossesg) + fpc gains.

Das and Crabb. L . it is only practical to develop equations for encodings of approximately 15 bits.2 Generating String Losses for 1-point crossover Consider two strings 00000000000 and 00010000100. 24 . however. All bits outside the sentry . In standard form. The equations need only be de ned once for one string in the space the standard form of the equation is always de ned for the string composed of all zero bits. the root of this graph is de ned by a string with a sentry bit in the rst and last bit positions. Consider strings B and B0 where the rst x bits are equal. We will refer to this string as a generator: it is like a schema. no disruption will occur. positions are \0" bits. . Because the number of terms in the equations is greater than the number of strings in the search space. indexed by i. will produce disruption: neither parent will survive crossover. the last (L . the middle ( + 1) bits have the pattern b##:::#b for B and b##:::#b for B0. if the crossover occurs before the rst \1" bit or after the last \1" bit. Bridges and Goldberg (1987) formalize the notion of a generator as follows. but # is used instead of * to better distinguish between a generator and the corresponding hyperplane. A move down and to the left in the graph causes the leftmost sentry bit to be shifted right a move down and to the right causes the rightmost sentry bit to be shifted left. Summing over the graph. The development of a general form for these equations is illustrated by generating the loss and gain terms in a systematic fashion (Whitley. 1###1 / \ / \ 01##1 1##10 / \ / \ / \ / \ 001#1 01#10 1#100 / \ / \ / \ / \ / \ / \ 00011 00110 01100 11000 The graph structure allows one to visualize the set of all generators for string losses. x .j. Let S represent the set of binary strings of length L. and the generator token \#" in all other intermediate positions. The b bits are referred to as \sentry bits" and they are used to de ne the probability of disruption. Also note that recombining 00000000000 with any string of the form 0001####100 will produce the same pattern of disruption. Given that the strings are of length L. Using 1-point crossover. the string composed of all zero bits is denoted S0. 1) strings generated as potential sources of string losses. one can see that there are PL=11 j 2L. B = S0 and the sentry bits must be 1.1 A Generalized Form Based on Equation Generators The 3 bit equations are similar to the 2 bit equations developed by Goldberg (1987). 1992). 7. Any crossover between the 1 bits.7. In general.1 j or (2L . 1) bits are equivalent. In general. The following directed acyclic graph illustrates all generators for \string losses" for the standard form of a 5 bit equation for S0.

1 f where (Si) is a function that counts the number of crossover points between sentry bits in string Si . 7. Sentry bits are located such that 1-point crossover between sentry bits produces a new copy of B. assuming L .11L. ! > . 1.For each string Si produced by one of the \middle" generators in the above graph structure.1 and S0 !] = 00:::0L.3 Generating String Gains for 1-point crossover Bridges and Goldberg (1987) note that string gains for a string B are produced from two strings Q and R which have the following relationship to B. where for the standard form of the equations: A S0 ] = ##:::##1 . 10000 00001 / \ / \ #1000 10000 00001 0001# / \ / \ / \ / \ ##100 #1000 10000 00001 0001# 001## / \ / \ / \ / \ / \ / \ ###10 ##100 #1000 10000 00001 0001# 001## 01### 25 . Bridges and Goldberg de ne a beginning function A B.! ##:::##: These generators can again be presented as a directed acyclic graph structure composed of paired templates which will be referred to as the upper A-generator and lower -generator. The following are the generators in a 5 bit problem. a term of the following form is added to the losses equations: (Si) f (Si ) P (S t) i L. Region -> beginning middle end Length -> a r w Q Characteristics ##:::#b = = = = b#:::# R Characteristics The \=" symbol denotes regions where the bits in Q and R match those in B again B = S0 for the standard form of the equations. ] and ending function B !].10 :::0L.!. while crossover of Q and R outside the sentry bits will not produce a new copy of B.

the vector st < represents the t th generation of the genetic algorithm and the i th component of st is the probability that the string i is selected for the gene pool. ( . The A-generator of the root has a \1" bit as the sentry bit in the rst position. A simple transformation function maps the equations to all other strings in the space. 7. can be absorbed by the term.1 for each pair of generators (the root is level 1). (The x and y terms are correction factors added to and ! in order to uniquely index a string in S . The generators are used as part of a two stage computation where the generators are rst used to create an exact equation in standard form.1 where (S +x S!+y ) counts the number of crossover points that fall in the critical region de ned by the sentry bits located at . also indexed by i: This notation will be used where appropriate to avoid confusion. the total number of string pairs that must be included in the equations to calculate string gains . In the Vose and Liepins model. and all other bits are \0. but before recombination). sti P (Si t)f (Si) where is the equivalence relation such that x y if and only if 9 > 0 j x = y: In this formulation.) Let the critical crossover region associated with S +x and S!+y be computed by the function (S +x S!+y ) = L . which would represent the average population tness normally associated with tness proportional reproduction. 1 and S!+y was produced by the -generator with a sentry bit at L. the term 1=f .In this case. 1): For each string pair S +x and S!+y a term of the following form is added to the gains equations: (S +x S!+y ) f (S +x) P (S t) f (S!+y ) P (S t) +x !+y f f L. For any level k of the directed graph there are k generators and the number of string pairs generated at that level is 2k.1 : k Let S +x and S!+y be two strings produced by a generator pair.!. The symbol S has already been used to denote the set of binary strings. such that S +x was produced by the A-generator and has a sentry bit at location . for S0 of length L is PL=11 k 2k. Therefore. Note that st corresponds to the expected distribution of strings in the intermediate population in the generational reproduction process (after selection has occurred. Using i to refer to a string in s can sometimes be confusing." A move down and left in the graph is produced by shifting the left sentry bit of the current upper A-generator to the right. and all other bits are \0.4 The Vose and Liepins Models The executable equations developed by Whitley (1993a) represent a special case of the model of a simple genetic algorithm introduced by Vose and Liepins (1991). ! ." The -generator of the root has a \1" bit as the sentry bit in the last position. 1 and L . In the Vose and Liepins formulation. 26 . A move down and right is produced by shifting the right sentry bit of the current lower -generator to the left. the root of the directed acyclic graph is de ned by starting with the most speci c generator pair. Each vacant bit position outside of the sentry bits which results from a shift operation is lled using the # symbol. !.

The remainder of the matrix entries are given by 0:5 (S +x. the function ri j (k) is used to construct a mixing matrix M where the i j th entry mi j = ri j (0). given the assumption of no mutations. Vose and Liepins formalize the notion that bitwise exclusive-or can be used to remap all the strings in the search space. For current purposes. The matrix M is symmetric and is zero everywhere on the diagonal except for entry m0 0 which is 1. thus. each entry is given by 1 .Let V = 2L. then the standard form of the executable equations corresponds to the following portion of the Liepins and Vose model (T denotes transpose): sT Ms: An alternative form of M denoted M 0 can be de ned by having only a single entry for each string pair i j where i 6= j .0. the de nition of ri j (k) assumes that exactly one o spring is produced. This is done by doubling the value of the enties in the lower triangle and setting the entries in the upper triangle of the matrix to 0. assume no mutation is used and 1-point crossover is used as the recombination operator. Therefore. E ptk+1 = X ij sti stj ri j (k): To further generalize this model. mj k = 0).S!+y ) : For each pair of strings produced by the string gains generators L 1 determine their index and enter the value returned by the function into the corresponding location in M: For completeness. Once de ned M does not change since it is not a ected by variations in tness or proportional representation in the population. the number of strings in the search space. Finally let ri j (k) be the probability that string k results from the recombination of strings i and j: Now. The vector pt <V is de ned such that the k th component of the vector is equal to the proportional representation of string k at generation t before selection occurs. The k component of pt would be the same as P (Sk t) in the notation more commonly associated with the schema theorem. which is equivalent to producing two o spring. But note that M has two entries for each string pair i j where i 6= j .0. the rst row and column of the matrix has entries inversely related to the string losses probabilities. Not including the above subcomputation. For completeness. They show that if 27 .0. (Sj Sk ) = 0 for all pairs of strings not generated by the string gains generators (i. the probability of obtaining S0 during reproduction is 1. Assuming each component of s is given by si = P (Si t)(f (Si)=f )).. using E to denote expectation. (0:5 (Si)=L . Note that this matrix gives the probabilities that crossing strings i and j will produce the string S0: Technically. Note that M is expressed entirely in terms of string gain information. this has the rhetorical advantage that sT M 0 (: 1)s0 = P (S0 t)(f (S0)=f )(1 . the remainder of the computation of sT M 0s calculates string gains. losses): where M 0(: 1) is the rst column of M 0 and s0 is the rst component of s. Thus. that s is updated each generation to correct for changes in the population average. (Si) for strings not produced by the string loss generators is 0 and. 1). where each string in the set S is crossed with S0. in this case represented by the vector s. and that 1-point crossover is used.e.

.1 >T = < sj 0 ::: sj (V . The transformation from the vector pt+1 to the next intermediate population represented by st+1 is given as follows: st+1 F M(st): Vose and Liepins give equations for calculating the mixing matrix M which not only includes probabilities for 1-point crossover.1 s >T Recall that s denoted the representation of strings in the population during the intermediate phase as the genetic algorithm goes from generation t to t + 1 (after selection.1 s )T M V . Let ri j (k) be the probability that k results from the recombination of strings i and j. A general operator M can be de ned over s which remaps sT Ms to cover all strings in the space. A permutation function. If recombination is a combination of crossover and mutation then ri j (k 0) = ri k j k (0). Vose (1993) surveys the current state of this research. This reordering is equivalent to remapping the variables in the executable equations (See Figure 4). M(s) = < ( 0 s)T M 0 s ::: ( V . More complex extension of the Vose and Liepins model include nite population models using Markov chains (Nix and Vose. To complete the cycle and reach a point at which the Vose and Liepins models can be executed in an iterative fashion. A tness matrix F is de ned such that tness information is stored along the diagonal the i i th element is given by f (i) where f is the evaluation function. but also mutation.A Transform Function to Rede ne Equations 000 001 010 011 010 ) 010 010 ) 011 010 ) 000 010 ) 001 100 101 110 111 010 ) 110 010 ) 111 010 ) 100 010 ) 101 Figure 5: The operator is bit-wise exclusive-or. tness information is now explicitly introduced to transform the population at the beginning of iteration t + 1 to the next intermediate population. 1992). recombination is a combination of crossover and mutation then ri j (k q) = ri k j k (q ) and speci cally ri j (k) = ri j (k 0) = ri k j k (0): This allows one to reorder the elements in s with respect to any particular point in the space. The strings are reordered with respect to 010. the size of the search space.1) >T where the vectors are treated as columns and V = 2L. but before recombination). 28 . is de ned as follows: j < s0 ::: sV .

Parents were mutated to create o spring. Recombination operators for evolutionary strategies also tend to di er from Holland-style crossover. Evolutionary Programming is based on the early book by L. The name \steady state" is somewhat misleading. Owens and Walsh (1966) entitled. But higher selective pressure is not the only di erence between Genitor and the canonical genetic algorithm. the worst individual in the population is replaced. Goldberg and Deb (1991) have shown replacing the worst member of the population generates much higher selective pressure than random replacement. First. Fogel. In Genitor. Evolution Strategies are based on the work of Rechenberg (1973) and Schwefel (1975 1981) and are discussed in a survey by Back. The ( )-ES is closer to the generational model used in canonical genetic algorithms o spring replace parents and then undergo selection. Genitor is a ( + ) strategy while the canonical genetic algorithm is a ( ) strategy. the accumulation of improved strings in the population is monotonic. 1991) and are therefore more susceptible to sample error and genetic drift. In ( + )-ES parents produce o spring the population is then reduced again to parents by selecting the best solutions from among both the parents and o spring. This results in a more aggressive search that in practice is often quite e ective. The individuals. Ranking helps to maintain a more constant selective pressure over the course of search. or which were developed independently. 29 . The advantage is that the best points found in the search are maintained in the population. parents survive until they are replaced by better solutions. since these algorithms show more variance than canonical genetic algorithms in the terms of hyperplane sampling behavior (Syswerda.There are several population based algorithms that are either spin-o s of Holland's genetic algorithm. reproduction produces one o spring at a time. Two parents are selected for reproduction and produce an o spring that is immediately placed back into the population. but rather the least t (or some relatively less t) member of the population. There are three di erences between Genitor-style algorithms and canonical genetic algorithms. Thus. The second major di erence is in how that o spring is placed back in the population. or \organisms. allowing operations such as averaging parameters. Ho meister and Schwefel (1991). Arti cial Intelligence Through Simulated Evolution. for example. Thus. 8 Other Models of Evolutionary Computation 8. Organisms that best solved some target function obtained the opportunity to reproduce. to create an o spring. To borrow terminology used by the Evolution Strategy community (as suggested by Larry Eshelman). Two examples of Evolution Strategies (ES) are the ( + )-ES and ( )-ES. There has been renewed interest in Evolution Programming as re ected by the 1992 First Annual Conference on Evolutionary Programming (Fogel and Atmar 1992). The third di erence between Genitor and most other forms of genetic algorithms is that tness is assigned according to rank rather than by tness proportionate reproduction.1 Genitor Genitor (Whitley 1988 1989) was the rst of what Syswerda (1989) has termed \steady state" genetic algorithms." in this study were nite state machines. Evolution Strategies and Evolutionary Programming refer to two computational paradigms that use a population based search. O spring do not replace parents.

g. 8. this kind of \survival of the ttest" replacement method already imposes considerable selective pressure. the N best unique individuals are drawn from the parent population and o spring population to create the next generation. \Traditional genetic algorithms. Thus CHC uses random selection. Strings with binary encodings must be a certain Hamming distance away from one another before they are allowed to reproduce. except that the best string is preserved intact. CHC is typically run using small population sizes (e. CHC stands for Cross generational elitist selection. after the initial population is created. the population converges to the point that it begins to more or less reproduce many of the same strings.8." 30 . Davis argues that hybridizing genetic algorithms with the most successful optimization methods for particular problems gives one the best of both worlds: correctly implemented. Of course. In a description of a parallel genetic algorithm Muhlenbein (1991:320) states. are generally not the most successful optimization algorithm on any particular domain" (1991:59). \Dave" Davis states in the Handbook of Genetic Algorithms. Other researchers. \The o spring does local hill-climbing. genetic search is restarted using only crossover. This form of \incest prevention" is designed to promote diversity. Muhlenbein takes a similar opportunistic view of hybridization.3 Hybrid Algorithms L." Furthermore. it also introduces the additional computational overhead of a population based search. Eshelman also uses a form of uniform crossover called HUX where exactly half of the di ering bits are swapped during crossover. \Each individual does local hill-climbing. All strings undergo heavy mutation. however. these algorithms should do no worst than the (usually more traditional) method with which the hybridizing is done. Heterogeneous recombination (by incest prevention) and Cataclysmic mutation. After mutation. except restrictions are imposed on which strings are allowed to mate. 50) thus using uniform crossover in this context is consistent with DeJong and Spears (1991) conjecture that uniform crossover can provide better sampling coverage in the context of small populations. after each o spring is created. and employs \recombination operators" that may be domain speci c. Davis often uses real valued encodings instead of binary encodings. With such small population sizes. After recombination. which is used to restart the search when the population starts to converge. The rationale behind CHC is to have a very aggressive search (by using monotonic selection through survival of the best strings) and to o set the aggressiveness of the search by using highly disruptive operators such as uniform crossover. although robust. CHC explicitly borrows from the ( + ) strategy of Evolution Strategies.2 CHC Another genetic algorithm that monotonically collects the best strings found so far is the CHC algorithm developed by Larry Eshelman (1991). Duplicates are removed from the population. so that there is no real need to use any other selection mechanisms. As Goldberg has shown with respect to Genitor. At this point the CHC algorithm uses cataclysmic mutation. such as Michalewicz (1992) also use nonbinary encodings and specialized operations in combination with a genetic based model of search.

in which case hill-climbing is a fast and e ective form of search. There may be several reasons for this. Third.Experimental researchers and theoreticians are particularly divided on the issue of hybridization. First." (1992:24). a hybrid strategy impairs hyperplane sampling. Vose's work (personal communication. Despite the theoretical objections. in general hill-climbing may nd a small number of signi cant improvements. On the other hand. Muhlenbein states that for many problems \many nonstandard genetic algorithms work well and the standard genetic algorithm performs poorly. using local optimization to improve the initial population of strings only biases the initial hyperplane samples. All the theory concerning hyperplane sampling has been developed with respect to the canonical genetic algorithm. In this case. This raises a very interesting issue. When is a genetic algorithm a hyperplane sampler and when it is a hill-climber? This is a nontrivial question since it is the hyperplane sampling abilities of genetic algorithms that are usually touted as the source of global sampling. Mutation and Hillclimbing. Coding the learned information back onto the chromosome means that the search utilizes a form of Lamarckian evolution. This alters the statistical information about hyperplane partitions that is implicitly contained in the population. (For the functions Vose has examined so far mutation always reduces the number of xed points. then Lamarckian learning should not be used.) 31 . the hybrid genetic algorithm is hill-climbing from multiple points in the search space. The main criticism is that if we wish to preserve the schema processing capabilities of the genetic algorithm. the e ects on schemata and hyperplane sampling may be minimal. Changing information in the o spring inherited from the parents results in a loss of inherited schemata. learning is being added to the evolution process. but does not disrupt it entirely. Second. Therefore using local optimization to improve each o spring undermines the genetic algorithm's ability to search via hyperplane sampling. By adding hill-climbing or hybridizing with some other optimization methods. \How genetic algorithms really work: I. Unless the objective function is severely multimodal it may be likely that some strings (o spring) will be in the basin of attraction of the global solution. but may not dramatically change the o spring. some researchers argue that crossover is unnecessary and that mutation is su cient for robust and e ective search. June 1993) with exact models of the canonical genetic algorithm indicates that even low levels of mutation can have a signi cant impact on convergence and change the number of xed points in the space. For example. The chromosomes improved by local hill-climbing or other methods are placed in the genetic population and allowed to compete for reproductive opportunities. 9 Hill-climbers or Hyperplane Samplers? In a recent paper entitled. Alternative forms of genetic algorithms often use mechanisms such as monotonic selection of the best strings which could easily lead to increased hill-climbing. but does not interfere with subsequent hyperplane sampling. hybrid genetic algorithms typically do well at optimization tasks." Muhlenbein shows that an Evolution Strategy algorithm using only mutation works quite well on a relatively simple test suite.

Goldberg and Deb (1991) show analytically that this form 32 . a ne grain massively parallel implementation that assumes one individual resides at each processor will be explored. First. or massively parallel genetic algorithms) the name cellular genetic algorithm is used in this tutorial. In natural populations.g. ne grain genetic algorithms. Instead of using tness proportionate reproduction or directly using ranking. In each of the following models. 1991). thousands or even millions of individuals exist in parallel. Finally. Second. except perhaps very low order hyperplanes (there are only 5 samples of each order-2 hyperplane in a population of 20). Recall that the implementation of one generation in a canonical genetic algorithm can be seen as a two step process. then the e ective population size may not be large enough to support hyperplane sampling. Small populations are much more likely to rely on hill-climbing.. Second. very high selective pressure suggests hill-climbing may dominate the search. First. three di erent ways of exploiting parallelism in genetic algorithms will be reviewed. This process of randomly selecting two strings from the current population and placing the best in the intermediate population is repeated until the intermediate population is full. any of these models could be implemented in massively parallel fashion. This suggests a degree of parallelism that is directly proportional to the population size used in genetic search. The best of the two strings is then placed in the intermediate population. The only change that will be made is that selection will be done by Tournament Selection (Goldberg. strings are mapped to processors in a particular way. Hyperplane sampling requires larger populations. 10 Parallel Genetic Algorithms Part of the biological metaphor used to motivate genetic search is that it is inherently parallel. What tends to be di erent is the role of local versus global communication. a parallel genetic algorithm similar to the canonical genetic algorithm will be reviewed next an Island Model using distinct subpopulations will be presented. Assume two strings are selected out of the current population after evaluation. Usually this is done in a way that maximizes parallelism while avoiding unnecessary processor communication. Tournament selection implements a noisy form of ranking. 10. Therefore. If the 5 best individuals in a population of 100 strings reproduce 95% of the time.1 Global Populations with Parallelism The most direct way to implement a parallel genetic algorithm is to implement something close to a canonical genetic algorithm. while these algorithms have been referred to by a number of somewhat awkward names (e. crossover and mutation are applied to produce the next generation. It can be shown that the ne grain models are a subclass of cellular automata (Whitley 1993b). A population of 20 individuals just doesn't provide very much information about hyperplane partitions. selection is used to create an intermediate population of duplicate strings selected according to tness. tournaments are held to ll the intermediate population. 1990 Goldberg and Deb. In this paper.In practice there may be clues as to when hill-climbing is a dominant factor in a search. However.

16 independent searches occur. 10. genetic drift will tend to drive these populations in di erent directions. N.0 bias.of tournament selection is the same in expectation as ranking using a linear 2.400 strings. In this case. then the ranking is linear and the bias is proportional to the probability with which the best string is chosen. Sampling error and genetic drift are particularly signi cant factors in small populations and.g. east and west: NSEW). and there is some designated way in which genetic material is moved from one island to another. or Genitor. With the addition of tournament selection.500 simple processors laid out on a 50x50 2-dimensional grid. If migration is too infrequent. The new strings that now reside in the processors represent the intermediate generation. 1991 Tanese 1989 Starkweather et al. 1990 Gorges-Schleuter. the subpopulations would swap a few strings. a parallel form of the canonical genetic algorithm can now be implemented in a fairly direct fashion. perhaps every ve generations or so. Each subpopulation is an island. or CHC.. Each search will be somewhat di erent since the initial populations will impose a certain sampling bias also. are even more pronounced in genetic algorithms such as Genitor and CHC when compared to the canonical genetic algorithm. One way to do this is to break the total population down into subpopulations of 100 strings each. If a large number of strings migrate each generation. If a winner is chosen probabilistically from a tournament of 2. Assume we wish to use 16 processors and have a population of 1. Processors on the edge of the grid wrap around to form a torus. Crossover and evaluation can now occur in parallel.3 Cellular Genetic Algorithms Assume we have 2. the Island Model is able to exploit di erences in the various subpopulations this variation in fact represents a source of genetic diversity. Assume the processors are numbered 1 to N/2 and the population size. 1991. each using a population of 100 strings without migration. It could be a canonical genetic algorithm. How should one implement a genetic algorithm on such an architecture? 33 . it may not be enough to prevent each small subpopulation from prematurely converging. This migration allows subpopulations to share genetic material (Whitley and Starkweather.600 strings or we might wish to use 64 processors and 6. is even 2 strings reside at each processor. as previous noted. By introducing migration. Each one of these subpopulations could then execute as a normal genetic algorithm. north. then global mixing occurs and local di erences between islands will be driven out.) Assume for a moment that one executes 16 separate genetic algorithms. Occasionally. south. Each processor holds two independent tournaments by randomly sampling strings in the population and each processor then keeps the winners of the two tournaments. Processors communicate only with their immediate neighbors (e. 10.2 Island Models One motivation for using Island Models is to exploit a more coarse grain parallel model.

The coloring of the cells in the cellular genetic algorithm represents genetically similar material that forms virtual islands isolated by distance. One can obviously assign one string per processor or cell. The common theme in cellular genetic algorithms is that selection and mating are typically restricted to a local neighborhood. if one neighborhood of strings is 20 or 25 moves away from another neighborhood of strings.An Island Model Genetic Algorithm A Cellular Genetic Algorithm Figure 6: An example of both an island model and a cellular genetic algorithm. After several generations. only one o spring is produced. processor) seek a mate close to home. 1991). 1991 Gorges-Schleuter. After the rst random population is evaluated. After a few generations. The arrows in the cellular model indicate that the grid wraps around to form a torus. Assuming that mating is restricted to adjacent processors.. 1990 Davidor. it is much more practical to have each string (i. but there is the potential for similar e ects. Of course. again due to sampling e ects in the initial population and genetic drift. and becomes the new resident at that processor. There are no explicit islands in the model. 1989 Collins and Je erson. competition between local groups will result in fewer and larger neighborhoods. 34 . In either case. 1991 Hillis. these neighborhoods are just as isolated as two subpopulations on separate islands. neighbors that are only 4 or 5 moves away have a greater potential for interaction.e. or alternatively. some form of local probabilistic selection could be used. Local mating and selection creates local evolutionary trends. there emerge many small local pockets of similar strings with similar tness values. 1932 Muhlenbein. Several people have proposed this type of computational model (Manderick and Spiessens. however. 1991). But global random mating would now seem inappropriate given the communication restrictions. Instead. Each processor can pick the best string in its local neighborhood to mate with. the pattern of strings over the set of processors should also be random. This kind of separation is referred to as isolation by distance (Wright.

in simplistic terms. Davis. Back. References Ackley. (1991) A Survey of Evolution Strategies. debates and even disagreements. and Schwefel. pp 249-256. Antonisse. Acknowledgements: This tutorial not only represents information transmitted through scholarly works. Davis. J. L. (1989) A New Interpretation of the Schema Notation that Overturns the Binary Encoding Constraint. J. D. ed. In. Davidor. Baker. Y. (1991) Selection in Massively Parallel Genetic Algorithms. Lawrence Erlbaum. (1991) A Naturally Occurring Niche & Species Phenomenon: The Model and First Results. K. H. Proc 3rd International Conf on Genetic Algorithms. D. Second International Conf. pp 257-263. Grefenstette. but also through conference presentations. L. personal discussions. of Computer and Communication Sciences. PhD Dissertation. Many other timely issues have not been covered in this tutorial. Proc. Morgan-Kaufmann. Proc. of Michigan. (1987) Improving Search in Genetic Algorithms. Genetic Algorithms and Their Applications: Proc. Morgan-Kaufmann. pp. DeJong. deals with con icting hyperplane competitions that have the potential to either mislead the genetic algorithm. ed. Booker. Kluwer Academic Publishers. D.P. R. The notion of deception. Ann Arbor. T. My thanks to the people in the genetic algorithm community who have educated me over the years. Univ.J.11 Conclusions One thing that is striking about genetic algorithms and the various parallel models is the richness of this form of computation. the issue of deception has not been discussed. L. Dept. In particular. Morgan Kaufman. and Goldberg. (1987) An analysis of reproduction and crossover in a binary-coded genetic Algorithm. 61-73. Any errors or errant interpretations of other works are my own. 4th International Conf. (1985) Adaptive selection methods for genetic algorithms. or to simply confound the search because the con icting hyperplane competitions interfere with the search process. (1987) Reducing Bias and Ine ciency in the Selection Algorithm. H. J. on Genetic Algorithms and Their Applications. Morgan-Kaufmann. (1975) An Analysis of the Behavior of a Class of Genetic Adaptive Systems. F. Grefenstette. International Conf. ed.D. on Genetic Algorithms. 4th International Conf. Bridges. Ho meister. Baker. J. For an introduction to the notion of deception see Goldberg (1987) and Whitley (1991) for a criticism of the work on deception see Grefenstette (1993). (1987) A Connectionist Machine for Genetic Hillclimbing. What may seem like simple changes in the algorithm often result in surprising kinds of emergent behavior.. Recent theoretical advances have also improved our understanding of genetic algorithms and have opened to the door to using more advanced analytical methods. C. and Je erson. J. Genetic Algorithms and Simulating Annealing. on Genetic Algorithms and Their Applications. Lawrence Erlbaum. Proc. Grefenstette. Work presented in the tutorial was supported by NSF grant IRI-9010546 and the Colorado Advanced Software Institute. Collins. 2nd International Conf. 35 . Proc 4th International Conf on Genetic Algorithms. (1991) Handbook of Genetic Algorithms. on Genetic Algorithms. ed. Proc. Lawrence Erlbaum. Van Nostrand Reinhold. Morgan-Kaufmann.

SpringerVerlag. L. Owens. (1990) A Note on Boltzmann Tournament Selection for Genetic Algorithms and Population-oriented Simulated Annealing. and Walsh. Springer Verlag. Fitzpatrick. Foundations of Genetic Algorithms. Foundations of Genetic Algorithms. New York. pp 150-159. K. D. J. (1993) Deception Considered Harmful. A. (1991) Evolution in Time and Space . (1992) Genetic Algorithms + Data Structures = Evolutionary Programs. (1988) Genetic Algorithms in Noisy Environments. and Cybernetics. pp 316-337. Whitley. and Baker. L. (1991) Explicit Parallelism of Genetic Algorithms through Population Structures. D. MA: Addison-Wesley. Holland.M. Parallel Problem Solving from Nature. Hillis. eds. and Deb. (1986) Optimization of Control Parameters for Genetic Algorithms. (1992) How genetic algorithms really work: I. Biological Cybernetics. ed. Engineering Mechanics. D. Morgan-Kaufmann. Manderick. G. D. Morgan Kaufmann. University of Michigan Press.. Deceptive Problem. G.J. (1990) Co-Evolving Parasites Improve Simulated Evolution as an Optimizing Procedure.. Muhlenbein. Morgan-Kaufmann. and Vose. Rawlins. C. TCGA 90003. Univ. Journal of Experimental and Theoretical Arti cial Intelligence. AI Series. Springer Verlag. D. and Atmar. (1991) A Comparative Analysis of Selection Schemes Used in Genetic Algorithms. Proc 3rd International Conf on Genetic Algorithms. M. Goldberg. Mutation and Hillclimbing. J. H. Foundations of Genetic Algorithms. Parallel Problem Solving from Nature -2-. (1990) Representation Issues in Genetic Algorithms.The Parallel Genetic Algorithm. M. (1991) The CHC Adaptive Search Algorithm..J. (1989) Fine Grained Parallel Genetic Algorithms. (1992) First Annual Conference on Evolutionary Programming. Pitman. Fogel.J. J. 62:397-405.W. Foundations of Genetic Algorithms -2-. D. ed. Rawlins. J. Grefenstette. and Spiessens P. Genetic Algorithms and Simulated Annealing. J. Liepins. ed. (1966) Arti cial Intelligence Through Simulated Evolution. (1989) Genetic Algorithms in Search. J. Grefenstette. B.J. (1991) The Theory of Virtual Alphabets. Goldberg. Alabama. Annals of Mathematics and Arti cial Intelligence.. Manderick. Gorges-Schleuter. In. R. L. Davis. D. Goldberg. Grefenstette. pp 256-283. Goldberg. Fogel. Manner and B. Systems. 36 .J. 2:101-115.. pp: 75-91. Morgan-Kaufmann. Rawlins. Optimization and Machine Learning. Physica D 42. John Wiley.. and Vose. H. 5:79-88.. (1992) Modeling Genetic Algorithms with Markov Chains. Michalewicz. Parallel Problem Solving from Nature. Man.Eshelman. Goldberg. A. 16(1): 122-128. J. and Grefenstette. North Holland. pp 228-234. Goldberg. Proc 3rd International Conf on Genetic Algorithms. ed. J. G. G. Nix. (1990) An Analysis of a Reordering Operator on a GA-Hard Problem. Morgan-Kaufmann. Morgan-Kaufmann. (1975) Adaptation In Natural and Arti cial Systems. M. Machine Learning. D. Z. and Bridges.J. eds. Reading. M. IEEE Trans. Muhlenbein. ed. D.J. pp 69-93. (1987) Simple Genetic Algorithms and the Minimal. (1989) How Genetic Algorithms Work: A Critical Look at Implicit Parallelism. 3(2/3): 101-120. pp 428-433.

G. Proc 3rd International Conf on Genetic Algorithms. Whitley. J. (1989) Distributed Genetic Algorithms. (1991) Fundamental Principles of Deception in Genetic Search. Proc 3rd International Conf on Genetic Algorithms. Proc.P. D. Congr. Davis. pp 116-121.. ed. and Liepins. G. (1992) Arti cial Intelligence.. Stuttgart. Foundations of Genetic Algorithms. H. D. Wiley. Denver. pp 434-439. Foundations of Genetic Algorithms -2-. In. (1991) A Study of Reproduction in Generational and Steady-State Genetic Algorithms. P. Theor. L. A. Whitley. Morgan-Kaufmann. H. Intell. (1993) Modeling Simple Genetic Algorithms. (1975) Evolutionsstrategie und numerische Optimierung. Whitley. Frommann-Holzboog Verland. S. D. D. and Kauth. Spears. Whitley. (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. G. Tanese. Morgan Kaufmann. Starkweather. D. CO. Rawlins. Whitley. Morgan Kaufmann. Morgan Kaufmann. C. pp 94-101.Rechenberg.D.P. Scha er. Annals of Mathematics and Arti cial Intelligence. G. Proceedings of the Rocky Mountain Conference on Arti cial Intelligence. Wright. K. T. Foundations of Genetic Algorithms -2-. L. T. Morgan-Kaufmann... D. (1993a) An Executable Model of a Simple Genetic Algorithm. I. Morgan Kaufmann. (1993) Real-Coded Genetic Algorithms and Interval Schemata. Pitman. Journal Expt. 5th International Conference on Genetic Algorithms. (1987) Some E ects of Selection Procedures on Hyperplane Sampling by Genetic Algorithms. Genetic Algorithms and Simulated Annealing. M. Das. Vose. Morgan-Kaufmann. Vose. G. pp 118-130. 6:367-388. Rawlins. Foundations of Genetic Algorithms. Springer Verlag. (1988) GENITOR: a Di erent Genetic Algorithm. Whitley. (1981) Numerical Optimization of Computer Models. 37 .. (1991) Optimization Using Distributed Genetic Algorithms. Morgan Kaufmann. (1990) Genitor II: a Distributed Genetic Algorithm. Schwefel. Whitley. Third Edition. Parallel Problem Solving from Nature.. (1991) Punctuated Equilibria in Genetic Search. (1989) The GENITOR Algorithm and Selective Pressure. and Starkweather. ed. 6th Int. Inbreeding. (1932) The Roles of Mutation. Morgan-Kaufmann. ed. ed. Wright. Foundations of Genetic Algorithms. Artif.. Syswerda. D. (1989) Uniform Crossover in Genetic Algorithms. J. D. ed. D. (1991) Genetic Algorithms for Real Parameter Optimization. and Mathias. Foundations of Genetic Algorithms. R. pp: 63-73. M. Crossbreeding. K. Whitley. G. and Crabb. Winston. pp 2-9. 2:189-214 Whitley. Rawlins.D. G.. Syswerda. Morgan-Kaufmann. ed. Foundations of Genetic Algorithms -2-. pp 356-366. J. D. on Genetics. Complex Systems 5:3144. Morgan-Kaufmann. ed. ed. Technische Universitat Berlin. Proc. (1992) Tracking Primary Hyperplane Competitors During Genetic Search. (1993b) Cellular Genetic Algorithms. Dissertation. Whitley. Scha er. and Selection in Evolution. R. and Eshelman. Addison-Wesley. Rawlins. and DeJong. (1991) An Analysis of Multi-Point Crossover. D. Schwefel. W. Whitley. Proc 3rd International Conf on Genetic Algorithms.

303 0.014 0.011 0.054 0.85 0.536 0.871 0.001 0.868 88.204 0.639 3.197 70.370 0.490 3.71 1.422 5.698 0.217 0.680 2.023 0.192 0.34 0.554 4.006 0.25 1.736 0.520 126.49 2.645 1.837 0.61 1.363 0.817 0.094 1.615 0.009 0.413 4.962 5.164 286.746 624.723 0.142 0.273 1436.189 0.049 0.972 0.103 0.413 E 10 3 2 14 10 19 19 17 7 4 6 6 12 13 12 4 9 18 12 10 F 0 1 1 2 0 2 1 0 1 3 0 3 1 1 0 0 1 1 2 0 Mating Pool Substring 2 0010100100 1010100001 0001001101 1110011011 0010100100 0011100010 0011100010 0111000010 0101011011 1001000110 0011100101 0011100101 0000111101 0000111110 0000111101 1001000110 1001111101 1010010100 0000111101 0010100100 Substring 1 1010101010 0111001000 0011100111 0111000010 1010101010 1011000011 1011000011 1011000110 0000000111 1000010100 0011111000 0011111000 0110011101 1110001101 0110011101 1000010100 1011100111 0100001001 0110011101 1010101010 .358 0.724 f(x) 959.08 0.041 5.004 0.386 0.994 1.680 105.062 0.37 0.000 0.716 0.13 B 0.002 0.728 0.13 1.472 0.006 C 0.871 1.117 42.001 0.045 0.549 0.008 0.108 0.093 0.030 0.589 0.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 String Substring 2 1110010000 0001001101 1010100001 1001000110 1100011000 0011100101 0101011011 1110101000 1001111101 0010100100 1111101001 0000111101 0000111110 1110011011 1010111010 0100011111 0111000010 1010010100 0011100010 1011100011 Substring 1 1100100000 0011100111 0111001000 1000010100 1011100011 0011111000 0000000111 1110101011 1011100111 1010101010 0001110100 0110011101 1110001101 0111000010 1010111000 1100111000 1011000110 0100001001 1011000011 1111010000 x2 5.24 0.026 512.220 0.018 0.024 0.201 57.060 0.639 4.747 0.947 3.674 3.872 0.343 2.753 987.000 D 0.931 0.012 0.082 4.955 F(x) 0.025 0.09 2.10 0.003 0.086 0.288 0.507 4.019 0.14 2.002 0.401 0.12 0.556 0.349 0.334 1.010 0.013 0.607 0.685 65.452 3.120 4.800 197.003 0.22 1.891 0.035 5.712 0.849 814.563 265.001 A 0.015 0.001 0.96 0.777 0.598 318.007 0.36 0.556 39.331 2.111 0.699 113.683 2.009 0.007 0.055 0.334 x1 4.364 5.413 4.164 1.692 1.556 97.98 1.326 4.067 0.355 2.007 0.455 0.005 0.833 4.84 0.376 0.148 0.358 4.017 0.147 5.

A = Expected Count = B = Probability Selection C = Cumulative probability of selection D = Random number between 0 and 1 E = String Number F = True Count in the Mating pool .

Sign up to vote on this title
UsefulNot useful