Professional Documents
Culture Documents
Chapter 12
Introduction
it models the genetic process that gives rise to evolution it models the sexual reproduction where both parents give some genetic information to their offspring it works very well but the solution is not guaranteed it is very popular algorithm for people to use when they have no idea of any other way to find a reasonable solution. it perform both exploitation and exploration so that they can make incremental improvements to current good solutions, but also find radically new solutions, some of which may be better than the current best
Genetic Algorithm
GA is a computational approximation to how evolution performs search, which is by altering the genome and this changing the fitness of individuals In order to interpret the theory into algorithm we need to figure out
a method for representing problems as chromosomes a way to calculate the fitness of a solution a selection method to choose parents a way to generate offspring by breeding the parents
To describe above methods we will use NP-complete problem example known as Knapsack problem in following slides
Knapsack problem
Suppose that you are packing for your holidays.
You've bought the biggest and best rucksack that was for sale, but there is still no way that you are going to fit in everything you want to take (camera, money, addresses of friends, etc.) and the things that your mum is insisting you take (spare underwear, phrasebook, stamps to write home with, etc.). As a good computer scientist you decide to assign a value to each item, and measure how much space it takes up.
Then you want to maximize the value of the items you will take with you, with the constraint that everything has to fit into the bag.
String Representation
way to represent the individual solution, in analogy to the chromosome these strings are composed of some chosen alphabet and most of the time alphabets are binary each alphabets of the string are analogous to alleles we create a set of random strings to be our initial population
if there were four things we want to take, then (0, 1, 1, 0) would mean that we take the middle two, but not the first or last
Evaluating Fitness
A fitness function is a particular type of objective function that is used to summarize, as a single figure of merit, how close a given design solution is to achieving the set aims.
Fitness function can be seen as an oracle that takes a string as an argument and returns a value for that string
It is problem-specific part of the algorithm.
Clearly, best string should have the highest fitness value and the fitness value should decrease as the strings do less well on the problem.
Population quality depends upon the fitness value they have.
However, fitness function does not tell us anything about whether they will fit into the bagwith this fitness function the optimal solution is to take everything.
Population
Usually, first population in GA is created randomly After fitness of every string in evaluated, first generation is bred together to make second generation, which is then used to generate third, and so on. After the initial population is chosen randomly, the algorithm evolves is such a way that the fitness of individuals in the population increases over the generations.
Unlike the real evolution, at every iteration in this problem, population stays the same size i.e. 100 .
Parent Selection
We need some method to select best possible parents in order to improve the fitness of new generation idea here is that fitness will improve if we select strings that are already relatively fit compared to the other members of the population (exploitation) however, it is also good to allow some exploration in there, which means that we have to allow some possibility of weak strings being considered Basic idea of exploration is to choose strings proportionally to their fitness, so that fitter strings are more likely to be chosen to enter the mating pool.
Functional Selection
Comparatively, it is simple method Just pick up some fraction f of the best strings and ignore the rest It is very easy to implement, but it does limit the amount of exploration that is done, biasing the GA towards exploitation
Where F is the fitness of string . This probabilistic interpretation is the reason why fitness should be positive.
If they arent guaranteed positive, then Boltzmann selection can be used to make them so (s is the selection strength)
Generating Offspring
After selection of the parents in the generation we proceed forward for generation of offspring We now decide how to combine their two string to generation of the offspring, which is genetics part of the algorithm. We have few genetic operators which are used to generate offspring
Genetic Operators
There are various genetic operators in practice Following are some of the operators that we will be discussing about 1. Crossover 2. Mutation 3. Elitism, Tournaments, and Niching
Crossover
It is analogous to reproduction and biological crossover, upon which genetic algorithms are based. Cross over is a process of taking more than one parent solutions and producing a child solution from them. we generate the new string as part of the first parent and part of second. There are methods for selection of the chromosomes. Those are also given below.
Single Point Crossover Multi Point Crossover Uniform Crossover
Everything between the two points is swapped between the parent string, rendering two child strings
Uniform Crossover
The Uniform Crossover uses a fixed mixing ratio between two parents.
Unlike single and multi-point crossover, the Uniform Crossover enables the parent to contribute the gene level rather than the segment level.
If the mixing ratio is 0.5, the offspring has approximately half of the genes from first parent and the other half from second parent, although cross over points can be randomly chosen as seen below:
Crossover in String
Mutation
The exploitation of the current best strings is performed by the mutation operator which effectively performs local random search. Value of each element of string is changed with some (usually low) probability p. For knapsack problem, mutation causes a bit-flip, as is seen below
Mutation (contd..)
For chromosomes with real values, some random number is generally added or subtracted from the current value. Often P 1/L where L is the string length, so that there is approximately one mutation in each string. This might seem quite high, but it is often found to be a good choice given that the mutation rate has to trade off doing lots of local search with the risk of disrupting the good solutions.
This biases the fitness function towards uncommon strings, but can also mean that very common good solutions are selected against.
Having made those choices, we can let the GA run on the problem, with a possible population and their offspring shown in figure in next slide, and look at the best solutions after some present number of iterations.
Output of the knapsack problem that we defined earlier in the slides. We use same fitness function that we defined earlier.
Limitations of GA
GA can be very slow. It may take longer to escape from the local maxima.
There is no absolute assurance that a genetic algorithm will find a global optimum
because we dont know anything about the fitness landscape, we cant see how well the GA is doing basic criticism of genetic algorithms is that it is very hard to analyze the behavior of GA
GA can be used to choose the topology of network. Mutation is used in finding the most suitable topology which includes following mutations
Delete a neuron Delete a weight connection Add a neuron Add a connection
Genetic Programming
introduced by Jhon Koza combines the idea of machine learning and evolved tree structures. tree-based variants on mutation and crossover are defined like replace sub-trees by other sub-trees, either randomly generated or swapped from another tree, and then the genetic program runs just like a normal genetic algorithm, but acting on these program trees rather than strings
1 x
1 1 *
0 x
1 +
1 +
*
1 x x x *
3
1 *
x x x
2x-3
Athematic trees x
X2+1
The search space is unbelievably large, and the mutation operator not especially useful, and so a lot depends upon the initial population.
Set of possibly useful sub-trees are usually chosen by the system developer first in order to give the system a head start.
Initially, each value of this vector is 0.5, so that each element has equal chance of being 0 or 1.
where best and second represent the best and second-best elements of the population