You are on page 1of 9

Exercise Genetic Algorithms

In this Exercise
1. Theory (in brief)

GAs
2. Things you have to consider
3. GAs and Matlab
4. Part #1 (fitness function, variables, representation, plots)
5. Part #2 (population diversity – size – range, fitness scaling)
6. Part #3 (selection, elitism, mutation)
7. Part #4 (global vs. local minima)

Duration: 120 min

1. Theory (in brief) (5 min)

A Genetic Algorithm is an optimization technique that is based on the evolution theory. Instead of
searching for a solution to a problem in the "state space" (like the traditional search algorithms do), a
GA works in the "solution space" and builds (or better, "breeds") new, hopefully better solutions
based on existing ones.
The general idea behind GAs is that we can build a better solution if we somehow combine the
"good" parts of other solutions (schemata theory), just like nature does by combining the DNA of liv-
ing beings. The overall idea of a GA is depicted in Figure 1 (you should refer to the theory for the very
details).

Pi selection Pi+1
elitism


mutation

Γ Α

parents mating descendants

Figure 1: The outline of a Genetic Algorithm

2. Things you have to consider (and be aware of) (5 min)

The first thing you must do in order to use a GA is to decide if it is possible to automatically build
solutions to your problem. For example, in the Traveling Salesman Problem, every route that passes
through the cities in question is potentially a solution, although probably not the optimal one. You
must be able to do that because a GA requires an initial population P of solutions.

Kokkoras F. | Paraskevopoulos K. P a g e |1
International Hellenic University Genetic Algorithms

Then you must decide what "gene" representation you will use. You have a few alternatives like
binary, integer, double, permutation, etc. with the binary and double being the most commonly used
since they are the most flexible. After having selected the representation you must decide in order:
 the method to select parents from the population P (Cost Roulette Wheel, Stochastic Uni-
versal Sampling, Rank Roulette Wheel, Tournament Selection, etc.)
 the way these parents will "mate" to create descendants (to many methods to mention
here – just note that your available options are a result of the representation decided earlier)
 the mutation method (optional but useful – again, options are representation depended)
 the method you will use to populate the next generation (Pi+1) (age based, quality based,
etc. – you probably use elitism as well)
 the algorithm's termination condition (number of generations, time limit, acceptable quality
threshold, improvement stall, etc. – combination of these is commonly used)

3. GAs and Matlab (10 min)

Figure 2: GAs in Matlab's Optimization Toolbox

Matlab provides an optimization toolbox that includes a GA-based solver. You start the toolbox
by typing optimtool in the Matlab's command line and pressing enter. As soon as the optimization
window appears, you select the solver ga – Genetic Algorithm and you are ready to go. Matlab does

Kokkoras F. | Paraskevopoulos K. P a g e |2
International Hellenic University Genetic Algorithms

not provide every method available in the literature in every step but it does have a lot of options for
fine tuning and also provides hooks for customization. The user should program (by writing m files)
any extended functionality required. Take your time and explore the window. If you mesh up with
the settings, before you proceed close the window and run the toolbox again.
Matlab R2008a (v.7.6) was used for the tutorial. Earlier versions are OK as soon as the proper
toolbox is presented and installed.

4. Part #1 (fitness function, variables, representation, plots) (15 min)

The first thing you have to do is to provide the fitness function, that is, the function that calculates
the quality of each member of the population (or in plain mathematics, the function you have to op-
timize). Let's use one provided by Matlab: type @rastriginsfcn in the proper field and set the
Number of variables to 2. The representation used is
defined in the Options-Population section. The default
selection Double Vector is fine.
To have an idea of what we are looking for, check
the equation of this function and its plot on the right.
Ras(x,y)=20+x2+y2-10(cos2πx+cos2πy)
We want to find the absolute minimum which is 0
at (0,0). Note that by default, only minimization is
supported. If for example you want to maximize the
f1(x,y) function then built and minimize the following custom function: f2(x,y) = - f1(x,y).
Although you are ready to run, let's ask for some plots, so we will be able to better figure out
what happens. Go in the Options section, scroll down
to the Plot functions and check Best Fitness and Dis-
tance checkboxes.
Now you are ready (the default settings in every-
thing else is adequate). Press the Start button. The
algorithm starts, the plots are pop-up and soon you
have the results at the bottom left of the window.
The best fitness function value (the smallest one
since we minimize) and the termination condition
met are printed, together with the solution (Final
Point – it is very close to (0,0)). Since the method is
stochastic, don't expect to be able to reproduce any
result found in a different run.
Now check the two plots on the left. It is obvious
that the population converges, since the average
distance between individuals (solutions) in term of
the fitness value is reduced, as the generations pass. This is a measure of the diversity of a popula-
tion. It is hard to avoid convergence but keeping it low or postponing its appearance is better. Having
diversity in the population allows the GA to search better in the solution space.

Kokkoras F. | Paraskevopoulos K. P a g e |3
International Hellenic University Genetic Algorithms

Check also the fitness value as it gradually gets smaller. This is required, it is an indication that
optimization takes place. Not only the fitness value of the best individual was reduced but the mean
(average) fitness of the population was also reduced (that is, in terms of the fitness value, the whole
population was improved –we have better solutions in the population, at the end).
All the above together are a good indication that the GA did its job well but we are really happy
only because we know where the solution is (at (0,0) with fitness value 0).
Note however that the nature of the GA prevents it from finding the best solution (0,0). It can go very close
to this value but getting exactly to (0,0) is hard and could be done only by luck. This is OK since in this kind of
problems (optimization) we are happy even with a good (and not the perfect one) solution. If not, the hybrid
function option should be used (not discussed here).

Generally speaking, to get the best results from the GA requires experimentation with the differ-
ent options. Let's see how some of these affect the performance of the GA.

5. Part #2 (population diversity – size – range, fitness scaling) (35 min)

The performance of a GA is affected by the diversity of the initial population. If the average distance
between individuals is large, the diversity is high; if the average distance is small, the diversity is low.
You should experiment to get the right amount of diversity. If the diversity is too high or too low, the
genetic algorithm might not perform well. We will demonstrate this in the following.
By default, the Optimization Tool creates a random initial population using a creation function.
You can limit this by setting the Initial range field in Population options. Set it to (1; 1.1). By this we
actually make it harder for the GA to search equally well in all the solutions space. We do not prevent
it though. The genetic algorithm can find the solution even if it does not lie in the initial range, pro-
vided that the populations have enough diversity.
Note: The initial range only restricts the range of the points in the initial population by specifying the lower
and upper bounds. Subsequent generations can contain points whose entries do not lie in the initial range. If
you want to bound all the individuals in all generations in a range, then you can use the lower and upper
bound fields in the constraints panel, on the left.

Leave the rest settings as in Part #1 except Op-


tions-Stopping Criteria-Stall Generations which should
be set to 100. This will let the algorithm run for 100
generation providing us with better results (and
plots). Now click the Start button.
The GA returns the best fitness function value of
approximately 2 and displays the plots in the figure
on the right.
The upper plot, which displays the best fitness at
each generation, shows little progress in lowering the
fitness value (black dots). The lower plot shows the
average distance between individuals at each genera-
tion, which is a good measure of the diversity of a
population. For this setting of initial range, there is too little diversity for the algorithm to make pro-
gress. The algorithm was trapped in a local minimum due to the initial range restriction!

Kokkoras F. | Paraskevopoulos K. P a g e |4
International Hellenic University Genetic Algorithms

Next, set Initial range to [1; 100] and run


the algorithm again. The GA returns the best
fitness value of approximately 3.3 and displays
the following plots:
This time, the genetic algorithm makes
progress, but because the average distance
between individuals is so large, the best indi-
viduals are far from the optimal solution. Note
though that if we let the GA to run for more
generations (by setting Generations and Stall
Generations in Stopping Criteria to 200) it will
eventually find a better solution.
Note: If you try this, please leave the settings
in their initial values before you proceed (de-
fault and 100, respectively).

Finally, set Initial range to [1; 2] and run


the GA. This returns the best fitness value of
approximately 0.012 and displays the plots that follow.
The diversity in this case is better suited to the problem, so the genetic algorithm returns a much
better result than in the previous two cases.
In all the examples above, we had the
Population Size (Options-Population) set to
20 (the default). This value determines the
size of the population at each generation.
Increasing the population size enables the
genetic algorithm to search more points and
thereby obtain a better result. However, the
larger the population size, the longer the ge-
netic algorithm takes to compute each gen-
eration.
Note though that you should set Popula-
tion Size to be at least the value of Number of
variables, so that the individuals in each
population span the space being searched.
You can experiment with different settings
for Population Size that return good results
without taking a prohibitive amount of time to run.
Finally, another parameter that affects the diversity of the population (remember, it's vital to
have good diversity in the population) is the Fitness Scaling (in Options). If the fitness values vary too
widely Figure 3, the individuals with the lowest values (recall that we minimize) reproduce too rapid-
ly, taking over the population pool too quickly and preventing the GA from searching other areas of
the solution space. On the other hand, if the values vary only a little, all individuals have approxi-
mately the same chance of reproduction and the search will progress very slowly.

Kokkoras F. | Paraskevopoulos K. P a g e |5
International Hellenic University Genetic Algorithms

Figure 3: Raw fitness values (lower is better) vary too widely on the left. Scaled values (right) do not
alter the selection advantage of the good individuals (except that now bigger is better). They just
reduce the diversity we have on the left. This prevents the GA from converging too early.

The Fitness Scaling adjusts the fitness values (scaled values) before the selection step of the GA.
This is done without changing the ranking order, that is, the best individual based on the raw fitness
value remains the best in the scaled rank, as well. Only the values are changed, and thus the proba-
bility of an individual to get selected for mating by the selection procedure. This prevents the GA
from converging too fast which allows the algorithm to better search the solution space.

6. Part #3 (selection, elitism, mutation) (35 min)

We continue this GA tutorial using the Rastrigin's function. Use the following settings leaving every-
thing else in its default value (Fitness function: @rastriginsfcn, Number of Variables: 2, Initial Range:
[1; 20], Plots: Best Fitness, Distance).
The Selection panel in Options controls the Selection Function, that is, how individuals are se-
lected to become parents. Note that this mechanism works on the scaled values, as described in the
previous section. Most well-known methods are presented (uniform, roulette and tournament). An
individual can be selected more than once as a parent, in which case it contributes its genes to more
than one child.

Figure 4: Stochastic uniform selection method. For 6 parents we step the


selection line with steps equal to 15/6.

The default selection option, Stochastic Uniform, lays out a line (Figure 4) in which each parent
corresponds to a section of the line of length proportional to its scaled value. The algorithm moves

Kokkoras F. | Paraskevopoulos K. P a g e |6
International Hellenic University Genetic Algorithms

along the line in steps of equal size. At each step, the algorithm allocates a parent from the section it
lands on. For example, assume a population of 4 individuals with scaled values 7, 4, 3 and 1. The indi-
vidual with the scaled value of 7 is the best and should contribute its genes more than the rest. We
create a line of length 1+3+4+7=15. Now, let's say that we need to select 6 individuals for parents.
We step over this line in steps of 15/6 and select the individual we land in (Figure 4).
The Reproduction panel in Options control how the GA creates the next generation. Here you
specify the amount of elitism and the fraction of the population of the next generation that is gener-
ated through mating (the rest is generated by mutation). The options are:
 Elite Count: the number of individuals with the best fitness values in the current generation
that are guaranteed to survive to the next generation. These individuals are called elite chil-
dren. The default value of Elite count is 2.
Try to solve the Rastrigin's problem by changing only this parameter. Try values of 10, 3 and
1. You will get results like those depicted in Figure 5. It is obvious that you should keep this
value low. 1 (or 2 - depending on the population size) is OK. (Why?)

Figure 5: Elite count 10 (left), 3 (middle) and 1 (right). Too much elitism results in
early convergence which can make the search less effective.

 Crossover Fraction: the fraction of individuals in the next generation, other than elite chil-
dren, that are created by crossover (that is, mating). The rest are generated by mutation. A
crossover fraction of 1 means that all children other than elite individuals are crossover chil-
dren. A crossover fraction of 0 means that all children are mutation children.
The following example shows that neither of these extremes is an effective strategy for
optimizing a function. You will now change the problem (you better restart the optimization
toolbox to have everything set to default values). You will optimize this function:
f(x1, x2, ..., x10) = |X1| + |X2| + ... +|X10|
Use the following settings:
o Fitness Function: @(x) sum(abs(x))
o Number of variables: 10
o Initial range: [-1; 1].
o Plots: Best fitness and Distance
Run the example with the default value of 0.8 for Crossover fraction, in the Options > Re-
production panel. This returns the best fitness value of approximately 0.25 and displays plots

Kokkoras F. | Paraskevopoulos K. P a g e |7
International Hellenic University Genetic Algorithms

like those in Figure 6 (left). Note though that for another fitness function, a different setting
for Crossover fraction might yield the best result.

Figure 6: Plots for Crossover fraction set to 0.8 (left) and 1 (right).

To see how the genetic algorithm performs when there is no mutation, set Crossover
fraction to 1.0 and click Start. This returns the best fitness value of approximately 1.1 and
displays plots similar to the one in Figure 6 (right).
In this case, the algorithm selects genes from the individuals in the initial population and
recombines them. The algorithm cannot create any new genes because there is no mutation.
The algorithm generates the best individual that it can using these genes at generation num-
ber ~15, where the best fitness plot becomes level. After this, it creates new copies of the
best individual, which are then are selected for the next generation. By generation number
~19, all individuals in the population are the same, namely, the best individual. When this oc-
curs, the average distance between individuals is 0. Since the algorithm cannot improve the
best fitness value after generation ~15, it terminates because the average change to the fit-
ness function is less what is set to the termination conditions.
To see how the genetic algorithm performs
when there is no crossover, set Crossover fraction
to 0 and click Start. This returns the best fitness
value of approximately ~2.7 and displays plots like
that on the right.
In this case, all children are generated though
mutation. The random changes that the algorithm
applies never improve the fitness value of the
best individual at the first generation. While it im-
proves the individual genes of other individuals,
as you can see in the upper plot by the decrease
in the mean value of the fitness function, these
improved genes are never combined with the
genes of the best individual because there is no

Kokkoras F. | Paraskevopoulos K. P a g e |8
International Hellenic University Genetic Algorithms

crossover. As a result, the best fitness plot is level and the algorithm stalls at generation
number 50.

7. Part #4 (global vs. local minima) (15 min)

Optimization algorithms sometimes return a local minimum instead of the global one, that is, a point
where the function value is smaller than the nearby
points, but possibly greater than one at a distant point in
the solution space. The genetic algorithm can sometimes
overcome this deficiency with the right settings. As an ex-
ample, consider the following function which has the plot
depicted on the right:

( )
( ) {
( )( )
The function has two local minima, one at x = 0, where
the function value is –1, and the other at x = 21, where the function value is about -1.37. Since the
latter value is smaller, the global minimum occurs at x = 21.
Let us now see how we can define custom fitness functions. Go to Matlab and select
File>New>M-File. Define the function in the editor window as shown is the picture. Then Save the file
to your desktop using the suggested filename
two_min (do not change it!).
Now in the Matlab's main toolbar, set the cur-
rent directory to the Desktop. This way, your M file
will be visible to the Matlab.
In the Optimization Toolbox set Fitness func-
tion to @two_min, Number of variables to 1,
Stopping criteria>Stall Generations to 100 and click
Start. The genetic algorithm returns a point very
close to the local minimum at x = 0.
The problem here is the default initial range of [0; 1] (in the Options > Population panel). This
range is not large enough to explore
points near the global minimum at x
= 21.
One way to make the GA explore
a wider range of points (that is, to
increase the diversity of the popula-
tions) is to increase the Initial range.
It does not have to include the point
x=21, but it must be large enough so
that the algorithm will be able to
generates individuals near x = 21. So, set Initial range to [0; 15] and click Start once again. Now the
GA returns a point very close to 21.

Kokkoras F. | Paraskevopoulos K. P a g e |9

You might also like