You are on page 1of 30

Evolution Strategies

Chapter 4
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

ES Quick Overview

l Developed: Germany in the 1970’s.


l Early names: I. Rechenberg, H.-P. Schwefel Techinical
university of Berlin.
l Typically applied to:
– Numerical optimisation.
l Attributed features:
– Fast.
– Good model for real-valued optimisation.
– Relatively much theory.
l Special:
– Standard version: Self-adaptation of (mutation) parameters.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

ES Technical Summary Table

Representation Real-valued vectors

Recombination Discrete or intermediary

Mutation Gaussian perturbation

Parent selection Uniform random

Survivor selection (µ,λ) or (µ+λ)

Specialty Self-adaptation of mutation


step sizes
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Introductory Example

n
l Task: minimimise f : R à R
l Algorithm: “two-membered ES” using
– Vectors from Rn directly as chromosomes.
– Population size 1.
– Mutation as the only operator to create one child.
– Greedy selection.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Introductory Example: Pseudocde


l Set t = 0
l Create initial point xt = 〈 x1t,…,xnt 〉
l REPEAT UNTIL (TERMIN.COND satisfied) DO
– Draw zi from a normal distr. for all i = 1,…,n
– yit = xit + zi
– IF f(xt) < f(yt) THEN xt+1 = xt
– ELSE xt+1 = yt
– FI
– Update t = t+1
l OD
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Introductory Example: Mutation Mechanism

l z values drawn from normal distribution N(ξ,σ):


– Mean ξ is set to 0.
– Variation σ is called mutation step size.
l σ is varied on the fly by the “1/5 success rule”:
l This rule resets σ after every k iterations, aiming to reach
1/5 of successful mutations:
– σ = σ / c if ps > 1/5.
– σ = σ • c if ps < 1/5.
– σ=σ if ps = 1/5.
l ps is the % of successful mutations, 0.817 ≤ c ≤ 1.
– In a successful mutation the child is fitter than the parent.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Illustration of Normal Distribution


Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Another Historical Example: The jet Nozzle


Experiment

Task: to optimize the shape of a jet nozzle


Approach: random mutations to shape + selection

Initial shape

Final shape
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Representation

l Chromosomes consist of three parts:


– Object variables: x1,…,xn.
– Strategy parameters:
l Mutation step sizes: σ1,…,σnσ, for uncorrelated mutation.
l Rotation angles: α1,…, αnα , for correlated mutation.

l Not every component is always present.


l Full size chromossome:
〈 x1,…,xn, σ1,…,σn ,α1,…, αnα〉
where na = n(n-1)/2 (number of i,j pairs)
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Mutation

l Main mechanism: changing value by adding random


noise drawn from a normal distribution. The new value
of a variable is:
– x’i = xi + N(0,σ)
l Key idea:
– σ is part of the chromosome 〈 x1,…,xn, σ 〉.
– σ is also mutated into σ’ (see later how).
l Thus: mutation step size σ is coevolving with the
solution vector x.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Order of Mutation

l Whole mutation effect: 〈 x, σ 〉 à 〈 x’, σ’ 〉


l Order is important:
– First σ à σ’ (see later how).
– Second x à x’ = x + N(0,σ’).
l Rationale: New 〈 x’ ,σ’ 〉 is evaluated twice.
– Primary: x’ is good if f(x’) is good.
– Secondary: σ’ is good if the x’ it created is good.
l Reversing mutation order this would not work.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Case 1: Uncorrelated Mutation with one


Step Size (σ)

l Chromosomes: 〈 x1,…,xn, σ 〉.
l σ’ = σ • exp(τ • N(0,1)).
– A single step size per individual calculated by
multiplying the step size by a lognormal distribution.
– The mutation step size is the same in each direction.
l x’i = xi + σ’ • N(0,1).
l Typically the “learning rate” τ ∝ 1/ n½.
l Use of a boundary rule σ’ < ε0 ⇒ σ’ = ε0 to avoid
small steps.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Mutants with Equal Likelihood

Circle: mutants having the same chance to be created.


Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Case 2: Uncorrelated Mutation with n


Step Sizes (σi)

l Chromosomes: 〈 x1,…,xn, σ1,…, σn 〉


l σ’i = σi • exp(τ’ • N(0,1) + τ • Ni (0,1))
l x’i = xi + σ’i • Ni (0,1)
l Two learning rate parameters:
– τ’ overall learning rate, the same for the individual. It preserves
all degrees of freedom.
– τ coordinate specific learning rate. It allows different mutation
strategies in different directions.
l τ’ ∝ 1/(2 n)½ and τ ∝ 1/(2 n½) ½.
l Lowest standard deviation: σi’ < ε0 ⇒ σi’ = ε0.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Mutants with Equal Likelihood

Ellipse: mutants having the same chance to be created.


Note the difference of dimensions.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Case 3: Correlated Mutation

l Chromosomes: 〈 x1,…,xn, σ1,…, σn ,α1,…, αnα〉.


where n a = n • (n-1)/2
l The covariance matrix C is defined as:
– cii = σi2.
– cij = 0 if i and j are not correlated .

– cij = ½ • ( σi2 - σj2 ) • tan(2 αij) if i and j are correlated.


l Note the numbering / indices of the α‘s.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Case 3: Correlated Mutation

The mutation mechanism is then:


l σ’i = σi • exp(τ’ • N(0,1) + τ • Ni (0,1))
l α’ j = αj + β • N (0,1)

l x ’ = x + N(0,C’)
– x stands for the vector 〈 x1,…,xn 〉
– C’ is the covariance matrix C after mutation of the α values
l τ’ ∝ 1/(2 n)½ and τ ∝ 1/(2 n½) ½ and β ≈ 5°
l σi’ < ε0 ⇒ σi’ = ε0 and
l | α’j | > π ⇒ α’j = α’j - 2 π sign(α’j)
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Mutants with Equal Likelihood

Ellipse: mutants having the same chance to be created.


Note the difference of dimensions and angle.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Recombination

l Two parents creates one child.


l Acts per variable / position:
– Intermediate recombination: Average parental values. Often
employed to recombine the strategy parameters.
– Discrete recombination: Select one of the parental values. Often
used to recombine object variable.
l From two or more parents by either:
– Using two selected parents to create a child.
– Selecting two parents for each position anew.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Names of Recombinations

Two parents
Two fixed parents
selected for each i

Local Global
zi = (xi + yi)/2
intermediary intermediary

zi is xi or yi Local Global
chosen randomly discrete discrete
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Parent Selection

l Parents are selected by uniform random


distribution whenever an operator needs
one/some.
l Thus: ES parent selection is unbiased - every
individual has the same probability to be
selected.
l Note that in ES “parent” means a population
member, while in GAs, a parent means a
population member selected to undergo
variation.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Survivor Selection

l Applied after creating λ children from the µ


parents by mutation and recombination.
l Deterministically chops off the “bad stuff”:
– Choice of the µ fitter individuals.
– Often λ >> µ.
l Basis of selection is either:
– The set of children only: (µ,λ)-selection.
– The set of parents and children: (µ+λ)-selection.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Survivor Selection

l (µ+λ)-selection is an elitist strategy.


l (µ,λ)-selection can “forget” former individuals.
l Often (µ,λ)-selection is preferred for being:
– Better in leaving local optima because it discards all parents.
– Better in following moving optima because it does not preserve
outdated solutions.
– Using the + strategy bad σ values can survive in 〈x,σ〉 too long if
their host x is very fit.
l Selective pressure in ES is very high (λ ≈ 7 • µ is the
common setting).
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Survivor Selection

l The takeover time (t *) of a given selection mechanism is


defined as the number of generations to completely fill
the population with copies of the best individual, given
one initial copy. Goldberg and Deb estimated the
takeover time for:
– A typical evolution strategy:
t * = ln(λ) / ln(λ/µ).
For µ = 15, λ = 100, then t* = 2.

– Proportional selection in a GA:


t * = λ ln(λ).
For λ = 100, then t* = 460.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Self-adaptation

l An self-adaptive ES is proved (in simple cases)


to outperform an ES without adaptation, i.e., the
mutation step decreases over time:
– In the beginning of the search process the
exploration is high to locate promising regions
whereas, later, smaller mutations are required to
yield fine-tuning (exploitation effect).
– In the context of changing fitness landscapes, the
mutation step needs to be increased whenever the
landscape changes, because further explorations if
often necessary.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Self-adaptation Illustrated

Changes in the fitness values (left) and the mutation step sizes (right) for a
moving optimum ES experiment on sphere function with n = 30. The location of
the optimum changes every 200 generations.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Prerequisites for Self-adaptation

l µ > 1 to have different strategies.


l λ > µ to generate offspring surplus.
l Not “too” strong selection, e.g., λ ≈ 7 • µ.
l (µ,λ)-selection to get rid of misadapted σ‘s.
l Mixing strategy parameters by (intermediary)
recombination on them.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Example Application: The Ackley Function


(Bäck et al ’93)

l The Ackley function (here used with n =30):


 1 n 2  1 n 
f ( x ) = −20 ⋅ exp − 0.2 ⋅ ∑ xi  − exp ∑ cos( 2πxi )  + 20 + e

 n i =1   n i =1 
l Evolution strategy (coincidence of 30’s!):
– Representation:
l Range of valid values: -30 < xi < 30.
l 30 step sizes (one for each element of the individual).
– (µ,λ)-selection: (30,200)-selection
– Termination: After 200000 fitness evaluations.
– Results: Average best solution in the final generation is 7.48 • 10 –8

(very good).
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Example Application: The Cherry Brandy


Experiment

l Task to create a colour mix yielding a target colour (that


one of a well known cherry brandy).
l Ingredients: water + red, yellow, blue dye.
l Representation: 〈 w, r, y ,b 〉 with no self-adaptation.
l Values scaled to give a predefined total volume (30 ml).
l Mutation: lo / med / hi σ values used with equal chance.
l Selection: (1,8) strategy.
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
Evolution Strategies

Example Application: The Cherry Brandy


Experiment

l Fitness: Students effectively making the mix


and comparing it with target colour.
l Termination criterion: Student satisfied with
mixed colour.
l Solution is found mostly within 20 generations.
l Accuracy is very good.

You might also like