CE Estrategias de Evolucao

Evolution Strategies
Chapter 4
Adapted from A.E. Eiben and J.E. Smith,
Evolutionary Computing
ES Quick Overview
l Developed: Germany in the 1970’s.

l Early names: I. Rechenberg, H.-P. Schwefel Techinical
university of Berlin.
l Typically applied to:
– Numerical optimisation.
l Attributed features:
– Fast.
– Good model for real-valued optimisation.
– Relatively much theory.
l Special:
– Standard version: Self-adaptation of (mutation) parameters.
ES Technical Summary Table
Representation Real-valued vectors
Recombination Discrete or intermediary
Mutation Gaussian perturbation
Parent selection Uniform random
Survivor selection (µ,λ) or (µ+λ)
Specialty Self-adaptation of mutation

step sizes
Introductory Example
n
l Task: minimimise f : R à R
l Algorithm: “two-membered ES” using
– Vectors from Rn directly as chromosomes.
– Population size 1.
– Mutation as the only operator to create one child.
– Greedy selection.
Introductory Example: Pseudocde

l Set t = 0
l Create initial point xt = 〈 x1t,…,xnt 〉
l REPEAT UNTIL (TERMIN.COND satisfied) DO
– Draw zi from a normal distr. for all i = 1,…,n
– yit = xit + zi
– IF f(xt) < f(yt) THEN xt+1 = xt
– ELSE xt+1 = yt
– FI
– Update t = t+1
l OD
Introductory Example: Mutation Mechanism
l z values drawn from normal distribution N(ξ,σ):

– Mean ξ is set to 0.
– Variation σ is called mutation step size.
l σ is varied on the fly by the “1/5 success rule”:
l This rule resets σ after every k iterations, aiming to reach
1/5 of successful mutations:
– σ = σ / c if ps > 1/5.
– σ = σ • c if ps < 1/5.
– σ=σ if ps = 1/5.
l ps is the % of successful mutations, 0.817 ≤ c ≤ 1.
– In a successful mutation the child is fitter than the parent.
Illustration of Normal Distribution

Another Historical Example: The jet Nozzle

Experiment
Task: to optimize the shape of a jet nozzle

Approach: random mutations to shape + selection
Initial shape
Final shape
Representation
l Chromosomes consist of three parts:

– Object variables: x1,…,xn.
– Strategy parameters:
l Mutation step sizes: σ1,…,σnσ, for uncorrelated mutation.
l Rotation angles: α1,…, αnα , for correlated mutation.
l Not every component is always present.

l Full size chromossome:
〈 x1,…,xn, σ1,…,σn ,α1,…, αnα〉
where na = n(n-1)/2 (number of i,j pairs)
Mutation
l Main mechanism: changing value by adding random

noise drawn from a normal distribution. The new value
of a variable is:
– x’i = xi + N(0,σ)
l Key idea:
– σ is part of the chromosome 〈 x1,…,xn, σ 〉.
– σ is also mutated into σ’ (see later how).
l Thus: mutation step size σ is coevolving with the
solution vector x.
Order of Mutation
l Whole mutation effect: 〈 x, σ 〉 à 〈 x’, σ’ 〉

l Order is important:
– First σ à σ’ (see later how).
– Second x à x’ = x + N(0,σ’).
l Rationale: New 〈 x’ ,σ’ 〉 is evaluated twice.
– Primary: x’ is good if f(x’) is good.
– Secondary: σ’ is good if the x’ it created is good.
l Reversing mutation order this would not work.
Case 1: Uncorrelated Mutation with one

Step Size (σ)
l Chromosomes: 〈 x1,…,xn, σ 〉.
l σ’ = σ • exp(τ • N(0,1)).
– A single step size per individual calculated by
multiplying the step size by a lognormal distribution.
– The mutation step size is the same in each direction.
l x’i = xi + σ’ • N(0,1).
l Typically the “learning rate” τ ∝ 1/ n½.
l Use of a boundary rule σ’ < ε0 ⇒ σ’ = ε0 to avoid
small steps.
Mutants with Equal Likelihood
Circle: mutants having the same chance to be created.

Case 2: Uncorrelated Mutation with n

Step Sizes (σi)
l Chromosomes: 〈 x1,…,xn, σ1,…, σn 〉

l σ’i = σi • exp(τ’ • N(0,1) + τ • Ni (0,1))
l x’i = xi + σ’i • Ni (0,1)
l Two learning rate parameters:
– τ’ overall learning rate, the same for the individual. It preserves
all degrees of freedom.
– τ coordinate specific learning rate. It allows different mutation
strategies in different directions.
l τ’ ∝ 1/(2 n)½ and τ ∝ 1/(2 n½) ½.
l Lowest standard deviation: σi’ < ε0 ⇒ σi’ = ε0.
Ellipse: mutants having the same chance to be created.

Note the difference of dimensions.
Case 3: Correlated Mutation
l Chromosomes: 〈 x1,…,xn, σ1,…, σn ,α1,…, αnα〉.

where n a = n • (n-1)/2
l The covariance matrix C is defined as:
– cii = σi2.
– cij = 0 if i and j are not correlated .
– cij = ½ • ( σi2 - σj2 ) • tan(2 αij) if i and j are correlated.

l Note the numbering / indices of the α‘s.
Case 3: Correlated Mutation
The mutation mechanism is then:

l σ’i = σi • exp(τ’ • N(0,1) + τ • Ni (0,1))
l α’ j = αj + β • N (0,1)
l x ’ = x + N(0,C’)
– x stands for the vector 〈 x1,…,xn 〉
– C’ is the covariance matrix C after mutation of the α values
l τ’ ∝ 1/(2 n)½ and τ ∝ 1/(2 n½) ½ and β ≈ 5°
l σi’ < ε0 ⇒ σi’ = ε0 and
l | α’j | > π ⇒ α’j = α’j - 2 π sign(α’j)
Ellipse: mutants having the same chance to be created.

Note the difference of dimensions and angle.
Recombination
l Two parents creates one child.

l Acts per variable / position:
– Intermediate recombination: Average parental values. Often
employed to recombine the strategy parameters.
– Discrete recombination: Select one of the parental values. Often
used to recombine object variable.
l From two or more parents by either:
– Using two selected parents to create a child.
– Selecting two parents for each position anew.
Names of Recombinations
Two parents
Two fixed parents
selected for each i
Local Global
zi = (xi + yi)/2
intermediary intermediary
zi is xi or yi Local Global
chosen randomly discrete discrete
Parent Selection
l Parents are selected by uniform random

distribution whenever an operator needs
one/some.
l Thus: ES parent selection is unbiased - every
individual has the same probability to be
selected.
l Note that in ES “parent” means a population
member, while in GAs, a parent means a
population member selected to undergo
variation.
Survivor Selection
l Applied after creating λ children from the µ

parents by mutation and recombination.
l Deterministically chops off the “bad stuff”:
– Choice of the µ fitter individuals.
– Often λ >> µ.
l Basis of selection is either:
– The set of children only: (µ,λ)-selection.
– The set of parents and children: (µ+λ)-selection.
Survivor Selection
l (µ+λ)-selection is an elitist strategy.

l (µ,λ)-selection can “forget” former individuals.
l Often (µ,λ)-selection is preferred for being:
– Better in leaving local optima because it discards all parents.
– Better in following moving optima because it does not preserve
outdated solutions.
– Using the + strategy bad σ values can survive in 〈x,σ〉 too long if
their host x is very fit.
l Selective pressure in ES is very high (λ ≈ 7 • µ is the
common setting).
Survivor Selection
l The takeover time (t *) of a given selection mechanism is

defined as the number of generations to completely fill
the population with copies of the best individual, given
one initial copy. Goldberg and Deb estimated the
takeover time for:
– A typical evolution strategy:
t * = ln(λ) / ln(λ/µ).
For µ = 15, λ = 100, then t* = 2.
– Proportional selection in a GA:

t * = λ ln(λ).
For λ = 100, then t* = 460.
Self-adaptation
l An self-adaptive ES is proved (in simple cases)

to outperform an ES without adaptation, i.e., the
mutation step decreases over time:
– In the beginning of the search process the
exploration is high to locate promising regions
whereas, later, smaller mutations are required to
yield fine-tuning (exploitation effect).
– In the context of changing fitness landscapes, the
mutation step needs to be increased whenever the
landscape changes, because further explorations if
often necessary.
Self-adaptation Illustrated
Changes in the fitness values (left) and the mutation step sizes (right) for a
moving optimum ES experiment on sphere function with n = 30. The location of
the optimum changes every 200 generations.
Prerequisites for Self-adaptation
l µ > 1 to have different strategies.

l λ > µ to generate offspring surplus.
l Not “too” strong selection, e.g., λ ≈ 7 • µ.
l (µ,λ)-selection to get rid of misadapted σ‘s.
l Mixing strategy parameters by (intermediary)
recombination on them.
Example Application: The Ackley Function

(Bäck et al ’93)
l The Ackley function (here used with n =30):

 1 n 2  1 n 
f ( x ) = −20 ⋅ exp − 0.2 ⋅ ∑ xi  − exp ∑ cos( 2πxi )  + 20 + e

 n i =1   n i =1 
l Evolution strategy (coincidence of 30’s!):
– Representation:
l Range of valid values: -30 < xi < 30.
l 30 step sizes (one for each element of the individual).
– (µ,λ)-selection: (30,200)-selection
– Termination: After 200000 fitness evaluations.
– Results: Average best solution in the final generation is 7.48 • 10 –8
(very good).
Example Application: The Cherry Brandy

Experiment
l Task to create a colour mix yielding a target colour (that

one of a well known cherry brandy).
l Ingredients: water + red, yellow, blue dye.
l Representation: 〈 w, r, y ,b 〉 with no self-adaptation.
l Values scaled to give a predefined total volume (30 ml).
l Mutation: lo / med / hi σ values used with equal chance.
l Selection: (1,8) strategy.
Example Application: The Cherry Brandy

Experiment
l Fitness: Students effectively making the mix

and comparing it with target colour.
l Termination criterion: Student satisfied with
mixed colour.
l Solution is found mostly within 20 generations.
l Accuracy is very good.

CE Estrategias de Evolucao

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CE Estrategias de Evolucao

Uploaded by

Copyright:

Available Formats

Evolution Strategies

l Developed: Germany in the 1970’s.

ES Technical Summary Table

Representation Real-valued vectors

Recombination Discrete or intermediary

Mutation Gaussian perturbation

Parent selection Uniform random

Survivor selection (µ,λ) or (µ+λ)

Specialty Self-adaptation of mutation

Introductory Example: Pseudocde

Introductory Example: Mutation Mechanism

l z values drawn from normal distribution N(ξ,σ):

Illustration of Normal Distribution

Another Historical Example: The jet Nozzle

Task: to optimize the shape of a jet nozzle

l Chromosomes consist of three parts:

l Not every component is always present.

l Main mechanism: changing value by adding random

l Whole mutation effect: 〈 x, σ 〉 à 〈 x’, σ’ 〉

Case 1: Uncorrelated Mutation with one

Mutants with Equal Likelihood

Circle: mutants having the same chance to be created.

Case 2: Uncorrelated Mutation with n

l Chromosomes: 〈 x1,…,xn, σ1,…, σn 〉

Mutants with Equal Likelihood

Ellipse: mutants having the same chance to be created.

Case 3: Correlated Mutation

l Chromosomes: 〈 x1,…,xn, σ1,…, σn ,α1,…, αnα〉.

– cij = ½ • ( σi2 - σj2 ) • tan(2 αij) if i and j are correlated.

Case 3: Correlated Mutation

The mutation mechanism is then:

Mutants with Equal Likelihood

Ellipse: mutants having the same chance to be created.

l Two parents creates one child.

l Parents are selected by uniform random

l Applied after creating λ children from the µ

l (µ+λ)-selection is an elitist strategy.

l The takeover time (t *) of a given selection mechanism is

– Proportional selection in a GA:

l An self-adaptive ES is proved (in simple cases)

Prerequisites for Self-adaptation

l µ > 1 to have different strategies.

Example Application: The Ackley Function

l The Ackley function (here used with n =30):

Example Application: The Cherry Brandy

l Task to create a colour mix yielding a target colour (that

Example Application: The Cherry Brandy

l Fitness: Students effectively making the mix

You might also like