BSCM-04 Estimation of Distribution Algorithms

Evolutionary Algorithms (EAs) – reminder
Bioinspired and soft-computing Main Mating Evolutionary Next

methods in data analysis and population pool operators population
optimization
Estimation of Distribution Algorithms
Krzysztof Michalak
krzysztof.michalak@ue.wroc.pl
https://krzysztof-michalak.pl/
Estimation of Distribution Algorithms Estimation of Distribution Algorithms

 Population-based algorithms  Main steps
 Select specimens from the population ("mating pool" in EAs)
 Representation of good solutions: a probabilistic  Update the probability model
model  Draw a new population from the model
 Evaluate
 In comparison to EAs:
Selected Model
 Specimens are used for evaluation Population Gn
specimens update
Population Gn+1
 No genetic operators, instead:

 A probabilistic model is built
 New specimens are drawn from this model
Larrañaga P., Lozano J. A., "Estimation of Distribution Algorithms", Genetic Algorithms and
Evolutionary Computation series vol. 2, Springer, 2002.
https://link.springer.com/book/10.1007/978-1-4615-1539-5 3 4
A step back… to data mining A step back… to data mining

 Before we go to the EDAs… a question for you!   Before we go to the EDAs… a question for you! 
 What is the underlying assumption of the Naive  What is the underlying assumption of the Naive
Bayes classifier? Bayes classifier?
The probability distributions of the attributes

(features) are assumed to be independent
? We can model the joint probability distribution

using the marginals
n
P ( x | o j  ci )   P( xk | o j  ci )
k 1
5 6
1
EDAs for binary vectors EDAs for binary vectors
 Univariate Marginal Distribution Algorithm (UMDA)[1]  Univariate Marginal Distribution Algorithm (UMDA)
 Models binary vectors in  = { 0, 1 }n  Model update
 The joint probability distribution is modelled as:  Performed using solutions from the mating pool Q
n
We only need to model the marginals for k = 1, …, n ,
P ( x )   P ( xk ) 
k 1
because we have assumed independence
 A difference w.r.t. the Naive Bayes: we update the  The model is a vector of probabilities p k for k = 1, …, n
model based on the mating pool Q, and not based
 Each p k represents the probability of setting the k -th
on the training sample genotype element to 1.
[1] Mühlenbein H., Paaß G., "From recombination of genes to the estimation of distributions I. x  Q : xk  1
Binary parameters", in: Voigt H. M., Ebeling W., Rechenberg I., Schwefel H. P. (eds) Parallel pk  P( xk  1) 
Problem Solving from Nature — PPSN IV. PPSN 1996, Lecture Notes in Computer Science, Q
vol 1141. Springer, Berlin, Heidelberg, 1996. 7 8

 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)
 Model update xk
 Sampling solutions for the next generation
 Because we assumed that the probability distributions for
different positions in the genotype are independent…
Mating pool Q
 … we can draw the value for each position in the
genotype independently of the others. For k = 1, …,n :
 Draw a number  uniformly from [0, 1]
Model
 If  < P(x k = 1) set x k = 1
pk
 Otherwise, set x k = 0
x  Q : xk  1
pk  P( xk  1) 
Q
9 10

 Sampling solutions for the next generation  Pseudocode

pk
 Start from a population of random binary vectors
Model
if rand([0, 1]) < p k
then x k = 1  Until the stopping condition is met
else x k = 0  Evaluate solutions in the population
 Select the mating pool
 Update the model
Next Population Gn
Selected
specimens
Model
Population Gn+1
Sample the next generation
update
generation 
from the model
xk
11 12
2
 Pseudocode  Pseudocode
Time for a little refresher! Time for a little refresher!
 Start from a population of random binary vectors  Start from a population of random binary vectors
Name three selection methods we Name three selection methods we
 Until the stopping conditiondiscussed
is metin the previous lectures:  Until the stopping conditiondiscussed
is metin the previous lectures:
 Evaluate solutions in the population
1) . . . . . . . .  Evaluate solutions in the population
1) Roulette wheel
2) . . . . . . . . 2) Binary tournament
 Select the mating pool 3) . . . . . . . .  Select the mating pool 3) Elitist selection
 Update the model  Update the model
Selected Model Selected Model
Population Gn Population Gn+1 Population Gn Population Gn+1
specimens specimens
Sample the next generation Sample the next generation
update update
 
from the model from the model
13 14

 Generalizations – how to apply a similar approach  Generalizations – how to apply a similar approach
to: to:
 Discrete values, but not binary, e.g. 1, 2, …, h  Discrete values, but not binary, e.g. 1, 2, …, h
 Continuous distributions  Continuous distributions
?  Follow the same approach as for the Naive Bayes

 Model each marginal distribution separately (for each x k
with k = 1, …, n )
 A wide variety of models is available (some examples in

the following slides)
15 16
EDAs for integer vectors EDAs for integer vectors

 Generalization to discrete values: j = 1, 2, …, h  Generalization to discrete values: j = 1, 2, …, h
 The model for each variable x k is a histogram  The model for each variable x k is a histogram
Probability distribution for xk  Model update: approximate the probability that
0.30 x k = j with the frequency with which j occurs at
0.25 position k :
0.20
x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
P( xk = j )
0.15
Q
0.10
0.05
0.00
1 2 3 4 5
j  { 1, …, h }
17 18
3
 Model update: histograms represent the frequencies  The model for each variable x k is a histogram
 Sampling: for the genotype element at position k

draw each of the values j = 1, 2, …, h with the
probability P(x k = j )
x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
Q 19 20

 The model for each variable x k is a histogram  The model for each variable x k is a histogram
 Sampling: for the genotype element at position k  Sampling: for the genotype element at position k
draw each of the values j = 1, 2, …, h with the draw each of the values j = 1, 2, …, h with the
probability P(x k = j ) probability P(x k = j )
You can use the roulette-wheel selection.
Just use the probabilities P(x k = j ) for the
values j = 1, 2, …, h
Do you remember, from the previous
lectures, a method which does that?
P(xk = 1)
P(xk = 2)
P(xk = 3)
P(xk = 4)
P(xk = 5)
P(xk = 6)
21 22
EDAs for integer vectors EDAs for real vectors

 Sampling solutions for the next generation  Generalization to continuous values
xk
 Parametric approach – model update
 Assume the type of distribution (e.g. Gaussian)
 Estimate the parameters for this type of distribution
 Note, that we only need univariate distributions
 Sampling
 Sample x k for k = 1, …, n from the distribution with the
estimated parameters (for each k separately)
23 24
4
EDAs for real vectors EDAs for real vectors
 An analogy to the parametric estimation in the  An analogy to the parametric estimation in the
Naive Bayes classifier Naive Bayes classifier Remember an important difference:
The marginal distributions The marginal distributions  we do not try to model multiple
classes
 instead we model the
distribution of solutions in the
mating pool
So, for example, we construct a
model only for the green points
25 26
EDAs for real vectors EDAs for binary vectors

 An analogy to the parametric estimation in the  Population-Based Incremental Learning (PBIL)
Naive Bayes classifier
 Models binary vectors in  = { 0, 1 }n
 The model is a vector of probabilities p k

for k = 1, …, n
The joint distribution  Each p k represents the probability of setting

the k -th genotype element to 1.
[1] Baluja S., "Population-based incremental learning: a method for integrating genetic search
based function optimization and competitive learning", Carnegie Mellon University,
27 Pittsburgh, PA, USA, Technical report, 1994. 28

 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)
 Model update  Parameters (and their default values from the

 Performed using the best solution x (+) in the mating pool S. Baluja paper)
and the worst solution x (-)  Positive learning rate: η + (0.1)
 The second step of the update is mutation  Negative learning rate: η − (0.075)
 Sampling  The sum of the learning rates: η = η + + η −
 Performed in the same way as in UMDA  Mutation probability: P mut (0.02)
 Each element of the new genotype x k is set to 1 with the  Mutation shift:  (0.05)
probability p k
 For k = 1, …, n draw a number  uniformly from [0, 1]
 If  < P(x k = 1) set x k = 1, otherwise, set x k = 0 29 30
5
 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)
 Model update using the best solution x (+) and the  Mutation
worst solution x (-)  Independently for each position k = 1, …, n in the
 If xk(-) = xk(+) set: genotype with the probability P mut
pk  pk  (1    )  xk(  )    Draw a value   { 0, 1 } with P(0) = P(1) = 0.5

 Set p k = p k  ( 1 - )+
 If xk (-) ≠ xk (+) set:
pk  pk  (1   )  xk( ) 
31 32
EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model used in the Edge Histogram-Based Sampling  Model update
Algorithm (EHBSA). A matrix [n ×n ] with p i,j  [0, 1]n
 From the mating pool select the shortest tour (+) and
p i,j – how probable it is that j is placed
right after i in good permutations the longest one (-)
j  Encode the permutations (+) and (-) as permutation
0.1 0.8 0 0.1 matrices x (+) and x (-)
0 0 0.1 0.9 1
i
0.9 0 0 0.1
π = 3, 1, 4, 2 1
1
0.1 0.1 0.8 0
 Apply the update rule used in the PBIL algorithm using
x (+) and x (-)
33 34
 Model sampling  Model sampling
 The model is the probability matrix  We do not want to revisit cities!
j  Set the probabilities for the j -th city to 0
0.1 0.8 0 0.1
j
0 0 0.1 0.9 j=3
0.1 0.8 0 0.1
i
0.9 0 0 0.1
0 0 0 0.9
i
0.1 0.1 0.8 0
0.9 0 0 0.1
 Select the first city in the tour uniformly at random 0.1 0.1 0 0
(e.g. j = 3)
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 ? ? ? ] 35 36
6
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row
 Set the probabilities for the j -th city to 0
j
0.1 0.8 0 0.1 j
i=3 j=1
0 0.8 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0.9 0 0 0.1 i
0 0 0 0.1
0.1 0.1 0 0
0 0.1 0 0
 Selected city j = 1
Tour: [ 3 1 ? ? ]
37 38
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row  Set the probabilities for the j -th city to 0
j
0 0.8 0 0.1 j
i=1 j=2
0 0 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0 0 0 0.1 i
0 0 0 0.1
0 0.1 0 0
0 0 0 0
Tour: [ 3 1 2 ? ]
39 40
 The EH-PBIL model  The Mallows model
 Model sampling  Exponential, unimodal distribution
 Select the next city in the tour with the probabilities  Analogous to the Gaussian distribution
given in the i -th row
 The probability distribution has two parameters:
j
 A central permutation π0
0 0 0 0.1
i=2  A spread parameter Θ
0 0 0 0.9
i
0 0 0 0.1
 Needs a definition of
a distance between
0 0 0 0 permutations D(, )
Tour: [ 3 1 2 4 ]
41 42
Mallows models used in EDAs for permutation-based
optimization problems, like, for example, the TSP
7
 The Kendall's- distance D(σ, π),  The Kendall's- distance D(σ, π),
 Measures the number of pairs of items for which σ and π  Has properties which make model update and sampling
have the opposing ordering simpler
 Right invariance
 For any permutation : D(σ, π) = D(σ  , π  )
 Thus: D(σ, π) = D(σ  π-1, id), where id is the identity permutation
 Example: id = 1 2 … n
σ= 12 34 5  Instead D(σ  π-1, id) denote D(), where  = σ  π-1
π=34125
D(σ, π) = 4
 Decomposition Vj(ζ) is the number of positions of
D ( )   j 1V j ( )
n 1
the permutation ζ to the right of j
with values smaller than ζ(j)
V j ( )  i  j 1 I  ( j )   (i ) 
n
43 44
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm
 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool
 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
Note the similarity to the Gaussian
constant  ( ) constant  ( )
distribution!
 The model is:  The model is:

n 1 n 1
1  V j  01  1  V j  01  1

1  x   2
  
P ( )  e j 1
P ( )  e j 1
f ( x)  e 2  
       2
45 46
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm
 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool
 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
constant  ( ) Normalization constants
constant  ( ) Distance from the centroid
(Mallows) and the mean (Gaussian)
 The model is:  The model is:
n 1 n 1
1   
V j  01  1
1  x   2
   1   
V j  01  1  
1  x   2

P ( )  e j 1
f ( x)  e 2  
P ( )  e j 1
f ( x)  e 2  
    2     2
47 48
8
EDAs for the TSP – models
 The Mallows model – sampling (overview)
 Draw the Vj(ππ-1
0 ) values randomly
 Reconstruct a permutation using the algorithm proposed

by Meila et. al[1]
 Many more details can be found in the paper by Josu

Ceberio[2]
[1] Meila M., Phadnis K., Patterson A., and Bilmes J., "Consensus ranking under the exponen-
tial model", in: Proc. 22nd Conf. Uncertainty Artif. Intell., Vancouver, BC, USA, Jul. 2007, pp.
285–294.
[2] Ceberio J., Irurozki E., Mendiburu A., "A Distance-Based Ranking Model Estimation of
Distribution Algorithm for the Flowshop Scheduling Problem", IEEE Transactions on Evolutionary
computation, vol. 18, no. 2, 2014. 49

BSCM-04 Estimation of Distribution Algorithms

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BSCM-04 Estimation of Distribution Algorithms

Uploaded by

Copyright:

Available Formats

Evolutionary Algorithms (EAs) – reminder

Bioinspired and soft-computing Main Mating Evolutionary Next

Estimation of Distribution Algorithms

Estimation of Distribution Algorithms Estimation of Distribution Algorithms

 No genetic operators, instead:

A step back… to data mining A step back… to data mining

The probability distributions of the attributes

? We can model the joint probability distribution

 Models binary vectors in  = { 0, 1 }n  Model update

EDAs for binary vectors EDAs for binary vectors

EDAs for binary vectors EDAs for binary vectors

 Sampling solutions for the next generation  Pseudocode

from the model

from the model from the model

EDAs for binary vectors EDAs for binary vectors

?  Follow the same approach as for the Naive Bayes

 A wide variety of models is available (some examples in

EDAs for integer vectors EDAs for integer vectors

 Sampling: for the genotype element at position k

EDAs for integer vectors EDAs for integer vectors

EDAs for integer vectors EDAs for real vectors

 Estimate the parameters for this type of distribution

 Note, that we only need univariate distributions

EDAs for real vectors EDAs for binary vectors

 The model is a vector of probabilities p k

The joint distribution  Each p k represents the probability of setting

EDAs for binary vectors EDAs for binary vectors

 Model update  Parameters (and their default values from the

 Sampling  The sum of the learning rates: η = η + + η −

 Performed in the same way as in UMDA  Mutation probability: P mut (0.02)

pk  pk  (1    )  xk(  )    Draw a value   { 0, 1 } with P(0) = P(1) = 0.5

 The model is:  The model is:

1  V j  01  1  V j  01  1

 Reconstruct a permutation using the algorithm proposed

 Many more details can be found in the paper by Josu

You might also like