You are on page 1of 9

Evolutionary Algorithms (EAs) – reminder

Bioinspired and soft-computing Main Mating Evolutionary Next


methods in data analysis and population pool operators population
optimization

Estimation of Distribution Algorithms

Krzysztof Michalak
krzysztof.michalak@ue.wroc.pl
https://krzysztof-michalak.pl/

Estimation of Distribution Algorithms Estimation of Distribution Algorithms


 Population-based algorithms  Main steps
 Select specimens from the population ("mating pool" in EAs)
 Representation of good solutions: a probabilistic  Update the probability model
model  Draw a new population from the model
 Evaluate
 In comparison to EAs:
Selected Model
 Specimens are used for evaluation Population Gn
specimens update
Population Gn+1

 No genetic operators, instead:


 A probabilistic model is built
 New specimens are drawn from this model

Larrañaga P., Lozano J. A., "Estimation of Distribution Algorithms", Genetic Algorithms and
Evolutionary Computation series vol. 2, Springer, 2002.
https://link.springer.com/book/10.1007/978-1-4615-1539-5 3 4

A step back… to data mining A step back… to data mining


 Before we go to the EDAs… a question for you!   Before we go to the EDAs… a question for you! 

 What is the underlying assumption of the Naive  What is the underlying assumption of the Naive
Bayes classifier? Bayes classifier?

The probability distributions of the attributes


(features) are assumed to be independent

? We can model the joint probability distribution


using the marginals
n
P ( x | o j  ci )   P( xk | o j  ci )
k 1

5 6

1
EDAs for binary vectors EDAs for binary vectors
 Univariate Marginal Distribution Algorithm (UMDA)[1]  Univariate Marginal Distribution Algorithm (UMDA)

 Models binary vectors in  = { 0, 1 }n  Model update

 The joint probability distribution is modelled as:  Performed using solutions from the mating pool Q
n
We only need to model the marginals for k = 1, …, n ,
P ( x )   P ( xk ) 

k 1
because we have assumed independence

 A difference w.r.t. the Naive Bayes: we update the  The model is a vector of probabilities p k for k = 1, …, n
model based on the mating pool Q, and not based
 Each p k represents the probability of setting the k -th
on the training sample genotype element to 1.
[1] Mühlenbein H., Paaß G., "From recombination of genes to the estimation of distributions I. x  Q : xk  1
Binary parameters", in: Voigt H. M., Ebeling W., Rechenberg I., Schwefel H. P. (eds) Parallel pk  P( xk  1) 
Problem Solving from Nature — PPSN IV. PPSN 1996, Lecture Notes in Computer Science, Q
vol 1141. Springer, Berlin, Heidelberg, 1996. 7 8

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Model update xk
 Sampling solutions for the next generation
 Because we assumed that the probability distributions for
different positions in the genotype are independent…
Mating pool Q
 … we can draw the value for each position in the
genotype independently of the others. For k = 1, …,n :
 Draw a number  uniformly from [0, 1]
Model
 If  < P(x k = 1) set x k = 1
pk
 Otherwise, set x k = 0
x  Q : xk  1
pk  P( xk  1) 
Q
9 10

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Sampling solutions for the next generation  Pseudocode


pk
 Start from a population of random binary vectors
Model
if rand([0, 1]) < p k
then x k = 1  Until the stopping condition is met
else x k = 0  Evaluate solutions in the population
 Select the mating pool
 Update the model
Next Population Gn
Selected
specimens
Model
Population Gn+1
Sample the next generation
update
generation 

from the model

xk

11 12

2
EDAs for binary vectors EDAs for binary vectors
 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Pseudocode  Pseudocode
Time for a little refresher! Time for a little refresher!
 Start from a population of random binary vectors  Start from a population of random binary vectors
Name three selection methods we Name three selection methods we
 Until the stopping conditiondiscussed
is metin the previous lectures:  Until the stopping conditiondiscussed
is metin the previous lectures:
 Evaluate solutions in the population
1) . . . . . . . .  Evaluate solutions in the population
1) Roulette wheel
2) . . . . . . . . 2) Binary tournament
 Select the mating pool 3) . . . . . . . .  Select the mating pool 3) Elitist selection
 Update the model  Update the model
Selected Model Selected Model
Population Gn Population Gn+1 Population Gn Population Gn+1
specimens specimens
Sample the next generation Sample the next generation
update update
 

from the model from the model

13 14

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Generalizations – how to apply a similar approach  Generalizations – how to apply a similar approach
to: to:
 Discrete values, but not binary, e.g. 1, 2, …, h  Discrete values, but not binary, e.g. 1, 2, …, h
 Continuous distributions  Continuous distributions

?  Follow the same approach as for the Naive Bayes


 Model each marginal distribution separately (for each x k
with k = 1, …, n )

 A wide variety of models is available (some examples in


the following slides)
15 16

EDAs for integer vectors EDAs for integer vectors


 Generalization to discrete values: j = 1, 2, …, h  Generalization to discrete values: j = 1, 2, …, h

 The model for each variable x k is a histogram  The model for each variable x k is a histogram
Probability distribution for xk  Model update: approximate the probability that
0.30 x k = j with the frequency with which j occurs at
0.25 position k :
0.20
x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
P( xk = j )

0.15
Q
0.10

0.05

0.00
1 2 3 4 5
j  { 1, …, h }

17 18

3
EDAs for integer vectors EDAs for integer vectors
 Model update: histograms represent the frequencies  The model for each variable x k is a histogram

 Sampling: for the genotype element at position k


draw each of the values j = 1, 2, …, h with the
probability P(x k = j )

x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
Q 19 20

EDAs for integer vectors EDAs for integer vectors


 The model for each variable x k is a histogram  The model for each variable x k is a histogram

 Sampling: for the genotype element at position k  Sampling: for the genotype element at position k
draw each of the values j = 1, 2, …, h with the draw each of the values j = 1, 2, …, h with the
probability P(x k = j ) probability P(x k = j )
You can use the roulette-wheel selection.
Just use the probabilities P(x k = j ) for the
values j = 1, 2, …, h
Do you remember, from the previous
lectures, a method which does that?
P(xk = 1)
P(xk = 2)
P(xk = 3)
P(xk = 4)
P(xk = 5)
P(xk = 6)

21 22

EDAs for integer vectors EDAs for real vectors


 Sampling solutions for the next generation  Generalization to continuous values
xk
 Parametric approach – model update
 Assume the type of distribution (e.g. Gaussian)

 Estimate the parameters for this type of distribution

 Note, that we only need univariate distributions

 Sampling
 Sample x k for k = 1, …, n from the distribution with the
estimated parameters (for each k separately)

23 24

4
EDAs for real vectors EDAs for real vectors
 An analogy to the parametric estimation in the  An analogy to the parametric estimation in the
Naive Bayes classifier Naive Bayes classifier Remember an important difference:
The marginal distributions The marginal distributions  we do not try to model multiple
classes
 instead we model the
distribution of solutions in the
mating pool
So, for example, we construct a
model only for the green points

25 26

EDAs for real vectors EDAs for binary vectors


 An analogy to the parametric estimation in the  Population-Based Incremental Learning (PBIL)
Naive Bayes classifier
 Models binary vectors in  = { 0, 1 }n

 The model is a vector of probabilities p k


for k = 1, …, n

The joint distribution  Each p k represents the probability of setting


the k -th genotype element to 1.

[1] Baluja S., "Population-based incremental learning: a method for integrating genetic search
based function optimization and competitive learning", Carnegie Mellon University,
27 Pittsburgh, PA, USA, Technical report, 1994. 28

EDAs for binary vectors EDAs for binary vectors


 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)

 Model update  Parameters (and their default values from the


 Performed using the best solution x (+) in the mating pool S. Baluja paper)
and the worst solution x (-)  Positive learning rate: η + (0.1)
 The second step of the update is mutation  Negative learning rate: η − (0.075)

 Sampling  The sum of the learning rates: η = η + + η −

 Performed in the same way as in UMDA  Mutation probability: P mut (0.02)

 Each element of the new genotype x k is set to 1 with the  Mutation shift:  (0.05)
probability p k
 For k = 1, …, n draw a number  uniformly from [0, 1]
 If  < P(x k = 1) set x k = 1, otherwise, set x k = 0 29 30

5
EDAs for binary vectors EDAs for binary vectors
 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)

 Model update using the best solution x (+) and the  Mutation
worst solution x (-)  Independently for each position k = 1, …, n in the
 If xk(-) = xk(+) set: genotype with the probability P mut

pk  pk  (1    )  xk(  )    Draw a value   { 0, 1 } with P(0) = P(1) = 0.5


 Set p k = p k  ( 1 - )+
 If xk (-) ≠ xk (+) set:
pk  pk  (1   )  xk( ) 

31 32

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model used in the Edge Histogram-Based Sampling  Model update
Algorithm (EHBSA). A matrix [n ×n ] with p i,j  [0, 1]n
 From the mating pool select the shortest tour (+) and
p i,j – how probable it is that j is placed
right after i in good permutations the longest one (-)
j  Encode the permutations (+) and (-) as permutation
0.1 0.8 0 0.1 matrices x (+) and x (-)
0 0 0.1 0.9 1
i
0.9 0 0 0.1
π = 3, 1, 4, 2 1
1
0.1 0.1 0.8 0
 Apply the update rule used in the PBIL algorithm using
x (+) and x (-)
33 34

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 The model is the probability matrix  We do not want to revisit cities!
j  Set the probabilities for the j -th city to 0
0.1 0.8 0 0.1
j
0 0 0.1 0.9 j=3
0.1 0.8 0 0.1
i
0.9 0 0 0.1
0 0 0 0.9
i
0.1 0.1 0.8 0
0.9 0 0 0.1

 Select the first city in the tour uniformly at random 0.1 0.1 0 0

(e.g. j = 3)
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 ? ? ? ] 35 36

6
EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row
 Set the probabilities for the j -th city to 0
j
0.1 0.8 0 0.1 j
i=3 j=1
0 0.8 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0.9 0 0 0.1 i
0 0 0 0.1
0.1 0.1 0 0
0 0.1 0 0
 Selected city j = 1
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 ? ? ]
37 38

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row  Set the probabilities for the j -th city to 0
j
0 0.8 0 0.1 j
i=1 j=2
0 0 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0 0 0 0.1 i
0 0 0 0.1
0 0.1 0 0
0 0 0 0
 Selected city j = 2
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 2 ? ]
39 40

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The Mallows model
 Model sampling  Exponential, unimodal distribution

 Select the next city in the tour with the probabilities  Analogous to the Gaussian distribution
given in the i -th row
 The probability distribution has two parameters:
j
 A central permutation π0
0 0 0 0.1
i=2  A spread parameter Θ
0 0 0 0.9
i
0 0 0 0.1
 Needs a definition of
a distance between
0 0 0 0 permutations D(, )

 Selected city j = 4
Tour: [ 3 1 2 4 ]
41 42
Mallows models used in EDAs for permutation-based
optimization problems, like, for example, the TSP

7
EDAs for the TSP – models EDAs for the TSP – models
 The Kendall's- distance D(σ, π),  The Kendall's- distance D(σ, π),
 Measures the number of pairs of items for which σ and π  Has properties which make model update and sampling
have the opposing ordering simpler

 Right invariance
 For any permutation : D(σ, π) = D(σ  , π  )
 Thus: D(σ, π) = D(σ  π-1, id), where id is the identity permutation
 Example: id = 1 2 … n
σ= 12 34 5  Instead D(σ  π-1, id) denote D(), where  = σ  π-1
π=34125
D(σ, π) = 4
 Decomposition Vj(ζ) is the number of positions of
D ( )   j 1V j ( )
n 1
the permutation ζ to the right of j
with values smaller than ζ(j)
V j ( )  i  j 1 I  ( j )   (i ) 
n

43 44

EDAs for the TSP – models EDAs for the TSP – models
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm

 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool

 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
Note the similarity to the Gaussian
constant  ( ) constant  ( )
distribution!

 The model is:  The model is:


n 1 n 1

1  V j  01  1  V j  01  1


1  x   2
  
P ( )  e j 1
P ( )  e j 1
f ( x)  e 2  

       2

45 46

EDAs for the TSP – models EDAs for the TSP – models
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm

 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool

 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
constant  ( ) Normalization constants
constant  ( ) Distance from the centroid
(Mallows) and the mean (Gaussian)
 The model is:  The model is:
n 1 n 1

1   
V j  01  1
1  x   2
   1   
V j  01  1  
1  x   2

P ( )  e j 1
f ( x)  e 2  
P ( )  e j 1
f ( x)  e 2  
    2     2

47 48

8
EDAs for the TSP – models
 The Mallows model – sampling (overview)
 Draw the Vj(ππ-1
0 ) values randomly

 Reconstruct a permutation using the algorithm proposed


by Meila et. al[1]

 Many more details can be found in the paper by Josu


Ceberio[2]

[1] Meila M., Phadnis K., Patterson A., and Bilmes J., "Consensus ranking under the exponen-
tial model", in: Proc. 22nd Conf. Uncertainty Artif. Intell., Vancouver, BC, USA, Jul. 2007, pp.
285–294.

[2] Ceberio J., Irurozki E., Mendiburu A., "A Distance-Based Ranking Model Estimation of
Distribution Algorithm for the Flowshop Scheduling Problem", IEEE Transactions on Evolutionary
computation, vol. 18, no. 2, 2014. 49

You might also like