Professional Documents
Culture Documents
Krzysztof Michalak
krzysztof.michalak@ue.wroc.pl
https://krzysztof-michalak.pl/
Larrañaga P., Lozano J. A., "Estimation of Distribution Algorithms", Genetic Algorithms and
Evolutionary Computation series vol. 2, Springer, 2002.
https://link.springer.com/book/10.1007/978-1-4615-1539-5 3 4
What is the underlying assumption of the Naive What is the underlying assumption of the Naive
Bayes classifier? Bayes classifier?
5 6
1
EDAs for binary vectors EDAs for binary vectors
Univariate Marginal Distribution Algorithm (UMDA)[1] Univariate Marginal Distribution Algorithm (UMDA)
The joint probability distribution is modelled as: Performed using solutions from the mating pool Q
n
We only need to model the marginals for k = 1, …, n ,
P ( x ) P ( xk )
k 1
because we have assumed independence
A difference w.r.t. the Naive Bayes: we update the The model is a vector of probabilities p k for k = 1, …, n
model based on the mating pool Q, and not based
Each p k represents the probability of setting the k -th
on the training sample genotype element to 1.
[1] Mühlenbein H., Paaß G., "From recombination of genes to the estimation of distributions I. x Q : xk 1
Binary parameters", in: Voigt H. M., Ebeling W., Rechenberg I., Schwefel H. P. (eds) Parallel pk P( xk 1)
Problem Solving from Nature — PPSN IV. PPSN 1996, Lecture Notes in Computer Science, Q
vol 1141. Springer, Berlin, Heidelberg, 1996. 7 8
Model update xk
Sampling solutions for the next generation
Because we assumed that the probability distributions for
different positions in the genotype are independent…
Mating pool Q
… we can draw the value for each position in the
genotype independently of the others. For k = 1, …,n :
Draw a number uniformly from [0, 1]
Model
If < P(x k = 1) set x k = 1
pk
Otherwise, set x k = 0
x Q : xk 1
pk P( xk 1)
Q
9 10
xk
11 12
2
EDAs for binary vectors EDAs for binary vectors
Univariate Marginal Distribution Algorithm (UMDA) Univariate Marginal Distribution Algorithm (UMDA)
Pseudocode Pseudocode
Time for a little refresher! Time for a little refresher!
Start from a population of random binary vectors Start from a population of random binary vectors
Name three selection methods we Name three selection methods we
Until the stopping conditiondiscussed
is metin the previous lectures: Until the stopping conditiondiscussed
is metin the previous lectures:
Evaluate solutions in the population
1) . . . . . . . . Evaluate solutions in the population
1) Roulette wheel
2) . . . . . . . . 2) Binary tournament
Select the mating pool 3) . . . . . . . . Select the mating pool 3) Elitist selection
Update the model Update the model
Selected Model Selected Model
Population Gn Population Gn+1 Population Gn Population Gn+1
specimens specimens
Sample the next generation Sample the next generation
update update
13 14
Generalizations – how to apply a similar approach Generalizations – how to apply a similar approach
to: to:
Discrete values, but not binary, e.g. 1, 2, …, h Discrete values, but not binary, e.g. 1, 2, …, h
Continuous distributions Continuous distributions
The model for each variable x k is a histogram The model for each variable x k is a histogram
Probability distribution for xk Model update: approximate the probability that
0.30 x k = j with the frequency with which j occurs at
0.25 position k :
0.20
x Q : xk j
j 1,2, , h : P ( xk j )
P( xk = j )
0.15
Q
0.10
0.05
0.00
1 2 3 4 5
j { 1, …, h }
17 18
3
EDAs for integer vectors EDAs for integer vectors
Model update: histograms represent the frequencies The model for each variable x k is a histogram
x Q : xk j
j 1,2, , h : P ( xk j )
Q 19 20
Sampling: for the genotype element at position k Sampling: for the genotype element at position k
draw each of the values j = 1, 2, …, h with the draw each of the values j = 1, 2, …, h with the
probability P(x k = j ) probability P(x k = j )
You can use the roulette-wheel selection.
Just use the probabilities P(x k = j ) for the
values j = 1, 2, …, h
Do you remember, from the previous
lectures, a method which does that?
P(xk = 1)
P(xk = 2)
P(xk = 3)
P(xk = 4)
P(xk = 5)
P(xk = 6)
21 22
Sampling
Sample x k for k = 1, …, n from the distribution with the
estimated parameters (for each k separately)
23 24
4
EDAs for real vectors EDAs for real vectors
An analogy to the parametric estimation in the An analogy to the parametric estimation in the
Naive Bayes classifier Naive Bayes classifier Remember an important difference:
The marginal distributions The marginal distributions we do not try to model multiple
classes
instead we model the
distribution of solutions in the
mating pool
So, for example, we construct a
model only for the green points
25 26
[1] Baluja S., "Population-based incremental learning: a method for integrating genetic search
based function optimization and competitive learning", Carnegie Mellon University,
27 Pittsburgh, PA, USA, Technical report, 1994. 28
Each element of the new genotype x k is set to 1 with the Mutation shift: (0.05)
probability p k
For k = 1, …, n draw a number uniformly from [0, 1]
If < P(x k = 1) set x k = 1, otherwise, set x k = 0 29 30
5
EDAs for binary vectors EDAs for binary vectors
Population-Based Incremental Learning (PBIL) Population-Based Incremental Learning (PBIL)
Model update using the best solution x (+) and the Mutation
worst solution x (-) Independently for each position k = 1, …, n in the
If xk(-) = xk(+) set: genotype with the probability P mut
31 32
EDAs for the TSP – models EDAs for the TSP – models
The EH-PBIL model The EH-PBIL model
Model used in the Edge Histogram-Based Sampling Model update
Algorithm (EHBSA). A matrix [n ×n ] with p i,j [0, 1]n
From the mating pool select the shortest tour (+) and
p i,j – how probable it is that j is placed
right after i in good permutations the longest one (-)
j Encode the permutations (+) and (-) as permutation
0.1 0.8 0 0.1 matrices x (+) and x (-)
0 0 0.1 0.9 1
i
0.9 0 0 0.1
π = 3, 1, 4, 2 1
1
0.1 0.1 0.8 0
Apply the update rule used in the PBIL algorithm using
x (+) and x (-)
33 34
EDAs for the TSP – models EDAs for the TSP – models
The EH-PBIL model The EH-PBIL model
Model sampling Model sampling
The model is the probability matrix We do not want to revisit cities!
j Set the probabilities for the j -th city to 0
0.1 0.8 0 0.1
j
0 0 0.1 0.9 j=3
0.1 0.8 0 0.1
i
0.9 0 0 0.1
0 0 0 0.9
i
0.1 0.1 0.8 0
0.9 0 0 0.1
Select the first city in the tour uniformly at random 0.1 0.1 0 0
(e.g. j = 3)
Now, the j -th city is our starting point (set i = j )
Tour: [ 3 ? ? ? ] 35 36
6
EDAs for the TSP – models EDAs for the TSP – models
The EH-PBIL model The EH-PBIL model
Model sampling Model sampling
Select the next city in the tour with the probabilities We do not want to revisit cities!
given in the i -th row
Set the probabilities for the j -th city to 0
j
0.1 0.8 0 0.1 j
i=3 j=1
0 0.8 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0.9 0 0 0.1 i
0 0 0 0.1
0.1 0.1 0 0
0 0.1 0 0
Selected city j = 1
Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 ? ? ]
37 38
EDAs for the TSP – models EDAs for the TSP – models
The EH-PBIL model The EH-PBIL model
Model sampling Model sampling
Select the next city in the tour with the probabilities We do not want to revisit cities!
given in the i -th row Set the probabilities for the j -th city to 0
j
0 0.8 0 0.1 j
i=1 j=2
0 0 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0 0 0 0.1 i
0 0 0 0.1
0 0.1 0 0
0 0 0 0
Selected city j = 2
Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 2 ? ]
39 40
EDAs for the TSP – models EDAs for the TSP – models
The EH-PBIL model The Mallows model
Model sampling Exponential, unimodal distribution
Select the next city in the tour with the probabilities Analogous to the Gaussian distribution
given in the i -th row
The probability distribution has two parameters:
j
A central permutation π0
0 0 0 0.1
i=2 A spread parameter Θ
0 0 0 0.9
i
0 0 0 0.1
Needs a definition of
a distance between
0 0 0 0 permutations D(, )
Selected city j = 4
Tour: [ 3 1 2 4 ]
41 42
Mallows models used in EDAs for permutation-based
optimization problems, like, for example, the TSP
7
EDAs for the TSP – models EDAs for the TSP – models
The Kendall's- distance D(σ, π), The Kendall's- distance D(σ, π),
Measures the number of pairs of items for which σ and π Has properties which make model update and sampling
have the opposing ordering simpler
Right invariance
For any permutation : D(σ, π) = D(σ , π )
Thus: D(σ, π) = D(σ π-1, id), where id is the identity permutation
Example: id = 1 2 … n
σ= 12 34 5 Instead D(σ π-1, id) denote D(), where = σ π-1
π=34125
D(σ, π) = 4
Decomposition Vj(ζ) is the number of positions of
D ( ) j 1V j ( )
n 1
the permutation ζ to the right of j
with values smaller than ζ(j)
V j ( ) i j 1 I ( j ) (i )
n
43 44
EDAs for the TSP – models EDAs for the TSP – models
The Mallows model – model update (overview) The Mallows model – model update (overview)
Calculate the centroid 0 using the Borda algorithm Calculate the centroid 0 using the Borda algorithm
For each permutation in the mating pool For each permutation in the mating pool
Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0) Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)
Average the Vj(ππ-1
0 ) values in the mating pool Average the Vj(ππ-1
0 ) values in the mating pool
Calculate the spread parameter and the normalization Calculate the spread parameter and the normalization
Note the similarity to the Gaussian
constant ( ) constant ( )
distribution!
2
45 46
EDAs for the TSP – models EDAs for the TSP – models
The Mallows model – model update (overview) The Mallows model – model update (overview)
Calculate the centroid 0 using the Borda algorithm Calculate the centroid 0 using the Borda algorithm
For each permutation in the mating pool For each permutation in the mating pool
Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1) Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)
Average the Vj(ππ-1
0 ) values in the mating pool Average the Vj(ππ-1
0 ) values in the mating pool
Calculate the spread parameter and the normalization Calculate the spread parameter and the normalization
constant ( ) Normalization constants
constant ( ) Distance from the centroid
(Mallows) and the mean (Gaussian)
The model is: The model is:
n 1 n 1
1
V j 01 1
1 x 2
1
V j 01 1
1 x 2
P ( ) e j 1
f ( x) e 2
P ( ) e j 1
f ( x) e 2
2 2
47 48
8
EDAs for the TSP – models
The Mallows model – sampling (overview)
Draw the Vj(ππ-1
0 ) values randomly
[1] Meila M., Phadnis K., Patterson A., and Bilmes J., "Consensus ranking under the exponen-
tial model", in: Proc. 22nd Conf. Uncertainty Artif. Intell., Vancouver, BC, USA, Jul. 2007, pp.
285–294.
[2] Ceberio J., Irurozki E., Mendiburu A., "A Distance-Based Ranking Model Estimation of
Distribution Algorithm for the Flowshop Scheduling Problem", IEEE Transactions on Evolutionary
computation, vol. 18, no. 2, 2014. 49