Professional Documents
Culture Documents
Differential Evolution A Survey of Theoretical Analyses
Differential Evolution A Survey of Theoretical Analyses
PII: S2210-6502(17)30422-4
DOI: 10.1016/j.swevo.2018.06.010
Reference: SWEVO 421
Please cite this article as: K.R. Opara, Jarosł. Arabas, Differential Evolution: A survey of theoretical
analyses, Swarm and Evolutionary Computation BASE DATA (2018), doi: 10.1016/j.swevo.2018.06.010.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
3 Karol R. Oparaa , Jarosław Arabasb
a
Systems Research Institute, Polish Academy of Sciences
RI
4
b
5 Institute of Computer Science, Warsaw University of Technology
SC
6 Abstract
Differential Evolution (DE) is a state-of-the art global optimization tech-
U
nique. Considerable research effort has been made to improve this algorithm
and apply it to a variety of practical problems. Nevertheless, analytical stud-
AN
ies concerning DE are rather rare. This paper surveys the theoretical results
obtained so far for DE. A discussion of genetic operators characteristic of
DE is coupled with an overview of the population diversity and dynamics
M
models. A comprehensive view on the current-day understanding of the un-
derlying mechanisms of DE is complemented by a list of promising research
directions.
D
7 Keywords:
DE, population dynamics, diversity, convergence, global optimization,
TE
9 evolutionary algorithm
Notation
EP
10
11 Scalar values are written in italics. Lowercase bold letters are column
12 vectors and uppercase bold letters denote matrices. Textual subscripts are
C
14
DE — Differential Evolution
15 CDF — cumulative distribution function
PDF — probability density function
16 Greek symbols
𝜆 — weight of the target vector, see equations (4) and(5)
17
Φ — CDF of standard normal random variable
Emailsubmitted
Preprint address: to
karol.opara@ibspan.waw.pl (Karol R. Opara)
Swarm and Evolutionary Computation June 21, 2018
ACCEPTED MANUSCRIPT
18 Latin symbols
𝐶r — crossover rate (parameter of DE)
C — covariance matrix
PT
Cov, Cov
̂︂ — covariance matrix operator and its estimator
E, E
̂︀ — expectation vector operator and its estimator
𝑓 — objective function
RI
𝐹 — scaling factor (parameter of DE)
𝑔(𝐹 ) — generalized scaling factor, see (25)
SC
ℎ — PDF of infinite population
ℎm — PDF of mutants in infinite population model, see (32)
𝑘 — number of difference vectors
𝑛 — search space dimension
U
𝒩 (𝑚, 𝑣) — normal distribution with mean 𝑚 and variance 𝑣
𝑁p — population size (parameter of DE)
O
𝒪
—
—
AN
offspring population
big O notation
19 o𝑖 — vector encoding the 𝑖th offspring
M
𝑝c — probability of the 𝑐th realization of crossover
𝑝f — probability of strategy selection in DE/either-or, see (6)
𝑝m — probability of mutation (a function of 𝐶r ), see (15)
D
Var, Var
̂︂ — variance operator (in one-dimension) and its estimator
𝑣 — variance of a one-dimensional random variable
x𝑖 — vector encoding the 𝑖th individual in the population
xbest — best (fittest) individual in the population
C
20 1. Introduction
21 Differential Evolution (DE) is a simple and effective evolutionary algo-
22 rithm used to solve global optimization problems in a continuous domain
2
ACCEPTED MANUSCRIPT
23 [1, 2]. It was proposed by Price and Storn in 1995 in a series of papers
24 [3, 4, 5] and since then, it has attracted the interest of researchers and prac-
25 titioners. Comprehensive survey papers [6, 7] provide an up-to-date view on
PT
26 this algorithm and discuss its various modifications, improvements and uses.
27 A growing interest in DE is reflected in the number of publications citing
28 the most distinguished original paper [5]. The top plot in Figure 1 illustrates
RI
29 the citation count of this paper retrieved from the Scopus database. For
30 comparison, the bottom plot shows the relative increase in the number of
31 articles about evolutionary algorithms (EA) indexed in the database. As the
SC
32 reference level 100%, we chose the number of publications about EA from
33 1999. The above-average increase in interest in DE is noticeable from 2004.
34 In the last few years, the number of citations stabilized at a level of over 1000
U
35 papers a year, which shows the importance of this metaheuristic.
36 Roughly half of the papers citing the original DE article [5] describe its ap-
37
38
AN
plication in various domains. The rest offer modifications and improvements
of the algorithm. Despite this massive amount of research, very few papers
39 concern the theoretical foundations of DE. These studies lead to a better
understanding of the algorithm and also provide tips for tuning parameters
M
40
3
ACCEPTED MANUSCRIPT
1500
Annual)DE)citations Citation)structure
PT
Computer)Science)32m
Engineering)23m
Mathematics)17m
1000 Physics)and)Astronomy)4m
Energy)3m
DE)citations)[P]
Decision)Sciences)3m
RI
Materials)Science)3m
Chemical)Engineering)2m
Biochemistry)and)Mol7)Biology)2m
Environmental)Science)2m
Other)9m
SC
500
U
0
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
AN Year
2000
EA)citations)[m]
EA)citations)y1999)=)100md
M
1000
0
D
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Year
TE
Figure 1: Top Figure: Number of citations of paper [5] introducing DE and the division
of citations into disciplines based on the Scopus bibliography database; Bottom Figure:
Number of citations within evolutionary algorithms in percent
EP
57 2. Differential Evolution
{︀ }︀
58 The DE algorithm processes the population X = x1 , x2 , ..., x𝑁p con-
C
4
ACCEPTED MANUSCRIPT
PT
while stop condition not met do
for all 𝑖 ∈ {1, 2, ..., 𝑁p } do
u𝑖 ← differential mutation (𝐹 ; 𝑖, X)
RI
o𝑖 ← crossover (𝐶r ; x𝑡𝑖 , u𝑖 )
if 𝑓 (o𝑖 ) ≤ 𝑓 (x𝑡𝑖 ) then
x𝑡+1
SC
𝑖 ← o𝑖
else
x𝑡+1
𝑖 ← x𝑡𝑖
end if
U
end for
𝑡←𝑡+1
end while
return arg maxx𝑡𝑖 𝑓 (x𝑡𝑖 )
AN
M
67 2.1. Differential Mutation
The differential mutation operator has a few basic variants which are
D
described in [5, 9]. The most common one, denoted as DE/rand/1, consists
of randomly choosing three individuals from the population and adding to
TE
the first of them x𝑟1 (known as the base vector) a scaled difference between
two others x𝑟2 and x𝑟3
EP
71
5
ACCEPTED MANUSCRIPT
Price and Storn also defined the DE/rand/𝑘 variant which uses a larger
number of difference vectors:
PT
u𝑖 ← x𝑟1 + 𝐹1 · (x𝑟2 − x𝑟3 ) + ... + 𝐹𝑘 · (x𝑟2𝑘 − x𝑟2𝑘+1 ), (2)
RI
79
SC
u𝑖 ← xbest + 𝐹 · (x𝑟1 − x𝑟2 ). (3)
U
special cases of a generalized formula which uses a sum of 𝑘 scaled difference
AN
vectors and a weighted mean between the best individual and a randomly
selected one:
𝑘
∑︁
u𝑖 ← 𝜆xbest + (1 − 𝜆)x𝑟1 + 𝐹𝑗 · (x𝑟2𝑗 − x𝑟2𝑗+1 ), (4)
M
𝑗=1
∑︁
u𝑖 ← 𝜆xbest + (1 − 𝜆)x𝑖 + 𝐹𝑗 · (x𝑟2𝑗−1 − x𝑟2𝑗 ). (5)
𝑗=1
relation:
{︃
x𝑟1 + 𝐹 · (x𝑟2 − x𝑟3 ) if rand(0, 1) < 𝑝f ,
u𝑖 ← (6)
x𝑟1 + 𝐾 · (x𝑟2 + x𝑟3 − 2x𝑟1 ) otherwise,
84 where 𝐾 = 𝐹 +1 2
. This operator replaces both differential mutation and
85 crossover operators, ensuring rotational invariance of the resulting algorithm.
6
ACCEPTED MANUSCRIPT
86 2.2. Crossover
In DE, the crossover operator is based on exchanging elements between
vectors encoding the parent and the mutant. It is introduced to promote the
PT
population diversity [6]. In 𝑛-dimensional space, an offspring o𝑖 = [𝑜1 , ..., 𝑜𝑛 ]𝑇
is created through exchanging elements of vectors that represent its parent
x𝑖 = [𝑥1 , ..., 𝑥𝑛 ]𝑇 and the mutant u𝑖 = [𝑢1 , ..., 𝑢𝑛 ]𝑇 . The most common
RI
variant is binomial crossover, denoted as DE/𝑋/𝑌 /bin. The offspring is
created by randomly choosing elements either from the parent or from the
mutant:
SC
{︃
𝑢𝑗 if rand(0, 1) ≤ 𝐶r
𝑜𝑗 = for 𝑗 = 1, ..., 𝑛. (7)
𝑥𝑗 otherwise
U
87 The number of elements inherited from the mutant is described using
88 binomial distribution with 𝑛 independent repeats and success probability
89
90
AN
𝐶r . The offspring is also required to contain at least one element from the
mutant, however this assumption is often neglected in theoretical studies;
91 cf. [10, 11]. Under unit crossover rate 𝐶r = 1, the offspring is equal to the
mutant o𝑖 = u𝑖 .
M
92
7
ACCEPTED MANUSCRIPT
PT
104 reproduces.
105 Selection consists of comparing an offspring with its parent. Only the bet-
106 ter of them is chosen for the next generation. The greediness of this mecha-
RI
107 nism induces elitism, but it operates locally, which promotes the maintenance
108 of the population spread. This approach is distinct from population-based
109 methods of proportional and rank reproduction [16].
SC
110 Selection in DE can be implemented either after the completion of the
111 whole population of mutants or in a steady-state manner through immedi-
112 ately replacing the losing vector. Most of the theoretical studies address the
U
113 former approach.
114
115
2.4. Generalizations of DE AN
A large number of modifications to the original DE algorithm have been
116 proposed so far. These include:
M
117 1. Parameter and strategy adaptation in DE [17], e.g. randomization and
118 self-adaptation of parameters [18, 19], adaptation of the population size
119 [20, 21], strategy adaptation [22].
D
128 These and many more algorithms are surveyed in [6, 7]. The theoretical
129 analyses are typically done for classical DE. Sometimes, the obtained results
AC
130 can be easily generalized to describe more cases. However, the introduction of
131 new evolutionary mechanisms makes this task more challenging and creates
132 a need for tailored theoretical studies.
8
ACCEPTED MANUSCRIPT
PT
Black-Box Differential Evolution (BBDE) algorithm consists only of differ-
ential mutation and local selection. The current individual x𝑖 is used as the
base vector and generates an offspring through the addition of a randomly
RI
scaled difference vector:
SC
138 where 𝐹𝑟 is sampled from a log-normal distribution. Next, the offspring
139 competes with its parent for survival to the following generation.
Scrapping all non-essential elements of DE was done in an attempt to
U
140
141 introduce possibly many invariances to the search algorithm in order to im-
142
143
AN
prove its robustness. This approach seems to be inspired by Hansen’s work
on another state-of-the-art evolutionary algorithm, CMA-ES [34].
147 these algorithms, such as the encoding of individuals [35], adaptation mech-
148 anisms [36] or constraint handling techniques [37]. Many papers address
TE
149 individual algorithms – their description can be found e.g. in [38, 39]. The
150 usefulness of these studies is limited by strong assumptions and significant
151 differences between particular algorithms.
EP
152 In the following sections, we indicate the main areas of theoretical studies
153 of evolutionary algorithms to present the analyses conducted for DE in a
154 wider context.
C
9
ACCEPTED MANUSCRIPT
PT
operations is proportional to the maximal iteration number 𝐺max (the outer
loop in Pseudocode 1) and the population size 𝑁p (the inner loop):
RI
𝒪(𝑛 · 𝑁p · 𝐺max ). (10)
SC
164 the objective function evaluation 𝑐(𝑛) is at most linearly dependent on the
165 search space dimension 𝑛. Otherwise, the complexity of DE is not driven
166 by the execution of genetic operators but rather by the number of objective
function evaluations:
U
167
178 is one of such methods. When the whole population is initialized within a
179 sufficiently large attraction basin of a single local optimum, the population
180 cannot leave this basin because of elitist selection. Formal proof of this
C
181 statement is given by Hu et al. [44] using drift analysis for a specially created
182 “deceptive” function.
AC
183 Hu et al. [45] show that convergence with probability 1 can be obtained
184 after softening the selection in DE and adding a mutation strategy that
185 samples from the whole feasible set. Another way to introduce the global op-
186 timization property to DE is to re-initialize the population, or its part, every
187 𝑘tol iterations [46]. Although the global convergence property is desirable for
188 evolutionary algorithms, its practical usefulness is typically limited by the
189 prohibitively long time necessary to ensure that the optimum is reached.
10
ACCEPTED MANUSCRIPT
PT
193 incorporation of invariances is argued to be a fundamental design criterion
194 for the development of search algorithms [47]. Classical DE is characterized
195 by scale and translation invariance. DE behaves in the same way for an
RI
196 objective function 𝑓 : R𝑛 → R and its transform 𝑓1 (x) = 𝑓 (𝑎x + b) if the
197 initial population is scaled accordingly, where 𝑎 ̸= 0 is a scalar and b ∈ R𝑛
198 a translation vector.
SC
199 DE is also invariant to order-preserving transformations. The selection
200 mechanism of DE requires only an indication of which of the two solutions is
201 preferable. Therefore, DE behaves in the same way for an objective function
U
202 𝑓 and its order-preserving transformations, such as shifting 𝑓2 (x) = 𝑎 + 𝑓 (x),
203 scaling by a positive number 𝑎 > 0, 𝑓3 (x) = 𝑎 · 𝑓 (x), or composition with
204
205
AN
a strictly increasing function 𝑠 : R → R, 𝑓4 (x) = 𝑠(𝑓 (x)). Consequently,
results obtained for simple models generalize to the whole classes of problems.
206 For instance, DE behaves in exactly the same way for the quadratic function
𝑓 (x) = x𝑇 x as for all radial functions, whose values depend solely on the
M
207
208 distance from the origin and are strictly increasing. DE uses only the order
209 between individuals and operates in an ordinal rather than interval scale.
D
210 This property might be useful in the rare cases when it is possible to compare
211 two solutions, but it is not feasible to quantify the objective function, e.g.
TE
219 crossover along the eigenvectors of the population rather than the axes of the
220 coordinate system. This approach makes use of the experimentally observed
AC
221 adaptation of the population to the local shape of the objective function,
222 which is known as “contour fitting”.
223 Price [15] analysed both classical DE and DE/either-or. He showed that
224 these algorithms suffer from mutation bias, i.e. a situation in which the
225 mutants are not centre-symmetrically distributed around the current vector.
226 The requirement of central symmetry is stronger than the coincidence of the
227 expectation of the mutant distribution with the current vector.
11
ACCEPTED MANUSCRIPT
Price [15] also underlined the presence of selection bias, i.e. a situation
in which trial vectors are not symmetric around the current individual but
shifted towards the population midpoint:
PT
𝑁p
1 ∑︁
xmean = x𝑖 . (12)
𝑁p 𝑖=1
RI
228 Selection bias disappears if the current individual x𝑖 is used as the base
vector. These ideas have led to a drift-free DE, in which “each operation
SC
229
U
233 3.4. Universal optimization algorithm
234
235
AN
The no free lunch theorem for optimization was proved by Wolpert and
Macready in [49] and further discussed by Köppen et al. [50]. It shows that
236 in a finite search space for any non-repeating optimization algorithm and any
M
237 performance measure, obtaining above-average performance for one class of
238 problems correlates with below-average performance for another class. There
239 are some discussions about the assumptions necessary to generalize the no
D
240 free lunch theorem to continuous domains [51]. No single “best” universal
241 optimizer exists, but for particular problem classes, some algorithms give
TE
245
246 no structure [55]. On the other hand, practical problems are most often char-
247 acterized by a large number of regularities, such as continuity. Consequently,
248 the no free lunch theorem does not undermine the sense of developing new
C
249 optimizers [53]. Instead, it shows the necessity to appropriately match the
problems with the algorithms [52].
AC
250
251 Properties of the optimization problem may themselves become the sub-
252 ject of analyses. Characterizing the influence of the problem properties on
253 the algorithm’s performance is a way to overcome the limitations imposed
254 by the no free lunch theorem by appropriately assigning the algorithm to a
255 given task [56].
12
ACCEPTED MANUSCRIPT
PT
258
259 assume that the optimization problem is separable to some degree. This
260 echoes in the exponential crossover operator in DE, which exchanges only
261 consecutive elements of the vectors that encode the parent and the mutant.
RI
262 However, the building block hypothesis does not explain the performance-
263 related issues of the search process [39, 59].
Population drift in evolutionary algorithms can be also modelled using
SC
264
265 supermartingales [60, 61]. This kind of analysis can be used to derive the
266 lower bound of the expected runtime of the analysed algorithm [62].
U
267 4. Population diversity in DE
268
269
AN
Before discussing the results of studies concerning population diversity
and dynamics in DE, it is worth summarizing the assumptions commonly
270 made by researchers:
M
271 1. Individuals for differential mutation are drawn with replacement, i.e.
272 the requirement of pairwise distinctness is abandoned, to make these
vectors independent [10, 11, 12, 13, 63, 64, 65, 66].
D
273
274 2. It is not required that at least one element of the mutant must be
crossed over with the parent vector [10, 11, 12, 64].
TE
275
278
13
ACCEPTED MANUSCRIPT
PT
E Var(U)
̂︂ = 2𝐹 + Var(X),
̂︂ (13)
𝑁p
where Var
̂︂ is an estimate of variance. The expectation of the empirical vari-
ance is taken with respect to all possible mutant populations generated by
RI
differential mutation. This expectation can be estimated with repetitive runs
of the algorithm. After crossover between mutants and the parents, the ex-
SC
pected variance of the offspring population is given by
)︁ (︂ 2𝑝m 𝑝2m ̂︂
(︁ )︂
2
E Var(O) = 1 + 2𝑝m 𝐹 −
̂︂ + Var(X), (14)
𝑁p 𝑁p
U
where 𝑝m denotes the crossover probability that is defined for a general, 𝑛-
dimensional search space as [63]:
{︃
𝐶r 𝑛−1
𝑛
AN
+ 𝑛1 for binomial crossover,
𝑝m = 1−𝐶r𝑛 (15)
𝑛·(1−𝐶r )
for exponential crossover.
M
283 For binomial crossover, the relationship between crossover probability 𝑝m
284 and crossover rate 𝐶r is linear, whereas for exponential crossover it is non-
D
287 several other DE variants are derived in [8, 69] and summarized in [70]. In
288 particular, for DE/best/1, the expected variance after mutation and crossover
289 reads:
(︂ )︂
𝑝m (1 − 𝑝m ) ̂︂
EP
(︁ )︁
2
E Var(O) =
̂︂ 1 + 2𝑝m 𝐹 − 𝑝m − Var(X) +
𝑁p
𝑁p − 1 (︁ )︁2
+𝑝m (1 − 𝑝m ) 𝑥best − E(X)
̂︀ , (16)
𝑁p
C
[︂ (︂ )︂
(︁ )︁
2 2 1
E Var(O) = Var(X) · 𝑝f · 1 + 2𝐹 −
̂︂ ̂︂ +
𝑁p
(︂ )︂
𝑁p − 1 2 2
+2𝑝f (1 − 𝑝f ) + 𝐹 + 3𝐾 − 2𝐾 +
𝑁p
(︂ )︂]︂
2 𝑁p − 1 𝑁p − 2 2
+(1 − 𝑝f ) +2 (3𝐾 − 2𝐾) . (17)
𝑁p 𝑁p
14
ACCEPTED MANUSCRIPT
PT
(︁ )︁
E Var(O)
̂︂ = 𝑐 · Var(X)
̂︂ + 𝑑, (18)
RI
291 where 𝑐 and 𝑑 are scalars [70].
292 Zaharie did not show how the variance of population changes after selec-
293 tion, i.e. after completing the whole iteration of the algorithm. Difficulties
SC
294 in computing this quantity are related primarily to its dependence on the
295 objective function.
296 Formulas (13), (14), (16), and (17) directly depend on the population size
U
297 𝑁p . For large populations, quantities (𝑁p − 1)/𝑁p and 1/𝑁p tend to 1 and
298 0 respectively. Thus the dependence weakens towards insignificance, which
299
300
AN
simplifies the diversity equations. On the other hand, for low 𝑁p values, the
actual variances of the mutant and offspring populations may largely differ
301 from their expectations. This is due to the poor efficiency of small sample
statistics.
M
302
305 are evolved according to the same rule, hence analysis can be conducted
306 componentwise [71]. Therefore, the expected values and variances transform
TE
and provide bounds for setting the parameters. For instance, comparing the
population variance before and after applying mutation and crossover allows
for determining the critical values of the parameters for which the offspring
population will have lower diversity than the parent population. Such a diver-
sity loss without imposing selective pressure makes DE prone to premature
convergence for every objective function [12]. For DE/rand/1/bin, equat-
ing the empirical variance of population Var(X)
̂︂ to the expected empirical
15
ACCEPTED MANUSCRIPT
(︁ )︁
variance of the offspring E Var(O)
̂︂ yields
2𝑝m 𝑝2m
PT
2𝐹 2 𝑝m − + = 0. (19)
𝑁p 𝑁p
Equation (19) allows for the calculation of the critical value of the scaling
RI
factor 𝐹crit below which premature convergence is inevitable:
√︃
1 − 𝑝m /2
SC
𝐹crit = . (20)
𝑁p
314 The critical regions were further analysed by Zaharie and Micota in [70]
U
315 and presented as several plots in the 𝐹 -𝐶r domain for a few DE variants.
316 To avoid undesired behaviour, such as the premature collapse of population
317
318
AN
diversity, parameters have to be chosen outside of the critical regions.
Zaharie [8] equated the expected variances of the DE/either-or operator,
319 described by formula (6), for two different 𝑝f settings. The resulting equation
was solved with respect to 𝐾. To make the operator independent on the
M
320
324
325 The crossover operator in DE was investigated by Zaharie [63, 65]. Her
326 comparison of the two most common methods, binomial DE/𝑋/𝑌 /bin and
327 exponential DE/𝑋/𝑌 /exp was based on the crossover probability 𝑝m pro-
EP
328 vided in equation (15). She concluded that the difference in performance
329 of these operators mainly follows from the number of elements exchanged
330 between the vectors that encode the offspring and the parent.
C
333 nomial crossover, the difference between 𝑝m and 𝐶r is small and vanishes
334 completely if the individuals taking part in DE are not required to be pair-
335 wise distinct. These parameters are therefore approximately equal 𝑝m ≈ 𝐶r .
336 The expected number of exchanged elements is 𝑛𝑝m = (𝑛 − 1)𝐶r + 1.
In the exponential crossover operator, crossover probability depends non-
linearly on the crossover rate. The enumerator in formula (15) quickly van-
ishes due to the exponentiation of 𝐶r ∈ [0, 1]. The expected number of
16
ACCEPTED MANUSCRIPT
PT
𝑛. In a limit, when 𝑛 → ∞, mutation probability is zero for all crossover
rates smaller than one [63]:
RI
⎧
⎨𝐶r for binomial crossover,
⎪
lim 𝑝m = 0 for exponential crossover and 0 ≤ 𝐶r < 1, (21)
𝑛→∞ ⎪
1 for exponential crossover and 𝐶r = 1.
SC
⎩
337 These results show that the crossover rate 𝐶r ≈ 1 is necessary for exponen-
338 tial crossover to avoid very low crossover probabilities 𝑝m in high-dimensional
U
339 search spaces.
E E(U)
̂︀ = xbase , (22)
(︃ 𝑘
)︃
(︁ )︁ ∑︁
E Cov(U) = 1+2 𝐹𝑗2 · Cov(X), (23)
EP
̂︂ ̂︂
𝑗=1
where Cov(X)
̂︂ denotes the empirical covariance matrix of the current pop-
C
ulation. Weak convergence means that for large 𝑘 differential mutation (2)
can be approximated with Gaussian mutation:
AC
√
u𝑖 ← x𝑟1 + 2𝑘𝐹 · q∞ , where q∞ ∼ 𝒩 (0, Cov(X)).
̂︂ (24)
17
ACCEPTED MANUSCRIPT
mutation range but keep the same search directions. This is described by a
generalized scaling factor 𝑔(𝐹 ):
(︁ )︁
PT
E Cov(U) = 𝑔(𝐹 ) · Cov(X).
̂︂ ̂︂ (25)
341 One can compensate for the differences among mutation ranges with ap-
propriate modification of the scaling factor [72]. The generalized scaling fac-
RI
342
343 tors show that many variants of differential mutation can be modelled with
344 a base variant with an appropriately transformed, equivalent scaling factor.
SC
345 For instance, DE/current-to-best/1, defined by formula√︁(5), corresponds to
346 DE/best/1 with generalized scaling factor 𝐹 DE/best/1 = 12 [(1 − 𝐹1 )2 + 2𝐹22 ]
√︁
347 and to DE/rand/1 with 𝐹 DE/rand/1
= 12 [(1 − 𝐹1 )2 + 2𝐹22 − 1].
U
348 The generalized scaling factors for several differential mutation opera-
tors are listed in a table in paper [72]. This allows for a common, synthetic
349
350
351
AN
analysis of many DE variants. Moreover, the choice of a particular muta-
tion operator does not change the optimization performance much when the
352 generalized scaling factor is kept at the same value. This effect is particu-
M
353 larly well observed for mutation operators using a larger number of random
354 elements, such as DE/rand/2.
D
356 Evolutionary algorithms are often modeled as Markov chains [73, 74, 38],
357 which enables the population diversity to be analysed in consecutive iter-
358 ations. Most DE variants, including the classical one, satisfy the Markov
359 property, i.e. the distribution of the next population depends only on the
EP
363 the global minimum is already found in the first iteration. Consequently,
364 properties such as “convergence speed” must be redefined. This is typically
AC
365 done through analysing statistics that describe population distribution, such
366 as its expectation vector or covariance matrix. Convergence is not understood
367 as hitting the neighbourhood of the global optimum, but rather as the growth
368 of sampling density in that region.
369 The infinite population model describes the expected location of the pop-
370 ulation midpoint, which does not necessarily coincide with observations ob-
371 tained for a single run of the algorithm with a finite population. Consider
18
ACCEPTED MANUSCRIPT
372 a bimodal, symmetric objective function. With a low mutation range, the
373 whole population (and its mean) converges to either of the optima. However,
374 the expected population distribution is bimodal, with a mean located halfway
PT
375 between the optima [78]. The infinite population model does not describe a
376 single run of the algorithm but rather an average of many independent runs.
RI
377 5.1. Dynamics of a one-dimensional population
378 The mutant distribution is described by a sum of random variables cor-
379 responding to choosing individuals that take part in differential mutation.
SC
380 Ali [79] derived the probability density of the mutant distribution for DE.
381 He used direct integration for a uniformly distributed population in a one-
382 dimensional search space based on an infinite population model. Wang and
U
383 Huang [67] advanced Ali’s results by analysing crossover and computing the
384 distribution of the next population for a monotonous objective function.
385
386
AN
They also provided illustrative examples showing how the population dis-
tribution changes over the first iteration.
387 Ali and Fatti [80] approximated differential mutation with beta distri-
bution, which allowed them to avoid generating mutants outside the box
M
388
389 constraints. Wang and Huang [67] explicitly analysed the projection of un-
390 feasible solutions on the box constraints. This aspect is rarely considered in
theoretical analyses of DE, despite its significant influence on optimization
D
391
were greatly simplified because the analysed algorithm did not use a selec-
tion operator. The populations in consecutive iterations were approximately
Gaussian. The following condition on the parameters is necessary to avoid
C
(26)
393 Inequality (26) was derived from a condition describing the exponential
394 vanishing of the population covariance matrix without selective pressure.
19
ACCEPTED MANUSCRIPT
398 from probability theory and calculus. In their approach, each individual was
399 treated as a particle moving with a certain velocity. The iterative character
400 of DE was interpreted as a discrete approximation of continuous time. Their
PT
401 analysis was performed in the neighbourhood of the minimum, assuming that
402 the objective function was twice differentiable and did not change too rapidly
403 (i.e. fulfilled the Lipschitz condition). In this case, the expected velocity of
RI
404 individual 𝑥𝑡𝑖 in iteration 𝑡 could be approximated with
(︂ 𝑡 )︂ 𝑡
d𝑥𝑖 𝑡 2 d𝑓 (𝑥𝑖 )
≈ − 𝑘8 𝐶r · (2𝐹 2 + 1)Var(X
(︀ )︀
E ̂︂ 𝑡 ) + (𝑥𝑡 − 𝑥 ) + (27)
SC
mean 𝑖
d𝑡 d𝑥
+ 12 𝐶r · (𝑥𝑡mean − 𝑥𝑡𝑖 ),
U
405
408
AN
between the population midpoint and the current individual 𝑥𝑡mean − 𝑥𝑡𝑖 ap-
pearing in formula (27) is a symptom of the selection bias.
409 In the above work, a step function modelling “greedy selection” was ap-
proximated with a logistic, sigmoidal function. This was then substituted
M
410
411 with a linear term from the Maclaurin series. Therefore, in equation (27),
412 factor 𝑘 appeared, which is a “moderate value” of the constant from the
exponent of the logistic function. These transformations implicitly modify
D
413
414 the replacement mechanism in DE that lowers the selective pressure. This
introduces a possibility that a weaker offspring replaces a stronger parent.
TE
415
Dasgupta et al. [10, 64] underlined the similarity between the form of
equation (27) and the gradient descent optimization described with formula
EP
d𝑥 d𝑓 (𝑥)
= −𝑎 + 𝑏, (28)
d𝑡 d𝑥
where 𝑎 is the learning rate and 𝑏 is the momentum. In equation (27), the
C
𝑘 (︁ )︁
𝑎DE = 𝐶r · (2𝐹 2 + 1)Var(X
̂︂ 𝑡 ) + (𝑥𝑡
mean − 𝑥 𝑡 2
𝑖 ) , (29)
8
and momentum
1
𝑏DE = 𝐶r · (𝑥𝑡mean − 𝑥𝑡𝑖 ). (30)
2
d𝑓 (𝑥𝑡 )
The term −𝑎DE d𝑥 𝑖 is responsible for guiding the population in the direc-
tion of the decreasing gradient, whereas 𝑏DE represents velocity towards the
20
ACCEPTED MANUSCRIPT
PT
which can be noted in a reformulation of equation (27)
d𝑥𝑡𝑖 d𝑓 (𝑥𝑡𝑖 )
= −𝑎DE (𝑥𝑡𝑖 , X𝑡 ) + 𝑏DE (𝑥𝑡𝑖 , 𝑥𝑡mean ). (31)
RI
d𝑡 d𝑥
416 DE is invariant under monotonic (order-preserving) transformations of
the objective function. Weakening this assumption through modelling selec-
SC
417
418 tion with a linear approximation of the logistic function implicitly changes
419 the selection mechanism from elitist to quasi-proportional. This allows the
420 mechanisms of DE to be highlighted. Informally, these mechanisms are de-
U
421 scribed as performing “gradient optimization without computing gradient”
422 and adjusting the speed of DE using the spread of the population.
1 ∑︁ ∑︁ ∑︁ (︁ x )︁ (︁ x )︁
ℎ𝑡m,𝑖 (x) = 2 ℎ𝑡𝑖1 (x) * ℎ𝑡𝑖2 * ℎ𝑡𝑖3 − , (32)
TE
𝐹 𝑃3,𝑁p −1 𝑖 𝑖 𝑖 𝐹 𝐹
1 2 3
𝑖1 ̸=𝑖2 ̸=𝑖3 ̸=𝑖
424 where constant 𝑃3,𝑁p −1 denotes the number of permutations of three elements
EP
either mutant or parent are denoted as a𝑐 and b𝑐 . Notation xa𝑐 , yb𝑐 describes
a vector formed from elements of vector x in positions a𝑐 and vector y in
AC
x = [𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 ]𝑇 , (33)
y = [𝑦1 , 𝑦2 , 𝑦3 , 𝑦4 , 𝑦5 ]𝑇 , (34)
a𝑐 = {1, 2, 4}, (35)
b𝑐 = {3, 5}, (36)
21
ACCEPTED MANUSCRIPT
we obtain
xa𝑐 , yb𝑐 = [𝑥1 , 𝑥2 , 𝑦3 , 𝑥4 , 𝑦5 ]𝑇 . (37)
PT
This notation facilitates the common analysis of binomial and exponential
crossover. Consider a particular crossover execution in which elements with
indices a𝑐 originate from the mutant and elements with indices b𝑐 from the
parent. To generate an offspring at point o = o𝑎𝑐 , o𝑏𝑐 we need a mutant with
RI
elements of o at coordinates a𝑐 and a parent with elements of o at coordinates
b𝑐 . The remaining coordinates do not influence the offspring. Therefore, the
PDF of an offspring ℎ𝑡o is given by an integral of a product of PDFs of the
SC
mutant ℎ𝑡m (oa𝑐 , ub𝑐 ) and the parent ℎ𝑡 (ya𝑐 , ob𝑐 ):
∫︁
ℎ𝑡o (oa𝑐 , ob𝑐 ) = ℎ𝑡m (oa𝑐 , ub𝑐 )ℎ𝑡 (ya𝑐 , ob𝑐 ) dya𝑐 dub𝑐 ,
U
(38)
R|a𝑐 | ×R|b𝑐 |
426
AN
where |a𝑐 | and |b𝑐 | denote the number of elements originating from the mu-
427 tant and the parent, |a𝑐 | + |b𝑐 | = 𝑛. Variables ya𝑐 and ub𝑐 denote the irrele-
M
428 vant elements of the vectors encoding the parent and the mutant, which are
429 integrated out.
430 The choice of an individual that survives to the next iteration depends
D
431 on the comparison of the objective function values for the parent and the
432 offspring. This can be described through integration over the level sets i.e.
TE
433 the regions for which the value of the objective function is better (or worse)
434 than for the offspring 𝑓 (x) = 𝑓 (xa𝑐 , xb𝑐 ) ≥ 𝑓 (oa𝑐 , xb𝑐 ). In this way, the
435 population PDF in the next iteration becomes:
EP
∫︁
ℎ𝑡+1
𝑝𝑐 (x) = 𝑡
ℎ (x) ℎ𝑡m (oa𝑐 , ub𝑐 ) doa𝑐 dub𝑐 + (39)
𝑓 (x)<𝑓 (oa𝑐 ,xb𝑐 )
C
∫︁
ℎ𝑡m (xa𝑐 , ub𝑐 )ℎ𝑡 (ya𝑐 , xb𝑐 ) dya𝑐 dub𝑐 .
AC
436 The first summand in equation (39) describes a situation in which the
437 offspring is rejected in selection and its parent gets to the next population.
438 The integral describes the probability of such an event. The second integral
439 denotes the opposite case: the offspring generated in point x = xa𝑐 , xb𝑐 wins
440 the competition against the parent ya𝑐 , xb𝑐 .
22
ACCEPTED MANUSCRIPT
441 The derivation of formula (39) was conducted for a fixed choice of in-
442 dices a𝑐 and b𝑐 . To describe the general case, Ghosh et al. [68] summed
443 density (39) over all possible executions of the crossover operator using the
PT
444 probability of particular crossover outcomes 𝑝𝑐 as weights:
𝑛 −1
2∑︁
𝑡+1
𝑝𝑐 ℎ𝑡+1
RI
ℎ (x) = 𝑝𝑐 (x). (40)
𝑐=1
445 The summation runs over all 2𝑛 − 1 possible realizations of the crossover
SC
446 operator (at least one element was required to come from the mutant). Hence,
447 the model’s computational complexity grows exponentially with the search
448 space dimension 𝑛, which significantly constrains the scope of its application.
U
449 After the substitution of equation (39) into (40) the final dynamics for-
450 mula becomes:
𝑡+1
𝑛
2∑︁−1
⎛
⎜ 𝑡
AN
∫︁
ℎ (x) = 𝑝𝑐 ⎝ℎ (x) ℎ𝑡m (oa𝑐 , ub𝑐 ) doa𝑐 dub𝑐 + (41)
M
𝑐=1
𝑓 (x)<𝑓 (oa𝑐 ,xb𝑐 )
⎞
∫︁
+ ℎ𝑡m (xa𝑐 , ub𝑐 )ℎ𝑡 (ya𝑐 , xb𝑐 ) dya𝑐 dub𝑐 ⎠ .
⎟
D
458 method provides the conditions which ensure that a system with constantly
459 decreasing energy will reach the zero energy state identified by a stationary
460 point. The main advantage of this method is the possibility of analysing the
461 stability of the solution without solving the dynamics equations.
462 Dasgupta et al. [64] used Lyapunov’s second method to show that a
463 population located close to an isolated optimum approaches it in a stable way
464 and without any kind of oscillatory behaviour. These results were further
23
ACCEPTED MANUSCRIPT
PT
468 Ghosh et al. [68] did not solve equation (41). Instead, they used it to
469 derive a non-constructive proof of the convergence of DE. They defined a Lya-
470 punov function as the difference between the expected values of the objective
RI
471 function with respect to measures given by population distributions in two
472 consecutive populations. Next, they showed that this function equals zero in
473 the global minimum and is positive for all other arguments. Its changes be-
SC
474 tween consecutive iterations, which are an approximation of a time derivative,
475 decrease during the execution of the algorithm. Lyapunov’s second method
476 ensures convergence of the population to a one-point distribution located in
U
477 the global minimum.
478 The above theoretical result seems to contradict experimentally observed
479
480
AN
cases in which DE does not find the global minimum but converges to a local
one. Ghosh et al. [68] use an important additional assumption of an infinite
481 population initialized in an area containing the global minimum. Therefore,
for all individuals outside of the neighbourhood of the minimum, there is a
M
482
483 non-zero probability that an offspring falls there and then replaces the parent.
484 The theorem proved in paper [68] means that infinite population distribution
D
485 almost surely converges to the global minimum, which is hit already in the
486 first iteration.
TE
487 Similar results are derivable with the standard probability theory. How-
488 ever, use of Lyapunov’s second method potentially opens new possibilities.
489 For instance, one can analyse convergence speed through investigating the
490 values of the Lyapunov function in consecutive iterations [68].
EP
495 function. Without the loss of generality, it was assumed that the population
496 midpoint was located on the first coordinate axis. This property held for the
497 expected distributions in consecutive populations.
It was also assumed that the population in each iteration 𝑡 = 1, 2, 3, ...
had uncorrelated, multivariate Gaussian distribution
X𝑡 ∼ 𝒩 (x𝑡mean , C𝑡 ), (42)
24
ACCEPTED MANUSCRIPT
with expectation vector x𝑡mean = [𝑚𝑡 , 0, ..., 0]𝑇 and diagonal covariance matrix
⎡ 𝑡
𝑣1 0 · · · 0
⎤
PT
.. ⎥
⎢ 0 𝑣2𝑡 .⎥
⎢
𝑡
C =⎢. . ⎥. (43)
⎣ .. .. 0 ⎦
0 · · · 0 𝑣2𝑡
RI
498 Gaussian approximation (42) accurately represents the axial symmetry
499 of the considered case, but it neglects the skewness of the distribution of the
SC
500 population approaching the minimum.
501 Zhang and Sanderson derived equations describing the parameters of the
502 expected population distribution in iteration 𝑡 + 1 based on their values in
U
503 iteration 𝑡, which constituted a system:
)︀2
AN )︀2 𝑣1𝑡+1 + (𝑛 − 1)𝑣2𝑡+1
𝑚𝑡+1 𝑚𝑡 − ∆𝑚𝑡
(︀ (︀
= + , (44)
𝑁p
𝜋 − 1 𝑣1𝑡 + 𝑤1𝑡
𝑣1𝑡+1 = , (45)
M
𝜋 (︃ 2 )︃ (︃ )︃
𝑚𝑡− 𝑚𝑡−
𝑣2𝑡+1 𝑡
= 𝑣2 Φ − √︀ 𝑡 + 𝑤2 Φ √︀ 𝑡 +𝑡
(46)
𝑣+ 𝑣+
D
(︁ )︁2
2 2
2 (𝑣2𝑡 ) + (𝑤2𝑡 ) (︂
(𝑚𝑡− )2
)︂
TE
− √︀ − 𝑡
,
𝑡
2𝜋𝑣+ 2𝑣+
504 where Φ denotes the cumulative distribution function (CDF) of the standard
normal distribution, 𝑤1𝑡 and 𝑤2𝑡 describe variances of the mutant population
EP
505
25
ACCEPTED MANUSCRIPT
507
PT
508
RI
.
509 Equations (44–46) describe the elements of the expectation vector and
SC
510 covariance matrix of the expected population distribution. This model ade-
511 quately describes the dynamics of large populations in which the empirical
512 distribution is close to the expected distribution. Based on a simulation
513 study, Zhang and Sanderson [13] conjectured that after a certain time, the
U
514 rate of change of parameters of the Gaussian distribution stabilizes.
The results from Zhang and Sanderson allow practitioners to compute
515
516
517
AN
the critical value of the scaling factor 𝐹 , below which the undesirable phe-
nomenon of premature convergence occurs. Far away from the minimum,
518 the quadratic objective function locally resembles a hyperplane and the crit-
M
√︀
519 ical value is 𝐹 = 1/(𝜋 − 1) ≈ 0.683, whereas in the neighbourhood of the
520 minimum it drops to 𝐹 = 0.48.
D
523 Carlo methods. The goal was to draw samples with a density proportional
524 to the objective function values rather than perform standard optimization.
525 The objective function was assumed to be the density of some continuous
probability distribution, i.e. a non-negative function with a unit integral.
EP
526
26
ACCEPTED MANUSCRIPT
528 The DE-MC algorithm yields a Markov chain, which has a unique sta-
529 tionary distribution with density 𝑓 . In the proof, Ter Braak first shows that
530 the resulting chain is reversible and therefore the stationary distribution is
PT
531 proportional to 𝑓 . To derive the uniqueness of the chain, the unboundedness
532 of the support of e is necessary, so that the population can move between all
533 of the local optima by means of macromutation.
RI
534 From the Bayesian point of view, the significance of Ter Braak’s results
535 consists of the fact that the DE-MC algorithm adapts the direction and range
536 of the sampling distribution and effectively accommodates heavy-tailed and
SC
537 multimodal target distributions [85].
538 The resulting population dynamics in DE-MC is quite different from other
539 DE variants. Rather than converging to a single optimum, DE-MC maintains
U
540 population diversity. The population traverses the whole feasible set, with
541 sampling density proportional to the objective function. The resulting algo-
542
543
AN
rithm has an exploratory character and seems more appropriate for Monte
Carlo sampling than for optimization.
M
544 6. Discussion and further study
545 6.1. Comparison of theoretical results on DE
D
552 times, certain modifications are added, such as the introduction of soft se-
553 lection. The latter makes it possible to prove global convergence. It also
554 facilitates the use of calculus by smoothing the discontinuous step function
C
27
ACCEPTED MANUSCRIPT
PT
[11, 63, 70] mutation and crossover, crit- population distribution
ical regions for parameters
Zhang and DE/rand/𝑘 Dynamics for the radial ob- Gaussian distribution of
Sanderson jective function populations, geometrical
RI
[13] approach
Dasgupta et DE/rand/1 Analogy with gradient de- Dynamic system, non-elitist
al. [10, 64] soft selection scent, stability near the opti- selection
SC
mum
Ghosh et al. DE/rand/1 General dynamics equation, Dynamic system, non-elitist
[68] asymptotic global conver- selection, Lyapunov’s second
gence method for stability
U
Ter Braak [84, DE-MC soft Characterization of a station- Existence and uniqueness of a
85] selection ary population distribution stationary Markov Chain
565
567
568 their practical use. However, it is worth mentioning approaches which have
569 successfully bridged these two areas. Among the results that have influenced
570 applications is Zaharie’s analysis of critical values for the scaling factor and
EP
571 crossover rate. The developments of DE/either-or and BBDE by Price were
572 also motivated by theoretical considerations. Recently, Arabas and Biedrzy-
573 cki improved the efficiency of various EAs, including several types of DE,
C
28
ACCEPTED MANUSCRIPT
PT
585 unexplored. Direct modelling of selection in DE is a difficult task due
586 to its elitist, local character and dependence on the objective function.
587 Nevertheless, the impact of the selection mechanism on search dynam-
RI
588 ics is profound. Investigations into this issue could clarify whether DE
589 resembles an adaptive swarm optimizer or rather a group of loosely
590 related hill-climbers.
SC
591 2. Population dynamics models for DE need further development. The
592 ones currently available provide little operational advice. Moreover,
593 the models are much more mathematically complicated than the orig-
U
594 inal DE algorithm. Consequently, conclusions are less straightforward,
595 which hinders their translation into new algorithmic developments.
596
597
AN
Population dynamics models are a possible way to compute statistics
such as the expected first hitting time, i.e. the average time necessary
598 to find an optimal solution.
3. Population size 𝑁p has gained the least attention of all the control
M
599
29
ACCEPTED MANUSCRIPT
620 its numerous variants make modelling troublesome. From the theoreti-
621 cal perspective it is advisable to look for minimalistic, analysis-friendly
622 variants of DE. Low complexity facilitates theoretical studies and al-
PT
623 lows for stronger conclusions with clearer interpretations to be drawn.
624 Algorithmic simplicity does not necessarily hinder performance [31, 32].
625 The search for a general, synthetic framework may concern not only
RI
626 whole algorithms but also their elements, such as genetic operators.
627 6. DE would benefit from transferring theoretical results developed for
628 other metaheuristic optimization methods, such as Evolution Strategies
SC
629 or Estimation of Distribution Algorithms. Adaptation of these results
630 would also show similarities and contrasts between DE and other global
631 optimizers.
U
632 7. It would be useful to characterize the properties of optimization prob-
633 lem classes that would be particularly easy, or particularly hard, to
634
635
AN
be solved by DE. The existence of such classes of tasks is a conse-
quence of the no free lunch theorem. Although some guidelines in this
636 area are available [87], they are of experimental origin and have a gen-
M
637 eral, descriptive character. These guidelines would benefit from being
638 grounded in mathematical derivation. However, it is not clear to what
639 extent it is feasible to formalize non-trivial features such as deceptive-
D
640 ness or a lack of clear structure [88]. Apart from the algorithm as a
641 whole, guidelines could also address the use of particular elements of
TE
643 7. Conclusions
EP
647 rithms. The importance of theoretical studies lies in the possibility to better
648 understand the principles and mechanics of the optimizer. Analytical results
AC
649 often inspire improvements and modifications to the algorithm. Finally, well-
650 developed theoretical foundations build trust in the analysed methods.
651 This paper surveyed the theoretical results concerning DE. It provided
652 an overview of the convergence studies, invariances, as well as investigations
653 into differential mutation, crossover operators, and population diversity. A
654 detailed presentation of the population dynamics models stressed their ad-
655 vantages and weaknesses. Despite huge research effort into DE, there are
30
ACCEPTED MANUSCRIPT
656 still many open problems waiting for analytical study, the most promising of
657 which have been unfolded in this paper.
PT
658 Acknowledgment
659 Karol Opara’s study was supported by a research fellowship within “Infor-
RI
660 mation technologies: research and their interdisciplinary applications”, agree-
661 ment number POKL.04.01.01-00-051/10-00.
SC
662 [1] F. Neri, V. Tirronen, Recent advances in Differential Evolution: a sur-
663 vey and experimental analysis, Artificical Intelligence Reviews 33 (1-2)
664 (2010) 61–106. doi:10.1007/s10462-009-9137-2.
U
665 [2] N. Pham, A. Malinowski, T. Bartczak, Comparative study of derivative
free optimization algorithms, IEEE Transactions on Industrial Informat-
666
667
AN
ics 7 (4) (2011) 592–600. doi:10.1109/TII.2011.2166799.
668 [3] K. Price, Differential Evolution: a fast and simple numerical op-
M
669 timizer, in: Fuzzy Information Processing Society, 1996. NAFIPS.,
670 1996 Biennial Conference of the North American, 1996, pp. 524–527.
671 doi:10.1109/NAFIPS.1996.534790.
D
672 [4] R. Storn, On the usage of Differential Evolution for function opti-
mization, in: Fuzzy Information Processing Society, 1996. NAFIPS.,
TE
673
674 1996 Biennial Conference of the North American, 1996, pp. 519 –523.
675 doi:10.1109/NAFIPS.1996.534789.
EP
676 [5] R. Storn, K. Price, Differential Evolution –- a simple and efficient heuris-
677 tic for global optimization over continuous spaces, Journal of Global
678 Optimization 11 (1997) 341–359. doi:10.1023/A%3A1008202821328.
C
680
31
ACCEPTED MANUSCRIPT
PT
688 [9] K. V. Price, R. M. Storn, J. A. Lampinen, Differential Evolution: A
689 practical approach to global optimization, Natural Computing Series,
RI
690 Springer, 2005. doi:10.1007/3-540-31306-0.
SC
692 dynamics of Differential Evolution: A mathematical model, in:
693 IEEE Congress on Evolutionary Computation, CEC 2008. (IEEE
694 World Congress on Computational Intelligence)., 2008, pp. 1439–1446.
695 doi:10.1109/CEC.2008.4630983.
U
696 [11] D. Zaharie, On the explorative power of Differential Evolution algo-
697 AN
rithms, Analele Universitatii din Timosoara XXXIX (2) (2001) 1–12.
698 [12] D. Zaharie, Critical values for the control parameters of Differential
Evolution algorithms, in: Proceedings of MENDEL, 2002, pp. 62–67.
M
699
702 doi:10.1007/978-3-642-01527-4.
TE
703 [14] Y. Zhou, W. Yi, L. Gao, X. Li, Analysis of mutation vectors selection
704 mechanism in Differential Evolution, Applied Intelligence 44 (4) (2016)
705 904–912. doi:10.1007/s10489-015-0738-y.
EP
706 [15] K. V. Price, Eliminating Drift Bias from the Differential Evolution Algo-
707 rithm, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 33–88.
708 doi:10.1007/978-3-540-68830-3_2.
C
710 Computation, 1st Edition, IOP Publishing Ltd., Bristol, UK, UK, 1997.
711 [17] E.-N. Dragoi, V. Dafinescu, Parameter control and hybridization tech-
712 niques in Differential Evolution: a survey, Artificial Intelligence Review
713 45 (4) (2016) 447–470. doi:10.1007/s10462-015-9452-8.
32
ACCEPTED MANUSCRIPT
PT
718 [19] R. Tanabe, A. Fukunaga, Success-history based parameter adaptation
719 for Differential evolution, in: 2013 IEEE Congress on Evolutionary Com-
720 putation, 2013, pp. 71–78. doi:10.1109/CEC.2013.6557555.
RI
721 [20] J. Brest, M. Sepesy Maučec, Population size reduction for the Differ-
722 ential Evolution algorithm, Applied Intelligence 29 (3) (2008) 228–247.
SC
723 doi:10.1007/s10489-007-0091-x.
U
726 doi:10.1016/j.swevo.2016.05.003.
727
728
AN
[22] A. Qin, V. Huang, P. Suganthan, Differential Evolution algorithm
with strategy adaptation for global numerical optimization, IEEE
729 Transaction on Evolutionary Computation 13 (2) (2009) 398–417.
doi:10.1109/TEVC.2008.927706.
M
730
737
738 [25] A. Zhang, J.; Sanderson, JADE: Adaptive Differential Evolution with
739 optional external archive, IEEE Transactions on Evolutionary Compu-
C
33
ACCEPTED MANUSCRIPT
PT
750 [29] A. Caponio, F. Neri, V. Tirronen, Super-fit control adaptation in
751 memetic Differential Evolution frameworks, Soft Computing 13 (8-9)
RI
752 (2009) 811–831. doi:10.1007/s00500-008-0357-1.
SC
754
755 [31] G. Iacca, F. Neri, E. Mininno, Y.-S. Ong, M.-H. Lim, Ockham’s
756 razor in memetic computing: Three stage optimal memetic explo-
U
757 ration, Information Sciences 188 (Supplement C) (2012) 17 – 43.
758 doi:10.1016/j.ins.2011.11.025.
759
AN
[32] A. P. Piotrowski, J. J. Napiórkowski, Some metaheuristics should be
760 simplified, Information Sciences 427 (Supplement C) (2018) 32 – 62.
761 doi:10.1016/j.ins.2017.10.039.
M
762 [33] K. Price, How symmetry constraints evolutionary optimizers: Black Box
763 Differential Evolution – a case study, in: Proceedings of the 2017 IEEE
D
766 [34] N. Hansen, The CMA evolution strategy: a comparing review, in: J. A.
767 Lozano, P. Larrañaga, I. Inza, E. Bengoetxea (Eds.), Towards a new
768 evolutionary computation. Advances on estimation of distribution algo-
EP
34
ACCEPTED MANUSCRIPT
779 art, Computer methods in applied mechanics and engineering 191 (11)
780 (2002) 1245–1287. doi:10.1016/S0045-7825(01)00323-1.
PT
781 [38] H.-G. Beyer, H.-P. Schwefel, I. Wegener, How to analyse evolution-
782 ary algorithms, Theoretical Computer Science 287 (1) (2002) 101–130.
783 doi:10.1016/S0304-3975(02)00137-8.
RI
784 [39] A. E. Eiben, G. Rudolph, Theory of evolutionary algorithms: a
785 bird’s eye view, Theoretical Computer Science 229 (1) (1999) 3–9.
SC
786 doi:10.1016/S0304-3975(99)00089-4.
787 [40] K. Zielinski, R. Laur, Stopping criteria for Differential Evolution in con-
788 strained single-objective optimization, in: U. K. Chakraborty (Ed.), Ad-
U
789 vances in Differential Evolution, Springer, Berlin, Heidelberg, 2008, pp.
790 111–138. doi:10.1007/978-3-540-68830-3_4.
791
AN
[41] K. Opara, J. Arabas, Benchmarking procedures for continuous opti-
792 mization algorithms, Journal of Telecommunications and Information
Technology 2011 (4) (2011) 73–80.
M
793
794 [42] K. Zielinski, D. Peters, R. Laur, Run time analysis regarding stop-
795 ping criteria for Differential Evolution and particle swarm optimiza-
D
798 1–8.
800
807 [45] Z. Hu, S. Xiong, Q. Su, Z. Fang, Finite markov chain analysis of classical
808 Differential Evolution algorithm, Journal of Computational and Applied
809 Mathematics 268 (2014) 121–134. doi:10.1016/j.cam.2014.02.034.
35
ACCEPTED MANUSCRIPT
810 [46] Y.-J. Wang, J.-S. Zhang, Global optimization by an improved Differ-
811 ential evolutionary algorithm, Applied Mathematics and Computation
812 188 (1) (2007) 669–680. doi:10.1016/j.amc.2006.10.021.
PT
813 [47] N. Hansen, Invariance, self-adaptation and correlated mutations in evo-
814 lution strategies, in: M. Schoenauer, K. Deb, G. Rudolph, E. Yao, Xi-
RI
815 nand Lutton, J. J. Merelo, H.-P. Schwefel (Eds.), Parallel Problem Solv-
816 ing from Nature PPSN VI: 6th International Conference Paris, France,
817 September 18–20, 2000 Proceedings, Springer Berlin Heidelberg, Berlin,
SC
818 Heidelberg, 2000, pp. 355–364. doi:10.1007/3-540-45356-3_35.
U
821 ary Computation 19 (1) (2015) 31–49. doi:10.1109/TEVC.2013.2297160.
822
823
AN
[49] D. Wolpert, W. Macready, No free lunch theorems for optimization,
IEEE Transactions on Evolutionary Computation 1 (1) (1997) 67–82.
824 doi:10.1109/4235.585893.
M
825 [50] M. Köppen, D. H. Wolpert, W. G. Macready, Remarks on a recent paper
826 on the “No Free Lunch” theorems, IEEE Transactions on Evolutionary
Computation 5 (3) (2001) 295–296. doi:10.1109/4235.930318.
D
827
829 free lunch: Continuous lunches are not free, Evolutionary Computation
830 25 (3) (2017) 503–528. doi:10.1162/EVCO_a_00196.
831
36
ACCEPTED MANUSCRIPT
841 [55] T. English, Optimization is easy and learning is hard in the typical
842 function, in: Proceedings of the 2000 Congress on Evolutionary Com-
843 putation: CEC 2000, 2000, pp. 924–931. doi:10.1109/CEC.2000.870741.
PT
844 [56] W. B. Langdon, R. Poli, Evolving problems to learn about par-
845 ticle swarm optimizers and other search algorithms, IEEE Trans-
RI
846 actions on Evolutionary Computation 11 (5) (2007) 561–578.
847 doi:10.1109/TEVC.2006.886448.
[57] J. H. Holland, Adaptation in natural and artificial systems: An intro-
SC
848
849 ductory analysis with applications to biology, control, and artificial in-
850 telligence., University of Michigan Press, 1975.
U
851 [58] D. E. Goldberg, et al., Genetic Algorithms in search, optimization, and
852 machine learning, Vol. 412, Addison-wesley Reading Menlo Park, 1989.
853
854
AN
[59] L. Altenberg, The schema theorem and Price’s theorem, in: Foundations
of Genetic Algorithms, Vol. 3, 1994, pp. 23–49.
[60] J. He, X. Yao, Drift analysis and average time complexity of evo-
M
855
858 [61] J. He, X. Yao, A study of drift analysis for estimating computation
859 time of evolutionary algorithms, Natural Computing 3 (1) (2004) 21–
TE
862
37
ACCEPTED MANUSCRIPT
PT
876 Springer, 2010, pp. 114–123. doi:10.1007/978-3-642-15844-5_12.
877 [67] L. Wang, F. Huang, Parameter analysis based on stochastic model for
RI
878 Differential Evolution algorithm, Applied Mathematics and Computa-
879 tion 217 (7) (2010) 3263 – 3273. doi:10.1016/j.amc.2010.08.060.
SC
880 [68] S. Ghosh, S. Das, A. V. Vasilakos, K. Suresh, On convergence of Differ-
881 ential Evolution over a class of continuous functions with unique global
882 optimum, IEEE Transactions on Systems, Man, and Cybernetics, PartB:
883 Cybernetics 42 (1) (2012) 107–124. doi:10.1109/TSMCB.2011.2160625.
U
884 [69] E. Zhabitskaya, Constraints on control parameters of asynchronous Dif-
885
886
AN
ferential Evolution, in: G. Adam, J. Buša, M. Hnatič (Eds.), Mathemat-
ical Modeling and Computational Science: International Conference,
887 MMCP 2011, Stará Lesná, Slovakia, July 4-8, 2011, Revised Selected
M
888 Papers, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 322–
889 327. doi:10.1007/978-3-642-28212-6_40.
890
892
893 doi:10.1109/CEC.2017.7969521.
895 random search algorithms, in: P. Brito (Ed.), COMPSTAT 2008: Pro-
896 ceedings in Computational Statistics, Physica-Verlag HD, Heidelberg,
897 2008, pp. 473–485. doi:10.1007/978-3-7908-2084-3_39.
C
901 [73] A. E. Nix, M. D. Vose, Modeling genetic algorithms with Markov chains,
902 Annals of mathematics and artificial intelligence 5 (1) (1992) 79–88.
903 doi:10.1007/BF01530781.
38
ACCEPTED MANUSCRIPT
PT
907 [75] X. Qi, F. Palmieri, Theoretical analysis of evolutionary algorithms with
908 an infinite population size in continuous space. Part I: Basic properties
RI
909 of selection and mutation, IEEE Transactions on Neural Networks 5 (1)
910 (1994) 102–119. doi:10.1109/72.265965.
[76] I. Karcz-Duleba, Dynamics of infinite populations evolving in a land-
SC
911
912 scape of uni and bimodal fitness functions, IEEE Transactions on Evo-
913 lutionary Computation 5 (4) (2001) 398–409. doi:10.1109/4235.942533.
U
914 [77] I. Karcz-Duleba, Dynamics of two-element populations in the space
915 of population states, IEEE Transactions on Evolutionary Computation
916
917
AN
10 (2) (2006) 199–209. doi:10.1109/TEVC.2005.856070.
[78] J. Arabas, Approximating the genetic diversity of populations in the
918 quasi-equilibrium state, IEEE Transactions on Evolutionary Computa-
M
919 tion 16 (5) (2012) 632–644. doi:10.1109/TEVC.2011.2166157.
920 [79] M. Ali, Differential Evolution with preferential crossover, Euro-
D
921 pean Journal of Operational Research 181 (3) (2007) 1137 – 1147.
922 doi:10.1016/j.ejor.2005.06.077.
TE
923 [80] M. M. Ali, L. P. Fatti, A differential free point generation scheme in the
924 Differential Evolution algorithm, Journal of Global Optimization 35 (4)
925 (2006) 551–572. doi:10.1007/s10898-005-3767-y.
EP
928
929 Solving from Nature, PPSN XI: 11th International Conference, Kraków,
Poland, September 11-15, 2010, Proceedings, Part II, Springer Berlin
AC
930
39
ACCEPTED MANUSCRIPT
PT
940 doi:10.1109/CEC.2005.1554757.
941 [84] C. J. Ter Braak, A Markov Chain Monte Carlo version of the ge-
RI
942 netic algorithm Differential Evolution: easy Bayesian computing for
943 real parameter spaces, Statistics and Computing 16 (3) (2006) 239–249.
944 doi:10.1007/s11222-006-8769-1.
SC
945 [85] J. A. Vrugt, C. Ter Braak, C. Diks, B. A. Robinson, J. M. Hyman,
946 D. Higdon, Accelerating Markov chain Monte Carlo simulation by Dif-
947 ferential Evolution with self-adaptive randomized subspace sampling,
U
948 International Journal of Nonlinear Sciences and Numerical Simulation
10 (3) (2009) 273–290. doi:10.1515/IJNSNS.2009.10.3.273.
949
950
AN
[86] J. Arabas, R. Biedrzycki, Improving evolutionary algorithms in a
951 continuous domain by monitoring the population midpoint, IEEE
M
952 Transactions on Evolutionary Computation 21 (5) (2017) 807–812.
953 doi:10.1109/TEVC.2017.2673962.
954
956
957 doi:10.1109/CEC.2017.7969387.
958 [88] J. He, T. Chen, X. Yao, On the easiest and hardest fitness functions,
EP
40
ACCEPTED MANUSCRIPT
PT
RI
U SC
AN
M
D
TE
C EP
AC