Metaheristics Optimization: Algorithm Analysis and Open Problems

Intro Metaheuristic Algorithms Applications Markov Chains Analysis All NFL Open Problems Thanks
Metaheristics Optimization: Algorithm Analysis

and Open Problems
Xin-She Yang
National Physical Laboratory, UK
@ SEA 2011
Xin-She Yang 2011
Metaheuristics and Optimization
Intro
Intro
Computational science is now the third paradigm of science,
complementing theory and experiment.
- Ken Wilson (Cornell University), Nobel Laureate.
Xin-She Yang 2011
Intro
Intro
Xin-She Yang 2011
Intro
Intro
All models are wrong, but some are useful.
- George Box, Statistician
Xin-She Yang 2011
Intro
Intro
All models are inaccurate, but some are useful.
Xin-She Yang 2011
Intro
Intro
All algorithms perform equally well on average over all possible
functions.
- No-free-lunch theorems (Wolpert & Macready)
Xin-She Yang 2011
Intro
Intro
functions. How so?
Xin-She Yang 2011
Intro
Intro
functions. Not quite! (more later)
Xin-She Yang 2011
Intro
Intro
functions. Not quite! (more later)
Xin-She Yang 2011
Overview
Overview
Introduction
Metaheuristic Algorithms
Applications
Markov Chains and Convergence Analysis
Exploration and Exploitation
Free Lunch or No Free Lunch?
Open Problems
Xin-She Yang 2011
Essence of an Optimization Algorithm
To move to a new, better point x
i +1
from an existing known
location x
i
.
x
1
x
2
x
i
Xin-She Yang 2011
i +1
location x
i
.
x
1
x
2
x
i
Xin-She Yang 2011
i +1
location x
i
.
x
1
x
2
x
i
x
i +1
?
Population-based algorithms use multiple, interacting paths.
Dierent algorithms
Dierent strategies/approaches in generating these moves!
Xin-She Yang 2011
Optimization Algorithms
Optimization Algorithms
Deterministic
Newtons method (1669, published in 1711), Newton-Raphson
(1690), hill-climbing/steepest descent (Cauchy 1847),
least-squares (Gauss 1795),
linear programming (Dantzig 1947), conjugate gradient
(Lanczos et al. 1952), interior-point method (Karmarkar
1984), etc.
Xin-She Yang 2011
Stochastic/Metaheuristic
Stochastic/Metaheuristic
Genetic algorithms (1960s/1970s), evolutionary strategy
(Rechenberg & Swefel 1960s), evolutionary programming
(Fogel et al. 1960s).
Simulated annealing (Kirkpatrick et al. 1983), Tabu search
(Glover 1980s), ant colony optimization (Dorigo 1992),
genetic programming (Koza 1992), particle swarm
optimization (Kennedy & Eberhart 1995), dierential
evolution (Storn & Price 1996/1997),
harmony search (Geem et al. 2001), honeybee algorithm
(Nakrani & Tovey 2004), ..., rey algorithm (Yang 2008),
cuckoo search (Yang & Deb 2009), ...
Xin-She Yang 2011
Steepest Descent/Hill Climbing
Gradient-Based Methods
Use gradient/derivative information very ecient for local search.
Xin-She Yang 2011
Xin-She Yang 2011
Xin-She Yang 2011
Xin-She Yang 2011
Xin-She Yang 2011
Xin-She Yang 2011
Newtons Method
x
n+1
= x
n
H
1
f , H =
_
_
_
_
2
f
x
1
2

2
f
x
1
x
n
.
.
.
.
.
.
.
.
.
2
f
x
n
x
1

2
f
x
n
2
_
_
_
_
.
Xin-She Yang 2011
Newtons Method
x
n+1
= x
n
H
1
f , H =
_
_
_
_
2
f
x
1
2

2
f
x
1
x
n
.
.
.
.
.
.
.
.
.
2
f
x
n
x
1

2
f
x
n
2
_
_
_
_
.
Quasi-Newton
If H is replaced by I, we have
x
n+1
= x
n
If (x
n
).
Here controls the step length.
Xin-She Yang 2011
Newtons Method
x
n+1
= x
n
H
1
f , H =
_
_
_
_
2
f
x
1
2

2
f
x
1
x
n
.
.
.
.
.
.
.
.
.
2
f
x
n
x
1

2
f
x
n
2
_
_
_
_
.
Quasi-Newton
x
n+1
= x
n
If (x
n
).
Xin-She Yang 2011
Newtons Method
x
n+1
= x
n
H
1
f , H =
_
_
_
_
2
f
x
1
2

2
f
x
1
x
n
.
.
.
.
.
.
.
.
.
2
f
x
n
x
1

2
f
x
n
2
_
_
_
_
.
Quasi-Newton
x
n+1
= x
n
If (x
n
).
Generation of new moves by gradient.
Xin-She Yang 2011
Simulated Annealling
Metal annealing to increase strength = simulated annealing.
Probabilistic Move: p exp[E/k
B
T].
k
B
=Boltzmann constant (e.g., k
B
= 1), T=temperature, E=energy.
E f (x), T = T
0
t
(cooling schedule) , (0 < < 1).
T 0, =p 0, = hill climbing.
Xin-She Yang 2011
Metal annealing to increase strength = simulated annealing.
Probabilistic Move: p exp[E/k
B
T].
k
B
=Boltzmann constant (e.g., k
B
= 1), T=temperature, E=energy.
E f (x), T = T
0
t
(cooling schedule) , (0 < < 1).
T 0, =p 0, = hill climbing.
This is essentially a Markov chain.
Generation of new moves by Markov chain.
Xin-She Yang 2011
An Example
An Example
Xin-She Yang 2011
Genetic Algorithms
Genetic Algorithms
crossover mutation
Xin-She Yang 2011
Genetic Algorithms
Genetic Algorithms
crossover mutation
Xin-She Yang 2011
Genetic Algorithms
Genetic Algorithms
crossover mutation
Xin-She Yang 2011
Xin-She Yang 2011
Xin-She Yang 2011
Generation of new solutions by crossover, mutation and elistism.
Xin-She Yang 2011
Swarm Intelligence
Swarm Intelligence
Ants, bees, birds, sh ...
Simple rules lead to complex behaviour.
Swarming Starlings
Xin-She Yang 2011
PSO
PSO
x
i
g
x
j
Particle swarm optimization (Kennedy and Eberhart 1995)
v
t+1
i
= v
t
i
+
1
(g
x
t
i
) +
2
(x
i
x
t
i
),
x
t+1
i
= x
t
i
+ v
t+1
i
.
, = learning parameters,
1
,
2
=random numbers.
Xin-She Yang 2011
PSO
PSO
x
i
g
x
j
v
t+1
i
= v
t
i
+
1
(g
x
t
i
) +
2
(x
i
x
t
i
),
x
t+1
i
= x
t
i
+ v
t+1
i
.
1
,
2
=random numbers.
Xin-She Yang 2011
PSO
PSO
x
i
g
x
j
v
t+1
i
= v
t
i
+
1
(g
x
t
i
) +
2
(x
i
x
t
i
),
x
t+1
i
= x
t
i
+ v
t+1
i
.
1
,
2
=random numbers.
Without randomness, generation of new moves by weighted
average or pattern search.
Adding randomization to increase the diversity of new solutions.
Xin-She Yang 2011
PSO Convergence
PSO Convergence
Consider a 1D system without randomness (Clerc & Kennedy 2002)
v
t+1
i
= v
t
i
+(x
t
i
x
i
) +(x
t
i
g), x
t+1
i
= x
t
i
+ v
t+1
i
.
Xin-She Yang 2011
PSO Convergence
PSO Convergence
v
t+1
i
= v
t
i
+(x
t
i
x
i
) +(x
t
i
g), x
t+1
i
= x
t
i
+ v
t+1
i
.
Considering only one particle and dening p =
x
i
+g
+
, = +
and setting y
t
= p x
t
i
, we have
_
v
t+1
= v
t
+y
t
,
y
t+1
= v
t
+ (1 )y
t
.
Xin-She Yang 2011
PSO Convergence
PSO Convergence
v
t+1
i
= v
t
i
+(x
t
i
x
i
) +(x
t
i
g), x
t+1
i
= x
t
i
+ v
t+1
i
.
Considering only one particle and dening p =
x
i
+g
+
, = +
and setting y
t
= p x
t
i
, we have
_
v
t+1
= v
t
+y
t
,
y
t+1
= v
t
+ (1 )y
t
.
This can be written as
U
t
=
_
v
t
y
t
_
, A =
_
1
1 (1 )
_
, =U
t+1
= AU
t
,
a simple dynamical system whose eigenvalues are
= 1

2

_
2
4
2
.
Periodic, quasi-periodic depending on . Convergence for 4.
Xin-She Yang 2011
Ant and Bee Algorithms
Ant Colony Optimization (Dorigo 1992)
Bee algorithms & many variants (Nakrani & Tovey 2004,
Karabogo 2005, Yang 2005, Asfhar et al. 2007, ..., others.
Xin-She Yang 2011
Ant Colony Optimization (Dorigo 1992)
Bee algorithms & many variants (Nakrani & Tovey 2004,
Karabogo 2005, Yang 2005, Asfhar et al. 2007, ..., others.
Advantages
Very promising for combinatorial optimization, but for continuous
problems, it may not be the best choice.
Xin-She Yang 2011
Ant & Bee Algorithms
Ant & Bee Algorithms
Pheromone based
Each agent follows paths with higher pheromone
concentration (quasi-randomly)
Pheromone evaporates (exponentially) with time
Xin-She Yang 2011
Firey Algorithm
Firey Algorithm
Firey Algorithm by Xin-She Yang (2008)
(Xin-She Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).)
Firey Behaviour and Idealization
Fireies are unisex and brightness varies with distance.
Less bright ones will be attracted to bright ones.
If no brighter rey can be seen, a rey will move randomly.
x
t+1
i
= x
t
i
+
0
e
r
2
ij
(x
j
x
i
) +
t
i
.
Generation of new solutions by random walk and attraction.
Xin-She Yang 2011
FA Convergence
FA Convergence
For the rey motion without the randomness term, we focus on a
single agent and replace x
t
j
by g
x
t+1
i
= x
t
i
+
0
e
r
2
i
(g x
t
i
),
where the distance r
i
= ||g x
t
i
||
2
.
Xin-She Yang 2011
FA Convergence
FA Convergence
t
j
by g
x
t+1
i
= x
t
i
+
0
e
r
2
i
(g x
t
i
),
i
= ||g x
t
i
||
2
.
In the 1-D case, we set y
t
= g x
t
i
and u
t
=
y
t
, we have
u
t+1
= u
t
[1
0
e
u
2
t
].
Xin-She Yang 2011
FA Convergence
FA Convergence
t
j
by g
x
t+1
i
= x
t
i
+
0
e
r
2
i
(g x
t
i
),
i
= ||g x
t
i
||
2
.
In the 1-D case, we set y
t
= g x
t
i
and u
t
=
y
t
, we have
u
t+1
= u
t
[1
0
e
u
2
t
].
Analyzing this using the same methodology for u
t
= u
t
(1 u
t
),
we have a corresponding chaotic map, focusing on the transition
from periodic multiple states to chaotic behaviour.
Xin-She Yang 2011
Convergence can be achieved for
0
< 2. There is a transition
from periodic to chaos at
0
4.
Chaotic characteristics can often be used as an ecient
mixing technique for generating diverse solutions.
Too much attraction may cause chaos :)
Xin-She Yang 2011
Convergence can be achieved for
0
< 2. There is a transition
from periodic to chaos at
0
4.
Chaotic characteristics can often be used as an ecient
mixing technique for generating diverse solutions.
Too much attraction may cause chaos :)
Xin-She Yang 2011
Cuckoo Breeding Behaviour
Cuckoo Breeding Behaviour
Evolutionary Advantages
Dumps eggs in the nests of host birds and let these host birds raise
their chicks.
Cuckoo Video (BBC)
Xin-She Yang 2011
Cuckoo Search
Cuckoo Search
Cuckoo Search by Xin-She Yang and Suash Deb (2009)
(Xin-She Yang and Suash Deb, Cuckoo search via Levy ights, in: Proceeings of
World Congress on Nature & Biologically Inspired Computing (NaBIC 2009, India),
IEEE Publications, USA, pp. 210-214 (2009). Also, Xin-She Yang and Suash Deb,
Engineering Optimization by Cuckoo Search, Int. J. Mathematical Modelling and
Numerical Optimisation, Vol. 1, No. 4, 330-343 (2010). )
Cuckoo Behaviour and Idealization
Each cuckoo lays one egg (solution) at a time, and dumps its
egg in a randomly chosen nest.
The best nests with high-quality eggs (solutions) will carry out
to the next generation.
The egg laid by a cuckoo can be discovered by the host bird
with a probability p
a
and a nest will then be built.
Xin-She Yang 2011
Cuckoo Search
Cuckoo Search
Local random walk:
x
t+1
i
= x
t
i
+ s H(p
a
) (x
t
j
x
t
k
).
[x
i
, x
j
, x
k
are 3 dierent solutions, H(u) is a Heaviside function,
is a random number drawn from a uniform distribution, and s is
the step size.
Xin-She Yang 2011
Cuckoo Search
Cuckoo Search
Local random walk:
x
t+1
i
= x
t
i
+ s H(p
a
) (x
t
j
x
t
k
).
[x
i
, x
j
, x
k
the step size.
Global random walk via Levy ights:
x
t+1
i
= x
t
i
+L(s, ), L(s, ) =
() sin(/2)
1
s
1+
, (s s
0
).
Xin-She Yang 2011
Cuckoo Search
Cuckoo Search
Local random walk:
x
t+1
i
= x
t
i
+ s H(p
a
) (x
t
j
x
t
k
).
[x
i
, x
j
, x
k
the step size.
x
t+1
i
= x
t
i
+L(s, ), L(s, ) =
() sin(/2)
1
s
1+
, (s s
0
).
Xin-She Yang 2011
Cuckoo Search
Cuckoo Search
Local random walk:
x
t+1
i
= x
t
i
+ s H(p
a
) (x
t
j
x
t
k
).
[x
i
, x
j
, x
k
the step size.
x
t+1
i
= x
t
i
+L(s, ), L(s, ) =
() sin(/2)
1
s
1+
, (s s
0
).
Generation of new moves by Levy ights, random walk and elitism.
Xin-She Yang 2011
Applications
Applications
Design optimization: structural engineering, product design ...
Scheduling, routing and planning: often discrete,
combinatorial problems ...
Applications in almost all areas (e.g., nance, economics,
engineering, industry, ...)
Xin-She Yang 2011
Pressure Vessel Design Optimization
Pressure Vessel Design Optimization
r
d
1
r
L
d
2
Xin-She Yang 2011
Optimization
Optimization
This is a well-known test problem for optimization (e.g., see
Cagnina et al. 2008) and it can be written as
minimize f (x) = 0.6224d
1
rL+1.7781d
2
r
2
+3.1661d
2
1
L+19.84d
2
1
r ,
subject to
_
_
g
1
(x) = d
1
+ 0.0193r 0
g
2
(x) = d
2
+ 0.00954r 0
g
3
(x) = r
2
L
4
3
r
3
+ 1296000 0
g
4
(x) = L 240 0.
Xin-She Yang 2011
Optimization
Optimization
1
rL+1.7781d
2
r
2
+3.1661d
2
1
L+19.84d
2
1
r ,
subject to
_
_
g
1
(x) = d
1
+ 0.0193r 0
g
2
(x) = d
2
+ 0.00954r 0
g
3
(x) = r
2
L
4
3
r
3
+ 1296000 0
g
4
(x) = L 240 0.
The simple bounds are
0.0625 d
1
, d
2
99 0.0625, 10.0 r , L 200.0.
Xin-She Yang 2011
Optimization
Optimization
1
rL+1.7781d
2
r
2
+3.1661d
2
1
L+19.84d
2
1
r ,
subject to
_
_
g
1
(x) = d
1
+ 0.0193r 0
g
2
(x) = d
2
+ 0.00954r 0
g
3
(x) = r
2
L
4
3
r
3
+ 1296000 0
g
4
(x) = L 240 0.
The simple bounds are
0.0625 d
1
, d
2
99 0.0625, 10.0 r , L 200.0.
The best solution found so far
f
= 6059.714, x
= (0.8125, 0.4375, 42.0984, 176.6366).

Xin-She Yang 2011
Dome Design
Dome Design
Xin-She Yang 2011
Dome Design
Dome Design
120-bar dome: Divided into 7 groups, 120 design elements, about 200
constraints (Kaveh and Talatahari 2010; Gandomi and Yang 2011).
Xin-She Yang 2011
Tower Design
Tower Design
26-storey tower: 942 design elements, 244 nodal links, 59 groups/types,
> 4000 nonlinear constraints (Kaveh & Talatahari 2010; Gandomi & Yang 2011).
Xin-She Yang 2011
Monte Carlo Methods
Monte Carlo Methods
Random walk A drunkards walk:
u
t+1
= + u
t
+ w
t
,
where w
t
is a random variable, and is the drift.
For example, w
t
N(0,
2
) (Gaussian).
Xin-She Yang 2011
Monte Carlo Methods
Monte Carlo Methods
u
t+1
= + u
t
+ w
t
,
where w
t
For example, w
t
N(0,
2
) (Gaussian).
-10
-5
0
5
10
15
20
25
0 100 200 300 400 500
Xin-She Yang 2011
Monte Carlo Methods
Monte Carlo Methods
u
t+1
= + u
t
+ w
t
,
where w
t
For example, w
t
N(0,
2
) (Gaussian).
-10
-5
0
5
10
15
20
25
0 100 200 300 400 500
-20
-15
-10
-5
0
5
10
-15 -10 -5 0 5 10 15 20
Xin-She Yang 2011
Markov Chains
Markov Chains
Markov chain: the next state only depends on the current state
and the transition probability.
P(i , j ) P(V
t+1
= S
j
V
0
= S
p
, ..., V
t
= S
i
)
= P(V
t+1
= S
j
V
t
= S
j
),
=P
ij
i
= P
ji
j
,
= stionary probability distribution.

Examples: Brownian motion
u
i +1
= + u
i
+
i
,
i
N(0,
2
).
Xin-She Yang 2011
Markov Chains
Markov Chains
Monopoly (board games)
Monopoly Animation
Xin-She Yang 2011
Markov Chain Monte Carlo
Markov Chain Monte Carlo
Landmarks: Monte Carlo method (1930s, 1945, from 1950s) e.g.,
Metropolis Algorithm (1953), Metropolis-Hastings (1970).
Markov Chain Monte Carlo (MCMC) methods A class of
methods.
Really took o in 1990s, now applied to a wide range of areas:
physics, Bayesian statistics, climate changes, machine learning,
nance, economy, medicine, biology, materials and engineering ...
Xin-She Yang 2011
Convergence Behaviour
As the MCMC runs, convergence may be reached
When does a chain converge? When to stop the chain ... ?
Are multiple chains better than a single chain?
0
100
200
300
400
500
600
0 100 200 300 400 500 600 700 800 900
Xin-She Yang 2011
t=2
t=0
t=2
U
1
2
3
t
t=n
converged
Multiple, interacting chains
Multiple agents trace multiple, interacting Markov chains during
the Monte Carlo process.
Xin-She Yang 2011
Analysis
Analysis
Classications of Algorithms
Trajectory-based: hill-climbing, simulated annealing, pattern
search ...
Population-based: genetic algorithms, ant & bee algorithms,
articial immune systems, dierential evolutions, PSO, HS,
FA, CS, ...
Xin-She Yang 2011
Analysis
Analysis
search ...
FA, CS, ...
Xin-She Yang 2011
Analysis
Analysis
search ...
FA, CS, ...
Ways of Generating New Moves/Solutions
Markov chains with dierent transition probability.
Trajectory-based = a single Markov chain;
Population-based = multiple, interacting chains.
Tabu search (with memory) = self-avoiding Markov chains.
Xin-She Yang 2011
Ergodicity
Ergodicity
Markov Chains & Markov Processes
Most theoretical studies uses Markov chains/process as a
framework for convergence analysis.
A Markov chain is said be to regular if some positive power k
of the transition matrix P has only positive elements.
A chain is call time-homogeneous if the change of its
transition matrix P is the same after each step, thus the
transition probability after k steps become P
k
.
A chain is ergodic or irreducible if it is aperiodic and positive
recurrent it is possible to reach every state from any state.
Xin-She Yang 2011
As k , we have the stationary probability distribution
= P, = thus the rst eigenvalue is always 1.
Asymptotic convergence to optimality:
lim
k
, (with probability one).

Xin-She Yang 2011
As k , we have the stationary probability distribution
= P, = thus the rst eigenvalue is always 1.
Asymptotic convergence to optimality:
lim
k
, (with probability one).

The rate of convergence is usually determined by the second
eigenvalue 0 <
2
< 1.
An algorithm can converge, but may not be necessarily ecient,
as the rate of convergence is typically low.
Xin-She Yang 2011
Convergence of GA
Convergence of GA
Important studies by Aytug et al. (1996)
1
, Aytug and Koehler
(2000)
2
, Greenhalgh and Marschall (2000)
3
, Gutjahr (2010),
4
etc.
5
The number of iterations t() in GA with a convergence
probability of can be estimated by
t()
_
ln(1 )
ln
_
1 min[(1 )
Ln
,
Ln
]
_
_
,
where =mutation rate, L=string length, and n=population size.
1
H. Aytug, S. Bhattacharrya and G. J. Koehler, A Markov chain analysis of genetic algorithms with power of
2 cardinality alphabets, Euro. J. Operational Research, 96, 195-201 (1996).
2
H. Aytug and G. J. Koehler, New stopping criterion for genetic algorithms, Euro. J. Operational research,
126, 662-674 (2000).
3
D. Greenhalgh & S. Marshal, Convergence criteria for genetic algorithms, SIAM J. Computing, 30, 269-282
(2000).
4
W. J. Gutjahr, Convergence Analysis of Metaheuristics Annals of Information Systems, 10, 159-187 (2010).
5
Xin-She Yang 2011

Multiobjective Metaheuristics
Asymptotic convergence of metaheuristic for multiobjective
optimization (Villalobos-Arias et al. 2005)
6
The transition matrix P of a metaheuristic algorithm has a
stationary distribution such that
|P
k
ij

j
| (1 )
k1
, i , j , (k = 1, 2, ...),
where is a function of mutation probability , string length L
and population size. For example, = 2
nL
nL
, so < 0.5.
6
M. Villalobos-Arias, C. A. Coello Coello and O. Hernandez-Lerma, Asymptotic convergence of metaheuristics
for multiobjective optimization problems, Soft Computing, 10, 1001-1005 (2005).
Xin-She Yang 2011
6
|P
k
ij

j
| (1 )
k1
, i , j , (k = 1, 2, ...),
nL
nL
, so < 0.5.
6
Xin-She Yang 2011
6
|P
k
ij

j
| (1 )
k1
, i , j , (k = 1, 2, ...),
nL
nL
, so < 0.5.
Note: An algorithm satisfying this condition may not converge (for
multiobjective optimization)
However, an algorithm with elitism, obeying the above condition,
does converge!.
6
Xin-She Yang 2011
Other results
Other results
Limited results on convergence analysis exist, concerning (nite
states/domains)
ant colony optimization
generalized hill-climbers and simulated annealing,
best-so-far convergence of cross-entropy optimization,
nested partition method, Tabu search, and
of course, combinatorial optimization.
Xin-She Yang 2011
Other results
Other results
Limited results on convergence analysis exist, concerning (nite
states/domains)
ant colony optimization
generalized hill-climbers and simulated annealing,
best-so-far convergence of cross-entropy optimization,
nested partition method, Tabu search, and
of course, combinatorial optimization.
However, more challenging tasks for innite states/domains and
continuous problems.
Many, many open problems needs satisfactory answers.
Xin-She Yang 2011
Converged?
Converged?
Converged, often the best-so-far convergence, not necessarily at
the global optimality
In theory, a Markov chain can converge, but the number of
iterations tends to be large.
In practice, a nite (hopefully, small) number of generations, if the
algorithm converges, it may not reach the global optimum.
Xin-She Yang 2011
Converged?
Converged?
Xin-She Yang 2011
Converged?
Converged?
How to avoid premature convergence
Equip an algorithm with the ability to escape a local optimum
Increase diversity of the solutions
Enough randomization at the right stage
....(unknown, new) ....
Xin-She Yang 2011
All
All
So many algorithms what are the common characteristics?
What are the key components?
How to use and balance dierent components?
What controls the overall behaviour of an algorithm?
Xin-She Yang 2011
Characteristics of Metaheuristics
Exploration and Exploitation, or Diversication and Intensication.
Xin-She Yang 2011
Exploitation/Intensication
Intensive local search, exploiting local information.
E.g., hill-climbing.
Xin-She Yang 2011
Exploitation/Intensication
Intensive local search, exploiting local information.
E.g., hill-climbing.
Exploration/Diversication
Exploratory global search, using randomization/stochastic
components. E.g., hill-climbing with random restart.
Xin-She Yang 2011
Summary
Summary
Exploitation
E
x
p
l
o
r
a
t
i
o
n
Xin-She Yang 2011
Summary
Summary
Exploitation
E
x
p
l
o
r
a
t
i
o
n
uniform
search
Xin-She Yang 2011
Summary
Summary
Exploitation
E
x
p
l
o
r
a
t
i
o
n
uniform
search
steepest
descent
Xin-She Yang 2011
Summary
Summary
Exploitation
E
x
p
l
o
r
a
t
i
o
n
uniform
search
steepest
descent
Tabu Nelder-Mead
C
S
P
S
O
/
F
A
E
P
/
E
S
S
A
A
n
t
/
B
e
e
G
e
n
e
t
i
c
a
l
g
o
r
i
t
h
m
s
Newton-
Raphson
Xin-She Yang 2011
Summary
Summary
Exploitation
E
x
p
l
o
r
a
t
i
o
n
uniform
search
steepest
descent
Tabu Nelder-Mead
C
S
P
S
O
/
F
A
E
P
/
E
S
S
A
A
n
t
/
B
e
e
G
e
n
e
t
i
c
a
l
g
o
r
i
t
h
m
s
Newton-
Raphson
Best?
Free lunch?
Xin-She Yang 2011
No-Free-Lunch (NFL) Theorems
Algorithm Performance
Any algorithm is as good/bad as random search, when averaged
over all possible problems/functions.
Xin-She Yang 2011
Finite domains
No universally ecient algorithm!
Xin-She Yang 2011
Finite domains
No universally ecient algorithm!
Any free taster or dessert?
Yes and no. (more later)
Xin-She Yang 2011
NFL Theorems (Wolpert and Macready 1997)
Search space is nite (though quite large), thus the space of
possible cost values is also nite. Objective function
f : X Y, with F = Y
X
(space of all possible problems).
Assumptions: nite domain, closed under permutation (c.u.p).
For m iterations, m distinct visited points form a time-ordered
set d
m
=
__
d
x
m
(1), d
y
m
(1)
_
, ...,
_
d
x
m
(m), d
y
m
(m)
__
.
The performance of an algorithm a iterated m times on a cost
function f is denoted by P(d
y
m
|f , m, a).
For any pair of algorithms a and b, the NFL theorem states
f
P(d
y
m
|f , m, a) =
f
P(d
y
m
|f , m, b).
Xin-She Yang 2011
f : X Y, with F = Y
X
set d
m
=
__
d
x
m
(1), d
y
m
(1)
_
, ...,
_
d
x
m
(m), d
y
m
(m)
__
.
y
m
|f , m, a).
f
P(d
y
m
|f , m, a) =
f
P(d
y
m
|f , m, b).
Xin-She Yang 2011
f : X Y, with F = Y
X
set d
m
=
__
d
x
m
(1), d
y
m
(1)
_
, ...,
_
d
x
m
(m), d
y
m
(m)
__
.
y
m
|f , m, a).
f
P(d
y
m
|f , m, a) =
f
P(d
y
m
|f , m, b).
Any algorithm is as good (bad) as a random search!
Xin-She Yang 2011
Proof Sketch
Proof Sketch
Wolpert and Macreadys original proof by induction
For m = 1, d
1
= {d
x
1
, d
y
1
}, so the only possible value of d
y
1
is f (d
x
1
), and thus
(d
y
1
, f (d
x
1
)). This means
f
P(d
y
1
|f , m = 1, a) =
f
(d
y
1
, f (d
x
1
)) = |Y|
|X|1
,
which is independent of algorithm a. [|Y| is the size of Y.]
If it is true for m, or
f
P(d
y
m
|f , m, a) is independent of a, then for m + 1, we
have d
m+1
= d
m
{x, f (x)} with d
x
m+1
(m + 1) = x and d
y
m+1
(m + 1) = f (x).
Thus, we get (Bayesian approach)
P(d
y
m+1
|f , m + 1, a) = P(d
y
m+1
(m + 1)|d
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
So
f
P(d
y
m+1
|f , m + 1, a) =
f ,x
(d
m
m+1
(m + 1), f (x))P(x|d
y
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
Using P(x|d
m
, a) = (x, a(d
m
)) and P(d
m
|f , m + 1, a) = P(d
m
|f , m, a), this
leads to
f
P(d
y
m+1
|f , m + 1, a) =
1
|Y|
f
P(d
y
m
|f , m, a),
which is also independent of a.
Xin-She Yang 2011
Proof Sketch
Proof Sketch
For m = 1, d
1
= {d
x
1
, d
y
1
y
1
is f (d
x
1
), and thus
(d
y
1
, f (d
x
1
)). This means
f
P(d
y
1
|f , m = 1, a) =
f
(d
y
1
, f (d
x
1
)) = |Y|
|X|1
,
f
P(d
y
m
have d
m+1
= d
m
{x, f (x)} with d
x
m+1
(m + 1) = x and d
y
m+1
(m + 1) = f (x).
P(d
y
m+1
|f , m + 1, a) = P(d
y
m+1
(m + 1)|d
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
So
f
P(d
y
m+1
|f , m + 1, a) =
f ,x
(d
m
m+1
(m + 1), f (x))P(x|d
y
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
Using P(x|d
m
, a) = (x, a(d
m
)) and P(d
m
|f , m + 1, a) = P(d
m
|f , m, a), this
leads to
f
P(d
y
m+1
|f , m + 1, a) =
1
|Y|
f
P(d
y
m
|f , m, a),
Xin-She Yang 2011
Proof Sketch
Proof Sketch
For m = 1, d
1
= {d
x
1
, d
y
1
y
1
is f (d
x
1
), and thus
(d
y
1
, f (d
x
1
)). This means
f
P(d
y
1
|f , m = 1, a) =
f
(d
y
1
, f (d
x
1
)) = |Y|
|X|1
,
f
P(d
y
m
have d
m+1
= d
m
{x, f (x)} with d
x
m+1
(m + 1) = x and d
y
m+1
(m + 1) = f (x).
P(d
y
m+1
|f , m + 1, a) = P(d
y
m+1
(m + 1)|d
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
So
f
P(d
y
m+1
|f , m + 1, a) =
f ,x
(d
m
m+1
(m + 1), f (x))P(x|d
y
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
Using P(x|d
m
, a) = (x, a(d
m
)) and P(d
m
|f , m + 1, a) = P(d
m
|f , m, a), this
leads to
f
P(d
y
m+1
|f , m + 1, a) =
1
|Y|
f
P(d
y
m
|f , m, a),
Xin-She Yang 2011
Proof Sketch
Proof Sketch
For m = 1, d
1
= {d
x
1
, d
y
1
y
1
is f (d
x
1
), and thus
(d
y
1
, f (d
x
1
)). This means
f
P(d
y
1
|f , m = 1, a) =
f
(d
y
1
, f (d
x
1
)) = |Y|
|X|1
,
f
P(d
y
m
have d
m+1
= d
m
{x, f (x)} with d
x
m+1
(m + 1) = x and d
y
m+1
(m + 1) = f (x).
P(d
y
m+1
|f , m + 1, a) = P(d
y
m+1
(m + 1)|d
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
So
f
P(d
y
m+1
|f , m + 1, a) =
f ,x
(d
m
m+1
(m + 1), f (x))P(x|d
y
m
, f , m + 1, a)P(d
y
m
|f , m + 1, a).
Using P(x|d
m
, a) = (x, a(d
m
)) and P(d
m
|f , m + 1, a) = P(d
m
|f , m, a), this
leads to
f
P(d
y
m+1
|f , m + 1, a) =
1
|Y|
f
P(d
y
m
|f , m, a),
Xin-She Yang 2011
Free Lunches
Free Lunches
NFL not true for continuous domains (Auger and Teytaud 2009)
Continuous free lunches = some algorithms are better than others!
For example, for a 2D sphere function, an ecient algorithm only
needs 4 iterations/steps to reach the optimality (global minimum).
7
7
A. Auger and O. Teytaud, Continuous lunches are free plus the design of optimal optimization algorithms,
Algorithmica, 57, 121-146 (2010).
8
J. A. Marshall and T. G. Hinton, Beyond no free lunch: realistic algorithms for arbitrary problem classes,
WCCI 2010 IEEE World Congress on Computational Intelligence, July 1823, Barcelona, Spain, pp. 1319-1324.
Xin-She Yang 2011
Free Lunches
Free Lunches
7
7
Algorithmica, 57, 121-146 (2010).
8
Xin-She Yang 2011
Free Lunches
Free Lunches
7
Revisiting algorithms
NFL assumes that the time-ordered set has m distinct points
(non-revisiting). For revisiting points, it breaks the closed under
permutation, so NFL does not hold (Marshall and Hinton 2010)
8
7
Algorithmica, 57, 121-146 (2010).
8
Xin-She Yang 2011
More Free Lunches
More Free Lunches
Coevolutionary algorithms
A set of players (agents?) in self-play problems work together to
produce a champion like training a chess champion
free lunches exist (Wolpert and Macready 2005).
9
[A single player tries to pursue the best next move, or for two
players, the tness function depends on the moves of both players.]
9
D. H. Wolpert and W. G. Macready, Coevolutonary free lunches, IEEE Trans. Evolutionary Computation, 9,
721-735 (2005).
10
D. Corne and J. Knowles, Some multiobjective optimizers are better than others, Evolutionary Computation,
CEC03, 4, 2506-2512 (2003).
Xin-She Yang 2011
More Free Lunches
More Free Lunches
9
9
721-735 (2005).
10
CEC03, 4, 2506-2512 (2003).
Xin-She Yang 2011
More Free Lunches
More Free Lunches
9
Multiobjective
Some multiobjective optimizers are better than others (Corne
and Knowles 2003).
10
[results for nite domains only]
Free lunches due to archiver and generator.
9
721-735 (2005).
10
CEC03, 4, 2506-2512 (2003).
Xin-She Yang 2011
Open Problems
Open Problems
Framework: Need to develop a unied framework for
algorithmic analysis (e.g.,convergence).
Exploration and exploitation: What is the optimal balance
between these two components? (50-50 or what?)
Performance measure: What are the best performance
measures ? Statistically? Why ?
Convergence: Convergence analysis of algorithms for innite,
continuous domains require systematic approaches?
Xin-She Yang 2011
Open Problems
Open Problems
Xin-She Yang 2011
Open Problems
Open Problems
Xin-She Yang 2011
Open Problems
Open Problems
Xin-She Yang 2011
More Open Problems
More Open Problems
Free lunches: Unproved for innite or continuous domains for
multiobjective optimization. (possible free lunches!)
What are implications of NFL theorems in practice?
If free lunches exist, how to nd the best algorithm(s)?
Knowledge: Problem-specic knowledge always helps to nd
appropriate solutions? How to quantify such knowledge?
Intelligent algorithms: Any practical way to design truly
intelligent, self-evolving algorithms?
Xin-She Yang 2011
More Open Problems
More Open Problems
Xin-She Yang 2011
More Open Problems
More Open Problems
Xin-She Yang 2011
Thanks
Thanks
Yang X. S., Engineering Optimization: An Introduction with Metaheuristic
Applications, Wiley, (2010).
Yang X. S., Introduction to Computational Mathematics, World Scientic,
(2008).
Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).
Yang X. S., Introduction to Mathematical Optimization: From Linear
Programming to Metaheuristics, Cambridge Int. Science Publishing, (2008).
Yang X. S., Applied Engineering Optimization, Cambridge Int. Science
Publishing, (2007).
Xin-She Yang 2011
IJMMNO
IJMMNO
International Journal of Mathematical Modelling and Numerical
Optimization (IJMMNO)
http://www.inderscience.com/ijmmno
Thank you!
Xin-She Yang 2011
Thank you!
Questions ?
Xin-She Yang 2011

Metaheristics Optimization: Algorithm Analysis and Open Problems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Metaheristics Optimization: Algorithm Analysis and Open Problems

Uploaded by

Copyright:

Available Formats

Intro Metaheuristic Algorithms Applications Markov Chains Analysis All NFL Open Problems Thanks

Metaheristics Optimization: Algorithm Analysis

= (0.8125, 0.4375, 42.0984, 176.6366).

= stionary probability distribution.

, (with probability one).

, (with probability one).

Xin-She Yang 2011

You might also like