Evolutionary Methods in Multi-Objective Optimization

Evolutionary Methods in Multi-Objective Optimization
- Why do they work ? -
Lothar Thiele
Computer Engineering and Networks Laboratory

Dept. of Information Technology and Electrical Engineering
Swiss Federal Institute of Technology (ETH) Zurich
Computer Engineering
and Networks Laboratory
Overview
 introduction
 limit behavior
 run-time
 performance measures
Black-Box Optimization
objective function
decision
vector x ? objective
vector f(x)
(e.g. simulation model)
Optimization Algorithm:
only allowed to evaluate f
(direct search)
Issues in EMO
How to maintain a diverse

y2 Pareto set approximation?
Diversity
 density estimation
How to prevent nondominated

solutions from being lost?
 environmental selection
y1
How to guide the population
Convergence towards the Pareto set?
 fitness assignment
Multi-objective Optimization
A Generic Multiobjective EA
population archive
evaluate
update
sample
truncate
vary
new population new archive

Comparison of Three Implementations
2-objective knapsack problem
SPEA2
VEGA
extended VEGA
Trade-off between
distance and diversity?
Performance Assessment: Approaches
Which technique is suited for which problem class?
 Theoretically (by analysis): difficult

Limit behavior (unlimited run-time resources)
Running time analysis
 Empirically (by simulation): standard
Problems: randomness, multiple objectives

Issues: quality measures, statistical testing,
benchmark problems, visualization, …
Overview
 introduction
 limit behavior
 run-time
Analysis: Main Aspects
Evolutionary algorithms are random search heuristics

Qualitative:
Probability{Optimum found} Limit behavior
for t → ∞
1
B
b lem
Pro
d to
p plie Quantitative:
1/2 Aa Expected
rithm
o Running
Alg
Time E(T)
Computation Time
(number of iterations)
∞
Archiving
optimization archiving
generate update, truncate finite

memory
every solution finite

at least once archive A
f2
Requirements:
1. Convergence to Pareto front
2. Bounded size of archive
3. Diversity among the stored
solutions
f1
Problem: Deterioration
f2 t f2 t+1
bounded
archive
discarded
(size 3)
new
solution
new
solution
f1 f1
New solution accepted in t+1 is dominated by a solution found

previously (and “lost” during the selection process)
Problem: Deterioration
Goal: Maintain “good” front NSGA-II
(distance + diversity)
SPEA
But: Most archiving

strategies may forget
Pareto-optimal
solutions…
Limit Behavior: Related Work
Requirements for
archive:
1. Convergence
2. Diversity
3. Bounded Size
[Rudolph 98,00] [Rudolph 98,00]

[Veldhuizen 99] [Hanne 99] in this work
“store all” “store one”
(impractical) (not nice)

Solution Concept: Epsilon Dominance
Definition 1: -Dominance -dominated

dominated
A -dominates B iff
(1+)·f(A)  f(B)
(known since 1987)
Pareto set
-Pareto set
Definition 2: -Pareto set
A subset of the Pareto-optimal
set which -dominates all Pareto-
optimal solutions
Keeping Convergence and Diversity
Goal: Maintain -Pareto set Algorithm: (-update)

Idea: -grid, i.e. maintain a
set of non-dominated Accept a new solution f if
boxes (one solution
per box)  the corresponding box is not
y2 dominated by any box
3
(1+) represented in the archive A
2
(1+)
AND
 any other archive member in

(1+)
the same box is dominated by
1 y1
the new solution
2 3
1 (1+) (1+) (1+)
Correctness of Archiving Method
Theorem:
Let F = (f1, f2, f3, …) be an infinite sequence of objective vectors
one by one passed to the -update algorithm, and Ft the union of
the first t objective vectors of F.
Then for any t > 0, the following holds:

 the archive A at time t contains an -Pareto front of Ft
 the size of the archive A at time t is bounded by the term
(K = “maximum objective value”, m = “number of objectives”)
m 1
 log K 
 
 log(1   ) 
Correctness of Archiving Method
Sketch of Proof:
 3 possible failures for At not being an -Pareto set of Ft (indirect

proof)
at time k  t a necessary solution was missed
at time k  t a necessary solution was expelled
At contains an f  Pareto set of Ft
 m
 log K 
   log K 
Number of total boxes in objective space:  log(1   ) 
 
 log(1   ) 
Maximal one solution per box accepted
m 1
 log K 
Partition into  log(1   )  chains of boxes
Simulation Example
Rudolph and Agapie, 2000
Epsilon- Archive
Overview
 introduction
 limit behavior
 run-time
Running Time Analysis: Related Work
problem domain type of results
• expected RT [Mühlenbein 92]

discrete (bounds)
search [Rudolph 97]
Single-objective spaces • RT with high [Droste, Jansen, Wegener 98,02]
probability [Garnier, Kallel, Schoenauer 99,00]
EAs
(bounds) [He, Yao 01,02]
continuous • asymptotic [Beyer 95,96,…]
search convergence rates [Rudolph 97]
spaces • exact convergence [Jagerskupper 03]
rates
Multiobjective
EAs [none]
Methodology
Typical “ingredients”
of a Running Time Analysis: Here:
 SEMO, FEMO, GEMO (“simple”, “fair”,

Simple algorithms
“greedy”)
 mLOTZ, mCOCZ (m-objective Pseudo-

Simple problems Boolean problems)
 General upper bound technique & Graph

Analytical methods & tools search process
1. Rigorous results for specific algorithm(s) on specific problem(s)

2. General tools & techniques
3. General insights (e.g., is a population beneficial at all?)
Three Simple Multiobjective EAs
insert
into population
if not dominated
flip remove
randomly dominated
chosen bit from population
select
individual
from population
Variant 1: SEMO Variant 2: FEMO

Variant 3: GEMO
Each individual in the Select individual with
Priority of convergence
population is selected minimum number of
if there is progress
with the same probability mutation trials
(greedy selection)
(uniform selection) (fair selection)
SEMO
uniform selection
x
single point include,
mutation if not dominated
population
population PP remove dominated and
x’ duplicated
Example Algorithm: SEMO
Simple Evolutionary Multiobjective Optimizer
1. Start with a random solution

2. Choose parent randomly (uniform)
3. Produce child by variating parent
4. If child is not dominated then
• add to population
• discard all dominated
5. Goto 2
Run-Time Analysis: Scenario
Problem:
leading ones, trailing zeros (LOTZ)
1 1 0 1 0 0 0
Variation:
single point mutation
1 1 0 1 0 0 0
1 1 1 1 0 0 0
one bit per individual

The Good News
SEMO behaves like a single-objective EA until the Pareto set
has been reached...
trailing 0s
y2
Phase 1: Phase 2:
only one only Pareto-optimal
solution stored solutions stored
in the archive in the archive
y1
leading 1s
SEMO: Sketch of the Analysis I
Phase 1: until first Pareto-optimal solution has been found
leading 1s trailing 0s
1 1 0 1 1 0 0
i = number of ‘incorrect’ bits
i  i-1: probability of a successful mutation  1/n

expected number of mutations = n
i=n  i=0: at maximum n-1 steps (i=1 not possible)

expected overall number of mutations = O(n2)
SEMO: Sketch of the Analysis II
Phase 2: from the first to all Pareto-optimal solutions
j = number of optimal solutions in the

population
Pareto set
j  j+1: probability of choosing an outer solution  1/j,  2/j

probability of a successful mutation  1/n ,  2/n
expected number Tj of trials (mutations)  nj/4,  nj
j=1  j=n: at maximum n steps  n3/8 + n2/8   Tj  n3/2 + n2/2

expected overall number of mutations = (n3)
SEMO on LOTZ
Can we do even better ?
Our problem is the exploration of the Pareto-front.
Uniform sampling is unfair as it samples early found Pareto-points
more frequently.
FEMO on LOTZ
Sketch of Proof
111111 111100 110000 000000
111110 111000 100000
Probability for each individual, that parent did not generate it with c/p log n trials:
n individuals must be produced. The probability that one needs more

than c/p log n trials is bounded by n1-c.
FEMO on LOTZ
Single objective (1+1) EA with multistart strategy (epsilon-

constraint method) has running time (n3).
EMO algorithm with fair sampling has running time

(n2 log(n)).
Generalization: Randomized Graph Search
 Pareto front can be modeled as a graph

 Edges correspond to mutations
 Edge weights are mutation probabilities
How long does it take to explore the whole graph?

How should we select the “parents”?
Randomized Graph Search
Running Time Analysis: Comparison
Algorithms
Problems
Population-based approach can be more efficient than

multistart of single membered strategy
Overview
 introduction
 limit behavior
 run-time
Performance Assessment: Approaches
Which technique is suited for which problem class?
 Theoretically (by analysis): difficult

Limit behavior (unlimited run-time resources)
Running time analysis
 Empirically (by simulation): standard
Problems: randomness, multiple objectives

Issues: quality measures, statistical testing,
benchmark problems, visualization, …
The Need for Quality Measures
A A
B
B
Is A better than B?
independent of
Yes (strictly) No
user preferences
dependent on
How much? In what aspects?
user preferences
Ideal: quality measures allow to make both type of statements

Independent of User Preferences
Pareto set approximation (algorithm outcome) =
set of incomparable solutions
 = set of all Pareto set approximations
A weakly dominates B
= not worse in all objectives
sets not equal
C dominates D
B = better in at least one objective
A A strictly dominates C
= better in all objectives
D C
B is incomparable to C
= neither set weakly better
Dependent on User Preferences
Goal: Quality measures compare two Pareto set approximations A

and B.
application of A B
quality measures hypervolume 432.34 420.13
(here: unary) distance 0.3308 0.4532
diversity 0.3637 0.3463
A
spread 0.3622 0.3601
B cardinality 6 5
comparison and
interpretation of
quality values
“A better”
Quality Measures: Examples
Unary Binary
Hypervolume measure Coverage measure
performance performance
B C(A,B) = 25%
S(A) = 60%
C(B,A) = 75%
A
S(A)
A
cheapness cheapness
[Zitzler, Thiele: 1999]

Previous Work on Multiobjective Quality Measures
Status:
Visual comparisons common until recently
Numerous quality measures have been proposed
since the mid-1990s
[Schott: 1995][Zitzler, Thiele: 1998][Hansen, Jaszkiewicz: 1998][Zitzler: 1999]
[Van Veldhuizen, Lamont: 2000][Knowles, Corne , Oates: 2000][Deb et al.: 2000]
[Sayin: 2000][Tan, Lee, Khor: 2001][Wu, Azarm: 2001]…
Most popular: unary quality measures (diversity + distance)
No common agreement which measures should be used
Open questions:
What kind of statements do quality measures allow?
Can quality measures detect whether or that a Pareto set
approximation is better than another?
[Zitzler, Thiele, Laumanns, Fonseca, Grunert da Fonseca: 2003]
Comparison Methods and Dominance Relations
Compatibility of a comparison method C:

C yields true  A is (weakly, strictly) better than B
(C detects that A is (weakly, strictly) better than B)
Completeness of a comparison method C:

A is (weakly, strictly) better than B  C yields true
Ideal: compatibility and completeness, i.e.,

A is (weakly, strictly) better than B  C yields true
(C detects whether A is (weakly, strictly) better than B)
Limitations of Unary Quality Measures
Theorem:
There exists no unary quality measure that is able to detect
whether A is better than B.
This statement even holds, if we consider a finite combination
of unary quality measures.
There exists no combination of unary measures

applicable to any problem.
Power of Unary Quality Indicators
strictly dominates doesn’t weakly dominate doesn’t dominate weakly dominates

Quality Measures: Results
Basic question: Can we say on the basis of the quality measures
whether or that an algorithm outperforms another?
application of S T
quality measures hypervolume 432.34 420.13
distance 0.3308 0.4532
S diversity 0.3637 0.3463
T spread 0.3622 0.3601
cardinality 6 5
There is no combination of unary quality measures such that

S is better than T in all measures is equivalent to S dominates T
Unary quality measures usually do not tell that

S dominates T; at maximum that S does not dominate T
[Zitzler et al.: 2002]
A New Measure:  -Quality Measure
Two solutions: Two approximations:
E(a,b) = E(A,B) =
max1 i  n min   fi(a)  fi(b) maxb  B mina  A E(a,b)
E(a,b) = 2 E(A,B) = 2
E(b,a) = ½ E(B,A) = ½
2 b 2 B
1 1
a A
1 2 1 2 4
Advantages: allows all kinds of statements (complete and compatible)

Selected Contributions
How to apply (evolutionary) optimization algorithms to
large-scale multiobjective optimization problems?
Algorithms:
° Improved techniques [Zitzler, Thiele: IEEE TEC1999]
[Zitzler, Teich, Bhattacharyya: CEC2000]
[Zitzler, Laumanns, Thiele: EUROGEN2001]
° Unified model [Laumanns, Zitzler, Thiele: CEC2000]
[Laumanns, Zitzler, Thiele: EMO2001]
° Test problems [Zitzler, Thiele: PPSN 1998, IEEE TEC 1999]
[Deb, Thiele, Laumanns, Zitzler: GECCO2002]
Theory:
° Convergence/diversity [Laumanns, Thiele, Deb, Zitzler: GECCO2002]
[Laumanns, Thiele, Zitzler, Welzl, Deb: PPSN-VII]
° Performance measure [Zitzler, Thiele: IEEE TEC1999]
[Zitzler, Deb, Thiele: EC Journal 2000]
[Zitzler et al.: GECCO2002]

Evolutionary Methods in Multi-Objective Optimization - Why Do They Work ?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evolutionary Methods in Multi-Objective Optimization - Why Do They Work ?

Uploaded by

Copyright:

Available Formats

- Why do they work ? -

Computer Engineering and Networks Laboratory

(e.g. simulation model)

How to maintain a diverse

How to prevent nondominated

new population new archive

Which technique is suited for which problem class?

 Theoretically (by analysis): difficult

 Empirically (by simulation): standard

Problems: randomness, multiple objectives

Evolutionary algorithms are random search heuristics

generate update, truncate finite

every solution finite

New solution accepted in t+1 is dominated by a solution found

But: Most archiving

[Rudolph 98,00] [Rudolph 98,00]

“store all” “store one”

(impractical) (not nice)

Definition 1: -Dominance -dominated

Goal: Maintain -Pareto set Algorithm: (-update)

 any other archive member in

Then for any t > 0, the following holds:

 3 possible failures for At not being an -Pareto set of Ft (indirect

problem domain type of results

• expected RT [Mühlenbein 92]

 SEMO, FEMO, GEMO (“simple”, “fair”,

 mLOTZ, mCOCZ (m-objective Pseudo-

 General upper bound technique & Graph

1. Rigorous results for specific algorithm(s) on specific problem(s)

Variant 1: SEMO Variant 2: FEMO

Simple Evolutionary Multiobjective Optimizer

1. Start with a random solution

one bit per individual

Phase 1: until first Pareto-optimal solution has been found

i = number of ‘incorrect’ bits

i  i-1: probability of a successful mutation  1/n

i=n  i=0: at maximum n-1 steps (i=1 not possible)

Phase 2: from the first to all Pareto-optimal solutions

j = number of optimal solutions in the

j  j+1: probability of choosing an outer solution  1/j,  2/j

j=1  j=n: at maximum n steps  n3/8 + n2/8   Tj  n3/2 + n2/2

111110 111000 100000

n individuals must be produced. The probability that one needs more

Single objective (1+1) EA with multistart strategy (epsilon-

EMO algorithm with fair sampling has running time

 Pareto front can be modeled as a graph

How long does it take to explore the whole graph?

Population-based approach can be more efficient than

Which technique is suited for which problem class?

 Theoretically (by analysis): difficult

 Empirically (by simulation): standard

Problems: randomness, multiple objectives

Ideal: quality measures allow to make both type of statements

Goal: Quality measures compare two Pareto set approximations A

[Zitzler, Thiele: 1999]

Compatibility of a comparison method C:

Completeness of a comparison method C:

Ideal: compatibility and completeness, i.e.,

There exists no combination of unary measures

strictly dominates doesn’t weakly dominate doesn’t dominate weakly dominates

There is no combination of unary quality measures such that

Unary quality measures usually do not tell that