Measuring Multiobjective Evolutionary Algorithm Performance

On Measuring Multiobjective Evolutionary Algorithm Performance
David A. Van Veldhuizen Gary B. Lamont

Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering
Graduate School of Engineering Graduate School of Engineering
Air Force Institute of Technology Air Force Institute of Technology
Wright-Patterson AFB, OH 45433-7765 Wright-Patterson AFB, OH 45433-7765
david.vanveldhuizen@brooks.af.mil gary.lamont@aiit.af.mil
Abstract- Solving optimization problems with sults and observations, followed by our conclusions in
multiple (often conflicting) objectives is generally Section 6.
a quite difficult goal. Evolutionary Algorithms
(EAs) were initially extended and applied dur- 2 MOP Background
ing the mid-eighties in an attempt to stochasti-
cally solve problems of this generic class. Dur- MOPs (as a rule) present a possibly uncountable set
ing the past decade a multiplicity of Multiobjec- of solutions, which when evaluated produce vectors
tive EA (MOEA) techniques have been proposed whose components represent trade-offs in decision space.
and applied to many scientific and engineering We have defined an MOP’S global optimum to be the
applications. Our discussion’s intent is to rigor- Pareto front determined by evaluating each member of
ously define and execute a quantitative MOEA the Pareto optimal solution set (Van Veldhuizen, 1999).
performance comparison methodology. Almost Since Pareto concepts are crucial components of the
all comparisons cited in the current literature vi- MOP domain and MOEA implementations, their incar-
sually compare algorithmic results, resulting in nations axe now briefly described.
only relative conclusions. Our methodology gives During MOEA execution, a “local” set of Pareto opti-
a basis for absolute conclusions regarding MOEA mal solutions (with respect to the current MOEA gener-
performance. Selected results from its execution ational population) is determined at each EA generation
with four MOEAs are presented and described. and termed Pcurrent(t), where t represents the generation
number. Many MOEA implementations also use a sec-
1 Introduction ondary population, storing all/some Pareto optimal so-
lutions found through the generations (Van Veldhuizen,
Multiobjective Evolutionary Algorithms (MOEAs) are 1999). This secondary population is termed p k n o w n ( t ) ,
now a well-established field within Evolutionary Com- also annotated with t (representing completion o f t gen-
putation, initially developed in the mid-eighties when erations) to reflect possible changes in its membership
the first MOEA specifically designed to solve Multiobjec- during MOEA execution. Pknown (0) is defined as 8
tive Optimization Problems (MOPs) was implemented. (the empty set) and Pknownaloneas the final, overall
Over 500 publications (Coello, 1999b) have since pro- set of Pareto optimal solutions returned by an MOEA.
posed various MOEA implementations and applications, Of course, the true Pareto optimal solution set (termed
and to a much lesser extent, underlying MOEA theory. Ptpue)is not explicitly known for MOPs of any difficulty.
Solving MOPs with MOEAs presents several unique Pt,, is defined by the functions composing an MOP; it
challenges (e.g., vector-valued fitness and sets of solu- is fixed and does not change.
tions). Space does not permit a detailed presentation Pcumnt(t),.&own, and Pt,, are sets of MOEA geno-
of all material necessary to understand this dynamic types where each set’s phenotypes form a Pareto front.
and rapidly growing research field. For a good intro- We term the associated Pareto front for each of these
duction to the relevant issues and past EA-based ap- solution sets as PFevrrent( t ) , PFknown, and PF,,.,,.
proaches, see Van Veldhuizen (Van Veldhuizen, 1999), Thus, when using an MOEA to solve MOPs, one implic-
Coello Coello (Coello, 1999a), and Fonseca and Flem- itly assumes that one of the following conditions hold:
ing (Fonseca and Fleming, 1995). This paper instead PFknown E PFt,, or that over some norm (Euclidean,
focuses on rigorously defining and executing a quantita- m s , etc.), PFknown E [PFtme,PFtrve + €1.
tive MOEA performance comparison methodology.
The remainder of this paper is organized as fol- 3 MOEA Experiments
lows. Section 2 presents relevant MOP concepts while
Section 3 outlines a quantitative MOEA experimental The major goal of these experiments is to compare well-
methodology. Sections 4 and 5 discuss experimental re- engineered MOEAs in terms of effectiveness and effi-
0-7803-6375-2/00/$10.00 02000 EEE. 204

ciency in regards to carefully selected test problems hom lution space. The underlying computational grid may
the same class. Jackson et al. (1991) imply this should be increased/decreased as desired, at least up to some
sufEce to show MOEA feasibility and promise. We also point where the enumerative computation becomes im-
wish to determine how well the test problems and pro- practical or intractable. This methodology is designed
posed metrics capture and report essential MOP/MOEA for experimentation and used to make judgments about
characteristics and performance, as well as analyzing any proposed MOEAs and their implementations.
interesting observations resulting from experiment exe-
cution and result analysis. 3.2 M O E A Test Algorithms
We report relevant quantitative MOEA performance
Four MOEAs were selected for testing. These algorithms
based on appropriate experiments. This is noteworthy
and their original raisons d’etre are:
as almost all comparisons cited in the current litera-
ture visually compare algorithmic results. As experimen- 1. MOGA. Implemented by Fonseca and Fleming
tal numeric MOPs’ Pt,, and PFttueare most often not (1998). Initially used to explore incorporation of
known these visually-derived results are then relative. decision maker’s goals and priorities in the MOP
The methodology described in the next section gives a solution process. Employs an original Pareto rank-
basis for absolute conclusions regarding MOEA perfor- ing scheme and incorporates fitness sharing.
mance - a goal actively pursued by few other MOEA 2. M O M G A . Implemented by Van Veldhuizen and
researchers (e.g., Zitzler and Thiele (1999) and Shaw et Lamont (2000)’ (Van Veldhuizen, 1999). Initially
al. (1999)). used to explore the relationship between MOP
solution Building Blocks (BBS) and their use in
3.1 Experimental Methodology MOEA search. Incorporates fitness sharing and
Although test suite or benchmark functions do provide Horn et al.’s (1994) tournament selection.
a common basis for MOEA comparisons, results are em- 3. N P G A . Implemented by Horn et al. (1994).
pirical unless the global optima are known. We note that Initially used to explore benefits of providing
finding a general MOP’S Pareto optimal solution set is Pkn,,n as input to a decision analysis technique.
NP-complete (Van Veldhuizen, 1999). However, there Uses tournament selection based on Pareto opti-
is a way to approximate Ptmefor specific problem cases. mality. Incorporates fitness sharing.
Teaming this data with appropriate metrics then allows
for the desired quantitative MOEA comparisons. 4. NSGA. Implemented by Srinivas and Deb (1994).
When the real- (continuous) world is modeled (e.g., Initially used to explore bias prevention towards
via objective functions) on a computer (a discrete ma- certain regions of the Pareto front. Employs a
chine), there is a fidelity loss between the (possibly) con- unique Pareto ranking scheme and incorporates fit-
tinuous mathematical model and its discrete representa- ness sharing.
tion. Any formalized MOP being computationally solved We note here that these algorithms were selected be-
suffers this fate. However, at a “standardized” compu- cause they specifically incorporate what appear to be
tational resolution and genetic representation, MOEA key theoretical problem/algorithm domain aspects such
results can be quantitatively compared not only against as Pareto ranking, niching, and fitness sharing. Other re-
each other but against certain MOPs’ PFtme. Thus, searchers appear to share these thoughts as the MOGA,
whether or not these selected MOPS’ PFtmeis actually NPGA, and NSGA (or variants thereof) are the litera-
continuous or discrete is not of experimental concern, ture’s most cited and imitated (Van Veldhuizen, 1999).
as the representable Pt,,, and PFtrue are b e d based on The MOGA, NPGA, and NSGA are based on “tra-
certain assumptions. ditional” GAS; the MOMGA is based on the messy GA
For purposes of these experiments we define a com- (mGA) and can be considered non-standard. However,
putational grid by placing an equidistantly spaced grid the conceptual evolutionary process modeled by each al-
over decision variable space, allowing a uniform sampling gorithm is the same and gives the basis for their direct
of possible solutions. Each grid intersection point (com- comparison. Other algorithms were considered but not
putable solution) is then assigned successive numbers us- included in these experiments (for various reasons); those
ing a binary representation. Given a fixed length binary and other alternative MOEAs may be considered in later
string, decision variable values are then determined by experiments.
mapping the binary (sub)string to an integer. Although the NFL theorems (Wolpert and Macready,
Even though a binary representation restricts a search 1997) show there is no overall “best” EA, certain EAs
space’s size, it does allow for a “fair” and quantita- have been experimentally shown to be more likely effec-
tive MOEA comparison, determination of an MOP’S tive than others for some real-world problems. However,
PFtrue (at some computational resolution), and an enu- attempts to experimentally examine MOEA effectiveness
meration method for deterministically searching a so- are few.
205
3.3 MOEA Key Parameters tens of thousands of fitness evaluations and this emeri-
mental limit is within that range. As it explores o& a
Many EA experiments vary key algorithmic parameters
small fiaction of the search space an MOEA’s effective-
in itll attempt to determine the most effective and effi- ness should be readily apparent in how well its results
cient implementation for a particular problem instantia- (Phomand PFbown) compare to Ptme and PFtme (if
tion or class. A parameter analysis investigating effects
known).
of differing parameter values is beyond the scope of these
Thus, for all test MOPs, the MOMGA was executed
experiments; we do not wish to “tune” MOEAs for good
first and the number of solutions evaluated per run de-
performance on some problem class. The experimen-
termined. The other MOEAs (each with population size
tal algorithms are then executed with default parame-
N = 50) were then set to execute almost the same num-
ter values as reported in the literature, implementing
ber of evaluations (N multiplied by the number of gener-
each MOEA “out of the box” as it were. However, the
ations), ensuring a very nearly equivalent computational
term “default” is somewhat of a lnisnomer as no formal
effort for each tested MOEA. Again, note these exper-
MOEA parameter ( d u e ) studies are known. iments’ purpose is to explore MOEA performance and
Existing empirical experimental results sometimes in-
not to determine ideal parameter settings for solving the
dicate mating restriction is necesary for good perfor-
test functions.
mance - at other times various MOEA implementations
seem to operate well without it. These empirical results
3.4 MOEA Experimental Metrics
indicate the NFL theorems are alive and well (Wolpert
and Macready, 1997). As incorporating mating restric- What metrics might adequately measure an MOEA’s re-
tion in some experimental software required major code sults or allow meaningful comparisons of specific MOEA
modifications, and because of its uncertain usefulness in implementations? Appropriate metrics must be selected
the MOP domain, mating restriction is not incorporated upon which to base MOEA performance claims, and
in any experimental MOEA. as the literature offers few quantitative MOEA metrics,
The InGA’s “cut and splice” EVOPs’ effect (when proposed metrics must be carefully delined to be use-
both are used) is intended to be similar to recombina- ful. Additionally, no single metric can entirely capture
tion’s (Goldberg et al., 1989). The MOMGA used mGA total MOEA performance, as some measure algorithm
default parameters for these operators, namely pctlt = effectiveness and others efficiency. Temporal effective-
0.2 (only one cut allowed per string) and Paplice = 1.0. ness and efficiency may also be judged, e.g., measuring
There is not yet a “default” MOEA crossover rate but an MOEA’s progress each generation.
past experiments have used crossover probabilities in the The metrics identified in this section measure perfor-
range p c E [0.7,1.0](Van Veldhuizen, 1999). Thus, the mance in the phenotype domain. Whereas many oper-
other experimental MOEAs used singlepoint crossover ations researchers attempt to generate Pr, (and thus
with p , = 1.0. All but the MOMGA used a mutation implicitly measure performance in genotype space) (Van
rate of p m = f where I is the number of binary digits. Veldhuizen, 1999), MOEA researchers have mainly fo-
The MORIGA did not employ mutation (i.e., pm = 0). cused on generating P F h e ( W d thus measure perfor-
As in thc original mGA presentation (Goldberg et al., mance in phenotype space). As there is a direct cor-
1989), this results in the most stringent possible testing. respondence between solutions in Ptmeand vectors in
As mutation is not available to provide diversity and PFtNeone method may not be “better” than another.
“recreate” BBs, losing a BB from the population means However, note that multiple solutions may map to the
it is golie forever. same vector. Also, we here discuss only primary metrics
We compare experimental MOEA results derived af- used in our experiments - a more complete discussion is
ter an ideiitical number of solution evaluations are per- found elsewhere (Van Veldhuizen, 1999).
formed, using that factor as a measure of common com-
putational effort. However, the number of executed so- 3.4.1 Final Generational Distance
lution evaluations often differs between MOMGA runs
This metric is a value representing how “far” PFL~,,,,is
(even thosc solving the same MOP) because of internal
from PFtpueand is defined as:
paraniet crs dynamically governing its operation. Thus,
in these experiments the MOMGA is set to execute for
three erns and to contain 100 individuals in each jux-
tapositional population. All MOMGA runs are termi- where n is the number of vectors in P F h , , , p = 2,
nated before tlie total number of solution evaluations for and 4 is the Euclidean distance (in objective space) be-
a run exceeds 65,536 (216). The total haction of explored tween each vector and the neawst member of PF,,.,,, .
search space is then bounded above by $ = 0.39% (the A result of 0 indicates PFt,e= P F k n o w n ; any other
computational grid used for all test MOPs results in 224 value indicates P F k n o w n deviates hom PFtpue . The ex-
possible solutions). Historically, EAs execute at most ample in Figure 1 has d1 = ,/(2.5-2)2+ (9 - 8 ) z ,
206
Schott (Schott, 1995) uses a like metric (although de-
fined over the Pareto optimal set, i.e., I p k n o w n I).
(1.5.10) Genotypically or phenotypically defining this metric is
(2.5.91 probably a matter of preference, but we again note
I (2.8) : multiple solutions may map to the same vector, i.e.,
I Pknown 121P F h o w n I. Although counting the number
of nondominated solutions gives some feeling for how
effective the MOEA is in generating desired solutions,
it does not reflect on how “far” from PF,,,the vec-
tors in PFknoulnare. Additionally, too few vectors and
PFk,,,, ’s representation may be poor; too many vec-
tors may overwhelm the decision maker. The example
in Figure 1 has an O N V G = 3.
3.5 M O E A Computational Environment
Figure 1: PFk,,,, /PFtme Example All MOEAs are executed on the same computational
platform for consistency. The host is a Sun Ultra 60
workstation with dual 300 MHz processors and 512 MB
+
d2 = J(3 - 3)2 (6 - 6)2, d3 = J(5 - 4)2 + (4- 4)2, RAM, running Solaris 2.5.1. Many other computational
and G = J1.1182 + O2 12/3 = 0.5. + platforms would suffice but this high-end host offers ex-
clusive access. The MOMGA and NPGA are extensions
3.4.2 Spacing of existing algorithms and speci6c software, are written
in “C” and compiled using the Sun Workshop Compiler
This metric is a value measuring the spread (dis-
tribution) of vectors throughout PFknown . Because version C 4.2. Much of our associated research and re-
lated experimentation employs a specialized MATLAB
PFknown ’S “beginning” and “end” are known, a suitably
EA-based toolbox; the MOGA and NSGA are written
defined metric judges how well PFk,,,, is distributed.
as self-contained “m-files” leveraging pre-defined tool-
Schott (Schott, 1995) proposes such a metric measuring
box routines. These MOEAs are also executed on the
the range (distance) variance of neighboring vectors in
Sun platform described previously but within the MAT-
PFk,,,, . Called spacing, he defines this metric as: LAB 5.2 environment. Timing results are not of specific
I . n experimental concern here, however, for all experimental
MOPs, each MOEA run executes in a matter of minutes.
3.6 M O E A Experimental M O P s and Approach

where di = minj(( fj(Z) - f{(.’) I 1 - fi(Z)I),+ fi(.’) Several MOPs are elsewhere substantiated, formulated,
i , j = 1 , . . .,n,d is the mean of all di, and n is the
and proposed for use in an MOEA test function
number of vectors in PFknoWn. A value of zero for
suite (Van Veldhuizen, 1999; Deb, 1999). For these ex-
this metric indicates all members of PFk,,, are equidis-
periments’ test functions we select the following MOPs
tantly spaced. However, note that vectors composing from the suite: MOP1, MOP2, MOP3, MOP4, and
PFt,, are not necessarily uniformly spaced. The exam- MOP6. These are all bi-objective MOPs exhibiting key
ple in Figure 1 has dl = 3.5, d2 = 4, 2 = 3.75, and an characteristics of the MOP domain; they are described
S = 0.25. Srinivas and Deb (Srinivas and Deb, 1994) also in great detail elsewhere (Van Veldhuizen, 1999).
define a similar measure expressing how well an MOEA The purpose of these experiments is to compare well-
has distributed Pareto optimal solutions over the Pareto engineered algorithms in terms of effectiveness as re-
optimal set (genotype space). gards carefully selected test problems. We wish to de-
termine selected MOEA performance over the MOP do-
3*4.3 Overall Vector Generation main class, to evaluate the usefulness of proposed test
a n d Ratio functions and metrics, and to record other germane ob-
The tested MOEAs add PFcvrrent ( t ) to PFknown(t - 1) servations arising during experiment execution and re-
each generation, most probably resulting in different car- sult analysis. Thus, the metrics described in Section 3.4
dinalities for PFknown( t ) as t changes. This metric are selected for use because they initially appear to be
then measures the total number of nondominated vec- the most appropriate indicators of MOEA performance
tors found during MOEA execution and is defined as: and thus provide a validated basis for MOEA compari-
son.
O N V G $1 PFknown 1 (3)
207
4 MOEA Experimental Analyses each MOP and recorded metrics, at least two MOEAs’
results’ distributions differ (Van Veldhuizen, 1999).
All MOP test functions used in these experiments are This result allows use of the Wilcoxon rank sum test
formulated as stated elsewhere (Van Veldhuizen, 1999, in comparing the results of MOEA “pairs,” attempting
pp. 5-13 - 5-14); other experimental and algorithmic pa- now to determine which of a given two MOEAs does
rameters are as discussed in Sections 3.1 to 3.3. For com- “better.” This test assumes the sample of differences is
parative purposes the four experimental MOEAs were randomly selected and that the probability distributions
each executed ten times for each MOP, providing a sta- from which the sample of paired differences is drawn is
tistical sample with which to derive metric values. continuous (McClave et al., 1998). Our initial results
meet this criteria so we test the following hypotheses:
4.1 Experimental Metrics and MOPs
Ho: The probability distributions of
Although the experimental metrics and MOPs were pre- MOEAl and MOEA2 results applied to
viously theoretically validated, these experiments high- MOPX are identical.
light practical implementation difficulties. We discuss
elsewhere that some MOEA metrics appear not as valu- Ha: The results’ distributions differ
able as others (Van Veldhuizen, 1999); currently we con- for the two MOEAs.
sider spacing, generational distance, and ONVG as the
most meaningful metrics for analysis. Although genera- There are six possible MOEA pairings: MOGA and
tional distance is dependent upon objective space size MOMGA, MOGA and NPGA, MOGA and NSGA,
(and may vary widely between MOPs), spacing and MOMGA and NPGA, MOMGA and NSGA, and NPGA
ONVG values may not be as much so. and NSGA. If c Wilcoxon rank sum tests are performed
Taken overall, the selected test suite MOPs appear with an overall level of significance a,the Bonnferroni
useful in practice as well as in theory. We do make technique (Neter et al., 1990) allows us to conduct each
recommendations to modify some problem formulations, individual test at a level of significance a* = a / c (Mc-
however (Van Veldhuizen, 1999). Clave et al., 1998). For these tests we select an overall
significance level a = 0.2 due to the fact we have only
4.2 Experimental Statistical Analyses 10 data points per MOEA. Thus, a* = 0.2/6 = 0.03333.
Wilcoxon rank sum test results are also termed p
Preliminary results imply each algorithm’s observations
values, or observed significance levels. We reject the null
are not normally distributed, and that the variance is no-
ticeably different for different MOEAs (Van Veldhuizen, hypothesis whenever p 5 a*.These tests show the ma-
1999). This data may not then satisfy necessary assump- jority of pairwise MOEA comparisons provide enough
tions for parametric mean comparisons and we thus con- evidence to support the alternative hypothesis and con-
clude in those cases that there is a significant statis-
sider non-parametric statistical techniques for analyzing
tical difference between the MOEAs (Van Veldhuizen,
these experimental results.
The Kruskal-Wallis H-Test requires no assumptions 1999). We are then able to make conclusions about each
about the probability distributions being compared (Mc- MOEA’s results as regards each of the three metrics.
Clave et al., 1998). However, other assumptions must For each metric, a figure presenting mean metric perfor-
be satisfied in order to apply this test: that five or more mance ( p ) is plotted for each MOP by algorithm with
error bars 2a in length ( p + a , p - a , with cr the standard
measurements are in each sample and that the samples
are random and independent; and that the probability deviation).
distributions from which the samples are drawn are con-
tinuous. Our initial results meet this criteria so we test 4.2.1 Generational Distance Statistical Analysis
the following hypotheses: Figure 2 presents MOEA performance as regards gener-
Ho: The probability distributions of ational distance. The MOP1 values for both the NPGA
and NSGA were large enough to skew the results when
MOGA, MOMGA, NPGA, and NSGA
viewed in this format - the graph truncates those two
results applied to MOPX are identical.
bars (their respective results are G M 66 and G M 11).
The pairwise Wilcoxon rank sum tests allow us to state
Ha: At least two of the experimen-
that in general, when considering generational distance
tal MOEAs’ result distributions differ.
the MOGA, MOMGA, and NPGA gave better results
Kruskal- Wallis H-Test results are termed pvalues, than the NSGA over the test suite problems. This is
also called observed significance levels. We reject the null certainly true for MOP4; the MOGA and MOMGA per-
hypothesis whenever p 5 a. Using a significance level form much better than the other two MOEAs on MOP6.
a = 0.1, we in all cases see there is enough evidence to
support the alternative hypothesis and conclude that for
208
0.45 -
0.4 -
0.35 - T
1:
0.3 -
0.15.
0.1
0.05
0-
.
. I
NSGA
Figure 2: Overall Generational Distance Performance Figure 4: Overall ONVG Performance
lems. Disregarding MOPl allows us to state the these

three algorithms always outperformed the NSGA. This
061
is a surprising result. Although sometimes performing
0.4 worse (when considering spacing and generational dis-
tance), the NSGA generally returned values close to the
other algorithms. In this case the NSGA consistently re-
I turns fewer (often less than half) the number of nondom-
inated vectors than the other algorithms. We highlight
this result as we are attempting to provide a decision
maker with a number of choices represented by the non-
dominated vectors composing PFknown .
5 MOEA Experimental Observations

Algonlhm A number of related experiments are executed in support
of those analyzed in this chapter. Selected results and
Figure 3: Overall Spacing Performance observations gleaned through the experimental process
are reported in this section.
4.2.2 Spacing Statistical Analysis 5.1 Experimental Timing Analysis
Figure 3 presents overall MOEA performance as regards As previously indicated, the MOGA and NSGA are im-
spacing. This graph does not report MOPl results as plemented within a MATLAB toolbox. Part of the tool-
they are (possibly) somewhat misleading (Van Veld- box's default output includes CPU and wall-clock exe-
huizen, 1999). The pairwise Wilcoxon rank sum tests cution time; these times are shown in Figure 5 which re-
allow us to state that in general, when considering spac- ports mean CPU timing results ( p ) for each MOP tested
ing the NSGA gave worse results over the test suite with error bars as above.
problems. This is certainly true for MOP2 and MOP4; It is seen that the MOGA's mean values are gener-
the MOGA and MOMGA perform much better than the ally less than those of the NSGA, although MOP3 is
other two MOEAs on MOP6. an exception. A straightforward reason exists for this.
All MOEAs employed a secondary population. Whereas
4.2.3 ONVG Statistical Analysis the MOMGA and NPGA append PFcument( t ) to a file
Figure 4 presents overall MOEA performance as regards containing PFk,,,,,, (t - 1) each generation, the MOGA
ONVG. The pairwise Wilcoxon rank sum tests allow us and NSGA update PFknown( t ) as part of their genera-
t o state that in general, when considering ONVG the tional loops. As determination of Pareto optimality is
NSGA again gave worse results over the test suite prob- (3(n2),and as the MOGA always returns more nondom-
inated vectors than the NSGA (see Figure 4), its actual
209
35 - Generational Nondominated Vector Generation
(GNVG) is a metric tracking the number of nondom-
30- inated vectors produced each MOEA generation (Van
Veldhuizen, 1999). Figure 6 presents GNVG results
25 - for MOP7. As the MOMGA records PFcumnt(t)
each juxtapositional generation, we see values for 13
20 - generations each in eras 1 and 2, and 14 in era 3. Note
s35 that the number of nondominated vectors is higher near
4 15- the algorithm’s end and that it may produce two to
three times the number of vectors as the MOGA and
10 - NSGA. This is most likely due to the MOMGA’s larger
juxtapositional population size (100 individuals vs. 50).
5- The latter MOEAs record results every generation.
Their results axe plotted using identical scales; only
three NSGA generations reported a GNVG above 8
(two with GNVG = 9 and one where GNVG = 11).
Figure 5: MOEA Timing P, , I , , ,

NSGA
, , , , ,
execution times are most likely actually much lower (as

are the NSGA’s). However, even if the MOGA’s and
NSGA’s logic is changed to reflect that of the MOMGA
and NPGA (or vice versa), we believe the MOMGA and
NPGA are still the faster running MOEAs because they
are machine executables.
5.2 Additional Experimental Metrics

As discussed in Section 3.4, some experimental metrics
may also be used to track MOEA generational perfor-
mance. We provide examples here for further insight.
Figure 7: NSGA “Waves”
These results are from an arbitrary MOP7 experimen- Moo*
2
tal run (MOP7 is defined elsewhere (Van Veldhuizen,
1999)); note that the NPGA is not used in solving this 1
MOP.
0
MOMGA
2-
3-
MOGA NSGA
Figure 8: MOGA N V A
Only one citation in the known literature (Van Veld-

huizen, 1999) reports any quantitative results concern-
ing the number of “waves” or fronts produced (per
- 0
generation) by implementing Goldberg’s Pareto ranking
0 200 400 600
Generation
800
Generation scheme. We present a similar graph (Figure 7) showing
the number of waves may vary significantly between gen-
Figure 6: GNVG erations. As this computation may be significant, it is
most likely a contributor to why the NSGA is the slowest
210
MOEA tested (see Section 5.1). Figure 8 shows Nondom- Horn, J., Nafpliotis, N., and Goldberg, D. E. (1994). A
hated Vector Addition ( N V A ) values recorded during a Niched Pareto Genetic Algorithm for Multiobjective
MOGA run. This metric is defined as the change in car- Optimization. In Michalewicz, Z., editor, Proceed-
dinality between PFknown ( t )and PFkno, (t -t1) (van ings of the First IEEE Conference on Evolutionary
Veldhuizen, 1999). This metric’s value may remain Computation, pages 82-87, Piscataway NJ. IEEE
steady for several generations and sometimes even shows Service Center.
a net loss of solutions from PFknown.
Jackson, R. H. F., Boggs, P. T., Nash, S. G., and Pow-
i The final generational distance metric (see Sec- ell, S. (1991). Guidelines for Reporting Results of
tion 3.4.1) indicates an MOEA’s convergence to PFt,,, ,
Computational Experiments - Report of the Ad Hoc
but it may also be used to measure each generational
Committee. Mathematical Programming, 49:413-
population’s convergence. In this case, one expects G
425.
to generally decrease during MOEA execution if con-
vergence is occurring, although the contextual nature of McClave, J. T., Benson, P. G., and Sincich, T. (1998).
determining Pareto dominance means metric values may Statistics for Business and Economics. Prentice
sometimes increase. Hall, Upper Saddle River, NJ, 7th edition.
Neter, J., Wasserman, W., and Kutner, M. H. (1990).
6 Conclusion Applied Linear Statistical Models: Regression, Anal-
This paper proposes and implements a quantitative ysis of Variance, and Experimental Designs. Irwin,
MOEA performance comparison methodology, present- Boston, MA, 3rd edition.
ing experimental results and analyses. Using three prac- Schott, J. R. (1995). Fault Tolerant Design Using Single
tical MOP comparative metrics (generational distance, and Multicriteria Genetic Algorithm Optimization.
overall nondominated vector generation, and spacing), Master’s thesis, Department of Aeronautics and As-
nonparametric statistical analyses then show NSGA per- tronautics, Massachusetts Institute of Technology,
formance to be statistically worse than the other tested Cambridge, Massachusetts.
MOEAs (over the test problems). It concludes with se-
lected results and observations of related experiments. Shaw, K. J., Nortcliffe, A. L., Thompson, M., Love, J.,
Fonseca, C. M., and Fleming, P. J. (1999). Assess-
ing the Performance of Multiobjective Genetic Algo-
References rithms for Optimization of a Batch Process Schedul-
Coello, C. A. C. (1999a). A Comprehensive Survey ing Problem. In 1999 Congress on Evolutionary
of Evolutionary-Based Multiobjective Optimization Computation, pages 37-45, Washington, D.C. IEEE
Techniques. Knowledge and Information Systems, Service Center.
1(3):269-308. Srinivas, N. and Deb, K. (1994). Multiobjective Opti-
Coello, C. A. C. (199913). List of References on Evolu- mization Using Nondominated Sorting in Genetic
tionary Multiobjective Optimization. Online. Avail- Algorithms. Evolutionary Computation, 2(3):221-
able: ht tp: / /www.lania.mx/EMOO. 248.
Van Veldhuizen, D. A. (1999). Multiobjective Evolution-
Deb, I<. (1999). Multi-Objective Genetic Algorithms: ary Algorithms: Classifications, Analyses, and New
Problem Difficulties and Construction of Test Prob-
Innovations. PhD Thesis, AFIT/DS/ENG/99-01,
lems. Evolutionary Computation, 7(3):205-230. Air Force Institute of Technology, Wright-Patterson
Fonseca, C. M. and Fleming, P. J. (1995). An Overview AFB .
of Evolutionary Algorithms in Multiobjective Opti- Van Veldhuizen, D. A. and Lamont, G. B. (2000). Mul-
mization. Evolutionary Computation, 3( 1):l-16. tiobjective Optimization with Messy Genetic Algo-
Fonseca, C. M. and Fleming, P. J. (1998). Multiob- rithm. In Proceedings of the 2000 Symposium on
jective Optimization and Multiple Constraint Han- Applied Computing, pages 470-476. ACM.
dling with Evolutionary Algorithms - Part I: A Uni- Wolpert, D. H. and Macready, W. G. (1997). No Free
fied Formulation. IEEE Dunsuctions on Systems, Lunch Theorems for Optimization. IEEE Transac-
Man and Cybernetics - Part A : Systems and Hu- tions on Evolutionary Computation, 1(1):67-82.
mans, 28(1):26-37.
Zitzler, E. and Thiele, L. (1999). Multiobjective Evo-
Goldberg, D. E., Korb, B., and Deb, K. (1989). Messy lutionary Algorithms: A Comparative Case Study
Genetic Algorithms: Motivation, Analysis, and and the Strength Pareto Approach. IEEE Transac-
First Results. Complex Systems, 3:493-530. tions on Evolutionary Computation, 3(4):257-271.
21 1

Measuring Multiobjective Evolutionary Algorithm Performance

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measuring Multiobjective Evolutionary Algorithm Performance

Uploaded by

Copyright:

Available Formats

On Measuring Multiobjective Evolutionary Algorithm Performance

David A. Van Veldhuizen Gary B. Lamont

0-7803-6375-2/00/$10.00 02000 EEE. 204

3.5 M O E A Computational Environment

3.6 M O E A Experimental M O P s and Approach

Figure 2: Overall Generational Distance Performance Figure 4: Overall ONVG Performance

lems. Disregarding MOPl allows us to state the these

5 MOEA Experimental Observations

Figure 5: MOEA Timing P, , I , , ,

execution times are most likely actually much lower (as

5.2 Additional Experimental Metrics

Only one citation in the known literature (Van Veld-

You might also like